<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Digital Mishnah</title>
	<atom:link href="http://www.digitalmishnah.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.digitalmishnah.org</link>
	<description>Developing a Digital Edition of the Mishnah</description>
	<lastBuildDate>Sun, 28 Apr 2013 01:04:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>Demo Feeback Page</title>
		<link>http://www.digitalmishnah.org/demo/feedback/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=feedback</link>
		<comments>http://www.digitalmishnah.org/demo/feedback/#comments</comments>
		<pubDate>Sun, 28 Apr 2013 00:40:12 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[demo]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=195</guid>
		<description><![CDATA[Notes on Version 0.0.3 Thank you for visiting the feedback page for the Digital Mishnah Project Demo. Please use the comments area on this page to provide feedback on the project and the demo. Some basic versioning information: For reference purposes I am calling this version 0.0.3. When we can introduce some statistical tools, and customizable output, we can start numbering versions 0.1.0, etc., and we&#8217;ll be moving from demo to alpha development phase. This version of the demo adds some styling changes, changes to the interface where witnesses are accessed for comparison/collation, and one major output change: the ability<div class="readmore"><a href="http://www.digitalmishnah.org/demo/feedback/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<h3>Notes on Version 0.0.3</h3>
<p>Thank you for visiting the feedback page for the Digital Mishnah Project Demo.</p>
<p>Please use the comments area on this page to provide feedback on the project and the demo.</p>
<p>Some basic versioning information:</p>
<p>For reference purposes I am calling this version 0.0.3. When we can introduce some statistical tools, and customizable output, we can start numbering versions 0.1.0, etc., and we&#8217;ll be moving from demo to alpha development phase.</p>
<p>This version of the demo adds some styling changes, changes to the interface where witnesses are accessed for comparison/collation, and one major output change: the ability to view the output of comparison for an individual mishnah or  a whole chapter. Version 0.0.2 introduced the ability to browse through witnesses by page, column, or chapter. Version 0.0.1 demonstrated the basic functionalities.</p>
<p>Previous versions of the compare/collate functions of the demo are still available for comparison. (Prior versions of the browse function have been overwritten.)</p>
<ul>
<li>For the immediately prior version (let&#8217;s call it 0.0.2) see <a title="here" href="http://dev.digitalmishnah.org/viewer/text/collate-hl" target="_blank">here</a>. The web version calls the same stylesheets</li>
<li>Of the earliest version (0.0.1), version <a title="0.0.1B" href="http://dev.digitalmishnah.org/viewer-collatex-1.3/text/demo" target="_blank">0.0.1B </a>has been retained since it uses a somewhat different alignment algorithm, and users may wish to compare results. Allowing users to select and compare the output of different alignment methods is a desideratum.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/demo/feedback/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Version of Demo</title>
		<link>http://www.digitalmishnah.org/uncategorized/new-version-of-demo/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=new-version-of-demo</link>
		<comments>http://www.digitalmishnah.org/uncategorized/new-version-of-demo/#comments</comments>
		<pubDate>Sun, 24 Feb 2013 03:14:51 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=188</guid>
		<description><![CDATA[We have released a new version of the demo. Much of the change is in styling and branding, but there are new texts added, some new views, and a new naming convention. New texts. Gradually, I am replacing the sample files with just Bava Metsi&#8217;a Ch. 2 with transcriptions covering all of tractate Neziqin (the Bavot). Currently, this applies to the Maimonides autograph, Paris BNF Héb. 328-329, and the Naples editio princeps (with the marginalia from the copy in the National Library of Israel.) Work is ongoing on other witnesses. Some new Genizah fragments have been added, and, in the<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/new-version-of-demo/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>We have released a new version of the demo. Much of the change is in styling and branding, but there are new texts added, some new views, and a new naming convention.</p>
<p><em><strong>New</strong></em><strong> <em>texts.</em></strong> Gradually, I am replacing the sample files with just <em>Bava Metsi&#8217;a</em> Ch. 2 with transcriptions covering all of tractate Neziqin (the Bavot). Currently, this applies to the Maimonides autograph, Paris BNF Héb. 328-329, and the Naples <em>editio princeps</em> (with the marginalia from the copy in the National Library of Israel.) Work is ongoing on other witnesses. Some new Genizah fragments have been added, and, in the next release, I hope to be able to show some samples of virtually joined manuscripts that can be broken out into the individual fragments.</p>
<p><em><strong>New views</strong><strong>.</strong></em> Users can now browse through documents page by page or column by column, and they can see witnesses chunked by chapter in a compact view.</p>
<p><em><strong>New naming convention.</strong></em> Sigla for the manuscripts will now be based on the recent <em>Thesaurus of Talmudic Manuscripts</em>. Print editions will be based on serial numbers in similar format. We are experimenting with a convention for sigla that is slightly more informative, so that it will be possible to tell that a given witness includes the Mishnah alone, or a commentary in Hebrew or Arabic, and perhaps other data such as region and date of hand. (This last will require expert typing of the manuscripts.)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/new-version-of-demo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Answering the Mail</title>
		<link>http://www.digitalmishnah.org/uncategorized/answering-the-mail/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=answering-the-mail</link>
		<comments>http://www.digitalmishnah.org/uncategorized/answering-the-mail/#comments</comments>
		<pubDate>Wed, 14 Nov 2012 04:04:24 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=183</guid>
		<description><![CDATA[I had promised to respond to comments on the demo, so, at long last, here goes. Request for greater highlighting of collation options (Tim Finney). In fact, CollateX has several alignment methods built into libraries that can be utilized. This is outside of what I feel comfortable talking about (I don&#8217;t really read Java &#8230; yet) but there is no reason we can&#8217;t allow users to select methods and see what yields the best results. Don&#8217;t build unnecessary mechanisms (Desmond Schmidt). Well taken. As a non-programmer, I&#8217;m not always the best judge of what is difficult or simple to build.<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/answering-the-mail/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>I had promised to respond to comments on the demo, so, at long last, here goes.</p>
<ol>
<li>Request for greater highlighting of collation options (Tim Finney). In fact, CollateX has several alignment methods built into libraries that can be utilized. This is outside of what I feel comfortable talking about (I don&#8217;t really read Java &#8230; yet) but there is no reason we can&#8217;t allow users to select methods and see what yields the best results.</li>
<li>Don&#8217;t build unnecessary mechanisms (Desmond Schmidt). Well taken. As a non-programmer, I&#8217;m not always the best judge of what is difficult or simple to build. The point though was to allow manual error-correction of the alignment by adding or deleting cells in a table row. As for the order of witnesses, my own sense is that it is extremely useful for visually examining groupings of manuscripts.</li>
<li>Apparatus unnecessary (Desmond Schmidt), or unwieldy (Daniel Stoekl, Naftali Cohn). Well, Stoekl, a potential user, suggests that the print-type apparatus is useful. It is a way of compactly summarizing data. My include-everything model is in fact unwieldy, and the suggestion to leave out readings that are identical with the base text would simplify the situation. Just how text families can be generated and then used in the apparatus is a discussion for a later day, but it is definitely a desideratum.</li>
<li>Additional textual detail; handling absence of evidence (Daniel Stoekl, Naftali Cohn). These are important points. For collation, I made the decision to present a simplified text, but obviously this will have to be made more complex. I don&#8217;t think additional tagging is necessary in most cases; different processing is. For additions, corrections in second hand, we effectively generate an additional witness, but ignore the readings of that secondary witness except when they differ from the primary witness. For dealing with highly lacunose texts, the method will be: to have a reference text that includes individual addressing for each word in the Mishnah. The tagging in the lacunose text aligns the text and lacunae with the reference text. At a minimum, this allows us to identify &#8220;gaps&#8221; to be ignored and &#8220;gaps&#8221; to be processed. A reference text of the Bavot exists, and I am working on extending it further, but we are still working on the pointing mechanism.</li>
<li>Search functionality (Naftali Cohn). Yes, but what? Ironically, I can envision complex searches (a particular abbreviation in texts in Sephardic hands) more easily than simple searches. What should a search for &#8220;Rabbi Meir&#8221; or &#8220;Prohibited&#8221; return?</li>
<li>Other matters (Naftali Cohn). My December and January task is to start working on page by page and chapter by chapter view, especially that now my text sample includes extended runs of text. I&#8217;d also like to be able to generate apparatus or alignments for a whole chapter.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/answering-the-mail/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drowning in Texts</title>
		<link>http://www.digitalmishnah.org/uncategorized/drowning-in-texts/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=drowning-in-texts</link>
		<comments>http://www.digitalmishnah.org/uncategorized/drowning-in-texts/#comments</comments>
		<pubDate>Sun, 21 Oct 2012 03:08:49 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=178</guid>
		<description><![CDATA[The comments on the demo deserve a full response (although the short response is: thank you and, in almost all cases, I agree). However, for this post I want to report on progress in getting and identifying texts for the extended demo. We have made the decision to build out from the sample chapter in Bava Metsi&#8217;a to all of tractate Neziqin (the &#8220;Bavot&#8221;), a 30-chapter and 13-14,000-word base text to work with. Michael Krupp has generously provided transcriptions of 4 orders for three manuscripts (Kaufmann, Parma de Rossi 138, and Cambridge Add. 470.1). The first is now available in<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/drowning-in-texts/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>The comments on the demo deserve a full response (although the short response is: thank you and, in almost all cases, I agree). However, for this post I want to report on progress in getting and identifying texts for the extended demo. We have made the decision to build out from the sample chapter in Bava Metsi&#8217;a to all of tractate Neziqin (the &#8220;Bavot&#8221;), a 30-chapter and 13-14,000-word base text to work with.</p>
<p>Michael Krupp has generously provided transcriptions of 4 orders for three manuscripts (Kaufmann, Parma de Rossi 138, and Cambridge Add. 470.1). The first is now available in an <a href="http://kaufmann.mtak.hu/en/ms50/ms50-coll1.htm">electronic version</a> that is far better than what was available to Krupp when the transcriptions were made. The Cambridge ms  is presumably based on the edition of it by Lowe in the nineteenth century, and the Cambridge Libraries reported recently that that manuscript would be available on line. (At least, that&#8217;s what the Genizah Unit said on Facebook on July 4.) So there is room for improving the texts and resources available to do so. This should facilitate making substantial blocks of text available rather quickly. The problem is actually finding the time to encode the texts &#8230;</p>
<p>Meanwhile, with the participation of Lieberman Institute, under the direction of Shamma Friedman and the aid of Leor Jacoby, I am gradually filling out the corpus of texts available. I say gradually not because the work on the part of the Institute transcribers is slow. However, our agreement is for transcribers to provide transcriptions, and I see to the conversion to XML.</p>
<p>Those in the &#8220;biz&#8221; know that Yad Izhak Ben-Zvi and the Friedenberg Genizah Project recently published a three volume Thesaurus of Talmudic Manuscripts, edited by Sussman. The detailed information on joins makes it easier to prioritize fragments to transcribe. (It also leaves me feeling &#8220;scooped,&#8221; since my discoveries of joins were in most cases, possibly in all, anticipated by the Thesaurus, which was not yet available when I started working on this project.) On the basis of that catalog, the number of distinct shelfmarks for witnesses (once we include all the fragments of joined manuscripts where one or more fragment has text in the Bavot) runs to 200.</p>
<p>So, aside from wondering about next steps on the application that will drive the edition, I am drowning in texts. Happily, but drowning nonetheless.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/drowning-in-texts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finally, a live demo</title>
		<link>http://www.digitalmishnah.org/uncategorized/live-demo/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=live-demo</link>
		<comments>http://www.digitalmishnah.org/uncategorized/live-demo/#comments</comments>
		<pubDate>Thu, 30 Aug 2012 20:19:01 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=164</guid>
		<description><![CDATA[I am please to say that with a lot of work on a lot of people&#8217;s part, there is now a live demo of the Digital Mishnah Project. The demo is just that: a demonstration of possible functionalities.This post will outline some of the features that were always meant to be temporary and some new planned or desired features, and then invite comments. What will be changed The selection of witnesses. Entering numerals is unwieldy. Ideally, users should be able to slide text &#8220;icons&#8221; around (as one does with a pivot table in Excel, for instance) Output in browse functions. A<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/live-demo/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>I am please to say that with a lot of work on a lot of people&#8217;s part, there is now a <strong><a title="Digital Mishnah Demo Page" href="http://www.digitalmishnah.org/demo/">live demo of the Digital Mishnah Project</a></strong>. The demo is just that: a demonstration of possible functionalities.This post will outline some of the features that were always meant to be temporary and some new planned or desired features, and then invite comments.</p>
<h3>What will be changed</h3>
<ul>
<li>The selection of witnesses. Entering numerals is unwieldy. Ideally, users should be able to slide text &#8220;icons&#8221; around (as one does with a pivot table in Excel, for instance)</li>
<li>Output in browse functions. A single chapter was used for the demo version. Future versions will allow users to select specific chapters and/or specific ms pages and progress by page or chapter. Metadata should perhaps be hideable.</li>
<li>Output in collate functions. The demo groups output together; these are actually alternative functions.</li>
</ul>
<h3>Additional basic functionalities</h3>
<ul>
<li>Ability to download or print results.</li>
<li>Ability to  compare longer texts (whole chapters)</li>
<li>Improved collation&#8211;and/or the ability to select alternative collation methods</li>
</ul>
<h3>Desiderata</h3>
<ul>
<li>Statistical tools, such as multi-dimensional scaling and clustering, to group manuscripts and display results</li>
<li>Since there will inevitably be errors in collation, ability to correct alignment and re-run various operations</li>
<li>Dynamic synoptic view, in which two or more witnesses can be viewed in parallel columns, with the ability to highlight textual differences or other features.</li>
</ul>
<h3>Please Comment, Please Help</h3>
<p>Please use the comment function to this post to note errors, queries, and advice. And please, if you are interested in contributing, please do get in touch</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/live-demo/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Midsummer Update</title>
		<link>http://www.digitalmishnah.org/uncategorized/midsummer-update/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=midsummer-update</link>
		<comments>http://www.digitalmishnah.org/uncategorized/midsummer-update/#comments</comments>
		<pubDate>Tue, 10 Jul 2012 20:46:01 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=141</guid>
		<description><![CDATA[In addition to getting the demo ready to go live&#8211;it&#8217;s ready to go!&#8211;this summer&#8217;s agenda has been to add texts and add reference material. We now have two sets of reference data ready to implement. The heavy lifting for this was done by Atara Siegel, an undergraduate at Stern College, who worked for me for several weeks this summer. Atara prepared the lists, and, for the newly expanded sample text (tractates Bava Qamma, Bava Metsi&#8217;a and Bava Batra) also linked the relevant words in the reference text to the names list. Personal Names. This list is based on the list<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/midsummer-update/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>In addition to getting the demo ready to go live&#8211;it&#8217;s ready to go!&#8211;this summer&#8217;s agenda has been to add texts and add reference material.</p>
<p>We now have two sets of reference data ready to implement. The heavy lifting for this was done by Atara Siegel, an undergraduate at Stern College, who worked for me for several weeks this summer. Atara prepared the lists, and, for the newly expanded sample text (tractates Bava Qamma, Bava Metsi&#8217;a and Bava Batra) also linked the relevant words in the reference text to the names list.</p>
<ul>
<li><em><strong>Personal Names</strong></em>. This list is based on the list of Tannaim in the Mishnah in Albeck, <em>Mavo la-mishnah</em>, cross-referenced with the relevant names from Stemberger-Strack, <em>Introduction to the Talmud and Midrash</em>.</li>
<li><em><strong>Place Names</strong></em>. This list is based on three sources: B-Z Segal, <em>Ha-geografya ba-mishnah</em>, conveniently digitized <a title="here" href="http://www.shechem.org/torah/geomishna/index.html" target="_blank">here</a>, cross-referenced with Tsafrir, et al., <em>Tabula Imperii Romani: Iudaea-Palaestina</em>, and G. Reeg, <em>Die Ortsnamen Israels nach der rabbinischen Literatur</em>. (Note: Map references are given according to the Survey of Israel coordinates; we will have to find alternatives for non-Palestine sites.)</li>
</ul>
<p>In addition, we continue to add to the corpus of texts. The last of the planned witnesses for Bava Metsi&#8217;a Chapter 2 (my initial sample text) will be done by the end of the Summer, thanks to Bruce Roth, a graduate student at the Baltimore Hebrew Institute at Towson University, and transcribers students at Catholic University are preparing Genizah fragments.Working with the Lieberman Institute in Israel, I am preparing to have a number of witnesses to all three Bavot. We are starting with the Maimonides autograph and the Paris MS (Bibliothèqe nationale de France, Heb 328-329).</p>
<p>I keep holding out hope that the state of the Naples first edition is good enough that one should be able to OCR the text, but my experiments thus far have been disappointing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/midsummer-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Almost Ready for Prime Time</title>
		<link>http://www.digitalmishnah.org/uncategorized/almost-ready-for-prime-time/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=almost-ready-for-prime-time</link>
		<comments>http://www.digitalmishnah.org/uncategorized/almost-ready-for-prime-time/#comments</comments>
		<pubDate>Thu, 24 May 2012 17:44:09 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=136</guid>
		<description><![CDATA[We now have two versions of a demos up and ready to run. Both allow a user to pull data from the witness files, containing manuscript transcriptions, select texts to compare, run the texts through a version of Collatex, then present the results as an alignment table (a &#8220;synopsis&#8221; in or &#8220;partitur&#8221; in some text-critical dialects), and as a text with apparatus. The second of these is still buggy (and the cause of both a couple of late nights night and the lateness of this post (for which I apologize heartily to the nice people at MITH)), but it does<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/almost-ready-for-prime-time/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>We now have two versions of a demos up and ready to run. Both allow a user to pull data from the witness files, containing manuscript transcriptions, select texts to compare, run the texts through a version of Collatex, then present the results as an alignment table (a &#8220;synopsis&#8221; in or &#8220;partitur&#8221; in some text-critical dialects), and as a text with apparatus.</p>
<p>The second of these is still buggy (and the cause of both a couple of late nights night and the lateness of this post (for which I apologize heartily to the nice people at MITH)), but it does a couple of additional things:</p>
<ul>
<li><strong><em>Prioritization</em></strong>. While the ability to generate all sorts of different apparatus is a desideratum, at present what we <em><strong>can</strong></em> do is choose the order in which results are presented, and, in the case of presenting a text with apparatus, the first text chosen becomes the base text for comparison.</li>
<li><strong><em>Tokenizing</em></strong>. I am now able to tokenize in two steps. First with &#8220;rich&#8221; tokens that retain data about the individual words (e.g., abbreviations, which should be compared based on their expanded text rather than on the abbreviation as written), as well as other data in the text (page breaks, etc). From there we can create &#8220;regularized&#8221; tokens. For now I have regularized the tokens by removing all yods and waws. Additional candidates might include dealing with prepositions that are sometimes but not always attached in medieval Mishnah manuscripts (shel, e.g.), final aleph/heh, and final nun/mem. &#8220;Simple&#8221; tokens are passed to Collatex (or, we allow Collatex to process &#8220;rich&#8221; tokens) and the resulting collation output is merged with the rich tokens.</li>
<li><strong><em>Presentation</em></strong>. Because the &#8220;rich&#8221; tokens retain information about the witness, it is possible to generate a &#8220;text-with-apparatus&#8221; in which the base text can be presented with formatting and contextual information that may be useful to the reader. (Disclaimer: Here is a big bug: The XSLT that joins the two lists of tokens inserts the non-words (page breaks etc.) in a position that is offset by one location. Any suggestions?)</li>
</ul>
<p>Next up: modifying the  demo to present multi-column synopses, and linking in Talmudic and Commentary citations.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/almost-ready-for-prime-time/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Housekeeping</title>
		<link>http://www.digitalmishnah.org/uncategorized/housekeeping/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=housekeeping</link>
		<comments>http://www.digitalmishnah.org/uncategorized/housekeeping/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 17:25:09 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Fragments]]></category>
		<category><![CDATA[Mishnah]]></category>
		<category><![CDATA[Synopsis]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=119</guid>
		<description><![CDATA[This Site I&#8217;ve now updated the &#8220;Examples of Work&#8221; page to include viewable samples. Thanks to Kirsten Keister for setting up the light box format to view the samples. The examples include two samples of work that processes more than one text (collation, synopsis) and a number of examples of manuscripts. The Project I&#8217;ve been working on two issues. One is pointing. I now have a complete set of pointers from the reference file (ref.xml) to the witness files for locating spans of damaged text and page and fragment beginnings and ends for fragmentary texts. Of course, because nothing is<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/housekeeping/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<h3>This Site</h3>
<p>I&#8217;ve now updated the &#8220;Examples of Work&#8221; page to include viewable samples. Thanks to Kirsten Keister for setting up the light box format to view the samples. The examples include two samples of work that processes more than one text (collation, synopsis) and a number of examples of manuscripts.</p>
<h3>The Project</h3>
<p>I&#8217;ve been working on two issues. One is pointing. I now have a complete set of pointers from the reference file (ref.xml) to the witness files for locating spans of damaged text and page and fragment beginnings and ends for fragmentary texts. Of course, because nothing is simple, the direction of all of these will have to be reversed, so that the individual witnesses point into the reference text.<br />
In addition, I&#8217;ve improved the tokenization process, so that I can process &#8220;rich&#8221; tokens, retaining data about the word in question (e.g., that it is an abbreviation, or deleted &#8230;.; hold a regularized spelling as well as the original) as well as simple tokens, and re-join a collation based on simple tokens with the complex tokens.</p>
<h3>Text Geek Heaven</h3>
<p>Along the way, I&#8217;ve discovered some joining Genizah fragments. The coolest by far on a technical, jigsaw-puzzle level is the four-way join between TS AS 78.69, TS AS 78.162, TS AS 78.235 and TS NS 329.286 (Cambridge). The four fragments adjoin yet another, TS E2.71. This will be featured as a <a title="Fragment of the Month" href="http://www.lib.cam.ac.uk/Taylor-Schechter/fotm/">Fragment of the Month</a> of the Taylor-Schechter Genizah Research Unit. Look for it there!<br />
Cool in that that they join material from multiple cities are:</p>
<ul>
<li>TS E1.99 (Camb), MS heb. 8-11 (Oxf),  and TS F6.3, joining fragments from Cambridge and Oxford <em>and</em>:</li>
<li>TS AS 85.270 (Camb), TS F6.2, and MS R2339, fol. 1 (JTS), joining fragments from Cambridge and New York</li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/housekeeping/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Progress, real but in small steps.</title>
		<link>http://www.digitalmishnah.org/uncategorized/progress-real-but-in-small-steps/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=progress-real-but-in-small-steps</link>
		<comments>http://www.digitalmishnah.org/uncategorized/progress-real-but-in-small-steps/#comments</comments>
		<pubDate>Thu, 15 Mar 2012 12:41:33 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Fragments]]></category>
		<category><![CDATA[Manuscripts]]></category>
		<category><![CDATA[Mishnah]]></category>
		<category><![CDATA[Progress]]></category>

		<guid isPermaLink="false">http://www.digitalmishnah.org/?p=73</guid>
		<description><![CDATA[[Originally published on March 11, 2012 at http://blog.umd.edu/digitalmishnah] I had been holding out for my next post for a new Digital Mishnah website, courtesy of MITH, and a new collation demo hosted on it, but, that will be for my next post, deo volente. Since my last confession, I have: Submitted a paper that details methods and progress to date. It&#8217;s for a Festschrift, and I&#8217;ve been asked not to state the venue openly, but can share a draft. Thought a lot about (and only partly understand) multivariate statistics. Completed the first round of markup for all the Genizah fragments<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/progress-real-but-in-small-steps/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>[Originally published on March 11, 2012 at http://blog.umd.edu/digitalmishnah]</p>
<p>I had been holding out for my next post for a new Digital Mishnah website, courtesy of MITH, and a new collation demo hosted on it, but, that will be for my next post, <em>deo volente</em>.</p>
<p>Since my last confession, I have:</p>
<ul>
<li>Submitted a paper that details methods and progress to date. It&#8217;s for a Festschrift, and I&#8217;ve been asked not to state the venue openly, but can share a draft.</li>
<li>Thought a lot about (and only partly understand) multivariate statistics.</li>
<li>Completed the first round of markup for all the Genizah fragments for my sample chapter. A second round of markup linking the fragments to the reference text needs to be done (next bullet). Formatted versions of these texts will be viewable</li>
<li>Started rethinking how to handle the encoding of highly fragmentary texts. In particular, I&#8217;ve found four pieces of a single sheet of text in two different locations in the Taylor-Schechter collection (TS AS 78.69 + TS AS 78.162 + TS AS 78.235  + TS NS 329.286; the sheet adjoins another single sheet from a third box, TS E2.71). For the present, we are encoding each fragment as a document, and recording the extent of the lacunae at the edges of the fragment as fitting within the smallest properly oriented rectangle that encloses the fragment. What needs doing is a pointing scheme that will point into the reference text.</li>
<li>Identified the next fragments to work on to expand the work to Tractate <em>Neziqin</em> (aka the <em>Bavot</em>), and started to recruit people to work on it.</li>
</ul>
<p>Next up, completing fragmentary texts; encoding the remaining Mishnah texts in the Babylonian Talmud mss., and learning some Java.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/progress-real-but-in-small-steps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thinking about the end product</title>
		<link>http://www.digitalmishnah.org/uncategorized/thinking-about-the-end-product/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=thinking-about-the-end-product</link>
		<comments>http://www.digitalmishnah.org/uncategorized/thinking-about-the-end-product/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 20:44:48 +0000</pubDate>
		<dc:creator>Hayim Lapin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[collatex]]></category>
		<category><![CDATA[Mishnah]]></category>
		<category><![CDATA[phylogenetic software]]></category>
		<category><![CDATA[statistical clustering]]></category>

		<guid isPermaLink="false">http://blog.umd.edu/digitalmishnah/?p=28</guid>
		<description><![CDATA[Since my last post, I have been working on a grant application. This has afforded the opportunity of some stock taking. I&#8217;ve also had some very helpful conversations with scholars in the field: Juan Garcés and Matt Munson in Hebrew Biblical Studies, Tim Finney in New Testament and Desmond Schmidt in textual computing and classics. 1. Collation. Based on very simple normalization and tokenization and a few samples, CollateX will remain error prone, unless the algorithm changes significantly. Examples: (1) In a Mishnah section with repeated words, slight differences in spelling resulted in pushing a whole clause off to the<div class="readmore"><a href="http://www.digitalmishnah.org/uncategorized/thinking-about-the-end-product/">Read More +</a></div>]]></description>
			<content:encoded><![CDATA[<p>Since my last post, I have been working on a grant application. This has afforded the opportunity of some stock taking. I&#8217;ve also had some very helpful conversations with scholars in the field: Juan Garcés and Matt Munson in Hebrew Biblical Studies, Tim Finney in New Testament and Desmond Schmidt in textual computing and classics.</p>
<p>1. <em>Collation</em>. Based on very simple normalization and tokenization and a few samples, CollateX will remain error prone, unless the algorithm changes significantly. Examples: (1) In a Mishnah section with repeated words, slight differences in spelling resulted in pushing a whole clause off to the second match. (2) In another passage, CollateX failed to diagnose a missing clause in the text and aligned non matching tokens. My estimate is that currently the error rate is above 10% (for one passage it was about 15%). Better normalization will improve this result. This raises the question of whether the normalization (or, which may amount to the same thing, having CollateX ignore certain characters in comparison) can be carried out automatically, and what this would look like, or whether, as Desmond Schmidt assures me, the whole enterprise is wrongheaded.</p>
<p>2. <em>Statistical measures,</em> now done by hand, but ideally automated. I have now invested in a license for SPSS. This, and my old friend Excel have allowed me to run some preliminary analyses. First: run collations on every Mishnah section in my sample chapter using a few representative witnesses. Transfer the output to Excel; manually fix the alignment (remember, high error rate). Then start flagging variations. I have opted for a method that is akin to what Schmidt and Tim Finney have used: effectively to create a master document with all possible readings, and use a binary encoding (1, 0) for each witness for whether the reading appears in a given witness. (Since the text is already tokenized, I used individual tokens, aka words, not characters, for estimating distance.) Use SPSS to generate a distance matrix, multi-dimensional scaling (MDS), and clustering. I have also experimented with sites providing a graphic interface to Bioinformatic software (FastME and Phylip) to produce phylogenetic trees.</p>
<p>The results were interesting enough that I wanted to see the results with more careful identification of variance (I&#8217;m doing these by hand, after all) and more witnesses. I used the sections with the fullest representation among witnesses (Chapter 2, Mishnah 1-2), choosing a total of 10 witnesses. The results I got were consistent with the larger text sample and fewer witnesses, but neither represented the accepted wisdom on the relationship between manuscripts. I therefore divided the cases between no-variation, substantive (different word, different gender, change in grammatical form), and orthographic (initial waw, matres lectiones, spacing between preposition and word). As an example, the Greek word <em>emporia</em> generated no fewer than six variant spellings, but all represented a recognizable version of the word: orthographic, not substantive variation.</p>
<div id="attachment_63" class="wp-caption alignleft" style="width: 160px"><a href="http://mith-dev.umd.edu/mishnah/wp-content/uploads/OutputOrth13.jpg"><img class="size-thumbnail wp-image-63" src="http://mith-dev.umd.edu/mishnah/wp-content/uploads/OutputOrth13-150x150.jpg" alt="" width="150" height="150" /></a><p class="wp-caption-text">MDS for Orthographic Differences, 10 Witnesses</p></div>
<div id="attachment_33" class="wp-caption alignright" style="width: 160px"><a href="http://mith-dev.umd.edu/mishnah/wp-content/uploads/OutputSubst1.jpg"><img class="size-thumbnail wp-image-33" src="http://mith-dev.umd.edu/mishnah/wp-content/uploads/OutputSubst1-150x150.jpg" alt="MDS for Substantive Differences" width="150" height="150" /></a><p class="wp-caption-text">MDS for Substantive Differences, 10 Witnesses</p></div>
<p class="size-medium wp-image-34 ">Now, there were some interesting results: the manuscripts thought to be of the &#8220;Palestinian type&#8221; clustered closely on substantive differences, considerably less so (and differently) on orthographic differences.</p>
<p>The lesson: Orthographic and substantive variations do not coincide, probably due to scribal decision-making (and inconsistency). Substantive differences  seem to be better for groupings of text families. (This may be easier to identify automatically as well: normalizing orthography to improve collation erases orthographic difference (by definition), while retaining non-orthographic difference.) But lingusitic and orthographic differences are of research significance too. We may need a way for the user to flag readings to be compared.</p>
<div id="attachment_37" class="wp-caption alignright" style="width: 100px"><a href="http://mith-dev.umd.edu/mishnah/wp-content/uploads/SusbtDistanceforPhylogRootedTree.jpg"><img class="size-thumbnail wp-image-37 " src="http://mith-dev.umd.edu/mishnah/wp-content/uploads/SusbtDistanceforPhylogRootedTree-150x150.jpg" alt="" width="90" height="90" /></a><p class="wp-caption-text">Rooted Tree (Phylip) for Substantive Differences, 10 Witnesses</p></div>
<div id="attachment_54" class="wp-caption alignleft" style="width: 160px"><a href="http://mith-dev.umd.edu/mishnah/wp-content/uploads/SusbtDistanceforPhylogUnRootedTree.jpg"><img class="size-thumbnail wp-image-54" src="http://mith-dev.umd.edu/mishnah/wp-content/uploads/SusbtDistanceforPhylogUnRootedTree-150x150.jpg" alt="" width="150" height="150" /></a><p class="wp-caption-text">Unrooted Phylogenetic Tree, 10 Witnesses</p></div>
<p>As for visualization, we are not yet ready for phylogenetic stemmata, certainly not of the rooted type. The underlying assumptions about a steady evolutionary clock, and the absence of the assumption of contamination make the results interesting from a heuristic point of view, but unreliable in fact. We might think of an unrooted tree as a way of imagining the MDS space with links showing connections. The phylogenetic links in my examples are identical in the rooted and unrooted trees, although from the point of view of grouping families the unrooted tree makes more intuitive sense of the data (closer MSS appear closer) but the trees make the various close relations (the so-called &#8220;Palestinian tradition&#8221;) into the ancestors or early descendants of distinct traditions. This would require more work to establish, but in more generally, a phylogenetic scheme will require a model better suited to the data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.digitalmishnah.org/uncategorized/thinking-about-the-end-product/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
