Digital Mishnah Demo Page

The demo for the Digital Mishnah project is now live (for the old “Examples of Work” page see here). In fact, there are two versions applying two versions of CollateX to perform the alignment:

  • Version A (based on the version of CollateX available in October 2011)
  • Version B (based on a newer version, numbered 1.3)

For genizah fragments, the entire fragment is represented. For other, longer manuscripts only Chapter 2 of Mishnah Bava Metsia is available.

What you will see

  1. Landing page. Select “Browse” to view texts; “Collate” to compare texts.
  2. “Browse.” At present only two formats are available: page format, approximating the layout of the original witness or fragment, and tei/xml, to see the underlying encoding
  3. “Collate.” Select the specific Mishnah from the drop down menu and the witnesses and order of presentation using numerals. The lowest number will appear in the top line of the alignment table and will be base text in the “Text and Apparatus” presentation. Output consists of:
  • Alignment table (“Partitur”; synopsis)
  • Text (first text selected) and apparatus (order of witnesses based on selection order)
  • Parallel column synopsis (order of witnesses based on selection order)

What’s going on

The demo is based on Cocoon pipelines that take Mishnah texts encoded in the TEI instantiation of xml and manipulate them using XSLT. The “Browse” mode flattens the hierarchy of the original xml, based on tractates and chapters, and rebuilds the texts based on manuscript pages. Later versions will show one page at a time, and have the ability to go forward/backward one page or chapter. And, of course, money and rights permitting, they will show the pages opposite an image.

The “Collate” mode extracts text and tokenizes it (read: breaks it up into word units), while retaining some structural data. The tokens, highly regularized and stripped down (removing all waws and yods, and including special handling for words and prepositions that can be written attached and unattached) are passed to the collator. The output is re-merged with the original tokens, and the remerged data is used for the output.

The collation is done using CollateX, based on a Cocoon implementation by Gregor Middel. All of the programming related to building the application was by Travis Brown. The TEI schema customization was done by Trevor Muñoz and subsequently modified by Hayim Lapin. The XSLT was done by Hayim with significant input from Travis.

Please use the blog page to comment on the demo, note errors, and suggest desiderata.