26 Apr 2012

This Site

I’ve now updated the “Examples of Work” page to include viewable samples. Thanks to Kirsten Keister for setting up the light box format to view the samples. The examples include two samples of work that processes more than one text (collation, synopsis) and a number of examples of manuscripts.

The Project

I’ve been working on two issues. One is pointing. I now have a complete set of pointers from the reference file (ref.xml) to the witness files for locating spans of damaged text and page and fragment beginnings and ends for fragmentary texts. Of course, because nothing is simple, the direction of all of these will have to be reversed, so that the individual witnesses point into the reference text.
In addition, I’ve improved the tokenization process, so that I can process “rich” tokens, retaining data about the word in question (e.g., that it is an abbreviation, or deleted ….; hold a regularized spelling as well as the original) as well as simple tokens, and re-join a collation based on simple tokens with the complex tokens.

Text Geek Heaven

Along the way, I’ve discovered some joining Genizah fragments. The coolest by far on a technical, jigsaw-puzzle level is the four-way join between TS AS 78.69, TS AS 78.162, TS AS 78.235 and TS NS 329.286 (Cambridge). The four fragments adjoin yet another, TS E2.71. This will be featured as a Fragment of the Month of the Taylor-Schechter Genizah Research Unit. Look for it there!
Cool in that that they join material from multiple cities are:

  • TS E1.99 (Camb), MS heb. 8-11 (Oxf),  and TS F6.3, joining fragments from Cambridge and Oxford and:
  • TS AS 85.270 (Camb), TS F6.2, and MS R2339, fol. 1 (JTS), joining fragments from Cambridge and New York


Tags: , ,

4 Responses

  1. daniel

    Very nice Hayim! Of course, the joins are a really nice “collateral” result of your work.
    Do your pointers indicate the approximate size of the lacuna?
    D’sh Hama

    • Hayim Lapin

      If I understand the question, this is something that is undergoing revision. Gaps are noted in two ways. First, the actual lacunae are noted, and tagged with an estimated character length. Where there is no margin preserved on at least one side, however, my practice is to only mark the distance from the end or beginning of the line to the edge of a circumscribing rectangle around the fragment. My hope is that this will facilitate the “tiling” of multiple fragments. (The alternative is simply to divide the length of the expected text in half ….)
      Second, and this is still under development, we plan to use xpointer to identify the corresponding start positions and end positions in a fragment on a “central” document. In theory (we have not implemented this yet), if the collation is reliable enough, we should be able allow the user to pull the text of any other witness to fill the lacuna. One could implement this in an “edition” format, pulling the missing text from a select witness, or a kind of diagnostic alignment table, allowing the user to see the possible range of variation etc.)

  2. Hayim Lapin

    For those who are keeping track, I have now edited this post twice to correct the fragment groups of Cairo Genizah fragments. Not sure what was going on with my brain when I first posted!

  3. Pingback: Post: Progress, real but in small steps | Maryland Institute for Technology in the Humanities

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>