The 18th World Congress of Jewish Studies

Optimizing Automatic Reading of Hebrew Manuscripts

The main goal of the research is computer reading (HTR/OCR) of handwritten medieval Jewish manuscripts, focusing primarily on the Midrashic literature. Manuscript handwriting styles being highly dependent on time, place and individual scribes’ predilections, we improve over state-of-the-art models by leveraging transfer learning. Models pre-trained over a large corpus are fine-tuned on the first few annotated pages of a manuscript in order to help decipher the rest of the manuscript.
We also research hybrid algorithms by integrating language models and other natural language processing (NLP) techniques together with computer vision OCR methods. Initial machine readings is compared to a
databank of known texts (editio princeps as well as those derived from other printed editions and manuscripts) and aligned with them so as to suggest better readings.
The research is part of a multi-university research team, which includes researchers from TAU, the Hebrew University and the University of Haifa and is carried out in cooperation with French and German scientists and scholars.