The 18th World Congress of Jewish Studies

From Vilna Press to Annotated Digital Text: An Automated Digitization Pipeline

We introduce a tool which takes as input a scan or photo of a printed rabbinic text of any length (typically in old Rashi font) and automatically converts it into highly accurate machine-readable text that is vocalized and punctuated, with abbreviations expanded and all citations from earlier literature identified and linked in footnotes. The tool integrates a suite of AI-based algorithms developed at Dicta over several years.