קונגרס העולמי ה-18 למדעי היהדות

Applying Text Reuse Detection for Improving Automatic Transcription

Following the success in automatic transcription of Hebrew Manuscripts, this lecture will present a model for automatic cataloging of Manuscripts, which may serve both for identifying unknown materials in anthological materials, and as means for improvement of the automatic transcription.
We introduce a framework that automatically catalogs anthologies into granular pieces based on probably reused texts.
Then we apply this framework as a tool to align the texts and propose text corrections to automatic transcriptions generated by e-Scriptorium.