The re-emergence of Hebrew as a language of everyday exchange with native speakers around the turn of the 20th century is an unprecedented case of language revival. A century later, still little is known about the linguistic unfolding of this process. This is due both to the scarcity of corpora documenting spoken Hebrew in the critical stages (Reshef 2012) and to paucity of research regarding its syntactic properties (Doron 2016).
Digital corpora of the nonextant language of the "revival generation" are lacking as a basis for an historical investigation of Hebrew`s re-emergence. Existing corpora suffer from some combination of the following limitations: (i) no genre diversity (generally only high register), (ii) mediocre quality of digital texts that were obtained through OCR technology, (iii) little metadata, crucially lacking the creation date in some cases, and (iv) limited access to the digital texts and accompanying linguistic annotations.
This paper describes the release of the Jerusalem corpus of Emergent Modern Hebrew (JEMH), a new corpus made possible by the mutual interests of linguists, the National Library of Israel, and Project Ben-Yehuda, the Israeli open literature initiative. JEMH is an open access historical corpus that aims to overcome the above limitations of existing corpora and to open new avenues for linguistic research. Case studies will focus on the modal domain and include syntactic and semantic change within the modal existential/possessive construction (yeš li l-; Ben-Hayyim 1953/1992). The role of metadata enhancement in analyzing the effect of contact on language change will be highlighted.