Corpora Development for Standard Yiddish

Valentina Fedchenko
Department of Jewish Culture, St. Petersburg State University

During its whole history, the Yiddish language planning process has been influenced by different, often controversial, ideological tendencies. A number of them, such as purism and dialectal unification, resulted in limitation of linguistic diversity and reduction of both grammatical phenomena and lexical usage in the Yiddish material that they described. Most of existing Yiddish grammars are compiled by competent language speakers who relied on their introspection and linguistic intuition, which means that the presentation of language material depends on author`s` explanatory adequacy and thoroughness. Existing grammars do not describe even all the varieties of written literary Yiddish and often give controversial data about different grammatical phenomena of the language and their interpretation. Advantages of studying Yiddish with corpus methodology are the: access to a large and diverse linguistic material and objectivity of description, possibility of quantitative analysis and analysis of frequency, description of language complexity. A modern corpus should be big in size, representative and contain mark-up. The only corpus which corresponds to these criteria is the Corpus of Modern Yiddish (CMY) [Birzer 2012] The advantages of corpus methodology will be illustrated in this paper by several morphological examples. The question of CMY improvement will be discussed apart: in particular, the problems of morphological disambiguation and filtering system creation, corpus enlargement and increase of its representativeness. Bibliography Birzer, S. 2012 The Corpus of Modern Yiddish: a new resource for linguistic research on Yiddish. In Yiddish Language Structures, B. Hansen and M. Aptroot.

Valentina Fedchenko
Valentina Fedchenko








Powered by Eventact EMS