Corpus Evidence and the Pedagogical Lexicography of Modern Hebrew - The 17th World Congress of Jewish Studies August 6-10, 2017

Since the so-called "corpus revolution" started in the 1980s in pedagogical lexicography, more and more learner`s dictionaries have come to use (lexicographic) corpora as the main or exclusive data source. This methodology has been extending to comprehensive dictionaries for native speakers. The size of such corpora used in the major monolingual and bilingual learner`s dictionaries of, for example, English, is between some hundred million and several billion words.

Modern Hebrew did not have such a lexicographic corpus, until (the annotated version of) heTenTen with the size of about 900 million words was released in 2015 as part of the world-renowned corpus query system Sketch Engine. This is the first lexicographic corpus of Modern Hebrew that can serve as the main or exclusive data source for future dictionaries of Modern Hebrew, at least for foreign learners (with a limited number of headwords), if not for native speakers.

After a brief survey of heTenTen and Sketch Engine as a source and a tool respectively for dictionary writing, the present study demonstrates how corpus evidence can contribute to improving the existing monolingual and bilingual learner`s dictionaries of Modern Hebrew in the following four core areas of dictionary writing: 1) frequency ranking of words and subsequent preparation of headwords, 2) checking of word senses and their frequency in the existing dictionaries and detection of new word senses, 3) finding collocations, and 4) finding good dictionary examples.