A New Dictionary for Learners of Thai : Corpus-Based Approach Assoc. Prof. Dr. Jirapa Vitayapirak King Mongkuts Institute of Technology Ladkrabang (KMITL) [email protected], Ilan J. Kernerman K Dictionaries Ltd [email protected] AIMS: To report a research project on developing

a new dictionary for learners of Thai based on Thai National Corpus (TNC) . This paper argues that Thai dictionary compilation should be prior in corpus desi gn. OUTLINE Background of Dictionaries and corpora and the Project History of Thai lexicography Thai (learner's) Dictionary Core

Project (TDC) DICTIONARIES AND CORPORA Since 1980, the growing technology for text-processing/ corpus Major dictionary publishers have produced dictionaries by using corpora to design better learners dictionaries (Sincl air 1994): -Collins COBUILD -Longman -Oxford -Cambridge

Thai Dictionaries Little previous research had been carried on the design of Thai dictionaries. Thai (learner's) dictionary core project (TDC) Background of the K. Dictionaries Project a part of the new Bilingual Learners Dictionaries Series (BLDS) which currently in cludes monolingual dictionary cores for nearl y 20 major European and Asian languages su ch as French, Japanese, Chinese, etc. The aim of this project is to develop a Thai dictionary and eventually use for translation other languages.

EXAMPLE: BLDS French-Japanese contradictoire adj mujun shita des tmoignages contradictoires ai-hansuru shoogen contraindre vt kyoosee suru contraindre qqn faire qqch hito ni ~ suru yoo shiiru contrainte nf kyoosee

contraire adj 1 oppos hantai no des avis contraires tairitsu suru iken tre contraire la loi hoo ni han suru 2 au contraire hantai ni, sore dokoro ka A summary of the project : (1) the translations of the Password English-Thai dictionary (2) edit the Thai-English index, that was generated by

reversing the English-Thai translations of (1) (3) compile the new Thai dictionary core, based on the wordlist of the Thai-English index of (2) Thai (learner's) dictionary core project (TDC) Password English-Thai jab [db] past tense, past participle jabbed b] past tense, past participle jabbed b] past tense, past participle jabbed verb to poke or prod: He jabbed me in the ribs with his elbow; She jabbed the needle into her finger. {16295: } noun

a sudden hard poke or prod: He gave me a jab with his finger; a jab of pain . {16296: } jabber [db] past tense, past participle jabbed b] past tense, past participle jabbed b] verb to talk idly, rapidly and indistinctly: The women are always jabbering with one another. {16297: ; } jack [db] past tense, past participle jabbed b] past tense, past participle jabbed k] noun 1 an instrument for lifting up a motor car or other heavy weight: You shoul d always keep a jack in the car in case you need to change a wheel. {16298: } 2 the playing-card between the ten and queen, sometimes called the knave : The jack, queen and king are the three face cards. {16299: }

SECTION 2 History of Thai lexicography Early Thai Dictionaries (1800s) In 1800s Siam began opening up to foreigners. There were many foreigners such as missionaries living in Siam. These foreign ers strongly needed to communicate with Th ais so the bilingual Thai dictionaries were la boriously compiled by hand before the days o f the commercial printing press in Siam

(McFarland 1944: i) Thai dictionaries 1.1 Dictionary of the Siamese Language (Caswell, 1846)- a hand-written Thai-English dictionary 1.2 Dictionarium Lingue Thai (Pallegoix, 1854)- Thai Latin French English 1.2 Dictionary of the Siamese Language (Bradley, 1873)-Thai-Thai 1.3 Siamese-English Dictionary (Michell, 1892) 1.4 (1892)- Thai-Thai 1.5 The Royal Institute Dictionary 1927 (revised and first printing in 1950) - Native Thai di

ctionary 1.6Thai-English Dictionary (McFarland, 1944)-Stanford 1.7 Thai-English Students Dictionary (Haas, 1964) - Stanford Etc. SECTION 3 Thai (learner's) Dictionary Core Project (TDC) TDC dictionary Aim: This dictionary follows a simple principle of making it easy for non-native Thai users to fin

d what they want and use it for translation other languages Audience: Adults or students who are learning Thai as a foreign language at an intermediate level. How many words? It contains 12,000 entries. Methodology: The development of a Thai dictionary core Preparing the list of (12,000) main headwords using TNC compiling the entries by the editorial team

revision by the chief editor using an XML Editor for lexicographical work http://www.thai-dict.com/blds/ THE HEADWORD LIST 12,000 lemmas It includes all the words that are the most frequent or particularly important in Thai (written and spoken) and used in day-to-day situation. The Thai National Corpus (TNC) was used as a guideline for headword selection. Central Thai takes precedence. The Thai words are arranged alphabetically from

to in accordance with the system of the Royal Institute Thai Dictionary (1982) Thai National Corpus TNC is a general corpus of standard Thai designed to be a comparable corpus of British National Corpus by Chulalongkorn University. The project aims to collect eighty million words. (Aroonmanakun, 2009) Corpus (TNC) at: http:www.arts.chula.ac.th/~ling/tnc2/

Frequency data as criteria for headword selection 817 781 653 544 416 255 228

222 174 112 32 24 21 9

CONCORDANCE Use of Authentic Examples: ... ! , , ... ... () ...

... ... " 2549 ... helping learners with real Thai THE ENTRY COMPONENTS: HEADWORD PRONUNCIATION PART OF SPEECH THAI DEFINITION (for each sense of the headword) EXAMPLE (Thai example of usage) + SPECIAL FEATURES

Idiom (Thai collocational phrases) Sense indicators, register, subject field, hypernym, synonym, antonym Structure of an entry: Headword Pronunciation Part of speech Definition k ban yat Noun

() Example /kok/ Noun 1 () '' 2 , 3 ()

Verb , , http://www.thai-dict.com/blds/ Sense Indicators Wherever possible the first sense is a common one usually the sense that most people would expect. Subject field Disambiguate the sense by providing the general subject category of the word entry. Example:

() legal: () mathematical: () scientific: () literature: Register Comments related to register, such as formal, impolite, informal, spoken, etc. can be provided. Abbreviations: () grammar: () royal word: () impolite word: () colloquial language: () formal language: ;

() dialect: () obsolete or old-fashioned: () figurative: () female: () male: Sense qualifier Phrases such as (black sheep) indicate figurative language (). () indicates the translation which is the literal meaning. For example: [ kb] past tense, past participle jabbed dam] Noun ()

() Hypernym, Synonym, Antonym The hypernym is a generic term, to whose field of application the headword belongs. It helps to disambiguate its sense. For example: Noun () Verb Synonym (=) and antonym (#)

For example: [ lo rk ] Verb ; = , [ kaaw ] Adjective ; # CONCLUSION In short, Thai dictionaries should be based on corpus data of Thai literature. Frequency data and concordancing can play an important role in the design, evaluation, and revi

sion of Thai dictionaries. So we can m ake sure that the information recorded in the dictionaries is authoritative and backed up by empirical evidence. Thank you for your attention!

