Socio-Political Domain vs. Lexicon

Research Computing Center of Moscow State University NCO Center for Information Research Natalia V. Loukachevitch, Boris V. Sociopolitical Domain as Dobrov a Bridge from General Words to Terms of Specific General Words and Terms in Automatic Text Processing Texts in electronic collections contain

as general words as terms Two different research domains: lexicology and terminology Wuster (founder of Vienna school of terminology): terminologists begin consideration from a concept, but lexicologists from a form of a linguistic expression Wuster: difference between lexicological and terminological approaches

terminological research starts from the concept which has to be precisely delimited in terminology concepts are considered to be independent from their designations terminologists talk about concepts while linguists talk about word meanings Construction of Wordnets and Terminology Research Development of wordnets: Construction of hierarchical semantic

networks Search for similar synsets for different languages building the top ontology of language-independent concepts Approaches to study of general words and terms become closer Theory of Terminology: Properties of Ideal Term the term must relate directly to the concept. It must express the concept clearly, there should be no synonyms where

absolute, relative or apparent, the contents of terms should be precise and not overlap in meaning with other terms, the meaning of the term should be independent of context. Theory of terminology: serious difference between a general word and a term biunivocal relationship between concepts and terms in each special field of knowledge For a terminology nothing could be better than that: no synonymy, no homonymy and no polysemy

A huge gap between general words and terms BUT! Term Formation and Words of General Language A general sense of a word and a terminological senses of a word are really different: function as a general word, function in mathematics, function in biology

Cruse: senses of a lexical form are antagonistic to one another; that is to say , they can not be brought into play simultaneously without oddness A word and a term are very similar in meaning arson - Law. the malicious burning of another's house or property, or in some statutes, the burning of one's own house or property, as to collect insurance (Random House Unabriged dictionary) A general dictionary uses a very strict definition

How to distinguish terminological and general senses Teacher in court accused of school arson A teacher charged with setting fire to a West Yorkshire school has appeared in court. Amina Ditta, 23, of Scholemoor Road, Bradford, has faced the city's magistrates court charged with one count of arson. The charge relates to an incident last Wednesday at Atlas Primary School in Manningham, where Ms Ditta was employed. She spoke only to confirm her personal details and was represented by barrister Mr Narinda Sekhon. She was granted conditional bail to return to court on June 12. ( Traditional point of view:

definitions Traditional terminologists: definitions of terms are strict in comparison to glosses of general words Contemporary point of view: degree of vagueness in term definitions is lower, but in many cases it is inevitable.

Taxation in Russian legislation: New construction vs. repair How many general and terminological senses are so close? - 1 Building - relatively permanent enclosed construction over a plot of land, having a roof and usually windows and often more than one level, used for any of a wide variety of activities, as living, entertaining, or manufacturing (Unabridged Webster dictionary) Domains Construction industry Domain of public utilities

It is impossible to separate senses Practically all denotations are the same How many general and terminological senses are so close? - 2 transportation means, job positions, technical devices, food, agricultural plants and animals other natural objects, art work and others Produced by professionals we use them in everyday life

social, political and economic processes planned or restricted by professionals, our life is influenced by them General words and terminologies Intersection is significant Number of words in general dictionaries -- 40-50 percents belong to the intersection area

We call this intersection area -socio-political domain -- domain of social life -- it describes everyday life of contemporary society The sociopolitical domain and domains in WordNet Many researchers proposed sets of domains for WordNet and EuroWordNet The sociopolitical domain is approximately equal to sum of the proposed domains A synset is related to the sociopolitical domain if there is a professional domain (not science) that has a term with very

similar sense (+- vagueness) Emotions and feelings do not belong to the sociopolitical domain Multiword terms from specific domains A lot of multiword terms from professional domains are understandable to native speakers Multinational country Single member constituency Amicable agreement Global market Criminal omission Special criteria for inclusion of multiword

expressions Features of Sociopolitical Domain1 Texts of various genres official documents, international treaties, legislative documents, newspaper articles are related to the sociopolitical domain. Development of a unified linguistic resource for automatic text processing of such various texts A broad basis for development of domainspecific resources Features of Sociopolitical Domain2 Inclusion of multiword terms facilitates disambiguation procedures Ambiguity within the domain is much lower than in the

whole resource, distinctions between senses are more definite and more important it is possible to use different disambiguation procedures within the sociopolitical domain and out of the domain Procedures of identification of lexical cohesion, lexical chains can be also different for synsets in the sociopolitical area and out of it, because of more thematic definiteness of concepts in the sociopolitical domain (privatization vs. creation) Experience of Work in Sociopolitical Domain Project University Information System RUSSIA ( 800 thousand Russian Documents (after 1991) Russian thesaurus on Sociopolitical life

(since 1994) concept-based network of 30 thousand concepts, 75 thousand words and terms Automatic text processing since 1995 text categorization, automatic conceptual indexing, text summarization University Information System RUSSIA ( Legal documents State Duma daily records

Source 800,000/ 7.5GbRetrospective Official Publication 1990- Coverage Documents 55,000 State Duma 1994- 100,000

Statistics State Statistics Agency; CIS Interstate Statistics Committee 1998- 20,000 Mass media Expert weekly; Nezavis. gazeta; Izvestia; 199(7)-

180,000 Central Bank of RF; Rus.-Europ.Center for Economic Policy; 1996- 10,000 MSU Publishing, RePEc, Sociology Research, 1999-

2,000 Analytical reports Scientific publications (+230,000ref) Socio-Political Domain vs. Lexicon Levels of Hierarchy Scienc es

Lexicon SocioPolitica l Domai n 110,000 text entries 50,000 concepts 75,000 text entries 30,000 concepts Specific Domains vs. SocioPolitical

Levels of Hierarchy Socio-Political Domain Elections Geography Industrial Production Interrelations between Socio-Political Domains Levels of Hierarchy Socio-Political Domain Taxation Law

Accounting Banking Sciences vs. Socio-Political Domain Social Science s Natural Science s SocioPolitical Domain

SocioPolitical Domain Specific applications of Sociopolitical thesaurus Terms of economics and sociology were included automatic text categorization of scientific papers (700 Categories JEL (Journal of Economic Literature subject headings) Terms of non-production spheres were added automatic text categorization of Russian legislation (3000 categories of the commercial subject headings system) Conclusions-1

A border between a general language lexicon and terminologies of specific domains is not sharp and abrupt. It looks more like a broad strip and contains general language senses practically coinciding with concepts of social subdomains and concepts of specific domains understandable for native speakers Conclusions-2

Detailed description of concepts, terms, words from this transition area, called sociopolitical domain, can be naturally added to a wordnet semantic network and facilitate solution of such problems as lexical disambiguation and identification of the text structure, enhance coverage of domain-specific texts by wordnets synsets, improve effectiveness of the wordnets use in various automatic text processing applications

