Building a Primitive-based Lexical Consultation System prepared by Lim Beng Tat Supervisor: Dr Tang Enya Kong Dr. Guo Cheng Ming
Abstract The research gives about the design of semantic-primitive-based lexical consultation system and the possible processes which will be performed on a mahine-readable dictionary (MRD) and corpus to produce a machine-tractable dictionary (MTD) and tractable corpus automatically. Linguistic tools such as sense tagger and reources are created during or after the processes. Besides that, this research will also show how to perform an unsupervised word sense disambiguation method to the samples of unrestricted text from various prospective application areas by using the newly constructed MTD. This is important to the applications that need lexical semantics such as machine translation, information retrieval and hypertext navigation, content and thematic analysis, grammatical analysis, speech processing and text processing.
Lexical Consultation System
System design and architecture
Bilingual Knowledge Bank
Supply knowledge (language and world)
E.g. Collins English Dictionary (CED), Longman's Dictionary of Contemporary English (LDOCE) and Webster's 9th Dictionary (W9)
word sn pos definition be 10 n spend or use time english 2 n people of england . . . ... . . . ...
Explicit information (POS)
Implicit information / semantic information
Hypernym/hyponym relations (class/subclass)
Meronym/Holonym relation (part/whole, ...)
Collocational relations (compounds, idioms, ...) and etc
Problem: Extracting semantic information from dictionary?
Identify significant recurring phrase
E.g. “ A member of”- NP
hand a member of a ship's crew…[W9]
Extraction of semantic hierarchy
Extraction of hyponym.
E.g. dipper a ladle used for dipping... [CED]
ladle a long-handled spoon... [CED]
spoon a metal, wooden, or plastic utensil ... [CED]
utensil spoon ladle dipper
E.g. tool an implement, such as a hammer... [CED]
implement a piece of equipment ; tool or utensil. [CED]
utensil an implement, tool or container... [CED]
Inconsistency in dictionaries
E.g. corkscrew a pointed spiral piece of metal... [W9]
dinner service a complete set of plates and dishes... [LDOCE]
Dictionaries for human usage
Semantic primitive and word sense disambiguation
Semantic primitive refer to a “core” meaning that cannot be not further analyzed
E.g. bachelor and red
bachelor means that someone is a man who is not married
What does red mean ?
red represents semantic primitive (a basic meaning), while bachelor does not.
Semantic Primitive (Cont)
2 types of semantic primitive
Prescriptive and descriptive
Prescriptive semantic primitives
Set of pre-defined primitive
E.g. father marry couple
marry :[ human , human ].
father : [ human ]
couple : [ human , thing].
To choose the correct sense of ‘couple’
Prescriptive semantic primitives
Problem: always need to be extended
Descriptive semantic primitives
Set of semantic primitives which is derived from a natural source of data such as dictionary.
Semantic Primitive (Cont) father5 - a term#5 of address for priest#2 in some church especially roman#7 or orthodox#3 catholic marry3 - perform#1 a marriage#4 ceremony couple1 - a pair#5 of people#5 who live#7 together#2 Uniquely identify each of the definition of entries Avoid Circularity
Word Sense Disambiguation(WSD)
Documents are collections of sentences containing words
Some words have more than one meaning. These meanings are often called word senses.
Assign meanings to words in some context according to some lexical resource.
Producing Machine-Tractable Dictionary (MTD) from Machine-Readable Dictionary using descriptive semantic primitives and WSD
Producing tractable database/corpus from database/corpus
Encoded with information extracted from MRD
Usable format and highly structured semantic information for NLP tasks
Determining the relatedness or closeness among word senses in a dictionary Descriptive semantic primitives word sn sp pos definition be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D, -] [people, N, 1] [of, P, -] [england, N, 1] . . . ... . . . ... LCDD = 0.1 %
Lexical Consultation System
Semantic Primitive Extractor
Searching for self-reference circle in definition
Semantic Primitive Extractor
sense_1 [def] [sense_2 sense_5 sense_6]
sense_2 [def] [sense_3 sense_2]
sense_3 [def] [sense_1 sense_2]
sense_4 [def] [sense_5]
sense_5 [def] [sense_2 sense_4]
sense_6 [def] [sense_5 sense_4]
=>sense_1 is a semantic primitive
Step 1: Expanding dictionary
Semantic Primitive Extractor (cont) abandon 1 a feeling of extreme emotional intensity abandon 2 leave behind . . betray 2 abandon abandon 1 a feeling of extreme emotional intensity abandon 2 leave behind . . betray 2 abandon1 abandon2
Step 2: identify semantic primitives using self-reference circle
Extract primitives from pre-released WordNet during SENSEVAL2.
forecast2 : predict1 in advance3 fixed6 : specify1 in advance3 make3 a prediction1 about a change1 for the better2 progress4 predict1 advance3 be specific1 about a change1 for the better2 progress4 specify1 advance3
LCDD generator(Cont) LCDD(forecast2, fixed6) = a*70% + (b + c + d)/3*30% Depth-First Method Layer 1 for forecast2 Layer 2 for forecast2 Layer 2 for fixed6 Layer 1 for fixed6 a b d c Layer 1 specify1 in advance3 Layer 1 predict1 in advance3
a = 1/[(2+2)/2]
Simple Summation Algorithm
For example, assume that a sentence, ‘ father’ , ‘ marry’ and ‘ couple’. Each word in the sentence has two senses only .
The best combination of word senses: father1 marry2 couple1
System Design Lexical Consultation System Domain MTD for WSD General Dictionary (MTD) + Domain MRD Domain Database/Corpus Tractable Domain Database/Corpus
System Architecture Papillon Dictionaries or FEM Bilingual Knowledge Bank (BKB)
Part-of-speech tagging (Auto)
Semantic Primitive (SP) identification
SP WSD (Auto)
SP LCDD generator (Auto)
Domain Semantic primitive (MTD) General Dictionary (MTD) Domain MRD
LCDD generation (Auto)
Domain Database/Corpus Tractable Domain Database/Corpus LCDD=10% word sn sp pos definition be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D - ] [people, N, ? ] [of, -, - ] [england, N, ? ] people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2] . . . . ... . . . . ... LCDD=0.3% word sn sp pos definition be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D, - ] [people, N, 1 ] [of, P, - ] [england, N, 3 ] people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2] . . . . ... . . . . ...
Tractable Bilingual Knowledge Bank (BKB) kutip(1)[v] (3-4/3-4) itu(1)[det] (3-4/3-4) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) dia kutip bola itu 0-1 3-4 2-3 3-4 1E 1M pick(1)[v] up(1)[p] (3-4+7-8/3-4) the(1)[det] (2-3/2-3) he(1)[n] (0-1/0-1) ball(1)[n] (3-4/2-4) he pick the ball up 0-1 3-4 2-3 3-4 7-8 (0-5,0-4) (0-1,0-1) (2-4,2-4) (2-3,3-4) (2-3,3-4) (3-4,2-3) (0-1,0-1) he(1)[n] (0-1/0-1) kutip(1)[v] (3-4/3-4) itu(1)[det] (3-4/3-4) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) dia kutip bola itu 0-1 3-4 2-3 3-4 1E 1M pick(1)[v] up(1)[p] (3-4+7-8/3-4) the(1)[det] (2-3/2-3) he(1)[n] (0-1/0-1) ball(1)[n] (3-4/2-4) he pick the ball up 0-1 3-4 2-3 3-4 7-8 (0-5,0-4) (0-1,0-1) (2-4,2-4) (2-3,3-4) (2-3,3-4) (3-4,2-3) (0-1,0-1) he(1)[n] 0-1 0-1 (0-1,0-1) (0-1,0-1) dia(1)[n] (0-1/0-1) kutip( 2 )[v] (3-4/3-4) itu( 1 )[det] (3-4/3-4) bola( 1 )[n] (2-3/2-4) 0 lelaki 1 tua 2 itu 3 kutip 4 bola 5 itu 6 lelaki( 3 )[n] (0-1/0-3) itu ( 1 )[det] (2-3/2-3) tua ( 2 )[adj] (1-2/1-2) pick( 1 )[v] up( 1 )[p] (3-4+7-8/3-4) the( 2 )[det] (2-3/2-3) ball( 1 )[n] (3-4/2-4) 0 the 1 old 2 man 3 pick 4 the 5 ball 6 up 7 man( 4 )[n] (2-3/0-3) the( 2 )[det] (0-1/0-1) old( 3 )[adj] (1-2/1-2)
Any comments please send to [email_address]
Step 2: compute the frequency of each sense entry in dictionary according to its appearance in definition text.
Sort the list by frequency
an entry with high frequency =>
high probability that entry is a primitive
Possibility of selecting wrong semantic primitives based on the self-reference method
Semantic Primitive Extractor (cont) Sense frequency be10 40 english2 20
Improving the quality of a number of Natural Language Processing Tasks:
Internet Search Engines
WSD (Cont) previous path value + difference between the two consecutive paths D 7 D 6 D 5 D 4 D 3 D 2 D 1 P 1 Difference P 8 = P 7+ D 6 couple2 marry2 father2 P 7 = P 6+ D 5 couple1 marry2 father2 P 6 = P 5+ D 4 couple2 marry1 father2 P 5 = P 4+ D 4 couple1 marry1 father2 P 4 = P 3+ D 3 couple2 marry2 father1 P 3 = P 2+ D 2 couple1 marry2 father1 P 2 = P 1+ D 1 couple2 marry1 father1 P 1 couple1 marry1 father1 Path value Path