Limbengtatt25july2002
Upcoming SlideShare
Loading in...5
×
 

Limbengtatt25july2002

on

  • 550 views

 

Statistics

Views

Total Views
550
Views on SlideShare
550
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Limbengtatt25july2002 Limbengtatt25july2002 Presentation Transcript

  • Building a Primitive-based Lexical Consultation System prepared by Lim Beng Tat Supervisor: Dr Tang Enya Kong Dr. Guo Cheng Ming
  • Abstract The research gives about the design of semantic-primitive-based lexical consultation system and the possible processes which will be performed on a mahine-readable dictionary (MRD) and corpus to produce a machine-tractable dictionary (MTD) and tractable corpus automatically. Linguistic tools such as sense tagger and reources are created during or after the processes. Besides that, this research will also show how to perform an unsupervised word sense disambiguation method to the samples of unrestricted text from various prospective application areas by using the newly constructed MTD. This is important to the applications that need lexical semantics such as machine translation, information retrieval and hypertext navigation, content and thematic analysis, grammatical analysis, speech processing and text processing.
  • Outline
    • Introduction
      • Problem
      • Objective
    • Lexical Consultation System
      • System design and architecture
    • Example applications
      • Bilingual Knowledge Bank
  • Introduction
    • Dictionaries
      • Supply knowledge (language and world)
      • E.g. Collins English Dictionary (CED), Longman's Dictionary of Contemporary English (LDOCE) and Webster's 9th Dictionary (W9)
    word sn pos definition be 10 n spend or use time english 2 n people of england . . . ... . . . ...
  • Introduction (Cont)
    • Explicit information (POS)
    • Implicit information / semantic information
      • Hypernym/hyponym relations (class/subclass)
      • Synonymy/Antonymy relations
      • Meronym/Holonym relation (part/whole, ...)
      • Collocational relations (compounds, idioms, ...) and etc
  • Introduction (Cont)
    • Problem: Extracting semantic information from dictionary?
    • 2 methods:
      • Defining pattern
        • Identify significant recurring phrase
        • E.g. “ A member of”- NP
          • hand a member of a ship's crew…[W9]
      • Extraction of semantic hierarchy
        • Extraction of hyponym.
        • E.g. dipper a ladle used for dipping... [CED]
        • ladle a long-handled spoon... [CED]
        • spoon a metal, wooden, or plastic utensil ... [CED]
    utensil spoon ladle dipper
  • Introduction (Cont)
    • Disadv:
      • Circularity
        • E.g. tool an implement, such as a hammer... [CED]
        • implement a piece of equipment ; tool or utensil. [CED]
        • utensil an implement, tool or container... [CED]
      • Inconsistency in dictionaries
        • E.g. corkscrew a pointed spiral piece of metal... [W9]
        • dinner service a complete set of plates and dishes... [LDOCE]
      • Dictionaries for human usage
    • Other methods:
      • Semantic primitive and word sense disambiguation
  • Semantic Primitive
    • Semantic primitive refer to a “core” meaning that cannot be not further analyzed
      • E.g. bachelor and red
      • bachelor means that someone is a man who is not married
      • What does red mean ?
      • red represents semantic primitive (a basic meaning), while bachelor does not.
  • Semantic Primitive (Cont)
    • 2 types of semantic primitive
      • Prescriptive and descriptive
    • Prescriptive semantic primitives
      • Set of pre-defined primitive
      • E.g. father marry couple
          • marry :[ human , human ].
          • father : [ human ]
          • couple : [ human , thing].
      • To choose the correct sense of ‘couple’
    • Prescriptive semantic primitives
      • Problem: always need to be extended
    • Descriptive semantic primitives
      • Set of semantic primitives which is derived from a natural source of data such as dictionary.
      • E.g.
    Semantic Primitive (Cont) father5 - a term#5 of address for priest#2 in some church especially roman#7 or orthodox#3 catholic marry3 - perform#1 a marriage#4 ceremony couple1 - a pair#5 of people#5 who live#7 together#2 Uniquely identify each of the definition of entries Avoid Circularity
  • Word Sense Disambiguation(WSD)
    • Documents are collections of sentences containing words
    • Some words have more than one meaning. These meanings are often called word senses.
    • Goal:
      • Assign meanings to words in some context according to some lexical resource.
  • Objective
    • Producing Machine-Tractable Dictionary (MTD) from Machine-Readable Dictionary using descriptive semantic primitives and WSD
    • Producing tractable database/corpus from database/corpus
  • Linguistic Resources
    • Machine-Tractable dictionary
      • Encoded with information extracted from MRD
      • Usable format and highly structured semantic information for NLP tasks
    Determining the relatedness or closeness among word senses in a dictionary Descriptive semantic primitives word sn sp pos definition be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D, -] [people, N, 1] [of, P, -] [england, N, 1] . . . ... . . . ... LCDD = 0.1 %
  • Lexical Consultation System
    • Semantic Primitive Extractor
    • LCDD Generator
    • WSD
    • Searching for self-reference circle in definition
      • For example,
    Semantic Primitive Extractor
      • sense_1 [def] [sense_2 sense_5 sense_6]
      • sense_2 [def] [sense_3 sense_2]
      • sense_3 [def] [sense_1 sense_2]
      • sense_4 [def] [sense_5]
      • sense_5 [def] [sense_2 sense_4]
      • sense_6 [def] [sense_5 sense_4]
    =>sense_1 is a semantic primitive
    • Step 1: Expanding dictionary
    Semantic Primitive Extractor (cont) abandon 1 a feeling of extreme emotional intensity abandon 2 leave behind . . betray 2 abandon abandon 1 a feeling of extreme emotional intensity abandon 2 leave behind . . betray 2 abandon1 abandon2
    • Step 2: identify semantic primitives using self-reference circle
    • Example,
      • Extract primitives from pre-released WordNet during SENSEVAL2.
        • Pre-released WordNet1.7 = 192,460 entries
        • Extracted primitives = 9368 entries (around 5% of pre-released WordNet1.7 entries)
    Semantic Primitive Extractor (cont)
  • LCDD generator
    • Identify the word senses’ definition layers
      • First layer for forecast2 and fixed6
      • Second layer for forecast2 and fixed6
        • forecast2
        • fixed6
    forecast2 : predict1 in advance3 fixed6 : specify1 in advance3 make3 a prediction1 about a change1 for the better2 progress4 predict1 advance3 be specific1 about a change1 for the better2 progress4 specify1 advance3
  • LCDD generator(Cont) LCDD(forecast2, fixed6) = a*70% + (b + c + d)/3*30% Depth-First Method Layer 1 for forecast2 Layer 2 for forecast2 Layer 2 for fixed6 Layer 1 for fixed6 a b d c Layer 1 specify1 in advance3 Layer 1 predict1 in advance3
      • a = 1/[(2+2)/2]
  • WSD
      • Simple Summation Algorithm
        • For example, assume that a sentence, ‘ father’ , ‘ marry’ and ‘ couple’. Each word in the sentence has two senses only .
          • father1 marry1 couple1
          • father1 marry1 couple2
          • father1 marry2 couple1
          • father1 marry2 couple2
          • father2 marry1 couple1
          • father2 marry1 couple2
          • father2 marry2 couple2
          • father2 marry2 couple2
        • Dynamic programming techniques
    Repetitive Calculation 15.0 15.0 = 40.0 = 40.0 = 60.0 = 45.0 = 35.0 = 24.0 = 40.0 = 40.0 35.0 10.0
    • The best combination of word senses: father1 marry2 couple1
  • System Design Lexical Consultation System Domain MTD for WSD General Dictionary (MTD) + Domain MRD Domain Database/Corpus Tractable Domain Database/Corpus
  • System Architecture Papillon Dictionaries or FEM Bilingual Knowledge Bank (BKB)
    • Part-of-speech tagging (Auto)
    • Semantic Primitive (SP) identification
    • SP WSD (Auto)
    • SP LCDD generator (Auto)
    Domain Semantic primitive (MTD) General Dictionary (MTD) Domain MRD
    • Part-of-speech tagging(Auto)
    • WSD (Auto)
    • LCDD generation (Auto)
    Domain MTD
    • Part-of-speech tagging(Auto)
    • WSD (Auto)
    Domain Database/Corpus Tractable Domain Database/Corpus LCDD=10% word sn sp pos definition be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D - ] [people, N, ? ] [of, -, - ] [england, N, ? ] people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2] . . . . ... . . . . ... LCDD=0.3% word sn sp pos definition be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D, - ] [people, N, 1 ] [of, P, - ] [england, N, 3 ] people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2] . . . . ... . . . . ...
  • Tractable Bilingual Knowledge Bank (BKB) kutip(1)[v] (3-4/3-4) itu(1)[det] (3-4/3-4) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) dia kutip bola itu 0-1 3-4 2-3 3-4 1E 1M pick(1)[v] up(1)[p] (3-4+7-8/3-4) the(1)[det] (2-3/2-3) he(1)[n] (0-1/0-1) ball(1)[n] (3-4/2-4) he pick the ball up 0-1 3-4 2-3 3-4 7-8 (0-5,0-4) (0-1,0-1) (2-4,2-4) (2-3,3-4) (2-3,3-4) (3-4,2-3) (0-1,0-1) he(1)[n] (0-1/0-1) kutip(1)[v] (3-4/3-4) itu(1)[det] (3-4/3-4) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) dia kutip bola itu 0-1 3-4 2-3 3-4 1E 1M pick(1)[v] up(1)[p] (3-4+7-8/3-4) the(1)[det] (2-3/2-3) he(1)[n] (0-1/0-1) ball(1)[n] (3-4/2-4) he pick the ball up 0-1 3-4 2-3 3-4 7-8 (0-5,0-4) (0-1,0-1) (2-4,2-4) (2-3,3-4) (2-3,3-4) (3-4,2-3) (0-1,0-1) he(1)[n] 0-1 0-1 (0-1,0-1) (0-1,0-1) dia(1)[n] (0-1/0-1) kutip( 2 )[v] (3-4/3-4) itu( 1 )[det] (3-4/3-4) bola( 1 )[n] (2-3/2-4) 0 lelaki 1 tua 2 itu 3 kutip 4 bola 5 itu 6 lelaki( 3 )[n] (0-1/0-3) itu ( 1 )[det] (2-3/2-3) tua ( 2 )[adj] (1-2/1-2) pick( 1 )[v] up( 1 )[p] (3-4+7-8/3-4) the( 2 )[det] (2-3/2-3) ball( 1 )[n] (3-4/2-4) 0 the 1 old 2 man 3 pick 4 the 5 ball 6 up 7 man( 4 )[n] (2-3/0-3) the( 2 )[det] (0-1/0-1) old( 3 )[adj] (1-2/1-2)
    • Thank you
    • Any comments please send to [email_address]
    • Step 2: compute the frequency of each sense entry in dictionary according to its appearance in definition text.
      • Sort the list by frequency
        • an entry with high frequency =>
        • high probability that entry is a primitive
      • Problems:
        • Empty definition
        • Possibility of selecting wrong semantic primitives based on the self-reference method
    Semantic Primitive Extractor (cont) Sense frequency be10 40 english2 20
  • WSD (Cont)
    • Improving the quality of a number of Natural Language Processing Tasks:
      • Machine Translation
      • Information Extraction
      • Internet Search Engines
  • WSD (Cont) previous path value + difference between the two consecutive paths D 7 D 6 D 5 D 4 D 3 D 2 D 1 P 1 Difference P 8 = P 7+ D 6 couple2 marry2 father2 P 7 = P 6+ D 5 couple1 marry2 father2 P 6 = P 5+ D 4 couple2 marry1 father2 P 5 = P 4+ D 4 couple1 marry1 father2 P 4 = P 3+ D 3 couple2 marry2 father1 P 3 = P 2+ D 2 couple1 marry2 father1 P 2 = P 1+ D 1 couple2 marry1 father1 P 1 couple1 marry1 father1 Path value Path