Pronominal Anaphora resolution

Pronominal Anaphora
Resolution
in Nepali Language

by

Dev Bahadur Poudel(03314)
Bivod Aale Magar(03307)
Nepal Engineering College
Changunaryan, Bhaktapur

1 07/31/12 Pronominal Anaphora Resolution

Contents

 BriefIntroduction and Background
 Approach to Algorithm
 Implementation in Nepali Discourse
 Over-view of our system
 Scope of our system
 Conclusion


What is Anaphora?

Reference to an entity that has
been previously introduced in the
discourse.


What is Anaphora Resolution?

Process of determining the
antecedent of an anaphor.


Anaphor resolution in Nepali

राम सकू ल जानछ । ऊ घर फकर नछ ।

Anaphor
Antecedent

ऊ =राम

Can Machine resolve the anaphora?

 Human intelligence can easily find out to
which referents the anaphor belongs.
 Can we built a system that can resolve the
anaphora to the antecendents?


Corpus

 collection of linguistic data, either written
texts or a transcription of recorded speech,
which can be used as a starting-point of
linguistic description or as a means of
verifying hypotheses about a language.


Unicode

 an industry standard allowing computers to
represent and manipulate text consistently
 consists of about 100,000 characters, a set of code
charts for visual reference, an encoding methodology
and set of standard character encodings
 Unlike ASCII, which uses 7 bits for each character,
Unicode uses 16 bits, which means that it can
represent more than 65,000 unique characters.


Approach to the Algorithm

 Non-Probabilistic
– Lappin and Leass Algorithm(1994)
– A Tree Search Algorithm- Hobbs(1978)
 Probabilistic
– Centering Algorithm
– Mitkov’s weak knowledge algorithm


Approach to the Algorithm

 Lappin and Leass Algorithm(1994)
 Algorithm based on the Sailence
factors given to the noun and pronoun.


Salience factors in Lappin and
Leass's Algorithm.

 Sentence recency 100
 Subject emphasis 80
 Existential emphasis 70
 Accusative (direct object) emphasis 50
 Indirect object and oblique complement
emphasis 40
 Non-adverbial emphasis 50
 Head noun emphasis 80

Implementation

Can be implemented using different
languages
JAVA, PHP
Our system uses JAVA


Block Diagram of the system

Tokenizer and Salience Output
Input Tagger Factor
Assigner


Flowchart START

Input Paragraph

Take A sentence

Tokenize

Take token

no
Log Error Check In Corpus

yes
Classify as noun or pronoun

Classify subject/Object

Give Silence value

Calculate total weights

yes
Next sentence ? Half the salience values

no

14 Determine correct referents
07/31/12
Display Results
Pronominal Anaphora Resolution
END

User Interface


An Example in Nepali

! = /fd 38L lsGg rfxG5 .
@= xl/n] Tof] k;ndf b]Vof] .
#= p;n] p;nfO{ b]vfof] .


! = /fd 38L lsGg rfxG5 .

Decrease the salient values by factor 2
when reading next sentence


@= xl/n] Tof] k;ndf b]Vof] .

 xl/ gets (Rec: 100+ Sub: 80+ Non adv:
50+ HN:80 =310)
Tof] get 280 (rec:100+ cobj:50+non-
adv:50+ HN: 80)
Tof] resolved to 38L due to high salience
value of 38L
 k;n will get (rec:100+non-adv 50+
HN:80)=230

Updated Discourse Model

Divide the previous salience factors by two


#= p;n] p;nfO{ b]vfof] .
 p;n] will be resolve to xl/ due to high salience factors. Add Salience
factor (recency:100+ subpos: 80+ nonadv:50+HN:80)=310
 p;nfO{ can not be xl/ due to syntactic constraints. So, p;nfO{ will
be resolved to /fd . (rec:100+indObj:40+non-adv 50+ HN:80)=270

Updated Discourse Model


Result
Paragraph Total Total Total Correctly Incorrectly Zero Effici
Samples Antecedent Anaphors resolved Resolved Anapho ency
Using
Used s rs

2-sentence 15 37 22 15 7 0 68%

3-sentence 15 50 37 28 9 0 75%

4-sentence 10 35 35 22 11 2 62.8%

5-sentence 10 43 41 25 14 2 60.9%

> 5-sentence 5 28 31 17 11 3 54%

Total 55 193 166 107 52 7 64%


Scope of the Project
-Natural language
processing
-Question answering
-Text Summarizing
-Information Extraction
-Interaction with query
interfaces and dialogue
interpretation
22 -Natural Language
07/31/12 Pronominal Anaphora Resolution

Limitations

 The lack of tagger and parser limits the system for
large corpus and had to go for a hand annotated
corpus.
 The sentences are limited to the words defined in
our corpus
 The system is limited to the third person pronouns
but not reflexive.


Further Works
 Morphological analysis can be done
 The system can be enhanced further work on large
number of sentences.
 This project can be used with collaboration of other
NLP projects in Nepali language for further research.
 The statistical methods can be applied to get higher
efficiency.


Conclusion

 Research to see how a basic approach like
Lappin and Leass performs for Nepali language.
 Applies to non reflexive third person pronouns.
 Emerging concept in Nepali Language
 Understanding the discourse - challenging to
computer intelligence
 Without tagger and parser our system is greatly
dictionary dependent
 Our work aid to future research in Nepali
25 language 07/31/12 Pronominal Anaphora Resolution

Thank You.


Pronominal Anaphora resolution

More Related Content

Similar to Pronominal Anaphora resolution

Recently uploaded

Pronominal Anaphora resolution