Pronominal Anaphora
                    Resolution
            in Nepali Language

                       by

    Dev Bahadur Poudel(03314)
      Bivod Aale Magar(03307)
     Nepal Engineering College
        Changunaryan, Bhaktapur



1           07/31/12        Pronominal Anaphora Resolution
Contents

     BriefIntroduction and Background
     Approach to Algorithm
     Implementation in Nepali Discourse
     Over-view of our system
     Scope of our system
     Conclusion




2                    07/31/12     Pronominal Anaphora Resolution
What is Anaphora?

    Reference  to an entity that has
     been previously introduced in the
     discourse.



3                07/31/12   Pronominal Anaphora Resolution
What is Anaphora Resolution?

    Process of determining the
     antecedent of an anaphor.




4               07/31/12   Pronominal Anaphora Resolution
Anaphor resolution in Nepali


    राम सकू ल जानछ । ऊ घर फकर नछ ।

                              Anaphor
      Antecedent

                   ऊ =राम
5                  07/31/12    Pronominal Anaphora Resolution
Can Machine resolve the anaphora?

     Human   intelligence can easily find out to
      which referents the anaphor belongs.
     Can we built a system that can resolve the
      anaphora to the antecendents?




6                     07/31/12      Pronominal Anaphora Resolution
Corpus

     collection  of linguistic data, either written
      texts or a transcription of recorded speech,
      which can be used as a starting-point of
      linguistic description or as a means of
      verifying hypotheses about a language.




7                      07/31/12      Pronominal Anaphora Resolution
Unicode

       an industry standard allowing computers to
        represent and manipulate text consistently
       consists of about 100,000 characters, a set of code
        charts for visual reference, an encoding methodology
        and set of standard character encodings
       Unlike ASCII, which uses 7 bits for each character,
        Unicode uses 16 bits, which means that it can
        represent more than 65,000 unique characters.


8                         07/31/12       Pronominal Anaphora Resolution
Approach to the Algorithm

     Non-Probabilistic
      –   Lappin and Leass Algorithm(1994)
      –   A Tree Search Algorithm- Hobbs(1978)
     Probabilistic
      –   Centering Algorithm
      –   Mitkov’s weak knowledge algorithm



9                       07/31/12       Pronominal Anaphora Resolution
Approach to the Algorithm

      Lappin  and Leass Algorithm(1994)
      Algorithm based on the Sailence
       factors given to the noun and pronoun.




10                  07/31/12    Pronominal Anaphora Resolution
Salience factors in Lappin and
     Leass's Algorithm.

      Sentence  recency                                    100
      Subject emphasis                                     80
      Existential emphasis                                 70
      Accusative (direct object) emphasis                  50
      Indirect object and oblique complement
      emphasis                                              40
      Non-adverbial emphasis                               50
      Head noun emphasis                                   80
11                     07/31/12     Pronominal Anaphora Resolution
Implementation

     Can be implemented using different
      languages
     JAVA, PHP
     Our system uses JAVA




12               07/31/12   Pronominal Anaphora Resolution
Block Diagram of the system



             Tokenizer and      Salience               Output
     Input      Tagger           Factor
                                Assigner




13                   07/31/12     Pronominal Anaphora Resolution
Flowchart                       START


                           Input Paragraph


                         Take A sentence


                          Tokenize

                         Take token



                 no
     Log Error            Check In Corpus


                                        yes
                  Classify as noun or pronoun


                      Classify subject/Object

                      Give Silence value


                      Calculate total weights



                                                yes
                          Next sentence ?                   Half the salience values


                                   no

14               Determine correct referents
                       07/31/12
                                                         Display Results
                                                      Pronominal Anaphora Resolution
                                                                                       END
User Interface




15               07/31/12   Pronominal Anaphora Resolution
An Example in Nepali

       ! = /fd 38L lsGg rfxG5 .
       @= xl/n] Tof] k;ndf b]Vof] .
       #= p;n] p;nfO{ b]vfof] .


16               07/31/12   Pronominal Anaphora Resolution
! = /fd 38L lsGg rfxG5 .




         Decrease the salient values by factor 2
              when reading next sentence




17                    07/31/12       Pronominal Anaphora Resolution
@= xl/n] Tof] k;ndf b]Vof] .


      xl/  gets (Rec: 100+ Sub: 80+ Non adv:
       50+ HN:80 =310)
       Tof] get 280 (rec:100+ cobj:50+non-
       adv:50+ HN: 80)
        Tof] resolved to 38L due to high salience
       value of 38L
      k;n will get (rec:100+non-adv 50+
       HN:80)=230
18                   07/31/12    Pronominal Anaphora Resolution
Updated Discourse Model




      Divide the previous salience factors by two




19                   07/31/12      Pronominal Anaphora Resolution
#= p;n] p;nfO{ b]vfof] .
      p;n] will be resolve to xl/ due to high salience factors. Add Salience
       factor (recency:100+ subpos: 80+ nonadv:50+HN:80)=310
      p;nfO{ can not be xl/ due to syntactic constraints. So, p;nfO{ will
        be resolved to /fd . (rec:100+indObj:40+non-adv 50+ HN:80)=270

                     Updated Discourse Model




20                             07/31/12           Pronominal Anaphora Resolution
Result
     Paragraph       Total      Total     Total   Correctly Incorrectly Zero Effici
                    Samples   Antecedent Anaphors resolved   Resolved Anapho ency
       Using
                     Used         s                                      rs


     2-sentence       15         37               22   15            7           0      68%

     3-sentence       15         50               37   28            9           0      75%

     4-sentence       10         35               35   22            11          2     62.8%

     5-sentence       10         43               41   25            14          2     60.9%

     > 5-sentence      5         28               31   17            11          3      54%

        Total         55         193          166      107           52          7      64%


21                                     07/31/12              Pronominal Anaphora Resolution
Scope of the Project
       -Natural language
       processing
       -Question answering
       -Text Summarizing
       -Information Extraction
       -Interaction with query
       interfaces and dialogue
       interpretation
22     -Natural Language
                  07/31/12   Pronominal Anaphora Resolution
Limitations

        The lack of tagger and parser limits the system for
         large corpus and had to go for a hand annotated
         corpus.
        The sentences are limited to the words defined in
         our corpus
        The system is limited to the third person pronouns
         but not reflexive.




23                         07/31/12         Pronominal Anaphora Resolution
Further Works
        Morphological analysis can be done
        The system can be enhanced further work on large
         number of sentences.
        This project can be used with collaboration of other
         NLP projects in Nepali language for further research.
        The statistical methods can be applied to get higher
         efficiency.



24                         07/31/12        Pronominal Anaphora Resolution
Conclusion

      Research   to see how a basic approach like
       Lappin and Leass performs for Nepali language.
      Applies to non reflexive third person pronouns.
      Emerging concept in Nepali Language
      Understanding the discourse - challenging to
       computer intelligence
      Without tagger and parser our system is greatly
       dictionary dependent
      Our work aid to future research in Nepali
25     language        07/31/12       Pronominal Anaphora Resolution
Thank You.




26       07/31/12   Pronominal Anaphora Resolution

Pronominal Anaphora resolution

  • 1.
    Pronominal Anaphora Resolution in Nepali Language by Dev Bahadur Poudel(03314) Bivod Aale Magar(03307) Nepal Engineering College Changunaryan, Bhaktapur 1 07/31/12 Pronominal Anaphora Resolution
  • 2.
    Contents  BriefIntroduction and Background  Approach to Algorithm  Implementation in Nepali Discourse  Over-view of our system  Scope of our system  Conclusion 2 07/31/12 Pronominal Anaphora Resolution
  • 3.
    What is Anaphora? Reference to an entity that has been previously introduced in the discourse. 3 07/31/12 Pronominal Anaphora Resolution
  • 4.
    What is AnaphoraResolution? Process of determining the antecedent of an anaphor. 4 07/31/12 Pronominal Anaphora Resolution
  • 5.
    Anaphor resolution inNepali राम सकू ल जानछ । ऊ घर फकर नछ । Anaphor Antecedent ऊ =राम 5 07/31/12 Pronominal Anaphora Resolution
  • 6.
    Can Machine resolvethe anaphora?  Human intelligence can easily find out to which referents the anaphor belongs.  Can we built a system that can resolve the anaphora to the antecendents? 6 07/31/12 Pronominal Anaphora Resolution
  • 7.
    Corpus  collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language. 7 07/31/12 Pronominal Anaphora Resolution
  • 8.
    Unicode  an industry standard allowing computers to represent and manipulate text consistently  consists of about 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings  Unlike ASCII, which uses 7 bits for each character, Unicode uses 16 bits, which means that it can represent more than 65,000 unique characters. 8 07/31/12 Pronominal Anaphora Resolution
  • 9.
    Approach to theAlgorithm  Non-Probabilistic – Lappin and Leass Algorithm(1994) – A Tree Search Algorithm- Hobbs(1978)  Probabilistic – Centering Algorithm – Mitkov’s weak knowledge algorithm 9 07/31/12 Pronominal Anaphora Resolution
  • 10.
    Approach to theAlgorithm  Lappin and Leass Algorithm(1994)  Algorithm based on the Sailence factors given to the noun and pronoun. 10 07/31/12 Pronominal Anaphora Resolution
  • 11.
    Salience factors inLappin and Leass's Algorithm.  Sentence recency 100  Subject emphasis 80  Existential emphasis 70  Accusative (direct object) emphasis 50  Indirect object and oblique complement emphasis 40  Non-adverbial emphasis 50  Head noun emphasis 80 11 07/31/12 Pronominal Anaphora Resolution
  • 12.
    Implementation Can be implemented using different languages JAVA, PHP Our system uses JAVA 12 07/31/12 Pronominal Anaphora Resolution
  • 13.
    Block Diagram ofthe system Tokenizer and Salience Output Input Tagger Factor Assigner 13 07/31/12 Pronominal Anaphora Resolution
  • 14.
    Flowchart START Input Paragraph Take A sentence Tokenize Take token no Log Error Check In Corpus yes Classify as noun or pronoun Classify subject/Object Give Silence value Calculate total weights yes Next sentence ? Half the salience values no 14 Determine correct referents 07/31/12 Display Results Pronominal Anaphora Resolution END
  • 15.
    User Interface 15 07/31/12 Pronominal Anaphora Resolution
  • 16.
    An Example inNepali ! = /fd 38L lsGg rfxG5 . @= xl/n] Tof] k;ndf b]Vof] . #= p;n] p;nfO{ b]vfof] . 16 07/31/12 Pronominal Anaphora Resolution
  • 17.
    ! = /fd38L lsGg rfxG5 . Decrease the salient values by factor 2 when reading next sentence 17 07/31/12 Pronominal Anaphora Resolution
  • 18.
    @= xl/n] Tof]k;ndf b]Vof] .  xl/ gets (Rec: 100+ Sub: 80+ Non adv: 50+ HN:80 =310) Tof] get 280 (rec:100+ cobj:50+non- adv:50+ HN: 80) Tof] resolved to 38L due to high salience value of 38L  k;n will get (rec:100+non-adv 50+ HN:80)=230 18 07/31/12 Pronominal Anaphora Resolution
  • 19.
    Updated Discourse Model Divide the previous salience factors by two 19 07/31/12 Pronominal Anaphora Resolution
  • 20.
    #= p;n] p;nfO{b]vfof] .  p;n] will be resolve to xl/ due to high salience factors. Add Salience factor (recency:100+ subpos: 80+ nonadv:50+HN:80)=310  p;nfO{ can not be xl/ due to syntactic constraints. So, p;nfO{ will be resolved to /fd . (rec:100+indObj:40+non-adv 50+ HN:80)=270 Updated Discourse Model 20 07/31/12 Pronominal Anaphora Resolution
  • 21.
    Result Paragraph Total Total Total Correctly Incorrectly Zero Effici Samples Antecedent Anaphors resolved Resolved Anapho ency Using Used s rs 2-sentence 15 37 22 15 7 0 68% 3-sentence 15 50 37 28 9 0 75% 4-sentence 10 35 35 22 11 2 62.8% 5-sentence 10 43 41 25 14 2 60.9% > 5-sentence 5 28 31 17 11 3 54% Total 55 193 166 107 52 7 64% 21 07/31/12 Pronominal Anaphora Resolution
  • 22.
    Scope of theProject -Natural language processing -Question answering -Text Summarizing -Information Extraction -Interaction with query interfaces and dialogue interpretation 22 -Natural Language 07/31/12 Pronominal Anaphora Resolution
  • 23.
    Limitations  The lack of tagger and parser limits the system for large corpus and had to go for a hand annotated corpus.  The sentences are limited to the words defined in our corpus  The system is limited to the third person pronouns but not reflexive. 23 07/31/12 Pronominal Anaphora Resolution
  • 24.
    Further Works  Morphological analysis can be done  The system can be enhanced further work on large number of sentences.  This project can be used with collaboration of other NLP projects in Nepali language for further research.  The statistical methods can be applied to get higher efficiency. 24 07/31/12 Pronominal Anaphora Resolution
  • 25.
    Conclusion  Research to see how a basic approach like Lappin and Leass performs for Nepali language.  Applies to non reflexive third person pronouns.  Emerging concept in Nepali Language  Understanding the discourse - challenging to computer intelligence  Without tagger and parser our system is greatly dictionary dependent  Our work aid to future research in Nepali 25 language 07/31/12 Pronominal Anaphora Resolution
  • 26.
    Thank You. 26 07/31/12 Pronominal Anaphora Resolution