ANAPHORA RESOLUTION
       FINDWISE
Anaphoric Pronoun Resolution




Finding links
 • Pronoun to antecedent
Enriching text
 • Input: preprocessed document
 • Output: All found anaphoric pronoun
    references to words/phrases
Areas of use




Document summarization
 • Improving sentence comparisons       Ontology enrichment
 • Enriching results                     • Populating with more
                                           data.
Entity level sentiment analysis         Question answering
 • Adding more information to indata.    • Extracting more RDF-
                                           tripples
Preprocessing




 Required                     Additional
  • Sentence splitting         • Dependency parsing
  • Tokenization
  • Part of Speech-tagging
  • Named Entity Reconition
  • Gender Detection
Model representation




Anaphora pairs                         Candidate selection/ranking
 • Pronoun                              • Find pronoun
 • Antecedent                           • Pair with antecedent candidates
   - Entities                           • Filter out improbable pairs (rules)
   - Nouns, cardinals, foreign words    • Rank candidate pairs
                                        • Select the most probable
                                          candidate (if any)
Feature representation




Distance Features      Overlap Features/Filters
 • Sentence distance    • Gender
 • Hobbs distance       • Animacity
Antecedent Features     • Number
 • PoS-tag              • Entity
 • Gender              Pronoun Features
 • Animacity            • Word string
 • Number               • Gender
 • Entity tag           • Animacity
 • ...                  • ...
Machine learning models


                                      Running the models
Models
 • Condidtional Random Fields (CRF)    • Control confiedence
                                          threshold
    - Mallet
                                               - Precision/Recall trade
 • Logistic Regression                off
        - Liblinear
Training the models
 • OntoNotes Conll 2012
 • English
 • 1667 documents
 • Various domains
Further Work/Ideas for Improvement




Full coreference/anaphora resolution
                                       Improved Features
 • Change model representations         • Improved gender detection
     - Clusters
     - Chains                           • Improved animacity detection
 • Generalize comparisons (not only     • Additional overlap features
                                       Multi pass approach
     pronoun - antecedent)
Non referential/cataphora detection     • First pass(es) rule based
 • Training separate models             • Harder classifications with
                                           machine learning models
 • Rule based
Demonstration

Anaphora Resolution

  • 1.
  • 2.
    Anaphoric Pronoun Resolution Findinglinks • Pronoun to antecedent Enriching text • Input: preprocessed document • Output: All found anaphoric pronoun references to words/phrases
  • 3.
    Areas of use Documentsummarization • Improving sentence comparisons Ontology enrichment • Enriching results • Populating with more data. Entity level sentiment analysis Question answering • Adding more information to indata. • Extracting more RDF- tripples
  • 4.
    Preprocessing Required Additional • Sentence splitting • Dependency parsing • Tokenization • Part of Speech-tagging • Named Entity Reconition • Gender Detection
  • 5.
    Model representation Anaphora pairs Candidate selection/ranking • Pronoun • Find pronoun • Antecedent • Pair with antecedent candidates - Entities • Filter out improbable pairs (rules) - Nouns, cardinals, foreign words • Rank candidate pairs • Select the most probable candidate (if any)
  • 6.
    Feature representation Distance Features Overlap Features/Filters • Sentence distance • Gender • Hobbs distance • Animacity Antecedent Features • Number • PoS-tag • Entity • Gender Pronoun Features • Animacity • Word string • Number • Gender • Entity tag • Animacity • ... • ...
  • 7.
    Machine learning models Running the models Models • Condidtional Random Fields (CRF) • Control confiedence threshold - Mallet - Precision/Recall trade • Logistic Regression off - Liblinear Training the models • OntoNotes Conll 2012 • English • 1667 documents • Various domains
  • 8.
    Further Work/Ideas forImprovement Full coreference/anaphora resolution Improved Features • Change model representations • Improved gender detection - Clusters - Chains • Improved animacity detection • Generalize comparisons (not only • Additional overlap features Multi pass approach pronoun - antecedent) Non referential/cataphora detection • First pass(es) rule based • Training separate models • Harder classifications with machine learning models • Rule based
  • 9.