Predicting P & R
by
Term Variability &
Word Sense Ambiguity


 Farzaneh Sarafraz

5 March 2010
What do the following things
    have in common?
EARRING STEMS
ARMY ENCAMPMENTS
STAFF
POSITION
BLOG
ARTICLE
TRADING STATION
Post
Polysemy – Word Sense Ambiguity




                        A
Term Variability


                   A
                   B
                   C
Term Variability Measure


                # different words
    V [class ]=
                # all mentions




       AABBBBCCAACAAAAAAAAAC
Word Entropy


                       #  word ,class            #  word , class
H [ word ]=− ∑                          . log 2                    
            allclasses #  word ,*                #  word ,* 




     A
Class Entropy
                           # word , class
  H [class ]= ∑ H [word ].
             word ∈class   # * , class




Class [Un]-characterisability / Confusion

                        # word , *−class # word , class
 C [class]= ∑                            .                 
           word ∈ class # word , *        # * , class
Guess

          Lexical Variability
                                R
        AABBBBCCAAAAAAC


 Class Characterisability

             A                  P

             B
Precision vs. Class Confusion
      100


              Protein Catabolism
       90

      Phosphorylation

       80


       70

                                                         Regulation
                                                                  Gene Expression
       60
                 Binding           Positive Regulation


       50                                                      Localization
       Negative Regulation



       40


       30


                                                                                                 Transcription
       20


       10


         0
             0                      0.05                    0.1                     0.15   0.2   0.25            0.3




                                                Correlation = -0.72
Recall vs. Lexical Variability
      120




      100           Phosphorylation




                                                                  Protein Catabolism


       80       Gene Expression                                                     Localization




                                                Transcription
       60




       40
                                                                                  Binding
                                                                                                          Negative Regulation




       20
                                                                    Positive Regulation                   Regulation




        0
         0.02    0.04             0.06   0.08       0.1         0.12           0.14                0.16         0.18            0.2




                          Correlation = -0.68
Results For Other Teams




                                 Average precision vs. class
Average recall vs. variability   confusion

Correlation = -0.82              Correlation = 0.03
Ideas, please?
Information Extraction


  K J G T C S H U A G Y X W.
Information Extraction


  K J G T C S H U A G Y X W.
Information Extraction


  K J G T C S H U A G Y X W.
Information Extraction


  K J G T C S H U A G Y X W.

Ambiguity

  • 1.
    Predicting P &R by Term Variability & Word Sense Ambiguity Farzaneh Sarafraz 5 March 2010
  • 2.
    What do thefollowing things have in common?
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 10.
  • 11.
    Polysemy – WordSense Ambiguity A
  • 12.
  • 13.
    Term Variability Measure # different words V [class ]= # all mentions AABBBBCCAACAAAAAAAAAC
  • 14.
    Word Entropy #  word ,class  #  word , class H [ word ]=− ∑  . log 2   allclasses #  word ,* #  word ,*  A
  • 15.
    Class Entropy # word , class H [class ]= ∑ H [word ]. word ∈class # * , class Class [Un]-characterisability / Confusion # word , *−class # word , class C [class]= ∑  .  word ∈ class # word , * # * , class
  • 16.
    Guess Lexical Variability R AABBBBCCAAAAAAC Class Characterisability A P B
  • 17.
    Precision vs. ClassConfusion 100 Protein Catabolism 90 Phosphorylation 80 70 Regulation Gene Expression 60 Binding Positive Regulation 50 Localization Negative Regulation 40 30 Transcription 20 10 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Correlation = -0.72
  • 18.
    Recall vs. LexicalVariability 120 100 Phosphorylation Protein Catabolism 80 Gene Expression Localization Transcription 60 40 Binding Negative Regulation 20 Positive Regulation Regulation 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Correlation = -0.68
  • 19.
    Results For OtherTeams Average precision vs. class Average recall vs. variability confusion Correlation = -0.82 Correlation = 0.03
  • 20.
  • 21.
    Information Extraction K J G T C S H U A G Y X W.
  • 22.
    Information Extraction K J G T C S H U A G Y X W.
  • 23.
    Information Extraction K J G T C S H U A G Y X W.
  • 24.
    Information Extraction K J G T C S H U A G Y X W.