SlideShare a Scribd company logo
1 of 1
Download to read offline
Information Content Measures of Semantic Similarity
                                                                                                      Perform Better Without Sense-Tagged Text
                                                                                                                                      Ted Pedersen
                                                                                                                             University of Minnesota, Duluth

                     Abstract                                                  Semantic Similarity                                              Information Content                                                           Experiments
This poster presents an empirical                                             Is a dog similar to a human?                                                                             The hypothesis underlying this poster is that
comparison of similarity measures                                               Is a cow similar to a barn?                                                                            Information Content values can be reasonably
                                                                       Is a time machine similar to a telescope?                                                                       calculated without sense tagged text, and that
for pairs of concepts based on                                           Is a computer similar to a surf board?                                                                        sense-tagged text is not available in sufficient
Information Content. It shows that                                                                                                                                                     quantities to be considered reliable for this task.
using modest amounts of untagged                                   Similar things share certain traits : X and Y both                                                                  A number of experiments were conducted where
                                                                   (have | are made of | like | think about | live in |                                                                different amounts of raw text were used to
text to derive Information Content
                                                                   studied at | play | hate | are a kind of | eat ) Z                                                                  estimate Information Content values and to use
results in higher correlation with
                                                                                                                                                                                       those in computing semantic similarity.
human similarity judgments than                                    Aristotle, 384 BC – 322 BC
                                                                   ● Great Chain of Being - nature with ranks                                                                          The automatically computed semantic similarity
using the largest available corpus
                                                                   ● Law of Similarity                                                                                                 values were measured for rank correlation with
of   manually                annotated                    sense-                                                                                                                       three manually created gold standard datasets.
tagged data.                                                       Carl Linnaeus, 1707-1778                                                                                            This poster shows the results using the Miller
                                                                   ● Nature as nested hierarchy                                                                                        and Charles set of 30 word pairs.
                                                                   ● Binomial Nomenclature : Genus + Species
                                                                                                                                                                                       Overall there was no advantage to using sense
                                                                                                                              This is a small portion of WordNet, showing              tagged text to compute Information Content
                                                                    This leads to TAXONOMY – concepts organized                                                                        values, and the Information Content based
                                                                        in a hierarchy joined by “is-a” relations             Information Content for concepts based on
                                                                                                                                                                                       measures of semantic similarity improved upon
                                                                                                                              sense-tagged text (SemCor), newspaper text
                                                                                                                                                                                       the path based measures.
                                                                     NLP needs to incorporate similarity for WSD,             (AFE), and by assuming each sense occurs one
                                                                     lexical selection in text generation, semantic           time (ADD-1). If the text is not sense tagged, then      Of particular interest was the fact that the add-1
                                                                       search, recognizing textual entailment...              each possible sense of a word (and its ancestors)        smoothing method (without any text!) provided
                                                                                                                              is incremented with each occurrence.                     results almost as good as raw newspaper text.
                                                                                                                                                                                       This shows that the real power of Information
                                                                       Measures of Semantic Similarity                                                                                 Content is not in the frequency counts but in the
                                                                                                                                                                                       structure of the taxonomy. This would have come
                                                                   A measure of semantic similarity quantifies the            Correlation with Miller & Charles Data                   as no surprise to Aristotle and Linnaeus
                                                                   degree to which two concepts are “alike”
                                                                   ● How much does c1 remind you of c2?

                                                                                                                                                                                       Concept Pairs                                RES                              LIN                       JCN

                                                                                                                           Corpus for          Size       Coverage   JCN   LIN   RES   Similarity Measures    PATH     SEMCOR        AFE      ADD-    SEMCOR          AFE    ADD-1   SEMCOR     AFE    ADD-1
                                                                   Concepts can be arranged in a hierarchy, where              IC                                                      w/ Different Sources                                    1

                                                                   children “are of a kind of” their parents – path        SemCor            226,000        .24      .72   .73   .74
                                                                                                                                                                                       telescope#n#1
                                                                                                                                                                                       microscope#n#1
                                                                                                                                                                                                               0.33          9.28    10.81     0.93           0.93    0.89    0.91      0.66    0.38    0.66


                                                                   lengths give plausible approximation of similarity      (sense-tagged)
                                                                                                                                                                                       instrument#n#1          0.33          4.37     4.66     0.70           0.70    0.69    0.65      0.26    0.24    0.26
                                                                                                                                                                                       machine#n#1
                                                                                                                           SemCor            670,000        .37      .82   .79   .76   telescope#n#1           0.17          4.37     4.66     0.41           0.41    0.34    0.36      0.08    0.06    0.08
                                                                   ...but path lengths are inconsistent as you move        (raw)                                                       time_machine#n#1

                                                                   to more general or more specific concepts               xie199501        1.2 million     .35      .78   .75   .73
                                                                                                                                                                                       catapult#n#3
                                                                                                                                                                                       time_machine#n#1
                                                                                                                                                                                                               0.14          4.37     4.66        0             0     0.29    0.31        0     0.04    0.06

                                                                                                                           (Jan 1995)
                                                                   Resnik (1995) assigned a value of specificity to        xie199501-       2.3 million     .39      .79   .75   .73
                                                                   each concept (which corrects for path length)           02
                                                                   ● Information Content (c1) = log prob (c1)
                                                                                                                           xie199501-       16 million      .51      .87   .81   .75
                                                                                                                                                                                                                               References
                                                                   ● Similarity of c1 and c2 is the IC of their least      12
                                                                                                                                                                                                                                Measures of Semantic Similarity
                                                                   common subsumer (most specific ancestor)                xie1995-         73 million      .60      .89   .81   .75   J. Jiang and D. Conrath, 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings
                                                                                                                           1998                                                        on International Conference on Research in Computational Linguistics, Taiwan.

                                                                                      Resnik (1995)                        XIE              133 million     .64      .88   .81   .76
                                                                                                                                                                                       D. Lin, 1998. An information-theoretic definition of similarity, Proceedings of the International Conference on
                                                                                                                                                                                       Machine Learning, Madison.
                                                                              res (c1,c2) = IC (LCS (c1,c2))               1995-2001                                                   P. Resnik, 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the
                                                                                                                                                                                       14th International Joint Conference on Artificial Intelligence, Montreal.

                     Contact                                                     Jiang & Conrath (1997)
                                                                                                                           AFE              174 million     .66      .88   .80   .77                                                      Experimental Data

                                                                                                                                                                                       L. Finkelstein, et al. 2002. Placing search in context: The concept revisited. ACM Transactions on Information
                                                                     jcn (c1, c2) = 1/IC(c1) + IC(c2) – 2 * res (c1, c2)   APW              560 million     .75      .84   .79   .76
                                                                                                                                                                                       Systems, 20(1):116-131. http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/

              Ted Pedersen                                                                                                                                                             G. A. Miller and W.G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive
                                                                                                                                                                                       Processes, 6(1):1-28
     Department of Computer Science                                                      Lin (1998)
     University of Minnesota, Duluth                                                                                       NYT              963 million     .83      .84   .79   .77
                                                                                                                                                                                       H. Rubenstein and J.B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM,

                                                                      lin (c1, c2) = 2 * res (c1, c2) / IC(c1) + IC (c2)                                                               8:627-633.

                                                                                                                                                                                                                                              Software

           tpederse@d.umn.edu                                                                                              ADD-1                0           1.00     .85   .77   .76   Pedersen, et al. 2004. WordNet::Similarity – Measuring the relatedness of concepts. In Proceedings of

     http://www.d.umn.edu/~tpederse                                                                                                                                                    HLT/NAACL 2004, pg 38-41, Boston, MA.



     Poster Design & Printing by Genigraphics® - 800.790.4001

More Related Content

More from University of Minnesota, Duluth

Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? University of Minnesota, Duluth
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?University of Minnesota, Duluth
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection University of Minnesota, Duluth
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...University of Minnesota, Duluth
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...University of Minnesota, Duluth
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyUniversity of Minnesota, Duluth
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...University of Minnesota, Duluth
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyUniversity of Minnesota, Duluth
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...University of Minnesota, Duluth
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)University of Minnesota, Duluth
 

More from University of Minnesota, Duluth (20)

Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...
 
Screening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSDScreening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSD
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
Pedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshopPedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshop
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 

Pedersen naacl-2010-poster

  • 1. Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text Ted Pedersen University of Minnesota, Duluth Abstract Semantic Similarity Information Content Experiments This poster presents an empirical Is a dog similar to a human? The hypothesis underlying this poster is that comparison of similarity measures Is a cow similar to a barn? Information Content values can be reasonably Is a time machine similar to a telescope? calculated without sense tagged text, and that for pairs of concepts based on Is a computer similar to a surf board? sense-tagged text is not available in sufficient Information Content. It shows that quantities to be considered reliable for this task. using modest amounts of untagged Similar things share certain traits : X and Y both A number of experiments were conducted where (have | are made of | like | think about | live in | different amounts of raw text were used to text to derive Information Content studied at | play | hate | are a kind of | eat ) Z estimate Information Content values and to use results in higher correlation with those in computing semantic similarity. human similarity judgments than Aristotle, 384 BC – 322 BC ● Great Chain of Being - nature with ranks The automatically computed semantic similarity using the largest available corpus ● Law of Similarity values were measured for rank correlation with of manually annotated sense- three manually created gold standard datasets. tagged data. Carl Linnaeus, 1707-1778 This poster shows the results using the Miller ● Nature as nested hierarchy and Charles set of 30 word pairs. ● Binomial Nomenclature : Genus + Species Overall there was no advantage to using sense This is a small portion of WordNet, showing tagged text to compute Information Content This leads to TAXONOMY – concepts organized values, and the Information Content based in a hierarchy joined by “is-a” relations Information Content for concepts based on measures of semantic similarity improved upon sense-tagged text (SemCor), newspaper text the path based measures. NLP needs to incorporate similarity for WSD, (AFE), and by assuming each sense occurs one lexical selection in text generation, semantic time (ADD-1). If the text is not sense tagged, then Of particular interest was the fact that the add-1 search, recognizing textual entailment... each possible sense of a word (and its ancestors) smoothing method (without any text!) provided is incremented with each occurrence. results almost as good as raw newspaper text. This shows that the real power of Information Measures of Semantic Similarity Content is not in the frequency counts but in the structure of the taxonomy. This would have come A measure of semantic similarity quantifies the Correlation with Miller & Charles Data as no surprise to Aristotle and Linnaeus degree to which two concepts are “alike” ● How much does c1 remind you of c2? Concept Pairs RES LIN JCN Corpus for Size Coverage JCN LIN RES Similarity Measures PATH SEMCOR AFE ADD- SEMCOR AFE ADD-1 SEMCOR AFE ADD-1 Concepts can be arranged in a hierarchy, where IC w/ Different Sources 1 children “are of a kind of” their parents – path SemCor 226,000 .24 .72 .73 .74 telescope#n#1 microscope#n#1 0.33 9.28 10.81 0.93 0.93 0.89 0.91 0.66 0.38 0.66 lengths give plausible approximation of similarity (sense-tagged) instrument#n#1 0.33 4.37 4.66 0.70 0.70 0.69 0.65 0.26 0.24 0.26 machine#n#1 SemCor 670,000 .37 .82 .79 .76 telescope#n#1 0.17 4.37 4.66 0.41 0.41 0.34 0.36 0.08 0.06 0.08 ...but path lengths are inconsistent as you move (raw) time_machine#n#1 to more general or more specific concepts xie199501 1.2 million .35 .78 .75 .73 catapult#n#3 time_machine#n#1 0.14 4.37 4.66 0 0 0.29 0.31 0 0.04 0.06 (Jan 1995) Resnik (1995) assigned a value of specificity to xie199501- 2.3 million .39 .79 .75 .73 each concept (which corrects for path length) 02 ● Information Content (c1) = log prob (c1) xie199501- 16 million .51 .87 .81 .75 References ● Similarity of c1 and c2 is the IC of their least 12 Measures of Semantic Similarity common subsumer (most specific ancestor) xie1995- 73 million .60 .89 .81 .75 J. Jiang and D. Conrath, 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings 1998 on International Conference on Research in Computational Linguistics, Taiwan. Resnik (1995) XIE 133 million .64 .88 .81 .76 D. Lin, 1998. An information-theoretic definition of similarity, Proceedings of the International Conference on Machine Learning, Madison. res (c1,c2) = IC (LCS (c1,c2)) 1995-2001 P. Resnik, 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal. Contact Jiang & Conrath (1997) AFE 174 million .66 .88 .80 .77 Experimental Data L. Finkelstein, et al. 2002. Placing search in context: The concept revisited. ACM Transactions on Information jcn (c1, c2) = 1/IC(c1) + IC(c2) – 2 * res (c1, c2) APW 560 million .75 .84 .79 .76 Systems, 20(1):116-131. http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/ Ted Pedersen G. A. Miller and W.G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1-28 Department of Computer Science Lin (1998) University of Minnesota, Duluth NYT 963 million .83 .84 .79 .77 H. Rubenstein and J.B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM, lin (c1, c2) = 2 * res (c1, c2) / IC(c1) + IC (c2) 8:627-633. Software tpederse@d.umn.edu ADD-1 0 1.00 .85 .77 .76 Pedersen, et al. 2004. WordNet::Similarity – Measuring the relatedness of concepts. In Proceedings of http://www.d.umn.edu/~tpederse HLT/NAACL 2004, pg 38-41, Boston, MA. Poster Design & Printing by Genigraphics® - 800.790.4001