Information Content Measures of Semantic Similarity
Perform Better Without Sense-Tagged Text
University of Minnesota, Duluth
Abstract Semantic Similarity Information Content Experiments
This poster presents an empirical Is a dog similar to a human? The hypothesis underlying this poster is that
comparison of similarity measures Is a cow similar to a barn? Information Content values can be reasonably
Is a time machine similar to a telescope? calculated without sense tagged text, and that
for pairs of concepts based on Is a computer similar to a surf board? sense-tagged text is not available in sufficient
Information Content. It shows that quantities to be considered reliable for this task.
using modest amounts of untagged Similar things share certain traits : X and Y both A number of experiments were conducted where
(have | are made of | like | think about | live in | different amounts of raw text were used to
text to derive Information Content
studied at | play | hate | are a kind of | eat ) Z estimate Information Content values and to use
results in higher correlation with
those in computing semantic similarity.
human similarity judgments than Aristotle, 384 BC – 322 BC
● Great Chain of Being - nature with ranks The automatically computed semantic similarity
using the largest available corpus
● Law of Similarity values were measured for rank correlation with
of manually annotated sense- three manually created gold standard datasets.
tagged data. Carl Linnaeus, 1707-1778 This poster shows the results using the Miller
● Nature as nested hierarchy and Charles set of 30 word pairs.
● Binomial Nomenclature : Genus + Species
Overall there was no advantage to using sense
This is a small portion of WordNet, showing tagged text to compute Information Content
This leads to TAXONOMY – concepts organized values, and the Information Content based
in a hierarchy joined by “is-a” relations Information Content for concepts based on
measures of semantic similarity improved upon
sense-tagged text (SemCor), newspaper text
the path based measures.
NLP needs to incorporate similarity for WSD, (AFE), and by assuming each sense occurs one
lexical selection in text generation, semantic time (ADD-1). If the text is not sense tagged, then Of particular interest was the fact that the add-1
search, recognizing textual entailment... each possible sense of a word (and its ancestors) smoothing method (without any text!) provided
is incremented with each occurrence. results almost as good as raw newspaper text.
This shows that the real power of Information
Measures of Semantic Similarity Content is not in the frequency counts but in the
structure of the taxonomy. This would have come
A measure of semantic similarity quantifies the Correlation with Miller & Charles Data as no surprise to Aristotle and Linnaeus
degree to which two concepts are “alike”
● How much does c1 remind you of c2?
Concept Pairs RES LIN JCN
Corpus for Size Coverage JCN LIN RES Similarity Measures PATH SEMCOR AFE ADD- SEMCOR AFE ADD-1 SEMCOR AFE ADD-1
Concepts can be arranged in a hierarchy, where IC w/ Different Sources 1
children “are of a kind of” their parents – path SemCor 226,000 .24 .72 .73 .74
0.33 9.28 10.81 0.93 0.93 0.89 0.91 0.66 0.38 0.66
lengths give plausible approximation of similarity (sense-tagged)
instrument#n#1 0.33 4.37 4.66 0.70 0.70 0.69 0.65 0.26 0.24 0.26
SemCor 670,000 .37 .82 .79 .76 telescope#n#1 0.17 4.37 4.66 0.41 0.41 0.34 0.36 0.08 0.06 0.08
...but path lengths are inconsistent as you move (raw) time_machine#n#1
to more general or more specific concepts xie199501 1.2 million .35 .78 .75 .73
0.14 4.37 4.66 0 0 0.29 0.31 0 0.04 0.06
Resnik (1995) assigned a value of specificity to xie199501- 2.3 million .39 .79 .75 .73
each concept (which corrects for path length) 02
● Information Content (c1) = log prob (c1)
xie199501- 16 million .51 .87 .81 .75
● Similarity of c1 and c2 is the IC of their least 12
Measures of Semantic Similarity
common subsumer (most specific ancestor) xie1995- 73 million .60 .89 .81 .75 J. Jiang and D. Conrath, 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings
1998 on International Conference on Research in Computational Linguistics, Taiwan.
Resnik (1995) XIE 133 million .64 .88 .81 .76
D. Lin, 1998. An information-theoretic definition of similarity, Proceedings of the International Conference on
Machine Learning, Madison.
res (c1,c2) = IC (LCS (c1,c2)) 1995-2001 P. Resnik, 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the
14th International Joint Conference on Artificial Intelligence, Montreal.
Contact Jiang & Conrath (1997)
AFE 174 million .66 .88 .80 .77 Experimental Data
L. Finkelstein, et al. 2002. Placing search in context: The concept revisited. ACM Transactions on Information
jcn (c1, c2) = 1/IC(c1) + IC(c2) – 2 * res (c1, c2) APW 560 million .75 .84 .79 .76
Systems, 20(1):116-131. http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
Ted Pedersen G. A. Miller and W.G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive
Department of Computer Science Lin (1998)
University of Minnesota, Duluth NYT 963 million .83 .84 .79 .77
H. Rubenstein and J.B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM,
lin (c1, c2) = 2 * res (c1, c2) / IC(c1) + IC (c2) 8:627-633.
firstname.lastname@example.org ADD-1 0 1.00 .85 .77 .76 Pedersen, et al. 2004. WordNet::Similarity – Measuring the relatedness of concepts. In Proceedings of
http://www.d.umn.edu/~tpederse HLT/NAACL 2004, pg 38-41, Boston, MA.
Poster Design & Printing by Genigraphics® - 800.790.4001