Feb20 mayo-webinar-21feb2012

914 views

Published on

Measuring Semantic Similarity and Relatedness in the Biomedical Domain : Methods and Applications - presented Feb 21, 2012 as a webinar to the Mayo Clinic BMI group.

Published in: Health & Medicine, Education
1 Comment
0 Likes
Statistics
Notes
  • great summary! thank you,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
914
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
39
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Feb20 mayo-webinar-21feb2012

  1. 1. Measuring Semantic Similarity and Relatedness in the Biomedical Domain : Methods and Applications Ted Pedersen, Ph.D. Department of Computer Science University of Minnesota, Duluth [email_address] http://www.d.umn.edu/~tpederse February 21, 2012
  2. 2. Acknowledgments <ul><li>This work on semantic similarity and relatedness has been supported by a National Science Foundation CAREER award (2001 – 2007, #0092784, PI Pedersen) and by the National Library of Medicine, National Institutes of Health (2008 – present, 1R01LM009623-01A2, PI Pakhomov)
  3. 3. The contents of this talk are solely my responsibility and do not necessarily represent the official views of the National Science Foundation or the National Institutes of Health. </li></ul>
  4. 4. Topics <ul><li>Semantic similarity vs. semantic relatedness
  5. 5. How to measure similarity </li><ul><li>With ontologies and corpora </li></ul><li>How to measure relatedness </li><ul><li>With definitions and corpora </li></ul><li>Applications? </li><ul><li>Word Sense Disambiguation
  6. 6. Sentiment Classification </li></ul></ul>
  7. 7. What are we measuring? <ul><li>Concept pairs </li><ul><li>Assign a numeric value that quantifies how similar or related two concepts are </li></ul><li>Not words </li><ul><li>Must know concept underlying a word form
  8. 8. Cold may be temperature or illness </li><ul><li>Concept Mapping
  9. 9. Word Sense Disambiguation </li></ul><li>But, WSD can be done using semantic similarity! </li><ul><li>SenseRelate, later in the talk </li></ul></ul></ul>
  10. 10. Why? <ul><li>Being able to organize concepts by their similarity or relatedness to each other is a fundamental operation in the human mind, and in many problems in Natural Language Processing and Artificial Intelligence
  11. 11. If we know a lot about X, and if we know Y is similar to X, then a lot of what we know about X may apply to Y </li><ul><li>Use X to explain or categorize Y </li></ul></ul>
  12. 12. GOOD NEWS! Free Open Source Software! <ul><li>WordNet::Similarity </li><ul><li>http://wn-similarity.sourceforge.net
  13. 13. General English
  14. 14. Widely used (+550 citations) </li></ul><li>UMLS::Similarity </li><ul><li>http://umls-similarity.sourceforge.net
  15. 15. Unified Medical Language System
  16. 16. Spun off from WordNet::Similarity </li><ul><li>But has added a whole lot! </li></ul></ul></ul>
  17. 17. Similar or Related? <ul><li>Similarity based on is-a relations </li></ul><ul><ul><li>How much is X like Y?
  18. 18. Share ancestor in is-a hierarchy </li><ul><li>LCS : least common subsumer
  19. 19. Closer / deeper the ancestor the more similar </li></ul></ul><li>Tetanus and strep throat are similar </li><ul><li>both are kinds-of bacterial infections </li></ul></ul>
  20. 20. Least Common Subsumer (LCS)
  21. 21. Similar or Related ? <ul><li>Relatedness more general </li><ul><li>How much is X related to Y?
  22. 22. Many ways to be related </li><ul><li>is-a, part-of, treats, affects, symptom-of, ... </li></ul></ul><li>Tetanus and deep cuts are related but they really aren't similar </li><ul><li>(deep cuts can cause tetanus) </li></ul><li>All similar concepts are related, but not all related concepts are similar </li></ul>
  23. 23. Measures of Similarity (WordNet::Similarity & UMLS::Similarity ) <ul><li>Path Based </li></ul><ul><ul><li>Rada et al., 1989 (path)
  24. 24. Caviedes & Cimino, 2004 (cdist)* </li><ul><li>cdist only in UMLS::Similarity </li></ul></ul></ul><ul><li>Path + Depth </li></ul><ul><ul><li>Wu & Palmer, 1994 (wup)
  25. 25. Leacock & Chodorow, 1998 (lch) </li></ul></ul><ul><ul><li>Zhong et al., 2002 (zhong)* </li></ul></ul><ul><ul><li>Nguyen & Al-Mubaid, 2006 (nam)* </li><ul><li>zhong and nam only in UMLS::Similarity </li></ul></ul></ul>
  26. 26. Measures of Similarity (WordNet::Similarity & UMLS::Similarity) <ul><li>Path + Information Content </li><ul><li>Resnik, 1995 (res)
  27. 27. Jiang & Conrath, 1997 (jcn)
  28. 28. Lin, 1998 (lin) </li></ul></ul>
  29. 29. Path Based Measures <ul><li>Distance between concepts (nodes) in tree intuitively appealing
  30. 30. Spatial orientation, good for networks or maps but not is-a hierarchies </li><ul><li>Reasonable approximation sometimes
  31. 31. Assumes all paths have same “weight”
  32. 32. But, more specific (deeper) paths tend to travel less semantic distance </li></ul><li>Shortest path a good start, but needs corrections </li></ul>
  33. 33. Shortest is-a Path 1 <ul><li>path(a,b) = ------------------------------
  34. 34. shortest is-a path(a,b) </li></ul>
  35. 35. We count nodes... <ul><li>Maximum = 1 </li><ul><li>self similarity
  36. 36. path(tetanus,tetanus) = 1 </li></ul><li>Minimum = 1 / (longest path in isa tree) </li><ul><li>path(typhoid, oral thrush) = 1/7
  37. 37. path(food poisoning, strep throat) = 1/7
  38. 38. etc... </li></ul></ul>
  39. 39. path(strep throat, tetanus) = .25
  40. 40. path (bacterial infections, yeast infections) = .25
  41. 41. ? <ul><li>Are bacterial infections and yeast infections similar to the same degree as are tetanus and strep throat ?
  42. 42. The path measure says “yes, they are.” </li></ul>
  43. 43. Path + Depth <ul><li>Path only doesn't account for specificity
  44. 44. Deeper concepts more specific
  45. 45. Paths between deeper concepts travel less semantic distance </li></ul>
  46. 46. Wu and Palmer, 1994 <ul>2 * depth (LCS (a,b)) <li>wup(a,b) = ----------------------------
  47. 47. depth (a) + depth (b)
  48. 48. depth(x) = shortest is-a path(root,x) </li></ul>
  49. 49. wup(strep throat, tetanus) = (2*2)/(4+3) = .57
  50. 50. wup (bacterial infections, yeast infections) = (2*1)/(2+3) = .4
  51. 51. ? <ul><li>Wu and Palmer say that strep throat and tetanus (.57) are more similar than are bacterial infections and yeast infections (.4)
  52. 52. Path says that strep throat and tetanus (.25) are equally similar as are bacterial infections and yeast infections (.25) </li></ul>
  53. 53. Information Content <ul><li>ic(concept) = -log p(concept) [Resnik 1995] </li><ul><li>Need to count concepts
  54. 54. Term frequency +Inherited frequency
  55. 55. p(concept) = tf + if / N </li></ul><li>Depth shows specificity but not frequency
  56. 56. Low frequency concepts often much more specific than high frequency ones </li></ul>
  57. 57. Information Content term frequency (tf)
  58. 58. Information Content inherited frequency (if)
  59. 59. Information Content (IC = -log (f/N)) final count (f = tf + if, N = 365,820)
  60. 60. Lin, 1998 <ul>2 * IC (LCS (a,b)) <li>lin(a,b) = --------------------------
  61. 61. IC (a) + IC (b)
  62. 62. Look familiar?
  63. 63. 2* depth (LCS (a,b) )
  64. 64. wup(a,b) = ------------------------------
  65. 65. depth(a) + depth (b) </li></ul>
  66. 66. <ul>lin (strep throat, tetanus) = 2 * 2.26 / (5.21 + 4.11) = 0.485 </ul>
  67. 67. <ul>lin (strep throat, tetanus) = 2 * 2.26 / (5.21 + 4.11) = 0.485 </ul>
  68. 68. <ul>lin (bacterial infection, yeast infection) = 2 * 0.71 / (2.26+2.81) = 0.280 </ul>
  69. 69. ? <ul><li>Lin says that strep throat and tetanus (.49) are more similar than are bacterial infection and yeast infection (.28)
  70. 70. Wu and Palmer say that strep throat and tetanus (.57) are more similar than are bacterial infection and yeast infection (.4)
  71. 71. Path says that strep throat and tetanus (.25) are equally similar as are bacterial infection and yeast infection (.25) </li></ul>
  72. 72. How to decide? <ul><li>Consider quality of information </li><ul><li>If you have a consistent and reliable ontology, maybe depth based works best
  73. 73. If you have corpora with reliable sense tags or concept mapping, maybe information content </li></ul><li>Evaluation </li><ul><li>Intrinsic (relative to human performance) </li><ul><li>Data at : http://rxinformatics.umn.edu </li></ul><li>Extrinsic (task based) </li></ul></ul>
  74. 74. What about concepts not connected via is-a relations? <ul><li>Connected via other relations? </li><ul><li>Part-of, treatment-of, causes, etc. </li></ul><li>Not connected at all? </li><ul><li>In different sections (axes) of an ontology (infections and treatments)
  75. 75. In different ontologies entirely (SNOMEDCT and FMA) </li></ul><li>Relatedness! </li><ul><li>Use definition information
  76. 76. No is-a relations so can't be similarity </li></ul></ul>
  77. 77. Measures of relatedness WordNet::Similarity & UMLS::Similarity <ul><li>Definition based </li></ul><ul><ul><li>Lesk, 1986
  78. 78. Adapted lesk (lesk) </li><ul><li>Banerjee & Pedersen, 2003 </li></ul></ul></ul><ul><li>Definition + corpus </li></ul><ul><ul><li>Gloss Vector (vector) </li><ul><li>Patwardhan & Pedersen, 2006 </li></ul></ul></ul>
  79. 79. Measuring relatedness with definitions <ul><li>Related concepts defined using many of the same terms
  80. 80. But, definitions are short, inconsistent
  81. 81. Concepts don't need to be connected via relations or paths to measure them </li><ul><li>Lesk, 1986
  82. 82. Adapted Lesk, Banerjee & Pedersen, 2003 </li></ul></ul>
  83. 83. Two separate ontologies...
  84. 84. Could join them together … ?
  85. 85. Each concept has definition
  86. 86. Find overlaps in definitions...
  87. 87. Overlaps <ul><li>Oral Thrush and Alopecia </li><ul><li>side effect of chemotherapy </li><ul><li>Can't see this in structure of is-a hierarchies
  88. 88. Oral thrush and folliculitis just as similar </li></ul></ul><li>Alopecia and Folliculitis </li><ul><li>hair disorder & hair </li><ul><li>Reflects structure of is-a hierarchies
  89. 89. If you start with text like this maybe you can build is-a hierarchies automatically! </li><ul><li>Another tutorial... </li></ul></ul></ul></ul>
  90. 90. Lesk and Adapted Lesk <ul><li>Lesk, 1986 : measure overlaps in definitions to assign senses to words </li><ul><li>The more overlaps between two senses (concepts), the more related </li></ul><li>Banerjee & Pedersen, 2003, Adapted Lesk </li><ul><li>Augment definition of each concept with definitions of related concepts </li><ul><li>Build a super gloss </li></ul><li>Increase chance of finding overlaps </li></ul><li>lesk in WordNet::Similarity & UMLS::Similarity </li></ul>
  91. 91. The problem with definitions ... <ul><li>Definitions contain variations of terminology that make it impossible to find exact overlaps
  92. 92. Alopecia : … a result of cancer treatment
  93. 93. Thrush : … a side effect of chemotherapy </li><ul><li>Real life example, I modified the alopecia definition to work better with Lesk!!!
  94. 94. NO MATCHES!! </li></ul><li>How can we see that “result” and “side effect” are similar, as are “cancer treatment” and “chemotherapy” ? </li></ul>
  95. 95. Gloss Vector Measure of Semantic Relatedness <ul><li>Rely on co-occurrences of terms </li><ul><li>Terms that occur within some given number of terms of each other other </li></ul><li>Allows for a fuzzier notion of matching
  96. 96. Exploits second order co-occurrences </li><ul><li>Friend of a friend relation
  97. 97. Suppose cancer_treatment and chemotherapy don't occur in text with each other. But, suppose that “survival” occurs with each.
  98. 98. cancer_treatment and chemotherapy are second order co-occurrences via “survival” </li></ul></ul>
  99. 99. Gloss Vector Measure of Semantic Relatedness <ul><li>Replace words or terms in definitions with vector of co-occurrences observed in corpus
  100. 100. Defined concept now represented by an averaged vector of co-occurrences
  101. 101. Measure relatedness of concepts via cosine between their respective vectors
  102. 102. Patwardhan and Pedersen, 2006 </li><ul><li>Inspired by Schutze, 1998 </li></ul><li>vector in WordNet::Similarity & UMLS::Similarity </li></ul>
  103. 103. Experimental Results <ul><li>Vector > Lesk > Info Content > Depth > Path </li><ul><li>Clear trend across various studies </li></ul><li>Dramatic differences when comparing to human reference standards (Vector > Lesk >> Info Content > Depth > Path) </li><ul><li>Banerjee and Pedersen, 2003 (IJCAI)
  104. 104. Pedersen, et al. 2007 (JBI) </li></ul><li>Differences less extreme in extrinsic task-based evaluations </li><ul><li>Human raters mix up similarity & relatedness? </li></ul></ul>
  105. 105. So far we've shown that ... <ul><li>… we can quantify the similarity and relatedness between concepts using a variety of sources of information </li><ul><li>Paths
  106. 106. Depths
  107. 107. Information content
  108. 108. Definitions
  109. 109. Co-occurrence / corpus data </li></ul><li>There is open source software to help you! </li></ul>
  110. 110. Sounds great! What now? <ul><li>SenseRelate Hypothesis : Most words in text will have multiple possible senses and will often be used with the sense most related to those of surrounding words </li><ul><li>He either has a cold or the flu </li><ul><li>Cold not likely to mean temperature </li></ul></ul><li>The underlying sentiment of a text can be discovered by determining which emotion is most related to the words in that text </li><ul><li>I cried a lot after my mother died. </li><ul><li>Happy? </li></ul></ul></ul>
  111. 111. SenseRelate! <ul><li>In coherent text words will be used in similar or related senses, and these will also be related to the overall topic or mood of a text
  112. 112. First applied to WSD in 2002 </li><ul><li>Banerjee and Pedersen, 2002 (WordNet)
  113. 113. Patwardhan et al., 2003 (WordNet)
  114. 114. Pedersen and Kolhatkar 2009 (WordNet)
  115. 115. McInnes et al., 2011 (UMLS) </li></ul><li>Recently applied to emotion classification </li><ul><li>Pedersen, 2012 (i2b2 suicide notes challenge) </li></ul></ul>
  116. 116. (MORE) GOOD NEWS! Free Open Source Software! <ul><li>WordNet::SenseRelate </li><ul><li>AllWords, TargetWord, WordToSet
  117. 117. http://senserelate.sourceforge.net </li></ul><li>UMLS::SenseRelate </li><ul><li>AllWords, TargetWord
  118. 118. http://search.cpan.org/dist/UMLS-SenseRelate/ </li></ul></ul>
  119. 119. SenseRelate for WSD <ul><li>Assign each word the sense which is most similar or related to one or more of its neighbors </li><ul><li>Pairwise
  120. 120. 2 or more neighbors </li></ul><li>Pairwise algorithm results in a trellis much like in HMMs </li><ul><li>More neighbors adds lots of information and a lot of computational complexity </li></ul></ul>
  121. 121. SenseRelate - pairwise
  122. 122. SenseRelate – 2 neighbors
  123. 123. SenseRelate for Sentiment Classification <ul><li>Find emotion most related to context </li><ul><li>Similarity less effective since many words can be related an emotion, but fewer are similar </li><ul><li>Related to happy? : love, food, success, ...
  124. 124. Similar to happy? : joyful, ecstatic, pleased, … </li></ul><li>Pairwise comparisons between emotion and senses of words in context </li></ul><li>Same form as Naive Bayesian model or Latent Variable model </li><ul><li>WordNet::SenseRelate::WordToSet </li></ul></ul>
  125. 125. SenseRelate - WordToSet
  126. 126. Experimental Results <ul><li>WSD results vary with part of speech </li><ul><li>Nouns reasonably accurate; verbs, adjectives and adverbs less so
  127. 127. Jiang-Conrath measure often a high performer for nouns (e.g., Patwardhan et al. 2003)
  128. 128. Vector and lesk do well because of coverage (handle mixed pairs while others don't) </li></ul><li>Sentiment classification results in 2011 i2b2 suicide notes challenge were disappointing (Pedersen, 2012) </li><ul><li>Suicide notes not very emotional! </li></ul></ul>
  129. 129. Future Work <ul><li>Integrate Unsupervised Clustering with WordNet::Similarity and UMLS::Similarity </li><ul><li>http://senseclusters.sourceforge.net </li></ul><li>Exploit graphical nature of of SenseRelate </li><ul><li>e.g., Minimal Spanning Trees / Viterbi Algorithm to solve larger problem spaces? </li></ul><li>Incorporate SenseRelate into Biomedicus </li><ul><li>Ongoing, Pakhomov and Bill (UMTC)
  130. 130. UIMA/Java based Clinical NLP pipeline </li></ul><li>Attract and support users for all of these tools! </li></ul>
  131. 131. Conclusion <ul><li>Measures of semantic similarity and relatedness are supported by a rich body of theory, and good open source software </li><ul><li>http://wn-similarity.sourceforge.net
  132. 132. http://umls-similarity.sourceforge.net </li></ul><li>These measures can be viewed as building blocks for many NLP and AI applications </li><ul><li>Word sense disambiguation </li><ul><li>http://senserelate.sourceforge.net </li></ul><li>Sentiment classification </li></ul></ul>
  133. 133. UMLS::Similarity Collaborators <ul><li>Serguei Pakhomov : Mayo, UMTC </li><ul><li>PI of NIH similarity and relatedness grant that enabled development of UMLS::Similarity </li></ul><li>Bridget McInnes </li><ul><li>BS UMD, 2002; MS UMD, 2004
  134. 134. PhD UMTC, 2009
  135. 135. Post-doc UMTC, 2009 - 2011
  136. 136. Now at Securboration </li></ul><li>Ying Liu </li><ul><li>Post-doc UMTC 2009 - present </li></ul></ul>
  137. 137. WordNet::Similarity Collaborators <ul><li>Varada Kolhatkar : MS UMD, 2009 </li><ul><li>Now PhD student at U of Toronto </li></ul><li>Jason Michelizzi, : MS UMD, 2005 </li><ul><li>Now US Navy pilot </li></ul><li>Siddharth Patwardhan : MS UMD, 2003 </li><ul><li>PhD Utah, 2009
  138. 138. Now at IBM Watson Research </li></ul><li>Satanjeev Banerjee : MS UMD 2002 </li><ul><li>PhD CMU, 2010
  139. 139. Now at Twitter Research </li></ul></ul>
  140. 140. Development Milestones <ul><li>Satanjeev Banerjee MS thesis 2002 </li><ul><li>Lesk algorithm for WSD, realized that lesk measure and WSD could be separated!
  141. 141. Original developer of WordNet::Similarity::lesk </li></ul><li>Siddharth Patwardhan, MS thesis 2003 </li><ul><li>Original developer of WordNet::Similarity
  142. 142. Original developer of WordNet::SenseRelate::TargetWord
  143. 143. Original developer of SNOMED::Similarity </li><ul><li>At Mayo, Summer 2003 with S. Pakhomov </li></ul></ul></ul>
  144. 144. Development Milestones <ul><li>Jason Michelizzi, MS thesis 2005 </li><ul><li>Rewrote WordNet::Similarity
  145. 145. Original developer of WordNet::SenseRelate::AllWords and WordNet::SenseRelate::WordToSet </li></ul><li>Varada Kolhatkar, MS thesis 2009 </li><ul><li>Rewrote WordNet::SenseRelate::AllWords </li></ul></ul>
  146. 146. Development Milestones <ul><li>Bridget McInnes, post-doc UMTC 2009 – 2011 </li><ul><li>Original developer of UMLS::Similarity </li><ul><li>Spun off from WordNet::Similarity and SNOMED::Similarity </li></ul><li>Original developer of UMLS::SenseRelate </li><ul><li>Spun off from WordNet::SenseRelate::AllWords and WordNet::SenseRelate::TargetWord </li></ul></ul><li>Ying Liu, post-doc UMTC 2009 – present </li><ul><li>Original developer of gloss vector measure for UMLS::Similarity </li><ul><li>Spun off (conceptually at least) from WordNet::Similarity::vector </li></ul></ul></ul>
  147. 147. References <ul><li>S. Banerjee and T. Pedersen. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pages 136—145, Mexico City, February 2002.
  148. 148. S. Banerjee and T. Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805-810, Acapulco, August 2003.
  149. 149. J. Caviedes and J. Cimino. Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics, 37(2):77-85, April 2004.
  150. 150. J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings on International Conference on Research in Computational Linguistics, pages 19-33, Taiwan, 1997. </li></ul>
  151. 151. References <ul><li>C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In C. Fellbaum, editor, WordNet: An electronic lexical database, pages 265-283. MIT Press, 1998.
  152. 152. M.E. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine code from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, pages 24-26. ACM Press, 1986.
  153. 153. D. Lin. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning, Madison, August 1998.
  154. 154. B. McInnes, T. Pedersen, Y. Liu, G. Melton and S. Pakhomov. Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, pages 895-904, Washington, DC, October 2011. </li></ul>
  155. 155. References <ul><li>H.A. Nguyen and H. Al-Mubaid. New ontology-based semantic similarity measure for the biomedical domain. In Proceedings of the IEEE International Conference on Granular Computing, pages 623-628, Atlanta, GA, May 2006.
  156. 156. S. Patwardhan, S. Banerjee, and T. Pedersen. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pages 241—257, Mexico City, February 2003.
  157. 157. S. Patwardhan and T. Pedersen. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In Proceedings of the EACL 2006 Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together, pages 1-8, Trento, Italy, April 2006.
  158. 158. T. Pedersen. Rule-based and lightly supervised methods to predict emotions in suicide notes. Biomedical Informatics Insights, 2012:5 (Suppl. 1):185-193, January 2012. </li></ul>
  159. 159. References <ul><li>T. Pedersen and V. Kolhatkar. WordNet :: SenseRelate :: AllWords - a broad coverage word sense tagger that maximizes semantic relatedness. In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies 2009 Conference, pages 17-20, Boulder, CO, June 2009.
  160. 160. T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3) : 288-299, June 2007.
  161. 161. R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17-30, 1989. </li></ul>
  162. 162. References <ul><li>P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448-453, Montreal, August 1995.
  163. 163. H. Schütze. Automatic word sense discrimination. Computational Linguistics, 24(1):97-123, 1998.
  164. 164. J. Zhong, H. Zhu, J. Li, and Y. Yu. Conceptual graph matching for semantic search. Proceedings of the 10th International Conference on Conceptual Structures, pages 92-106, 2002. </li></ul>
  165. 165. Semantic Similarity for the Gene Ontology <ul><li>Various measures supported: </li><ul><li>http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml </li></ul></ul>
  166. 166. Path Based Relatedness <ul><li>Ontologies include relations other than is-a
  167. 167. These can be used to find shortest paths between concepts </li><ul><li>e.g., WordNet::Similarity::hso
  168. 168. However, a path made up of different kinds of relations can lead to big semantic jumps
  169. 169. Aspirin treats headaches which are a symptom of the flu which can be prevented by a flu vaccine which is recommend for children </li><ul><li>…. so aspirin and children are related ?? </li></ul></ul></ul>
  170. 170. Path Based Measure of Relatedness <ul><li>WebServices::UMLSKS::Similarity </li><ul><li>HSO implementation for UMLS </li><ul><li>Developed at University of Minnesota, Duluth 2010-2012, ongoing
  171. 171. http://search.cpan.org/dist/WebService-UMLSKS-Similarity/ </li></ul></ul></ul>

×