A Similarity Measure Based on Semantic and Linguistic Information<br />Nitish Aggarwal<br />DERI, NUI Galway<br />firstnam...
Based On:<br />“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”<br />Authors: Giuse...
Overview<br />Introduction<br />Classical Approaches<br />Ontology-based Similarity<br />Set of relations <br />Informatio...
Introduction & Motivation<br />Short-text Similarity<br />Lack of Semantics and Linguistics<br />Applications<br />Semanti...
Classical Approaches<br />String Similarity<br />Levenshteindistance, Dice Coefficient<br />Corpus-based<br />ESA, Google ...
First Paper:<br />“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”<br />Authors: Gi...
Ontology-based - Overview<br />Features<br />Whole set of semantic relations defined in an ontology<br />Resnik’s Informat...
Ontology-based - Why whole set?<br />8<br />Relation: Part of<br />Eyes<br />Ears<br />
Ontology-based - model<br />Tversky’s feature-based similarity model<br />common features of two concepts ~ similarity<br ...
Ontology-based - Mapping<br />1<br />10<br /><ul><li>Mapping between feature-based and information theoretic similarity mo...
Ontology-based - Example<br />11<br />T1: Car<br />T2: Bicycle<br />		Example of Concept Feature<br />
Ontology-based - Example<br />12<br />T1: Car<br />T2: Bicycle<br />		Example of Concept Feature<br />
Ontology-based - Framework<br />Intrinsic information content(iIC)<br />.<br />where sub(c) is number of sub-concept of gi...
DataSet: 65 human evaluated pairs<br />Correlation values:<br />14<br />Ontology-based – Evaluation of Similarity<br />
Ontology-based – Evaluation of Relatedness<br />DataSet : Wordnet 353<br />Correlationvalue:<br />15<br />
Ontology-based - Summary<br />Intrinsic similarity measure <br />Ontology-based similarity<br />Outperforms corpus measure...
Second paper (SyMSS)<br />“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”<br />Aut...
SyMSS - Overview<br />SyMSS = “syntax-based similarity for short-term text”<br />Syntactic Information<br />Not only word ...
SyMSS - Semantic Information<br />Path-base measure<br />Shortest path<br />Hirst and st. Onge (HSO)<br />Information Cont...
SyMSS - Syntactic Information<br />Parse tree <br />phrases<br />Head of phrases<br />Head similarity<br />Head of phrases...
SyMSS - Model<br />My brother has a dog with four legs<br />My brother has four legs<br />Sim(Has,Has) = 1<br />Sim(brothe...
SyMSS - Evaluation<br />DataSet: 30 pairs out of 65 human evaluated pairs<br />Correlation values:<br />22<br />
SyMSS - Effect of adverb and adjective<br />Sentence1: ”I have a big dog”<br />Sentence2: ”I have a little dog”<br />8.68%...
SyMSS - Summary<br />Syntax-based similarity considers…<br />Nouns and verbs<br />Influence of adjectives and adverbs<br /...
Conclusion<br />No established method for short text<br />Parsing of phrases is difficult<br />Concept similarity depend o...
Upcoming SlideShare
Loading in …5
×

A similarity measure based on semantic and linguistic information

1,328 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,328
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
32
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A similarity measure based on semantic and linguistic information

  1. 1. A Similarity Measure Based on Semantic and Linguistic Information<br />Nitish Aggarwal<br />DERI, NUI Galway<br />firstname.lastname@deri.org<br />Wednesday,15th June, 2011<br />DERI, Reading Group<br />1<br />
  2. 2. Based On:<br />“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”<br />Authors: Giuseppe Pirro and JeoromeEuzenat<br />Published: International Semantic Web Conference, 2010<br />“SyMSS: A syntax-based measure for short-text semantic similarity ”<br />Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias<br />Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011 <br />2<br />
  3. 3. Overview<br />Introduction<br />Classical Approaches<br />Ontology-based Similarity<br />Set of relations <br />Information Content<br />SyMSS (Syntax-based)<br />Deep Parsing <br />Influence of adjectives and adverbs<br />Conclusion<br />3<br />
  4. 4. Introduction & Motivation<br />Short-text Similarity<br />Lack of Semantics and Linguistics<br />Applications<br />Semantic Annotation<br />Semantic Search<br />Information Retrieval and Extraction<br />4<br />
  5. 5. Classical Approaches<br />String Similarity<br />Levenshteindistance, Dice Coefficient<br />Corpus-based<br />ESA, Google distance,Vector-Space Model<br />Ontology-based<br />Path distance, Information content<br />Syntax Similarity<br />Word-order, Part of Speech<br />5<br />
  6. 6. First Paper:<br />“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”<br />Authors: Giuseppe Pirro and JeoromeEuzenat<br />Published: International Semantic Web Conference, 2010<br />“SyMSS: A syntax-based measure for short-text semantic similarity ”<br />Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias<br />Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011<br />6<br />
  7. 7. Ontology-based - Overview<br />Features<br />Whole set of semantic relations defined in an ontology<br />Resnik’s Information Content<br />IC(c) = -log p(c)<br />Intrinsic Information Content<br />Overcome the analysis of large corpora<br />Extended Information Content<br />Map feature-based model to information theoretic domain<br />7<br />
  8. 8. Ontology-based - Why whole set?<br />8<br />Relation: Part of<br />Eyes<br />Ears<br />
  9. 9. Ontology-based - model<br />Tversky’s feature-based similarity model<br />common features of two concepts ~ similarity<br />Extra feature ~ 1/similarity<br />.<br />Ratio-base formulation of Tverky’s model<br />.<br />9<br />
  10. 10. Ontology-based - Mapping<br />1<br />10<br /><ul><li>Mapping between feature-based and information theoretic similarity models</li></ul>1. MSCA: Most Specific Common Abstraction<br />
  11. 11. Ontology-based - Example<br />11<br />T1: Car<br />T2: Bicycle<br /> Example of Concept Feature<br />
  12. 12. Ontology-based - Example<br />12<br />T1: Car<br />T2: Bicycle<br /> Example of Concept Feature<br />
  13. 13. Ontology-based - Framework<br />Intrinsic information content(iIC)<br />.<br />where sub(c) is number of sub-concept of given concept c.<br />Extended information content(eIC)<br />where EIC(c) is relatedness coefficient using all kind of relations<br />13<br />
  14. 14. DataSet: 65 human evaluated pairs<br />Correlation values:<br />14<br />Ontology-based – Evaluation of Similarity<br />
  15. 15. Ontology-based – Evaluation of Relatedness<br />DataSet : Wordnet 353<br />Correlationvalue:<br />15<br />
  16. 16. Ontology-based - Summary<br />Intrinsic similarity measure <br />Ontology-based similarity<br />Outperforms corpus measures<br />Limitation<br />No short-text<br />Model-based<br />E,g, only concepts in the ontology are considered (e.g. car accident)<br />16<br />
  17. 17. Second paper (SyMSS)<br />“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”<br />Authors: Giuseppe Pirro and JeoromeEuzenat<br />Published: International Semantic Web Conference, 2010<br />“SyMSS: A syntax-based measure for short-text semantic similarity ”<br />Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias<br />Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011 <br />17<br />
  18. 18. SyMSS - Overview<br />SyMSS = “syntax-based similarity for short-term text”<br />Syntactic Information<br />Not only word order<br />Deep Parsing<br />Parts of speech<br />Semantic Information<br />Wordnet similarity<br />Different ontology-based similarity<br />18<br />
  19. 19. SyMSS - Semantic Information<br />Path-base measure<br />Shortest path<br />Hirst and st. Onge (HSO)<br />Information Content<br />Resnik measure <br />Jiang and Corath measure<br />Lin measure<br />Gloss-base measure<br />Gloss Overlap and Gloss vector<br />19<br />
  20. 20. SyMSS - Syntactic Information<br />Parse tree <br />phrases<br />Head of phrases<br />Head similarity<br />Head of phrases which have same syntactic function <br />Penalization factor<br />Non shared phrases<br />20<br />
  21. 21. SyMSS - Model<br />My brother has a dog with four legs<br />My brother has four legs<br />Sim(Has,Has) = 1<br />Sim(brother,brother) = 1<br />Sim(dog,leg) = 0.1414<br />PF = 0.03<br />
  22. 22. SyMSS - Evaluation<br />DataSet: 30 pairs out of 65 human evaluated pairs<br />Correlation values:<br />22<br />
  23. 23. SyMSS - Effect of adverb and adjective<br />Sentence1: ”I have a big dog”<br />Sentence2: ”I have a little dog”<br />8.68% gain in SyMSS with HSO<br />23<br />
  24. 24. SyMSS - Summary<br />Syntax-based similarity considers…<br />Nouns and verbs<br />Influence of adjectives and adverbs<br />Limitation<br />Depend on parsed structure<br />E.g. not grammatically correct<br />Depend on word similarity<br />24<br />
  25. 25. Conclusion<br />No established method for short text<br />Parsing of phrases is difficult<br />Concept similarity depend on model<br />Weak model<br />E.g. xebr: Extraordinary Income and xebr: Other Operating Income -><br />Pathlength = 0.2 and Expert = 0.8 <br />Need a syntactic similarity for concepts tag (word or phrase) <br />25<br />

×