A similarity measure based on semantic and linguistic information
Upcoming SlideShare
Loading in...5
×
 

A similarity measure based on semantic and linguistic information

on

  • 1,175 views

 

Statistics

Views

Total Views
1,175
Views on SlideShare
1,175
Embed Views
0

Actions

Likes
0
Downloads
21
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A similarity measure based on semantic and linguistic information A similarity measure based on semantic and linguistic information Presentation Transcript

  • A Similarity Measure Based on Semantic and Linguistic Information
    Nitish Aggarwal
    DERI, NUI Galway
    firstname.lastname@deri.org
    Wednesday,15th June, 2011
    DERI, Reading Group
    1
  • Based On:
    “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
    Authors: Giuseppe Pirro and JeoromeEuzenat
    Published: International Semantic Web Conference, 2010
    “SyMSS: A syntax-based measure for short-text semantic similarity ”
    Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
    Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
    2
  • Overview
    Introduction
    Classical Approaches
    Ontology-based Similarity
    Set of relations
    Information Content
    SyMSS (Syntax-based)
    Deep Parsing
    Influence of adjectives and adverbs
    Conclusion
    3
  • Introduction & Motivation
    Short-text Similarity
    Lack of Semantics and Linguistics
    Applications
    Semantic Annotation
    Semantic Search
    Information Retrieval and Extraction
    4
  • Classical Approaches
    String Similarity
    Levenshteindistance, Dice Coefficient
    Corpus-based
    ESA, Google distance,Vector-Space Model
    Ontology-based
    Path distance, Information content
    Syntax Similarity
    Word-order, Part of Speech
    5
  • First Paper:
    “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
    Authors: Giuseppe Pirro and JeoromeEuzenat
    Published: International Semantic Web Conference, 2010
    “SyMSS: A syntax-based measure for short-text semantic similarity ”
    Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
    Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
    6
  • Ontology-based - Overview
    Features
    Whole set of semantic relations defined in an ontology
    Resnik’s Information Content
    IC(c) = -log p(c)
    Intrinsic Information Content
    Overcome the analysis of large corpora
    Extended Information Content
    Map feature-based model to information theoretic domain
    7
  • Ontology-based - Why whole set?
    8
    Relation: Part of
    Eyes
    Ears
  • Ontology-based - model
    Tversky’s feature-based similarity model
    common features of two concepts ~ similarity
    Extra feature ~ 1/similarity
    .
    Ratio-base formulation of Tverky’s model
    .
    9
  • Ontology-based - Mapping
    1
    10
    • Mapping between feature-based and information theoretic similarity models
    1. MSCA: Most Specific Common Abstraction
  • Ontology-based - Example
    11
    T1: Car
    T2: Bicycle
    Example of Concept Feature
  • Ontology-based - Example
    12
    T1: Car
    T2: Bicycle
    Example of Concept Feature
  • Ontology-based - Framework
    Intrinsic information content(iIC)
    .
    where sub(c) is number of sub-concept of given concept c.
    Extended information content(eIC)
    where EIC(c) is relatedness coefficient using all kind of relations
    13
  • DataSet: 65 human evaluated pairs
    Correlation values:
    14
    Ontology-based – Evaluation of Similarity
  • Ontology-based – Evaluation of Relatedness
    DataSet : Wordnet 353
    Correlationvalue:
    15
  • Ontology-based - Summary
    Intrinsic similarity measure
    Ontology-based similarity
    Outperforms corpus measures
    Limitation
    No short-text
    Model-based
    E,g, only concepts in the ontology are considered (e.g. car accident)
    16
  • Second paper (SyMSS)
    “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
    Authors: Giuseppe Pirro and JeoromeEuzenat
    Published: International Semantic Web Conference, 2010
    “SyMSS: A syntax-based measure for short-text semantic similarity ”
    Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
    Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
    17
  • SyMSS - Overview
    SyMSS = “syntax-based similarity for short-term text”
    Syntactic Information
    Not only word order
    Deep Parsing
    Parts of speech
    Semantic Information
    Wordnet similarity
    Different ontology-based similarity
    18
  • SyMSS - Semantic Information
    Path-base measure
    Shortest path
    Hirst and st. Onge (HSO)
    Information Content
    Resnik measure
    Jiang and Corath measure
    Lin measure
    Gloss-base measure
    Gloss Overlap and Gloss vector
    19
  • SyMSS - Syntactic Information
    Parse tree
    phrases
    Head of phrases
    Head similarity
    Head of phrases which have same syntactic function
    Penalization factor
    Non shared phrases
    20
  • SyMSS - Model
    My brother has a dog with four legs
    My brother has four legs
    Sim(Has,Has) = 1
    Sim(brother,brother) = 1
    Sim(dog,leg) = 0.1414
    PF = 0.03
  • SyMSS - Evaluation
    DataSet: 30 pairs out of 65 human evaluated pairs
    Correlation values:
    22
  • SyMSS - Effect of adverb and adjective
    Sentence1: ”I have a big dog”
    Sentence2: ”I have a little dog”
    8.68% gain in SyMSS with HSO
    23
  • SyMSS - Summary
    Syntax-based similarity considers…
    Nouns and verbs
    Influence of adjectives and adverbs
    Limitation
    Depend on parsed structure
    E.g. not grammatically correct
    Depend on word similarity
    24
  • Conclusion
    No established method for short text
    Parsing of phrases is difficult
    Concept similarity depend on model
    Weak model
    E.g. xebr: Extraordinary Income and xebr: Other Operating Income ->
    Pathlength = 0.2 and Expert = 0.8
    Need a syntactic similarity for concepts tag (word or phrase)
    25