SlideShare a Scribd company logo
1 of 31
Download to read offline
Conceptual  Similarity:  
  Why,  Where,  How


         Michalis  Sfakakis
         Laboratory  on  Digital  Libraries  &  Electronic  Publishing,  
         Department  of  Archives  and  Library  Sciences,  
         Ionian  University,  Greece  




First  Workshop  on  Digital  Information  Management,  March  30-­31,  2011  
                                 Corfu,  Greece  
Do  we  need  similarity?

  Are  the  following  objects  similar?
    (Similarity,  SIMILARITY)  
       As  character  sequences,  NO!
         How  do  they  differ?
       As  character  sequences,  but  case  
       insensitive,  Yes!  
       As  English  words,  Yes!  
         Same  word!  They  have  the  same  definition,  
         written  differently  
Contents

 Introduction

 Disciplines

 How  we  measure  similarity
   Focus  on  Ontology  Learning  evaluation
What  about  the  similarity  of  the  objects?
   (1,  a)
       The  first  object  is  the  number  one  and  the  second  is  
       the  first  letter  of  the  English  alphabet.  Therefore,  as  
       the  first  is  a  number  and  the  second  is  a  letter,  they  
       are  different!



       e.g.  a  chapter,  or  a  paragraph  number,  they  are  both  
       representing  the  first  object  of  the  list,  the  first  
       chapter,  paragraph,  etc.  Therefore,  they  could  be  
       considered  as  being  similar!
Results  for  an  Information  Need




  How  similar  are  the  Results?  Which  one  to  select?
Comparing  Concepts


 objects?
   (Disease,  Illness)  
     As  English  words,  or  as  character  
     sequences  they  are  not  similar!
        How  do  they  differ?
     As  synonymous  terms  in  a  Thesaurus,  they  
     are  both  representing  the  same  concept.  
     (related  with  the  equivalency relationship)  
Comparing  Hierarchies




                                                               *




                           car from  the  left  hierarchy  to  the  
             node  auto from  the  right  hierarchy?
                               van from  both  hierarchies?


*   [Dellschaft  and  Staab,  2006]
Similarity  is  a  context  dependent  concept

       Merriam-­
       defines  similarity  as*:
            A  quality  that  makes  one  person  or  thing  like  
            another


       Therefore,  the  context  and  the    
       characteristics  in  common  are  required  in  
       order  to  specify  and  measure  similarity  

*   http://www.learnersdictionary.com/search/similarity
Where  the  concept  of  similarity  is  
encountered


  Machine  learning
     Ontology  Learning  
     Schema  &  Ontology  Matching  and  Mapping  
     Clustering  
     IR  

     pattern  recognition  algorithm

  Vital  part  of  the  Semantic  Web  development
Precision  &  Recall  in  IR,  measuring    
similarity  between  answers  
     Let  C be  the  result  set  for  a  query  (the  retrieved  
     documents,  i.e.  the  Computed set)

     Also,  we  need  to  know  the  correct  results  for  the  
     query  (all  the  relevant  documents,  the  Reference
     set)
          Precision:  is  the  fraction  of  retrieved  documents  that  
          are  relevant  to  the  search
          Recall:  is  the  fraction  of  the  documents  that  are  
          relevant  to  the  query  that  are  successfully  retrieved




Wikipedia:    http://en.wikipedia.org/wiki/Precision_and_recall
measure  similarity
     Precision &  Recall are  two  widely  used  
     metrics   for  evaluating  the  correctness  of  
     a  pattern  recognition  algorithm


     Recall and  Precision depend  on  the  
     outcome  (oval)  of  a  pattern  recognition  
     algorithm  and  its  relation  to  all  relevant  
     patterns  (left)  and  the  non-­relevant  
     patterns  (right).  
     The  more  correct  results  (green),  the  
     better.
          Precision:  horizontal  arrow.
          Recall:  diagonal  arrow.  

Wikipedia:    http://en.wikipedia.org/wiki/Precision_and_recall
Precision  &  Recall,  once  more
                           D
  Precision
     P  =  |R     C|/|R|   False  Negative   False  Positive


  Recall
     R  =  |R     C|/|C|       R       R     C           C


  TP  =  R     C
  TN  =  D     (R     C)
                                    True  Positive
  FN  =  R     C
  FP  =  C     R           True  Negative
Overall  evaluation,  
combining  Precision &  Recall

  Given  Precision &  Recall,  F-­measure could  combines  
  them  for  an  overall  evaluation

  Balanced  F-­measure (P &  R are  evenly  weighted)
     F1 =  2*(P*R)/(P+R)

  Weighted  F-­measure
     Fb =  (1+b2)*(P*R)/(b2*P+R),  b  non-­zero  

     F1 (b=2)  weights  recall  twice  as  much  as  precision
     F0.5 (b=0.5)  weights  precision  twice  as  much  as  recall
Measuring  Similarity,    
Comparing  two  Ontologies



       A  simplified  definition  of  a  core  ontology*:
            The  structure  O :=  (C,  root,   C)  is  called  a  core  ontology.  C
            is  a  set  of  concept  identifiers  and  root is  a  designated  root  
            concept  for  the  partial  order   C on  C.  This  partial  order  is  
            called  concept  hierarchy  or  taxonomy.  The  equation  
              c C :  c C root holds  for  this  concept  hierarchy.

       Levels  of  comparison  
             Lexical,  how  terms  are  used  to  convey  meanings
             Conceptual,  which  conceptual  relations  exist  between  terms  



*   [Dellschaft  and  Staab,  2006]
Gold  Standard based  
Evaluation  of  Ontology  Learning
      OR                                       OC




  Given  a  pre-­defined  ontology  
     The  so-­called  Gold  Standard or  Reference
  Compare  the  Learned  (Computed)  Ontology
  with  the  Gold  Standard
Measuring  Similarity  -­
Lexical  Comparison  Level                            LP,  LR

      OR                                               OC




  Lexical  Precision &  Lexical  Recall
      LP(OC,  OR)  =  |CC     CR|/|CC|
      LR(OC,  OR)    =  |CC    CR|/|CR|


  The  lexical  precision  and  recall  reflect  how  good  the  
  learned  lexical  terms  CC cover  the  target  domain CR
  For  the  above  example  LP=4/6=0.67,  LR=4/5=0.8
Measuring  Similarity,  
Lexical  Comparison  Level  -­ aSM
       Average  String  Matching,  using  edit  distance
            Levenshtein  distance,  the  most  common  definition  
            for  edit  distance,  measures  the  minimum  number  
            of  token  insertions,  deletions  and  substitutions  
            required  to  transform  one  string  into  an  other  

            For  example*,  the  Levenshtein  distance
            between  "kitten"  and  "sitting"  is  3  (there  is  
            no  way  to  do  it  with  fewer  than  three  edits)
                kitten   sitten  (substitution  of  's'  for  'k')
                sitten   sittin  (substitution  of  'i'  for  'e')
                sittin   sitting (insertion  of  'g'  at  the  end).

*   Wikipedia:  http://en.wikipedia.org/wiki/Levenshtein_distance
Measuring  Similarity,  
Lexical  Comparison  Level                    String  Matching      

     String  Matching  measure  
     (SM),  given  two  lexical  
     entries  L1,  L2
          Weights  the  number  of  the  
          required  changes  against  
          the  shorter  string
          1  stands  for  perfect  match,  
          0  for  bad  match  
     Average  SM
          Asymmetric,  determines  the  
          extend  to  which  L1 (target)  
          is  covered  by  L2 (source)



[Maedche  and  Staab,  2002]
Measuring  Similarity,  
Lexical  Comparison  Level -­ RelHit



  RelHit actually  express  Lexical  Precision
  RelHit Compared  to  average  String  
  Matching
    Average  SM reduces  the  influences  of  string  
    pseudo-­differences  (e.g.  singular  vs.  plurals)
    Average  SM may  introduce  some  kind  of  noise,  
Measuring  Similarity,  
Conceptual  Comparison  Level

 Conceptual  level  compares  semantic  
 structure  of  ontologies

 Conceptual  structures  are  constituted  by  
 Hierarchies,  or  by  Relations

 How  to  compare  two  hierarchies?  
 How  do  the  positions  of  concepts  influence  
 similarity  of  Hierarchies?  
 What  measures  to  use?
Measuring  Similarity,  
Conceptual  Comparison  Level
         OR                                                     OC




 Local  measures  compare  the  positions  of  two  concepts  
 based  on  characteristics  extracts  from  the  concept  
 hierarchies  they  belong  to
 Some  characteristic  extracts
    Semantic  Cotopy  (sc)
       sc(c,  O)  =  {ci|ci C   (ci c   c ci)}
    Common  Semantic  Cotopy  (csc)
       csc(c,  O1,  O2)  =  {ci|ci C1     C2   (ci     c       c     ci)}
Measuring  Similarity,  
Conceptual  Comparison  Level                                      sc
          OR                                              OC




  Semantic  Cotopy
     sc(c,  O)  =  {ci|ci C   (ci c   c ci)}
  Semantic  Cotopy  examples
                OR)  =  {root,  bike,  car,  van,  coup}
                OC)  =  {root,  bike,  auto,  BMX,  van,  coup }
                OR)  =  {root,  bike}
                OC)  =  {root  ,  bike,  BMX}
               OR)  =  {root  ,  car,  van,  coup }
                OC)  =  {root,  auto,  van,  coup }
Measuring  Similarity,  
Conceptual  Comparison  Level                                              csc
            OR                                                         OC




  Common  Semantic  Cotopy
     csc(c,  O1,  O2)  =  {ci|ci C1     C2    (ci     c       c     ci)}
  Common  Semantic  Cotopy  examples
     C1   C2 =  {root,  bike,  van,  coup }

                  OR,  OC)  =  {bike,  van,  coup  }
                  OC,  OR)  =  {bike,  van,  coup }
                  OR,  OC                            OC,  OR)  =  {root}
                 OR,  OC)  =  {root  ,  van,  coup                  OC,  OR)  =  
                  Oc,  OR)  =  {root,  van,  coup                        OC,  OR)  =  
Measuring  Similarity,  Conceptual  
Comparison  Level local  measures  tp,  tr
            OR                                                   OC




  Local  taxonomic  precision using  characteristic  extracts  
     tpce(c1,  c2,  OC,  OR)  =  |ce(c1,  OC)     ce(c1,  OR)  |/|ce(c1,  OC)|



  Local  taxonomic  recall using  characteristic  extracts  
     trce(c1,  c2,  OC,  OR)  =  |ce(c1,  OC)     ce(c1,  OR)  |/|ce(c1,  OR)|
Measuring  Similarity,  Conceptual  
Comparison  Level local  measures  tp
                 OR                                                 OC




     Local  taxonomic  precision examples  using  sc
                         OR)  =  {root,  bike},  
                         OC)  =  {root,  bike,  BMX}


          tpsc                 C,  OR)  =  |{root,  bike}|/|{root,  bike,  BMX}|,
          tpsc                      C,  OR)  =  2/3  =  0.67




[Maedche  and  Staab,  2002]
Measuring  Similarity,  Conceptual  
Comparison  Level local  measures  tp
         OR                                           OC




  Local  taxonomic  precision examples  using  sc
               OR)  =  {root  ,  car,  van,  coup },  
                OC)  =  {root  ,  auto,  van,  coup }


    tpsc car              C,  OR)  =  
    |{                   }  |/|                            |,  
    tpsc car               C,  OR)  =  3/4  =  0.75
Measuring  Similarity,  Conceptual  
Comparison  Level comparing  Hierarchies
         OR                            OC




  Global  Taxonomic  Precision  (TP)
Measuring  Similarity,  Conceptual  
Comparison  Level Overall  evaluation
           F-­measure,  but  now  using Global  Taxonomic  
  Precision  (TP) and  Global  Taxonomic  Recall  (TR)  

  Balanced  Taxonomic  F-­measure (TP &  TR are  evenly  
  weighted)
     TF1 =  2*(TP*TR)/(TP+TR)

  Weighted  TF-­measure
     TFb =  (1+b2)*(TP*TR)/(b2*TP+TR),  b  non-­zero  

     TF1 (b=2)  weights  recall  twice  as  much  as  precision
     TF0.5 (b=0.5)  weights  precision  twice  as  much  as  
     recall
Measuring  Similarity,  Conceptual  
Comparison  Level Taxonomic  Overlap

  taxonomic  overlap  (TO)
References  &  Further  Reading
 Dellschaft,  Klaas  and  Staab,  Steffen  (  2006)  :  On  How  to  
 Perform  a  Gold  Standard  Based  Evaluation  of  Ontology  
 Learning.  In:  I.  Cruz  et  al.  (Eds.)  ISWC  2006.  LNCS  4273,  pp.  
 228 241.  Springer,  Heidelberg
 Maedche,  A.,  Staab,  S.  (2002):  Measuring  similarity  between  
 ontologies.  In:  Proceedings  of  the  European  Conference  on  
 Knowledge  Acquisition  and  Management  (EKAW-­2002).  
 Siguenza,  Spain
 Staab,  S.  and  Hotho,  A.:  Semantic  Web  and  Machine  Learning  
 Tutorial  (available  at  http://www.uni-­
 koblenz.de/~staab/Research/Events/ICML05tutorial/icml05tuto
 rial.pdf)

 Bellahsene,  Z.,  Bonifati,  A.,  Rahm,  E.  (Eds.)  (2011):  Schema  
 Matching  and  Mapping,  Heidelberg,  Springer-­ Verlag,  ISBN  
 978-­3-­642-­16517-­7.
End  of  tutorial!



  Thanks  for  your  attention!

      Michalis  Sfakakis

More Related Content

Similar to Conceptual similarity: why, where and how

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDataminingTools Inc
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDatamining Tools
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerAdriano Melo
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology MappingPradeep B Pillai
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture NotesFellowBuddy.com
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 
Data Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description LogicsData Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description LogicsAdila Krisnadhi
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...Raj vardhan
 
Shape matching and object recognition using shape context belongie pami02
Shape matching and object recognition using shape context belongie pami02Shape matching and object recognition using shape context belongie pami02
Shape matching and object recognition using shape context belongie pami02irisshicat
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCSR2011
 

Similar to Conceptual similarity: why, where and how (20)

Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL Reasoner
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
Www.cs.berkeley.edu kunal
Www.cs.berkeley.edu kunalWww.cs.berkeley.edu kunal
Www.cs.berkeley.edu kunal
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
Sentence generation
Sentence generationSentence generation
Sentence generation
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
semeval2016
semeval2016semeval2016
semeval2016
 
Data Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description LogicsData Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description Logics
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
 
Shape matching and object recognition using shape context belongie pami02
Shape matching and object recognition using shape context belongie pami02Shape matching and object recognition using shape context belongie pami02
Shape matching and object recognition using shape context belongie pami02
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminski
 

More from Giannis Tsakonas

Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόΑρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
 
Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Giannis Tsakonas
 
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςΑκαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςGiannis Tsakonas
 
We were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopWe were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopGiannis Tsakonas
 
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταΒιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταGiannis Tsakonas
 
{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.Giannis Tsakonas
 
Affective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressAffective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressGiannis Tsakonas
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουGiannis Tsakonas
 
FRBR και Linked Data - Σεμινάριο Αθήνας
FRBR και Linked Data -  Σεμινάριο ΑθήναςFRBR και Linked Data -  Σεμινάριο Αθήνας
FRBR και Linked Data - Σεμινάριο ΑθήναςGiannis Tsakonas
 
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Giannis Tsakonas
 
Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Giannis Tsakonas
 
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Giannis Tsakonas
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Giannis Tsakonas
 
Path-based MXML Storage and Querying
Path-based MXML Storage and QueryingPath-based MXML Storage and Querying
Path-based MXML Storage and QueryingGiannis Tsakonas
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataGiannis Tsakonas
 
Open Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISOpen Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISGiannis Tsakonas
 
Dileo Presentation (in English)
Dileo Presentation (in English)Dileo Presentation (in English)
Dileo Presentation (in English)Giannis Tsakonas
 
Evaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesEvaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesGiannis Tsakonas
 

More from Giannis Tsakonas (20)

Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόΑρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
 
Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...
 
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςΑκαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
 
We were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopWe were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshop
 
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταΒιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
 
{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.
 
Affective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressAffective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stress
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
 
FRBR και Linked Data - Σεμινάριο Αθήνας
FRBR και Linked Data -  Σεμινάριο ΑθήναςFRBR και Linked Data -  Σεμινάριο Αθήνας
FRBR και Linked Data - Σεμινάριο Αθήνας
 
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
 
Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...
 
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
 
Path-based MXML Storage and Querying
Path-based MXML Storage and QueryingPath-based MXML Storage and Querying
Path-based MXML Storage and Querying
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
 
Open Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISOpen Bibliographic Data and E-LIS
Open Bibliographic Data and E-LIS
 
Dileo Presentation (in English)
Dileo Presentation (in English)Dileo Presentation (in English)
Dileo Presentation (in English)
 
Evaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesEvaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital Repositories
 

Recently uploaded

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 

Recently uploaded (20)

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

Conceptual similarity: why, where and how

  • 1. Conceptual  Similarity:   Why,  Where,  How Michalis  Sfakakis Laboratory  on  Digital  Libraries  &  Electronic  Publishing,   Department  of  Archives  and  Library  Sciences,   Ionian  University,  Greece   First  Workshop  on  Digital  Information  Management,  March  30-­31,  2011   Corfu,  Greece  
  • 2. Do  we  need  similarity? Are  the  following  objects  similar? (Similarity,  SIMILARITY)   As  character  sequences,  NO! How  do  they  differ? As  character  sequences,  but  case   insensitive,  Yes!   As  English  words,  Yes!   Same  word!  They  have  the  same  definition,   written  differently  
  • 3. Contents Introduction Disciplines How  we  measure  similarity Focus  on  Ontology  Learning  evaluation
  • 4. What  about  the  similarity  of  the  objects? (1,  a) The  first  object  is  the  number  one  and  the  second  is   the  first  letter  of  the  English  alphabet.  Therefore,  as   the  first  is  a  number  and  the  second  is  a  letter,  they   are  different! e.g.  a  chapter,  or  a  paragraph  number,  they  are  both   representing  the  first  object  of  the  list,  the  first   chapter,  paragraph,  etc.  Therefore,  they  could  be   considered  as  being  similar!
  • 5. Results  for  an  Information  Need How  similar  are  the  Results?  Which  one  to  select?
  • 6. Comparing  Concepts objects? (Disease,  Illness)   As  English  words,  or  as  character   sequences  they  are  not  similar! How  do  they  differ? As  synonymous  terms  in  a  Thesaurus,  they   are  both  representing  the  same  concept.   (related  with  the  equivalency relationship)  
  • 7. Comparing  Hierarchies * car from  the  left  hierarchy  to  the   node  auto from  the  right  hierarchy? van from  both  hierarchies? * [Dellschaft  and  Staab,  2006]
  • 8. Similarity  is  a  context  dependent  concept Merriam-­ defines  similarity  as*: A  quality  that  makes  one  person  or  thing  like   another Therefore,  the  context  and  the     characteristics  in  common  are  required  in   order  to  specify  and  measure  similarity   * http://www.learnersdictionary.com/search/similarity
  • 9. Where  the  concept  of  similarity  is   encountered Machine  learning Ontology  Learning   Schema  &  Ontology  Matching  and  Mapping   Clustering   IR   pattern  recognition  algorithm Vital  part  of  the  Semantic  Web  development
  • 10. Precision  &  Recall  in  IR,  measuring     similarity  between  answers   Let  C be  the  result  set  for  a  query  (the  retrieved   documents,  i.e.  the  Computed set) Also,  we  need  to  know  the  correct  results  for  the   query  (all  the  relevant  documents,  the  Reference set) Precision:  is  the  fraction  of  retrieved  documents  that   are  relevant  to  the  search Recall:  is  the  fraction  of  the  documents  that  are   relevant  to  the  query  that  are  successfully  retrieved Wikipedia:    http://en.wikipedia.org/wiki/Precision_and_recall
  • 11. measure  similarity Precision &  Recall are  two  widely  used   metrics   for  evaluating  the  correctness  of   a  pattern  recognition  algorithm Recall and  Precision depend  on  the   outcome  (oval)  of  a  pattern  recognition   algorithm  and  its  relation  to  all  relevant   patterns  (left)  and  the  non-­relevant   patterns  (right).   The  more  correct  results  (green),  the   better. Precision:  horizontal  arrow. Recall:  diagonal  arrow.   Wikipedia:    http://en.wikipedia.org/wiki/Precision_and_recall
  • 12. Precision  &  Recall,  once  more D Precision P  =  |R   C|/|R| False  Negative False  Positive Recall R  =  |R   C|/|C| R R   C C TP  =  R   C TN  =  D   (R   C) True  Positive FN  =  R   C FP  =  C   R True  Negative
  • 13. Overall  evaluation,   combining  Precision &  Recall Given  Precision &  Recall,  F-­measure could  combines   them  for  an  overall  evaluation Balanced  F-­measure (P &  R are  evenly  weighted) F1 =  2*(P*R)/(P+R) Weighted  F-­measure Fb =  (1+b2)*(P*R)/(b2*P+R),  b  non-­zero   F1 (b=2)  weights  recall  twice  as  much  as  precision F0.5 (b=0.5)  weights  precision  twice  as  much  as  recall
  • 14. Measuring  Similarity,     Comparing  two  Ontologies A  simplified  definition  of  a  core  ontology*: The  structure  O :=  (C,  root,   C)  is  called  a  core  ontology.  C is  a  set  of  concept  identifiers  and  root is  a  designated  root   concept  for  the  partial  order   C on  C.  This  partial  order  is   called  concept  hierarchy  or  taxonomy.  The  equation   c C :  c C root holds  for  this  concept  hierarchy. Levels  of  comparison   Lexical,  how  terms  are  used  to  convey  meanings Conceptual,  which  conceptual  relations  exist  between  terms   * [Dellschaft  and  Staab,  2006]
  • 15. Gold  Standard based   Evaluation  of  Ontology  Learning OR OC Given  a  pre-­defined  ontology   The  so-­called  Gold  Standard or  Reference Compare  the  Learned  (Computed)  Ontology with  the  Gold  Standard
  • 16. Measuring  Similarity  -­ Lexical  Comparison  Level   LP,  LR OR OC Lexical  Precision &  Lexical  Recall LP(OC,  OR)  =  |CC CR|/|CC| LR(OC,  OR)    =  |CC CR|/|CR| The  lexical  precision  and  recall  reflect  how  good  the   learned  lexical  terms  CC cover  the  target  domain CR For  the  above  example  LP=4/6=0.67,  LR=4/5=0.8
  • 17. Measuring  Similarity,   Lexical  Comparison  Level  -­ aSM Average  String  Matching,  using  edit  distance Levenshtein  distance,  the  most  common  definition   for  edit  distance,  measures  the  minimum  number   of  token  insertions,  deletions  and  substitutions   required  to  transform  one  string  into  an  other   For  example*,  the  Levenshtein  distance between  "kitten"  and  "sitting"  is  3  (there  is   no  way  to  do  it  with  fewer  than  three  edits) kitten   sitten  (substitution  of  's'  for  'k') sitten   sittin  (substitution  of  'i'  for  'e') sittin   sitting (insertion  of  'g'  at  the  end). * Wikipedia:  http://en.wikipedia.org/wiki/Levenshtein_distance
  • 18. Measuring  Similarity,   Lexical  Comparison  Level String  Matching       String  Matching  measure   (SM),  given  two  lexical   entries  L1,  L2 Weights  the  number  of  the   required  changes  against   the  shorter  string 1  stands  for  perfect  match,   0  for  bad  match   Average  SM Asymmetric,  determines  the   extend  to  which  L1 (target)   is  covered  by  L2 (source) [Maedche  and  Staab,  2002]
  • 19. Measuring  Similarity,   Lexical  Comparison  Level -­ RelHit RelHit actually  express  Lexical  Precision RelHit Compared  to  average  String   Matching Average  SM reduces  the  influences  of  string   pseudo-­differences  (e.g.  singular  vs.  plurals) Average  SM may  introduce  some  kind  of  noise,  
  • 20. Measuring  Similarity,   Conceptual  Comparison  Level Conceptual  level  compares  semantic   structure  of  ontologies Conceptual  structures  are  constituted  by   Hierarchies,  or  by  Relations How  to  compare  two  hierarchies?   How  do  the  positions  of  concepts  influence   similarity  of  Hierarchies?   What  measures  to  use?
  • 21. Measuring  Similarity,   Conceptual  Comparison  Level OR OC Local  measures  compare  the  positions  of  two  concepts   based  on  characteristics  extracts  from  the  concept   hierarchies  they  belong  to Some  characteristic  extracts Semantic  Cotopy  (sc) sc(c,  O)  =  {ci|ci C   (ci c   c ci)} Common  Semantic  Cotopy  (csc) csc(c,  O1,  O2)  =  {ci|ci C1   C2 (ci   c     c   ci)}
  • 22. Measuring  Similarity,   Conceptual  Comparison  Level   sc OR OC Semantic  Cotopy sc(c,  O)  =  {ci|ci C   (ci c   c ci)} Semantic  Cotopy  examples OR)  =  {root,  bike,  car,  van,  coup} OC)  =  {root,  bike,  auto,  BMX,  van,  coup } OR)  =  {root,  bike} OC)  =  {root  ,  bike,  BMX} OR)  =  {root  ,  car,  van,  coup } OC)  =  {root,  auto,  van,  coup }
  • 23. Measuring  Similarity,   Conceptual  Comparison  Level csc OR OC Common  Semantic  Cotopy csc(c,  O1,  O2)  =  {ci|ci C1   C2 (ci   c     c   ci)} Common  Semantic  Cotopy  examples C1   C2 =  {root,  bike,  van,  coup } OR,  OC)  =  {bike,  van,  coup } OC,  OR)  =  {bike,  van,  coup } OR,  OC OC,  OR)  =  {root} OR,  OC)  =  {root  ,  van,  coup OC,  OR)  =   Oc,  OR)  =  {root,  van,  coup OC,  OR)  =  
  • 24. Measuring  Similarity,  Conceptual   Comparison  Level local  measures  tp,  tr OR OC Local  taxonomic  precision using  characteristic  extracts   tpce(c1,  c2,  OC,  OR)  =  |ce(c1,  OC)   ce(c1,  OR)  |/|ce(c1,  OC)| Local  taxonomic  recall using  characteristic  extracts   trce(c1,  c2,  OC,  OR)  =  |ce(c1,  OC)   ce(c1,  OR)  |/|ce(c1,  OR)|
  • 25. Measuring  Similarity,  Conceptual   Comparison  Level local  measures  tp OR OC Local  taxonomic  precision examples  using  sc OR)  =  {root,  bike},   OC)  =  {root,  bike,  BMX} tpsc C,  OR)  =  |{root,  bike}|/|{root,  bike,  BMX}|, tpsc C,  OR)  =  2/3  =  0.67 [Maedche  and  Staab,  2002]
  • 26. Measuring  Similarity,  Conceptual   Comparison  Level local  measures  tp OR OC Local  taxonomic  precision examples  using  sc OR)  =  {root  ,  car,  van,  coup },   OC)  =  {root  ,  auto,  van,  coup } tpsc car C,  OR)  =   |{ }  |/| |,   tpsc car C,  OR)  =  3/4  =  0.75
  • 27. Measuring  Similarity,  Conceptual   Comparison  Level comparing  Hierarchies OR OC Global  Taxonomic  Precision  (TP)
  • 28. Measuring  Similarity,  Conceptual   Comparison  Level Overall  evaluation F-­measure,  but  now  using Global  Taxonomic   Precision  (TP) and  Global  Taxonomic  Recall  (TR)   Balanced  Taxonomic  F-­measure (TP &  TR are  evenly   weighted) TF1 =  2*(TP*TR)/(TP+TR) Weighted  TF-­measure TFb =  (1+b2)*(TP*TR)/(b2*TP+TR),  b  non-­zero   TF1 (b=2)  weights  recall  twice  as  much  as  precision TF0.5 (b=0.5)  weights  precision  twice  as  much  as   recall
  • 29. Measuring  Similarity,  Conceptual   Comparison  Level Taxonomic  Overlap taxonomic  overlap  (TO)
  • 30. References  &  Further  Reading Dellschaft,  Klaas  and  Staab,  Steffen  (  2006)  :  On  How  to   Perform  a  Gold  Standard  Based  Evaluation  of  Ontology   Learning.  In:  I.  Cruz  et  al.  (Eds.)  ISWC  2006.  LNCS  4273,  pp.   228 241.  Springer,  Heidelberg Maedche,  A.,  Staab,  S.  (2002):  Measuring  similarity  between   ontologies.  In:  Proceedings  of  the  European  Conference  on   Knowledge  Acquisition  and  Management  (EKAW-­2002).   Siguenza,  Spain Staab,  S.  and  Hotho,  A.:  Semantic  Web  and  Machine  Learning   Tutorial  (available  at  http://www.uni-­ koblenz.de/~staab/Research/Events/ICML05tutorial/icml05tuto rial.pdf) Bellahsene,  Z.,  Bonifati,  A.,  Rahm,  E.  (Eds.)  (2011):  Schema   Matching  and  Mapping,  Heidelberg,  Springer-­ Verlag,  ISBN   978-­3-­642-­16517-­7.
  • 31. End  of  tutorial! Thanks  for  your  attention! Michalis  Sfakakis