Similarity on DBpediaUIMRPhD student: Samantha LamSupervisor: Conor Hayes
SimilarityHow similar are the following films:                                      2
SimilarityHow similar are the following films:    (Unsatisfactory)                                      Answer: it depends!...
DBpedia GraphFilms - nodes - on DBpedia.                              4
DBpedia GraphFilms - nodes - on DBpedia.Some things about DBpedia:    Big, rich, dense Knowledge Base     → 3.77m nodes, 4...
DBpedia GraphFilms - nodes - on DBpedia.Some things about DBpedia:    Big, rich, dense Knowledge Base     → 3.77m nodes, 4...
Similarity in general   Cognitive Science - Tversky (1977) - psychology - featural.       E.g. film: genre, language, direc...
SemanticThe notion of semantic networks is derived from the hierarchicalsemantic memory model [Collins & Quillian, 1969]  ...
Semantic SimilarityDifferent techniques:    Word frequency: Latent semantic analysis (doesn’t actually    use semantic net ...
Semantic SimilarityDifferent techniques:    Word frequency: Latent semantic analysis (doesn’t actually    use semantic net ...
Semantic SimilarityRemember, DBpedia not as ‘neat’:(Image source: http://www.visualdataweb.org/relfinder/)                ...
On DBpedia/WikipediaRecent applications:    Gabrilovich & Markovitch (2007) - express text as a weighted    vector of Wiki...
On DBpedia/WikipediaRecent applications:    Gabrilovich & Markovitch (2007) - express text as a weighted    vector of Wiki...
On DBpedia/WikipediaRecent applications:    Gabrilovich & Markovitch (2007) - express text as a weighted    vector of Wiki...
On DBpedia/WikipediaRecent applications:    Gabrilovich & Markovitch (2007) - express text as a weighted    vector of Wiki...
On DBpedia/WikipediaRecent applications:    Gabrilovich & Markovitch (2007) - express text as a weighted    vector of Wiki...
SimilarityImportant:    Properties can be related to each other                                       node type 2, e.g. fil...
Network SimilaritySocial Network Analysis    Established field - notions of influence, centrality, rank etc.    Often applie...
Network SimilarityHomogeneous network measures:    PageRank - Sergey & Brin (1998) - random-surfer with    teleportation  ...
Network SimilarityHomogeneous network measures:    PageRank - Sergey & Brin (1998) - random-surfer with    teleportation  ...
Network SimilarityHomogeneous network measures:    PageRank - Sergey & Brin (1998) - random-surfer with    teleportation  ...
Network SimilarityHeterogeneous network measures:    PathSim - Sun & Han (2009) - count instances of    ‘meta-path’ (speci...
Network SimilarityApplicability to DBpedia:    PageRank, SimRank - N/A - assumes homogeneous links!    Spreading Activatio...
Network SimilarityApplicability to DBpedia:    PageRank, SimRank - N/A - assumes homogeneous links!    Spreading Activatio...
Similarity                                                                         Totoro    GITS     Matrix            ...
SimilarityHow similar are the following films:   Answer: it still depends                                                  ...
SimilarityHow similar are the following films:   Answer: it still depends                                      - on the pat...
Summary   Similarity, useful concept in many areas, hard to define       how are films similar?   DBpedia, richly linked KB ...
Summary   Similarity, useful concept in many areas, hard to define       how are films similar?   DBpedia, richly linked KB ...
Summary   Similarity, useful concept in many areas, hard to define       how are films similar?   DBpedia, richly linked KB ...
Ongoing/Future WorkMining DBpedia as Network   Analyse structured and related data                                        ...
Ongoing/Future WorkMining DBpedia as Network   Analyse structured and related data   Similarity as complement to – reasoni...
Ioana Hulpus (2011/2012)Graph-based topic analysis with the support of Linked Data                                        ...
Ioana Hulpus (2011/2012)Graph-based topic analysis with the support of Linked Data                                        ...
Benjamin Heitmann (2011/2012)Spreading activation for cross-domain recommendation                                         ...
Challenges/DiscussionChallenges:    Topology of DBpedia graph        Standard SNA measures for homogeneous networks, e.g. ...
Challenges/DiscussionChallenges:    Topology of DBpedia graph        Standard SNA measures for homogeneous networks, e.g. ...
Challenges/DiscussionChallenges:    Topology of DBpedia graph        Standard SNA measures for homogeneous networks, e.g. ...
Upcoming SlideShare
Loading in …5
×

Similarity on DBpedia

1,629 views

Published on

Overview on the notion of similarity and methods for defining similarity on DBpedia.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,629
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
36
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Similarity on DBpedia

  1. 1. Similarity on DBpediaUIMRPhD student: Samantha LamSupervisor: Conor Hayes
  2. 2. SimilarityHow similar are the following films: 2
  3. 3. SimilarityHow similar are the following films: (Unsatisfactory) Answer: it depends! 3
  4. 4. DBpedia GraphFilms - nodes - on DBpedia. 4
  5. 5. DBpedia GraphFilms - nodes - on DBpedia.Some things about DBpedia: Big, rich, dense Knowledge Base → 3.77m nodes, 400m edges (EN) Lots of prior work (as we shall see...) But very heterogeneous - vocabularies, categories 4
  6. 6. DBpedia GraphFilms - nodes - on DBpedia.Some things about DBpedia: Big, rich, dense Knowledge Base → 3.77m nodes, 400m edges (EN) Lots of prior work (as we shall see...) But very heterogeneous - vocabularies, categories It is a graph 4
  7. 7. Similarity in general Cognitive Science - Tversky (1977) - psychology - featural. E.g. film: genre, language, director Modelling of human thought, semantic relations, how do we relate things to each other? (Quillian & Collins 1969) 5
  8. 8. SemanticThe notion of semantic networks is derived from the hierarchicalsemantic memory model [Collins & Quillian, 1969] 6
  9. 9. Semantic SimilarityDifferent techniques: Word frequency: Latent semantic analysis (doesn’t actually use semantic net structure) Rada (1989) - average shortest path length Resnik (1999) - information content of lcs 7
  10. 10. Semantic SimilarityDifferent techniques: Word frequency: Latent semantic analysis (doesn’t actually use semantic net structure) Rada (1989) - average shortest path length Resnik (1999) - information content of lcsUnfortunately... Word frequency N/A Often assumes hierarchical/tree structure of taxonomy/ontology. (Both Rada and Resnik assume taxonomy is an is-A hierarchy) 7
  11. 11. Semantic SimilarityRemember, DBpedia not as ‘neat’:(Image source: http://www.visualdataweb.org/relfinder/) 8
  12. 12. On DBpedia/WikipediaRecent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) 9
  13. 13. On DBpedia/WikipediaRecent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours 9
  14. 14. On DBpedia/WikipediaRecent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours Passant (2010) - Linked Data Semantic Distance 9
  15. 15. On DBpedia/WikipediaRecent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours Passant (2010) - Linked Data Semantic Distance Mirizzi et al. (2012) uses DBpedia for movie recommendation using a Vector Space Model 9
  16. 16. On DBpedia/WikipediaRecent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours Passant (2010) - Linked Data Semantic Distance ← uses paths! Mirizzi et al. (2012) uses DBpedia for movie recommendation using a Vector Space Model 10
  17. 17. SimilarityImportant: Properties can be related to each other node type 2, e.g. film node, e.g. director type 1, e.g. influenced type 2, e.g. collaborated with 11
  18. 18. Network SimilaritySocial Network Analysis Established field - notions of influence, centrality, rank etc. Often applied to small networks Note: Ranking is often based on similarity 12
  19. 19. Network SimilarityHomogeneous network measures: PageRank - Sergey & Brin (1998) - random-surfer with teleportation SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank of neighbours 13
  20. 20. Network SimilarityHomogeneous network measures: PageRank - Sergey & Brin (1998) - random-surfer with teleportation SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank of neighbours 13
  21. 21. Network SimilarityHomogeneous network measures: PageRank - Sergey & Brin (1998) - random-surfer with teleportation SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank of neighbours σact - Thiel & Berthold (2010) - node similarities from spreading activation with a decay factor 13
  22. 22. Network SimilarityHeterogeneous network measures: PathSim - Sun & Han (2009) - count instances of ‘meta-path’ (specific link pattern) 14
  23. 23. Network SimilarityApplicability to DBpedia: PageRank, SimRank - N/A - assumes homogeneous links! Spreading Activation - possible with constraints Apply PathSim - but how to learn such meta-paths? 15
  24. 24. Network SimilarityApplicability to DBpedia: PageRank, SimRank - N/A - assumes homogeneous links! Spreading Activation - possible with constraints Apply PathSim - but how to learn such meta-paths?Another idea: Count node-disjoint paths. Why? View each path as one distinct ‘reason’. 15
  25. 25. Similarity   Totoro GITS Matrix  Totoro 44 1 0     GITS 1 35 2  Matrix 0 2 58 Totoro – GITS Category:Anime films GITS – Matrix Category:Brain-computer interfacing in fiction Matrix → Category:The Matrix (franchise) → Category:Media franchises ← GITS 16
  26. 26. SimilarityHow similar are the following films: Answer: it still depends 17
  27. 27. SimilarityHow similar are the following films: Answer: it still depends - on the path you take 18
  28. 28. Summary Similarity, useful concept in many areas, hard to define how are films similar? DBpedia, richly linked KB film information available here→ Problem: How to define similarity on DBpedia? 19
  29. 29. Summary Similarity, useful concept in many areas, hard to define how are films similar? DBpedia, richly linked KB film information available here→ Problem: How to define similarity on DBpedia? Past methods - don’t exploit linkedness Network analysis methods can aid this test trial with node-disjoint paths, GITS more similar to Matrix than Totoro 19
  30. 30. Summary Similarity, useful concept in many areas, hard to define how are films similar? DBpedia, richly linked KB film information available here→ Problem: How to define similarity on DBpedia? Past methods - don’t exploit linkedness Network analysis methods can aid this test trial with node-disjoint paths, GITS more similar to Matrix than Totoro 20
  31. 31. Ongoing/Future WorkMining DBpedia as Network Analyse structured and related data 21
  32. 32. Ongoing/Future WorkMining DBpedia as Network Analyse structured and related data Similarity as complement to – reasoning, retrieval, querying Also useful in NLP, recommender systems, knowledge discovery→ Examples: work we do in UIMR 21
  33. 33. Ioana Hulpus (2011/2012)Graph-based topic analysis with the support of Linked Data 22
  34. 34. Ioana Hulpus (2011/2012)Graph-based topic analysis with the support of Linked Data 23
  35. 35. Benjamin Heitmann (2011/2012)Spreading activation for cross-domain recommendation 24
  36. 36. Challenges/DiscussionChallenges: Topology of DBpedia graph Standard SNA measures for homogeneous networks, e.g. density, degree distribution - how to apply to DBpedia? What does a path actually mean? Which subgraphs to use? How do metrics vary with different subgraphs, e.g. diff ontologies/categories? 25
  37. 37. Challenges/DiscussionChallenges: Topology of DBpedia graph Standard SNA measures for homogeneous networks, e.g. density, degree distribution - how to apply to DBpedia? What does a path actually mean? Which subgraphs to use? How do metrics vary with different subgraphs, e.g. diff ontologies/categories? Scalability (not problem, but challenge) Evaluation - how do we confirm something is similar? 25
  38. 38. Challenges/DiscussionChallenges: Topology of DBpedia graph Standard SNA measures for homogeneous networks, e.g. density, degree distribution - how to apply to DBpedia? What does a path actually mean? Which subgraphs to use? How do metrics vary with different subgraphs, e.g. diff ontologies/categories? Scalability (not problem, but challenge) Evaluation - how do we confirm something is similar? Thanks for listening! Questions/Suggestions? 25

×