Capturing emerging relations between schema ontologies on the Web of Data<br />Andriy Nikolov<br />Enrico Motta<br />
Public linked data<br />“Linking” in the Linked Data cloud:<br />References to instance URIs described in external sources...
Motivation<br />Schema heterogeneity is an obstacle both for creating and for utilising these links<br />Extracting inform...
Schema-level interlinks<br />Schema-level<br />?<br />Data-level<br />
Matching approaches<br />“Top-down”<br />Analyzing schema ontologies and generating alignments (manually or automatically)...
Our approach<br />Constructing a large-scale network of schema mappings<br />Applying a light-weight instance-based matche...
Motivating factors<br />Potential use case scenarios<br />Discovering relevant sources for connection<br />Discovering rel...
Instance-based matching<br />Use of instance-based matching<br />Some implicit schema-level assumptions cannot be captured...
Instance set overlaps<br />Co-typing<br />dbpedia:Artist<br />yago:ItalianComposers<br />DBPedia<br />is_a<br />is_a<br />...
Dataset<br />Billion Triple Challenge 2009<br />about 1.14 billion triples<br />contains<br />core LOD repositories (DBPed...
Inferring mappings<br />Classification task<br />Classes A, B: is there a mapping?<br />Boolean classification <br />type ...
Test<br />Training<br />Training set: 6000 overlapping pairs of classes<br />Test: 10-fold cross-validation<br />Training<...
Observations: class mappings<br />Association-based network: classes involved into the largest number of mappings<br />Hig...
Observations: class mappings<br />Co-typing-based network: classes involved into the largest number of mappings<br />Popul...
Links between ontologies<br />Aggregated network: connections between ontologies<br />Mapping-based links between ontologi...
Association-based network<br />
Association-based network<br />Generic:<br /><ul><li>YAGO
Freebase
UMBEL
OpenCYC
DBPedia</li></li></ul><li>Association-based network<br />Domain-specific<br />Generic:<br /><ul><li>YAGO
Freebase
UMBEL
OpenCYC
Upcoming SlideShare
Loading in...5
×

Capturing emerging relations between schema ontologies on the Web of Data

383

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
383
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Capturing emerging relations between schema ontologies on the Web of Data

  1. 1. Capturing emerging relations between schema ontologies on the Web of Data<br />Andriy Nikolov<br />Enrico Motta<br />
  2. 2. Public linked data<br />“Linking” in the Linked Data cloud:<br />References to instance URIs described in external sources<br />Special case: identity links between equivalent resources<br />Linking Open Data cloud diagram, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/<br />
  3. 3. Motivation<br />Schema heterogeneity is an obstacle both for creating and for utilising these links<br />Extracting information on the same topic from different repositories<br />Discovering equivalence links between individuals<br />Motivation for our work: discovering instance-level links<br />How to choose the repositories to connect a new one?<br />Which subsets of repositories contain co-referring instances?<br />?<br />LinkedMDB<br />TV programs<br />DBPedia<br />Freebase<br />movies<br />pieces of music<br />MusicBrainz<br />
  4. 4. Schema-level interlinks<br />Schema-level<br />?<br />Data-level<br />
  5. 5. Matching approaches<br />“Top-down”<br />Analyzing schema ontologies and generating alignments (manually or automatically)<br />UMBEL<br />Using CYC as a “backbone”<br />Mapping commonly used schema ontologies<br />“Bottom-up”<br />Inferring schema mappings based on instance-level information<br />
  6. 6. Our approach<br />Constructing a large-scale network of schema mappings<br />Applying a light-weight instance-based matcher<br />Analysing the resulting network<br />What does it tell us about the use of ontologies?<br />
  7. 7. Motivating factors<br />Potential use case scenarios<br />Discovering relevant sources for connection<br />Discovering relevant subsets of comparable instances<br />Tolerance to the quality of mappings<br />A mapping between “strongly overlapping” classes is still useful even if there is no strict equivalence/subsumption<br />
  8. 8. Instance-based matching<br />Use of instance-based matching<br />Some implicit schema-level assumptions cannot be captured using only schema-level evidence<br />Interpretation mismatches<br />dbpedia:Actor = professional actor (film or stage)<br />movie:actor = anybody who participated in a movie<br />Class interpretation “as used” vs “as designed”<br />FOAF: foaf:Person = any person<br />DBLP: foaf:Person = computer scientist<br />
  9. 9. Instance set overlaps<br />Co-typing<br />dbpedia:Artist<br />yago:ItalianComposers<br />DBPedia<br />is_a<br />is_a<br />dbpedia:Ennio_Morricone<br />Declared association<br />movie:music_contributor<br />dbpedia:Artist<br />mo:MusicArtist<br />is_a<br />is_a<br />LinkedMDB<br />DBPedia<br />MusicBrainz<br />=<br />=<br />dbpedia:Ennio_Morricone<br />movie:music_contributor/2490<br />music:artist/a16…9fdf<br />
  10. 10. Dataset<br />Billion Triple Challenge 2009<br />about 1.14 billion triples<br />contains<br />core LOD repositories (DBPedia, Freebase, Geonames, Musicbrainz, LinkedMDB,…)<br />smaller semantic datasets retrieved by search servers (Falcon-S, Sindice)<br />≈3.6M co-typing-based overlapping pairs of classes<br />≈1M association-based pairs<br />
  11. 11. Inferring mappings<br />Classification task<br />Classes A, B: is there a mapping?<br />Boolean classification <br />type of mappings assigned based on comparing sizes of instance sets<br />Features<br />𝑛𝑠1, 𝑛𝑠2: namespaces of class URIs<br />|𝑒𝐴∩𝐵|: size of the overlap<br />𝑒𝐴, 𝑒𝐵: sizes of instance sets<br />|𝑒(𝐴∩𝐵)||𝑒𝐴|,|𝑒(𝐴∩𝐵)||𝑒𝐴|: ratio of the overlapping subset to the complete instance set<br /><ul><li>direct/indirect: whether classes have instances explicitly declared to be equivalent</li></ul> <br />
  12. 12. Test<br />Training<br />Training set: 6000 overlapping pairs of classes<br />Test: 10-fold cross-validation<br />Training<br />Training set: 6000 overlapping pairs of classes<br />Test: 10-fold cross-validation<br />Applying<br />2 networks of class mappings<br />
  13. 13. Observations: class mappings<br />Association-based network: classes involved into the largest number of mappings<br />High-level classes represented concepts covered in many repositories<br />… and describing categories with very fine-grained class decomposition<br />Usually also the most populated ones<br />geonames:Feature<br />freebase:people.person<br />yago:PhysicalEntity<br />linkedmdb:film<br />umbel:Person<br />akt:Person<br />akt:ArticleReference<br />…<br />“under-linked” ones?<br />
  14. 14. Observations: class mappings<br />Co-typing-based network: classes involved into the largest number of mappings<br />Popular classes reused in many repositories<br />… or in DBPedia<br />… and describing categories with fine-grained class decomposition<br />Usually also the most populated ones<br />foaf:Person<br />umbel:Person<br />dbpedia:Person<br />dbpedia:FootballPlayer<br />wordnet:Person<br />dbpedia:Album<br />sioc:WikiArticle<br />geonames:Feature<br />…<br />
  15. 15. Links between ontologies<br />Aggregated network: connections between ontologies<br />Mapping-based links between ontologies<br />At least 1 mapping between corresponding classes must exist<br />
  16. 16. Association-based network<br />
  17. 17. Association-based network<br />Generic:<br /><ul><li>YAGO
  18. 18. Freebase
  19. 19. UMBEL
  20. 20. OpenCYC
  21. 21. DBPedia</li></li></ul><li>Association-based network<br />Domain-specific<br />Generic:<br /><ul><li>YAGO
  22. 22. Freebase
  23. 23. UMBEL
  24. 24. OpenCYC
  25. 25. DBPedia</li></li></ul><li>Association-based network<br />Main factor: topic coverage<br />Popularity for linking is not reflected<br />Data-level: DBPedia has more connections than Freebase<br />Schema-level: no substantial difference<br />Effect of exploiting composed links<br />
  26. 26. Co-typing-based network<br />Main factor:<br />Popularity for reuse<br />FOAF and WordNet:<br />the most popular <br />DBPedia, YAGO, OpenCYC, UMBEL<br />Reused for DBPedia instances<br />
  27. 27. Outcomes<br />Possible usage scenarios for mappings<br />Selecting suitable sources to connect<br />“LinkedMDB contains more movies than DBPedia – more likely to cover all my instances”<br />Selecting an ontology to reuse to structure new instances<br />Which sources use this ontology? Do I want my data to be integrated with them?<br />Other data-driven tasks<br />E.g., exploratory search<br />Generic challenges<br />How to take into account task requirements in ontology matching?<br />Recall vs precision, fuzzy vs exact<br />How to capture changes in the data?<br />BTC 2009 is almost obsolete by now<br />
  28. 28. Limitations and future work<br />Limitations<br />Light-weight matcher can lead to lower quality mappings<br />OK for our scenario but not others<br />Pre-existing instance-level mappings are not always available<br />Future work<br />Combining with schema-based ontology matching techniques<br />Taking into account properties and complex correspondences<br />
  29. 29. Questions?<br />Thanks for your attention<br />
  30. 30. Disjoint but overlapping<br />Spurious owl:sameAs link<br />dbpedia:Hippocrates(Hippocrates) = bookmashup:9004095748 (Hippocratic Lives and Legends (Studies in Ancient Medicine, Vol 4))<br />Spurious rdf:typeassignment<br />dbpedia:Celtic_Frost (band) defined as Person in DBPedia (fixed in the current version of DBPedia)<br />Modelling assumptions<br />dbpedia:Masada describes both the geographical place and the battle<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×