Your SlideShare is downloading. ×
0
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Capturing emerging relations between schema ontologies on the Web of Data

367

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
367
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Capturing emerging relations between schema ontologies on the Web of Data<br />Andriy Nikolov<br />Enrico Motta<br />
  • 2. Public linked data<br />“Linking” in the Linked Data cloud:<br />References to instance URIs described in external sources<br />Special case: identity links between equivalent resources<br />Linking Open Data cloud diagram, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/<br />
  • 3. Motivation<br />Schema heterogeneity is an obstacle both for creating and for utilising these links<br />Extracting information on the same topic from different repositories<br />Discovering equivalence links between individuals<br />Motivation for our work: discovering instance-level links<br />How to choose the repositories to connect a new one?<br />Which subsets of repositories contain co-referring instances?<br />?<br />LinkedMDB<br />TV programs<br />DBPedia<br />Freebase<br />movies<br />pieces of music<br />MusicBrainz<br />
  • 4. Schema-level interlinks<br />Schema-level<br />?<br />Data-level<br />
  • 5. Matching approaches<br />“Top-down”<br />Analyzing schema ontologies and generating alignments (manually or automatically)<br />UMBEL<br />Using CYC as a “backbone”<br />Mapping commonly used schema ontologies<br />“Bottom-up”<br />Inferring schema mappings based on instance-level information<br />
  • 6. Our approach<br />Constructing a large-scale network of schema mappings<br />Applying a light-weight instance-based matcher<br />Analysing the resulting network<br />What does it tell us about the use of ontologies?<br />
  • 7. Motivating factors<br />Potential use case scenarios<br />Discovering relevant sources for connection<br />Discovering relevant subsets of comparable instances<br />Tolerance to the quality of mappings<br />A mapping between “strongly overlapping” classes is still useful even if there is no strict equivalence/subsumption<br />
  • 8. Instance-based matching<br />Use of instance-based matching<br />Some implicit schema-level assumptions cannot be captured using only schema-level evidence<br />Interpretation mismatches<br />dbpedia:Actor = professional actor (film or stage)<br />movie:actor = anybody who participated in a movie<br />Class interpretation “as used” vs “as designed”<br />FOAF: foaf:Person = any person<br />DBLP: foaf:Person = computer scientist<br />
  • 9. Instance set overlaps<br />Co-typing<br />dbpedia:Artist<br />yago:ItalianComposers<br />DBPedia<br />is_a<br />is_a<br />dbpedia:Ennio_Morricone<br />Declared association<br />movie:music_contributor<br />dbpedia:Artist<br />mo:MusicArtist<br />is_a<br />is_a<br />LinkedMDB<br />DBPedia<br />MusicBrainz<br />=<br />=<br />dbpedia:Ennio_Morricone<br />movie:music_contributor/2490<br />music:artist/a16…9fdf<br />
  • 10. Dataset<br />Billion Triple Challenge 2009<br />about 1.14 billion triples<br />contains<br />core LOD repositories (DBPedia, Freebase, Geonames, Musicbrainz, LinkedMDB,…)<br />smaller semantic datasets retrieved by search servers (Falcon-S, Sindice)<br />≈3.6M co-typing-based overlapping pairs of classes<br />≈1M association-based pairs<br />
  • 11. Inferring mappings<br />Classification task<br />Classes A, B: is there a mapping?<br />Boolean classification <br />type of mappings assigned based on comparing sizes of instance sets<br />Features<br />𝑛𝑠1, 𝑛𝑠2: namespaces of class URIs<br />|𝑒𝐴∩𝐵|: size of the overlap<br />𝑒𝐴, 𝑒𝐵: sizes of instance sets<br />|𝑒(𝐴∩𝐵)||𝑒𝐴|,|𝑒(𝐴∩𝐵)||𝑒𝐴|: ratio of the overlapping subset to the complete instance set<br /><ul><li>direct/indirect: whether classes have instances explicitly declared to be equivalent</li></ul> <br />
  • 12. Test<br />Training<br />Training set: 6000 overlapping pairs of classes<br />Test: 10-fold cross-validation<br />Training<br />Training set: 6000 overlapping pairs of classes<br />Test: 10-fold cross-validation<br />Applying<br />2 networks of class mappings<br />
  • 13. Observations: class mappings<br />Association-based network: classes involved into the largest number of mappings<br />High-level classes represented concepts covered in many repositories<br />… and describing categories with very fine-grained class decomposition<br />Usually also the most populated ones<br />geonames:Feature<br />freebase:people.person<br />yago:PhysicalEntity<br />linkedmdb:film<br />umbel:Person<br />akt:Person<br />akt:ArticleReference<br />…<br />“under-linked” ones?<br />
  • 14. Observations: class mappings<br />Co-typing-based network: classes involved into the largest number of mappings<br />Popular classes reused in many repositories<br />… or in DBPedia<br />… and describing categories with fine-grained class decomposition<br />Usually also the most populated ones<br />foaf:Person<br />umbel:Person<br />dbpedia:Person<br />dbpedia:FootballPlayer<br />wordnet:Person<br />dbpedia:Album<br />sioc:WikiArticle<br />geonames:Feature<br />…<br />
  • 15. Links between ontologies<br />Aggregated network: connections between ontologies<br />Mapping-based links between ontologies<br />At least 1 mapping between corresponding classes must exist<br />
  • 16. Association-based network<br />
  • 17. Association-based network<br />Generic:<br /><ul><li>YAGO
  • 18. Freebase
  • 19. UMBEL
  • 20. OpenCYC
  • 21. DBPedia</li></li></ul><li>Association-based network<br />Domain-specific<br />Generic:<br /><ul><li>YAGO
  • 22. Freebase
  • 23. UMBEL
  • 24. OpenCYC
  • 25. DBPedia</li></li></ul><li>Association-based network<br />Main factor: topic coverage<br />Popularity for linking is not reflected<br />Data-level: DBPedia has more connections than Freebase<br />Schema-level: no substantial difference<br />Effect of exploiting composed links<br />
  • 26. Co-typing-based network<br />Main factor:<br />Popularity for reuse<br />FOAF and WordNet:<br />the most popular <br />DBPedia, YAGO, OpenCYC, UMBEL<br />Reused for DBPedia instances<br />
  • 27. Outcomes<br />Possible usage scenarios for mappings<br />Selecting suitable sources to connect<br />“LinkedMDB contains more movies than DBPedia – more likely to cover all my instances”<br />Selecting an ontology to reuse to structure new instances<br />Which sources use this ontology? Do I want my data to be integrated with them?<br />Other data-driven tasks<br />E.g., exploratory search<br />Generic challenges<br />How to take into account task requirements in ontology matching?<br />Recall vs precision, fuzzy vs exact<br />How to capture changes in the data?<br />BTC 2009 is almost obsolete by now<br />
  • 28. Limitations and future work<br />Limitations<br />Light-weight matcher can lead to lower quality mappings<br />OK for our scenario but not others<br />Pre-existing instance-level mappings are not always available<br />Future work<br />Combining with schema-based ontology matching techniques<br />Taking into account properties and complex correspondences<br />
  • 29. Questions?<br />Thanks for your attention<br />
  • 30. Disjoint but overlapping<br />Spurious owl:sameAs link<br />dbpedia:Hippocrates(Hippocrates) = bookmashup:9004095748 (Hippocratic Lives and Legends (Studies in Ancient Medicine, Vol 4))<br />Spurious rdf:typeassignment<br />dbpedia:Celtic_Frost (band) defined as Person in DBPedia (fixed in the current version of DBPedia)<br />Modelling assumptions<br />dbpedia:Masada describes both the geographical place and the battle<br />

×