Discovering Alignments in Ontologies of Linked Data

494 views

Published on

Rahul Parundekar and Craig A. Knoblock and Jose Luis Ambite, Discovering Alignments in Ontologies of Linked Data, Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013

Published in: Education
  • Be the first to comment

Discovering Alignments in Ontologies of Linked Data

  1. 1. Discovering Concept Coverings in Ontologies of Linked Data Sources Best Paper: Research Track – The 11th International Semantic Web Conference, 2012 Rahul Parundekar, Craig A. Knoblock and Jose-Luis Ambite {parundek,knoblock}@usc.edu, ambite@isi.edu University of Southern California
  2. 2. MOTIVATION
  3. 3. Data Integration on the Web Goal: Integrate and Query Data across the Web Two challenges in data integration: • Object-level (aka record linkage): When objects at different sites are the same real-world object • Linked Data movement addresses object-level integration • objects have unique URIs, • equivalent objects linked with owl:sameAs statements • Schema-level (aka schema/ontology mapping): align the semantics of different sources • Define mappings to relate source schemas to common schema • Use schema mappings to query across multiple sources Our work: Exploit linked data to automatically discover expressive schema/ontology mappings
  4. 4. The Web of Linked Data integrates data at the object level Example: Geospatial Domain Los Angeles City of Los Angeles
  5. 5. Equivalent instances in different sources are connected with owl:sameAs Source 1 Source 2 Ontology Level Populated Place City Instance Level owl:sameAs Los Angeles City of Los Angeles
  6. 6. Links are absent at the ontology level Source 1 Source 2 Ontology Level Populated Place Only 15 out of the NO LINKS!! 190 ontologies are connected City Instance Level owl:sameAs Los Angeles City of Los Angeles
  7. 7. Alignments are necessary for interoperability of the sources Source 1 Source 2 Ontology Level Populated Place We need to find NO LINKS!! links at the Ontology Level City Instance Level owl:sameAs Los Angeles City of Los Angeles
  8. 8. How can we find ontology alignments? Source 1 Source 2 Ontology Level = Populated Place City Instance Level owl:sameAs Los Angeles City of Los Angeles
  9. 9. DISCOVERING ALIGNMENTS IN ONTOLOGIES OF LINKED DATA
  10. 10. Use an extensional approach to align concepts Represents set of instances belonging to ClassA Represents set of instances belonging to ClassB ClassA is disjoint from ClassB ClassA is equivalent to ClassB ClassA is subset of ClassB ClassB is subset of ClassA
  11. 11. Align concepts when supported by evidence at the instance level Source 1 Source 2 Ontology Level = Populated Place City Instance Level New York NYC City of Los Angeles Los Angeles City of Dublin Dublin Kuala Lumpur … and more Kolumpo … and more
  12. 12. However, ontologies of many sources are rudimentary DBpedia Ontology Description GeoNames Ontology # of Properties 3.77 million 8 million 9 feature classes, 645 feature codes 359 1 (Well-definedhierarchy) # ofClasses Geographical Database For Example: Places, People, Music, Topics, etc. # of Instances Semantic Web version of Wikipedia (rdf:type=Feature) 1775 29 Rich, Descriptive Ontology Impoverished Ontology Finding Alignments is Non-Trivial
  13. 13. Create NEW concepts by restricting values of properties Set of all instances in GeoNames Set of all instances with featureClass=P Atomic Restriction Classes* Set of all instances in DBpedia Set of all instances with rdf:type=PopulatedPlace * Value Restrictions in OWL-DL
  14. 14. Comparing Linked Instances Across Ontologies | Img(r1) ∩ r2| P | Img(r1) | R | Img(r1) ∩ r2| |r2|
  15. 15. Align Atomic Restriction Classes by comparing their overlap featureClass=P rdf:type=PopulatedPlace = r1 r2 | Img(r1) ∩ r2| | Img(r1) | = > 0.9 1 | Img(r1) ∩ r2| |r2| = > 0.9 1
  16. 16. Create specialized concepts using conjunctive restriction classes For Example: Creating the concept for “Schools in the US” Set of all instances in GeoNames Set of all instances with countryCode=US i.e. features in the US Set of all instances with featureCode=S.SCH &countryCode=US i.e. Schools in the US Set of all instances with featureCode=S.SCH i.e. Schools Conjunctive Restriction Classes
  17. 17. An ordered top-down exploration algorithm to align Atomic & Conjunctive Restriction Classes
  18. 18. Detect rich alignments even when ontologies are rudimentary GeoNames-Dbpedia Relationship Equivalent Subset # Alignments Found with Atomic and Conjunctive Restriction Classes 31 2193 Can we find more meaningful alignments?
  19. 19. CONSIDER THREE OF THE SUBSET RELATIONS PRODUCED…
  20. 20. 1) Schools in GeoNames are Educational Institutions in DBpedia featureCode=S.SCH rdf:type=EducationalInstitution
  21. 21. 2) Colleges in GeoNames are Educational Institutions in DBpedia featureCode=S.SCH rdf:type=EducationalInstitution featureCode=S.SCHC
  22. 22. 3) Universities in GeoNames are Educational Institutions in DBpedia featureCode=S.SCH rdf:type=EducationalInstitution featureCode=S.SCHC featureCode=S.UNIV
  23. 23. Using featureCode property as a hint, create a Union of concepts featureCode=S.SCH rdf:type=EducationalInstitution featureCode=S.SCHC featureCode=S.UNIV featureCode=S.SCHC ∩ ∩ featureCode=S.SCH featureCode=S.UNIV
  24. 24. Detect a Concept Covering by extensional comparison featureCode=S.SCH rdf:type=EducationalInstitution featureCode=S.SCHC = featureCode=S.UNIV featureCode=S.SCHC ∩ ∩ featureCode=S.SCH featureCode=S.UNIV
  25. 25. Compare the overlap of the extension sets to determine equivalence featureCode={S.SCH, S.SCHC, S.UNIV} US rdf:type=EducationalInstitution UL |UL| = 404 Educational Institutions UA=US∩UL = | UA | | US | > = 1, by definition 0.9, by definition | UA | | UL | = =1 396 404 = 0.98 > 0.9
  26. 26. DETECTING OUTLIERS
  27. 27. Example: Am I in Spain … or Italy? • We align dbpedia:country=dbpedia:Spain with geonames:countryCode=ES • 3917 out of 3918 instances in GeoNames agree with this • ONE instance had its country code as Italy. • Because this instance contradicts overwhelming evidence, we can flag it as an outlier
  28. 28. Concept Covering of Educational Institutions: What are the other 8 instances? featureCode={S.SCH, S.SCHC, S.UNIV} | UA | | UL | rdf:type=EducationalInstitution = 396 404 • 1 with featureCode=S.HSP (Hospitals) • There are 31 instances with S.HSP because of which Hospitals are not subsets • • • • • 3 with featureCode=S.BLDG (Buildings) 1 with featureCode=S.EST (Establishment) 1 with featureCode=S.LIBR (Library) 1 with featureCode=S.MUS (Museum) 1 doesn’t have a featureCodeproperty
  29. 29. RESULTS
  30. 30. Example alignments of Atomic Restriction Classes Restriction Class from GeoNames Restriction Class from DBpedia Rel P R | Img(r1) ∩ r2| Alignments with concepts that were not explicit– e.g. Concept ofPlaces featureClass=P rdf:type=PopulatedPlace = 99. 6 90.5 70658 ∩ Alignments with geographical regions like countries, administrative divisions, etc. countryCode=ES country=Spain = 94. 5 99.9 3917 Find the actual relationship between concepts as opposed to the perceived one
  31. 31. Example alignments of Conjunctive Restriction Classes Restriction Class from GeoNames Restriction Class from DBpedia Rel P R | Img(r1) ∩ r2| Find alignments with conjunctiverestriction classes e.g. Concepts of ‘Places in the US’ are equal featureClass=P &countrycode=US rdf:type=PopulatedPlace& country=United_States = 97. 2 96.7 26061 Find alignments with conjunctive restriction classes that have related properties e.g. Places in North Dakota have 701 area code for phone numbers featureClass=P & parentADM1= North_Dakota areaCode=701 = 98. 1 96.5 361 In some cases the meaning of a concept shifts slightly e.g. Populated Places in Senegal are aligned to Towns rather than PopulatedPlaces featureClass=P &countryCode=SN rdf:type=Town & country=Senegal = 92. 6 100 25
  32. 32. Example Alignments of Disjunctive Restriction Classes Larger Restriction Class Union of Smaller Restriction Classes Rel R Ove Outliers rlap Find concept coverings with disjunctive restriction classes– e.g. Educational Institution concept in Dbpedia covers concepts of Schools, Colleges and Universities rdf:type= dbpedia: geonames:featureCode= Educational_Institution {S.SCH, S.SCHC, S.UNIV} = 98. 0 396/ 404 S.BLDG, S.HSP, S.MUS, etc. = 99. 2 1981/ 1996 S.AIRF, S.FRMT, S.SCH, T.HLL, etc. 98. 0 1939/ 1978 dbpedia: Kingdom_o f_the_Neth erlands System can flag outliers that may need to be corrected rdf:type= dbpedia:Airport Geonames:featureCode= {S.AIRB, S.AIRP} System is able to find all terms used for the country Netherlands geonames: countryCode=NL dbpedia:country= {dbpedia:The_Netherlands,db pedia:Flag_of_the_Netherland s.svg, dbpedia:Netherlands} =
  33. 33. Related Work • Other Ontology Alignment efforts in the Web of Linked Data • BLOOMS, BLOOMS+ [Jain et al. ISWC 2010, 2011] • Linked Open Data ontologies aligned with central ontology called ‘Proton’ using structural similarity • Agreement Maker [Cruz et al. 2011] • Similarity Metrics on labels of classes • Statistical schema induction [Volker et al. ISWC 2011] • Mines associativity rules from intermediate ‘transaction data sets’ -> OWL2 Axioms. • Formalization of Ontology Mappings [Atencia et al. ISWC 2012] • A related work that provides a formalization of weighted ontology mappings
  34. 34. Conclusion and Future Work • Conclusion • Our approach is able to find alignments • Automatically, across any two linked sources • Even in the case of a rudimentary ontology • Types of Alignments • Atomic Restriction Classes • Conjunctive Restriction Classes • Disjunctive Restriction Classes i.e. Concept Coverings • And detect Outliers that help identify inconsistencies in the data • Future work • Add support for negation • Build complete descriptions of sources • Use algorithms to negotiate meaning between agents on the fly
  35. 35. References for Additional Detail: Any questions? THANK YOU Rahul Parundekar, Craig A. Knoblock, and Jose Luis Ambite. Linking and building ontologies of linked data. The Semantic Web, ISWC 2010. Rahul Parundekar, Craig A. Knoblock, and Jose Luis Ambite. Discovering concept coverings in ontologies of linked data sources The Semantic Web, ISWC 2012. Rahul Parundekar, Craig A. Knoblock, and Jose Luis Ambite. Discovering alignments in ontologies of linked data IJCAI-2013

×