Mid-Ontology Learning from Linked Data @JIST2011

  • 793 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
793
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 大学共同利用機関法人 情報・システム研究機構 国立情報学研究所 National Institute of InformaticsMid-Ontology Learning from Linked DataLihua Zhao and Ryutaro IchiseJIST2011, 12.05.2011, Hangzhou
  • 2. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Outline Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 2 国立情報学研究所 National Institute of Informatics
  • 3. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Introduction Linked Open Data 295 data sets, 31 billion RDF triples (as of Sep. 2011) 7 domains (cross-domain, geographic, media, life sciences, government, user-generated content, and publications) Interlinked Instances (owl:sameAs) 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 3 国立情報学研究所 National Institute of Informatics
  • 4. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Introduction Challenging Problem Each data set has specific ontology schema DBpedia: http://dbpedia.org/property/population Geonames: http://www.geonames.org/ontology#population Time-consuming to learn all the ontology schema DBpedia: 320 classes and thousands of properties. Heterogeneity of ontology schema http://dbpedia.org/property/populationTotal http://dbpedia.org/property/population 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 4 国立情報学研究所 National Institute of Informatics
  • 5. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Introduction Objective Collected data based on “http://dbpedia.org/resource/Berlin”. Predicate Object http : //dbpedia.org /property /name Berlin http : //dbpedia.org /property /population 3439100 http : //dbpedia.org /property /plz 10001-14199 http : //dbpedia.org /ontology /postalCode 10001-14199 http : //dbpedia.org /ontology /populationTotal 3439100 ...... ...... http : //www .geonames.org /ontology #alternateName Berlin http : //www .geonames.org /ontology #alternateName Berlyn@af http : //www .geonames.org /ontology #population 3426354 ...... ...... http : //www .w 3.org /2004/02/skos/core#prefLabel Berlin (Germany) http : //data.nytimes.com/elements/first use 2004-09-12 http : //data.nytimes.com/elements/latest use 2010-06-13 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 5 国立情報学研究所 National Institute of Informatics
  • 6. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Introduction Simple ontology for various data sets: Mid-Ontology Investigation on linked instances owl:sameAs links identical or related instances Scale down the data set Automatic ontology learning Integrate ontologies from diverse domain data sets Automate the ontology construction process Adapt to linked open data sets 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 6 国立情報学研究所 National Institute of Informatics
  • 7. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Mid-Ontology Learning Approach 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 7 国立情報学研究所 National Institute of Informatics
  • 8. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Data Collection We scale down the data sets by collecting only linked instances, from which we can extract related information. Extract data linked with owl:sameAs Select a core data set (inward & outward links) Collect all instances that have owl:sameAs Remove noisy instances of the core data set Noisy instances: without any meaningful triple Collect predicates and objects collect <predicate, object> (PO) pairs from collected instances collect PO pairs from linked instances (other data sets) 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 8 国立情報学研究所 National Institute of Informatics
  • 9. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work An Example of Collected Data dbpedia:Berlin owl:sameAs http://sws.geonames.org/2950159/ http://data.nytimes.com/N50987186835223032381 owl:sameAs dbpedia:Berlin Collected data based on “http://dbpedia.org/resource/Berlin”. Predicate Object http : //dbpedia.org /property /name Berlin http : //dbpedia.org /property /population 3439100 http : //dbpedia.org /property /plz 10001-14199 http : //dbpedia.org /ontology /postalCode 10001-14199 http : //dbpedia.org /ontology /populationTotal 3439100 ...... ...... http : //www .geonames.org /ontology #alternateName Berlin http : //www .geonames.org /ontology #alternateName Berlyn@af http : //www .geonames.org /ontology #population 3426354 ...... ...... http : //www .w 3.org /2004/02/skos/core#prefLabel Berlin (Germany) http : //data.nytimes.com/elements/first use 2004-09-12 http : //data.nytimes.com/elements/latest use 2010-06-13 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 9 国立情報学研究所 National Institute of Informatics
  • 10. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Mid-Ontology Learning Approach 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 10 国立情報学研究所 National Institute of Informatics
  • 11. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Predicate Grouping Grouping related predicates from different ontology schema, because many similar or related predicates actually refer to the same thing. Group predicates by exact matching Prune groups by similarity matching Refine groups using extracted relations 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 11 国立情報学研究所 National Institute of Informatics
  • 12. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Predicate Grouping Grouping related predicates from different ontology schema, because many similar or related predicates actually refer to the same thing. Group predicates by exact matching One predicate may have various objects Different predicates may have the same object value Prune groups by similarity matching Refine groups using extracted relations 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 12 国立情報学研究所 National Institute of Informatics
  • 13. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Group Predicates by Exact Matching Create initial groups (Gi ) of PO pairs e.g. Gi .predicates = { db-prop:name, geo-onto:alternateName } Gi .objects = { Berlin, Berlyn@af } Collected data based on “http://dbpedia.org/resource/Berlin”. Predicate Object http : //dbpedia.org /property /name Berlin http : //dbpedia.org /property /population 3439100 http : //dbpedia.org /property /plz 10001-14199 http : //dbpedia.org /ontology /postalCode 10001-14199 http : //dbpedia.org /ontology /populationTotal 3439100 ...... ...... http : //www .geonames.org /ontology #alternateName Berlin http : //www .geonames.org /ontology #alternateName Berlyn@af http : //www .geonames.org /ontology #population 3426354 ...... ...... http : //www .w 3.org /2004/02/skos/core#prefLabel Berlin (Germany) http : //data.nytimes.com/elements/first use 2004-09-12 http : //data.nytimes.com/elements/latest use 2010-06-13 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 13 国立情報学研究所 National Institute of Informatics
  • 14. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Predicate Grouping Grouping related predicates from different ontology schema, because many similar or related predicates actually refer to the same thing. Group predicates by exact matching Prune groups by similarity matching Exact matching may ignore Terms of predicates or objects written in different languages Semantically identical or related predicates Refine groups using extracted relations 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 14 国立情報学研究所 National Institute of Informatics
  • 15. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Prune Groups by Similarity Matching Ontology similarity matching at the concept level String-based similarity measure: StrSim(O(Gi ), O(Gj )) O(Gi ): objects in Gi Prefix, Suffix, Levenshtein distance, and n-gram. Knowledge-based similarity measure: WNSim(T (Gi ), T (Gj )) T (Gi ): pre-processed terms of predicates in Gi Natural Language Processing: tokenizing terms, removing stop words, and stemming. WordNet-based similarity measures: LCH, RES, HSO, JCN, LESK, PATH, WUP, LIN, and VECTOR 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 15 国立情報学研究所 National Institute of Informatics
  • 16. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Prune Groups by Similarity Matching Similarity between initial groups {G1 , G2 , . . . Gk } StrSim(O(Gi ), O(Gj )) + WNSim(T (Gi ), T (Gj )) Sim(Gi , Gj ) = 2 Prune initial groups Gi If Sim(Gi , Gj ) is higher than the predefined similarity threshold, we merge Gi and Gj . If an initial group Gi has not been merged and has only one PO pair, we remove Gi . 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 16 国立情報学研究所 National Institute of Informatics
  • 17. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work An Example of Similarity Calculation Group Predicate Object Gi http : //dbpedia.org /property /population 3439100 http : //dbpedia.org /ontology /populationTotal 3439100 Gj http : //www .geonames.org /ontology #population 3426354 Example of String-based similarity measures on pairwise objects. Pairwise Objects prefix suffix Levenshtein distance n-gram “3439100”, “3426354” 0.29 0 0 0.29 Example of WordNet-based similarity measures on pairwise terms. Pairwise Terms LCH RES HSO JCN LESK PATH WUP LIN VECTOR population, population 1 1 1 1 1 1 1 1 1 population, total 0.4 0 0 0.06 0.03 0.11 0.33 0 0.06 0.145 + 0.5825 Sim(Gi , Gj ) = = 0.36375 2 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 17 国立情報学研究所 National Institute of Informatics
  • 18. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Predicate Grouping Grouping related predicates from different ontology schema, because many similar or related predicates actually refer to the same thing. Group predicates by exact matching Prune groups by similarity matching Refine groups using extracted relations Divide pruned groups according to rdfs:domain and rdfs:range. Keep groups with high frequency 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 18 国立情報学研究所 National Institute of Informatics
  • 19. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Mid-Ontology Learning Approach 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 19 国立情報学研究所 National Institute of Informatics
  • 20. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Mid-Ontology Construction Select terms for Mid-Ontology Collect all the terms of predicates in each refined group Gi . Collect all the pre-processed terms of P(Gi ) (predicates in Gi ). Choose one term, which has the highest frequency and longest term. e.g. “area” and “areaCode” are totally different Construct Relations mo-prop:hasMembers to link Mid-Ontology classes and integrated predicates Construct Mid-Ontology Automatically construct Mid-Ontology using selected terms and mo-prop:hasMembers. 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 20 国立情報学研究所 National Institute of Informatics
  • 21. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Experimental Evaluation Evaluate the Mid-Ontology approach from four different aspects: Evaluation of Data Reduction Evaluation of Ontology Quality Evaluation with A SPARQL Example Analysis of Mid-Ontology Approach 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 21 国立情報学研究所 National Institute of Informatics
  • 22. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Implementation Environment Linux Ubuntu 10.10, 16GB Memory, 1 TB Disk Core i7 CPU 880 3.07GHz Java, Netbeans 6.9 Virtuoso High-performance server for RDF storage SPARQL query endpoint WordNet::Similarity Implemented in Perl Knowledge-based similarity measures 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 22 国立情報学研究所 National Institute of Informatics
  • 23. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Experimental Data DBpedia: cross-domain, 3.5 million things, 8.9 million URIs Geonames: geographical domain, 7 million URIs NYTimes: media domain, 10,467 subject news Choose DBpedia as the core data set, because of its wealth of inward and outward links to other data sets. 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 23 国立情報学研究所 National Institute of Informatics
  • 24. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Evaluation of Data Reduction Evaluate the effectiveness of data reduction during the data collection phase by comparing the number of instances. Number of distinct instances during data collection phase. Data set Before reduction owl:sameAs retrieval Noisy data removal DBpedia 8,955,728 135,749 (1.52%) 88,506 (0.99%) Geonames 7,479,714 128,961 (1.72%) 82,054 (1.10%) NYTimes 10,467 9,226 (88.14%) 8,535 (81.54%) Evaluation Analysis The data sets are dramatically scaled down by keeping only linked instances that share related information. Successfully removed noisy instances, which may affect the quality of the Mid-Ontology. e.g. Removed instances with only db-prop:hasPhotosCollection (broken link) and owl:sameAs link. 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 24 国立情報学研究所 National Institute of Informatics
  • 25. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Evaluation of Ontology Quality Evaluate the quality of Mid-Ontology by validating whether predicates in each class share related information. Accuracy of Mid-Ontology n |Correct Predicates in Ci | i=1 |Ci | ACC (MO) = n n: the number of classes |Ci |: the number of predicates in class Ci . Cardinality |Number of Predicates| Cardinality = |Number of Classes| 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 25 国立情報学研究所 National Institute of Informatics
  • 26. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Evaluation of Ontology Quality Improvement achieved by our approach MO no p r: with exact matching (without the pruning and refining processes) MO: with both pruning and refining processes MO Number of Classes Number of Predicates Cardinality Accuracy MO no p r 11 300 27.27 68.78% MO 29 180 6.21 90.10% Evaluation Analysis Significantly improved the accuracy Decreased the cardinality (Less number of predicates and more classes) Successfully removed unrelated predicates 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 26 国立情報学研究所 National Institute of Informatics
  • 27. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Evaluation with A SPARQL Example Evaluate the effectiveness of information retrieval with the Mid-Ontology constructed with our approach. Predicates grouped in mo-onto:population. <rdf:Description rdf:about=“mid-onto:population”> <mo-prop:hasMembers rdf:resource=“http://dbpedia.org/property/population”/> <mo-prop:hasMembers rdf:resource=“http://dbpedia.org/property/popLatest”/> <mo-prop:hasMembers rdf:resource=“http://dbpedia.org/property/populationTotal”/> <mo-prop:hasMembers rdf:resource=“http://dbpedia.org/ontology/populationTotal”/> <mo-prop:hasMembers rdf:resource=“http://dbpedia.org/property/einwohner”/> <mo-prop:hasMembers rdf:resource=“http://www.geonames.org/ontology#population”/> </rdf:Description> 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 27 国立情報学研究所 National Institute of Informatics
  • 28. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Evaluation with A SPARQL Example SPARQL: Find places with a population of more than 10 million. SELECT DISTINCT ?places WHERE{ mid-onto:population mo-prop:hasMembers ?prop. ?places ?prop ?population. FILTER (xsd:integer(?population) > 10000000). } Single property for population Number of Results http://dbpedia.org/property/population 177 http://dbpedia.org/property/popLatest 1 http://dbpedia.org/property/populationTotal 107 http://dbpedia.org/ontology/populationTotal 129 http://dbpedia.org/property/einwohner 1 http://www.geonames.org/ontology#population 244 Evaluation Analysis Find 517 places with mid-onto:population. Less results with each single predicate under the same condition. 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 28 国立情報学研究所 National Institute of Informatics
  • 29. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Analysis of Mid-Ontology Approach Analyze whether we can successfully identify how data sets are connected. Sample classes in the Mid-Ontology DBpedia DBpedia & Geonames DBpedia & Geonames & NYTimes mo-onto:birthdate mo-onto:population mo-onto:name mo-onto:deathdate mo-onto:prominence mo-onto:long mo-onto:motto mo-onto:postal Evaluation Analysis Predicates in DBpedia are heterogeneous. Linked instances between DBpedia and Geonames are about places. Linked instances among DBpedia, Geonames, and NYTimes are about events, persons, or places. 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 29 国立情報学研究所 National Institute of Informatics
  • 30. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Possible Application Find missing owl:sameAs links e.g. Find missing owl:sameAs link with mo-onto:population http://dbpedia.org/resource/Cyclades db-prop:population “119549” http://dbpedia.org/resource/Cyclades db-prop:name “Cyclades” http://sws.geonames.org/259819/ geo-onto:population “119549” http://sws.geonames.org/259819/ geo-onto:alternateName “Cyclades” 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 30 国立情報学研究所 National Institute of Informatics
  • 31. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Possible Application Find missing owl:sameAs links e.g. Find missing owl:sameAs link with mo-onto:population http://dbpedia.org/resource/Cyclades db-prop:population “119549” http://dbpedia.org/resource/Cyclades db-prop:name “Cyclades” http://sws.geonames.org/259819/ geo-onto:population “119549” http://sws.geonames.org/259819/ geo-onto:alternateName “Cyclades” Add owl:sameAs link http://dbpedia.org/resource/Cyclades owl:sameAs http://sws.geonames.org/259819/ http://sws.geonames.org/259819/ owl:sameAs http://dbpedia.org/resource/Cyclades 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 31 国立情報学研究所 National Institute of Informatics
  • 32. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Related Work Construct intermediate-layer ontology from geospatial, zoology, and genetics data resources. [Parundekar, et al.,2010] Limited to a specific domain Construct intermediate-level ontology by enriching upper ontology (by adding new classes and properties). [Damova, et al., 2010] Still too large Analysis of basic properties of SameAs network, Pay-Level-Domain network and Class-Level Similarity network. [Ding, et al., 2010] Only frequent types are considered to analyze how data are connected 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 32 国立情報学研究所 National Institute of Informatics
  • 33. Introduction Mid-Ontology Learning Approach Experimental Evaluation Related Work Conclusion and Future Work Conclusion and Future Work Conclusion Learning heterogeneous ontology schema in the linked open data sets is not feasible. An automatic Mid-Ontology learning approach can solve the heterogeneity problem by integrating related predicates. The Mid-Ontology has a high accuracy, and effective to search from various data sets. A simple Mid-Ontology can be constructed without learning the entire ontology schema. Future Work Billion Triple Challenge (BTC) data set Crawl links at two or three depths without a core data set 大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 33 国立情報学研究所 National Institute of Informatics
  • 34. Questions? Lihua Zhao, lihua@nii.ac.jp Ryutaro Ichise, ichise@nii.ac.jp大学共同利用機関法人 情報・システム研究機構 Lihua Zhao and Ryutaro Ichise | Mid-Ontology Learning from Linked Data | 34国立情報学研究所National Institute of Informatics