• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Graph-based Ontology Analysis in the Linked Open Data
 

Graph-based Ontology Analysis in the Linked Open Data

on

  • 1,356 views

 

Statistics

Views

Total Views
1,356
Views on SlideShare
1,352
Embed Views
4

Actions

Likes
0
Downloads
19
Comments
0

2 Embeds 4

https://twitter.com 3
http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Graph-based Ontology Analysis in the Linked Open Data Graph-based Ontology Analysis in the Linked Open Data Presentation Transcript

    • Graph-based Ontology Analysis in the Linked Open DataLihua Zhao, Ryutaro IchiseSeptember 5, 2012, I-Semantics2012, Graz, Austria
    • Outline Introduction Related Work Our Approach Graph Pattern Extraction <Predicate, Object> Collection Related Classes and Predciates Grouping Integration for All Graph Patterns Manual Revision Experiments Experimental Data Graph Patterns of Linked Instances Class-level Analysis Predicate-level Analysis Comparison with Previous Work Conclusion and Future Work Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 2
    • IntroductionLinked Open Data (LOD) 295 data sets, 31 billion RDF triples (as of Sep. 2011). Interlinked instances (owl:sameAs). Linked LOV User Slide- tags2con Audio Feedback share2RDF delicious Moseley Scrobbler Bricklink Sussex Folk (DBTune) Reading St. GTAA Magna- Klapp- Lists Andrews tune Resource stuhl- NTU DB club Lists Resource Tropes Lotico Semantic yovisto John Music Man- Lists Music Tweet chester Hellenic Peel Brainz NDL (DBTune) (Data Brainz Reading subjects FBD (zitgist) Lists Open EUTC Incubator) Linked Hellenic Library t4gm Produc- Crunch- Open PD tions Surge RDF info base Library Radio Discogs ohloh Ontos Source Code Crime (Data Plymouth (Talis) News Ecosystem Reading RAMEAU LEM Reports business. Incubator) Portal Crime Linked Data Lists SH UK data.gov. Music Jamendo (En- uk AKTing) Brainz (DBtune) Linked Ox FanHubz gnoss ntnusc (DBTune) SSW LCCN Points Last.FM Poké- Thesau- Thesau- Popula- artists pédia Didac- rus rus W LIBRIS tion (En- (DBTune) Last.FM talia theses. LCSH Rådata reegle AKTing) research. patents. MARC data.gov. data.gov. (rdfize) my fr Codes nå! NHS uk uk Good- Experi- List Ren. Classical Energy (En- win flickr ment (DB Pokedex Family wrappr Norwe- Genera- AKTing) Mortality BBC Sudoc PSH Tune) gian tors (En- Program- AKTing) MeSH mes semantic IdRef GND CO2 education. OpenEI BBC web.org Energy SW Sudoc ndlna Emission data.gov. Music Dog VIAF EEA (En- uk Chronic- Linked (En- Food AKTing) ling Event MDB Portu- UB Mann- AKTing) Europeana America Media guese heim BBC DBpedia Calames Recht- Wildlife Deutsche Ord- Revyu DDC Open Openly spraak. Finder Bio- Election nance lobid Local graphie NSZL Data legislation Survey nl Tele- RDF Book data Ulm Resources Swedish Project data.gov.uk New Catalog EU Insti- graphis Mashup bnf.fr Open tutions York URI Greek Open P20 Cultural UK Post- Times Heritage Burner DBpedia Calais codes statistics. ECS Wiki lobid GovWILD data.gov. Taxon iServe South- Organi- uk LOIUS BNB Concept ECS ampton sations Brazilian Geo World BibBase STW GESIS OS ECS Poli- ESD Names Fact- South- ampton (RKB ticians stan- reference. book Budapest dards data.gov.uk Freebase EPrints Explorer) data.gov. intervals NASA uk Project OAI Lichfield transport. (Data Incu- DBpedia data Pisa Spen- Guten- dcs data.gov. bator) Fishes berg RESEX Scholaro- ding DBLP ISTAT uk of DBLP meter Immi- Scotland Geo (FU (L3S) Texas Uberblic gration Pupils & Species data- Berlin) DBLP IRIT Exams Euro- dbpedia (RKB London stat TCM open- ACM lite Gene ac- Explorer) IBM NVD Traffic Gazette (FUB) Geo Scotland TWC LOGD Eurostat Daily DIT uk Linked UN/ Data UMBEL Med ERA Data LOCODE DEPLOY Gov.ie CORDIS YAGO New- lingvoj Disea- (RKB some SIDER RAE2001 castle LOCAH Explorer) Linked Eurécom CORDIS Drug Roma Eurostat Sensor Data CiteSeer (FUB) (Ontology Bank GovTrack (Kno.e.sis) riese Open Pfam Course- Central) Enipedia LinkedCT Cyc Lexvo ware Linked UniProt PDB VIVO EURES EDGAR ePrints dotAC US SEC Indiana IEEE (Ontology totl.net (rdfabout) Central) WordNet RISKS (VUA) Taxo- UniProt US Census EUNIS Twarql (Bio2RDF) HGNC Semantic (rdfabout) Cornetto nomy VIVO FTS XBRL PRO- ProDom STITCH Cornell LAAS SITE NSF Scotland KISTI Geo- LODE Geo- graphy WordNet WordNet WordNet JISC (W3C) (RKB Affy- Climbing Linked KEGG SMC Explorer) SISVU metrix Pub Drug VIVO UF Piedmont GeoData PubMed ECCO- Media Finnish Journals Gene SGD Chem Accomo- TCP Munici- dations El Viajero Ontology palities Alpine AGROVOC bible Tourism Ski ontology Geographic Austria KEGG Ocean Enzyme PBAC GEMET ChEMBL Italian Drilling Metoffice OMIM KEGG AEMET Weather Open Publications public Codices Linked MGI Pathway Forecasts Data InterPro GeneID schools EARTh Thesau- Open KEGG Turismo rus Colors Reaction User-generated content de Zaragoza Product Smart KEGG Weather DB Link Medi Glycan Janus Stations Product Care KEGG Government AMP UniParc UniRef UniSTS Types Italian Homolo- Com- Yahoo! Airports Ontology Museums pound Google Gene Cross-domain Geo Art Planet National wrapper Chem2 Radio- Bio2RDF activity Uni Life sciences JP Sears Open Linked OGOLOD Pathway Corpo- Amster- Reactome dam medu- Open rates Numbers Museum cator As of September 2011 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 3
    • Challenging ProblemsInfeasible to understand all the ontology schema of linked data sets. Ontology heterogeneity problem Heterogeneous ontology classes DBpedia: http://dbpedia.org/ontology/Country. Geonames: http://www.geonames.org/ontology#A.PCLI. LinkedMDB: http://data.linkedmdb.org/resource/movie/country. Heterogeneous ontology predicates http://dbpedia.org/property/populationTotal. http://dbpedia.org/property/population. Time-consuming and infeasible to inspect large ontologies Misuse of classes and predicates DBpedia: 320 classes and thousands of predicates. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 4
    • Solution for the ProblemsAutomatically or semi-automatically integrate different ontologiesby analyzing interlinked instances. Semi-automatic ontology integration Reduce the ontology heterogeneity. Identify important ontology classes and predicates that link instances. Easy to understand simple integrated ontology. Simplify the queries on various data sets. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 5
    • Related Work Find useful attributes from frequent graph patterns. [Le, et al., 2010] Only for geographic data. Analysis of basic predicates of SameAs network, Pay-Level-Domain network and Class-Level Similarity network. [Ding, et al., 2010] Only frequent types are considered to analyze how data are connected. A debugging method for mapping lightweight ontologies. [Meilicke, et al., 2008] Limited to the expressive lightweight ontologies. Construct intermediate-layer ontology from geospatial, zoology, and genetics data resources. [Parundekar, et al., 2010] Only for specific domains and only considers at class-level. Construct an integrated mid-ontology from DBpedia, Geonames, and NYTimes. [Zhao, et al., 2011] Needs a hub data set and only considers at predicate-level. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 6
    • Our Approach Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 7
    • Step 1: Graph Pattern Extraction Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 8
    • Graph Pattern ExtractionExtract graph patterns from interlinked instances to discoverrelated ontology classes and predicates. SameAs Graph SG = (V, E, I), V is a set of labels of data sets, E ⊆ V × V, I is a set of URIs of the interlinked instances. Example: SGAustria = (V, E, I) V = {D, G, N, M} E = {(D,G), (D,N), (G,N), (G,M)} I = { db:Austria, geo:2782113, nyt:66221058161318373601, mdb-country:AT}. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 9
    • Step 2: <Predicate, Object> Collection Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 10
    • <Predicate, Object> CollectionAn instance has a collection of <subject, predicate, object>.(instance URI → subject, property → predicate, class → object) <predicate, object> (PO) pairs as the content of a SameAs Graph. Classify PO pairs into five types Class: rdf:type and skos:inScheme. Date: XMLSchema:date, gYear, gMonthDay, etc. Number: XMLSchema:integer, int, float, double, etc. URI: starts with “http://” and XMLSchema:anyURI. String: XMLSchema:string and Others. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 11
    • An Example of Collected PO pairs Table: PO pairs and types for SGAustria Predicate Object Type rdf:type owl:Thing Class rdf:type db-onto:Place Class rdf:type db-onto:PopulatedPlace Class rdf:type db-onto:Country Class rdfs:label “Austria”@en String db-onto:wikiPageExternalLink http://www.austria.mu/ URI db-prop:populationEstimate 8356707 Number ...... ...... ...... geo-onto:name Austria String geo-onto:alternateName “Austria”@en String geo-onto:alternateName “Republic of Austria”@en String geo-onto:featureClass geo-onto:A Class geo-onto:featureCode geo-onto:A.PCLI Class geo-onto:population 8205000 Number ...... ...... ...... rdf:type mdb:country Class mdb:country name Austria String ...... ...... ...... skos:inScheme nyt:nytd geo Class skos:prefLabel “Austria”@en String nyt-prop:first use 2004-10-04 Date Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 12
    • Step 3: Related Classes and Predicates Grouping Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 13
    • Related Classes GroupingGroup related classes from each SameAs Graph by trackingsubsumption relations owl:subClassOf and skos:inScheme. < C1 owl:subClassOf C2 > or < C1 skos:inScheme C2 > means the concept of class C1 is more specific than the concept of class C2 . Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 14
    • Related Predicates GroupingPerform pairwise comparison on <predicate, object> (PO) pairs tofind out related predicates (properties). Discover related predicates using different methods for the types of Date, URI, Number, and String. Date, URI: exact matching. Number, String: exact matching + similarity matching.Exact matching on PO pairs to create initial sets of PO pairs. If OPOi = OPOj or PPOi = PPOj ⇒ Sk ← POi , POj OPO : the object of PO. PPO : the predicate of PO. S: Initial set of PO pairs. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 15
    • Related Predicates GroupingSimilarity matching on PO pairs of type Number and String. Similarity between POi and POj . ObjSim(POi , POj ) + PreSim(POi , POj ) Sim(POi , POj ) = 2 Merge similar initial sets Si and Sj . if Sim(POi , POj ) ≥ θ, where POi ∈ Si , POj ∈ Sj Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 16
    • Related Predicates Grouping Similarity of objects between two PO pairs. |OPOi −OPOj | 1− OPOi +OPOj if OPO is Number ObjSim(POi , POj ) = StrSim(OPOi , OPOj ) if OPO is String OPO : the object of PO. StrSim(OPOi , OPOj ): the average of the three string-based similarity values JaroWinkler, Levenshtein distance, and n-gram. Similarity of predicates between POi and POj PreSim(POi , POj ) = WNSim(TPOi , TPOj ) TPO : the pre-processed terms of the predicates in PO. WNSim(TPOi , TPOj ): the average of the nine applied WordNet-based similarity values. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 17
    • Step 4: Integration for All Graph Patterns Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 18
    • Integration for All Graph PatternsGroups of related classes and predicates are independent for eachgraph pattern. Hence, we integrate them for all the graph patternsto construct an integrated ontology. Select terms for integrated ontology. ex-onto:ClassTerm: select one concept from a set of classes. ex-prop:propTerm: select one concept from a set of predicates. Construct relations. ex-prop:hasMemberClasses: link sets of classes with ex-onto:ClassTerm. ex-prop:hasMemberDataTypes: link sets of predicates with ex-prop:propTerm. Construct an integrated ontology. Sets of related classes and predicates. Selected terms: ClassTerm and propTerm. Constructed relations. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 19
    • Step 5: Manual Revision Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 20
    • Manual RevisionMinor revision process on the automatically constructed ontology. Modify incorrect terms Not all the terms of classes and predicates are properly selected. Add domain information About 40% of the predicate sets lack of rdfs:domain information. Modify incorrectly grouped classes and predicates We can not guarantee 100% accuracy. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 21
    • ExperimentsAnalyze the characteristics of linked instances with the integratedontology constructed with our approach. Experimental Data Graph Patterns of Linked Instances Class-level Analysis Predicate-level Analysis Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 22
    • Experimental Data DBpedia: cross-domain, 3.5 million things, 8.9 million URIs. Geonames: geographical domain, 7 million URIs. NYTimes: media domain, 10,467 subject news. LinkedMDB: media domain, 0.5 million entities. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 23
    • Graph Patterns of Linked Instances 13 graph patterns Frequent graph patterns: GP1, GP2, GP3 N,G,D: GP4, GP5, GP7, GP8 N,M,D: GP6 M,G,D: GP9 M,D,N,G: GP10, GP11, GP12, GP13 Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 24
    • Class-level AnalysisSuccessfully integrated related classes from extracted graph patters. Characteristics of graph patterns Class Type Graph Pattern Actor GP2 , GP6 Person(Athlete, Politician, etc) GP3 Organization/Agent GP1 , GP3 , GP8 Film GP2 City/Settlement GP1 , GP4 , GP5 , GP7 , GP8 Country GP9 , GP10 , GP11 , GP12 , GP13 Place(Mountain, River, etc) GP1 , GP3 , GP7 Integrated 97 classes into 48 groups Example: ex-onto:Country db-onto:Country geo-onto:A.PCLI mdb:country nyt:nytd geo Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 25
    • Class-level Analysis Discover missing class information Example: db:Shingo Katori db:Shingo Katori rdf:type dbpedia-owl:MusicalArtist. mdb-actor:27092 owl:sameAs db:Shingo Katori Therefore, db:Shingo Katori rdf:type db-onto:Actor. Main classes of each data set. NYTimes: person, organization, and place. LinkedMDB: movie, actor, and country. Geonames: A(country, administrative region), P (city, settlement), T (mountain), S (building, school), and H (Lake, river). DBpedia: person (artist, politician, athlete), organization (company, educational institute, sports team), work (film), and place (populated place, natural place, architectural structure). Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 26
    • Predicate-level Analysis Integrated 367 predicates into 38 groups Example: ex-prop:birthDate Predicate Number of Instances db-onto:birthDate 287,327 db-prop:datebirth 1,675 db-prop:dateofbirth 87,364 db-prop:dateOfBirth 163,876 db-prop:born 34,832 db-prop:birthdate 70,630 db-prop:birthDate 101,121 Recommend standard predicates <db-onto:birthDate, rdfs:domain, db-onto:Person> “db-onto:birthDate” has the highest frequency of usage Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 27
    • Comparison with Previous WorkCompare our ontology integration approach with the mid-ontologyapproach [Zhao, et al., JIST2011]. Mid-Ontology approach Our approach A hub data for data collection. No hub data. String-based similarity measures Different similarity measures for for all types of objects. different types of objects. 105 predicates in 22 groups. 367 predicates into 38 groups. No classes 97 classes into 48 groups Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 28
    • Conclusion and Future Work Conclusion Integrate heterogeneous ontologies from various data sets. Identify the characteristics of graph patterns using the integrated ontology classes. Recommend standard predicates using the integrated ontology predicates. Reduce the heterogeneity of ontologies. Construct an integrated ontology without learning the entire ontology schema. Future Work Use more data sets in the LOD cloud. Apply MapReduce method to solve scalability and ontology heterogeneity problem. Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 29
    • Questions? Lihua Zhao, lihua@nii.ac.jpRyutaro Ichise, ichise@nii.ac.jpLihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 30