• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Type inference through the analysis of Wikipedia links
 

Type inference through the analysis of Wikipedia links

on

  • 511 views

 

Statistics

Views

Total Views
511
Views on SlideShare
511
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Type inference through the analysis of Wikipedia links Type inference through the analysis of Wikipedia links Presentation Transcript

    • Type inference through the analysis of Wikipedia links Andrea Giovanni Nuzzolese nuzzoles@cs.unibo.it Aldo Gangemi aldo.gangemi@cnr.it Valentina Presutti valentina.presutti@cnr.it Paolo Ciancarini ciancarini@cs.unibo.it stlab.istc.cnr.it 16 April 2012 - Lyon, France - LDOW 2012
    • Outline• Motivations• Materials• Applied methods• Results• Conclusions stlab.istc.cnr.it 2
    • Motivations ✦ Only a subset of the DBpedia resources is typed with the Resources used in wikilinks relations: DBpedia ontology (DBPO) 15,944,381 ✦ The typing procedure is top- down. ✦ Is the DBPO complete with respect to the DBpedia domain?Resources having a ✦ How good and homogeneous DBPO type: 1,518,697 is the granularity of DBPO types? stlab.istc.cnr.it 3
    • MaterialsWikilink triples with typed subject/object: 16,745,830 DBpedia 3.6 Dataset # of triples Wikilinks wikilink triples 107,892,317 triples: 107,892,317 infobox mapping-based “data” triples 9,357,273 rdfs:label triples 7,972,225 rdf:type triples 6,173,940 DBpedia ontology: 272 classes infobox mapping-based “object” triples 4,251,239 stlab.istc.cnr.it 4
    • What we did• Wikilinks of a DBpedia resource convey knowledge that can be used for classifying it.• Classification methods ✦ Inductive learning: k-Nearest Neighbor algorithm ✦ Abductive classification based on EKPs [1] and homotypes used as background knowledge• The methods were performed on Resources having a Resources used in wikilinks DBPO type: relations: Sample of 1,518,697 15,944,381 untyped resources: 1,000 [1] A. G. Nuzzolese, A. Gangemi,V. Presutti, and P. Ciancarini. Encyclopedic Knowledge Patterns from Wikipedia Links. In L. Aroyo, N. Noy, and C. Welty, editors, Proceedings of the 10th International Semantic Web Conference (ISWC2011), pages 520-536. Springer, 2011. stlab.istc.cnr.it 5
    • Inductive classification• We designed two inductive classification experiments based on the k-NN algorithm ✦ on 272 features, i.e., all the classes in the DBPO ✦ on 27 features, i.e., the top-level classes in the DBPO hierarchy• For each experiment we built a labeled feature space model as training set by using a randomly sampled 20% of typed resources ✦ the algorithms were tested on the remaining 80% of typed resources stlab.istc.cnr.it 5 6
    • Building the training set for K-Nearest Neighbor algorithmdbpo:wikiPageWikiLink dbpedia:Apple_Inc. dbpedia:NeXT dbpedia:Steve_Jobs dbpedia:Forbes dbpedia:Cupertino,_California Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs ... stlab.istc.cnr.it 7
    • Building the training set for K-Nearest Neighbor algorithm dbpo:Organisationdbpo:wikiPageWikiLink dbpedia:Apple_Inc. dbpedia:NeXT rdf:type dbpedia:Steve_Jobs dbpo:Magazine dbpo:City dbpedia:Forbes dbpo:Persondbpedia:Cupertino,_California Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs dbpo:Person ... stlab.istc.cnr.it 7
    • Building the training set for K-Nearest Neighbor algorithm dbpo:Organisationdbpo:wikiPageWikiLink dbpo:Magazine dbpo:City rdf:type kp:linksTo dbpedia:Steve_Jobs Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs 0 0 1 0 1 1 dbpo:Person ... stlab.istc.cnr.it 7
    • Building the training set for K-Nearest Neighbor algorithm dbpo:Organisationdbpo:wikiPageWikiLink dbpo:Magazine dbpo:City rdf:type kp:linksTo dbpedia:Steve_Jobs Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs 0 0 1 0 1 1 dbpo:Person ... ... ... ... ... ... ... ... stlab.istc.cnr.it 7
    • Building the training set for K-Nearest Neighbor algorithm dbpo:Organisationdbpo:wikiPageWikiLink dbpo:Magazine dbpo:City rdf:type kp:linksTo dbpedia:Steve_Jobs ✦ Precision using all DBPO types as features: 31.65% ✦ Precision using the top-level of DBPO as features: 40.27% stlab.istc.cnr.it 7
    • Abductive classification with EKPs• EKPs ✦ A EKP of a certain entity type is a small vocabulary that captures the core types used for describing such entity type as it emerges from the Wikipedia crowds visit aemoo.org for an exploratory tool based on EKPs stlab.istc.cnr.it 8
    • How can we infer the type of“Galileo Galilei”? http://www.aemoo.org stlab.istc.cnr.it 9
    • How can we infer We know its path types the type of“Galileo Galilei”? http://www.aemoo.org stlab.istc.cnr.it 9
    • We compare the path types involvingWe have 231 EKPs “Galileo Galilei” as subject with EKPs in order to identify the most similar, which is the "Scientist" EKP. http://www.aemoo.org stlab.istc.cnr.it 9
    • The inferred type for the resource “Galileo Galiei” is the class “Scientist” http://www.aemoo.org stlab.istc.cnr.it 9
    • Distinctive weakness of some EKPs ✦ The distinctive weakness seems due to wide overlaps among some EKPs ✦ Systematic ambiguity of the 4 largest classes ✦ Precision and recall on all DBPO types both 44.4% ✦ Precision and recall on the top-level of DBPO hierarchy: 36.5% and 79.5% stlab.istc.cnr.it 10
    • Homotype-based abductive classification• Homotypes are wikilinks that have the same type on both the subject and the object of the triple dbpo:Philosopher dbpedia:Immanuel_Kant dbpedia:Plato dbpo:Philosopher rdf:type dbpo:wikiPageWikiLink rdf:type• We have observed how the homotype is usually the most frequent (or in the top 3) wikilink type• Given an untyped entity, we hypothesize that the most frequent type involved in its ingoing/ outgoing wikilinks detects its homotype, hence it indicates its type 11 stlab.istc.cnr.it
    • Homotype-based abductive classification s stlab.istc.cnr.it 12
    • Homotype-based abductive classification s stlab.istc.cnr.it 12
    • Results on classifying already typed resources stlab.istc.cnr.it 13
    • Results on untyped resources • Results on a sample of 1,000 untyped resources are much less satisfactory With EKPs With Homotypes stlab.istc.cnr.it 14
    • Why? [1]• Typed entities: 2:3 typed wikilinks ratio• Untyped entities: 1:3 typed wikilinks ratio• Link structure for untyped entities is not rich enough stlab.istc.cnr.it 15
    • Why? [2]• DBPO does not provide a complete set of classes for correctly typing DBpedia resourcesdbpedia:List_of_FIFA_World_Cup_finals Collection dbpedia:Computer_Science ScientificDiscipline dbpedia:Counterattack Plan dbpedia:Eros(concept) Concept dbpedia:Gentlemen’s_agreement Agreement stlab.istc.cnr.it 16
    • Conclusions• We have investigated different approaches for typing DBpedia resources based on the data set of wikilinks• Results are acceptable in the test set, but extensive untypedness in output links, and poor DBPO coverage severely compromise automatic typing for untyped resources• We have analyzed possible causes deriving from some bias in DBpedia 17 stlab.istc.cnr.it
    • Future work• Yago could be helpful but ✦ there is a lack of mapping between YAGO and DBPO ✦ it has larger coverage and only an overlap with DBPO ✦ the granularity of its categories is finer, and not easily reusable, because the top level is very large stlab.istc.cnr.it 18
    • Thank you Andrea Nuzzolese - STLab, ISTC-CNR &Dipartimento di Scienze dell’Informazione University of Bologna Italy stlab.istc.cnr.it 19
    • Path popularity nRes(dbpo:MusicalArtist) dbpo:MusicalArtist Anthony_Kiedis Michael_Jacksonrdf:type Jackie_Jackson rdf:type rdf:type rdf:type Chad_Smith dbpo:MusicalArtist dbpo:MusicalArtistNumber of resources having type dbpo:MusicalArtist rdf:type John_Lennon Paul_McCartney rdf:type dbpo:MusicalArtist PREFIX dbpo : http://dbpedia.org/ontology/ stlab.istc.cnr.it
    • Path popularity nSubjectRes(Pi,j) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Si dbpo:wikiPageWikiLink Oj dbpo:MusicalArtist dbpo:Band Foo Fighters Beatles Number of distinct resourcesthat participate in a path as subjects John_Lennon Paul_McCartney PREFIX dbpo : http://dbpedia.org/ontology/ stlab.istc.cnr.it
    • Path popularity pathPopularity(Pi,j,Si) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Madonna Prince Charlie_Parker Keith_JarrettFoo Fighters Beatles nSubjectRes(Pi,j)/nRes(Si) John_Lennon Paul_McCartney stlab.istc.cnr.it