Your SlideShare is downloading. ×
0
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Linkset quality
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Linkset quality

225

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
225
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Assessing Linkset Quality ForComplementing Third Party Datasets Riccardo Albertoni, Asunción Gómez Pérez ralbertoni@delicias.dia.fi.upm.es Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid A paper pertaining to this work is going to be presented at 3RD INTERNATIONAL WORKSHOP ON LINKED WEB DATA MANAGEMENT (LWDM 2013) in conjunction with the 16th International Conference on Extending Database Technology (EDBT 2013) March 22, 2013 - Genoa, Italy
  • 2. Motivations LINKED DATA’s PROMISE: Evolving the Web into a Global Data Space It should help to overcome data silos effect…. So many bubbles there, THAT’S SO COOL!! BUT …. Can I exploit that third party data for my OWN ANALYSES?Riccardo Albertoni 2
  • 3. MotivationWhat does this arrow mean Hard to say!!! ??Can I use DBLP to extend You must engage anSWDF in order to perform painful analysis of the two any kind target task? datasets and their interlinks in order to know .. NO ground concepts DBLP about What makes a linkset SWDF suitable for a target application ( some concepts but forRiccardo Albertoni datasets, not linksets) 3
  • 4. What is Linkset Quality for? Linked Data Publishers can check if a linkset That they have provided • is good enough or need to be improved; • is still good enough when one of the two target datasets is updated. Linked Data Consumers can • figure out if they can or can’t rely on a linkset; • have a first guess of what is the next move they can take to improve the linkset; • rank possible linkset alternatives.Riccardo Albertoni 4
  • 5. Complementing a Dataset X via a Linkset L DBLP Pub1 Y a Pub2 foaf:made Journal 1 Pub3 ≠ b Pub4 Linkset L Yolanda Gil foaf:madeAfflii5 a owl:sameAs a’ Complementation might a’ b owl:sameAs b’ introduce some “data missing”Affili4 foaf:member The less “data missing” (like b’ researcher c) are introduced the more the Linkset is complete foaf:member c’Affili3 X XL 5
  • 6. Complementing a Dataset X via a Linkset L ≠ Linkset L Yolanda Gil a owl:sameAs a’ DBLP YAfflii5 b owl:sameAs b’ Pub1 Complementation might a’ a introduce some “data missing”Affili4 Pub2 foaf:member foaf:made Journal 1 The less “data missing” (like Pub3 b’ b researcher c) are introduced the Pub4 more the Linkset is complete foaf:member foaf:made c’Affili3 X XL 6
  • 7. What is a Linkset ? (http://vocab.deri.ie/void) Every linkset a special kind of dataset !!Every linkset has two target datasets: Every linkset should have only one Subject and Object datasets linking propertyRiccardo Albertoni owl:sameAs linksets 7
  • 8. Defining quality measures Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009What to define providing a quality Provided in this Linkset qualitymeasureQuality Indicator is an aspect of a data item Entities Typesor data set that may give an indication to the Number of Entities for Typesuser of the suitability of the data for some ……intended use.Scoring Function namely, functions Linkset Type Coverageevaluating quality indicators to measure the Linkset Type Completenesssuitability of the data for some intended use. Linkset Entity Coverage for TypeAggregate Metric user-specified Interpretation tables:assessment metric built upon scoring interpretation on the scoringfunctions. These aggregations produce new functions that helps in figuring outassessment values through the average, which is the next action to dosum, max, min or threshold functions appliedto the set of scoring functions.Riccardo Albertoni 8
  • 9. Example: indicators applied on DBLP & SWDFType(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person swrc:Proceedings swr:Proceedings foaf:Document ro:FullPaper ro:PosterPaper ro:ShortPaper DBLP SWDF#E4Type(foaf:Agent,DBLP)=1000000#E4Type(swrc:Proceedings,DBLP)=1108400#E4Type(foaf:Document,DBLP)=1984087 Riccardo Albertoni 9
  • 10. Quality indicators: Types Power set on the Dataset/ possible User Linkset defined Types Returns the types of entities exposed in a dataset or a linkset e.g. owl:Class, owl:Restricti on, skos:Concept, sko s:ConceptSchemeRiccardo Albertoni 11
  • 11. Quality indicators: # of Entity for a Type Dataset/ One of the possible User Linkset defined Types Set of (positive) integer Returns the number of entities exposed in a dataset/ linkset for a given typeRiccardo Albertoni 12
  • 12. Example: indicators applied on DBLP & SWDFType(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings L2 foaf:Document ro:FullPaper L3 ro:PosterPaper L4 ro:ShortPaper DBLP SWDF #E4Type(foaf:Agent, L1)= 100 Type(L1) #E4Type(foaf:Person, L1)=100 Type(L2) Riccardo Albertoni 13
  • 13. Ideas behind the scoring function: Type Coverage Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings DBLP SWDF Complementing DBLP with L1, are we adding some new entities to DBLP? DBLPL1 provides organizations for the researchers that have been interlinkedRiccardo Albertoni 15
  • 14. Ideas behind the scoring function: Type Coverage Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings L2 DBLP SWDF Complementing SWDF with L1, we don’t add any new type of entities SWDFL1 has exactly the same kind of Entities of SWDF We can try to extend SWDF in term of extensional coverage: getting In SWDFL1 U DBLP more researchers/proceedings entities, but we cannot enlarge the type of EntitiesRiccardo Albertoni 16
  • 15. Formalization of Linkset Type Coverage LinksetTarget datasetRiccardo Albertoni 17
  • 16. scoring functions: Ideas behind Type Completeness (1) Focusing on the type of entities, is the owl:sameAs linksets L1 complete? Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings DBLP SWDF It does not make sense to run a procedure ( e.g., SILK) trying to discoverinterlinks between the instances of swrc:Proceedings and foaf:Organization!!! Riccardo Albertoni 18
  • 17. scoring function: Ideas behind Type Completeness(2) Focusing on the type of entities, is the owl:sameAs linksets L1 complete? Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings Alignment DBLP SWDFamong classes We should try to run a procedure ( e.g., SILK) trying to discover interlinks between the instances of swrc:Proceedings and swr:Proceedings!!! Riccardo Albertoni 19
  • 18. Formalization of Linkset Type Completeness Target dataset 1 Terget dataset 2 Linkset Types In the subject that are not considered in the linkset returns the set of types that X have an equivalent in Y according to a relation of equivalence among classes A linkset is complete with respect to types  LTCom= 1 LTCom<1 otherwiseRiccardo Albertoni 20
  • 19. Ideas behind the scoring function: Type Coverage Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings L2 DBLP SWDF LTCom(L1,DBLP, SWDF) = 1- (|{swrc:Proceedings}| / |{swrc:Proceedings,foaf:Person}|)=1/2 LTCom(L2,DBLP, SWDF) = 1- (|{}| / |{swr:Proceedings,foaf:Person}|)=1Riccardo Albertoni 21
  • 20. LINKSET Type CompletenessRiccardo Albertoni 22
  • 21. Linkset Entity Coverage for Type foaf:Organization foaf:Agent foaf:Person swrc:Proceedings swr:Proceedings L 1 L2 DBLP SWDF L1 and L2 are indistinguishable from the point of view of types Which is the most interesting? L1 or L2? Or L1 U L2 ? How good is a linkset providing 100 owl:sameAs? Number of Entity of type T in the linkset L Number of Entity of type T in the Dataset XRiccardo Albertoni 23
  • 22. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 24
  • 23. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 25
  • 24. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 26
  • 25. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 27
  • 26. Aggregate Metrics reasoning on TYPEs and Entity coverageRiccardo Albertoni 28
  • 27. Related work: (extended discussion in the paper)• WIQA is a Information Quality Contributes with a policy language, engine for interpreting such policies, Explanation if a piece of information Assessment Framework satisfies that policy • C. Bizer and R. Cyganiak. Quality-driven information Quality criteria are parameters of the filtering using the WIQA policy framework. J. WebSem., system It does not aim at proposing new 7(1):110, 2009 quality measures• LOD2 • P. N. Mendes, C. Bizer, J. H. Young, Z. Miklos, J.-P. Reviews quality dimensions Calbimonte, and A. Moraru. Conceptual model and best practices for high-quality metadata publishing.Technical No indicators or criteria for completeness report, PlanetData, Deliverable 2.1, 2012,http://planet- data-wiki.sti2.at/web/File:D2.1.pdf.• PlanetData Intensionally compl. : the schema • P. N. Mendes and C. Bizer. Survey report state of the art contains all the necessary attributes,; in mapping, quality assessment and data fusion. Technical Extensionally compl. : all instances re report, LOD2- Creating Knowledge out of Interlinked data, quired al present), Deliverable 4.3.1, 2011,http://static.lod2.eu/Deliverables LDS Completeness: relevant properties have a values• SIEVE SIEVE deploys some of the idea developed • P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linked in WIQA and LDS completeness data quality assessment and fusion. In D. Srivastava and I. They don’t explicitly address quality for Ari, editors, LWDM EDBT/ICDT Workshops, pp. 116-123. Linksets ACM, 2012.Riccardo Alberton 29
  • 28. Related work: (extended discussion in the paper)• Link-QA Different approach: They apply classic network measure such • C. Gueret, P. T. Groth, C. Stadler, and J. Lehmann. as degree, centrality, clustering coefficient + Assessing linked data mappings using network measures. open-sameAs chain, description richness In E. Simperl, P. Cimiano, A. Polleres, O. Corcho, and V. for determining whether a bunch of links Presutti, editors, ESWC, volume 7295 of Lecture Notes in improves the overall dataset quality Computer Science, pp. 87-102. Springer, 2012 Quality of interlinking not for linkset LINK-QA works on links independently of they are part or not of the same linksets; LINK-QA addresses correctness and it does not deal with Completeness LINK-QA is for ranking sets of links, it can be used to say a linkset is better than another, but it does not suggest what is the next move a consumer should take to improve his linksetRiccardo Alberton 30
  • 29. Conclusions Contribution: Quality measure for linkset • Formalization for indicators, score functions and aggregation metrics; • The only measure explicitly addressing linkset completeness for dataset complementation • A first proof of concept prototype (JAVA-JENA) On-going and Future work • Validation on the LOD, • How many “incomplete” Linksets can we detect in the LOD? • Extension for considering others than owl:sameAs Linkset ( skos:exactMatch?!) • Other dimensions than completeness..Riccardo Albertoni 31

×