Assessing Linkset Quality ForComplementing Third Party Datasets   Riccardo Albertoni, Asunción Gómez Pérez                ...
Motivations                       LINKED DATA’s PROMISE:             Evolving the Web into a Global Data Space            ...
MotivationWhat does this arrow mean                  Hard to say!!!           ??Can I use DBLP to extend                  ...
What is Linkset Quality for?        Linked Data Publishers can check if a linkset That          they have provided        ...
Complementing a Dataset X via a Linkset L              DBLP                         Pub1     Y      a                     ...
Complementing a Dataset X via a Linkset L                                                                              ≠  ...
What is a Linkset ? (http://vocab.deri.ie/void) Every linkset a special kind of dataset !!Every linkset has two target dat...
Defining quality measures  Considering the terminology adopted by  C. Bizer and R. Cyganiak. Quality-driven information fi...
Example: indicators applied on DBLP & SWDFType(DBLP)                                                         Type(SWDF)   ...
Quality indicators: Types                     Power set on the     Dataset/         possible User     Linkset          def...
Quality indicators: # of Entity for a Type                     Dataset/                One of the possible User           ...
Example: indicators applied on DBLP & SWDFType(DBLP)                                                              Type(SWD...
Ideas behind the scoring function: Type Coverage     Type(DBLP)                                                         Ty...
Ideas behind the scoring function: Type Coverage     Type(DBLP)                                                         Ty...
Formalization of Linkset Type Coverage       LinksetTarget datasetRiccardo Albertoni                                      ...
scoring functions: Ideas behind Type Completeness (1)                        Focusing on the type of entities,            ...
scoring function: Ideas behind Type Completeness(2)                        Focusing on the type of entities,              ...
Formalization of Linkset Type Completeness            Target dataset 1                                                   T...
Ideas behind the scoring function: Type Coverage     Type(DBLP)                                                           ...
LINKSET Type CompletenessRiccardo Albertoni                          22
Linkset Entity Coverage for Type                                                    foaf:Organization                     ...
Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni                                          24
Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni                                          25
Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni                                          26
Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni                                          27
Aggregate Metrics reasoning on TYPEs and Entity coverageRiccardo Albertoni                                    28
Related work: (extended discussion in the paper)• WIQA is a Information Quality                                           ...
Related work: (extended discussion in the paper)• Link-QA                                                                 ...
Conclusions        Contribution: Quality measure for linkset        • Formalization for indicators, score functions and   ...
Upcoming SlideShare
Loading in …5
×

Linkset quality

437 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
437
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linkset quality

  1. 1. Assessing Linkset Quality ForComplementing Third Party Datasets Riccardo Albertoni, Asunción Gómez Pérez ralbertoni@delicias.dia.fi.upm.es Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid A paper pertaining to this work is going to be presented at 3RD INTERNATIONAL WORKSHOP ON LINKED WEB DATA MANAGEMENT (LWDM 2013) in conjunction with the 16th International Conference on Extending Database Technology (EDBT 2013) March 22, 2013 - Genoa, Italy
  2. 2. Motivations LINKED DATA’s PROMISE: Evolving the Web into a Global Data Space It should help to overcome data silos effect…. So many bubbles there, THAT’S SO COOL!! BUT …. Can I exploit that third party data for my OWN ANALYSES?Riccardo Albertoni 2
  3. 3. MotivationWhat does this arrow mean Hard to say!!! ??Can I use DBLP to extend You must engage anSWDF in order to perform painful analysis of the two any kind target task? datasets and their interlinks in order to know .. NO ground concepts DBLP about What makes a linkset SWDF suitable for a target application ( some concepts but forRiccardo Albertoni datasets, not linksets) 3
  4. 4. What is Linkset Quality for? Linked Data Publishers can check if a linkset That they have provided • is good enough or need to be improved; • is still good enough when one of the two target datasets is updated. Linked Data Consumers can • figure out if they can or can’t rely on a linkset; • have a first guess of what is the next move they can take to improve the linkset; • rank possible linkset alternatives.Riccardo Albertoni 4
  5. 5. Complementing a Dataset X via a Linkset L DBLP Pub1 Y a Pub2 foaf:made Journal 1 Pub3 ≠ b Pub4 Linkset L Yolanda Gil foaf:madeAfflii5 a owl:sameAs a’ Complementation might a’ b owl:sameAs b’ introduce some “data missing”Affili4 foaf:member The less “data missing” (like b’ researcher c) are introduced the more the Linkset is complete foaf:member c’Affili3 X XL 5
  6. 6. Complementing a Dataset X via a Linkset L ≠ Linkset L Yolanda Gil a owl:sameAs a’ DBLP YAfflii5 b owl:sameAs b’ Pub1 Complementation might a’ a introduce some “data missing”Affili4 Pub2 foaf:member foaf:made Journal 1 The less “data missing” (like Pub3 b’ b researcher c) are introduced the Pub4 more the Linkset is complete foaf:member foaf:made c’Affili3 X XL 6
  7. 7. What is a Linkset ? (http://vocab.deri.ie/void) Every linkset a special kind of dataset !!Every linkset has two target datasets: Every linkset should have only one Subject and Object datasets linking propertyRiccardo Albertoni owl:sameAs linksets 7
  8. 8. Defining quality measures Considering the terminology adopted by C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. J. Web Sem., 7(1):1-10, 2009What to define providing a quality Provided in this Linkset qualitymeasureQuality Indicator is an aspect of a data item Entities Typesor data set that may give an indication to the Number of Entities for Typesuser of the suitability of the data for some ……intended use.Scoring Function namely, functions Linkset Type Coverageevaluating quality indicators to measure the Linkset Type Completenesssuitability of the data for some intended use. Linkset Entity Coverage for TypeAggregate Metric user-specified Interpretation tables:assessment metric built upon scoring interpretation on the scoringfunctions. These aggregations produce new functions that helps in figuring outassessment values through the average, which is the next action to dosum, max, min or threshold functions appliedto the set of scoring functions.Riccardo Albertoni 8
  9. 9. Example: indicators applied on DBLP & SWDFType(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person swrc:Proceedings swr:Proceedings foaf:Document ro:FullPaper ro:PosterPaper ro:ShortPaper DBLP SWDF#E4Type(foaf:Agent,DBLP)=1000000#E4Type(swrc:Proceedings,DBLP)=1108400#E4Type(foaf:Document,DBLP)=1984087 Riccardo Albertoni 9
  10. 10. Quality indicators: Types Power set on the Dataset/ possible User Linkset defined Types Returns the types of entities exposed in a dataset or a linkset e.g. owl:Class, owl:Restricti on, skos:Concept, sko s:ConceptSchemeRiccardo Albertoni 11
  11. 11. Quality indicators: # of Entity for a Type Dataset/ One of the possible User Linkset defined Types Set of (positive) integer Returns the number of entities exposed in a dataset/ linkset for a given typeRiccardo Albertoni 12
  12. 12. Example: indicators applied on DBLP & SWDFType(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings L2 foaf:Document ro:FullPaper L3 ro:PosterPaper L4 ro:ShortPaper DBLP SWDF #E4Type(foaf:Agent, L1)= 100 Type(L1) #E4Type(foaf:Person, L1)=100 Type(L2) Riccardo Albertoni 13
  13. 13. Ideas behind the scoring function: Type Coverage Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings DBLP SWDF Complementing DBLP with L1, are we adding some new entities to DBLP? DBLPL1 provides organizations for the researchers that have been interlinkedRiccardo Albertoni 15
  14. 14. Ideas behind the scoring function: Type Coverage Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings L2 DBLP SWDF Complementing SWDF with L1, we don’t add any new type of entities SWDFL1 has exactly the same kind of Entities of SWDF We can try to extend SWDF in term of extensional coverage: getting In SWDFL1 U DBLP more researchers/proceedings entities, but we cannot enlarge the type of EntitiesRiccardo Albertoni 16
  15. 15. Formalization of Linkset Type Coverage LinksetTarget datasetRiccardo Albertoni 17
  16. 16. scoring functions: Ideas behind Type Completeness (1) Focusing on the type of entities, is the owl:sameAs linksets L1 complete? Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings DBLP SWDF It does not make sense to run a procedure ( e.g., SILK) trying to discoverinterlinks between the instances of swrc:Proceedings and foaf:Organization!!! Riccardo Albertoni 18
  17. 17. scoring function: Ideas behind Type Completeness(2) Focusing on the type of entities, is the owl:sameAs linksets L1 complete? Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings Alignment DBLP SWDFamong classes We should try to run a procedure ( e.g., SILK) trying to discover interlinks between the instances of swrc:Proceedings and swr:Proceedings!!! Riccardo Albertoni 19
  18. 18. Formalization of Linkset Type Completeness Target dataset 1 Terget dataset 2 Linkset Types In the subject that are not considered in the linkset returns the set of types that X have an equivalent in Y according to a relation of equivalence among classes A linkset is complete with respect to types  LTCom= 1 LTCom<1 otherwiseRiccardo Albertoni 20
  19. 19. Ideas behind the scoring function: Type Coverage Type(DBLP) Type(SWDF) foaf:Organization foaf:Agent foaf:Person L1 swrc:Proceedings swr:Proceedings L2 DBLP SWDF LTCom(L1,DBLP, SWDF) = 1- (|{swrc:Proceedings}| / |{swrc:Proceedings,foaf:Person}|)=1/2 LTCom(L2,DBLP, SWDF) = 1- (|{}| / |{swr:Proceedings,foaf:Person}|)=1Riccardo Albertoni 21
  20. 20. LINKSET Type CompletenessRiccardo Albertoni 22
  21. 21. Linkset Entity Coverage for Type foaf:Organization foaf:Agent foaf:Person swrc:Proceedings swr:Proceedings L 1 L2 DBLP SWDF L1 and L2 are indistinguishable from the point of view of types Which is the most interesting? L1 or L2? Or L1 U L2 ? How good is a linkset providing 100 owl:sameAs? Number of Entity of type T in the linkset L Number of Entity of type T in the Dataset XRiccardo Albertoni 23
  22. 22. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 24
  23. 23. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 25
  24. 24. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 26
  25. 25. Aggregate Metrics reasoning only on TYPEsRiccardo Albertoni 27
  26. 26. Aggregate Metrics reasoning on TYPEs and Entity coverageRiccardo Albertoni 28
  27. 27. Related work: (extended discussion in the paper)• WIQA is a Information Quality Contributes with a policy language, engine for interpreting such policies, Explanation if a piece of information Assessment Framework satisfies that policy • C. Bizer and R. Cyganiak. Quality-driven information Quality criteria are parameters of the filtering using the WIQA policy framework. J. WebSem., system It does not aim at proposing new 7(1):110, 2009 quality measures• LOD2 • P. N. Mendes, C. Bizer, J. H. Young, Z. Miklos, J.-P. Reviews quality dimensions Calbimonte, and A. Moraru. Conceptual model and best practices for high-quality metadata publishing.Technical No indicators or criteria for completeness report, PlanetData, Deliverable 2.1, 2012,http://planet- data-wiki.sti2.at/web/File:D2.1.pdf.• PlanetData Intensionally compl. : the schema • P. N. Mendes and C. Bizer. Survey report state of the art contains all the necessary attributes,; in mapping, quality assessment and data fusion. Technical Extensionally compl. : all instances re report, LOD2- Creating Knowledge out of Interlinked data, quired al present), Deliverable 4.3.1, 2011,http://static.lod2.eu/Deliverables LDS Completeness: relevant properties have a values• SIEVE SIEVE deploys some of the idea developed • P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linked in WIQA and LDS completeness data quality assessment and fusion. In D. Srivastava and I. They don’t explicitly address quality for Ari, editors, LWDM EDBT/ICDT Workshops, pp. 116-123. Linksets ACM, 2012.Riccardo Alberton 29
  28. 28. Related work: (extended discussion in the paper)• Link-QA Different approach: They apply classic network measure such • C. Gueret, P. T. Groth, C. Stadler, and J. Lehmann. as degree, centrality, clustering coefficient + Assessing linked data mappings using network measures. open-sameAs chain, description richness In E. Simperl, P. Cimiano, A. Polleres, O. Corcho, and V. for determining whether a bunch of links Presutti, editors, ESWC, volume 7295 of Lecture Notes in improves the overall dataset quality Computer Science, pp. 87-102. Springer, 2012 Quality of interlinking not for linkset LINK-QA works on links independently of they are part or not of the same linksets; LINK-QA addresses correctness and it does not deal with Completeness LINK-QA is for ranking sets of links, it can be used to say a linkset is better than another, but it does not suggest what is the next move a consumer should take to improve his linksetRiccardo Alberton 30
  29. 29. Conclusions Contribution: Quality measure for linkset • Formalization for indicators, score functions and aggregation metrics; • The only measure explicitly addressing linkset completeness for dataset complementation • A first proof of concept prototype (JAVA-JENA) On-going and Future work • Validation on the LOD, • How many “incomplete” Linksets can we detect in the LOD? • Extension for considering others than owl:sameAs Linkset ( skos:exactMatch?!) • Other dimensions than completeness..Riccardo Albertoni 31

×