0
Assessing Linkset Quality ForComplementing Third Party DatasetsRiccardo Albertoni1,2, Asunción Gómez Pérez11Ontology Engin...
2MotivationsRiccardo AlbertoniLINKED DATA’s PROMISE:Evolving the Web into a Global Data SpaceIt should help to overcome da...
3MotivationRiccardo AlbertoniWhat does thisarrow mean ??NO GROUND CONCEPTaboutwhat makes a linksetsuitable for a targetapp...
4What is Linkset Quality for?Linked Data Publishers can check if a linkset theyhave provided• is good enough or need to be...
5foaf:madeaPub1Pub2bfoaf:madePub3Pub4Yolanda GilDBLPYLinkset La owl:sameAs a’b owl:sameAs b’XLfoaf:membera’Afflii5Affili4b...
6What is a Linkset ? (http://vocab.deri.ie/void)Riccardo AlbertoniEvery linkset is a special kind of dataset !!Every links...
7Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-driven...
8Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-driven...
9INDICATORS: Examples on DBLP & SWDFRiccardo Albertonifoaf:Organizationfoaf:Personro:FullPaperfoaf:Documentfoaf:Agentswr:P...
11INDICATORS: Examples on DBLP & SWDFRiccardo Albertonifoaf:Organizationfoaf:Personro:FullPaperfoaf:Documentfoaf:Agentswr:...
12Quality indicators: TypesRiccardo AlbertoniDataset/LinksetPower set on thepossible Userdefined Typese.g.owl:Class, owl:R...
13Quality indicators: # of Entity for a TypeRiccardo AlbertoniDataset/LinksetOne of the possible Userdefined TypesSet of (...
15Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-drive...
16SCORING FUNCTIONS: Linkset Type Coverage (1)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP...
17SCORING FUNCTIONS: Linkset Type Coverage (2)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP...
18Definition of Linkset Type CoverageRiccardo AlbertoniLinksetTarget datasetConsidering a dataset X, What percentage of ty...
19SCORING FUNCTION: Ideas behind Type Completeness (1)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:Proceed...
20SCORING FUNCTION: Ideas behind Type Completeness(2)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:Proceedi...
21Formalization of Linkset Type CompletenessRiccardo AlbertoniLinksetTerget dataset 2Target dataset 1Types In the subject ...
22Example on Type CompletenessRiccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1Type(DBLP...
23foaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1L1 and L2 are indistinguishable from the point of view ...
25Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-drive...
26Riccardo AlbertoniAggregate Metrics: Interpretation uponthe presented score functionsInterpretation is summed upas “deci...
27Related work: (extended discussion in the paper)• WIQA is a Information QualityAssessment Framework• C. Bizer and R. Cyg...
28Related work: (extended discussion in the paper)• Link-QA• C. Gueret, P. T. Groth, C. Stadler, and J. Lehmann.Assessing ...
29ConclusionsContribution: Quality measure on linksets• The only measure explicitly addressing linksetcompleteness for dat...
30THANKS for your ATTENTION!riccardo.albertoni@ge.imati.cnr.itRiccardo Albertoni
Upcoming SlideShare
Loading in...5
×

Linkset quality (LWDM 2013)

193

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
193
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Linkset quality (LWDM 2013)"

  1. 1. Assessing Linkset Quality ForComplementing Third Party DatasetsRiccardo Albertoni1,2, Asunción Gómez Pérez11Ontology Engineering GroupDepartamento de Inteligencia ArtificialFacultad de InformáticaUniversidad Politécnica de Madrid2CNR-IMATI,Via De Marini, 6, Torre di Francia, 16149 Genova, Italy3RD INTERNATIONAL WORKSHOP ON LINKED WEB DATAMANAGEMENT (LWDM 2013)in conjunction with the 16th International Conference on ExtendingDatabase Technology (EDBT 2013)March 22, 2013 - Genoa, Italy
  2. 2. 2MotivationsRiccardo AlbertoniLINKED DATA’s PROMISE:Evolving the Web into a Global Data SpaceIt should help to overcome data silos effect….So manybubbles there,THAT’S SOCOOL!!BUT ….Can I exploitthat third partydata for myOWNANALYSES?
  3. 3. 3MotivationRiccardo AlbertoniWhat does thisarrow mean ??NO GROUND CONCEPTaboutwhat makes a linksetsuitable for a targetapplicationWell founded works onquality for datasets, butLinksets are not yet directlyaddressed!SWDFDBLP
  4. 4. 4What is Linkset Quality for?Linked Data Publishers can check if a linkset theyhave provided• is good enough or need to be improved;• is still good enough after one of the two targetdatasets is updated.Linked Data Consumers can• figure out if they can or can’t rely on a linkset;• have a first guess of what is the next move they cantake to improve the linkset;• rank possible linkset alternatives.Riccardo Albertoni
  5. 5. 5foaf:madeaPub1Pub2bfoaf:madePub3Pub4Yolanda GilDBLPYLinkset La owl:sameAs a’b owl:sameAs b’XLfoaf:membera’Afflii5Affili4b’foaf:memberAffili3XJournal 1c’Complementing a Dataset X via a Linkset L≠Complementation mightintroduce some “data missing”The less “data missing” (likeresearcher c) are introduced themore the Linkset is complete
  6. 6. 6What is a Linkset ? (http://vocab.deri.ie/void)Riccardo AlbertoniEvery linkset is a special kind of dataset !!Every linkset has two target datasets:Subject and Object datasetsEvery linkset should have only onelinking propertyowl:sameAs linksets
  7. 7. 7Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQApolicy framework. J. Web Sem., 7(1):1-10, 2009What to define providing a qualitymeasureProvided in this Linkset qualityQuality Indicator is an aspect of a data itemor data set that may give an indication to theuser of the suitability of the data for someintended use.Entities TypesNumber of Entities for Types… …Scoring Function namely, functionsevaluating quality indicators to measure thesuitability of the data for some intended use.Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for TypeAggregate Metric user-specifiedassessment metric built upon scoringfunctions. These aggregations produce newassessment values through the average,sum, max, min or threshold functions appliedto the set of scoring functions.Interpretation tables:interpretation on the scoringfunctions that helps in figuring outwhich is the next action to do
  8. 8. 8Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQApolicy framework. J. Web Sem., 7(1):1-10, 2009What to define providing a qualitymeasureProvided in this Linkset qualityQuality Indicator is an aspect of a data itemor data set that may give an indication to theuser of the suitability of the data for someintended use.Entities TypesNumber of Entities for Types… …Scoring Function namely, functionsevaluating quality indicators to measure thesuitability of the data for some intended use.Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for TypeAggregate Metric user-specifiedassessment metric built upon scoringfunctions. These aggregations produce newassessment values through the average,sum, max, min or threshold functions appliedto the set of scoring functions.Interpretation tables:interpretation on the scoringfunctions that helps in figuring outwhich is the next action to do
  9. 9. 9INDICATORS: Examples on DBLP & SWDFRiccardo Albertonifoaf:Organizationfoaf:Personro:FullPaperfoaf:Documentfoaf:Agentswr:Proceedingsswrc:ProceedingsDBLP SWDFro:ShortPaperro:PosterPaperType(DBLP) Type(SWDF)#E4Type(foaf:Agent,DBLP)=1000000#E4Type(foaf:Document,DBLP)=1984087#E4Type(swrc:Proceedings,DBLP)=1108400
  10. 10. 11INDICATORS: Examples on DBLP & SWDFRiccardo Albertonifoaf:Organizationfoaf:Personro:FullPaperfoaf:Documentfoaf:Agentswr:Proceedingsswrc:ProceedingsDBLP SWDFL2ro:PosterPaperType(DBLP) Type(SWDF)#E4Type(foaf:Agent,L2)=100#E4Type(foaf:Person,L2)=100 Type(L2)
  11. 11. 12Quality indicators: TypesRiccardo AlbertoniDataset/LinksetPower set on thepossible Userdefined Typese.g.owl:Class, owl:Restriction, skos:Concept, skos:ConceptSchemeReturns the typesof entitiesexposed in adataset or alinkset
  12. 12. 13Quality indicators: # of Entity for a TypeRiccardo AlbertoniDataset/LinksetOne of the possible Userdefined TypesSet of (positive) integerReturns the number of entities exposedin a dataset/ linkset for a given typeBlank nodes are left out
  13. 13. 15Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQApolicy framework. J. Web Sem., 7(1):1-10, 2009What to define providing a qualitymeasureProvided in this Linkset qualityQuality Indicator is an aspect of a data itemor data set that may give an indication to theuser of the suitability of the data for someintended use.Entities TypesNumber of Entities for Types… …Scoring Function namely, functionsevaluating quality indicators to measure thesuitability of the data for some intended use.Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for TypeAggregate Metric user-specifiedassessment metric built upon scoringfunctions. These aggregations produce newassessment values through the average,sum, max, min or threshold functions appliedto the set of scoring functions.Interpretation tables:interpretation on the scoringfunctions that helps in figuring outwhich is the next action to do
  14. 14. 16SCORING FUNCTIONS: Linkset Type Coverage (1)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1Type(DBLP) Type(SWDF)Complementing DBLP with L1, are we adding somenew entities to DBLP?DBLPL1 “imports” organizations for the researchers(foaf:Agent) involved in the linkset
  15. 15. 17SCORING FUNCTIONS: Linkset Type Coverage (2)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFType(DBLP) Type(SWDF)Complementing SWDF with L2, we don’t add any new type of entitiesSWDFL2 has exactly the same kind of Entities of SWDFswr:ProceedingsL2
  16. 16. 18Definition of Linkset Type CoverageRiccardo AlbertoniLinksetTarget datasetConsidering a dataset X, What percentage of typesof X that are also covered by the linkset?
  17. 17. 19SCORING FUNCTION: Ideas behind Type Completeness (1)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1Type(DBLP) Type(SWDF)L1 is type completeIt does not make sense to run a procedure ( e.g., SILK) trying to discoverinterlinks between the instances of swrc:Proceedings and foaf:Organization!!!
  18. 18. 20SCORING FUNCTION: Ideas behind Type Completeness(2)Riccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1Type(DBLP) Type(SWDF)swr:ProceedingsWe should try to run a procedure ( e.g., SILK) trying to discover interlinksbetween the instances of swrc:Proceedings and swr:Proceedings!!!Alignmentamong classesL1 is type incomplete
  19. 19. 21Formalization of Linkset Type CompletenessRiccardo AlbertoniLinksetTerget dataset 2Target dataset 1Types In the subject that arenot considered in the linksetreturns the set of types that X have an equivalent in Yaccording to a relation of equivalence among classesA linkset is complete with respect to types  LTCom= 1LTCom<1 otherwise
  20. 20. 22Example on Type CompletenessRiccardo Albertonifoaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1Type(DBLP) Type(SWDF)swr:ProceedingsL2LTCom(L1,DBLP, SWDF) = 1- (|{swrc:Proceedings}| /|{swrc:Proceedings,foaf:Person}|)=1/2LTCom(L2,DBLP, SWDF) = 1- (|{}| /|{swr:Proceedings,foaf:Person}|)=1
  21. 21. 23foaf:Organizationfoaf:Personfoaf:Agentswrc:ProceedingsDBLP SWDFL1L1 and L2 are indistinguishable from the point of view of typesWhich is the most interesting? L1 or L2? Or L1 U L2 ?swr:ProceedingsL2Linkset Entity Coverage for TypeRiccardo AlbertoniNumber of Entity of type T inthe linkset LNumber of Entity of type T inthe Dataset XHow good is a linkset providing 100 owl:sameAs?
  22. 22. 25Defining quality measuresRiccardo AlbertoniConsidering the terminology adopted byC. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQApolicy framework. J. Web Sem., 7(1):1-10, 2009What to define providing a qualitymeasureProvided in this Linkset qualityQuality Indicator is an aspect of a data itemor data set that may give an indication to theuser of the suitability of the data for someintended use.Entities TypesNumber of Entities for Types… …Scoring Function namely, functionsevaluating quality indicators to measure thesuitability of the data for some intended use.Linkset Type CoverageLinkset Type CompletenessLinkset Entity Coverage for TypeAggregate Metric user-specifiedassessment metric built upon scoringfunctions. These aggregations produce newassessment values through the average,sum, max, min or threshold functions appliedto the set of scoring functions.Interpretation tables:interpretation on the scoringfunctions that helps in figuring outwhich is the next action to do
  23. 23. 26Riccardo AlbertoniAggregate Metrics: Interpretation uponthe presented score functionsInterpretation is summed upas “decision tree”
  24. 24. 27Related work: (extended discussion in the paper)• WIQA is a Information QualityAssessment Framework• C. Bizer and R. Cyganiak. Quality-driven informationfiltering using the WIQA policy framework. J. WebSem.,7(1):110, 2009• LOD2• P. N. Mendes, C. Bizer, J. H. Young, Z. Miklos, J.-P.Calbimonte, and A. Moraru. Conceptual model and bestpractices for high-quality metadata publishing.Technicalreport, PlanetData, Deliverable 2.1, 2012,http://planet-data-wiki.sti2.at/web/File:D2.1.pdf.• PlanetData• P. N. Mendes and C. Bizer. Survey report state of the artin mapping, quality assessment and data fusion. Technicalreport, LOD2- Creating Knowledge out of Interlinked data,Deliverable 4.3.1, 2011,http://static.lod2.eu/Deliverables• SIEVE• P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linkeddata quality assessment and fusion. In D. Srivastava and I.Ari, editors, LWDM EDBT/ICDT Workshops, pp. 116-123.ACM, 2012.Riccardo AlbertonContributes with a policy language,engine for interpreting such policies,Explanation if a piece of informationsatisfies that policyQuality criteria are parameters of thesystem It does not aim at proposing newquality measuresReviews quality dimensionsNo indicators or criteria for completenessIntensionally compl. : the schemacontains all the necessary attributes,;Extensionally compl. : all instances required al present),LDS Completeness: relevant propertieshave a valuesSIEVE deploys some of the idea developedin WIQA and LDS completenessThey don’t explicitly address quality forLinksets
  25. 25. 28Related work: (extended discussion in the paper)• Link-QA• C. Gueret, P. T. Groth, C. Stadler, and J. Lehmann.Assessing linked data mappings using network measures.In E. Simperl, P. Cimiano, A. Polleres, O. Corcho, and V.Presutti, editors, ESWC, volume 7295 of Lecture Notes inComputer Science, pp. 87-102. Springer, 2012Riccardo AlbertonDifferent approach:They apply classic network measure suchas degree, centrality, clustering coefficient +open-sameAs chain, description richnessfor determining whether a bunch of linksimproves the overall dataset qualityQuality of interlinking not for linksetLINK-QA works on links independentlyof they are part or not of the same linksets;LINK-QA addresses correctness and itdoes not deal withCompletenessLINK-QA is for ranking sets of links, itcan be used to say a linkset is better thananother, but itdoes not suggest what is the next movea consumer shouldtake to improve his linkset
  26. 26. 29ConclusionsContribution: Quality measure on linksets• The only measure explicitly addressing linksetcompleteness for dataset complementation• Formalization for indicators, score functions andaggregation metrics;• A first proof of concept prototype (JAVA-JENA)On-going and Future work• Validation on the LOD,• How many “incomplete” Linksets can we detect in the LOD?• Extension for considering others than owl:sameAsLinkset (e-g., skos:exactMatch)• Other dimensions than completeness (e.g.,Timeliness, Availability, Consistency)Riccardo Albertoni
  27. 27. 30THANKS for your ATTENTION!riccardo.albertoni@ge.imati.cnr.itRiccardo Albertoni
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×