Your SlideShare is downloading. ×
  • Like
  • Save
Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

  • 1,168 views
Published

Presented at IIWeb2012

Presented at IIWeb2012

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,168
On SlideShare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Personal background
  • Executive summary vs. overview
  • Executive summary vs. overview
  • Complete stack of semantic web technologies is based on open standards and protocols.The semantic web technologies focus on application layer of internet stack.
  • Go back to research question slidesGo back to work flow and highlight whats needed
  • Emphasize blendedReference SIGMOD

Transcript

  • 1. Digital Enterprise Research Institute www.deri.ie Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications Umair ul Hassan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute National University of Ireland, Galway Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
  • 2. OutlineDigital Enterprise Research Institute www.deri.ie  Motivation & Problem Space  Identity Resolution on the Linked Open Data (LOD) Web  Proposed Approach  LOD Application Architecture  How it relates to existing works  Evaluation  Conclusion & Future Work
  • 3. OverviewDigital Enterprise Research Institute www.deri.ie  Identity Resolution in the Linked Open Data Web  Real-world entities have multiple identifiers in LOD  Identity resolution links have associated uncertainty  LOD Applications require user verification of links  Problem  Feedback for all links is infeasible for large datasets  LOD Applications have domain specific utility of links  Proposed Approach  Leverages matching dependencies to define domain specific requirements of identity resolution  Ranks identity resolution links according to value of perfect information
  • 4. Linked Open Data (LOD)Digital Enterprise Research Institute www.deri.ie  Expose and interlink datasets on the Web  Using URIs to identify “things” in your data  Using a graph representation (RDF) to describe URIs  Vision: The Web as a huge graph database Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 5. Linked Data ExampleDigital Enterprise Research Institute www.deri.ie Identity resolution links Multiple Identifiers
  • 6. Identity Resolution in LODDigital Enterprise Research Institute www.deri.ie  Identity resolution is required for consolidation of data in applications consuming LOD  Three sources of identity resolution links  Provided by data publishers (e.g. dbpedia.org)  Generated by consumer through tools (e.g. SILK, SEMIRI, RiMOM)  Maintained by third party web services (e.g. sameas.org)  Uncertainty associated with links  Due to multiple identity equivalence interpretations  Due to characteristics of link generation algorithms (similarity based)
  • 7. Identity Resolution ProblemDigital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Challenges  Domain specific semantic requirements – How to define domain specific requirements of quality for Linked Data applications?  Limited user attention – How to rank candidate links according to their benefit to maximize utility of user feedback?
  • 8. Identity Resolution ProblemDigital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Proposed Approach  Domain specific semantic requirements – Leverage Matching Dependencies  Limited user attention – Employ value of perfect information theory
  • 9. LOD Application ArchitectureDigital Enterprise Research Institute www.deri.ie Utility Feedback Consolidation Module Module Module Candidate Links Questions Rules Feedback Matching Utility Dependencies Improvement Ranked Feedback TasksTom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition), 1-136. Morgan & Claypool.
  • 10. Related WorkDigital Enterprise Research Institute www.deri.ie  Jeffery et al., “Pay-as-you-go user feedback for dataspace systems,” in Proceedings of the 2008 ACM SIGMOD Conference, 2008, pp. 847-860.  Utility:  In terms of cardinality of query results on dataspace  General metric not suitable for application specific data quality  Assumption:  Availability of global query statistics – Problematic for Linked Open Data
  • 11. Proposed ApproachDigital Enterprise Research Institute www.deri.ie  Domain Specific Utility  Define utility in terms of user specified rules i.e. matching dependencies  Rank candidates links for user feedback according to value of perfect information  Assumptions  We assume matching dependencies are either provided by user or generated through existing tools  Utility is based on satisfaction ratio of dependencies in dataspace
  • 12. Proposed ApproachDigital Enterprise Research Institute www.deri.ie  Matching Dependencies  Matching Rule  Example  Utility of rule g (mk ) U ( Dmk , M {mk }) pk  Value of Perfect Information U ( Dmk , M {mk })(1 pk ) U ( D, M )
  • 13. EvaluationDigital Enterprise Research Institute www.deri.ie  Measure change in utility of a dataspace according to matching rules after a specific number of feedback iterations  Candidate links generated by the Silk framework
  • 14. EvaluationDigital Enterprise Research Institute www.deri.ie  Datasets IIMB 2009 Dataset UCI-Adult Dataset Drug Dataset Data Source Instance Matching Benchmark UCI Machine Learning Repository Instance Matching Benchmark 2009 2010 Data Collection IIMB 2009 US Consensus Dataset DrugBank and Sider Datasets - Reference Ontology - Manually created duplicates and - Interlinking between two datasets - Ontology #16 with errors in data value errors of same domain attributes Entity Types imdb:Movie foaf:Person drugbank:drugs, sider:drugs Total Triples 291 64000 14348 Total Entity IDs 44 4000 5696 Total Attributes 9 16 3 Total Values 130 10878 8473 Candidate Links 81 72 94 Correct Links 22 72 66
  • 15. EvaluationDigital Enterprise Research Institute www.deri.ie IIMB 2009 Dataset UCI-Adult Dataset 100% 100% Dataspace Utility Improvement Dataspace Utility Improvement 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% VPI_RULES 30% VPI_RULES 20% CONFIDENCE 20% CONFIDENCE 10% RANDOM 10% RANDOM 0% 0% 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Feedback Iteration Feedback Iteration
  • 16. ConclusionDigital Enterprise Research Institute www.deri.ie  Matching dependencies provide an effective mechanism to:  Represent entity matching rules  Specify domain specific semantic requirements  Measure utility of dataspaces  Value of perfect information enables effective ranking strategy for user feedback  In the three datasets 100% utility improvement was reached under 40% of user feedback
  • 17. Future WorkDigital Enterprise Research Institute www.deri.ie  Expand to other data quality problems  Expand on types of dependencies such as comparable dependencies and order dependencies  Allow multi-user feedback for collaborative data cleaning