Digital Enterprise Research Institute                                                                         www.deri.ie ...
OutlineDigital Enterprise Research Institute                                        www.deri.ie             Motivation & ...
OverviewDigital Enterprise Research Institute                                                      www.deri.ie           ...
Linked Open Data (LOD)Digital Enterprise Research Institute                                                               ...
Linked Data ExampleDigital Enterprise Research Institute                          www.deri.ie        Identity resolution l...
Identity Resolution in LODDigital Enterprise Research Institute                                                      www.d...
Identity Resolution ProblemDigital Enterprise Research Institute                                                         w...
Identity Resolution ProblemDigital Enterprise Research Institute                                                www.deri.i...
LOD Application ArchitectureDigital Enterprise Research Institute                                                         ...
Related WorkDigital Enterprise Research Institute                                                    www.deri.ie         ...
Proposed ApproachDigital Enterprise Research Institute                                                             www.der...
Proposed ApproachDigital Enterprise Research Institute                                        www.deri.ie             Mat...
EvaluationDigital Enterprise Research Institute                                   www.deri.ie             Measure change ...
EvaluationDigital Enterprise Research Institute                                                                           ...
EvaluationDigital Enterprise Research Institute                                                                           ...
ConclusionDigital Enterprise Research Institute                                     www.deri.ie             Matching depe...
Future WorkDigital Enterprise Research Institute                                     www.deri.ie             Expand to ot...
Upcoming SlideShare
Loading in …5
×

Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

1,365 views

Published on

https://www.insight-centre.org/content/leveraging-matching-dependencies-guided-user-feedback-linked-data-applications

Presented at IIWeb2012

ABSTRACT
This paper presents a new approach for managing integration quality and user feedback, for entity consolidation, within applications consuming Linked Open Data. The quality of a dataspace containing multiple linked datasets is defined in term of a utility measure, based on domain specific matching dependencies. Furthermore, the user is involved in the consolidation process through soliciting feedback about identity resolution links, where each candidate link is ranked according to its benefit to the dataspace; calculated by approximating the improvement in the utility of dataspace utility. The approach
evaluated on real world and synthetic datasets demonstrates the effectiveness of utility measure; through dataspace integration quality improvement that requires less overall user feedback iterations.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,365
On SlideShare
0
From Embeds
0
Number of Embeds
658
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Personal background
  • Executive summary vs. overview
  • Executive summary vs. overview
  • Complete stack of semantic web technologies is based on open standards and protocols.The semantic web technologies focus on application layer of internet stack.
  • Go back to research question slidesGo back to work flow and highlight whats needed
  • Emphasize blendedReference SIGMOD
  • Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

    1. 1. Digital Enterprise Research Institute www.deri.ie Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications Umair ul Hassan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute National University of Ireland, Galway Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
    2. 2. OutlineDigital Enterprise Research Institute www.deri.ie  Motivation & Problem Space  Identity Resolution on the Linked Open Data (LOD) Web  Proposed Approach  LOD Application Architecture  How it relates to existing works  Evaluation  Conclusion & Future Work
    3. 3. OverviewDigital Enterprise Research Institute www.deri.ie  Identity Resolution in the Linked Open Data Web  Real-world entities have multiple identifiers in LOD  Identity resolution links have associated uncertainty  LOD Applications require user verification of links  Problem  Feedback for all links is infeasible for large datasets  LOD Applications have domain specific utility of links  Proposed Approach  Leverages matching dependencies to define domain specific requirements of identity resolution  Ranks identity resolution links according to value of perfect information
    4. 4. Linked Open Data (LOD)Digital Enterprise Research Institute www.deri.ie  Expose and interlink datasets on the Web  Using URIs to identify “things” in your data  Using a graph representation (RDF) to describe URIs  Vision: The Web as a huge graph database Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
    5. 5. Linked Data ExampleDigital Enterprise Research Institute www.deri.ie Identity resolution links Multiple Identifiers
    6. 6. Identity Resolution in LODDigital Enterprise Research Institute www.deri.ie  Identity resolution is required for consolidation of data in applications consuming LOD  Three sources of identity resolution links  Provided by data publishers (e.g. dbpedia.org)  Generated by consumer through tools (e.g. SILK, SEMIRI, RiMOM)  Maintained by third party web services (e.g. sameas.org)  Uncertainty associated with links  Due to multiple identity equivalence interpretations  Due to characteristics of link generation algorithms (similarity based)
    7. 7. Identity Resolution ProblemDigital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Challenges  Domain specific semantic requirements – How to define domain specific requirements of quality for Linked Data applications?  Limited user attention – How to rank candidate links according to their benefit to maximize utility of user feedback?
    8. 8. Identity Resolution ProblemDigital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Proposed Approach  Domain specific semantic requirements – Leverage Matching Dependencies  Limited user attention – Employ value of perfect information theory
    9. 9. LOD Application ArchitectureDigital Enterprise Research Institute www.deri.ie Utility Feedback Consolidation Module Module Module Candidate Links Questions Rules Feedback Matching Utility Dependencies Improvement Ranked Feedback TasksTom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition), 1-136. Morgan & Claypool.
    10. 10. Related WorkDigital Enterprise Research Institute www.deri.ie  Jeffery et al., “Pay-as-you-go user feedback for dataspace systems,” in Proceedings of the 2008 ACM SIGMOD Conference, 2008, pp. 847-860.  Utility:  In terms of cardinality of query results on dataspace  General metric not suitable for application specific data quality  Assumption:  Availability of global query statistics – Problematic for Linked Open Data
    11. 11. Proposed ApproachDigital Enterprise Research Institute www.deri.ie  Domain Specific Utility  Define utility in terms of user specified rules i.e. matching dependencies  Rank candidates links for user feedback according to value of perfect information  Assumptions  We assume matching dependencies are either provided by user or generated through existing tools  Utility is based on satisfaction ratio of dependencies in dataspace
    12. 12. Proposed ApproachDigital Enterprise Research Institute www.deri.ie  Matching Dependencies  Matching Rule  Example  Utility of rule g (mk ) U ( Dmk , M {mk }) pk  Value of Perfect Information U ( Dmk , M {mk })(1 pk ) U ( D, M )
    13. 13. EvaluationDigital Enterprise Research Institute www.deri.ie  Measure change in utility of a dataspace according to matching rules after a specific number of feedback iterations  Candidate links generated by the Silk framework
    14. 14. EvaluationDigital Enterprise Research Institute www.deri.ie  Datasets IIMB 2009 Dataset UCI-Adult Dataset Drug Dataset Data Source Instance Matching Benchmark UCI Machine Learning Repository Instance Matching Benchmark 2009 2010 Data Collection IIMB 2009 US Consensus Dataset DrugBank and Sider Datasets - Reference Ontology - Manually created duplicates and - Interlinking between two datasets - Ontology #16 with errors in data value errors of same domain attributes Entity Types imdb:Movie foaf:Person drugbank:drugs, sider:drugs Total Triples 291 64000 14348 Total Entity IDs 44 4000 5696 Total Attributes 9 16 3 Total Values 130 10878 8473 Candidate Links 81 72 94 Correct Links 22 72 66
    15. 15. EvaluationDigital Enterprise Research Institute www.deri.ie IIMB 2009 Dataset UCI-Adult Dataset 100% 100% Dataspace Utility Improvement Dataspace Utility Improvement 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% VPI_RULES 30% VPI_RULES 20% CONFIDENCE 20% CONFIDENCE 10% RANDOM 10% RANDOM 0% 0% 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Feedback Iteration Feedback Iteration
    16. 16. ConclusionDigital Enterprise Research Institute www.deri.ie  Matching dependencies provide an effective mechanism to:  Represent entity matching rules  Specify domain specific semantic requirements  Measure utility of dataspaces  Value of perfect information enables effective ranking strategy for user feedback  In the three datasets 100% utility improvement was reached under 40% of user feedback
    17. 17. Future WorkDigital Enterprise Research Institute www.deri.ie  Expand to other data quality problems  Expand on types of dependencies such as comparable dependencies and order dependencies  Allow multi-user feedback for collaborative data cleaning

    ×