SlideShare a Scribd company logo
1 of 17
Digital Enterprise Research Institute                                                                         www.deri.ie




                 Leveraging Matching Dependencies for Guided
                   User Feedback in Linked Data Applications
                                                 Umair ul Hassan, Sean O’Riain, Edward Curry
                                                                     Digital Enterprise Research Institute
                                                                     National University of Ireland, Galway




 Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Outline
Digital Enterprise Research Institute                                        www.deri.ie




             Motivation & Problem Space
                    Identity Resolution on the Linked Open Data (LOD) Web
             Proposed Approach
                    LOD Application Architecture
                    How it relates to existing works
             Evaluation
             Conclusion & Future Work
Overview
Digital Enterprise Research Institute                                                      www.deri.ie




             Identity Resolution in the Linked Open Data Web
                    Real-world entities have multiple identifiers in LOD
                    Identity resolution links have associated uncertainty
                    LOD Applications require user verification of links
             Problem
                    Feedback for all links is infeasible for large datasets
                    LOD Applications have domain specific utility of links
             Proposed Approach
                    Leverages matching dependencies to define domain specific
                     requirements of identity resolution
                    Ranks identity resolution links according to value of perfect information
Linked Open Data (LOD)
Digital Enterprise Research Institute                                                                                            www.deri.ie




             Expose and interlink datasets on the Web
             Using URIs to identify “things” in your data
             Using a graph representation (RDF) to describe URIs
             Vision: The Web as a huge graph database




                                              Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked Data Example
Digital Enterprise Research Institute                          www.deri.ie




        Identity resolution links




                                        Multiple Identifiers
Identity Resolution in LOD
Digital Enterprise Research Institute                                                      www.deri.ie




             Identity resolution is required for consolidation of data in
              applications consuming LOD

             Three sources of identity resolution links
                    Provided by data publishers (e.g. dbpedia.org)
                    Generated by consumer through tools (e.g. SILK, SEMIRI, RiMOM)
                    Maintained by third party web services (e.g. sameas.org)


             Uncertainty associated with links
                    Due to multiple identity equivalence interpretations
                    Due to characteristics of link generation algorithms (similarity based)
Identity Resolution Problem
Digital Enterprise Research Institute                                                         www.deri.ie




             User feedback for uncertain links
                    Verify uncertain identity resolution links from users/experts
                    Improve quality of entity consolidation


             Challenges
                    Domain specific semantic requirements
                       – How to define domain specific requirements of quality for Linked
                         Data applications?


                    Limited user attention
                       – How to rank candidate links according to their benefit to maximize
                         utility of user feedback?
Identity Resolution Problem
Digital Enterprise Research Institute                                                www.deri.ie




             User feedback for uncertain links
                    Verify uncertain identity resolution links from users/experts
                    Improve quality of entity consolidation


             Proposed Approach
                    Domain specific semantic requirements
                       – Leverage Matching Dependencies


                    Limited user attention
                       – Employ value of perfect information theory
LOD Application Architecture
Digital Enterprise Research Institute                                                                                                                                                 www.deri.ie




                                                                                                                                          Utility              Feedback         Consolidation
                                                                                                                                          Module                Module            Module
                                                                Candidate Links


                                                                                                                                                               Questions
                                                                                                                                                    Rules                  Feedback

                                                                                                                                         Matching                                   Utility
                                                                                                                                       Dependencies                              Improvement



                                                                                                                                                               Ranked
                                                                                                                                                            Feedback Tasks



Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition), 1-136. Morgan & Claypool.
Related Work
Digital Enterprise Research Institute                                                    www.deri.ie




             Jeffery et al., “Pay-as-you-go user feedback for dataspace
              systems,” in Proceedings of the 2008 ACM SIGMOD
              Conference, 2008, pp. 847-860.

             Utility:
                    In terms of cardinality of query results on dataspace
                    General metric not suitable for application specific data quality
             Assumption:
                    Availability of global query statistics
                       – Problematic for Linked Open Data
Proposed Approach
Digital Enterprise Research Institute                                                             www.deri.ie




             Domain Specific Utility
                    Define utility in terms of user specified rules i.e. matching dependencies
                    Rank candidates links for user feedback according to value of perfect
                     information


             Assumptions
                    We assume matching dependencies are either provided by user or generated
                     through existing tools
                    Utility is based on satisfaction ratio of dependencies in dataspace
Proposed Approach
Digital Enterprise Research Institute                                        www.deri.ie




             Matching Dependencies

                    Matching Rule


                    Example


                    Utility of rule


                                             g (mk ) U ( Dmk , M {mk }) pk
             Value of Perfect Information           U ( Dmk , M {mk })(1 pk )
                                                     U ( D, M )
Evaluation
Digital Enterprise Research Institute                                   www.deri.ie




             Measure change in utility of a dataspace according to
              matching rules after a specific number of feedback iterations
             Candidate links generated by the Silk framework
Evaluation
Digital Enterprise Research Institute                                                                                                     www.deri.ie




             Datasets

                                            IIMB 2009 Dataset                UCI-Adult Dataset                     Drug Dataset

              Data Source           Instance Matching Benchmark      UCI Machine Learning Repository     Instance Matching Benchmark
                                    2009                                                                 2010
              Data Collection       IIMB 2009                        US Consensus Dataset                DrugBank and Sider Datasets
                                     - Reference Ontology            - Manually created duplicates and   - Interlinking between two datasets
                                     - Ontology #16 with errors in   data value errors                   of same domain
                                    attributes

              Entity Types          imdb:Movie                       foaf:Person                         drugbank:drugs, sider:drugs
              Total Triples         291                              64000                                             14348
              Total Entity IDs      44                               4000                                               5696
              Total Attributes      9                                16                                                   3
              Total Values          130                              10878                                              8473
              Candidate Links       81                               72                                                  94
              Correct Links         22                               72                                                  66
Evaluation
Digital Enterprise Research Institute                                                                                                                                               www.deri.ie




                                                        IIMB 2009 Dataset                                                                            UCI-Adult Dataset
                                      100%                                                                                         100%
      Dataspace Utility Improvement




                                                                                                   Dataspace Utility Improvement
                                      90%                                                                                          90%
                                      80%                                                                                          80%
                                      70%                                                                                          70%
                                      60%                                                                                          60%
                                      50%                                                                                          50%
                                      40%                                                                                          40%
                                      30%                                          VPI_RULES                                       30%                                         VPI_RULES
                                      20%                                          CONFIDENCE                                      20%                                         CONFIDENCE
                                      10%                                          RANDOM                                          10%                                         RANDOM

                                       0%                                                                                           0%
                                             0%   20%        40%         60%     80%        100%                                          0%   20%        40%         60%     80%          100%
                                                            Feedback Iteration                                                                           Feedback Iteration
Conclusion
Digital Enterprise Research Institute                                     www.deri.ie




             Matching dependencies provide an effective mechanism to:
                    Represent entity matching rules
                    Specify domain specific semantic requirements
                    Measure utility of dataspaces


             Value of perfect information enables effective ranking strategy
              for user feedback

             In the three datasets 100% utility improvement was reached
              under 40% of user feedback
Future Work
Digital Enterprise Research Institute                                     www.deri.ie




             Expand to other data quality problems

             Expand on types of dependencies such as comparable
              dependencies and order dependencies

             Allow multi-user feedback for collaborative data cleaning

More Related Content

What's hot

Web 2.0
Web 2.0Web 2.0
Web 2.0
gypsy
 
Doculabs E Discovery 051710
Doculabs E Discovery 051710Doculabs E Discovery 051710
Doculabs E Discovery 051710
Lane Severson
 
A Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven DesignA Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven Design
elliando dias
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
Stichting ePortfolio Support
 
Osservatorio mobile social networks final report
Osservatorio mobile social networks final reportOsservatorio mobile social networks final report
Osservatorio mobile social networks final report
Laura Cavallaro
 
Clearvale overview oct2011
Clearvale overview oct2011Clearvale overview oct2011
Clearvale overview oct2011
tommydm
 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agave
juanaya
 
Introduction to google analytics
Introduction to google analyticsIntroduction to google analytics
Introduction to google analytics
Jeff Wisniewski
 

What's hot (20)

Social Software They'll Love to Use
Social Software They'll Love to UseSocial Software They'll Love to Use
Social Software They'll Love to Use
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Doculabs E Discovery 051710
Doculabs E Discovery 051710Doculabs E Discovery 051710
Doculabs E Discovery 051710
 
A Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven DesignA Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven Design
 
Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
One-stop shop for software development information
One-stop shop for software development informationOne-stop shop for software development information
One-stop shop for software development information
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
 
Understanding Composite Web Applications with SharePoint 2010
Understanding Composite Web Applications with SharePoint 2010Understanding Composite Web Applications with SharePoint 2010
Understanding Composite Web Applications with SharePoint 2010
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
Aberdeen ppt-iam integrated-db-06 20120412
Aberdeen ppt-iam integrated-db-06 20120412Aberdeen ppt-iam integrated-db-06 20120412
Aberdeen ppt-iam integrated-db-06 20120412
 
Osservatorio mobile social networks final report
Osservatorio mobile social networks final reportOsservatorio mobile social networks final report
Osservatorio mobile social networks final report
 
Clearvale overview oct2011
Clearvale overview oct2011Clearvale overview oct2011
Clearvale overview oct2011
 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agave
 
Dynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksDynamic Open Semantic Service Networks
Dynamic Open Semantic Service Networks
 
Introduction to google analytics
Introduction to google analyticsIntroduction to google analytics
Introduction to google analytics
 
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical ImagesAdvanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
 
System Center webinar
System Center webinarSystem Center webinar
System Center webinar
 

Viewers also liked

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
A Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task AssignmentA Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task Assignment
Umair ul Hassan
 

Viewers also liked (7)

A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
 
A Collaborative Approach for Metadata Management for Internet of Things
A Collaborative Approach for Metadata Management for Internet of ThingsA Collaborative Approach for Metadata Management for Internet of Things
A Collaborative Approach for Metadata Management for Internet of Things
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
A Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task AssignmentA Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task Assignment
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
 

Similar to Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

Middleware 2002
Middleware 2002Middleware 2002
Middleware 2002
eaiti
 
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
Andre Freitas
 
Identity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slidesIdentity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slides
CA API Management
 

Similar to Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications (20)

E2.0 - Next Generation Portal and Content Management
E2.0 - Next Generation Portal and Content ManagementE2.0 - Next Generation Portal and Content Management
E2.0 - Next Generation Portal and Content Management
 
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph data
 
Agent Technology
Agent Technology Agent Technology
Agent Technology
 
Agent Technology Presentation
Agent Technology PresentationAgent Technology Presentation
Agent Technology Presentation
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Northridge Webinar Share Point 2010 Public Web
Northridge Webinar Share Point 2010 Public WebNorthridge Webinar Share Point 2010 Public Web
Northridge Webinar Share Point 2010 Public Web
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
 
Enabling agility with continuous integration testing
Enabling agility with continuous integration testingEnabling agility with continuous integration testing
Enabling agility with continuous integration testing
 
Middleware 2002
Middleware 2002Middleware 2002
Middleware 2002
 
Innovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceInnovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle Coherence
 
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
 
BDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
BDI 9/16/09 B2B Social Communications Case Studies Conference - DeloitteBDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
BDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
 
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
 
Identity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slidesIdentity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slides
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranetIntranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

  • 1. Digital Enterprise Research Institute www.deri.ie Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications Umair ul Hassan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute National University of Ireland, Galway Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
  • 2. Outline Digital Enterprise Research Institute www.deri.ie  Motivation & Problem Space  Identity Resolution on the Linked Open Data (LOD) Web  Proposed Approach  LOD Application Architecture  How it relates to existing works  Evaluation  Conclusion & Future Work
  • 3. Overview Digital Enterprise Research Institute www.deri.ie  Identity Resolution in the Linked Open Data Web  Real-world entities have multiple identifiers in LOD  Identity resolution links have associated uncertainty  LOD Applications require user verification of links  Problem  Feedback for all links is infeasible for large datasets  LOD Applications have domain specific utility of links  Proposed Approach  Leverages matching dependencies to define domain specific requirements of identity resolution  Ranks identity resolution links according to value of perfect information
  • 4. Linked Open Data (LOD) Digital Enterprise Research Institute www.deri.ie  Expose and interlink datasets on the Web  Using URIs to identify “things” in your data  Using a graph representation (RDF) to describe URIs  Vision: The Web as a huge graph database Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 5. Linked Data Example Digital Enterprise Research Institute www.deri.ie Identity resolution links Multiple Identifiers
  • 6. Identity Resolution in LOD Digital Enterprise Research Institute www.deri.ie  Identity resolution is required for consolidation of data in applications consuming LOD  Three sources of identity resolution links  Provided by data publishers (e.g. dbpedia.org)  Generated by consumer through tools (e.g. SILK, SEMIRI, RiMOM)  Maintained by third party web services (e.g. sameas.org)  Uncertainty associated with links  Due to multiple identity equivalence interpretations  Due to characteristics of link generation algorithms (similarity based)
  • 7. Identity Resolution Problem Digital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Challenges  Domain specific semantic requirements – How to define domain specific requirements of quality for Linked Data applications?  Limited user attention – How to rank candidate links according to their benefit to maximize utility of user feedback?
  • 8. Identity Resolution Problem Digital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Proposed Approach  Domain specific semantic requirements – Leverage Matching Dependencies  Limited user attention – Employ value of perfect information theory
  • 9. LOD Application Architecture Digital Enterprise Research Institute www.deri.ie Utility Feedback Consolidation Module Module Module Candidate Links Questions Rules Feedback Matching Utility Dependencies Improvement Ranked Feedback Tasks Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition), 1-136. Morgan & Claypool.
  • 10. Related Work Digital Enterprise Research Institute www.deri.ie  Jeffery et al., “Pay-as-you-go user feedback for dataspace systems,” in Proceedings of the 2008 ACM SIGMOD Conference, 2008, pp. 847-860.  Utility:  In terms of cardinality of query results on dataspace  General metric not suitable for application specific data quality  Assumption:  Availability of global query statistics – Problematic for Linked Open Data
  • 11. Proposed Approach Digital Enterprise Research Institute www.deri.ie  Domain Specific Utility  Define utility in terms of user specified rules i.e. matching dependencies  Rank candidates links for user feedback according to value of perfect information  Assumptions  We assume matching dependencies are either provided by user or generated through existing tools  Utility is based on satisfaction ratio of dependencies in dataspace
  • 12. Proposed Approach Digital Enterprise Research Institute www.deri.ie  Matching Dependencies  Matching Rule  Example  Utility of rule g (mk ) U ( Dmk , M {mk }) pk  Value of Perfect Information U ( Dmk , M {mk })(1 pk ) U ( D, M )
  • 13. Evaluation Digital Enterprise Research Institute www.deri.ie  Measure change in utility of a dataspace according to matching rules after a specific number of feedback iterations  Candidate links generated by the Silk framework
  • 14. Evaluation Digital Enterprise Research Institute www.deri.ie  Datasets IIMB 2009 Dataset UCI-Adult Dataset Drug Dataset Data Source Instance Matching Benchmark UCI Machine Learning Repository Instance Matching Benchmark 2009 2010 Data Collection IIMB 2009 US Consensus Dataset DrugBank and Sider Datasets - Reference Ontology - Manually created duplicates and - Interlinking between two datasets - Ontology #16 with errors in data value errors of same domain attributes Entity Types imdb:Movie foaf:Person drugbank:drugs, sider:drugs Total Triples 291 64000 14348 Total Entity IDs 44 4000 5696 Total Attributes 9 16 3 Total Values 130 10878 8473 Candidate Links 81 72 94 Correct Links 22 72 66
  • 15. Evaluation Digital Enterprise Research Institute www.deri.ie IIMB 2009 Dataset UCI-Adult Dataset 100% 100% Dataspace Utility Improvement Dataspace Utility Improvement 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% VPI_RULES 30% VPI_RULES 20% CONFIDENCE 20% CONFIDENCE 10% RANDOM 10% RANDOM 0% 0% 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Feedback Iteration Feedback Iteration
  • 16. Conclusion Digital Enterprise Research Institute www.deri.ie  Matching dependencies provide an effective mechanism to:  Represent entity matching rules  Specify domain specific semantic requirements  Measure utility of dataspaces  Value of perfect information enables effective ranking strategy for user feedback  In the three datasets 100% utility improvement was reached under 40% of user feedback
  • 17. Future Work Digital Enterprise Research Institute www.deri.ie  Expand to other data quality problems  Expand on types of dependencies such as comparable dependencies and order dependencies  Allow multi-user feedback for collaborative data cleaning

Editor's Notes

  1. Personal background
  2. Executive summary vs. overview
  3. Executive summary vs. overview
  4. Complete stack of semantic web technologies is based on open standards and protocols.The semantic web technologies focus on application layer of internet stack.
  5. Go back to research question slidesGo back to work flow and highlight whats needed
  6. Emphasize blendedReference SIGMOD