SlideShare a Scribd company logo
Pay-as-you-go Reconciliation in Schema 
Matching Networks 
Nguyen Quoc Viet Hung1, Nguyen Thanh Tam 1, Zoltán Miklós2, Karl Aberer1, 
Avigdor Gal3, and Matthias Weidlich4 
1 École Polytechnique Fédérale de Lausanne 
2 Université de Rennes 1 
3 Technion – Israel Institute of Technology 
4 Imperial College London
ICDE | 2014 2 
Schema Matching - Where? 
Schema matching is the process of establishing correspondences between the 
attributes of schemas, for the purpose of data integration 
Large enterprises 
Cloud 
WWW 
Collaborative Systems 
P2P Networks
Private PhD Thesis Defense | 12.2013 3 
Schema Matching Network 
A network of schemas that are matched against each other 
Traditional approach: 
Mediated schema 
Our approach: 
Schema Matching Network 
S1 S2 S3 
S1 
S2 S3 
Require consensus on schema 
Updated Frequently
ICDE | 2014 4 
Pay-as-you-go Reconciliation 
 Reconciliation is the process of asking human user to give feedback on correspondences. 
 Need of reconciliation: automatic techniques use heuristics  results are inherently uncertain 
s1: EoverI 
s2: BBC 
s3: DVDizzy 
a4: productionDate 
a1: releaseDate 
a3: availabilityDate 
c4 
c2 
c1 
c3 
c5 
a2: screeningDate 
Attribute names are quite similar 
 automatic matching tools often fail to identify the 
correct correspondences. 
Instantiation 
Selective matching 
Uncertainty 
Reduction 
Pay‐as‐you‐go 
reconciliation 
Incrementally improve matching 
quality with minimal user effort 
Instantiate a single trusted 
set of correspondences
ICDE | 2014 5 
System Overview 
General approach: 
1. Develop a probabilistic matching network (pSMN)  can measure the overall 
uncertainty of the network 
2. Reduce network uncertainty: guide user feedback with minimal effort 
3. Instantiate a selective matching: maintain a good set of attribute correspondences 
to make the system available at any time
ICDE | 2014 6 
Outline 
 Probabilistic Schema Matching Network (pSMN): 
 Model 
 Computation 
 Uncertainty Reduction 
 Instantiation of the selective matching 
 Experimental results 
 Conclusion and future work
ICDE | 2014 7 
pSMN - Modeling 
 Schema matching network is modeled as a quadruple N ൌ ܵ, ܩ௦, Γ, ܥ, ܲ 
 ܵ – set of schemas ݏ 
 ܩ௦ ‐ interaction graph: represents the connections in the networks. 
 ܥ – set of attribute correspondences 
 Γ – set of integrity constraints 
 An integrity constraint is the formulation of natural properties 
 1‐1 constraint 
 Cycle constraint (transitivity) 
 Etc. 
 ܲ ൌ ሼpୡሽ – a set of probabilities. Each probability ݌௖ is associated with a 
correspondence ܿ ∈ ܥ.
ICDE | 2014 8 
pSMN - Computing 
 Probability of a correspondence 
 Semantics: indicate the correctness of these correspondences 
 Source: integrity constraints and user input. Idea: a correspondence that involves 
many violations has a high chance of being problematic. 
 Computation: 
 Step 1: construct all possible matching instances Ω ൌ ሼIଵ, … , I୬ሽ. Matching 
instance is a maximal set of correspondences satisfying all integrity constraints 
and user input. 
 Step 2: compute by the formula: 
݌௖ ൌ #௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ ௖௢௡௧௔௜௡ ௖ 
#௔௟௟ ௣௢௦௦௜௕௟௘ ௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ (i.e. ݌௖ ൌ ሼூ∈ஐ:௖∈ூሽ 
ஐ ) 
 Challenge: probability computation has a high complexity  We use non‐uniform 
sampling and a view‐maintenance technique to approximate the probability 
efficiently. 
 Network Uncertainty: quantify the uncertainty of pSMN based on entropy: 
ܪ ܥ ൌ െ෍݌௖ 
log ݌௖ ൅ ሺ1 െ ݌௖ሻ logሺ1 െ ݌௖ሻ 
௖∈஼
ICDE | 2014 9 
Outline 
 Probabilistic Schema Matching Network (pSMN): 
 Model 
 Computation 
 Uncertainty Reduction 
 Instantiation of the selective matching 
 Experimental results 
 Conclusion and future work
ICDE | 2014 10 
Reduce Network Uncertainty 
 Goal: guide user to give feedback with minimal user effort 
 Problem (UNCERTAINTY MINIMIZATION WITH LIMITED EFFORT BUDGET). Given a 
probabilistic matching network 〈ܵ, ܥ, ܩ, Γ, ܲ〉 and a budget of user effort ݇, find a set of 
correspondences ܥᇱ ⊆ ܥ with ܥᇱ ൑ ݇, such that ܪሺܥ, ܲሻ is minimal.
ICDE | 2014 11 
Approach – Use heuristic ordering 
 Idea: feed users the correspondences with highest information‐gain first. 
 Information gain: the uncertainty reduction before and after validation: 
ܫܩ ܿ ൌ ܪ ܥ െ ܪሺܥ|ܿሻ 
ܪ ܥ ܿ : expected network uncertainty when knowing the true value of c 
Two possible solutions: {c1,c2,c3} and 
{c1,c4,c5}. 
 Ask c1 first  the network is unchanged 
 no uncertainty reduction. 
 Ask c2 first  only 1 solution left  the 
network becomes certain. 
SA 
SB 
SC 
c3 
c4 
c5 
c1 c2 
SA 
SB 
SC 
c5 
c3 
c4 
c1 c2 
SA 
SB 
SC 
c3 
c1 c2
ICDE | 2014 12 
Instantiate a selective matching 
 Goal: Maintain a single trusted set of correspondences 
 Goodness measurement of a set of correspondences ܫ ⊆ ܥ: 
 Repair distance: information loss of eliminating some correspondences to 
guarantee integrity constraint 
Δ ܫ ൌ ܥ ∖ ܫ 
 Likelihood: represents the collective correctness of correspondences: 
ݑ ܫ ൌ ෑ݌௖ 
௖∈ூ 
 Instantiation problem: given a schema matching network, identify a set of 
correspondences ܫ ⊆ ܥ with minimal repair distance (w.r.t. ܥ) and maximal 
likelihood.
ICDE | 2014 13 
Approach 
 The instantiation problem is NP‐complete  use heuristic approach 
 Algorithm: 
 Step 1: Initialization ‐ Pickup a sampled matching instance with minimal repair 
distance 
 Step 2: Optimization – Randomized local search 
Repair Distance 
Likelihood 
minimal repair distance + maximal likelihood 
I0 
Iopt 
randomized local search 
matching instances: 
satisfy all constraints 
non‐sampled instance 
sampled instance 
sampled + minimal repair distance
ICDE | 2014 14 
Outline 
 Probabilistic Schema Matching Network (pSMN): 
 Model 
 Computation 
 Uncertainty Reduction 
 Instantiation of the selective matching 
 Experimental results 
 Conclusion and future work
ICDE | 2014 15 
Experiment – Dataset and Setting 
 Datasets: 
 Business Partner: schemas from enterprise systems 
 Purchase Order: purchase order e‐business schemas 
 University Application Form: schemas from Web interfaces of American university 
application forms 
 WebForm: schemas from Web forms of different domains 
 Thalia: schemas describing university courses 
 Metrics: 
 Precision: measures quality improvement at each user interaction step ݅, with G 
being the exact match. 
ܲ௜ ൌ ሺD୧ 
∩ ܩሻ/|D୧| 
 User effort: the percentage of feedback steps relative to the size of the matcher 
output. 
ܧ௜ ൌ ݅/|ܥ|
Efficiency of guiding strategy on uncertainty reduction 
 Goal: compare between guiding vs. non‐guiding strategy on uncertainty reduction 
 Evaluation procedure: 
ICDE | 2014 16 
 Increases user effort 
 Upon each user input, measure the network uncertainty and precision 
 Interesting finding: heuristic ordering strategy achieves savings of up to 48% user 
effort compared to random ordering.
ICDE | 2014 17 
Efficiency of guiding strategy on instantiation 
 Goal: compare between guiding vs. non‐guiding strategy on instantiation 
 Evaluation procedure: 
 Increases user effort 
 Measure the precision and recall of the instantiated matching 
 Interesting finding: heuristic ordering strategy outperforms the baseline with an 
average difference of 15% (precision) and 14% (recall).
ICDE | 2014 18 
Conclusions 
 We introduce the concept of schema matching networks and probabilistic matching 
networks 
 We define a model for pay‐as‐you‐go reconciliation on top of matching networks. 
 We propose a guiding technique to reduce network uncertainty and a heuristic 
approach to instantiate a selective matching. 
 Through experiments with real‐world schemas, our guiding strategy outperforms the 
baseline: 
 Saving user effort by up to 48% 
 Increasing precision (15%) and recall (14%)
ICDE | 2014 19 
Future Work 
 Generalizing pay‐as‐you‐go reconciliation for crowdsourced models: 
 Business process matching 
 Ontology alignment
ICDE | 2014 20 
THANK YOU 
Q&A

More Related Content

What's hot

Human-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyHuman-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital Pathology
Mara Graziani
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
Daniel Emaasit
 
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESTEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
sipij
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
Yoonho Lee
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
csandit
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.Teng Xiaolu
 
Evaluation of online learning
Evaluation of online learningEvaluation of online learning
Evaluation of online learning61820_62133
 
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
ijscai
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
Seokhyun Yoon
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
HJ van Veen
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
sipij
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
Sungjoon Choi
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET Journal
 
Ajila (1)
Ajila (1)Ajila (1)
Ajila (1)
akanksha kunwar
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
NAVER Engineering
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913
IJRAT
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 

What's hot (19)

Human-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital PathologyHuman-centric Interpretability for Digital Pathology
Human-centric Interpretability for Digital Pathology
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
 
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESTEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHES
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.
 
Evaluation of online learning
Evaluation of online learningEvaluation of online learning
Evaluation of online learning
 
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
 
Ajila (1)
Ajila (1)Ajila (1)
Ajila (1)
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 

Similar to Pay-as-you-go Reconciliation in Schema Matching Networks

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
AllenWu
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
Evgeny Frolov
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
A scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringA scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringlau
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
Aladejubelo Oluwashina
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Study and development of methods and tools for testing, validation and verif...
 Study and development of methods and tools for testing, validation and verif... Study and development of methods and tools for testing, validation and verif...
Study and development of methods and tools for testing, validation and verif...
Emilio Serrano
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
PlanetData Network of Excellence
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
IRJET Journal
 
ann1.pptx
ann1.pptxann1.pptx
ann1.pptx
vipinkmenon1
 
A Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningA Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine Learning
IRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
IRJET Journal
 
Ai
AiAi
Ai
AiAi
Artificial Intelligence Certification
Artificial Intelligence CertificationArtificial Intelligence Certification
Artificial Intelligence Certification
kartikaryan4
 
Ai
AiAi
Ai
AiAi

Similar to Pay-as-you-go Reconciliation in Schema Matching Networks (20)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
A scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringA scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clustering
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Study and development of methods and tools for testing, validation and verif...
 Study and development of methods and tools for testing, validation and verif... Study and development of methods and tools for testing, validation and verif...
Study and development of methods and tools for testing, validation and verif...
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
ann1.pptx
ann1.pptxann1.pptx
ann1.pptx
 
A Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine LearningA Review Study OF Movie Recommendation Using Machine Learning
A Review Study OF Movie Recommendation Using Machine Learning
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Artificial Intelligence Certification
Artificial Intelligence CertificationArtificial Intelligence Certification
Artificial Intelligence Certification
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 

More from PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PlanetData Network of Excellence
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
PlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
PlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
PlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
PlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
PlanetData Network of Excellence
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
PlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
PlanetData Network of Excellence
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
PlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
PlanetData Network of Excellence
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
PlanetData Network of Excellence
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PlanetData Network of Excellence
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
PlanetData Network of Excellence
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
PlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 

Recently uploaded

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
ShahulHameed54211
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
Himani415946
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
TristanJasperRamos
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 

Recently uploaded (16)

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 

Pay-as-you-go Reconciliation in Schema Matching Networks

  • 1. Pay-as-you-go Reconciliation in Schema Matching Networks Nguyen Quoc Viet Hung1, Nguyen Thanh Tam 1, Zoltán Miklós2, Karl Aberer1, Avigdor Gal3, and Matthias Weidlich4 1 École Polytechnique Fédérale de Lausanne 2 Université de Rennes 1 3 Technion – Israel Institute of Technology 4 Imperial College London
  • 2. ICDE | 2014 2 Schema Matching - Where? Schema matching is the process of establishing correspondences between the attributes of schemas, for the purpose of data integration Large enterprises Cloud WWW Collaborative Systems P2P Networks
  • 3. Private PhD Thesis Defense | 12.2013 3 Schema Matching Network A network of schemas that are matched against each other Traditional approach: Mediated schema Our approach: Schema Matching Network S1 S2 S3 S1 S2 S3 Require consensus on schema Updated Frequently
  • 4. ICDE | 2014 4 Pay-as-you-go Reconciliation  Reconciliation is the process of asking human user to give feedback on correspondences.  Need of reconciliation: automatic techniques use heuristics  results are inherently uncertain s1: EoverI s2: BBC s3: DVDizzy a4: productionDate a1: releaseDate a3: availabilityDate c4 c2 c1 c3 c5 a2: screeningDate Attribute names are quite similar  automatic matching tools often fail to identify the correct correspondences. Instantiation Selective matching Uncertainty Reduction Pay‐as‐you‐go reconciliation Incrementally improve matching quality with minimal user effort Instantiate a single trusted set of correspondences
  • 5. ICDE | 2014 5 System Overview General approach: 1. Develop a probabilistic matching network (pSMN)  can measure the overall uncertainty of the network 2. Reduce network uncertainty: guide user feedback with minimal effort 3. Instantiate a selective matching: maintain a good set of attribute correspondences to make the system available at any time
  • 6. ICDE | 2014 6 Outline  Probabilistic Schema Matching Network (pSMN):  Model  Computation  Uncertainty Reduction  Instantiation of the selective matching  Experimental results  Conclusion and future work
  • 7. ICDE | 2014 7 pSMN - Modeling  Schema matching network is modeled as a quadruple N ൌ ܵ, ܩ௦, Γ, ܥ, ܲ  ܵ – set of schemas ݏ  ܩ௦ ‐ interaction graph: represents the connections in the networks.  ܥ – set of attribute correspondences  Γ – set of integrity constraints  An integrity constraint is the formulation of natural properties  1‐1 constraint  Cycle constraint (transitivity)  Etc.  ܲ ൌ ሼpୡሽ – a set of probabilities. Each probability ݌௖ is associated with a correspondence ܿ ∈ ܥ.
  • 8. ICDE | 2014 8 pSMN - Computing  Probability of a correspondence  Semantics: indicate the correctness of these correspondences  Source: integrity constraints and user input. Idea: a correspondence that involves many violations has a high chance of being problematic.  Computation:  Step 1: construct all possible matching instances Ω ൌ ሼIଵ, … , I୬ሽ. Matching instance is a maximal set of correspondences satisfying all integrity constraints and user input.  Step 2: compute by the formula: ݌௖ ൌ #௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ ௖௢௡௧௔௜௡ ௖ #௔௟௟ ௣௢௦௦௜௕௟௘ ௠௔௧௖௛௜௡௚ ௜௡௦௧௔௡௖௘௦ (i.e. ݌௖ ൌ ሼூ∈ஐ:௖∈ூሽ ஐ )  Challenge: probability computation has a high complexity  We use non‐uniform sampling and a view‐maintenance technique to approximate the probability efficiently.  Network Uncertainty: quantify the uncertainty of pSMN based on entropy: ܪ ܥ ൌ െ෍݌௖ log ݌௖ ൅ ሺ1 െ ݌௖ሻ logሺ1 െ ݌௖ሻ ௖∈஼
  • 9. ICDE | 2014 9 Outline  Probabilistic Schema Matching Network (pSMN):  Model  Computation  Uncertainty Reduction  Instantiation of the selective matching  Experimental results  Conclusion and future work
  • 10. ICDE | 2014 10 Reduce Network Uncertainty  Goal: guide user to give feedback with minimal user effort  Problem (UNCERTAINTY MINIMIZATION WITH LIMITED EFFORT BUDGET). Given a probabilistic matching network 〈ܵ, ܥ, ܩ, Γ, ܲ〉 and a budget of user effort ݇, find a set of correspondences ܥᇱ ⊆ ܥ with ܥᇱ ൑ ݇, such that ܪሺܥ, ܲሻ is minimal.
  • 11. ICDE | 2014 11 Approach – Use heuristic ordering  Idea: feed users the correspondences with highest information‐gain first.  Information gain: the uncertainty reduction before and after validation: ܫܩ ܿ ൌ ܪ ܥ െ ܪሺܥ|ܿሻ ܪ ܥ ܿ : expected network uncertainty when knowing the true value of c Two possible solutions: {c1,c2,c3} and {c1,c4,c5}.  Ask c1 first  the network is unchanged  no uncertainty reduction.  Ask c2 first  only 1 solution left  the network becomes certain. SA SB SC c3 c4 c5 c1 c2 SA SB SC c5 c3 c4 c1 c2 SA SB SC c3 c1 c2
  • 12. ICDE | 2014 12 Instantiate a selective matching  Goal: Maintain a single trusted set of correspondences  Goodness measurement of a set of correspondences ܫ ⊆ ܥ:  Repair distance: information loss of eliminating some correspondences to guarantee integrity constraint Δ ܫ ൌ ܥ ∖ ܫ  Likelihood: represents the collective correctness of correspondences: ݑ ܫ ൌ ෑ݌௖ ௖∈ூ  Instantiation problem: given a schema matching network, identify a set of correspondences ܫ ⊆ ܥ with minimal repair distance (w.r.t. ܥ) and maximal likelihood.
  • 13. ICDE | 2014 13 Approach  The instantiation problem is NP‐complete  use heuristic approach  Algorithm:  Step 1: Initialization ‐ Pickup a sampled matching instance with minimal repair distance  Step 2: Optimization – Randomized local search Repair Distance Likelihood minimal repair distance + maximal likelihood I0 Iopt randomized local search matching instances: satisfy all constraints non‐sampled instance sampled instance sampled + minimal repair distance
  • 14. ICDE | 2014 14 Outline  Probabilistic Schema Matching Network (pSMN):  Model  Computation  Uncertainty Reduction  Instantiation of the selective matching  Experimental results  Conclusion and future work
  • 15. ICDE | 2014 15 Experiment – Dataset and Setting  Datasets:  Business Partner: schemas from enterprise systems  Purchase Order: purchase order e‐business schemas  University Application Form: schemas from Web interfaces of American university application forms  WebForm: schemas from Web forms of different domains  Thalia: schemas describing university courses  Metrics:  Precision: measures quality improvement at each user interaction step ݅, with G being the exact match. ܲ௜ ൌ ሺD୧ ∩ ܩሻ/|D୧|  User effort: the percentage of feedback steps relative to the size of the matcher output. ܧ௜ ൌ ݅/|ܥ|
  • 16. Efficiency of guiding strategy on uncertainty reduction  Goal: compare between guiding vs. non‐guiding strategy on uncertainty reduction  Evaluation procedure: ICDE | 2014 16  Increases user effort  Upon each user input, measure the network uncertainty and precision  Interesting finding: heuristic ordering strategy achieves savings of up to 48% user effort compared to random ordering.
  • 17. ICDE | 2014 17 Efficiency of guiding strategy on instantiation  Goal: compare between guiding vs. non‐guiding strategy on instantiation  Evaluation procedure:  Increases user effort  Measure the precision and recall of the instantiated matching  Interesting finding: heuristic ordering strategy outperforms the baseline with an average difference of 15% (precision) and 14% (recall).
  • 18. ICDE | 2014 18 Conclusions  We introduce the concept of schema matching networks and probabilistic matching networks  We define a model for pay‐as‐you‐go reconciliation on top of matching networks.  We propose a guiding technique to reduce network uncertainty and a heuristic approach to instantiate a selective matching.  Through experiments with real‐world schemas, our guiding strategy outperforms the baseline:  Saving user effort by up to 48%  Increasing precision (15%) and recall (14%)
  • 19. ICDE | 2014 19 Future Work  Generalizing pay‐as‐you‐go reconciliation for crowdsourced models:  Business process matching  Ontology alignment
  • 20. ICDE | 2014 20 THANK YOU Q&A