SlideShare a Scribd company logo
HARE:
An Engine for Enhancing Answer Completeness
of SPARQL Queries via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal
Motivation (1)
2
Motivation (1)
3
Due to the semi-structured nature of RDF,
incomplete values cannot be easily detected.
Motivation (2)
4
SELECT DISTINCT ?drug WHERE {
?drug rdf:type dbo:Drug .
?drug dbo:atcPrefix “C01” .
?drug dbp:routesOfAdministration ?route .
}
Retrieve drugs that are annotated with the prefix “C01” (Cardiac Therapy) in the Anatomical
Therapeutic Chemical (ATC) classification system and which have known routes of administration.
47 drugs
(v. 2016)
Motivation (2)
5
SELECT DISTINCT ?drug WHERE {
?drug rdf:type dbo:Drug .
?drug dbo:atcPrefix “C01” .
?drug dbp:routesOfAdministration ?route .
}
Retrieve drugs that are annotated with the prefix “C01” (Cardiac Therapy) in the Anatomical
Therapeutic Chemical (ATC) classification system and which have known routes of administration.
98 drugs
(v. 2016)
(There are 48 drugs without
routes of administration)
Motivation (3)
6
Examples of drugs (with ATC prefix “C01”) with no routes of administration in
All images licensed under Fair use via Wikipedia.
dbr:Acadesine dbr:Acetyldigitoxin dbr:Dimetofrine dbr:Flecainide
(v. 2016)
Intravenous administration,
for treating leukemia.
Source: PubChem
Also used in doping (sports).
Source: PubMed
Oral administration,
Source: DrugBank
No route found. No route found.
Problem Definition
7
Given an RDF dataset D and a SPARQL query Q against D. Consider D* the
virtual dataset that contains all the data that should be in D.
P1) Identifying portions P of Q that yield missing values.
P2) Resolving missing values.
[[P]]D [[P]]D*⊂
µ [[P]]D µ [[P]]D*∉ ∈
Do not belong to
the solution of P
Should belong to
the solution of P
Our Approach:
HARE (Hybrid SPARQL Query Engine)
8
SELECT DISTINCT ?drug WHERE {
?drug rdf:type dbo:Drug .
?drug dbo:atcPrefix “C01” .
?drug dbp:routesOfAdministration ?route .
}
HARE Overview
9
{?drug à dbr:Ibuprofen}
{?drug à dbr:Flecainide}
Query Engine
RDF Completeness
Model
Microtask Manager
{?drug à dbr:Acadesine}
{?drug à dbr:Ibuprofen}
{?drug à dbr:Flecainide}
{?drug à dbr:Acadesine}
Crowd Knowledge
CKB+ CKB- CKB~
D
τ
HARE
• A hybrid machine/human SPARQL query engine that is able to enhance
the size of query answers.
• Based on a novel RDF completeness model, HARE implements query
optimization and execution techniques:
P1) Identifying portions of queries that yield missing values.
• HARE resorts to microtask crowdsourcing:
P2) Resolving missing values.
10
RDF Completeness Model (1)
• Relies on the Local Closed World Assumption (LCWA).
• Estimates the local completeness of resources with respect to other
resources in an RDF graph that belong to the same classes.
11
rdf:type
dbp:routesOf
Administration
rdf:type
rdf:type
dbo:Drug
dbr:Procainamide
dbr:Flecainide
dbr:Bretylium
Local Completeness
dbp:routesOf
Administration
dbp:routesOf
Administration
dbp:routesOf
Administration
dbp:routesOf
Administration
RDF Completeness Model (2)
① Multiplicity of an RDF Resource
Number of objects that a resource has for a certain predicate.
12
MOD(dbr:Procainamide, dbp:routesOfAdministration) = 3
dbr:Procainamide
dbp:routesOfAdministration
dbp:routesOfAdministration
dbp:routesOfAdministration
dbr:Intravenous
dbr:Intramuscular_injection
dbr:Oral_administration
RDF Completeness Model (3)
② Aggregated Multiplicity of a Class
Given a predicate, median number of distinct objects that have all the
resources that belong to a class.
13
AMOD(dbo:Drug, dbp:routesOfAdministration) = 3
MOD(dbr:Procainamide, dbp:routesOfAdministration) = 3
MOD(dbr:Bretylium, dbp:routesOfAdministration) = 2
RDF Completeness Model (4)
③ Local Completeness of an RDF Resource
Given a predicate, the completeness of an RDF resource is determined by
the aggregated predicate multiplicity of the classes that it belongs to.
14
CompD(dbr:Procainamide | dbp:routesOfAdministration) =
CompD(dbr:Bretylium | dbp:routesOfAdministration) =
CompD(dbr:Flecainide | dbp:routesOfAdministration) =
3
3
2
3
①Computed in
Computed in ②
0
3
Crowd Knowledge Bases (1)
• The knowledge collected from the crowd is captured in three KBs:
• CKB+, CKB–, CKB~ are fuzzy RDF datasets composed of 4-tuples:
15
CKB~
CKB+
CKB–
(subject, predicate, object, membership_degree)
RDF triple
Crowd Knowledge Bases (2)
16
Types of Crowd Knowledge Bases
(dbr: Acadesine, dbp:routesOfAdministration, _:o2, 0.78)
“Flecainide is administered orally.”
(dbr:Flecainide, dbp:routesOfAdministration, dbr:Oral_administration, 0.9)
“Flecainide does not have a (known) route of administration.”
(dbr:Flecainide, dbp:routesOfAdministration, _:o1, 0.05)
“I am not sure if Acadesine has a route of administration.”
CKB+
CKB-
CKB~
Contradiction (C)
Unknownness (U)
Query Engine (1)
• The engine computes the probability of crowdsourcing a triple pattern t in
query Q, denoted PCROWD(t).
• If PCROWD(t) is greater than a user threshold τ, then the query engine
crowdsources the triple pattern t.
• α is a score weight between 0.0 and 1.0.
17
PCROWD (t) =
α (1 – Comp(t)) + (1 – α) max {max{m+, m–}, min{C(t), 1 – U(t)}}
Estimated
incompleteness
Crowd
unreliability
Crowd
confidence
Query Engine (2)
• The engine combines mappings obtained from the dataset D and fuzzy
mappings from the crowd stored in CKB+.
• We define a fuzzy set semantics for SPARQL.
18
({?drug à dbr:Isoprenaline, ?route à dbr:Intravenous}, 0.94)
{?drug à dbr:Isoprenaline, ?route à dbr:Inhalation}
CKB+
D
The complexity of computing the mapping set of a SPARQL query under fuzzy set semantics is
the same as under set semantics.
The HARE query engine does not increase the time complexity of computing the mapping set of
a SPARQL query.
Corollary
Theorem
Microtask Manager (1)
19
• Receives triple patterns to crowdsource.
• Creates human tasks.
• Submits tasks to the crowdsourcing platform.
(dbr:Flecainide, dbp:routesOfAdministration, ?route)
dbr:Flecainide
Microtask Manager (2)
20
rdfs:label
Flecainide acetate (/flɛˈkeɪnaɪd/
US dict: fle·kā′·nīd) is a classic Ic
antyarrhythmic agent (...)
rdfs:comment
wiki-commons:Special:FilePath/
Flecainide_structure.svg
foaf:depiction
http://en.wikipedia.org/
wiki/Flecainide
foaf:isPrimaryTopicOf
dbp:routesOfAdministration
“Flecainide“@en
“routes of administration“@en
RDF Graphs:
Experimental Study
21
Experimental Settings
• Benchmark: 50 queries against (English version, 2014).
• Ten queries in five different knowledge domains:
History, Life Sciences, Movies, Music, and Sports.
• Implementation details:
• Dataset (queries executed directly against the dataset).
• HARE (our proposed approach).
• HARE BL (generates microtask interfaces replacing URIs by labels).
• Crowdsourcing configuration:
• The crowd is reached via CrowdFlower.
• Four different triple patterns per task, 0.07 US$ per task (Sep. 2015).
• At least 3 answers were collected per task.
22
Overview of the Results
• Benchmark: 50 queries against (English version, 2014).
• Ten queries in five different knowledge domains:
History, Life Sciences, Movies, Music, and Sports.
• Implementation details:
• Dataset (queries executed directly against the dataset).
• HARE (our proposed approach).
• HARE BL (generates microtask interfaces replacing URIs by labels).
• Crowdsourcing configuration:
• The crowd is reached via CrowdFlower.
• Four different triple patterns per task, 0.07 US$ per task (Sep. 2015).
• At least 3 answers were collected per task.
23
Total triple patterns crowdsourced: 1,004
Total answers collected from the crowd: 3,163
75%-98% of the crowd answers
were produced in 12 minutes
0
500
1000
1500
0.00 0.25 0.50 0.75 1.00
τ
Crowdsourcedtriplepatterns
Sports
History
LifeSciences
Music
Movies
24
# Crowdsourced Triple Patterns per Domain
The RDF completeness model considerably reduces the
number of triple patterns to crowdsource (τ >= 0.5).
Effectiveness of the RDF Completeness Model
Completeness of Query Answers
Sports Music Life Sciences Movies History
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10
0.00
0.25
0.50
0.75
1.00
Query
Recall
Dataset HARE−BL HARE
25
Recall of tested approaches w.r.t. D* per SPARQL query
Recall varies across queries and knowledge domains.
Completing answers in certain domains is more challenging.
Sports Music Life Sciences Movies History
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10
0.00
0.25
0.50
0.75
1.00
Query
Recall
Dataset HARE−BL HARE
Completeness of Query Answers
26
Recall of tested approaches w.r.t. D* per SPARQL query
HARE outperforms the other approaches across all knowledge domains.
Our RDF completeness model captures the skewed distributions of values.
Recall varies across queries and knowledge domains.
Completing answers in certain domains is more challenging.
✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓
Quality of Crowd Answers: Precision
27
The crowd exhibits heterogeneous performance within domains.
This supports the importance of HARE triple-based approach.
Quality of Crowd Answers: Precision
28
The precision of the crowd answers is in general higher when
crowdsourcing semantically enriched tasks.
Conclusions & Outlook
29
Conclusions
• HARE: Hybrid query engine against RDF data sets.
• Supports microtasks to enhance query answers on-the-fly.
• Experimental results confirmed that:
Future work
• Study further approaches to capture crowd reliability.
• Consider other quality dimensions on the knowledge collected from the
crowd.
30
3.13 – 12 times
Size of query answer
Precision
0.62 – 0.97
Crowd quality
Semantically enriched tasks
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal
31
HARE:
An Engine for Enhancing Answer Completeness of
SPARQL Queries via Crowdsourcing
Size of query answer
Precision
Crowd quality
SELECT DISTINCT ?drug WHERE {
?drug rdf:type dbo:Drug .
?drug dbo:atcPrefix “C01” .
?drug dbp:routesOfAdministration ?route .
}
Crowd Knowledge
CKB+ CKB- CKB~
D

More Related Content

What's hot

Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
Shadi Saleh
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slideMohd Iqbal Al-farabi
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
Dimitris Kontokostas
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
Boris Glavic
 
Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...
NextMove Software
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
Tetsuya Sakai
 
Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...
Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...
Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...
Flink Forward
 

What's hot (7)

Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 
Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...
Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...
Flink Forward Berlin 2018: Suneel Marthi & Joey Frazee - "Streaming topic mod...
 

Similar to HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
Maribel Acosta Deibe
 
SWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by YamamotoSWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by Yamamoto
yayamamo @ DBCLS Kashiwanoha
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Maribel Acosta Deibe
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
Valery Tkachenko
 
Semantics and optimisation of the SPARQL1.1 federation extension
Semantics and optimisation of the SPARQL1.1 federation extensionSemantics and optimisation of the SPARQL1.1 federation extension
Semantics and optimisation of the SPARQL1.1 federation extension
Oscar Corcho
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
Nitish Upreti
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
Sujit Pal
 
Database Searching
Database SearchingDatabase Searching
Database Searching
Meghaj Mallick
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Allen Day, PhD
 
A scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materializationA scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materialization
Rokan Uddin Faruqui
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
Maulik Kamdar
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
University of Washington
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
Kamel Mansouri
 
Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...
German Terrazas
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the CloudDataMine Lab
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
Blake Regalia
 

Similar to HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing (20)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
 
SWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by YamamotoSWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by Yamamoto
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Semantics and optimisation of the SPARQL1.1 federation extension
Semantics and optimisation of the SPARQL1.1 federation extensionSemantics and optimisation of the SPARQL1.1 federation extension
Semantics and optimisation of the SPARQL1.1 federation extension
 
Po20
Po20Po20
Po20
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
A scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materializationA scalable ontology reasoner via incremental materialization
A scalable ontology reasoner via incremental materialization
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...Discovering Beneficial Cooperative Structures for the Automated Construction ...
Discovering Beneficial Cooperative Structures for the Automated Construction ...
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
 

More from Maribel Acosta Deibe

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
Maribel Acosta Deibe
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Maribel Acosta Deibe
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Maribel Acosta Deibe
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Maribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
Maribel Acosta Deibe
 

More from Maribel Acosta Deibe (6)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 

Recently uploaded

platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 

Recently uploaded (20)

platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 

HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing

  • 1. HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal
  • 3. Motivation (1) 3 Due to the semi-structured nature of RDF, incomplete values cannot be easily detected.
  • 4. Motivation (2) 4 SELECT DISTINCT ?drug WHERE { ?drug rdf:type dbo:Drug . ?drug dbo:atcPrefix “C01” . ?drug dbp:routesOfAdministration ?route . } Retrieve drugs that are annotated with the prefix “C01” (Cardiac Therapy) in the Anatomical Therapeutic Chemical (ATC) classification system and which have known routes of administration. 47 drugs (v. 2016)
  • 5. Motivation (2) 5 SELECT DISTINCT ?drug WHERE { ?drug rdf:type dbo:Drug . ?drug dbo:atcPrefix “C01” . ?drug dbp:routesOfAdministration ?route . } Retrieve drugs that are annotated with the prefix “C01” (Cardiac Therapy) in the Anatomical Therapeutic Chemical (ATC) classification system and which have known routes of administration. 98 drugs (v. 2016) (There are 48 drugs without routes of administration)
  • 6. Motivation (3) 6 Examples of drugs (with ATC prefix “C01”) with no routes of administration in All images licensed under Fair use via Wikipedia. dbr:Acadesine dbr:Acetyldigitoxin dbr:Dimetofrine dbr:Flecainide (v. 2016) Intravenous administration, for treating leukemia. Source: PubChem Also used in doping (sports). Source: PubMed Oral administration, Source: DrugBank No route found. No route found.
  • 7. Problem Definition 7 Given an RDF dataset D and a SPARQL query Q against D. Consider D* the virtual dataset that contains all the data that should be in D. P1) Identifying portions P of Q that yield missing values. P2) Resolving missing values. [[P]]D [[P]]D*⊂ µ [[P]]D µ [[P]]D*∉ ∈ Do not belong to the solution of P Should belong to the solution of P
  • 8. Our Approach: HARE (Hybrid SPARQL Query Engine) 8
  • 9. SELECT DISTINCT ?drug WHERE { ?drug rdf:type dbo:Drug . ?drug dbo:atcPrefix “C01” . ?drug dbp:routesOfAdministration ?route . } HARE Overview 9 {?drug à dbr:Ibuprofen} {?drug à dbr:Flecainide} Query Engine RDF Completeness Model Microtask Manager {?drug à dbr:Acadesine} {?drug à dbr:Ibuprofen} {?drug à dbr:Flecainide} {?drug à dbr:Acadesine} Crowd Knowledge CKB+ CKB- CKB~ D τ
  • 10. HARE • A hybrid machine/human SPARQL query engine that is able to enhance the size of query answers. • Based on a novel RDF completeness model, HARE implements query optimization and execution techniques: P1) Identifying portions of queries that yield missing values. • HARE resorts to microtask crowdsourcing: P2) Resolving missing values. 10
  • 11. RDF Completeness Model (1) • Relies on the Local Closed World Assumption (LCWA). • Estimates the local completeness of resources with respect to other resources in an RDF graph that belong to the same classes. 11 rdf:type dbp:routesOf Administration rdf:type rdf:type dbo:Drug dbr:Procainamide dbr:Flecainide dbr:Bretylium Local Completeness dbp:routesOf Administration dbp:routesOf Administration dbp:routesOf Administration dbp:routesOf Administration
  • 12. RDF Completeness Model (2) ① Multiplicity of an RDF Resource Number of objects that a resource has for a certain predicate. 12 MOD(dbr:Procainamide, dbp:routesOfAdministration) = 3 dbr:Procainamide dbp:routesOfAdministration dbp:routesOfAdministration dbp:routesOfAdministration dbr:Intravenous dbr:Intramuscular_injection dbr:Oral_administration
  • 13. RDF Completeness Model (3) ② Aggregated Multiplicity of a Class Given a predicate, median number of distinct objects that have all the resources that belong to a class. 13 AMOD(dbo:Drug, dbp:routesOfAdministration) = 3 MOD(dbr:Procainamide, dbp:routesOfAdministration) = 3 MOD(dbr:Bretylium, dbp:routesOfAdministration) = 2
  • 14. RDF Completeness Model (4) ③ Local Completeness of an RDF Resource Given a predicate, the completeness of an RDF resource is determined by the aggregated predicate multiplicity of the classes that it belongs to. 14 CompD(dbr:Procainamide | dbp:routesOfAdministration) = CompD(dbr:Bretylium | dbp:routesOfAdministration) = CompD(dbr:Flecainide | dbp:routesOfAdministration) = 3 3 2 3 ①Computed in Computed in ② 0 3
  • 15. Crowd Knowledge Bases (1) • The knowledge collected from the crowd is captured in three KBs: • CKB+, CKB–, CKB~ are fuzzy RDF datasets composed of 4-tuples: 15 CKB~ CKB+ CKB– (subject, predicate, object, membership_degree) RDF triple
  • 16. Crowd Knowledge Bases (2) 16 Types of Crowd Knowledge Bases (dbr: Acadesine, dbp:routesOfAdministration, _:o2, 0.78) “Flecainide is administered orally.” (dbr:Flecainide, dbp:routesOfAdministration, dbr:Oral_administration, 0.9) “Flecainide does not have a (known) route of administration.” (dbr:Flecainide, dbp:routesOfAdministration, _:o1, 0.05) “I am not sure if Acadesine has a route of administration.” CKB+ CKB- CKB~ Contradiction (C) Unknownness (U)
  • 17. Query Engine (1) • The engine computes the probability of crowdsourcing a triple pattern t in query Q, denoted PCROWD(t). • If PCROWD(t) is greater than a user threshold τ, then the query engine crowdsources the triple pattern t. • α is a score weight between 0.0 and 1.0. 17 PCROWD (t) = α (1 – Comp(t)) + (1 – α) max {max{m+, m–}, min{C(t), 1 – U(t)}} Estimated incompleteness Crowd unreliability Crowd confidence
  • 18. Query Engine (2) • The engine combines mappings obtained from the dataset D and fuzzy mappings from the crowd stored in CKB+. • We define a fuzzy set semantics for SPARQL. 18 ({?drug à dbr:Isoprenaline, ?route à dbr:Intravenous}, 0.94) {?drug à dbr:Isoprenaline, ?route à dbr:Inhalation} CKB+ D The complexity of computing the mapping set of a SPARQL query under fuzzy set semantics is the same as under set semantics. The HARE query engine does not increase the time complexity of computing the mapping set of a SPARQL query. Corollary Theorem
  • 19. Microtask Manager (1) 19 • Receives triple patterns to crowdsource. • Creates human tasks. • Submits tasks to the crowdsourcing platform. (dbr:Flecainide, dbp:routesOfAdministration, ?route)
  • 20. dbr:Flecainide Microtask Manager (2) 20 rdfs:label Flecainide acetate (/flɛˈkeɪnaɪd/ US dict: fle·kā′·nīd) is a classic Ic antyarrhythmic agent (...) rdfs:comment wiki-commons:Special:FilePath/ Flecainide_structure.svg foaf:depiction http://en.wikipedia.org/ wiki/Flecainide foaf:isPrimaryTopicOf dbp:routesOfAdministration “Flecainide“@en “routes of administration“@en RDF Graphs:
  • 22. Experimental Settings • Benchmark: 50 queries against (English version, 2014). • Ten queries in five different knowledge domains: History, Life Sciences, Movies, Music, and Sports. • Implementation details: • Dataset (queries executed directly against the dataset). • HARE (our proposed approach). • HARE BL (generates microtask interfaces replacing URIs by labels). • Crowdsourcing configuration: • The crowd is reached via CrowdFlower. • Four different triple patterns per task, 0.07 US$ per task (Sep. 2015). • At least 3 answers were collected per task. 22
  • 23. Overview of the Results • Benchmark: 50 queries against (English version, 2014). • Ten queries in five different knowledge domains: History, Life Sciences, Movies, Music, and Sports. • Implementation details: • Dataset (queries executed directly against the dataset). • HARE (our proposed approach). • HARE BL (generates microtask interfaces replacing URIs by labels). • Crowdsourcing configuration: • The crowd is reached via CrowdFlower. • Four different triple patterns per task, 0.07 US$ per task (Sep. 2015). • At least 3 answers were collected per task. 23 Total triple patterns crowdsourced: 1,004 Total answers collected from the crowd: 3,163 75%-98% of the crowd answers were produced in 12 minutes
  • 24. 0 500 1000 1500 0.00 0.25 0.50 0.75 1.00 τ Crowdsourcedtriplepatterns Sports History LifeSciences Music Movies 24 # Crowdsourced Triple Patterns per Domain The RDF completeness model considerably reduces the number of triple patterns to crowdsource (τ >= 0.5). Effectiveness of the RDF Completeness Model
  • 25. Completeness of Query Answers Sports Music Life Sciences Movies History Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0.00 0.25 0.50 0.75 1.00 Query Recall Dataset HARE−BL HARE 25 Recall of tested approaches w.r.t. D* per SPARQL query Recall varies across queries and knowledge domains. Completing answers in certain domains is more challenging.
  • 26. Sports Music Life Sciences Movies History Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0.00 0.25 0.50 0.75 1.00 Query Recall Dataset HARE−BL HARE Completeness of Query Answers 26 Recall of tested approaches w.r.t. D* per SPARQL query HARE outperforms the other approaches across all knowledge domains. Our RDF completeness model captures the skewed distributions of values. Recall varies across queries and knowledge domains. Completing answers in certain domains is more challenging. ✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓ ✓✓✓✓ ✓✓✓✓✓ ✓ ✓✓✓✓ ✓✓✓✓✓
  • 27. Quality of Crowd Answers: Precision 27 The crowd exhibits heterogeneous performance within domains. This supports the importance of HARE triple-based approach.
  • 28. Quality of Crowd Answers: Precision 28 The precision of the crowd answers is in general higher when crowdsourcing semantically enriched tasks.
  • 30. Conclusions • HARE: Hybrid query engine against RDF data sets. • Supports microtasks to enhance query answers on-the-fly. • Experimental results confirmed that: Future work • Study further approaches to capture crowd reliability. • Consider other quality dimensions on the knowledge collected from the crowd. 30 3.13 – 12 times Size of query answer Precision 0.62 – 0.97 Crowd quality Semantically enriched tasks
  • 31. Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal 31 HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing Size of query answer Precision Crowd quality SELECT DISTINCT ?drug WHERE { ?drug rdf:type dbo:Drug . ?drug dbo:atcPrefix “C01” . ?drug dbp:routesOfAdministration ?route . } Crowd Knowledge CKB+ CKB- CKB~ D