Microtask Crowdsourcing
Applications for Linked Data
Architecture of
Linked Data Applications
Presentation Tier
Logic Tier
Data Tier

Integrated
Dataset

Data Access
Component...
Data Tier
Data Integration Component
Data Access
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

C...
Data Tier (2)
Data Integration Component
Data Access
Component

Data Integration Component
Vocabulary
Mapping

Interlinkin...
Other Crowdsourcing-based
Solutions for Linked Data Tasks
• Query understanding: CrowdDQ

• Ontology population: OntoGame
...
DBPEDIA QUALITY ASSESSMENT

EUCLID – Microtask crowdsourcing
applications for Linked Data
Assessing DBpedia Triples
Correct

{s p o .}
Dataset

{s p o .}
Incorrect +
Quality issue

1. Selecting LD quality issues ...
Selecting LD Quality
Issues to Crowdsource
Three categories of quality problems occur
pervasively in DBpedia [Zaveri2013]
...
Selecting Appropriate
Crowdsourcing Approaches
Verify

Find

Contest

Microtasks

LD Experts
Difficult task
Final prize

W...
Presenting the Data
to the Crowd
Microtask interfaces: MTurk tasks

Incorrect object

• Selection of foaf:name or
rdfs:lab...
Results
Object values

Data types

Interlinks

Linked Data
experts

0.7151

0.8270

0.1525

MTurk

0.8977

0.4752

0.9412
...
ZENCROWD

EUCLID – Microtask crowdsourcing
applications for Linked Data
ZenCrowd: Entity Linking by
the Crowd

• Combine both algorithmic and manual linking
• Automate manual linking via crowdso...
http://dbpedia.org/resource/Facebook

HTML:
<p>Facebook is not waiting for its initial
public offering to make its first b...
ZenCrowd Architecture
HTML
Pages

Input

Z enCrowd

Micro
Matching
Tasks

MicroTask Manager

Entity
Extractors

Crowdsourc...
Entity Factor Graphs
• Graph components

pw1( )

w1

– Workers, links, clicks
Observed
variables
– Prior probabilities
c11...
Lessons Learnt
• Crowdsourcing + Prob reasoning works!
• But
– Different worker communities perform differently
– Many low...
ZenCrowd Summary
• ZenCrowd: Probabilistic reasoning over automatic and
crowdsourcing methods for entity linking
• Standar...
CROWDQ – CROWD-POWERED
QUERY UNDERSTANDING
EUCLID – Microtask crowdsourcing
applications for Linked Data
Motivation
• Web Search Engines can answer simple factual
queries directly on the result page
• Users with complex informa...
CrowdQ
• CrowdQ is the first system that uses
crowdsourcing to
– Understand the intended meaning
– Build a structured quer...
22
CrowdQ Architecture
Off-line: query template generation with the help of the crowd
On-line: query template matching using ...
Hybrid Human-Machine
Pipeline
Q= birthdate of actors of forrest gump
Query annotation

Noun

Noun

Named entity

Verificat...
Structured query generation
Q= birthdate of actors of forrest gump
SELECT ?y ?x
WHERE { ?y <dbpedia-owl:birthdate> ?x .
?z...
CROWDMAP & OTHERS

EUCLID – Microtask crowdsourcing
applications for Linked Data
CrowdMAP
• Experiments using MTurk, CrowdFlower and established benchmarks
• Enhancing the results of automatic techniques...
Taste IT! Try IT!
•
•
•
•

Restaurant review Android app developed in the Insemtives project
Uses Dbpedia concepts to gene...
LODrefine

http://research.zemanta.com/crowds-to-the-rescue/
11/11/2013

EUCLID – Microtask crowdsourcing
applications for...
Ontology Population

11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

30
Linked Data Curation

EUCLID – Microtask crowdsourcing
applications for Linked Data

31
Problems and Challenges
•

What is feasible and how can tasks be optimally translated into microtasks?
– Examples: data qu...
Web site:
https://sites.google.com/site/microtasktutorial/
SLIDES and EXERCISES:
https://github.com/maribelacosta/crowdsou...
For exercises, quiz and further material visit our website:

http://www.euclid-project.eu

Course

eBook

Other channels:
...
Upcoming SlideShare
Loading in...5
×

Microtask Crowdsourcing Applications for Linked Data

865

Published on

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
865
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
28
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • What quality issues are humans able to detect?
  • embarrassingly parallelizable
  • Transcript of "Microtask Crowdsourcing Applications for Linked Data"

    1. 1. Microtask Crowdsourcing Applications for Linked Data
    2. 2. Architecture of Linked Data Applications Presentation Tier Logic Tier Data Tier Integrated Dataset Data Access Component Republication Republication Component Data Integration Component Vocabulary Mapping Interlinking SPARQL Wr. Physical Wrapper R2R Transf. Cleansing LD Wrapper RDF/ XML Web Data accessed via APIs SPARQL Endpoints EUCLID – Microtask crowdsourcing applications for Linked Data Relational Data Linked Data 2
    3. 3. Data Tier Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing • Consolidates the data retrieved from heterogeneous sources. • This component may operate at: – Schema level: Performs vocabulary mappings in order to translate data into a single unified schema. Links correspond to RDFS properties CH 2 or OWL property and class axioms. – Instance level: Performs entity linking, e.g., entity resolution via owl:sameAs links CH 3 EUCLID – Microtask crowdsourcing applications for Linked Data 3
    4. 4. Data Tier (2) Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing The data integration component can be enhanced by including microtask crowdsourcing apporaches: • Cleansing or data assessments: Assessment of DBpedia triples • Vocabulary mapping: CrowdMAP • Interlinking: ZenCrowd EUCLID – Microtask crowdsourcing applications for Linked Data 4
    5. 5. Other Crowdsourcing-based Solutions for Linked Data Tasks • Query understanding: CrowdDQ • Ontology population: OntoGame • Linked Data curation: Urbanopoly • … EUCLID – Microtask crowdsourcing applications for Linked Data 5
    6. 6. DBPEDIA QUALITY ASSESSMENT EUCLID – Microtask crowdsourcing applications for Linked Data
    7. 7. Assessing DBpedia Triples Correct {s p o .} Dataset {s p o .} Incorrect + Quality issue 1. Selecting LD quality issues generated by erroneous extraction mechanisms and that can be detected by the crowd 2. Selecting the appropriate crowdsourcing approaches 3. Designing and generating the interfaces to present the data to the crowd EUCLID – Microtask crowdsourcing applications for Linked Data
    8. 8. Selecting LD Quality Issues to Crowdsource Three categories of quality problems occur pervasively in DBpedia [Zaveri2013] and can be crowdsourced: • Incorrect object  Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”. • Incorrect data type  Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en. • Incorrect link to “external Web pages”  Example: dbpedia:John-Two-Hawks dbpediaowl:wikiPageExternalLink <http://cedarlakedvd.com/> EUCLID – Microtask crowdsourcing applications for Linked Data
    9. 9. Selecting Appropriate Crowdsourcing Approaches Verify Find Contest Microtasks LD Experts Difficult task Final prize Workers Easy task Micropayments TripleCheckMate MTurk [Kontoskostas2013] Adapted from [Bernstein2010] EUCLID – Microtask crowdsourcing applications for Linked Data
    10. 10. Presenting the Data to the Crowd Microtask interfaces: MTurk tasks Incorrect object • Selection of foaf:name or rdfs:label to extract humanreadable descriptions • Real object values extracted automatically from Wikipedia infoboxes Incorrect data type • Link to the Wikipedia article via foaf:isPrimaryTopicOf Incorrect outlink • Preview of external pages by implementing HTML iframe EUCLID – Microtask crowdsourcing applications for Linked Data
    11. 11. Results Object values Data types Interlinks Linked Data experts 0.7151 0.8270 0.1525 MTurk 0.8977 0.4752 0.9412 (majority voting) • Both forms of crowdsourcing can be applied to detect certain LD quality issues • The effort of LD experts must be applied on those tasks demanding specific-domain skills • MTurk crowd are exceptionally good at performing comparison of data entries EUCLID – Microtask crowdsourcing applications for Linked Data 11
    12. 12. ZENCROWD EUCLID – Microtask crowdsourcing applications for Linked Data
    13. 13. ZenCrowd: Entity Linking by the Crowd • Combine both algorithmic and manual linking • Automate manual linking via crowdsourcing • Dynamically assess human workers with a probabilistic reasoning framework Crowd Machines EUCLID – Microtask crowdsourcing applications for Linked Data Algorithms 13
    14. 14. http://dbpedia.org/resource/Facebook HTML: <p>Facebook is not waiting for its initial public offering to make its first big purchase.</p><p>In its largest acquisition to date, the social network has purchased Instagram, the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</p> http://dbpedia.org/resource/Instagram owl:sameAs fbase:Instagram Google RDFa enrichment Android <p><span about="http://dbpedia.org/resource/Facebook"><cit e property=”rdfs:label">Facebook</cite> is not waiting for its initial public offering to make its first big purchase.</span></p><p><span about="http://dbpedia.org/resource/Instagram">In its largest acquisition to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</span></p> EUCLID – Microtask crowdsourcing applications for Linked Data 14
    15. 15. ZenCrowd Architecture HTML Pages Input Z enCrowd Micro Matching Tasks MicroTask Manager Entity Extractors Crowdsourcing Platform HTML+ RDFa Pages Output Algorithmic Matchers Decision Engine Probabilistic Network LOD Index Get Entity Workers Decisions LOD Open Data Cloud Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on World Wide Web (WWW 2012). EUCLID – Microtask crowdsourcing applications for Linked Data 15
    16. 16. Entity Factor Graphs • Graph components pw1( ) w1 – Workers, links, clicks Observed variables – Prior probabilities c11 c21 – Link Factors Link – Constraints factors w2 c12 lf1( ) • Probabilistic Inference SameAs l1 constraints c22 c13 lf2( ) sa1-2( ) pl1( ) – Select all links with posterior prob >τ Worker priors pw2( ) l2 pl2( ) c23 lf3( ) u2-3( ) l3 Dataset Unicity constraints pl3( ) Link priors 2 workers, 6 clicks, 3 candidate links EUCLID – Microtask crowdsourcing applications for Linked Data 16
    17. 17. Lessons Learnt • Crowdsourcing + Prob reasoning works! • But – Different worker communities perform differently – Many low quality workers – Completion time may vary (based on reward) • Need to find the right workers for your task (see WWW13 paper) EUCLID – Microtask crowdsourcing applications for Linked Data 17
    18. 18. ZenCrowd Summary • ZenCrowd: Probabilistic reasoning over automatic and crowdsourcing methods for entity linking • Standard crowdsourcing improves 6% over automatic • 4% - 35% improvement over standard crowdsourcing • 14% average improvement over automatic approaches http://exascale.info/zencrowd/ • Follow up-work (VLDBJ): – Also used for instance matching across datasets – 3-way blocking with the crowd EUCLID – Microtask crowdsourcing applications for Linked Data 18
    19. 19. CROWDQ – CROWD-POWERED QUERY UNDERSTANDING EUCLID – Microtask crowdsourcing applications for Linked Data
    20. 20. Motivation • Web Search Engines can answer simple factual queries directly on the result page • Users with complex information needs are often unsatisfied • Purely automatic techniques are not enough • We want to solve it with Crowdsourcing! EUCLID – Microtask crowdsourcing applications for Linked Data 20
    21. 21. CrowdQ • CrowdQ is the first system that uses crowdsourcing to – Understand the intended meaning – Build a structured query template – Answer the query over Linked Open Data Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013). EUCLID – Microtask crowdsourcing applications for Linked Data 21
    22. 22. 22
    23. 23. CrowdQ Architecture Off-line: query template generation with the help of the crowd On-line: query template matching using NLP and search over open data Keyword Query On# line'Complex'Query Processing Complex query classifier User Y Off# line'Complex'Query Decomposition query POS + NER tagging N N Structured Query Vetrical selection, Unstructured Search, ... Crowd Manager Match with existing Queries Templ + Answer Types query templates t1 t2 t3 Template Generation Answer Composition Query Template Index SERP Query Log Structured LOD Search Crowdsourcing Platform Result Joiner 23 LOD Open Data Cloud
    24. 24. Hybrid Human-Machine Pipeline Q= birthdate of actors of forrest gump Query annotation Noun Noun Named entity Verification Is forrest gump this entity in the query? Entity Relations Which is the relation between: actors and forrest gump Schema element Starring Verification Is the relation between: Indiana Jones – Harrison Ford Back to the Future – Michael J. Fox of the same type as Forrest Gump – actors starring <dbpedia-owl:starring> EUCLID – Microtask crowdsourcing applications for Linked Data 24
    25. 25. Structured query generation Q= birthdate of actors of forrest gump SELECT ?y ?x WHERE { ?y <dbpedia-owl:birthdate> ?x . ?z <dbpedia-owl:starring> ?y . ?z <rdfs:label> ‘Forrest Gump’ } Results from BTC09: EUCLID – Microtask crowdsourcing applications for Linked Data 25
    26. 26. CROWDMAP & OTHERS EUCLID – Microtask crowdsourcing applications for Linked Data
    27. 27. CrowdMAP • Experiments using MTurk, CrowdFlower and established benchmarks • Enhancing the results of automatic techniques • Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012] CartP 301-304 100R50P Edas-Iasted 100R50P Ekaw-Iasted 100R50P Cmt-Ekaw 100R50P ConfOf-Ekaw Imp 301-304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0 27
    28. 28. Taste IT! Try IT! • • • • Restaurant review Android app developed in the Insemtives project Uses Dbpedia concepts to generate structured reviews Uses mechanism design/gamification to configure incentives User study – 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 2500 2000 1500 1000 500 0 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes) https://play.google.com/store/apps/details?id=insemtives.android&hl=en 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 28
    29. 29. LODrefine http://research.zemanta.com/crowds-to-the-rescue/ 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 29
    30. 30. Ontology Population 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 30
    31. 31. Linked Data Curation EUCLID – Microtask crowdsourcing applications for Linked Data 31
    32. 32. Problems and Challenges • What is feasible and how can tasks be optimally translated into microtasks? – Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions • What to show to users – Natural language descriptions of Linked Data/SPARQL – How much context – What form of rendering – How about links? • How to combine with automatic tools – Which results to validate • • • Low precision (no fun for gamers...) Low recall (vs all possible questions) How to embed it into an existing application – Tasks are fine granular, perceived as additional burden to the actual functionality • What to do with the resulting data? – Integration into existing practices – Vocabularies! 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 32
    33. 33. Web site: https://sites.google.com/site/microtasktutorial/ SLIDES and EXERCISES: https://github.com/maribelacosta/crowdsourcingtutorial Full-day tutorial ISWC2013 Sydney Australia 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 33
    34. 34. For exercises, quiz and further material visit our website: http://www.euclid-project.eu Course eBook Other channels: @euclid_project euclidproject EUCLID – Microtask crowdsourcing applications for Linked Data euclidproject 34
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×