Microtask Crowdsourcing Applications for Linked Data
Upcoming SlideShare
Loading in...5
×
 

Microtask Crowdsourcing Applications for Linked Data

on

  • 559 views

 

Statistics

Views

Total Views
559
Views on SlideShare
422
Embed Views
137

Actions

Likes
5
Downloads
18
Comments
0

3 Embeds 137

http://www.euclid-project.eu 110
http://euclid-project.eu 22
https://twitter.com 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • What quality issues are humans able to detect?
  • embarrassingly parallelizable

Microtask Crowdsourcing Applications for Linked Data Microtask Crowdsourcing Applications for Linked Data Presentation Transcript

  • Microtask Crowdsourcing Applications for Linked Data
  • Architecture of Linked Data Applications Presentation Tier Logic Tier Data Tier Integrated Dataset Data Access Component Republication Republication Component Data Integration Component Vocabulary Mapping Interlinking SPARQL Wr. Physical Wrapper R2R Transf. Cleansing LD Wrapper RDF/ XML Web Data accessed via APIs SPARQL Endpoints EUCLID – Microtask crowdsourcing applications for Linked Data Relational Data Linked Data 2
  • Data Tier Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing • Consolidates the data retrieved from heterogeneous sources. • This component may operate at: – Schema level: Performs vocabulary mappings in order to translate data into a single unified schema. Links correspond to RDFS properties CH 2 or OWL property and class axioms. – Instance level: Performs entity linking, e.g., entity resolution via owl:sameAs links CH 3 EUCLID – Microtask crowdsourcing applications for Linked Data 3
  • Data Tier (2) Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing The data integration component can be enhanced by including microtask crowdsourcing apporaches: • Cleansing or data assessments: Assessment of DBpedia triples • Vocabulary mapping: CrowdMAP • Interlinking: ZenCrowd EUCLID – Microtask crowdsourcing applications for Linked Data 4
  • Other Crowdsourcing-based Solutions for Linked Data Tasks • Query understanding: CrowdDQ • Ontology population: OntoGame • Linked Data curation: Urbanopoly • … EUCLID – Microtask crowdsourcing applications for Linked Data 5
  • DBPEDIA QUALITY ASSESSMENT EUCLID – Microtask crowdsourcing applications for Linked Data
  • Assessing DBpedia Triples Correct {s p o .} Dataset {s p o .} Incorrect + Quality issue 1. Selecting LD quality issues generated by erroneous extraction mechanisms and that can be detected by the crowd 2. Selecting the appropriate crowdsourcing approaches 3. Designing and generating the interfaces to present the data to the crowd EUCLID – Microtask crowdsourcing applications for Linked Data
  • Selecting LD Quality Issues to Crowdsource Three categories of quality problems occur pervasively in DBpedia [Zaveri2013] and can be crowdsourced: • Incorrect object  Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”. • Incorrect data type  Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en. • Incorrect link to “external Web pages”  Example: dbpedia:John-Two-Hawks dbpediaowl:wikiPageExternalLink <http://cedarlakedvd.com/> EUCLID – Microtask crowdsourcing applications for Linked Data
  • Selecting Appropriate Crowdsourcing Approaches Verify Find Contest Microtasks LD Experts Difficult task Final prize Workers Easy task Micropayments TripleCheckMate MTurk [Kontoskostas2013] Adapted from [Bernstein2010] EUCLID – Microtask crowdsourcing applications for Linked Data
  • Presenting the Data to the Crowd Microtask interfaces: MTurk tasks Incorrect object • Selection of foaf:name or rdfs:label to extract humanreadable descriptions • Real object values extracted automatically from Wikipedia infoboxes Incorrect data type • Link to the Wikipedia article via foaf:isPrimaryTopicOf Incorrect outlink • Preview of external pages by implementing HTML iframe EUCLID – Microtask crowdsourcing applications for Linked Data
  • Results Object values Data types Interlinks Linked Data experts 0.7151 0.8270 0.1525 MTurk 0.8977 0.4752 0.9412 (majority voting) • Both forms of crowdsourcing can be applied to detect certain LD quality issues • The effort of LD experts must be applied on those tasks demanding specific-domain skills • MTurk crowd are exceptionally good at performing comparison of data entries EUCLID – Microtask crowdsourcing applications for Linked Data 11
  • ZENCROWD EUCLID – Microtask crowdsourcing applications for Linked Data
  • ZenCrowd: Entity Linking by the Crowd • Combine both algorithmic and manual linking • Automate manual linking via crowdsourcing • Dynamically assess human workers with a probabilistic reasoning framework Crowd Machines EUCLID – Microtask crowdsourcing applications for Linked Data Algorithms 13
  • http://dbpedia.org/resource/Facebook HTML: <p>Facebook is not waiting for its initial public offering to make its first big purchase.</p><p>In its largest acquisition to date, the social network has purchased Instagram, the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</p> http://dbpedia.org/resource/Instagram owl:sameAs fbase:Instagram Google RDFa enrichment Android <p><span about="http://dbpedia.org/resource/Facebook"><cit e property=”rdfs:label">Facebook</cite> is not waiting for its initial public offering to make its first big purchase.</span></p><p><span about="http://dbpedia.org/resource/Instagram">In its largest acquisition to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</span></p> EUCLID – Microtask crowdsourcing applications for Linked Data 14
  • ZenCrowd Architecture HTML Pages Input Z enCrowd Micro Matching Tasks MicroTask Manager Entity Extractors Crowdsourcing Platform HTML+ RDFa Pages Output Algorithmic Matchers Decision Engine Probabilistic Network LOD Index Get Entity Workers Decisions LOD Open Data Cloud Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on World Wide Web (WWW 2012). EUCLID – Microtask crowdsourcing applications for Linked Data 15
  • Entity Factor Graphs • Graph components pw1( ) w1 – Workers, links, clicks Observed variables – Prior probabilities c11 c21 – Link Factors Link – Constraints factors w2 c12 lf1( ) • Probabilistic Inference SameAs l1 constraints c22 c13 lf2( ) sa1-2( ) pl1( ) – Select all links with posterior prob >τ Worker priors pw2( ) l2 pl2( ) c23 lf3( ) u2-3( ) l3 Dataset Unicity constraints pl3( ) Link priors 2 workers, 6 clicks, 3 candidate links EUCLID – Microtask crowdsourcing applications for Linked Data 16
  • Lessons Learnt • Crowdsourcing + Prob reasoning works! • But – Different worker communities perform differently – Many low quality workers – Completion time may vary (based on reward) • Need to find the right workers for your task (see WWW13 paper) EUCLID – Microtask crowdsourcing applications for Linked Data 17
  • ZenCrowd Summary • ZenCrowd: Probabilistic reasoning over automatic and crowdsourcing methods for entity linking • Standard crowdsourcing improves 6% over automatic • 4% - 35% improvement over standard crowdsourcing • 14% average improvement over automatic approaches http://exascale.info/zencrowd/ • Follow up-work (VLDBJ): – Also used for instance matching across datasets – 3-way blocking with the crowd EUCLID – Microtask crowdsourcing applications for Linked Data 18
  • CROWDQ – CROWD-POWERED QUERY UNDERSTANDING EUCLID – Microtask crowdsourcing applications for Linked Data
  • Motivation • Web Search Engines can answer simple factual queries directly on the result page • Users with complex information needs are often unsatisfied • Purely automatic techniques are not enough • We want to solve it with Crowdsourcing! EUCLID – Microtask crowdsourcing applications for Linked Data 20
  • CrowdQ • CrowdQ is the first system that uses crowdsourcing to – Understand the intended meaning – Build a structured query template – Answer the query over Linked Open Data Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013). EUCLID – Microtask crowdsourcing applications for Linked Data 21
  • 22
  • CrowdQ Architecture Off-line: query template generation with the help of the crowd On-line: query template matching using NLP and search over open data Keyword Query On# line'Complex'Query Processing Complex query classifier User Y Off# line'Complex'Query Decomposition query POS + NER tagging N N Structured Query Vetrical selection, Unstructured Search, ... Crowd Manager Match with existing Queries Templ + Answer Types query templates t1 t2 t3 Template Generation Answer Composition Query Template Index SERP Query Log Structured LOD Search Crowdsourcing Platform Result Joiner 23 LOD Open Data Cloud
  • Hybrid Human-Machine Pipeline Q= birthdate of actors of forrest gump Query annotation Noun Noun Named entity Verification Is forrest gump this entity in the query? Entity Relations Which is the relation between: actors and forrest gump Schema element Starring Verification Is the relation between: Indiana Jones – Harrison Ford Back to the Future – Michael J. Fox of the same type as Forrest Gump – actors starring <dbpedia-owl:starring> EUCLID – Microtask crowdsourcing applications for Linked Data 24
  • Structured query generation Q= birthdate of actors of forrest gump SELECT ?y ?x WHERE { ?y <dbpedia-owl:birthdate> ?x . ?z <dbpedia-owl:starring> ?y . ?z <rdfs:label> ‘Forrest Gump’ } Results from BTC09: EUCLID – Microtask crowdsourcing applications for Linked Data 25
  • CROWDMAP & OTHERS EUCLID – Microtask crowdsourcing applications for Linked Data
  • CrowdMAP • Experiments using MTurk, CrowdFlower and established benchmarks • Enhancing the results of automatic techniques • Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012] CartP 301-304 100R50P Edas-Iasted 100R50P Ekaw-Iasted 100R50P Cmt-Ekaw 100R50P ConfOf-Ekaw Imp 301-304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0 27
  • Taste IT! Try IT! • • • • Restaurant review Android app developed in the Insemtives project Uses Dbpedia concepts to generate structured reviews Uses mechanism design/gamification to configure incentives User study – 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 2500 2000 1500 1000 500 0 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes) https://play.google.com/store/apps/details?id=insemtives.android&hl=en 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 28
  • LODrefine http://research.zemanta.com/crowds-to-the-rescue/ 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 29
  • Ontology Population 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 30
  • Linked Data Curation EUCLID – Microtask crowdsourcing applications for Linked Data 31
  • Problems and Challenges • What is feasible and how can tasks be optimally translated into microtasks? – Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions • What to show to users – Natural language descriptions of Linked Data/SPARQL – How much context – What form of rendering – How about links? • How to combine with automatic tools – Which results to validate • • • Low precision (no fun for gamers...) Low recall (vs all possible questions) How to embed it into an existing application – Tasks are fine granular, perceived as additional burden to the actual functionality • What to do with the resulting data? – Integration into existing practices – Vocabularies! 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 32
  • Web site: https://sites.google.com/site/microtasktutorial/ SLIDES and EXERCISES: https://github.com/maribelacosta/crowdsourcingtutorial Full-day tutorial ISWC2013 Sydney Australia 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 33
  • For exercises, quiz and further material visit our website: http://www.euclid-project.eu Course eBook Other channels: @euclid_project euclidproject EUCLID – Microtask crowdsourcing applications for Linked Data euclidproject 34