Your SlideShare is downloading. ×
0
Crowdsourcing tasks in Linked Data management Elena Simperl,1 Barry Norton,2 Denny Vrandecic1 1Institute         AIFB, Kar...
Motivation        Various aspects of Linked Data management       naturally rely on human intelligence to yield       opti...
Microtask platforms                                                                 Break task                            ...
Approach        Formal, declarative description of the data and tasks       using SPARQL patterns as a basis for the autom...
Examples of Linked Data tasks    amenable to crowdsourcing         Identity resolution         Metadata completion and che...
Running Example6   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und For...
Identity resolution    Identity Resolution “involves the creation of sameAs    links, either by comparison of metadata or ...
Metadata completion & correction    “Certain properties, necessary for a given query,    may not be uniformly populated. M...
Classification    “Linked Data emphasis[es…] relationships between    resources [over classification]. [D]ue to the promot...
Ordering     “Having means to rank Linked Data content along     specific dimensions is typically deemed useful for       ...
Translation     “[An important] aspect of the labeling of resources for     humans is multi-linguality […] actual provisio...
Open query answering          Query a FOAF-file using the vCard vocabulary     hp:Harry foaf:mbox <mailto:scarface@hogwart...
Limitations of microtask crowdsourcing          Decomposability          Verifiability          Expertise         Composit...
Challenges          Decomposition of user-visible queries:                  SPARQL                       Easy: Low quality...
Further Challenges         Appropriate level of granularity for HITs design for        specific SPARQL constructs and typi...
QUESTIONS16   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale ...
Upcoming SlideShare
Loading in...5
×

Crowdsourcing tasks in Linked Data management

1,874

Published on

Talk delivered at the Consuming Linked Data Workshop, ISWC 2011

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,874
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Crowdsourcing tasks in Linked Data management"

  1. 1. Crowdsourcing tasks in Linked Data management Elena Simperl,1 Barry Norton,2 Denny Vrandecic1 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB)Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  2. 2. Motivation Various aspects of Linked Data management naturally rely on human intelligence to yield optimal results But reaching a critical mass of useful contributions from all relevant stakeholders is still more an art than an engineering exercise2 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  3. 3. Microtask platforms Break task Evaluate the Define task into smaller results units3 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  4. 4. Approach Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Integral part of Linked Data tools and applications At design time application developer specifies which data portions workers can process and via which types of HITs At run time The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results4 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  5. 5. Examples of Linked Data tasks amenable to crowdsourcing Identity resolution Metadata completion and checking/correction Classification Ordering Quantitative Qualitative Translation5 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  6. 6. Running Example6 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  7. 7. Identity resolution Identity Resolution “involves the creation of sameAs links, either by comparison of metadata or by investigation of links on the human Web.” Input: {?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along} Output: {OPTIONAL {?airport owl:sameAs ?station}}7 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  8. 8. Metadata completion & correction “Certain properties, necessary for a given query, may not be uniformly populated. Manually conducted research might be necessary to transfer this information from the human-readable Web” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long; dbp:icao ?badicao} Output: {?station dbp:icao ?goodicao}8 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  9. 9. Classification “Linked Data emphasis[es…] relationships between resources [over classification]. [D]ue to the promoted use of generic vocabularies, is it not always possible to infer classification from […] properties” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}9 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  10. 10. Ordering “Having means to rank Linked Data content along specific dimensions is typically deemed useful for quantitative querying and browsing […both] “specific” ordering [(e.g. timestamps) … and] orderings […] via qualitative “less straightforward” built-ins [(e.g. pref/alt labels)]” Input: {?station foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}10 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  11. 11. Translation “[An important] aspect of the labeling of resources for humans is multi-linguality […] actual provision of labels in non-English languages is currently rather low” Input: {?station rdfs:label ?enlabel. FILTER (LANG(?label) = "EN")} Output: {?station rdfs:label ?bglabel. FILTER (LANG(?label) = "BG")}11 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  12. 12. Open query answering Query a FOAF-file using the vCard vocabulary hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ; foaf:nick "Harry" ; foaf:familyName "Potter" . SELECT ?name ?email WHERE { ?p vcard:email ?email ; vcard:fn ?name } In order to answer the query as intended Vocabulary mapping and entity resolution (foaf to vcard) Metadata completion (full name is Harry Potter)12 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  13. 13. Limitations of microtask crowdsourcing Decomposability Verifiability Expertise Compositions to deal with tasks with underspecified workflow and/or multiple correct answers13 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  14. 14. Challenges Decomposition of user-visible queries: SPARQL Easy: Low quality (meta)data can be subject to automated checking (even if not fixing) Medium: Missing data (and translation) can be automatically identified (but knowing to which dataset it should belong is not necessarily clear) Difficult: Interlinking (at least sameAs) is somewhat implicit (using entailment) and knowing where user expects Query optimisation obfuscates what is used and should involve costs for human tasks Pig might be somewhat easier in latter regard Caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets14 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  15. 15. Further Challenges Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content (Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming15 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  16. 16. QUESTIONS16 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×