Crowdsourcing tasks in Linked Data management

  • 1,771 views
Uploaded on

Talk delivered at the Consuming Linked Data Workshop, ISWC 2011

Talk delivered at the Consuming Linked Data Workshop, ISWC 2011

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,771
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
14
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Crowdsourcing tasks in Linked Data management Elena Simperl,1 Barry Norton,2 Denny Vrandecic1 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB)Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Motivation Various aspects of Linked Data management naturally rely on human intelligence to yield optimal results But reaching a critical mass of useful contributions from all relevant stakeholders is still more an art than an engineering exercise2 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 3. Microtask platforms Break task Evaluate the Define task into smaller results units3 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 4. Approach Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Integral part of Linked Data tools and applications At design time application developer specifies which data portions workers can process and via which types of HITs At run time The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results4 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 5. Examples of Linked Data tasks amenable to crowdsourcing Identity resolution Metadata completion and checking/correction Classification Ordering Quantitative Qualitative Translation5 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 6. Running Example6 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 7. Identity resolution Identity Resolution “involves the creation of sameAs links, either by comparison of metadata or by investigation of links on the human Web.” Input: {?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along} Output: {OPTIONAL {?airport owl:sameAs ?station}}7 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 8. Metadata completion & correction “Certain properties, necessary for a given query, may not be uniformly populated. Manually conducted research might be necessary to transfer this information from the human-readable Web” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long; dbp:icao ?badicao} Output: {?station dbp:icao ?goodicao}8 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 9. Classification “Linked Data emphasis[es…] relationships between resources [over classification]. [D]ue to the promoted use of generic vocabularies, is it not always possible to infer classification from […] properties” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}9 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 10. Ordering “Having means to rank Linked Data content along specific dimensions is typically deemed useful for quantitative querying and browsing […both] “specific” ordering [(e.g. timestamps) … and] orderings […] via qualitative “less straightforward” built-ins [(e.g. pref/alt labels)]” Input: {?station foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}10 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 11. Translation “[An important] aspect of the labeling of resources for humans is multi-linguality […] actual provision of labels in non-English languages is currently rather low” Input: {?station rdfs:label ?enlabel. FILTER (LANG(?label) = "EN")} Output: {?station rdfs:label ?bglabel. FILTER (LANG(?label) = "BG")}11 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 12. Open query answering Query a FOAF-file using the vCard vocabulary hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ; foaf:nick "Harry" ; foaf:familyName "Potter" . SELECT ?name ?email WHERE { ?p vcard:email ?email ; vcard:fn ?name } In order to answer the query as intended Vocabulary mapping and entity resolution (foaf to vcard) Metadata completion (full name is Harry Potter)12 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 13. Limitations of microtask crowdsourcing Decomposability Verifiability Expertise Compositions to deal with tasks with underspecified workflow and/or multiple correct answers13 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 14. Challenges Decomposition of user-visible queries: SPARQL Easy: Low quality (meta)data can be subject to automated checking (even if not fixing) Medium: Missing data (and translation) can be automatically identified (but knowing to which dataset it should belong is not necessarily clear) Difficult: Interlinking (at least sameAs) is somewhat implicit (using entailment) and knowing where user expects Query optimisation obfuscates what is used and should involve costs for human tasks Pig might be somewhat easier in latter regard Caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets14 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 15. Further Challenges Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content (Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming15 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 16. QUESTIONS16 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)