Aaai2012

  • 330 views
Uploaded on

 

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
330
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Crowdsourcing tasks in open query answering Elena Simperl,1 Barry Norton,2 Denny Vrandecic1 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB)Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Background: what is Linked Data? Linked Data: set of best practices to publish and connect structured data on the Web. URIs to identify entities and concepts in the world HTTP to access and retrieve resources and descriptions of these resources RDF as generic graph-based data model to structure and link data Taken together Linked Data is said to form a ‘cloud’ of shared references and vocabularies. http://linkeddata.org/faq2 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 3. Background: why is Linked Data important?Data.gov & public sector information: BBC & media: added value ofmore transparency and accountability ingovernance content through interlinking Google, Yahoo, Bing & schema.org: enhanced search3 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 4. Crowdsourcing Linked Data management Tasks requiring human contributions Interlinking Conceptual modeling Labeling and translation Classification Ordering Crowdsourcing already in use4 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 5. Example: open query answering Query FOAF data using the vCard vocabulary hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ; foaf:nick "Harry" ; foaf:familyName "Potter" . SELECT ?name ?email WHERE { ?p vcard:email ?email ; vcard:fn ?name } In order to answer the query as intended Vocabulary mapping and entity resolution (FOAF to vCard) Metadata completion (full name is “Harry Potter”)5 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 6. Crowdsourcing-enabled query answering • Integral part of a query engine At design time application developer specifies which data portions workers can process and via which types of HITs At run time The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Reducing the number of tasks through automatic reasoning6 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 7. Example: Identity resolution Identity resolution involves the creation of links, either by comparison of metadata or by investigation of links on the human Web. Input: {?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along} Output: {OPTIONAL {?airport owl:sameAs ?station}}7 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 8. Example: Classification Classification of entities to classes cannot be always automatically inferred from the schema. Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}8 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 9. Challenges Decomposition of queries Query optimisation obfuscates what is used and should involve costs for human tasks Query execution and caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content (Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming9 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 10. QUESTIONS10 07.06.2012 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)