A semantically enabled architecture for crowdsourced Linked Data management Elena Simperl,1 Maribel Acosta,1 Barry Norton2...
Background: What is Linked Data?      Linked Data: set of best practices     to publish and connect structured     data on...
Background: Why Linked Data?Data.gov & public sector information: more                                 BBC & media: added ...
Outline                 1       • Motivation                 2       • Our Approach                 3       • Extensions t...
1. Motivation                        User Query: Give me the German names of all commercial                        airport...
1. Motivation    „Retrieve the labels in German of commercial airports    located in Baden-Württemberg, ordered by the bet...
1. Motivation    „Retrieve the labels in German of commercial airports    located in Baden-Württemberg, ordered by the bet...
1. Motivation: Our Aim        SPARQL query engine, able to process queries using       seamless combination of automatic q...
2. Our Approach                                                Parser     Query                    Results                ...
2. Our Approach                                                 Optimizer      Query                    Results           ...
2. Our Approach                                                 Executor      Query                    Results            ...
3. Extensions to VoID and SPARQL         The RDF based schema to describe data sets is VoID        (Vocabulary of Interlin...
3. Extensions to VoID and SPARQL          Automatic interlinking of data sets     Example - Specification of Data Sets:   ...
3. Extensions to VoID and SPARQL           CrowdClass      - Specifies which entities of a data set could be crowdsourced....
3. Extensions to VoID and SPARQL          RDF data can be queried using the language SPARQL.         Common SPARQL operato...
4. Tasks         Formal, declarative description of the data and        tasks using SPARQL patterns as a basis for the    ...
4.1. Ontological Classification         It is not always possible to automatically infer classification        from the pr...
4.2. Ordering         Orderings defined via less straightforward built-ins; for        instance, the ordering of pictorial...
4.3. Computational tasks expressed as     SPARQL queries          Transitive relations inferred automatically, without    ...
5. Advantages         Declarative description of data allows to decompose the        query.          Generation of the UIs...
6. Challenges         Appropriate level of granularity for HITs design for specific        SPARQL constructs.          Cac...
QUESTIONS23   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewand...
Upcoming SlideShare
Loading in …5
×

Crowdsourcing-enabled Linked Data management architecture

994 views

Published on

Crowdsourcing with and for Linked Data at the CrowdSearch workshop @WWW2012

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
994
On SlideShare
0
From Embeds
0
Number of Embeds
125
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Crowdsourcing-enabled Linked Data management architecture

  1. 1. A semantically enabled architecture for crowdsourced Linked Data management Elena Simperl,1 Maribel Acosta,1 Barry Norton2 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB)Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  2. 2. Background: What is Linked Data? Linked Data: set of best practices to publish and connect structured data on the Web. URIs to identify entities and concepts in the world HTTP to access and retrieve resources and descriptions of these resources RDF as generic graph-based data model to structure and link data Taken together Linked Data is said to form a ‘cloud’ of shared references and vocabularies. Query language: SPARQL. http://linkeddata.org/faq2 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  3. 3. Background: Why Linked Data?Data.gov & public sector information: more BBC & media: added value oftransparency and accountability in content through interlinkinggovernance Google, Yahoo, Bing & schema.org: enhanced search3 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  4. 4. Outline 1 • Motivation 2 • Our Approach 3 • Extensions to VoID and SPARQL 4 • Crowdsourced query processing tasks 5 • Advantages 6 • Challenges4 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  5. 5. 1. Motivation User Query: Give me the German names of all commercial airports in Baden-Württemberg, ordered by their most informative description. „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. This query cannot be optimally answered automatically: Incorrect/missing classification of entities (e.g. classification as airports instead of commercial airports). Missing information in data sets (e.g. German labels). It is not possible to optimally perform subjective operations (e.g. comparisons of pictures or NL comments).5 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  6. 6. 1. Motivation „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human- readable description of the airport given in the comment“. In order to answer the query as intended: Classification of airports as commercial airports. Identity resolution of places (Baden-Württemberg). Translation of the labels of the airports. Ordering of the comments by a subjective comparison.6 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  7. 7. 1. Motivation „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human- readable description of the airport given in the comment“. SPARQL Query: SELECT ?label WHERE { Classification 1 ?x a metar:CommercialHubAirport; rdfs:label ?label; rdfs:comment ?comment . ?x geonames:parentFeature ?z . Identity Resolution 2 ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> . FILTER (LANG(?label) = "de") 3 Missing Information 4 Ordering } ORDER BY CROWD(?comment, "Better description of %x")7 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  8. 8. 1. Motivation: Our Aim SPARQL query engine, able to process queries using seamless combination of automatic query processing and crowdsourcing. Query Results Mediator SPARQL query engine Crowdsourced query processing Query parsing Task design UI generation Query optimization Query execution Wrapper Wrapper Wrapper Wrapper8 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  9. 9. 2. Our Approach Parser Query Results Decomposes the input query. SPARQL query engine Selects the data sets that should be Query parsing accessed to produce answers. Query optimization Rewrites the query into the internal Query execution structures.9 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  10. 10. 2. Our Approach Optimizer Query Results DB statistics and crowdsourcing SPARQL query engine statistics: estimated time to completion, Query parsing and other information about the performance (quality, cost) of the crowd. Query optimization Traditional data bases optimization Query execution techniques are implemented. Determines which parts of the query should be solved by human input: VoID and SPARQL extensions. Generates logical and physical plans.10 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  11. 11. 2. Our Approach Executor Query Results Implements physical operators. SPARQL query engine Invokes crowdsourcing component: Query parsing Creates tasks. Query optimization Generates UI. Query execution Infers facts automatically. Executes query against Linked Data: computational tasks. Incorporates results from the human input.11 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  12. 12. 3. Extensions to VoID and SPARQL The RDF based schema to describe data sets is VoID (Vocabulary of Interlinked Datasets). Common VoID predicates: voidDataset, void:inDataset, void:Linkset, void:linkPredicate, void:target. Automatic interlinking of datasets VoID extensions: CrowdClass CrowdProperty13 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  13. 13. 3. Extensions to VoID and SPARQL Automatic interlinking of data sets Example - Specification of Data Sets: :METAR rdf:type void:Dataset . METAR :Genonames rdf:type void:Dataset . owl:sameAs :METAR2Geonames rdf:type void:Linkset ; void:linkPredicate owl:sameAs ; void:target :METAR ; Geonames void:target :Geonames .14 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  14. 14. 3. Extensions to VoID and SPARQL CrowdClass - Specifies which entities of a data set could be crowdsourced. - All subclasses of the crowdClass are also defined (implicitly) as crowdsourced entities. Example: metar:Airport void:inDataset :METAR . metar:CommercialHubAirport void:inDataset :METAR; rdfs:subClass metar:Airport . metar:Airport rdf:type void:crowdClass . metar:CommercialHubAirport rdf:type void:crowdClass.15 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  15. 15. 3. Extensions to VoID and SPARQL RDF data can be queried using the language SPARQL. Common SPARQL operators: join, union, optional, filter, order by. Properties related to general ontology languages such as OWL are treated as extensions of SPARQL operators, and are modeled in our architecture as tasks.16 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  16. 16. 4. Tasks Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs. Identity resolution Missing information Ontological classification Ordering (new operator)17 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  17. 17. 4.1. Ontological Classification It is not always possible to automatically infer classification from the properties. Example: Retrieve the names (labels) of METAR stations that correspond to commercial airports. SELECT ?label WHERE { ?station a metar:CommercialHubAirport; rdfs:label ?label .} Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station}18 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  18. 18. 4.2. Ordering Orderings defined via less straightforward built-ins; for instance, the ordering of pictorial representations of entities. SPARQL extension: ORDER BY CROWD Example: Retrieves all airports and their pictures, and the pictures should be ordered according to the more representative image of the given airport.SELECT ?airport ?picture WHERE { ?airport a metar:Airport; foaf:depiction ?picture .} ORDER BY CROWD(?picture,"Most representative image for %airport") Input: {?airport foaf:depiction ?x, ?y}Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}19 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  19. 19. 4.3. Computational tasks expressed as SPARQL queries Transitive relations inferred automatically, without requiring human intervention. Implementation of restrictions in SPIN. Identity Resolution Classification Ordering CONSTRUCT { CONSTRUCT { CONSTRUCT { ?a owl:sameAs ?c . ?a a ?b. {(?a ?b) a rdf:List .} } WHERE { ?b rdfs:subClassOf ?c. } WHERE { ?a owl:sameAs ?b . } WHERE { (?a ?x) a rdf:List . ?b owl:sameAs ?c . ?a rdfs:subClassOf ?c. (?x ?b) a rdf:List . } ?b rdfs:subClassOf ?b1. } ?b1 rdfs:subClassOf ?c. }20 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  20. 20. 5. Advantages Declarative description of data allows to decompose the query. Generation of the UIs automatically. Generation of human tasks on-the-fly and adjustment of the design of the task. Automatic consistency check of results by reasoning against validating ontology.21 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  21. 21. 6. Challenges Appropriate level of granularity for HITs design for specific SPARQL constructs. Caching Naively we can materialise HIT results into datasets. How to deal with partial coverage and dynamic datasets. Optimal user interfaces of graph-like content. Pricing and workers’ assignment.22 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  22. 22. QUESTIONS23 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)

×