A Critique of the Proposed National Education Policy Reform
Crowdsourcing-enabled Linked Data management architecture
1. A semantically enabled architecture for
crowdsourced Linked Data management
Elena Simperl,1 Maribel Acosta,1 Barry Norton2
1Institute AIFB, Karlsruhe Institute of Technology, Germany
2Ontotext AD, Bulgaria
Institute of Applied Informatics and Formal Description Methods (AIFB)
Institute of Applied Informatics and Formal Description Methods (AIFB)
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association www.kit.edu
2. Background: What is Linked Data?
Linked Data: set of best practices
to publish and connect structured
data on the Web.
URIs to identify entities and
concepts in the world
HTTP to access and retrieve
resources and descriptions of
these resources
RDF as generic graph-based data
model to structure and link data
Taken together Linked Data is
said to form a ‘cloud’ of shared
references and vocabularies.
Query language: SPARQL.
http://linkeddata.org/faq
2 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
3. Background: Why Linked Data?
Data.gov & public sector information: more BBC & media: added value of
transparency and accountability in content through interlinking
governance
Google, Yahoo, Bing & schema.org:
enhanced search
3 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
4. Outline
1 • Motivation
2 • Our Approach
3 • Extensions to VoID and SPARQL
4 • Crowdsourced query processing tasks
5 • Advantages
6 • Challenges
4 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
5. 1. Motivation
User Query: Give me the German names of all commercial
airports in Baden-Württemberg, ordered by their most
informative description.
„Retrieve the labels in German of commercial airports located
in Baden-Württemberg, ordered by the better human-readable
description of the airport given in the comment“.
This query cannot be optimally answered automatically:
Incorrect/missing classification of entities (e.g. classification as
airports instead of commercial airports).
Missing information in data sets (e.g. German labels).
It is not possible to optimally perform subjective operations (e.g.
comparisons of pictures or NL comments).
5 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
6. 1. Motivation
„Retrieve the labels in German of commercial airports
located in Baden-Württemberg, ordered by the better human-
readable description of the airport given in the comment“.
In order to answer the query as intended:
Classification of airports as commercial airports.
Identity resolution of places (Baden-Württemberg).
Translation of the labels of the airports.
Ordering of the comments by a subjective comparison.
6 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
7. 1. Motivation
„Retrieve the labels in German of commercial airports
located in Baden-Württemberg, ordered by the better human-
readable description of the airport given in the comment“.
SPARQL Query:
SELECT ?label WHERE { Classification
1
?x a metar:CommercialHubAirport;
rdfs:label ?label;
rdfs:comment ?comment .
?x geonames:parentFeature ?z . Identity Resolution
2
?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> .
FILTER (LANG(?label) = "de") 3 Missing Information
4 Ordering
} ORDER BY CROWD(?comment, "Better description of %x")
7 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
8. 1. Motivation: Our Aim
SPARQL query engine, able to process queries using
seamless combination of automatic query processing and
crowdsourcing.
Query Results
Mediator
SPARQL query engine Crowdsourced query processing
Query parsing Task design UI generation
Query optimization
Query execution
Wrapper Wrapper Wrapper Wrapper
8 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
9. 2. Our Approach
Parser
Query Results Decomposes the input query.
SPARQL query engine
Selects the data sets that should be
Query parsing
accessed to produce answers.
Query optimization
Rewrites the query into the internal
Query execution structures.
9 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
10. 2. Our Approach
Optimizer
Query Results DB statistics and crowdsourcing
SPARQL query engine statistics: estimated time to completion,
Query parsing
and other information about the
performance (quality, cost) of the crowd.
Query optimization
Traditional data bases optimization
Query execution
techniques are implemented.
Determines which parts of the query
should be solved by human input: VoID
and SPARQL extensions.
Generates logical and physical plans.
10 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
11. 2. Our Approach
Executor
Query Results Implements physical operators.
SPARQL query engine
Invokes crowdsourcing component:
Query parsing
Creates tasks.
Query optimization
Generates UI.
Query execution
Infers facts automatically.
Executes query against Linked Data:
computational tasks.
Incorporates results from the human
input.
11 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
12. 3. Extensions to VoID and SPARQL
The RDF based schema to describe data sets is VoID
(Vocabulary of Interlinked Datasets).
Common VoID predicates: voidDataset,
void:inDataset, void:Linkset, void:linkPredicate,
void:target.
Automatic interlinking of datasets
VoID extensions: CrowdClass
CrowdProperty
13 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
13. 3. Extensions to VoID and SPARQL
Automatic interlinking of data sets
Example - Specification of Data Sets:
:METAR rdf:type void:Dataset . METAR
:Genonames rdf:type void:Dataset .
owl:sameAs
:METAR2Geonames rdf:type void:Linkset ;
void:linkPredicate owl:sameAs ;
void:target :METAR ; Geonames
void:target :Geonames .
14 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
14. 3. Extensions to VoID and SPARQL
CrowdClass
- Specifies which entities of a data set could be crowdsourced.
- All subclasses of the crowdClass are also defined (implicitly)
as crowdsourced entities.
Example:
metar:Airport void:inDataset :METAR .
metar:CommercialHubAirport void:inDataset :METAR;
rdfs:subClass metar:Airport .
metar:Airport rdf:type void:crowdClass .
metar:CommercialHubAirport rdf:type void:crowdClass.
15 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
15. 3. Extensions to VoID and SPARQL
RDF data can be queried using the language SPARQL.
Common SPARQL operators: join, union, optional,
filter, order by.
Properties related to general ontology languages such as
OWL are treated as extensions of SPARQL operators,
and are modeled in our architecture as tasks.
16 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
16. 4. Tasks
Formal, declarative description of the data and
tasks using SPARQL patterns as a basis for the
automatic design of HITs.
Identity resolution
Missing information
Ontological classification
Ordering (new operator)
17 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
17. 4.1. Ontological Classification
It is not always possible to automatically infer classification
from the properties.
Example: Retrieve the names (labels) of METAR stations that
correspond to commercial airports.
SELECT ?label WHERE {
?station a metar:CommercialHubAirport;
rdfs:label ?label .}
Input: {?station a metar:Station;
rdfs:label ?label;
wgs84:lat ?lat;
wgs84:long ?long}
Output: {?station a ?type.
?type rdfs:subClassOf metar:Station}
18 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
18. 4.2. Ordering
Orderings defined via less straightforward built-ins; for
instance, the ordering of pictorial representations of entities.
SPARQL extension: ORDER BY CROWD
Example: Retrieves all airports and their pictures, and the pictures should
be ordered according to the more representative image of the given airport.
SELECT ?airport ?picture WHERE {
?airport a metar:Airport;
foaf:depiction ?picture .
} ORDER BY CROWD(?picture,
"Most representative image for %airport")
Input: {?airport foaf:depiction ?x, ?y}
Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
19 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
19. 4.3. Computational tasks expressed as
SPARQL queries
Transitive relations inferred automatically, without
requiring human intervention.
Implementation of restrictions in SPIN.
Identity Resolution Classification Ordering
CONSTRUCT { CONSTRUCT { CONSTRUCT {
?a owl:sameAs ?c . ?a a ?b. {(?a ?b) a rdf:List .}
} WHERE { ?b rdfs:subClassOf ?c. } WHERE {
?a owl:sameAs ?b . } WHERE { (?a ?x) a rdf:List .
?b owl:sameAs ?c . ?a rdfs:subClassOf ?c. (?x ?b) a rdf:List .
} ?b rdfs:subClassOf ?b1. }
?b1 rdfs:subClassOf ?c.
}
20 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
20. 5. Advantages
Declarative description of data allows to decompose the
query.
Generation of the UIs automatically.
Generation of human tasks on-the-fly and adjustment of
the design of the task.
Automatic consistency check of results by reasoning
against validating ontology.
21 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
21. 6. Challenges
Appropriate level of granularity for HITs design for specific
SPARQL constructs.
Caching
Naively we can materialise HIT results into datasets.
How to deal with partial coverage and dynamic datasets.
Optimal user interfaces of graph-like content.
Pricing and workers’ assignment.
22 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)
22. QUESTIONS
23 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale
Linked Data management Beschreibungsverfahren (AIFB)