PlanetData: Consuming Structured Data at Web Scale

PlanetData: Consuming Structured
Data at Web Scale

Elena Simperl, Barry Norton, Karlsruhe Institute of Technology

1st International Symposium on Data-driven Process Discovery and Analysis

June 30, 2011, Campione d’Italia, Italy

PlanetData‘s Aim and Objectives

 Aim: establish an interdisciplinary,
sustainable European community on
large-scale data management
◦ Purposeful data exposure
Databases

◦ Novel and improved applications
Data and
Semantics Web
Mining

• Objectives
◦ Addressing challenges through integrated research
◦ Data and technology provisioning through PlanetData Lab
◦ Impact through training, dissemination, standardization
and networking
◦ Openness and flexibility through PlanetData Programs

Work Plan Highlights
 Methods and techniques to publish, access and manage stream-
like data
 Quality assessment of interlinked data sets, including best
practices for the representation and usage of spatio-temporal
information
 Provenance and access control framework for Linked (Stream)
Data

 Data sets and vocabularies, including best practices for
publishing and managing self-descriptive data

 Linked Services and Processes as an instrument to develop
applications

 Yearly summer school co-located with the Extended Semantic
Web Conference
 Semantic Web video journal

 PlanetData Programs

The Rise of Linked Data

8/10/2011 Slide 4 of x

Data.gov & public sector information
 Many data sets useful for business
intelligence

BBC & Media
 Value of content increased by Linked Data

BestBuy & eCommerce
 Structured mark-up increases visibility

Linked Data Cloud
 Taken together Linked Data is said to form
a ‘cloud’ of shared references and
vocabularies

(growing on a weekly basis)

Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF,
SPARQL)
4. Include links to other URIs, so that they can
discover more things.

 Bring together semantic technologies and the
Web architecture
 Applied to other types of data as well: stream-
like, multimedia…

Consuming Linked Data

8/10/2011 Slide 10 of x

Services Over Linked Data
 A problem can be seen in the
current Linked Data sphere
when it comes to
services/APIs/functionalities

 The standards are often not
then used

 The results of service
interaction do not
contribute to the Linked
Data cloud

 Developers have to work
with heterogeneous
representations RDF

RDF Services at the BBC
 This is not a problem of scale, efficiency
or speed

RDF-based
communication
efficiently
realised using
memcached

04.08.201 Real-time updates to a large
0
(ferocious) audience

Linked Open Services
 Aim to promote services over Linked Data
bringing together:

 RESTful services (respecting Web
architecture)
◦ Resource-oriented
◦ Manipulated with HTTP verbs
 GET, PUT (, PATCH), POST, DELETE
◦ Negotiate representations
 Linked Data
◦ Uniform use of URIs
◦ Use of RDF and SPARQL

Linked Services: Principles
 Concretely, Linked Open Services come with a
set of guiding principles:
1. Describe services as LOD prosumers
with input and output descriptions as SPARQL graph
patterns
2. Communicate RDF by RESTful content negotiation
3. Communicate and describe the knowledge
contribution resulting from service interaction,
including implicit knowledge relating input, output and
service provider
 Associated with the last principle is an optional
fourth:
4. When wrapping non-LOS services, extend the (lifted,
if non-RDF) message to make explicit the implicit
knowledge, and to use Linked Data vocabularies, using
SPARQL CONSTRUCT queries
http://www.linkedopenservices.org/blog/?page_id=2

LOS Weather Service

Input: [a wgs84:Point; wgs84:lat ?lat; wgs84:long ?long]
Output:[met:weatherObservation [
weather:hasStationID ?icao
geonames:inCountry ?country;
...
weather:hasWindEvent
[weather:windDirection ?windDirection],
[weather:windSpeed ?windSpeed]

Linked Processes: Principles
 In order to compose Linked Services we are
not specific about the style, except that RDF
must be stored and forwarded

 Principles:
◦ Decide control flow conditions based on SPARQL
ASK queries
◦ Base iteration on SPARQL SELECT queries
◦ Define dataflow/mediation based on SPARQL
CONSTRUCT queries

 In this way compositions, ‘mash-up’s, etc.,
also use the languages/technologies most
familiar to the Linked Data community

LOP Media Monitoring Process
 A Social Media Manager is required to monitor
(micro)blogging sites and respond to negative comments:

10.08.2011

Composition Service 1
 A service may monitor the ‘Twittersphere’ for tweets with a
given tag

Harvest
Input: {?t a sioc_t:Tag; rdfs:label ?l}
Output: {?p a sioc_t:MicroblogPost;
sioc:topic ?t;
sioc:has_creator ?m;
sioc:content ?c .
OPTIONAL {?p sioc:addressed_to ?a}}

10.08.2011

 A sentiment analysis service may annotate (micro)blog posts
according to, e.g., the Human Emotion Ontology

AnalyseSentiment
Input: {?p a sioc:Post; sioc:content ?c}
Output: {?e a heo:Emotion;
heo:hasManifestationInMedia ?p;
heo:hasCategory ?c}

10.08.2011

 A human service selects among possible combinations of
these and optionally raises a response

ManageMicroblog
Input: {?p a sioc_t:MicroblogPost;
sioc:has_creator ?m.
?e heo:hasManifestationInMedia ?p.
{?e heo:hasCategory heo:anger UNION
?e heo:hasCategory heo:disgust}}
Output: {OPTIONAL {?r a sioc_t:MicroblogPost;
sioc:addressed_to ?m}}

10.08.2011

PlanetData Collaborations

8/10/2011 Slide 22 of x

http://www.planet-data.eu
Join PlanetData
 Associate partners have
 Access to open training infrastructure
 Early access to ongoing PD results through
participation in PlanetData meetings
 Opportunity to shape the results and topics of the
PD Programs through contribution of
requirements and use cases
 PlanetData Programs call in 2012

PlanetData: Consuming Structured Data at Web Scale

More Related Content

What's hot

Viewers also liked

Similar to PlanetData: Consuming Structured Data at Web Scale

More from PlanetData Network of Excellence

Recently uploaded

PlanetData: Consuming Structured Data at Web Scale