Linked
(Open)
Data
VU
Web
Engineering
/
TU
Wien
May
27th
2013
-‐
Bernhard
Haslhofer
-‐
About
me
• Since
03/2013
Postdoc
@
University
of
Vienna
• Previously
– Lecturer
&
Postdoc
@
Cornell
University,
NY,
USA
– Univ.
Ass
@
University
of
Vienna
– …
– WINF
TU
Wien
2003,
INF
TU
Wien
2006
2
About
me
• Research
Interests
– Web
informaZon
systems
– Globally
connected,
Web-‐based
data
networks
• Structured
Web
Data
(Linked
Data,
schema.org,
(FB)
Open
Graph
Protocol,
etc.)
• Knowledge
Graphs
(e.g.,
DBpedia,
Freebase)
• AnnotaZons
/
SemanZc
Tagging
• Quality
in
Open
Data
Networks
• ….
3
My
teaching
philosophy
• A
course
is
a
collaboraZve
experience
• Instructor
provides
– Structure
– FoundaZon
for
learning
• Students
– Engage,
contribute,
challenge
– Ask
quesZons!
– Think
criZcally!
– Disagree
if
appropriate!
4
Aren’t we beyond that?
My
plan
for
today…
• Linked
(Open)
Data
???
• Linked
Data
–
Intro
&
Overview
• Linked
Data
-‐
Technologies
• Recent
Trends
and
Developments
• QuesZons
/
Discussion
5
Open
Data
“Open
data
is
data
that
can
be
freely
used,
reused
and
redistributed
by
anyone
-‐
subject
only,
at
most,
to
the
requirement
to
a:ribute
and
sharealike.”
(Open
Data
Handbook,
2012,
Open
Knowledge
FoundaZon)
6
“Open”
Data
DefiniZon
• Availability
and
Access
– Data
must
be
available
as
a
whole
and
at
no
more
than
a
reasonable
reproducZon
cost,
preferably
by
downloading
over
the
internet
– Data
must
also
be
available
in
a
convenient
and
modifiable
form
• Reuse
and
RedistribuZon
– Data
must
be
provided
under
terms
that
permit
reuse
and
redistribuZon
including
the
intermixing
with
other
datasets.
• Universal
ParZcipaZon
– Everyone
must
be
able
to
use,
reuse
and
redistribute
(no
discriminaZon)
– No
‘non-‐commercial’
restricZons
(hip://opendefiniZon.org/okd/)
7
Open
Data
Movement
8
Source: http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photo
QuesZons
• Why
should
the
open
data
principles
sound
familiar
to
sokware
engineers?
• Any
known
“open
data”
examples?
9
Linked
Data
“A
method
of
publishing
structured
data
so
that
it
can
be
interlinked
and
become
more
useful.
It
builds
upon
standard
Web
technologies
such
as
HTTP,
RDF
and
URIs,
but
rather
than
using
them
to
serve
web
pages
for
human
readers,
it
extends
them
to
share
informaLon
in
a
way
that
can
be
read
automaLcally
by
computers.
This
enables
data
from
different
sources
to
be
connected
and
queried”
[Bizer,
Heath,
Berners-‐Lee
2009]
20
My
plan
for
today…
• Linked
(Open)
Data
???
• Linked
Data
–
Intro
&
Overview
• Linked
Data
-‐
Technologies
• Recent
Trends
and
Developments
• QuesZons
/
Discussion
22
Web
Architecture
• A
set
of
simple
standards
– Uniform
global
addressing
(URI)
– Uniform
document
encoding
(HTML)
– Uniform
transportaZon
(HTTP)
• Hyperlinks
connecZng
documents
• Works
preiy
well
for
accessing
and
exchanging
documents
Web
Services
and
Web
APIs
Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/
Web
Services
and
Web
APIs
• Each
Web
API
has
a
proprietary
interface
• Datasources
must
be
known
in
advance
• InformaZon
enZZes
(papers,
authors,
subjects,
etc.)
are
oken
not
linked
Linked
Data
Vision
• Publish
and
link
structured
data
on
the
Web
• Create
a
single
globally
connected
data
space
based
on
the
Web
Architecture
Web
of
Linked
Data
• A
set
of
simple
standards
– Uniform
global
addressing
(URI)
– Uniform
data
model
(RDF)
– Uniform
transportaZon
(HTTP)
• RDF
links
connecZng
enZZes
• Forms
a
global
data
space
and
facilitates
accessing
and
exchanging
data
What
is
Linked
Data?
• A
method
to
build
a
Web
of
Data
• Architectural
style,
set
of
standards
Linking
Open
Data
Project
• A
W3C
community
project
with
the
goal
to
extend
the
Web
with
a
data
commons
by
publishing
various
open
data
sets
as
RDF
on
the
Web
and
by
serng
links
between
data
items
from
different
sources
My
plan
for
today…
• Linked
(Open)
Data
???
• Linked
Data
–
Intro
&
Overview
• Linked
Data
-‐
Technologies
• Recent
Trends
and
Developments
• QuesZons
/
Discussion
50
Web
/
REST
Basics
-‐
Recap
• Key
Architectural
Web
Components
– IdenZficaZon:
URI
– InteracZon:
HTTP
– Standardized
Document
Formats:
HTML,
XML,
JSON,
etc.
51
Web
/
REST
Basics
-‐
Recap
• URIs
idenZfy
interesZng
things
– documents
on
the
Web
– relevant
aspects
of
a
data
set
– phone
numbers,
Skype
usernames,
e-‐mail
addresses
• HTTP
URIs
name
and
address
resources
in
Web-‐based
systems
52
Web
/
REST
Basics
-‐
Recap
• A
resource
can
have
several
representaZons
• RepresentaZons
can
be
in
any
format
– HTML
– XML
– JSON
– …
URI
Resource
Representation
Plain Text
text/plain
http://example.com/someURI
Representation
HTML
text/html
Representation
JSON
text/json
53
Web
/
REST
Basics
-‐
Recap
• We
deal
with
resource
representaZons
– not
the
resources
themselves
(pass
by
value)
– representaZons
can
be
in
any
format
(defined
by
media-‐type)
• Each
resource
implements
a
standard
uniform
interface
(HTTP)
– a
small
set
of
verbs
applied
to
a
large
set
of
nouns
– verbs
are
universal
and
not
invented
on
a
per-‐applicaZon
basis
Client Server
Logical
Resources
Physical
Resources
JSON
Resource Representations
Uniform
Interface
54
Web
/
REST
Basics
-‐
Recap
HTML,
XHTML,
...
XML,
JSON,
...
Transport and store data
Display information
55
Web
/
REST
Basics
-‐
Recap
• Example
Web
Service
operaZons:
– Publish
image
on
Flickr
– Order
a
book
at
Amazon
– Post
a
message
on
your
friend’s
Facebook
wall
– Update
user
photo
on
foursquare
Web
Application A Application B
API
56
RDF
• A
data
model
for
represenZng
data
on
the
Web
• Several
statements
(triples)
form
a
graph
http://dbpedia.org/resource/
The_Shining_(film)
The Shining (film)
rdfs:label
闪灵 (电影)
rdfs:label
http://dbpedia.org/ontology/
Film
rdf:type
http://dbpedia.org/resource/
Jack_Nicholson
dbpprop:starring
http://xmlns.com/foaf/0.1/
Person
rdf:type
1937-04-22 Jack Nicholson
dbpedia-owl:birthDate
foaf:name
RDF/XML,
N3,
Turtle,
etc.
• Data
formats
for
RDF
resource
representaZons
• Used
to
transfer
RDF
data
between
apps
RDFS
• A
language
for
describing
the
syntax
and
semanZcs
of
schemas/vocabularies
in
a
machine-‐understandable
way
http://dbpedia.org/ontology/
Film
http://dbpedia.org/ontology/
Work
rdfs:subClassOf
OWL
• A
more
expressive
(formal)
language
for
defining
the
syntax
and
semanZcs
of
schemas/vocabularies
• Solves
RDFS
shortcomings
but
introduces
quite
some
complexity
http://dbpedia.org/ontology/
starring
http://www.w3.org/2002/07/
owl#ObjectProperty
http://dbpedia.org/ontology/
Person
http://dbpedia.org/ontology/
Work
starring
rdf:type
rdfs:range
rdfs:domain
rdfs:label
SKOS
• A
language
for
describing
controlled
vocabularies
(taxonomies,
thesauri,
classificaZon
schemes)
http://dbpedia.org/resource/
The_Shining_(film)
http://dbpedia.org/resource/
Category:1980s_horror_films
http://dbpedia.org/resource/
Category:1980s_films
http://www.w3.org/2004/02/
skos/core#Concept
dcterms:subject rdf:type
skos:broader
rdf:type
SPARQL
• A
query
language
and
protocol
for
accessing
RDF
data
on
the
Web
SELECT DISTINCT ?x!
WHERE {!
!?x dcterms:subject !
!<http://dbpedia.org/resource/Category:1980s_horror_films> .!
}!
Database
Systems
Analogy...
Purpose
Rela,onal
Database
Management
Systems
(RDBMS)
Linked
Data
Technologies
Query
Schema
DefiniZon
Language
Data
RepresentaZon
IdenZfiers
63
?
Database
Systems
Analogy...
Purpose
Rela,onal
Database
Management
Systems
(RDBMS)
Linked
Data
Technologies
Query
SQL
SPARQL
Schema
DefiniZon
Language
SQL
DDL
RDFS
/
OWL
Data
RepresentaZon
RelaZonal
Model
/
Tables
RDF
/
Graph
IdenZfiers
Primary
Keys
(numeric
sequences)
URI
64
Publishing
Linked
Data
• DisZnguish
between
non-‐informaZon
and
informaZon
resource
• Sample
non-‐informaZon
resource
– hip://dbpedia.org/resource/The_Shining_(film)
• Sample
informaZon
resource
– hip://dbpedia.org/page/The_Shining_(film)
-‐
HTML
– hip://dbpedia.org/data/The_Shining_(film)
-‐
RDF
Publishing
Linked
Data
GET http://dbpedia.org/resource/The_Shining_(film)
Accept: application/rdf+xml
303 See Other
Location: http://dbpedia.org/data/The_Shining_(film)
GET http://dbpedia.org/data/The_Shining_(film)
Accept: application/rdf+xml
200 OK
...
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF ...
Publishing
Large
RDF
Datasets
• Run
a
servlet
that
implements
the
303
publishing
approach
– for
non
informaZon
resources
• parse
Accept
Header
field
• Redirect
(303
See
Also)
to
corresponding
informaZon
resource
• Generate
RDF
SerializaZon
dynamically
from
underlying
data
storage
My
plan
for
today…
• Linked
(Open)
Data
???
• Linked
Data
–
Intro
&
Overview
• Linked
Data
-‐
Technologies
• Recent
Trends
and
Developments
• QuesZons
/
Discussion
68
Microdata
(HTML5)
• A
very
young
HTML
5
proposiZon
that
extends
Microformats
and
addresses
its
shortcomings
• Items
are
created
within
an
itemscope
• Every
item
is
assigned
an
arbitrary
number
of
properZes
(itemprop)
and
relaZonships
(itemref)
• Uses
global
idenZfiers
for
typing
and
naming
items
schema.org
/
Microdata
example
<h1>Pirates of the Carribean: On Stranger Tides (2011)</h1>!
Jack Sparrow and Barbossa embark on a quest to find the
elusive fountain!
of youth, only to discover that Blackbeard and his daughter
are after it too.!
!
Director: Rob Marshall!
Writers: Ted Elliott, Terry Rossio, and 7 more credits!
Stars: Johnny Depp, Penelope Cruz, Ian McShane!
8/10 stars from 200 users. Reviews: 50.!
schema.org
• Defines
– a
number
of
types
(e.g,
person),
organized
in
an
inheritance
hierarchy
– a
number
of
properZes
(e.g.,
name)
• Extension
mechanisms
to
extend
the
schemas
• OWL
representaZon:
hip://schema.org/docs/schemaorg.owl
• hip://schema.rdfs.org/index.html
76
Google
Knowledge
Graph
• Enables
search
for
things
(people,
places)
that
Google
knows
about
• Rooted
in
public
sources
such
as
Freebase,
Wikipedia,
CIA
World
Factbook,
etc.
– augmented
to
500M
objects,
3.5B
facts
and
relaZonship
• Next
generaZon
search
(semanZc
index)
82
Readings
• Tom
Heath
and
ChrisZan
Bizer
(2011)
Linked
Data:
Evolving
the
Web
into
a
Global
Data
Space
(1st
ediZon).
Synthesis
Lectures
on
the
SemanZc
Web:
Theory
and
Technology,
1:1,
1-‐136.
Morgan
&
Claypool.
• Jason
Ronallo:
HTML5
Microdata
and
Schema.org
hip://journal.code4lib.org/arZcles/6400