Linked Data Generation Process

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

1st
Summer
School
on

Smart
Ci2es
and
Linked
Open
Data
(LD4SC-‐15)

Linked
Data
Genera=on
Process

Raúl
García-‐Castro,

Filip
Radulovic,
Oscar
Corcho,
María
Poveda,

Víctor
Rodríguez-‐Doncel,
Asunción
Gómez-‐Pérez,
Daniel
Vila-‐Suero

Presenter:
Raúl
García-‐Castro

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Index

•  Linked
Open
Data
in
Smart
Ci2es

•  Guidelines
for
the
Genera=on
of
Linked
Data

•  Discussion

•  Hands-‐on
Descrip=on

2

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Data
in
smart
ci=es

hQp://br.ﬁberhomegroup.com/pt/Enterprise/324/2282.aspx

3

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

•  For
example,
(re)using
open
transport
data

–  Provide
travel
informa=on
to
persons

–  Allow
beQer
mul=modal
route
planning

–  Facilitate
public
transport
management

–  …

–  Accessibility

•  Which
metro
accesses
are
accessible
for
wheelchair
users?

•  In
which
bus
stops
is
it
safer
and
more
convenient
for
a

wheelchair
user
to
wait?

•  Is
there
any
accessible
parking
space
nearby
a
bus
stop?

•  etc.

Open
data…
for
what?

4

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Legal
framework
and
open
data
ini=a=ves

•  Aarhus
Conven=on
(1998)

–  Right
to
par=cipa=on
and
access;
41
countries
and
the
EU

•  Open
Access
Ini=a=ve
(2001)

–  Scien=ﬁc
informa=on
on
the
Web;
>
510
organisa=ons

•  PSI
Direc=ve

–  PSI
Reuse
(2003/98/EC)

•  Conven=on
for
the
access
to
oﬃcial
documents
(2009)

–  Signed
by
12
countries

–  Belgium,
Finland,
Norway,
Sweden,
Hungary,
Estonia,
Lithuania,
Slovenia,
Georgia,

Montenegro,
Serbia
and
Macedonia

•  Law
37/2007.
PSI
Reuse

•  Law
11/2007.
Ci=zen
access
to
public
services
and
right
to
the
quality
of
services

•  RD
4/2010
Na=onal
Interoperability
Scheme

–  Open
standards

–  Technology
neutral

–  Open
source
solware

•  RD
1495/2011
It
develops
law
37/2007

•  Norma
Técnica
de
Interoperabilidad
(19/02/2013,
BOE
4/3/2013)

Adapted
from
Antonio
Rodríguez
Pascual
(IGN)

5

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

The
problem:
lack
of
interoperability

Publish

Extract

Publish

Extract

Publish

Extract

I
want
to
publish
data
in

an
interoperable

structure
and
format

I
use
GTFS
I
use
my
own
CSV

structure

I
provide
a
web

service

Build
an
app
that
is

available
all
over
the

world

6

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Scenario:
open
transport
data

Is
there
any
open
transport

data
already?

We
are
surrounded
by
them

7

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Open
data
and
how
they
are
published

1)
In
no2ce
boards

–  For
those
who
have
a
lot
of
free
=me

–  Or
those
who
are
there
at
the
right
moment
in
=me

Adapted
from
Antonio
Rodríguez
Pascual
(IGN)

DATA

8

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Open
data
and
how
they
are
published

2)
In
web
pages
and
mobile
apps

–  For
people

Adapted
from
Antonio
Rodríguez
Pascual
(IGN)

On
the
Web,
open
license

DATA

9

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Open
data
and
how
they
are
published

2)
In
web
pages
and
mobile
apps

–  For
people

Adapted
from
Antonio
Rodríguez
Pascual
(IGN)

On
the
Web,
open
license

DATA

Machine-‐readable

Non-‐proprietary
format

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Open
data
and
how
they
are
published

3)
As
web
ﬁles

–  So
that
they
can
be
loaded
by
humans
in
their

informa=on
systems
(XML,
HTML,
CSV,
etc.)

–  Hopefully
it
is
not
a
scanned
PDF

Adapted
from
Antonio
Rodríguez
Pascual
(IGN)

On
the
Web,
open
license

DATA

Machine-‐readable

Non-‐proprietary
format

11

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain
Adapted
from
Antonio
Rodríguez
Pascual
(IGN)

Open
data
and
how
they
are
published

4)
Via
web
services

–  For
humans
and
machines

–  It
allows
genera=ng
added-‐value
services

–  And
can
be
integrated
in
the
applica=on
business
logic

On
the
Web,
open
license

DATA

Machine-‐readable

Non-‐proprietary
format

12

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

What
is
open
data?

•  Open
data
are
data
that
can
be
freely
used,
reused
and

redistributed
by
anyone
-‐
subject
only,
at
most,
to
the

requirement
to
a9ribute
and
sharealike.

•  The
most
important
aspects
to
consider:

–  Availability
and
Access:
data
must
be
available
as
a
whole
and
at
no

more
than
a
reasonable
reproduc2on
cost,
preferably
by

downloading
over
the
Internet.
Data
must
also
be
available
in
a

convenient
and
modiﬁable
form.

–  Reuse
and
Redistribu2on:
data
must
be
provided
under
terms
that

permit
reuse
and
redistribu2on
including
the
intermixing
with
other

datasets.

–  Universal
Par2cipa2on:
everyone
must
be
able
to
use,
reuse
and

redistribute
-‐
there
should
be
no
discrimina2on
against
ﬁelds
of

endeavour
or
against
persons
or
groups.
For
example,
‘non-‐
commercial’
or
‘only
in
educa=on’
restric=ons.

Source:
Open
Data
Handbook

13

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Scenario:
open
transport
data

Is
there
any
open
transport

data
already?

Can
we
do
it
beSer?

14

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Going
into
4
and
5

Linked
Data

Make
it
available
as
structured
data
(e.g.,
Excel
instead
of
image
scan
or
a
table)

Use
non-‐proprietary
formats
(e.g.,
CSV
instead
of
Excel)

Use
URIs
to
iden2fy
things,
so
that
people
can
point
at
your
stuﬀ

Link
your
data
to
other
data
to
provide
context

Make
your
stuﬀ
available
on
the
Web
(whatever
format)
under
an
open
license

15

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

USE
URIs
+
RDF
RDF
standards

José

Mobility

impairment

Boardgames

API

Mirasierra

Ven=squero
de

la
Condesa

Yes

CSV

Mega
Games

Ven=squero
de

la
Condesa

Yes

CSV

Mega
Games

Conquer
&

Smash!

MG
29,95

HTML

José

Mobility

Impairment

hasImpairment

Wheelchair
Accessibility

requires

Boardgame

likes

Mirasierra

address
Ven=squero
de

la
Condesa

Wheelchair
Accessibility

hasAccessibility

Mega

Games

address

hasAccessibility
Wheelchair
Accessibility

Ven=squero
de

la
Condesa

Mega

Games

Conquer
&

Smash!

is
a

Boardgame

sells

API
RDF
CSV
RDF
CSV
RDF
HTML
RDF

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Link
your
data
Linked
RDF

José

Mobility

impairment

Boardgames

Mirasierra

Ven=squero
de

la
Condesa

Yes

Mega
Games

Ven=squero
de

la
Condesa

Yes

Mega
Games

Conquer
&

Smash!

MG
29,95

API
CSV
CSV
HTML

José

Mobility

Impairment

hasImpairment

Wheelchair
Accessibility

requires

Boardgame

likes

Mirasierra

address
Ven=squero
de

la
Condesa

Wheelchair
Accessibility

Mega

Games

address

hasAccessibility
Wheelchair
Accessibility

Mega

Games

Conquer
&

Smash!

is
a

hasAccessibility

Boardgame

Ven=squero
de

la
Condesa

sells

API
RDF
CSV
RDF
CSV
RDF
HTML
RDF

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Wheelchair
Accessibility

Ven=squero
de

la
Condesa

Boardgame

Link
your
data
Linked
RDF

José

Mobility

impairment

Boardgames

Mirasierra

Ven=squero
de

la
Condesa

Yes

Mega
Games

Ven=squero
de

la
Condesa

Yes

Mega
Games

Conquer
&

Smash!

MG
29,95

API
CSV
CSV
HTML

José

Mobility

Impairment

hasImpairment

Wheelchair
Accessibility

requires

Boardgame

likes

Mirasierra

address
Ven=squero
de

la
Condesa

hasAccessibility
Wheelchair
Accessibility

Mega

Games

address
Ven=squero
de

la
Condesa

hasAccessibility
Wheelchair
Accessibility

Mega

Games

sells

Conquer
&

Smash!

is
a

Boardgame

API
RDF
CSV
RDF
CSV
RDF
HTML
RDF

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Make
complex
queries

Where
can
I
buy
the

Conquer
&
Smash!

game?

Which
are
the
most

accessible
routes
for

Christmas
shopping?

Expansion
pack
for
Conquer
&
Smash!

Take
metro
line
9
and
in
35
minutes

we
can
demo
it
to
you!

Or
beQer
take
bus
231
because
it

is
sunny
and
you
can
take
a

glance
at
the
outdoor
art

exhibi=on
in
Plaza
de
Cas=lla

MG

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Using
Linked
Open
Transport
Data

•  Calculate
accessible
routes

–  Combined
with
geographical
data
(IGN)

–  Which
stop
should
I
use
if
I
have
mobility
problems?

•  Commercial
routes
by
bus

–  Combined
with
Madrid’s
shop
census
(from
Ayto.
Madrid)

•  Geomarke=ng
decisions
for
enterpreneurs

–  Where
should
I
open
my
shop?
Based
on
the
combina=on
of

the
number
of
travellers
per
stop,
demographic
data,
data

about
other
businesses
and
shops
around,
etc.

•  Personalised
oﬀers
to
travellers

–  With
real-‐=me
data
and
data
about
consump=on
paQerns

(e.g.,
credit
card
transac=ons)

•  …

20

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Index

•  Linked
Open
Data
in
Smart
Ci=es

•  Guidelines
for
the
Genera2on
of
Linked
Data

•  Discussion

•  Hands-‐on
Descrip=on

21

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
life
cycle

Specification
Modelling
GenerationPublication
Exploitation
Linking
22

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Requirements
(smart
ci=es
domain)

1.  Tabular
formats
(i.e.,
SQL,
XLS
or
CSV)

–  Other
data
structures
(e.g.,
XML)
less
important
in
prac=ce

or
are
unstructured
and
would
require
much
more
work

2.  Changing
data
(dynamic
or
streaming
data),
versioning,

(automa=c)
data
quality
assurance
and
reliability

3.  Data
access
through
web
services,
proprietary
APIs
and

data
ﬁles

4.  Legal
aspects
(e.g.,
licensing,
data
ownership)

5.  Access
rights
management
or
mechanisms
for

extrac=ng
public
data
(plenty
of
conﬁden=al
data)

23

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
genera=on
process

Select data
source
Obtain
access to
data source
Analyse data
source
Analyse
licensing of
the data
source
Define resource
naming strategy
Transform
data source
Link with
other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology
Develop
ontology
24

F.
Radulovic,
M.
Poveda-‐Villalón,
D.
Vila-‐Suero,
V.
Rodríguez-‐Doncel,
R.
García-‐Castro
and
A.
Gómez-‐
Pérez,
Guidelines
for
Linked
Data
genera=on
and
publica=on:

An
example
in
building
energy
consump=on,
Automa=on
in
Construc=on,
Special
Issue
on
Linked
Data
in
Architecture
and
Construc=on.
Available
online
April
2015.

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
genera=on
process

Select data
source
Obtain
access to
data source
Analyse data
source
Analyse
licensing of
the data
source
Define resource
naming strategy
Transform
data source
Link with
other
datasets
Data source
Access, data
License
Schema, data
Ontology
RDF data
Linked dataset
Ontology
Develop
ontology
DATA PREPARATION
25

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Select
data
source

•  Select
the
data
source
that
will
be
transformed

into
Linked
Data

•  Steps:

– To
deﬁne
the
requirements
for
selec=on

– To
select
one
or
several
data
sources

•  The
data
set
may
be:

– Owned
by
your
organiza=on…

– …
or
not
(external
data
sources)

26

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Select
data
source
–
LCmple

•  Requirements

–  Real-‐world
scenario
in
the
smart
city
domain

–  Available
for
use

–  Available
in
machine-‐processable
format
(the
more

structured
the
data
are,
the
beQer)

–  Can
be
linked
with
generic
en==es
(e.g.,
loca=on)

•  Leeds
City
Council
–
energy
consump=on

–  hQp://data.gov.uk/dataset/council-‐energy-‐consump=on

27

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Obtain
access
to
data
source

•  Data
access
means

–  Technical
means
to
retrieve
the
data

–  Legal
rights
to
use
the
data

•  If
the
data
is
not
accessible:

–  To
iden=fy
the
person
to
contact

–  To
request
the
access

–  To
obtain
access
and
to
retrieve
the
data

•  Access
alterna=ves:

–  ﬁle,

–  programming
interface,

–  database,

–  data
stream,

–  etc.

28

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Obtain
access
to
data
source
–
Lample

•  Data
set
already
available
as
a
CSV
ﬁle

29

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analysing
licensing
of
the
data
source

•  Licenses
specify
the
legal
terms
under
which
a
data
set
can
be
used

and
exploited

•  Neither
legal
prescrip=ons
on
how
to
declare
licenses
nor
common

standard
prac=ces
to
do
so

•  Steps
(not
automatable):

–  To
iden=fy
the
rightsholder
and
the
authorita=ve
publisher

•  Righstholder
vs.
authorized
distributor

–  To
ﬁnd
the
applicable
license

•  Web
page,
data
set
metadata,
data
themselves

•  Contact
the
publisher

–  To
read
the
license
and
analyse
legal
terms

•  Tips

–  Analysis
should
be
performed
upon
all
copies
and
formats
of
the
data

–  Ensure
license
compa=bility
when
integra=ng
several
data
sources

30

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
resources
can
be
protected

Ontologies are intellectual works,
they can be protected by copyright
RDF Datasets can be considered as
databases, also legally protected in the EU
31

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Create, consume, aggregate,
derive and publish Linked Data
in a lawful environment
0
Always
license
your
data

…

Data
shops
Government
Individuals

32

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Licensed
Linked
Data

Non-‐licensed
Linked
Data
Licensed
Linked
Data

+License
Unless there is a license allowing to
do so, the resource cannot be copied,
modified or published.
In practice, non-licensed resources
are useless in industrial settings
Licensed Linked Data can be used
33

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Licensed
Linked
Data
in
prac=ce

Linked Open Data
Published
Open License
(Published) Linked Data
Published
No Open License
Linked Data
Not Published
No Open License
34

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

ç
Guidelines
for
licensing
linked
data

35

Add
"rights"
metadata
in
the
dataset
descrip=on

(e.g.,
VoID,
DCAT)
1

Use
standard
predicates
to
declare
"rights"
statements

(e.g.,
Dublin
Core
terms:
dc:rights,
dct:license)
2

?
Use
rights
declara2on

language,
e.g.,
ODRL

Yes
Use
URI
of
standard

license

e.g.,
CC0

3b
3a

No
Standard license available
ODRL

Open
Digital
Rights
Language

DCAT

Data
catalog
vocabulary

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Licensing
Linked
Data
is
Simple…

The
Bri=sh
Na=onal
Bibliography
(BNB)
lists
the
books

and
new
journal
=tles
published
or
distributed
in
the

United
Kingdom
and
Ireland
since
1950.

J

36

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

…
or
complex
depending
your
needs

Policies
can
be
expressed
with
ODRL
2.0
to
govern
access
to
Linked
Data

Example
of
access
to
Linked
Data
for
a
price
(15EUR
for
the
dataset
or
0.01EUR
for
a
triple
thereof)

@prefix gr: <http://purl.org/goodrelations/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
<http://salonica.dia.fi.upm.es/ldr/policy/cdaddba4-fc2e-4ee0-a784-e62f1db259bf>
a odrl:Set ;
rdfs:label "License Offering Paid Linked Data" ;
odrl:permission [ a odrl:Permission ;
odrl:target <http://example.org/dataset/ds01> ;
odrl:action odrl:reproduce ;
odrl:duty [ a odrl:Duty ;
rdfs:label "Pay" ;
gr:UnitOfMeasurement dcat:Dataset ;
gr:amountOfThisGood "1" ;
odrl:action odrl:pay ;
odrl:target "15,00 EUR"
]
] , [ a odrl:Permission ;
odrl:action odrl:reproduce ;
odrl:target <http://example.org/dataset/ds01> ;
odrl:duty [ a odrl:Duty ;
rdfs:label "Pay" ;
gr:UnitOfMeasurement rdf:Statement ;
gr:amountOfThisGood "1" ;
odrl:action odrl:pay ;
odrl:target "0,01 EUR"
]
] ..
The target can be an ontology, a
dataset, a SPARQL endpoint…
…or a SPARQL query itself or a triple
pattern: {mysubject, ?p , ?o}
37

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

And
you
have
support
for
that

•  Condi=onal
access
to
Linked
Data

–  hQp://condi=onal.linkeddata.es

•  Dataset
of
licenses
in
RDF

–  hQp://rdflicense.appspot.com

•  ODRL
Profile
for
Linked
Data

–  hQp://purl.oclc.org/NET/ldr/ns#

–  hQps://www.w3.org/community/odrl/profile/linkeddata/

38

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analyse
licensing
–
LCmple

39

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analyse
data
source

•  Get
insight
into
the
data
structure
and
organiza=on

•  Steps:

–  To
analyse
the
characteris=cs
of
the
data

•  Data
values,
data
ranges,
etc.

–  To
obtain
the
schema
of
the
data

•  Concepts
and
their
rela=onships

•  Data
can
be
available
as:

–  Structured
data

–  Unstructured
data

•  If
the
schema
does
not
exist:

–  Use
a
standard
modeling
language
for
describing
the
data

schema
(e.g.,
UML)

40

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analyse
data
source
–
LCmple

•  Metadata
not
quite

descrip=ve:

–  Diﬀerent
types
of
council

sites
(mostly
buildings)

–  Electricity,
gas
and
oil

consump=ons

–  1-‐year
intervals
-‐

2010/11,
2011/12,

2012/13

•  Analysis
required

contac=ng
with
people

from
LCC
open
data

41

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analyse
data
source
–
LCmple

42

hQp://localhost:3333/

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analyse
data
source
–
LCmple

43

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Analyse
data
source
–
LCmple

•  Analyse
the
characteris=cs
of
data
using
facets

•  Obtain
the
schema
of
the
data

44

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Data
characteris=cs
and
schema
–
LCCLLIDD

Column
Type
Comments
/
Range
(rounded)
Problems

uprn
String
Not
unique,
empty
values

Site
Name
String
Unique?

Site
types
+
name

4
repeated
sites

Address
2
String
Not
unique,
empty
values

Address
3
String
Not
unique,
empty
values
Village?
Civil
Parish?

Address
4
String
Not
unique,
empty
values
City?
Metropolitan
district?

“leeds”
vs
“Leeds”

PostCode
String
Not
unique,
empty
values

Electricity
10/11
Decimal
0
—
2.700.000

Electricity
11/12
Decimal
0
—
2.300.000

Electricity
12/13
Decimal
0
—
2.400.000

Gas
10/11
Decimal
-‐100,000
—
6,100,000
Nega=ve
values

Gas
11/12
Decimal
-‐100,000
—
7,800,000
Nega=ve
values

Gas
12/13
Decimal
-‐100,000
—
8,300,000
Nega=ve
values

Oil
12/13
Decimal
-‐1,000,000
—
13,000,000
Nega=ve
values
45

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
genera=on
process

Select data
source
Obtain
access to
data source
Analyse data
source
Analyse
licensing of
the data
source
Define resource
naming strategy
Transform
data source
Link with
other
datasets
Data source
Access, data
License
Schema, data
Ontology
RDF data
Linked dataset
Ontology
Develop
ontology
DEFINE RESOURCE
NAMING STRATEGY
46

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Hash
and
slash
URIs

•  Hash
URIs
(#)

–  hQp://www.energycompany.com/about#energyCompany

–  The
fragment
part
has
to
be
stripped
oﬀ
when
the
URI
is

requested
from
the
server
(i.e.,
the
resource
cannot
be

retrieved
directly)

–  Hash
URIs
can
be
used
to
iden=fy
non-‐document
resources

•  Slash
URIs
(/)

–  hQp://www.energycompany.com/about/energyCompany

–  Imply
a
303
redirec=on
to
the
loca=on
of
a
document
that

represents
the
resource
(+
content
nego=a=on)

•  E.g.,
hQp://www.energycompany.com/about/energyCompany.rdf

–  Drawbacks:
HTTP
round-‐trip,
redirects,
web
server

conﬁgura=on

47

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Hash
or
slash?

•  Depends
on
the
data
and
on
their
expected
use

•  Small
data:

–  Hash
namespace

–  Access
all
the
data
as
a
whole

–  HTTP
GET
would
return
a
single
informa=on
resource
with

everything

•  Large
/
frequently-‐updated
/
modular
data:

–  Slash
namespace

–  Access
resources
individually
or
in
groups

–  Resource
descrip=ons
may
be
divided
among
many
informa=on

resources
or
may
be
managed
via
a
query
service
(e.g.,
SPARQL)

–  Progressively
greater
detail
about
resources
may
be
retrieved

through
mul=ple
accesses

48

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Deﬁne
resource
naming
strategy

•  Steps:

–  To
choose
a
URI
form
(hash
or
slash)

–  To
choose
a
domain
for
the
URIs.

–  To
choose
a
path
for
the
URIs.

–  To
choose
a
paQern
for
ontology
classes
and
proper=es
in
the

ontology,
as
well
as
for
individuals

•  Tips:

–  One
URI
must
iden=fy
only
one
item
(e.g.,
avoid
mixing
with
web

pages
and
real-‐world
objects)

–  URIs
should
be
persistent
and
should
not
change
over
=me
(e.g.,

state
informa=on);
PURL
may
support
this

–  Use
a
domain
that
is
under
your
control
(or
a
service
such
as

PURL)

–  Separate
the
ontology
model
from
its
instances

–  Deﬁne
meaningful
URIs

49

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Resource
naming
strategy
–
LCC

•  Hash
URIs
for
ontological
terms,
slash
URIs
for
individuals

•  Domain:
hQp://smartcity.linkeddata.es/

•  Ontological
terms
path:

–  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#

•  Individuals
path:

–  hQp://smartcity.linkeddata.es/lcc/resource/

•  Ontological
terms
paSern:

–  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>

–  Ex.:
hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#hasQuan=ta=veValue

•  Individuals
paSern:

–  hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>

–  Ex.:
hQp://smartcity.linkeddata.es/lcc/resource/LeisureCentre/WetJohnCharlesCentreforSport

50

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
genera=on
process

Select data
source
Obtain
access to
data source
Analyse data
source
Analyse
licensing of
the data
source
Define resource
naming strategy
Transform
data source
Link with
other
datasets
Data source
Access, data
License
Schema, data
Ontology
RDF data
Linked dataset
Ontology
Develop
ontology
DEVELOP ONTOLOGY
51

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Ontology
development

6. Ontology
implementation
5. Ontology selection
1. Requirements deﬁnition
Can you
represent all
your data?
7. Ontology evaluation
2. Terms extraction
3. Ontology conceptualization
4. Ontology search
6.2 Ontology
completion
3.1 Initial model drafting
3.2 Detailed model deﬁnition
6.1 Ontology integration
You
did
this

yesterday

52

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Ontology
development
–
LCCDD

53

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
genera=on
process

Select data
source
Obtain
access to
data source
Analyse data
source
Analyse
licensing of
the data
source
Define resource
naming strategy
Transform
data source
Link with
other
datasets
Data source
Access, data
License
Schema, data
Ontology
RDF data
Linked dataset
Ontology
Develop
ontology
TRANSFORM
DATA
54

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Data
transforma=on

•  Steps:

–  To
select
the
RDF
serializa=on

•  RDF/XML,
Turtle,
N-‐Triples,
JSON-‐LD

–  To
select
a
tool.
Depends
on:

•  The
format
of
the
data
(database,
spreadsheets,
etc.),

•  Concrete
needs
of
the
transforma=on
process
(e.g.,

dynamicity)

–  To
transform
the
data
into
RDF

•  Usually
requires
a
mapping
between
the
data
and
the

ontology

•  The
mapping
implements
the
resource
naming
strategy

–  To
evaluate
the
obtained
RDF
data:

•  Syntax,
Completeness,
Accuracy,
Conciseness,
Modelling,

Understandability,
Versa=lity,
Usage,
Licensing,
…

55

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Data
transforma=on
tools

Database
to
RDF
Data
streams
to
RDF

•  morph-‐RDB

•  D2R
Server

•  TopBraid
Composer

•  morph-‐streams

•  D2R
Server

Spreadsheets
to
RDF

XML
to
RDF

•  TopBraid
Composer

•  Excel2RDF

•  RDF123

•  XLWrap

•  OpenReﬁne/LODReﬁne

•  XML2RDF

•  TopBraid
Composer


56

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Data
transforma=on
tools

Database
to
RDF
Data
streams
to
RDF

•  morph-‐RDB

•  D2R
Server

•  TopBraid
Composer

•  morph-‐streams

•  D2R
Server

Spreadsheets
to
RDF

XML
to
RDF

•  TopBraid
Composer

•  Excel2RDF

•  RDF123

•  XLWrap


•  XML2RDF

•  TopBraid
Composer


Overview
of

OpenReﬁne

57

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

OpenReﬁne
basic
opera=ons

•  Installing

•  Crea=ng
a
new
project

•  Data
analysis

–  Exploring
data

–  Sor=ng
data

–  Face=ng
data

–  Filtering
data

•  Basic
data
transforma=on
(cleaning/preparing)

–  Columns:

•  Move

•  Rename

•  Remove
columns

•  Collapse
and
expand

•  Common
transforma=ons

–  Rows:

•  Remove
rows

•  Export
whole
project

58

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Adding
derived
columns

Edit
column
à

Add
column
based
on
this
column...

59

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Spli‚ng
data
accross
columns

Edit
column
à

Split
into
several
columns...

60

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Handling
mul=-‐valued
cells

Edit
Cells
à

Split
mul=-‐valued
cells...

Edit
Cells
à

Join
mul=-‐valued
cells...

61

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Rows
and
records

Show
as:

rows

records

Record

Row

62

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Clustering
similar
cells

Edit
cells
à

Cluster
and
edit...

63

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Transposing
rows
and
columns

Transpose
à

Transpose
cells
across
columns
into
rows...

Transpose
à

Columnize
by
key/value
columns...

64

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Other
useful
u=li=es

•  Regular
expressions

–  Java
regular
expressions

•  Custom
transforma=ons

–  General
Reﬁne
Expression
Language
(GREL)

–  Jython
(Python
implemented
in
Java)

–  Clojure
(func=onal
language
that
resembles
Lisp)

65

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

66

Using
the
project
history

•  Project
history:

– Access
opera=on
history

– Undo
opera=ons

– Extract
opera=ons
(in
JSON)

– Apply
opera=ons

•  Cau=on:

– Transforma=ons
are
registered
in

the
history;
ﬁlters
and
facets
are

not

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Solving
memory
problems

hQps://github.com/OpenReﬁne/OpenReﬁne/wiki/FAQ:-‐Allocate-‐More-‐Memory

67

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

OpenReﬁne
RDF
extension
-‐
RDF
skeleton

•  Resource
naming
strategy

– Ontological
terms
paQern:

hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>

– Individuals
paQern:

hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>

Add
base
URI

Add
preﬁxes

68

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Crea=ng
individuals

schema:CivicStructure
rdf:type
lccRes:CouncilOﬃcesBelgraveHouse

69

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Previewing
results

70

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Adding
property
values

rdfs:label
schema:CivicStructure xsd:string
rdf:type
lccRes:CouncilOﬃcesBelgraveHouse

rdfs:label
“Belgrave
House”
71

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Expor=ng
RDF

@prefix schema: <http://schema.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix lcc: <http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CouncilOfficesBelgraveHouse>
a schema:CivicStructure ;
rdfs:label "Belgrave House" .
<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CommunityCentreTunstallRoad>
a schema:CivicStructure ;
rdfs:label "Tunstall Road" .
Export
à

RDF
as
Turtle

72

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Evalua=ng
the
exported
data

•  Manual
inspec=on

•  Syntax
evalua=on
(with
syntax
validator)

•  Consistency
with
the
ontologies
(with
reasoner)

•  Usage
evalua=on
(e.g.,
by
running
SPARQL

queries)

– Show
all
electricity
consump=ons
and
the
related

=me
periods
for
all
council
sites
related
to
culture

– Show
all
energy
consump=ons
and
the
related
=me

periods
of
council
sites
from
the
Wakeﬁeld
district

73

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Index

•  Linked
Open
Data
in
Smart
Ci=es

•  Guidelines
for
the
Genera=on
of
Linked
Data

•  Discussion

•  Hands-‐on
Descrip=on

74

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

75

Richer
schema
(and
data)

time:Interval
schema:City
ssn:Observation
ssn:observation
SamplingTime
ssn:SensorOutput
ssn:ObservationValue
ssn:hasValue
ssn:FeatureOf
Interest
ssn:featureOf
Interest
lcc:hasQuantityValue :: xsd:decimal
ssn:Property
ero:FinalEnergy
ssn:observed
Property
ssn:observation
Result
Legend
Class
datatype property :: datatype
object property
subclass of relation
lcc:uprn :: xsd:String
dc:title :: xsd:String
schema:PostalAddress
schema:addressLocality :: xsd:String
schema:addressRegion :: xsd:String
schema:streetAddress :: xsd:String
schema:postalCode :: xsd:String
schema:address
admingeo:District
admingeo:district
time:Instant
time:inXSDDateTime :: xsd:dateTime
time:hasBeginning
time:hasEnd
ero:Energy
ConsumerFacility
ero:consumes
EnergyType
om:Unit_of_measure
lcc:hasQuantityUnitOf
Measurement
SupplyOrStorageSite
OpenAirSite
AccomodationSite AdministrativeSite
OfficeSite
EducationalSite
SocialSite
OtherSite
CulturalSite
schema:containedIn
schema:Place
schema:Administrative
AreaLeisureSite

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Linked
Data
are
just
data

01000000
electric1011
01000000
electric1112
01000000
0 20 40 60 80 100
electric1213
Building
Electrical consumption
0e+00
2e+06
4e+06
6e+06
8e+06
0 500000 1000000 1500000 2000000
Electricity
Gas
Electricity vs gas consumption 12/13
0.0e+00
4.0e+06
8.0e+06
1.2e+07
0 500000 1000000 1500000 2000000
Electricity
Oil
Electricity vs oil consumption 12/13
76

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

77

Beneﬁts
of
linking
data

resPlus$electricTotal
0e+00
2e+06
4e+06
6e+06
Total
electric
consump2on

Original
data

+
geoloca=on

resP
Total
electric
consump2on
in
loca2ons
with

popula2on
>
20.000

Original
data

+
geoloca=on

+
popula=on

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Beneﬁts
of
reasoning

resPlus
25
50
75
10
Total
electric
consump2on
in
cultural
buildings

CulturalSite
Museum Library
78

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Index

•  Linked
Open
Data
in
Smart
Ci=es

•  Guidelines
for
the
Genera=on
of
Linked
Data

•  Discussion

•  Hands-‐on
Descrip2on

79

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

What
are
we
going
to
do?

Specification
Modelling
GenerationPublication
Exploitation
Linking
80

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

What
are
we
going
to
do?

Select data
source
Obtain
access to
data source
Analyse data
source
Analyse
licensing of
the data
source
Define resource
naming strategy
Transform
data source
Link with
other
datasets
Data source
Access, data
License
Schema, data
Ontology
RDF data
Linked dataset
Ontology
Develop
ontology
81

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Hands-‐on
task
1

•  Goal:
to
get
familiar
with
the
ﬁrst
steps
in
the
Linked

Data
genera=on
process

•  The
students
will
have
to
take
their
selected
dataset(s)

and
perform
the
following
tasks:

–  Analyse
Data
Set

•  Both
the
data
(quan==es,
value
ranges,
etc.)
and
the
schema

–  Analyse
Licensing
of
the
Data
Source

•  Who
is
the
publisher
and
the
rightsholder?

•  What
is
the
licence?

•  Which
will
be
the
license
to
be
used
for
the
generated
dataset?

–  Deﬁne
Resource
Naming
Strategy

•  For
the
ontology
and
the
data
(URI
form,
content
nego=a=on,

URIs
domain,
path,
paQerns,
etc.)

–  Finish
Ontology
Development

•  Lightweight
ontology
(i.e.,
classes,
proper=es,
domains
and

ranges)

82

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Hands-‐on
task
1
-‐
Deliverables

•  A
document
that
includes:

– The
analyses
performed
over
the
data
source

– The
licensing
of
the
data
source
and
the

poten=al
license

– The
resource
naming
strategy
defined

•  An
OWL
file
with
the
ontology
developed,

according
to
the
resource
naming
strategy

defined

83

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Hands-‐on
task
2

•  Goal:
to
get
familiar
with
the
transforma=on

of
CSV
data
into
RDF
using
LODRefine

•  The
students
will
have
to
take
their
selected

dataset(s)
and
perform
the
following
tasks:

–  Import
data
into
LODRefine

–  Analyse
and
fix
data

•  Analysis
performed
in
the
previous
class,
but
can
be

updated
with
new
findings

•  Fix
the
data
to
remove
errors

•  Transform
the
data
to
facilitate
RDF
genera=on

–  Export
data
to
RDF

•  Define
an
RDF
skeleton
for
the
data

•  Export
the
data
to
RDF
(Turtle
syntax)

84

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

Hands-‐on
task
2
-‐
Deliverables

For
each
dataset:

•  An
RDF
ﬁle
in
the
Turtle
syntax
with
the

data
transformed
into
RDF

85

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

LD4SC
Summer
School

7th
-‐
12th
June,
Cercedilla,
Spain

1st
Summer
School
on

Smart
Ci2es
and
Linked
Open
Data
(LD4SC-‐15)

Thank
you
for
your
aQen=on!

Linked Data Generation Process

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Linked Data Generation Process

Similar to Linked Data Generation Process (20)

More from LD4SC

More from LD4SC (8)

Recently uploaded

Recently uploaded (20)

Linked Data Generation Process