Linked Data and Tools

Linked Data and Tools
Pedro Szekely
USC/Information Sciences Institute
pszekely@isi.edu, http://isi.edu/~szekely
September 2014
CC-By 2.0

Outline
• Introduction to linked open data
• RDF: the Resource Description Framework
• Tools to convert data to RDF
• Tools for linking/reconciliation/resolution
• Storing and maintaining the data
• Applications
Pedro Szekely CC-By 2.0 2

Pedro Szekely
Linked Open Data!
CC-By 2.0 3

The Web of Documents

What We See

What the Computer Sees
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah
blah

Problem
web pages are machine processable,
but not machine understandable
impractical for building applications using the data

Solution
publish the data as Linked Open Data

What Is Linked Data?
A method of publishing structured data
so that it can be interlinked
and become more useful
Builds upon standard Web technologies
such as HTTP and URIs
to share information
in a way that can be read automatically by computers
from Wikipedia

“Linked” Open Data
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
National Portrait
Gallery
The Metropolitan
Museum of Art
Smithsonian American
Art Museum

“Linked” Open Data
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
… data is public!
… in a common format!
… but we only have islands of data!
Indianapolis
Museum
of Art
National Portrait
Gallery
The Metropolitan
Museum of Art
Smithsonian American
Art Museum
✔
✖

Linked Open Data

Linked Data Principles
• Use URIs as names for things
• Use HTTP URIs so that people
can look up those names
• When someone looks up a URI,
provide useful information,
using the standards (RDF,
SPARQL)
• Include links to other URIs so
that they can discover more
things http://youtu.be/OM6XIICm_qo!
http://www.w3.org/DesignIssues/LinkedData.html !

Pedro Szekely
Principle 1
Use URIs as names for things
Principle 2
Use HTTP URIs so that people can look up those names
CC-By 2.0 14

Can USC Have a URI?

http://dbpedia.org/resource/University_of_Southern_California

Can the Pythagoras Theorem Have a URI?

http://www.freebase.com/m/05r2j

My Dog: Can He Have a URI?

http://szekelys.com/diego

Pedro Szekely
Principle 3
When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL)
CC-By 2.0 21

Pedro Szekely
http://dbpedia.org/resource/University_of_Southern_California
CC-By 2.0 22

Pedro Szekely
http://www.freebase.com/m/05r2j
CC-By 2.0 23

Pedro Szekely
Principle 3
When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL) CC-By 2.0 24

Pedro Szekely
Principle 4
Include links to other URIs so that they
can discover more things
CC-By 2.0 25

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbpprop: <http://dbpedia.org/property/> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .
@prefix fb: <http://rdf.freebase.com/ns/> .
rdf:type “Dog” ;
http://szekelys.com/name ”Diego" ;
dbpedia-owl:species “Labrador Retriever” ;
dbprop:country “Canada” ;
dbprop:color “Yellow” ;
fb:base.petbreeds.dog.gender “Male” .
Linked Data?! Pedro Szekely CC-By 2.0 26

rdf:type “Dog” ;
dbpedia-owl:species “Labrador Retriever” ;
dbprop:country “Canada” ;
dbprop:color “Yellow” ;
fb:base.petbreeds.dog.gender “Male” .
Not Linked Data! Pedro Szekely CC-By 2.0 27

rdf:type dbpedia:Dog;
dbpedia-owl:species dbpedia:Labrador_Retriever ;
dbprop:country dbpedia:Canada;
dbprop:color dbpedia:Yellow;
fb:base.petbreeds.dog.gender fb:en.male.
Almost Linked Data! Pedro Szekely CC-By 2.0 28

Almost Linked Data! Pedro Szekely CC-By 2.0 29

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
foaf is a widely used ontology!
foaf:name ”Diego" ;
Linked Data!

Pedro Szekely
RDF!
CC-By 2.0 31

Resource Description Framework
Intended for representing metadata about Web resources,
such as the title, author, and modification date
of a Web document
… also be used to represent information about
things that can be identified on the Web,
even when they cannot be directly retrieved on the Web

Represent Resources Using URIs
That guy has first name “Pedro”
h&p://szekelys.com/family#pedro
“Pedro”
h&p://xmlns.com/foaf/0.1/firstName

Represent Information as Triples
Subject!
Predicate!
“Pedro”
The resource being described
A property of the resource
Object! The value of the property

Use Namespaces
“Pedro”
foaf:firstName
“Pedro”

RDF Graphs
“Pedro”
foaf:firstName
foaf:Person
rdf:type
foaf:homepage
h&p://isi.edu/~szekely

RDF Graphs
Real world objects! Kinds of things!
“Pedro”
foaf:firstName
foaf:Person
rdf:type
foaf:homepage
Literals!
Properties of things!

Mix Vocabularies
“Pedro”
foaf:Person
foaf:firstName
rdf:type
foaf:homepage
schema:Person
rdf:type
schema:spouse
h&p://szekelys.com/family#claudia

Linked Open Data

Tools to Convert Data
Pedro Szekely
to RDF!
CC-By 2.0 40

Steps to Create Linked Open Data
• Select ontologies
… that define classes and properties for our data
• Convert data to RDF
… from the museum database to the ontologies
• Identify links to other Linked Data datasets
… to other museums and Link Data hubs

CIDOC CRM
http://www.cidoc-crm.org/

• Convert data to RDF
… from the museum database to the ontologies
Pedro SzPeekderlyo Szekely CC-By 2.0 43

RDF Mapping Tools
Tool Shortcomings Benefits
custom
labor intensive, error
flexible
code
prone
R2RML difficult to learn, only
for SQL databases
W3C standard, good documentation,
multiple vendors
RDF
Refine
only for tabular data graphical user interface, support for
reconciliation, open source
Karma semi-automatic, graphical user
interface, supports tabular data, XML
and JSON, multiple export formats,
R2RML compatible, open source

R2RML
About 6,550 results!

R2RML Example
:Table1 rdf:type rr:TriplesMap ;
rr:logicalTable "Select ('<http:..isbn/' || ISBN || '>') AS isbn,
Author, Title, Publisher, Year from book_table";
rr:subjectMap [ rdf:type rr:IRIMap ; rr:column "isbn" ] ;
rr:propertyObjectMap [ rr:property a:title ; rr:column "Title" ; ] ;
rr:propertyObjectMap [ rr:property a:year ; rr:column "Year" ; ] ;
http://ivan-herman.name/2010/11/02/my-first-mapping-from-rdb-to-rdf-using-r2rml/!
http://www.w3.org/TR/r2rml/!

RDF Refine
Pedro Szekely http://refine.deri.ie/rdfExportDocs!CC-By 2.0 47

Karma
https://github.com/InformationIntegrationGroup/Web-Karma!

Pedro Szekely
Tools for Linking!
CC-By 2.0 49

Multiple “John Singer Sargent”
ima:SaamPerson_John_Singer_Sargent!
a saam:SaamPerson ;!
dct:date "1856-1925" ;!
foaf:name "John Singer Sargent" .!
cb:SaamPerson_John_Singer_Sargent!
ont0:dateOfBirth "1879", "1885" ;!
ont0:dateOfDeath "1925" ;!
skos:prefLabel "John Singer Sargent" .!
saam:SaamPerson_4253!
saam:associatedPlace !
dallas:SaamPerson_John_Singer_Sargent!
ont0:dateOfBirth "1856" ;!
met:SaamPerson_John_Singer_Sargent!
ont0:placeOfResidence !
saam:SaamPlace_1357324439768t1r13950_0, !
saam:SaamPlace_1357324439768t1r13951_0 ;!
saam:constituentId "4253" ;!
rdaGr2:biographicalInformation !
“Painter. Sargent traveled …" ;!
rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ;!
rdaGr2:dateOfBirth "1856-1-12" ;!
rdaGr2:dateOfDeath "1925-4-15" ;!
rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ;!
rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ;!
skos:altLabel "John S. Sargent" ;!
"North and Central America", !
"United States" ;!

ima:SaamPerson_John_Singer_Sargent!
dct:date "1856-1925" ;!
cb:SaamPerson_John_Singer_Sargent!
ont0:dateOfBirth "1879", "1885" ;!
saam:associatedPlace !
dallas:SaamPerson_John_Singer_Sargent!
ont0:dateOfBirth "1856" ;!
met:SaamPerson_John_Singer_Sargent!
ont0:placeOfResidence !
saam:SaamPlace_1357324439768t1r13950_0, !
saam:SaamPlace_1357324439768t1r13951_0 ;!
saam:constituentId "4253" ;!
rdaGr2:biographicalInformation !
“Painter. Sargent traveled …" ;!
rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ;!
rdaGr2:dateOfBirth "1856-1-12" ;!
rdaGr2:dateOfDeath "1925-4-15" ;!
rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ;!
rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ;!
skos:altLabel "John S. Sargent" ;!
"North and Central America", !
"United States" ;!
Pedro
Szekely
John Singer Sargent

Linking “John Singer Sargent”
owl:sameAs cb:SaamPerson_John_Singer_Sargent ;!
owl:sameAs dallas:SaamPerson_John_Singer_Sargent ;!
owl:sameAs ima:SaamPerson_John_Singer_Sargent ;!
owl:sameAs met:SaamPerson_John_Singer_Sargent ;!
owl:sameAs dbpedia:John_Singer_Sargent ;!
owl:sameAs nytimes/N49129220686803623753 ;!
owl:sameAs w-flick/John_Singer_Sargent ;!
...!
.!
Pedro
Szekely

Linking/Reconciliation Tools
Tool Shortcomings Benefits
custom
code
very difficult tuned to the data
SILK
LIMES
experimental, poor
support
work with RDF, efficient, relatively
easy to use
RDF
Refine
requires implementing
a new reconciliation
service
integrated with RDF conversion, user
interface for curation
Karma under development

SILK
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk!

RDF Refine
http://refine.deri.ie/reconciliationDocs!

Pedro Szekely
Storing and
Maintaining the Data!
CC-By 2.0 56

Storage Options
Technology Shortcomings Benefits
SPARQL
low reliability, esoteric,
endpoint
slow
sophisticated query language
RDF dump no query capability,
esoteric
flexibility: clients can
download and use in
applications, easy to publish
JSON-LD +
ElasticSearch
restricted query language very high performance,
mainstream technology, easy
to publish

JSON-LD
{
"@type": "http://www.cidoc-crm.org/cidoc-crm/E21_Person",
"@id": "http://americanart.si.edu/data/person-institution/99”,
“P1_is_identified_by": {
"@type": "http://www.cidoc-crm.org/cidoc-crm/E82_Actor_Appellation",
"@id": "http://americanart.si.edu/data/person-institution/99/appellation/Birth-or-Maiden-Name”,
“label": " Walter Inglis Anderson”,
“lastname": "Anderson",
“firstname": "Walter Inglis”
}
}

Pedro Szekely
Applications!
CC-By 2.0 60

we have expanded the reach of linked data within the BBC to more
audience facing products and presented our ambitions to using linked
data as glue for the plethora of content the BBC produces!
!
http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-new-ontologies-website!
http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-Connecting-together-the-BBCs-Online-Content!
http://www.bbc.co.uk/blogs/internet/posts/Opening-up-the-BBCs-Linked-Data!

thanks for your attention!
questions?!

Linked Data and Tools

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (8)

Similar to Linked Data and Tools

Similar to Linked Data and Tools (20)

More from Pedro Szekely

More from Pedro Szekely (6)

Recently uploaded

Recently uploaded (20)

Linked Data and Tools