Is linked data something for me?

Is Linked Data something for me?

Christophe Guéret, Clément Levallois
eHumanities group meeting, November 22, 2012

1/

Get ready !
Goal of today

Learn about Linked Data

See if that is something interesting for your activities

2/

Hands-on tutorial
Make groups, one per table

Pick a famous person of your choice per group

Grab the material on http://bit.ly/ehg_tutorial or
catch a USB stick

3/

Big data, but how to get it?
Can't always
gather all the
information
manually

4/

Data scattered in
different information
systems

5/

Data in different formats

6/

What if we could?
If all data where “readable”, connections between
datasets could be made. We would simply know
more than we do today.

“Linked data” is an attempt to do that

7/

Why is it so hard?
Machines can not read the text and extract data

What is the name of that person? 8/

Ouch!
You just faced the same problem as machines:
Can't read the document and extract the data

Linked Data is a solution to this problem

Note: in the following we take the example of data “buried” in
webpages (html documents), but the same logic applies to other
kinds of docs (csv files, databases, your collection of pictures…)

9/

Use case for the hands-on

10/

What we will do...
Take a the webpage of a researcher (one page per
group!)

Explain why the data in this page is “buried”

Solve the issue by introducing some linked data
sweetness in the webpage

Show what we gained: now, we can connect the
researchers!
11/

Template 1
The name is in the title
City is ambiguous

12/

Template 2
The name is not visible on the page
City is ambiguous

13/

Template 3
The name is in the description
City is ambiguous

14/

Hands-on: check out the templates
Open the templates in a web browser and look at
their HTML source code

15/

Hands-on: check out the templates
Change “William Smith” into a name of your own
(one name per group)
Change and pick another name!

16/

First part of the hands-on

17/

In what sense do we mean that the name of this
researcher is buried in this web page?
There is no way for a software reading this page to guess:
is there a name on this page?
if so, what is this name?
What does this name represent? What does it relate to?

But wait, my Internet browser can read html pages,
why can’t it figure out the name of the researcher?
Because the html code gives info about how to display
the page, but no info about what the content means!

18/

Two roads from there…

We could design a software that understands English
This is the approach of natural language processing,
statistics, etc...

We can put extra code that tells directly to the software
what the data means
This is the linked data approach! This extra code in html
pages is called “RDFa”

19/

Annotate the data
We use a VOCABULARY for these annotations
foaf:name

20/

Wait! What is that “foaf:name” ?
It is a term from a vocabulary
foaf:name comes from the vocabulary FOAF and is used
to annotate the name of a person Key concept!!!

Vocabulary = set of unambiguous consensual
terms used to annotate pages with data

Vocabulary are
An agreement between data publisher and consumers
Generally focused on particular topics 21/

Annotate the page with the data

22/

Hands-on: annotate with foaf:name
Add the “foaf:name” annotation to the three
templates

Step 1: declare the vocabulary FOAF
<html xmlns:foaf="http://xmlns.com/foaf/0.1/">

Step 2: annotate the data
William Smith
Template 2 does not display the name we use a meta:
<meta property="foaf:name" content="William Smith"/> 23/

Hands-on: extract annotations
Use the RDFa extractor at http://bit.ly/RDFaParser
to get the annotations from the three templates

Command line tool:
java -jar RDFaParser-0.0.6.jar template1.html

All the three return the same result: nothing!
24/

Bingo!
We get exactly the same result for the three
templates
foaf:name = William Smith

25/

How this should look like now
(here showing template 1)

26/

How to choose a vocabulary?
Vocabulary => consensus

Therefore, it is better to
Avoid obscure vocabularies nobody knows
Focus on well organised and maintained vocabularies

Why did we use FOAF?
Specialised for personal profiles and widely accepted
W3C support & recommended for use by EU members
http://joinup.ec.europa.eu/asset/core_person/description 27/

What vocabularies are available?
Many are well established: FOAF, SIOC, Dublin
Core, BIBO, …

Creating vocabularies is doable but beware that:
New vocabularies won't necessarily gain adoption
Need to maintain the vocabulary
Need to host it on the Web

A vocabulary can borrow terms from other vocabs.
28/

EU initiative
“Core Vocabularies” from ISA program
Combine existing terms and new ones

29/

Google/Bing/Yahoo/Yandex initiative
Vocabulary: Schema.org
Used by search engines to extract pages' data

30/

Facebook initiative
Vocabulary: Open graph protocol
Used to put the “Like” buttons on pages

31/

How to use a vocabulary?
Look at the documentation, e.g.
http://xmlns.com/foaf/spec/

Map your concepts to terms from the vocabulary
Naam → foaf:name
Voornaam → foaf:firstName
Achternaam → foaf:lastName
Werklocatie → foaf:based_near

32/

Triples and subjects
Remember, we created this annotation
. foaf:name "William Smith“

But what entity has “William Smith” for a name?
<template1.html> foaf:name "William Smith"
Meaning: This document has for name “William Smith”

This is a “triple” made of a subject, a predicate and an object
Subject = <template1.html>
Predicate = foaf:name
Object = "William Smith"

33/

We did not declare a subject
This says that this is the foaf:name but does not
define a subject → Use the page name by default
foaf:name

34/

Why does this matter?
Subjects can be used as objects to create links
foaf:knows foaf:name

Need a common subject to group annotations

foaf:name
William smith

foaf:based_near
Durham

35/

Picking a resource
Need to be stable, web accessible, re-used

Consensus again, example:
Amsterdam: http://dbpedia.org/resource/Amsterdam
TBL: http://www.w3.org/People/Berners-Lee/card#i

The <C:/MyDirectory/templateX.html> are not valid
Web based, we need to change that

36/

Hands-on: set the subject
Step 1: decide on a resource for the person
http://example.org/william_smith
http://myurl.com/john_doe

Step 2: add the resource with an “about” tag in the
same span as the foaf:name
Example:
You had: 
It becomes:


37/

5-star Linked Data
Rules (see http://5stardata.info/ ):
Resource are valid URIs
Machine readable data is associated to the resource
The data contains links to other resources
Example http://dbpedia.org/resource/Amsterdam

38/

Great! We're done now!
We added this structured piece of data to all the
templates:
<http://example.org/william_smith> foaf:name "William Smith"

This data can be extracted by a software

We can build our application that fetch persons'
name, but there are still no links between them :-/

39/

One of the new code
All the annotated templates have their name
suffixed with “_with_name_and_subject”

40/

Second part of the hands-on

Create some links

41/

Creating links
Links are used to connect two resources

Example: William Smith knows Tim Berners-Lee
<http://example.org/william_smith> foaf:knows
<http://www.w3.org/People/Berners-Lee/card#i>

Two usages:
Create (social) networks by connecting resources
Disambiguate text by pointing to the exact resource
42/

Hands-on: getting social
Step 1: ask 3 other groups in this workshop for their subject
(remember, a subject is:


Step 2: use the 3 subjects you got to annotate the links
Example:

I know
John Doe
, and
Noam Chomsky
, and also
Sally
Wyatt

43/

Let's make some links

44/

Remember, there are two Durham
One of the US, one in the UK, similar importance
Which one is the “Durham” on the profile?

http://sws.geonames.org/4464368 http://sws.geonames.org/2650628

45/

Finding a resource on Geonames
Search by name, follow the RDF link, strip out the
“/about.rdf” part

46/

Hands-on: disambiguate Durham
Annotate “Durham” with a link to the exact
resource

Step 1: decide on which Durham to use

Step 2: annotate Durham with the link
Durham
47/

Hands-on: extract annotations
Use the RDFa extractor at http://bit.ly/RDFaParser
to get the annotations from the three templates

Command line tool:

All the three return the same result!
48/

Hands-on: extract a network!
Now use a little software from the dropBox

49/

That's all for now!

(but there is more to discover: ontologies, reasoning, SPARQL, ...)

50/

Is linked data something for me?

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (12)

Similar to Is linked data something for me?

Similar to Is linked data something for me? (20)

More from Christophe Guéret

More from Christophe Guéret (20)

Is linked data something for me?