Linked Open Data
SIKS course on Data Science

May 20, 2016 Vught.

Laura Hollink
Why do we create and use Linked Open Data?
Example questions from
the humanities and
social sciences
How did the debate about
the financial crisis in
Greece develop?
Searching the proceedings of the EU Parliament
"Greece" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
050100150200
1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013
Searching through newspaper archives
Mentions of “Griekenland” in the Dutch newspaper the Telegraaf.
Search volumes on a search engine
Query = “Greece”
http://www.google.com/trends
Search volumes on a search engine
Query = “Greece”
http://www.google.com/trends
We need access to data. Analysing
them gives us some useful insight.
But to answer the question properly
we would need to combine sources
and do more complex queries.
Why do we create and use Linked Open Data?
Example question 2 

Which political debate in the
post-war period has attracted
most media attention?
“De Indonesische Quaestie"
“De Indonesische Quaestie"
To answer this question we need to
go through all newspaper articles
about all political debates.
-> we need access to combined
data sources, we need
structured queries.
Why do we create and use Linked Open Data?
Why do we create and use Linked Open Data?
Example question 3
What are the differences
between different media?

Example question 4
Has the coverage changed
over time?
Research goals and research questions
Our goal is to build an infrastructure to answer these kinds of questions.

1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. What can we conclude from usage statistics on these datasets?

4. Can we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?
Research goals and research questions
Our goal is to build an infrastructure to answer these kinds of questions.

1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. What can we conclude from usage statistics on these datasets?

4. Can we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?
Data Science - Big Data - Linked Open Data
Table of Contents
1. What is Linked Open Data (LOD)
2. Creating LOD
1. How to discover links
2. How to represent links on the Web
3. How to evaluate links
3. Access to LOD (from both the server and the client
perspective)
What is Linked Open Data?
What is Linked Open Data?
What is Linked Open Data?
A method of publishing structured data on the Web
in such a way that it can be linked and queried
by computers as well as humans.
The Web of Documents
The Web of Documents
• Documents	
  identified	
  by	
  URIs	
  (html,	
  pdf,	
  images,	
  movies,	
  etc.)	
  
• with	
  structured	
  information	
  for	
  humans	
  (tables,	
  headers)	
  and	
  
• with	
  hyperlinks	
  between	
  them	
  
• The	
  data	
  is	
  not	
  machine	
  readable,	
  meant	
  for	
  humans	
  
• structure	
  is	
  implicit	
  (what	
  do	
  the	
  columns	
  of	
  a	
  table	
  mean?)	
  
• links	
  are	
  not	
  typed	
  (what	
  is	
  the	
  relation	
  between	
  two	
  documents?)	
  
The Web of Data
The Web of Data
• Everything	
  identified	
  by	
  URIs	
  (not	
  just	
  documents,	
  but	
  also	
  classes,	
  
instances,	
  relations/links)	
  
• The	
  data	
  is	
  machine	
  readable:	
  	
  
• in	
  formal	
  languages	
  (RDF,	
  RDFS,	
  OWL,	
  SKOS)	
  	
  
• which	
  enable	
  machines	
  to	
  do	
  reasoning,	
  i.e.	
  infer	
  new	
  statements	
  
from	
  inserted	
  statements.	
  
Compared to a database table…
Amsterdam
has population
“1364422” City Schiphol
is a has airport
Thing Type Population Airport
Amsterdam City 1364422 Schiphol
…. … …. …
Compared to a database table…
Amsterdam
has population
“1364422” City Schiphol
is a has airport
Differences:

• Statements can be distributed over the web

• Non-unique naming assumption

• Open World assumption

• Everyone can say anything about anything
Thing Type Population Airport
Amsterdam City 1364422 Schiphol
…. … …. …
Compared to a database table…
Amsterdam
has population
“1364422” City Schiphol
is a has airport
Examples of URIs on the Web of Data
• documents:
• http://vu.nl/index.html

• http://example.org/cities#Leuven

• real world objects (a book in the library, a person)
• isbn://5031-4444-333

• http://eyaloren.org/foaf.rdf#me

• concepts:
• http://cyc.org/concept/Mammal 

• http://cyc.org/concept/Dog 

• www.w3.org/2006/03/wn/wn20/instances/synset-anniversary-noun-1

• relations:
• http://purl.org/linkedpolitics/vocabulary/speaker
RDF (the basics)
• A W3C recommendation to
describe resources on the Web
of Data called “Resource
description Framework”

• See https://www.w3.org/RDF/ 

• RDF data model: triples!
RDF (the basics)
• A W3C recommendation to
describe resources on the Web
of Data called “Resource
description Framework”

• See https://www.w3.org/RDF/ 

• RDF data model: triples!
RDF (the basics)
• A W3C recommendation to
describe resources on the Web
of Data called “Resource
description Framework”

• See https://www.w3.org/RDF/ 

• RDF data model: triples!
RDF example in Turtle syntax:
<bob#me>
a foaf:Person ;
foaf:knows <alice#me> ;
schema:birthDate "1990-07-04"^^xsd:date ;
foaf:topic_interest wd:Q12418 .
Vocabulary definition and reasoning with RDFS
B
C
r
A
data level
ontology / vocabulary /
schema level
Vocabulary definition and reasoning with RDFS
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
B
C
r
A
data level
ontology / vocabulary /
schema level
Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
C rdfs:subClassOf B
r rdf:type C
THEN
r rdf:type B
B
C
r
A
data level
ontology / vocabulary /
schema level
Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
B rdfs:subClassOf A
r rdf:type B
THEN
r rdf:type A
<bob#me> rdf:type foaf:Person .
foaf:Person rdfs:subClassOf foaf:Agent .
Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
B rdfs:subClassOf A
r rdf:type B
THEN
r rdf:type A
<bob#me> rdf:type foaf:Person .
foaf:Person rdfs:subClassOf foaf:Agent .
<bob#me> a foaf:Agent .
Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
B rdfs:subClassOf A
r rdf:type B
THEN
r rdf:type A
<bob#me> rdf:type foaf:Person .
foaf:Person rdfs:subClassOf foaf:Agent .
<bob#me> a foaf:Agent .
Standard meaning
Vocabulary definition and reasoning with RDFS
IF
p rdfs:range R
A p B
THEN
B rdf:type R
<bob#me> foaf:knows <alice#me> .
foaf:knows rdfs:range foaf:Person .
Vocabulary definition and reasoning with RDFS
IF
p rdfs:range R
A p B
THEN
B rdf:type R
<bob#me> foaf:knows <alice#me> .
foaf:knows rdfs:range foaf:Person .
<alice#me> rdf:type foaf:Person .
Vocabulary definition and reasoning with RDFS
IF
p rdfs:range R
A p B
THEN
B rdf:type R
<bob#me> foaf:knows <alice#me> .
foaf:knows rdfs:range foaf:Person .
<alice#me> rdf:type foaf:Person .
Standard meaning
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
Query: :JamesDean ?what :Giant.
SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
Query: :JamesDean ?what :Giant.
Answer: :playedIn
Linked Open Data
A method of publishing on the Web of Data: openly
available, in RDF, with links to other datasets.
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked Open Data
A method of publishing on the Web of Data: openly
available, in RDF, with links to other datasets.
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Creating Linked Open Data
in the Talk of Europe project:
Discovering links, knowledge representation
Creating Linked Open Data
in the Talk of Europe project:
Discovering links, knowledge representation
The European Parliament as Linked Open Data
Laura Hollink	 	 Centrum Wiskunde & Informatica, Amsterdam
Astrid van Aggelen 	 VU University Amsterdam
Martijn Kleppe	 	 Erasmus University Rotterdam
Henri Beunders Erasmus University Rotterdam
Jill Briggeman Erasmus University Rotterdam
Max Kemman	 	 University of Luxembourg
Talk of Europe goals
• To publish the entire plenary debates of the European
Parliament as Linked Open Data

• To improve access to the data

• To enable large scale analysis across time spans.

‣To residents of the European Union access to the proceedings
of the European parliament is a formal right.
A. van Aggelen, L. Hollink, M.
Kemman, M. Kleppe & H. Beunders.
The debates of the European
Parliament as Linked Open Data.
Semantic Web Journal. In press, 2016.
1. Data in RDF
1. Data in RDF
1. Data in RDF
14M RDF statements about the 30K
speeches in 23 languages by 3K
speakers in 1K session days that
were held in the EU parliament
between 1999 and 2014
2. Links to external datasets
•
2. Links to external datasets
•
2. Links to external datasets
•
Example 1: speeches that contain a certain keyword
Query: all speeches that contain the phrase “open data”
…. So let us go for open data, let us
go for utilisation of all the instruments
available to that end! …..
…. but there too governments are
encouraging the use of open data to
increase transparency, accountability
and citizen participation ….
…. We already have many open data
projects in the Member States and
local authorities…..
Example 2: speeches that contain a certain
keyword by date
"Slovenia" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
020406080100
1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
Example 2: speeches that contain a certain
keyword by date
"Slovenia" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
020406080100
1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
Example 2: speeches that contain a certain keyword
by date
Mentions of 'human rights'
dates
Frequency
0200400600800
1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013
Example 3: speeches that contain a certain keyword
by country
AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK
Mentions of 'human rights' by country
01000200030004000500060007000
Example 4: the number of speeches per EU
country
SELECT ?c (COUNT(?c) as ?count) 

WHERE { 

	 ?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>. 

	 ?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p. 

	 ?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c

} GROUP BY ?c LIMIT 50
Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Integrate data from
the EU parliament
with external datasets
Linking Members of Parliament to Wikipedia /
DBpedia
Linking Members of Parliament to Wikipedia /
DBpedia
Linking Members of Parliament to Wikipedia /
DBpedia
Linking Members of Parliament to Wikipedia /
DBpedia
• String matching is the most important feature in the linking process.

• “nearly all [alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
http://www.dbpedia.org/resource/Judith_Sargentini
Linking Members of Parliament to Wikipedia /
DBpedia
• String matching is the most important feature in the linking process.

• “nearly all [alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
http://www.dbpedia.org/resource/Judith_Sargentini
How to relate a speech to a speaker and party?
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech to a speaker and party?
Why is this not a good solution?
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech to a speaker and party?
Why is this not a good solution?
1. A person might be a member of more than one party (at different times)
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech to a speaker and party?
Why is this not a good solution?
1. A person might be a member of more than one party (at different times)
2. Since there is no link between a speech and a party, queries for all speeches
spoken by the members of a certain party become very complicated.
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:spokenAs
lpv:speaker
lpv:spokenAs
rdf:type
How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:spokenAs
lpv:speaker
lpv:spokenAs
rdf:type
Note: this is a common “design pattern”
referred to as n-ary relations or
relations as classes
Intermezzo: one-question Quiz
Reasoning on the Web of Data
Question: What can we conclude from this graph?

A. Stihler is a member of exactly 3 parties

B. Stihler is a member of at least 3 parties

C. Stihler is a member of at most 3 parties

D. None of the above

E. All of the above

F. Other, namely ….
http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name
http://purl.org/linkedpolitics/EUParty/PES
http://dbpedia.org/resource/
Party_of_European_Socialists
http://dbpedia.org/resource/
Progressive_Alliance_of_Socialists_and_Democrats
:memberOf
:memberOf
:memberOf
Creating Linked Open Data
in the PoliMedia project:
Discovering links, knowledge representation, evaluation
Creating Linked Open Data
in the PoliMedia project:
Discovering links, knowledge representation, evaluation
Linking government data
to news data
Which political debate in
the post-war period has
attracted most media
attention?

What are the differences
between different media?

Has the coverage changed
over time?
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Roughly 1.8 Million news
bulletins between
1937-1984

(We only use 1945-1995)
Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.

(We only use 1945-1995)
Transcriptions of all
meetings of the
European Parliament
between 1999 and
2014.
Links in PoliMedia
is about
• 3 Million links
Discovering links between politics and news
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Step 2: generate links
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Step 2: generate links
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Intuition 2: the more the article and the
speech overlap in terms of topics and
named entities, the more they are related.
Representation of links
:speech123:newsArticle456 :isAbout
Representation of links
• Note: this is another
example of
the“design pattern”
referred to as n-ary
relations or relations
as classes!

• It allows us to save
provenance
information about
the statements we
create.
:speech123:newsArticle456 :isAbout
Representation of links
• Note: this is another
example of
the“design pattern”
referred to as n-ary
relations or relations
as classes!

• It allows us to save
provenance
information about
the statements we
create.
:speech123:newsArticle456 :isAbout
:speech123
:newsArticle456
:link001
01-02-2013 :PoliMedia_Linking_Engine
:quotes
:concept1
:concept2
link type
:madeBy:creationDate
Evaluation of links
Evaluation of links
1. Manually rating (a sample of) links

• relatively cheap and easy to interpret

• only precision, no recall
Evaluation of links
1. Manually rating (a sample of) links

• relatively cheap and easy to interpret

• only precision, no recall
2. Comparison to a reference linkset

• precision and recall

• used in OAEI on the SEALS platform

• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
Evaluation of links
1. Manually rating (a sample of) links

• relatively cheap and easy to interpret

• only precision, no recall
2. Comparison to a reference linkset

• precision and recall

• used in OAEI on the SEALS platform

• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
3. End-to-end evaluation (a.k.a. evaluating an application
that uses the mappings)

• arguably the best method!

• need to have access to an application + users
Evaluation of links: beyond precision / recall
B
C
r
A
data level
ontology / vocabulary /
schema level
Evaluation of links: beyond precision / recall
Generalized precision and Generalized recall

• Instead of a binary classification into correct/
incorrect mappings, take into account how wrong
an link is:

• where r(a) is the semantic distance between
correspondence a and correspondence a’ in the
reference alignment, A is the number of
correspondences.
Laura Hollink, Mark van Assem, Shenghui
Wang, Antoine Isaac, Guus Schreiber. Two
Variations on Ontology Alignment
Evaluation: Methodological Issues.ESWC
2008.
B
C
r
A
data level
ontology / vocabulary /
schema level
Evaluation of links in PoliMedia
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.

• a pilot study showed that we needed
more than a 2 point scale.

• inter-rater agreement: 0.5 ->
acceptable, but not high.

• Precision: 80%
Evaluation of links in PoliMedia
Setting 1 Setting 2 Setting 3
0,48 0,62 0,8
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.

• a pilot study showed that we needed
more than a 2 point scale.

• inter-rater agreement: 0.5 ->
acceptable, but not high.

• Precision: 80%
Evaluation of links in PoliMedia
Setting 1 Setting 2 Setting 3
0,48 0,62 0,8
How many links did we miss?
• We ask the raters to
manually search the KB
archives for related
articles.

• Recall: 62%
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.

• a pilot study showed that we needed
more than a 2 point scale.

• inter-rater agreement: 0.5 ->
acceptable, but not high.

• Precision: 80%
DEMO - PoliMedia search application
Online database:
“SPARQL endpoint”
• A service to query a knowledge
base using the SPARQL query
language.

“All speeches with more
than 60 associated news
items.”
Access to Linked Open Data: how to serve and
how to consume Linked Open Data
Access to Linked Open Data: how to serve and
how to consume Linked Open Data
Access to LOD: 1. download a data dump
Access to LOD: 1. download a data dump
From server logs we know the query
-some context of the requested URIs
-variable names (?)
Access to LOD 2: follow-your-nose
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
dbp:children
"2"
lpv:speaker
dbc:Officiers_of_the_Légion_d'honneur
Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
dbp:children
"2"
lpv:speaker
dbc:Officiers_of_the_Légion_d'honneur
From server logs we know the requested URI:

GET /Martin_Schulz HTTP/1.0 Accept: application/rdf+xml
Count the agenda items in which at least one MEP from
France spoke out.
Access to LOD: 3. SPARQL
SELECT (COUNT (DISTINCT ?ai) as ?count)
WHERE {
?ai rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/
plenary/AgendaItem
?ai dcterms:hasPart ?speech.
?speech lpv:speaker ?speaker.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?label.
filter(?label="France"@en)
}
From server logs we know the query
-some context of the requested URIs
-variable names (?)
Access to LOD: 4. Linked Data Fragments
xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000] 

"GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin
HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en"
…
Access to LOD: 4. Linked Data Fragments
xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000] 

"GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin
HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en"
…
From server logs we know the triple patterns that were
requested
-some context of the requested URIs
-variable names (?)
What do we know about usage of Linked Open
Data?
What do we know about usage of Linked Open
Data?
1. Yearly datasets of server logs released for research purposes, 2011-2016

Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016)
USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344

2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016

Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al.

http://usewod.org/
USEWOD2011
2016
Linked Open Data query log analysis?
1. Yearly datasets of server logs released for research purposes, 2011-2016

Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016)
USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344

2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016

Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al.

http://usewod.org/
USEWOD2011
2016
Linked Open Data query log analysis?
Licensing + Anonymization:
replace all IPs with a
country code and an
identifier
What has been found so far?
• Efficient index generation [1]

• Caching [2]

• Auto-completion [3]

• Hardware scaling at peak times [4]

• modularisation of data [4]
[1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de
la Fuente, P. (2011). An empirical study of real-world
SPARQL queries. USEWOD2011
[2] Lorey, J., & Naumann, F. Caching and prefetching
strategies for sparql queries. USEWOD2013
[3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE:
SPARQL Index for Efficient Autocompletion. ISWC2013
(Posters & Demos)
[4] Luczak-Rösch, M., & Bischoff, M. (2011). Statistical
analysis of web of data usage. EvoDyn2011
[5] Rietveld, L., & Hoekstra, R. Man vs. Machine:
Differences in SPARQL Queries. USEWOD2014
[6] Huelss, J., & Paulheim, H. What SPARQL Query Logs
Tell and do not Tell about Semantic Relatedness in LOD.
NoISE @ ESWC 2015
Issues:
• what is the difference between queries by machines and
humans? [5]

• what is the meaning of repeated queries by tools? Bots?

• a lot of the usage is invisible due to data dump
download
[6]
Reflection: to what extend can we now answer
these questions?
How did the debate about the
financial crisis in Greece
develop?

Which political event has
attracted most media
attention?

What are the differences
between different media?

Has the coverage changed
over time?
Reflection: to what extend can we now answer
these questions?
How did the debate about the
financial crisis in Greece
develop?

Which political event has
attracted most media
attention?

What are the differences
between different media?

Has the coverage changed
over time?
Yes, but:

• what is the influence of the selection of newspapers
available at the National Library?

• what was the quality of the digitisation process (OCR)?

• How good is our linking approach (based on
automatically detected entities and topics)?

➡ How to handle these uncertainties is one of our research
questions! We call this Tool Criticism
Resources:
PoliMedia demo: http://polimedia.nl/
PoliMedia project video: https://youtu.be/u24oRCj7xrQ
Talk of Europe project: http://talkofeurope.eu/
Talk of Europe data: purl.org/linkedpolitics
Talk of Europe project video: https://youtu.be/GxA53gkCe0o
USEWOD workshop: http://usewod.org/
My website: http://homepages.cwi.nl/~hollink/
I’d be happy to answer your questions!

Linked Open Data

  • 1.
    Linked Open Data SIKScourse on Data Science May 20, 2016 Vught. Laura Hollink
  • 2.
    Why do wecreate and use Linked Open Data? Example questions from the humanities and social sciences How did the debate about the financial crisis in Greece develop?
  • 3.
    Searching the proceedingsof the EU Parliament "Greece" in the plenary meetings of the European Parliament Year Nr.ofmentions 050100150200 1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013
  • 4.
    Searching through newspaperarchives Mentions of “Griekenland” in the Dutch newspaper the Telegraaf.
  • 5.
    Search volumes ona search engine Query = “Greece” http://www.google.com/trends
  • 6.
    Search volumes ona search engine Query = “Greece” http://www.google.com/trends We need access to data. Analysing them gives us some useful insight. But to answer the question properly we would need to combine sources and do more complex queries.
  • 7.
    Why do wecreate and use Linked Open Data? Example question 2 Which political debate in the post-war period has attracted most media attention?
  • 8.
  • 9.
    “De Indonesische Quaestie" Toanswer this question we need to go through all newspaper articles about all political debates. -> we need access to combined data sources, we need structured queries.
  • 10.
    Why do wecreate and use Linked Open Data?
  • 11.
    Why do wecreate and use Linked Open Data? Example question 3 What are the differences between different media? Example question 4 Has the coverage changed over time?
  • 12.
    Research goals andresearch questions Our goal is to build an infrastructure to answer these kinds of questions. 1. How do we automatically link heterogeneous datasets? 2. How do we interpret links between datasets of different quality and certainty? 3. What can we conclude from usage statistics on these datasets? 4. Can we design interfaces that allow scholars to study the datasets • including the links between them? • while assessing the reliability of the findings?
  • 13.
    Research goals andresearch questions Our goal is to build an infrastructure to answer these kinds of questions. 1. How do we automatically link heterogeneous datasets? 2. How do we interpret links between datasets of different quality and certainty? 3. What can we conclude from usage statistics on these datasets? 4. Can we design interfaces that allow scholars to study the datasets • including the links between them? • while assessing the reliability of the findings? Data Science - Big Data - Linked Open Data
  • 14.
    Table of Contents 1.What is Linked Open Data (LOD) 2. Creating LOD 1. How to discover links 2. How to represent links on the Web 3. How to evaluate links 3. Access to LOD (from both the server and the client perspective)
  • 15.
    What is LinkedOpen Data?
  • 16.
    What is LinkedOpen Data?
  • 17.
    What is LinkedOpen Data? A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as humans.
  • 18.
    The Web ofDocuments
  • 19.
    The Web ofDocuments • Documents  identified  by  URIs  (html,  pdf,  images,  movies,  etc.)   • with  structured  information  for  humans  (tables,  headers)  and   • with  hyperlinks  between  them   • The  data  is  not  machine  readable,  meant  for  humans   • structure  is  implicit  (what  do  the  columns  of  a  table  mean?)   • links  are  not  typed  (what  is  the  relation  between  two  documents?)  
  • 20.
  • 21.
    The Web ofData • Everything  identified  by  URIs  (not  just  documents,  but  also  classes,   instances,  relations/links)   • The  data  is  machine  readable:     • in  formal  languages  (RDF,  RDFS,  OWL,  SKOS)     • which  enable  machines  to  do  reasoning,  i.e.  infer  new  statements   from  inserted  statements.  
  • 22.
    Compared to adatabase table… Amsterdam has population “1364422” City Schiphol is a has airport
  • 23.
    Thing Type PopulationAirport Amsterdam City 1364422 Schiphol …. … …. … Compared to a database table… Amsterdam has population “1364422” City Schiphol is a has airport
  • 24.
    Differences: • Statements canbe distributed over the web • Non-unique naming assumption • Open World assumption • Everyone can say anything about anything Thing Type Population Airport Amsterdam City 1364422 Schiphol …. … …. … Compared to a database table… Amsterdam has population “1364422” City Schiphol is a has airport
  • 25.
    Examples of URIson the Web of Data • documents: • http://vu.nl/index.html • http://example.org/cities#Leuven • real world objects (a book in the library, a person) • isbn://5031-4444-333 • http://eyaloren.org/foaf.rdf#me • concepts: • http://cyc.org/concept/Mammal • http://cyc.org/concept/Dog • www.w3.org/2006/03/wn/wn20/instances/synset-anniversary-noun-1 • relations: • http://purl.org/linkedpolitics/vocabulary/speaker
  • 26.
    RDF (the basics) •A W3C recommendation to describe resources on the Web of Data called “Resource description Framework” • See https://www.w3.org/RDF/ • RDF data model: triples!
  • 27.
    RDF (the basics) •A W3C recommendation to describe resources on the Web of Data called “Resource description Framework” • See https://www.w3.org/RDF/ • RDF data model: triples!
  • 28.
    RDF (the basics) •A W3C recommendation to describe resources on the Web of Data called “Resource description Framework” • See https://www.w3.org/RDF/ • RDF data model: triples! RDF example in Turtle syntax: <bob#me> a foaf:Person ; foaf:knows <alice#me> ; schema:birthDate "1990-07-04"^^xsd:date ; foaf:topic_interest wd:Q12418 .
  • 29.
    Vocabulary definition andreasoning with RDFS B C r A data level ontology / vocabulary / schema level
  • 30.
    Vocabulary definition andreasoning with RDFS A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A B C r A data level ontology / vocabulary / schema level
  • 31.
    Vocabulary definition andreasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF C rdfs:subClassOf B r rdf:type C THEN r rdf:type B B C r A data level ontology / vocabulary / schema level
  • 32.
    Vocabulary definition andreasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF B rdfs:subClassOf A r rdf:type B THEN r rdf:type A <bob#me> rdf:type foaf:Person . foaf:Person rdfs:subClassOf foaf:Agent .
  • 33.
    Vocabulary definition andreasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF B rdfs:subClassOf A r rdf:type B THEN r rdf:type A <bob#me> rdf:type foaf:Person . foaf:Person rdfs:subClassOf foaf:Agent . <bob#me> a foaf:Agent .
  • 34.
    Vocabulary definition andreasoning with RDFS A B A B C IF B rdfs:subClassOf A C rdfs:subClassOf B THEN C rdfs:subClassOf A IF B rdfs:subClassOf A r rdf:type B THEN r rdf:type A <bob#me> rdf:type foaf:Person . foaf:Person rdfs:subClassOf foaf:Agent . <bob#me> a foaf:Agent . Standard meaning
  • 35.
    Vocabulary definition andreasoning with RDFS IF p rdfs:range R A p B THEN B rdf:type R <bob#me> foaf:knows <alice#me> . foaf:knows rdfs:range foaf:Person .
  • 36.
    Vocabulary definition andreasoning with RDFS IF p rdfs:range R A p B THEN B rdf:type R <bob#me> foaf:knows <alice#me> . foaf:knows rdfs:range foaf:Person . <alice#me> rdf:type foaf:Person .
  • 37.
    Vocabulary definition andreasoning with RDFS IF p rdfs:range R A p B THEN B rdf:type R <bob#me> foaf:knows <alice#me> . foaf:knows rdfs:range foaf:Person . <alice#me> rdf:type foaf:Person . Standard meaning
  • 38.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/
  • 39.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant .
  • 40.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant .
  • 41.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what .
  • 42.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant
  • 43.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant
  • 44.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant.
  • 45.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean
  • 46.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean
  • 47.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean Query: :JamesDean ?what :Giant.
  • 48.
    SPARQL (the basics) •A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ Data: :JamesDean :playedIn :Giant . Query: :JamesDean :playedIn ?what . Answer: :Giant Query: ?who :playedIn :Giant. Answer: :JamesDean Query: :JamesDean ?what :Giant. Answer: :playedIn
  • 49.
    Linked Open Data Amethod of publishing on the Web of Data: openly available, in RDF, with links to other datasets. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 50.
    Linked Open Data Amethod of publishing on the Web of Data: openly available, in RDF, with links to other datasets. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 51.
    Creating Linked OpenData in the Talk of Europe project: Discovering links, knowledge representation
  • 52.
    Creating Linked OpenData in the Talk of Europe project: Discovering links, knowledge representation
  • 53.
    The European Parliamentas Linked Open Data Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Astrid van Aggelen VU University Amsterdam Martijn Kleppe Erasmus University Rotterdam Henri Beunders Erasmus University Rotterdam Jill Briggeman Erasmus University Rotterdam Max Kemman University of Luxembourg
  • 54.
    Talk of Europegoals • To publish the entire plenary debates of the European Parliament as Linked Open Data • To improve access to the data • To enable large scale analysis across time spans. ‣To residents of the European Union access to the proceedings of the European parliament is a formal right. A. van Aggelen, L. Hollink, M. Kemman, M. Kleppe & H. Beunders. The debates of the European Parliament as Linked Open Data. Semantic Web Journal. In press, 2016.
  • 55.
  • 56.
  • 57.
    1. Data inRDF 14M RDF statements about the 30K speeches in 23 languages by 3K speakers in 1K session days that were held in the EU parliament between 1999 and 2014
  • 58.
    2. Links toexternal datasets •
  • 59.
    2. Links toexternal datasets •
  • 60.
    2. Links toexternal datasets •
  • 61.
    Example 1: speechesthat contain a certain keyword Query: all speeches that contain the phrase “open data” …. So let us go for open data, let us go for utilisation of all the instruments available to that end! ….. …. but there too governments are encouraging the use of open data to increase transparency, accountability and citizen participation …. …. We already have many open data projects in the Member States and local authorities…..
  • 62.
    Example 2: speechesthat contain a certain keyword by date "Slovenia" in the plenary meetings of the European Parliament Year Nr.ofmentions 020406080100 1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
  • 63.
    Example 2: speechesthat contain a certain keyword by date "Slovenia" in the plenary meetings of the European Parliament Year Nr.ofmentions 020406080100 1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
  • 64.
    Example 2: speechesthat contain a certain keyword by date Mentions of 'human rights' dates Frequency 0200400600800 1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013
  • 65.
    Example 3: speechesthat contain a certain keyword by country AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK Mentions of 'human rights' by country 01000200030004000500060007000
  • 66.
    Example 4: thenumber of speeches per EU country SELECT ?c (COUNT(?c) as ?count) WHERE { ?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>. ?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p. ?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c } GROUP BY ?c LIMIT 50
  • 67.
    Example 5: backgroundinfo about the MEPs • MEPs that were not born in Europe. Members of Parliament
  • 68.
    Example 5: backgroundinfo about the MEPs • MEPs that were not born in Europe. Members of Parliament
  • 69.
    Example 5: backgroundinfo about the MEPs • MEPs that were not born in Europe. Members of Parliament
  • 70.
    Example 5: backgroundinfo about the MEPs • MEPs that were not born in Europe. Members of Parliament
  • 71.
    Example 5: backgroundinfo about the MEPs • MEPs that were not born in Europe. Members of Parliament
  • 72.
    Example 5: backgroundinfo about the MEPs • MEPs that were not born in Europe. Members of Parliament Integrate data from the EU parliament with external datasets
  • 73.
    Linking Members ofParliament to Wikipedia / DBpedia
  • 74.
    Linking Members ofParliament to Wikipedia / DBpedia
  • 75.
    Linking Members ofParliament to Wikipedia / DBpedia
  • 76.
    Linking Members ofParliament to Wikipedia / DBpedia • String matching is the most important feature in the linking process. • “nearly all [alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. http://www.dbpedia.org/resource/Judith_Sargentini
  • 77.
    Linking Members ofParliament to Wikipedia / DBpedia • String matching is the most important feature in the linking process. • “nearly all [alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. http://www.dbpedia.org/resource/Judith_Sargentini
  • 78.
    How to relatea speech to a speaker and party? lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 79.
    How to relatea speech to a speaker and party? Why is this not a good solution? lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 80.
    How to relatea speech to a speaker and party? Why is this not a good solution? 1. A person might be a member of more than one party (at different times) lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 81.
    How to relatea speech to a speaker and party? Why is this not a good solution? 1. A person might be a member of more than one party (at different times) 2. Since there is no link between a speech and a party, queries for all speeches spoken by the members of a certain party become very complicated. lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 82.
    How to relatea speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker
  • 83.
    How to relatea speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type
  • 84.
    How to relatea speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:spokenAs lpv:speaker lpv:spokenAs rdf:type
  • 85.
    How to relatea speech to a speaker and party? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:spokenAs lpv:speaker lpv:spokenAs rdf:type Note: this is a common “design pattern” referred to as n-ary relations or relations as classes
  • 86.
    Intermezzo: one-question Quiz Reasoningon the Web of Data Question: What can we conclude from this graph? A. Stihler is a member of exactly 3 parties B. Stihler is a member of at least 3 parties C. Stihler is a member of at most 3 parties D. None of the above E. All of the above F. Other, namely …. http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name http://purl.org/linkedpolitics/EUParty/PES http://dbpedia.org/resource/ Party_of_European_Socialists http://dbpedia.org/resource/ Progressive_Alliance_of_Socialists_and_Democrats :memberOf :memberOf :memberOf
  • 87.
    Creating Linked OpenData in the PoliMedia project: Discovering links, knowledge representation, evaluation
  • 88.
    Creating Linked OpenData in the PoliMedia project: Discovering links, knowledge representation, evaluation
  • 89.
  • 91.
    Which political debatein the post-war period has attracted most media attention? What are the differences between different media? Has the coverage changed over time?
  • 92.
    Transcriptions of all9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995) Transcriptions of all meetings of the European Parliament between 1999 and 2014.
  • 94.
    Links in PoliMedia isabout • 3 Million links
  • 95.
    Discovering links betweenpolitics and news Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate
  • 96.
    Step 2: generatelinks Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate
  • 97.
    Step 2: generatelinks Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
  • 98.
  • 99.
    Representation of links •Note: this is another example of the“design pattern” referred to as n-ary relations or relations as classes! • It allows us to save provenance information about the statements we create. :speech123:newsArticle456 :isAbout
  • 100.
    Representation of links •Note: this is another example of the“design pattern” referred to as n-ary relations or relations as classes! • It allows us to save provenance information about the statements we create. :speech123:newsArticle456 :isAbout :speech123 :newsArticle456 :link001 01-02-2013 :PoliMedia_Linking_Engine :quotes :concept1 :concept2 link type :madeBy:creationDate
  • 101.
  • 102.
    Evaluation of links 1.Manually rating (a sample of) links • relatively cheap and easy to interpret • only precision, no recall
  • 103.
    Evaluation of links 1.Manually rating (a sample of) links • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference linkset • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!)
  • 104.
    Evaluation of links 1.Manually rating (a sample of) links • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to a reference linkset • precision and recall • used in OAEI on the SEALS platform • more expensive if a reference alignment has to be created (but: crowd sourcing!) 3. End-to-end evaluation (a.k.a. evaluating an application that uses the mappings) • arguably the best method! • need to have access to an application + users
  • 105.
    Evaluation of links:beyond precision / recall B C r A data level ontology / vocabulary / schema level
  • 106.
    Evaluation of links:beyond precision / recall Generalized precision and Generalized recall • Instead of a binary classification into correct/ incorrect mappings, take into account how wrong an link is: • where r(a) is the semantic distance between correspondence a and correspondence a’ in the reference alignment, A is the number of correspondences. Laura Hollink, Mark van Assem, Shenghui Wang, Antoine Isaac, Guus Schreiber. Two Variations on Ontology Alignment Evaluation: Methodological Issues.ESWC 2008. B C r A data level ontology / vocabulary / schema level
  • 107.
    Evaluation of linksin PoliMedia How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Precision: 80%
  • 108.
    Evaluation of linksin PoliMedia Setting 1 Setting 2 Setting 3 0,48 0,62 0,8 How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Precision: 80%
  • 109.
    Evaluation of linksin PoliMedia Setting 1 Setting 2 Setting 3 0,48 0,62 0,8 How many links did we miss? • We ask the raters to manually search the KB archives for related articles. • Recall: 62% How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Precision: 80%
  • 110.
    DEMO - PoliMediasearch application
  • 116.
    Online database: “SPARQL endpoint” •A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.”
  • 117.
    Access to LinkedOpen Data: how to serve and how to consume Linked Open Data
  • 118.
    Access to LinkedOpen Data: how to serve and how to consume Linked Open Data
  • 119.
    Access to LOD:1. download a data dump
  • 120.
    Access to LOD:1. download a data dump From server logs we know the query -some context of the requested URIs -variable names (?)
  • 121.
    Access to LOD2: follow-your-nose
  • 122.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6
  • 123.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart
  • 124.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart
  • 125.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz
  • 126.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz
  • 127.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz dbp:children "2" lpv:speaker dbc:Officiers_of_the_Légion_d'honneur
  • 128.
    Access to LOD2: follow-your-nose lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent dc:hasPart lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz lp:eu/plenary/ 2013-11-20/Speech_103 ...the fittest need to struggle for the survival of the weak.[...]"@en lpv:spokenText lpv:speaker lp:Speaker_Malala_Yousafzai "Award of the Sakharov Prize (formal sitting)."@en dc:title dc:hasPart lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent ...Ich glaube, das war ein außergewöhnlicher Moment für uns alle hier in diesem Parlament[...]"@en lpv:spokenText lpv:speaker owl:sameAs http:://dbpedia.org/ resource/Martin_Schulz dc:hasPart lp:Martin_Schulz dbp:children "2" lpv:speaker dbc:Officiers_of_the_Légion_d'honneur From server logs we know the requested URI: GET /Martin_Schulz HTTP/1.0 Accept: application/rdf+xml
  • 129.
    Count the agendaitems in which at least one MEP from France spoke out. Access to LOD: 3. SPARQL SELECT (COUNT (DISTINCT ?ai) as ?count) WHERE { ?ai rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/ plenary/AgendaItem ?ai dcterms:hasPart ?speech. ?speech lpv:speaker ?speaker. ?speaker lpv:countryOfRepresentation ?country. ?country rdfs:label ?label. filter(?label="France"@en) }
  • 131.
    From server logswe know the query -some context of the requested URIs -variable names (?)
  • 135.
    Access to LOD:4. Linked Data Fragments xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000] 
 "GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en" …
  • 136.
    Access to LOD:4. Linked Data Fragments xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000] 
 "GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en" … From server logs we know the triple patterns that were requested -some context of the requested URIs -variable names (?)
  • 137.
    What do weknow about usage of Linked Open Data?
  • 138.
    What do weknow about usage of Linked Open Data?
  • 139.
    1. Yearly datasetsof server logs released for research purposes, 2011-2016 Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016) USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344 2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016 Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al. http://usewod.org/ USEWOD2011 2016 Linked Open Data query log analysis?
  • 140.
    1. Yearly datasetsof server logs released for research purposes, 2011-2016 Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016) USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344 2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016 Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al. http://usewod.org/ USEWOD2011 2016 Linked Open Data query log analysis? Licensing + Anonymization: replace all IPs with a country code and an identifier
  • 141.
    What has beenfound so far? • Efficient index generation [1] • Caching [2] • Auto-completion [3] • Hardware scaling at peak times [4] • modularisation of data [4] [1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. USEWOD2011 [2] Lorey, J., & Naumann, F. Caching and prefetching strategies for sparql queries. USEWOD2013 [3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE: SPARQL Index for Efficient Autocompletion. ISWC2013 (Posters & Demos) [4] Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. EvoDyn2011 [5] Rietveld, L., & Hoekstra, R. Man vs. Machine: Differences in SPARQL Queries. USEWOD2014 [6] Huelss, J., & Paulheim, H. What SPARQL Query Logs Tell and do not Tell about Semantic Relatedness in LOD. NoISE @ ESWC 2015 Issues: • what is the difference between queries by machines and humans? [5] • what is the meaning of repeated queries by tools? Bots? • a lot of the usage is invisible due to data dump download [6]
  • 142.
    Reflection: to whatextend can we now answer these questions? How did the debate about the financial crisis in Greece develop? Which political event has attracted most media attention? What are the differences between different media? Has the coverage changed over time?
  • 143.
    Reflection: to whatextend can we now answer these questions? How did the debate about the financial crisis in Greece develop? Which political event has attracted most media attention? What are the differences between different media? Has the coverage changed over time? Yes, but: • what is the influence of the selection of newspapers available at the National Library? • what was the quality of the digitisation process (OCR)? • How good is our linking approach (based on automatically detected entities and topics)? ➡ How to handle these uncertainties is one of our research questions! We call this Tool Criticism
  • 144.
    Resources: PoliMedia demo: http://polimedia.nl/ PoliMediaproject video: https://youtu.be/u24oRCj7xrQ Talk of Europe project: http://talkofeurope.eu/ Talk of Europe data: purl.org/linkedpolitics Talk of Europe project video: https://youtu.be/GxA53gkCe0o USEWOD workshop: http://usewod.org/ My website: http://homepages.cwi.nl/~hollink/ I’d be happy to answer your questions!