Guest Lecture: Linked Open Data for the Humanities and Social Sciences

Laura Hollink
Laura HollinkResearcher (tenure track) at Centrum Wiskunde & Informatica
Linked Open Data
for the Humanities and Social Sciences
Use cases: linking government data to news data
in the PoliMedia and Talk of Europe projects
Laura Hollink

Centrum Wiskunde & Informatica (CWI)

KU Leuven

Guest lecture 

November 10, 2016
Linked Open Data in the SSH?
Example question:
How did the debate about
the financial crisis in
Greece develop?
Searching the proceedings of the European
Parliament
"Greece" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
050100150200
1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013
Searching through newspaper archives
Mentions of “Griekenland” in the Dutch newspaper De Telegraaf
Search volumes of a search engine
Frequency of the query “Greece” on Google
http://www.google.com/trends
Search volumes of a search engine
Frequency of the query “Greece” on Google
http://www.google.com/trends
We need: 

✦open access to data
✦to combine sources
✦more complex queries
Linked Open Data in the SSH?
Example question: 

Which political debate in the
post-war period has attracted
most media attention?
“De Indonesische Quaestie"
“De Indonesische Quaestie"
To answer this question we need to
go through all newspaper articles
about all political debates…
“De Indonesische Quaestie"
To answer this question we need to
go through all newspaper articles
about all political debates…
We need: 

✦open access to data
✦to combine sources
✦more complex queries
Linked Open Data in the SSH?
Example question:
What are the differences
between different media?

Example question:
Has the coverage changed
over time?
A method of publishing structured data on the Web
in such a way that it can be linked and queried
by computers as well as people.
A very brief introduction…
✦open access to data
✦to combine sources
✦more complex queries
Linked Open Data
A method of publishing structured data on the Web
in such a way that it can be linked and queried
by computers as well as people.
A very brief introduction…
✦open access to data
✦to combine sources
✦more complex queries
Linked Open Data
Thing Type Population Airport
Amsterdam City 1364422 Schiphol
…. … …. …
Structured data
ex:Amsterdam a ex:City .
ex:Amsterdam dbo:populationUrban "1330235"^^xsd:integer .
ex:Amsterdam dbp:cityServed ex:Schiphol .
Comparable to the data one may find in a database table
Represented as RDF triples
On the Web
Everything is identified by URIs (documents, concepts, instances, links)
http://example.org/cities#Amsterdam
http://example.org/City
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://dbpedia.org/ontology/population
On the Web
Triples can be distributed over the Web
Everything is identified by URIs (documents, concepts, instances, links)
http://example.org/cities#Amsterdam
http://example.org/City
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://dbpedia.org/ontology/population
http://example.org/cities#Amsterdam a ex:City.
http://example.org/cities#Amsterdam dbo:populationUrban "1364422"
http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol
On the Web
Amsterdam
has population
“1364422” City Schiphol
is a
has airport
Triples can be distributed over the Web
Everything is identified by URIs (documents, concepts, instances, links)
http://example.org/cities#Amsterdam
http://example.org/City
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://dbpedia.org/ontology/population
http://example.org/cities#Amsterdam a ex:City.
http://example.org/cities#Amsterdam dbo:populationUrban "1364422"
http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol
Forming a graph
The Web of Data vs. the Web of Documents
The Web of Data vs. the Web of Documents
The Web of Data vs. the Web of Documents
Note the differences Web of Data <-> database:

• Non-unique naming assumption

• Open World assumption

• Everyone can say anything about anything
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Querying Linked Open Data
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”

• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
:JamesDean ?what :Giant.
?who :playedIn :Giant.
:JamesDean :playedIn ?what .
:JamesDean :playedIn :Giant .
:Giant
:JamesDean
:playedIn
Data
Query Result
Two example projects of Linked Open Data in SSH:
data modelling and linking in the PoliMedia and
Talk of Europe projects
Linking government data to news data
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Roughly 1.8 Million news
bulletins between
1937-1984

(We only use 1945-1995)
Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.

(We only use 1945-1995)
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Links in PoliMedia
is about
• 3 Million links
Step 1: Translate the Dutch parliamentary debates
to the standard structured web format RDF
nl.proc.sgd.d.
194519460000002
nl.proc.sgd.d.
194519460000002.1
PartOfDebateDebate
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002
http://statengeneraaldigitaal.nl/
http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
nl.proc.sgd.d.19720000002
Handelingen Verenigde
Vergadering...
Dutch
1945-11-20
rdf:type
dc:id
dc:source
dc:source
dc:publisher
dc:language
dc:date
hasPart
rdf:type
nl.proc.sgd.d.
194519460000002.1.1
hasPart
DebateContext
rdf:type
nl.proc.sgd.d.
194519460000002.1.2
Speech
rdf:type
hasPart
nl.proc.sgd.d.
194519460000002.1.3
hasSubsequentSpeech
"Mijnheer de
Voorzitter, de
Commissie
van …"
hasSpokenText
sem:hasActor
Speaker_0006
4
Party_kvp
hasParty
hasSpeaker
member_of
_parliament
"De voorzitter
opent de
vergadering…"
hasText
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
coveredIn
Party
KVP
Katholieke Volkspartij
rdf:type
hasAcronym
hasFullName
Joannes Antonius James
Bargefoaf:firstName
foaf:lastName
Barge
rdfs:label
http://resolver.politicalmashup.nl/nl.m.00064
dc:source
Politician
rdf:type
hasRole
nl.proc.sgd.d.
194519460000002.2
hasSubsequentPartOfDebate
Step 2: Discovering links between politics and
news
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Step 2: Discovering links between politics and
news
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Step 2: Discovering links between politics and
news
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Intuition 2: the more the article and the
speech overlap in terms of topics and
named entities, the more they are related.
Representation of links
architecten architectsskos:exactMatch
Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
Representation of links
architecten
architects
Link 001
skos:exactMatch
handmatigL. Hollink
concept1
concept2
link type
link methode
auteur
architecten architectsskos:exactMatch
• This is an example of the“design
pattern” referred to as n-ary
relations or relations as classes.

• It allows us to save provenance
information about the statements
we create.
Evaluation of Links
Recall that we aim to use the links to answer a research
question.

Can we still do that if there are errors in the links? 

How many errors are acceptable? 

We need to know the quality!
Evaluation of Links
How would you determine the quality of the links?
Recall that we aim to use the links to answer a research
question.

Can we still do that if there are errors in the links? 

How many errors are acceptable? 

We need to know the quality!
Evaluation of Links
How would you determine the quality of the links?
1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall
Recall that we aim to use the links to answer a research
question.

Can we still do that if there are errors in the links? 

How many errors are acceptable? 

We need to know the quality!
Evaluation of Links
How would you determine the quality of the links?
1. Manually rating (a sample of) mappings

• relatively cheap and easy to interpret

• only precision, no recall
2. Comparison to manually found links

• precision and recall

• more expensive! (but: crowd sourcing?)
Recall that we aim to use the links to answer a research
question.

Can we still do that if there are errors in the links? 

How many errors are acceptable? 

We need to know the quality!
Evaluation of links in PoliMedia
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.

• a pilot study showed that we needed more
than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable,
but not high.

• Score: 80%
Evaluation of links in PoliMedia
Score Setting 1 Setting 2 Setting 3
I don’t know 0,14 0,15 0,08
0 - unrelated 0,38 0,23 0,12
1- related 0,29 0,36 0,36
2- explicit mention of the debate 0,19 0,26 0,44
1+2 0,48 0,62 0,8
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.

• a pilot study showed that we needed more
than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable,
but not high.

• Score: 80%
Evaluation of links in PoliMedia
Score Setting 1 Setting 2 Setting 3
I don’t know 0,14 0,15 0,08
0 - unrelated 0,38 0,23 0,12
1- related 0,29 0,36 0,36
2- explicit mention of the debate 0,19 0,26 0,44
1+2 0,48 0,62 0,8
How many links did we miss?
• We ask the raters to
manually search the
archives of the National
Library for related articles.

• Score: 62%
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.

• a pilot study showed that we needed more
than a 2 point scale.

• inter-rater agreement: 0.5 -> acceptable,
but not high.

• Score: 80%
Results
• An open data set of Dutch parliamentary debates,

• with almost 3 Million links between 450.000 speeches and 1.5 Million news
paper articles and radio bulletins at the National Library.

• accessible though a Web demonstrator and through a Sparql Enpoint
Demo
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Online database:
SPARQL endpoint
• A service to query a knowledge
base using the SPARQL query
language.

“All speeches with more
than 60 associated news
items.”
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
The European Parliament as Linked Open Data
Laura Hollink	 	 Centrum Wiskunde & Informatica, Amsterdam
Astrid van Aggelen 	 VU University Amsterdam
Martijn Kleppe	 	 Erasmus University Rotterdam
Henri Beunders Erasmus University Rotterdam
Jill Briggeman Erasmus University Rotterdam
Max Kemman	 	 University of Luxembourg
Talk of Europe goals
• To publish the entire plenary debates of the European
Parliament as Linked Open Data

• To improve access to the data

• To enable large scale analysis across time spans.

‣To residents of the European Union access to the proceedings
of the European parliament is a formal right.
Step 1: Translate the
European parliamentary
debates to Linked
Open Data
Step 1: Translate the
European parliamentary
debates to Linked
Open Data
14M RDF statements about the 30K
speeches in 23 languages by 3K
speakers in 1K session days that
were held in the EU parliament
between 1999 and 2014
Step 1: Translate the
European parliamentary
debates to Linked
Open Data
Modelling debates as events, not documents
• `
lpv:number
lpv:month
lpv:year
rdf:type
lp:eu/plenary/SessionDay/
2013-11-20
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
lp:eu/plenary/
Session/2013-11
"2013-11-20"^xsd:date
"11"^xsd:gMonth
"2013"^xsd:gYear
lp:eu/plenary/2013-11-20/
AgendaItem_7
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
lpv:has
Subsequent
dc:date
dc:date
dc:date
103^xsd:integer
6^xsd:integer
lpv:number
dc:hasPart dc:isPartOf
dc:hasPart dc:isPartOf
dc:isPartOfdc:hasPart
lpv:eu/plenary/Speech
lpv:eu/plenary/AgendaItem
lpv:eu/plenary/SessionDay
lpv:eu/plenary/Session
rdf:type
rdf:type
rdf:type
PREFIX lpv: <http://purl.org/linkedpolitics/vocabulary/>
PREFIX lp: <http://purl.org/linkedpolitics/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
How to relate a speech the party of the speaker?
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech the party of the speaker?
Why is this not a good solution?
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech the party of the speaker?
Why is this not a good solution?
1. A person might be a member of more than one party (at different times)
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech the party of the speaker?
Why is this not a good solution?
1. A person might be a member of more than one party (at different times)
2. Since there is no link between a speech and a party, queries for all speeches
spoken by the members of a certain party become very complicated.
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
How to relate a speech to the party of the
speaker?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
How to relate a speech to the party of the
speaker?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
How to relate a speech to the party of the
speaker?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:spokenAs
lpv:speaker
lpv:spokenAs
rdf:type
How to relate a speech to the party of the
speaker?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:spokenAs
lpv:speaker
lpv:spokenAs
rdf:type
Note: this is another example of the
design pattern called n-ary relations or
relations as classes.
Step 2: create links to external data sources
•
Step 2: create links to external data sources
•
Step 2: create links to external data sources
•
(links made by the EC)
Linking Members of Parliament to Wikipedia /
DBpedia
how?
Linking Members of Parliament to Wikipedia /
DBpedia
Linking Members of Parliament to Wikipedia /
DBpedia
• String matching is the most important feature in the linking process.

• “nearly all [alignment systems] use a string similarity metric” [12]

• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
http://www.dbpedia.org/page/Judith_Sargentini
Example query 1: speeches that contain a certain
keyword
Query: all speeches that contain the phrase “open data”
…. So let us go for open data, let us
go for utilisation of all the instruments
available to that end! …..
…. but there too governments are
encouraging the use of open data to
increase transparency, accountability
and citizen participation ….
…. We already have many open data
projects in the Member States and
local authorities…..
Example 2: speeches that contain a certain
keyword by date
"Slovenia" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
020406080100
1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
Example 2: speeches that contain a certain
keyword by date
"Slovenia" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
020406080100
1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
Example 2: speeches that contain a certain keyword
by date
Mentions of 'human rights'
dates
Frequency
0200400600800
1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013
Example 3: speeches that contain a certain keyword
by country
AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK
Mentions of 'human rights' by country
01000200030004000500060007000
Example 4: the number of speeches per EU
country
SELECT ?c (COUNT(?c) as ?count)
WHERE {
?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>.
?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p.
?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c
} GROUP BY ?c LIMIT 50
Example 5: include data external source
Query: MEPs that were born outside Europe.
Members of Parliament
(DBpedia contains info on
birthplace, birth date, schools,
careers, residence, family, etc. )
Example 5: include data external source
Query: MEPs that were born outside Europe.
Members of Parliament
(DBpedia contains info on
birthplace, birth date, schools,
careers, residence, family, etc. )
Intermezzo: one-question Quiz
Reasoning on the Web of Data
Question: What can we conclude from this graph?

A. Stihler is a member of exactly 3 parties

B. Stihler is a member of at least 3 parties

C. Stihler is a member of at most 3 parties

D. None of the above

E. All of the above

F. Other, namely ….
http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name
http://purl.org/linkedpolitics/EUParty/PES
http://dbpedia.org/resource/
Party_of_European_Socialists
http://dbpedia.org/resource/
Progressive_Alliance_of_Socialists_and_Democrats
:memberOf
:memberOf
:memberOf
Results
• An open data set of EU parliamentary debates,

• with links to other sources on the Web of Data

• accessible though a through a Sparql Enpoint
Reflection: to what extent can we now answer
these questions?
How did the debate about the
financial crisis in Greece
develop?

Which political event has
attracted most media
attention?

What are the differences
between different media?

Has the coverage changed
over time?
Reflection: to what extent can we now answer
these questions?
How did the debate about the
financial crisis in Greece
develop?

Which political event has
attracted most media
attention?

What are the differences
between different media?

Has the coverage changed
over time?
We can, but:

• what is the influence of the selection of newspapers
available at the National Library?

• what was the quality of the digitisation process (OCR)?

• How good is our linking approach (based on
automatically detected entities and topics)?

• How much can we trust the quality of external sources?

➡ How to handle these uncertainties is one of our research
questions. We call this Tool Criticism
Research directions at CWI
Transparent, reproducible analysis of large volumes of connected,
heterogenous, multimodal data.
1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. How do we handle the fact that knowledge evolves?

4. How do we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?
Research directions at CWI
Transparent, reproducible analysis of large volumes of connected,
heterogenous, multimodal data.
1. How do we automatically link heterogeneous datasets?

2. How do we interpret links between datasets of different quality and certainty?

3. How do we handle the fact that knowledge evolves?

4. How do we design interfaces that allow scholars to study the datasets

• including the links between them?

• while assessing the reliability of the findings?
Data Science - Big Data - Web of Data
PoliMedia demo: http://polimedia.nl/
PoliMedia project video: https://youtu.be/u24oRCj7xrQ
Talk of Europe project: http://talkofeurope.eu/
Talk of Europe data: purl.org/linkedpolitics
Talk of Europe project video: https://youtu.be/GxA53gkCe0o
My website: http://homepages.cwi.nl/~hollink/
A. van Aggelen, L. Hollink, M. Kemman, M. Kleppe & H. Beunders. The
debates of the European Parliament as Linked Open Data. Semantic Web
Journal. In press, 2016.
M. Kleppe, L. Hollink, J. Oomen, M. Kenman, D. Juric, J. Blom, H.
Beunders. PoliMedia - Improving the Analyses of Radio & Newspaper
coverage of Political Debates. First prize winner of the LinkedUp Veni
Competition, presented at the Open Knowledge Conference (OKCon),
Geneva, September 2013..
I’d be happy to answer any questions!
1 of 90

Recommended

Linked Open Data by
Linked Open DataLinked Open Data
Linked Open DataLaura Hollink
254 views144 slides
Signposting Overview (Version November 2017) by
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
11.3K views36 slides
The web is rotting and what to do about it by
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
325 views86 slides
Ld4 dh tutorial by
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorialEnrico Daga
1.9K views163 slides
Linked opendata parisemantique.fr - 24062011 by
Linked opendata   parisemantique.fr - 24062011Linked opendata   parisemantique.fr - 24062011
Linked opendata parisemantique.fr - 24062011Loïc Dias Da Silva
916 views57 slides
Interoperability for web based scholarship by
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
4.1K views36 slides

More Related Content

What's hot

Semantic Technologies: Representing Semantic Data by
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
7K views74 slides
Slides by
SlidesSlides
Slidesrazzmenot
80 views48 slides
Signposting for Repositories by
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
798 views34 slides
Introduction to Linked Data by
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataJuan Sequeda
10.4K views60 slides
Christian Jakenfelds by
Christian JakenfeldsChristian Jakenfelds
Christian JakenfeldsConnected Data World
424 views34 slides
FASIDS introduction by
FASIDS introductionFASIDS introduction
FASIDS introductionReza Hosseini Teshnizi
168 views115 slides

What's hot(13)

Semantic Technologies: Representing Semantic Data by Matthew Rowe
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
Matthew Rowe7K views
Signposting for Repositories by Martin Klein
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
Martin Klein798 views
Introduction to Linked Data by Juan Sequeda
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
Juan Sequeda10.4K views
Facebook data mining - case study by Josef Šlerka
Facebook data mining - case studyFacebook data mining - case study
Facebook data mining - case study
Josef Šlerka1K views
Discovering Scholarly Orphans Using ORCID by Martin Klein
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
Martin Klein1.8K views
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le... by Olaf Janssen
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
Olaf Janssen1.1K views
WW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The Hague by Olaf Janssen
WW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The HagueWW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The Hague
WW2 underground newspapers on Wikipedia using DBPedia , 12-2-2016, The Hague
Olaf Janssen1.3K views
What_do_Knowledge_Graph_Embeddings_Learn.pdf by Heiko Paulheim
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim169 views

Viewers also liked

Connecting political data to media data by
Connecting political data to media dataConnecting political data to media data
Connecting political data to media dataLaura Hollink
1K views36 slides
Linked Data: principles and examples by
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples Victor de Boer
3.6K views81 slides
Linked data for Libraries, Archives, Museums by
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museumsljsmart
6.8K views34 slides
Linked data and Semantic Web Applications for Libraries by
Linked data and Semantic Web Applications for LibrariesLinked data and Semantic Web Applications for Libraries
Linked data and Semantic Web Applications for LibrariesVikas Bhushan
1.3K views51 slides
Web of Data - Introduction (english) by
Web of Data - Introduction (english)Web of Data - Introduction (english)
Web of Data - Introduction (english)Thomas Francart
941 views87 slides
WTF is the Semantic Web and Linked Data by
WTF is the Semantic Web and Linked DataWTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked DataJuan Sequeda
1.6K views121 slides

Viewers also liked(11)

Connecting political data to media data by Laura Hollink
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
Laura Hollink1K views
Linked Data: principles and examples by Victor de Boer
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples
Victor de Boer3.6K views
Linked data for Libraries, Archives, Museums by ljsmart
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
ljsmart6.8K views
Linked data and Semantic Web Applications for Libraries by Vikas Bhushan
Linked data and Semantic Web Applications for LibrariesLinked data and Semantic Web Applications for Libraries
Linked data and Semantic Web Applications for Libraries
Vikas Bhushan1.3K views
Web of Data - Introduction (english) by Thomas Francart
Web of Data - Introduction (english)Web of Data - Introduction (english)
Web of Data - Introduction (english)
Thomas Francart941 views
WTF is the Semantic Web and Linked Data by Juan Sequeda
WTF is the Semantic Web and Linked DataWTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked Data
Juan Sequeda1.6K views
The Semantic Web Exists. What Next? by Anna Fensel
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
Anna Fensel4.4K views
Talk of Europe – Linking European Parliament Proceedings by Astrid van Aggelen
Talk of Europe – Linking European Parliament ProceedingsTalk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament Proceedings
Astrid van Aggelen2.4K views
Linked Data and Libraries: What? Why? How? by Emily Nimsakont
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
Emily Nimsakont758 views
What is Linked Data, and What Does It Mean for Libraries? by Emily Nimsakont
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
Emily Nimsakont1.9K views

Similar to Guest Lecture: Linked Open Data for the Humanities and Social Sciences

ICWE2013 - Discovering links between political debates and media by
ICWE2013 - Discovering links between political debates and mediaICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and mediagjhouben
409 views19 slides
Linked Open Data and Applications by
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications Victor de Boer
916 views64 slides
Talk of Europe: Linked data of the European Parliament by
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European ParliamentLaura Hollink
5.9K views16 slides
Madrid Linked Data for Digital Humanities by
Madrid Linked Data for Digital HumanitiesMadrid Linked Data for Digital Humanities
Madrid Linked Data for Digital HumanitiesVictor de Boer
831 views64 slides
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr... by
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
4.2K views17 slides
The Europeana Strategy and Linked Open Data by
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open DataDavid Haskiya
2.2K views51 slides

Similar to Guest Lecture: Linked Open Data for the Humanities and Social Sciences(20)

ICWE2013 - Discovering links between political debates and media by gjhouben
ICWE2013 - Discovering links between political debates and mediaICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and media
gjhouben409 views
Linked Open Data and Applications by Victor de Boer
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications
Victor de Boer916 views
Talk of Europe: Linked data of the European Parliament by Laura Hollink
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European Parliament
Laura Hollink5.9K views
Madrid Linked Data for Digital Humanities by Victor de Boer
Madrid Linked Data for Digital HumanitiesMadrid Linked Data for Digital Humanities
Madrid Linked Data for Digital Humanities
Victor de Boer831 views
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr... by Leon Derczynski
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Leon Derczynski4.2K views
The Europeana Strategy and Linked Open Data by David Haskiya
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open Data
David Haskiya2.2K views
Digital History Seminar by CDesenclos
Digital History SeminarDigital History Seminar
Digital History Seminar
CDesenclos101 views
Semantic Search Summer School2009 by Peter Mika
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
Peter Mika1.5K views
Nemeth Marton - Widening the limits of cognitive reception with online digita... by BOBCATSSS 2017
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...
BOBCATSSS 2017127 views
Connecting political data to media data by Laura Hollink
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
Laura Hollink5.9K views
#opentourism - Linked Open Data Publishing and Discovery Workshop by Raf Buyle
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop
Raf Buyle287 views
Lodlam presentation v1.0 final al20151104 by Asa Letourneau
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104
Asa Letourneau804 views
Talk of Europe @ DHBenelux2015 by Laura Hollink
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
Laura Hollink905 views
Linked Data (1st Linked Data Meetup Malmö) by Anja Jentzsch
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch1.5K views
Digital Methods and Tools for Hacking Journalism by annehelmond
Digital Methods and Tools for Hacking JournalismDigital Methods and Tools for Hacking Journalism
Digital Methods and Tools for Hacking Journalism
annehelmond719 views
Modern text mining – understanding a million comments in 60 minutes by ZOLLHOF - Tech Incubator
Modern text mining – understanding a million comments in 60 minutesModern text mining – understanding a million comments in 60 minutes
Modern text mining – understanding a million comments in 60 minutes
Desenclos 9 october 2012 a tei approach by Digital History
Desenclos 9 october 2012   a tei approachDesenclos 9 october 2012   a tei approach
Desenclos 9 october 2012 a tei approach
Digital History246 views
What do we want computers to do for us? by Andrea Volpini
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
Andrea Volpini2.3K views
Widening the limits of cognitive reception with online digital library graph ... by Marton Nemeth
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...
Marton Nemeth551 views
Linked Open Data for Public Contracts by Martin Necasky
Linked Open Data for Public ContractsLinked Open Data for Public Contracts
Linked Open Data for Public Contracts
Martin Necasky2K views

More from Laura Hollink

Creating and Analysing Linked Open Data for the EU Parliament by
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentLaura Hollink
564 views40 slides
Enriching Linked Open Data with distributional semantics to study concept drift by
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftLaura Hollink
636 views43 slides
Images in Online News: demo scenario by
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenarioLaura Hollink
736 views12 slides
Presentation at the final meeting of the MuNCH project by
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectLaura Hollink
5.9K views12 slides
WWW2013: Web Usage Mining with Semantic Analysis by
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisLaura Hollink
699 views22 slides
Bringing parliamentary debates to the Semantic Web by
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebLaura Hollink
675 views21 slides

More from Laura Hollink(6)

Creating and Analysing Linked Open Data for the EU Parliament by Laura Hollink
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
Laura Hollink564 views
Enriching Linked Open Data with distributional semantics to study concept drift by Laura Hollink
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
Laura Hollink636 views
Images in Online News: demo scenario by Laura Hollink
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
Laura Hollink736 views
Presentation at the final meeting of the MuNCH project by Laura Hollink
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
Laura Hollink5.9K views
WWW2013: Web Usage Mining with Semantic Analysis by Laura Hollink
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
Laura Hollink699 views
Bringing parliamentary debates to the Semantic Web by Laura Hollink
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
Laura Hollink675 views

Recently uploaded

Spesifikasi Lengkap ASUS Vivobook Go 14 by
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14Dot Semarang
35 views1 slide
The details of description: Techniques, tips, and tangents on alternative tex... by
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
121 views24 slides
Java Platform Approach 1.0 - Picnic Meetup by
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic MeetupRick Ossendrijver
25 views39 slides
The Research Portal of Catalonia: Growing more (information) & more (services) by
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
73 views25 slides
Attacking IoT Devices from a Web Perspective - Linux Day by
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day Simone Onofri
15 views68 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
209 views20 slides

Recently uploaded(20)

Spesifikasi Lengkap ASUS Vivobook Go 14 by Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada121 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 views
RADIUS-Omnichannel Interaction System by RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS15 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb12 views
Data-centric AI and the convergence of data and model engineering: opportunit... by Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier34 views
The Importance of Cybersecurity for Digital Transformation by NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS27 views
.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk93 views
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... by Vadym Kazulkin
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
Vadym Kazulkin75 views
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex19 views
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... by NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS37 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada130 views
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS28 views

Guest Lecture: Linked Open Data for the Humanities and Social Sciences

  • 1. Linked Open Data for the Humanities and Social Sciences Use cases: linking government data to news data in the PoliMedia and Talk of Europe projects Laura Hollink Centrum Wiskunde & Informatica (CWI) KU Leuven Guest lecture November 10, 2016
  • 2. Linked Open Data in the SSH? Example question: How did the debate about the financial crisis in Greece develop?
  • 3. Searching the proceedings of the European Parliament "Greece" in the plenary meetings of the European Parliament Year Nr.ofmentions 050100150200 1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013
  • 4. Searching through newspaper archives Mentions of “Griekenland” in the Dutch newspaper De Telegraaf
  • 5. Search volumes of a search engine Frequency of the query “Greece” on Google http://www.google.com/trends
  • 6. Search volumes of a search engine Frequency of the query “Greece” on Google http://www.google.com/trends We need: ✦open access to data ✦to combine sources ✦more complex queries
  • 7. Linked Open Data in the SSH? Example question: Which political debate in the post-war period has attracted most media attention?
  • 9. “De Indonesische Quaestie" To answer this question we need to go through all newspaper articles about all political debates…
  • 10. “De Indonesische Quaestie" To answer this question we need to go through all newspaper articles about all political debates… We need: ✦open access to data ✦to combine sources ✦more complex queries
  • 11. Linked Open Data in the SSH? Example question: What are the differences between different media? Example question: Has the coverage changed over time?
  • 12. A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as people. A very brief introduction… ✦open access to data ✦to combine sources ✦more complex queries Linked Open Data
  • 13. A method of publishing structured data on the Web in such a way that it can be linked and queried by computers as well as people. A very brief introduction… ✦open access to data ✦to combine sources ✦more complex queries Linked Open Data
  • 14. Thing Type Population Airport Amsterdam City 1364422 Schiphol …. … …. … Structured data ex:Amsterdam a ex:City . ex:Amsterdam dbo:populationUrban "1330235"^^xsd:integer . ex:Amsterdam dbp:cityServed ex:Schiphol . Comparable to the data one may find in a database table Represented as RDF triples
  • 15. On the Web Everything is identified by URIs (documents, concepts, instances, links) http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population
  • 16. On the Web Triples can be distributed over the Web Everything is identified by URIs (documents, concepts, instances, links) http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population http://example.org/cities#Amsterdam a ex:City. http://example.org/cities#Amsterdam dbo:populationUrban "1364422" http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol
  • 17. On the Web Amsterdam has population “1364422” City Schiphol is a has airport Triples can be distributed over the Web Everything is identified by URIs (documents, concepts, instances, links) http://example.org/cities#Amsterdam http://example.org/City http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/population http://example.org/cities#Amsterdam a ex:City. http://example.org/cities#Amsterdam dbo:populationUrban "1364422" http://example.org/cities#Amsterdam dbp:cityServed ex:Schiphol Forming a graph
  • 18. The Web of Data vs. the Web of Documents
  • 19. The Web of Data vs. the Web of Documents
  • 20. The Web of Data vs. the Web of Documents Note the differences Web of Data <-> database: • Non-unique naming assumption • Open World assumption • Everyone can say anything about anything
  • 21. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 22. Querying Linked Open Data • A W3C recommendation for querying RDF graphs called “SPARQL Protocol And RDF Query Language” • See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/ sparql11-query/ :JamesDean ?what :Giant. ?who :playedIn :Giant. :JamesDean :playedIn ?what . :JamesDean :playedIn :Giant . :Giant :JamesDean :playedIn Data Query Result
  • 23. Two example projects of Linked Open Data in SSH: data modelling and linking in the PoliMedia and Talk of Europe projects
  • 24. Linking government data to news data
  • 26. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • 28. Links in PoliMedia is about • 3 Million links
  • 29. Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF nl.proc.sgd.d. 194519460000002 nl.proc.sgd.d. 194519460000002.1 PartOfDebateDebate http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 http://statengeneraaldigitaal.nl/ http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf nl.proc.sgd.d.19720000002 Handelingen Verenigde Vergadering... Dutch 1945-11-20 rdf:type dc:id dc:source dc:source dc:publisher dc:language dc:date hasPart rdf:type nl.proc.sgd.d. 194519460000002.1.1 hasPart DebateContext rdf:type nl.proc.sgd.d. 194519460000002.1.2 Speech rdf:type hasPart nl.proc.sgd.d. 194519460000002.1.3 hasSubsequentSpeech "Mijnheer de Voorzitter, de Commissie van …" hasSpokenText sem:hasActor Speaker_0006 4 Party_kvp hasParty hasSpeaker member_of _parliament "De voorzitter opent de vergadering…" hasText http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr coveredIn Party KVP Katholieke Volkspartij rdf:type hasAcronym hasFullName Joannes Antonius James Bargefoaf:firstName foaf:lastName Barge rdfs:label http://resolver.politicalmashup.nl/nl.m.00064 dc:source Politician rdf:type hasRole nl.proc.sgd.d. 194519460000002.2 hasSubsequentPartOfDebate
  • 30. Step 2: Discovering links between politics and news Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate
  • 31. Step 2: Discovering links between politics and news Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate
  • 32. Step 2: Discovering links between politics and news Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
  • 33. Representation of links architecten architectsskos:exactMatch
  • 34. Representation of links architecten architects Link 001 skos:exactMatch handmatigL. Hollink concept1 concept2 link type link methode auteur architecten architectsskos:exactMatch
  • 35. Representation of links architecten architects Link 001 skos:exactMatch handmatigL. Hollink concept1 concept2 link type link methode auteur architecten architectsskos:exactMatch • This is an example of the“design pattern” referred to as n-ary relations or relations as classes. • It allows us to save provenance information about the statements we create.
  • 36. Evaluation of Links Recall that we aim to use the links to answer a research question. Can we still do that if there are errors in the links? How many errors are acceptable? We need to know the quality!
  • 37. Evaluation of Links How would you determine the quality of the links? Recall that we aim to use the links to answer a research question. Can we still do that if there are errors in the links? How many errors are acceptable? We need to know the quality!
  • 38. Evaluation of Links How would you determine the quality of the links? 1. Manually rating (a sample of) mappings • relatively cheap and easy to interpret • only precision, no recall Recall that we aim to use the links to answer a research question. Can we still do that if there are errors in the links? How many errors are acceptable? We need to know the quality!
  • 39. Evaluation of Links How would you determine the quality of the links? 1. Manually rating (a sample of) mappings • relatively cheap and easy to interpret • only precision, no recall 2. Comparison to manually found links • precision and recall • more expensive! (but: crowd sourcing?) Recall that we aim to use the links to answer a research question. Can we still do that if there are errors in the links? How many errors are acceptable? We need to know the quality!
  • 40. Evaluation of links in PoliMedia How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Score: 80%
  • 41. Evaluation of links in PoliMedia Score Setting 1 Setting 2 Setting 3 I don’t know 0,14 0,15 0,08 0 - unrelated 0,38 0,23 0,12 1- related 0,29 0,36 0,36 2- explicit mention of the debate 0,19 0,26 0,44 1+2 0,48 0,62 0,8 How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Score: 80%
  • 42. Evaluation of links in PoliMedia Score Setting 1 Setting 2 Setting 3 I don’t know 0,14 0,15 0,08 0 - unrelated 0,38 0,23 0,12 1- related 0,29 0,36 0,36 2- explicit mention of the debate 0,19 0,26 0,44 1+2 0,48 0,62 0,8 How many links did we miss? • We ask the raters to manually search the archives of the National Library for related articles. • Score: 62% How good are the links? • We ask 2 raters to manually score pairs of newspaper articles and speeches. • a pilot study showed that we needed more than a 2 point scale. • inter-rater agreement: 0.5 -> acceptable, but not high. • Score: 80%
  • 43. Results • An open data set of Dutch parliamentary debates, • with almost 3 Million links between 450.000 speeches and 1.5 Million news paper articles and radio bulletins at the National Library. • accessible though a Web demonstrator and through a Sparql Enpoint
  • 44. Demo
  • 51. Online database: SPARQL endpoint • A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.”
  • 56. The European Parliament as Linked Open Data Laura Hollink Centrum Wiskunde & Informatica, Amsterdam Astrid van Aggelen VU University Amsterdam Martijn Kleppe Erasmus University Rotterdam Henri Beunders Erasmus University Rotterdam Jill Briggeman Erasmus University Rotterdam Max Kemman University of Luxembourg
  • 57. Talk of Europe goals • To publish the entire plenary debates of the European Parliament as Linked Open Data • To improve access to the data • To enable large scale analysis across time spans. ‣To residents of the European Union access to the proceedings of the European parliament is a formal right.
  • 58. Step 1: Translate the European parliamentary debates to Linked Open Data
  • 59. Step 1: Translate the European parliamentary debates to Linked Open Data
  • 60. 14M RDF statements about the 30K speeches in 23 languages by 3K speakers in 1K session days that were held in the EU parliament between 1999 and 2014 Step 1: Translate the European parliamentary debates to Linked Open Data
  • 61. Modelling debates as events, not documents • ` lpv:number lpv:month lpv:year rdf:type lp:eu/plenary/SessionDay/ 2013-11-20 lp:eu/plenary/ 2013-11-20/AgendaItem_6 lp:eu/plenary/ 2013-11-20/Speech_103 lp:eu/plenary/ Session/2013-11 "2013-11-20"^xsd:date "11"^xsd:gMonth "2013"^xsd:gYear lp:eu/plenary/2013-11-20/ AgendaItem_7 lp:eu/plenary/2013-11-20/ Speech_104 lpv:has Subsequent lpv:has Subsequent dc:date dc:date dc:date 103^xsd:integer 6^xsd:integer lpv:number dc:hasPart dc:isPartOf dc:hasPart dc:isPartOf dc:isPartOfdc:hasPart lpv:eu/plenary/Speech lpv:eu/plenary/AgendaItem lpv:eu/plenary/SessionDay lpv:eu/plenary/Session rdf:type rdf:type rdf:type PREFIX lpv: <http://purl.org/linkedpolitics/vocabulary/> PREFIX lp: <http://purl.org/linkedpolitics/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  • 62. How to relate a speech the party of the speaker? lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 63. How to relate a speech the party of the speaker? Why is this not a good solution? lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 64. How to relate a speech the party of the speaker? Why is this not a good solution? 1. A person might be a member of more than one party (at different times) lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 65. How to relate a speech the party of the speaker? Why is this not a good solution? 1. A person might be a member of more than one party (at different times) 2. Since there is no link between a speech and a party, queries for all speeches spoken by the members of a certain party become very complicated. lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:speaker lp:EUParty/SomeParty lpv:hasParty
  • 66. How to relate a speech to the party of the speaker? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker
  • 67. How to relate a speech to the party of the speaker? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type
  • 68. How to relate a speech to the party of the speaker? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:spokenAs lpv:speaker lpv:spokenAs rdf:type
  • 69. How to relate a speech to the party of the speaker? "20111126"^ xsd:date "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitute lpv:political Function lpv:institution lpv:speaker "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:speaker rdf:type "20111126"^ xsd:date lp:political- Function101 lpv:end "20111126"^ xsd:date lpv:beginning "20071114" ^xsd:date lpv:PoliticalFunction "20090716"^ xsd:date lp:political- Function102 lpv:beginning lpv:end lp:EUmember_1023 lp:political Function lp:eu/plenary/2009-10-21/Speech_140> lpv:role lp:EUCommittee/ Committee_on_Legal_Affairs lp:Role/substitutelp:Role/member lp:EUParty/NI lpv:role lpv:political Function lpv:institutionlpv:institution rdf:type lpv:spokenAs lpv:speaker lpv:spokenAs rdf:type Note: this is another example of the design pattern called n-ary relations or relations as classes.
  • 70. Step 2: create links to external data sources •
  • 71. Step 2: create links to external data sources •
  • 72. Step 2: create links to external data sources • (links made by the EC)
  • 73. Linking Members of Parliament to Wikipedia / DBpedia how?
  • 74. Linking Members of Parliament to Wikipedia / DBpedia
  • 75. Linking Members of Parliament to Wikipedia / DBpedia • String matching is the most important feature in the linking process. • “nearly all [alignment systems] use a string similarity metric” [12] • stopping and stemming is not helpful! Nor is using WordNet synonyms. [12] [12] Cheatham, M., & Hitzler, P. String similarity metrics for ontology alignment. ISWC 2013. http://www.dbpedia.org/page/Judith_Sargentini
  • 76. Example query 1: speeches that contain a certain keyword Query: all speeches that contain the phrase “open data” …. So let us go for open data, let us go for utilisation of all the instruments available to that end! ….. …. but there too governments are encouraging the use of open data to increase transparency, accountability and citizen participation …. …. We already have many open data projects in the Member States and local authorities…..
  • 77. Example 2: speeches that contain a certain keyword by date "Slovenia" in the plenary meetings of the European Parliament Year Nr.ofmentions 020406080100 1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
  • 78. Example 2: speeches that contain a certain keyword by date "Slovenia" in the plenary meetings of the European Parliament Year Nr.ofmentions 020406080100 1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
  • 79. Example 2: speeches that contain a certain keyword by date Mentions of 'human rights' dates Frequency 0200400600800 1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013
  • 80. Example 3: speeches that contain a certain keyword by country AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK Mentions of 'human rights' by country 01000200030004000500060007000
  • 81. Example 4: the number of speeches per EU country SELECT ?c (COUNT(?c) as ?count) WHERE { ?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>. ?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p. ?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c } GROUP BY ?c LIMIT 50
  • 82. Example 5: include data external source Query: MEPs that were born outside Europe. Members of Parliament (DBpedia contains info on birthplace, birth date, schools, careers, residence, family, etc. )
  • 83. Example 5: include data external source Query: MEPs that were born outside Europe. Members of Parliament (DBpedia contains info on birthplace, birth date, schools, careers, residence, family, etc. )
  • 84. Intermezzo: one-question Quiz Reasoning on the Web of Data Question: What can we conclude from this graph? A. Stihler is a member of exactly 3 parties B. Stihler is a member of at least 3 parties C. Stihler is a member of at most 3 parties D. None of the above E. All of the above F. Other, namely …. http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name http://purl.org/linkedpolitics/EUParty/PES http://dbpedia.org/resource/ Party_of_European_Socialists http://dbpedia.org/resource/ Progressive_Alliance_of_Socialists_and_Democrats :memberOf :memberOf :memberOf
  • 85. Results • An open data set of EU parliamentary debates, • with links to other sources on the Web of Data • accessible though a through a Sparql Enpoint
  • 86. Reflection: to what extent can we now answer these questions? How did the debate about the financial crisis in Greece develop? Which political event has attracted most media attention? What are the differences between different media? Has the coverage changed over time?
  • 87. Reflection: to what extent can we now answer these questions? How did the debate about the financial crisis in Greece develop? Which political event has attracted most media attention? What are the differences between different media? Has the coverage changed over time? We can, but: • what is the influence of the selection of newspapers available at the National Library? • what was the quality of the digitisation process (OCR)? • How good is our linking approach (based on automatically detected entities and topics)? • How much can we trust the quality of external sources? ➡ How to handle these uncertainties is one of our research questions. We call this Tool Criticism
  • 88. Research directions at CWI Transparent, reproducible analysis of large volumes of connected, heterogenous, multimodal data. 1. How do we automatically link heterogeneous datasets? 2. How do we interpret links between datasets of different quality and certainty? 3. How do we handle the fact that knowledge evolves? 4. How do we design interfaces that allow scholars to study the datasets • including the links between them? • while assessing the reliability of the findings?
  • 89. Research directions at CWI Transparent, reproducible analysis of large volumes of connected, heterogenous, multimodal data. 1. How do we automatically link heterogeneous datasets? 2. How do we interpret links between datasets of different quality and certainty? 3. How do we handle the fact that knowledge evolves? 4. How do we design interfaces that allow scholars to study the datasets • including the links between them? • while assessing the reliability of the findings? Data Science - Big Data - Web of Data
  • 90. PoliMedia demo: http://polimedia.nl/ PoliMedia project video: https://youtu.be/u24oRCj7xrQ Talk of Europe project: http://talkofeurope.eu/ Talk of Europe data: purl.org/linkedpolitics Talk of Europe project video: https://youtu.be/GxA53gkCe0o My website: http://homepages.cwi.nl/~hollink/ A. van Aggelen, L. Hollink, M. Kemman, M. Kleppe & H. Beunders. The debates of the European Parliament as Linked Open Data. Semantic Web Journal. In press, 2016. M. Kleppe, L. Hollink, J. Oomen, M. Kenman, D. Juric, J. Blom, H. Beunders. PoliMedia - Improving the Analyses of Radio & Newspaper coverage of Political Debates. First prize winner of the LinkedUp Veni Competition, presented at the Open Knowledge Conference (OKCon), Geneva, September 2013.. I’d be happy to answer any questions!