Connecting political data to media data

Connecting political data to media data
Laura Hollink
VU University Amsterdam
Web & Media group
ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’
February 18, 2014

Laura Hollink

Damir Juric
Geert-Jan Houben

Funded by Clarin-NL

Martijn Kleppe
Max Kemman
Henri Beunders

Johan Oomen
Jaap Blom

Questions we want to answer
• Which events have attracted
a lot of media attention?
• What are the differences
between different media?
E.g. in different newspapers,
or newspapers vs. radio
bulletins?
• Has the coverage changed
over time?
• How are the events visualized
(photos, layout of newspaper,
etc.).

Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.

parliament between
1,208,903 speeches.

Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)

parliament between
1,208,903 speeches.

Roughly 1.8 Million news
bulletins between
1937-1984
(We only use 1945-1995)

Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)

Step 1: Translate the Dutch parliamentary debates
to the standard structured web format RDF
XML by
War in
Parliament
Project

Handelingen Verenigde
Vergadering...
Debate

PartOfDebate

DebateContext

rdf:type

rdf:type

rdf:type

1945-11-20
dc:date

Dutch

dc:language

nl.proc.sgd.d.
194519460000002

hasPart

nl.proc.sgd.d.
194519460000002.1

hasPart

nl.proc.sgd.d.
194519460000002.1.1

hasText

"De voorzitter
opent de
vergadering…"

dc:publisher
dc:id

http://statengeneraaldigitaal.nl/
dc:source

nl.proc.sgd.d.19720000002

hasSubsequentPartOfDebate
hasPart

dc:source
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

"Mijnheer de
Voorzitter, de
Commissie
van …"

member_of
_parliament

Speech

nl.proc.sgd.d.
194519460000002.2

hasSpokenText

hasRole
rdf:type

rdf:type

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

Joannes Antonius James

Politician

foaf:ﬁrstName

Barge

foaf:lastName
nl.proc.sgd.d.
194519460000002.1.2

sem:hasActor

hasSpeaker

Speaker_0006
4

rdfs:label

Barge

dc:source
coveredIn

http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr

hasSubsequentSpeech

http://resolver.politicalmashup.nl/nl.m.00064
hasParty

nl.proc.sgd.d.
194519460000002.1.3

Party

Katholieke Volkspartij
rdf:type
hasFullName
Party_kvp

hasAcronym

KVP

Modeling the debates as events
• An event has a date, a
location, actors, and
possibly sub-events.
• We build on the Simple
Event Model (SEM).

• links to the original sources
• reusing existing
vocabularies

Vergadering...
Debate

dc:title

1945-11-20

rdf:type

dc:date

Dutch

dc:language

nl.proc.sgd.d.
194519460000002

dc:publisher
dc:id

http://statengeneraaldigitaal.nl/
dc:source

nl.proc.sgd.d.19720000002

dc:source
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

Vergadering...

PartOfDebate

rdf:type

dc:title

nl.proc.sgd.d.
194519460000002

hasPart

DebateContext

rdf:type

nl.proc.sgd.d.
194519460000002.1

hasPart

nl.proc.sgd.d.
194519460000002.1.1

hasText

"Mijnheer de
Voorzitter, de
Commissie
van …"

hasSubsequentPartOfDebate
hasPart

Speech

nl.proc.sgd.d.
194519460000002.2
rdf:type

•the part-of structure and
chronological order of the
debates.

"De voorzitter
opent de
vergadering…"

nl.proc.sgd.d.
194519460000002.1.2

hasSubsequentSpeech

nl.proc.sgd.d.
194519460000002.1.3

hasSpokenText

"Mijnheer de
Voorzitter, de
Commissie
van …"

Speech

hasSpokenText
rdf:type

member_of
_parliament

Politician
Joannes Antonius James

hasRole

rdf:type

foaf:ﬁrstName

Barge

foaf:lastName
nl.proc.sgd.d.
194519460000002.1.2

sem:hasActor

coveredIn

hasSpeaker

Speaker_0006
4

rdfs:label

Barge

hasParty

Party
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
Katholieke Volkspartij
rdf:type
hasFullName
Party_kvp

• the different roles and parties

that a speaker can have in his/
her career.

hasAcronym

KVP

Step 2: Linking speeches in the debate to the
newspaper articles that cover them
We created a linking method to deal with our two challenges:
1.How to link documents that are so different in nature?
2. Can we use the structure of the debates: people, chronologic
order of speeches, introductions to each new topic, etc?
Name of
speaker
Date of
debate

Search
newspaper
archive

Candidate
articles
Rank
candidate
articles

Debates
Detect
topics in
speeches

Topics

Create
queries

Detect
Named
Entities in
speeches

Named
Entities

Queries

Links
between
speeches
and articles

Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Name of
speaker
Date of
debate

Search
newspaper
archive

Candidate
articles
Rank
candidate
articles

Debates
Detect
topics in
speeches

Topics

Create
queries

Detect
Named
Entities in
speeches

Named
Entities

Queries

Links
between
speeches
and articles

Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Name of
speaker
Date of
debate

Search
newspaper
archive

Candidate
articles
Rank
candidate
articles

Debates
Detect
topics in
speeches

Topics

Create
queries

Detect
Named
Entities in
speeches

Named
Entities

Links
between
speeches
and articles

Queries

Intuition 2: the more the article and the
speech overlap in terms of topics and
named entities, the more they are related.

Evaluation: what do we use to rank the candidate
articles?
• Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K
= 0.5
• Compare text of candidate articles to:
• Setting 1: Named Entities in speech
• Setting 2: Named Entities + Topics in speech
• Setting 3: Named Entities + Topics in speech and larger part-of-debate

Score

Setting 1 Setting 2 Setting 3

I don’t know

0.14

0.15

0.08

0 - unrelated

0.38

0.23

0.12

1- related

0.29

0.36

0.36

2- explicit mention of the debate 0.19

0.26

0.44

1+2

0.62

0.80

0.48

Results
• An open data set of Dutch parliamentary debates,
• with almost 3 Million

links between 450.000 speeches and URL’s of 1.5
Million news paper articles and radio bulletins at the National Library.

• accessible though a Web demonstrator and through a SPARQL endpoint.

SPARQL endpoint
• A service to query a knowledge
base using the SPARQL query
language.

“All speeches with more
than 60 associated news
items.”
SELECT ?speech ?no_newsitems {{
SELECT ?speech (COUNT(?news) AS ?no_news_items)
WHERE{
?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news .
}
GROUP BY ?speech }
FILTER (?no_news_items > 60) }

Reﬂection: to what extend can we answer these
questions?
• Which events have attracted
a lot of media attention?
• What are the differences
between different media?
E.g. in different newspapers,
or newspapers vs. radio
bulletins?
• Has the coverage changed
over time?
• How are the events visualized
(photos, layout of newspaper,
etc.).

Future work
• More types of links
• From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf”
“talksAbout”
• More types of media

• More types of (political) events.

Project ‘Talk of Europe / Traveling Clarin Campus’
2014-2015
Funded by CLARIN-ERIC

From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer,
Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)

Plans of ‘ToE/TTC’
1.Publish proceedings of the EU parliamentary debates in RDF
• hosted by DANS
2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we
invite international partners to work with the data.
3.In collaboration with international partners:
• enrich with annotations, e.g. topics, structured data about people, parties,
etc.
• link to national datasets, e.g. media or national parliaments

Connecting political data to media data

Connecting political data to media data

Recommended

Recommended

More Related Content

Similar to Connecting political data to media data

Similar to Connecting political data to media data (20)

More from Laura Hollink

More from Laura Hollink (7)

Recently uploaded

Recently uploaded (20)

Connecting political data to media data