SlideShare a Scribd company logo
Connecting political data to media data
Laura Hollink
VU University Amsterdam
Web & Media group
ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’
February 18, 2014
Laura Hollink Damir Juric
Geert-Jan Houben
Martijn Kleppe
Max Kemman
Henri Beunders
Johan Oomen
Jaap Blom
Funded by Clarin-NL
Questions we want to answer
• Which events have attracted
a lot of media attention?
• What are the differences
between different media?
E.g. in different newspapers,
or newspapers vs. radio
bulletins?
• Has the coverage changed
over time?
• How are the events visualized
(photos, layout of newspaper,
etc.).
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Roughly 1.8 Million news
bulletins between
1937-1984
(We only use 1945-1995)
Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)
PoliMedia methods
Step 1: Translate the Dutch parliamentary debates
to the standard structured web format RDF
nl.proc.sgd.d.
194519460000002
nl.proc.sgd.d.
194519460000002.1
PartOfDebateDebate
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002
http://statengeneraaldigitaal.nl/
http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
nl.proc.sgd.d.19720000002
Handelingen Verenigde
Vergadering...
Dutch
1945-11-20
rdf:type
dc:id
dc:source
dc:source
dc:publisher
dc:language
dc:date
hasPart
rdf:type
nl.proc.sgd.d.
194519460000002.1.1
hasPart
DebateContext
rdf:type
nl.proc.sgd.d.
194519460000002.1.2
Speech
rdf:type
hasPart
nl.proc.sgd.d.
194519460000002.1.3
hasSubsequentSpeech
"Mijnheer de
Voorzitter, de
Commissie
van …"
hasSpokenText
sem:hasActor
Speaker_0006
4
Party_kvp
hasParty
hasSpeaker
member_of
_parliament
"De voorzitter
opent de
vergadering…"
hasText
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
coveredIn
Party
KVP
Katholieke Volkspartij
rdf:type
hasAcronym
hasFullName
Joannes Antonius James
Bargefoaf:firstName
foaf:lastName
Barge
rdfs:label
http://resolver.politicalmashup.nl/nl.m.00064
dc:source
Politician
rdf:type
hasRole
nl.proc.sgd.d.
194519460000002.2
hasSubsequentPartOfDebate
XML by
War in
Parliament
Project
Modeling the debates as events
• An event has a date, a
location, actors, and
possibly sub-events.
• We build on the Simple
Event Model (SEM).
•links to the original sources
•reusing existing
vocabularies
nl.proc.sgd.d.
194519460000002
Debate
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002
http://statengeneraaldigitaal.nl/
http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
nl.proc.sgd.d.19720000002
Handelingen Verenigde
Vergadering...
Dutch
1945-11-20
rdf:type
dc:id
dc:source
dc:source
dc:publisher
dc:language
dc:date
dc:title
•the part-of structure and
chronological order of the
debates.
nl.proc.sgd.d.
194519460000002
nl.proc.sgd.d.
194519460000002.1
PartOfDebate
hasPart
rdf:type
nl.proc.sgd.d.
194519460000002.1.1
hasPart
DebateContext
rdf:type
nl.proc.sgd.d.
194519460000002.1.2
Speech
rdf:type
hasPart
nl.proc.sgd.d.
194519460000002.1.3
hasSubsequentSpeech
"Mijnheer de
Voorzitter, de
Commissie
van …"
hasSpokenText
"De voorzitter
opent de
vergadering…"
hasText
nl.proc.sgd.d.
194519460000002.2
hasSubsequentPartOfDebate
Handelingen Verenigde
Vergadering...
dc:title
•the different roles and parties
that a speaker can have in his/
her career.
nl.proc.sgd.d.
194519460000002.1.2
Speech
rdf:type
"Mijnheer de
Voorzitter, de
Commissie
van …"
hasSpokenText
sem:hasActor
Speaker_0006
4
Party_kvp
hasParty
hasSpeaker
member_of
_parliament
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
coveredIn
Party
KVP
Katholieke Volkspartij
rdf:type
hasAcronym
hasFullName
Joannes Antonius James
Bargefoaf:firstName
foaf:lastName
Barge
rdfs:label
Politician
rdf:type
hasRole
Step 2: Linking speeches in the debate to the
newspaper articles that cover them
We created a linking method to deal with our two challenges:
1.How to link documents that are so different in nature?
2. Can we use the structure of the debates: people, chronologic
order of speeches, introductions to each new topic, etc?
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Step 2: Linking speeches in the debate to the
newspaper articles that cover them
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Step 2: Linking speeches in the debate to the
newspaper articles that cover them
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Intuition 2: the more the article and the
speech overlap in terms of topics and
named entities, the more they are related.
Evaluation: what do we use to rank the candidate
articles?
• Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K
= 0.5
• Compare text of candidate articles to:
• Setting 1: Named Entities in speech
• Setting 2: Named Entities + Topics in speech
• Setting 3: Named Entities + Topics in speech and larger part-of-debate
Score Setting 1 Setting 2 Setting 3
I don’t know 0.14 0.15 0.08
0 - unrelated 0.38 0.23 0.12
1- related 0.29 0.36 0.36
2- explicit mention of the debate 0.19 0.26 0.44
1+2 0.48 0.62 0.80
Results
•An open data set of Dutch parliamentary debates,
•with almost 3 Million links between 450.000 speeches and URL’s of 1.5
Million news paper articles and radio bulletins at the National Library.
•accessible though a Web demonstrator and through a SPARQL endpoint.
Demo
SPARQL endpoint
• A service to query a knowledge
base using the SPARQL query
language.
“All speeches with more
than 60 associated news
items.”
SELECT ?speech ?no_newsitems {{
SELECT ?speech (COUNT(?news) AS ?no_news_items)
WHERE{
?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news .
}
GROUP BY ?speech }
FILTER (?no_news_items > 60) }
Reflection: to what extend can we answer these
questions?
• Which events have attracted
a lot of media attention?
• What are the differences
between different media?
E.g. in different newspapers,
or newspapers vs. radio
bulletins?
• Has the coverage changed
over time?
• How are the events visualized
(photos, layout of newspaper,
etc.).
Future work
• More types of links
• From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf”
“talksAbout”
• More types of media
• More types of (political) events.
Project ‘Talk of Europe / Traveling Clarin Campus’
2014-2015
Funded by CLARIN-ERIC
From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer,
Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)
Plans of ‘ToE/TTC’
1.Publish proceedings of the EU parliamentary debates in RDF
• hosted by DANS
2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we
invite international partners to work with the data.
3.In collaboration with international partners:
• enrich with annotations, e.g. topics, structured data about people, parties,
etc.
• link to national datasets, e.g. media or national parliaments
Connecting political data to media data

More Related Content

What's hot

Introduction to Research project PoliMedia
Introduction to Research project PoliMediaIntroduction to Research project PoliMedia
Introduction to Research project PoliMedia
Martijn Kleppe
 
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
Axel Bruns
 
DM2E - Europeana Cloud
DM2E - Europeana CloudDM2E - Europeana Cloud
DM2E - Europeana Cloud
Joris Klerkx
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
Tim Highfield
 
Social Media in Japan (Panel in Blogtalk2009)
Social Media in Japan (Panel in Blogtalk2009)Social Media in Japan (Panel in Blogtalk2009)
Social Media in Japan (Panel in Blogtalk2009)
National Institute of Informatics (NII)
 
Introduction MA Data, Culture and Society | University of Westminster, UK
Introduction MA Data, Culture and Society | University of Westminster, UKIntroduction MA Data, Culture and Society | University of Westminster, UK
Introduction MA Data, Culture and Society | University of Westminster, UK
slejay
 
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'OroDigital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
Michael Mitchell
 
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
Axel Bruns
 
Twitterdatabase for academics proposal
Twitterdatabase for academics proposalTwitterdatabase for academics proposal
Twitterdatabase for academics proposal
Mattias Östmar
 
Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Twitter as a First Draft of the Present – and the Challenges of Preserving It...Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Axel Bruns
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Juliane Stiller
 
Data Journalism at HSE conference
Data Journalism at HSE conferenceData Journalism at HSE conference
Data Journalism at HSE conference
Irina Radchenko
 
Groningen nl pgroep
Groningen nl pgroepGroningen nl pgroep
Groningen nl pgroep
maartenmarx
 
Greek independent media and the antifascist movement
Greek independent media and the antifascist movementGreek independent media and the antifascist movement
Greek independent media and the antifascist movement
Tim Highfield
 
Introduction to Data Journalism
Introduction to Data JournalismIntroduction to Data Journalism
Introduction to Data Journalism
Irina Radchenko
 
2013 05-23-knowledge triangle
2013 05-23-knowledge triangle2013 05-23-knowledge triangle
2013 05-23-knowledge triangle
Francesca Di Donato
 
NewsScape, UCLA’s International Television News Archive
NewsScape, UCLA’s International Television News ArchiveNewsScape, UCLA’s International Television News Archive
NewsScape, UCLA’s International Television News Archive
UCLA Social Sciences Computing
 
20190711 dh-utrecht
20190711 dh-utrecht20190711 dh-utrecht
20190711 dh-utrecht
Leo Lahti
 

What's hot (18)

Introduction to Research project PoliMedia
Introduction to Research project PoliMediaIntroduction to Research project PoliMedia
Introduction to Research project PoliMedia
 
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
 
DM2E - Europeana Cloud
DM2E - Europeana CloudDM2E - Europeana Cloud
DM2E - Europeana Cloud
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
 
Social Media in Japan (Panel in Blogtalk2009)
Social Media in Japan (Panel in Blogtalk2009)Social Media in Japan (Panel in Blogtalk2009)
Social Media in Japan (Panel in Blogtalk2009)
 
Introduction MA Data, Culture and Society | University of Westminster, UK
Introduction MA Data, Culture and Society | University of Westminster, UKIntroduction MA Data, Culture and Society | University of Westminster, UK
Introduction MA Data, Culture and Society | University of Westminster, UK
 
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'OroDigital Humanities Venice Group Presentation - Opening the Libro d'Oro
Digital Humanities Venice Group Presentation - Opening the Libro d'Oro
 
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
 
Twitterdatabase for academics proposal
Twitterdatabase for academics proposalTwitterdatabase for academics proposal
Twitterdatabase for academics proposal
 
Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Twitter as a First Draft of the Present – and the Challenges of Preserving It...Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Twitter as a First Draft of the Present – and the Challenges of Preserving It...
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
 
Data Journalism at HSE conference
Data Journalism at HSE conferenceData Journalism at HSE conference
Data Journalism at HSE conference
 
Groningen nl pgroep
Groningen nl pgroepGroningen nl pgroep
Groningen nl pgroep
 
Greek independent media and the antifascist movement
Greek independent media and the antifascist movementGreek independent media and the antifascist movement
Greek independent media and the antifascist movement
 
Introduction to Data Journalism
Introduction to Data JournalismIntroduction to Data Journalism
Introduction to Data Journalism
 
2013 05-23-knowledge triangle
2013 05-23-knowledge triangle2013 05-23-knowledge triangle
2013 05-23-knowledge triangle
 
NewsScape, UCLA’s International Television News Archive
NewsScape, UCLA’s International Television News ArchiveNewsScape, UCLA’s International Television News Archive
NewsScape, UCLA’s International Television News Archive
 
20190711 dh-utrecht
20190711 dh-utrecht20190711 dh-utrecht
20190711 dh-utrecht
 

Similar to Connecting political data to media data

Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
Laura Hollink
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
PrattSILS
 
How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?
Integrated Carbon Observation System (ICOS)
 
ICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and mediaICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and media
gjhouben
 
Spanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 enSpanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 en
BO TRUE ACTIVITIES SL
 
TIDSR
TIDSRTIDSR
TIDSR
Eric Meyer
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Laura Hollink
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Tuukka Ylä-Anttila
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
Simon Bishop
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
Dr Wasim Ahmed
 
Research and Social Media
Research and Social MediaResearch and Social Media
Research and Social Media
Krijn Poppe
 
Science & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationScience & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to Deliberation
Prof. Alexander Gerber
 
PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...
PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...
PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...
Martijn Kleppe
 
Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...
UoLResearchSupport
 
Public engagement while you sleep
Public engagement while you sleep Public engagement while you sleep
Public engagement while you sleep
Kirsten Thompson
 
A History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - AttentioA History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - Attentio
Influence People
 
Tracking Social Media Participation: New Approaches to Studying User-Gener...
Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...
Tracking Social Media Participation: New Approaches to Studying User-Gener...
Axel Bruns
 
Conference Law Via The Internet
Conference Law Via The InternetConference Law Via The Internet
Conference Law Via The Internet
Alessandro Gallo
 
Open Science What? Why? For What? How?
Open Science What? Why? For What? How?Open Science What? Why? For What? How?
Open Science What? Why? For What? How?
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Sebastian Ruder
 

Similar to Connecting political data to media data (20)

Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?
 
ICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and mediaICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and media
 
Spanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 enSpanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 en
 
TIDSR
TIDSRTIDSR
TIDSR
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
Research and Social Media
Research and Social MediaResearch and Social Media
Research and Social Media
 
Science & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationScience & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to Deliberation
 
PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...
PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...
PoliMedia - Analysing Mediacoverage of political debates in newspapers, radio...
 
Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...Public engagement while you sleep? How altmetrics can help researchers broade...
Public engagement while you sleep? How altmetrics can help researchers broade...
 
Public engagement while you sleep
Public engagement while you sleep Public engagement while you sleep
Public engagement while you sleep
 
A History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - AttentioA History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - Attentio
 
Tracking Social Media Participation: New Approaches to Studying User-Gener...
Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...
Tracking Social Media Participation: New Approaches to Studying User-Gener...
 
Conference Law Via The Internet
Conference Law Via The InternetConference Law Via The Internet
Conference Law Via The Internet
 
Open Science What? Why? For What? How?
Open Science What? Why? For What? How?Open Science What? Why? For What? How?
Open Science What? Why? For What? How?
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 

More from Laura Hollink

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
Laura Hollink
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
Laura Hollink
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Laura Hollink
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
Laura Hollink
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
Laura Hollink
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
Laura Hollink
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
Laura Hollink
 

More from Laura Hollink (7)

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
 

Recently uploaded

gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
VasileiosMezaris
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
Areesha Ahmad
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
the fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptxthe fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptx
parminder0808singh
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
DrRajeshDas
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)
PABOLU TEJASREE
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
FarhanaHussain18
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed ProfessorFiroozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
savindersingh16
 
Mites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdfMites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdf
PirithiRaju
 

Recently uploaded (20)

gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
the fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptxthe fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptx
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)Analysis of Polygenic Traits (GPB-602)
Analysis of Polygenic Traits (GPB-602)
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed ProfessorFiroozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed Professor
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
 
Mites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdfMites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdf
 

Connecting political data to media data

  • 1. Connecting political data to media data Laura Hollink VU University Amsterdam Web & Media group ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ February 18, 2014
  • 2. Laura Hollink Damir Juric Geert-Jan Houben Martijn Kleppe Max Kemman Henri Beunders Johan Oomen Jaap Blom Funded by Clarin-NL
  • 3.
  • 4.
  • 5. Questions we want to answer • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).
  • 6.
  • 7. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.
  • 8. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • 9. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • 11. Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF nl.proc.sgd.d. 194519460000002 nl.proc.sgd.d. 194519460000002.1 PartOfDebateDebate http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 http://statengeneraaldigitaal.nl/ http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf nl.proc.sgd.d.19720000002 Handelingen Verenigde Vergadering... Dutch 1945-11-20 rdf:type dc:id dc:source dc:source dc:publisher dc:language dc:date hasPart rdf:type nl.proc.sgd.d. 194519460000002.1.1 hasPart DebateContext rdf:type nl.proc.sgd.d. 194519460000002.1.2 Speech rdf:type hasPart nl.proc.sgd.d. 194519460000002.1.3 hasSubsequentSpeech "Mijnheer de Voorzitter, de Commissie van …" hasSpokenText sem:hasActor Speaker_0006 4 Party_kvp hasParty hasSpeaker member_of _parliament "De voorzitter opent de vergadering…" hasText http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr coveredIn Party KVP Katholieke Volkspartij rdf:type hasAcronym hasFullName Joannes Antonius James Bargefoaf:firstName foaf:lastName Barge rdfs:label http://resolver.politicalmashup.nl/nl.m.00064 dc:source Politician rdf:type hasRole nl.proc.sgd.d. 194519460000002.2 hasSubsequentPartOfDebate XML by War in Parliament Project
  • 12. Modeling the debates as events • An event has a date, a location, actors, and possibly sub-events. • We build on the Simple Event Model (SEM). •links to the original sources •reusing existing vocabularies nl.proc.sgd.d. 194519460000002 Debate http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 http://statengeneraaldigitaal.nl/ http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf nl.proc.sgd.d.19720000002 Handelingen Verenigde Vergadering... Dutch 1945-11-20 rdf:type dc:id dc:source dc:source dc:publisher dc:language dc:date dc:title
  • 13. •the part-of structure and chronological order of the debates. nl.proc.sgd.d. 194519460000002 nl.proc.sgd.d. 194519460000002.1 PartOfDebate hasPart rdf:type nl.proc.sgd.d. 194519460000002.1.1 hasPart DebateContext rdf:type nl.proc.sgd.d. 194519460000002.1.2 Speech rdf:type hasPart nl.proc.sgd.d. 194519460000002.1.3 hasSubsequentSpeech "Mijnheer de Voorzitter, de Commissie van …" hasSpokenText "De voorzitter opent de vergadering…" hasText nl.proc.sgd.d. 194519460000002.2 hasSubsequentPartOfDebate Handelingen Verenigde Vergadering... dc:title
  • 14. •the different roles and parties that a speaker can have in his/ her career. nl.proc.sgd.d. 194519460000002.1.2 Speech rdf:type "Mijnheer de Voorzitter, de Commissie van …" hasSpokenText sem:hasActor Speaker_0006 4 Party_kvp hasParty hasSpeaker member_of _parliament http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr coveredIn Party KVP Katholieke Volkspartij rdf:type hasAcronym hasFullName Joannes Antonius James Bargefoaf:firstName foaf:lastName Barge rdfs:label Politician rdf:type hasRole
  • 15. Step 2: Linking speeches in the debate to the newspaper articles that cover them We created a linking method to deal with our two challenges: 1.How to link documents that are so different in nature? 2. Can we use the structure of the debates: people, chronologic order of speeches, introductions to each new topic, etc? Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate
  • 16. Step 2: Linking speeches in the debate to the newspaper articles that cover them Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate
  • 17. Step 2: Linking speeches in the debate to the newspaper articles that cover them Detect topics in speeches Create queries Search newspaper archive Topics Named Entities Name of speaker Detect Named Entities in speeches Candidate articles Queries Rank candidate articles Links between speeches and articles Debates Date of debate Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
  • 18. Evaluation: what do we use to rank the candidate articles? • Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K = 0.5 • Compare text of candidate articles to: • Setting 1: Named Entities in speech • Setting 2: Named Entities + Topics in speech • Setting 3: Named Entities + Topics in speech and larger part-of-debate Score Setting 1 Setting 2 Setting 3 I don’t know 0.14 0.15 0.08 0 - unrelated 0.38 0.23 0.12 1- related 0.29 0.36 0.36 2- explicit mention of the debate 0.19 0.26 0.44 1+2 0.48 0.62 0.80
  • 19. Results •An open data set of Dutch parliamentary debates, •with almost 3 Million links between 450.000 speeches and URL’s of 1.5 Million news paper articles and radio bulletins at the National Library. •accessible though a Web demonstrator and through a SPARQL endpoint.
  • 20. Demo
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. SPARQL endpoint • A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.” SELECT ?speech ?no_newsitems {{ SELECT ?speech (COUNT(?news) AS ?no_news_items) WHERE{ ?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news . } GROUP BY ?speech } FILTER (?no_news_items > 60) }
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Reflection: to what extend can we answer these questions? • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).
  • 33. Future work • More types of links • From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf” “talksAbout” • More types of media • More types of (political) events.
  • 34. Project ‘Talk of Europe / Traveling Clarin Campus’ 2014-2015 Funded by CLARIN-ERIC From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer, Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)
  • 35. Plans of ‘ToE/TTC’ 1.Publish proceedings of the EU parliamentary debates in RDF • hosted by DANS 2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we invite international partners to work with the data. 3.In collaboration with international partners: • enrich with annotations, e.g. topics, structured data about people, parties, etc. • link to national datasets, e.g. media or national parliaments