SlideShare a Scribd company logo
Connecting political data to media data
Laura Hollink
VU University Amsterdam
Web & Media group
ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’
February 18, 2014
Laura Hollink

Damir Juric
Geert-Jan Houben

Funded by Clarin-NL

Martijn Kleppe
Max Kemman
Henri Beunders

Johan Oomen
Jaap Blom
Connecting political data to media data
Connecting political data to media data
Questions we want to answer
• Which events have attracted
a lot of media attention?
• What are the differences
between different media?
E.g. in different newspapers,
or newspapers vs. radio
bulletins?
• Has the coverage changed
over time?
• How are the events visualized
(photos, layout of newspaper,
etc.).
Connecting political data to media data
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.

Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)
Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.

Roughly 1.8 Million news
bulletins between
1937-1984
(We only use 1945-1995)

Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)
PoliMedia methods
Step 1: Translate the Dutch parliamentary debates
to the standard structured web format RDF
XML by
War in
Parliament
Project

Handelingen Verenigde
Vergadering...
Debate

PartOfDebate

DebateContext

rdf:type

rdf:type

rdf:type

1945-11-20
dc:date

Dutch

dc:language

nl.proc.sgd.d.
194519460000002

hasPart

nl.proc.sgd.d.
194519460000002.1

hasPart

nl.proc.sgd.d.
194519460000002.1.1

hasText

"De voorzitter
opent de
vergadering…"

dc:publisher
dc:id

http://statengeneraaldigitaal.nl/
dc:source

nl.proc.sgd.d.19720000002

hasSubsequentPartOfDebate
hasPart

dc:source
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

"Mijnheer de
Voorzitter, de
Commissie
van …"

member_of
_parliament

Speech

nl.proc.sgd.d.
194519460000002.2

hasSpokenText

hasRole
rdf:type

rdf:type

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

Joannes Antonius James

Politician

foaf:firstName

Barge

foaf:lastName
nl.proc.sgd.d.
194519460000002.1.2

sem:hasActor

hasSpeaker

Speaker_0006
4

rdfs:label

Barge

dc:source
coveredIn

http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr

hasSubsequentSpeech

http://resolver.politicalmashup.nl/nl.m.00064
hasParty

nl.proc.sgd.d.
194519460000002.1.3

Party

Katholieke Volkspartij
rdf:type
hasFullName
Party_kvp

hasAcronym

KVP
Modeling the debates as events
• An event has a date, a
location, actors, and
possibly sub-events.
• We build on the Simple
Event Model (SEM).

• links to the original sources
• reusing existing
vocabularies

Handelingen Verenigde
Vergadering...
Debate

dc:title

1945-11-20

rdf:type

dc:date

Dutch

dc:language

nl.proc.sgd.d.
194519460000002

dc:publisher
dc:id

http://statengeneraaldigitaal.nl/
dc:source

nl.proc.sgd.d.19720000002

dc:source
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
Handelingen Verenigde
Vergadering...

PartOfDebate

rdf:type

dc:title

nl.proc.sgd.d.
194519460000002

hasPart

DebateContext

rdf:type

nl.proc.sgd.d.
194519460000002.1

hasPart

nl.proc.sgd.d.
194519460000002.1.1

hasText

"Mijnheer de
Voorzitter, de
Commissie
van …"

hasSubsequentPartOfDebate
hasPart

Speech

nl.proc.sgd.d.
194519460000002.2
rdf:type

•the part-of structure and
chronological order of the
debates.

"De voorzitter
opent de
vergadering…"

nl.proc.sgd.d.
194519460000002.1.2

hasSubsequentSpeech

nl.proc.sgd.d.
194519460000002.1.3

hasSpokenText
"Mijnheer de
Voorzitter, de
Commissie
van …"

Speech

hasSpokenText
rdf:type

member_of
_parliament

Politician
Joannes Antonius James

hasRole

rdf:type

foaf:firstName

Barge

foaf:lastName
nl.proc.sgd.d.
194519460000002.1.2

sem:hasActor

coveredIn

hasSpeaker

Speaker_0006
4

rdfs:label

Barge

hasParty

Party
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
Katholieke Volkspartij
rdf:type
hasFullName
Party_kvp

• the different roles and parties

that a speaker can have in his/
her career.

hasAcronym

KVP
Step 2: Linking speeches in the debate to the
newspaper articles that cover them
We created a linking method to deal with our two challenges:
1.How to link documents that are so different in nature?
2. Can we use the structure of the debates: people, chronologic
order of speeches, introductions to each new topic, etc?
Name of
speaker
Date of
debate

Search
newspaper
archive

Candidate
articles
Rank
candidate
articles

Debates
Detect
topics in
speeches

Topics

Create
queries

Detect
Named
Entities in
speeches

Named
Entities

Queries

Links
between
speeches
and articles
Step 2: Linking speeches in the debate to the
newspaper articles that cover them
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Name of
speaker
Date of
debate

Search
newspaper
archive

Candidate
articles
Rank
candidate
articles

Debates
Detect
topics in
speeches

Topics

Create
queries

Detect
Named
Entities in
speeches

Named
Entities

Queries

Links
between
speeches
and articles
Step 2: Linking speeches in the debate to the
newspaper articles that cover them
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Name of
speaker
Date of
debate

Search
newspaper
archive

Candidate
articles
Rank
candidate
articles

Debates
Detect
topics in
speeches

Topics

Create
queries

Detect
Named
Entities in
speeches

Named
Entities

Links
between
speeches
and articles

Queries

Intuition 2: the more the article and the
speech overlap in terms of topics and
named entities, the more they are related.
Evaluation: what do we use to rank the candidate
articles?
• Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K
= 0.5
• Compare text of candidate articles to:
• Setting 1: Named Entities in speech
• Setting 2: Named Entities + Topics in speech
• Setting 3: Named Entities + Topics in speech and larger part-of-debate

Score

Setting 1 Setting 2 Setting 3

I don’t know

0.14

0.15

0.08

0 - unrelated

0.38

0.23

0.12

1- related

0.29

0.36

0.36

2- explicit mention of the debate 0.19

0.26

0.44

1+2

0.62

0.80

0.48
Results
• An open data set of Dutch parliamentary debates,
• with almost 3 Million

links between 450.000 speeches and URL’s of 1.5
Million news paper articles and radio bulletins at the National Library.

• accessible though a Web demonstrator and through a SPARQL endpoint.
Demo
Connecting political data to media data
Connecting political data to media data
Connecting political data to media data
Connecting political data to media data
Connecting political data to media data
Connecting political data to media data
SPARQL endpoint
• A service to query a knowledge
base using the SPARQL query
language.

“All speeches with more
than 60 associated news
items.”
SELECT ?speech ?no_newsitems {{
SELECT ?speech (COUNT(?news) AS ?no_news_items)
WHERE{
?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news .
}
GROUP BY ?speech }
FILTER (?no_news_items > 60) }
Connecting political data to media data
Connecting political data to media data
Connecting political data to media data
Connecting political data to media data
Reflection: to what extend can we answer these
questions?
• Which events have attracted
a lot of media attention?
• What are the differences
between different media?
E.g. in different newspapers,
or newspapers vs. radio
bulletins?
• Has the coverage changed
over time?
• How are the events visualized
(photos, layout of newspaper,
etc.).
Future work
• More types of links
• From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf”
“talksAbout”
• More types of media

• More types of (political) events.
Project ‘Talk of Europe / Traveling Clarin Campus’
2014-2015
Funded by CLARIN-ERIC

From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer,
Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)
Plans of ‘ToE/TTC’
1.Publish proceedings of the EU parliamentary debates in RDF
• hosted by DANS
2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we
invite international partners to work with the data.
3.In collaboration with international partners:
• enrich with annotations, e.g. topics, structured data about people, parties,
etc.
• link to national datasets, e.g. media or national parliaments
Connecting political data to media data

More Related Content

Similar to Connecting political data to media data

Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European Parliament
Laura Hollink
 
Using open datasets for research purposes
Using open datasets for research purposesUsing open datasets for research purposes
Using open datasets for research purposes
Martijn Kleppe
 
How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?
Integrated Carbon Observation System (ICOS)
 
ICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and mediaICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and media
gjhouben
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
PrattSILS
 
Spanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 enSpanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 en
BO TRUE ACTIVITIES SL
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Tuukka Ylä-Anttila
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
Simon Bishop
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
Dr Wasim Ahmed
 
TIDSR
TIDSRTIDSR
TIDSR
Eric Meyer
 
News intro and language
News intro and languageNews intro and language
News intro and language
Great Baddow High School Media
 
Finding newspapers and news online
Finding newspapers and news onlineFinding newspapers and news online
Finding newspapers and news online
kevinwilsongold
 
Tracking Social Media Participation: New Approaches to Studying User-Gener...
Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...
Tracking Social Media Participation: New Approaches to Studying User-Gener...
Axel Bruns
 
Research and Social Media
Research and Social MediaResearch and Social Media
Research and Social Media
Krijn Poppe
 
Twitter provides a selfie of envolving language
Twitter provides a selfie of envolving languageTwitter provides a selfie of envolving language
Twitter provides a selfie of envolving language
TERMCAT
 
Conference Law Via The Internet
Conference Law Via The InternetConference Law Via The Internet
Conference Law Via The Internet
Alessandro Gallo
 
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Axel Bruns
 
A History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - AttentioA History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - Attentio
Influence People
 
Social Media and Architecture Journal Archives
Social Media and Architecture Journal ArchivesSocial Media and Architecture Journal Archives
Social Media and Architecture Journal Archives
Noreen Whysel
 
Social Media and Architecture Journal Archives
Social Media and Architecture Journal ArchivesSocial Media and Architecture Journal Archives
Social Media and Architecture Journal Archives
Rachel Isaac-Menard
 

Similar to Connecting political data to media data (20)

Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European Parliament
 
Using open datasets for research purposes
Using open datasets for research purposesUsing open datasets for research purposes
Using open datasets for research purposes
 
How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?How to have societal impact...as an individual researcher?
How to have societal impact...as an individual researcher?
 
ICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and mediaICWE2013 - Discovering links between political debates and media
ICWE2013 - Discovering links between political debates and media
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
Spanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 enSpanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 en
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
TIDSR
TIDSRTIDSR
TIDSR
 
News intro and language
News intro and languageNews intro and language
News intro and language
 
Finding newspapers and news online
Finding newspapers and news onlineFinding newspapers and news online
Finding newspapers and news online
 
Tracking Social Media Participation: New Approaches to Studying User-Gener...
Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...Tracking  Social  Media  Participation: New Approaches to Studying User-Gener...
Tracking Social Media Participation: New Approaches to Studying User-Gener...
 
Research and Social Media
Research and Social MediaResearch and Social Media
Research and Social Media
 
Twitter provides a selfie of envolving language
Twitter provides a selfie of envolving languageTwitter provides a selfie of envolving language
Twitter provides a selfie of envolving language
 
Conference Law Via The Internet
Conference Law Via The InternetConference Law Via The Internet
Conference Law Via The Internet
 
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
 
A History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - AttentioA History of Social Media Listening - Simon McDermott - Attentio
A History of Social Media Listening - Simon McDermott - Attentio
 
Social Media and Architecture Journal Archives
Social Media and Architecture Journal ArchivesSocial Media and Architecture Journal Archives
Social Media and Architecture Journal Archives
 
Social Media and Architecture Journal Archives
Social Media and Architecture Journal ArchivesSocial Media and Architecture Journal Archives
Social Media and Architecture Journal Archives
 

More from Laura Hollink

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
Laura Hollink
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
Laura Hollink
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Laura Hollink
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
Laura Hollink
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
Laura Hollink
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
Laura Hollink
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
Laura Hollink
 

More from Laura Hollink (7)

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
 

Recently uploaded

Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
FIDO Alliance
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
Tech Guru
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
Badri_Bady
 
Smart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdfSmart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdf
Market.us
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Zilliz
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
DianaGray10
 

Recently uploaded (20)

Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
UX Webinar Series: Essentials for Adopting Passkeys as the Foundation of your...
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Intel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdfIntel Unveils Core Ultra 200V Lunar chip .pdf
Intel Unveils Core Ultra 200V Lunar chip .pdf
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
 
Smart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdfSmart Mobility Market:Revolutionizing Transportation.pdf
Smart Mobility Market:Revolutionizing Transportation.pdf
 
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
 

Connecting political data to media data

  • 1. Connecting political data to media data Laura Hollink VU University Amsterdam Web & Media group ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ February 18, 2014
  • 2. Laura Hollink Damir Juric Geert-Jan Houben Funded by Clarin-NL Martijn Kleppe Max Kemman Henri Beunders Johan Oomen Jaap Blom
  • 5. Questions we want to answer • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).
  • 7. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.
  • 8. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • 9. Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • 11. Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF XML by War in Parliament Project Handelingen Verenigde Vergadering... Debate PartOfDebate DebateContext rdf:type rdf:type rdf:type 1945-11-20 dc:date Dutch dc:language nl.proc.sgd.d. 194519460000002 hasPart nl.proc.sgd.d. 194519460000002.1 hasPart nl.proc.sgd.d. 194519460000002.1.1 hasText "De voorzitter opent de vergadering…" dc:publisher dc:id http://statengeneraaldigitaal.nl/ dc:source nl.proc.sgd.d.19720000002 hasSubsequentPartOfDebate hasPart dc:source http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 "Mijnheer de Voorzitter, de Commissie van …" member_of _parliament Speech nl.proc.sgd.d. 194519460000002.2 hasSpokenText hasRole rdf:type rdf:type http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf Joannes Antonius James Politician foaf:firstName Barge foaf:lastName nl.proc.sgd.d. 194519460000002.1.2 sem:hasActor hasSpeaker Speaker_0006 4 rdfs:label Barge dc:source coveredIn http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr hasSubsequentSpeech http://resolver.politicalmashup.nl/nl.m.00064 hasParty nl.proc.sgd.d. 194519460000002.1.3 Party Katholieke Volkspartij rdf:type hasFullName Party_kvp hasAcronym KVP
  • 12. Modeling the debates as events • An event has a date, a location, actors, and possibly sub-events. • We build on the Simple Event Model (SEM). • links to the original sources • reusing existing vocabularies Handelingen Verenigde Vergadering... Debate dc:title 1945-11-20 rdf:type dc:date Dutch dc:language nl.proc.sgd.d. 194519460000002 dc:publisher dc:id http://statengeneraaldigitaal.nl/ dc:source nl.proc.sgd.d.19720000002 dc:source http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
  • 13. Handelingen Verenigde Vergadering... PartOfDebate rdf:type dc:title nl.proc.sgd.d. 194519460000002 hasPart DebateContext rdf:type nl.proc.sgd.d. 194519460000002.1 hasPart nl.proc.sgd.d. 194519460000002.1.1 hasText "Mijnheer de Voorzitter, de Commissie van …" hasSubsequentPartOfDebate hasPart Speech nl.proc.sgd.d. 194519460000002.2 rdf:type •the part-of structure and chronological order of the debates. "De voorzitter opent de vergadering…" nl.proc.sgd.d. 194519460000002.1.2 hasSubsequentSpeech nl.proc.sgd.d. 194519460000002.1.3 hasSpokenText
  • 14. "Mijnheer de Voorzitter, de Commissie van …" Speech hasSpokenText rdf:type member_of _parliament Politician Joannes Antonius James hasRole rdf:type foaf:firstName Barge foaf:lastName nl.proc.sgd.d. 194519460000002.1.2 sem:hasActor coveredIn hasSpeaker Speaker_0006 4 rdfs:label Barge hasParty Party http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr Katholieke Volkspartij rdf:type hasFullName Party_kvp • the different roles and parties that a speaker can have in his/ her career. hasAcronym KVP
  • 15. Step 2: Linking speeches in the debate to the newspaper articles that cover them We created a linking method to deal with our two challenges: 1.How to link documents that are so different in nature? 2. Can we use the structure of the debates: people, chronologic order of speeches, introductions to each new topic, etc? Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Queries Links between speeches and articles
  • 16. Step 2: Linking speeches in the debate to the newspaper articles that cover them Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Queries Links between speeches and articles
  • 17. Step 2: Linking speeches in the debate to the newspaper articles that cover them Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Links between speeches and articles Queries Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
  • 18. Evaluation: what do we use to rank the candidate articles? • Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K = 0.5 • Compare text of candidate articles to: • Setting 1: Named Entities in speech • Setting 2: Named Entities + Topics in speech • Setting 3: Named Entities + Topics in speech and larger part-of-debate Score Setting 1 Setting 2 Setting 3 I don’t know 0.14 0.15 0.08 0 - unrelated 0.38 0.23 0.12 1- related 0.29 0.36 0.36 2- explicit mention of the debate 0.19 0.26 0.44 1+2 0.62 0.80 0.48
  • 19. Results • An open data set of Dutch parliamentary debates, • with almost 3 Million links between 450.000 speeches and URL’s of 1.5 Million news paper articles and radio bulletins at the National Library. • accessible though a Web demonstrator and through a SPARQL endpoint.
  • 20. Demo
  • 27. SPARQL endpoint • A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.” SELECT ?speech ?no_newsitems {{ SELECT ?speech (COUNT(?news) AS ?no_news_items) WHERE{ ?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news . } GROUP BY ?speech } FILTER (?no_news_items > 60) }
  • 32. Reflection: to what extend can we answer these questions? • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).
  • 33. Future work • More types of links • From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf” “talksAbout” • More types of media • More types of (political) events.
  • 34. Project ‘Talk of Europe / Traveling Clarin Campus’ 2014-2015 Funded by CLARIN-ERIC From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer, Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)
  • 35. Plans of ‘ToE/TTC’ 1.Publish proceedings of the EU parliamentary debates in RDF • hosted by DANS 2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we invite international partners to work with the data. 3.In collaboration with international partners: • enrich with annotations, e.g. topics, structured data about people, parties, etc. • link to national datasets, e.g. media or national parliaments