Connecting political data to media data
Upcoming SlideShare
Loading in...5
×
 

Connecting political data to media data

on

  • 109 views

Presentation given at ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ on February 18, 2014

Presentation given at ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ on February 18, 2014

Statistics

Views

Total Views
109
Views on SlideShare
108
Embed Views
1

Actions

Likes
1
Downloads
1
Comments
0

1 Embed 1

http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Connecting political data to media data Connecting political data to media data Presentation Transcript

  • Connecting political data to media data Laura Hollink VU University Amsterdam Web & Media group ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ February 18, 2014
  • Laura Hollink Damir Juric Geert-Jan Houben Funded by Clarin-NL Martijn Kleppe Max Kemman Henri Beunders Johan Oomen Jaap Blom
  • Questions we want to answer • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).
  • Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.
  • Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Roughly 1.8 Million news bulletins between 1937-1984 (We only use 1945-1995) Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995. (We only use 1945-1995)
  • PoliMedia methods
  • Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF XML by War in Parliament Project Handelingen Verenigde Vergadering... Debate PartOfDebate DebateContext rdf:type rdf:type rdf:type 1945-11-20 dc:date Dutch dc:language nl.proc.sgd.d. 194519460000002 hasPart nl.proc.sgd.d. 194519460000002.1 hasPart nl.proc.sgd.d. 194519460000002.1.1 hasText "De voorzitter opent de vergadering…" dc:publisher dc:id http://statengeneraaldigitaal.nl/ dc:source nl.proc.sgd.d.19720000002 hasSubsequentPartOfDebate hasPart dc:source http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 "Mijnheer de Voorzitter, de Commissie van …" member_of _parliament Speech nl.proc.sgd.d. 194519460000002.2 hasSpokenText hasRole rdf:type rdf:type http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf Joannes Antonius James Politician foaf:firstName Barge foaf:lastName nl.proc.sgd.d. 194519460000002.1.2 sem:hasActor hasSpeaker Speaker_0006 4 rdfs:label Barge dc:source coveredIn http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr hasSubsequentSpeech http://resolver.politicalmashup.nl/nl.m.00064 hasParty nl.proc.sgd.d. 194519460000002.1.3 Party Katholieke Volkspartij rdf:type hasFullName Party_kvp hasAcronym KVP
  • Modeling the debates as events • An event has a date, a location, actors, and possibly sub-events. • We build on the Simple Event Model (SEM). • links to the original sources • reusing existing vocabularies Handelingen Verenigde Vergadering... Debate dc:title 1945-11-20 rdf:type dc:date Dutch dc:language nl.proc.sgd.d. 194519460000002 dc:publisher dc:id http://statengeneraaldigitaal.nl/ dc:source nl.proc.sgd.d.19720000002 dc:source http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002 http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
  • Handelingen Verenigde Vergadering... PartOfDebate rdf:type dc:title nl.proc.sgd.d. 194519460000002 hasPart DebateContext rdf:type nl.proc.sgd.d. 194519460000002.1 hasPart nl.proc.sgd.d. 194519460000002.1.1 hasText "Mijnheer de Voorzitter, de Commissie van …" hasSubsequentPartOfDebate hasPart Speech nl.proc.sgd.d. 194519460000002.2 rdf:type •the part-of structure and chronological order of the debates. "De voorzitter opent de vergadering…" nl.proc.sgd.d. 194519460000002.1.2 hasSubsequentSpeech nl.proc.sgd.d. 194519460000002.1.3 hasSpokenText
  • "Mijnheer de Voorzitter, de Commissie van …" Speech hasSpokenText rdf:type member_of _parliament Politician Joannes Antonius James hasRole rdf:type foaf:firstName Barge foaf:lastName nl.proc.sgd.d. 194519460000002.1.2 sem:hasActor coveredIn hasSpeaker Speaker_0006 4 rdfs:label Barge hasParty Party http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr Katholieke Volkspartij rdf:type hasFullName Party_kvp • the different roles and parties that a speaker can have in his/ her career. hasAcronym KVP
  • Step 2: Linking speeches in the debate to the newspaper articles that cover them We created a linking method to deal with our two challenges: 1.How to link documents that are so different in nature? 2. Can we use the structure of the debates: people, chronologic order of speeches, introductions to each new topic, etc? Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Queries Links between speeches and articles
  • Step 2: Linking speeches in the debate to the newspaper articles that cover them Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Queries Links between speeches and articles
  • Step 2: Linking speeches in the debate to the newspaper articles that cover them Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate Name of speaker Date of debate Search newspaper archive Candidate articles Rank candidate articles Debates Detect topics in speeches Topics Create queries Detect Named Entities in speeches Named Entities Links between speeches and articles Queries Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
  • Evaluation: what do we use to rank the candidate articles? • Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K = 0.5 • Compare text of candidate articles to: • Setting 1: Named Entities in speech • Setting 2: Named Entities + Topics in speech • Setting 3: Named Entities + Topics in speech and larger part-of-debate Score Setting 1 Setting 2 Setting 3 I don’t know 0.14 0.15 0.08 0 - unrelated 0.38 0.23 0.12 1- related 0.29 0.36 0.36 2- explicit mention of the debate 0.19 0.26 0.44 1+2 0.62 0.80 0.48
  • Results • An open data set of Dutch parliamentary debates, • with almost 3 Million links between 450.000 speeches and URL’s of 1.5 Million news paper articles and radio bulletins at the National Library. • accessible though a Web demonstrator and through a SPARQL endpoint.
  • Demo
  • SPARQL endpoint • A service to query a knowledge base using the SPARQL query language. “All speeches with more than 60 associated news items.” SELECT ?speech ?no_newsitems {{ SELECT ?speech (COUNT(?news) AS ?no_news_items) WHERE{ ?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news . } GROUP BY ?speech } FILTER (?no_news_items > 60) }
  • Reflection: to what extend can we answer these questions? • Which events have attracted a lot of media attention? • What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins? • Has the coverage changed over time? • How are the events visualized (photos, layout of newspaper, etc.).
  • Future work • More types of links • From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf” “talksAbout” • More types of media • More types of (political) events.
  • Project ‘Talk of Europe / Traveling Clarin Campus’ 2014-2015 Funded by CLARIN-ERIC From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer, Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)
  • Plans of ‘ToE/TTC’ 1.Publish proceedings of the EU parliamentary debates in RDF • hosted by DANS 2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we invite international partners to work with the data. 3.In collaboration with international partners: • enrich with annotations, e.g. topics, structured data about people, parties, etc. • link to national datasets, e.g. media or national parliaments