Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PoliMedia presentation NOTaS meeting


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

PoliMedia presentation NOTaS meeting

  1. 1. Interlinking multimedia for the analysis of media coverage of political debates Max Kemman & Henri Beunders NOTaS meeting
  2. 2. Main goal• Aimed at Humanities researchers• Using CLARIN standard25-6-2012 PoliMedia - NOTaS meeting 2
  3. 3. Main research question What choices do different media make in the coverage of people and topics while reporting on debates in the Dutch parliament since the first televised evening news in 1956 until 1995?25-6-2012 PoliMedia - NOTaS meeting 3
  4. 4. Historical research use case• How did the European Monetary Union (EMU) come to be in the 1990’s?• What events led to the becoming of the EMU?• How was this all represented by the media at that time?25-6-2012 PoliMedia - NOTaS meeting 4
  5. 5. Current approach + = Too much work Limited material + = and different systems25-6-2012 PoliMedia - NOTaS meeting 5
  6. 6. PoliMedia approach PoliMedia Newspapers Portal KB 1950-1995 Staten - Browse: Generaal Television debate and Digitaal Sound and Vision date KB 1956-1995 - Search: 1818-1995 Radio debate and KB person 1950-198425-6-2012 PoliMedia - NOTaS meeting 6
  7. 7. Why PoliMedia?Better insight in the relations between mediaitems25-6-2012 PoliMedia - NOTaS meeting 7
  8. 8. Data sets• Primary data set: • The Dutch parliamentary debates (Handelingen der Staten-General (Dutch Hansard)) • Available at the KB in raw format • Made CLARIN compliant in War In Parliament project – chronological structure of consecutive speakers in a debate• Secondary data set: 1. NISV Academia set (OAI protocol) 2. KB - newspapers (SRU protocol) 3. KB - radio bulletins (SRU protocol)25-6-2012 PoliMedia - NOTaS meeting 8
  9. 9. Current status of technical work1. Extract structure of debates2. Find named entities in debate texts: people, organizations, locations.3. Find links between debates and media.25-6-2012 PoliMedia - NOTaS meeting 9
  10. 10. 1. Debate dataset structure Debatemetadata TopicSpeaker SpeechSegment 25-6-2012 PoliMedia - NOTaS meeting 10
  11. 11. Debate metadata schema 2011-12-14 Stemmingen over… poli:hasNextSpeech poli:hasNextSegment poli:hasPubDate poli:hasDesc sem:hasActor speech debate speech segment poli:hasSpeech poli:hasSpeechSegment poli:hasDesc poli:MediaType Natuur en milieu poli:coveredIn Dbpedia: transcript poli:mentions (media) (People, locations, organizations)25-6-2012 PoliMedia - NOTaS meeting 11
  12. 12. 2. Named Entity Recognition in debates• Fietstas: web services for processing textual content –• Lists of named entities (NEs) that appear in specific documents or sets of documents• Works well with Dutch language (unlike other popular services like Dbpedia spotlight)25-6-2012 PoliMedia - NOTaS meeting 12
  13. 13. Named Entity Recognition debate1 .xml debate1 ner1 .xml .xml debate2 .xml debate2 ner2 .xml .xml debate3 debate3 ner3 .xml .xml .xml25-6-2012 PoliMedia - NOTaS meeting 13
  14. 14. Named Entity Recognition •Persons •Organizations •Locations •Miscellaneous25-6-2012 PoliMedia - NOTaS meeting 14
  15. 15. 3. Find links to newspapers and radio bulletinsWe use the dates, topics, named entities andspeakers of the debates to query the mediaarchives.Media document harvesting:• SRU protocol (Search and retrieval via URL )•• JSRU is a Java implementation of the SRU protocol at the KB25-6-2012 PoliMedia - NOTaS meeting 15
  16. 16. Automatic Query Construction • Persons, Locations and Organizations Debate Metadata mentioned inside topics of the debate • Speakers Topic 1 TopicList = PersonsInTopic LocationsInTopic Org.InTopic Speaker 1 / Content Speaker 2 / Content + Speaker n = Speaker 3 / Content ActorFromSegment TimeFrame Topic 2 Example query: give all the newspaper issues in the collection Speaker 1 / Content Query DDD_krantnr where the date value is between 01-01-1940 and 31-12-194525-6-2012 PoliMedia - NOTaS meeting
  17. 17. Newspaper metadata 1951-11-08 SCHUTJASSEN poli:hasPubDate poli:hasTitle De Heerenveensche poli:PublishedIn koerier article instance poli:Mentions poli:MediaType Dbpedia: Newspaper article25-6-2012 PoliMedia - NOTaS meeting 17
  18. 18. Radio bulletin metadata 1946/05/06 ANP Nieuwsbericht - 06-05-1946 - 10 poli:hasPubDate poli:hasTitle article instance25-6-2012 PoliMedia - NOTaS meeting Dbpedia: Radio bulletin 18
  19. 19. The date of a debate and a media article • We use the dates, topics, named entities and speakers of the debates to query the media archives. • News item is always at the same day or after the debate. • How much time should we allow between debate and media item? • Current choice: 1 month. Result 1-26 of 26 results for “Princen” AND “Van Mierlo” Timeframe: one month period: • 26 articles in period between 21/12. and 21/0125-6-2012 • 7 on day of the PoliMedia - NOTaS meeting debate, only 1 article 1 month later. 19
  20. 20. Debate → Newspaper example Dates between: 21.12.1994.(debate date) 21.01.1995. • Queries: o Small numbers of topics (to avoid overspecialization) o Shorter timespan (fast media cycle)25-6-2012 PoliMedia - NOTaS meeting 20
  21. 21. Overview PersonsInTopic LocationsInTopic Org.InTopic TimeFrame Query ActorFromSegment25-6-2012 PoliMedia - NOTaS meeting 21
  22. 22. PoliMedia+• Elections in September 300 influential political Twitter accounts25-6-2012 PoliMedia - NOTaS meeting 22
  23. 23. What can you do with this?• PoliMedia allows a better insight between politics and media• What can Speech- and Language-technologists do with it?25-6-2012 PoliMedia - NOTaS meeting 23
  24. 24. Contact kemman@eshcc.eur.nlAcknowledgements• Rest of the team – Laura Hollink (VU), Geert-Jan Houben, Damir Juric (TU Delft), Johan Oomen, Jaap Blom (NISV), Martijn Kleppe (EUR) – KB• War in Parliament• CLARIN – Arjan van Hessen25-6-2012 PoliMedia - NOTaS meeting 24