PoliticalMashup                                            1                     PoliticalMashup  Connecting promises and ...
PoliticalMashup                                 2                            Content• Overview PoliticalMashup project• Zo...
PoliticalMashup                                   3                           Who am I?• Political scientist turned comput...
PoliticalMashup                                         4                  PoliticalMashup project• Large scale data integ...
PoliticalMashup                                                  5                  Goal of PoliticalMashup• Making huge a...
PoliticalMashup                                          6                     Mashup of what and how?• 4 data sources    ...
PoliticalMashup                                               7                          Data sourcesPromises    • Electio...
PoliticalMashup                                       8                      Used techniques• Text analytics and XML DB an...
PoliticalMashup                                9                  Zoom in on one data corpus
PoliticalMashup                                      10                     Longitudinal data• weakly measurement for over...
PoliticalMashup                                11                  Data about human behaviour
PoliticalMashup                         12                  Often rather boring
PoliticalMashup                                       13         But sometimes full of drama and excitement
PoliticalMashup                                                       14                       Loads of measurement points...
PoliticalMashup                         15                  Digitally available
PoliticalMashup                                      16         De Handelingen der Staten Generaal (Dutch                 ...
PoliticalMashup                                          17                    About this collection• very sparse availabl...
PoliticalMashup                               18                  Same data: different views• Raw data in PDF• XML styled w...
PoliticalMashup                               19                  Some applications of this
PoliticalMashup                                                     20                  Content and structure search• Comb...
PoliticalMashup                                                   21                  Exhaustive data collection• Example ...
PoliticalMashup                                       22                  Link the proceedings to entities• Who is speakin...
PoliticalMashup                                          23       Application: Interruption graph (Attackogram)• MP A inte...
PoliticalMashup                         24                  NLP research topics
PoliticalMashup                                        25                            0) Topics• Common European thesaurus ...
PoliticalMashup                                        26                  1) Populist language in parliament• PhD Thesis ...
PoliticalMashup                                       27 2) Automatically detecting promises (’toezegging’)            by ...
PoliticalMashup                                                          28                             ExampleDe voorzitt...
PoliticalMashup                                                      29                    3) Opinion detection• Detect op...
PoliticalMashup                                                         30                  4) Detect type of speech• Inte...
PoliticalMashup                               31                       5) Detect “bullshit”• Tautologi¨en . . .           ...
PoliticalMashup                                              32                  6) Spelling normalization• Dutch had many...
PoliticalMashup                                                     33                  Lots of data available: happy to s...
PoliticalMashup                        34                      Thanks                  maartenmarx@uva.nl
Upcoming SlideShare
Loading in …5
×

Groningen nl pgroep

404 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
404
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Groningen nl pgroep

  1. 1. PoliticalMashup 1 PoliticalMashup Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Groningen, α-informatica, 2011-03-11
  2. 2. PoliticalMashup 2 Content• Overview PoliticalMashup project• Zooming in on one cultural heritage dataset• A few example applications• Research ideas for NLP-scientists.
  3. 3. PoliticalMashup 3 Who am I?• Political scientist turned computer scientist• My field: • Theory of XML Database Systems • Semi Structured Information Retrieval• Cooperation with • Tweede Kamer • Koninklijke Bibliotheek, • historians at NIOD, DNPP
  4. 4. PoliticalMashup 4 PoliticalMashup project• Large scale data integration project• 2 years NWO funded infrastructure project 2010-2012• Partners: U. Amsterdam, Groningen and Tilburg• Ongoing with irregular funding since 2008
  5. 5. PoliticalMashup 5 Goal of PoliticalMashup• Making huge amounts of textual data available for• large scale automatic quantitative data and content analysis• done by scientists from the humanities and social sciences.
  6. 6. PoliticalMashup 6 Mashup of what and how?• 4 data sources Promises and actions of politicians Reactions on those in media and general public• Connect data on Political entities Time Topics
  7. 7. PoliticalMashup 7 Data sourcesPromises • Election manifestos, mostly scans, DNPP • Party websites and blogs, Archipol • Twitter of politiciansActions Parliamentary proceedings, mostly scans, KBReactions • News media • User generated content Fora, Blogs, Comments on news, Twitter
  8. 8. PoliticalMashup 8 Used techniques• Text analytics and XML DB and IR technology• Named entity recognition and normalization• Data mining, Machine Learning, hand-crafted rules• Natural Language Processing, Language Models Make implicit structure and information explicit.
  9. 9. PoliticalMashup 9 Zoom in on one data corpus
  10. 10. PoliticalMashup 10 Longitudinal data• weakly measurement for over 150 years• very stable measurement procedure and data model
  11. 11. PoliticalMashup 11 Data about human behaviour
  12. 12. PoliticalMashup 12 Often rather boring
  13. 13. PoliticalMashup 13 But sometimes full of drama and excitement
  14. 14. PoliticalMashup 14 Loads of measurement points 24.000 days, 450.000 topics, 7.5 miljoen speeches
  15. 15. PoliticalMashup 15 Digitally available
  16. 16. PoliticalMashup 16 De Handelingen der Staten Generaal (Dutch Hansards)
  17. 17. PoliticalMashup 17 About this collection• very sparse available metadata• very rich “metadata” sits hidden inside the raw data• Rich data model• Meeting (1 Day) • Topic • Stage direction • Scene • Stage direction • Speech • Paragraph
  18. 18. PoliticalMashup 18 Same data: different views• Raw data in PDF• XML styled with stylesheet• Machine readable XML format
  19. 19. PoliticalMashup 19 Some applications of this
  20. 20. PoliticalMashup 20 Content and structure search• Combine IR style keyword search with restrictions on structure.• E.g., return speeches by Wilders about Islam
  21. 21. PoliticalMashup 21 Exhaustive data collection• Example query for NIOD historians• Search for paragraphs about fascisme OR nazisme OR dictatuur OR (nazi AND dictatuur) OR . . .• Return a tsv file with for each hit date speakername speakerid speaker-party . . .• NIOD query
  22. 22. PoliticalMashup 22 Link the proceedings to entities• Who is speaking?• Who says what to whom?Applications• Summary of one speaker• On old OCRed data: Linking and resolving entities
  23. 23. PoliticalMashup 23 Application: Interruption graph (Attackogram)• MP A interrupts B ⇐⇒ A speaks during the block of B.
  24. 24. PoliticalMashup 24 NLP research topics
  25. 25. PoliticalMashup 25 0) Topics• Common European thesaurus http://eurovoc.europa.eu• detection• classification (sentence, paragraph, speech level)
  26. 26. PoliticalMashup 26 1) Populist language in parliament• PhD Thesis Jan Jagers (2006).
  27. 27. PoliticalMashup 27 2) Automatically detecting promises (’toezegging’) by ministers in Parliament• https: //zoek.officielebekendmakingen.nl/kst-103196.pdf (pagina 56)• Eerste Kamer has a nice database online http://www.eerstekamer.nl/toezeggingen_2
  28. 28. PoliticalMashup 28 ExampleDe voorzitter: Ik constateer dat wij bijna aan het einde van dezevergadering zijn gekomen. Wij hebben nog tijd om even detoezeggingen langs te lopen. Ik vraag iedereen om op te letten of erniets over het hoofd is gezien. Ik zal dit snel doen en daarna sprekenwij nog even over het vervolg. De toezeggingen.Na de zomer ligt het wetsvoorstel bij de Kamer.Er komt een brief om de Kamer erover te informeren op welke wijzeer voorkomen wordt dat er expertise verloren gaat.Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niettoegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niettoegezegd.
  29. 29. PoliticalMashup 29 3) Opinion detection• Detect opinions expressed about entities and topics. (Speaker is known)• Detect reported speech.
  30. 30. PoliticalMashup 30 4) Detect type of speech• Interruption, attack, answer, speech (“betoog”), ’stage-direction’, ...• http://data.politicalmashup.nl/debates/nl/ h-ek-19961997-37-58.1-tijdslijn.html
  31. 31. PoliticalMashup 31 5) Detect “bullshit”• Tautologi¨en . . . e• Regels zijn regels, Op is op• p→p• het is wat het is
  32. 32. PoliticalMashup 32 6) Spelling normalization• Dutch had many spelling reforms.• Leads to lower recall.• Search in new spelling, return results in old spellings.
  33. 33. PoliticalMashup 33 Lots of data available: happy to share• Now: 15 years of Dutch Parliamentary Proceedings in rich XML• Now: 200 years more in poorer XML, slowly getting richer.• Parliamentary proceedings from EU (15y), UK (75y), Spain (40y), Scandinavian countries, . . .• Election manifestos (provincial elections 2007 and 2011)• All tweets, blogs, Flickr and Youtube of all Dutch national politicians since 1.5 year.
  34. 34. PoliticalMashup 34 Thanks maartenmarx@uva.nl

×