Groningen nl pgroep

PoliticalMashup 1

PoliticalMashup
Connecting promises and actions of politicians and how
the society reacts on them

Maarten Marx

Universiteit van Amsterdam

Groningen, α-informatica, 2011-03-11

PoliticalMashup 2

Content

• Overview PoliticalMashup project

• Zooming in on one cultural heritage dataset

• A few example applications

• Research ideas for NLP-scientists.

PoliticalMashup 3

Who am I?

• Political scientist turned computer scientist

• My ﬁeld:
• Theory of XML Database Systems
• Semi Structured Information Retrieval

• Cooperation with
• Tweede Kamer
• Koninklijke Bibliotheek,
• historians at NIOD, DNPP

PoliticalMashup 4

PoliticalMashup project

• Large scale data integration project

• 2 years NWO funded infrastructure project 2010-2012

• Partners: U. Amsterdam, Groningen and Tilburg

• Ongoing with irregular funding since 2008

PoliticalMashup 5

Goal of PoliticalMashup

• Making huge amounts of textual data available for

• large scale automatic quantitative data and content analysis

• done by scientists from the humanities and social sciences.

PoliticalMashup 6

Mashup of what and how?

• 4 data sources
Promises and actions of politicians
Reactions on those in media and general public

• Connect data on
Political entities
Time
Topics

PoliticalMashup 7

Data sources

Promises
• Election manifestos, mostly scans, DNPP
• Party websites and blogs, Archipol
• Twitter of politicians

Actions Parliamentary proceedings, mostly scans, KB

Reactions
• News media
• User generated content Fora, Blogs, Comments on news,
Twitter

PoliticalMashup 8

Used techniques

• Text analytics and XML DB and IR technology

• Named entity recognition and normalization

• Data mining, Machine Learning, hand-crafted rules

• Natural Language Processing, Language Models

Make implicit structure and information explicit.

PoliticalMashup 9

Zoom in on one data corpus

PoliticalMashup 10

Longitudinal data

• weakly measurement for over 150 years

• very stable measurement procedure and data model

PoliticalMashup 11

Data about human behaviour

PoliticalMashup 12

Often rather boring

PoliticalMashup 13

But sometimes full of drama and excitement

PoliticalMashup 14

Loads of measurement points

24.000 days, 450.000 topics, 7.5 miljoen speeches

PoliticalMashup 15

Digitally available

PoliticalMashup 16

De Handelingen der Staten Generaal (Dutch
Hansards)

PoliticalMashup 17

About this collection

• very sparse available metadata

• very rich “metadata” sits hidden inside the raw data

• Rich data model
• Meeting (1 Day)
• Topic
• Stage direction
• Scene
• Stage direction
• Speech
• Paragraph

PoliticalMashup 18

Same data: diﬀerent views

• Raw data in PDF

• XML styled with stylesheet

• Machine readable XML format

PoliticalMashup 19

Some applications of this

PoliticalMashup 20

Content and structure search

• Combine IR style keyword search with restrictions on structure.

• E.g., return speeches by Wilders about Islam

PoliticalMashup 21

Exhaustive data collection

• Example query for NIOD historians

• Search for paragraphs about fascisme OR nazisme OR dictatuur
OR (nazi AND dictatuur) OR . . .

• Return a tsv ﬁle with for each hit date speakername speakerid
speaker-party . . .

• NIOD query

PoliticalMashup 22

Link the proceedings to entities

• Who is speaking?

• Who says what to whom?

Applications

• Summary of one speaker

• On old OCRed data: Linking and resolving entities

PoliticalMashup 23

Application: Interruption graph (Attackogram)

• MP A interrupts B ⇐⇒ A speaks during the block of B.

PoliticalMashup 24

NLP research topics

PoliticalMashup 25

0) Topics

• Common European thesaurus http://eurovoc.europa.eu

• detection

• classiﬁcation (sentence, paragraph, speech level)

PoliticalMashup 26

1) Populist language in parliament

• PhD Thesis Jan Jagers (2006).

PoliticalMashup 27

2) Automatically detecting promises (’toezegging’)
by ministers in Parliament

• https:
//zoek.officielebekendmakingen.nl/kst-103196.pdf
(pagina 56)

• Eerste Kamer has a nice database online
http://www.eerstekamer.nl/toezeggingen_2

PoliticalMashup 28

Example

De voorzitter: Ik constateer dat wij bijna aan het einde van deze
vergadering zijn gekomen. Wij hebben nog tijd om even de
toezeggingen langs te lopen. Ik vraag iedereen om op te letten of er
niets over het hoofd is gezien. Ik zal dit snel doen en daarna spreken
wij nog even over het vervolg. De toezeggingen.
Na de zomer ligt het wetsvoorstel bij de Kamer.
Er komt een brief om de Kamer erover te informeren op welke wijze
er voorkomen wordt dat er expertise verloren gaat.
Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niet
toegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niet
toegezegd.

PoliticalMashup 29

3) Opinion detection

• Detect opinions expressed about entities and topics. (Speaker is
known)

• Detect reported speech.

PoliticalMashup 30

4) Detect type of speech

• Interruption, attack, answer, speech (“betoog”), ’stage-direction’,
...

• http://data.politicalmashup.nl/debates/nl/
h-ek-19961997-37-58.1-tijdslijn.html

PoliticalMashup 31

5) Detect “bullshit”

• Tautologi¨en . . .
e

• Regels zijn regels, Op is op

• p→p

• het is wat het is

PoliticalMashup 32

6) Spelling normalization

• Dutch had many spelling reforms.

• Leads to lower recall.

• Search in new spelling, return results in old spellings.

PoliticalMashup 33

Lots of data available: happy to share

• Now: 15 years of Dutch Parliamentary Proceedings in rich XML

• Now: 200 years more in poorer XML, slowly getting richer.

• Parliamentary proceedings from EU (15y), UK (75y), Spain (40y),
Scandinavian countries, . . .

• Election manifestos (provincial elections 2007 and 2011)

• All tweets, blogs, Flickr and Youtube of all Dutch national
politicians since 1.5 year.

PoliticalMashup 34

Thanks

maartenmarx@uva.nl

Groningen nl pgroep

Recommended

Recommended

More Related Content

Similar to Groningen nl pgroep

Similar to Groningen nl pgroep (20)

More from maartenmarx

More from maartenmarx (11)

Groningen nl pgroep