1. Bart de Goede Maarten Marx
Political slant in public
broadcasting
9 June 2011
Politicologenetmaal, Amsterdam
woensdag 8 juni 2011 (week )
2. Research aim
• Frivolous research for a bachelor thesis
• Research aim: Apply methodology Gentzkow & Shapiro
(2010) to Dutch situation, perhaps improve using NLP
• Future applications:
• Analysis of Dutch media landscape (NewsMonitor)
• Agendasetting and framing research (Timmermans,
Breeman)
• Parliament and media: lag or lead? (Vliegenthart)
woensdag 8 juni 2011 (week )
3. Disclaimer
• We are information scientists, not political scientists
• We might have made awful conceptual mistakes
• We will have missed almost all important references
woensdag 8 juni 2011 (week )
4. Disclaimer
• Our aim is to show a powerful technique
• We concentrated on getting the data ‘in shape’, rather than
interpretation of results
woensdag 8 juni 2011 (week )
5. Talk outline
1. Research plan and methodology
2. Description of our research
3. Results
4. What’s next?
woensdag 8 juni 2011 (week )
6. Gentzkow & Shapiro
• Econometrical research: compare language of news
outlets to political language
• ‘An economically significant demand for news slanted
towards one’s own political ideology’
Gentzkow, M. and Shapiro, J. M. (2010). What drives media slant? Evi-
dence from U.S. daily newspapers. Econometrica, 78(1):35–71.
woensdag 8 juni 2011 (week )
7. Gentzkow & Shapiro
Operationalization
• Find characteristic words and phrases of Democrats and
Republicans in Hansards (‘death tax’ versus ‘estate tax’)
• Count relative frequencies of these words in newspapers
• Score newspapers on ‘political slant’ by comparing
frequencies of Democratic and Republican words
• ... (even more, but not relevant to us)
woensdag 8 juni 2011 (week )
8. Our research
Reproduce, with some alterations
• Dutch versus English: compound words, unigrams instead
of bigrams
• Television data instead of newspapers
• Far more political parties
• Other, more powerful technique for finding characteristic
words
woensdag 8 juni 2011 (week )
9. Our research
An outline
1. Collecting TV data
2. Selecting appropriate broadcasts
3. Defining political groups
4. Obtaining data for each group
5. Obtaining characteristic words
6. Compare word use in political groups and TV broadcasts
woensdag 8 juni 2011 (week )
10. TV Data
• Subtitles for the hearing impaired (http://tt888.nl)
• Complete data from January 2008 till February 2011
• Problem: hardly any useful metadata (63% only has date
and time of broadcast)
woensdag 8 juni 2011 (week )
11. TV Data
Solution Before After
Programme
• TV guide with title
16.995 32.491
• Used http://tv2day.nl to Unique 4.560 ->
combine broadcast time 2.238
titles 2.702
with (unambiguous)
program title Single
1.598 1.174
events
Broadcast
frequency 1.104 1.064
>2
woensdag 8 juni 2011 (week )
12. Selected broadcasts
Nova
362.844 words
Pauw & Witteman
895.935 words
DWDD
1.626.929 words
EenVandaag
1.556.642 words
Nos Journaal
12.609.620 words Goedemorgen Nederland
760.658 words
Netwerk
879.635 words
NOS Jeugdjournaal
1.383.728 words
Buitenhof DWDD
EenVandaag Goedemorgen Nederland
Het Elfde Uur Holland Doc
Knevel en Van den Brink Netwerk
Nieuwsuur NOS Jeugdjournaal
Nos Journaal Nova
Ochtendspits Pauw & Witteman
PowNews SchoolTV Weekjournaal
Sinterklaasjournaal Tegenlicht
Uitgesproken Vragenuurtje
Zembla
woensdag 8 juni 2011 (week )
13. Political groups
• Parliamentary period with greatest overlap on TV data set:
Balkenende IV
• Experiments with e.g. Wordfish have shown that text
comparisons mostly measure government - opposition,
not left - right (Hirst et al., 2010)
Hirst, G., Riabinin, Y., Graham, J., and Boizot-Roche, M. Text to Ideology
or Text to Party Status?
woensdag 8 juni 2011 (week )
14. Political groups
• Therefore, we choose:
• Government (CDA, PvdA and ChristenUnie)
• Left wing opposition (GroenLinks, SP)
• Right wing opposition (PVV, VVD)
woensdag 8 juni 2011 (week )
15. Obtaining
Proceedings data
Trivial, using the PoliticalMashup database
$collection//HAN1995//
root[date restriction]//
speech[@party matches(party names)]/p/text()
Explain query:
HAN1995: all
woensdag 8 juni 2011 (week ) since 1995
16. Characteristic
words
Parsimonious language model
• Transform word frequency counts into probability
distributions of words (maximum likelyhood estimation)
• Compare distributions of subsets to distribution of all words
• Choose words from subset whose frequency is much
higher than expected
λ(t|D)
• Adjust probabilities et = tf (t, D) ·
(1 − λ)P (t|C) + λP (t|D)
• Iterate to convergence P (t|D) =
et
t et
woensdag 8 juni 2011 (week )
17. Characteristic
words
Why take the trouble?
• Filter out (corpus specific) ‘stopwords’ (e.g. ‘voorzitter’)
• Remove noise (‘kopvoddentaks’ out, ‘sharia’ in)
woensdag 8 juni 2011 (week )
18. In action
Top 5 characteristic words
left (SP, GroenLinks) right (PVV, VVD)
leraar politie
student crimineel
kinderombudsman straf
docent illegaal
bonus boete
woensdag 8 juni 2011 (week )
19. In action
Source: http://politiekinzicht.com
woensdag 8 juni 2011 (week )
20. In action
Source: http://politiekinzicht.com
woensdag 8 juni 2011 (week )
25. Comparison
1. Find most characteristic words for each political group
2. For each political group, estimate the probability that an
arbitrary word in a tv-programme is one of their characteristic
words
tft,T V
ˆ
P (q|T V ) =
t∈q
|T V |
woensdag 8 juni 2011 (week )
26. Results
DWDD
0,700
Estimated probability of words appearing
0,525
0,350
0,175
0
50 100 150 200 250 500 750 1000 1500 2000 2500 3000
n parsimonious derived words
gov left right *condensed values on x-axis
woensdag 8 juni 2011 (week )
27. Results
PowNews
0,700
Estimated probability of words appearing
0,525
0,350
0,175
0
50 100 150 200 250 500 750 1000 1500 2000 2500 3000
n parsimonious derived words
gov left right
*condensed values on x-axis
woensdag 8 juni 2011 (week )
28. Results
News (Journaal, Ochtendspits, etc.)
0,040
Estimated probability of words appearing
0,030
0,020
0,010
0
50 100 150 200 250
n parsimonious derived words
cda christenunie d66 groenlinks
pvda pvdd pvv sgp
sp verdonk vvd
woensdag 8 juni 2011 (week )
29. Results
Talkshows
0,030
Cumulative probability of words appearing
0,023
0,015
0,008
0
50 100 150 200 250
n parsimonious derived words
cda christenunie d66 groenlinks
pvda pvdd pvv sgp
sp verdonk vvd
woensdag 8 juni 2011 (week )
30. ‘Conclusions’
• Right never ‘wins’
• Possible explanations:
• TV = left church
• TV does not pick up right-wing slanted words
• Or: is TV-language use not different from regular Dutch?
woensdag 8 juni 2011 (week )
31. What’s next?
• First, turn all this into a bachelor thesis (deadline in two weeks)
• Future:
• Team up with researcher(s) in political science and media
analysis
Candidates?
• Try out more sophisticated NLP techniques
• ...
• Publish article
woensdag 8 juni 2011 (week )
32. Questions?
Slides available at http://www.politicalmashup.nl
woensdag 8 juni 2011 (week )