15. political discourseinthenewskb

Political Discourse in the
News  
(and other studies)
Antske Fokkens, VU University Amsterdam 
Political discourse in the News is joint work with: Ellis Aizenberg, Wouter van Atteveldt, Carlotta
Cassimassima, Franz-Xaver Geiger, Laura Hollink, Annick van der Peet, Chantal van Son

Overview
• Introduction
• Interdisciplinary research & research questions
• Text analysis
• From basic to complex: possibilities and challenges
• Methodological issues
• Conclusion

Introduction
• Interdisciplinary research:
• Social Science: manual annotation, research
questions
• Humanities: research questions
• Computer Science: modeling, visualization
• Computational Linguistics: text analysis

Introduction
• Research questions:
• Has personalization increased in political news?
• What trends do we see in reported political
conﬂicts?
• How does news reporting relate to the
parliamentary debates?
• What perspectives are expressed by news
(explicitly and implicitly)?

Approaches
• Manual annotations:
• Expert (communication science researchers
and Master students)
• Crowd (crowdsourcing)
• Automatic annotation:
• Basic as well as advanced NLP approaches

Text analysis
• AmCAT (Wouter van Atteveldt):
• Open source infrastructure that facilitates large-scale
analysis and manual content analysis of text
• BiographyNet/NewsReader pipeline (Piek Vossen’s cltl group):
• NLP modules for event (and event relation) extraction &
named entity recognition and disambiguation
• OpeNER tools (Piek Vossen’s cltl group):
• Sentiment analysis and opinion mining

Basic methods
• Counting:
• occurrences of names in text
• identifying words from word lists (e.g.
sentiment words)
• Topic modeling (e.g. LDA)

Basic methods
• Can easily be run on large datasets
• Can address research questions (e.g. Aizenberg
(2014) shows increase of personalization)
• Limited to overall trends and tendencies
• For some tasks, high risk of unreliable results:
• e.g. erg is listed with ‘negative sentiment’

More advanced analyses
• Can provide more detailed insight into the content of the text
• Scalability becomes an issue (several complex language
models)
• to illustrate:
• +/- 5 minutes per article (regular university cluster)
• 11 days for 1.3 million articles on Hadoop cluster at
SURFsara
• Accuracy can be low for difﬁcult tasks and because errors ‘pile
up’

Methodological issues
• Data interpretation
• Biases
• Example: OCR

Data interpretation
• Basic methods:
• results from counts are clear, but what do they
say?
• More advanced methods:
• attempt to provide semantic interpretations,
but what is the accuracy of the tools?

Biases
• One way to deal with errors is to assume that it is just noise in
a large pile of data
• This assumption works, if errors are equally distributed across
classes/information that matter for the research question
• For instance, counting sentiment related terms:
• are the lists for negative and positive terms of comparable
quality?
• does one of the list contain more ambiguous terms than the
other?

Bias example OCR
• Data from the KB still have some issues with OCR
• There tend to be more issues with older data
• Imagine we investigate whether emotional
expressions in text increased over time: 
 
Does worse OCR lead to a lower percentage of
identiﬁcation in older text?

Dealing with biases
• We cannot exclude the risk of biases completely
• We can:
• try to make sure researchers using output are
aware of the details of the method (raise
awareness of possible biases)
• carry out both intrinsic and extrinsic evaluation,
i.e. explicitly investigate the inﬂuence of a bias
on overall results

Conclusion
• Several research directions where technology (including NLP, linked
data, visualizations) is used to support research in Humanities and
Social Sciences
• NLP approaches vary from basic to complex pipelines carrying out
several steps
• Basic approaches can easily be applied to large datasets are
transparent, but do not say much
• More advanced approaches provide detailed information, but cannot
easily be applied to large sets and are less transparent
• Insight into how data was processed and both intrinsic and extrinsic
evaluation is needed to raise awareness about (or even avoid?) biases

References
• AmCAT: http://vanatteveldt.com/amcat/
• BiographyNet/NewsReader pipeline:
• Rodrigo Agerri et al. (2015). Event Detection version 2.2. NewsReader
Deliverable 4.2.2. http://www.newsreader-project.eu/ﬁles/2012/12/NWR-D4-2-2.pdf
• Methodological issues:
• Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek Vossen, Susan Legêne
and Guus Schreiber. 2014. BiographyNet: Methodological issues when NLP
supports historical research. Proceedings of LREC 2014. 
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1103_Paper.pdf
• Niels Ockeloen, Antske Fokkens, Serge ter Braake, Piek Vossen, Victor de
Boer, Guus Schreiber, and Susan Legêne. 2013. BiographyNet: Managing
Provenance at multiple levels and from different perspectives. Proceedings of
Linked Science 2013. http://linkedscience.org/wp-content/uploads/2013/04/paper7.pdf

15. political discourseinthenewskb

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to 15. political discourseinthenewskb

Similar to 15. political discourseinthenewskb (20)

More from ingeangevaare

More from ingeangevaare (17)

Recently uploaded

Recently uploaded (20)

15. political discourseinthenewskb