Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
15. political discourseinthenewskb
1. Political Discourse in the
News
(and other studies)
Antske Fokkens, VU University Amsterdam
Political discourse in the News is joint work with: Ellis Aizenberg, Wouter van Atteveldt, Carlotta
Cassimassima, Franz-Xaver Geiger, Laura Hollink, Annick van der Peet, Chantal van Son
2. Overview
• Introduction
• Interdisciplinary research & research questions
• Text analysis
• From basic to complex: possibilities and challenges
• Methodological issues
• Conclusion
3. Introduction
• Interdisciplinary research:
• Social Science: manual annotation, research
questions
• Humanities: research questions
• Computer Science: modeling, visualization
• Computational Linguistics: text analysis
4. Introduction
• Research questions:
• Has personalization increased in political news?
• What trends do we see in reported political
conflicts?
• How does news reporting relate to the
parliamentary debates?
• What perspectives are expressed by news
(explicitly and implicitly)?
5. Approaches
• Manual annotations:
• Expert (communication science researchers
and Master students)
• Crowd (crowdsourcing)
• Automatic annotation:
• Basic as well as advanced NLP approaches
6. Text analysis
• AmCAT (Wouter van Atteveldt):
• Open source infrastructure that facilitates large-scale
analysis and manual content analysis of text
• BiographyNet/NewsReader pipeline (Piek Vossen’s cltl group):
• NLP modules for event (and event relation) extraction &
named entity recognition and disambiguation
• OpeNER tools (Piek Vossen’s cltl group):
• Sentiment analysis and opinion mining
7. Basic methods
• Counting:
• occurrences of names in text
• identifying words from word lists (e.g.
sentiment words)
• Topic modeling (e.g. LDA)
8. Basic methods
• Can easily be run on large datasets
• Can address research questions (e.g. Aizenberg
(2014) shows increase of personalization)
• Limited to overall trends and tendencies
• For some tasks, high risk of unreliable results:
• e.g. erg is listed with ‘negative sentiment’
10. More advanced analyses
• Can provide more detailed insight into the content of the text
• Scalability becomes an issue (several complex language
models)
• to illustrate:
• +/- 5 minutes per article (regular university cluster)
• 11 days for 1.3 million articles on Hadoop cluster at
SURFsara
• Accuracy can be low for difficult tasks and because errors ‘pile
up’
12. Data interpretation
• Basic methods:
• results from counts are clear, but what do they
say?
• More advanced methods:
• attempt to provide semantic interpretations,
but what is the accuracy of the tools?
13. Biases
• One way to deal with errors is to assume that it is just noise in
a large pile of data
• This assumption works, if errors are equally distributed across
classes/information that matter for the research question
• For instance, counting sentiment related terms:
• are the lists for negative and positive terms of comparable
quality?
• does one of the list contain more ambiguous terms than the
other?
14. Bias example OCR
• Data from the KB still have some issues with OCR
• There tend to be more issues with older data
• Imagine we investigate whether emotional
expressions in text increased over time:
Does worse OCR lead to a lower percentage of
identification in older text?
15. Dealing with biases
• We cannot exclude the risk of biases completely
• We can:
• try to make sure researchers using output are
aware of the details of the method (raise
awareness of possible biases)
• carry out both intrinsic and extrinsic evaluation,
i.e. explicitly investigate the influence of a bias
on overall results
16. Conclusion
• Several research directions where technology (including NLP, linked
data, visualizations) is used to support research in Humanities and
Social Sciences
• NLP approaches vary from basic to complex pipelines carrying out
several steps
• Basic approaches can easily be applied to large datasets are
transparent, but do not say much
• More advanced approaches provide detailed information, but cannot
easily be applied to large sets and are less transparent
• Insight into how data was processed and both intrinsic and extrinsic
evaluation is needed to raise awareness about (or even avoid?) biases