SlideShare a Scribd company logo
Automated
Identification of Media
Bias by Word Choice
and Labeling in News
Articles
Anastasia Zhukova
Data & Knowledge
Engineering Group
29.09.2020
Short CV
2
2008 – 2014
Information technology, M.Eng.
Moscow Aviation Institute
2015 – 2019
Computer and Information Science, M. Sc.
University of Konstanz
2018
Graduate Student Researcher
Natural Language Processing group
National Institute of Informatics
2019 – present
Doctoral Researcher, Ph.D. Candidate
Data & Knowledge Engineering group
University of Wuppertal
Motivation
3
https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288
https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus
Agenda
4
• Background
• Methodology
• Results
• Conclusion
https://tgram.ru/channels/otsuka_bld
Media Bias Model
5
Ideological
View
Target
Audience
Owners Advertisers
Business Interest
Funding
...
Political Interest
Reputation
...
Gathering
Writing
Editing
News
Reality
News
Event
Perception
Consumers
News Production and Consumption Process
Presentation Style
• Placement
• Size Allocation
• Picture Selection
• Picture Explanation
Writing Style
• Labeling
• Word choice
Fact Selection
• Event Selection
• Source Selection
• Commission
• Omission
Political
View
Consumer Context
• Background Knowledge
• Attitude
• Social Status
• Country
Spin
Government
Reasons
Process
Forms
an arrogant person
Word Choice (WC)
Labeling (L)
a genius
F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018
a smart person
WCL problem
6
Word choice & labeling…
• strongly impacts the
public perception of news topics
• disturbs decision making process
• leads to false information propagation
Hurricane Katrina, 2005
F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
Social science background
7
Content analysis
“What to think about”
Frame analysis
“How to think about it”
Event-related articles
Putin
president
savior
tyrant
humble man
thief
Cross-document coreference resolution
president
savior
Putin
tyrant
humble man
thief
Sentiment analysis
president
savior
tyrant
humble man
thief
Putin
war
sanctions Crimea
army
Candidate extraction
Social
sciences
Computer
science
Identified actors, actions, events, concepts, etc. Concept polarization
Content analysis
Cross-document coreference resolution
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
Research question
8
Given
- No training set
- A set of event-related articles
- Extracted candidate phrases of groups of persons
Goal
- Find phrases referring to the same concepts
- Use only phrases themselves, i.e., no context information
- Exploratory unsupervised task
illegal aliens
undocumented immigrants
Directly referring mentions
White House officials
American authorities
Indirectly referring mentions
How can an automated approach identify
instances of bias by word choice and labeling
in the concepts (in)directly referring to groups of people
in a set of English news articles reporting on the same event?these
instances?
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
Multi-step merging approach (MSMA) 1.0
9
Corefs. & NPs ↓ number of mentions
…
…
Extraction of a
specific
attribute
…
recursion
Pairwise
comparison &
merging
…
“Winner takes it all” strategy
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
Merge using similar heads
10
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
illegal aliens who were brought as children
nearly 800,000 illegal aliens
illegal aliens
young illegal aliens
headsets
{illegals} {immigrants} {aliens}
similar in the vector space
Entity 1 Entity 2 Entity 3
the word alone is related to
the UFO; it will be merged
later as “illegal alien” at the
third step
Merge entities
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
headsets
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
Merge using representative phrases
11
A1: young immigrants,
A2: illegal immigrants,
A3: young illegals
young immigrants,
undocumented immigrants,
illegal immigrants,
young illegals,
endangered immigrants,
additional illegals
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Entity 1 Entity 2
Merged entities
Representative
labeling
phrases
B1: young people,
B2: foreign people
young people,
foreign people,
bad people,
estimated people
Sim.matrix
A1
A2
A3
B1 B2
1
1
1
0
0
0
3
2×3
≥ 0.3 → similar in the vector space
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Representative
labeling
phrases
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
MSMA 1.0 evaluation
12
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Init. Step 1 Step 2 Step 3 Step 4
F1 of concept types
Actor Country Misc Group
Core modifiers
Core meaning
Evaluation of the simplified version of NewsWCL50 annotation.
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
MSMA 1.0 Drawbacks
13
• Overparametrization
• Lack of stability
– small variation in wording affected results
• Few head modifiers used
– only adjectives
• Frequently falsely merged concepts
– American people – young immigrant people
– Chinese officials – American officials
• Low recall & low precision
– smaller related entities remain unmerged
– unrelated entities are merged
• “Winner takes it all” strategy is not optimal
Problems of MSMA 1.0 Goals of MSMA 2.0
• Self-controlled merging
• Default set of parameters for all datasets
• Stable performance in case of added phrases
• Use all head’s modifiers
• Keep concepts fine-grained
• Improve merging related smaller entities
Same challenge: unsupervised learning,
no training set
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
MSMA 2.0: Preprocessing
14
non-NE persons
NE (ORG) persons
core mentions
non-NE group
non-NE person
ORG person
generalizing mentions
specializing mentions
Republican establishment
GOP leaders,
Republicans
a red attorney general,
a Republican
Americans
U.S.
citizens
U.S. + citizens
2*U.S. + citizens
young
young + 2*U.S. + citizens
1. Concept’s sub-type prioritization 4. Weighting of the NE components
3. NE-grid: operation restriction or similarity amplification
immigrants
young + immigrants
GOP Republicans Republican United_States U.S. American Americans Spanish Mexico
GOP
Republicans
Republican
United_States
U.S.
American
Americans
Spanish
Mexico
5. Multiple similarity levels
- Head-similarity matrix SH
- Phrase-similarity matrix SP
- Core-phrase-similarity matrix SCP
- Ratio-matrix RM
2. More head modifiers
adjectival, noun, compound modifiers
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
young + citizens
MSMA 2.0: Pipeline
15
2. Forming cluster bodies
min 𝑠𝑖𝑚 𝑆𝑃
𝑚,𝑐𝑐 = 0.5
𝑐𝑐 ∈ 𝐶𝐶𝑖
𝑚
𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts
𝑪𝑩𝒊 𝑪𝑩𝒋
𝑪𝑪𝒊 𝑪𝑪𝒋
1. Identification of cluster cores
border points? noise?
0.4
𝒎
𝑐𝑏
∀𝒄𝒃 ∋ 𝑪𝑩𝒊
𝑆𝑃𝑚,𝑐𝑏 = 3
∀𝒄𝒃 ∋ 𝑪𝑩𝒋
𝑆𝑃𝑚,𝑐𝑏 = 2
𝑪𝒊 𝑪𝒋
3. Adding border points
4. Forming non-core clusters
5. Merging final clusters
𝑪𝒊 𝑪𝒋
𝑐𝑚 ∈ 𝐶𝑀
𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4
𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗
≥ log5000|𝑀|
.7
.8
.8
.8
.8
.7
𝑐𝑚
0
𝑐𝑚
1
𝑐𝑚
3
𝑐𝑚
5
𝑐𝑚
6
𝑐𝑚0
𝑐𝑚1
𝑐𝑚3
𝑐𝑚5
𝑐𝑚6
∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃
𝑚,𝑐𝑐 ≥ 0.5 and
normalized similarity to 𝐶𝐶𝑖
is larger than to 𝐶𝐶𝑗
min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and
normalized similarity to 𝐶𝐵𝑖
is larger than to 𝐶𝐵𝑗
.7
.8
.8
.8
.8
.7
𝑐0
𝑐1
𝑐3
𝑐5
𝑐6
• Use all modifiers
• On concept level
• TF-IDF-weighted concept-
similarity matrix
𝑐
0
𝑐
1
𝑐
3
𝑐
5
𝑐
6
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
Evaluation and results
16
Democrats,
Democratic leaders
Illinois Democrat
American public,
American families,
U.S. citizens,
Poor unskilled American workers
Voice of Americans
Demonstrators,
DACA protesters,
Opposition
Administration officials,
USCIS employees,
Executive authority,
DHS officials,
Chief of White House,
Acting secretary
Mexican,
Spanish,
Mexican officials
GOP senators,
Republicans,
Republican leaders,
A group of red state attorneys
European ally,
The Europeans,
European leaders,
Western European Diplomats
Israeli officials,
Israeli Ambassador,
The Israelis
Russian agents,
Russian nationals,
The Russians
caravan participants,
asylum-seeking immigrant caravan,
members of the caravan,
more than a few hundred asylum seekers,
150 migrants, many of whom were children,
asylum-seekers,
the people that are waiting outside,
these large “caravans” of people,
unauthorized immigrants,
refugees,
people traveling without documents,
a caravan of hundreds of Central Americans,
a group of about 100 people,
Central American migrants and supporters
one of the chief critics of DACA,
opponents of the policy,
some immigration critics,
immigration hard-liners,
groups who support stricter immigration controls
Indirect mentions: ORG Indirect mentions: GPEs Direct mentions
F1 Direct Indirect
CoreNLP 27.9 31.4
Hier.Clust. 37.2 29.1
EECDCR 41.6 42.6
MSMA 1.0 44.7 40.9
MSMA 2.0
ELMo
42.1 40.1
MSMA 2.0
fastText
48.3 43.6
MSMA
2.0
word2vec
48.5 44.3
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
Conclusion
17
• Bias by WCL has strong influence of the readers
• Revealing bias is a step towards mitigating it
• MSMA 1.0 & 2.0 successfully resolve biased mentions
Help social sciences with
frame analyses
Help news readers become
aware of bias in media
Newsalyze
news readers,
researchers
Help make the world a better place
Objectivity
Frame 2
Frame 1
https://github.com/fhamborg/newsalyze-backend
Soon to be publicly available
Questions
18
Contact:
Anastasia Zhukova
Zhukova@uni-wuppertal.de
@ana_m_zhukova
http://dke.uni-wuppertal.de/zhukova
Thank you for your attention!
Questions?

More Related Content

Similar to Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles

Tikko dublin ac14 [compatibiliteitsmodus]
Tikko dublin ac14 [compatibiliteitsmodus]Tikko dublin ac14 [compatibiliteitsmodus]
Tikko dublin ac14 [compatibiliteitsmodus]
European Journalism Training Association
 
UROP Poster
UROP Poster UROP Poster
UROP Poster
Christopher Ferrill
 
Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...
Lviv Startup Club
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
Liliana Bounegru
 
AI-generated news and misinformation during elections
AI-generated news and misinformation during electionsAI-generated news and misinformation during elections
AI-generated news and misinformation during elections
Paige Morrow
 
Ethical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systemsEthical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systems
Dr. Kim (Kyllesbech Larsen)
 
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
ALexandruDaia1
 
Each question should be done on a separate word document, with refer
Each question should be done on a separate word document, with referEach question should be done on a separate word document, with refer
Each question should be done on a separate word document, with refer
wildmandelorse
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
g8briel
 
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
Liliana Bounegru
 
Human Rights Council Study Guide
Human Rights Council Study GuideHuman Rights Council Study Guide
Human Rights Council Study Guide
dudasings
 
Attitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Attitudes Of Second Year Computer Science Undergraduates Toward PlagiarismAttitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Attitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Sean Flores
 
Mark Van't Hooft, Kent State University
Mark Van't Hooft, Kent State UniversityMark Van't Hooft, Kent State University
Mark Van't Hooft, Kent State University
HandheldLearning
 
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)
Vladimir Kanchev
 
In your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docxIn your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docx
annettsparrow
 
Matt sadler infomagination
Matt sadler infomaginationMatt sadler infomagination
Matt sadler infomagination
mattsadler
 
Era of Sociology News Rumors News Detection using Machine Learning
Era of Sociology News Rumors News Detection using Machine LearningEra of Sociology News Rumors News Detection using Machine Learning
Era of Sociology News Rumors News Detection using Machine Learning
ijtsrd
 
What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
Anastasia Zhukova
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
Liliana Bounegru
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
Jonathan Gray
 

Similar to Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles (20)

Tikko dublin ac14 [compatibiliteitsmodus]
Tikko dublin ac14 [compatibiliteitsmodus]Tikko dublin ac14 [compatibiliteitsmodus]
Tikko dublin ac14 [compatibiliteitsmodus]
 
UROP Poster
UROP Poster UROP Poster
UROP Poster
 
Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
 
AI-generated news and misinformation during elections
AI-generated news and misinformation during electionsAI-generated news and misinformation during elections
AI-generated news and misinformation during elections
 
Ethical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systemsEthical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systems
 
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
 
Each question should be done on a separate word document, with refer
Each question should be done on a separate word document, with referEach question should be done on a separate word document, with refer
Each question should be done on a separate word document, with refer
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
 
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
 
Human Rights Council Study Guide
Human Rights Council Study GuideHuman Rights Council Study Guide
Human Rights Council Study Guide
 
Attitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Attitudes Of Second Year Computer Science Undergraduates Toward PlagiarismAttitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Attitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
 
Mark Van't Hooft, Kent State University
Mark Van't Hooft, Kent State UniversityMark Van't Hooft, Kent State University
Mark Van't Hooft, Kent State University
 
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)
 
In your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docxIn your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docx
 
Matt sadler infomagination
Matt sadler infomaginationMatt sadler infomagination
Matt sadler infomagination
 
Era of Sociology News Rumors News Detection using Machine Learning
Era of Sociology News Rumors News Detection using Machine LearningEra of Sociology News Rumors News Detection using Machine Learning
Era of Sociology News Rumors News Detection using Machine Learning
 
What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 

More from Anastasia Zhukova

Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Anastasia Zhukova
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Anastasia Zhukova
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
Anastasia Zhukova
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Anastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Anastasia Zhukova
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
Anastasia Zhukova
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
Anastasia Zhukova
 

More from Anastasia Zhukova (10)

Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Recently uploaded

Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 

Recently uploaded (20)

Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 

Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles

  • 1. Automated Identification of Media Bias by Word Choice and Labeling in News Articles Anastasia Zhukova Data & Knowledge Engineering Group 29.09.2020
  • 2. Short CV 2 2008 – 2014 Information technology, M.Eng. Moscow Aviation Institute 2015 – 2019 Computer and Information Science, M. Sc. University of Konstanz 2018 Graduate Student Researcher Natural Language Processing group National Institute of Informatics 2019 – present Doctoral Researcher, Ph.D. Candidate Data & Knowledge Engineering group University of Wuppertal
  • 4. Agenda 4 • Background • Methodology • Results • Conclusion https://tgram.ru/channels/otsuka_bld
  • 5. Media Bias Model 5 Ideological View Target Audience Owners Advertisers Business Interest Funding ... Political Interest Reputation ... Gathering Writing Editing News Reality News Event Perception Consumers News Production and Consumption Process Presentation Style • Placement • Size Allocation • Picture Selection • Picture Explanation Writing Style • Labeling • Word choice Fact Selection • Event Selection • Source Selection • Commission • Omission Political View Consumer Context • Background Knowledge • Attitude • Social Status • Country Spin Government Reasons Process Forms an arrogant person Word Choice (WC) Labeling (L) a genius F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018 a smart person
  • 6. WCL problem 6 Word choice & labeling… • strongly impacts the public perception of news topics • disturbs decision making process • leads to false information propagation Hurricane Katrina, 2005 F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 7. Social science background 7 Content analysis “What to think about” Frame analysis “How to think about it” Event-related articles Putin president savior tyrant humble man thief Cross-document coreference resolution president savior Putin tyrant humble man thief Sentiment analysis president savior tyrant humble man thief Putin war sanctions Crimea army Candidate extraction Social sciences Computer science Identified actors, actions, events, concepts, etc. Concept polarization Content analysis Cross-document coreference resolution F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019 A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
  • 8. Research question 8 Given - No training set - A set of event-related articles - Extracted candidate phrases of groups of persons Goal - Find phrases referring to the same concepts - Use only phrases themselves, i.e., no context information - Exploratory unsupervised task illegal aliens undocumented immigrants Directly referring mentions White House officials American authorities Indirectly referring mentions How can an automated approach identify instances of bias by word choice and labeling in the concepts (in)directly referring to groups of people in a set of English news articles reporting on the same event?these instances? A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  • 9. Multi-step merging approach (MSMA) 1.0 9 Corefs. & NPs ↓ number of mentions … … Extraction of a specific attribute … recursion Pairwise comparison & merging … “Winner takes it all” strategy F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 10. Merge using similar heads 10 young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants illegal aliens who were brought as children nearly 800,000 illegal aliens illegal aliens young illegal aliens headsets {illegals} {immigrants} {aliens} similar in the vector space Entity 1 Entity 2 Entity 3 the word alone is related to the UFO; it will be merged later as “illegal alien” at the third step Merge entities young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants headsets F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 11. Merge using representative phrases 11 A1: young immigrants, A2: illegal immigrants, A3: young illegals young immigrants, undocumented immigrants, illegal immigrants, young illegals, endangered immigrants, additional illegals young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Entity 1 Entity 2 Merged entities Representative labeling phrases B1: young people, B2: foreign people young people, foreign people, bad people, estimated people Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 3 2×3 ≥ 0.3 → similar in the vector space young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Representative labeling phrases F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 12. MSMA 1.0 evaluation 12 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% Init. Step 1 Step 2 Step 3 Step 4 F1 of concept types Actor Country Misc Group Core modifiers Core meaning Evaluation of the simplified version of NewsWCL50 annotation. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 13. MSMA 1.0 Drawbacks 13 • Overparametrization • Lack of stability – small variation in wording affected results • Few head modifiers used – only adjectives • Frequently falsely merged concepts – American people – young immigrant people – Chinese officials – American officials • Low recall & low precision – smaller related entities remain unmerged – unrelated entities are merged • “Winner takes it all” strategy is not optimal Problems of MSMA 1.0 Goals of MSMA 2.0 • Self-controlled merging • Default set of parameters for all datasets • Stable performance in case of added phrases • Use all head’s modifiers • Keep concepts fine-grained • Improve merging related smaller entities Same challenge: unsupervised learning, no training set A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  • 14. MSMA 2.0: Preprocessing 14 non-NE persons NE (ORG) persons core mentions non-NE group non-NE person ORG person generalizing mentions specializing mentions Republican establishment GOP leaders, Republicans a red attorney general, a Republican Americans U.S. citizens U.S. + citizens 2*U.S. + citizens young young + 2*U.S. + citizens 1. Concept’s sub-type prioritization 4. Weighting of the NE components 3. NE-grid: operation restriction or similarity amplification immigrants young + immigrants GOP Republicans Republican United_States U.S. American Americans Spanish Mexico GOP Republicans Republican United_States U.S. American Americans Spanish Mexico 5. Multiple similarity levels - Head-similarity matrix SH - Phrase-similarity matrix SP - Core-phrase-similarity matrix SCP - Ratio-matrix RM 2. More head modifiers adjectival, noun, compound modifiers A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020 young + citizens
  • 15. MSMA 2.0: Pipeline 15 2. Forming cluster bodies min 𝑠𝑖𝑚 𝑆𝑃 𝑚,𝑐𝑐 = 0.5 𝑐𝑐 ∈ 𝐶𝐶𝑖 𝑚 𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts 𝑪𝑩𝒊 𝑪𝑩𝒋 𝑪𝑪𝒊 𝑪𝑪𝒋 1. Identification of cluster cores border points? noise? 0.4 𝒎 𝑐𝑏 ∀𝒄𝒃 ∋ 𝑪𝑩𝒊 𝑆𝑃𝑚,𝑐𝑏 = 3 ∀𝒄𝒃 ∋ 𝑪𝑩𝒋 𝑆𝑃𝑚,𝑐𝑏 = 2 𝑪𝒊 𝑪𝒋 3. Adding border points 4. Forming non-core clusters 5. Merging final clusters 𝑪𝒊 𝑪𝒋 𝑐𝑚 ∈ 𝐶𝑀 𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ 0.4 𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ log5000|𝑀| .7 .8 .8 .8 .8 .7 𝑐𝑚 0 𝑐𝑚 1 𝑐𝑚 3 𝑐𝑚 5 𝑐𝑚 6 𝑐𝑚0 𝑐𝑚1 𝑐𝑚3 𝑐𝑚5 𝑐𝑚6 ∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃 𝑚,𝑐𝑐 ≥ 0.5 and normalized similarity to 𝐶𝐶𝑖 is larger than to 𝐶𝐶𝑗 min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and normalized similarity to 𝐶𝐵𝑖 is larger than to 𝐶𝐵𝑗 .7 .8 .8 .8 .8 .7 𝑐0 𝑐1 𝑐3 𝑐5 𝑐6 • Use all modifiers • On concept level • TF-IDF-weighted concept- similarity matrix 𝑐 0 𝑐 1 𝑐 3 𝑐 5 𝑐 6 A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  • 16. Evaluation and results 16 Democrats, Democratic leaders Illinois Democrat American public, American families, U.S. citizens, Poor unskilled American workers Voice of Americans Demonstrators, DACA protesters, Opposition Administration officials, USCIS employees, Executive authority, DHS officials, Chief of White House, Acting secretary Mexican, Spanish, Mexican officials GOP senators, Republicans, Republican leaders, A group of red state attorneys European ally, The Europeans, European leaders, Western European Diplomats Israeli officials, Israeli Ambassador, The Israelis Russian agents, Russian nationals, The Russians caravan participants, asylum-seeking immigrant caravan, members of the caravan, more than a few hundred asylum seekers, 150 migrants, many of whom were children, asylum-seekers, the people that are waiting outside, these large “caravans” of people, unauthorized immigrants, refugees, people traveling without documents, a caravan of hundreds of Central Americans, a group of about 100 people, Central American migrants and supporters one of the chief critics of DACA, opponents of the policy, some immigration critics, immigration hard-liners, groups who support stricter immigration controls Indirect mentions: ORG Indirect mentions: GPEs Direct mentions F1 Direct Indirect CoreNLP 27.9 31.4 Hier.Clust. 37.2 29.1 EECDCR 41.6 42.6 MSMA 1.0 44.7 40.9 MSMA 2.0 ELMo 42.1 40.1 MSMA 2.0 fastText 48.3 43.6 MSMA 2.0 word2vec 48.5 44.3 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
  • 17. Conclusion 17 • Bias by WCL has strong influence of the readers • Revealing bias is a step towards mitigating it • MSMA 1.0 & 2.0 successfully resolve biased mentions Help social sciences with frame analyses Help news readers become aware of bias in media Newsalyze news readers, researchers Help make the world a better place Objectivity Frame 2 Frame 1 https://github.com/fhamborg/newsalyze-backend Soon to be publicly available