SlideShare a Scribd company logo
1 of 18
Download to read offline
Automated
Identification of Media
Bias by Word Choice
and Labeling in News
Articles
Anastasia Zhukova
Data & Knowledge
Engineering Group
29.09.2020
Short CV
2
2008 – 2014
Information technology, M.Eng.
Moscow Aviation Institute
2015 – 2019
Computer and Information Science, M. Sc.
University of Konstanz
2018
Graduate Student Researcher
Natural Language Processing group
National Institute of Informatics
2019 – present
Doctoral Researcher, Ph.D. Candidate
Data & Knowledge Engineering group
University of Wuppertal
Motivation
3
https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288
https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus
Agenda
4
• Background
• Methodology
• Results
• Conclusion
https://tgram.ru/channels/otsuka_bld
Media Bias Model
5
Ideological
View
Target
Audience
Owners Advertisers
Business Interest
Funding
...
Political Interest
Reputation
...
Gathering
Writing
Editing
News
Reality
News
Event
Perception
Consumers
News Production and Consumption Process
Presentation Style
• Placement
• Size Allocation
• Picture Selection
• Picture Explanation
Writing Style
• Labeling
• Word choice
Fact Selection
• Event Selection
• Source Selection
• Commission
• Omission
Political
View
Consumer Context
• Background Knowledge
• Attitude
• Social Status
• Country
Spin
Government
Reasons
Process
Forms
an arrogant person
Word Choice (WC)
Labeling (L)
a genius
F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018
a smart person
WCL problem
6
Word choice & labeling…
• strongly impacts the
public perception of news topics
• disturbs decision making process
• leads to false information propagation
Hurricane Katrina, 2005
F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
Social science background
7
Content analysis
“What to think about”
Frame analysis
“How to think about it”
Event-related articles
Putin
president
savior
tyrant
humble man
thief
Cross-document coreference resolution
president
savior
Putin
tyrant
humble man
thief
Sentiment analysis
president
savior
tyrant
humble man
thief
Putin
war
sanctions Crimea
army
Candidate extraction
Social
sciences
Computer
science
Identified actors, actions, events, concepts, etc. Concept polarization
Content analysis
Cross-document coreference resolution
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
Research question
8
Given
- No training set
- A set of event-related articles
- Extracted candidate phrases of groups of persons
Goal
- Find phrases referring to the same concepts
- Use only phrases themselves, i.e., no context information
- Exploratory unsupervised task
illegal aliens
undocumented immigrants
Directly referring mentions
White House officials
American authorities
Indirectly referring mentions
How can an automated approach identify
instances of bias by word choice and labeling
in the concepts (in)directly referring to groups of people
in a set of English news articles reporting on the same event?these
instances?
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
Multi-step merging approach (MSMA) 1.0
9
Corefs. & NPs ↓ number of mentions
…
…
Extraction of a
specific
attribute
…
recursion
Pairwise
comparison &
merging
…
“Winner takes it all” strategy
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
Merge using similar heads
10
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
illegal aliens who were brought as children
nearly 800,000 illegal aliens
illegal aliens
young illegal aliens
headsets
{illegals} {immigrants} {aliens}
similar in the vector space
Entity 1 Entity 2 Entity 3
the word alone is related to
the UFO; it will be merged
later as “illegal alien” at the
third step
Merge entities
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
headsets
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
Merge using representative phrases
11
A1: young immigrants,
A2: illegal immigrants,
A3: young illegals
young immigrants,
undocumented immigrants,
illegal immigrants,
young illegals,
endangered immigrants,
additional illegals
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Entity 1 Entity 2
Merged entities
Representative
labeling
phrases
B1: young people,
B2: foreign people
young people,
foreign people,
bad people,
estimated people
Sim.matrix
A1
A2
A3
B1 B2
1
1
1
0
0
0
3
2×3
≥ 0.3 → similar in the vector space
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Representative
labeling
phrases
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
MSMA 1.0 evaluation
12
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Init. Step 1 Step 2 Step 3 Step 4
F1 of concept types
Actor Country Misc Group
Core modifiers
Core meaning
Evaluation of the simplified version of NewsWCL50 annotation.
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
MSMA 1.0 Drawbacks
13
• Overparametrization
• Lack of stability
– small variation in wording affected results
• Few head modifiers used
– only adjectives
• Frequently falsely merged concepts
– American people – young immigrant people
– Chinese officials – American officials
• Low recall & low precision
– smaller related entities remain unmerged
– unrelated entities are merged
• “Winner takes it all” strategy is not optimal
Problems of MSMA 1.0 Goals of MSMA 2.0
• Self-controlled merging
• Default set of parameters for all datasets
• Stable performance in case of added phrases
• Use all head’s modifiers
• Keep concepts fine-grained
• Improve merging related smaller entities
Same challenge: unsupervised learning,
no training set
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
MSMA 2.0: Preprocessing
14
non-NE persons
NE (ORG) persons
core mentions
non-NE group
non-NE person
ORG person
generalizing mentions
specializing mentions
Republican establishment
GOP leaders,
Republicans
a red attorney general,
a Republican
Americans
U.S.
citizens
U.S. + citizens
2*U.S. + citizens
young
young + 2*U.S. + citizens
1. Concept’s sub-type prioritization 4. Weighting of the NE components
3. NE-grid: operation restriction or similarity amplification
immigrants
young + immigrants
GOP Republicans Republican United_States U.S. American Americans Spanish Mexico
GOP
Republicans
Republican
United_States
U.S.
American
Americans
Spanish
Mexico
5. Multiple similarity levels
- Head-similarity matrix SH
- Phrase-similarity matrix SP
- Core-phrase-similarity matrix SCP
- Ratio-matrix RM
2. More head modifiers
adjectival, noun, compound modifiers
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
young + citizens
MSMA 2.0: Pipeline
15
2. Forming cluster bodies
min 𝑠𝑖𝑚 𝑆𝑃
𝑚,𝑐𝑐 = 0.5
𝑐𝑐 ∈ 𝐶𝐶𝑖
𝑚
𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts
𝑪𝑩𝒊 𝑪𝑩𝒋
𝑪𝑪𝒊 𝑪𝑪𝒋
1. Identification of cluster cores
border points? noise?
0.4
𝒎
𝑐𝑏
∀𝒄𝒃 ∋ 𝑪𝑩𝒊
𝑆𝑃𝑚,𝑐𝑏 = 3
∀𝒄𝒃 ∋ 𝑪𝑩𝒋
𝑆𝑃𝑚,𝑐𝑏 = 2
𝑪𝒊 𝑪𝒋
3. Adding border points
4. Forming non-core clusters
5. Merging final clusters
𝑪𝒊 𝑪𝒋
𝑐𝑚 ∈ 𝐶𝑀
𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4
𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗
≥ log5000|𝑀|
.7
.8
.8
.8
.8
.7
𝑐𝑚
0
𝑐𝑚
1
𝑐𝑚
3
𝑐𝑚
5
𝑐𝑚
6
𝑐𝑚0
𝑐𝑚1
𝑐𝑚3
𝑐𝑚5
𝑐𝑚6
∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃
𝑚,𝑐𝑐 ≥ 0.5 and
normalized similarity to 𝐶𝐶𝑖
is larger than to 𝐶𝐶𝑗
min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and
normalized similarity to 𝐶𝐵𝑖
is larger than to 𝐶𝐵𝑗
.7
.8
.8
.8
.8
.7
𝑐0
𝑐1
𝑐3
𝑐5
𝑐6
• Use all modifiers
• On concept level
• TF-IDF-weighted concept-
similarity matrix
𝑐
0
𝑐
1
𝑐
3
𝑐
5
𝑐
6
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
Evaluation and results
16
Democrats,
Democratic leaders
Illinois Democrat
American public,
American families,
U.S. citizens,
Poor unskilled American workers
Voice of Americans
Demonstrators,
DACA protesters,
Opposition
Administration officials,
USCIS employees,
Executive authority,
DHS officials,
Chief of White House,
Acting secretary
Mexican,
Spanish,
Mexican officials
GOP senators,
Republicans,
Republican leaders,
A group of red state attorneys
European ally,
The Europeans,
European leaders,
Western European Diplomats
Israeli officials,
Israeli Ambassador,
The Israelis
Russian agents,
Russian nationals,
The Russians
caravan participants,
asylum-seeking immigrant caravan,
members of the caravan,
more than a few hundred asylum seekers,
150 migrants, many of whom were children,
asylum-seekers,
the people that are waiting outside,
these large “caravans” of people,
unauthorized immigrants,
refugees,
people traveling without documents,
a caravan of hundreds of Central Americans,
a group of about 100 people,
Central American migrants and supporters
one of the chief critics of DACA,
opponents of the policy,
some immigration critics,
immigration hard-liners,
groups who support stricter immigration controls
Indirect mentions: ORG Indirect mentions: GPEs Direct mentions
F1 Direct Indirect
CoreNLP 27.9 31.4
Hier.Clust. 37.2 29.1
EECDCR 41.6 42.6
MSMA 1.0 44.7 40.9
MSMA 2.0
ELMo
42.1 40.1
MSMA 2.0
fastText
48.3 43.6
MSMA
2.0
word2vec
48.5 44.3
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
Conclusion
17
• Bias by WCL has strong influence of the readers
• Revealing bias is a step towards mitigating it
• MSMA 1.0 & 2.0 successfully resolve biased mentions
Help social sciences with
frame analyses
Help news readers become
aware of bias in media
Newsalyze
news readers,
researchers
Help make the world a better place
Objectivity
Frame 2
Frame 1
https://github.com/fhamborg/newsalyze-backend
Soon to be publicly available
Questions
18
Contact:
Anastasia Zhukova
Zhukova@uni-wuppertal.de
@ana_m_zhukova
http://dke.uni-wuppertal.de/zhukova
Thank you for your attention!
Questions?

More Related Content

Similar to Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles

Each question should be done on a separate word document, with refer
Each question should be done on a separate word document, with referEach question should be done on a separate word document, with refer
Each question should be done on a separate word document, with refer
wildmandelorse
 
In your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docxIn your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docx
annettsparrow
 

Similar to Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles (20)

Tikko dublin ac14 [compatibiliteitsmodus]
Tikko dublin ac14 [compatibiliteitsmodus]Tikko dublin ac14 [compatibiliteitsmodus]
Tikko dublin ac14 [compatibiliteitsmodus]
 
UROP Poster
UROP Poster UROP Poster
UROP Poster
 
Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...Ramon van den Akker. Fairness of machine learning models an overview and prac...
Ramon van den Akker. Fairness of machine learning models an overview and prac...
 
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...
 
AI-generated news and misinformation during elections
AI-generated news and misinformation during electionsAI-generated news and misinformation during elections
AI-generated news and misinformation during elections
 
Ethical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systemsEthical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systems
 
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
 
Each question should be done on a separate word document, with refer
Each question should be done on a separate word document, with referEach question should be done on a separate word document, with refer
Each question should be done on a separate word document, with refer
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
 
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...What Actor-Network Theory (ANT) and digital methods can do for data journalis...
What Actor-Network Theory (ANT) and digital methods can do for data journalis...
 
Human Rights Council Study Guide
Human Rights Council Study GuideHuman Rights Council Study Guide
Human Rights Council Study Guide
 
Attitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Attitudes Of Second Year Computer Science Undergraduates Toward PlagiarismAttitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
Attitudes Of Second Year Computer Science Undergraduates Toward Plagiarism
 
Mark Van't Hooft, Kent State University
Mark Van't Hooft, Kent State UniversityMark Van't Hooft, Kent State University
Mark Van't Hooft, Kent State University
 
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)
 
In your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docxIn your responses, review at least one of the articles provided by y.docx
In your responses, review at least one of the articles provided by y.docx
 
Matt sadler infomagination
Matt sadler infomaginationMatt sadler infomagination
Matt sadler infomagination
 
Era of Sociology News Rumors News Detection using Machine Learning
Era of Sociology News Rumors News Detection using Machine LearningEra of Sociology News Rumors News Detection using Machine Learning
Era of Sociology News Rumors News Detection using Machine Learning
 
What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 

More from Anastasia Zhukova

M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Anastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Anastasia Zhukova
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
Anastasia Zhukova
 

More from Anastasia Zhukova (10)

Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Recently uploaded

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 

Recently uploaded (20)

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 

Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles

  • 1. Automated Identification of Media Bias by Word Choice and Labeling in News Articles Anastasia Zhukova Data & Knowledge Engineering Group 29.09.2020
  • 2. Short CV 2 2008 – 2014 Information technology, M.Eng. Moscow Aviation Institute 2015 – 2019 Computer and Information Science, M. Sc. University of Konstanz 2018 Graduate Student Researcher Natural Language Processing group National Institute of Informatics 2019 – present Doctoral Researcher, Ph.D. Candidate Data & Knowledge Engineering group University of Wuppertal
  • 4. Agenda 4 • Background • Methodology • Results • Conclusion https://tgram.ru/channels/otsuka_bld
  • 5. Media Bias Model 5 Ideological View Target Audience Owners Advertisers Business Interest Funding ... Political Interest Reputation ... Gathering Writing Editing News Reality News Event Perception Consumers News Production and Consumption Process Presentation Style • Placement • Size Allocation • Picture Selection • Picture Explanation Writing Style • Labeling • Word choice Fact Selection • Event Selection • Source Selection • Commission • Omission Political View Consumer Context • Background Knowledge • Attitude • Social Status • Country Spin Government Reasons Process Forms an arrogant person Word Choice (WC) Labeling (L) a genius F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018 a smart person
  • 6. WCL problem 6 Word choice & labeling… • strongly impacts the public perception of news topics • disturbs decision making process • leads to false information propagation Hurricane Katrina, 2005 F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 7. Social science background 7 Content analysis “What to think about” Frame analysis “How to think about it” Event-related articles Putin president savior tyrant humble man thief Cross-document coreference resolution president savior Putin tyrant humble man thief Sentiment analysis president savior tyrant humble man thief Putin war sanctions Crimea army Candidate extraction Social sciences Computer science Identified actors, actions, events, concepts, etc. Concept polarization Content analysis Cross-document coreference resolution F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019 A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
  • 8. Research question 8 Given - No training set - A set of event-related articles - Extracted candidate phrases of groups of persons Goal - Find phrases referring to the same concepts - Use only phrases themselves, i.e., no context information - Exploratory unsupervised task illegal aliens undocumented immigrants Directly referring mentions White House officials American authorities Indirectly referring mentions How can an automated approach identify instances of bias by word choice and labeling in the concepts (in)directly referring to groups of people in a set of English news articles reporting on the same event?these instances? A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  • 9. Multi-step merging approach (MSMA) 1.0 9 Corefs. & NPs ↓ number of mentions … … Extraction of a specific attribute … recursion Pairwise comparison & merging … “Winner takes it all” strategy F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 10. Merge using similar heads 10 young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants illegal aliens who were brought as children nearly 800,000 illegal aliens illegal aliens young illegal aliens headsets {illegals} {immigrants} {aliens} similar in the vector space Entity 1 Entity 2 Entity 3 the word alone is related to the UFO; it will be merged later as “illegal alien” at the third step Merge entities young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants headsets F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 11. Merge using representative phrases 11 A1: young immigrants, A2: illegal immigrants, A3: young illegals young immigrants, undocumented immigrants, illegal immigrants, young illegals, endangered immigrants, additional illegals young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Entity 1 Entity 2 Merged entities Representative labeling phrases B1: young people, B2: foreign people young people, foreign people, bad people, estimated people Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 3 2×3 ≥ 0.3 → similar in the vector space young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Representative labeling phrases F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 12. MSMA 1.0 evaluation 12 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% Init. Step 1 Step 2 Step 3 Step 4 F1 of concept types Actor Country Misc Group Core modifiers Core meaning Evaluation of the simplified version of NewsWCL50 annotation. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  • 13. MSMA 1.0 Drawbacks 13 • Overparametrization • Lack of stability – small variation in wording affected results • Few head modifiers used – only adjectives • Frequently falsely merged concepts – American people – young immigrant people – Chinese officials – American officials • Low recall & low precision – smaller related entities remain unmerged – unrelated entities are merged • “Winner takes it all” strategy is not optimal Problems of MSMA 1.0 Goals of MSMA 2.0 • Self-controlled merging • Default set of parameters for all datasets • Stable performance in case of added phrases • Use all head’s modifiers • Keep concepts fine-grained • Improve merging related smaller entities Same challenge: unsupervised learning, no training set A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  • 14. MSMA 2.0: Preprocessing 14 non-NE persons NE (ORG) persons core mentions non-NE group non-NE person ORG person generalizing mentions specializing mentions Republican establishment GOP leaders, Republicans a red attorney general, a Republican Americans U.S. citizens U.S. + citizens 2*U.S. + citizens young young + 2*U.S. + citizens 1. Concept’s sub-type prioritization 4. Weighting of the NE components 3. NE-grid: operation restriction or similarity amplification immigrants young + immigrants GOP Republicans Republican United_States U.S. American Americans Spanish Mexico GOP Republicans Republican United_States U.S. American Americans Spanish Mexico 5. Multiple similarity levels - Head-similarity matrix SH - Phrase-similarity matrix SP - Core-phrase-similarity matrix SCP - Ratio-matrix RM 2. More head modifiers adjectival, noun, compound modifiers A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020 young + citizens
  • 15. MSMA 2.0: Pipeline 15 2. Forming cluster bodies min 𝑠𝑖𝑚 𝑆𝑃 𝑚,𝑐𝑐 = 0.5 𝑐𝑐 ∈ 𝐶𝐶𝑖 𝑚 𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts 𝑪𝑩𝒊 𝑪𝑩𝒋 𝑪𝑪𝒊 𝑪𝑪𝒋 1. Identification of cluster cores border points? noise? 0.4 𝒎 𝑐𝑏 ∀𝒄𝒃 ∋ 𝑪𝑩𝒊 𝑆𝑃𝑚,𝑐𝑏 = 3 ∀𝒄𝒃 ∋ 𝑪𝑩𝒋 𝑆𝑃𝑚,𝑐𝑏 = 2 𝑪𝒊 𝑪𝒋 3. Adding border points 4. Forming non-core clusters 5. Merging final clusters 𝑪𝒊 𝑪𝒋 𝑐𝑚 ∈ 𝐶𝑀 𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ 0.4 𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ log5000|𝑀| .7 .8 .8 .8 .8 .7 𝑐𝑚 0 𝑐𝑚 1 𝑐𝑚 3 𝑐𝑚 5 𝑐𝑚 6 𝑐𝑚0 𝑐𝑚1 𝑐𝑚3 𝑐𝑚5 𝑐𝑚6 ∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃 𝑚,𝑐𝑐 ≥ 0.5 and normalized similarity to 𝐶𝐶𝑖 is larger than to 𝐶𝐶𝑗 min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and normalized similarity to 𝐶𝐵𝑖 is larger than to 𝐶𝐵𝑗 .7 .8 .8 .8 .8 .7 𝑐0 𝑐1 𝑐3 𝑐5 𝑐6 • Use all modifiers • On concept level • TF-IDF-weighted concept- similarity matrix 𝑐 0 𝑐 1 𝑐 3 𝑐 5 𝑐 6 A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  • 16. Evaluation and results 16 Democrats, Democratic leaders Illinois Democrat American public, American families, U.S. citizens, Poor unskilled American workers Voice of Americans Demonstrators, DACA protesters, Opposition Administration officials, USCIS employees, Executive authority, DHS officials, Chief of White House, Acting secretary Mexican, Spanish, Mexican officials GOP senators, Republicans, Republican leaders, A group of red state attorneys European ally, The Europeans, European leaders, Western European Diplomats Israeli officials, Israeli Ambassador, The Israelis Russian agents, Russian nationals, The Russians caravan participants, asylum-seeking immigrant caravan, members of the caravan, more than a few hundred asylum seekers, 150 migrants, many of whom were children, asylum-seekers, the people that are waiting outside, these large “caravans” of people, unauthorized immigrants, refugees, people traveling without documents, a caravan of hundreds of Central Americans, a group of about 100 people, Central American migrants and supporters one of the chief critics of DACA, opponents of the policy, some immigration critics, immigration hard-liners, groups who support stricter immigration controls Indirect mentions: ORG Indirect mentions: GPEs Direct mentions F1 Direct Indirect CoreNLP 27.9 31.4 Hier.Clust. 37.2 29.1 EECDCR 41.6 42.6 MSMA 1.0 44.7 40.9 MSMA 2.0 ELMo 42.1 40.1 MSMA 2.0 fastText 48.3 43.6 MSMA 2.0 word2vec 48.5 44.3 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
  • 17. Conclusion 17 • Bias by WCL has strong influence of the readers • Revealing bias is a step towards mitigating it • MSMA 1.0 & 2.0 successfully resolve biased mentions Help social sciences with frame analyses Help news readers become aware of bias in media Newsalyze news readers, researchers Help make the world a better place Objectivity Frame 2 Frame 1 https://github.com/fhamborg/newsalyze-backend Soon to be publicly available