2. Short CV
2
2008 – 2014
Information technology, M.Eng.
Moscow Aviation Institute
2015 – 2019
Computer and Information Science, M. Sc.
University of Konstanz
2018
Graduate Student Researcher
Natural Language Processing group
National Institute of Informatics
2019 – present
Doctoral Researcher, Ph.D. Candidate
Data & Knowledge Engineering group
University of Wuppertal
5. Media Bias Model
5
Ideological
View
Target
Audience
Owners Advertisers
Business Interest
Funding
...
Political Interest
Reputation
...
Gathering
Writing
Editing
News
Reality
News
Event
Perception
Consumers
News Production and Consumption Process
Presentation Style
• Placement
• Size Allocation
• Picture Selection
• Picture Explanation
Writing Style
• Labeling
• Word choice
Fact Selection
• Event Selection
• Source Selection
• Commission
• Omission
Political
View
Consumer Context
• Background Knowledge
• Attitude
• Social Status
• Country
Spin
Government
Reasons
Process
Forms
an arrogant person
Word Choice (WC)
Labeling (L)
a genius
F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018
a smart person
6. WCL problem
6
Word choice & labeling…
• strongly impacts the
public perception of news topics
• disturbs decision making process
• leads to false information propagation
Hurricane Katrina, 2005
F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
7. Social science background
7
Content analysis
“What to think about”
Frame analysis
“How to think about it”
Event-related articles
Putin
president
savior
tyrant
humble man
thief
Cross-document coreference resolution
president
savior
Putin
tyrant
humble man
thief
Sentiment analysis
president
savior
tyrant
humble man
thief
Putin
war
sanctions Crimea
army
Candidate extraction
Social
sciences
Computer
science
Identified actors, actions, events, concepts, etc. Concept polarization
Content analysis
Cross-document coreference resolution
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
8. Research question
8
Given
- No training set
- A set of event-related articles
- Extracted candidate phrases of groups of persons
Goal
- Find phrases referring to the same concepts
- Use only phrases themselves, i.e., no context information
- Exploratory unsupervised task
illegal aliens
undocumented immigrants
Directly referring mentions
White House officials
American authorities
Indirectly referring mentions
How can an automated approach identify
instances of bias by word choice and labeling
in the concepts (in)directly referring to groups of people
in a set of English news articles reporting on the same event?these
instances?
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
9. Multi-step merging approach (MSMA) 1.0
9
Corefs. & NPs ↓ number of mentions
…
…
Extraction of a
specific
attribute
…
recursion
Pairwise
comparison &
merging
…
“Winner takes it all” strategy
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
10. Merge using similar heads
10
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
illegal aliens who were brought as children
nearly 800,000 illegal aliens
illegal aliens
young illegal aliens
headsets
{illegals} {immigrants} {aliens}
similar in the vector space
Entity 1 Entity 2 Entity 3
the word alone is related to
the UFO; it will be merged
later as “illegal alien” at the
third step
Merge entities
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
headsets
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
11. Merge using representative phrases
11
A1: young immigrants,
A2: illegal immigrants,
A3: young illegals
young immigrants,
undocumented immigrants,
illegal immigrants,
young illegals,
endangered immigrants,
additional illegals
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Entity 1 Entity 2
Merged entities
Representative
labeling
phrases
B1: young people,
B2: foreign people
young people,
foreign people,
bad people,
estimated people
Sim.matrix
A1
A2
A3
B1 B2
1
1
1
0
0
0
3
2×3
≥ 0.3 → similar in the vector space
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Representative
labeling
phrases
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
12. MSMA 1.0 evaluation
12
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Init. Step 1 Step 2 Step 3 Step 4
F1 of concept types
Actor Country Misc Group
Core modifiers
Core meaning
Evaluation of the simplified version of NewsWCL50 annotation.
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
13. MSMA 1.0 Drawbacks
13
• Overparametrization
• Lack of stability
– small variation in wording affected results
• Few head modifiers used
– only adjectives
• Frequently falsely merged concepts
– American people – young immigrant people
– Chinese officials – American officials
• Low recall & low precision
– smaller related entities remain unmerged
– unrelated entities are merged
• “Winner takes it all” strategy is not optimal
Problems of MSMA 1.0 Goals of MSMA 2.0
• Self-controlled merging
• Default set of parameters for all datasets
• Stable performance in case of added phrases
• Use all head’s modifiers
• Keep concepts fine-grained
• Improve merging related smaller entities
Same challenge: unsupervised learning,
no training set
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
14. MSMA 2.0: Preprocessing
14
non-NE persons
NE (ORG) persons
core mentions
non-NE group
non-NE person
ORG person
generalizing mentions
specializing mentions
Republican establishment
GOP leaders,
Republicans
a red attorney general,
a Republican
Americans
U.S.
citizens
U.S. + citizens
2*U.S. + citizens
young
young + 2*U.S. + citizens
1. Concept’s sub-type prioritization 4. Weighting of the NE components
3. NE-grid: operation restriction or similarity amplification
immigrants
young + immigrants
GOP Republicans Republican United_States U.S. American Americans Spanish Mexico
GOP
Republicans
Republican
United_States
U.S.
American
Americans
Spanish
Mexico
5. Multiple similarity levels
- Head-similarity matrix SH
- Phrase-similarity matrix SP
- Core-phrase-similarity matrix SCP
- Ratio-matrix RM
2. More head modifiers
adjectival, noun, compound modifiers
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
young + citizens
15. MSMA 2.0: Pipeline
15
2. Forming cluster bodies
min 𝑠𝑖𝑚 𝑆𝑃
𝑚,𝑐𝑐 = 0.5
𝑐𝑐 ∈ 𝐶𝐶𝑖
𝑚
𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts
𝑪𝑩𝒊 𝑪𝑩𝒋
𝑪𝑪𝒊 𝑪𝑪𝒋
1. Identification of cluster cores
border points? noise?
0.4
𝒎
𝑐𝑏
∀𝒄𝒃 ∋ 𝑪𝑩𝒊
𝑆𝑃𝑚,𝑐𝑏 = 3
∀𝒄𝒃 ∋ 𝑪𝑩𝒋
𝑆𝑃𝑚,𝑐𝑏 = 2
𝑪𝒊 𝑪𝒋
3. Adding border points
4. Forming non-core clusters
5. Merging final clusters
𝑪𝒊 𝑪𝒋
𝑐𝑚 ∈ 𝐶𝑀
𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4
𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗
≥ log5000|𝑀|
.7
.8
.8
.8
.8
.7
𝑐𝑚
0
𝑐𝑚
1
𝑐𝑚
3
𝑐𝑚
5
𝑐𝑚
6
𝑐𝑚0
𝑐𝑚1
𝑐𝑚3
𝑐𝑚5
𝑐𝑚6
∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃
𝑚,𝑐𝑐 ≥ 0.5 and
normalized similarity to 𝐶𝐶𝑖
is larger than to 𝐶𝐶𝑗
min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and
normalized similarity to 𝐶𝐵𝑖
is larger than to 𝐶𝐵𝑗
.7
.8
.8
.8
.8
.7
𝑐0
𝑐1
𝑐3
𝑐5
𝑐6
• Use all modifiers
• On concept level
• TF-IDF-weighted concept-
similarity matrix
𝑐
0
𝑐
1
𝑐
3
𝑐
5
𝑐
6
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
16. Evaluation and results
16
Democrats,
Democratic leaders
Illinois Democrat
American public,
American families,
U.S. citizens,
Poor unskilled American workers
Voice of Americans
Demonstrators,
DACA protesters,
Opposition
Administration officials,
USCIS employees,
Executive authority,
DHS officials,
Chief of White House,
Acting secretary
Mexican,
Spanish,
Mexican officials
GOP senators,
Republicans,
Republican leaders,
A group of red state attorneys
European ally,
The Europeans,
European leaders,
Western European Diplomats
Israeli officials,
Israeli Ambassador,
The Israelis
Russian agents,
Russian nationals,
The Russians
caravan participants,
asylum-seeking immigrant caravan,
members of the caravan,
more than a few hundred asylum seekers,
150 migrants, many of whom were children,
asylum-seekers,
the people that are waiting outside,
these large “caravans” of people,
unauthorized immigrants,
refugees,
people traveling without documents,
a caravan of hundreds of Central Americans,
a group of about 100 people,
Central American migrants and supporters
one of the chief critics of DACA,
opponents of the policy,
some immigration critics,
immigration hard-liners,
groups who support stricter immigration controls
Indirect mentions: ORG Indirect mentions: GPEs Direct mentions
F1 Direct Indirect
CoreNLP 27.9 31.4
Hier.Clust. 37.2 29.1
EECDCR 41.6 42.6
MSMA 1.0 44.7 40.9
MSMA 2.0
ELMo
42.1 40.1
MSMA 2.0
fastText
48.3 43.6
MSMA
2.0
word2vec
48.5 44.3
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
17. Conclusion
17
• Bias by WCL has strong influence of the readers
• Revealing bias is a step towards mitigating it
• MSMA 1.0 & 2.0 successfully resolve biased mentions
Help social sciences with
frame analyses
Help news readers become
aware of bias in media
Newsalyze
news readers,
researchers
Help make the world a better place
Objectivity
Frame 2
Frame 1
https://github.com/fhamborg/newsalyze-backend
Soon to be publicly available