SlideShare a Scribd company logo
1 of 21
Human Interface Laboratory
On Measuring Gender Bias in Translation of
Gender-neutral Pronouns
2019. 8. 2 @GeBNLP, ACL Workshop
Won Ik Cho, Ji Won Kim, Seok Min Kim, Nam Soo Kim
Contents
• Overview: Gender bias in translation?
 About bias – Related work
 Problem statement in KR-EN
• Constructing an equity evaluation corpus (EEC)
 Content-related features
 Style-related features
• Measure, Experiment, Analysis
 Appropriateness of measure and sentence sets
 Quantitative/qualitative analysis
• Discussion
• Done and afterward
1
Overview: Gender bias in translation?
• Gender bias: in view of fairness machine learning
 What is bias?
 How is the bias in computer systems categorized?
• Pre-existing, technical, and emergent [Friedman and Nissenbaum, 1996]
 Bias in view of fairness machine learning?
• Problem of individuality and context rather than of
statistics and system [Binns, 2017]
 Examples of gender bias in view of fairness machine learning?
• Image semantic role labeling [Zhao et al., 2017]
• Amazon recruiting issue
 What is gender bias in
machine translation?
2
Overview: Gender bias in translation?
• Gender bias in automatic translation
 Bias
• Bias in computer systems
– Bias in view of fairness machine learning
» Gender bias in view of fairness machine learning
• Translation gender bias (TGB)!
 Seems very specific, but unexpectedly frequent
• Cross- and multi-lingual phenomenon
 Why should TGB be measured and mitigated?
• Translation affects people across country, race, religion etc.
• Regardlessly of the system performance, the user experience can be poor
• Amplification of the error is highly probable
3
Overview: Gender bias in translation?
• Previous and concurrent studies?
 Assessing gender bias in machine translation: a case study with Google
Translate [Prates et al., 2018]
• Investigates 12 languages with a template sentence
– Assumes no context, e.g., s/he is [xx]
• 1019 occupations, 21 adjectives
• Utilizes p-value for multi-lingual assessment (to EN)
 Evaluating Gender Bias in Machine Translation [Stanovsky et al., 2019]
• Investigates 8 languages regarding insertion of grammatical gender
– Assumes a situation with a weak context
• Utilizes the difference in performance for male/female regarding F1 score and
pre-/anti-stereotypical gender role assignment in evaluation (from EN)
• Compares various MT systems
4
Overview: Gender bias in translation?
• Target problem?
 Translation of gender-neutral pronouns (close to [Prates et al., 2018])
• Gender-neutral pronoun?
 One such as ‘single they’
 Here, includes the terms that are used interchangably with the pronouns
 Frequently appears in languages like Korean, Japanese, Turkish, ...
• Why Korean?
 Less explored language
 Displays various
sentence styles
 Translation service
popular among the users
(that many companies are
providing a service)
5
Overview: Gender bias in translation?
• Research questions
 What should we consider in making up the corpus for measuring
the bias?
 How should the measure be defined, not just pointing out the
difference of portion between two types of cases?
 Does the style, and not only content, of the sentence influence the
biasedness?
6
Constructing an equity evaluation corpus
• Equity evaluation corpus (EEC):
 Constructed to tease out biases towards races and gender
 Examples (↘) presented in [Kiritchenko and Mohammad, 2018]
 Template sentences are used
 How can we make such one
in the area of translation?
 What ethical constraints
should be considered?
7
Constructing an equity evaluation corpus
• Template sentence – 걔는 [xx]해/야
 kyay-nun [xx]-hay/ya
s/he-TOP [xx]-do/be
S/he does/is [xx]
 [xx]: content word
 hay/ya: particles (hay comes for adjective, ya comes for noun)
• Three factors considered
 Formality of the gender-neutral pronoun
• kyay (the kid/child; used to indicate someone of the same age or younger)
• ku salam (the person; used in more a formal context)
 Politeness of the sentence
• -yo (attached at the end of the sentence to assign politeness)
 The sentiment polarity of the content word
• sentiment words (positive, negative)
• occupation words (neutral)
8
Constructing an equity evaluation corpus
• More on sentiment words
 Excerpted from the Korean Sentiment Word Dictionary
• Published by Kunsan National University
• Reported to be constructed upon consensus of more than 3 natives
• 124 items for positive, 200 items for negative (root form)
• Single adjective word
– 상냥한 (sang-nyang-han, kind, positive)
• adjective phrase
– 됨됨이가 뛰어난 (toym-toym-i-ka-ttwi-e-nan, be good in manner, positive)
• verb phrase
– 함부로 말하는 (ham-pwu-lo mal-ha-nun, bombard rough words, negative)
• Two questions:
– Does the terms really belong to the category of positive/negative lexicon?
» 3 Korean natives’ unanimous decision
– Doesn’t it induce any prejudice if categorized into positive/negative lexicon?
» Appearance, richness, sexual orientation, disability etc.
9
Constructing an equity evaluation corpus
• More on occupation words
 Collected from the official government web site for employment
• List of 735 occupations was determined upon consensus
• Gender specificity had to be concealed
– e.g., 발레리노 (pal-ley-li-no, ballerino), 해녀 (hay-nye, woman diver)
• Occupation titles that show prejudice (respect or hate) toward some groups of
people were checked and had to be excluded
– e.g., 딴따라 (ttan-tta-la, slang for the music artists)
10
Sentiment #
Positive 124
Negative 200
Occupations 735
- Total 1,059 content terms
- Formality on/off (x2)
- Politeness on/off (x2)
- 4,236 sentences in total
Measure, Experiment, Analysis
• Measure
 𝑝 𝑤, 𝑝 𝑚, 𝑝 𝑛 for a sentence set 𝑆
• The ratio of the sentences in 𝑆 whose translation incorporates a pronoun
related to female (she, her, woman, girl etc.), male (he, him, man, boy, guy etc.),
or neither, respectively.
 Define 𝑃𝑖 = 𝑝 𝑤 𝑝 𝑚 + 𝑝 𝑛 for a sentence set 𝑆𝑖
 Let 𝑃 = 𝐴𝑉𝐺(𝑃𝑖) for all the sentence sets
- Translation gender bias index!
 Question:
• Is the measure appropriately defined?
• Does the measure really display how the model is biased?
11
Measure, Experiment, Analysis
• Measure
 The appropriateness of measure 𝑃𝑖 = 𝑝 𝑤 𝑝 𝑚 + 𝑝 𝑛
• Boundedness
– Given 0 ≤ 𝑝 𝑤, 𝑝 𝑚, 𝑝 𝑛 ≤ 1 and 𝑝 𝑤 + 𝑝 𝑚 + 𝑝 𝑛 = 1,
the measure is between 0 and 1
– Can be utilized in analyzing with multiple sentence sets
• Optimals
– 1 when 𝑝 𝑛 = 1
» Encourages the preservation of gender-neutrality
– 0 when either 𝑝 𝑤 or 𝑝 𝑚 = 1
» Discourages the bias caused by the volume in the corpus
• Considering most MT systems for KR-EN rarely utilizes the gender-neutral
expression, SQRT function alleviates the penalty of using gender-specific terms
– But still encourages the preservation of gender-neutrality
» e.g., (0.3, 0.3, 0.4) yields 0.7 while (0.4, 0.4, 0.2) yields 0.6
12
Measure, Experiment, Analysis
• Measure
 Does the measure really display how the model is biased?
• Bias caused by the volume imbalance in the corpus (VBias)
– At least so far, male dominancy is shown in various types of articles
» e.g., in description / while posing an example / especially in formal style articles
which are frequently utilized in the training phase ...
• Bias caused by the social prejudice (SBias)
– Relating or assuming specific gender to specific content terms (in talk, in novel...),
making an hasty guess etc.
 If the target language incorporates gender-neutral pronouns (e.g.,
Japanese) the neutrality is usually preserved. But if not...
• 𝑝 𝑛 might not have a role in some cases, although the measure still shows the
biasedness
• For a further investigation, should consider if the target language frequently
utilizes gender-neutral expressions
13
Measure, Experiment, Analysis
• Experiment
 Seven sentence sets, Three translation services in-use
14
For each row: 𝑃𝑠 (𝑝 𝑤, 𝑝 𝑛)
Average: 𝐴𝑉𝐺 𝑃𝑖 for sentence sets (a-g)
Total unbiasedness: GT > NP > KT
But does the high score really mean unbiasedness?
Measure, Experiment, Analysis
• Quantitative analysis
 VBias seems to be influential, shown by 𝑝 𝑚 dominating the others
 Content-related features
• Regards sentiment polarity and occupations
• For sentiment polarity, slight amount of difference: unbiasedness shown
positive > occupation > negative lexicons
• Overall, male dominancy is shown
 Style-related features
• Regards formality and politeness of the expressions
• For politeness, little difference shown between on/off
• For formality, very high male dominancy observed in formal style sentences
– A comment: because more male authors are engaged in formal writing and they
assume a male subject?
– Supported by [Argamon et al., 2003; Qian, 2019] and statistics
(https://www.bls.gov/cps/cpsaat11.htm)
15
Measure, Experiment, Analysis
• Qualitative analysis and inter-system comparison
 In case, VBias is attenuated if SBias engages in
• E.g., for `informal’ case, GT and NP shows less male dominancy than KT
– KT, in some sense, assumes a default male, while GT and NP shows diversity
– Does it mean that GT and NP are less biased?
 Should check `in which way’ the increase in the measure took place
• For GT, stereotypical gender role assignment was mainly observed, which seems
to have lowered the male dominancy
• For NP, anti-stereotypical gender role assignment was frequently observed, as is
expected to be performed by the developing team
 Our measure shows tendency, but not fully shows if social bias is
engaged in (and attenuates the volume bias)
• Can be augmented with human evaluation or automated system, that checks
stereotypical gender role assignment, as in [Stanovsky et al., 2019]
16
Discussion
• The target of the measure
 Not to arrange the translators in order of biasedness,
 also not to tell that the half-half guess is the best,
 but to claim that the hasty guess on gender should be avoided
• Recent progress
 Google translator let users choose between genders [Kuczmarski and
Johnson, 2018] and tries to recognize the context
• More to be advanced
 Mitigation can be performed in the way to recognize the presence of the
context and preventing a hasty guess
• Language specificity of the scheme
 Most of the studies are targeting [xx]-EN or EN-[xx] translation
• EEC and measure can be more developed in multi-lingual manner that
considers various language families as also a target language
17
Done and afterward
• Done
 Construction of a corpus with template sentences that can check the
preservation of gender-neutrality in KR-EN translation (along with a
detailed guideline)
 A measure to evaluate and compare the performance of translation
systems regarding the preservation of gender neutrality of pronouns
 Rigorous contemplation on why the preservation of gender neutrality has
to be guaranteed in translation
• Afterward?
 Constructing corpus/measure in multi-lingual point of view
 Investigating the effect of context (coreference resolution)
 Context-sensitive post-processing on the translation result
18
Reference (order of appearance)
• Friedman, Batya, and Helen Nissenbaum. "Bias in computer systems." ACM Transactions on
Information Systems (TOIS) 14.3 (1996): 330-347.
• Binns, Reuben. "Fairness in machine learning: Lessons from political philosophy." arXiv preprint
arXiv:1712.03586 (2017).
• Zhao, Jieyu, et al. "Men also like shopping: Reducing gender bias amplification using corpus-
level constraints." arXiv preprint arXiv:1707.09457 (2017).
• Prates, Marcelo OR, Pedro H. Avelar, and Luís C. Lamb. "Assessing gender bias in machine
translation: a case study with Google Translate." Neural Computing and Applications (2018): 1-
19.
• Stanovsky, Gabriel, Noah A. Smith, and Luke Zettlemoyer. "Evaluating Gender Bias in Machine
Translation." arXiv preprint arXiv:1906.00591 (2019).
• Kiritchenko, Svetlana, and Saif M. Mohammad. "Examining gender and race bias in two hundred
sentiment analysis systems." arXiv preprint arXiv:1805.04508 (2018).
• Argamon, Shlomo, et al. "Gender, genre, and writing style in formal written texts." Text-The
Hague Then Amsterdam Then Berlin- 23.3 (2003): 321-346.
• Qian, Yusu. "Gender Stereotypes Differ between Male and Female Writings." Proceedings of the
57th Conference of the Association for Computational Linguistics: Student Research Workshop.
(2019)
• Kuczmarski, James, and Melvin Johnson. "Gender-aware natural language translation." (2018).
19
Thank you!
EndOfPresentation

More Related Content

What's hot

Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Yu Tamura
 
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...Simon Buckingham Shum
 
APA Mechanics of Style - (7th Edition APA Manual)
APA Mechanics of Style - (7th Edition APA Manual)APA Mechanics of Style - (7th Edition APA Manual)
APA Mechanics of Style - (7th Edition APA Manual)Thiyagu K
 
Summary Assessing Skills
Summary Assessing SkillsSummary Assessing Skills
Summary Assessing SkillsGerardo Zavalla
 
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—Yu Tamura
 
assessing reading- assessment of reading skills- test -kiran nazir
assessing reading- assessment of reading skills- test -kiran nazirassessing reading- assessment of reading skills- test -kiran nazir
assessing reading- assessment of reading skills- test -kiran nazirkiran nazir
 
Introductions and conclusions february 2019 32
Introductions and conclusions   february  2019 32Introductions and conclusions   february  2019 32
Introductions and conclusions february 2019 32JAHennessyMurdoch
 
ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...
ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...
ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...A. Tenry Lawangen Aspat Colle
 
Carroll presentation ant hsinchu 2018
Carroll presentation ant hsinchu 2018Carroll presentation ant hsinchu 2018
Carroll presentation ant hsinchu 2018Michael Carroll
 
Validation of the grammatical carefulness scale using a discourse completion ...
Validation of the grammatical carefulness scale using a discourse completion ...Validation of the grammatical carefulness scale using a discourse completion ...
Validation of the grammatical carefulness scale using a discourse completion ...Yu Tamura
 

What's hot (17)

Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
 
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
 
APA Mechanics of Style - (7th Edition APA Manual)
APA Mechanics of Style - (7th Edition APA Manual)APA Mechanics of Style - (7th Edition APA Manual)
APA Mechanics of Style - (7th Edition APA Manual)
 
Summary Assessing Skills
Summary Assessing SkillsSummary Assessing Skills
Summary Assessing Skills
 
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
Word Frequency Effects and Plurality in L2 Word Recognition—A Preliminary Study—
 
TDC 1 - Class 2
TDC 1 - Class 2TDC 1 - Class 2
TDC 1 - Class 2
 
TDC 1 - Class 2
TDC 1 - Class 2TDC 1 - Class 2
TDC 1 - Class 2
 
assessing reading- assessment of reading skills- test -kiran nazir
assessing reading- assessment of reading skills- test -kiran nazirassessing reading- assessment of reading skills- test -kiran nazir
assessing reading- assessment of reading skills- test -kiran nazir
 
Testing reading
Testing readingTesting reading
Testing reading
 
Introductions and conclusions february 2019 32
Introductions and conclusions   february  2019 32Introductions and conclusions   february  2019 32
Introductions and conclusions february 2019 32
 
Assessing writing
Assessing writingAssessing writing
Assessing writing
 
Testing grammar
Testing grammarTesting grammar
Testing grammar
 
ASSESSMENT: READING COMPREHENSION ASSESSMENT
ASSESSMENT: READING COMPREHENSION ASSESSMENTASSESSMENT: READING COMPREHENSION ASSESSMENT
ASSESSMENT: READING COMPREHENSION ASSESSMENT
 
ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...
ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...
ASSESSMENT: DISCRETE POINT TEST, INTEGRATIVE TESTING, PERFORMANCE-BASED ASSES...
 
assesing reading
assesing readingassesing reading
assesing reading
 
Carroll presentation ant hsinchu 2018
Carroll presentation ant hsinchu 2018Carroll presentation ant hsinchu 2018
Carroll presentation ant hsinchu 2018
 
Validation of the grammatical carefulness scale using a discourse completion ...
Validation of the grammatical carefulness scale using a discourse completion ...Validation of the grammatical carefulness scale using a discourse completion ...
Validation of the grammatical carefulness scale using a discourse completion ...
 

Similar to 190802 GeBNLP

Decoding word association 3 - sentence completion test
Decoding word association 3 - sentence completion testDecoding word association 3 - sentence completion test
Decoding word association 3 - sentence completion testCol Mukteshwar Prasad
 
Gender and language (linguistics, social network theory, Twitter!)
Gender and language (linguistics, social network theory, Twitter!)Gender and language (linguistics, social network theory, Twitter!)
Gender and language (linguistics, social network theory, Twitter!)Tyler Schnoebelen
 
Gender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methodsGender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methodsIdibon1
 
(9)what is academic writing ljmu jhm
(9)what is academic writing ljmu jhm(9)what is academic writing ljmu jhm
(9)what is academic writing ljmu jhmJAHennessyMurdoch
 
Sat lessons power point dt6 10.05.2011
Sat lessons power point dt6 10.05.2011Sat lessons power point dt6 10.05.2011
Sat lessons power point dt6 10.05.2011VJN_88_
 
Developing your academic language at pg level
Developing your academic language at pg levelDeveloping your academic language at pg level
Developing your academic language at pg levelRhianWynWilliams
 
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxCommunity Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxdonnajames55
 
Psychological test adaptation
Psychological test adaptationPsychological test adaptation
Psychological test adaptationCarlo Magno
 
How to write a nature vs nurture essay
How to write a nature vs nurture essayHow to write a nature vs nurture essay
How to write a nature vs nurture essayEssayAcademy
 
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...Bilinguistics
 
PersonalQualitativeResearchStudy
PersonalQualitativeResearchStudyPersonalQualitativeResearchStudy
PersonalQualitativeResearchStudyCandice Garnes
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)Phong Đá
 
Representation of Gender 2015
Representation of Gender 2015Representation of Gender 2015
Representation of Gender 2015Naamah Hill
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
 
Care Setting Environmental Analysis Scoring GuideCRITERIA .docx
Care Setting Environmental Analysis Scoring GuideCRITERIA .docxCare Setting Environmental Analysis Scoring GuideCRITERIA .docx
Care Setting Environmental Analysis Scoring GuideCRITERIA .docxrobert345678
 
User review sites as a resource for large scale sociolinguistic studies
User review sites as a resource for large scale sociolinguistic studiesUser review sites as a resource for large scale sociolinguistic studies
User review sites as a resource for large scale sociolinguistic studiesHacer Tilbeç Turgut
 

Similar to 190802 GeBNLP (20)

2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
Decoding word association 3 - sentence completion test
Decoding word association 3 - sentence completion testDecoding word association 3 - sentence completion test
Decoding word association 3 - sentence completion test
 
Gender and language (linguistics, social network theory, Twitter!)
Gender and language (linguistics, social network theory, Twitter!)Gender and language (linguistics, social network theory, Twitter!)
Gender and language (linguistics, social network theory, Twitter!)
 
Gender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methodsGender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methods
 
APA Stylistics and Editorial Style (Part 1)
APA Stylistics and Editorial Style (Part 1)APA Stylistics and Editorial Style (Part 1)
APA Stylistics and Editorial Style (Part 1)
 
(9)what is academic writing ljmu jhm
(9)what is academic writing ljmu jhm(9)what is academic writing ljmu jhm
(9)what is academic writing ljmu jhm
 
Sat lessons power point dt6 10.05.2011
Sat lessons power point dt6 10.05.2011Sat lessons power point dt6 10.05.2011
Sat lessons power point dt6 10.05.2011
 
Developing your academic language at pg level
Developing your academic language at pg levelDeveloping your academic language at pg level
Developing your academic language at pg level
 
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
 
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxCommunity Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
 
Psychological test adaptation
Psychological test adaptationPsychological test adaptation
Psychological test adaptation
 
How to write a nature vs nurture essay
How to write a nature vs nurture essayHow to write a nature vs nurture essay
How to write a nature vs nurture essay
 
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
Using Dynamic Assessment in Differential Diagnoses of Culturally and Linguist...
 
PersonalQualitativeResearchStudy
PersonalQualitativeResearchStudyPersonalQualitativeResearchStudy
PersonalQualitativeResearchStudy
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Representation of Gender 2015
Representation of Gender 2015Representation of Gender 2015
Representation of Gender 2015
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
 
Care Setting Environmental Analysis Scoring GuideCRITERIA .docx
Care Setting Environmental Analysis Scoring GuideCRITERIA .docxCare Setting Environmental Analysis Scoring GuideCRITERIA .docx
Care Setting Environmental Analysis Scoring GuideCRITERIA .docx
 
User review sites as a resource for large scale sociolinguistic studies
User review sites as a resource for large scale sociolinguistic studiesUser review sites as a resource for large scale sociolinguistic studies
User review sites as a resource for large scale sociolinguistic studies
 

More from WarNik Chow

2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inpersonWarNik Chow
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech datasetWarNik Chow
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2eWarNik Chow
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminarWarNik Chow
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate SpeechWarNik Chow
 
2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLPWarNik Chow
 
2008 [lang con2020] act!
2008 [lang con2020] act!2008 [lang con2020] act!
2008 [lang con2020] act!WarNik Chow
 

More from WarNik Chow (20)

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
 
2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP
 
2008 [lang con2020] act!
2008 [lang con2020] act!2008 [lang con2020] act!
2008 [lang con2020] act!
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

190802 GeBNLP

  • 1. Human Interface Laboratory On Measuring Gender Bias in Translation of Gender-neutral Pronouns 2019. 8. 2 @GeBNLP, ACL Workshop Won Ik Cho, Ji Won Kim, Seok Min Kim, Nam Soo Kim
  • 2. Contents • Overview: Gender bias in translation?  About bias – Related work  Problem statement in KR-EN • Constructing an equity evaluation corpus (EEC)  Content-related features  Style-related features • Measure, Experiment, Analysis  Appropriateness of measure and sentence sets  Quantitative/qualitative analysis • Discussion • Done and afterward 1
  • 3. Overview: Gender bias in translation? • Gender bias: in view of fairness machine learning  What is bias?  How is the bias in computer systems categorized? • Pre-existing, technical, and emergent [Friedman and Nissenbaum, 1996]  Bias in view of fairness machine learning? • Problem of individuality and context rather than of statistics and system [Binns, 2017]  Examples of gender bias in view of fairness machine learning? • Image semantic role labeling [Zhao et al., 2017] • Amazon recruiting issue  What is gender bias in machine translation? 2
  • 4. Overview: Gender bias in translation? • Gender bias in automatic translation  Bias • Bias in computer systems – Bias in view of fairness machine learning » Gender bias in view of fairness machine learning • Translation gender bias (TGB)!  Seems very specific, but unexpectedly frequent • Cross- and multi-lingual phenomenon  Why should TGB be measured and mitigated? • Translation affects people across country, race, religion etc. • Regardlessly of the system performance, the user experience can be poor • Amplification of the error is highly probable 3
  • 5. Overview: Gender bias in translation? • Previous and concurrent studies?  Assessing gender bias in machine translation: a case study with Google Translate [Prates et al., 2018] • Investigates 12 languages with a template sentence – Assumes no context, e.g., s/he is [xx] • 1019 occupations, 21 adjectives • Utilizes p-value for multi-lingual assessment (to EN)  Evaluating Gender Bias in Machine Translation [Stanovsky et al., 2019] • Investigates 8 languages regarding insertion of grammatical gender – Assumes a situation with a weak context • Utilizes the difference in performance for male/female regarding F1 score and pre-/anti-stereotypical gender role assignment in evaluation (from EN) • Compares various MT systems 4
  • 6. Overview: Gender bias in translation? • Target problem?  Translation of gender-neutral pronouns (close to [Prates et al., 2018]) • Gender-neutral pronoun?  One such as ‘single they’  Here, includes the terms that are used interchangably with the pronouns  Frequently appears in languages like Korean, Japanese, Turkish, ... • Why Korean?  Less explored language  Displays various sentence styles  Translation service popular among the users (that many companies are providing a service) 5
  • 7. Overview: Gender bias in translation? • Research questions  What should we consider in making up the corpus for measuring the bias?  How should the measure be defined, not just pointing out the difference of portion between two types of cases?  Does the style, and not only content, of the sentence influence the biasedness? 6
  • 8. Constructing an equity evaluation corpus • Equity evaluation corpus (EEC):  Constructed to tease out biases towards races and gender  Examples (↘) presented in [Kiritchenko and Mohammad, 2018]  Template sentences are used  How can we make such one in the area of translation?  What ethical constraints should be considered? 7
  • 9. Constructing an equity evaluation corpus • Template sentence – 걔는 [xx]해/야  kyay-nun [xx]-hay/ya s/he-TOP [xx]-do/be S/he does/is [xx]  [xx]: content word  hay/ya: particles (hay comes for adjective, ya comes for noun) • Three factors considered  Formality of the gender-neutral pronoun • kyay (the kid/child; used to indicate someone of the same age or younger) • ku salam (the person; used in more a formal context)  Politeness of the sentence • -yo (attached at the end of the sentence to assign politeness)  The sentiment polarity of the content word • sentiment words (positive, negative) • occupation words (neutral) 8
  • 10. Constructing an equity evaluation corpus • More on sentiment words  Excerpted from the Korean Sentiment Word Dictionary • Published by Kunsan National University • Reported to be constructed upon consensus of more than 3 natives • 124 items for positive, 200 items for negative (root form) • Single adjective word – 상냥한 (sang-nyang-han, kind, positive) • adjective phrase – 됨됨이가 뛰어난 (toym-toym-i-ka-ttwi-e-nan, be good in manner, positive) • verb phrase – 함부로 말하는 (ham-pwu-lo mal-ha-nun, bombard rough words, negative) • Two questions: – Does the terms really belong to the category of positive/negative lexicon? » 3 Korean natives’ unanimous decision – Doesn’t it induce any prejudice if categorized into positive/negative lexicon? » Appearance, richness, sexual orientation, disability etc. 9
  • 11. Constructing an equity evaluation corpus • More on occupation words  Collected from the official government web site for employment • List of 735 occupations was determined upon consensus • Gender specificity had to be concealed – e.g., 발레리노 (pal-ley-li-no, ballerino), 해녀 (hay-nye, woman diver) • Occupation titles that show prejudice (respect or hate) toward some groups of people were checked and had to be excluded – e.g., 딴따라 (ttan-tta-la, slang for the music artists) 10 Sentiment # Positive 124 Negative 200 Occupations 735 - Total 1,059 content terms - Formality on/off (x2) - Politeness on/off (x2) - 4,236 sentences in total
  • 12. Measure, Experiment, Analysis • Measure  𝑝 𝑤, 𝑝 𝑚, 𝑝 𝑛 for a sentence set 𝑆 • The ratio of the sentences in 𝑆 whose translation incorporates a pronoun related to female (she, her, woman, girl etc.), male (he, him, man, boy, guy etc.), or neither, respectively.  Define 𝑃𝑖 = 𝑝 𝑤 𝑝 𝑚 + 𝑝 𝑛 for a sentence set 𝑆𝑖  Let 𝑃 = 𝐴𝑉𝐺(𝑃𝑖) for all the sentence sets - Translation gender bias index!  Question: • Is the measure appropriately defined? • Does the measure really display how the model is biased? 11
  • 13. Measure, Experiment, Analysis • Measure  The appropriateness of measure 𝑃𝑖 = 𝑝 𝑤 𝑝 𝑚 + 𝑝 𝑛 • Boundedness – Given 0 ≤ 𝑝 𝑤, 𝑝 𝑚, 𝑝 𝑛 ≤ 1 and 𝑝 𝑤 + 𝑝 𝑚 + 𝑝 𝑛 = 1, the measure is between 0 and 1 – Can be utilized in analyzing with multiple sentence sets • Optimals – 1 when 𝑝 𝑛 = 1 » Encourages the preservation of gender-neutrality – 0 when either 𝑝 𝑤 or 𝑝 𝑚 = 1 » Discourages the bias caused by the volume in the corpus • Considering most MT systems for KR-EN rarely utilizes the gender-neutral expression, SQRT function alleviates the penalty of using gender-specific terms – But still encourages the preservation of gender-neutrality » e.g., (0.3, 0.3, 0.4) yields 0.7 while (0.4, 0.4, 0.2) yields 0.6 12
  • 14. Measure, Experiment, Analysis • Measure  Does the measure really display how the model is biased? • Bias caused by the volume imbalance in the corpus (VBias) – At least so far, male dominancy is shown in various types of articles » e.g., in description / while posing an example / especially in formal style articles which are frequently utilized in the training phase ... • Bias caused by the social prejudice (SBias) – Relating or assuming specific gender to specific content terms (in talk, in novel...), making an hasty guess etc.  If the target language incorporates gender-neutral pronouns (e.g., Japanese) the neutrality is usually preserved. But if not... • 𝑝 𝑛 might not have a role in some cases, although the measure still shows the biasedness • For a further investigation, should consider if the target language frequently utilizes gender-neutral expressions 13
  • 15. Measure, Experiment, Analysis • Experiment  Seven sentence sets, Three translation services in-use 14 For each row: 𝑃𝑠 (𝑝 𝑤, 𝑝 𝑛) Average: 𝐴𝑉𝐺 𝑃𝑖 for sentence sets (a-g) Total unbiasedness: GT > NP > KT But does the high score really mean unbiasedness?
  • 16. Measure, Experiment, Analysis • Quantitative analysis  VBias seems to be influential, shown by 𝑝 𝑚 dominating the others  Content-related features • Regards sentiment polarity and occupations • For sentiment polarity, slight amount of difference: unbiasedness shown positive > occupation > negative lexicons • Overall, male dominancy is shown  Style-related features • Regards formality and politeness of the expressions • For politeness, little difference shown between on/off • For formality, very high male dominancy observed in formal style sentences – A comment: because more male authors are engaged in formal writing and they assume a male subject? – Supported by [Argamon et al., 2003; Qian, 2019] and statistics (https://www.bls.gov/cps/cpsaat11.htm) 15
  • 17. Measure, Experiment, Analysis • Qualitative analysis and inter-system comparison  In case, VBias is attenuated if SBias engages in • E.g., for `informal’ case, GT and NP shows less male dominancy than KT – KT, in some sense, assumes a default male, while GT and NP shows diversity – Does it mean that GT and NP are less biased?  Should check `in which way’ the increase in the measure took place • For GT, stereotypical gender role assignment was mainly observed, which seems to have lowered the male dominancy • For NP, anti-stereotypical gender role assignment was frequently observed, as is expected to be performed by the developing team  Our measure shows tendency, but not fully shows if social bias is engaged in (and attenuates the volume bias) • Can be augmented with human evaluation or automated system, that checks stereotypical gender role assignment, as in [Stanovsky et al., 2019] 16
  • 18. Discussion • The target of the measure  Not to arrange the translators in order of biasedness,  also not to tell that the half-half guess is the best,  but to claim that the hasty guess on gender should be avoided • Recent progress  Google translator let users choose between genders [Kuczmarski and Johnson, 2018] and tries to recognize the context • More to be advanced  Mitigation can be performed in the way to recognize the presence of the context and preventing a hasty guess • Language specificity of the scheme  Most of the studies are targeting [xx]-EN or EN-[xx] translation • EEC and measure can be more developed in multi-lingual manner that considers various language families as also a target language 17
  • 19. Done and afterward • Done  Construction of a corpus with template sentences that can check the preservation of gender-neutrality in KR-EN translation (along with a detailed guideline)  A measure to evaluate and compare the performance of translation systems regarding the preservation of gender neutrality of pronouns  Rigorous contemplation on why the preservation of gender neutrality has to be guaranteed in translation • Afterward?  Constructing corpus/measure in multi-lingual point of view  Investigating the effect of context (coreference resolution)  Context-sensitive post-processing on the translation result 18
  • 20. Reference (order of appearance) • Friedman, Batya, and Helen Nissenbaum. "Bias in computer systems." ACM Transactions on Information Systems (TOIS) 14.3 (1996): 330-347. • Binns, Reuben. "Fairness in machine learning: Lessons from political philosophy." arXiv preprint arXiv:1712.03586 (2017). • Zhao, Jieyu, et al. "Men also like shopping: Reducing gender bias amplification using corpus- level constraints." arXiv preprint arXiv:1707.09457 (2017). • Prates, Marcelo OR, Pedro H. Avelar, and Luís C. Lamb. "Assessing gender bias in machine translation: a case study with Google Translate." Neural Computing and Applications (2018): 1- 19. • Stanovsky, Gabriel, Noah A. Smith, and Luke Zettlemoyer. "Evaluating Gender Bias in Machine Translation." arXiv preprint arXiv:1906.00591 (2019). • Kiritchenko, Svetlana, and Saif M. Mohammad. "Examining gender and race bias in two hundred sentiment analysis systems." arXiv preprint arXiv:1805.04508 (2018). • Argamon, Shlomo, et al. "Gender, genre, and writing style in formal written texts." Text-The Hague Then Amsterdam Then Berlin- 23.3 (2003): 321-346. • Qian, Yusu. "Gender Stereotypes Differ between Male and Female Writings." Proceedings of the 57th Conference of the Association for Computational Linguistics: Student Research Workshop. (2019) • Kuczmarski, James, and Melvin Johnson. "Gender-aware natural language translation." (2018). 19

Editor's Notes

  1. .
  2. overview: gender bias in NLP – various problems translation: real-world problem - example e.g. Turkish, Korean..? How is it treated in previous works? Why should it be guaranteed? problem statement: with KR-EN example why not investigated in previous works? why appropriate for investigating gender bias? what examples are observed? construction: what are to be considered? formality (걔 vs 그 사람) politeness (-어 vs –어요) lexicon sentiment polarity (positive & negative & occupation) + things to be considered in... (not to threaten the fairness) - Measure? how the measure is defined, and proved to be bounded (and have optimum when the condition fits with the ideal case) concept of Vbias and Sbias – how they are aggregated into the measure << disadvantage? how the usage is justified despite disadvantages the strong points? - Experiment? how the EEC is used in evaluation, and how the arithmetic averaging is justified the result: GT > NP > KT? - Analysis? quantitative analysis – Vbias and Sbias, significant with style-related features qualitative analysis – observed with the case of occupation words Done: tgbi for KR-EN, with an EEC Afterward: how Sbias can be considered more explicitly? what if among context? how about with other target/source language?