#WiDS2019
Diversity in Data Science Space and Models
Vidhya Chandrasekaran
Women in Data Science | 9 March 2019 | IIM Bangalore
Some Stats to Start With
55%
Large Companies
that Adopted AI as
of Dec 2018
300,000
AI Engineers as
of Dec 2017
$60 billion
AI Revenue
by 2025
85%
Customer interactions
to be replaced by
AI in 2020
Gartner
Learning from us
Manual HR
Scroll through hundreds of profiles manually from different job sources
Large ETA, Human Bias, Error in Assessment
Screening and accessing the candidates skills against JD
Large ETA, Human Bias, Error in Assessment
Scheduling interview process with Recruiting manager
Human Bias, Scheduling Issues
Yahoo!!!
AI on HR
Rates and grades profiles that is more
relevant with the JD
Scheduling interview for top candidates
with the recruitment manager
• Always on HR
• Lesser turn around time for
recruitment
• Reduces Human bias in recruitment
• Better assessing the candidates
• Reduces Human bias in Promotions
• Fills the gap of technology skills in
HR
• Recommend the appropriate training
for the candidate
Yahoo!!!
Sexist Recruitment Tool?
• A ecommerce big player reportedly brought down its
internal AI recruitment tool for gender bias
• Gender-insensitive tool penalizes women candidates
since the data was trained from a workforce consisting
majorly of male employees
• Gave low grades to resumes with words like “Women”
• Candidates with a break in their career would be
considered a factor for low scoring, which impacts
women most
• “Executed”, “Captured” found in most male candidates
are scored high
• Instead of eliminating human bias, AI tool can bring in
unintentional bias
Challenges in Manual Law Enforcement
• Inadequate work force
• Human Bias
• High False Positives
• Human Bias
• Very Reactive approach
• High False Positives
• Workforce deployment
challenges
• Very Reactive approach
• Not enough technical
expertise to identify criminals
Law Enforcement with the Power of Data
• Proactive prevention of crime
• Better work force management
• Faster identification of criminals
• Optimizing security strategies
• Eliminate Intentional bias
• Identifying terrorism networks from
social media
AI Policing Gone Wrong
Stop and Frisk Program in New York uses AI
• Identifies 83% of Colored and Hispanic and 10% white people
• 52% black and Hispanic and 33% white population
ProPublica Report on racial bias in AI
Images courtesy: ProPublica
Subsequence offense
Dylan: 3 Drug possession
Bernard: None
Subsequence offense
Vernon: 1 Grand Theft
Brisha: None
Bias in Online Ads (Carnegie Mellon Research)
An experiment was conducted by Carnegie Mellon University with Adfisher.
• Exec jobs shown to 17% Women and 83% men
• 93% accurate that the model was biased
Skin color: Failure to recognize a human
Zip code: Denying same-day delivery
Mind Your Words
Men is to a Doctor as to Women is to a Nurse?
“Examining Gender and Racial Bias in Two Hundred Sentiment Analysis System” - Svetlana Kirichinko
and Saif Mohammad
• 75% to 86% of the sentiment analysis application scored sentences of one gender higher than the
other
• Associates Emotion with Women and Fear with Men
“Man is to Computer Programmer as Woman is to Homemaker” - Bolugbasi Tolga
• Found the following associations in the she-he analogies:
• Queen-king
• Sister-brother
• Daughter-son
• Mother-father
• Convent-monastery
• Waitress-waiter
• Nurse-physician
• Sewing-carpentry
• housewife-shopkeeper
• softball-baseball
• cosmetics-pharmaceuticals
• giggle-chuckle
• interior designer-architect
• charming-affable
The Infinite Loop – Accelerated Bias
Biased
Society
Data
from
Biased
Society
Model
Outputs
Action
Taken
Action
Recorded
Challenges in Eliminating Bias in AI
Vidhya Chandrasekaran
Women in Data Science | 9 March 2019 | IIM Bangalore
Fair AI
An AI can deliver a fair decision irrespective of
▪ Gender
▪ Age
▪ Race
▪ Educational background
▪ Economic background
▪ Physical appearance
▪ Power and authority …
What is fairness?
▪ Individual Fairness
o Similar people should be treated similarly
o Definition of similarity is the key
▪ Group Fairness
o Where fair decision making across groups split by
Race, Gender, Age, etc.
o Positive and Negative prediction rates equal across
groups
Why Bias
Accountability is difficult
No common law
Socio-Technical. Data
scientist think Loss function
Debugging and eliminating bias
is very complex
Difficult to eliminate hidden
bias from proxy variables that
has most information
Years of biased data
Can We Solve It?
Vidhya Chandrasekaran
Women in Data Science | 9 March 2019 | IIM Bangalore
Algorithmic Accountability
• Audits and standards should catch up with Technology
• Legal standards and practices should be established
• Having a governance on the algorithms
• Transparency and audit of code and logic
• Collaboration of Social scientist and Data scientist
• Compliances and governance on variable usage
• Platform to test fairness
Training Data
• Algorithms are fair to all only if they are trained on all
IBM research releases diversity in faces:
“For face recognition to perform as desired – to be both
accurate and fair – training data must provide sufficient balance
and coverage”
• Up-sampling and down-sampling and synthetic generation
• Transfer Learning where data inadequacy
• Universal unbiased word embeddings
Inclusive Training Sets
Decoupled Classifier – Group Fairness
“Decoupled Classifier for Group Fairness” -
Cynthia Dwork
Sensitive attributes classify with different
accuracy level for different classes
• Achieving both Fairness and accuracy is very difficult
• Build Multiple classifiers separate for each class
• One joint function to optimize
• Output is chosen based on minimizing the joint
function
• False positives are same across groups
Debiasing with Math
Man – Woman = Doctor - Nurse
Regularization to Treat Bias
Penalize if variables are given
more importance
Dropout nodes randomly to
avoid reliance
Randomness to the Rescue
“Accountability of any process should grapple it” – Joshua A. Kroll in
Accountable Algorithms (On randomness to design of a process)
• Many Machine learning algorithm incorporates Randomness
• Randomize the data used for training
• Randomize the features used for the model to avoid weighing heavily on one
particular feature
• Assign a different class value to the positive and negative classes predicted
for random sample
And…
Diverse Data Science Teams
Thank you.

Fair AI

  • 1.
  • 2.
    Diversity in DataScience Space and Models Vidhya Chandrasekaran Women in Data Science | 9 March 2019 | IIM Bangalore
  • 3.
    Some Stats toStart With 55% Large Companies that Adopted AI as of Dec 2018 300,000 AI Engineers as of Dec 2017 $60 billion AI Revenue by 2025 85% Customer interactions to be replaced by AI in 2020 Gartner
  • 4.
  • 5.
    Manual HR Scroll throughhundreds of profiles manually from different job sources Large ETA, Human Bias, Error in Assessment Screening and accessing the candidates skills against JD Large ETA, Human Bias, Error in Assessment Scheduling interview process with Recruiting manager Human Bias, Scheduling Issues Yahoo!!!
  • 6.
    AI on HR Ratesand grades profiles that is more relevant with the JD Scheduling interview for top candidates with the recruitment manager • Always on HR • Lesser turn around time for recruitment • Reduces Human bias in recruitment • Better assessing the candidates • Reduces Human bias in Promotions • Fills the gap of technology skills in HR • Recommend the appropriate training for the candidate Yahoo!!!
  • 7.
    Sexist Recruitment Tool? •A ecommerce big player reportedly brought down its internal AI recruitment tool for gender bias • Gender-insensitive tool penalizes women candidates since the data was trained from a workforce consisting majorly of male employees • Gave low grades to resumes with words like “Women” • Candidates with a break in their career would be considered a factor for low scoring, which impacts women most • “Executed”, “Captured” found in most male candidates are scored high • Instead of eliminating human bias, AI tool can bring in unintentional bias
  • 8.
    Challenges in ManualLaw Enforcement • Inadequate work force • Human Bias • High False Positives • Human Bias • Very Reactive approach • High False Positives • Workforce deployment challenges • Very Reactive approach • Not enough technical expertise to identify criminals
  • 9.
    Law Enforcement withthe Power of Data • Proactive prevention of crime • Better work force management • Faster identification of criminals • Optimizing security strategies • Eliminate Intentional bias • Identifying terrorism networks from social media
  • 10.
    AI Policing GoneWrong Stop and Frisk Program in New York uses AI • Identifies 83% of Colored and Hispanic and 10% white people • 52% black and Hispanic and 33% white population ProPublica Report on racial bias in AI Images courtesy: ProPublica Subsequence offense Dylan: 3 Drug possession Bernard: None Subsequence offense Vernon: 1 Grand Theft Brisha: None
  • 11.
    Bias in OnlineAds (Carnegie Mellon Research) An experiment was conducted by Carnegie Mellon University with Adfisher. • Exec jobs shown to 17% Women and 83% men • 93% accurate that the model was biased
  • 12.
    Skin color: Failureto recognize a human Zip code: Denying same-day delivery
  • 13.
    Mind Your Words Menis to a Doctor as to Women is to a Nurse? “Examining Gender and Racial Bias in Two Hundred Sentiment Analysis System” - Svetlana Kirichinko and Saif Mohammad • 75% to 86% of the sentiment analysis application scored sentences of one gender higher than the other • Associates Emotion with Women and Fear with Men “Man is to Computer Programmer as Woman is to Homemaker” - Bolugbasi Tolga • Found the following associations in the she-he analogies: • Queen-king • Sister-brother • Daughter-son • Mother-father • Convent-monastery • Waitress-waiter • Nurse-physician • Sewing-carpentry • housewife-shopkeeper • softball-baseball • cosmetics-pharmaceuticals • giggle-chuckle • interior designer-architect • charming-affable
  • 14.
    The Infinite Loop– Accelerated Bias Biased Society Data from Biased Society Model Outputs Action Taken Action Recorded
  • 15.
    Challenges in EliminatingBias in AI Vidhya Chandrasekaran Women in Data Science | 9 March 2019 | IIM Bangalore
  • 16.
    Fair AI An AIcan deliver a fair decision irrespective of ▪ Gender ▪ Age ▪ Race ▪ Educational background ▪ Economic background ▪ Physical appearance ▪ Power and authority … What is fairness? ▪ Individual Fairness o Similar people should be treated similarly o Definition of similarity is the key ▪ Group Fairness o Where fair decision making across groups split by Race, Gender, Age, etc. o Positive and Negative prediction rates equal across groups
  • 17.
    Why Bias Accountability isdifficult No common law Socio-Technical. Data scientist think Loss function Debugging and eliminating bias is very complex Difficult to eliminate hidden bias from proxy variables that has most information Years of biased data
  • 18.
    Can We SolveIt? Vidhya Chandrasekaran Women in Data Science | 9 March 2019 | IIM Bangalore
  • 19.
    Algorithmic Accountability • Auditsand standards should catch up with Technology • Legal standards and practices should be established • Having a governance on the algorithms • Transparency and audit of code and logic • Collaboration of Social scientist and Data scientist • Compliances and governance on variable usage • Platform to test fairness
  • 20.
    Training Data • Algorithmsare fair to all only if they are trained on all IBM research releases diversity in faces: “For face recognition to perform as desired – to be both accurate and fair – training data must provide sufficient balance and coverage” • Up-sampling and down-sampling and synthetic generation • Transfer Learning where data inadequacy • Universal unbiased word embeddings Inclusive Training Sets
  • 21.
    Decoupled Classifier –Group Fairness “Decoupled Classifier for Group Fairness” - Cynthia Dwork Sensitive attributes classify with different accuracy level for different classes • Achieving both Fairness and accuracy is very difficult • Build Multiple classifiers separate for each class • One joint function to optimize • Output is chosen based on minimizing the joint function • False positives are same across groups
  • 22.
    Debiasing with Math Man– Woman = Doctor - Nurse
  • 23.
    Regularization to TreatBias Penalize if variables are given more importance Dropout nodes randomly to avoid reliance
  • 24.
    Randomness to theRescue “Accountability of any process should grapple it” – Joshua A. Kroll in Accountable Algorithms (On randomness to design of a process) • Many Machine learning algorithm incorporates Randomness • Randomize the data used for training • Randomize the features used for the model to avoid weighing heavily on one particular feature • Assign a different class value to the positive and negative classes predicted for random sample
  • 25.
  • 26.