Algorithmic Bias and Fairness
1
Automated Decision Making: Pros
 Handles large volumes of data (Google search, airline
reservations, online markets, ..)
 Avoids certain kinds of bias
 Parole judges being more lenient after a meal
 Making hiring decisions based on the name of the person
 Subjectivity in evaluations of papers, music, teaching, etc.
 Human judgment in NYC stop and frisk policy
 4.4 M were stopped between 2004-2012
 88% of them led to no further action
 83% of the people stopped were Black or Hispanic – only
about half in the population are.
2
Complex and Opaque Decisions
 Hard to understand and make sense of
 Values, biases and potential discrimination built in
 The code is opaque and often trade secret
 Facebook’s newsfeed algorithm, recidivism algorithms, genetic testing
3
Gatekeeping Function
 Decide what gets attention, what is
published, and what is censored
 Google’s search results of geopolitical
queries might depend on location, e.g.,
different maps of Pakistan or India.
 Learning algorithms that make hiring
decisions.
 Pattern: Low commute time favors
low turnover
 Policy: Don’t hire from far off places
with bad public transportation
 Impact: People from poor and far off
neighborhoods may not be hired
4
Subjective Decision Making
 Algorithms to understand and translate language, drive cars, pilot
planes, and diagnose diseases.
 No right answer, but judgment and values.
 Detecting and removing terrorist content on the social networks.
 The definition of important words such as `terrorist’ and ‘extreme
content’ are controversial
 The scale makes it difficult for manual intervention.
 Algorithmic decisions may not be as good as people
5
Machine Learning
 Programs might be using protected attributes such as race and
gender to make predictions
 Even if the protected attributes are not used, they could be using
other “proxy” attributes which will have the same effect, e.g., zip
code.
 Recommendations based on earlier actions might create
bubbles, eg. Detecting trends on Twitter.
 Example: Predictive policing
 Predicting the neighborhoods most likely to be involved in
future crime based on crime statistics
 Rational but may be indistinguishable from racial profiling
 More police in the neighborhood lead to more arrests.
 Could lead to positive feedback loops and become a self-
fulfilling prophecy.
6
Data Privacy
 Who owns your browser data?
 Can your insurance company get access to your grocery list or
peek into your fridge?
 Can hospitals get access to consumer data to predict who is
going to get sick?
 Can your employer access your grades?
7
Transparency and Notification
 If the algorithm is opaque, there is no understanding or trust in
the program, e.g., medical decisions, hiring decisions
 Google’s search algorithm judged not demonstrably anti-
competitive in the US
 European Commission has successfully pursued an anti-
trust investigation
 Many points of trust: algorithm, input, learning data, control
surfaces, assumptions and models the algorithm uses, etc.
 Complete transparency makes it vulnerable to hacking. Does
not guarantee scrutiny.
 Consumers might demand the right to be notified when using
their information or demand excluding their personal information
8
Algorithmic Accountability
 How search engines censor violent/sexual search terms
 What influences Facebook’s newsfeed program or Google’s
advertisements
 Need causal explanations that link our digital experience with
data they are based upon
9
Government Regulation
 Destabilizing effect of high-speed trading systems led to
demands of transparency of these algorithms and ability to
modify them
 Should search algorithms be forced to follow some “search
neutrality rules”?
 Requires public officials to have access to the program and
modify it in the interest of public.
 There is no one right answer to the queries Google handles,
which makes it difficult.
10
Case Study: Recidivism Assessment
 COMPAS is a program to assess the recidivism of the prisoners –
their propensity to commit a crime in 3 years after the release.
 Propublica analyzed data of 10,000 prisoners in a Florida county
 There is one such table for Blacks and another for Whites. Θ is
chosen for each group separately.
 False Positive Rate 𝐹𝑃𝑅 =
𝐹𝑃
(𝐹𝑃+𝑇𝑁)
 Positive Predictive Value 𝑃𝑃𝑉 =
𝑇𝑃
(𝐹𝑃+𝑇𝑃)
 Propublica: FPR(Blacks) = 2 FPR(Whites)
 NorthPointe: PPV(Blacks) = PPV(Whites)
11
Recidivism Score ≤ θ Score > θ
False TN FP
True FN TP
Conflicting Demands on Fairness
12
 Red = False positives, FP; Blue= True positives TP
 Assumptions:
 Prevalence or rate of recidivism is higher for one group (say blacks)
 Positive Predictive Value 𝑃𝑃𝑉 =
𝑇𝑃
(𝐹𝑃+𝑇𝑃)
= same for both
 False Positive Rate 𝐹𝑃𝑅 =
𝐹𝑃
(𝐹𝑃+𝑇𝑁)
= higher for blacks.
White Black
Recidivism
=True
Prediction = Positive
Fairness of Recidivism Scores
Recidivism LowScore HighScore
False TN FP
True FN TP
13
 False Positive Rate 𝐹𝑃𝑅 =
𝐹𝑃
(𝐹𝑃+𝑇𝑁)
 False Negative Rate 𝐹𝑁𝑅 =
𝐹𝑁
(𝐹𝑁+𝑇𝑃)
 Prevalence p =
(𝐹𝑁+𝑇𝑃)
(𝐹𝑁+𝐹𝑃+𝑇𝑁+𝑇𝑃)
𝐹𝑃𝑅 =
𝑝
1 − 𝑝
1 − 𝑃𝑃𝑉
𝑃𝑃𝑉
(1 − FNR)
 Conclusion: If the prevalence p is different for two classes and
PPVs are the same then FNR or FPR or both must be different.
 The differences in FPR and FNR lead to disparate impacts –
more penalty for Blacks in both recidivism groups than Whites.
Summary
 It is mathematically impossible to achieve both equal PPV and
equal FPR across different groups.
 The differences in FPR and FNR persist in subgroups of
defendants.
 However, evidence suggests that data-driven risk assessment
tools (in medicine) are more accurate than human judgment.
 Human driven decisions are themselves prone to exhibiting
racial bias, eg, paroles, sentencing, stop and frisk, arrests, etc.
14
Case Study: Online Market Places
 How do we ensure that the sellers are honest about the quality of their
goods?
 Study: In early 2000’s eBay merchants misrepresented the quality
of their sports trading cards
 Problem largely solved by the feedback and reputation systems
 New development: demand for more information
 Study (2012): Subjects rated trustworthiness of potential borrowers
from photographs of them.
 People who looked trustworthy are more likely to get loans
 They are also more likely to repay their loans.
 More information leads to more freedom
 People can now choose whom to do business with based on looks
 A growing body of evidence suggests this leads to discrimination
15
Discrimination in Online Markets
 Air-BnB Study: 20 profiles sent to 6400 hosts
 The profiles are identical except 10 of them have names common to
white people and the rest to blacks
 Result: Requests for black-sounding names were 16% less
successful
 Discrimination was pervasive. Most of the people who rejected
never hosted a black guest.
 Other areas of discrimination: credit, labor markets, housing.
 Discrimination also occurs in algorithmic decisions.
 Searches for black sounding names on Google were more likely to
bring up ads about arrest records.
 Why?
 Learning from the past search data.
16
Principles and Recommendations
 Don’t Ignore potential discrimination
 Collect good data including race and gender stats
 Do regular reports and occasional audits
 Public disclosure of discrimination-related data
 Keep an experimental mindset to evaluate different design options
 Airbnb withholding host pictures from its ads
17
Design Decisions
 Control the information, its timing and salience
 When can you see the picture of Uber driver?
 Increase automation and charge for control
 Make instant book the default on AirBnB and charge a fee if the
host wants to approve the guest first
 Prioritize discrimination issues
 Remind the host about anti-discrimination policies at the time of
the transaction
 Make algorithms discrimination-aware
 Set explicit objectives: want my black and white customers to
be rejected at the same rate
18
Virtual Screens
 In mid 60’s less than 10% of the big 5 orchestras were women
 Moved away from face-to-face to behind-the-screen auditions
 Success rate of female musicians increased by 160%
 The online market allows virtual screens between buyers and sellers,
between employers and employees.
19
Case Study: Gerrymandering
20
 Background
 In the US, states are divided into
congressional districts every 10 years
 Each state is divided into precincts of
equal population
 The precincts are clustered into
congressional districts
 Whoever wins the majority of precincts
in the district wins that district
 Gerrymandering (named after Elbridge
Gerry) refers to manipulation of districts to
influence the outcome of an election.
 Packing: Pack most of the voters in the
opposing side into a small number of
districts
 Cracking: Split the voters of the
opposing side into several districts
where they are minority The original political cartoon on
Gerrymandered map of Essex
County Massachusetts, 1812
Impact of gerrymandering
 Racial gerrymandering that intentionally reduces minority
representation was ruled illegal in 1960.
 In 1980, voting rights act was amended to make states redraw
maps if they had racially discriminatory impact.
 Partisan gerrymandering has not been ruled illegal
 When republicans drew the maps (17 states) they won about 53 percent of
the vote and 72 percent of the seats.
 When democrats drew the maps (6 states), they won about 56 percent of
the vote and 71 percent of the seats.
 Proportional representation: Each party receives roughly the
same percent of votes as it wins the percent of the seats
 Wasted votes: Votes cast to the losing side or above the
minimum the winner needed to win.
 Efficiency gap: The difference in the wasted votes / total wasted.
It is intended to measure partisan bias.
21
Wisconsin’s redistricting in 2011
22
 Wisconsin’s Republican-led redistricting was struck
down by a 3 judge panel. It was heard by the supreme
court on October 3. A decision is pending.
 The arguments of the plaintiffs:
 Big efficiency gap indicates bias especially if it is
persistent. Wisconsin’s gap is the biggest ever.
 It violates voters’ right to equal treatment
 It discriminates against their views (first
amendment argument)
 Arguments of the defendants:
 Efficiency gaps arise naturally, e.g., when
democrats pack into cities
 Courts should stay out of it. States can appoint
independent commissions if they are concerned
 Justice Kennedy’s vote is probably going to be
decisive.
Discussion
Suppose you are heading an independent commission to
recommend a fair redistricting approach.
 How do you define fair redistricting? Why?
 How would you go about implementing your recommendation?
 What role do computer algorithms play?
23

algorithmic-bias.pptx

  • 1.
  • 2.
    Automated Decision Making:Pros  Handles large volumes of data (Google search, airline reservations, online markets, ..)  Avoids certain kinds of bias  Parole judges being more lenient after a meal  Making hiring decisions based on the name of the person  Subjectivity in evaluations of papers, music, teaching, etc.  Human judgment in NYC stop and frisk policy  4.4 M were stopped between 2004-2012  88% of them led to no further action  83% of the people stopped were Black or Hispanic – only about half in the population are. 2
  • 3.
    Complex and OpaqueDecisions  Hard to understand and make sense of  Values, biases and potential discrimination built in  The code is opaque and often trade secret  Facebook’s newsfeed algorithm, recidivism algorithms, genetic testing 3
  • 4.
    Gatekeeping Function  Decidewhat gets attention, what is published, and what is censored  Google’s search results of geopolitical queries might depend on location, e.g., different maps of Pakistan or India.  Learning algorithms that make hiring decisions.  Pattern: Low commute time favors low turnover  Policy: Don’t hire from far off places with bad public transportation  Impact: People from poor and far off neighborhoods may not be hired 4
  • 5.
    Subjective Decision Making Algorithms to understand and translate language, drive cars, pilot planes, and diagnose diseases.  No right answer, but judgment and values.  Detecting and removing terrorist content on the social networks.  The definition of important words such as `terrorist’ and ‘extreme content’ are controversial  The scale makes it difficult for manual intervention.  Algorithmic decisions may not be as good as people 5
  • 6.
    Machine Learning  Programsmight be using protected attributes such as race and gender to make predictions  Even if the protected attributes are not used, they could be using other “proxy” attributes which will have the same effect, e.g., zip code.  Recommendations based on earlier actions might create bubbles, eg. Detecting trends on Twitter.  Example: Predictive policing  Predicting the neighborhoods most likely to be involved in future crime based on crime statistics  Rational but may be indistinguishable from racial profiling  More police in the neighborhood lead to more arrests.  Could lead to positive feedback loops and become a self- fulfilling prophecy. 6
  • 7.
    Data Privacy  Whoowns your browser data?  Can your insurance company get access to your grocery list or peek into your fridge?  Can hospitals get access to consumer data to predict who is going to get sick?  Can your employer access your grades? 7
  • 8.
    Transparency and Notification If the algorithm is opaque, there is no understanding or trust in the program, e.g., medical decisions, hiring decisions  Google’s search algorithm judged not demonstrably anti- competitive in the US  European Commission has successfully pursued an anti- trust investigation  Many points of trust: algorithm, input, learning data, control surfaces, assumptions and models the algorithm uses, etc.  Complete transparency makes it vulnerable to hacking. Does not guarantee scrutiny.  Consumers might demand the right to be notified when using their information or demand excluding their personal information 8
  • 9.
    Algorithmic Accountability  Howsearch engines censor violent/sexual search terms  What influences Facebook’s newsfeed program or Google’s advertisements  Need causal explanations that link our digital experience with data they are based upon 9
  • 10.
    Government Regulation  Destabilizingeffect of high-speed trading systems led to demands of transparency of these algorithms and ability to modify them  Should search algorithms be forced to follow some “search neutrality rules”?  Requires public officials to have access to the program and modify it in the interest of public.  There is no one right answer to the queries Google handles, which makes it difficult. 10
  • 11.
    Case Study: RecidivismAssessment  COMPAS is a program to assess the recidivism of the prisoners – their propensity to commit a crime in 3 years after the release.  Propublica analyzed data of 10,000 prisoners in a Florida county  There is one such table for Blacks and another for Whites. Θ is chosen for each group separately.  False Positive Rate 𝐹𝑃𝑅 = 𝐹𝑃 (𝐹𝑃+𝑇𝑁)  Positive Predictive Value 𝑃𝑃𝑉 = 𝑇𝑃 (𝐹𝑃+𝑇𝑃)  Propublica: FPR(Blacks) = 2 FPR(Whites)  NorthPointe: PPV(Blacks) = PPV(Whites) 11 Recidivism Score ≤ θ Score > θ False TN FP True FN TP
  • 12.
    Conflicting Demands onFairness 12  Red = False positives, FP; Blue= True positives TP  Assumptions:  Prevalence or rate of recidivism is higher for one group (say blacks)  Positive Predictive Value 𝑃𝑃𝑉 = 𝑇𝑃 (𝐹𝑃+𝑇𝑃) = same for both  False Positive Rate 𝐹𝑃𝑅 = 𝐹𝑃 (𝐹𝑃+𝑇𝑁) = higher for blacks. White Black Recidivism =True Prediction = Positive
  • 13.
    Fairness of RecidivismScores Recidivism LowScore HighScore False TN FP True FN TP 13  False Positive Rate 𝐹𝑃𝑅 = 𝐹𝑃 (𝐹𝑃+𝑇𝑁)  False Negative Rate 𝐹𝑁𝑅 = 𝐹𝑁 (𝐹𝑁+𝑇𝑃)  Prevalence p = (𝐹𝑁+𝑇𝑃) (𝐹𝑁+𝐹𝑃+𝑇𝑁+𝑇𝑃) 𝐹𝑃𝑅 = 𝑝 1 − 𝑝 1 − 𝑃𝑃𝑉 𝑃𝑃𝑉 (1 − FNR)  Conclusion: If the prevalence p is different for two classes and PPVs are the same then FNR or FPR or both must be different.  The differences in FPR and FNR lead to disparate impacts – more penalty for Blacks in both recidivism groups than Whites.
  • 14.
    Summary  It ismathematically impossible to achieve both equal PPV and equal FPR across different groups.  The differences in FPR and FNR persist in subgroups of defendants.  However, evidence suggests that data-driven risk assessment tools (in medicine) are more accurate than human judgment.  Human driven decisions are themselves prone to exhibiting racial bias, eg, paroles, sentencing, stop and frisk, arrests, etc. 14
  • 15.
    Case Study: OnlineMarket Places  How do we ensure that the sellers are honest about the quality of their goods?  Study: In early 2000’s eBay merchants misrepresented the quality of their sports trading cards  Problem largely solved by the feedback and reputation systems  New development: demand for more information  Study (2012): Subjects rated trustworthiness of potential borrowers from photographs of them.  People who looked trustworthy are more likely to get loans  They are also more likely to repay their loans.  More information leads to more freedom  People can now choose whom to do business with based on looks  A growing body of evidence suggests this leads to discrimination 15
  • 16.
    Discrimination in OnlineMarkets  Air-BnB Study: 20 profiles sent to 6400 hosts  The profiles are identical except 10 of them have names common to white people and the rest to blacks  Result: Requests for black-sounding names were 16% less successful  Discrimination was pervasive. Most of the people who rejected never hosted a black guest.  Other areas of discrimination: credit, labor markets, housing.  Discrimination also occurs in algorithmic decisions.  Searches for black sounding names on Google were more likely to bring up ads about arrest records.  Why?  Learning from the past search data. 16
  • 17.
    Principles and Recommendations Don’t Ignore potential discrimination  Collect good data including race and gender stats  Do regular reports and occasional audits  Public disclosure of discrimination-related data  Keep an experimental mindset to evaluate different design options  Airbnb withholding host pictures from its ads 17
  • 18.
    Design Decisions  Controlthe information, its timing and salience  When can you see the picture of Uber driver?  Increase automation and charge for control  Make instant book the default on AirBnB and charge a fee if the host wants to approve the guest first  Prioritize discrimination issues  Remind the host about anti-discrimination policies at the time of the transaction  Make algorithms discrimination-aware  Set explicit objectives: want my black and white customers to be rejected at the same rate 18
  • 19.
    Virtual Screens  Inmid 60’s less than 10% of the big 5 orchestras were women  Moved away from face-to-face to behind-the-screen auditions  Success rate of female musicians increased by 160%  The online market allows virtual screens between buyers and sellers, between employers and employees. 19
  • 20.
    Case Study: Gerrymandering 20 Background  In the US, states are divided into congressional districts every 10 years  Each state is divided into precincts of equal population  The precincts are clustered into congressional districts  Whoever wins the majority of precincts in the district wins that district  Gerrymandering (named after Elbridge Gerry) refers to manipulation of districts to influence the outcome of an election.  Packing: Pack most of the voters in the opposing side into a small number of districts  Cracking: Split the voters of the opposing side into several districts where they are minority The original political cartoon on Gerrymandered map of Essex County Massachusetts, 1812
  • 21.
    Impact of gerrymandering Racial gerrymandering that intentionally reduces minority representation was ruled illegal in 1960.  In 1980, voting rights act was amended to make states redraw maps if they had racially discriminatory impact.  Partisan gerrymandering has not been ruled illegal  When republicans drew the maps (17 states) they won about 53 percent of the vote and 72 percent of the seats.  When democrats drew the maps (6 states), they won about 56 percent of the vote and 71 percent of the seats.  Proportional representation: Each party receives roughly the same percent of votes as it wins the percent of the seats  Wasted votes: Votes cast to the losing side or above the minimum the winner needed to win.  Efficiency gap: The difference in the wasted votes / total wasted. It is intended to measure partisan bias. 21
  • 22.
    Wisconsin’s redistricting in2011 22  Wisconsin’s Republican-led redistricting was struck down by a 3 judge panel. It was heard by the supreme court on October 3. A decision is pending.  The arguments of the plaintiffs:  Big efficiency gap indicates bias especially if it is persistent. Wisconsin’s gap is the biggest ever.  It violates voters’ right to equal treatment  It discriminates against their views (first amendment argument)  Arguments of the defendants:  Efficiency gaps arise naturally, e.g., when democrats pack into cities  Courts should stay out of it. States can appoint independent commissions if they are concerned  Justice Kennedy’s vote is probably going to be decisive.
  • 23.
    Discussion Suppose you areheading an independent commission to recommend a fair redistricting approach.  How do you define fair redistricting? Why?  How would you go about implementing your recommendation?  What role do computer algorithms play? 23