SlideShare a Scribd company logo
Fairness-aware Learning
   through Regularization Approach

    Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡
  † National Institute of Advanced Industrial Science and Technology (AIST)
     ‡ University of Tsukuba & Japan Science and Technology Agency
                           http://www.kamishima.net/
IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011)
  Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011



                                                                              1
Fairness-aware Learning
         through Regularization Approach

          Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡
        † National Institute of Advanced Industrial Science and Technology (AIST)
           ‡ University of Tsukuba & Japan Science and Technology Agency
                                 http://www.kamishima.net/
    IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011)
      Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011



START                                                                               1
Introduction

          Due to the spread of data mining technologies...
Data mining is being increasingly applied for serious decisions
ex. credit scoring, insurance rating, employment application
Accumulation of massive data enables to reveal personal information




  Fairness-aware / Discrimination-aware Mining
          taking care of the following sensitive information
                      in its decisions or results:
   information considered from the viewpoint of social fairness
   ex. gender, religion, race, ethnicity, handicap, political conviction
   information restricted by law or contracts
   ex. insider information, customers’ private information
                                                                           2
Outline

Backgrounds
 an example to explain why fairness-aware mining is needed

Causes of Unfairness
 prejudice, underestimation, negative legacy

Methods and Experiments
 our prejudice removal technique, experimental results

Related Work
 finding unfair association rules, situation testing, fairness-aware data
 publishing

Conclusion

                                                                           3
Backgrounds




              4
Why Fairness-aware Mining?
                                                            [Calders+ 10]

  US Census Data : predict whether their income is high or low

                              Male               Female
    High-Income              3,256      fewer       590
    Low-income               7,604                4,831
         Females are minority in the high-income class
# of High-Male data is 5.5 times # of High-Female data
While 30% of Male data are High income, only 11% of Females are

  Occum’s Razor : Mining techniques prefer simple hypothesis


              Minor patterns are frequently ignored
          and thus minorities tend to be treated unfairly

                                                                            5
Calders-Verwer Discrimination Score
                                                                [Calders+ 10]

         Calders-Verwer discrimination score (CV score)
  Pr[ Y=High-income | S=Male ] - Pr[ Y=High-income | S=Female ]
           Y: objective variable,    S: sensitive feature
        The conditional probability of the preferred decision
 given a sensitive value subtracted that given a non-sensitive value
As the values of Y, the values in sample data are used
     The baseline CV score is 0.19
Objective variables, Y, are predicted by a naive-Bayes classifier
trained from data containing all sensitive and non-sensitive features
      The CV score increases to 0.34, indicating unfair treatments
Even if sensitive features are excluded in the training of a classifier
     improved to 0.28, but still being unfairer than its baseline
Ignoring sensitive features is ineffective against the exclusion of
            their indirect influence (red-lining effect)
                                                                                6
Social Backgrounds

Equality Laws
 Many of international laws prohibit discrimination in socially-sensitive
 decision making tasks [Pedreschi+ 09]

Circulation of Private Information
 Apparently irrelevant information helps to reveal private information
 ex. Users’ demographics are predicted from query logs [Jones 09]

Contracts with Customers
 Customers’ information must be used for the purpose within the
 scope of privacy policies


We need sophisticated techniques for data analysis whose outputs
are neutral to specific information

                                                                            7
Causes of Unfairness




                       8
Three Causes of Unfairness
     There are at least three causes of unfairness in data analysis


Prejudice
 the statistical dependency of sensitive features on an objective
 variable or non-sensitive features

Underestimation
 incompletely converged predictors due to the training from the finite
 size of samples

Negative Legacy
 training data used for building predictors are unfairly sampled or
 labeled


                                                                        9
Prejudice: Prediction Model
    variables
objective variable Y : a binary class representing a result of social
decision, e.g., whether or not to allow credit
sensitive feature S : a discrete type and represents socially sensitive
information, such as gender or religion
non-sensitive feature X : a continuous type and corresponds to all
features other than a sensitive feature

prediction model
           Classification model representing Pr[ Y | X, S ]
                           M[ Y | X, S ]
   Joint distribution derived by multiplying a sample distribution
                 Pr[ Y, X, S ] = M[ Y | X, S ] Pr[ X, S ]
      considering the independence between these variables
                     over this joint distribution
                                                                          10
Prejudice
Prejudice : the statistical dependency of sensitive features on an
objective variable or non-sensitive features

   Direct Prejudice           Y⊥ S |X
                                /
                                ⊥
 a clearly unfair state that a prediction model directly depends on a
 sensitive feature
 implying the conditional dependence between Y and S given X
  Indirect Prejudice          Y⊥ S |φ
                               /⊥
 a sensitive feature depends on a objective variable
 bringing red-lining effect (unfair treatment caused by information
 which is non-sensitive, but depending on sensitive information)
   Latent Prejudice           X⊥ S |φ
                               /⊥
 a sensitive feature depends on a non-sensitive feature
 completely excluding sensitive information

                                                                        11
Relation to PPDM
                      indirect prejudice
  the dependency between a objective Y and a sensitive feature S



         from the information theoretic perspective...
         mutual information between Y and S is non-zero



          from the viewpoint of privacy-preservation...
leakage of sensitive information when an objective variable is known

                 different conditions from PPDM
introducing randomness is occasionally inappropriate for severe
decisions, such as job application
disclosure of identity isn’t problematic generally
                                                                       12
Underestimation
Underestimation : incompletely converged predictors due to the
training from the finite size of samples

  If the number of training samples is finite, the learned classifier may
 lead to more unfair decisions than that observed in training samples.

               Though such decisions are not intentional,
            they might awake suspicions of unfair treatment


 Notion of asymptotic convergence is mathematically rationale
 Unfavorable decisions for minorities due to the shortage of training
 samples might not be socially accepted

Techniques for an anytime algorithm or a class imbalance problem
might help to alleviate this underestimation
                                                                          13
Negative Legacy
Negative Legacy : training data used for building predictors are
unfairly sampled or labeled

Unfair Sampling
If people in a protected group have been refused without
investigation, those people are less frequently sampled
This problem is considered as a kind of a sample selection bias
problem, but it is difficult to detect the existence of the sampling bias


Unfair Labeling
If the people in a protected group that should favorably accepted have
been unfavorably rejected, labels of training samples become unfair
Transfer learning might help to address this problem, if additional
information, such as the small number of fairly labeled samples, is
available
                                                                           14
Methods and Experiments




                          15
Logistic Regression
       Logistic Regression : discriminative classification model

A method to remove indirect prejudice from decisions made by logistic
regression

samples of     samples of                   samples of
 objective    non-sensitive                  sensitive
 variables      features                     features  regularization
                             model                      parameter λ
                           parameter


      ln Pr({(y, x, s)}; ) + R({(y, x, s)}, ) +                         2
                                                             2          2


   regularization       Regularizer for Prejudice Removal
    parameter η                                           L2 regularizer
                                 the smaller value
  the larger value                                          avoiding
   more enforces              more strongly constraints    over fitting
      fairness          the independence between S and Y
                                                                            16
Prejudice Remover
                      Prejudice Remover Regularizer
                     mutual information between S and Y
                     so as to make Y independent from S

                                             Pr[Y, S]
                     M[Y |X, S] Pr[X, S] ln
                                            Pr[S] Pr[Y ]
             Y,X,S


                                               Pr[y|¯ s , s; ]
                                                    x
                           M[y|x, s; ] ln
         Y   {0,1} (x,s)                       s Pr[y|¯ s , s; ]
                                                       x

true distribution is replaced     approximate by the mutual info at means of X
  with sample distribution                instead of marginalizing X

             We are currently improving a computation method
                                                                                 17
Calders-Verwer Two Naive Bayes
                                                             [Calders+ 10]


                                        Calders-Verwer Two
         Naive Bayes
                                        Naive Bayes (CV2NB)

               Y                                    Y

           S       X                            S       X
S and X are conditionally            non-sensitive features X are
independent given Y                  mutually conditionally
                                     independent given Y and S


Unfair decisions are modeled by introducing of the dependency of S
on X, as well as that of Y on X
A model for representing joint distribution Y and S is built so as to
enforce fairness in decisions
                                                                             18
Experimental Results
                              Accuracy                                  Fairness (MI between S & Y)
                                                                     10−6
                0.85


                                                                     10−5
high-accuracy




                0.80

                                                                     10−4
                0.75




                                                            fairer
                                                                     10−3
                0.70

                                                                     10−2
                0.65

                                                                     10−1
                0.60

                                                        η              1
                       0.01   0.1         1        10                       0.01   0.1    1           10   η
                LR (without S)      Prejudice Remover       Naive Bayes (without S)           CV2NB

            Both CV2NB and PR made fairer decisions than their corresponding
            baseline methods
            For the larger η, the accuracy of PR tends to be declined, but no
            clear trends are found in terms of fairness
            The unstablity of PR would be due to the influence of approximation
            or the non-convexity of objective function
                                                                                                               19
Related Work




               20
Finding Unfair Association Rules
                                                           [Pedreschi+ 08, Ruggieri+ 10]

ex: association rules extracted from German Credit Data Set
(a) city=NYC     class=bad (conf=0.25)
	 0.25 of NY residents are denied their credit application
(b) city=NYC & race=African         class=bad (conf=0.75)
	 0.75 of NY residents whose race is African are denied their credit application
                                            conf( A ∧ B C )
extended lift (elift)          elift =
                                              conf( A C )
     the ratio of the confidence of a rule with additional condition
                     to the confidence of a base rule

 a-protection : considered as unfair if there exists association rules
 whose elift is larger than a
 ex: (b) isn’t a-protected if a = 2, because elift = conf(b) / conf(a) = 3

They proposed an algorithm to enumerate rules that are not a-protected
                                                                                           21
Situation Testing
                                                               [Luong+ 11]


Situation Testing : When all the conditions are same other than a
sensitive condition, people in a protected group are considered as
unfairly treated if they received unfavorable decision



                              They proposed a method for finding
           people in          unfair treatments by checking the
       a protected group
                              statistics of decisions in k-nearest
                              neighbors of data points in a
                              protected group
                              Condition of situation testing is
                              Pr[ Y | X, S=a ] = Pr[ Y | X, S=b ] ∀ X
                              This implies the independence
                              between S and Y
    k-nearest neighbors

                                                                             22
Fairness-aware Data Publishing
                                                                 [Dwork+ 11]

 data owner
                 loss function   original data
representing utilities
   for the vendor
                                                   data representation
                                                   so as to maximize
 vendor (data user)                archtype        vendor’s utility
                                                   under the constraints
                                                   to guarantee fairness
                                 fair decisions    in analysis



 They show the conditions that these archtypes should satisfy
 This condition implies that the probability of receiving favorable
 decision is irrelevant to belonging to a protected group


                                                                               23

More Related Content

What's hot

Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
Toshihiro Kamishima
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
stone55
 
Malhotra07
Malhotra07Malhotra07
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
Sara Hooker
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
Mohd. Noor Abdul Hamid
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
Andrea Dal Pozzolo
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
Sara Hooker
 
Malhotra04
Malhotra04Malhotra04
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
essp2
 
ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)Marcia Nealy
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
Andresz26
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2NBER
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
ijceronline
 
Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Chahine Hypothesis Testing,
Chahine Hypothesis Testing,
Saad Chahine
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
Erika Agostinelli
 
An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring
gerogepatton
 
What recommender systems can learn from decision psychology about preference ...
What recommender systems can learn from decision psychology about preference ...What recommender systems can learn from decision psychology about preference ...
What recommender systems can learn from decision psychology about preference ...
Eindhoven University of Technology / JADS
 

What's hot (20)

Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
Malhotra07
Malhotra07Malhotra07
Malhotra07
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Malhotra04
Malhotra04Malhotra04
Malhotra04
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
 
ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Chahine Hypothesis Testing,
Chahine Hypothesis Testing,
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
 
An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring
 
What recommender systems can learn from decision psychology about preference ...
What recommender systems can learn from decision psychology about preference ...What recommender systems can learn from decision psychology about preference ...
What recommender systems can learn from decision psychology about preference ...
 

Similar to Fairness-aware Learning through Regularization Approach

Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
Claudia Wagner
 
C0364012016
C0364012016C0364012016
C0364012016
theijes
 
06 Network Study Design: Ethical Considerations and Safeguards (2016)
06 Network Study Design: Ethical Considerations and Safeguards (2016)06 Network Study Design: Ethical Considerations and Safeguards (2016)
06 Network Study Design: Ethical Considerations and Safeguards (2016)
Duke Network Analysis Center
 
06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards
dnac
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
ARDC
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct Discrimination
Editor IJCATR
 
Project examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbersProject examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbers
John Goodpasture
 
Scoring and predicting risk preferences
Scoring and predicting risk preferencesScoring and predicting risk preferences
Scoring and predicting risk preferences
Gurdal Ertek
 
Study of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningStudy of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data Mining
IRJET Journal
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
IJERA Editor
 
Data in science
Data in science Data in science
Data in science
Sreejith Aravindakshan
 
Chapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical InferenceChapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical Inference
nszakir
 
What is the impact of bias in data analysis, and how can it be mitigated.pdf
What is the impact of bias in data analysis, and how can it be mitigated.pdfWhat is the impact of bias in data analysis, and how can it be mitigated.pdf
What is the impact of bias in data analysis, and how can it be mitigated.pdf
Soumodeep Nanee Kundu
 
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift Conference
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
ankit panigrahy
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
Delip Rao
 
Data is love data viz best practices
Data is love   data viz best practicesData is love   data viz best practices
Data is love data viz best practices
Gregory Nelson
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
Carlos Castillo (ChaTo)
 
Journal.pbio.2000638
Journal.pbio.2000638Journal.pbio.2000638
Journal.pbio.2000638
Osama Alwusaidi
 

Similar to Fairness-aware Learning through Regularization Approach (20)

Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
C0364012016
C0364012016C0364012016
C0364012016
 
06 Network Study Design: Ethical Considerations and Safeguards (2016)
06 Network Study Design: Ethical Considerations and Safeguards (2016)06 Network Study Design: Ethical Considerations and Safeguards (2016)
06 Network Study Design: Ethical Considerations and Safeguards (2016)
 
06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct Discrimination
 
Project examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbersProject examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbers
 
Scoring and predicting risk preferences
Scoring and predicting risk preferencesScoring and predicting risk preferences
Scoring and predicting risk preferences
 
Study of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningStudy of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data Mining
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
Data in science
Data in science Data in science
Data in science
 
Chapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical InferenceChapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical Inference
 
What is the impact of bias in data analysis, and how can it be mitigated.pdf
What is the impact of bias in data analysis, and how can it be mitigated.pdfWhat is the impact of bias in data analysis, and how can it be mitigated.pdf
What is the impact of bias in data analysis, and how can it be mitigated.pdf
 
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
Shift AI 2020: How to identify and treat biases in ML Models | Navdeep Sharma...
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
Data is love data viz best practices
Data is love   data viz best practicesData is love   data viz best practices
Data is love data viz best practices
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Journal.pbio.2000638
Journal.pbio.2000638Journal.pbio.2000638
Journal.pbio.2000638
 
J046065457
J046065457J046065457
J046065457
 

More from Toshihiro Kamishima

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料
Toshihiro Kamishima
 
WSDM2018読み会 資料
WSDM2018読み会 資料WSDM2018読み会 資料
WSDM2018読み会 資料
Toshihiro Kamishima
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用
Toshihiro Kamishima
 
KDD2016勉強会 資料
KDD2016勉強会 資料KDD2016勉強会 資料
KDD2016勉強会 資料
Toshihiro Kamishima
 
科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要
Toshihiro Kamishima
 
WSDM2016勉強会 資料
WSDM2016勉強会 資料WSDM2016勉強会 資料
WSDM2016勉強会 資料
Toshihiro Kamishima
 
ICML2015読み会 資料
ICML2015読み会 資料ICML2015読み会 資料
ICML2015読み会 資料
Toshihiro Kamishima
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
Toshihiro Kamishima
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
Toshihiro Kamishima
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
Toshihiro Kamishima
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
Toshihiro Kamishima
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理
Toshihiro Kamishima
 

More from Toshihiro Kamishima (12)

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料
 
WSDM2018読み会 資料
WSDM2018読み会 資料WSDM2018読み会 資料
WSDM2018読み会 資料
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用
 
KDD2016勉強会 資料
KDD2016勉強会 資料KDD2016勉強会 資料
KDD2016勉強会 資料
 
科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要
 
WSDM2016勉強会 資料
WSDM2016勉強会 資料WSDM2016勉強会 資料
WSDM2016勉強会 資料
 
ICML2015読み会 資料
ICML2015読み会 資料ICML2015読み会 資料
ICML2015読み会 資料
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

Fairness-aware Learning through Regularization Approach

  • 1. Fairness-aware Learning through Regularization Approach Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡ † National Institute of Advanced Industrial Science and Technology (AIST) ‡ University of Tsukuba & Japan Science and Technology Agency http://www.kamishima.net/ IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011) Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011 1
  • 2. Fairness-aware Learning through Regularization Approach Toshihiro Kamishima†, Shotaro Akaho†, and Jun Sakuma‡ † National Institute of Advanced Industrial Science and Technology (AIST) ‡ University of Tsukuba & Japan Science and Technology Agency http://www.kamishima.net/ IEEE Int’l Workshop on Privacy Aspects of Data Mining (PADM 2011) Dec. 11, 2011 @ Vancouver, Canada, co-located with ICDM2011 START 1
  • 3. Introduction Due to the spread of data mining technologies... Data mining is being increasingly applied for serious decisions ex. credit scoring, insurance rating, employment application Accumulation of massive data enables to reveal personal information Fairness-aware / Discrimination-aware Mining taking care of the following sensitive information in its decisions or results: information considered from the viewpoint of social fairness ex. gender, religion, race, ethnicity, handicap, political conviction information restricted by law or contracts ex. insider information, customers’ private information 2
  • 4. Outline Backgrounds an example to explain why fairness-aware mining is needed Causes of Unfairness prejudice, underestimation, negative legacy Methods and Experiments our prejudice removal technique, experimental results Related Work finding unfair association rules, situation testing, fairness-aware data publishing Conclusion 3
  • 6. Why Fairness-aware Mining? [Calders+ 10] US Census Data : predict whether their income is high or low Male Female High-Income 3,256 fewer 590 Low-income 7,604 4,831 Females are minority in the high-income class # of High-Male data is 5.5 times # of High-Female data While 30% of Male data are High income, only 11% of Females are Occum’s Razor : Mining techniques prefer simple hypothesis Minor patterns are frequently ignored and thus minorities tend to be treated unfairly 5
  • 7. Calders-Verwer Discrimination Score [Calders+ 10] Calders-Verwer discrimination score (CV score) Pr[ Y=High-income | S=Male ] - Pr[ Y=High-income | S=Female ] Y: objective variable, S: sensitive feature The conditional probability of the preferred decision given a sensitive value subtracted that given a non-sensitive value As the values of Y, the values in sample data are used The baseline CV score is 0.19 Objective variables, Y, are predicted by a naive-Bayes classifier trained from data containing all sensitive and non-sensitive features The CV score increases to 0.34, indicating unfair treatments Even if sensitive features are excluded in the training of a classifier improved to 0.28, but still being unfairer than its baseline Ignoring sensitive features is ineffective against the exclusion of their indirect influence (red-lining effect) 6
  • 8. Social Backgrounds Equality Laws Many of international laws prohibit discrimination in socially-sensitive decision making tasks [Pedreschi+ 09] Circulation of Private Information Apparently irrelevant information helps to reveal private information ex. Users’ demographics are predicted from query logs [Jones 09] Contracts with Customers Customers’ information must be used for the purpose within the scope of privacy policies We need sophisticated techniques for data analysis whose outputs are neutral to specific information 7
  • 10. Three Causes of Unfairness There are at least three causes of unfairness in data analysis Prejudice the statistical dependency of sensitive features on an objective variable or non-sensitive features Underestimation incompletely converged predictors due to the training from the finite size of samples Negative Legacy training data used for building predictors are unfairly sampled or labeled 9
  • 11. Prejudice: Prediction Model variables objective variable Y : a binary class representing a result of social decision, e.g., whether or not to allow credit sensitive feature S : a discrete type and represents socially sensitive information, such as gender or religion non-sensitive feature X : a continuous type and corresponds to all features other than a sensitive feature prediction model Classification model representing Pr[ Y | X, S ] M[ Y | X, S ] Joint distribution derived by multiplying a sample distribution Pr[ Y, X, S ] = M[ Y | X, S ] Pr[ X, S ] considering the independence between these variables over this joint distribution 10
  • 12. Prejudice Prejudice : the statistical dependency of sensitive features on an objective variable or non-sensitive features Direct Prejudice Y⊥ S |X / ⊥ a clearly unfair state that a prediction model directly depends on a sensitive feature implying the conditional dependence between Y and S given X Indirect Prejudice Y⊥ S |φ /⊥ a sensitive feature depends on a objective variable bringing red-lining effect (unfair treatment caused by information which is non-sensitive, but depending on sensitive information) Latent Prejudice X⊥ S |φ /⊥ a sensitive feature depends on a non-sensitive feature completely excluding sensitive information 11
  • 13. Relation to PPDM indirect prejudice the dependency between a objective Y and a sensitive feature S from the information theoretic perspective... mutual information between Y and S is non-zero from the viewpoint of privacy-preservation... leakage of sensitive information when an objective variable is known different conditions from PPDM introducing randomness is occasionally inappropriate for severe decisions, such as job application disclosure of identity isn’t problematic generally 12
  • 14. Underestimation Underestimation : incompletely converged predictors due to the training from the finite size of samples If the number of training samples is finite, the learned classifier may lead to more unfair decisions than that observed in training samples. Though such decisions are not intentional, they might awake suspicions of unfair treatment Notion of asymptotic convergence is mathematically rationale Unfavorable decisions for minorities due to the shortage of training samples might not be socially accepted Techniques for an anytime algorithm or a class imbalance problem might help to alleviate this underestimation 13
  • 15. Negative Legacy Negative Legacy : training data used for building predictors are unfairly sampled or labeled Unfair Sampling If people in a protected group have been refused without investigation, those people are less frequently sampled This problem is considered as a kind of a sample selection bias problem, but it is difficult to detect the existence of the sampling bias Unfair Labeling If the people in a protected group that should favorably accepted have been unfavorably rejected, labels of training samples become unfair Transfer learning might help to address this problem, if additional information, such as the small number of fairly labeled samples, is available 14
  • 17. Logistic Regression Logistic Regression : discriminative classification model A method to remove indirect prejudice from decisions made by logistic regression samples of samples of samples of objective non-sensitive sensitive variables features features regularization model parameter λ parameter ln Pr({(y, x, s)}; ) + R({(y, x, s)}, ) + 2 2 2 regularization Regularizer for Prejudice Removal parameter η L2 regularizer the smaller value the larger value avoiding more enforces more strongly constraints over fitting fairness the independence between S and Y 16
  • 18. Prejudice Remover Prejudice Remover Regularizer mutual information between S and Y so as to make Y independent from S Pr[Y, S] M[Y |X, S] Pr[X, S] ln Pr[S] Pr[Y ] Y,X,S Pr[y|¯ s , s; ] x M[y|x, s; ] ln Y {0,1} (x,s) s Pr[y|¯ s , s; ] x true distribution is replaced approximate by the mutual info at means of X with sample distribution instead of marginalizing X We are currently improving a computation method 17
  • 19. Calders-Verwer Two Naive Bayes [Calders+ 10] Calders-Verwer Two Naive Bayes Naive Bayes (CV2NB) Y Y S X S X S and X are conditionally non-sensitive features X are independent given Y mutually conditionally independent given Y and S Unfair decisions are modeled by introducing of the dependency of S on X, as well as that of Y on X A model for representing joint distribution Y and S is built so as to enforce fairness in decisions 18
  • 20. Experimental Results Accuracy Fairness (MI between S & Y) 10−6 0.85 10−5 high-accuracy 0.80 10−4 0.75 fairer 10−3 0.70 10−2 0.65 10−1 0.60 η 1 0.01 0.1 1 10 0.01 0.1 1 10 η LR (without S) Prejudice Remover Naive Bayes (without S) CV2NB Both CV2NB and PR made fairer decisions than their corresponding baseline methods For the larger η, the accuracy of PR tends to be declined, but no clear trends are found in terms of fairness The unstablity of PR would be due to the influence of approximation or the non-convexity of objective function 19
  • 22. Finding Unfair Association Rules [Pedreschi+ 08, Ruggieri+ 10] ex: association rules extracted from German Credit Data Set (a) city=NYC class=bad (conf=0.25) 0.25 of NY residents are denied their credit application (b) city=NYC & race=African class=bad (conf=0.75) 0.75 of NY residents whose race is African are denied their credit application conf( A ∧ B C ) extended lift (elift) elift = conf( A C ) the ratio of the confidence of a rule with additional condition to the confidence of a base rule a-protection : considered as unfair if there exists association rules whose elift is larger than a ex: (b) isn’t a-protected if a = 2, because elift = conf(b) / conf(a) = 3 They proposed an algorithm to enumerate rules that are not a-protected 21
  • 23. Situation Testing [Luong+ 11] Situation Testing : When all the conditions are same other than a sensitive condition, people in a protected group are considered as unfairly treated if they received unfavorable decision They proposed a method for finding people in unfair treatments by checking the a protected group statistics of decisions in k-nearest neighbors of data points in a protected group Condition of situation testing is Pr[ Y | X, S=a ] = Pr[ Y | X, S=b ] ∀ X This implies the independence between S and Y k-nearest neighbors 22
  • 24. Fairness-aware Data Publishing [Dwork+ 11] data owner loss function original data representing utilities for the vendor data representation so as to maximize vendor (data user) archtype vendor’s utility under the constraints to guarantee fairness fair decisions in analysis They show the conditions that these archtypes should satisfy This condition implies that the probability of receiving favorable decision is irrelevant to belonging to a protected group 23
  • 25. Conclusion Contributions three causes of unfairness: prejudice, underestimation, and negative legacy a prejudice remover regularizer, which enforces a classifier's independence from sensitive information experimental results of logistic regressions with our prejudice remover Future Work Computation of prejudice remover has to be improved Socially Responsible Mining Methods of data exploitation that do not damage people’s lives, such as fairness-aware mining, PPDM, or adversarial learning, together comprise the notion of socially responsible mining, which it should become an important concept in the near future. 24

Editor's Notes

  1. I’m Toshihiro Kamishima, and this is joint work with Shotaro Akaho and Jun Sakuma.\nToday, we would like to talk about fairness-aware data mining.\n
  2. Due to the spread of data mining technologies, data mining is being increasingly applied for serious decisions. For example, credit scoring, insurance rating, employment application, and so on.\nFurthermore, accumulation of massive data enables to reveal personal information.\nTo cope with these situation, several researchers have begun to develop fairness-aware mining techniques, which take care the sensitive information in their decisions.\nFirst, information considered from the viewpoint of social fairness. For example, gender, religion, race, ethnicity, handicap, political conviction, and so on.\nSecond, information restricted by law or contracts. For example, insider information, customers’ private information.\n
  3. This is an outline of our talk.\nFirst, we show an example to explain why fairness-aware mining is needed.\nSecond, we discuss three causes of unfairness.\nThird, we show our prejudice removal technique.\nFinally, we review this new research area\n\n
  4. Let’s turn to backgrounds of fairness-aware mining.\n\n
  5. This is Caldars & Verwer’s example to show why fairness-aware mining is needed.\nConsider a task to predict whether their income is high or low.\nIn this US census data, females are minority in the high-income class.\nBecause mining techniques prefer simple hypothesis, minor patterns are frequently ignored; and thus minorities tend to be treated unfairly.\n\n
  6. Caldars & Verwer proposed a score to quantify the degree of unfairness.\nThis is defined as the difference between conditional probabilities of High-income cases given Male and given Female. The larger score indicates the unfairer decision.\nThe baseline CV score is 0.19.\nIf objective variables are predicted by a naive Bayes classifier, the CV score increases to 0.34, indicating unfair treatments.\nEven if sensitive features are excluded, the score is improved to 0.28, but still being unfairer than its baseline.\nIgnoring sensitive features is ineffective against the exclusion of their indirect influence. This is called by a red-lining effect. More affirmative actions are required.\n
  7. Furthermore, fairness-aware mining are required due to social backgrounds, such as equality laws, the circulation of private information, contracts with customers, and so on.\nBecause these prohibit to use of specific information, we need sophisticated techniques for data analysis whose outputs are neutral to the information.\n
  8. We next discuss causes of unfairness.\n
  9. We consider that there are at least three causes of unfairness in data analysis: prejudice, underestimation, and negative legacy.\nWe sequentially show these causes.\n\n
  10. Before talking abut prejudice, we’d like to give some notations.\nThere are three types of variables: an objective variable Y, sensitive features S, and non-sensitive features X.\nWe introduce a classification model representing the conditional probability of Y given X and S.\nUsing this model, joint distribution is predicted by multiplying a sample distribution.\nWe consider the independence between these variables over this joint distribution.\n\n
  11. The first cause of unfairness is a prejudice, which is defined as the statistical dependency of sensitive features on an objective variable or non-sensitive features.\nWe further classify this prejudice into three types: direct, indirect, and latent.\nDirect prejudice is a clearly unfair state that a prediction model directly depends on a sensitive feature. This implies the conditional dependence between Y and S given X.\nIndirect prejudice is a state that a sensitive feature depends on a objective variable. This brings a red-lining effect.\nLatent prejudice is the state that a sensitive feature depends on a non-sensitive feature.\n\n
  12. Here, we’d like to point out the relation between indirect prejudice and PPDM.\nIndirect prejudice refers the dependency between Y and S.\nFrom information theoretic perspective, this means that mutual information between Y and S is non-zero.\nFrom the viewpoint of privacy-preservation, this is interpreted as the leakage of sensitive information when an objective variable is known.\nOn the other hand, there are different conditions from PPDM.\nintroducing randomness is occasionally inappropriate for severe decisions. For example, if my job application is rejected at random, I will complain the decision and immediately consult with lawyers. disclosure of identity isn’t problematic generally.\n
  13. Now, we turn back to the second cause of unfairness.\nUnderestimation is caused by incompletely converged predictors due to the training from the finite If the number of training samples is finite, the learned classifier may lead to more unfair decisions than that observed in the training samples.\nThough such decisions are not intentional, they might awake suspicions of unfair treatment.\nNotion of asymptotic convergence is mathematically rationale.\nHowever, unfavorable decisions for minorities due to the shortage of training samples might not be socially accepted\n
  14. The third cause of unfairness is a negative legacy, which refers that training data used for building predictors are unfairly sampled or labeled.\nUnfair sampling occurs when, for example, if people in a protected group have been refused without investigation, those people are less frequently sampled.\nUnfair labeling occurs when, for example, if the people in a protected group that should favorably accepted have been unfavorably rejected, labels of training samples become unfair.\n
  15. Among these causes of unfairness, we focus on the removal indirect prejudice.\nWe next talk about our methods and experimental results.\n
  16. We propose a method to remove indirect prejudice from decisions made by logistic regression.\nFor this purpose, we add this term to an original objective function of logistic regression.\nRegularizer for prejudice removal is designed so that the smaller value more strongly constraints the independence between S and Y.\nEta is a regularization parameter. The lager value more enforces fairness.\n
  17. To remove indirect prejudice, we develop a prejudice remover regularizer.\nThis regularizer is mutual information between S and Y so as to make Y independent from S.\nThis is calculated by this formula. Rather brute-force approximation is adopted.\n
  18. We compared our logistic regression with Calders-Verwer two-naive-Bayes classifier.\nUnfair decisions are modeled by introducing of the dependency of X on S as well as on Y.\nA model for representing joint distribution Y and S is built so as to enforce fairness in decisions.\n\n
  19. This is a summary of our experimental results.\nOur method balances accuracy and fairness by tuning η parameter.\nA two-naive-Bayes classifier is very powerful. We could win in accuracy, but lost in fairness.\nBoth CV2NB and PR made fairer decisions than their baseline methods.\nFor the larger η, the accuracy of PR tends to be declined, but no clear trends are found in terms of fairness\nThe unstablity of PR would be due to the influence of brute-force approximation or the non-convexity of objective function. We plan to investigate this point.\n\n
  20. Fairness-aware mining is a relatively new problem, so we briefly review related work.\n
  21. To our knowledge, Pedreschi, Ruggieri, and Turini firstly addressed the problem of fairness in data mining.\nThey proposed a measure to quantify the unfairness of association rules.\nELIFT is defined as the ratio of the confidence of an association rule with additional condition to the confidence of a base rule.\nThey proposed a notion of a-protection. Rules are considered as unfair if there exists association rules whose extended lift is larger than a.\nThey proposed an algorithm to enumerate rules that are not a-protected\n
  22. This group proposed another notion, situation testing.\nWhen all the conditions are same other than a sensitive condition, people in a protected group are considered as unfairly treated if they received unfavorable decision.\nThey proposed a method for finding unfair treatments by checking the statistics of decisions in k-nearest neighbors of data points in a protected group.\nThis situation testing has relation to our indirect prejudice.\n\n
  23. Dwork and colleagues proposed a framework for fairness-aware data publishing.\nVendor send a loss function, which represents utilities for the vendor.\nUsing this function, data owner transforms original data into archtypes.\nArchtype is a data representation so as to maximize vendor’s utility under the constraints to guarantee fair results.\nThey show the conditions that these archtypes should satisfy, and this condition implies that the probability of receiving favorable decision is irrelevant to belonging to a protected group.\nThis further implies the independence between S and Y, and has relation to our indirect prejudice.\n\n
  24. Our contributions are as follows.\nIn future work, computation of prejudice remover has to be improved.\nMethods of data exploitation that do not damage people’s lives, such as fairness-aware mining, PPDM, or adversarial learning, together comprise the notion of socially responsible mining, which it should become an important concept in the near future. \nThat’s all we have to say. Thank you for your attention.\n