SlideShare a Scribd company logo
Considerations on
              Fairness-aware Data Mining

   Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma†
        *National Institute of Advanced Industrial Science and Technology (AIST), Japan
          †University of Tsukuba, Japan; and Japan Science and Technology Agency



  IEEE ICDM International Workshop on Discrimination and Privacy-Aware Data Mining
                             @ Brussels, Belgium, Dec. 10, 2012




START                                                                                     1
Overview

             Fairness-aware Data Mining
      data analysis taking into account potential issues of
      fairness, discrimination, neutrality, or independence




                Introduction of the following topics
       to give an overview of fairness-aware data mining
Applications of fairness-aware data mining for the other purposes
besides avoiding discrimination
An integrated view of existing fairness measures based on the
statistical independence
Connections of fairness-aware data mining with other techniques
for data analysis

                                                                    2
Applications of
Fairness-Aware Data Mining




                             3
Discrimination-aware Data Mining
                                                                              [Pedreschi+ 08]

                   Discrimination-aware Data Mining
               Detection or elimination of the influence
         of socially sensitive information to serious decisions
    information considered from the viewpoint of social fairness
    ex. gender, religion, race, ethnicity, handicap, political conviction
    information restricted by law or contracts
    ex. insider information, customers’ private information




             This framework for detecting or eliminating
    the influence of a specific information to the analysis results
  can be used for other purposes besides avoiding discrimination
   enhancing the neutrality of the prediction
   excluding uninteresting information from analysis results
✴This is the reason why we use the term “fairness-aware” instead of “discrimination-aware”

                                                                                                4
Filter Bubble
                                                        [TED Talk by Eli Pariser]

                       The Filter Bubble Problem
Pariser posed a concern that personalization technologies narrow and
bias the topics of information provided to people
                     http://www.thefilterbubble.com/

              Friend Recommendation List in Facebook




To fit for Pariser’s preference, conservative people are eliminated from
     his recommendation list, while this fact is not notified to him
                                                                                    5
Information Neutral Recommender System
                                                           [Kamishima+ 12]


                     The Filter Bubble Problem



            Information Neutral Recommender System
   enhancing the neutrality from a viewpoint specified by a user
            and other viewpoints are not considered


       Fairness-aware data mining techniques are used for
       removing the influence of the specified information


ex. A system enhances the neutrality in terms of whether conservative
   or progressive, but it is allowed to make biased recommendations
   in terms of other viewpoints, for example, the birthplace or age of
   friends
                                                                             6
Ignoring Uninteresting Information
                                                               [Gondek+ 04]

non-redundant clustering : find clusters that are as independent
from a given uninteresting partition as possible
          a conditional information bottleneck method,
      which is a variant of an information bottleneck method
clustering facial images

                           Simple clustering methods find two clusters:
                           one contains only faces, and the other
                           contains faces with shoulders

                           Data analysts consider this clustering is
                           useless and uninteresting

                           A non-redundant clustering method derives
                           more useful male and female clusters,
                           which are independent of the above clusters
                                                                              7
An Integrated View of
Existing Fairness Measures




                             8
Notations of Variables
                                    a result of serious decision
Y   objective variable              ex., whether or not to allow credit
                                    socially sensitive information
S   sensitive feature               ex., gender or religion

X   non-sensitive feature vector
                          all features other than a sensitive feature
                          non-sensitive, but may correlate with S

    X   (E)    explainable non-sensitive feature vector


    X   (U )   unexplainable non-sensitive feature vector
    Non-sensitive features can be further classified explainable and
    unexplainable (we will introduce later)

                                                                          9
Prejudice
                                                             [Kamishima+ 12]

Prejudice : the statistical dependences of an objective variable or
non-sensitive features on a sensitive feature

   Direct Prejudice           Y 6? S | X
                                 ?
 a clearly unfair state that a prediction model directly depends on a
 sensitive feature
 implying the conditional dependence between Y and S given X

  Indirect Prejudice           Y 6? S | ;
                                  ?
 the dependence of an objective variable Y on a sensitive feature S
 bringing a red-lining effect

Indirect Prejudice with Explainable Features          Y 6? S | X(E)
                                                         ?
 the conditional dependence between Y and S given explainable non-
 sensitive features, if a part of non-sensitive features are explainable
                                                                               10
Prejudice


The degree of prejudices can be evaluated:
 computing statistics for the independence (among observations)
 ex. mutual information, χ2-statistics
 applying tests of independence (among sampled population)
 χ2-tests


Relations between this prejudice and other existing measures for
unfairness
 elift (extended lift)
 CV score (Calders-Verwer’s discrimination score)
 CV score with explainable features



                                                                   11
elift (extended lift)
                                                     [Pedreschi+ 08, Ruggieri+ 10]


                             conf(X = x, S= ) Y = )
   elift (extended lift) =
                                conf(X = x ) Y = )
    the ratio of the confidence of a rule with additional condition
                    to the confidence of a base rule

The condition elift = 1 means that no unfair treatments, and it implies
        Pr[Y = , S=           |X=x] = Pr[Y =          |X=x]
        If this condition is the case for any x 2 Dom(X)
    (the condition of no direct discrimination in their definition)
                  and S and Y are binary variables:
                      Pr[Y, S|X] = Pr[Y |X]
    This is equivalent to the condition of no direct prejudice
        Similarly, their condition of no-indirect-discrimination
       is equivalent to that of no-indirect-prejudice condition
                                                                                     12
CV Score
      (Calders-Verwer’s Discrimination Score)
                                                           [Calders+ 10]

        Calders-Verwer discrimination score (CV score)
            Pr[Y = + |S=+]          Pr[Y = + |S= ]
       The conditional probability of the advantageous decision
for non-protected members subtracted by that for protected members




                The condition CV score = 0 means
            that neither unfair or affirmative treatments


                        Pr[Y |S] = Pr[Y ]
   This is equivalent to the condition of no indirect prejudice

                                                                           13
Explainability
                                                               [Zliobaite+ 11]

non-discriminatory cases, even if distributions of an objective variable
depends on a sensitive feature
ex : admission to a university
                        more females apply
                        to medicine
 explainable feature    more males apply to          sensitive feature
       program          computer                         gender
 medicine / computer                                   male / female

                                  objective variable
 medicine : low acceptance         acceptance ratio
 computer : high acceptance       accept / not accept
Because females tend to apply to a more competitive program,
females are more frequently rejected
Such difference is explainable and is considered as non-discriminatory
                                                                                 14
Explainability
To remove unfair treatments under such a condition, the probability of
                                       †
receiving an advantageous decision, 	Pr [Y = + |X (E) ], is chosen so as
to satisfy the condition:
         †
     Pr [Y = + |X (E) ] =
     Pr† [Y = + |X (E) , S=+] = Pr† [Y = + |X (E) , S= ]


                           This condition implies
                    Pr[Y |X (E) , S] = Pr[Y |X (E) ]
                       This is equivalent to
 the condition of no indirect prejudice with explainable features
✴Notions that is similar to this explainability are proposed as a legally-grounded
 feature [Luong+ 11] and are considered in the rule generalization procedure
 [Haijan+ 12].

                                                                                     15
Connections with Other Techniques
        for Data Analysis




                                    16
Privacy-Preserving Data Mining
                     indirect prejudice
 the dependence between an objective Y and a sensitive feature S



           from the information theoretic perspective,
          mutual information between Y and S is non-zero



           from the viewpoint of privacy-preservation,
leakage of sensitive information when an objective variable is known

                    Different points from PPDM
introducing randomness is occasionally inappropriate for severe
decisions, such as job application
disclosure of identity isn’t problematic in FADM, generally
                                                                       17
Causal Inference
                                                                        [Pearl 2009]

                          Causal inference
  a general technique for dealing with probabilistic causal relations

Both FADM and causal inference can cope with the influence of
sensitive information to an objective, but their premises are quite
different
While causal inference demands more information about causality
among variables, it can handle the facts that were not occurred

example of the connection between FADM and causal inference
A person of S=+ actually receives Y=+
If S was changed to -, how was the probability of receiving Y=-?
Under the condition called exogenous, the range of this probability is
max[0, Pr[Y = + |S=+]   Pr[Y = + |S= ]]  PNS  min[Pr[Y =+, X=+], Pr[Y = , X= ]]

             This lower bound coincides with a CV score
                                                                                       18
Cost-Sensitive Learning
                                                                  [Elkan 2001]


Cost-Sensitive Learning: learning classifiers so as to optimize
classification costs, instead of maximizing prediction accuracies


FADM can be regarded as a kind of cost-sensitive learning that
pays the costs for taking fairness into consideration



Cost matrix C(i | j): cost if a true class j is predicted as class i
Total cost to minimize is formally defined as (if class Y=+ or -):
                             P
                L(x, i) =        j   Pr[j | x]C(i | j)
An object x is classified into the class i whose cost is minimized



                                                                                 19
Cost-Sensitive Learning
                                                                 [Elkan 2001]

  Theorem 1 in [Elkan 2001]
  If negative examples in a data set is over-sampled by the factor of
                               C(+| )
                               C( |+)
  and a classifier is learned from this samples, a classifier to
  optimize specified costs is obtained


  In a FADM case, an over-sampling technique is used for avoiding
  unfair treatments

  A corresponding cost matrix can be computed by this theorem,
  which connects a cost matrix and the class ratio in training data

✴Note that this over-sampling technique is simple and effective for
 avoiding unfair decisions, but its weak point that it completely
 ignores non-sensitive features
                                                                                20
Other Connected Techniques
Legitimacy
 Data mining models can be deployed in the real world
Independent Component Analysis
  Transformation while maintaining the independence between features
Delegate Data
 To perform statistical tests, specific information is removed from data
 sets
Dummy Query
 Dummy queries are inputted for protecting users’ demographics into
 search engines or recommender systems
Visual Anonymization
 To protect identities of persons in images, faces or other information
 is blurred

                                                                          21
Conclusion

Contributions
We introduced the following topics to give an overview of fairness-
aware data mining
 We showed other FADM applications besides avoiding discrimination:
 enhancing neutrality and ignoring uninteresting information.
 We discussed the relations between our notion of prejudice and other
 existing measures of unfairness.
 We showed the connections of FADM with privacy-preserving data
 mining, causal inference, cost-sensitive learning, and so on.
Socially Responsible Mining
 Methods of data exploitation that do not damage people’s lives, such
 as fairness-aware data mining, PPDM, or adversarial learning,
 together comprise the notion of socially responsible mining, which
 it should become an important concept in the near future.

                                                                        22
Program Codes and Data Sets
               Fairness-Aware Data Mining
            http://www.kamishima.net/fadm

         Information Neutral Recommender System
            http://www.kamishima.net/inrs



                  Acknowledgements
This work is supported by MEXT/JSPS KAKENHI Grant Number
16700157, 21500154, 22500142, 23240043, and 24500194, and JST
PRESTO 09152492



                                                                23

More Related Content

What's hot

Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
Toshihiro Kamishima
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
stone55
 
Malhotra07
Malhotra07Malhotra07
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
Andrea Dal Pozzolo
 
Malhotra04
Malhotra04Malhotra04
ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)
Marcia Nealy
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
Mohd. Noor Abdul Hamid
 
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
essp2
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
Sara Hooker
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
Sara Hooker
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Chahine Hypothesis Testing,
Chahine Hypothesis Testing,
Saad Chahine
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
Erika Agostinelli
 
An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring
gerogepatton
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
NBER
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
Andresz26
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
Ben Bolker
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
ijceronline
 
Malhotra08
Malhotra08Malhotra08

What's hot (19)

Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
Malhotra07
Malhotra07Malhotra07
Malhotra07
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
 
Malhotra04
Malhotra04Malhotra04
Malhotra04
 
ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Chahine Hypothesis Testing,
Chahine Hypothesis Testing,
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
 
An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Malhotra08
Malhotra08Malhotra08
Malhotra08
 

Viewers also liked

文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
Shohei Okada
 
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
Shohei Okada
 
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
Shohei Okada
 
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
Shohei Okada
 
文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences
Shohei Okada
 
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
Shohei Okada
 
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
Shohei Okada
 
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
Shohei Okada
 
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
Shohei Okada
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
Toshihiro Kamishima
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
Toshihiro Kamishima
 
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについてShohei Okada
 
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
Shohei Okada
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
Toshihiro Kamishima
 
WSDM2016勉強会 資料
WSDM2016勉強会 資料WSDM2016勉強会 資料
WSDM2016勉強会 資料
Toshihiro Kamishima
 
The Infamous Hello World Program
The Infamous Hello World ProgramThe Infamous Hello World Program
The Infamous Hello World Program
Shohei Okada
 
PRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルPRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルShohei Okada
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理
Toshihiro Kamishima
 
ICML2015読み会 資料
ICML2015読み会 資料ICML2015読み会 資料
ICML2015読み会 資料
Toshihiro Kamishima
 
Python入門
Python入門Python入門
Python入門
Shohei Okada
 

Viewers also liked (20)

文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
 
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
 
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
 
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
 
文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences
 
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
 
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
 
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
 
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
 
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
 
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
 
WSDM2016勉強会 資料
WSDM2016勉強会 資料WSDM2016勉強会 資料
WSDM2016勉強会 資料
 
The Infamous Hello World Program
The Infamous Hello World ProgramThe Infamous Hello World Program
The Infamous Hello World Program
 
PRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルPRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデル
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理
 
ICML2015読み会 資料
ICML2015読み会 資料ICML2015読み会 資料
ICML2015読み会 資料
 
Python入門
Python入門Python入門
Python入門
 

Similar to Consideration on Fairness-aware Data Mining

C0364012016
C0364012016C0364012016
C0364012016
theijes
 
Study of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningStudy of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data Mining
IRJET Journal
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
Carlos Castillo (ChaTo)
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
Micah Altman
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct Discrimination
Editor IJCATR
 
Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods
IOSR Journals
 
Disability Rights, Concepts & Practices (Classroom).pptx
Disability Rights, Concepts & Practices (Classroom).pptxDisability Rights, Concepts & Practices (Classroom).pptx
Disability Rights, Concepts & Practices (Classroom).pptx
Christopher Cross, M.A., C.M.A., D.S.P. (ret.)
 
WiMLDS Paris O Vereschak Trust AI-assisted decisions
WiMLDS Paris O Vereschak Trust AI-assisted decisionsWiMLDS Paris O Vereschak Trust AI-assisted decisions
WiMLDS Paris O Vereschak Trust AI-assisted decisions
Paris Women in Machine Learning and Data Science
 
J046065457
J046065457J046065457
J046065457
IJERA Editor
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missions
ijdms
 

Similar to Consideration on Fairness-aware Data Mining (10)

C0364012016
C0364012016C0364012016
C0364012016
 
Study of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningStudy of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data Mining
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct Discrimination
 
Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods
 
Disability Rights, Concepts & Practices (Classroom).pptx
Disability Rights, Concepts & Practices (Classroom).pptxDisability Rights, Concepts & Practices (Classroom).pptx
Disability Rights, Concepts & Practices (Classroom).pptx
 
WiMLDS Paris O Vereschak Trust AI-assisted decisions
WiMLDS Paris O Vereschak Trust AI-assisted decisionsWiMLDS Paris O Vereschak Trust AI-assisted decisions
WiMLDS Paris O Vereschak Trust AI-assisted decisions
 
J046065457
J046065457J046065457
J046065457
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missions
 

More from Toshihiro Kamishima

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料
Toshihiro Kamishima
 
WSDM2018読み会 資料
WSDM2018読み会 資料WSDM2018読み会 資料
WSDM2018読み会 資料
Toshihiro Kamishima
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用
Toshihiro Kamishima
 
KDD2016勉強会 資料
KDD2016勉強会 資料KDD2016勉強会 資料
KDD2016勉強会 資料
Toshihiro Kamishima
 
科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要
Toshihiro Kamishima
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
Toshihiro Kamishima
 

More from Toshihiro Kamishima (6)

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料
 
WSDM2018読み会 資料
WSDM2018読み会 資料WSDM2018読み会 資料
WSDM2018読み会 資料
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用
 
KDD2016勉強会 資料
KDD2016勉強会 資料KDD2016勉強会 資料
KDD2016勉強会 資料
 
科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
 

Recently uploaded

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 

Recently uploaded (20)

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 

Consideration on Fairness-aware Data Mining

  • 1. Considerations on Fairness-aware Data Mining Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma† *National Institute of Advanced Industrial Science and Technology (AIST), Japan †University of Tsukuba, Japan; and Japan Science and Technology Agency IEEE ICDM International Workshop on Discrimination and Privacy-Aware Data Mining @ Brussels, Belgium, Dec. 10, 2012 START 1
  • 2. Overview Fairness-aware Data Mining data analysis taking into account potential issues of fairness, discrimination, neutrality, or independence Introduction of the following topics to give an overview of fairness-aware data mining Applications of fairness-aware data mining for the other purposes besides avoiding discrimination An integrated view of existing fairness measures based on the statistical independence Connections of fairness-aware data mining with other techniques for data analysis 2
  • 4. Discrimination-aware Data Mining [Pedreschi+ 08] Discrimination-aware Data Mining Detection or elimination of the influence of socially sensitive information to serious decisions information considered from the viewpoint of social fairness ex. gender, religion, race, ethnicity, handicap, political conviction information restricted by law or contracts ex. insider information, customers’ private information This framework for detecting or eliminating the influence of a specific information to the analysis results can be used for other purposes besides avoiding discrimination enhancing the neutrality of the prediction excluding uninteresting information from analysis results ✴This is the reason why we use the term “fairness-aware” instead of “discrimination-aware” 4
  • 5. Filter Bubble [TED Talk by Eli Pariser] The Filter Bubble Problem Pariser posed a concern that personalization technologies narrow and bias the topics of information provided to people http://www.thefilterbubble.com/ Friend Recommendation List in Facebook To fit for Pariser’s preference, conservative people are eliminated from his recommendation list, while this fact is not notified to him 5
  • 6. Information Neutral Recommender System [Kamishima+ 12] The Filter Bubble Problem Information Neutral Recommender System enhancing the neutrality from a viewpoint specified by a user and other viewpoints are not considered Fairness-aware data mining techniques are used for removing the influence of the specified information ex. A system enhances the neutrality in terms of whether conservative or progressive, but it is allowed to make biased recommendations in terms of other viewpoints, for example, the birthplace or age of friends 6
  • 7. Ignoring Uninteresting Information [Gondek+ 04] non-redundant clustering : find clusters that are as independent from a given uninteresting partition as possible a conditional information bottleneck method, which is a variant of an information bottleneck method clustering facial images Simple clustering methods find two clusters: one contains only faces, and the other contains faces with shoulders Data analysts consider this clustering is useless and uninteresting A non-redundant clustering method derives more useful male and female clusters, which are independent of the above clusters 7
  • 8. An Integrated View of Existing Fairness Measures 8
  • 9. Notations of Variables a result of serious decision Y objective variable ex., whether or not to allow credit socially sensitive information S sensitive feature ex., gender or religion X non-sensitive feature vector all features other than a sensitive feature non-sensitive, but may correlate with S X (E) explainable non-sensitive feature vector X (U ) unexplainable non-sensitive feature vector Non-sensitive features can be further classified explainable and unexplainable (we will introduce later) 9
  • 10. Prejudice [Kamishima+ 12] Prejudice : the statistical dependences of an objective variable or non-sensitive features on a sensitive feature Direct Prejudice Y 6? S | X ? a clearly unfair state that a prediction model directly depends on a sensitive feature implying the conditional dependence between Y and S given X Indirect Prejudice Y 6? S | ; ? the dependence of an objective variable Y on a sensitive feature S bringing a red-lining effect Indirect Prejudice with Explainable Features Y 6? S | X(E) ? the conditional dependence between Y and S given explainable non- sensitive features, if a part of non-sensitive features are explainable 10
  • 11. Prejudice The degree of prejudices can be evaluated: computing statistics for the independence (among observations) ex. mutual information, χ2-statistics applying tests of independence (among sampled population) χ2-tests Relations between this prejudice and other existing measures for unfairness elift (extended lift) CV score (Calders-Verwer’s discrimination score) CV score with explainable features 11
  • 12. elift (extended lift) [Pedreschi+ 08, Ruggieri+ 10] conf(X = x, S= ) Y = ) elift (extended lift) = conf(X = x ) Y = ) the ratio of the confidence of a rule with additional condition to the confidence of a base rule The condition elift = 1 means that no unfair treatments, and it implies Pr[Y = , S= |X=x] = Pr[Y = |X=x] If this condition is the case for any x 2 Dom(X) (the condition of no direct discrimination in their definition) and S and Y are binary variables: Pr[Y, S|X] = Pr[Y |X] This is equivalent to the condition of no direct prejudice Similarly, their condition of no-indirect-discrimination is equivalent to that of no-indirect-prejudice condition 12
  • 13. CV Score (Calders-Verwer’s Discrimination Score) [Calders+ 10] Calders-Verwer discrimination score (CV score) Pr[Y = + |S=+] Pr[Y = + |S= ] The conditional probability of the advantageous decision for non-protected members subtracted by that for protected members The condition CV score = 0 means that neither unfair or affirmative treatments Pr[Y |S] = Pr[Y ] This is equivalent to the condition of no indirect prejudice 13
  • 14. Explainability [Zliobaite+ 11] non-discriminatory cases, even if distributions of an objective variable depends on a sensitive feature ex : admission to a university more females apply to medicine explainable feature more males apply to sensitive feature program computer gender medicine / computer male / female objective variable medicine : low acceptance acceptance ratio computer : high acceptance accept / not accept Because females tend to apply to a more competitive program, females are more frequently rejected Such difference is explainable and is considered as non-discriminatory 14
  • 15. Explainability To remove unfair treatments under such a condition, the probability of † receiving an advantageous decision, Pr [Y = + |X (E) ], is chosen so as to satisfy the condition: † Pr [Y = + |X (E) ] = Pr† [Y = + |X (E) , S=+] = Pr† [Y = + |X (E) , S= ] This condition implies Pr[Y |X (E) , S] = Pr[Y |X (E) ] This is equivalent to the condition of no indirect prejudice with explainable features ✴Notions that is similar to this explainability are proposed as a legally-grounded feature [Luong+ 11] and are considered in the rule generalization procedure [Haijan+ 12]. 15
  • 16. Connections with Other Techniques for Data Analysis 16
  • 17. Privacy-Preserving Data Mining indirect prejudice the dependence between an objective Y and a sensitive feature S from the information theoretic perspective, mutual information between Y and S is non-zero from the viewpoint of privacy-preservation, leakage of sensitive information when an objective variable is known Different points from PPDM introducing randomness is occasionally inappropriate for severe decisions, such as job application disclosure of identity isn’t problematic in FADM, generally 17
  • 18. Causal Inference [Pearl 2009] Causal inference a general technique for dealing with probabilistic causal relations Both FADM and causal inference can cope with the influence of sensitive information to an objective, but their premises are quite different While causal inference demands more information about causality among variables, it can handle the facts that were not occurred example of the connection between FADM and causal inference A person of S=+ actually receives Y=+ If S was changed to -, how was the probability of receiving Y=-? Under the condition called exogenous, the range of this probability is max[0, Pr[Y = + |S=+] Pr[Y = + |S= ]]  PNS  min[Pr[Y =+, X=+], Pr[Y = , X= ]] This lower bound coincides with a CV score 18
  • 19. Cost-Sensitive Learning [Elkan 2001] Cost-Sensitive Learning: learning classifiers so as to optimize classification costs, instead of maximizing prediction accuracies FADM can be regarded as a kind of cost-sensitive learning that pays the costs for taking fairness into consideration Cost matrix C(i | j): cost if a true class j is predicted as class i Total cost to minimize is formally defined as (if class Y=+ or -): P L(x, i) = j Pr[j | x]C(i | j) An object x is classified into the class i whose cost is minimized 19
  • 20. Cost-Sensitive Learning [Elkan 2001] Theorem 1 in [Elkan 2001] If negative examples in a data set is over-sampled by the factor of C(+| ) C( |+) and a classifier is learned from this samples, a classifier to optimize specified costs is obtained In a FADM case, an over-sampling technique is used for avoiding unfair treatments A corresponding cost matrix can be computed by this theorem, which connects a cost matrix and the class ratio in training data ✴Note that this over-sampling technique is simple and effective for avoiding unfair decisions, but its weak point that it completely ignores non-sensitive features 20
  • 21. Other Connected Techniques Legitimacy Data mining models can be deployed in the real world Independent Component Analysis Transformation while maintaining the independence between features Delegate Data To perform statistical tests, specific information is removed from data sets Dummy Query Dummy queries are inputted for protecting users’ demographics into search engines or recommender systems Visual Anonymization To protect identities of persons in images, faces or other information is blurred 21
  • 22. Conclusion Contributions We introduced the following topics to give an overview of fairness- aware data mining We showed other FADM applications besides avoiding discrimination: enhancing neutrality and ignoring uninteresting information. We discussed the relations between our notion of prejudice and other existing measures of unfairness. We showed the connections of FADM with privacy-preserving data mining, causal inference, cost-sensitive learning, and so on. Socially Responsible Mining Methods of data exploitation that do not damage people’s lives, such as fairness-aware data mining, PPDM, or adversarial learning, together comprise the notion of socially responsible mining, which it should become an important concept in the near future. 22
  • 23. Program Codes and Data Sets Fairness-Aware Data Mining http://www.kamishima.net/fadm Information Neutral Recommender System http://www.kamishima.net/inrs Acknowledgements This work is supported by MEXT/JSPS KAKENHI Grant Number 16700157, 21500154, 22500142, 23240043, and 24500194, and JST PRESTO 09152492 23