Mitigating unwanted biases with adversarial learning

2019.07.09
Mitigating Unwanted Biases
with Adversarial Learning

- 인구통계학적 집단에 대한 편견이 훈련 데이터에 존재할 때, 이에 따라 훈련된 모델 또한 편견이 포함된다.
- Protected group를 modeling하려는 adversary와 model을 예측하려는 predictor를 동시에 학습시켜,
편견을 완화시키고자 한다.
- Measurements for Fairness:
> Demographic Parity
> Equality of Odds
> Equality of Opportunity
X
data
Y
predict
Z
Protected
attribute
predictor
Adversary
Adversarial Debiasing

1. Demographic parity
2. Equality of Odds
3. Equality of Opportunity

> 𝐿𝑃 ො
𝑦, 𝑦 를 최소화하기 위해 W를 update (using stochastic gradient descent)
Word Embedding
> 단어 유사성을 제시하지만 여기에 bias가 포함되어 있다는 문제

AI Fairness 360 (AIF360)
AI Fairness 360 (AIF 360)
> 인공지능 기술의 활용 과정에서 등장할 수 있는 편향성을 시정하기 위해 IBM에서 이를 open source 형태로 발표.
https://github.com/IBM/AIF360
- Dataset과 model에서의 “편향성(bias)”을 완화하기 위한 알고리즘과 평가지표에 대한 설명
- Tutorial과 demo notebook 공개

https://arxiv.org/pdf/1810.01943.pdf

Algorithm Method
pre-processing
:데이터 자체의 편향성 문제
Re-weighing
(Kamiran&Calders, 2012)
Optimized pre-processing
(Calmon et al.,2017)
Learning fair
representation(LFR)
(Zemel et al.,2013)
Disparate import remover
(Feldman et al.,2015)
In-processing
: 특정 feature의 가중치로 인해
생성된 모델의 편향성 문제
(Zhang et al.,2018)
Prejudice remover
(Kamishima et al.,2012)
Post-processing
: Test dataset자체의 편향성 문제
Equalized odds post-
processing (Hardt et al.,2016)
Calibrated eq. odds
postprocessing (Pleiss et al.,2017)
Reject Option classification
(Kamiran et al.,2012)
pre-processing
In-processing
Post-processing

Adversarial Debiasing (in-processing)
: 예측 정확도를 최대화하고 동시에 예측으로부터 protected attribute를 결정할 수 있는 Adversary’s ability를 감소시키는 classifier를 학습한다.
즉, adversary가 이용할 수 있는 집단 간 차별 정보(privileged group & unprivileged group)를 예측에 전달할 수 없기 때문에 공정한 classifier가 된다.
Adult / Census Income Dataset
In-processing _ Adversarial Debiasing

- Income이 >$50K인지를 예측하는 데이터셋
- 해당 모델에 대해 “Equality of Odds”를 강화하고자 한다.
- Protected Attribute: Sex
- Privileged Group: Male / Unprivileged Group: Female

- Epoch: 50
- Batch_size: 128
- Plain model: without debias / model: with debias

Statistical Parity Difference
= Pr(Unprivileged group) – Pr(privileged group)
Fairness는 -0.1~0.1의 값으로 평가된다.
Equal Opportunity Difference
= true positive rate
value < 0 , privileged group의 이익 / value > 0, unprivileged group의 이익
Average Odds Difference
= (false positive rate + true positive rate) / 2
value < 0, privileged group의 이익 / value > 0, unprivileged group의 이익
Disparate Impact
= Pr(Unprivileged group) / Pr(privileged group)
value < 1 , privileged group의 이익 / value > 1, unprivileged group의 이익
Fairness는 0.8~1.2의 값으로 평가된다.
Theil Index
= generalized entrop
각각의 data에 대한 inequality을 의미한다.
Perfect fairness = 0 (value 값이 낮을수록 fairness / 높을수록 problematic)

Privileged Group: Male / Unprivileged Group: Female
Accuracy: 82%
Accuracy: 70%
http://aif360.mybluemix.net/

Pre-processing _ Reweighing
The German credit dataset
> Protected attribute: AGE
> algorithm: Reweighing (pre-processing)
(protected attribute에 따라 편향성을 줄이도록 데이터셋을 변형시킨다.)
Privileged group이 training dataset에서 17%의 positive한 결과를 갖는다.
즉, 이러한 bias한 결과를 완화해야한다.

The German credit dataset
> Protected attribute: AGE
> algorithm: Reweighing (pre-processing)
(protected attribute에 따라 편향성을 줄이도록 데이터셋을 변형시킨다.)
Re-weighing model (pre-processing) 모델을 학습한 결과,
이전의 편향성이 0으로 줄어든 것을 확인할 수 있다.
Re-weighing model: classification이전에 fairness를 확인하기 위해 feature들의 조합(group, label)에 가중치를 부여한다.
Pre-processing _ Reweighing

Post-processing _ calibrate eq odds postprocessing
The Adult / Census Income dataset
> Protected attribute: Sex
> algorithm: Calibrated_eq_odds postprocessing (post-processing)
Logistic
regression

Post-processing

Mitigating unwanted biases with adversarial learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from Danbi Cho

More from Danbi Cho (11)

Recently uploaded

Recently uploaded (20)

Mitigating unwanted biases with adversarial learning