SlideShare a Scribd company logo
1 of 22
Download to read offline
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
Calibrating Probability with Undersampling
for Unbalanced Classification
Andrea Dal Pozzolo, Olivier Caelen,
Reid A. Johnson, and Gianluca Bontempi
8/12/2015
IEEE CIDM 2015
Cape Town, South Africa
1/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
INTRODUCTION
In several binary classification problems, the two classes
are not equally represented in the dataset.
In Fraud detection for example, fraudulent transactions are
rare compared to genuine ones (less than 1% [2]).
Many classification algorithms performs poorly in with
unbalanced class distribution [5].
A standard solution to unbalanced classification is
rebalancing the classes before training a classifier.
2/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
UNDERSAMPLING
Undersampling is a well-know technique used to balanced
a dataset.
It consists in down-sizing the majority class by removing
observations at random until the dataset is balanced.
Several studies [11, 4] have reported that it improves
classification performances.
Most often, the consequences of undersampling on the
posterior probability of a classifier are ignored.
3/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
OBJECTIVE OF THIS STUDY
In this work we:
formalize how undersampling works.
show that undersampling is responsible for a shift in the
posterior probability of a classifier.
study how this shift is linked to class separability.
investigate how this shift produces biased probability
estimates.
show how to obtain and use unbiased (calibrated)
probability for classification.
4/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
THE PROBLEM
Let us consider a binary classification task f : Rn → {+, −}
X ∈ Rn is the input and Y ∈ {+, −} the output domain.
+ is the minority and − the majority class.
Given a classifier K and a training set TN, we are interested
in estimating for a new sample (x, y) the posterior
probability p(y = +|x).
5/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
EFFECT OF UNDERSAMPLING
Suppose that a classifier K is trained on set TN which is
unbalanced.
Let s be a random variable associated to each sample
(x, y) ∈ TN, s = 1 if the point is sampled and s = 0
otherwise.
Assume that s is independent of the input x given the class
y (class-dependent selection):
p(s|y, x) = p(s|y) ⇔ p(x|y, s) = p(x|y)
.
Undersampling--
Unbalanced- Balanced-
Figure : Undersampling: remove randomly majority class examples.
In red samples that are removed from the unbalanced dataset (s = 0).
6/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
POSTERIOR PROBABILITIES
Let ps = p(+|x, s = 1) and p = p(+|x). We can write ps as a
function of p [1]:
ps =
p
p + β(1 − p)
(1)
where β = p(s = 1|−). Using (1) we can obtain an expression of
p as a function of ps:
p =
βps
βps − ps + 1
(2)
7/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
WARPING AND CLASS SEPARABILITY
Let ω+ and ω− denote p(x|+) and p(x|−), and π+ (π+
s ) the class
priors before (after) undersampling. Using Bayes’ theorem:
p =
ω+π+
ω+ − δπ−
(3)
where δ = ω+ − ω−. Similarly, since ω+ does not change with
undersampling:
ps =
ω+π+
s
ω+ − δπ−
s
(4)
Now we can write ps − p as:
ps − p =
ω+π+
s
ω+ − δπ−
s
−
ω+π+
ω+ − δπ−
(5)
8/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
WARPING AND CLASS SEPARABILITY
Figure : ps − p as a function of δ, where δ = ω+
− ω−
for values of
ω+
∈ {0.01, 0.1} when π+
s = 0.5 and π+
= 0.1.
9/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
WARPING AND CLASS SEPARABILITY II
(a) ps as a function of β
3 15
0
500
1000
1500
−10 0 10 20 −10 0 10 20
x
Count
class
0
1
(b) Class distribution
Figure : Class distribution and posterior probability as a function of β
for two univariate binary classification tasks with norm class
conditional densities X− ∼ N(0, σ) and X+ ∼ N(µ, σ) (on the left
µ = 3 and on the right µ = 15, in both examples σ = 3). Note that p
corresponds to β = 1 and ps to β < 1.
10/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
ADJUSTING POSTERIOR PROBABILITIES
We propose to use correct ps with p , which is obtained
using (2):
p =
βps
βps − ps + 1
(6)
Eq. (6) is a special case of the framework proposed by Saerens
et al. [8] and Elkan [3] (see Appendix in the paper).
0.00
0.25
0.50
0.75
1.00
−10 −5 0 5 10 15
x
Posteriorprobability
Probability
ps
p'
p
Figure : Posterior probabilities ps, p and p for β = N+
N− in the dataset
with overlapping classes (µ = 3).
11/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CLASSIFICATION THRESHOLD
Let r+ and r− be the risk of predicting an instance as positive
and negative:
r+
= (1 − p) · l1,0 + p · l1,1
r−
= (1 − p) · l0,0 + p · l0,1
where li,j is the cost in predicting i when the true class is j and
p = p(y = +|x). A sample is predicted as positive if
r+ ≤ r− [10]:
ˆy =
+ if r+ ≤ r−
− if r+ > r− (7)
Alternatively, predict as positive when p > τ with τ:
τ =
l1,0 − l0,0
l1,0 − l0,0 + l0,1 − l1,1
(8)
12/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CORRECTING THE CLASSIFICATION THRESHOLD
When the costs of a FN (l0,1) and FP (l1,0) are unknown, we can
use the priors. Let l1,0 = π+ and l0,1 = π−, from (8) we get:
τ =
l1,0
l1,0 + l0,1
=
π+
π+ + π−
= π+
(9)
since π+ + π− = 1. Then we should use π+ as threshold with p:
p −→ τ = π+
Similarly
ps −→ τs = π+
s
From Elkan [3]:
τ
1 − τ
1 − τs
τs
= β (10)
Therefore, we obtain:
p −→ τ = π+
13/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
EXPERIMENTAL SETTINGS
We denote as ˆps, ˆp and ˆp the estimates of ps, p and p
Goal: understand which probability return the highest
ranking (AUC), calibration (BS) and classification accuracy
(G-mean).
We use a 10-fold cross validation (CV) to test our models
and we repeated the CV 10 times.
We test several classification algorithms: Random
Forest [7], SVM [6], and Logit Boost [9].
We consider real-world unbalanced datasets from the UCI
repository used in [1].
14/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
LEARNING FRAMEWORK
Test%set% Train%set%
Undersampling%Unbalanced%Model%
Balanced%Model%ˆp
ˆ!p
τ
τ '
τs
Fold%1% Fold%2% Fold%3% Fold%4% Fold%10%
Unbalanced%Dataset%
.%.%.%.%
ˆps
Figure : Learning framework for comparing models with and
without undersampling using Cross Validation (CV). We use one fold
of the CV as testing set and the others for training, and iterate the
framework to use all the folds once for testing.
15/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
DATASETS
Table : Datasets from the UCI repository used in [1].
Datasets N N+ N− N+/N
ecoli 336 35 301 0.10
glass 214 17 197 0.08
letter-a 20000 789 19211 0.04
letter-vowel 20000 3878 16122 0.19
ism 11180 260 10920 0.02
letter 20000 789 19211 0.04
oil 937 41 896 0.04
page 5473 560 4913 0.10
pendigits 10992 1142 9850 0.10
PhosS 11411 613 10798 0.05
satimage 6430 625 5805 0.10
segment 2310 330 1980 0.14
boundary 3505 123 3382 0.04
estate 5322 636 4686 0.12
cam 18916 942 17974 0.05
compustat 13657 520 13137 0.04
covtype 38500 2747 35753 0.07
16/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
RESULTS
Table : Sum of ranks and p-values of the paired t-test between the
ranks of ˆp and ˆp and between ˆp and ˆps for different metrics. In bold
the probabilities with the best rank sum (higher for AUC and
G-mean, lower for BS).
Metric Algo Rˆp Rˆps
Rˆp ρ(Rˆp, Rˆps
) ρ(Rˆp, Rˆp )
AUC LB 22,516 23,572 23,572 0.322 0.322
AUC RF 24,422 22,619 22,619 0.168 0.168
AUC SVM 19,595 19,902.5 19,902.5 0.873 0.873
G-mean LB 23,281 23,189.5 23,189.5 0.944 0.944
G-mean RF 22,986 23,337 23,337 0.770 0.770
G-mean SVM 19,550 19,925 19,925 0.794 0.794
BS LB 19809.5 29448.5 20402 0.000 0.510
BS RF 18336 28747 22577 0.000 0.062
BS SVM 17139 23161 19100 0.001 0.156
17/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
RESULTS II
The rank sum is the same for ˆps and ˆp since (6) is
monotone.
Undersampling does not always improve the ranking
(AUC) or classification accuracy (G-mean) of an algorithm.
ˆp is the probability estimate with the best calibration
(lower rank sum with BS).
ˆp has always better calibration than ˆps, then we should use
ˆp instead of ˆps.
18/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CREDIT CARDS DATASET
Real-world credit card dataset with transactions from Sep 2013,
frauds account for 0.172% of all transactions.
LB RF SVM
qqqqqqqqqq qqqqqqqqqq
qqqqqqqqqq qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq
qqqqqq qqqqqq
qqqqqq qqqqqq
qqqqqq qqqqqq
qqqqqqqq qqqqqqqq
qqqqqqqq qqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.900
0.925
0.950
0.975
1.000
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
beta
AUC
Probability
p
p'
ps
Credit−card
LB RF SVM
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqq
qqqqqqqqqq
qqqqqq
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
qqqqqq
qqqqqqqq qqqqqqqq
3e−04
6e−04
9e−04
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
beta
BS
Probability
p
p'
ps
Credit−card
19/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
CONCLUSION
As a result of undersampling, ˆps is shifted away from ˆp.
This shift is stronger for overlapping distributions and gets
larger for small values of β.
Using (6), we can remove the drift in ˆps and obtain ˆp
which has better calibration.
ˆp provides the same ranking quality of ˆps.
Results from UCI and credit card datasets show that using
ˆp with τ we are able to improve calibration without losing
predictive accuracy.
20/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
Credit card dataset: http://www.ulb.ac.be/di/map/
adalpozz/data/creditcard.Rdata
Website: www.ulb.ac.be/di/map/adalpozz
Email: adalpozz@ulb.ac.be
Thank you for the attention
Research is supported by the Doctiris scholarship
funded by Innoviris, Brussels, Belgium.
21/ 22
Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion
BIBLIOGRAPHY
[1] A. Dal Pozzolo, O. Caelen, and G. Bontempi.
When is undersampling effective in unbalanced classification tasks?
In Machine Learning and Knowledge Discovery in Databases. Springer, 2015.
[2] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi.
Learned lessons in credit card fraud detection from a practitioner perspective.
Expert Systems with Applications, 41(10):4915–4928, 2014.
[3] C. Elkan.
The foundations of cost-sensitive learning.
In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978, 2001.
[4] A. Estabrooks, T. Jo, and N. Japkowicz.
A multiple resampling method for learning from imbalanced data sets.
Computational Intelligence, 20(1):18–36, 2004.
[5] N. Japkowicz and S. Stephen.
The class imbalance problem: A systematic study.
Intelligent data analysis, 6(5):429–449, 2002.
[6] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis.
kernlab-an s4 package for kernel methods in r.
2004.
[7] A. Liaw and M. Wiener.
Classification and regression by randomforest.
R News, 2(3):18–22, 2002.
[8] M. Saerens, P. Latinne, and C. Decaestecker.
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.
Neural computation, 14(1):21–41, 2002.
[9] J. Tuszynski.
caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc., 2013.
R package version 1.16.
[10] V. N. Vapnik and V. Vapnik.
Statistical learning theory, volume 1.
Wiley New York, 1998.
[11] G. M. Weiss and F. Provost.
The effect of class distribution on classifier learning: an empirical study.
Rutgers Univ, 2001.
22/ 22

More Related Content

Viewers also liked

A novel approach on gate delay transition based path delay fault model
A novel  approach on gate delay transition based path delay fault modelA novel  approach on gate delay transition based path delay fault model
A novel approach on gate delay transition based path delay fault model
sriece2007
 
Aq workshop l01 - way of a muslim
Aq workshop   l01 - way of a muslimAq workshop   l01 - way of a muslim
Aq workshop l01 - way of a muslim
exploringislam
 
Подарунки
ПодарункиПодарунки
Подарунки
Irina Lut
 
产品经理实用工具全集(1 8)
产品经理实用工具全集(1 8)产品经理实用工具全集(1 8)
产品经理实用工具全集(1 8)
Gauin
 
The true religion of god
The true  religion of godThe true  religion of god
The true religion of god
exploringislam
 
The three fundamental principles
The three fundamental principlesThe three fundamental principles
The three fundamental principles
exploringislam
 
Shapiro capitulo 4 1° parte
Shapiro capitulo 4 1° parteShapiro capitulo 4 1° parte
Shapiro capitulo 4 1° parte
feraza
 
Taqleed following blindly
Taqleed   following blindlyTaqleed   following blindly
Taqleed following blindly
exploringislam
 

Viewers also liked (20)

A novel approach on gate delay transition based path delay fault model
A novel  approach on gate delay transition based path delay fault modelA novel  approach on gate delay transition based path delay fault model
A novel approach on gate delay transition based path delay fault model
 
互邀新平台产品宣介
互邀新平台产品宣介互邀新平台产品宣介
互邀新平台产品宣介
 
Aq workshop l01 - way of a muslim
Aq workshop   l01 - way of a muslimAq workshop   l01 - way of a muslim
Aq workshop l01 - way of a muslim
 
Catalogue
CatalogueCatalogue
Catalogue
 
Подарунки
ПодарункиПодарунки
Подарунки
 
Slideshare1
Slideshare1Slideshare1
Slideshare1
 
腾讯产品运营PPT-产品经理的视角
腾讯产品运营PPT-产品经理的视角腾讯产品运营PPT-产品经理的视角
腾讯产品运营PPT-产品经理的视角
 
Slideshare1
Slideshare1Slideshare1
Slideshare1
 
Notes day 2 - angels
Notes   day 2 - angelsNotes   day 2 - angels
Notes day 2 - angels
 
产品经理实用工具全集(1 8)
产品经理实用工具全集(1 8)产品经理实用工具全集(1 8)
产品经理实用工具全集(1 8)
 
Squaring 4
Squaring   4Squaring   4
Squaring 4
 
Raising a Mathematician Training Program 2015
Raising a Mathematician Training Program 2015Raising a Mathematician Training Program 2015
Raising a Mathematician Training Program 2015
 
The true religion of god
The true  religion of godThe true  religion of god
The true religion of god
 
Board room of today presentation
Board room of today presentationBoard room of today presentation
Board room of today presentation
 
tttt
tttttttt
tttt
 
The three fundamental principles
The three fundamental principlesThe three fundamental principles
The three fundamental principles
 
Confusion in manhaj
Confusion in manhajConfusion in manhaj
Confusion in manhaj
 
Colegio nacional nicolas esguerra
Colegio nacional nicolas esguerraColegio nacional nicolas esguerra
Colegio nacional nicolas esguerra
 
Shapiro capitulo 4 1° parte
Shapiro capitulo 4 1° parteShapiro capitulo 4 1° parte
Shapiro capitulo 4 1° parte
 
Taqleed following blindly
Taqleed   following blindlyTaqleed   following blindly
Taqleed following blindly
 

Similar to Calibrating Probability with Undersampling for Unbalanced Classification

GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
Pablo Ginestet
 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
kuntalpatra420
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
Karl Rudeen
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
Stéphane Canu
 

Similar to Calibrating Probability with Undersampling for Unbalanced Classification (20)

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
 
Batch gradient method for training of
Batch gradient method for training ofBatch gradient method for training of
Batch gradient method for training of
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
 
NB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itNB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use it
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 

More from Andrea Dal Pozzolo

Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015
Andrea Dal Pozzolo
 

More from Andrea Dal Pozzolo (9)

Andrea Dal Pozzolo's CV
Andrea Dal Pozzolo's CVAndrea Dal Pozzolo's CV
Andrea Dal Pozzolo's CV
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Presentation of the unbalanced R package
Presentation of the unbalanced R packagePresentation of the unbalanced R package
Presentation of the unbalanced R package
 
Andrea Dal Pozzolo - Data Scientist
Andrea Dal Pozzolo - Data ScientistAndrea Dal Pozzolo - Data Scientist
Andrea Dal Pozzolo - Data Scientist
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
 
Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015
 
Doctiris project - Innoviris, Brussels
Doctiris project - Innoviris, Brussels Doctiris project - Innoviris, Brussels
Doctiris project - Innoviris, Brussels
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Recently uploaded (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Calibrating Probability with Undersampling for Unbalanced Classification

  • 1. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion Calibrating Probability with Undersampling for Unbalanced Classification Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi 8/12/2015 IEEE CIDM 2015 Cape Town, South Africa 1/ 22
  • 2. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion INTRODUCTION In several binary classification problems, the two classes are not equally represented in the dataset. In Fraud detection for example, fraudulent transactions are rare compared to genuine ones (less than 1% [2]). Many classification algorithms performs poorly in with unbalanced class distribution [5]. A standard solution to unbalanced classification is rebalancing the classes before training a classifier. 2/ 22
  • 3. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion UNDERSAMPLING Undersampling is a well-know technique used to balanced a dataset. It consists in down-sizing the majority class by removing observations at random until the dataset is balanced. Several studies [11, 4] have reported that it improves classification performances. Most often, the consequences of undersampling on the posterior probability of a classifier are ignored. 3/ 22
  • 4. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion OBJECTIVE OF THIS STUDY In this work we: formalize how undersampling works. show that undersampling is responsible for a shift in the posterior probability of a classifier. study how this shift is linked to class separability. investigate how this shift produces biased probability estimates. show how to obtain and use unbiased (calibrated) probability for classification. 4/ 22
  • 5. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion THE PROBLEM Let us consider a binary classification task f : Rn → {+, −} X ∈ Rn is the input and Y ∈ {+, −} the output domain. + is the minority and − the majority class. Given a classifier K and a training set TN, we are interested in estimating for a new sample (x, y) the posterior probability p(y = +|x). 5/ 22
  • 6. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion EFFECT OF UNDERSAMPLING Suppose that a classifier K is trained on set TN which is unbalanced. Let s be a random variable associated to each sample (x, y) ∈ TN, s = 1 if the point is sampled and s = 0 otherwise. Assume that s is independent of the input x given the class y (class-dependent selection): p(s|y, x) = p(s|y) ⇔ p(x|y, s) = p(x|y) . Undersampling-- Unbalanced- Balanced- Figure : Undersampling: remove randomly majority class examples. In red samples that are removed from the unbalanced dataset (s = 0). 6/ 22
  • 7. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion POSTERIOR PROBABILITIES Let ps = p(+|x, s = 1) and p = p(+|x). We can write ps as a function of p [1]: ps = p p + β(1 − p) (1) where β = p(s = 1|−). Using (1) we can obtain an expression of p as a function of ps: p = βps βps − ps + 1 (2) 7/ 22
  • 8. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion WARPING AND CLASS SEPARABILITY Let ω+ and ω− denote p(x|+) and p(x|−), and π+ (π+ s ) the class priors before (after) undersampling. Using Bayes’ theorem: p = ω+π+ ω+ − δπ− (3) where δ = ω+ − ω−. Similarly, since ω+ does not change with undersampling: ps = ω+π+ s ω+ − δπ− s (4) Now we can write ps − p as: ps − p = ω+π+ s ω+ − δπ− s − ω+π+ ω+ − δπ− (5) 8/ 22
  • 9. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion WARPING AND CLASS SEPARABILITY Figure : ps − p as a function of δ, where δ = ω+ − ω− for values of ω+ ∈ {0.01, 0.1} when π+ s = 0.5 and π+ = 0.1. 9/ 22
  • 10. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion WARPING AND CLASS SEPARABILITY II (a) ps as a function of β 3 15 0 500 1000 1500 −10 0 10 20 −10 0 10 20 x Count class 0 1 (b) Class distribution Figure : Class distribution and posterior probability as a function of β for two univariate binary classification tasks with norm class conditional densities X− ∼ N(0, σ) and X+ ∼ N(µ, σ) (on the left µ = 3 and on the right µ = 15, in both examples σ = 3). Note that p corresponds to β = 1 and ps to β < 1. 10/ 22
  • 11. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion ADJUSTING POSTERIOR PROBABILITIES We propose to use correct ps with p , which is obtained using (2): p = βps βps − ps + 1 (6) Eq. (6) is a special case of the framework proposed by Saerens et al. [8] and Elkan [3] (see Appendix in the paper). 0.00 0.25 0.50 0.75 1.00 −10 −5 0 5 10 15 x Posteriorprobability Probability ps p' p Figure : Posterior probabilities ps, p and p for β = N+ N− in the dataset with overlapping classes (µ = 3). 11/ 22
  • 12. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CLASSIFICATION THRESHOLD Let r+ and r− be the risk of predicting an instance as positive and negative: r+ = (1 − p) · l1,0 + p · l1,1 r− = (1 − p) · l0,0 + p · l0,1 where li,j is the cost in predicting i when the true class is j and p = p(y = +|x). A sample is predicted as positive if r+ ≤ r− [10]: ˆy = + if r+ ≤ r− − if r+ > r− (7) Alternatively, predict as positive when p > τ with τ: τ = l1,0 − l0,0 l1,0 − l0,0 + l0,1 − l1,1 (8) 12/ 22
  • 13. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CORRECTING THE CLASSIFICATION THRESHOLD When the costs of a FN (l0,1) and FP (l1,0) are unknown, we can use the priors. Let l1,0 = π+ and l0,1 = π−, from (8) we get: τ = l1,0 l1,0 + l0,1 = π+ π+ + π− = π+ (9) since π+ + π− = 1. Then we should use π+ as threshold with p: p −→ τ = π+ Similarly ps −→ τs = π+ s From Elkan [3]: τ 1 − τ 1 − τs τs = β (10) Therefore, we obtain: p −→ τ = π+ 13/ 22
  • 14. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion EXPERIMENTAL SETTINGS We denote as ˆps, ˆp and ˆp the estimates of ps, p and p Goal: understand which probability return the highest ranking (AUC), calibration (BS) and classification accuracy (G-mean). We use a 10-fold cross validation (CV) to test our models and we repeated the CV 10 times. We test several classification algorithms: Random Forest [7], SVM [6], and Logit Boost [9]. We consider real-world unbalanced datasets from the UCI repository used in [1]. 14/ 22
  • 15. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion LEARNING FRAMEWORK Test%set% Train%set% Undersampling%Unbalanced%Model% Balanced%Model%ˆp ˆ!p τ τ ' τs Fold%1% Fold%2% Fold%3% Fold%4% Fold%10% Unbalanced%Dataset% .%.%.%.% ˆps Figure : Learning framework for comparing models with and without undersampling using Cross Validation (CV). We use one fold of the CV as testing set and the others for training, and iterate the framework to use all the folds once for testing. 15/ 22
  • 16. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion DATASETS Table : Datasets from the UCI repository used in [1]. Datasets N N+ N− N+/N ecoli 336 35 301 0.10 glass 214 17 197 0.08 letter-a 20000 789 19211 0.04 letter-vowel 20000 3878 16122 0.19 ism 11180 260 10920 0.02 letter 20000 789 19211 0.04 oil 937 41 896 0.04 page 5473 560 4913 0.10 pendigits 10992 1142 9850 0.10 PhosS 11411 613 10798 0.05 satimage 6430 625 5805 0.10 segment 2310 330 1980 0.14 boundary 3505 123 3382 0.04 estate 5322 636 4686 0.12 cam 18916 942 17974 0.05 compustat 13657 520 13137 0.04 covtype 38500 2747 35753 0.07 16/ 22
  • 17. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion RESULTS Table : Sum of ranks and p-values of the paired t-test between the ranks of ˆp and ˆp and between ˆp and ˆps for different metrics. In bold the probabilities with the best rank sum (higher for AUC and G-mean, lower for BS). Metric Algo Rˆp Rˆps Rˆp ρ(Rˆp, Rˆps ) ρ(Rˆp, Rˆp ) AUC LB 22,516 23,572 23,572 0.322 0.322 AUC RF 24,422 22,619 22,619 0.168 0.168 AUC SVM 19,595 19,902.5 19,902.5 0.873 0.873 G-mean LB 23,281 23,189.5 23,189.5 0.944 0.944 G-mean RF 22,986 23,337 23,337 0.770 0.770 G-mean SVM 19,550 19,925 19,925 0.794 0.794 BS LB 19809.5 29448.5 20402 0.000 0.510 BS RF 18336 28747 22577 0.000 0.062 BS SVM 17139 23161 19100 0.001 0.156 17/ 22
  • 18. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion RESULTS II The rank sum is the same for ˆps and ˆp since (6) is monotone. Undersampling does not always improve the ranking (AUC) or classification accuracy (G-mean) of an algorithm. ˆp is the probability estimate with the best calibration (lower rank sum with BS). ˆp has always better calibration than ˆps, then we should use ˆp instead of ˆps. 18/ 22
  • 19. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CREDIT CARDS DATASET Real-world credit card dataset with transactions from Sep 2013, frauds account for 0.172% of all transactions. LB RF SVM qqqqqqqqqq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqqqq qqqqqqqq qqqqqqqq qqqqqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.900 0.925 0.950 0.975 1.000 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 beta AUC Probability p p' ps Credit−card LB RF SVM q q q q q q q q q q q q q q q q q q q q qqqqqqqqqq qqqqqqqqqq qqqqqq qqqqqq q q q q q q q q q q q q qqqqqq qqqqqq qqqqqqqq qqqqqqqq 3e−04 6e−04 9e−04 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 beta BS Probability p p' ps Credit−card 19/ 22
  • 20. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion CONCLUSION As a result of undersampling, ˆps is shifted away from ˆp. This shift is stronger for overlapping distributions and gets larger for small values of β. Using (6), we can remove the drift in ˆps and obtain ˆp which has better calibration. ˆp provides the same ranking quality of ˆps. Results from UCI and credit card datasets show that using ˆp with τ we are able to improve calibration without losing predictive accuracy. 20/ 22
  • 21. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion Credit card dataset: http://www.ulb.ac.be/di/map/ adalpozz/data/creditcard.Rdata Website: www.ulb.ac.be/di/map/adalpozz Email: adalpozz@ulb.ac.be Thank you for the attention Research is supported by the Doctiris scholarship funded by Innoviris, Brussels, Belgium. 21/ 22
  • 22. Introduction Impact of sampling on probabilities Classification threshold Experiments Conclusion BIBLIOGRAPHY [1] A. Dal Pozzolo, O. Caelen, and G. Bontempi. When is undersampling effective in unbalanced classification tasks? In Machine Learning and Knowledge Discovery in Databases. Springer, 2015. [2] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10):4915–4928, 2014. [3] C. Elkan. The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978, 2001. [4] A. Estabrooks, T. Jo, and N. Japkowicz. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20(1):18–36, 2004. [5] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429–449, 2002. [6] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis. kernlab-an s4 package for kernel methods in r. 2004. [7] A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002. [8] M. Saerens, P. Latinne, and C. Decaestecker. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural computation, 14(1):21–41, 2002. [9] J. Tuszynski. caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc., 2013. R package version 1.16. [10] V. N. Vapnik and V. Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998. [11] G. M. Weiss and F. Provost. The effect of class distribution on classifier learning: an empirical study. Rutgers Univ, 2001. 22/ 22