SlideShare a Scribd company logo
1 of 12
Download to read offline
Dealing with imbalanced data in RTB
Yuya Kanemoto
Table of contents
1. Introduction
2. Methods
2.1 Re-sampling
2.2 Cost-sensitive learning
3. Tools in practice
4. Reference
1
Introduction
A classifier can predict the class labels of new data after the
training. Proportion of class labels for the training can be
imbalanced in real-world data sets, and imbalanced data makes the
training difficult for a classifier. This is the case for Real-Time
Bidding (RTB) framework in online advertisement, and there are
several ways to deal with the problem to improve the performance
of the classifier.
2
Methods: Re-sampling
Re-sampling can deal with the imbalanced data by balancing the
proportion of class labels
• Under-sampling the majority class
• Over-sampling the minority class
• Combining over- and under-sampling
• Create ensemble balanced sets
3
Methods: Calibration after re-sampling
There are several ways to calibrate the output probability from a
classifier after the re-sampling
• Isotonic regression
minimize
∑
i wi (yi − ˆyi )2
subject to ˆymin = ˆy1 ≤ ˆy2... ≤ ˆyn = ˆymax
• Calibration factor for negative under-sampling
q = p
p+(1−p)/w
• q: calibrated probability
• p: prediction in under-sampling space
• w: under-sampling rate
4
Methods: Calibration after re-sampling
• Probability calibration should be done on new data not used
for model fitting
• Logistic regression returns well calibrated predictions by
default as it directly optimizes log-loss
5
Cost-sensitive learning
Actual positive Actual negative
Predict positive C(0,0) C(0,1)
Predict negative C(1,0) C(1,1)
• Cost-sensitive learning takes the misclassification costs into
consideration
• R(i|x) =
∑
j P(j|x)C(i, j)
• expected cost R(i|x) of classifying an instance x into class i
• Classifier will classify an instance x into positive class if and
only if:
P(0|x)C(1, 0) ≤ P(1|x)C(0, 1) assumig C(0, 0) = C(1, 1) = 0
6
Cost-sensitive learning types: Thresholding
• Thresholding method modifies the threshold (0.5 by defalut)
to label the class considering the costs
p∗ = C(1,0)
C(1,0)+C(0,1)
• threshold p∗ for the classifier to classify an instance x into
positive if P(1|x) ≥ p∗
7
Cost-sensitive learning types: Sampling
• Re-sampling method described above can be considered as a
part of cost-sensitive learning
• Positive and negative examples are sampled by the ratio of:
p(1)FN : p(0)FP
• p(1) and p(0) are the prior probability of the positive and
negative examples in the original training set
8
Cost-sensitive learning types: Weighting
• Weighting method assigns a normalized weight to each
instance according to the misclassification costs
• This can be considered as a part of Sampling method as
example with high weights (for rare class with high costs) can
be viewed as example duplication - thus sampling
• Weighting method can utilize all data unlike Sampling method
9
Tools in practice: Xgboost
• Balance the positive and negative weights via
scale-pos-weight if you care only about the ranking order of
your prediction
• typically by inserting sum(negative/major samples)
sum(positive/rare samples)
• Use AUC for evaluation. Utility [Chapelle O 2015] can also be
considered as a metric in RTB
• If you care about predicting the right probability, you cannot
re-balance the data
• setting parameter max-delta-step to a finite number (like 1)
will help convergence
10
Reference
• Offline Evaluation of Response Prediction in Online
Advertising Auctions Categories and Subject Descriptors,
Chapelle O 2015
• XGBoost, Chen T et al. 2016
• Practical Lessons from Predicting Clicks on Ads at Facebook,
He X et al. 2014
• Cost-sensitive learning and the class imbalance problem, Ling
C et al. 2008
• Cost-sensitive Learning for Utility Optimization in Online
Advertising Auctions, Vasile F et al. 2016
11

More Related Content

Similar to Dealing with imbalanced data in RTB

Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptxRithikRaj25
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesNTNU
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learningRajasekhar364622
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluationkhairulhuda242
 
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptxMahmoudAbuGhali
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 

Similar to Dealing with imbalanced data in RTB (20)

evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision Trees
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
 
Lecture1.ppt
Lecture1.pptLecture1.ppt
Lecture1.ppt
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 

Recently uploaded (20)

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 

Dealing with imbalanced data in RTB

  • 1. Dealing with imbalanced data in RTB Yuya Kanemoto
  • 2. Table of contents 1. Introduction 2. Methods 2.1 Re-sampling 2.2 Cost-sensitive learning 3. Tools in practice 4. Reference 1
  • 3. Introduction A classifier can predict the class labels of new data after the training. Proportion of class labels for the training can be imbalanced in real-world data sets, and imbalanced data makes the training difficult for a classifier. This is the case for Real-Time Bidding (RTB) framework in online advertisement, and there are several ways to deal with the problem to improve the performance of the classifier. 2
  • 4. Methods: Re-sampling Re-sampling can deal with the imbalanced data by balancing the proportion of class labels • Under-sampling the majority class • Over-sampling the minority class • Combining over- and under-sampling • Create ensemble balanced sets 3
  • 5. Methods: Calibration after re-sampling There are several ways to calibrate the output probability from a classifier after the re-sampling • Isotonic regression minimize ∑ i wi (yi − ˆyi )2 subject to ˆymin = ˆy1 ≤ ˆy2... ≤ ˆyn = ˆymax • Calibration factor for negative under-sampling q = p p+(1−p)/w • q: calibrated probability • p: prediction in under-sampling space • w: under-sampling rate 4
  • 6. Methods: Calibration after re-sampling • Probability calibration should be done on new data not used for model fitting • Logistic regression returns well calibrated predictions by default as it directly optimizes log-loss 5
  • 7. Cost-sensitive learning Actual positive Actual negative Predict positive C(0,0) C(0,1) Predict negative C(1,0) C(1,1) • Cost-sensitive learning takes the misclassification costs into consideration • R(i|x) = ∑ j P(j|x)C(i, j) • expected cost R(i|x) of classifying an instance x into class i • Classifier will classify an instance x into positive class if and only if: P(0|x)C(1, 0) ≤ P(1|x)C(0, 1) assumig C(0, 0) = C(1, 1) = 0 6
  • 8. Cost-sensitive learning types: Thresholding • Thresholding method modifies the threshold (0.5 by defalut) to label the class considering the costs p∗ = C(1,0) C(1,0)+C(0,1) • threshold p∗ for the classifier to classify an instance x into positive if P(1|x) ≥ p∗ 7
  • 9. Cost-sensitive learning types: Sampling • Re-sampling method described above can be considered as a part of cost-sensitive learning • Positive and negative examples are sampled by the ratio of: p(1)FN : p(0)FP • p(1) and p(0) are the prior probability of the positive and negative examples in the original training set 8
  • 10. Cost-sensitive learning types: Weighting • Weighting method assigns a normalized weight to each instance according to the misclassification costs • This can be considered as a part of Sampling method as example with high weights (for rare class with high costs) can be viewed as example duplication - thus sampling • Weighting method can utilize all data unlike Sampling method 9
  • 11. Tools in practice: Xgboost • Balance the positive and negative weights via scale-pos-weight if you care only about the ranking order of your prediction • typically by inserting sum(negative/major samples) sum(positive/rare samples) • Use AUC for evaluation. Utility [Chapelle O 2015] can also be considered as a metric in RTB • If you care about predicting the right probability, you cannot re-balance the data • setting parameter max-delta-step to a finite number (like 1) will help convergence 10
  • 12. Reference • Offline Evaluation of Response Prediction in Online Advertising Auctions Categories and Subject Descriptors, Chapelle O 2015 • XGBoost, Chen T et al. 2016 • Practical Lessons from Predicting Clicks on Ads at Facebook, He X et al. 2014 • Cost-sensitive learning and the class imbalance problem, Ling C et al. 2008 • Cost-sensitive Learning for Utility Optimization in Online Advertising Auctions, Vasile F et al. 2016 11