SlideShare a Scribd company logo
1 of 33
Download to read offline
Ensembles of example-dependent cost-sensitive
decision trees
April 28, 2015
Alejandro Correa Bahnsen
with
Djamila Aouada, SnT
Björn Ottersten, SnT
Motivation
2
• Classification: predicting the class of
a set of examples given their
features.
• Standard classification methods aim
at minimizing the errors
• Such a traditional framework
assumes that all misclassification
errors carry the same cost
• This is not the case in many real-world applications: Credit card
fraud detection, churn modeling, credit scoring, direct marketing.
• Cost-sensitive classification
Background, previous contributions
• Cost-sensitive Ensembles
Introduction, random inducers, combination methods, propose algorithms
• Datasets
Credit card fraud detection, churn modeling, credit scoring, direct marketing
• Experiments
Experimental setup, results
• Conclusions
Contributions
Agenda
3
predict the class of set of examples given their features
Where each element of S is composed by
It is usually evaluated using a traditional misclassification measure such as
Accuracy, F1Score, AUC, among others.
However, these measures assumes that different misclassification errors
carry the same cost
Background - Binary classification
4
We define a cost measure based on the cost matrix [Elkan 2001]
From which we calculate the cost of applying a classifier to a given set
Background - Cost-sensitive evaluation
5
However, the total cost may not be easy to interpret. Therefore, we propose
a savings measure as the cost vs. the cost of using no algorithm at all
Where is the cost of predicting the costless class
Background - Cost-sensitive evaluation
6
Research in example-dependent cost-sensitive classification has been
narrow, mostly because of the lack of publicly available datasets [Aodha
and Brostow 2013].
Standard approaches consist in re-weighting the training examples based
on their costs:
• Cost-proportionate rejection sampling [Zadrozny et al. 2003]
• Cost-proportionate oversampling [Elkan 2001]
Background - State-of-the-art methods
7
• Bayes minimum risk
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost Sensitive Credit Card Fraud Detection
Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.
Miami, USA: IEEE, Dec. 2013, pp. 333–338.
• Probability calibration for Bayes minimum risk (BMR)
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Improving Credit Card Fraud Detection with
Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining,
Philadelphia, USA, 2014, pp. 677 – 685.
• Cost-sensitive logistic regression (CSLR)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for
Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:
IEEE, 2014, pp. 263–269.
• Cost-sensitive decision trees (CSDT)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Decision Trees,” Expert
Systems with Applications, in press, 2015.
Previous contributions
8
• Cost-sensitive classification
Background, previous contributions
• Cost-sensitive Ensembles
Introduction, random inducers, combination methods, propose algorithms
• Datasets
Credit card fraud detection, churn modeling, credit scoring, direct marketing
• Experiments
Experimental setup, results
• Conclusions
Contributions
Agenda
9
The main idea behind the ensemble methodology is to combine several
individual base classifiers in order to have a classifier that outperforms
everyone of them
“The Blind Men and the Elephant”, Godfrey Saxe’s
Introduction - Ensemble learning
10
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Some unknown distribution
Typical ensemble is made by combining T different base classifiers. Each
base classifiers is trained by applying algorithm M in a random subset
Introduction - Ensemble learning
11
Random inducers
12
1
2
3
4
5
6
7
8
8
6
2
5
2
1
3
6
7
1
2
3
8
1
5
8
1
4
4
2
1
9
4
6
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
Bagging Pasting Random forest Random patches
Training set
After the base classifiers are constructed they are typically combined using
one of the following methods:
• Majority voting
• Proposed cost-sensitive weighted voting
Proposed combination methods
13
• Proposed cost-sensitive stacking
Using the cost-sensitive logistic regression [Correa et. al, 2014] model:
Then the weights are estimated using
Proposed combination methods
14
The subsampling can be done either by: Bagging, pasting, random forest or
random patches
Proposed algorithms
Base classifiers
For j in 1..T:
1.Subsample from training set
𝑆𝑗 ← Subsample(𝑆)
2.Train a CSDT on 𝑺𝒋
M𝑗 ← M(𝑆𝑗)
3.Estimate the weight
α𝑗 ← 𝑠𝑎𝑣𝑖𝑛𝑔𝑠 M𝑗 𝑆𝑗
𝑜𝑜𝑏
Combination
Select combination method:
1.Majority voting
𝐻 ← 𝑓𝑚𝑣 𝑆, 𝑀
2.CS-Weighted voting
𝐻 ← 𝑓𝑚𝑣 𝑆, 𝑀, 𝛼
3.CS-Stacking
𝛽 ← 𝑎𝑟𝑔𝑚𝑖𝑛 𝐽(𝑆, 𝑀, 𝛽)
𝐻 ← 𝑓𝑠 𝑆, 𝑀, 𝛽
15
• Cost-sensitive classification
Background, previous contributions
• Cost-sensitive Ensembles
Introduction, random inducers, combination methods, propose algorithms
• Datasets
Credit card fraud detection, churn modeling, credit scoring, direct marketing
• Experiments
Experimental setup, results
• Conclusions
Contributions
Agenda
16
Cost matrix
Database
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost Sensitive Credit Card Fraud Detection
Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.
Miami, USA: IEEE, Dec. 2013, pp. 333–338.
Credit card fraud detection
17
# Examples % Positives Cost (Euros)
1,638,772 0.21% 860,448
Cost matrix
Database
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “A novel costsensitive
framework for customer churn predictive modeling,” Decision Analytics, vol. under review, 2015.
Churn modeling
18
# Examples % Positives Cost (Euros)
9,410 4.83% 580,884
Cost matrix
Database
A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for
Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:
IEEE, 2014, pp. 263–269.
Credit scoring
19
# Examples % Positives Cost (Euros)
Kaggle Credit 112,915 6.74% 83,740,181
PAKDD09 Credit 38,969 19.88% 3,117,960
Cost matrix
Database
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Improving Credit Card Fraud Detection with
Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining,
Philadelphia, USA, 2014, pp. 677 – 685.
Direct marketing
20
# Examples % Positives Cost (Euros)
37,931 12.62% 59,507
• Cost-sensitive classification
Background, previous contributions
• Cost-sensitive Ensembles
Introduction, random inducers, combination methods, propose algorithms
• Datasets
Credit card fraud detection, churn modeling, credit scoring, direct marketing
• Experiments
Experimental setup, results
• Conclusions
Contributions
Agenda
21
• Cost-insensitive (CI):
• Decision trees (DT)
• Logistic regression (LR)
• Random forest (RF)
• Under-sampling (u)
• Cost-proportionate sampling (CPS):
• Cost-proportionate rejection-sampling (r)
• Cost-proportionate over-sampling (o)
• Bayes minimum risk (BMR)
• Cost-sensitive training (CST):
• Cost-sensitive logistic regression (CSLR)
• Cost-sensitive decision trees (CSDT)
Experimental setup - Methods
22
• Ensemble cost-sensitive decision trees (ECSDT):
Random inducers:
• Bagging (CSB)
• Pasting (CSP)
• Random forest (CSRF)
• Random patches (CSRP)
Combination:
• Majority voting (mv)
• Cost-sensitive weighted voting (wv)
• Cost-sensitive staking (s)
Experimental setup - Methods
23
• Each experiment was carry out 50 times
• For the parameters of the algorithms a grid search was made
• Results are measured by savings
• Then the Friedman ranking is calculated for each method
Experimental setup
24
Results
25
Results of the Friedman rank of the savings (1=best, 28=worst)
Family Algorithm Rank
ECSDT CSRP-wv-t 2.6
ECSDT CSRP-s-t 3.4
ECSDT CSRP-mv-t 4
ECSDT CSB-wv-t 5.6
ECSDT CSP-wv-t 7.4
ECSDT CSB-mv-t 8.2
ECSDT CSRF-wv-t 9.4
BMR RF-t-BMR 9.4
ECSDT CSP-s-t 9.6
ECSDT CSP-mv-t 10.2
ECSDT CSB-s-t 10.2
BMR LR-t-BMR 11.2
CPS RF-r 11.6
CST CSDT-t 12.6
Family Algorithm Rank
CST CSLR-t 14.4
ECSDT CSRF-mv-t 15.2
ECSDT CSRF-s-t 16
CI RF-u 17.2
CPS LR-r 19
BMR DT-t-BMR 19
CPS LR-o 21
CPS DT-r 22.6
CI LR-u 22.8
CPS RF-o 22.8
CI DT-u 24.4
CPS DT-o 25
CI DT-t 26
CI RF-t 26.2
Results
26
Results of the Friedman rank of the savings organized by family
Results
27
Percentage of the highest savings
Database Algorithm Savings
Fraud CSRP-wv-t 0.73
Churn CSRP-s-t 0.17
Credit1 CSRP-mv-t 0.52
Credit2 LR-t-BMR 0.31
Marketing LR-t-BMR 0.5
Results within the ECSDT family
28
By combination methodBy random inducer
• New framework for ensembles of example dependent cost-sensitive
decision trees
• Using five databases, from four real-world applications: credit card fraud
detection, churn modeling, credit scoring and direct marketing, we show
that the proposed algorithm significantly outperforms the state-of-the-
art cost-insensitive and example-dependent cost-sensitive algorithms
• Highlight the importance of using the real example-dependent financial
costs associated with the real-world applications
Conclusions
29
Costcla - Software
CostCla is a Python module for cost-sensitive machine learning built on
top of Scikit-Learn, SciPy and distributed under the 3-Clause BSD license.
In particular, it provides:
• A set of example-dependent cost-sensitive algorithms
• Different real-world example-dependent cost-sensitive datasets.
Installation
pip install costcla
Documentation: https://pythonhosted.org/costcla/
Development: https://github.com/albahnsen/CostSensitiveClassification
30
Costcla - Software
31
Costcla - Software
32
Thank You!!
Alejandro Correa Bahnsen

More Related Content

What's hot

Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree AnalysisAnand Arora
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive ModelingEdureka!
 
Anomaly detection- Credit Card Fraud Detection
Anomaly detection- Credit Card Fraud DetectionAnomaly detection- Credit Card Fraud Detection
Anomaly detection- Credit Card Fraud DetectionLipsa Panda
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922stone55
 
An introduction to decision trees
An introduction to decision treesAn introduction to decision trees
An introduction to decision treesFahim Muntaha
 
Decision Tree- M.B.A -DecSci
Decision Tree- M.B.A -DecSciDecision Tree- M.B.A -DecSci
Decision Tree- M.B.A -DecSciLesly Lising
 
What Is a Model, Anyhow?
What Is a Model, Anyhow?What Is a Model, Anyhow?
What Is a Model, Anyhow?Bill Cassill
 
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELSPOSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELScscpconf
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsAlejandro Correa Bahnsen, PhD
 
Computational Finance Introductory Lecture
Computational Finance Introductory LectureComputational Finance Introductory Lecture
Computational Finance Introductory LectureStuart Gordon Reid
 

What's hot (19)

Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Les5e ppt 08
Les5e ppt 08Les5e ppt 08
Les5e ppt 08
 
Les5e ppt 09
Les5e ppt 09Les5e ppt 09
Les5e ppt 09
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
 
Les5e ppt 10
Les5e ppt 10Les5e ppt 10
Les5e ppt 10
 
Anomaly detection- Credit Card Fraud Detection
Anomaly detection- Credit Card Fraud DetectionAnomaly detection- Credit Card Fraud Detection
Anomaly detection- Credit Card Fraud Detection
 
FSRM 582 Project
FSRM 582 ProjectFSRM 582 Project
FSRM 582 Project
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
An introduction to decision trees
An introduction to decision treesAn introduction to decision trees
An introduction to decision trees
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Decision Tree- M.B.A -DecSci
Decision Tree- M.B.A -DecSciDecision Tree- M.B.A -DecSci
Decision Tree- M.B.A -DecSci
 
Les5e ppt 06
Les5e ppt 06Les5e ppt 06
Les5e ppt 06
 
What Is a Model, Anyhow?
What Is a Model, Anyhow?What Is a Model, Anyhow?
What Is a Model, Anyhow?
 
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELSPOSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
 
Computational Finance Introductory Lecture
Computational Finance Introductory LectureComputational Finance Introductory Lecture
Computational Finance Introductory Lecture
 

Viewers also liked

機会学習ハッカソン:ランダムフォレスト
機会学習ハッカソン:ランダムフォレスト機会学習ハッカソン:ランダムフォレスト
機会学習ハッカソン:ランダムフォレストTeppei Baba
 
わかパタ 1章
わかパタ 1章わかパタ 1章
わかパタ 1章weda654
 
わかりやすいパターン認識_2章
わかりやすいパターン認識_2章わかりやすいパターン認識_2章
わかりやすいパターン認識_2章weda654
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
国際化時代の40カ国語言語判定
国際化時代の40カ国語言語判定国際化時代の40カ国語言語判定
国際化時代の40カ国語言語判定Shuyo Nakatani
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRomain Lerallut
 
情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライドKenta Oku
 
ハトでもわかる単純パーセプトロン
ハトでもわかる単純パーセプトロンハトでもわかる単純パーセプトロン
ハトでもわかる単純パーセプトロンtakosumipasta
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksAlejandro Correa Bahnsen, PhD
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...Alejandro Correa Bahnsen, PhD
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filteringsscdotopen
 
JP Chaosmap 2015-2016
JP Chaosmap 2015-2016JP Chaosmap 2015-2016
JP Chaosmap 2015-2016Hiroshi Kondo
 

Viewers also liked (20)

機会学習ハッカソン:ランダムフォレスト
機会学習ハッカソン:ランダムフォレスト機会学習ハッカソン:ランダムフォレスト
機会学習ハッカソン:ランダムフォレスト
 
わかパタ 1章
わかパタ 1章わかパタ 1章
わかパタ 1章
 
わかりやすいパターン認識_2章
わかりやすいパターン認識_2章わかりやすいパターン認識_2章
わかりやすいパターン認識_2章
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
国際化時代の40カ国語言語判定
国際化時代の40カ国語言語判定国際化時代の40カ国語言語判定
国際化時代の40カ国語言語判定
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
 
coordinate descent 法について
coordinate descent 法についてcoordinate descent 法について
coordinate descent 法について
 
情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド情報推薦システム入門:講義スライド
情報推薦システム入門:講義スライド
 
ハトでもわかる単純パーセプトロン
ハトでもわかる単純パーセプトロンハトでもわかる単純パーセプトロン
ハトでもわかる単純パーセプトロン
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural Networks
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
 
Deep forest
Deep forestDeep forest
Deep forest
 
JP Chaosmap 2015-2016
JP Chaosmap 2015-2016JP Chaosmap 2015-2016
JP Chaosmap 2015-2016
 

Similar to Ensembles of example dependent cost-sensitive decision trees slides

Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...
Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...
Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...IRJET Journal
 
Case study of Machine learning in banks
Case study of Machine learning in banksCase study of Machine learning in banks
Case study of Machine learning in banksZhongmin Luo
 
Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?Zhongmin Luo
 
Statistical Learning on Credit Data
Statistical Learning on Credit DataStatistical Learning on Credit Data
Statistical Learning on Credit DataFiras Obeid
 
A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...IJDKP
 
Feature selection in multimodal
Feature selection in multimodalFeature selection in multimodal
Feature selection in multimodalijcsa
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...pandavaTirumala
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesZhongmin Luo
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Sanghun Kim
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...butest
 
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...guest48424e
 
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...萍華 楊
 

Similar to Ensembles of example dependent cost-sensitive decision trees slides (20)

DataMining_CA2-4
DataMining_CA2-4DataMining_CA2-4
DataMining_CA2-4
 
Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...
Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...
Selecting Best Tractor Ranking Wise by Software using MADM(Multiple –Attribut...
 
Case study of Machine learning in banks
Case study of Machine learning in banksCase study of Machine learning in banks
Case study of Machine learning in banks
 
Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?
 
Statistical Learning on Credit Data
Statistical Learning on Credit DataStatistical Learning on Credit Data
Statistical Learning on Credit Data
 
A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...
 
Feature selection in multimodal
Feature selection in multimodalFeature selection in multimodal
Feature selection in multimodal
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
 
pradeep ppt final.pptx
pradeep ppt final.pptxpradeep ppt final.pptx
pradeep ppt final.pptx
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
 
A STUDY OF DECISION TREE ENSEMBLES AND FEATURE SELECTION FOR STEEL PLATES FAU...
A STUDY OF DECISION TREE ENSEMBLES AND FEATURE SELECTION FOR STEEL PLATES FAU...A STUDY OF DECISION TREE ENSEMBLES AND FEATURE SELECTION FOR STEEL PLATES FAU...
A STUDY OF DECISION TREE ENSEMBLES AND FEATURE SELECTION FOR STEEL PLATES FAU...
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
 
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
11/04 Regular Meeting: Monority Report in Fraud Detection Classification of S...
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 

Recently uploaded

ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

Ensembles of example dependent cost-sensitive decision trees slides

  • 1. Ensembles of example-dependent cost-sensitive decision trees April 28, 2015 Alejandro Correa Bahnsen with Djamila Aouada, SnT Björn Ottersten, SnT
  • 2. Motivation 2 • Classification: predicting the class of a set of examples given their features. • Standard classification methods aim at minimizing the errors • Such a traditional framework assumes that all misclassification errors carry the same cost • This is not the case in many real-world applications: Credit card fraud detection, churn modeling, credit scoring, direct marketing.
  • 3. • Cost-sensitive classification Background, previous contributions • Cost-sensitive Ensembles Introduction, random inducers, combination methods, propose algorithms • Datasets Credit card fraud detection, churn modeling, credit scoring, direct marketing • Experiments Experimental setup, results • Conclusions Contributions Agenda 3
  • 4. predict the class of set of examples given their features Where each element of S is composed by It is usually evaluated using a traditional misclassification measure such as Accuracy, F1Score, AUC, among others. However, these measures assumes that different misclassification errors carry the same cost Background - Binary classification 4
  • 5. We define a cost measure based on the cost matrix [Elkan 2001] From which we calculate the cost of applying a classifier to a given set Background - Cost-sensitive evaluation 5
  • 6. However, the total cost may not be easy to interpret. Therefore, we propose a savings measure as the cost vs. the cost of using no algorithm at all Where is the cost of predicting the costless class Background - Cost-sensitive evaluation 6
  • 7. Research in example-dependent cost-sensitive classification has been narrow, mostly because of the lack of publicly available datasets [Aodha and Brostow 2013]. Standard approaches consist in re-weighting the training examples based on their costs: • Cost-proportionate rejection sampling [Zadrozny et al. 2003] • Cost-proportionate oversampling [Elkan 2001] Background - State-of-the-art methods 7
  • 8. • Bayes minimum risk A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications. Miami, USA: IEEE, Dec. 2013, pp. 333–338. • Probability calibration for Bayes minimum risk (BMR) A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia, USA, 2014, pp. 677 – 685. • Cost-sensitive logistic regression (CSLR) A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA: IEEE, 2014, pp. 263–269. • Cost-sensitive decision trees (CSDT) A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Decision Trees,” Expert Systems with Applications, in press, 2015. Previous contributions 8
  • 9. • Cost-sensitive classification Background, previous contributions • Cost-sensitive Ensembles Introduction, random inducers, combination methods, propose algorithms • Datasets Credit card fraud detection, churn modeling, credit scoring, direct marketing • Experiments Experimental setup, results • Conclusions Contributions Agenda 9
  • 10. The main idea behind the ensemble methodology is to combine several individual base classifiers in order to have a classifier that outperforms everyone of them “The Blind Men and the Elephant”, Godfrey Saxe’s Introduction - Ensemble learning 10 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Some unknown distribution
  • 11. Typical ensemble is made by combining T different base classifiers. Each base classifiers is trained by applying algorithm M in a random subset Introduction - Ensemble learning 11
  • 13. After the base classifiers are constructed they are typically combined using one of the following methods: • Majority voting • Proposed cost-sensitive weighted voting Proposed combination methods 13
  • 14. • Proposed cost-sensitive stacking Using the cost-sensitive logistic regression [Correa et. al, 2014] model: Then the weights are estimated using Proposed combination methods 14
  • 15. The subsampling can be done either by: Bagging, pasting, random forest or random patches Proposed algorithms Base classifiers For j in 1..T: 1.Subsample from training set 𝑆𝑗 ← Subsample(𝑆) 2.Train a CSDT on 𝑺𝒋 M𝑗 ← M(𝑆𝑗) 3.Estimate the weight α𝑗 ← 𝑠𝑎𝑣𝑖𝑛𝑔𝑠 M𝑗 𝑆𝑗 𝑜𝑜𝑏 Combination Select combination method: 1.Majority voting 𝐻 ← 𝑓𝑚𝑣 𝑆, 𝑀 2.CS-Weighted voting 𝐻 ← 𝑓𝑚𝑣 𝑆, 𝑀, 𝛼 3.CS-Stacking 𝛽 ← 𝑎𝑟𝑔𝑚𝑖𝑛 𝐽(𝑆, 𝑀, 𝛽) 𝐻 ← 𝑓𝑠 𝑆, 𝑀, 𝛽 15
  • 16. • Cost-sensitive classification Background, previous contributions • Cost-sensitive Ensembles Introduction, random inducers, combination methods, propose algorithms • Datasets Credit card fraud detection, churn modeling, credit scoring, direct marketing • Experiments Experimental setup, results • Conclusions Contributions Agenda 16
  • 17. Cost matrix Database A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications. Miami, USA: IEEE, Dec. 2013, pp. 333–338. Credit card fraud detection 17 # Examples % Positives Cost (Euros) 1,638,772 0.21% 860,448
  • 18. Cost matrix Database A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “A novel costsensitive framework for customer churn predictive modeling,” Decision Analytics, vol. under review, 2015. Churn modeling 18 # Examples % Positives Cost (Euros) 9,410 4.83% 580,884
  • 19. Cost matrix Database A. Correa Bahnsen, D. Aouada, and B. Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA: IEEE, 2014, pp. 263–269. Credit scoring 19 # Examples % Positives Cost (Euros) Kaggle Credit 112,915 6.74% 83,740,181 PAKDD09 Credit 38,969 19.88% 3,117,960
  • 20. Cost matrix Database A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia, USA, 2014, pp. 677 – 685. Direct marketing 20 # Examples % Positives Cost (Euros) 37,931 12.62% 59,507
  • 21. • Cost-sensitive classification Background, previous contributions • Cost-sensitive Ensembles Introduction, random inducers, combination methods, propose algorithms • Datasets Credit card fraud detection, churn modeling, credit scoring, direct marketing • Experiments Experimental setup, results • Conclusions Contributions Agenda 21
  • 22. • Cost-insensitive (CI): • Decision trees (DT) • Logistic regression (LR) • Random forest (RF) • Under-sampling (u) • Cost-proportionate sampling (CPS): • Cost-proportionate rejection-sampling (r) • Cost-proportionate over-sampling (o) • Bayes minimum risk (BMR) • Cost-sensitive training (CST): • Cost-sensitive logistic regression (CSLR) • Cost-sensitive decision trees (CSDT) Experimental setup - Methods 22
  • 23. • Ensemble cost-sensitive decision trees (ECSDT): Random inducers: • Bagging (CSB) • Pasting (CSP) • Random forest (CSRF) • Random patches (CSRP) Combination: • Majority voting (mv) • Cost-sensitive weighted voting (wv) • Cost-sensitive staking (s) Experimental setup - Methods 23
  • 24. • Each experiment was carry out 50 times • For the parameters of the algorithms a grid search was made • Results are measured by savings • Then the Friedman ranking is calculated for each method Experimental setup 24
  • 25. Results 25 Results of the Friedman rank of the savings (1=best, 28=worst) Family Algorithm Rank ECSDT CSRP-wv-t 2.6 ECSDT CSRP-s-t 3.4 ECSDT CSRP-mv-t 4 ECSDT CSB-wv-t 5.6 ECSDT CSP-wv-t 7.4 ECSDT CSB-mv-t 8.2 ECSDT CSRF-wv-t 9.4 BMR RF-t-BMR 9.4 ECSDT CSP-s-t 9.6 ECSDT CSP-mv-t 10.2 ECSDT CSB-s-t 10.2 BMR LR-t-BMR 11.2 CPS RF-r 11.6 CST CSDT-t 12.6 Family Algorithm Rank CST CSLR-t 14.4 ECSDT CSRF-mv-t 15.2 ECSDT CSRF-s-t 16 CI RF-u 17.2 CPS LR-r 19 BMR DT-t-BMR 19 CPS LR-o 21 CPS DT-r 22.6 CI LR-u 22.8 CPS RF-o 22.8 CI DT-u 24.4 CPS DT-o 25 CI DT-t 26 CI RF-t 26.2
  • 26. Results 26 Results of the Friedman rank of the savings organized by family
  • 27. Results 27 Percentage of the highest savings Database Algorithm Savings Fraud CSRP-wv-t 0.73 Churn CSRP-s-t 0.17 Credit1 CSRP-mv-t 0.52 Credit2 LR-t-BMR 0.31 Marketing LR-t-BMR 0.5
  • 28. Results within the ECSDT family 28 By combination methodBy random inducer
  • 29. • New framework for ensembles of example dependent cost-sensitive decision trees • Using five databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we show that the proposed algorithm significantly outperforms the state-of-the- art cost-insensitive and example-dependent cost-sensitive algorithms • Highlight the importance of using the real example-dependent financial costs associated with the real-world applications Conclusions 29
  • 30. Costcla - Software CostCla is a Python module for cost-sensitive machine learning built on top of Scikit-Learn, SciPy and distributed under the 3-Clause BSD license. In particular, it provides: • A set of example-dependent cost-sensitive algorithms • Different real-world example-dependent cost-sensitive datasets. Installation pip install costcla Documentation: https://pythonhosted.org/costcla/ Development: https://github.com/albahnsen/CostSensitiveClassification 30