SlideShare a Scribd company logo
1 of 55
Download to read offline
EXAMPLE-DEPENDENT COST-SENSITIVE
CLASSIFICATION
applications in financial risk modeling
and marketing analytics
September 15, 2015
Alejandro Correa Bahnsen
with
Djamila Aouada, SnT
BjΓΆrn Ottersten, SnT
Motivation
2
β€’ Classification: predicting the class of
a set of examples given their
features.
β€’ Standard classification methods aim
at minimizing the errors
β€’ Such a traditional framework
assumes that all misclassification
errors carry the same cost
β€’ This is not the case in many real-world applications: Credit card
fraud detection, churn modeling, credit scoring, direct marketing.
x1
x2
Motivation
3
β€’ Credit card fraud detection, failing to detect a fraudulent
transaction may have an economical impact from a few to
thousands of Euros, depending on the particular transaction and
card holder.
β€’ Credit scoring, accepting loans from bad customers does not have
the same economical loss, since customers have different credit
lines, therefore, different profit.
β€’ Churn modeling, misidentifying a profitable or unprofitable churner
has a significant different economic result.
β€’ Direct marketing, wrongly predicting that a customer will not
accept an offer when in fact he will, may have different financial
impact, as not all customers generate the same profit.
β€’ Motivation
β€’ Cost-sensitive classification
Background
β€’ Real-world cost-sensitive applications
Credit card fraud detection, churn modeling, credit scoring, direct marketing
β€’ Proposed cost-sensitive algorithms
Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,
ensembles of cost-sensitive decision trees
β€’ Experiments
Experimental setup, results
β€’ Conclusions
Contributions, future work
Agenda
4
predict the class of set of examples given their features
𝑓: 𝑆 β†’ 0,1
Where each element of 𝑆 is composed by 𝑋𝑖 = π‘₯𝑖
1
, π‘₯𝑖
2
, … , π‘₯𝑖
π‘˜
It is usually evaluated using a traditional misclassification measures such as
Accuracy, F1Score, AUC, among others.
However, these measures assume that different misclassification errors
carry the same cost
Background - Binary classification
5
We define a cost measure based on the cost matrix [Elkan 2001]
From which we calculate the π‘ͺ𝒐𝒔𝒕 of applying a classifier to a given set
πΆπ‘œπ‘ π‘‘ 𝑓 𝑆 =
𝑖=1
𝑁
𝑦𝑖 𝑐𝑖 𝐢 𝑇𝑃 𝑖
+ 1 βˆ’ 𝑐𝑖 𝐢 𝐹𝑁 𝑖
+ 1 βˆ’ 𝑦𝑖 𝑐𝑖 𝐢 𝐹𝑃 𝑖
+ 1 βˆ’ 𝑐𝑖 𝐢 𝑇𝑁 𝑖
Background - Cost-sensitive evaluation
6
Actual Positive
π’šπ’Š = 𝟏
Actual Negative
π’šπ’Š = 𝟎
Predicted Positive
π’„π’Š = 𝟏
𝐢 𝑇𝑃 𝑖
𝐢 𝐹𝑃 𝑖
Predicted Negative
π’„π’Š = 𝟎
𝐢 𝐹𝑁 𝑖
𝐢 𝑇𝑁 𝑖
However, the total cost may not be easy to interpret. Therefore, we propose
a π‘Ίπ’‚π’—π’Šπ’π’ˆπ’” measure as the cost vs. the cost of using no algorithm at all
π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑓 𝑆 =
πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 βˆ’ πΆπ‘œπ‘ π‘‘ 𝑓 𝑆
πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆
Where π‘ͺ𝒐𝒔𝒕𝒍 𝒇 𝑺 is the cost of predicting the costless class
πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 = min πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 , πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆
Background - Cost-sensitive evaluation
7
Research in example-dependent cost-sensitive classification has been
narrow, mostly because of the lack of publicly available datasets [Aodha
and Brostow 2013].
Standard approaches consist in re-weighting the training examples based
on their costs:
β€’ Cost-proportionate rejection sampling [Zadrozny et al. 2003]
β€’ Cost-proportionate oversampling [Elkan 2001]
Background - State-of-the-art methods
8
β€’ Motivation
β€’ Cost-sensitive classification
Background
β€’ Real-world cost-sensitive applications
Credit card fraud detection, churn modeling, credit scoring, direct marketing
β€’ Proposed cost-sensitive algorithms
Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,
ensembles of cost-sensitive decision trees
β€’ Experiments
Experimental setup, results
β€’ Conclusions
Contributions, future work
Agenda
9
Estimate the probability of a transaction being fraud based on analyzing
customer patterns and recent fraudulent behavior
Issues when constructing a fraud detection system [Bolton et al., 2002]:
β€’ Skewness of the data
β€’ Cost-sensitivity
β€’ Short time response of the system
β€’ Dimensionality of the search space
β€’ Feature preprocessing
Credit card fraud detection
10
Credit card fraud detection is a cost-sensitive problem. As the cost due to a
false positive is different than the cost of a false negative.
β€’ False positives: When predicting a transaction as fraudulent, when in
fact it is not a fraud, there is an administrative cost that is incurred by
the financial institution.
β€’ False negatives: Failing to detect a fraud, the amount of that transaction
is lost.
Moreover, it is not enough to assume a constant cost difference between
false positives and false negatives, as the amount of the transactions varies
quite significantly.
Credit card fraud detection
11
Cost matrix
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œCost Sensitive Credit Card Fraud Detection
Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.
Miami, USA: IEEE, Dec. 2013, pp. 333–338.
Credit card fraud detection
12
Actual Positive
π’šπ’Š = 𝟏
Actual Negative
π’šπ’Š = 𝟎
Predicted Positive
π’„π’Š = 𝟏
𝐢 𝑇𝑃 𝑖
= 𝐢 π‘Ž 𝐢 𝐹𝑃 𝑖
= 𝐢 π‘Ž
Predicted Negative
π’„π’Š = 𝟎
𝐢 𝐹𝑁 𝑖
= π΄π‘šπ‘‘π‘– 𝐢 𝑇𝑁 𝑖
= 0
Raw features
Credit card fraud detection
13
Attribute name Description
Transaction ID Transaction identification number
Time Date and time of the transaction
Account number Identification number of the customer
Card number Identification of the credit card
Transaction type ie. Internet, ATM, POS, ...
Entry mode ie. Chip and pin, magnetic stripe, ...
Amount Amount of the transaction in Euros
Merchant code Identification of the merchant type
Merchant group Merchant group identification
Country Country of trx
Country 2 Country of residence
Type of card ie. Visa debit, Mastercard, American Express...
Gender Gender of the card holder
Age Card holder age
Bank Issuer bank of the card
Transaction aggregation strategy [Whitrow, 2008]
Credit card fraud detection
14
Raw Features
TrxId Time Type Country Amt
1 1/1 18:20 POS Lux 250
2 1/1 20:35 POS Lux 400
3 1/1 22:30 ATM Lux 250
4 2/1 00:50 POS Ger 50
5 2/1 19:18 POS Ger 100
6 2/1 23:45 POS Ger 150
7 3/1 06:00 POS Lux 10
Aggregated Features
No Trx
last 24h
Amt last
24h
No Trx
last 24h
same
type and
country
Amt last
24h same
type and
country
0 0 0 0
1 250 1 250
2 650 0 0
3 900 0 0
3 700 1 50
2 150 2 150
3 400 0 0
Proposed periodic features
When is a customer expected to make a new transaction?
Considering a von Mises distribution with a period of 24 hours such that
𝑃(π‘‘π‘–π‘šπ‘’) ~ π‘£π‘œπ‘›π‘šπ‘–π‘ π‘’π‘  πœ‡, 𝜎 =
𝑒 πœŽπ‘π‘œπ‘ (π‘‘π‘–π‘šπ‘’βˆ’πœ‡)
2πœ‹πΌ0 𝜎
where 𝝁 is the mean, 𝝈 is the standard deviation, and 𝑰 𝟎 is the Bessel function
Credit card fraud detection
15
Proposed periodic features
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œFeature Engineering Strategies for
Credit Card Fraud Detection” submitted to Expert Systems with Applications.
Credit card fraud detection
16
Classify which potential customers are likely to default a contracted
financial obligation based on the customer’s past financial experience.
It is a cost-sensitive problem as the cost associated with approving a bad
customer, i.e., false negative, is quite different from the cost associated
with declining a good customer, i.e., false positive. Furthermore, the costs
are not constant among customers. This is because loans have different
credit line amounts, terms, and even interest rates.
Credit scoring
17
Cost matrix
A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Logistic Regression for
Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:
IEEE, 2014, pp. 263–269.
Credit scoring
18
Actual Positive
π’šπ’Š = 𝟏
Actual Negative
π’šπ’Š = 𝟎
Predicted Positive
π’„π’Š = 𝟏
𝐢 𝑇𝑃 𝑖
= 0 𝐢 𝐹𝑃 𝑖
= π‘Ÿπ‘– + 𝐢 𝐹𝑃
π‘Ž
Predicted Negative
π’„π’Š = 𝟎
𝐢 𝐹𝑁𝑖
= 𝐢𝑙𝑖 βˆ— 𝐿 𝑔𝑑 𝐢 𝑇𝑁 𝑖
= 0
Predict the probability of a customer defecting using historical, behavioral
and socioeconomical information.
This tool is of great benefit to subscription based companies allowing them
to maximize the results of retention campaigns.
Churn modeling
19
Churn management campaign [Verbraken, 2013]
Churn modeling
20
Inflow
New
Customers
Customer
Base
Active
Customers
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective
Churners
Churn Model Prediction
1
1
𝟏 βˆ’ 𝜸𝜸
1
Proposed financial evaluation of a churn campaign
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œA novel cost-sensitive
framework for customer churn predictive modeling,” Decision Analytics, vol. 2:5, 2015.
Churn modeling
21
Inflow
New
Customers
Customer
Base
Active
Customers
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective
Churners
Churn Model Prediction
𝟎
π‘ͺ𝑳𝑽
π‘ͺ𝑳𝑽 + π‘ͺ 𝒂π‘ͺ 𝒐 + π‘ͺ 𝒂
π‘ͺ 𝒐 + π‘ͺ 𝒂
Cost matrix
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œA novel cost-sensitive
framework for customer churn predictive modeling,” Decision Analytics, vol. 2:5, 2015.
Churn modeling
22
Actual Positive
π’šπ’Š = 𝟏
Actual Negative
π’šπ’Š = 𝟎
Predicted Positive
π’„π’Š = 𝟏
𝐢 𝑇𝑃 𝑖
= 𝛾𝑖 𝐢 π‘œ 𝑖
+
1 βˆ’ 𝛾𝑖 𝐢𝐿𝑉𝑖 + 𝐢 π‘Ž
𝐢 𝐹𝑃 𝑖
= 𝐢 π‘œ 𝑖
+ 𝐢 π‘Ž
Predicted Negative
π’„π’Š = 𝟎
𝐢 𝐹𝑁𝑖
= 𝐢𝐿𝑉𝑖 𝐢 𝑇𝑁 𝑖
= 0
Classify those customers who are more likely to have a certain response to a
marketing campaign.
This problem is example-dependent cost sensitive, as the false positives
have the cost of contacting the client, and false negatives have the cost due
to the loss of income by failing to making the an offer to the right customer.
Direct marketing
23
Cost matrix
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œImproving Credit Card Fraud Detection with
Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining,
Philadelphia, USA, 2014, pp. 677 – 685.
Direct marketing
24
Actual Positive
π’šπ’Š = 𝟏
Actual Negative
π’šπ’Š = 𝟎
Predicted Positive
π’„π’Š = 𝟏
𝐢 𝑇𝑃 𝑖
= 𝐢 π‘Ž 𝐢 𝐹𝑃 𝑖
= 𝐢 π‘Ž
Predicted Negative
π’„π’Š = 𝟎
𝐢 𝐹𝑁 𝑖
= 𝐼𝑛𝑑𝑖 𝐢 𝑇𝑁 𝑖
= 0
β€’ Motivation
β€’ Cost-sensitive classification
Background
β€’ Real-world cost-sensitive applications
Credit card fraud detection, churn modeling, credit scoring, direct marketing
β€’ Proposed cost-sensitive algorithms
Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,
ensembles of cost-sensitive decision trees
β€’ Experiments
Experimental setup, results
β€’ Conclusions
Contributions, future work
Agenda
25
β€’ Bayes minimum risk (BMR)
A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œCost Sensitive Credit Card Fraud Detection
Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.
Miami, USA: IEEE, Dec. 2013, pp. 333–338.
A. Correa Bahnsen, Aouada, and B. Ottersten, β€œImproving Credit Card Fraud Detection with Calibrated
Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia,
USA, 2014, pp. 677 – 685.
β€’ Cost-sensitive logistic regression (CSLR)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Logistic Regression for
Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:
IEEE, 2014, pp. 263–269.
β€’ Cost-sensitive decision trees (CSDT)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Decision Trees,” Expert
Systems with Applications, vol. 42:19, 2015.
β€’ Ensembles of cost-sensitive decision trees (ECSDT)
A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œEnsemble of Example-Dependent Cost-Sensitive Decision
Trees,” IEEE Transactions on Knowledge and Data Engineering, vol. under review, 2015.
Proposed cost-sensitive algorithms
26
Decision model based on quantifying tradeoffs between various decisions
using probabilities and the costs that accompany such decisions
Risk of classification
𝑅 𝑐𝑖 = 0|π‘₯𝑖 = 𝐢 𝑇𝑁 𝑖 1 βˆ’ 𝑝𝑖 + 𝐢 𝐹𝑁 𝑖 βˆ™ 𝑝𝑖
𝑅 𝑐𝑖 = 1|π‘₯𝑖 = 𝐢 𝐹𝑃 𝑖 1 βˆ’ 𝑝𝑖 + 𝐢 𝑇𝑃 𝑖 βˆ™ 𝑝𝑖
Using the different risks the prediction is made based on the following
condition:
𝑐𝑖 =
0 𝑅 𝑐𝑖 = 0|π‘₯𝑖 ≀ 𝑅 𝑐𝑖 = 1|π‘₯𝑖
1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
Bayes Minimum Risk
27
Cost-Sensitive Logistic Regression
28
β€’ Logistic Regression Model
𝑝𝑖 = 𝑃 𝑦𝑖 = 0|π‘₯𝑖 = β„Ž πœƒ π‘₯𝑖 = 𝑔
𝑗=1
π‘˜
πœƒπ‘— π‘₯𝑖
𝑗
β€’ Cost Function
𝐽𝑖 πœƒ = βˆ’π‘¦π‘– log β„Ž πœƒ π‘₯𝑖 βˆ’ 1 βˆ’ 𝑦𝑖 log 1 βˆ’ β„Ž πœƒ π‘₯𝑖
β€’ Cost Analysis
𝐽𝑖 πœƒ β‰ˆ
0 𝑖𝑓 𝑦𝑖 β‰ˆ β„Ž πœƒ π‘₯𝑖
𝑖𝑛𝑓 𝑖𝑓 𝑦𝑖 β‰ˆ 1 βˆ’ β„Ž πœƒ π‘₯𝑖
𝐢 𝑇𝑃 𝑖 = 𝐢 𝑇𝑁 𝑖 β‰ˆ 0
𝐢 𝐹𝑃 𝑖 = 𝐢 𝐹𝑁 𝑖 β‰ˆ ∞
Cost-Sensitive Logistic Regression
29
β€’ Actual Costs
𝐽 𝑐 πœƒ =
𝐢 𝑇𝑃 𝑖
𝑖𝑓 𝑦𝑖 = 1 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 1
𝐢 𝑇𝑁 𝑖
𝑖𝑓 𝑦𝑖 = 0 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 0
𝐢 𝐹𝑃 𝑖
𝑖𝑓 𝑦𝑖 = 0 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 1
𝐢 𝐹𝑁 𝑖
𝑖𝑓 𝑦𝑖 = 1 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 0
β€’ Proposed Cost-Sensitive Function
𝐽 𝑐 πœƒ =
1
𝑁
𝑖=1
𝑁
𝑦𝑖 β„Ž πœƒ π‘₯𝑖 𝐢 𝑇𝑃 𝑖
+ 1 βˆ’ β„Ž πœƒ π‘₯𝑖 𝐢 𝐹𝑁 𝑖
+
1 βˆ’ 𝑦𝑖 β„Ž πœƒ π‘₯𝑖 𝐢 𝐹𝑃 𝑖
+ 1 βˆ’ β„Ž πœƒ π‘₯𝑖 𝐢 𝑇𝑁 𝑖
β€’ A decision tree is a classification model that iteratively creates binary
decision rules π‘₯ 𝑗, 𝑙 π‘š
𝑗
that maximize certain criteria (gain, entropy, …).
Where π‘₯ 𝑗, 𝑙 π‘š
𝑗
refers to making a rule using feature j on value m
β€’ Maximize the accuracy is different than maximizing the cost.
β€’ To solve this, some studies had been proposed method that aim to
introduce the cost-sensitivity into the algorithms [Lomax 2013].
However, research have been focused on class-dependent methods
β€’ We proposed:
β€’ Example-dependent cost based impurity measure
β€’ Example-dependent cost based pruning criteria
Cost-Sensitive Decision trees
30
Cost-Sensitive Decision trees
31
Proposed Cost based impurity measure
𝑆 𝑙 = π‘₯|π‘₯𝑖 ∈ 𝑆 ∧ π‘₯𝑖
𝑗
≀ 𝑙 π‘š
𝑗
𝑆 π‘Ÿ = π‘₯|π‘₯𝑖 ∈ 𝑆 ∧ π‘₯𝑖
𝑗
> 𝑙 π‘š
𝑗
β€’ The impurity of each leaf is calculated using:
𝐼𝑐 𝑆 = min πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 , πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆
𝑓 𝑆 =
0 𝑖𝑓 πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 ≀ πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆
1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
β€’ Afterwards the gain of applying a given rule to the set 𝑆 is:
πΊπ‘Žπ‘–π‘› 𝑐 π‘₯ 𝑗, 𝑙 π‘š
𝑗
= 𝐼𝑐 πœ‹1 βˆ’ 𝐼𝑐 πœ‹1
𝑙
+ 𝐼𝑐 πœ‹1
π‘Ÿ
S
S S
π‘₯ 𝑗, 𝑙 π‘š
𝑗
Cost-Sensitive Decision trees
32
Decision trees construction
β€’ The rule that maximizes the gain is selected
𝑏𝑒𝑠𝑑 π‘₯, 𝑏𝑒𝑠𝑑𝑙 = π‘Žπ‘Ÿπ‘” max
𝑗,π‘š
πΊπ‘Žπ‘–π‘› π‘₯ 𝑗, 𝑙 π‘š
𝑗
S
S S
S S S S
S S S S
β€’ The process is repeated until a stopping criteria is met:
Cost-Sensitive Decision trees
33
Proposed cost-sensitive pruning criteria
β€’ Calculation of the Tree savings and pruned Tree savings
S
S S
S S S S
S S S S
𝑃𝐢𝑐 =
πΆπ‘œπ‘ π‘‘ 𝑓 𝑆, π‘‡π‘Ÿπ‘’π‘’ βˆ’ πΆπ‘œπ‘ π‘‘ 𝑓 𝑆, 𝐸𝐡 π‘‡π‘Ÿπ‘’π‘’, π‘π‘Ÿπ‘Žπ‘›π‘β„Ž
π‘‡π‘Ÿπ‘’π‘’ βˆ’ 𝐸𝐡 π‘‡π‘Ÿπ‘’π‘’, π‘π‘Ÿπ‘Žπ‘›π‘β„Ž
β€’ After calculating the pruning criteria for all possible trees. The maximum
improvement is selected and the Tree is pruned.
β€’ Later the process is repeated until there is no further improvement.
S
S S
S S S S
S S
S
S S
S S
Typical ensemble is made by combining T different base classifiers. Each
base classifiers is trained by applying algorithm M in a random subset
Ensembles of Cost-Sensitive Decision trees
34
𝑀𝑗 ← 𝑀 𝑆𝑗 βˆ€π‘— ∈ 1, … , 𝑇
The core principle in ensemble learning, is to induce random perturbations
into the learning procedure in order to produce several different base
classifiers from a single training set, then combining the base classifiers in
order to make the final prediction.
Ensembles of Cost-Sensitive Decision trees
35
Ensembles of Cost-Sensitive Decision trees
36
1
2
3
4
5
6
7
8
8
6
2
5
2
1
3
6
7
1
2
3
8
1
5
8
1
4
4
2
1
9
4
6
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
Bagging Pasting Random forest Random patches
Training set
After the base classifiers are constructed they are typically combined using
one of the following methods:
β€’ Majority voting
𝐻 𝑆 = π‘“π‘šπ‘£ 𝑆, 𝑀 = π‘Žπ‘Ÿπ‘” max
π‘βˆˆ 0,1
𝑗=1
𝑇
1 𝑐 𝑀𝑗 𝑆
β€’ Proposed cost-sensitive weighted voting
𝐻 𝑆 = 𝑓𝑀𝑣 𝑆, 𝑀, 𝛼 = π‘Žπ‘Ÿπ‘” max
π‘βˆˆ 0,1
𝑗=1
𝑇
𝛼𝑗1 𝑐 𝑀𝑗 𝑆
𝛼𝑗 =
1βˆ’πœ€ 𝑀 𝑗 𝑆 𝑗
π‘œπ‘œπ‘
𝑗1=1
𝑇
1βˆ’πœ€ 𝑀 𝑗1 𝑆 𝑗1
π‘œπ‘œπ‘
𝛼𝑗 =
π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑀 𝑗 𝑆 𝑗
π‘œπ‘œπ‘
𝑗1=1
𝑇
π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑀 𝑗1 𝑆 𝑗1
π‘œπ‘œπ‘
Ensembles of Cost-Sensitive Decision trees
37
β€’ Proposed cost-sensitive stacking
𝐻 𝑆 = 𝑓𝑠 𝑆, 𝑀, 𝛽 =
1
1 + 𝑒
βˆ’ 𝑗=1
𝑇
𝛽 𝑗 𝑀 𝑗 𝑆
Using the cost-sensitive logistic regression [Correa et. al, 2014] model:
𝐽 𝑆, 𝑀, 𝛽 =
𝑖=1
𝑁
𝑦𝑖 𝑓𝑠 𝑆, 𝑀, 𝛽 𝐢 𝑇𝑃 𝑖
βˆ’ 𝐢 𝐹𝑁𝑖
+ 𝐢 𝐹𝑁 𝑖
+
1 βˆ’ 𝑦𝑖 𝑓𝑠 𝑆, 𝑀, 𝛽 𝐢 𝐹𝑃 𝑖
βˆ’ 𝐢 𝑇𝑁 𝑖
+ 𝐢 𝑇𝑁 𝑖
Then the weights are estimated using
𝛽 = π‘Žπ‘Ÿπ‘” min
𝛽
𝐽 𝑆, 𝑀, 𝛽
Ensembles of Cost-Sensitive Decision trees
38
β€’ Motivation
β€’ Cost-sensitive classification
Background
β€’ Real-world cost-sensitive applications
Credit card fraud detection, churn modeling, credit scoring, direct marketing
β€’ Proposed cost-sensitive algorithms
Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,
ensembles of cost-sensitive decision trees
β€’ Experiments
Experimental setup, results
β€’ Conclusions
Contributions, future work
Agenda
39
Experimental setup - Datasets
40
Database # Examples % Positives Cost (Euros)
Fraud 1,638,772 0.21% 860,448
Churn 9,410 4.83% 580,884
Kaggle Credit 112,915 6.74% 8,740,181
PAKDD09 Credit 38,969 19.88% 3,117,960
Direct Marketing 37,931 12.62% 59,507
β€’ Cost-insensitive (CI):
β€’ Decision trees (DT)
β€’ Logistic regression (LR)
β€’ Random forest (RF)
β€’ Under-sampling (u)
β€’ Cost-proportionate sampling (CPS):
β€’ Cost-proportionate rejection-sampling (r)
β€’ Cost-proportionate over-sampling (o)
β€’ Bayes minimum risk (BMR)
β€’ Cost-sensitive training (CST):
β€’ Cost-sensitive logistic regression (CSLR)
β€’ Cost-sensitive decision trees (CSDT)
Experimental setup - Methods
41
β€’ Ensemble cost-sensitive decision trees (ECSDT):
Random inducers:
β€’ Bagging (CSB)
β€’ Pasting (CSP)
β€’ Random forest (CSRF)
β€’ Random patches (CSRP)
Combination:
β€’ Majority voting (mv)
β€’ Cost-sensitive weighted voting (wv)
β€’ Cost-sensitive staking (s)
Experimental setup - Methods
42
β€’ Each experiment was carried out 50 times
β€’ For the parameters of the algorithms a grid search was made
β€’ Results are measured by savings and F1Score
β€’ Then the Friedman ranking is calculated for each method
Experimental setup
43
Results
44
Percentage of the highest savings
Database Algorithm Savings
Savings
(Euros)
Fraud CSRP-wv-t 0.73 628,127
Churn CSRP-s-t 0.17 98,750
Credit1 CSRP-mv-t 0.52 4,544,894
Credit2 LR-t-BMR 0.31 966,568
Marketing LR-t-BMR 0.51 30,349
% Pos.
0.21
4.83
6.74
19.88
12.62
Results
45
Results of the Friedman rank of the savings (1=best, 28=worst)
Family Algorithm Rank
ECSDT CSRP-wv-t 2.6
ECSDT CSRP-s-t 3.4
ECSDT CSRP-mv-t 4
ECSDT CSB-wv-t 5.6
ECSDT CSP-wv-t 7.4
ECSDT CSB-mv-t 8.2
ECSDT CSRF-wv-t 9.4
BMR RF-t-BMR 9.4
ECSDT CSP-s-t 9.6
ECSDT CSP-mv-t 10.2
ECSDT CSB-s-t 10.2
BMR LR-t-BMR 11.2
CPS RF-r 11.6
CST CSDT-t 12.6
Family Algorithm Rank
CST CSLR-t 14.4
ECSDT CSRF-mv-t 15.2
ECSDT CSRF-s-t 16
CI RF-u 17.2
CPS LR-r 19
BMR DT-t-BMR 19
CPS LR-o 21
CPS DT-r 22.6
CI LR-u 22.8
CPS RF-o 22.8
CI DT-u 24.4
CPS DT-o 25
CI DT-t 26
CI RF-t 26.2
Results
46
Results of the Friedman rank of the savings organized by family
Results within the ECSDT family
47
By combination methodBy random inducer
Results
48
Comparison of the Friedman ranking of the savings and F1Score sorted by
F1Score ranking
β€’ Motivation
β€’ Cost-sensitive classification
Background
β€’ Real-world cost-sensitive applications
Credit card fraud detection, churn modeling, credit scoring, direct marketing
β€’ Proposed cost-sensitive algorithms
Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,
ensembles of cost-sensitive decision trees
β€’ Experiments
Experimental setup, results
β€’ Conclusions
Contributions, future work
Agenda
49
β€’ New framework of example dependent cost-sensitive classification
β€’ Using five databases, from four real-world applications: credit card fraud
detection, churn modeling, credit scoring and direct marketing, we show
that the proposed algorithms significantly outperforms the state-of-
the-art cost-insensitive and example-dependent cost-sensitive
algorithms
β€’ Highlight the importance of using the real example-dependent financial
costs associated with the real-world applications
Conclusions
50
β€’ Multi-class example-dependent cost-sensitive classification
β€’ Cost-sensitive calibration
β€’ Staking cost-sensitive decision trees
β€’ Example-dependent cost-sensitive boosting
β€’ Online example-dependent cost-sensitive classification
Future research directions
51
Contributions - Papers
Date Name Conference / Journal Status
July
2013
Cost Sensitive Credit Card Fraud Detection
using Bayes Minimum Risk
IEEE International Conference on
Machine Learning and Applications
Published
October
2013
Improving Credit Card Fraud Detection with
Calibrated Probabilities
SIAM International Conference on
Data Mining
Published
June
2014
Credit Scoring using
Cost-Sensitive Logistic Regression
IEEE International Conference on
Machine Learning and Applications
Published
October
2014
Example-Dependent Cost-Sensitive Decision
Trees
Expert Systems with Applications Published
January
2015
A novel cost-sensitive framework for customer
churn predictive modeling
Decision Analytics Published
March
2015
Ensemble of Example-Dependent Cost-
Sensitive Decision Trees
IEEE Transactions on Knowledge
and Data Engineering
Under
review
March
2015
Feature Engineering Strategies for Credit Card
Fraud Detection
Expert Systems with Applications
Under
review
June
2015
Detecting Credit Card Fraud using Periodic
Features
IEEE International Conference on
Machine Learning and Applications
In Press
52
Contributions - Costcla
costcla is a Python module for cost-
sensitive classification built on top of
Scikit-Learn, SciPy and distributed under
the 3-Clause BSD license.
In particular, it provides:
β€’ A set of example-dependent cost-
sensitive algorithms
β€’ Different real-world example-
dependent cost-sensitive datasets.
Installation
pip install costcla
Documentation:
https://pythonhosted.org/costcla/
53
Thank You!!
Alejandro Correa Bahnsen
Contact information
Alejandro Correa Bahnsen
University of Luxembourg
albahnsen.com
al.bahnsen@gmail.com
http://www.linkedin.com/in/albahnsen
https://github.com/albahnsen/CostSensitiveClassification

More Related Content

What's hot

Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
Β 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in BankingArul Bharathi
Β 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
Β 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
Β 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
Β 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
Β 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
Β 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)Learnbay Datascience
Β 
Titanic survivor prediction ppt (5)
Titanic survivor prediction ppt (5)Titanic survivor prediction ppt (5)
Titanic survivor prediction ppt (5)GLA University
Β 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedRising Media Ltd.
Β 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
Β 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
Β 
House price prediction
House price predictionHouse price prediction
House price predictionAdityaKumar1505
Β 
Predictive modelling
Predictive modellingPredictive modelling
Predictive modellingRajib Kumar De
Β 
Deep residual learning for image recognition
Deep residual learning for image recognitionDeep residual learning for image recognition
Deep residual learning for image recognitionYoonho Shin
Β 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
Β 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION K Srinivas Rao
Β 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
Β 
Customer Churn prediction in ECommerce Sector.pdf
Customer Churn prediction in ECommerce Sector.pdfCustomer Churn prediction in ECommerce Sector.pdf
Customer Churn prediction in ECommerce Sector.pdfvirajkhot5
Β 

What's hot (20)

Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
Β 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
Β 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Β 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
Β 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
Β 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
Β 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
Β 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Β 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
Β 
Titanic survivor prediction ppt (5)
Titanic survivor prediction ppt (5)Titanic survivor prediction ppt (5)
Titanic survivor prediction ppt (5)
Β 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Β 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Β 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
Β 
House price prediction
House price predictionHouse price prediction
House price prediction
Β 
Predictive modelling
Predictive modellingPredictive modelling
Predictive modelling
Β 
Deep residual learning for image recognition
Deep residual learning for image recognitionDeep residual learning for image recognition
Deep residual learning for image recognition
Β 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
Β 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
Β 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
Β 
Customer Churn prediction in ECommerce Sector.pdf
Customer Churn prediction in ECommerce Sector.pdfCustomer Churn prediction in ECommerce Sector.pdf
Customer Churn prediction in ECommerce Sector.pdf
Β 

Viewers also liked

Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data sl...
Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data   sl...Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data   sl...
Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data sl...Alejandro Correa Bahnsen, PhD
Β 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksAlejandro Correa Bahnsen, PhD
Β 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
Β 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Alejandro Correa Bahnsen, PhD
Β 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningAlejandro Correa Bahnsen, PhD
Β 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycleAlejandro Correa Bahnsen, PhD
Β 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsAlejandro Correa Bahnsen, PhD
Β 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAlejandro Correa Bahnsen, PhD
Β 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
Β 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesAlejandro Correa Bahnsen, PhD
Β 

Viewers also liked (13)

Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data sl...
Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data   sl...Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data   sl...
Fraud analytics detecciΓ³n y prevenciΓ³n de fraudes en la era del big data sl...
Β 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural Networks
Β 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
Β 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Β 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
Β 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
Β 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
Β 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
Β 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
Β 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
Β 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Β 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
Β 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
Β 

Similar to PhD Defense - Example-Dependent Cost-Sensitive Classification

How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities WNS Global Services
Β 
Retail Banking 6 Steps to Improving the Collections Experience.pptx
Retail Banking 6 Steps to Improving the Collections Experience.pptxRetail Banking 6 Steps to Improving the Collections Experience.pptx
Retail Banking 6 Steps to Improving the Collections Experience.pptxMaveric Systems
Β 
Retail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdfRetail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdfMaveric Systems
Β 
Retail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdfRetail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdfMaveric Systems
Β 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440stone55
Β 
Analytics in banking services
Analytics in banking servicesAnalytics in banking services
Analytics in banking servicesMariyageorge
Β 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
Β 
Transaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCardTransaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCardWestley Koenen
Β 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
Β 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banksPankaj Baid
Β 
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptxCredit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptxBoston Institute of Analytics
Β 
Blog-Embryonic Opportunities For Predictive Analytics In Australia
Blog-Embryonic Opportunities For Predictive Analytics In AustraliaBlog-Embryonic Opportunities For Predictive Analytics In Australia
Blog-Embryonic Opportunities For Predictive Analytics In AustraliaArup Das
Β 
Uses of analytics in the field of Banking
Uses of analytics in the field of BankingUses of analytics in the field of Banking
Uses of analytics in the field of BankingNiveditasri N
Β 
Credit Card Business Plan
Credit Card Business PlanCredit Card Business Plan
Credit Card Business PlanRaghavendra L Rao
Β 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistNaveen Kumar
Β 
IRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET Journal
Β 

Similar to PhD Defense - Example-Dependent Cost-Sensitive Classification (20)

Credit iconip
Credit iconipCredit iconip
Credit iconip
Β 
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
How a Predictive Analytics-based Framework Helps Reduce Bad Debts in Utilities
Β 
Retail Banking 6 Steps to Improving the Collections Experience.pptx
Retail Banking 6 Steps to Improving the Collections Experience.pptxRetail Banking 6 Steps to Improving the Collections Experience.pptx
Retail Banking 6 Steps to Improving the Collections Experience.pptx
Β 
Retail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdfRetail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdf
Β 
Retail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdfRetail Banking 6 Steps to Improving the Collections Experience.pdf
Retail Banking 6 Steps to Improving the Collections Experience.pdf
Β 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440
Β 
Analytics in banking services
Analytics in banking servicesAnalytics in banking services
Analytics in banking services
Β 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
Β 
Transaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCardTransaction_Scoring - WVK MasterCard
Transaction_Scoring - WVK MasterCard
Β 
Credit iconip
Credit iconipCredit iconip
Credit iconip
Β 
Credit iconip
Credit iconipCredit iconip
Credit iconip
Β 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
Β 
AI powered decision making in banks
AI powered decision making in banksAI powered decision making in banks
AI powered decision making in banks
Β 
K-MODEL PPT.pptx
K-MODEL PPT.pptxK-MODEL PPT.pptx
K-MODEL PPT.pptx
Β 
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptxCredit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
Β 
Blog-Embryonic Opportunities For Predictive Analytics In Australia
Blog-Embryonic Opportunities For Predictive Analytics In AustraliaBlog-Embryonic Opportunities For Predictive Analytics In Australia
Blog-Embryonic Opportunities For Predictive Analytics In Australia
Β 
Uses of analytics in the field of Banking
Uses of analytics in the field of BankingUses of analytics in the field of Banking
Uses of analytics in the field of Banking
Β 
Credit Card Business Plan
Credit Card Business PlanCredit Card Business Plan
Credit Card Business Plan
Β 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National Finalist
Β 
IRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank Loans
Β 

More from Alejandro Correa Bahnsen, PhD

More from Alejandro Correa Bahnsen, PhD (6)

black hat deephish
black hat deephishblack hat deephish
black hat deephish
Β 
DeepPhish: Simulating malicious AI
DeepPhish: Simulating malicious AIDeepPhish: Simulating malicious AI
DeepPhish: Simulating malicious AI
Β 
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
Β 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data Products
Β 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Β 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
Β 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
Β 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
Β 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
Β 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
Β 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
Β 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]πŸ“Š Markus Baersch
Β 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
Β 
εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€F sss
Β 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
Β 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
Β 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
Β 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
Β 
εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€fhwihughh
Β 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
Β 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
Β 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
Β 
Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)
Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)
Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)jennyeacort
Β 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
Β 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
Β 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Β 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Β 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Β 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Β 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Β 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
Β 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
Β 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Β 
εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―δΈ­δ½›η½—ι‡ŒθΎΎε€§ε­¦ζ―•δΈšθ―,UCFζˆη»©ε•εŽŸη‰ˆδΈ€ζ―”δΈ€
Β 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
Β 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
Β 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
Β 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Β 
εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€
εŠžη†ε­¦δ½θ―ηΊ½ηΊ¦ε€§ε­¦ζ―•δΈšθ―(NYUζ―•δΈšθ―δΉ¦οΌ‰εŽŸη‰ˆδΈ€ζ―”δΈ€
Β 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Β 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
Β 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
Β 
Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)
Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)
Call Us βž₯97111√47426🀳Call Girls in Aerocity (Delhi NCR)
Β 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
Β 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
Β 

PhD Defense - Example-Dependent Cost-Sensitive Classification

  • 1. EXAMPLE-DEPENDENT COST-SENSITIVE CLASSIFICATION applications in financial risk modeling and marketing analytics September 15, 2015 Alejandro Correa Bahnsen with Djamila Aouada, SnT BjΓΆrn Ottersten, SnT
  • 2. Motivation 2 β€’ Classification: predicting the class of a set of examples given their features. β€’ Standard classification methods aim at minimizing the errors β€’ Such a traditional framework assumes that all misclassification errors carry the same cost β€’ This is not the case in many real-world applications: Credit card fraud detection, churn modeling, credit scoring, direct marketing. x1 x2
  • 3. Motivation 3 β€’ Credit card fraud detection, failing to detect a fraudulent transaction may have an economical impact from a few to thousands of Euros, depending on the particular transaction and card holder. β€’ Credit scoring, accepting loans from bad customers does not have the same economical loss, since customers have different credit lines, therefore, different profit. β€’ Churn modeling, misidentifying a profitable or unprofitable churner has a significant different economic result. β€’ Direct marketing, wrongly predicting that a customer will not accept an offer when in fact he will, may have different financial impact, as not all customers generate the same profit.
  • 4. β€’ Motivation β€’ Cost-sensitive classification Background β€’ Real-world cost-sensitive applications Credit card fraud detection, churn modeling, credit scoring, direct marketing β€’ Proposed cost-sensitive algorithms Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees, ensembles of cost-sensitive decision trees β€’ Experiments Experimental setup, results β€’ Conclusions Contributions, future work Agenda 4
  • 5. predict the class of set of examples given their features 𝑓: 𝑆 β†’ 0,1 Where each element of 𝑆 is composed by 𝑋𝑖 = π‘₯𝑖 1 , π‘₯𝑖 2 , … , π‘₯𝑖 π‘˜ It is usually evaluated using a traditional misclassification measures such as Accuracy, F1Score, AUC, among others. However, these measures assume that different misclassification errors carry the same cost Background - Binary classification 5
  • 6. We define a cost measure based on the cost matrix [Elkan 2001] From which we calculate the π‘ͺ𝒐𝒔𝒕 of applying a classifier to a given set πΆπ‘œπ‘ π‘‘ 𝑓 𝑆 = 𝑖=1 𝑁 𝑦𝑖 𝑐𝑖 𝐢 𝑇𝑃 𝑖 + 1 βˆ’ 𝑐𝑖 𝐢 𝐹𝑁 𝑖 + 1 βˆ’ 𝑦𝑖 𝑐𝑖 𝐢 𝐹𝑃 𝑖 + 1 βˆ’ 𝑐𝑖 𝐢 𝑇𝑁 𝑖 Background - Cost-sensitive evaluation 6 Actual Positive π’šπ’Š = 𝟏 Actual Negative π’šπ’Š = 𝟎 Predicted Positive π’„π’Š = 𝟏 𝐢 𝑇𝑃 𝑖 𝐢 𝐹𝑃 𝑖 Predicted Negative π’„π’Š = 𝟎 𝐢 𝐹𝑁 𝑖 𝐢 𝑇𝑁 𝑖
  • 7. However, the total cost may not be easy to interpret. Therefore, we propose a π‘Ίπ’‚π’—π’Šπ’π’ˆπ’” measure as the cost vs. the cost of using no algorithm at all π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑓 𝑆 = πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 βˆ’ πΆπ‘œπ‘ π‘‘ 𝑓 𝑆 πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 Where π‘ͺ𝒐𝒔𝒕𝒍 𝒇 𝑺 is the cost of predicting the costless class πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 = min πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 , πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆 Background - Cost-sensitive evaluation 7
  • 8. Research in example-dependent cost-sensitive classification has been narrow, mostly because of the lack of publicly available datasets [Aodha and Brostow 2013]. Standard approaches consist in re-weighting the training examples based on their costs: β€’ Cost-proportionate rejection sampling [Zadrozny et al. 2003] β€’ Cost-proportionate oversampling [Elkan 2001] Background - State-of-the-art methods 8
  • 9. β€’ Motivation β€’ Cost-sensitive classification Background β€’ Real-world cost-sensitive applications Credit card fraud detection, churn modeling, credit scoring, direct marketing β€’ Proposed cost-sensitive algorithms Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees, ensembles of cost-sensitive decision trees β€’ Experiments Experimental setup, results β€’ Conclusions Contributions, future work Agenda 9
  • 10. Estimate the probability of a transaction being fraud based on analyzing customer patterns and recent fraudulent behavior Issues when constructing a fraud detection system [Bolton et al., 2002]: β€’ Skewness of the data β€’ Cost-sensitivity β€’ Short time response of the system β€’ Dimensionality of the search space β€’ Feature preprocessing Credit card fraud detection 10
  • 11. Credit card fraud detection is a cost-sensitive problem. As the cost due to a false positive is different than the cost of a false negative. β€’ False positives: When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the financial institution. β€’ False negatives: Failing to detect a fraud, the amount of that transaction is lost. Moreover, it is not enough to assume a constant cost difference between false positives and false negatives, as the amount of the transactions varies quite significantly. Credit card fraud detection 11
  • 12. Cost matrix A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œCost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications. Miami, USA: IEEE, Dec. 2013, pp. 333–338. Credit card fraud detection 12 Actual Positive π’šπ’Š = 𝟏 Actual Negative π’šπ’Š = 𝟎 Predicted Positive π’„π’Š = 𝟏 𝐢 𝑇𝑃 𝑖 = 𝐢 π‘Ž 𝐢 𝐹𝑃 𝑖 = 𝐢 π‘Ž Predicted Negative π’„π’Š = 𝟎 𝐢 𝐹𝑁 𝑖 = π΄π‘šπ‘‘π‘– 𝐢 𝑇𝑁 𝑖 = 0
  • 13. Raw features Credit card fraud detection 13 Attribute name Description Transaction ID Transaction identification number Time Date and time of the transaction Account number Identification number of the customer Card number Identification of the credit card Transaction type ie. Internet, ATM, POS, ... Entry mode ie. Chip and pin, magnetic stripe, ... Amount Amount of the transaction in Euros Merchant code Identification of the merchant type Merchant group Merchant group identification Country Country of trx Country 2 Country of residence Type of card ie. Visa debit, Mastercard, American Express... Gender Gender of the card holder Age Card holder age Bank Issuer bank of the card
  • 14. Transaction aggregation strategy [Whitrow, 2008] Credit card fraud detection 14 Raw Features TrxId Time Type Country Amt 1 1/1 18:20 POS Lux 250 2 1/1 20:35 POS Lux 400 3 1/1 22:30 ATM Lux 250 4 2/1 00:50 POS Ger 50 5 2/1 19:18 POS Ger 100 6 2/1 23:45 POS Ger 150 7 3/1 06:00 POS Lux 10 Aggregated Features No Trx last 24h Amt last 24h No Trx last 24h same type and country Amt last 24h same type and country 0 0 0 0 1 250 1 250 2 650 0 0 3 900 0 0 3 700 1 50 2 150 2 150 3 400 0 0
  • 15. Proposed periodic features When is a customer expected to make a new transaction? Considering a von Mises distribution with a period of 24 hours such that 𝑃(π‘‘π‘–π‘šπ‘’) ~ π‘£π‘œπ‘›π‘šπ‘–π‘ π‘’π‘  πœ‡, 𝜎 = 𝑒 πœŽπ‘π‘œπ‘ (π‘‘π‘–π‘šπ‘’βˆ’πœ‡) 2πœ‹πΌ0 𝜎 where 𝝁 is the mean, 𝝈 is the standard deviation, and 𝑰 𝟎 is the Bessel function Credit card fraud detection 15
  • 16. Proposed periodic features A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œFeature Engineering Strategies for Credit Card Fraud Detection” submitted to Expert Systems with Applications. Credit card fraud detection 16
  • 17. Classify which potential customers are likely to default a contracted financial obligation based on the customer’s past financial experience. It is a cost-sensitive problem as the cost associated with approving a bad customer, i.e., false negative, is quite different from the cost associated with declining a good customer, i.e., false positive. Furthermore, the costs are not constant among customers. This is because loans have different credit line amounts, terms, and even interest rates. Credit scoring 17
  • 18. Cost matrix A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Logistic Regression for Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA: IEEE, 2014, pp. 263–269. Credit scoring 18 Actual Positive π’šπ’Š = 𝟏 Actual Negative π’šπ’Š = 𝟎 Predicted Positive π’„π’Š = 𝟏 𝐢 𝑇𝑃 𝑖 = 0 𝐢 𝐹𝑃 𝑖 = π‘Ÿπ‘– + 𝐢 𝐹𝑃 π‘Ž Predicted Negative π’„π’Š = 𝟎 𝐢 𝐹𝑁𝑖 = 𝐢𝑙𝑖 βˆ— 𝐿 𝑔𝑑 𝐢 𝑇𝑁 𝑖 = 0
  • 19. Predict the probability of a customer defecting using historical, behavioral and socioeconomical information. This tool is of great benefit to subscription based companies allowing them to maximize the results of retention campaigns. Churn modeling 19
  • 20. Churn management campaign [Verbraken, 2013] Churn modeling 20 Inflow New Customers Customer Base Active Customers Predicted Churners Predicted Non-Churners TP: Actual Churners FP: Actual Non-Churners FN: Actual Churners TN: Actual Non-Churners Outflow Effective Churners Churn Model Prediction 1 1 𝟏 βˆ’ 𝜸𝜸 1
  • 21. Proposed financial evaluation of a churn campaign A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œA novel cost-sensitive framework for customer churn predictive modeling,” Decision Analytics, vol. 2:5, 2015. Churn modeling 21 Inflow New Customers Customer Base Active Customers Predicted Churners Predicted Non-Churners TP: Actual Churners FP: Actual Non-Churners FN: Actual Churners TN: Actual Non-Churners Outflow Effective Churners Churn Model Prediction 𝟎 π‘ͺ𝑳𝑽 π‘ͺ𝑳𝑽 + π‘ͺ 𝒂π‘ͺ 𝒐 + π‘ͺ 𝒂 π‘ͺ 𝒐 + π‘ͺ 𝒂
  • 22. Cost matrix A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œA novel cost-sensitive framework for customer churn predictive modeling,” Decision Analytics, vol. 2:5, 2015. Churn modeling 22 Actual Positive π’šπ’Š = 𝟏 Actual Negative π’šπ’Š = 𝟎 Predicted Positive π’„π’Š = 𝟏 𝐢 𝑇𝑃 𝑖 = 𝛾𝑖 𝐢 π‘œ 𝑖 + 1 βˆ’ 𝛾𝑖 𝐢𝐿𝑉𝑖 + 𝐢 π‘Ž 𝐢 𝐹𝑃 𝑖 = 𝐢 π‘œ 𝑖 + 𝐢 π‘Ž Predicted Negative π’„π’Š = 𝟎 𝐢 𝐹𝑁𝑖 = 𝐢𝐿𝑉𝑖 𝐢 𝑇𝑁 𝑖 = 0
  • 23. Classify those customers who are more likely to have a certain response to a marketing campaign. This problem is example-dependent cost sensitive, as the false positives have the cost of contacting the client, and false negatives have the cost due to the loss of income by failing to making the an offer to the right customer. Direct marketing 23
  • 24. Cost matrix A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œImproving Credit Card Fraud Detection with Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia, USA, 2014, pp. 677 – 685. Direct marketing 24 Actual Positive π’šπ’Š = 𝟏 Actual Negative π’šπ’Š = 𝟎 Predicted Positive π’„π’Š = 𝟏 𝐢 𝑇𝑃 𝑖 = 𝐢 π‘Ž 𝐢 𝐹𝑃 𝑖 = 𝐢 π‘Ž Predicted Negative π’„π’Š = 𝟎 𝐢 𝐹𝑁 𝑖 = 𝐼𝑛𝑑𝑖 𝐢 𝑇𝑁 𝑖 = 0
  • 25. β€’ Motivation β€’ Cost-sensitive classification Background β€’ Real-world cost-sensitive applications Credit card fraud detection, churn modeling, credit scoring, direct marketing β€’ Proposed cost-sensitive algorithms Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees, ensembles of cost-sensitive decision trees β€’ Experiments Experimental setup, results β€’ Conclusions Contributions, future work Agenda 25
  • 26. β€’ Bayes minimum risk (BMR) A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œCost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications. Miami, USA: IEEE, Dec. 2013, pp. 333–338. A. Correa Bahnsen, Aouada, and B. Ottersten, β€œImproving Credit Card Fraud Detection with Calibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia, USA, 2014, pp. 677 – 685. β€’ Cost-sensitive logistic regression (CSLR) A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Logistic Regression for Credit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA: IEEE, 2014, pp. 263–269. β€’ Cost-sensitive decision trees (CSDT) A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Decision Trees,” Expert Systems with Applications, vol. 42:19, 2015. β€’ Ensembles of cost-sensitive decision trees (ECSDT) A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œEnsemble of Example-Dependent Cost-Sensitive Decision Trees,” IEEE Transactions on Knowledge and Data Engineering, vol. under review, 2015. Proposed cost-sensitive algorithms 26
  • 27. Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions Risk of classification 𝑅 𝑐𝑖 = 0|π‘₯𝑖 = 𝐢 𝑇𝑁 𝑖 1 βˆ’ 𝑝𝑖 + 𝐢 𝐹𝑁 𝑖 βˆ™ 𝑝𝑖 𝑅 𝑐𝑖 = 1|π‘₯𝑖 = 𝐢 𝐹𝑃 𝑖 1 βˆ’ 𝑝𝑖 + 𝐢 𝑇𝑃 𝑖 βˆ™ 𝑝𝑖 Using the different risks the prediction is made based on the following condition: 𝑐𝑖 = 0 𝑅 𝑐𝑖 = 0|π‘₯𝑖 ≀ 𝑅 𝑐𝑖 = 1|π‘₯𝑖 1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’ Bayes Minimum Risk 27
  • 28. Cost-Sensitive Logistic Regression 28 β€’ Logistic Regression Model 𝑝𝑖 = 𝑃 𝑦𝑖 = 0|π‘₯𝑖 = β„Ž πœƒ π‘₯𝑖 = 𝑔 𝑗=1 π‘˜ πœƒπ‘— π‘₯𝑖 𝑗 β€’ Cost Function 𝐽𝑖 πœƒ = βˆ’π‘¦π‘– log β„Ž πœƒ π‘₯𝑖 βˆ’ 1 βˆ’ 𝑦𝑖 log 1 βˆ’ β„Ž πœƒ π‘₯𝑖 β€’ Cost Analysis 𝐽𝑖 πœƒ β‰ˆ 0 𝑖𝑓 𝑦𝑖 β‰ˆ β„Ž πœƒ π‘₯𝑖 𝑖𝑛𝑓 𝑖𝑓 𝑦𝑖 β‰ˆ 1 βˆ’ β„Ž πœƒ π‘₯𝑖 𝐢 𝑇𝑃 𝑖 = 𝐢 𝑇𝑁 𝑖 β‰ˆ 0 𝐢 𝐹𝑃 𝑖 = 𝐢 𝐹𝑁 𝑖 β‰ˆ ∞
  • 29. Cost-Sensitive Logistic Regression 29 β€’ Actual Costs 𝐽 𝑐 πœƒ = 𝐢 𝑇𝑃 𝑖 𝑖𝑓 𝑦𝑖 = 1 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 1 𝐢 𝑇𝑁 𝑖 𝑖𝑓 𝑦𝑖 = 0 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 0 𝐢 𝐹𝑃 𝑖 𝑖𝑓 𝑦𝑖 = 0 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 1 𝐢 𝐹𝑁 𝑖 𝑖𝑓 𝑦𝑖 = 1 π‘Žπ‘›π‘‘ β„Ž πœƒ π‘₯𝑖 β‰ˆ 0 β€’ Proposed Cost-Sensitive Function 𝐽 𝑐 πœƒ = 1 𝑁 𝑖=1 𝑁 𝑦𝑖 β„Ž πœƒ π‘₯𝑖 𝐢 𝑇𝑃 𝑖 + 1 βˆ’ β„Ž πœƒ π‘₯𝑖 𝐢 𝐹𝑁 𝑖 + 1 βˆ’ 𝑦𝑖 β„Ž πœƒ π‘₯𝑖 𝐢 𝐹𝑃 𝑖 + 1 βˆ’ β„Ž πœƒ π‘₯𝑖 𝐢 𝑇𝑁 𝑖
  • 30. β€’ A decision tree is a classification model that iteratively creates binary decision rules π‘₯ 𝑗, 𝑙 π‘š 𝑗 that maximize certain criteria (gain, entropy, …). Where π‘₯ 𝑗, 𝑙 π‘š 𝑗 refers to making a rule using feature j on value m β€’ Maximize the accuracy is different than maximizing the cost. β€’ To solve this, some studies had been proposed method that aim to introduce the cost-sensitivity into the algorithms [Lomax 2013]. However, research have been focused on class-dependent methods β€’ We proposed: β€’ Example-dependent cost based impurity measure β€’ Example-dependent cost based pruning criteria Cost-Sensitive Decision trees 30
  • 31. Cost-Sensitive Decision trees 31 Proposed Cost based impurity measure 𝑆 𝑙 = π‘₯|π‘₯𝑖 ∈ 𝑆 ∧ π‘₯𝑖 𝑗 ≀ 𝑙 π‘š 𝑗 𝑆 π‘Ÿ = π‘₯|π‘₯𝑖 ∈ 𝑆 ∧ π‘₯𝑖 𝑗 > 𝑙 π‘š 𝑗 β€’ The impurity of each leaf is calculated using: 𝐼𝑐 𝑆 = min πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 , πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆 𝑓 𝑆 = 0 𝑖𝑓 πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 ≀ πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆 1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’ β€’ Afterwards the gain of applying a given rule to the set 𝑆 is: πΊπ‘Žπ‘–π‘› 𝑐 π‘₯ 𝑗, 𝑙 π‘š 𝑗 = 𝐼𝑐 πœ‹1 βˆ’ 𝐼𝑐 πœ‹1 𝑙 + 𝐼𝑐 πœ‹1 π‘Ÿ S S S π‘₯ 𝑗, 𝑙 π‘š 𝑗
  • 32. Cost-Sensitive Decision trees 32 Decision trees construction β€’ The rule that maximizes the gain is selected 𝑏𝑒𝑠𝑑 π‘₯, 𝑏𝑒𝑠𝑑𝑙 = π‘Žπ‘Ÿπ‘” max 𝑗,π‘š πΊπ‘Žπ‘–π‘› π‘₯ 𝑗, 𝑙 π‘š 𝑗 S S S S S S S S S S S β€’ The process is repeated until a stopping criteria is met:
  • 33. Cost-Sensitive Decision trees 33 Proposed cost-sensitive pruning criteria β€’ Calculation of the Tree savings and pruned Tree savings S S S S S S S S S S S 𝑃𝐢𝑐 = πΆπ‘œπ‘ π‘‘ 𝑓 𝑆, π‘‡π‘Ÿπ‘’π‘’ βˆ’ πΆπ‘œπ‘ π‘‘ 𝑓 𝑆, 𝐸𝐡 π‘‡π‘Ÿπ‘’π‘’, π‘π‘Ÿπ‘Žπ‘›π‘β„Ž π‘‡π‘Ÿπ‘’π‘’ βˆ’ 𝐸𝐡 π‘‡π‘Ÿπ‘’π‘’, π‘π‘Ÿπ‘Žπ‘›π‘β„Ž β€’ After calculating the pruning criteria for all possible trees. The maximum improvement is selected and the Tree is pruned. β€’ Later the process is repeated until there is no further improvement. S S S S S S S S S S S S S S
  • 34. Typical ensemble is made by combining T different base classifiers. Each base classifiers is trained by applying algorithm M in a random subset Ensembles of Cost-Sensitive Decision trees 34 𝑀𝑗 ← 𝑀 𝑆𝑗 βˆ€π‘— ∈ 1, … , 𝑇
  • 35. The core principle in ensemble learning, is to induce random perturbations into the learning procedure in order to produce several different base classifiers from a single training set, then combining the base classifiers in order to make the final prediction. Ensembles of Cost-Sensitive Decision trees 35
  • 36. Ensembles of Cost-Sensitive Decision trees 36 1 2 3 4 5 6 7 8 8 6 2 5 2 1 3 6 7 1 2 3 8 1 5 8 1 4 4 2 1 9 4 6 1 1 5 8 1 4 4 2 1 1 5 8 1 4 4 2 1 1 5 8 1 4 4 2 1 Bagging Pasting Random forest Random patches Training set
  • 37. After the base classifiers are constructed they are typically combined using one of the following methods: β€’ Majority voting 𝐻 𝑆 = π‘“π‘šπ‘£ 𝑆, 𝑀 = π‘Žπ‘Ÿπ‘” max π‘βˆˆ 0,1 𝑗=1 𝑇 1 𝑐 𝑀𝑗 𝑆 β€’ Proposed cost-sensitive weighted voting 𝐻 𝑆 = 𝑓𝑀𝑣 𝑆, 𝑀, 𝛼 = π‘Žπ‘Ÿπ‘” max π‘βˆˆ 0,1 𝑗=1 𝑇 𝛼𝑗1 𝑐 𝑀𝑗 𝑆 𝛼𝑗 = 1βˆ’πœ€ 𝑀 𝑗 𝑆 𝑗 π‘œπ‘œπ‘ 𝑗1=1 𝑇 1βˆ’πœ€ 𝑀 𝑗1 𝑆 𝑗1 π‘œπ‘œπ‘ 𝛼𝑗 = π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑀 𝑗 𝑆 𝑗 π‘œπ‘œπ‘ 𝑗1=1 𝑇 π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑀 𝑗1 𝑆 𝑗1 π‘œπ‘œπ‘ Ensembles of Cost-Sensitive Decision trees 37
  • 38. β€’ Proposed cost-sensitive stacking 𝐻 𝑆 = 𝑓𝑠 𝑆, 𝑀, 𝛽 = 1 1 + 𝑒 βˆ’ 𝑗=1 𝑇 𝛽 𝑗 𝑀 𝑗 𝑆 Using the cost-sensitive logistic regression [Correa et. al, 2014] model: 𝐽 𝑆, 𝑀, 𝛽 = 𝑖=1 𝑁 𝑦𝑖 𝑓𝑠 𝑆, 𝑀, 𝛽 𝐢 𝑇𝑃 𝑖 βˆ’ 𝐢 𝐹𝑁𝑖 + 𝐢 𝐹𝑁 𝑖 + 1 βˆ’ 𝑦𝑖 𝑓𝑠 𝑆, 𝑀, 𝛽 𝐢 𝐹𝑃 𝑖 βˆ’ 𝐢 𝑇𝑁 𝑖 + 𝐢 𝑇𝑁 𝑖 Then the weights are estimated using 𝛽 = π‘Žπ‘Ÿπ‘” min 𝛽 𝐽 𝑆, 𝑀, 𝛽 Ensembles of Cost-Sensitive Decision trees 38
  • 39. β€’ Motivation β€’ Cost-sensitive classification Background β€’ Real-world cost-sensitive applications Credit card fraud detection, churn modeling, credit scoring, direct marketing β€’ Proposed cost-sensitive algorithms Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees, ensembles of cost-sensitive decision trees β€’ Experiments Experimental setup, results β€’ Conclusions Contributions, future work Agenda 39
  • 40. Experimental setup - Datasets 40 Database # Examples % Positives Cost (Euros) Fraud 1,638,772 0.21% 860,448 Churn 9,410 4.83% 580,884 Kaggle Credit 112,915 6.74% 8,740,181 PAKDD09 Credit 38,969 19.88% 3,117,960 Direct Marketing 37,931 12.62% 59,507
  • 41. β€’ Cost-insensitive (CI): β€’ Decision trees (DT) β€’ Logistic regression (LR) β€’ Random forest (RF) β€’ Under-sampling (u) β€’ Cost-proportionate sampling (CPS): β€’ Cost-proportionate rejection-sampling (r) β€’ Cost-proportionate over-sampling (o) β€’ Bayes minimum risk (BMR) β€’ Cost-sensitive training (CST): β€’ Cost-sensitive logistic regression (CSLR) β€’ Cost-sensitive decision trees (CSDT) Experimental setup - Methods 41
  • 42. β€’ Ensemble cost-sensitive decision trees (ECSDT): Random inducers: β€’ Bagging (CSB) β€’ Pasting (CSP) β€’ Random forest (CSRF) β€’ Random patches (CSRP) Combination: β€’ Majority voting (mv) β€’ Cost-sensitive weighted voting (wv) β€’ Cost-sensitive staking (s) Experimental setup - Methods 42
  • 43. β€’ Each experiment was carried out 50 times β€’ For the parameters of the algorithms a grid search was made β€’ Results are measured by savings and F1Score β€’ Then the Friedman ranking is calculated for each method Experimental setup 43
  • 44. Results 44 Percentage of the highest savings Database Algorithm Savings Savings (Euros) Fraud CSRP-wv-t 0.73 628,127 Churn CSRP-s-t 0.17 98,750 Credit1 CSRP-mv-t 0.52 4,544,894 Credit2 LR-t-BMR 0.31 966,568 Marketing LR-t-BMR 0.51 30,349 % Pos. 0.21 4.83 6.74 19.88 12.62
  • 45. Results 45 Results of the Friedman rank of the savings (1=best, 28=worst) Family Algorithm Rank ECSDT CSRP-wv-t 2.6 ECSDT CSRP-s-t 3.4 ECSDT CSRP-mv-t 4 ECSDT CSB-wv-t 5.6 ECSDT CSP-wv-t 7.4 ECSDT CSB-mv-t 8.2 ECSDT CSRF-wv-t 9.4 BMR RF-t-BMR 9.4 ECSDT CSP-s-t 9.6 ECSDT CSP-mv-t 10.2 ECSDT CSB-s-t 10.2 BMR LR-t-BMR 11.2 CPS RF-r 11.6 CST CSDT-t 12.6 Family Algorithm Rank CST CSLR-t 14.4 ECSDT CSRF-mv-t 15.2 ECSDT CSRF-s-t 16 CI RF-u 17.2 CPS LR-r 19 BMR DT-t-BMR 19 CPS LR-o 21 CPS DT-r 22.6 CI LR-u 22.8 CPS RF-o 22.8 CI DT-u 24.4 CPS DT-o 25 CI DT-t 26 CI RF-t 26.2
  • 46. Results 46 Results of the Friedman rank of the savings organized by family
  • 47. Results within the ECSDT family 47 By combination methodBy random inducer
  • 48. Results 48 Comparison of the Friedman ranking of the savings and F1Score sorted by F1Score ranking
  • 49. β€’ Motivation β€’ Cost-sensitive classification Background β€’ Real-world cost-sensitive applications Credit card fraud detection, churn modeling, credit scoring, direct marketing β€’ Proposed cost-sensitive algorithms Bayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees, ensembles of cost-sensitive decision trees β€’ Experiments Experimental setup, results β€’ Conclusions Contributions, future work Agenda 49
  • 50. β€’ New framework of example dependent cost-sensitive classification β€’ Using five databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we show that the proposed algorithms significantly outperforms the state-of- the-art cost-insensitive and example-dependent cost-sensitive algorithms β€’ Highlight the importance of using the real example-dependent financial costs associated with the real-world applications Conclusions 50
  • 51. β€’ Multi-class example-dependent cost-sensitive classification β€’ Cost-sensitive calibration β€’ Staking cost-sensitive decision trees β€’ Example-dependent cost-sensitive boosting β€’ Online example-dependent cost-sensitive classification Future research directions 51
  • 52. Contributions - Papers Date Name Conference / Journal Status July 2013 Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk IEEE International Conference on Machine Learning and Applications Published October 2013 Improving Credit Card Fraud Detection with Calibrated Probabilities SIAM International Conference on Data Mining Published June 2014 Credit Scoring using Cost-Sensitive Logistic Regression IEEE International Conference on Machine Learning and Applications Published October 2014 Example-Dependent Cost-Sensitive Decision Trees Expert Systems with Applications Published January 2015 A novel cost-sensitive framework for customer churn predictive modeling Decision Analytics Published March 2015 Ensemble of Example-Dependent Cost- Sensitive Decision Trees IEEE Transactions on Knowledge and Data Engineering Under review March 2015 Feature Engineering Strategies for Credit Card Fraud Detection Expert Systems with Applications Under review June 2015 Detecting Credit Card Fraud using Periodic Features IEEE International Conference on Machine Learning and Applications In Press 52
  • 53. Contributions - Costcla costcla is a Python module for cost- sensitive classification built on top of Scikit-Learn, SciPy and distributed under the 3-Clause BSD license. In particular, it provides: β€’ A set of example-dependent cost- sensitive algorithms β€’ Different real-world example- dependent cost-sensitive datasets. Installation pip install costcla Documentation: https://pythonhosted.org/costcla/ 53
  • 55. Contact information Alejandro Correa Bahnsen University of Luxembourg albahnsen.com al.bahnsen@gmail.com http://www.linkedin.com/in/albahnsen https://github.com/albahnsen/CostSensitiveClassification