Fraud Detection with Ensemble Learning Technique

CREDIT CARD FRAUD DETECTION:
EVALUATION AND DATA ANALYSIS
WITH ENSEMBLE LEARNING
TECHNIQUES
Francesca Pappalardo
Business Innovation and Informatics – Computational
Intelligence and Data Analytics

Problems
about fraud
2
Fraud
Customer
Loss
Money
Loss
Mobile
Devices
Unbalanced
DatasetUnknown
Identities
?
Introduction Pappalardo Francesca –Business Innovation & Informatics

“
The aim of my analysis is to
identify the best model of the
Ensemble Learning family
capable of correctly predicting
fraudulent transactions.
3

CRISP-DM
4

Dataset Explanation
CONTINUOUS VARIABLE
￮ V1…V28: may be result of a
PCA Dimensionality reduction
to protect user identities and
sensitive features
￮ Amount: transaction amount
￮ Time: number of seconds
elapsed between the i-th
transaction and the first
transaction in the dataset
CATEGORICAL VARIABLE
Class
￮ 1: Fraudulent Transaction
￮ 0: Not Fraudulent Transaction
The dataset on which an analysis study was carried out was downloaded from
the Kaggle platform.
The dataset contains transactions detected in two days in September 2013.
It presents 31 features.
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics

284,807
492
0.173%
Total Transactions
Number of Fraudulent Transaction
Percentage of Fraudulent Transactions

Data Visualization in
high dimensionality
PCA
Technique
t-SNE
Original
Dataset
Dataset after
PCA Kullback Leibler
divergence: 89,7086633

Analysis
What time of the day
are fraudulent
transactions taking
place?
?

Analysis
What was the
maximum amount of
money with a
fraudulent
transaction?
?
Maximum Amount for fraudulent transaction is: $ 2125,87

Data Preparation Steps
1. Data
Standardization
To ensure that all
the data involved in
the analysis assume
an average value of
0 and a standard
deviation of 1.
2. Splitting Data
60% train set
20% test set
20% validation set
3. Random Under
Sampling
To reduce the
number of
transactions of the
majority class and
work in excellent
conditions in terms
of time and costs.
Data Preparation Pappalardo Francesca –Business Innovation & Informatics

Data Preparation
Train set (after
standardization)
Train set (after
Random Under
Sampling)
Fraud 308 308
Non Fraud 170575 1540
Data Preparation Pappalardo Francesca –Business Innovation & Informatics

Ensemble Learning
Ensemble Learning is a Machine Learning paradigm where multiple models called weak
learners are trained to solve the same problem and combine to get better results.
Bagging
Bagging builds multiple
models using bootstrap
sampling; It’s useful to
decrease model’s variance.
Boosting
Boosting builds models
by iteratively fitting
baselearners to model
error; It’s useful to
decreasing the model’s bias.
Stacking
Stacking creates a
hierarchy of models
using the outputs from
previous layers; It’s
useful to increasing the
predictive force of the
classifier.
Modeling Pappalardo Francesca –Business Innovation & Informatics

Ensemble
Random
Forest AdaBoost
XGBoost
Classifiers

Steps to train each model
1. Find a best numbers of estimators;
2. OOB Score and select features;
3. Evaluate each model considered all features and
the features suggested by OOB Score;
OOB Score – Random Forest
OOB Score – AdaBoost
OOB Score – XGBoost

Winning classifiers
Random Forest with 12 features
suggested by OOB Score
AdaBoost (all features)
XGBoost (all features)

Ensemble Classifier
Ensemble Classifier
with all classifiers
Ensemble Classifier
with winning classifiers

Model Evaluation
The model evaluation phase is an important step in the project.
For this analysis, the evaluation of the models was evaluated based
on metrics such as:
￮ Recall: What proportion of actual positives was identified
correctly?
￮ Precision: What proportion of positive identifications was
correct?
￮ ROC-AUC: It tells how much model is capable of
distinguishing between classes.
Evaluation Pappalardo Francesca –Business Innovation & Informatics

Model Evaluation The graph shows on the x-axis the
False Positive rate parameter
which
indicates the proportion of all
transactions that are not
fraudulent but will be
identified as fraudulent. Instead,
on the y-axis, it has True Positive
Rate or Recall.
AUC provides a measure of
performance on all possible
classication thresholds.

Model Evaluation
Testing each classifier with the validation set, the best
result in terms of recall was obtained by Ensemble
Classifier with a value 0.82.

Evaluation Cost
In a fraud detection system analysis, if a transaction is not identified,
a loss of money is charged.
To evaluate in terms of costs, I established these values following
the data.
Business Understanding Pappalardo Francesca –Business Innovation & Informatics

Evaluation Cost
In binary classification problems, the performance of a model is
improved by reducing it to a minimum misclassification rate.
Business Understanding Pappalardo Francesca –Business Innovation & Informatics

A Neural Network way
With a work of Features Extraction, I obtained 10 important features with a
Permutation Test on each classifier.
Based on these features, I created two Neural Networks using Keras.
One with the specific important 10 features and the other one NN with all
features.
The model used for the development of a Neural Network was Sequential.
The number of neurons used was selected on a range from 1 to 60 (60
corresponds to twice the number of features) evaluating the optimal recall
value returned by each.
The optimal number of neurons obtained was 21.
Future Development Pappalardo Francesca –Business Innovation & Informatics

Neural Network
NN with all features NN with 10 specific features

Future Development
In the future, to continue this analysis:
• Neural Networks customize can be created to improve the
values of the metrics obtained.
• Improve performance costs using the thresholding technique.

Thanks for the attention!
Any questions?
You can find the code at:
https://github.com/kikkatigre/Credit-Card-Fraud-Detection

Fraud Detection with Ensemble Learning Technique

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fraud Detection with Ensemble Learning Technique

Similar to Fraud Detection with Ensemble Learning Technique (20)

More from Francesca Pappalardo

More from Francesca Pappalardo (10)

Recently uploaded

Recently uploaded (20)

Fraud Detection with Ensemble Learning Technique

Editor's Notes