Six Myths about Ontologies: The Basics of Formal Ontology
Fraud Detection with Ensemble Learning Technique
1. CREDIT CARD FRAUD DETECTION:
EVALUATION AND DATA ANALYSIS
WITH ENSEMBLE LEARNING
TECHNIQUES
Francesca Pappalardo
Business Innovation and Informatics – Computational
Intelligence and Data Analytics
3. “
The aim of my analysis is to
identify the best model of the
Ensemble Learning family
capable of correctly predicting
fraudulent transactions.
3
Introduction Pappalardo Francesca –Business Innovation & Informatics
6. Dataset Explanation
CONTINUOUS VARIABLE
○ V1…V28: may be result of a
PCA Dimensionality reduction
to protect user identities and
sensitive features
○ Amount: transaction amount
○ Time: number of seconds
elapsed between the i-th
transaction and the first
transaction in the dataset
CATEGORICAL VARIABLE
Class
○ 1: Fraudulent Transaction
○ 0: Not Fraudulent Transaction
The dataset on which an analysis study was carried out was downloaded from
the Kaggle platform.
The dataset contains transactions detected in two days in September 2013.
It presents 31 features.
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
7. 284,807
492
0.173%
Total Transactions
Number of Fraudulent Transaction
Percentage of Fraudulent Transactions
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
8. Data Visualization in
high dimensionality
PCA
Technique
t-SNE
Original
Dataset
Dataset after
PCA Kullback Leibler
divergence: 89,7086633
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
9. Analysis
What time of the day
are fraudulent
transactions taking
place?
?
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
10. Analysis
What was the
maximum amount of
money with a
fraudulent
transaction?
?
Maximum Amount for fraudulent transaction is: $ 2125,87
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
12. Data Preparation Steps
1. Data
Standardization
To ensure that all
the data involved in
the analysis assume
an average value of
0 and a standard
deviation of 1.
2. Splitting Data
60% train set
20% test set
20% validation set
3. Random Under
Sampling
To reduce the
number of
transactions of the
majority class and
work in excellent
conditions in terms
of time and costs.
Data Preparation Pappalardo Francesca –Business Innovation & Informatics
13. Data Preparation
Train set (after
standardization)
Train set (after
Random Under
Sampling)
Fraud 308 308
Non Fraud 170575 1540
Data Preparation Pappalardo Francesca –Business Innovation & Informatics
15. Ensemble Learning
Ensemble Learning is a Machine Learning paradigm where multiple models called weak
learners are trained to solve the same problem and combine to get better results.
Bagging
Bagging builds multiple
models using bootstrap
sampling; It’s useful to
decrease model’s variance.
Boosting
Boosting builds models
by iteratively fitting
baselearners to model
error; It’s useful to
decreasing the model’s bias.
Stacking
Stacking creates a
hierarchy of models
using the outputs from
previous layers; It’s
useful to increasing the
predictive force of the
classifier.
Modeling Pappalardo Francesca –Business Innovation & Informatics
17. Steps to train each model
Modeling Pappalardo Francesca –Business Innovation & Informatics
1. Find a best numbers of estimators;
2. OOB Score and select features;
3. Evaluate each model considered all features and
the features suggested by OOB Score;
OOB Score – Random Forest
OOB Score – AdaBoost
OOB Score – XGBoost
18. Winning classifiers
Modeling Pappalardo Francesca –Business Innovation & Informatics
Random Forest with 12 features
suggested by OOB Score
AdaBoost (all features)
XGBoost (all features)
21. Model Evaluation
The model evaluation phase is an important step in the project.
For this analysis, the evaluation of the models was evaluated based
on metrics such as:
○ Recall: What proportion of actual positives was identified
correctly?
○ Precision: What proportion of positive identifications was
correct?
○ ROC-AUC: It tells how much model is capable of
distinguishing between classes.
Evaluation Pappalardo Francesca –Business Innovation & Informatics
22. Model Evaluation The graph shows on the x-axis the
False Positive rate parameter
which
indicates the proportion of all
transactions that are not
fraudulent but will be
identified as fraudulent. Instead,
on the y-axis, it has True Positive
Rate or Recall.
AUC provides a measure of
performance on all possible
classication thresholds.
Evaluation Pappalardo Francesca –Business Innovation & Informatics
23. Model Evaluation
Testing each classifier with the validation set, the best
result in terms of recall was obtained by Ensemble
Classifier with a value 0.82.
Evaluation Pappalardo Francesca –Business Innovation & Informatics
25. Evaluation Cost
In a fraud detection system analysis, if a transaction is not identified,
a loss of money is charged.
To evaluate in terms of costs, I established these values following
the data.
Business Understanding Pappalardo Francesca –Business Innovation & Informatics
26. Evaluation Cost
In binary classification problems, the performance of a model is
improved by reducing it to a minimum misclassification rate.
Business Understanding Pappalardo Francesca –Business Innovation & Informatics
28. A Neural Network way
With a work of Features Extraction, I obtained 10 important features with a
Permutation Test on each classifier.
Based on these features, I created two Neural Networks using Keras.
One with the specific important 10 features and the other one NN with all
features.
The model used for the development of a Neural Network was Sequential.
The number of neurons used was selected on a range from 1 to 60 (60
corresponds to twice the number of features) evaluating the optimal recall
value returned by each.
The optimal number of neurons obtained was 21.
Future Development Pappalardo Francesca –Business Innovation & Informatics
29. Neural Network
NN with all features NN with 10 specific features
Future Development Pappalardo Francesca –Business Innovation & Informatics
30. Future Development
In the future, to continue this analysis:
• Neural Networks customize can be created to improve the
values of the metrics obtained.
• Improve performance costs using the thresholding technique.
Future Development Pappalardo Francesca –Business Innovation & Informatics
31. Thanks for the attention!
Any questions?
You can find the code at:
https://github.com/kikkatigre/Credit-Card-Fraud-Detection
Editor's Notes
ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease.
ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease.