SlideShare a Scribd company logo
1 of 31
CREDIT CARD FRAUD DETECTION:
EVALUATION AND DATA ANALYSIS
WITH ENSEMBLE LEARNING
TECHNIQUES
Francesca Pappalardo
Business Innovation and Informatics – Computational
Intelligence and Data Analytics
Problems
about fraud
2
Fraud
Customer
Loss
Money
Loss
Mobile
Devices
Unbalanced
DatasetUnknown
Identities
?
Introduction Pappalardo Francesca –Business Innovation & Informatics
“
The aim of my analysis is to
identify the best model of the
Ensemble Learning family
capable of correctly predicting
fraudulent transactions.
3
Introduction Pappalardo Francesca –Business Innovation & Informatics
CRISP-DM
4
Introduction Pappalardo Francesca –Business Innovation & Informatics
Exploration
Data Analysis
Dataset Explanation
CONTINUOUS VARIABLE
○ V1…V28: may be result of a
PCA Dimensionality reduction
to protect user identities and
sensitive features
○ Amount: transaction amount
○ Time: number of seconds
elapsed between the i-th
transaction and the first
transaction in the dataset
CATEGORICAL VARIABLE
Class
○ 1: Fraudulent Transaction
○ 0: Not Fraudulent Transaction
The dataset on which an analysis study was carried out was downloaded from
the Kaggle platform.
The dataset contains transactions detected in two days in September 2013.
It presents 31 features.
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
284,807
492
0.173%
Total Transactions
Number of Fraudulent Transaction
Percentage of Fraudulent Transactions
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
Data Visualization in
high dimensionality
PCA
Technique
t-SNE
Original
Dataset
Dataset after
PCA Kullback Leibler
divergence: 89,7086633
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
Analysis
What time of the day
are fraudulent
transactions taking
place?
?
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
Analysis
What was the
maximum amount of
money with a
fraudulent
transaction?
?
Maximum Amount for fraudulent transaction is: $ 2125,87
Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
Data
Preparation
Data Preparation Steps
1. Data
Standardization
To ensure that all
the data involved in
the analysis assume
an average value of
0 and a standard
deviation of 1.
2. Splitting Data
60% train set
20% test set
20% validation set
3. Random Under
Sampling
To reduce the
number of
transactions of the
majority class and
work in excellent
conditions in terms
of time and costs.
Data Preparation Pappalardo Francesca –Business Innovation & Informatics
Data Preparation
Train set (after
standardization)
Train set (after
Random Under
Sampling)
Fraud 308 308
Non Fraud 170575 1540
Data Preparation Pappalardo Francesca –Business Innovation & Informatics
Modeling
Ensemble Learning
Ensemble Learning is a Machine Learning paradigm where multiple models called weak
learners are trained to solve the same problem and combine to get better results.
Bagging
Bagging builds multiple
models using bootstrap
sampling; It’s useful to
decrease model’s variance.
Boosting
Boosting builds models
by iteratively fitting
baselearners to model
error; It’s useful to
decreasing the model’s bias.
Stacking
Stacking creates a
hierarchy of models
using the outputs from
previous layers; It’s
useful to increasing the
predictive force of the
classifier.
Modeling Pappalardo Francesca –Business Innovation & Informatics
Ensemble
Random
Forest AdaBoost
XGBoost
Classifiers
Modeling Pappalardo Francesca –Business Innovation & Informatics
Steps to train each model
Modeling Pappalardo Francesca –Business Innovation & Informatics
1. Find a best numbers of estimators;
2. OOB Score and select features;
3. Evaluate each model considered all features and
the features suggested by OOB Score;
OOB Score – Random Forest
OOB Score – AdaBoost
OOB Score – XGBoost
Winning classifiers
Modeling Pappalardo Francesca –Business Innovation & Informatics
Random Forest with 12 features
suggested by OOB Score
AdaBoost (all features)
XGBoost (all features)
Ensemble Classifier
Ensemble Classifier
with all classifiers
Ensemble Classifier
with winning classifiers
Modeling Pappalardo Francesca –Business Innovation & Informatics
Evaluation
Model Evaluation
The model evaluation phase is an important step in the project.
For this analysis, the evaluation of the models was evaluated based
on metrics such as:
○ Recall: What proportion of actual positives was identified
correctly?
○ Precision: What proportion of positive identifications was
correct?
○ ROC-AUC: It tells how much model is capable of
distinguishing between classes.
Evaluation Pappalardo Francesca –Business Innovation & Informatics
Model Evaluation The graph shows on the x-axis the
False Positive rate parameter
which
indicates the proportion of all
transactions that are not
fraudulent but will be
identified as fraudulent. Instead,
on the y-axis, it has True Positive
Rate or Recall.
AUC provides a measure of
performance on all possible
classication thresholds.
Evaluation Pappalardo Francesca –Business Innovation & Informatics
Model Evaluation
Testing each classifier with the validation set, the best
result in terms of recall was obtained by Ensemble
Classifier with a value 0.82.
Evaluation Pappalardo Francesca –Business Innovation & Informatics
Business
Understanding
Evaluation Cost
In a fraud detection system analysis, if a transaction is not identified,
a loss of money is charged.
To evaluate in terms of costs, I established these values following
the data.
Business Understanding Pappalardo Francesca –Business Innovation & Informatics
Evaluation Cost
In binary classification problems, the performance of a model is
improved by reducing it to a minimum misclassification rate.
Business Understanding Pappalardo Francesca –Business Innovation & Informatics
Future
Development
A Neural Network way
With a work of Features Extraction, I obtained 10 important features with a
Permutation Test on each classifier.
Based on these features, I created two Neural Networks using Keras.
One with the specific important 10 features and the other one NN with all
features.
The model used for the development of a Neural Network was Sequential.
The number of neurons used was selected on a range from 1 to 60 (60
corresponds to twice the number of features) evaluating the optimal recall
value returned by each.
The optimal number of neurons obtained was 21.
Future Development Pappalardo Francesca –Business Innovation & Informatics
Neural Network
NN with all features NN with 10 specific features
Future Development Pappalardo Francesca –Business Innovation & Informatics
Future Development
In the future, to continue this analysis:
• Neural Networks customize can be created to improve the
values of the metrics obtained.
• Improve performance costs using the thresholding technique.
Future Development Pappalardo Francesca –Business Innovation & Informatics
Thanks for the attention!
Any questions?
You can find the code at:
https://github.com/kikkatigre/Credit-Card-Fraud-Detection

More Related Content

What's hot

Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...
butest
 
PROJECT REPORT
PROJECT REPORTPROJECT REPORT
PROJECT REPORT
pmm330
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
butest
 

What's hot (20)

Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Using Machine Learning in Anti Money Laundering - Part 1
Using Machine Learning in Anti Money Laundering - Part 1Using Machine Learning in Anti Money Laundering - Part 1
Using Machine Learning in Anti Money Laundering - Part 1
 
PROJECT REPORT
PROJECT REPORTPROJECT REPORT
PROJECT REPORT
 
Machine Learning Algorithm & Anomaly detection 2021
Machine Learning Algorithm & Anomaly detection 2021Machine Learning Algorithm & Anomaly detection 2021
Machine Learning Algorithm & Anomaly detection 2021
 
Real Time Intrusion Detection System Using Computational Intelligence and Neu...
Real Time Intrusion Detection System Using Computational Intelligence and Neu...Real Time Intrusion Detection System Using Computational Intelligence and Neu...
Real Time Intrusion Detection System Using Computational Intelligence and Neu...
 
Modular Machine Learning for Model Validation
Modular Machine Learning for Model ValidationModular Machine Learning for Model Validation
Modular Machine Learning for Model Validation
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The Best Data Science Training in Bangalore by myTectra
The Best Data Science Training in Bangalore by myTectraThe Best Data Science Training in Bangalore by myTectra
The Best Data Science Training in Bangalore by myTectra
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex models
 
The role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha KrishnanThe role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha Krishnan
 

Similar to Fraud Detection with Ensemble Learning Technique

Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
Sanghun Kim
 
Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015
Siva Rama Sarma
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
NitishChoudhary23
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Rahul Bhatia
 

Similar to Fraud Detection with Ensemble Learning Technique (20)

CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
 
Project PPT sem 2.pptx
Project PPT sem 2.pptxProject PPT sem 2.pptx
Project PPT sem 2.pptx
 
CS-422 THESIS (1).pptx
CS-422 THESIS (1).pptxCS-422 THESIS (1).pptx
CS-422 THESIS (1).pptx
 
Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015Sivrama Sarma - Profile_July_2015
Sivrama Sarma - Profile_July_2015
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Data-science-manager.docx
Data-science-manager.docxData-science-manager.docx
Data-science-manager.docx
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Predictive Analytics Overview
Predictive Analytics OverviewPredictive Analytics Overview
Predictive Analytics Overview
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
Machine learning
 Machine learning Machine learning
Machine learning
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 

More from Francesca Pappalardo

More from Francesca Pappalardo (10)

Final written Essay Francesca Pappalardo
Final written Essay Francesca PappalardoFinal written Essay Francesca Pappalardo
Final written Essay Francesca Pappalardo
 
FATE Financial Analysis Tool for Excel - Prenatal
FATE Financial Analysis Tool for Excel - PrenatalFATE Financial Analysis Tool for Excel - Prenatal
FATE Financial Analysis Tool for Excel - Prenatal
 
Small Summary
Small SummarySmall Summary
Small Summary
 
Report Statistical Analysis
Report Statistical AnalysisReport Statistical Analysis
Report Statistical Analysis
 
Presentation CCT
Presentation CCTPresentation CCT
Presentation CCT
 
CCT (Check and Calculate Transfer)
CCT (Check and Calculate Transfer)CCT (Check and Calculate Transfer)
CCT (Check and Calculate Transfer)
 
CCT Check and Calculate Transfer
CCT Check and Calculate TransferCCT Check and Calculate Transfer
CCT Check and Calculate Transfer
 
SLEMapp
SLEMappSLEMapp
SLEMapp
 
CoolMi Documentation
CoolMi DocumentationCoolMi Documentation
CoolMi Documentation
 
Cool mi by Coolook
Cool mi by Coolook Cool mi by Coolook
Cool mi by Coolook
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Fraud Detection with Ensemble Learning Technique

  • 1. CREDIT CARD FRAUD DETECTION: EVALUATION AND DATA ANALYSIS WITH ENSEMBLE LEARNING TECHNIQUES Francesca Pappalardo Business Innovation and Informatics – Computational Intelligence and Data Analytics
  • 3. “ The aim of my analysis is to identify the best model of the Ensemble Learning family capable of correctly predicting fraudulent transactions. 3 Introduction Pappalardo Francesca –Business Innovation & Informatics
  • 4. CRISP-DM 4 Introduction Pappalardo Francesca –Business Innovation & Informatics
  • 6. Dataset Explanation CONTINUOUS VARIABLE ○ V1…V28: may be result of a PCA Dimensionality reduction to protect user identities and sensitive features ○ Amount: transaction amount ○ Time: number of seconds elapsed between the i-th transaction and the first transaction in the dataset CATEGORICAL VARIABLE Class ○ 1: Fraudulent Transaction ○ 0: Not Fraudulent Transaction The dataset on which an analysis study was carried out was downloaded from the Kaggle platform. The dataset contains transactions detected in two days in September 2013. It presents 31 features. Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
  • 7. 284,807 492 0.173% Total Transactions Number of Fraudulent Transaction Percentage of Fraudulent Transactions Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
  • 8. Data Visualization in high dimensionality PCA Technique t-SNE Original Dataset Dataset after PCA Kullback Leibler divergence: 89,7086633 Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
  • 9. Analysis What time of the day are fraudulent transactions taking place? ? Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
  • 10. Analysis What was the maximum amount of money with a fraudulent transaction? ? Maximum Amount for fraudulent transaction is: $ 2125,87 Exploration Data Analysis Pappalardo Francesca –Business Innovation & Informatics
  • 12. Data Preparation Steps 1. Data Standardization To ensure that all the data involved in the analysis assume an average value of 0 and a standard deviation of 1. 2. Splitting Data 60% train set 20% test set 20% validation set 3. Random Under Sampling To reduce the number of transactions of the majority class and work in excellent conditions in terms of time and costs. Data Preparation Pappalardo Francesca –Business Innovation & Informatics
  • 13. Data Preparation Train set (after standardization) Train set (after Random Under Sampling) Fraud 308 308 Non Fraud 170575 1540 Data Preparation Pappalardo Francesca –Business Innovation & Informatics
  • 15. Ensemble Learning Ensemble Learning is a Machine Learning paradigm where multiple models called weak learners are trained to solve the same problem and combine to get better results. Bagging Bagging builds multiple models using bootstrap sampling; It’s useful to decrease model’s variance. Boosting Boosting builds models by iteratively fitting baselearners to model error; It’s useful to decreasing the model’s bias. Stacking Stacking creates a hierarchy of models using the outputs from previous layers; It’s useful to increasing the predictive force of the classifier. Modeling Pappalardo Francesca –Business Innovation & Informatics
  • 16. Ensemble Random Forest AdaBoost XGBoost Classifiers Modeling Pappalardo Francesca –Business Innovation & Informatics
  • 17. Steps to train each model Modeling Pappalardo Francesca –Business Innovation & Informatics 1. Find a best numbers of estimators; 2. OOB Score and select features; 3. Evaluate each model considered all features and the features suggested by OOB Score; OOB Score – Random Forest OOB Score – AdaBoost OOB Score – XGBoost
  • 18. Winning classifiers Modeling Pappalardo Francesca –Business Innovation & Informatics Random Forest with 12 features suggested by OOB Score AdaBoost (all features) XGBoost (all features)
  • 19. Ensemble Classifier Ensemble Classifier with all classifiers Ensemble Classifier with winning classifiers Modeling Pappalardo Francesca –Business Innovation & Informatics
  • 21. Model Evaluation The model evaluation phase is an important step in the project. For this analysis, the evaluation of the models was evaluated based on metrics such as: ○ Recall: What proportion of actual positives was identified correctly? ○ Precision: What proportion of positive identifications was correct? ○ ROC-AUC: It tells how much model is capable of distinguishing between classes. Evaluation Pappalardo Francesca –Business Innovation & Informatics
  • 22. Model Evaluation The graph shows on the x-axis the False Positive rate parameter which indicates the proportion of all transactions that are not fraudulent but will be identified as fraudulent. Instead, on the y-axis, it has True Positive Rate or Recall. AUC provides a measure of performance on all possible classication thresholds. Evaluation Pappalardo Francesca –Business Innovation & Informatics
  • 23. Model Evaluation Testing each classifier with the validation set, the best result in terms of recall was obtained by Ensemble Classifier with a value 0.82. Evaluation Pappalardo Francesca –Business Innovation & Informatics
  • 25. Evaluation Cost In a fraud detection system analysis, if a transaction is not identified, a loss of money is charged. To evaluate in terms of costs, I established these values following the data. Business Understanding Pappalardo Francesca –Business Innovation & Informatics
  • 26. Evaluation Cost In binary classification problems, the performance of a model is improved by reducing it to a minimum misclassification rate. Business Understanding Pappalardo Francesca –Business Innovation & Informatics
  • 28. A Neural Network way With a work of Features Extraction, I obtained 10 important features with a Permutation Test on each classifier. Based on these features, I created two Neural Networks using Keras. One with the specific important 10 features and the other one NN with all features. The model used for the development of a Neural Network was Sequential. The number of neurons used was selected on a range from 1 to 60 (60 corresponds to twice the number of features) evaluating the optimal recall value returned by each. The optimal number of neurons obtained was 21. Future Development Pappalardo Francesca –Business Innovation & Informatics
  • 29. Neural Network NN with all features NN with 10 specific features Future Development Pappalardo Francesca –Business Innovation & Informatics
  • 30. Future Development In the future, to continue this analysis: • Neural Networks customize can be created to improve the values of the metrics obtained. • Improve performance costs using the thresholding technique. Future Development Pappalardo Francesca –Business Innovation & Informatics
  • 31. Thanks for the attention! Any questions? You can find the code at: https://github.com/kikkatigre/Credit-Card-Fraud-Detection

Editor's Notes

  1. ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease.
  2. ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease.