SlideShare a Scribd company logo
1 of 38
Fraud Data Science
Alejandro Correa Bahnsen, PhD
Lead Data Scientist
About me
• PhD in Machine Learning at Luxembourg University
• Lead Data Scientist at Easy Solutions
• Worked for +8 years as a data scientist at GE Money, Scotiabank
and SIX Financial Services
• Bachelor and Master in Industrial Engineering
• Organizer of the Big Data & Data Science Bogota Meetup
2
Data
Science
3
4
5
Big data (Data Science) is like teenage sex:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is doing it,
so everyone claims they are doing it...
6
7
Those are the pillars of data science: computing, statistics,
mathematics and quantitative disciplines combined to
analyze data for better decision making
Data Science is the use
of methods and tools of
Machine Learning and
Artificial Intelligence
with the objective
making data-driven
decisions
8
Fraud detection
and prevention
9
Estimate the probability of a transaction being fraud based on
analyzing customer patterns and recent fraudulent behavior
Issues when constructing a fraud detection system:
• Skewness of the data
• Cost-sensitivity
• Short time response of the system
• Dimensionality of the search space
• Feature preprocessing
• Model selection
10
Credit card fraud detection
Network
Fraud??
11
• Larger European card processing
company
• 2012 & 2013 card present
transactions
• 20MM Transactions
• 40,000 Frauds
• 0.467% Fraud rate
• ~ 2MM EUR lost due to fraud on
test dataset
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
Test
Train
Data
• “Purpose is to use facts and rules, taken from the knowledge
of many human experts, to help make decisions.”
• Example of rules
• More than 4 ATM transactions in one hour?
• More than 2 transactions in 5 minutes?
• Magnetic stripe transaction then internet transaction?
13
If-Then rules (Expert rules)
1.04%
31%
17%
22%
Miss-cla Recall Precision F1-Score
14
If-Then rules (Expert rules)
Credit card fraud detection is a cost-sensitive problem. As the cost due to a
false positive is different than the cost of a false negative.
• False positives: When predicting a transaction as fraudulent, when in
fact it is not a fraud, there is an administrative cost that is incurred by
the financial institution.
• False negatives: Failing to detect a fraud, the amount of that transaction
is lost.
Moreover, it is not enough to assume a constant cost difference between
false positives and false negatives, as the amount of the transactions varies
quite significantly.
15
Financial evaluation
Cost matrix
𝐶𝑜𝑠𝑡 𝑓 𝑆 =
𝑖=1
𝑁
𝑦𝑖 𝑐𝑖 𝐶 𝑇𝑃 𝑖
+ 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖
+ 1 − 𝑦𝑖 𝑐𝑖 𝐶 𝐹𝑃 𝑖
+ 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖
16
Actual Positive
𝒚𝒊 = 𝟏
Actual Negative
𝒚𝒊 = 𝟎
Predicted Positive
𝒄𝒊 = 𝟏
𝐶 𝑇𝑃 𝑖
= 𝐶 𝑎 𝐶 𝐹𝑃 𝑖
= 𝐶 𝑎
Predicted Negative
𝒄𝒊 = 𝟎
𝐶 𝐹𝑁 𝑖
= 𝐴𝑚𝑡𝑖 𝐶 𝑇𝑁 𝑖
= 0
Financial evaluation
1.24 €
1.94 €
Cost Total Losses
1.04%
31%
17%
22%
Miss-cla Recall Precision F1-Score
17
If-Then rules (Expert rules)
Fraud Data
Science
18
Fraud Data Science is the use of
statistical and mathematical techniques
(Machine Learning) to discover patterns
in data in order to make predictions
Fraud Data Science
Raw features
20
Attribute name Description
Transaction ID Transaction identification number
Time Date and time of the transaction
Account number Identification number of the customer
Card number Identification of the credit card
Transaction type ie. Internet, ATM, POS, ...
Entry mode ie. Chip and pin, magnetic stripe, ...
Amount Amount of the transaction in Euros
Merchant code Identification of the merchant type
Merchant group Merchant group identification
Country Country of trx
Country 2 Country of residence
Type of card ie. Visa debit, Mastercard, American Express...
Gender Gender of the card holder
Age Card holder age
Bank Issuer bank of the card
Features
Transaction aggregation strategy
21
Raw Features
TrxId Time Type Country Amt
1 1/1 18:20 POS Lux 250
2 1/1 20:35 POS Lux 400
3 1/1 22:30 ATM Lux 250
4 2/1 00:50 POS Ger 50
5 2/1 19:18 POS Ger 100
6 2/1 23:45 POS Ger 150
7 3/1 06:00 POS Lux 10
Aggregated Features
No Trx
last 24h
Amt last
24h
No Trx
last 24h
same
type and
country
Amt last
24h same
type and
country
0 0 0 0
1 250 1 250
2 650 0 0
3 900 0 0
3 700 1 50
2 150 2 150
3 400 0 0
Features
When is a customer expected to
make a new transaction?
Considering a von Mises
distribution with a period of 24
hours such that
𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎
=
𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇)
2𝜋𝐼0 𝜎
where 𝝁 is the mean, 𝝈 is the standard
deviation, and 𝑰 𝟎 is the Bessel function
22
Periodic features
23
Periodic features
24
*New Periodic features
• Analyzing the time of
a transaction using a
24 hour clock
• Model a non-linear
von Mises kernel
25
*New Periodic features
19h risk = 10
9h risk = 95
• Estimate the risk comparing a new transaction with the kernel
distribution
Modeling Basics
26
Amountofthetransaction
Number of transactions last day
Normal Transaction
Fraud
27
28
Amountofthetransaction
Number of transactions last day
Normal Transaction
Fraud
29
Amount of the transaction
Normal Transaction
Fraud
Number of transactions last dayNumber of ATM transactions
last week
Fraud Analytics
Algorithms
Fuzzy Rules
Neural Nets
Naive Bayes
Random Forests
RF – with Cost-Proportionate
Rejection Sampling
Cost-Sensitive Random Patches
Decision Trees
30
0%
20%
40%
60%
80%
100%
Expert
Rules
Fuzzy
Rules
Neural
Nets
Naïve
Bayes
Random
Forests
RF - CP
Random
Sampling
CS
Random
Patches
% Savings % Frauds
31
32
Model Performance vs. Interpretability
33
Black Box Decryption
34
Local Interpretable Model-agnostic Explanations
The LIME algorithm approximates
the underlying model with an
interpretable one by:
• Learning on perturbations of the
original instance
• Finding the nearest neighborhood
around the target instance
• Training a sparse linear model in
the
35
Interpreting Model Predictions
Transaction 1
Anomaly Score = 82
Example of using LIME to
understand predictions of
an anomaly detection
algorithm (Isolation Forest),
trained with over 2 million
parameters.
36
Interpreting Model Predictions
Transaction 3
Anomaly Score = 99
Transaction 2
Anomaly Score = 0
• Fraud Data Science (ML) models are
significantly better than expert rules
• Models should be evaluated taking into
account real financial costs of the application
• Algorithms should be developed to
incorporate those financial costs
• Don't be afraid of complex ML models
Takeaways!!
37
Questions?
Alejandro Correa Bahnsen, PhD
Lead Data Scientist
acorrea@Easysol.net
38

More Related Content

What's hot

Online Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningOnline Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningStefano Tempesta
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud DetectionBinayakreddy
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithmsankit panigrahy
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION K Srinivas Rao
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionkalpesh1908
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxImpetus Technologies
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 

What's hot (20)

Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Online Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningOnline Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine Learning
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 

Viewers also liked

Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationAlejandro Correa Bahnsen, PhD
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningAlejandro Correa Bahnsen, PhD
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Alejandro Correa Bahnsen, PhD
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...Alejandro Correa Bahnsen, PhD
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksAlejandro Correa Bahnsen, PhD
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesAlejandro Correa Bahnsen, PhD
 

Viewers also liked (12)

Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural Networks
 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 

Similar to 1609 Fraud Data Science

Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
 
How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...
How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...
How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...Vesta Corporation
 
Application of Data Mining and Machine Learning techniques for Fraud Detectio...
Application of Data Mining and Machine Learning techniques for Fraud Detectio...Application of Data Mining and Machine Learning techniques for Fraud Detectio...
Application of Data Mining and Machine Learning techniques for Fraud Detectio...Christian Adom
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001Vijay Desai
 
Brighterion bai july 2016 fraud white paper
Brighterion bai july 2016 fraud white paperBrighterion bai july 2016 fraud white paper
Brighterion bai july 2016 fraud white paperAndrew Morrison
 
Understanding the Card Fraud Lifecycle : A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle :  A Guide For Private Label IssuersUnderstanding the Card Fraud Lifecycle :  A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle : A Guide For Private Label IssuersChristopher Uriarte
 
A Novel Framework for Credit Card.
A Novel Framework for Credit Card.A Novel Framework for Credit Card.
A Novel Framework for Credit Card.Shakas Technologies
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph insideTigerGraph
 
Nasscom how can you identify fraud in fintech lending using deep learning
Nasscom how can you identify fraud in fintech lending using deep learningNasscom how can you identify fraud in fintech lending using deep learning
Nasscom how can you identify fraud in fintech lending using deep learningRatnakar Pandey
 
Fraud Detection System with Artificial Intelligence
Fraud Detection System with Artificial IntelligenceFraud Detection System with Artificial Intelligence
Fraud Detection System with Artificial IntelligenceEmanuele Gargiulo
 
Fraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On GraphsFraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On GraphsTigerGraph
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractVenkat Projects
 
How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jNeo4j
 
Fraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conFraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conSeshika Fernando
 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConSeshika Fernando
 
Next Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNext Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNeo4j
 
The Target Breach – Follow The Money
The Target Breach – Follow The MoneyThe Target Breach – Follow The Money
The Target Breach – Follow The MoneyResilient Systems
 
Fraud Detector - The easy-to-customize, high ROI, IT solution for detecting ...
Fraud Detector - The easy-to-customize, high ROI,  IT solution for detecting ...Fraud Detector - The easy-to-customize, high ROI,  IT solution for detecting ...
Fraud Detector - The easy-to-customize, high ROI, IT solution for detecting ...112Motion
 

Similar to 1609 Fraud Data Science (20)

Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and Graphs
 
How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...
How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...
How the UK's #1 Mobile Network Enhanced Its Approval Rate by 10%, with Zero F...
 
Application of Data Mining and Machine Learning techniques for Fraud Detectio...
Application of Data Mining and Machine Learning techniques for Fraud Detectio...Application of Data Mining and Machine Learning techniques for Fraud Detectio...
Application of Data Mining and Machine Learning techniques for Fraud Detectio...
 
Fraud management
Fraud managementFraud management
Fraud management
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
Brighterion bai july 2016 fraud white paper
Brighterion bai july 2016 fraud white paperBrighterion bai july 2016 fraud white paper
Brighterion bai july 2016 fraud white paper
 
Understanding the Card Fraud Lifecycle : A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle :  A Guide For Private Label IssuersUnderstanding the Card Fraud Lifecycle :  A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle : A Guide For Private Label Issuers
 
A Novel Framework for Credit Card.
A Novel Framework for Credit Card.A Novel Framework for Credit Card.
A Novel Framework for Credit Card.
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Nasscom how can you identify fraud in fintech lending using deep learning
Nasscom how can you identify fraud in fintech lending using deep learningNasscom how can you identify fraud in fintech lending using deep learning
Nasscom how can you identify fraud in fintech lending using deep learning
 
Fraud Detection System with Artificial Intelligence
Fraud Detection System with Artificial IntelligenceFraud Detection System with Artificial Intelligence
Fraud Detection System with Artificial Intelligence
 
Fraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On GraphsFraudulent credit card cash-out detection On Graphs
Fraudulent credit card cash-out detection On Graphs
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstract
 
How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4j
 
Fraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conFraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data con
 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data Con
 
Next Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNext Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4j
 
The Target Breach – Follow The Money
The Target Breach – Follow The MoneyThe Target Breach – Follow The Money
The Target Breach – Follow The Money
 
Fraud Detector - The easy-to-customize, high ROI, IT solution for detecting ...
Fraud Detector - The easy-to-customize, high ROI,  IT solution for detecting ...Fraud Detector - The easy-to-customize, high ROI,  IT solution for detecting ...
Fraud Detector - The easy-to-customize, high ROI, IT solution for detecting ...
 

More from Alejandro Correa Bahnsen, PhD

More from Alejandro Correa Bahnsen, PhD (6)

black hat deephish
black hat deephishblack hat deephish
black hat deephish
 
DeepPhish: Simulating malicious AI
DeepPhish: Simulating malicious AIDeepPhish: Simulating malicious AI
DeepPhish: Simulating malicious AI
 
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data Products
 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision Trees
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

1609 Fraud Data Science

  • 1. Fraud Data Science Alejandro Correa Bahnsen, PhD Lead Data Scientist
  • 2. About me • PhD in Machine Learning at Luxembourg University • Lead Data Scientist at Easy Solutions • Worked for +8 years as a data scientist at GE Money, Scotiabank and SIX Financial Services • Bachelor and Master in Industrial Engineering • Organizer of the Big Data & Data Science Bogota Meetup 2
  • 4. 4
  • 5. 5
  • 6. Big data (Data Science) is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... 6
  • 7. 7 Those are the pillars of data science: computing, statistics, mathematics and quantitative disciplines combined to analyze data for better decision making
  • 8. Data Science is the use of methods and tools of Machine Learning and Artificial Intelligence with the objective making data-driven decisions 8
  • 10. Estimate the probability of a transaction being fraud based on analyzing customer patterns and recent fraudulent behavior Issues when constructing a fraud detection system: • Skewness of the data • Cost-sensitivity • Short time response of the system • Dimensionality of the search space • Feature preprocessing • Model selection 10 Credit card fraud detection
  • 12. • Larger European card processing company • 2012 & 2013 card present transactions • 20MM Transactions • 40,000 Frauds • 0.467% Fraud rate • ~ 2MM EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train Data
  • 13. • “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.” • Example of rules • More than 4 ATM transactions in one hour? • More than 2 transactions in 5 minutes? • Magnetic stripe transaction then internet transaction? 13 If-Then rules (Expert rules)
  • 14. 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 14 If-Then rules (Expert rules)
  • 15. Credit card fraud detection is a cost-sensitive problem. As the cost due to a false positive is different than the cost of a false negative. • False positives: When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the financial institution. • False negatives: Failing to detect a fraud, the amount of that transaction is lost. Moreover, it is not enough to assume a constant cost difference between false positives and false negatives, as the amount of the transactions varies quite significantly. 15 Financial evaluation
  • 16. Cost matrix 𝐶𝑜𝑠𝑡 𝑓 𝑆 = 𝑖=1 𝑁 𝑦𝑖 𝑐𝑖 𝐶 𝑇𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖 + 1 − 𝑦𝑖 𝑐𝑖 𝐶 𝐹𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖 16 Actual Positive 𝒚𝒊 = 𝟏 Actual Negative 𝒚𝒊 = 𝟎 Predicted Positive 𝒄𝒊 = 𝟏 𝐶 𝑇𝑃 𝑖 = 𝐶 𝑎 𝐶 𝐹𝑃 𝑖 = 𝐶 𝑎 Predicted Negative 𝒄𝒊 = 𝟎 𝐶 𝐹𝑁 𝑖 = 𝐴𝑚𝑡𝑖 𝐶 𝑇𝑁 𝑖 = 0 Financial evaluation
  • 17. 1.24 € 1.94 € Cost Total Losses 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 17 If-Then rules (Expert rules)
  • 19. Fraud Data Science is the use of statistical and mathematical techniques (Machine Learning) to discover patterns in data in order to make predictions Fraud Data Science
  • 20. Raw features 20 Attribute name Description Transaction ID Transaction identification number Time Date and time of the transaction Account number Identification number of the customer Card number Identification of the credit card Transaction type ie. Internet, ATM, POS, ... Entry mode ie. Chip and pin, magnetic stripe, ... Amount Amount of the transaction in Euros Merchant code Identification of the merchant type Merchant group Merchant group identification Country Country of trx Country 2 Country of residence Type of card ie. Visa debit, Mastercard, American Express... Gender Gender of the card holder Age Card holder age Bank Issuer bank of the card Features
  • 21. Transaction aggregation strategy 21 Raw Features TrxId Time Type Country Amt 1 1/1 18:20 POS Lux 250 2 1/1 20:35 POS Lux 400 3 1/1 22:30 ATM Lux 250 4 2/1 00:50 POS Ger 50 5 2/1 19:18 POS Ger 100 6 2/1 23:45 POS Ger 150 7 3/1 06:00 POS Lux 10 Aggregated Features No Trx last 24h Amt last 24h No Trx last 24h same type and country Amt last 24h same type and country 0 0 0 0 1 250 1 250 2 650 0 0 3 900 0 0 3 700 1 50 2 150 2 150 3 400 0 0 Features
  • 22. When is a customer expected to make a new transaction? Considering a von Mises distribution with a period of 24 hours such that 𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎 = 𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇) 2𝜋𝐼0 𝜎 where 𝝁 is the mean, 𝝈 is the standard deviation, and 𝑰 𝟎 is the Bessel function 22 Periodic features
  • 24. 24 *New Periodic features • Analyzing the time of a transaction using a 24 hour clock • Model a non-linear von Mises kernel
  • 25. 25 *New Periodic features 19h risk = 10 9h risk = 95 • Estimate the risk comparing a new transaction with the kernel distribution
  • 27. Amountofthetransaction Number of transactions last day Normal Transaction Fraud 27
  • 28. 28 Amountofthetransaction Number of transactions last day Normal Transaction Fraud
  • 29. 29 Amount of the transaction Normal Transaction Fraud Number of transactions last dayNumber of ATM transactions last week
  • 30. Fraud Analytics Algorithms Fuzzy Rules Neural Nets Naive Bayes Random Forests RF – with Cost-Proportionate Rejection Sampling Cost-Sensitive Random Patches Decision Trees 30
  • 32. 32 Model Performance vs. Interpretability
  • 34. 34 Local Interpretable Model-agnostic Explanations The LIME algorithm approximates the underlying model with an interpretable one by: • Learning on perturbations of the original instance • Finding the nearest neighborhood around the target instance • Training a sparse linear model in the
  • 35. 35 Interpreting Model Predictions Transaction 1 Anomaly Score = 82 Example of using LIME to understand predictions of an anomaly detection algorithm (Isolation Forest), trained with over 2 million parameters.
  • 36. 36 Interpreting Model Predictions Transaction 3 Anomaly Score = 99 Transaction 2 Anomaly Score = 0
  • 37. • Fraud Data Science (ML) models are significantly better than expert rules • Models should be evaluated taking into account real financial costs of the application • Algorithms should be developed to incorporate those financial costs • Don't be afraid of complex ML models Takeaways!! 37
  • 38. Questions? Alejandro Correa Bahnsen, PhD Lead Data Scientist acorrea@Easysol.net 38

Editor's Notes

  1. The famous French general didn’t even live the information age, and yet he attributed most of his military success to having the right information. When you’re battling for a competitive advantage in business, analytics data can be equally important to your success.