Online fraud costs the global economy more than $400 billion, with more than 800 million personal records stolen in 2013 alone. Increasingly, fraud has diversified to different digital channels, including mobile and online payments, creating new challenges as innovative fraud patterns emerge. Hence it is still a challenge to find effective methods to mitigate fraud. Existing solutions include simple if-then rules and classical machine learning algorithms.
From an academic perspective, credit card fraud detection is a standard classification problem, in which historical transaction data is used to predict future frauds. However, practical aspects make the problem more complex. Indeed, existent comparison measures lack a realistic representation of monetary gains and losses, which is necessary for effective fraud detection. Moreover, there is an enormous amount of transactions from which only a tiny part are frauds, which implies a huge class imbalance. Additionally, a real fraud detection system is required to give a response in milliseconds. This criterion needs to be taken into account in the modeling process in order for the system to be successfully implemented. To solve these problems, in this presentation two recently proposed algorithms are compared: Bayes minimum risk and example-dependent cost-sensitive decision tree. These methods are compared with state of the art algorithms and shows significant improvements measured by financial savings.
2. About me
• PhD in Machine Learning at Luxembourg University
• Lead Data Scientist at Easy Solutions
• Worked for +8 years as a data scientist at GE Money, Scotiabank
and SIX Financial Services
• Bachelor and Master in Industrial Engineering
• Organizer of the Big Data & Data Science Bogota Meetup
2
6. Big data (Data Science) is like teenage sex:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is doing it,
so everyone claims they are doing it...
6
7. 7
Those are the pillars of data science: computing, statistics,
mathematics and quantitative disciplines combined to
analyze data for better decision making
8. Data Science is the use
of methods and tools of
Machine Learning and
Artificial Intelligence
with the objective
making data-driven
decisions
8
10. Estimate the probability of a transaction being fraud based on
analyzing customer patterns and recent fraudulent behavior
Issues when constructing a fraud detection system:
• Skewness of the data
• Cost-sensitivity
• Short time response of the system
• Dimensionality of the search space
• Feature preprocessing
• Model selection
10
Credit card fraud detection
12. • Larger European card processing
company
• 2012 & 2013 card present
transactions
• 20MM Transactions
• 40,000 Frauds
• 0.467% Fraud rate
• ~ 2MM EUR lost due to fraud on
test dataset
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
Test
Train
Data
13. • “Purpose is to use facts and rules, taken from the knowledge
of many human experts, to help make decisions.”
• Example of rules
• More than 4 ATM transactions in one hour?
• More than 2 transactions in 5 minutes?
• Magnetic stripe transaction then internet transaction?
13
If-Then rules (Expert rules)
15. Credit card fraud detection is a cost-sensitive problem. As the cost due to a
false positive is different than the cost of a false negative.
• False positives: When predicting a transaction as fraudulent, when in
fact it is not a fraud, there is an administrative cost that is incurred by
the financial institution.
• False negatives: Failing to detect a fraud, the amount of that transaction
is lost.
Moreover, it is not enough to assume a constant cost difference between
false positives and false negatives, as the amount of the transactions varies
quite significantly.
15
Financial evaluation
19. Fraud Data Science is the use of
statistical and mathematical techniques
(Machine Learning) to discover patterns
in data in order to make predictions
Fraud Data Science
20. Raw features
20
Attribute name Description
Transaction ID Transaction identification number
Time Date and time of the transaction
Account number Identification number of the customer
Card number Identification of the credit card
Transaction type ie. Internet, ATM, POS, ...
Entry mode ie. Chip and pin, magnetic stripe, ...
Amount Amount of the transaction in Euros
Merchant code Identification of the merchant type
Merchant group Merchant group identification
Country Country of trx
Country 2 Country of residence
Type of card ie. Visa debit, Mastercard, American Express...
Gender Gender of the card holder
Age Card holder age
Bank Issuer bank of the card
Features
21. Transaction aggregation strategy
21
Raw Features
TrxId Time Type Country Amt
1 1/1 18:20 POS Lux 250
2 1/1 20:35 POS Lux 400
3 1/1 22:30 ATM Lux 250
4 2/1 00:50 POS Ger 50
5 2/1 19:18 POS Ger 100
6 2/1 23:45 POS Ger 150
7 3/1 06:00 POS Lux 10
Aggregated Features
No Trx
last 24h
Amt last
24h
No Trx
last 24h
same
type and
country
Amt last
24h same
type and
country
0 0 0 0
1 250 1 250
2 650 0 0
3 900 0 0
3 700 1 50
2 150 2 150
3 400 0 0
Features
22. When is a customer expected to
make a new transaction?
Considering a von Mises
distribution with a period of 24
hours such that
𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎
=
𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇)
2𝜋𝐼0 𝜎
where 𝝁 is the mean, 𝝈 is the standard
deviation, and 𝑰 𝟎 is the Bessel function
22
Periodic features
34. 34
Local Interpretable Model-agnostic Explanations
The LIME algorithm approximates
the underlying model with an
interpretable one by:
• Learning on perturbations of the
original instance
• Finding the nearest neighborhood
around the target instance
• Training a sparse linear model in
the
35. 35
Interpreting Model Predictions
Transaction 1
Anomaly Score = 82
Example of using LIME to
understand predictions of
an anomaly detection
algorithm (Isolation Forest),
trained with over 2 million
parameters.
37. • Fraud Data Science (ML) models are
significantly better than expert rules
• Models should be evaluated taking into
account real financial costs of the application
• Algorithms should be developed to
incorporate those financial costs
• Don't be afraid of complex ML models
Takeaways!!
37
The famous French general didn’t even live the information age, and yet he attributed most of his military success to having the right information. When you’re battling for a competitive advantage in business, analytics data can be equally important to your success.