Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Analytics - compitiendo en la era d... by Alejandro Correa ... 1843 views
- Classifying Phishing URLs Using Rec... by Alejandro Correa ... 2213 views
- 2013 credit card fraud detection wh... by Alejandro Correa ... 3768 views
- Fraud analytics detección y prevenc... by Alejandro Correa ... 2539 views
- Maximizing a churn campaign’s profi... by Alejandro Correa ... 3936 views
- Fraud Detection with Cost-Sensitive... by Alejandro Correa ... 2738 views

1,657 views

Published on

From an academic perspective, credit card fraud detection is a standard classification problem, in which historical transaction data is used to predict future frauds. However, practical aspects make the problem more complex. Indeed, existent comparison measures lack a realistic representation of monetary gains and losses, which is necessary for effective fraud detection. Moreover, there is an enormous amount of transactions from which only a tiny part are frauds, which implies a huge class imbalance. Additionally, a real fraud detection system is required to give a response in milliseconds. This criterion needs to be taken into account in the modeling process in order for the system to be successfully implemented. To solve these problems, in this presentation two recently proposed algorithms are compared: Bayes minimum risk and example-dependent cost-sensitive decision tree. These methods are compared with state of the art algorithms and shows significant improvements measured by financial savings.

Published in:
Data & Analytics

No Downloads

Total views

1,657

On SlideShare

0

From Embeds

0

Number of Embeds

700

Shares

0

Downloads

59

Comments

0

Likes

4

No embeds

No notes for slide

- 1. Fraud Data Science Alejandro Correa Bahnsen, PhD Lead Data Scientist
- 2. About me • PhD in Machine Learning at Luxembourg University • Lead Data Scientist at Easy Solutions • Worked for +8 years as a data scientist at GE Money, Scotiabank and SIX Financial Services • Bachelor and Master in Industrial Engineering • Organizer of the Big Data & Data Science Bogota Meetup 2
- 3. Data Science 3
- 4. 4
- 5. 5
- 6. Big data (Data Science) is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... 6
- 7. 7 Those are the pillars of data science: computing, statistics, mathematics and quantitative disciplines combined to analyze data for better decision making
- 8. Data Science is the use of methods and tools of Machine Learning and Artificial Intelligence with the objective making data-driven decisions 8
- 9. Fraud detection and prevention 9
- 10. Estimate the probability of a transaction being fraud based on analyzing customer patterns and recent fraudulent behavior Issues when constructing a fraud detection system: • Skewness of the data • Cost-sensitivity • Short time response of the system • Dimensionality of the search space • Feature preprocessing • Model selection 10 Credit card fraud detection
- 11. Network Fraud?? 11
- 12. • Larger European card processing company • 2012 & 2013 card present transactions • 20MM Transactions • 40,000 Frauds • 0.467% Fraud rate • ~ 2MM EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train Data
- 13. • “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.” • Example of rules • More than 4 ATM transactions in one hour? • More than 2 transactions in 5 minutes? • Magnetic stripe transaction then internet transaction? 13 If-Then rules (Expert rules)
- 14. 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 14 If-Then rules (Expert rules)
- 15. Credit card fraud detection is a cost-sensitive problem. As the cost due to a false positive is different than the cost of a false negative. • False positives: When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the financial institution. • False negatives: Failing to detect a fraud, the amount of that transaction is lost. Moreover, it is not enough to assume a constant cost difference between false positives and false negatives, as the amount of the transactions varies quite significantly. 15 Financial evaluation
- 16. Cost matrix 𝐶𝑜𝑠𝑡 𝑓 𝑆 = 𝑖=1 𝑁 𝑦𝑖 𝑐𝑖 𝐶 𝑇𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖 + 1 − 𝑦𝑖 𝑐𝑖 𝐶 𝐹𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖 16 Actual Positive 𝒚𝒊 = 𝟏 Actual Negative 𝒚𝒊 = 𝟎 Predicted Positive 𝒄𝒊 = 𝟏 𝐶 𝑇𝑃 𝑖 = 𝐶 𝑎 𝐶 𝐹𝑃 𝑖 = 𝐶 𝑎 Predicted Negative 𝒄𝒊 = 𝟎 𝐶 𝐹𝑁 𝑖 = 𝐴𝑚𝑡𝑖 𝐶 𝑇𝑁 𝑖 = 0 Financial evaluation
- 17. 1.24 € 1.94 € Cost Total Losses 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 17 If-Then rules (Expert rules)
- 18. Fraud Data Science 18
- 19. Fraud Data Science is the use of statistical and mathematical techniques (Machine Learning) to discover patterns in data in order to make predictions Fraud Data Science
- 20. Raw features 20 Attribute name Description Transaction ID Transaction identification number Time Date and time of the transaction Account number Identification number of the customer Card number Identification of the credit card Transaction type ie. Internet, ATM, POS, ... Entry mode ie. Chip and pin, magnetic stripe, ... Amount Amount of the transaction in Euros Merchant code Identification of the merchant type Merchant group Merchant group identification Country Country of trx Country 2 Country of residence Type of card ie. Visa debit, Mastercard, American Express... Gender Gender of the card holder Age Card holder age Bank Issuer bank of the card Features
- 21. Transaction aggregation strategy 21 Raw Features TrxId Time Type Country Amt 1 1/1 18:20 POS Lux 250 2 1/1 20:35 POS Lux 400 3 1/1 22:30 ATM Lux 250 4 2/1 00:50 POS Ger 50 5 2/1 19:18 POS Ger 100 6 2/1 23:45 POS Ger 150 7 3/1 06:00 POS Lux 10 Aggregated Features No Trx last 24h Amt last 24h No Trx last 24h same type and country Amt last 24h same type and country 0 0 0 0 1 250 1 250 2 650 0 0 3 900 0 0 3 700 1 50 2 150 2 150 3 400 0 0 Features
- 22. When is a customer expected to make a new transaction? Considering a von Mises distribution with a period of 24 hours such that 𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎 = 𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇) 2𝜋𝐼0 𝜎 where 𝝁 is the mean, 𝝈 is the standard deviation, and 𝑰 𝟎 is the Bessel function 22 Periodic features
- 23. 23 Periodic features
- 24. 24 *New Periodic features • Analyzing the time of a transaction using a 24 hour clock • Model a non-linear von Mises kernel
- 25. 25 *New Periodic features 19h risk = 10 9h risk = 95 • Estimate the risk comparing a new transaction with the kernel distribution
- 26. Modeling Basics 26
- 27. Amountofthetransaction Number of transactions last day Normal Transaction Fraud 27
- 28. 28 Amountofthetransaction Number of transactions last day Normal Transaction Fraud
- 29. 29 Amount of the transaction Normal Transaction Fraud Number of transactions last dayNumber of ATM transactions last week
- 30. Fraud Analytics Algorithms Fuzzy Rules Neural Nets Naive Bayes Random Forests RF – with Cost-Proportionate Rejection Sampling Cost-Sensitive Random Patches Decision Trees 30
- 31. 0% 20% 40% 60% 80% 100% Expert Rules Fuzzy Rules Neural Nets Naïve Bayes Random Forests RF - CP Random Sampling CS Random Patches % Savings % Frauds 31
- 32. 32 Model Performance vs. Interpretability
- 33. 33 Black Box Decryption
- 34. 34 Local Interpretable Model-agnostic Explanations The LIME algorithm approximates the underlying model with an interpretable one by: • Learning on perturbations of the original instance • Finding the nearest neighborhood around the target instance • Training a sparse linear model in the
- 35. 35 Interpreting Model Predictions Transaction 1 Anomaly Score = 82 Example of using LIME to understand predictions of an anomaly detection algorithm (Isolation Forest), trained with over 2 million parameters.
- 36. 36 Interpreting Model Predictions Transaction 3 Anomaly Score = 99 Transaction 2 Anomaly Score = 0
- 37. • Fraud Data Science (ML) models are significantly better than expert rules • Models should be evaluated taking into account real financial costs of the application • Algorithms should be developed to incorporate those financial costs • Don't be afraid of complex ML models Takeaways!! 37
- 38. Questions? Alejandro Correa Bahnsen, PhD Lead Data Scientist acorrea@Easysol.net 38

No public clipboards found for this slide

Be the first to comment