Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Example-Dependent Cost-Sensitive Credit Card Fraud Detection

4,684 views

Published on

Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company.

Published in: Technology
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice explanation on credit cards for more information related to credit cards you can visit mymoneykarma.com. Before you apply for any of the fintech product recommend to have a free credit score check. Refer https://www.mymoneykarma.com/credit-score.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Example-Dependent Cost-Sensitive Credit Card Fraud Detection

  1. 1. Example-Dependent Cost-Sensitive Credit Card Fraud Detection March 21st, 2014 Alejandro Correa Bahnsen with Djamila Aouada, SnT Björn Ottersten, SnT
  2. 2. Introduction € 500 € 600 € 700 € 800 2007 2008 2009 2010 2011E 2012E Europe fraud evolution Internettransactions(millions of euros) 2
  3. 3. Introduction $- $1.0 $2.0 $3.0 $4.0 $5.0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 US fraud evolution Online revenue lost due to fraud (Billions of dollars) 3
  4. 4. • Increasing fraud levels around the world • Different technologies and legal requirements makes it harder to control • Lack of collaboration between academia and practitioners, leading to solutions that fail to incorporate practical issues of credit card fraud detection: • Financial comparison measures • Huge class imbalance • Response time measure in milliseconds Introduction 4
  5. 5. • Introduction • Database • Evaluation • Bayes Minimum Risk • Experiments • Probability Calibration • Other applications • Conclusions & Future Work Agenda 5
  6. 6. Simplify transaction flow Fraud?? Network 6
  7. 7. Data • Larger European card processing company • Jan2012 – Jun2013 card present transactions • 1,638,772 Transactions • 3,444 Frauds • 0.21%Fraud rate • 205,542 EUR lost due to fraud on test dataset Jun13 May13 Apr13 Mar13 Feb13 Jan13 … … … Mar12 Feb12 Jan12 Test Train 7
  8. 8. • Raw attributes • Other attributes: Age, country of residence, postal code, type of card Data TRXID Client ID Date Amount Location Type Merchant Group Fraud 1 1 2/1/12 6:00 580 Ger Internet Airlines No 2 1 2/1/12 6:15 120 Eng Present Car Rent No 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 4 1 3/1/12 4:15 60 Esp ATM ATM No 5 2 3/1/12 9:18 8 Fra Present Retail No 6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 8
  9. 9. • Derived attributes Data Trx ID Client ID Date Amount Location Type Merchant Group Fraud No. of Trx – same client– last 6 hour Sum – same client– last 7 days 1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0 2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0 4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700 5 2 3/1/12 9:18 8 Fra Present Retail No 0 12 6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760 By Group Last Function Client None hour Count Credit Card Transaction Type day Sum(Amount) Merchant week Avg(Amount) Merchant Category month Merchant Country 3 months – Combination of following criteria: 9
  10. 10. Date of transaction 04/03/2012 - 03:14 07/03/2012 - 00:47 07/03/2012 - 02:57 08/03/2012 - 02:08 14/03/2012 - 22:15 25/03/2012 - 05:03 26/03/2012 - 21:51 28/03/2012 - 03:41 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛 = 1 𝑛 𝑡 𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑀𝑒𝑎𝑛 = tan _2−1 sin(𝑡) , cos(𝑡) 𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑆𝑡𝑑 = 𝑙𝑛 1 1 𝑛 sin 𝑡 2 + 1 𝑛 cos 𝑡 2 𝑡 ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝑘 ≈ 1 𝑠𝑡𝑑 𝑃 −𝑧𝑡 < 𝑡 < 𝑧𝑡 = 0.95 -1 -1 24h 6h 12h 18h Data 10
  11. 11. Date of transaction 04/03/2012 - 03:14 07/03/2012 - 00:47 07/03/2012 - 02:57 08/03/2012 - 02:08 14/03/2012 - 22:15 25/03/2012 - 05:03 26/03/2012 - 21:51 28/03/2012 - 03:41 -1 -1 24h 6h 12h 18h 02/04/2012 - 02:02 03/04/2012 - 12:10 new features Inside CI(0.95) last 30 days Inside CI(0.95) last 7 days Inside CI(0.5) last 30 days Inside CI(0.5) last 7 days Data 11
  12. 12. • Misclassification = 1 − 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 • Recall = 𝑇𝑃 𝑇𝑃+𝐹𝑁 • Precision = 𝑇𝑃 𝑇𝑃+𝐹𝑃 • F-Score = 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 Evaluation True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑝𝑖=1) TP FP Legitimate (𝑝𝑖=0) FN TN • Confusion matrix 12
  13. 13. • Motivation: • Equal misclassification results • Frauds carry different cost Evaluation - Financialmeasure TRX ID Amount Fraud 1 580 No 2 120 No 3 12 Yes 4 60 No 5 8 No 6 1210 Yes Miss-Class 2 / 6 Cost 1222 Prediction (Fraud?) No No Yes No Yes No 2 / 6 1212 Prediction (Fraud?) No No No No Yes Yes 2 / 6 14 Prediction (Fraud?) No No No No No No Algorithm 1 Algorithm 3Algorithm 2 13
  14. 14. • Cost matrix where: Evaluation - Financialmeasure True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖 = Ca 𝐶 𝐹𝑃𝑖 = Ca Legitimate (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖 = Amt(i) 𝐶 𝑇𝑁𝑖 = 0 Ca Administrative costs Amt Amount of transaction i 𝐶𝑜𝑠𝑡 = 𝑦𝑖 𝑐𝑖 𝐶 𝑎 + 1 − 𝑐𝑖 𝐴𝑚𝑡𝑖 + (1 − 𝑦𝑖)𝑐𝑖 𝐶 𝑎 𝑚 𝑖=1 14 • Evaluation measure
  15. 15. • Introduction • Database • Evaluation • Bayes Minimum Risk • Experiments • Probability Calibration • Other applications • Conclusions & Future Work Agenda 15
  16. 16. • Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions • Risk of classification Bayes Minimum Risk 16
  17. 17. • If then 𝑡 𝐵𝑀𝑅 𝑖 = 𝐶 𝑎 𝐴𝑚𝑡𝑖 Bayes Minimum Risk 17 • Example-dependent threshold
  18. 18. • Estimation of the fraud probabilities using • Decision Trees • Logistic Regression • Random Forest • Datasets Experiments Database Transactions Frauds Losses Total 1,638,772 0.21% 860,448 Train 815,368 0.21% 416,369 Validation 412,137 0.22% 238,537 Test 411,267 0.21% 205,542 18
  19. 19. Experiments 0.00 0.05 0.10 0.15 0.20 0.25 0 50,000 100,000 150,000 200,000 250,000 No Model RF DT LR Cost(Euros) Cost F1-Score 19
  20. 20. Experiments 0.00 0.05 0.10 0.15 0.20 0.25 - 50,000 100,000 150,000 200,000 250,000 No Model RF BMR DT BMR LR BMR RF DT LR Cost(Euros) Cost F1-Score 20 • Cost its reduced when using BRM • The F1-Score is also reduced
  21. 21. Probability Calibration • When using the output of a binary classier as a basis for decision making, there is a need for a probability that not only separates well between positive and negative examples, but that also assesses the real probability of the event 21
  22. 22. Probability Calibration • Reliability Diagram 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(pf) P(pf|x) base LR RF DT 22
  23. 23. Probability Calibration • ROC Convex Hull calibration ROC CurveClass (y) Prob (p) 0 0.0 1 0.1 0 0.2 0 0.3 1 0.4 0 0.5 1 0.6 1 0.7 0 0.8 1 0.9 1 1.0 23
  24. 24. Probability Calibration • ROC Convex Hull calibration ROC ConvexHull Curve Class (y) Prob (p) Cal Prob 0.0 0 0 0.1 1 0.333 0.2 0 0.333 0.3 0 0.333 0.4 1 0.5 0.5 0 0.5 0.6 1 0.666 0.7 1 0.666 0.8 0 0.666 0.9 1 1 1.0 1 1 the calibratedprobabilitiesare extracted by first group the probabilitiesaccording to the pointsin the ROCCH curve, and then make the calibratedprobabilitiesbe the slope(T) for each group T. 24
  25. 25. Probability Calibration • Reliability Diagram 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(pf) P(pf|x) base RF DT LR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(pf) P(pf|x) base Cal RF Cal DT Cal LR 25
  26. 26. • Extra 1.5% decrease in cost by using calibrated probabilities Experiments 0.00 0.05 0.10 0.15 0.20 0.25 - 50,000 100,000 150,000 200,000 250,000 No Model RF BMR CAL BMR DT BMR CAL BMR LR BMR CAL BMR RF DT LR Cost(Euros) Cost F1-Score 26
  27. 27. • Introduction • Database • Evaluation • Bayes Minimum Risk • Experiments • Probability Calibration • Other applications • Conclusions & Future Work Agenda 27
  28. 28. Other Applications • Direct Marketing: Banking LTD offers http://archive.ics.uci.edu/ml/datasets/Bank+Marketing • Credit Scoring: 2011 Kaggle competition Give Me Some Credit http://www.kaggle.com/c/GiveMeSomeCredit/ • Credit Scoring: 2009 Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD)competition http://sede.neurotech.com.br:443/PAKDD2009/ 28
  29. 29. Other Applications • Direct Marketing: Banking LTD offers where int(i) is the expected interest gains of customer i • Datasets Cost Matrix True Class (𝑦𝑖) Accept (𝑦𝑖=1) Decline (𝑦𝑖=0) Predicted class (𝑝𝑖) Accept (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖 = Ca 𝐶 𝐹𝑃𝑖 = Ca Decline (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖 = Int(i) 𝐶 𝑇𝑁𝑖 = 0 29 Database Examples Acceptance Int Total 47,562 12.56% 394,211 Train 19,119 12.64% 156,676 Validation 11,809 12.78% 97,498 Test 11,815 12.23% 97,594
  30. 30. Other Applications • Direct Marketing: Banking LTD offers 0.00 0.10 0.20 0.30 0.40 - 4,000 8,000 12,000 16,000 No Model RF BMR CAL BMR DT BMR CAL BMR LR BMR CAL BMR RF DT LR Cost(Euros) Cost F1-Score 95,594 30 • Extra 13.4%decrease in cost by using calibrated probabilities
  31. 31. Other Applications • Credit Scoring where 𝑙𝑔𝑑 is the loss given default, 𝐶𝑙𝑖 is the credit line of client i, 𝑟𝑖 is the expected profit of client i, and 𝐶 𝐹𝑃 𝑎 is the expected cost of lending the money to an alternativeborrower. • Datasets Kaggle Credit Dataset PAKDD Credit Dataset Cost Matrix True Class (𝑦𝑖) Accept (𝑦𝑖=1) Decline (𝑦𝑖=0) Predicted class (𝑝𝑖) Accept (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖 = 0 𝐶 𝐹𝑃𝑖 = 𝑟𝑖 + 𝐶 𝐹𝑃𝑎 Decline (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖 = 𝐶𝑙 𝑖 ∗ 𝑙𝑔𝑑 𝐶 𝑇𝑁𝑖 = 0 31 Database Examples Default Total 112,915 6.74% Train 45,358 6.83% Validation 33,850 6.67% Test 33,707 6.71% Database Examples Default Total 38,969 19.88% Train 15,614 19.98% Validation 11,711 20.02% Test 11,644 19.63%
  32. 32. Other Applications • Credit Scoring Kaggle Credit Dataset PAKDD Credit Dataset 0.00 0.10 0.20 0.30 0.40 - 5.00 10.00 15.00 20.00 25.00 RF BMR CAL BMR Cost(MillionsEuros) Cost F1-Score 0.00 0.10 0.20 0.30 0.40 - 0.20 0.40 0.60 0.80 1.00 RF BMR CAL BMRCost(MillionsEuros) Cost F1-Score 32 • Extra 0.9% decrease in cost by using calibrated probabilities
  33. 33. Conclusion • Selecting models based on traditional statisticsdoes not give the best results in terms of cost • Models should be evaluated taking into account real financial costsof the application • Algorithms should be developed to incorporate those real financial costs • Calibration of probabilities yields to further decrease in cost 33
  34. 34. Future work • Example Dependent Cost Sensitive Decision Trees • Example-Dependent Cost-Sensitive Calibration Method • Applications: • Corporate credit risk • Involuntary & Voluntary Churn in TV subscription market 34
  35. 35. References • Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2013). Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk. In International Conference on Machine Learning and Applications. Miami, USA: IEEE. • Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2014). Improving Credit Card Fraud Detection with Calibrated Probabilities. In SIAM International Conference on Data Mining. Philadelphia, USA: SIAM. • Correa Bahnsen, A., Aouada, D., & Ottersten, B. (2014). Example-Dependent Cost- Sensitive Credit Scoring using Bayes Minimum Risk. Submitted to ECAI 2014. 35
  36. 36. Contact information Alejandro Correa Bahnsen University of Luxembourg Luxembourg al.bahnsen@gmail.com http://www.linkedin.com/in/albahnsen http://www.slideshare.net/albahnsen 36

×