2013 credit card fraud detection why theory dosent adjust to practice

2,511 views

Published on

Presentation at the SAS Analytics Conference 2013, London, UK.

Presenter:
Alejandro Correa Bahnsen

1 Comment
7 Likes
Statistics
Notes
No Downloads
Views
Total views
2,511
On SlideShare
0
From Embeds
0
Number of Embeds
782
Actions
Shares
0
Downloads
115
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

2013 credit card fraud detection why theory dosent adjust to practice

  1. 1. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Credit Card Fraud DetectionWhy Theory Doesnt Adjust to PracticeAlejandro Correa Bahnsen, Luxembourg UniversityAndrés Gonzalez Montoya, Scotia Bank
  2. 2. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Introduction€ 500€ 600€ 700€ 8002007 2008 2009 2010 2011E 2012EEurope fraud evolutionInternet transactions (millions of euros)
  3. 3. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Introduction$-$1.0$2.0$3.0$4.0$5.02001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012US fraud evolutionOnline revenue lost due to fraud (Billions of dollars)
  4. 4. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Increasing fraud levels around the world• Different technologies and legal requirements makesit harder to control• There is a need for advanced fraud detectionsystemsIntroduction
  5. 5. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Introduction• Transaction flow• Database• Evaluation of algorithms• If-Then rules (Expert Rules)• Financial measure• Predictive modeling• Logistic Regression• Cost Sensitive Logistic RegressionAgenda
  6. 6. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Simplify transaction flowFraud??Network
  7. 7. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Data• Larger European cardprocessing company• 2012 card present transactions• 750,000 Transactions• 3500 Frauds• 0.467% Fraud rate• 148,562 EUR lost due to fraudon test datasetDecNovOctSepAugJulJunMayAprMarFebJanTestTrain
  8. 8. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Raw attributes• Other attributes:Age, country of residence, postal code, type of cardDataTRXID Client ID Date Amount Location TypeMerchantGroupFraud1 1 2/1/12 6:00 580 Ger Internet Airlines No2 1 2/1/12 6:15 120 Eng Present Car Rent No3 2 2/1/12 8:20 12 Bel Present Hotel Yes4 1 3/1/12 4:15 60 Esp ATM ATM No5 2 3/1/12 9:18 8 Fra Present Retail No6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes
  9. 9. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Derived attributesDataTrxIDClientIDDate Amount Location TypeMerchantGroupFraudNo. of Trx – sameclient – last 6 hourSum – same client– last 7 days1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 02 1 2/1/12 6:15 120 Eng Present Car Renting No 1 5803 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 04 1 3/1/12 4:15 60 Esp ATM ATM No 0 7005 2 3/1/12 9:18 8 Fra Present Retail No 0 126 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760By Group Last FunctionClient None hour CountCredit Card Transaction Type day Sum(Amount)Merchant week Avg(Amount)Merchant Category monthMerchant Country 3 months– Combination of following criteria:
  10. 10. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Misclassification = 1 −𝑇𝑃+𝑇𝑁𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁• Recall =𝑇𝑃𝑇𝑃+𝐹𝑁• Precision =𝑇𝑃𝑇𝑃+𝐹𝑃• F-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙EvaluationTrue Class (𝑦𝑖)Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)Predicted class(𝑝𝑖)Fraud (𝑝𝑖=1) TP FPLegitimate (𝑝𝑖=0) FN TN• Confusion matrix
  11. 11. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Introduction• Transaction flow• Database• Evaluation of algorithms• If-Then rules (Expert Rules)• Financial measure• Predictive modeling• Logistic Regression• Cost Sensitive Logistic RegressionAgenda
  12. 12. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013FraudAlgorithms• If-Then rules• Predictive modeling• Logistic Regression• Decision Trees• Random Forest• Cost SensitiveLogistic RegressionFraud??Network
  13. 13. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• “Purpose is to use facts and rules, taken from theknowledge of many human experts, to help makedecisions.”• Example of rules• More than 4 ATM transactions in one hour?• More than 2 transactions in 5 minutes?• Magnetic stripe transaction then internet transaction?If-Then rules (Expert rules)
  14. 14. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• More than 4 ATM transactions in one hour?• More than 2 transactions in 5 minutes?• Magnetic stripe transaction then internettransaction?If-Then rules (Expert rules)Fraud??NetworkIf one or more rules is activated then decline the transaction
  15. 15. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Problems with rules• New fraud patterns are not detected• Only simple rules can be created• Advantages of rules• Easy to implement• Very easy to interpretIf-Then rules (Expert rules)
  16. 16. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013If-Then rules (Expert rules)1.04%31%17%22%Miss-cla Recall Precision F1-ScoreResults
  17. 17. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Motivation• False positives carries a different cost thanfalse negatives• Frauds range from few to thousands of euros(dollars, pounds, etc)Financial evaluationThere is a need for a real comparison measure
  18. 18. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Cost matrixwhere:• Evaluation measureFinancial evaluationCa Administrative costsAmt Amount of transaction iTrue Class (𝑦𝑖)Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)Predicted class(𝑝𝑖)Fraud (𝑝𝑖=1) Ca CaLegitimate (𝑝𝑖=0) Amt 0
  19. 19. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013If-Then rules1.04%31%17%22%Miss-cla Recall Precision F1-ScoreResults€95,520€148,562Cost Cost No Model148,562 EUR are the losses due to fraud in the test database (2 months)
  20. 20. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Introduction• Transaction flow• Database• Evaluation of algorithms• If-Then rules (Expert Rules)• Financial measure• Predictive modeling• Logistic Regression• Cost Sensitive Logistic RegressionAgenda
  21. 21. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Predictive modeling is the use of statistical andmathematical techniques to discover patterns in data inorder to make predictionsPredictive modeling
  22. 22. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Predictive modelingAmountoftransactionNumber of transactions last dayNormal TransactionFraud
  23. 23. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Predictive modelingAmountoftransactionNumber of transactions last dayNormal TransactionFraud
  24. 24. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Predictive modelingAmount of transactionNumber of transactions last dayNormal TransactionFraudAmount spend on internet last month
  25. 25. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013True Class (𝑦𝑖)Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)Predicted class(𝑝𝑖)Fraud (𝑝𝑖=1) 0 1Legitimate (𝑝𝑖=0) 1 0• Model• Cost Function• Cost MatrixLogistic Regression
  26. 26. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013€148,196€148,562Cost Cost No Model0.52% 0%2%0%Miss-cla Recall Precision F1-ScoreLogistic RegressionResults148,562 EUR are the losses due to fraud in the test database (2 months)
  27. 27. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics20131% 5% 10% 20% 50%Logistic RegressionSub-sampling procedure:0.467%Select all the frauds and a random sample of the legitimate transactions.620,000310,00062,00031,000 15,500 5,200Fraud Percentage
  28. 28. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Logistic RegressionResults€ 148,562 € 148,196€ 142,510€ 112,103€ 79,838€ 65,870€ 46,530€ -€ 20,000€ 40,000€ 60,000€ 80,000€ 100,000€ 120,000€ 140,000€ 160,0000%10%20%30%40%50%60%70%No Model All 1% 5% 10% 20% 50%Cost Recall Precision Miss-cla F1-ScoreSelecting the algorithm by Cost
  29. 29. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Logistic Regression• Best model selected using traditional F1-Score does not gives the best results interms of cost• Model selected by cost, is trained using less than 1% of the database, meaning thereis a lot of information excluded• The algorithm is trained to minimize the miss-classification (approx.) but then isevaluated based on cost• Why not train the algorithm to minimize the cost instead?
  30. 30. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013True Class (𝑦𝑖)Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)Predicted class(𝑝𝑖)Fraud (𝑝𝑖=1) Ca CaLegitimate (𝑝𝑖=0) Amt 0• Cost MatrixCost Sensitive Logistic Regression• Cost Function• ObjectiveFind 𝜃 that minimized the cost function (Genetic Algorithms)
  31. 31. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Cost Function• Gradient• HessianCost Sensitive Logistic Regression
  32. 32. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Cost Sensitive Logistic Regression0%10%20%30%40%50%60%70%80%90%100%LegitimateFraudAmount cumulative distribution€49€370€124€196
  33. 33. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013€ 148,562€ 31,174€ 37,785€ 66,245 € 67,264€ 73,772€ 85,724€ -€ 20,000€ 40,000€ 60,000€ 80,000€ 100,000€ 120,000€ 140,000€ 160,0000%10%20%30%40%50%60%70%80%90%100%No Model All 1% 5% 10% 20% 50%Cost Recall Precision F1-ScoreCost sensitive Logistic RegressionResults
  34. 34. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Cost sensitive Logistic RegressionResults€ 148,562€ 95,520€ 46,530€ 31,174€ 35,466 € 34,203€ -€ 20,000€ 40,000€ 60,000€ 80,000€ 100,000€ 120,000€ 140,000€ 160,0000%10%20%30%40%50%60%70%80%No Model If-Then rules Logistic Regression Cost SensitiveLogistic RegressionDecision Trees Random ForestsCost Recall Precision F1-Score
  35. 35. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Conclusion• Selecting models based on traditional statistics does notgives the best results in terms of cost• Models should be evaluated taking into account realfinancial costs of the application• Algorithms should be developed to incorporate thosefinancial costs
  36. 36. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Contact informationAlejandro Correa BahnsenUniversity of LuxembourgLuxembourgal.bahnsen@gmail.comhttp://www.linkedin.com/in/albahnsenhttp://www.slideshare.net/albahnsen
  37. 37. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013Thank You!!Alejandro Correa BahnsenAndres Gonzalez Montoya
  38. 38. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013• Hastie, T., & Tibshirani, R. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction.Beijing.• Hand, D., Whitrow, C., Adams, N. M., Juszczak, P., & Weston, D. (2007). Performance criteria for plastic card frauddetection tools. Journal of the Operational Research Society, 59, 956–962.• Sheng, V., & Ling, C. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the NationalConference on Artificial Intelligence.• Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: Acomparative study. Decision Support Systems, 50(3), 602–613.• Ling, C., & Sheng, V. (2008). Cost-sensitive learning and the class imbalance problem. In C. Sammut & G. I. Webb(Eds.), Encyclopedia of Machine Learning (pp. 231–235). Springer.• Moro, S., Laureano, R., & Cortez, P. (2011). Using data mining for bank direct marketing: An application of thecrisp-dm methodology. In EUROSIS (Ed.), European Simulation and Modeling Conference - ESM’2011 (pp. 117–121). Guimares, Portugal.References

×