Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-time fraud detection in credit card transactions

1,262 views

Published on

Prezentacja wygłoszona na Data Science Warsaw Meetup 2017-02.28

Published in: Data & Analytics
  • Be the first to comment

Real-time fraud detection in credit card transactions

  1. 1. Real-time fraud detection in credit card transactions Mariusz Rafało Warsaw, February 28th, 2017
  2. 2. About me…  Professional:  Co-founder and partner at Sorigo  Academic:  Lecturer at Warsaw School of Economics  Contact details:  mariusz.rafalo@sorigo.pl  http://www.linkedin.com/in/mrafalo
  3. 3. Agenda 1. What is the best approach to prevent specific type of fraud? 2. How to configure Big Data tools to detect frauds? 3. Is Big Data architecture flexible enough for fraud detection? 3
  4. 4. BATCH VS STREAM APPROACH 4
  5. 5. Key issue: reaction time transactions fraud occured t1 5
  6. 6. sympthoms Key issue: reaction time transactions t1 fraud occured t0 reaction time 6
  7. 7. sympthoms fraud detected Key issue: reaction time transactions t2t1 fraud occured fraud latency t0 reaction time 7
  8. 8. Reaction time Number of transactions Analysis window Analysis window Number of transactions Stream Batch Stream Batch short long wide narrow wide narrow high low high low Stream Batch Batch vs stream: summary 8
  9. 9. ARCHITECTURE 9
  10. 10. Data warehouse Scoring engine High level concept Reports Actions Models Rules Actions 10
  11. 11. DATA ANALYSIS 11
  12. 12. Data at first glace  284 807 unique observations (1-3 transactions per second)  30 variables + timestamp + target  Unbalanced dataset: 0.17% frauds (492 vs 284 315) 12
  13. 13. SMOTE: Synthetic Minority Over-sampling Technique xk n1 n2 n3 n4 n5 n6 x1, x2, x3, … xk … xn 13
  14. 14. SMOTE: Synthetic Minority Over-sampling Technique xk n1 n2 n3 n4 n5 n6 s1 s2 s3 s4 s5 s6 x1, x2, x3, … xk … xn 14
  15. 15. SMOTE: Synthetic Minority Over-sampling Technique xk n1 n2 n3 n4 n5 n6 s1 s2 s3 s4 s5 s6 s1, s2, s3 … sm 15
  16. 16. 16 fraud = 0 fraud = 1 fraud = 0 fraud = 1
  17. 17. 17 fraud = 0 fraud = 1 fraud = 0 fraud = 1
  18. 18. 18 fraud = 0 fraud = 1 fraud = 0 fraud = 1
  19. 19. Summary Supervised model: (decission tree) Unsupervised model: (k-means) Reference 1 0 Predict 1 839 43 0 106 902 Reference 1 0 Predict 1 815 74 0 130 871 Accuracy = 0.9212 Sensitivity= 0.8878 Specifity= 0.9545 Accuracy = 0.8921 Sensitivity= 0.8624 Specifity= 0.9217 19
  20. 20. TECHNOLOGIES 20
  21. 21. Tools used transactions 21
  22. 22. 22
  23. 23. SUMMARY 23
  24. 24. Conculsions  For most cases, batch processing is good enough  Flexibility decreases as the system is based on multiple technologies  Consider independent module design with 24
  25. 25. Thanks! Mariusz Rafało mariusz.rafalo@sorigo.pl

×