Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fraud Analytics

892 views

Published on

https://www.youtube.com/watch?v=eXtWRkfMisM

Durante el 2012 el nivel de fraude en tarjeta de crédito llego a 11.3 billones de dólares, un aumento de casi un 15% comparado con el 2011, esto demuestra el problema que el fraude representa no solo a las instituciones financieras sino también para la sociedad. Tradicionalmente la prevención del fraude consistía en proteger físicamente la infraestructura, sin embargo con cada vez más medios y canales de pago, la información financiera se ha vuelto cada vez más susceptible a ser hurtada. La siguiente opción para prevenir y controlar el fraude consiste en determinar si una transacción está siendo realizada por el cliente de acuerdo con sus patrones históricos de comportamiento. Este es el enfoque de Fraud Analytics.

En esta presentación se mostrara cómo es posible por medio de Fraud Analytics, determinar la probabilidad que una transacción sea o no realizada por el cliente, utilizando la información de compra de los clientes, sus interacciones con la entidad financiera, y por medio de análisis de redes sociales. Adicionalmente, se discutirán y compararan los resultados de las comúnmente utilizadas reglas de decisión y modelos avanzados de Machine Learning.

Published in: Data & Analytics
  • Be the first to comment

Fraud Analytics

  1. 1. Easy Solutions
  2. 2. About us Industry recognitionA leading global provider of electronic fraud prevention for financial institutions and enterprise customers 280+ customers In 26 countries 75 million Users protected 22+ billion Online connections monitored in last 12 months 2
  3. 3. Some of our Customers 3
  4. 4. Our Approach:Total Fraud Protection® 4
  5. 5. Fraud Analytics Alejandro Correa Bahnsen, PhD Data Scientist
  6. 6. About me • PhD in Machine Learning at Luxembourg University • Data Scientist at Easy Solutions • Worked for +8 years as a data scientist at GE Money, Scotiabank and SIX Financial Services • Bachelor and Master in Industrial Engineering • Organizer of Data Science Luxembourg and recently of Big Data Science Bogota 6
  7. 7. ~1Billion USD ~171Millions USD ~3Billions USD Does fraud affect me? 7
  8. 8. € - € 100 € 200 € 300 € 400 € 500 € 600 € 700 € 800 2007 2008 2009 2010 2011 2012 Europe fraud evolution Card not present (Internet) transactions 8
  9. 9. $- $500 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 US fraud evolution Card not present (Internet) transactions 9
  10. 10. 1.10% 1.30% 1.10% 0.90% 0.88% 0.87% 0.09% 0.08% 0.08% 0.06% 0.05% 0.05% 2006 2007 2008 2009 2010 2011 Card Present vs. Card Not Present Fraud Rates Card Not Present Card Present 23.3 26.8 30.0 33.3 35.0 2009 2010 2011 2012 2013 US Online Banking Billions of Transactions 1.2 3.0 5.6 9.4 14.0 2009 2010 2011 2012 2013 US Mobile Banking Billions of Transactions 10
  11. 11. There is a need for better fraud detection strategies 11
  12. 12. 12
  13. 13. BigData? 13
  14. 14. “War is ninety percent information” • Napoleon Bonaparte 14
  15. 15. 15
  16. 16. 16
  17. 17. Big data (Data Science) is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... 17
  18. 18. 18
  19. 19. BigData Analytics 19
  20. 20. BigData Analytics is the use of methods and tools of Machine Learning and Artificial Intelligence with the objective making data- driven decisions 20
  21. 21. Fraud detection and prevention 21
  22. 22. Estimate the probability of a transaction being fraud based on analyzing customer patterns and recent fraudulent behavior Issues when constructing a fraud detection system: • Skewness of the data • Cost-sensitivity • Short time response of the system • Dimensionality of the search space • Feature preprocessing • Model selection 22 Credit card fraud detection
  23. 23. Network Fraud?? 23
  24. 24. • Larger European card processing company • 2012 & 2013 card present transactions • 20MM Transactions • 40,000 Frauds • 0.467% Fraud rate • ~ 2MM EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train Data
  25. 25. • “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.” • Example of rules • More than 4 ATM transactions in one hour? • More than 2 transactions in 5 minutes? • Magnetic stripe transaction then internet transaction? 25 If-Then rules (Expert rules)
  26. 26. 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 26 If-Then rules (Expert rules)
  27. 27. Credit card fraud detection is a cost-sensitive problem. As the cost due to a false positive is different than the cost of a false negative. • False positives: When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the financial institution. • False negatives: Failing to detect a fraud, the amount of that transaction is lost. Moreover, it is not enough to assume a constant cost difference between false positives and false negatives, as the amount of the transactions varies quite significantly. 27 Financial evaluation
  28. 28. Cost matrix 𝐶𝑜𝑠𝑡 𝑓 𝑆 = 𝑖=1 𝑁 𝑦𝑖 𝑐𝑖 𝐶 𝑇𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖 + 1 − 𝑦𝑖 𝑐𝑖 𝐶 𝐹𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖 28 Actual Positive 𝒚𝒊 = 𝟏 Actual Negative 𝒚𝒊 = 𝟎 Predicted Positive 𝒄𝒊 = 𝟏 𝐶 𝑇𝑃 𝑖 = 𝐶 𝑎 𝐶 𝐹𝑃 𝑖 = 𝐶 𝑎 Predicted Negative 𝒄𝒊 = 𝟎 𝐶 𝐹𝑁 𝑖 = 𝐴𝑚𝑡𝑖 𝐶 𝑇𝑁 𝑖 = 0 Financial evaluation
  29. 29. 1.24 € 1.94 € Cost Total Losses 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 29 If-Then rules (Expert rules)
  30. 30. Fraud Analytics 30
  31. 31. Raw features 31 Attribute name Description Transaction ID Transaction identification number Time Date and time of the transaction Account number Identification number of the customer Card number Identification of the credit card Transaction type ie. Internet, ATM, POS, ... Entry mode ie. Chip and pin, magnetic stripe, ... Amount Amount of the transaction in Euros Merchant code Identification of the merchant type Merchant group Merchant group identification Country Country of trx Country 2 Country of residence Type of card ie. Visa debit, Mastercard, American Express... Gender Gender of the card holder Age Card holder age Bank Issuer bank of the card Features
  32. 32. Transaction aggregation strategy 32 Raw Features TrxId Time Type Country Amt 1 1/1 18:20 POS Lux 250 2 1/1 20:35 POS Lux 400 3 1/1 22:30 ATM Lux 250 4 2/1 00:50 POS Ger 50 5 2/1 19:18 POS Ger 100 6 2/1 23:45 POS Ger 150 7 3/1 06:00 POS Lux 10 Aggregated Features No Trx last 24h Amt last 24h No Trx last 24h same type and country Amt last 24h same type and country 0 0 0 0 1 250 1 250 2 650 0 0 3 900 0 0 3 700 1 50 2 150 2 150 3 400 0 0 Features
  33. 33. When is a customer expected to make a new transaction? Considering a von Mises distribution with a period of 24 hours such that 𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎 = 𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇) 2𝜋𝐼0 𝜎 where 𝝁 is the mean, 𝝈 is the standard deviation, and 𝑰 𝟎 is the Bessel function 33 Periodic features
  34. 34. 34 Periodic features
  35. 35. Fraud Analytics is the use of statistical and mathematical techniques (Machine Learning) to discover patterns in data in order to make predictions Fraud Analytics
  36. 36. Amountofthetransaction Number of transactions last day Normal Transaction Fraud 36
  37. 37. 37 Amountofthetransaction Number of transactions last day Normal Transaction Fraud
  38. 38. 38 Amount of the transaction Normal Transaction Fraud Number of transactions last dayNumber of ATM transactions last week
  39. 39. Fraud Analytics Algorithms Fuzzy Rules Neural Nets Naive Bayes Random Forests Cost-Sensitive Random Patches Decision Trees 39
  40. 40. 0% 20% 40% 60% 80% 100% Expert Rules Fuzzy Rules Neural Nets Naïve Bayes Random Forests CS Random Patches % Savings % Frauds 40
  41. 41. • Fraud Analytics (ML) models are significantly better than expert rules • Models should be evaluated taking into account real financial costs of the application • Algorithms should be developed to incorporate those financial costs Conclusions 41
  42. 42. Questions? Alejandro Correa Bahnsen, PhD Data Scientist acorrea@Easysol.net 42

×