Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning for Fraud Detection

16,283 views

Published on

Deep Learning for Fraud Detection

Published in: Technology
  • We called it "operation mind control" - as we discovered a simple mind game that makes a girl become obsessed with you. (Aand it works even if you're not her type or she's already dating someone else) Here's how we figured it out... ◆◆◆ http://t.cn/AiurDrZp
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • My special guest's 3-Step "No Product Funnel" can be duplicated to start earning a significant income online. ●●● https://bit.ly/2kS5a5J
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Want to earn $4000/m? Of course you do. Learn how when you join today! ■■■ http://ishbv.com/ezpayjobs/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Learning for Fraud Detection

  1. 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  2. 2. © 2014 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning
  3. 3. © 2014 MapR Technologies 3 Goals for Today • Explore the state of the art for deep-learning and fraud detection • Separate at least some of the wheat from the chaff • Provide some realistic guidance for getting results
  4. 4. © 2014 MapR Technologies 4 Goals for Today • Explore the state of the art for deep-learning and fraud detection • Separate at least some of the wheat from the chaff • Provide some realistic guidance for getting results • Play with cool stuff !
  5. 5. © 2014 MapR Technologies 5 Agenda • Motivation • What are neural networks and deep learning? • It can be simpler than you think • But, no free lunch / you get what you pay / other clever aphorism • Some experiments • Where to go from here
  6. 6. © 2014 MapR Technologies 6 Motivation For Advanced Modeling in Fraud • Neural networks have completely dominated credit card fraud detection since late 80’s – Random forest, tree ensembles often used in other kinds of fraud and churn models • The reason is rule-based systems simply don’t work – Well, they do work at first – Fraudsters change tactics, you add rules, interaction mayhem ensues • And learning algorithms really do work – Fraudsters change tactics, you add features and retrain
  7. 7. © 2014 MapR Technologies 7 So learning is good
  8. 8. © 2014 MapR Technologies 8 So learning is good But good learning is hard
  9. 9. © 2014 MapR Technologies 9 So learning is good But good learning is hard And finding good features is really hard
  10. 10. © 2014 MapR Technologies 10 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • AVS or CVV2 mismatch
  11. 11. © 2014 MapR Technologies 11 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or CVV2 mismatch
  12. 12. © 2014 MapR Technologies 12 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or Card Verification Value mismatch
  13. 13. © 2014 MapR Technologies 13 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or Card Verification Value mismatch • Unusual region for card • Unusual time-of-day relative to history • Magstripe use if chip available • (hundreds more)
  14. 14. © 2014 MapR Technologies 14 Sequence Based Features • Plausible pattern matching (rent a car, pay for gas at airport) • Probe transactions (gas in wrong place, pizza, big charge) • Previous transaction at compromised merchant • Card velocity
  15. 15. © 2014 MapR Technologies 15 Key Problems • Good guys need data … that means that fraudsters get first chance at bat • Good guys are careful and test systems before releasing • Bad guys have many low-risk transactions and can change methods quickly • In some areas, fraudster adapt techniques in hours
  16. 16. © 2014 MapR Technologies 16 Making up features is easy Finding features that add real lift is very hard
  17. 17. © 2014 MapR Technologies 17 What are neural networks and deep learning? • Start simple … imagine we have 20 features, 0 or 1 – Let’s yell “Fraud” if any of the features is a 1 – Houston, we have a model • But this model isn’t any better than a rule • Also doesn’t have any interesting Greek letters
  18. 18. © 2014 MapR Technologies 18 Real-world Intrudes • We assumed all features are equally good – What if some are kind of poor or weak? • Can we weight different features more or less? – Can we learn these weights from data?
  19. 19. © 2014 MapR Technologies 19 Real-world Intrudes • We assumed all features are equally good – What if some are kind of poor or weak? • Can we weight different features more or less? – Can we learn these weights from data?
  20. 20. © 2014 MapR Technologies 20 Learning Works • Yes. We can learn these models • How we measure error is important • We must have good features • Even good features may need transformation – Take logs of times and monetary values – Subtract means, scale, bin values
  21. 21. © 2014 MapR Technologies 21 Not Good Enough • We need combinations of models • Simple linear combinations aren’t subtle enough • Enter multi-level models – Can we learn a model that uses combinations of inputs – Where each of those combinations is a model that we learn?
  22. 22. © 2014 MapR Technologies 22 Yes, Virginia, There IS a Santa Claus Each circle is a sum and a (soft) threshold Arrows are multiplication by a learned weight
  23. 23. © 2014 MapR Technologies 23 Errors on Output Can Propagate Each circle is sends error to each arrow Arrows weight back- propagating errors Inputs Hidden layer
  24. 24. © 2014 MapR Technologies 24 Success! Triumph! World domination!
  25. 25. © 2014 MapR Technologies 25 World domination! With some reservations because features are hard
  26. 26. © 2014 MapR Technologies 26 Turtles All the Way Down – We Wish • This learning works well for just a few layers • This is still a big deal … – with cool features, we can build real systems • With many layers, the learning no longer converges • Well … until recently
  27. 27. © 2014 MapR Technologies 27 Model Learning in an Ideal World • If we could just learn the features – Maybe unsupervised, maybe supervised – And at the same time learn the model • Presumably we could build models quicker • And more easily • And we wouldn’t have to dirty our minds with pedestrian domain knowledge
  28. 28. © 2014 MapR Technologies 28 Example 1 – (not very) Deep Auto-encoder • Let’s take an example where we can learn features • Data is EKG traces • We want to find anomalies – No supervised training
  29. 29. © 2014 MapR Technologies 29 Spot the Anomaly Anomaly?
  30. 30. © 2014 MapR Technologies 30 Maybe not!
  31. 31. © 2014 MapR Technologies 31 Where’s Waldo? This is the real anomaly
  32. 32. © 2014 MapR Technologies 32 Normal Isn’t Just Normal • What we want is a model of what is normal • What doesn’t fit the model is the anomaly • For simple signals, the model can be simple … • The real world is rarely so accommodating x ~ m(t)+ N(0,e)
  33. 33. © 2014 MapR Technologies 33 We Do Windows
  34. 34. © 2014 MapR Technologies 34 We Do Windows
  35. 35. © 2014 MapR Technologies 35 We Do Windows
  36. 36. © 2014 MapR Technologies 36 We Do Windows
  37. 37. © 2014 MapR Technologies 37 We Do Windows
  38. 38. © 2014 MapR Technologies 38 We Do Windows
  39. 39. © 2014 MapR Technologies 39 We Do Windows
  40. 40. © 2014 MapR Technologies 40 We Do Windows
  41. 41. © 2014 MapR Technologies 41 We Do Windows
  42. 42. © 2014 MapR Technologies 42 Windows on the World • The set of windowed signals is a nice model of our original signal • Clustering can find the prototypes – Fancier techniques available using sparse coding • The result is a dictionary of shapes • New signals can be encoded by shifting, scaling and adding shapes from the dictionary
  43. 43. © 2014 MapR Technologies 43 Most Common Shapes (for EKG)
  44. 44. © 2014 MapR Technologies 44 Reconstructed signal Original signal Reconstructed signal Reconstruction error < 1 bit / sample
  45. 45. © 2014 MapR Technologies 45 An Anomaly Original technique for finding 1-d anomaly works against reconstruction error
  46. 46. © 2014 MapR Technologies 46 Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do.
  47. 47. © 2014 MapR Technologies 47 A Different Kind of Anomaly
  48. 48. © 2014 MapR Technologies 48 Some k-means Caveats • But Eamonn Keogh says that k-means can’t work on time-series • That is silly … and kind of correct, k-means does have limits – Other kinds of auto-encoders are much more powerful • More fun and code demos at – https://github.com/tdunning/k-means-auto-encoder http://www.cs.ucr.edu/~eamonn/meaningless.pdf
  49. 49. © 2014 MapR Technologies 49 The Limits of Clustering as Auto-encoder • Clustering is like trying to tile your sample distribution • Can be used to approximate a signal • Filling d dimensional region with k clusters should give • If d is large, this is no good e » 1/ kd
  50. 50. © 2014 MapR Technologies 50 0 500 1000 1500 2000 −2−1012 Time series training data (first 2000 samples) Time Test data Reconstruction Error
  51. 51. © 2014 MapR Technologies 51 0 500 1000 1500 2000 0.000.050.100.15 Reconstruction error for time−series data Centroids MAVError Training data Held−out data
  52. 52. © 2014 MapR Technologies 52 Moral For Auto-encoders • The simplest auto-encoders can be good models • For more complex spaces/signals, more elaborate models may be required – Winner take (absolutely) all may be problematic – In particular, models that allow sparse linear combination may be better • Consider deep learning, recurrent networks, denoising
  53. 53. © 2014 MapR Technologies 53 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn Input For normalized cluster centroids, dot-product and distance are equivalent
  54. 54. © 2014 MapR Technologies 54 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn Input Winner takes all with k-means
  55. 55. © 2014 MapR Technologies 55 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction Dot-product scales centroid to reconstruct
  56. 56. © 2014 MapR Technologies 56 AKA - Neural Network x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction
  57. 57. © 2014 MapR Technologies 57 What If … We Had More Layers? ... ... ... ... ... ... ... ... ... ... ... ... ... ... A B A'
  58. 58. © 2014 MapR Technologies 58 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  59. 59. © 2014 MapR Technologies 59 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  60. 60. © 2014 MapR Technologies 60 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  61. 61. © 2014 MapR Technologies 61 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning! • Well, almost
  62. 62. © 2014 MapR Technologies 62 The Point of Deep Learning • It isn’t just many hidden layers in a neural network • The goal is to eliminate feature engineering by learning features as well as the classifier
  63. 63. © 2014 MapR Technologies 63 Experiment 3 – Card Velocity • Most features so far are inherent in the data • Few are true sequence features • Card velocity is a pure combination – Starting point can be anywhere – The issue is where the next point is relative to starting point
  64. 64. © 2014 MapR Technologies 64 Card Velocity Non-fraud steps are reasonable in terms of distance and time Fraudulent use of card by multiple attackers results in big, fast jumps
  65. 65. © 2014 MapR Technologies 65 Synthetic Data Example • Generate random point • Take four small steps • If fraud, second step can be large • Result is five positions, each in 3-d on surface of a sphere – Data shape is N x (5 x 3) • Add secondary features containing step size … N x 4
  66. 66. © 2014 MapR Technologies 66 The Truth is Out There • With the right feature (step-size), it is trivial to spot the fraud • Here we show the step size between positions • Fraud cases take a big jump that others don’t • But they can be anywhere
  67. 67. © 2014 MapR Technologies 67 But Dimensionality Bites Hard • With the step-size feature, learning succeeds instantly with the simplest models and gets nearly perfect accuracy • Without the step-size feature, learning with TensorFlow gets modest accuracy after substantial learning cost (work in progress, could do better with lots more tuning) • The problem is that there are two many combinations of 15 variables, we need a very specific combination of three pair-wise diffs combined non-linearly into a distance
  68. 68. © 2014 MapR Technologies 68 104 105 106 1 0 0.2 0.4 0.6 0.8 Data Size AUCorPrecision AUC Precision
  69. 69. © 2014 MapR Technologies 69 We have a bona fide revolution But old tricks still pay
  70. 70. © 2014 MapR Technologies 70 Greenfield Problem Landscape
  71. 71. © 2014 MapR Technologies 71 Mature Problem Landscape
  72. 72. © 2014 MapR Technologies 72 Summary • There is too much to say in 40 minutes, let’s talk some more at the MapR booth • Deep learning, especially with systems like TensorFlow have huge promise • Deep learning trades learning architecture engineering for feature engineering • There are powerful middle grounds
  73. 73. © 2014 MapR Technologies 73
  74. 74. © 2014 MapR Technologies 74 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  75. 75. © 2014 MapR Technologies 75 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
  76. 76. © 2014 MapR Technologies 76 Thank You!
  77. 77. © 2014 MapR Technologies 77 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies

×