Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

David Talby, SVP Engineering, Atigeo at MLconf ATL - 9/18/15

804 views

Published on

Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch, is relatively rare (one in millions for finance or ecommerce, for example), and it may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.

This talk will cover, via live demo & code walk-through, the key lessons we’ve learned while building such real-world software systems over the past few years. We’ll incrementally build a hybrid machine learned model for fraud detection, combining features from natural language processing, topic modeling, time series analysis, link analysis, heuristic rules & anomaly detection. We’ll be looking for fraud signals in public email datasets, using Python & popular open-source libraries for data science and Apache Spark as the compute engine for scalable parallel processing.

Published in: Technology
  • Be the first to comment

David Talby, SVP Engineering, Atigeo at MLconf ATL - 9/18/15

  1. 1. 11 Online fraud detection: A reference architecture for adversarial learning David Talby @davidtalby SVP Engineering, Atigeo
  2. 2. 2 Why Semi-Supervised Learning & Feedback? 50+Schemes (and counting) 99.9999%‘Good’ messages 6+Months per case
  3. 3. 3 Why Hybrid Analytics? Ignore more rules Unusual timing of eventsUnusual personal network Teamwork & scale Think & talk differently
  4. 4. 4 Why Hybrid Analytics? Rule Inference Time Series Analysis Link Analysis Ensemble Learning Natural Language
  5. 5. 5 Stream processing Kafka Email Stream Account transactions Stream Email NLP Features People graph Transactions time series
  6. 6. 6 User Analysis Iteration Email NLP Features User graph Transactions time series Graph Features Time Series Features NLP Features Agent Feedback Train/TestClassifier
  7. 7. 77 Thank you! Source code available on request: David.Talby@atigeo.com @davidtalby
  8. 8. © 2015 Atigeo, Corporation. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  9. 9. 9
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. 16
  17. 17. 17

×