Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning logistics

2,122 views

Published on

The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.

Published in: Technology

Machine Learning logistics

  1. 1. © 2017 MapR Technologies 1 Machine Learning Model Management
  2. 2. © 2017 MapR Technologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Committer, PMC member, board member, ASF O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  3. 3. © 2017 MapR Technologies 3 Machine Learning Everywhere Image courtesy Mtell used with permission.Images © Ellen Friedman.
  4. 4. © 2017 MapR Technologies 4 Traditional View
  5. 5. © 2017 MapR Technologies 5 Traditional View: This isn’t the whole story
  6. 6. © 2017 MapR Technologies 6 90% of the effort in successful machine learning isn’t in the training or model dev… It’s the logistics
  7. 7. © 2017 MapR Technologies 7 Why? • Just getting the training data is hard – Which data? How to make it accessible? Multiple sources! – New kinds of observations force restarts – Requires a ton of domain knowledge • The myth of the unitary model – You can’t train just one – You will have dozens of models, likely hundreds or more – Handoff to new versions is tricky – You have to get run-time to be sure about which is better 
  8. 8. © 2017 MapR Technologies 8 What Machine Learning Tool is Best? • Most successful groups keep several “favorite” machine learning tools at hand – No single tool is best in every situation • The most important tool is a platform that supports logistics well – Don’t have to do everything at the application level – Lots of what matters can be handled at the platform level • A good design for the logistics can make a big difference
  9. 9. © 2017 MapR Technologies 9 Some Gotchas • Ops-oriented people will not “get it” regarding modeling subtleties • Data scientists will not “get it” regarding operational realities • Therefore, modelers have to deliver self-contained models • And, ops has to provide pre-wired structure
  10. 10. © 2017 MapR Technologies 10 Rendezvous Architecture Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  11. 11. © 2017 MapR Technologies 11 Rendezvous to the Rescue: Better ML Logistics • Stream-1st architecture is a powerful approach with surprisingly widespread advantages – Innovative technologies emerging to for streaming data • Microservices approach provides flexibility – Streaming supports microservices (if done right) • Containers remove surprises – Predictable environment for running models
  12. 12. © 2017 MapR Technologies 12 Rendezvous: Mainly for Decisioning Engines • Decisioning models – Looking for a “right answer” – Simpler than reinforcement learning • Examples include: – Fraud detection – Predictive analytics / market prediction – Churn prediction (as in telecommunications) – Yield optimization – Deep learning in form of speech or image recognition, in some cases
  13. 13. © 2017 MapR Technologies 13 Why Stream? Munich surfing wave Image © 2017 Ellen Friedman
  14. 14. © 2017 MapR Technologies 14 Stream-1st Architecture: Basis for MicroServices Stream instead of database as the shared “truth” POS 1..n Fraud detector Last card use Updater Card analytics Other card activity Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
  15. 15. © 2017 MapR Technologies 15 Streaming Isolates Services stream Data source Consumer
  16. 16. © 2017 MapR Technologies 16 With MapR, Geo-Distributed Data Appears Local stream stream Data source Consumer
  17. 17. © 2017 MapR Technologies 17 With MapR, Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  18. 18. © 2017 MapR Technologies 18 Features of Good Streaming • It is Persistent – Messages stick around for other consumers – Consumers don’t affect producers – Consumer doesn’t have to be online when message arrives • It is Performant – You don’t have to worry if a stream can keep up • It is Pervasive – It is there whenever you need it, no need to deploy anything – How much work is it to create a new file? Why harder for a stream?
  19. 19. © 2017 MapR Technologies 19 Stream transport supports microservices
  20. 20. © 2017 MapR Technologies 20 But we talked about decision engines?!?
  21. 21. © 2017 MapR Technologies 21 What We Ultimately Want request response Model
  22. 22. © 2017 MapR Technologies 22 But This Isn’t The Answer Model 1 request response Load balancer Model 2 Model 3
  23. 23. © 2017 MapR Technologies 23 First Try with Streams Input Model 1 Model 2 Model 3 request response ?
  24. 24. © 2017 MapR Technologies 24 First Rendezvous Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  25. 25. © 2017 MapR Technologies 25 Some Key Points • Note that all models see identical inputs • All models run in production setting • All models send scores to same stream • The rendezvous server decides which scores to ignore • Roll forward, roll back, correlated comparison are all now trivial
  26. 26. © 2017 MapR Technologies 26 Reality Check, Injecting External State Model 1 Model 2 Model 3 request Raw Add external data Input Database The world
  27. 27. © 2017 MapR Technologies 27 Recording Raw Data (as it really was) Input Scores Decoy Model 2 Model 3 Archive
  28. 28. © 2017 MapR Technologies 28 Quality & Reproducibility of Input Data is Important! • Recording raw-ish data is really a big deal – Data as seen by a model is worth gold – Data reconstructed later often has time-machine leaks – Databases were made for updates, streams are safer • Raw data is useful for non-ML cases as well (think flexibility) • Decoy model records training data as seen by models under development & evaluation
  29. 29. © 2017 MapR Technologies 29 Canary for Comparison Real model ∆ Result Canary Decoy Archive Input
  30. 30. © 2017 MapR Technologies 30 What Does the Canary Do? • The canary is a real model, but is very rarely updated • The canary results are almost never used for decisioning • The virtue of the canary is stability • Comparing to the canary results gives insight into new models
  31. 31. © 2017 MapR Technologies 31 Isolated Development With Stream Replication Model 1 Model 2 Model 3 request Raw Add external data Input Internal 1 Internal 2 Internal 3 The world Model 4 Raw New external data Input Internal 4 Production Development
  32. 32. © 2017 MapR Technologies 32 Scores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  33. 33. © 2017 MapR Technologies 33 ResultsRendezvousScores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  34. 34. © 2017 MapR Technologies 34 Metrics Metrics ResultsRendezvousScores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw
  35. 35. © 2017 MapR Technologies 35 Models in production live in the real world: Conditions may (will) change
  36. 36. © 2017 MapR Technologies 36 Not Such Bad Ideas • Keep models running “in the wings” – Don’t wait until conditions change to start building the next model – Keep new short-history models ready to roll, some graybeards as well • Hot hand-off – With rendezvous: just stop ignoring the new best model • Deploy a canary server – Keep an old model active as a reference – If it was 90% correct, difference with any better model should be small – Score distribution should be roughly constant
  37. 37. © 2017 MapR Technologies 37 Correlated Comparison of Score Quantiles
  38. 38. © 2017 MapR Technologies 38 Sample Model Cascade A B Fraud Fraud Clean Clean Fraud Assume that finding more frauds is all we care to do
  39. 39. © 2017 MapR Technologies 39 Some Data
  40. 40. © 2017 MapR Technologies 40 Consisting of Type 1
  41. 41. © 2017 MapR Technologies 41 And Type 2
  42. 42. © 2017 MapR Technologies 42 Sample Model Cascade A B Fraud Fraud Clean Clean Fraud Good with type 1 Good with type 2
  43. 43. © 2017 MapR Technologies 43 Baseline Conditions • Model A – 80% recall on type 1, 0% recall on type 2 (40% net) • Model B – 0% recall on type 1, 80% recall on type 2 (40% net) • Combined – No overlap in responses – 80% recall on type 1 (due to model A) – 80% recall on type 2 (due to model B) – 80% recall overall
  44. 44. © 2017 MapR Technologies 44 “New and Improved” • Suppose model A is “improved” – Before: 80% recall on type 1, 0% recall on type 2 (40% net) – After: 40% recall on type 1, 100% also on type 2 (70% net) • Combined after change – Huge overlap in responses – 40% recall on type 1 (due to model A) – 100% recall on type 2 (due to model A) – Model B has no effect – 70% recall overall
  45. 45. © 2017 MapR Technologies 45 Coupling Paradox
  46. 46. © 2017 MapR Technologies 46 Is There Any Hope? • This kind of problem is HARD – Do your competitor’s and your own marketing model couple? • Where possible, use ensembles instead of cascades – Not as simple as it sounds • Where possible, deploy composite models as units – Not as simple as it sounds • Always measure everything!
  47. 47. © 2017 MapR Technologies 47 How to Do Better • Data + the right question + domain knowledge matter! • Prioritize – put serious effort into infrastructure – DataOps requires more than just data science • Persist – use streams to keep data around • Measure – everything, and record it • Meta-analyze – understand and see what is happening • Containerize – make deployment repeatable, easy • Oh… don’t forget to do some machine learning, too
  48. 48. © 2017 MapR Technologies 48 Additional Resources O’Reilly report by Ted Dunning & Ellen Friedman © March 2017 Read free courtesy of MapR: https://mapr.com/geo-distribution-big-data-and-analytics/ O’Reilly book by Ted Dunning & Ellen Friedman © March 2016 Read free courtesy of MapR: https://mapr.com/streaming-architecture-using- apache-kafka-mapr-streams/
  49. 49. © 2017 MapR Technologies 49 Additional Resources O’Reilly book by Ted Dunning & Ellen Friedman © June 2014 Read free courtesy of MapR: https://mapr.com/practical-machine-learning- new-look-anomaly-detection/ O’Reilly book by Ellen Friedman & Ted Dunning © February 2014 Read free courtesy of MapR: https://mapr.com/practical-machine-learning/
  50. 50. © 2017 MapR Technologies 50 Additional Resources by Ellen Friedman 8 Aug 2017 on MapR blog: https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/ by Ted Dunning 13 Sept 2017 in InfoWorld: https://www.infoworld.com/article/3223 688/machine-learning/machine- learning-skills-for-software- engineers.html
  51. 51. © 2017 MapR Technologies 51 New book: Machine Learning Logistics Model Management in the Real World O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017 Pre-register for a free pdf copy of book when it becomes available 26th September, courtesy of MapR http://info.mapr.com/2017_Content_Machine-Learning- Logistics_eBook_Prereg_RegistrationPage.html Going to Strata Data NYC? Book will be released 26 Sept 2017: Visit MapR booth for free book signings or to talk about logistics
  52. 52. © 2017 MapR Technologies 52 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  53. 53. © 2017 MapR Technologies 53 Q&A @mapr tdunning@mapr.com ENGAGE WITH US @ Ted_Dunning

×