Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recommendations for Building Machine Learning Software

3,925 views

Published on

Talk from Data by the Bay 2016 on some recommendations for building machine learning software and putting machine learning into applications.

Published in: Technology
  • Be the first to comment

Recommendations for Building Machine Learning Software

  1. 1. 11 Recommendations for Building Machine Learning Software Justin Basilico Page Algorithms Engineering May 19, 2016 @JustinBasilico
  2. 2. 22 Introduction
  3. 3. 3 Change of focus 2006 2016
  4. 4. 4 Netflix Scale  > 81M members  > 190 countries  > 1000 device types  > 3B hours/month  > 36% of peak US downstream traffic
  5. 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  6. 6. 6 Everything is a Recommendation Rows Ranking Over 80% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  7. 7. 7 Machine Learning Approach Problem Data AlgorithmModel Metrics
  8. 8. 8 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  9. 9. 9 Design Considerations Recommendations • Personal • Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  10. 10. 10 Software Stack http://techblog.netflix.com
  11. 11. 1111 Recommendations
  12. 12. 12 Be flexible about where and when computation happens Recommendation 1
  13. 13. 13 System Architecture  Offline: Process data  Batch learning  Nearline: Process events  Model evaluation  Online learning  Asynchronous  Online: Process requests  Real-time Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  14. 14. 14 Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X≈UVt X sij>t
  15. 15. 15 Design application software for experimentation Recommendation 2
  16. 16. 16 Example development process Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post- processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  17. 17. 17 Solution: Share and lean towards production  Developing machine learning is iterative  Need a short pipeline to rapidly try ideas  Want to see output of complete system  So make the application easy to experiment with  Share components between online, nearline, and offline  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten-path
  18. 18. 18 Shared Engine Avoid dual implementations Experiment code Production code ProductionExperiment • Models • Features • Algorithms • …
  19. 19. 19 Make algorithms extensible and modular Recommendation 3
  20. 20. 20 Make algorithms and models extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  21. 21. 21 Provide building blocks  Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Loss, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  22. 22. 22 Example: Tailoring Random Forests Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble
  23. 23. 23 Describe your input and output transformations with your model Recommendation 4
  24. 24. 24 Application Putting learning in an application Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk Application or model code?
  25. 25. 25 Example: Simple ranking system  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } Ranker Scorer Features Linear function Feature transformations
  26. 26. 26 Maximize out a single machine before distributing your algorithms Recommendation 5
  27. 27. 27 Problem: Your great new algorithm doesn’t scale  Want to run your algorithm on larger data  Temptation to go distributed  Spark/Hadoop/etc seem to make it easy  But building distributed versions of non-trivial ML algorithms is hard  Often means changing the algorithm or making lots of approximations  So try to squeeze as much out of a single machine first  Have a lot more communication bandwidth via memory than network  You will be surprised how far one machine can go  Example: Amazon announced today an X1 instance type with 2TB memory and 128 virtual CPUs
  28. 28. 28 How?  Profile your code and think about memory cache layout  Small changes can have a big impact  Example: Transposing a matrix can drop computation from 100ms to 3ms  Go multicore  Algorithms like HogWild for SGD-type optimization can make this very easy  Use specialized resources like GPU (or TPU?)  Only go distributed once you’ve optimized on these dimensions (often you won’t need to)
  29. 29. 29 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes  Mesos, Spark, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  30. 30. 30 Don’t just rely on metrics for testing Recommendation 6
  31. 31. 31 Machine Learning and Testing  Temptation: Use validation metrics to test software  When things work and metrics go up this seems great  When metrics don’t improve was it the  code  data  metric  idea  …?
  32. 32. 32 Reality of Testing  Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system: Just unit testing is not enough  At a minimum, compare output for unexpected changes across versions
  33. 33. 3333 Conclusions
  34. 34. 34 Two ways to solve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  35. 35. 35 Take-aways for building machine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of application where you are placing learning  Design your algorithms to be modular  Optimize how your code runs on a single machine before going distributed  Testing can be hard but is worthwhile
  36. 36. 36 Thank You Justin Basilico jbasilico@netflix.com @JustinBasilico We’re hiring

×