Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical Considerations for Continual Learning

114 views

Published on

Presented at the Continual AI Meetup: "Real World Applications of Continual Learning", April 29 2020.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Practical Considerations for Continual Learning

  1. 1. Practical Considerations for Continual Learning Tom Diethe tdiethe@amazon.com Continual AI Meetup: “Real-world Applications of Continual Learning” April 28 2020
  2. 2. Continual Learning at Amazon Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 1 / 11
  3. 3. Alexa AI What is Alexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
  4. 4. Alexa AI What is Alexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences How do we ensure that ... we create robust and efficient AI systems? we ensure that the privacy of customer data is safeguarded? customers are treated fairly by ML algorithms? Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
  5. 5. Failure Modes Unintentional failures: ML system produces a formally correct but completely unsafe outcome Outliers/anomalies Dataset shift Limited memory Intentional failures: failure is caused by an active adversary attempting to subvert the system to attain her goals, such as to: misclassify the result infer private training data steal the underlying algorithm Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 3 / 11
  6. 6. FX (xt1 , . . . , xtn ) = FX (xt1+τ , . . . , xtn+τ ) for all τ, t1, . . . , tn for all n ∈ N Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 4 / 11
  7. 7. Sagemaker Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 5 / 11
  8. 8. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  9. 9. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  10. 10. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  11. 11. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  12. 12. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  13. 13. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  14. 14. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  15. 15. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  16. 16. Bayesian Continual Learning [Nguyen 2018] Given e.g. data in task t as Dt = x (nt ) t , y (nt ) t Nt n=1 , parameters θ (e.g. BLR, BNN, GP ...) p(θ|D1:T ) ∝ p(θ)p(D1:T |θ) = p(θ) T t−1 NT n=1 p y (nt ) t |θ, x (nt ) t = p(θ|D1:T−1)p(DT |θ). Natural recursive algorithm! Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
  17. 17. Bayesian Continual Learning [Nguyen 2018] Given e.g. data in task t as Dt = x (nt ) t , y (nt ) t Nt n=1 , parameters θ (e.g. BLR, BNN, GP ...) p(θ|D1:T ) ∝ p(θ)p(D1:T |θ) = p(θ) T t−1 NT n=1 p y (nt ) t |θ, x (nt ) t = p(θ|D1:T−1)p(DT |θ). Natural recursive algorithm! Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
  18. 18. Engineering a Continual Learning System Automating Data Retention Policies: Sketcher/Compressor: when the data rate is too high Joiner: when labels arrive late Shared infrastructure: optimal use of space, like an OS cache Automating Monitoring and Quality Control: Data monitoring: dataset shift detection, anomaly detection Prediction monitoring: monitor performance of models Automating the ML Life-Cycle: Trainer and HPO: store provenance, warm start training Model policy engine: ensure re-training performed at right cadence Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 8 / 11
  19. 19. “Zero-Touch” Machine Learning Model Policy Engine Streams Model Stream Trainer HPO Data Statistics Data Monitoring Anomaly Detection, Distribution Shift Measurement Retrain Rollback Prediction statistics Prediction Statistics Prediction Monitoring Accuracy, Shift Predictor Business Metrics Business Logic Business metrics Costs Desired accuracy Joiner System State DB Diagnostic Logs Sketcher/ Sampler Predictions Predictions Shared Infrastructure Model DB Training Data Reservoir Validation Data Reservoir Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 9 / 11
  20. 20. Summary: Continual Learning Continual Learning Bayesian methods are a natural fit for continual learning However it’s tricky to make them work well with deep learning methods Many interesting methodological improvements happening, but most are still not production ready Engineering viewpoint is also required Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 10 / 11
  21. 21. Questions? tdiethe@amazon.com Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 11 / 11

×