Practical Considerations for Continual Learning
Tom Diethe
tdiethe@amazon.com
Continual AI Meetup:
“Real-world Applications of Continual Learning”
April 28 2020
Continual Learning at Amazon
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 1 / 11
Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
How do we ensure that ...
we create robust and efficient AI systems?
we ensure that the privacy of customer
data is safeguarded?
customers are treated fairly by ML
algorithms?
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
Failure Modes
Unintentional failures: ML system produces a formally correct but completely unsafe
outcome
Outliers/anomalies
Dataset shift
Limited memory
Intentional failures: failure is caused by an active adversary attempting to subvert the
system to attain her goals, such as to:
misclassify the result
infer private training data
steal the underlying algorithm
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 3 / 11
FX (xt1 , . . . , xtn ) = FX (xt1+τ , . . . , xtn+τ )
for all τ, t1, . . . , tn
for all n ∈ N
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 4 / 11
Sagemaker
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 5 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
Bayesian Continual Learning [Nguyen 2018]
Given e.g. data in task t as Dt = x
(nt )
t , y
(nt )
t
Nt
n=1
, parameters θ (e.g. BLR, BNN, GP ...)
p(θ|D1:T ) ∝ p(θ)p(D1:T |θ)
= p(θ)
T
t−1
NT
n=1
p y
(nt )
t |θ, x
(nt )
t
= p(θ|D1:T−1)p(DT |θ).
Natural recursive algorithm!
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
Bayesian Continual Learning [Nguyen 2018]
Given e.g. data in task t as Dt = x
(nt )
t , y
(nt )
t
Nt
n=1
, parameters θ (e.g. BLR, BNN, GP ...)
p(θ|D1:T ) ∝ p(θ)p(D1:T |θ)
= p(θ)
T
t−1
NT
n=1
p y
(nt )
t |θ, x
(nt )
t
= p(θ|D1:T−1)p(DT |θ).
Natural recursive algorithm!
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
Engineering a Continual Learning System
Automating Data Retention Policies:
Sketcher/Compressor: when the data rate is too high
Joiner: when labels arrive late
Shared infrastructure: optimal use of space, like an OS cache
Automating Monitoring and Quality Control:
Data monitoring: dataset shift detection, anomaly detection
Prediction monitoring: monitor performance of models
Automating the ML Life-Cycle:
Trainer and HPO: store provenance, warm start training
Model policy engine: ensure re-training performed at right cadence
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 8 / 11
“Zero-Touch” Machine Learning
Model Policy
Engine
Streams
Model
Stream
Trainer
HPO
Data
Statistics
Data Monitoring
Anomaly Detection,
Distribution Shift
Measurement
Retrain
Rollback
Prediction
statistics
Prediction
Statistics
Prediction
Monitoring
Accuracy, Shift
Predictor
Business Metrics
Business Logic
Business metrics
Costs
Desired accuracy
Joiner
System State
DB
Diagnostic
Logs
Sketcher/
Sampler
Predictions
Predictions
Shared Infrastructure
Model DB
Training Data
Reservoir
Validation Data
Reservoir
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 9 / 11
Summary: Continual Learning
Continual Learning
Bayesian methods are a natural fit for continual learning
However it’s tricky to make them work well with deep learning methods
Many interesting methodological improvements happening, but most are still not
production ready
Engineering viewpoint is also required
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 10 / 11
Questions?
tdiethe@amazon.com
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 11 / 11

Practical Considerations for Continual Learning

  • 1.
    Practical Considerations forContinual Learning Tom Diethe tdiethe@amazon.com Continual AI Meetup: “Real-world Applications of Continual Learning” April 28 2020
  • 2.
    Continual Learning atAmazon Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 1 / 11
  • 3.
    Alexa AI What isAlexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
  • 4.
    Alexa AI What isAlexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences How do we ensure that ... we create robust and efficient AI systems? we ensure that the privacy of customer data is safeguarded? customers are treated fairly by ML algorithms? Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
  • 5.
    Failure Modes Unintentional failures:ML system produces a formally correct but completely unsafe outcome Outliers/anomalies Dataset shift Limited memory Intentional failures: failure is caused by an active adversary attempting to subvert the system to attain her goals, such as to: misclassify the result infer private training data steal the underlying algorithm Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 3 / 11
  • 6.
    FX (xt1 ,. . . , xtn ) = FX (xt1+τ , . . . , xtn+τ ) for all τ, t1, . . . , tn for all n ∈ N Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 4 / 11
  • 7.
    Sagemaker Tom Diethe (Amazon)Practical Considerations for Continual Learning April 28 2020 5 / 11
  • 8.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 9.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 10.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 11.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 12.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 13.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 14.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 15.
    Robustness & Transparencyvia Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
  • 16.
    Bayesian Continual Learning[Nguyen 2018] Given e.g. data in task t as Dt = x (nt ) t , y (nt ) t Nt n=1 , parameters θ (e.g. BLR, BNN, GP ...) p(θ|D1:T ) ∝ p(θ)p(D1:T |θ) = p(θ) T t−1 NT n=1 p y (nt ) t |θ, x (nt ) t = p(θ|D1:T−1)p(DT |θ). Natural recursive algorithm! Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
  • 17.
    Bayesian Continual Learning[Nguyen 2018] Given e.g. data in task t as Dt = x (nt ) t , y (nt ) t Nt n=1 , parameters θ (e.g. BLR, BNN, GP ...) p(θ|D1:T ) ∝ p(θ)p(D1:T |θ) = p(θ) T t−1 NT n=1 p y (nt ) t |θ, x (nt ) t = p(θ|D1:T−1)p(DT |θ). Natural recursive algorithm! Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
  • 18.
    Engineering a ContinualLearning System Automating Data Retention Policies: Sketcher/Compressor: when the data rate is too high Joiner: when labels arrive late Shared infrastructure: optimal use of space, like an OS cache Automating Monitoring and Quality Control: Data monitoring: dataset shift detection, anomaly detection Prediction monitoring: monitor performance of models Automating the ML Life-Cycle: Trainer and HPO: store provenance, warm start training Model policy engine: ensure re-training performed at right cadence Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 8 / 11
  • 19.
    “Zero-Touch” Machine Learning ModelPolicy Engine Streams Model Stream Trainer HPO Data Statistics Data Monitoring Anomaly Detection, Distribution Shift Measurement Retrain Rollback Prediction statistics Prediction Statistics Prediction Monitoring Accuracy, Shift Predictor Business Metrics Business Logic Business metrics Costs Desired accuracy Joiner System State DB Diagnostic Logs Sketcher/ Sampler Predictions Predictions Shared Infrastructure Model DB Training Data Reservoir Validation Data Reservoir Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 9 / 11
  • 20.
    Summary: Continual Learning ContinualLearning Bayesian methods are a natural fit for continual learning However it’s tricky to make them work well with deep learning methods Many interesting methodological improvements happening, but most are still not production ready Engineering viewpoint is also required Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 10 / 11
  • 21.
    Questions? tdiethe@amazon.com Tom Diethe (Amazon)Practical Considerations for Continual Learning April 28 2020 11 / 11