Unlike computers, physical engineering systems (such as data center cooling or wireless network control) do not get faster with time. This is arguably one of the main reasons why recent beautiful advances in deep reinforcement learning (RL) stay mostly in the realm of simulated worlds and do not immediately translate to practical success in the real world. In order to make the best use of the small data sets these systems generate, we develop data-driven neural simulators to model the system and apply model-based control to optimize them. In this talk I will present the first step of this research agenda, a new versatile system modelling tool called deep autoregressive mixture density net (DARMDN – pronounced darm-dee-en). We argue that the performance of model-based reinforcement learning is partly limited by the approximation capacity of the currently used conditional density models and show how DARMDN alleviates these limitations. The model, combined with a random shooting controller, establishes a new state of the art on the popular Acrobot benchmark. Our most interesting and counter-intuitive finding is that the “sincos” Acrobot system which requires no multimodal posterior predictives, can be solved with a deterministic model, but only if it is trained as a probabilistic model. A deterministic model that is trained to minimize MSE leads to prediction error accumulation.
DARMDN: Deep autoregressive mixture density nets for dynamical system modelling
1. HUAWEI TECHNOLOGIES CO., LTD.
www.huawei.com
DARMDN: Deep autoregressive mixture density nets for
dynamical system modelling
— Balazs Kegl, Gabriel Hurtado, Albert Thomas
for Noah's Ark Research Lab, Paris
2. HUAWEI TECHNOLOGIES CO., LTD. Page 2
Develop neural simulators
trained on short system logs
Objective
B. Kegl / Huawei Research France
3. HUAWEI TECHNOLOGIES CO., LTD. Page 3
Why?
Automate engineering systems
› Data center cooling
› Wireless parameter tuning
› Wifi setup
Predictive maintenance
› Copper and optical end-user devices
› Wireless network devices
› Data center servers
We believe these are only the
tip of the iceberg
B. Kegl / Huawei Research France
6. HUAWEI TECHNOLOGIES CO., LTD. Page 6
Physical systems do not get faster with time
System access is tightly controlled by engineers whose responsibility is to
keep the systems running
Why is it hard?
BU
Engineer
System
𝒂 𝒕
𝒐 𝒕, 𝒓𝒕
Micro-data!!! reinforcement learning
7. HUAWEI TECHNOLOGIES CO., LTD. Page 7
Generative time-series predictors (= neural system models)
› Sample efficient: can be learned on a couple of thousands of time steps
› Introspective and well-calibrated: honest about their own uncertainty
Control and exploration using system models
› Basic model predictive control (random shooting)
› Active sampling and exploration
› Learn the control agent
› Multi-agent control and transfer learning
Landing
› Wireless parameter tuning
› Data center cooling
› Diagnostics and debugging tools usable by engineers
Research program
B. Kegl / Huawei Research France
8. HUAWEI TECHNOLOGIES CO., LTD. Page 8
Predict (random) future from history of system observables and control
actions:
𝒐 𝑡+1 ~ 𝒑
𝒚
𝒐 𝑡+1
𝒙
𝒐1, 𝑎1 , … 𝒐 𝑡, 𝑎 𝑡
› We want to simulate
multiple futures from the model
Objective of neural system models
B. Kegl / Huawei Research France
9. HUAWEI TECHNOLOGIES CO., LTD. Page 9
Generative regression: predict 𝒚 ~ 𝑝 𝒚 𝒙) instead of 𝒚 = 𝑓 𝒙
› Predictors that are honest about their uncertainty: introspective models
Requirements
› Both 𝒙 and 𝒚 are multidimensional
› Training should scale well with the dimension of 𝒙 and 𝒚 and the size of the training data
› Easy to compute likelihood
› Easy to sample (simulate)
› Able to model y-interdependence
› Able to model different types of variables
› Frequent semi-automatic retraining and retuning: robustness and debuggability
Objective
B. Kegl / Huawei Research France
10. HUAWEI TECHNOLOGIES CO., LTD. Page 10
Can AI learn physics (of a system) from data?
𝜽 𝟏
𝜽 𝟐
B. Kegl / Huawei Research France
11. HUAWEI TECHNOLOGIES CO., LTD. Page 11
Yes it can!
Which one is the physical model and which one is AI?
You can vote in the chat window: AI is left or right?
B. Kegl / Huawei Research France
13. HUAWEI TECHNOLOGIES CO., LTD. Page 13
1. Explicit summary of history 𝒔𝑡 = 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡
› Simplifies the time series problem into "classical" prediction
› System engineers can input prior knowledge
› Can be fine-tuned using end to end training or extended to RNNs
2. Autoregression 𝑝 𝒚 𝒙) = 𝑝1 𝑦1 𝒙) 𝑗=2
𝑑
𝑝𝑗 𝑦 𝑗 𝑦1, … , 𝑦 𝑗−1, 𝒙)
› Fighting curse of dimensionality:
» We reduce the 𝑑-dimensional model into 𝑑 one-dimensional models
› We can tune the models separately:
» unlike e.g. images, system logs may have varying column types
› Modelling y-interdependence: 𝑝 𝑦1 𝒙) and 𝑝 𝑦2 𝒙) can be strongly dependent in physical systems
3. Mixture model 𝑝 𝑦 𝒙) = ℓ=1
𝐿
𝑤ℓ(𝒙)𝒫ℓ 𝑦; 𝜃ℓ(𝒙)
› Simple: easy to compute likelihood, easy to simulate from
› Versatile: can use prior knowledge (component type), can approximate any density
Why the decompositions?
B. Kegl / Huawei Research France
14. HUAWEI TECHNOLOGIES CO., LTD. Page 14
Any regressor + fixed sigma: 𝑝 𝑦 𝒙) = 𝑵(𝒇 𝒙; 𝛉 , 𝝈)
› Linear regression
› Classical neural nets
We learn the parameters (𝑤(𝒙) and 𝜃(𝒙)) with a deep neural net:
deep autoregressive mixture density nets = DARMDN ("darm-dee-en")
› DARMDN(1) with a single Gaussian component: heteroscedastic 𝑝 𝑦 𝒙) = 𝑵 𝝁 𝒙 , 𝝈 𝒙
› DARMDN(10)
Non-autoregressive models
› Gaussian process
› DMDN(10): classical mixture density nets with multivariate Gaussian components [Bishop 1994]
› Both assume y-independence
How do we learn the model?
B. Kegl / Huawei Research France
15. HUAWEI TECHNOLOGIES CO., LTD. Page 15
What is y-interdependence and why is it important?
B. Kegl / Huawei Research France
sin𝜃
cos𝜃
GP
DMDN(5)
DARMDN(1)
16. HUAWEI TECHNOLOGIES CO., LTD. Page 16
What is y-interdependence and why is it important?
B. Kegl / Huawei Research France
17. HUAWEI TECHNOLOGIES CO., LTD. Page 18
Approximation capacity in system modelling
› We want to be able to represent the real system dynamics efficiently
› We also want to have realistic representation of uncertainty ("plausible futures") to support
exploration
"Raw angles" acrobot
› Normally angles are transformed using sine and cosine to make the system dynamics smooth
› What if we are agnostic? We do not know if a system variable is an angle
› Abrupt jumps are OK, but if we have (epistemic) uncertainty, posteriors need to be multimodal
B. Kegl / Huawei Research France
Is multi-modal posterior predictive important?
18. HUAWEI TECHNOLOGIES CO., LTD. Page 19
Is multi-modal posterior predictive important?
"raw angles" acrobot
B. Kegl / Huawei Research France
19. HUAWEI TECHNOLOGIES CO., LTD. Page 20
› Baseline density ℒb is a multivariate unconditional spherical Gaussian
› Measures how much the data is more likely under the learned model than under the
baseline likelihood
› Baseline = 1, higher the better, no limit
Evaluation
Likelihood ratio to simple baseline
𝐿𝑅 𝒐 𝑡, 𝑎 𝑡 𝑡=1
𝑇
; 𝒑 =
𝒆ℒ 𝒐 𝑡,𝑎 𝑡 𝑡=1
𝑇
;𝒑
𝒆ℒb 𝒐 𝑡,𝑎 𝑡 𝑡=1
𝑇
Log Likelihood
ℒ 𝒐 𝑡, 𝑎 𝑡 𝑡=1
𝑇
; 𝒑 =
1
𝑇 − 1
𝑡=1
𝑇−1
log 𝑝1 𝑜𝑡+1
1
𝒔 𝑡 +
𝑗=2
4
log 𝑝𝑗 𝑜𝑡+1
𝑗
𝒔 𝑡, 𝑜𝑡+1
1
, … , 𝑜𝑡+1
𝑗−1
B. Kegl / Huawei Research France
20. HUAWEI TECHNOLOGIES CO., LTD. Page 21
Results on skewed acrobot data
Algorithm Acrobot "sincos", data generated with linear policy
time series, 5K training points
Likelihood ratio to
spherical Gaussian
Precision
(R2) after
10 steps
Calibratedness
(Kolmogorov-Smirnov)
after 10 steps
Linear regression + constant sigma 2 4% 0.127
Gaussian process 56 83% 0.133
NN regression + constant sigma 32 55% 0.194
DMDN with 10 components 95 90% 0.128
DARMDN with 10 components 119 87% 0.095
B. Kegl / Huawei Research France
DARMDN is both precise and well-calibrated
OK, but does it matter for model-based RL?
21. HUAWEI TECHNOLOGIES CO., LTD. Page 22
1. Collect samples from a random policy
2. Train model on collected samples
3. Learn control policy on the model
4. Apply control policy on real system and collect the data, go back to 2.
Model-based RL loop
B. Kegl / Huawei Research France
We retrain the model after each episode of 200 steps
Control policy is classical random shooting (RS) [Richards 2005]
› Simulate trajectories of 𝑁 = 10 steps using random actions
› Select the optimal trajectory (with the highest reward after 𝑁 steps)
› Execute the first action of the optimal trajectory
22. HUAWEI TECHNOLOGIES CO., LTD. Page 23B. Kegl / Huawei Research France
Acrobot "raw angles"
DARMDN with random shooting is the new SOTA
› Almost as good as planning using the real system dynamics
› Converges 2 to 4 times faster than previous SOTA
x4
x2
24. HUAWEI TECHNOLOGIES CO., LTD. Page 25
Learnt policy after ~10k samples
B. Kegl / Huawei Research France
25. HUAWEI TECHNOLOGIES CO., LTD. Page 26
Deterministic predictors
B. Kegl / Huawei Research France
Do we really need to represent uncertainty?
𝑝 𝑦 𝒙) = 𝐃𝐢𝐫𝐚𝐜(𝒇 𝒙; 𝛉 )
What models?
› NNdet: classical neural net
› DARMDN(10)det: mean of the predictive posterior:
a deterministic model learned probabilistically
26. HUAWEI TECHNOLOGIES CO., LTD. Page 27B. Kegl / Huawei Research France
Acrobot "raw angle": no surprise
deterministic models are suboptimal
27. HUAWEI TECHNOLOGIES CO., LTD. Page 28B. Kegl / Huawei Research France
Acrobot "raw angle": no surprise
deterministic models are suboptimal
28. HUAWEI TECHNOLOGIES CO., LTD. Page 29B. Kegl / Huawei Research France
Acrobot "sincos": what?
Deterministic model is optimal but only if learned probabilistically
29. HUAWEI TECHNOLOGIES CO., LTD. Page 30B. Kegl / Huawei Research France
Acrobot "sincos": what?
Deterministic model is optimal but only if learned probabilistically
32. HUAWEI TECHNOLOGIES CO., LTD. Page 33
Model-based control, bandits, and reinforcement learning
› Learn to control the system in a sample efficient way:
» "real world will not become faster in a few years, contrary to computers"
[Chatzilygeroudis et al., 2019]
› State of the art suffers from the lack of efficient system modelling tools
› Modelling uncertainties is crucial for safety
Bayesian optimization
› Require good and efficient models to quantify uncertainty due to unknown
Transfer learning, meta-learning, and robust reinforcement learning
› Precise probabilistic system models allow to transfer models between systems of the same kind
Anomaly detection
› Anomaly = system state is beyond "likely" behavior
Broader applications of DARMDN
B. Kegl / Huawei Research France
33. HUAWEI TECHNOLOGIES CO., LTD. Page 34
Deep autoregressive mixture density (DARMDN) + random shooting is new
SOTA on Acrobot
Autoregression is useful for modelling y-interdependence
Multimodal posterior predictive is necessary on "raw angles" representation
Deterministic DARMDN is as good as stochastic models on "sincos"
representation, beats NN model trained for deterministic (RMSE) loss
› Something happens in the long horizon, no error accumulation
› Perhaps heteroscedastic epistemic uncertainty models may "let outliers go"?
Conclusions
B. Kegl / Huawei Research France