SlideShare a Scribd company logo
1 of 82
Download to read offline
HUAWEI TECHNOLOGIES CO., LTD.
www.huawei.com
Introduction to model-based reinforcement learning
Towards self-driving engineering systems
Balazs Kegl, Noah's Ark Research Lab, Paris
Joint work with Albert Thomas, Gabriel Hurtado, and Othman Gaizi
HUAWEI TECHNOLOGIES CO., LTD. Page 2
 AI research veteran (25 years)
› recently crossing over from academic research to industry
 In the last 5 years at CNRS I became interested in the human aspects of AI
tech transfer
› Within the scientific world: getting machine learning pipelines into sciences (astrophysics,
medical sciences, climate sciences, economy, etc.)
› Turned out that the management and organizational issues are very similar in industry
› The ultimate question: what should we work on?
 Leading a team of 15 at Huawei Noah's Ark Lab in Paris
› Research Scientists, Research Engineers, PhD students
› Partly doing AI research, partly solving BU problems
Who am I?
https://www.linkedin.com/in/balazs.kegl
https://twitter.com/balazs.kegl
https://balazskegl.medium.com/
HUAWEI TECHNOLOGIES CO., LTD. Page 3
Noah's Ark Paris team
 Composition
› 8 Permanent researchers: Balazs Kegl (lead), Merwan Barlier,
Chunchun Yang, Igor Colin, Ludovic Dos Santos, Albert Thomas,
Aladin Virmaux, Cedric Malherbe
› 3 research engineers: Illyyne Saffar, Gabriel Hurtado, Martin Tabikh
› 3 PhD students: George Dasoulas, Geovani Rizk, Paul Daoudi
 Expertise
› machine learning, optimization, reinforcement learning, deep
learning, distributed and multi-agent algorithms, robust ML, graph
theory, AutoML, transfer learning
 Growth
› 7 in Jan. 2018 to 11 in Nov. 2019 to 13 in 2020 to 18 in 2021
HUAWEI TECHNOLOGIES CO., LTD. Page 4
Part I
The why
HUAWEI TECHNOLOGIES CO., LTD. Page 5
The concept of interpretation is all here: there is no experience of truth
that is not interpretative. I do not know anything that does not interest me.
If it does interest me, it is evident that I do not look at it in a noninterested
way.
Gianni Vattimo: After the Death of God (talking about Heidegger)
 My dream: move AI from a propositional (function learning) paradigm towards
a procedural (goal-oriented) paradigm that incorporates data collection
 My day job: self-driving engineering systems
 Also: supervised learning is embedded in a frequent re-training/tuning loop
basically in all successful industrial ML pipelines
HUAWEI TECHNOLOGIES CO., LTD. Page 6
The big questions
 How does AI generate value?
 What problems we should solve?
› Most AI research is improving solutions on well-defined problems
 How to make sure that the solutions are useful within the organizational and
management constraints
› Derive the problems from the imagined workflow in which the solution will be used
› Note that this is a non-technical expertise, we also need organizational experts
› https://towardsdatascience.com/how-to-build-a-data-science-pipeline-f24341848045
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 7
Meta
 Not a usual tutorial
› No breadth
› Rather a historical walk through our research process (~2 years)
› No theory (math, bounds), only intuitions (based on solid theoretical ground)
› Rather a mix of engineering and experimental scientific methodology to optimize and to learn
» Identify the problem to solve
» Look around for solutions
» Design solutions
» Design well-controlled experiments to understand properties of the solutions
 Q&A, discussion format is the zeitgeist
› There is no stupid question: if you don't understand something, chances are that half of the class
doesn't either
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 8
AI: Highly visible recent breakthroughs
HUAWEI TECHNOLOGIES CO., LTD. Page 9
Why these advances
are not already
in engineering systems?
HUAWEI TECHNOLOGIES CO., LTD. Page 10
A typical engineering control system
Engineer
System
𝒂𝒕
𝒐𝒕, 𝒓𝒕
Engineer observes
system states and performance indicators,
tunes some parameters time to time,
to optimize the performance indicators
HUAWEI TECHNOLOGIES CO., LTD. Page 11
Engineering systems = ~$10s of trillions per year
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 12
Our use cases
 Autopilots for engineering
systems
› Data center cooling
› Wireless parameter tuning
› Wi-Fi setup
 Making them
› Safer, better, more reliable, more
energy efficient
 We believe these are only the
tip of the iceberg
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 13
Automated control, if exists, is based on
deep understanding of the physics
of the system.
HUAWEI TECHNOLOGIES CO., LTD. Page 14
Sometimes it goes wrong
HUAWEI TECHNOLOGIES CO., LTD. Page 15
But mostly it works (it just doesn't learn)
HUAWEI TECHNOLOGIES CO., LTD. Page 16
What is AI (in this context)?
Learn the system behavior
based on historical data
and use it for better control
HUAWEI TECHNOLOGIES CO., LTD. Page 17
 SE:“I would like you to land AI to control my engineering system.”
 DS: “Ok, can I access your system with an algorithm which takes control of
the system, possibly breaking it sometimes in order to learn?”
 SE: “Over my dead body.”
A typical conversation between data scientists
(DS) and BU systems engineer (SE)
HUAWEI TECHNOLOGIES CO., LTD. Page 18
 DS: “OK, do you have a simulator which I can use to learn a control policy?”
 SE: “We are working on it. But in any case, it will never be good enough to
be trusted.”
A typical conversation between data scientists
(DS) and BU systems engineer (SE)
HUAWEI TECHNOLOGIES CO., LTD. Page 19
 DS: “Can you execute a new control policy, after thorough checking and with
human safeguards, time to time and log the system variables and KPIs?
 SE: “Maybe.”
A typical conversation between data scientists
(DS) and BU systems engineer (SE)
HUAWEI TECHNOLOGIES CO., LTD. Page 20
 The systems engineer thinks in classical tech transfer project management
terms
› Systems engineer specifies a problem
› Researcher solves it and delivers technology
 The data science process requires R&D iteration
› Systems engineer specifies a problem
› Data scientist describes what data/simulator/system she needs
› They design tools to provide/annotate data and interfaces to AI algorithms
› Data scientist designs algorithms, pipelines, experiments, metrics
› They iterate
What has just happened?
HUAWEI TECHNOLOGIES CO., LTD. Page 21
Controlled engineering system:
organizational constraints
 Offline (batch): system traces (logs)
 Micro-data: physical systems, high-quality logging is not priority
 Safety: we cannot "lose" while learning
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 22
"real world will not become faster in a few years,
contrary to computers"
HUAWEI TECHNOLOGIES CO., LTD. Page 23
Part II
The what
HUAWEI TECHNOLOGIES CO., LTD. Page 24
Iterated offline/batch RL
 Realistic:
› Fits the organizational scenario we can hope to implement
› Technically doable
› Not well-studied in research (cf trillion dollar market)
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 25
Model-based offline RL
 Why?
› Considered the best approach for the micro-data regime
› We do not waste predictive power (unlike, e.g., on images)
› System models (simulators) are useful on their own
› Self-supervision in RL
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 26
Model-free offline RL
 Why?
› Better asymptotic performance (a goal to aim at with MBRL)
› Better researched, good baselines
› MBRL planners (called "Dyna-styled") are essentially model-free algorithms
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 27
Contextual bandits / Bayesopt (zero order)
 Why?
› Rewards at every step, short delay
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 28
 Models for dynamic systems
› Which models to choose and based on what criteria?
› Separating epistemic and aleatory uncertainties: Can we verify? How to do it?
› Heteroscedasticity at training time proved to be crucial. Why?
› Causality/action sensitivity: building models leading to better treatment effect estimation
› Summarizing history (context): prior knowledge, attention.
› Distribution shift, transfer learning.
› Data check, online or offline, "fear" reaction (unknown behavior).
 Model-free reinforcement learning
› Which model-free or planning agents to choose on system models?
» Robustness to covariate shift
» Criteria to choose
› Best model-free offline RL algorithms, especially in terms of sample complexity.
› Which are the best contextual bandit/bayesopt algorithms?
› How to explore in the "slow" iterated offline setup.
 Safety
› How to formulate and enforce safety?
› When learning and when deploying the learned agent
› How to set the desired safety level flexibly?
› How to add safety to the exploration policy?
 Multi-agent control
› Multiple non-interacting systems, sharing their experience.
› Transferring the learned model and agent from one system to another.
› Interaction between the systems and the control agents.
› Optimizing multi-system rewards in a fair way.
 Policy evaluation and AutoML
› Toolbox, easy to use by novice data scientist or system engineer.
› Policy evaluation to select and tune models.
› Towards automating the process that learns the autopilot.
Research themes (3-4 year plan)
https://balazskegl.medium.com/building-autopilots-for-engineering-systems-using-ai-86a4f312c1f2
B. Kegl / Huawei Research France
Albert, Balazs,
Othman, Gabriel
Igor, Ludo,
Merwan, Albert,
Alexandre,
Geovani
Ludo, Merwan,
Paul
Merwan, Ludo,
Igor
HUAWEI TECHNOLOGIES CO., LTD. Page 29
 Models for dynamic systems
› Which models to choose and based on what criteria?
› Separating epistemic and aleatory uncertainties: Can we verify? How to do it?
› Heteroscedasticity at training time proved to be crucial. Why?
› Causality/action sensitivity: building models leading to better treatment effect estimation
› Summarizing history (context): prior knowledge, attention.
› Distribution shift, transfer learning.
› Data check, online or offline, "fear" reaction (unknown behavior).
 Model-free reinforcement learning
› Which model-free or planning agents to choose on system models?
» Robustness to covariate shift
» Criteria to choose
› Best model-free offline RL algorithms, especially in terms of sample complexity.
› Which are the best contextual bandit/bayesopt algorithms?
› How to explore in the "slow" iterated offline setup.
 Safety
› How to formulate and enforce safety?
› When learning and when deploying the learned agent
› How to set the desired safety level flexibly?
› How to add safety to the exploration policy?
 Multi-agent control
› Multiple non-interacting systems, sharing their experience.
› Transferring the learned model and agent from one system to another.
› Interaction between the systems and the control agents.
› Optimizing multi-system rewards in a fair way.
 Policy evaluation and AutoML
› Toolbox, easy to use by novice data scientist or system engineer.
› Policy evaluation to select and tune models.
› Towards automating the process that learns the autopilot.
Subject of this course
https://balazskegl.medium.com/building-autopilots-for-engineering-systems-using-ai-86a4f312c1f2
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 30
Model-based offline RL
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 31
 Observables 𝒐
› ~10-100 dimensional, both internal (depend on actions) and external
› Mixed continuous, discrete, categorical; bounded or not
 Actions 𝒂
› ~1-100 dimensional
› Mixed continuous, discrete, categorical
 Rewards (called KPIs) 𝒓
› 1-10 dimensional, usually 𝒓 = 𝑓 𝒐 , continuous, short delay
› Multi-dimensional constraints (safety) and targets
 History
› Chunks of length 1000 - 100000
› Missing sensors and time steps
Typical use case
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 32
"real world will not become faster in a few years,
contrary to computers"
HUAWEI TECHNOLOGIES CO., LTD. Page 33
Micro-data model-based RL needs
reliable and scalable
system models
HUAWEI TECHNOLOGIES CO., LTD. Page 34
System model
=
multi-output
probabilistic (generative)
time series forecaster
HUAWEI TECHNOLOGIES CO., LTD. Page 35
 Generative time-series predictors
› Sample efficient: can be learned on a couple of thousands of time steps
› Introspective and well-calibrated: honest about their own uncertainty
› Self-tuning and/or robust, from 100 to 100000 training points
 Control and exploration using system models
› Basic model predictive control (random shooting)
› Active sampling and exploration
› Learn the control agent
 Landing
› Diagnostics and debugging tools usable by engineers
Research program
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 36
https://towardsdatascience.com/cabe95990664
HUAWEI TECHNOLOGIES CO., LTD. Page 37
 Predict (random) future from history of system observables and control
actions:
𝒐𝑡+1 ~ 𝒑
𝒚
𝒐𝑡+1
𝒙
𝒐1, 𝑎1 , … 𝒐𝑡, 𝑎𝑡
› We want to simulate
multiple futures from the model
System model = multi-output time series forecaster
B. Kegl / Huawei Research France
present
future (simulated)
future (ground truth)
past
HUAWEI TECHNOLOGIES CO., LTD. Page 38
System model = multi-output time series forecaster
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 39
 Generative regression: predict 𝒚 ~ 𝑝 𝒚 𝒙) instead of 𝒚 = 𝑓 𝒙
› Predictors that are honest about their uncertainty: introspective models
 Requirements
› Both 𝒙 and 𝒚 are multidimensional
› Training should scale well with the dimension of 𝒙 and 𝒚 and the size of the training data
› Easy to compute likelihood
› Easy to sample (simulate)
› Able to model y-interdependence
› Able to model different types of variables
› Frequent semi-automatic retraining and retuning: robustness and debuggability
Objective
B. Kegl / Huawei Research France
𝒐𝑡+1 ~ 𝒑
𝒚
𝒐𝑡+1
𝒙
𝒐1, 𝑎1 , … 𝒐𝑡, 𝑎𝑡
HUAWEI TECHNOLOGIES CO., LTD. Page 40
 What model?
› Deterministic predictor + fixed-sigma Gaussian
› (Conditional) Gaussian (mixture)
› autoregressive NNs and forests
› VAE
› GAN
› Flow models
Scientific questions I
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 41
 What are the important properties?
› Deterministic (classical predictors): 𝒚 ~ Dirac 𝒚 𝒙), 𝒚 = 𝒇(𝒙)
› Probabilistic 𝒚 ~ 𝑝 𝒚 𝒙)
» Homoscedastic (variance does not depend on the input) 𝒚 ~𝓝 𝒚 𝒇 𝒙 , 𝝈)
» Heteroscedastic (sigma does depend on the input)
– Unimodal 𝒚 ~𝓝 𝒚 𝒇 𝒙 , 𝝈(𝒙))
– Multimodal 𝒚 ~ ℓ=1
𝐿
𝑤ℓ
(𝒙)𝒫ℓ
𝑦; 𝜃ℓ
(𝒙)
» y-interdependent (being able to model (inter)dependence of components of 𝒚 given 𝒙)
Scientific questions II
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 42
What is y-interdependence and why it may be important?
B. Kegl / Huawei Research France
sin𝜃
cos𝜃
GP
DMDN(5)
DARMDN(1)
HUAWEI TECHNOLOGIES CO., LTD. Page 43
What is y-interdependence and why it may be important?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 44
Why generative models?
HUAWEI TECHNOLOGIES CO., LTD. Page 45
What is the probability of the world ending if I press this button?
HUAWEI TECHNOLOGIES CO., LTD. Page 46
 Why generative?
Besides point forecasts, predictors should also predict their uncertainty.
 Uncertainties are important for decision making: should I plan an outdoor
event?
› Instead of
“tomorrow’s max temperature is 26 degrees, it will be sunny”,
say that
“tomorrow’s max temperature is 26 degrees +- 3 degrees, 10% chance of rain”.
Generative time series forecasting
HUAWEI TECHNOLOGIES CO., LTD. Page 47
 Why generative?
Besides point forecasts, predictors should also predict their uncertainty.
› We need to simulate from the forecasting models, for model-based control and optimization.
When the forecast is consumed by a control or optimization module, uncertainty can be
propagated through the deterministic optimizer or planner by executing it on several random
simulated traces (“futures”). This is especially important when safety is at stakes, since we need
to model tail (extreme) event probabilities.
 Epistemic vs aleatory uncertainty
Generative time series forecasting
HUAWEI TECHNOLOGIES CO., LTD. Page 48
 Approximation capacity in system modelling
› We want to be able to represent the real system dynamics efficiently
› We also want to have realistic representation of uncertainty ("plausible futures") to support
exploration
 "Raw angles" acrobot
› Normally angles are transformed using sine and cosine to make the system dynamics smooth
› What if we are agnostic? We do not know if a system variable is an angle
› Abrupt jumps are OK, but if we have (epistemic) uncertainty, posteriors need to be multimodal
B. Kegl / Huawei Research France
Is multi-modal posterior predictive important?
HUAWEI TECHNOLOGIES CO., LTD. Page 49
 What to do with a good system model?
› Plug it into a planning algorithm - no learning (beyond learning the system model)
› Learn an agent on the model and send it back to the real system ("Dyna-style")
» Exploration (iterative batch!): bad model and bad agent can be stuck while seem to have
converged
» Planning: we may just want to use the agent to guide the planning algorithm, not directly on
the real system
– When choosing the actions in the rollouts
– Bootstrapping the learned value at the last step (instead of just summing up the rewards)
Scientific questions III
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 50
Part III/a
The how
The experimental setup
HUAWEI TECHNOLOGIES CO., LTD. Page 51
HUAWEI TECHNOLOGIES CO., LTD. Page 52
› Both are based on experiments
› George Stevenson: makes sure the locomotive works, then optimize
› Carnot: understand the principles of thermodynamics, theorize,
design experiments to (in)validate hypotheses
› We need to publish: religion of the SOTA
› We also want to study the properties of the best approach
› Strategy: go straight ahead to optimize, then come back and check
rigorously what really matters (ablation)
› Let's start optimizing the model with a simple planning algorithm,
then move on to smart agents
› Business cases are out of reach for exhaustive experimentation, we
first need to learn to master our algorithms on toy benchmarks
Engineering or experimental
scientific approach?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 53
Which system(s) or env(s)?
B. Kegl / Huawei Research France
 The broad approach
› Good overview, huge work, and very useful!
› Helped us to choose a single env to start with
› Lacks in-depth understanding of individual envs and
hyperparameter optimization (what do we learn other
than which method works on which env?)
HUAWEI TECHNOLOGIES CO., LTD. Page 54
Which system(s) or env(s)?
B. Kegl / Huawei Research France
 Our deep approach
› Choose a single env, understand and optimize it, reach
SOTA beyond doubt
› We chose Acrobot
» Relatively simple but non-trivial: we could learn
good system models on a couple of thousands of
training points
» Good model + simple planning is SOTA
» Previous SOTA happened to be very suboptimal
› Generalizability is in question: do our findings extend
to other envs?
HUAWEI TECHNOLOGIES CO., LTD. Page 55
The benchmark system: Acrobot
System observables: 𝒐 = (𝜃2, 𝜃2, 𝜃1, 𝜃1)
Actions: torque at second joint, 𝑎 = left, none, right
Reward: height of the tip of the lower segment
0: hanging position
2: ceiling
4: top position
Raw angles system: 𝒐 = 𝜃2, 𝜃2, 𝜃1, 𝜃1
jumps at ±π
Sincos system: 𝒐 = sin 𝜃2 , cos 𝜃2 , 𝜃2, sin 𝜃1 , cos 𝜃1 , 𝜃1
y-interdependence
B. Kegl / Huawei Research France
𝜽𝟏
𝜽𝟐
HUAWEI TECHNOLOGIES CO., LTD. Page 56
Can we learn a precise system model from data?
𝜽𝟏
𝜽𝟐
B. Kegl / Huawei Research France
𝒑(𝒐𝑡+1|(𝒐1, 𝑎1), … , (𝒐𝑡, 𝑎𝑡)) = 𝒑 𝒐𝑡+1 𝒐𝑡, 𝑎𝑡
HUAWEI TECHNOLOGIES CO., LTD. Page 57
Yes we can!
Which one is the physical model and which one is AI?
You can vote in the chat window: AI is left or right?
https://youtu.be/FHFz2ERB4eA
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 58
Let's jump ahead:
what do we do if we have a model?
Remember that our goal is
small sample complexity:
use system access steps as efficiently as possible
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 59
1. Collect samples from a random policy
2. Train model on collected samples
3. Learn (or just apply) control policy on the model
4. Apply control policy on real system and collect the data, go back to 2.
Model-based RL loop
(iterative batch)
B. Kegl / Huawei Research France
 We retrain the model after each episode of 200 steps
 Control policy is classical random shooting (RS) [Richards 2005]
› Simulate 𝑛 trajectories of ℎ steps using random actions
› Select the optimal trajectory (with the highest reward after ℎ steps)
› Execute the first action of the optimal trajectory
HUAWEI TECHNOLOGIES CO., LTD. Page 60
 https://youtu.be/fgwQGTXgI1M
› Random policy,
mean reward = 0.1 (can go up to 0.5, halfway to the length of the lower link)
 https://youtu.be/X-qTJP5U78Q
› Suboptimal policy stuck below the horizon,
mean reward = 1.56
 https://youtu.be/Rwrf7-46aUE
› A good policy that, until recently, we thought was impossible to beat in a 200-step episode,
mean reward = 2.01
 https://youtu.be/XxiTVqxSS1o
› Currently optimal policy that stabilizes the Acrobot within the 200-step episode,
mean reward = 2.56
Acrobot is a non-trivial system
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 61
Acrobot is a non-trivial system
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 62
Part III/b
The how
The metrics
HUAWEI TECHNOLOGIES CO., LTD. Page 63
 We want high reward fast, "dynamic" metrics
› Unlike supervised learning, RL has no simply decipherable metrics
» Total reward depends on env, scale, number of steps
› Reliability: error bars (across episodes and seeds)
› (R)MAR: (relative) mean average reward after convergence
› MRCP(70): mean reward convergence pace
 We want to train, tune, and compare models on "static" metrics
› That matter for dynamic performance
› Time series regression metrics: MSE and R2
› Generative metrics: likelihood, (calibratedness), and (outlier ratio)
› Long horizon metrics: R2(h)
Metrics
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 64
Dynamic metrics
B. Kegl / Huawei Research France
0: mean reward of random policy
1: mean reward of random shooting, h=10, n=100
convergent
transient
RMAR = 0.54 ± 0.03
RMAR = 1.23 ± 0.01
RMAR = 0.7
MRCP(70) = 1200 (system access step)
RMAR: Relative Mean Asymptotic Reward
MRCP(70): Mean Reward Convergence Pace
MRCP(70) = ∞
HUAWEI TECHNOLOGIES CO., LTD. Page 65
› ℒb is a multivariate unconditional spherical Gaussian
› Measures how much the data is more likely under the learned model than under the
baseline likelihood
› Baseline = 1, higher the better, no limit
Static metrics
Likelihood ratio to simple baseline
𝐿𝑅 𝒐𝑡, 𝑎𝑡 𝑡=1
𝑇
; 𝒑 =
𝒆ℒ 𝒐𝑡,𝑎𝑡 𝑡=1
𝑇
;𝒑
𝒆ℒb 𝒐𝑡,𝑎𝑡 𝑡=1
𝑇
Log Likelihood
ℒ 𝒐𝑡, 𝑎𝑡 𝑡=1
𝑇
; 𝒑 =
1
𝑇 − 1
𝑡=1
𝑇−1
log 𝒑 𝒐𝑡+1 𝒐𝑡, 𝑎𝑡
HUAWEI TECHNOLOGIES CO., LTD. Page 66
› Baseline = 0, higher the better, 1 is perfect
› Works both on deterministic and generative regressors
Static metrics
R2 (variance explained)
R2 𝒐𝑡, 𝑎𝑡 𝑡=1
𝑇
; 𝒑 =
1
𝑑𝒐
𝑗=1
𝑑𝒐
1 −
MSE𝑗 𝒐𝑡, 𝑎𝑡 𝑡=1
𝑇
; 𝒑
𝜎𝑗
2
Mean prediction, baseline variance, MSE
𝑓𝑗 𝒐𝑡, 𝑎𝑡 = EXP 𝑝𝑗 𝑜𝑡+1
𝑗
𝒐𝑡, 𝑎𝑡 𝜎𝑗
2
= VAR 𝑜𝑡
𝑗
𝑡=1
𝑇
MSE𝑗 𝒐𝑡, 𝑎𝑡 𝑡=1
𝑇
; 𝒑 =
1
𝑇 − 1
𝑡=1
𝑇−1
𝑜𝑡+1
𝑗
− 𝑓𝑗 𝒐𝑡, 𝑎𝑡
2
HUAWEI TECHNOLOGIES CO., LTD. Page 67
 Long horizon metrics
› Models predict 𝒐𝑡+1directly, but can be cascaded: 𝒐𝑡+2 = 𝑓 𝑓 𝒐𝑡
› Likelihood would need convolution, but R2(h) can be computed using Monte-Carlo
› We found that R2(10) correlates the best with dynamic performance
Static metrics
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 68
Part III/c
The how
The system models
HUAWEI TECHNOLOGIES CO., LTD. Page 69
Formal model illustrated on acrobot
System observables: 𝒐 = (𝜃2, 𝜃2, 𝜃1, 𝜃1)
Actions: torque at second joint, 𝑎 = {left, none, right}
Objective: learn 𝒑(𝒐𝑡+1|𝒐𝑡, 𝑎𝑡)
Decomposition 1 (autoregression):
𝒑 𝒐𝑡+1 𝒔𝑡 =
𝑝1 𝜃𝑡+1
2
𝒐𝑡, 𝑎𝑡
×
𝑝2 𝜃𝑡+1
2
𝒐𝑡, 𝑎𝑡, 𝜃𝑡+1
2
×
𝑝3 𝜃𝑡+1
1
𝒐𝑡, 𝑎𝑡, 𝜃𝑡+1
2
, 𝜃𝑡+1
2
×
𝑝4 𝜃𝑡+1
1
𝒐𝑡, 𝑎𝑡, 𝜃𝑡+1
2
, 𝜃𝑡+1
2
, 𝜃𝑡+1
1
Decomposition 2 (mixture model):
𝑝 𝑦 𝒙) =
ℓ=1
𝐿
𝑤ℓ
(𝒙)𝒫ℓ
𝑦; 𝜃ℓ
(𝒙)
𝒫: component type (e.g. Gaussian)
𝑤: component weight
𝜃: component parameters (e.g. μ, 𝜎)
B. Kegl / Huawei Research France
𝜽𝟏
𝜽𝟐
HUAWEI TECHNOLOGIES CO., LTD. Page 70
 Autoregression 𝑝 𝒚 𝒙) = 𝑝1 𝑦1 𝒙) 𝑗=2
𝑑
𝑝𝑗 𝑦𝑗 𝑦1, … , 𝑦𝑗−1, 𝒙)
› Fighting curse of dimensionality:
» We reduce the 𝑑-dimensional model into 𝑑 one-dimensional models
› We can tune the models separately:
» unlike e.g. images, system logs may have varying column types
› Modelling y-interdependence: 𝑝 𝑦1 𝒙) and 𝑝 𝑦2 𝒙) can be strongly dependent in physical systems
 Mixture model 𝑝 𝑦 𝒙) = ℓ=1
𝐿
𝑤ℓ
(𝒙)𝒫ℓ
𝑦; 𝜃ℓ
(𝒙)
› Simple: easy to compute likelihood, easy to simulate from
› Versatile: can use prior knowledge (component type), can approximate any density
Why the decompositions?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 71
Evaluation
Log Likelihood
ℒ 𝒐𝑡, 𝑎𝑡 𝑡=1
𝑇
; 𝒑 =
1
𝑇 − 1
𝑡=1
𝑇−1
log 𝑝1 𝑜𝑡+1
1
𝒐𝑡, 𝑎𝑡 +
𝑗=2
4
log 𝑝𝑗 𝑜𝑡+1
𝑗
𝒐𝑡, 𝑎𝑡, 𝑜𝑡+1
1
, … , 𝑜𝑡+1
𝑗−1
HUAWEI TECHNOLOGIES CO., LTD. Page 72
 Any regressor + fixed sigma: 𝑝 𝑦 𝒙) = 𝑵(𝒇 𝒙; 𝛉 , 𝝈)
› Linear regression (ARLinσ)
› Classical neural nets (DARNNσ)
 We learn the parameters (𝑤(𝒙) and 𝜃(𝒙)) with a deep neural net:
deep autoregressive mixture density nets = DARMDN ("darm-dee-en")
› DARMDN(1) with a single Gaussian component: heteroscedastic 𝑝 𝑦 𝒙) = 𝑵 𝝁 𝒙 , 𝝈 𝒙
› DARMDN(10) allows for multi-modality
› PETS [Chua et al 2018]: ensembled DARMDN(1)
 Non-autoregressive models
› Gaussian process
› DMDN(10): classical mixture density nets with multivariate Gaussian components [Bishop 1994]
› Both assume y-independence
› VAE, flow (RealNVP), GAN
How do we learn the model?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 73
 Deterministic models
› When we shoot in random shooting (using the model to simulate futures), we can choose between
simulating from the mean or drawing from the conditional density
› DARNNdet , DARMDN(1)det , DARMDN(10)det , DMDN(10)det , PETSdet
How do we learn the model?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 74
B. Kegl / Huawei Research France
What did we find?
HUAWEI TECHNOLOGIES CO., LTD. Page 75
B. Kegl / Huawei Research France
What did we find?
HUAWEI TECHNOLOGIES CO., LTD. Page 76
B. Kegl / Huawei Research France
What did we find?
HUAWEI TECHNOLOGIES CO., LTD. Page 77
Part III/d
The how
The smart agents
HUAWEI TECHNOLOGIES CO., LTD. Page 78
Scientific questions III
B. Kegl / Huawei Research France
› We know that we can achieve optimal
policy with longer horizon and more
simulation
› 1. Can we simply learn an agent on the
model and deploy it on the real system?
› The two tricks of AlphaGo: is it possible with
less simulations and shorter horizon if
» 2. the planning (search) is not random
but guided by a smart agent?
» 3. the estimated reward is not the
reward at the final step but the value
estimate of the smart agent?
HUAWEI TECHNOLOGIES CO., LTD. Page 79
Can we learn a smart agent on the model and
deploy in the real system? NO
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 80
Can we assist the planning with a smart agent?
YES
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 81
 Mixture density nets are optimal and versatile, especially the autoregressive
type
 Multimodal generative model may be needed depending on the env
 Deterministic model is slightly better if multimodality is not needed
 Heteroscedasticity is useful even when we use the deterministic mean
at simulation time!
 y-interdependence does not seem to matter
 Smart agents + planning + exploration beats both smart agents alone and
random shooting planning
Conclusions
B. Kegl / Huawei Research France
Thank you
www.huawei.com
Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation, statements regarding the future
financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual
results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such
information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.
Page 82 HUAWEI TECHNOLOGIES CO., LTD.

More Related Content

Similar to Model-based reinforcement learning and self-driving engineering systems

Symposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificielleSymposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificiellePMI-Montréal
 
Technology in financial services
Technology in financial servicesTechnology in financial services
Technology in financial servicesLuis Caldeira
 
Technology in financial services
Technology in financial servicesTechnology in financial services
Technology in financial servicesLuis Caldeira
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Agile India
 
Enterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsEnterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsWSO2
 
Build Intelligence System with AI. Antimo Musone, Ernst & Young
Build Intelligence System with AI. Antimo Musone, Ernst & YoungBuild Intelligence System with AI. Antimo Musone, Ernst & Young
Build Intelligence System with AI. Antimo Musone, Ernst & YoungData Driven Innovation
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Management of Complexity in System Design of Large IT Solutions
Management of Complexity in System Design of Large IT SolutionsManagement of Complexity in System Design of Large IT Solutions
Management of Complexity in System Design of Large IT SolutionsMichael Heiss
 
Best Practices - Software Engineering
Best Practices - Software EngineeringBest Practices - Software Engineering
Best Practices - Software Engineering3Quill Softwares
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemGiovanni Asproni
 
Software Development and Quality
Software Development and QualitySoftware Development and Quality
Software Development and QualityHerwig Habenbacher
 
OpenEdge Character UI - Where to go?
OpenEdge Character UI - Where to go?OpenEdge Character UI - Where to go?
OpenEdge Character UI - Where to go?Gabriel Lucaciu
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineering[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineeringIvano Malavolta
 
Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing InformationThomas Zimmermann
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence WorkshopDavid Tan
 

Similar to Model-based reinforcement learning and self-driving engineering systems (20)

Symposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificielleSymposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence Artificielle
 
Technology in financial services
Technology in financial servicesTechnology in financial services
Technology in financial services
 
Technology in financial services
Technology in financial servicesTechnology in financial services
Technology in financial services
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
 
Enterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsEnterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and Apps
 
Build Intelligence System with AI. Antimo Musone, Ernst & Young
Build Intelligence System with AI. Antimo Musone, Ernst & YoungBuild Intelligence System with AI. Antimo Musone, Ernst & Young
Build Intelligence System with AI. Antimo Musone, Ernst & Young
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Management of Complexity in System Design of Large IT Solutions
Management of Complexity in System Design of Large IT SolutionsManagement of Complexity in System Design of Large IT Solutions
Management of Complexity in System Design of Large IT Solutions
 
Best Practices - Software Engineering
Best Practices - Software EngineeringBest Practices - Software Engineering
Best Practices - Software Engineering
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
 
Software Development and Quality
Software Development and QualitySoftware Development and Quality
Software Development and Quality
 
OpenEdge Character UI - Where to go?
OpenEdge Character UI - Where to go?OpenEdge Character UI - Where to go?
OpenEdge Character UI - Where to go?
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineering[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineering
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
AE Foyer: Information Management in the Digital Enterprise
AE Foyer: Information Management in the Digital EnterpriseAE Foyer: Information Management in the Digital Enterprise
AE Foyer: Information Management in the Digital Enterprise
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
 

More from Balázs Kégl

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsBalázs Kégl
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflowsBalázs Kégl
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksBalázs Kégl
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBalázs Kégl
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionBalázs Kégl
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesBalázs Kégl
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challengesBalázs Kégl
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)Balázs Kégl
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsBalázs Kégl
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceBalázs Kégl
 

More from Balázs Kégl (10)

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflows
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricks
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submission
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Model-based reinforcement learning and self-driving engineering systems

  • 1. HUAWEI TECHNOLOGIES CO., LTD. www.huawei.com Introduction to model-based reinforcement learning Towards self-driving engineering systems Balazs Kegl, Noah's Ark Research Lab, Paris Joint work with Albert Thomas, Gabriel Hurtado, and Othman Gaizi
  • 2. HUAWEI TECHNOLOGIES CO., LTD. Page 2  AI research veteran (25 years) › recently crossing over from academic research to industry  In the last 5 years at CNRS I became interested in the human aspects of AI tech transfer › Within the scientific world: getting machine learning pipelines into sciences (astrophysics, medical sciences, climate sciences, economy, etc.) › Turned out that the management and organizational issues are very similar in industry › The ultimate question: what should we work on?  Leading a team of 15 at Huawei Noah's Ark Lab in Paris › Research Scientists, Research Engineers, PhD students › Partly doing AI research, partly solving BU problems Who am I? https://www.linkedin.com/in/balazs.kegl https://twitter.com/balazs.kegl https://balazskegl.medium.com/
  • 3. HUAWEI TECHNOLOGIES CO., LTD. Page 3 Noah's Ark Paris team  Composition › 8 Permanent researchers: Balazs Kegl (lead), Merwan Barlier, Chunchun Yang, Igor Colin, Ludovic Dos Santos, Albert Thomas, Aladin Virmaux, Cedric Malherbe › 3 research engineers: Illyyne Saffar, Gabriel Hurtado, Martin Tabikh › 3 PhD students: George Dasoulas, Geovani Rizk, Paul Daoudi  Expertise › machine learning, optimization, reinforcement learning, deep learning, distributed and multi-agent algorithms, robust ML, graph theory, AutoML, transfer learning  Growth › 7 in Jan. 2018 to 11 in Nov. 2019 to 13 in 2020 to 18 in 2021
  • 4. HUAWEI TECHNOLOGIES CO., LTD. Page 4 Part I The why
  • 5. HUAWEI TECHNOLOGIES CO., LTD. Page 5 The concept of interpretation is all here: there is no experience of truth that is not interpretative. I do not know anything that does not interest me. If it does interest me, it is evident that I do not look at it in a noninterested way. Gianni Vattimo: After the Death of God (talking about Heidegger)  My dream: move AI from a propositional (function learning) paradigm towards a procedural (goal-oriented) paradigm that incorporates data collection  My day job: self-driving engineering systems  Also: supervised learning is embedded in a frequent re-training/tuning loop basically in all successful industrial ML pipelines
  • 6. HUAWEI TECHNOLOGIES CO., LTD. Page 6 The big questions  How does AI generate value?  What problems we should solve? › Most AI research is improving solutions on well-defined problems  How to make sure that the solutions are useful within the organizational and management constraints › Derive the problems from the imagined workflow in which the solution will be used › Note that this is a non-technical expertise, we also need organizational experts › https://towardsdatascience.com/how-to-build-a-data-science-pipeline-f24341848045 B. Kegl / Huawei Research France
  • 7. HUAWEI TECHNOLOGIES CO., LTD. Page 7 Meta  Not a usual tutorial › No breadth › Rather a historical walk through our research process (~2 years) › No theory (math, bounds), only intuitions (based on solid theoretical ground) › Rather a mix of engineering and experimental scientific methodology to optimize and to learn » Identify the problem to solve » Look around for solutions » Design solutions » Design well-controlled experiments to understand properties of the solutions  Q&A, discussion format is the zeitgeist › There is no stupid question: if you don't understand something, chances are that half of the class doesn't either B. Kegl / Huawei Research France
  • 8. HUAWEI TECHNOLOGIES CO., LTD. Page 8 AI: Highly visible recent breakthroughs
  • 9. HUAWEI TECHNOLOGIES CO., LTD. Page 9 Why these advances are not already in engineering systems?
  • 10. HUAWEI TECHNOLOGIES CO., LTD. Page 10 A typical engineering control system Engineer System 𝒂𝒕 𝒐𝒕, 𝒓𝒕 Engineer observes system states and performance indicators, tunes some parameters time to time, to optimize the performance indicators
  • 11. HUAWEI TECHNOLOGIES CO., LTD. Page 11 Engineering systems = ~$10s of trillions per year B. Kegl / Huawei Research France
  • 12. HUAWEI TECHNOLOGIES CO., LTD. Page 12 Our use cases  Autopilots for engineering systems › Data center cooling › Wireless parameter tuning › Wi-Fi setup  Making them › Safer, better, more reliable, more energy efficient  We believe these are only the tip of the iceberg B. Kegl / Huawei Research France
  • 13. HUAWEI TECHNOLOGIES CO., LTD. Page 13 Automated control, if exists, is based on deep understanding of the physics of the system.
  • 14. HUAWEI TECHNOLOGIES CO., LTD. Page 14 Sometimes it goes wrong
  • 15. HUAWEI TECHNOLOGIES CO., LTD. Page 15 But mostly it works (it just doesn't learn)
  • 16. HUAWEI TECHNOLOGIES CO., LTD. Page 16 What is AI (in this context)? Learn the system behavior based on historical data and use it for better control
  • 17. HUAWEI TECHNOLOGIES CO., LTD. Page 17  SE:“I would like you to land AI to control my engineering system.”  DS: “Ok, can I access your system with an algorithm which takes control of the system, possibly breaking it sometimes in order to learn?”  SE: “Over my dead body.” A typical conversation between data scientists (DS) and BU systems engineer (SE)
  • 18. HUAWEI TECHNOLOGIES CO., LTD. Page 18  DS: “OK, do you have a simulator which I can use to learn a control policy?”  SE: “We are working on it. But in any case, it will never be good enough to be trusted.” A typical conversation between data scientists (DS) and BU systems engineer (SE)
  • 19. HUAWEI TECHNOLOGIES CO., LTD. Page 19  DS: “Can you execute a new control policy, after thorough checking and with human safeguards, time to time and log the system variables and KPIs?  SE: “Maybe.” A typical conversation between data scientists (DS) and BU systems engineer (SE)
  • 20. HUAWEI TECHNOLOGIES CO., LTD. Page 20  The systems engineer thinks in classical tech transfer project management terms › Systems engineer specifies a problem › Researcher solves it and delivers technology  The data science process requires R&D iteration › Systems engineer specifies a problem › Data scientist describes what data/simulator/system she needs › They design tools to provide/annotate data and interfaces to AI algorithms › Data scientist designs algorithms, pipelines, experiments, metrics › They iterate What has just happened?
  • 21. HUAWEI TECHNOLOGIES CO., LTD. Page 21 Controlled engineering system: organizational constraints  Offline (batch): system traces (logs)  Micro-data: physical systems, high-quality logging is not priority  Safety: we cannot "lose" while learning B. Kegl / Huawei Research France
  • 22. HUAWEI TECHNOLOGIES CO., LTD. Page 22 "real world will not become faster in a few years, contrary to computers"
  • 23. HUAWEI TECHNOLOGIES CO., LTD. Page 23 Part II The what
  • 24. HUAWEI TECHNOLOGIES CO., LTD. Page 24 Iterated offline/batch RL  Realistic: › Fits the organizational scenario we can hope to implement › Technically doable › Not well-studied in research (cf trillion dollar market) B. Kegl / Huawei Research France
  • 25. HUAWEI TECHNOLOGIES CO., LTD. Page 25 Model-based offline RL  Why? › Considered the best approach for the micro-data regime › We do not waste predictive power (unlike, e.g., on images) › System models (simulators) are useful on their own › Self-supervision in RL B. Kegl / Huawei Research France
  • 26. HUAWEI TECHNOLOGIES CO., LTD. Page 26 Model-free offline RL  Why? › Better asymptotic performance (a goal to aim at with MBRL) › Better researched, good baselines › MBRL planners (called "Dyna-styled") are essentially model-free algorithms B. Kegl / Huawei Research France
  • 27. HUAWEI TECHNOLOGIES CO., LTD. Page 27 Contextual bandits / Bayesopt (zero order)  Why? › Rewards at every step, short delay B. Kegl / Huawei Research France
  • 28. HUAWEI TECHNOLOGIES CO., LTD. Page 28  Models for dynamic systems › Which models to choose and based on what criteria? › Separating epistemic and aleatory uncertainties: Can we verify? How to do it? › Heteroscedasticity at training time proved to be crucial. Why? › Causality/action sensitivity: building models leading to better treatment effect estimation › Summarizing history (context): prior knowledge, attention. › Distribution shift, transfer learning. › Data check, online or offline, "fear" reaction (unknown behavior).  Model-free reinforcement learning › Which model-free or planning agents to choose on system models? » Robustness to covariate shift » Criteria to choose › Best model-free offline RL algorithms, especially in terms of sample complexity. › Which are the best contextual bandit/bayesopt algorithms? › How to explore in the "slow" iterated offline setup.  Safety › How to formulate and enforce safety? › When learning and when deploying the learned agent › How to set the desired safety level flexibly? › How to add safety to the exploration policy?  Multi-agent control › Multiple non-interacting systems, sharing their experience. › Transferring the learned model and agent from one system to another. › Interaction between the systems and the control agents. › Optimizing multi-system rewards in a fair way.  Policy evaluation and AutoML › Toolbox, easy to use by novice data scientist or system engineer. › Policy evaluation to select and tune models. › Towards automating the process that learns the autopilot. Research themes (3-4 year plan) https://balazskegl.medium.com/building-autopilots-for-engineering-systems-using-ai-86a4f312c1f2 B. Kegl / Huawei Research France Albert, Balazs, Othman, Gabriel Igor, Ludo, Merwan, Albert, Alexandre, Geovani Ludo, Merwan, Paul Merwan, Ludo, Igor
  • 29. HUAWEI TECHNOLOGIES CO., LTD. Page 29  Models for dynamic systems › Which models to choose and based on what criteria? › Separating epistemic and aleatory uncertainties: Can we verify? How to do it? › Heteroscedasticity at training time proved to be crucial. Why? › Causality/action sensitivity: building models leading to better treatment effect estimation › Summarizing history (context): prior knowledge, attention. › Distribution shift, transfer learning. › Data check, online or offline, "fear" reaction (unknown behavior).  Model-free reinforcement learning › Which model-free or planning agents to choose on system models? » Robustness to covariate shift » Criteria to choose › Best model-free offline RL algorithms, especially in terms of sample complexity. › Which are the best contextual bandit/bayesopt algorithms? › How to explore in the "slow" iterated offline setup.  Safety › How to formulate and enforce safety? › When learning and when deploying the learned agent › How to set the desired safety level flexibly? › How to add safety to the exploration policy?  Multi-agent control › Multiple non-interacting systems, sharing their experience. › Transferring the learned model and agent from one system to another. › Interaction between the systems and the control agents. › Optimizing multi-system rewards in a fair way.  Policy evaluation and AutoML › Toolbox, easy to use by novice data scientist or system engineer. › Policy evaluation to select and tune models. › Towards automating the process that learns the autopilot. Subject of this course https://balazskegl.medium.com/building-autopilots-for-engineering-systems-using-ai-86a4f312c1f2 B. Kegl / Huawei Research France
  • 30. HUAWEI TECHNOLOGIES CO., LTD. Page 30 Model-based offline RL B. Kegl / Huawei Research France
  • 31. HUAWEI TECHNOLOGIES CO., LTD. Page 31  Observables 𝒐 › ~10-100 dimensional, both internal (depend on actions) and external › Mixed continuous, discrete, categorical; bounded or not  Actions 𝒂 › ~1-100 dimensional › Mixed continuous, discrete, categorical  Rewards (called KPIs) 𝒓 › 1-10 dimensional, usually 𝒓 = 𝑓 𝒐 , continuous, short delay › Multi-dimensional constraints (safety) and targets  History › Chunks of length 1000 - 100000 › Missing sensors and time steps Typical use case B. Kegl / Huawei Research France
  • 32. HUAWEI TECHNOLOGIES CO., LTD. Page 32 "real world will not become faster in a few years, contrary to computers"
  • 33. HUAWEI TECHNOLOGIES CO., LTD. Page 33 Micro-data model-based RL needs reliable and scalable system models
  • 34. HUAWEI TECHNOLOGIES CO., LTD. Page 34 System model = multi-output probabilistic (generative) time series forecaster
  • 35. HUAWEI TECHNOLOGIES CO., LTD. Page 35  Generative time-series predictors › Sample efficient: can be learned on a couple of thousands of time steps › Introspective and well-calibrated: honest about their own uncertainty › Self-tuning and/or robust, from 100 to 100000 training points  Control and exploration using system models › Basic model predictive control (random shooting) › Active sampling and exploration › Learn the control agent  Landing › Diagnostics and debugging tools usable by engineers Research program B. Kegl / Huawei Research France
  • 36. HUAWEI TECHNOLOGIES CO., LTD. Page 36 https://towardsdatascience.com/cabe95990664
  • 37. HUAWEI TECHNOLOGIES CO., LTD. Page 37  Predict (random) future from history of system observables and control actions: 𝒐𝑡+1 ~ 𝒑 𝒚 𝒐𝑡+1 𝒙 𝒐1, 𝑎1 , … 𝒐𝑡, 𝑎𝑡 › We want to simulate multiple futures from the model System model = multi-output time series forecaster B. Kegl / Huawei Research France present future (simulated) future (ground truth) past
  • 38. HUAWEI TECHNOLOGIES CO., LTD. Page 38 System model = multi-output time series forecaster B. Kegl / Huawei Research France
  • 39. HUAWEI TECHNOLOGIES CO., LTD. Page 39  Generative regression: predict 𝒚 ~ 𝑝 𝒚 𝒙) instead of 𝒚 = 𝑓 𝒙 › Predictors that are honest about their uncertainty: introspective models  Requirements › Both 𝒙 and 𝒚 are multidimensional › Training should scale well with the dimension of 𝒙 and 𝒚 and the size of the training data › Easy to compute likelihood › Easy to sample (simulate) › Able to model y-interdependence › Able to model different types of variables › Frequent semi-automatic retraining and retuning: robustness and debuggability Objective B. Kegl / Huawei Research France 𝒐𝑡+1 ~ 𝒑 𝒚 𝒐𝑡+1 𝒙 𝒐1, 𝑎1 , … 𝒐𝑡, 𝑎𝑡
  • 40. HUAWEI TECHNOLOGIES CO., LTD. Page 40  What model? › Deterministic predictor + fixed-sigma Gaussian › (Conditional) Gaussian (mixture) › autoregressive NNs and forests › VAE › GAN › Flow models Scientific questions I B. Kegl / Huawei Research France
  • 41. HUAWEI TECHNOLOGIES CO., LTD. Page 41  What are the important properties? › Deterministic (classical predictors): 𝒚 ~ Dirac 𝒚 𝒙), 𝒚 = 𝒇(𝒙) › Probabilistic 𝒚 ~ 𝑝 𝒚 𝒙) » Homoscedastic (variance does not depend on the input) 𝒚 ~𝓝 𝒚 𝒇 𝒙 , 𝝈) » Heteroscedastic (sigma does depend on the input) – Unimodal 𝒚 ~𝓝 𝒚 𝒇 𝒙 , 𝝈(𝒙)) – Multimodal 𝒚 ~ ℓ=1 𝐿 𝑤ℓ (𝒙)𝒫ℓ 𝑦; 𝜃ℓ (𝒙) » y-interdependent (being able to model (inter)dependence of components of 𝒚 given 𝒙) Scientific questions II B. Kegl / Huawei Research France
  • 42. HUAWEI TECHNOLOGIES CO., LTD. Page 42 What is y-interdependence and why it may be important? B. Kegl / Huawei Research France sin𝜃 cos𝜃 GP DMDN(5) DARMDN(1)
  • 43. HUAWEI TECHNOLOGIES CO., LTD. Page 43 What is y-interdependence and why it may be important? B. Kegl / Huawei Research France
  • 44. HUAWEI TECHNOLOGIES CO., LTD. Page 44 Why generative models?
  • 45. HUAWEI TECHNOLOGIES CO., LTD. Page 45 What is the probability of the world ending if I press this button?
  • 46. HUAWEI TECHNOLOGIES CO., LTD. Page 46  Why generative? Besides point forecasts, predictors should also predict their uncertainty.  Uncertainties are important for decision making: should I plan an outdoor event? › Instead of “tomorrow’s max temperature is 26 degrees, it will be sunny”, say that “tomorrow’s max temperature is 26 degrees +- 3 degrees, 10% chance of rain”. Generative time series forecasting
  • 47. HUAWEI TECHNOLOGIES CO., LTD. Page 47  Why generative? Besides point forecasts, predictors should also predict their uncertainty. › We need to simulate from the forecasting models, for model-based control and optimization. When the forecast is consumed by a control or optimization module, uncertainty can be propagated through the deterministic optimizer or planner by executing it on several random simulated traces (“futures”). This is especially important when safety is at stakes, since we need to model tail (extreme) event probabilities.  Epistemic vs aleatory uncertainty Generative time series forecasting
  • 48. HUAWEI TECHNOLOGIES CO., LTD. Page 48  Approximation capacity in system modelling › We want to be able to represent the real system dynamics efficiently › We also want to have realistic representation of uncertainty ("plausible futures") to support exploration  "Raw angles" acrobot › Normally angles are transformed using sine and cosine to make the system dynamics smooth › What if we are agnostic? We do not know if a system variable is an angle › Abrupt jumps are OK, but if we have (epistemic) uncertainty, posteriors need to be multimodal B. Kegl / Huawei Research France Is multi-modal posterior predictive important?
  • 49. HUAWEI TECHNOLOGIES CO., LTD. Page 49  What to do with a good system model? › Plug it into a planning algorithm - no learning (beyond learning the system model) › Learn an agent on the model and send it back to the real system ("Dyna-style") » Exploration (iterative batch!): bad model and bad agent can be stuck while seem to have converged » Planning: we may just want to use the agent to guide the planning algorithm, not directly on the real system – When choosing the actions in the rollouts – Bootstrapping the learned value at the last step (instead of just summing up the rewards) Scientific questions III B. Kegl / Huawei Research France
  • 50. HUAWEI TECHNOLOGIES CO., LTD. Page 50 Part III/a The how The experimental setup
  • 51. HUAWEI TECHNOLOGIES CO., LTD. Page 51
  • 52. HUAWEI TECHNOLOGIES CO., LTD. Page 52 › Both are based on experiments › George Stevenson: makes sure the locomotive works, then optimize › Carnot: understand the principles of thermodynamics, theorize, design experiments to (in)validate hypotheses › We need to publish: religion of the SOTA › We also want to study the properties of the best approach › Strategy: go straight ahead to optimize, then come back and check rigorously what really matters (ablation) › Let's start optimizing the model with a simple planning algorithm, then move on to smart agents › Business cases are out of reach for exhaustive experimentation, we first need to learn to master our algorithms on toy benchmarks Engineering or experimental scientific approach? B. Kegl / Huawei Research France
  • 53. HUAWEI TECHNOLOGIES CO., LTD. Page 53 Which system(s) or env(s)? B. Kegl / Huawei Research France  The broad approach › Good overview, huge work, and very useful! › Helped us to choose a single env to start with › Lacks in-depth understanding of individual envs and hyperparameter optimization (what do we learn other than which method works on which env?)
  • 54. HUAWEI TECHNOLOGIES CO., LTD. Page 54 Which system(s) or env(s)? B. Kegl / Huawei Research France  Our deep approach › Choose a single env, understand and optimize it, reach SOTA beyond doubt › We chose Acrobot » Relatively simple but non-trivial: we could learn good system models on a couple of thousands of training points » Good model + simple planning is SOTA » Previous SOTA happened to be very suboptimal › Generalizability is in question: do our findings extend to other envs?
  • 55. HUAWEI TECHNOLOGIES CO., LTD. Page 55 The benchmark system: Acrobot System observables: 𝒐 = (𝜃2, 𝜃2, 𝜃1, 𝜃1) Actions: torque at second joint, 𝑎 = left, none, right Reward: height of the tip of the lower segment 0: hanging position 2: ceiling 4: top position Raw angles system: 𝒐 = 𝜃2, 𝜃2, 𝜃1, 𝜃1 jumps at ±π Sincos system: 𝒐 = sin 𝜃2 , cos 𝜃2 , 𝜃2, sin 𝜃1 , cos 𝜃1 , 𝜃1 y-interdependence B. Kegl / Huawei Research France 𝜽𝟏 𝜽𝟐
  • 56. HUAWEI TECHNOLOGIES CO., LTD. Page 56 Can we learn a precise system model from data? 𝜽𝟏 𝜽𝟐 B. Kegl / Huawei Research France 𝒑(𝒐𝑡+1|(𝒐1, 𝑎1), … , (𝒐𝑡, 𝑎𝑡)) = 𝒑 𝒐𝑡+1 𝒐𝑡, 𝑎𝑡
  • 57. HUAWEI TECHNOLOGIES CO., LTD. Page 57 Yes we can! Which one is the physical model and which one is AI? You can vote in the chat window: AI is left or right? https://youtu.be/FHFz2ERB4eA B. Kegl / Huawei Research France
  • 58. HUAWEI TECHNOLOGIES CO., LTD. Page 58 Let's jump ahead: what do we do if we have a model? Remember that our goal is small sample complexity: use system access steps as efficiently as possible B. Kegl / Huawei Research France
  • 59. HUAWEI TECHNOLOGIES CO., LTD. Page 59 1. Collect samples from a random policy 2. Train model on collected samples 3. Learn (or just apply) control policy on the model 4. Apply control policy on real system and collect the data, go back to 2. Model-based RL loop (iterative batch) B. Kegl / Huawei Research France  We retrain the model after each episode of 200 steps  Control policy is classical random shooting (RS) [Richards 2005] › Simulate 𝑛 trajectories of ℎ steps using random actions › Select the optimal trajectory (with the highest reward after ℎ steps) › Execute the first action of the optimal trajectory
  • 60. HUAWEI TECHNOLOGIES CO., LTD. Page 60  https://youtu.be/fgwQGTXgI1M › Random policy, mean reward = 0.1 (can go up to 0.5, halfway to the length of the lower link)  https://youtu.be/X-qTJP5U78Q › Suboptimal policy stuck below the horizon, mean reward = 1.56  https://youtu.be/Rwrf7-46aUE › A good policy that, until recently, we thought was impossible to beat in a 200-step episode, mean reward = 2.01  https://youtu.be/XxiTVqxSS1o › Currently optimal policy that stabilizes the Acrobot within the 200-step episode, mean reward = 2.56 Acrobot is a non-trivial system B. Kegl / Huawei Research France
  • 61. HUAWEI TECHNOLOGIES CO., LTD. Page 61 Acrobot is a non-trivial system B. Kegl / Huawei Research France
  • 62. HUAWEI TECHNOLOGIES CO., LTD. Page 62 Part III/b The how The metrics
  • 63. HUAWEI TECHNOLOGIES CO., LTD. Page 63  We want high reward fast, "dynamic" metrics › Unlike supervised learning, RL has no simply decipherable metrics » Total reward depends on env, scale, number of steps › Reliability: error bars (across episodes and seeds) › (R)MAR: (relative) mean average reward after convergence › MRCP(70): mean reward convergence pace  We want to train, tune, and compare models on "static" metrics › That matter for dynamic performance › Time series regression metrics: MSE and R2 › Generative metrics: likelihood, (calibratedness), and (outlier ratio) › Long horizon metrics: R2(h) Metrics B. Kegl / Huawei Research France
  • 64. HUAWEI TECHNOLOGIES CO., LTD. Page 64 Dynamic metrics B. Kegl / Huawei Research France 0: mean reward of random policy 1: mean reward of random shooting, h=10, n=100 convergent transient RMAR = 0.54 ± 0.03 RMAR = 1.23 ± 0.01 RMAR = 0.7 MRCP(70) = 1200 (system access step) RMAR: Relative Mean Asymptotic Reward MRCP(70): Mean Reward Convergence Pace MRCP(70) = ∞
  • 65. HUAWEI TECHNOLOGIES CO., LTD. Page 65 › ℒb is a multivariate unconditional spherical Gaussian › Measures how much the data is more likely under the learned model than under the baseline likelihood › Baseline = 1, higher the better, no limit Static metrics Likelihood ratio to simple baseline 𝐿𝑅 𝒐𝑡, 𝑎𝑡 𝑡=1 𝑇 ; 𝒑 = 𝒆ℒ 𝒐𝑡,𝑎𝑡 𝑡=1 𝑇 ;𝒑 𝒆ℒb 𝒐𝑡,𝑎𝑡 𝑡=1 𝑇 Log Likelihood ℒ 𝒐𝑡, 𝑎𝑡 𝑡=1 𝑇 ; 𝒑 = 1 𝑇 − 1 𝑡=1 𝑇−1 log 𝒑 𝒐𝑡+1 𝒐𝑡, 𝑎𝑡
  • 66. HUAWEI TECHNOLOGIES CO., LTD. Page 66 › Baseline = 0, higher the better, 1 is perfect › Works both on deterministic and generative regressors Static metrics R2 (variance explained) R2 𝒐𝑡, 𝑎𝑡 𝑡=1 𝑇 ; 𝒑 = 1 𝑑𝒐 𝑗=1 𝑑𝒐 1 − MSE𝑗 𝒐𝑡, 𝑎𝑡 𝑡=1 𝑇 ; 𝒑 𝜎𝑗 2 Mean prediction, baseline variance, MSE 𝑓𝑗 𝒐𝑡, 𝑎𝑡 = EXP 𝑝𝑗 𝑜𝑡+1 𝑗 𝒐𝑡, 𝑎𝑡 𝜎𝑗 2 = VAR 𝑜𝑡 𝑗 𝑡=1 𝑇 MSE𝑗 𝒐𝑡, 𝑎𝑡 𝑡=1 𝑇 ; 𝒑 = 1 𝑇 − 1 𝑡=1 𝑇−1 𝑜𝑡+1 𝑗 − 𝑓𝑗 𝒐𝑡, 𝑎𝑡 2
  • 67. HUAWEI TECHNOLOGIES CO., LTD. Page 67  Long horizon metrics › Models predict 𝒐𝑡+1directly, but can be cascaded: 𝒐𝑡+2 = 𝑓 𝑓 𝒐𝑡 › Likelihood would need convolution, but R2(h) can be computed using Monte-Carlo › We found that R2(10) correlates the best with dynamic performance Static metrics B. Kegl / Huawei Research France
  • 68. HUAWEI TECHNOLOGIES CO., LTD. Page 68 Part III/c The how The system models
  • 69. HUAWEI TECHNOLOGIES CO., LTD. Page 69 Formal model illustrated on acrobot System observables: 𝒐 = (𝜃2, 𝜃2, 𝜃1, 𝜃1) Actions: torque at second joint, 𝑎 = {left, none, right} Objective: learn 𝒑(𝒐𝑡+1|𝒐𝑡, 𝑎𝑡) Decomposition 1 (autoregression): 𝒑 𝒐𝑡+1 𝒔𝑡 = 𝑝1 𝜃𝑡+1 2 𝒐𝑡, 𝑎𝑡 × 𝑝2 𝜃𝑡+1 2 𝒐𝑡, 𝑎𝑡, 𝜃𝑡+1 2 × 𝑝3 𝜃𝑡+1 1 𝒐𝑡, 𝑎𝑡, 𝜃𝑡+1 2 , 𝜃𝑡+1 2 × 𝑝4 𝜃𝑡+1 1 𝒐𝑡, 𝑎𝑡, 𝜃𝑡+1 2 , 𝜃𝑡+1 2 , 𝜃𝑡+1 1 Decomposition 2 (mixture model): 𝑝 𝑦 𝒙) = ℓ=1 𝐿 𝑤ℓ (𝒙)𝒫ℓ 𝑦; 𝜃ℓ (𝒙) 𝒫: component type (e.g. Gaussian) 𝑤: component weight 𝜃: component parameters (e.g. μ, 𝜎) B. Kegl / Huawei Research France 𝜽𝟏 𝜽𝟐
  • 70. HUAWEI TECHNOLOGIES CO., LTD. Page 70  Autoregression 𝑝 𝒚 𝒙) = 𝑝1 𝑦1 𝒙) 𝑗=2 𝑑 𝑝𝑗 𝑦𝑗 𝑦1, … , 𝑦𝑗−1, 𝒙) › Fighting curse of dimensionality: » We reduce the 𝑑-dimensional model into 𝑑 one-dimensional models › We can tune the models separately: » unlike e.g. images, system logs may have varying column types › Modelling y-interdependence: 𝑝 𝑦1 𝒙) and 𝑝 𝑦2 𝒙) can be strongly dependent in physical systems  Mixture model 𝑝 𝑦 𝒙) = ℓ=1 𝐿 𝑤ℓ (𝒙)𝒫ℓ 𝑦; 𝜃ℓ (𝒙) › Simple: easy to compute likelihood, easy to simulate from › Versatile: can use prior knowledge (component type), can approximate any density Why the decompositions? B. Kegl / Huawei Research France
  • 71. HUAWEI TECHNOLOGIES CO., LTD. Page 71 Evaluation Log Likelihood ℒ 𝒐𝑡, 𝑎𝑡 𝑡=1 𝑇 ; 𝒑 = 1 𝑇 − 1 𝑡=1 𝑇−1 log 𝑝1 𝑜𝑡+1 1 𝒐𝑡, 𝑎𝑡 + 𝑗=2 4 log 𝑝𝑗 𝑜𝑡+1 𝑗 𝒐𝑡, 𝑎𝑡, 𝑜𝑡+1 1 , … , 𝑜𝑡+1 𝑗−1
  • 72. HUAWEI TECHNOLOGIES CO., LTD. Page 72  Any regressor + fixed sigma: 𝑝 𝑦 𝒙) = 𝑵(𝒇 𝒙; 𝛉 , 𝝈) › Linear regression (ARLinσ) › Classical neural nets (DARNNσ)  We learn the parameters (𝑤(𝒙) and 𝜃(𝒙)) with a deep neural net: deep autoregressive mixture density nets = DARMDN ("darm-dee-en") › DARMDN(1) with a single Gaussian component: heteroscedastic 𝑝 𝑦 𝒙) = 𝑵 𝝁 𝒙 , 𝝈 𝒙 › DARMDN(10) allows for multi-modality › PETS [Chua et al 2018]: ensembled DARMDN(1)  Non-autoregressive models › Gaussian process › DMDN(10): classical mixture density nets with multivariate Gaussian components [Bishop 1994] › Both assume y-independence › VAE, flow (RealNVP), GAN How do we learn the model? B. Kegl / Huawei Research France
  • 73. HUAWEI TECHNOLOGIES CO., LTD. Page 73  Deterministic models › When we shoot in random shooting (using the model to simulate futures), we can choose between simulating from the mean or drawing from the conditional density › DARNNdet , DARMDN(1)det , DARMDN(10)det , DMDN(10)det , PETSdet How do we learn the model? B. Kegl / Huawei Research France
  • 74. HUAWEI TECHNOLOGIES CO., LTD. Page 74 B. Kegl / Huawei Research France What did we find?
  • 75. HUAWEI TECHNOLOGIES CO., LTD. Page 75 B. Kegl / Huawei Research France What did we find?
  • 76. HUAWEI TECHNOLOGIES CO., LTD. Page 76 B. Kegl / Huawei Research France What did we find?
  • 77. HUAWEI TECHNOLOGIES CO., LTD. Page 77 Part III/d The how The smart agents
  • 78. HUAWEI TECHNOLOGIES CO., LTD. Page 78 Scientific questions III B. Kegl / Huawei Research France › We know that we can achieve optimal policy with longer horizon and more simulation › 1. Can we simply learn an agent on the model and deploy it on the real system? › The two tricks of AlphaGo: is it possible with less simulations and shorter horizon if » 2. the planning (search) is not random but guided by a smart agent? » 3. the estimated reward is not the reward at the final step but the value estimate of the smart agent?
  • 79. HUAWEI TECHNOLOGIES CO., LTD. Page 79 Can we learn a smart agent on the model and deploy in the real system? NO B. Kegl / Huawei Research France
  • 80. HUAWEI TECHNOLOGIES CO., LTD. Page 80 Can we assist the planning with a smart agent? YES B. Kegl / Huawei Research France
  • 81. HUAWEI TECHNOLOGIES CO., LTD. Page 81  Mixture density nets are optimal and versatile, especially the autoregressive type  Multimodal generative model may be needed depending on the env  Deterministic model is slightly better if multimodality is not needed  Heteroscedasticity is useful even when we use the deterministic mean at simulation time!  y-interdependence does not seem to matter  Smart agents + planning + exploration beats both smart agents alone and random shooting planning Conclusions B. Kegl / Huawei Research France
  • 82. Thank you www.huawei.com Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice. Page 82 HUAWEI TECHNOLOGIES CO., LTD.