SlideShare a Scribd company logo
1 of 34
HUAWEI TECHNOLOGIES CO., LTD.
www.huawei.com
DARMDN: Deep autoregressive mixture density nets for
dynamical system modelling
— Balazs Kegl, Gabriel Hurtado, Albert Thomas
for Noah's Ark Research Lab, Paris
HUAWEI TECHNOLOGIES CO., LTD. Page 2
Develop neural simulators
trained on short system logs
Objective
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 3
Why?
 Automate engineering systems
› Data center cooling
› Wireless parameter tuning
› Wifi setup
 Predictive maintenance
› Copper and optical end-user devices
› Wireless network devices
› Data center servers
 We believe these are only the
tip of the iceberg
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 4
AI: Highly visible breakthroughs
HUAWEI TECHNOLOGIES CO., LTD. Page 5
Why aren't these algorithms
already
in engineering systems?
HUAWEI TECHNOLOGIES CO., LTD. Page 6
 Physical systems do not get faster with time
 System access is tightly controlled by engineers whose responsibility is to
keep the systems running
Why is it hard?
BU
Engineer
System
𝒂 𝒕
𝒐 𝒕, 𝒓𝒕
Micro-data!!! reinforcement learning
HUAWEI TECHNOLOGIES CO., LTD. Page 7
 Generative time-series predictors (= neural system models)
› Sample efficient: can be learned on a couple of thousands of time steps
› Introspective and well-calibrated: honest about their own uncertainty
 Control and exploration using system models
› Basic model predictive control (random shooting)
› Active sampling and exploration
› Learn the control agent
› Multi-agent control and transfer learning
 Landing
› Wireless parameter tuning
› Data center cooling
› Diagnostics and debugging tools usable by engineers
Research program
B. Kegl / Huawei Research France








HUAWEI TECHNOLOGIES CO., LTD. Page 8
 Predict (random) future from history of system observables and control
actions:
𝒐 𝑡+1 ~ 𝒑
𝒚
𝒐 𝑡+1
𝒙
𝒐1, 𝑎1 , … 𝒐 𝑡, 𝑎 𝑡
› We want to simulate
multiple futures from the model
Objective of neural system models
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 9
 Generative regression: predict 𝒚 ~ 𝑝 𝒚 𝒙) instead of 𝒚 = 𝑓 𝒙
› Predictors that are honest about their uncertainty: introspective models
 Requirements
› Both 𝒙 and 𝒚 are multidimensional
› Training should scale well with the dimension of 𝒙 and 𝒚 and the size of the training data
› Easy to compute likelihood
› Easy to sample (simulate)
› Able to model y-interdependence
› Able to model different types of variables
› Frequent semi-automatic retraining and retuning: robustness and debuggability
Objective
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 10
Can AI learn physics (of a system) from data?
𝜽 𝟏
𝜽 𝟐
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 11
Yes it can!
Which one is the physical model and which one is AI?
You can vote in the chat window: AI is left or right?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 12
Formal model illustrated on acrobot
System observables: 𝒐 = (𝜃2
, 𝜃2
, 𝜃1
, 𝜃1
)
Actions: torque at second joint, 𝑎 = {left, none, right}
Objective: learn 𝒑(𝒐 𝑡+1|(𝒐1, 𝑎1), … , (𝒐 𝑡, 𝑎 𝑡))
Decomposition 1 (summarizing history):
𝒑(𝒐 𝑡+1|(𝒐1, 𝑎1), … , (𝒐 𝑡, 𝑎 𝑡)) = 𝒑 𝒐 𝑡+1 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡
𝒇FE is a time series feature extractor:
𝒔𝑡 = 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡
𝒑(𝒐 𝑡+1|(𝒐1, 𝑎1), … , (𝒐 𝑡, 𝑎 𝑡)) = 𝒑(𝒐 𝑡+1|𝒔𝑡)
Decomposition 2 (autoregression):
𝒑 𝒐 𝑡+1 𝒔𝑡 =
𝑝1 𝜃𝑡+1
2
𝒔𝑡
𝑝2 𝜃𝑡+1
2
𝒔 𝑡, 𝜃𝑡+1
2
𝑝3 𝜃𝑡+1
1
𝒔 𝑡, 𝜃𝑡+1
2
, 𝜃𝑡+1
2
𝑝4 𝜃𝑡+1
1
𝒔 𝑡, 𝜃𝑡+1
2
, 𝜃𝑡+1
2
, 𝜃𝑡+1
1
Decomposition 3 (mixture model):
𝑝 𝑦 𝒙) =
ℓ=1
𝐿
𝑤ℓ
(𝒙)𝒫ℓ
𝑦; 𝜃ℓ
(𝒙)
𝒫: component type (e.g. Gaussian)
𝑤: component weight
𝜃: component parameters (e.g. μ, 𝜎)
B. Kegl / Huawei Research France
𝜽 𝟏
𝜽 𝟐
HUAWEI TECHNOLOGIES CO., LTD. Page 13
 1. Explicit summary of history 𝒔𝑡 = 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡
› Simplifies the time series problem into "classical" prediction
› System engineers can input prior knowledge
› Can be fine-tuned using end to end training or extended to RNNs
 2. Autoregression 𝑝 𝒚 𝒙) = 𝑝1 𝑦1 𝒙) 𝑗=2
𝑑
𝑝𝑗 𝑦 𝑗 𝑦1, … , 𝑦 𝑗−1, 𝒙)
› Fighting curse of dimensionality:
» We reduce the 𝑑-dimensional model into 𝑑 one-dimensional models
› We can tune the models separately:
» unlike e.g. images, system logs may have varying column types
› Modelling y-interdependence: 𝑝 𝑦1 𝒙) and 𝑝 𝑦2 𝒙) can be strongly dependent in physical systems
 3. Mixture model 𝑝 𝑦 𝒙) = ℓ=1
𝐿
𝑤ℓ(𝒙)𝒫ℓ 𝑦; 𝜃ℓ(𝒙)
› Simple: easy to compute likelihood, easy to simulate from
› Versatile: can use prior knowledge (component type), can approximate any density
Why the decompositions?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 14
 Any regressor + fixed sigma: 𝑝 𝑦 𝒙) = 𝑵(𝒇 𝒙; 𝛉 , 𝝈)
› Linear regression
› Classical neural nets
 We learn the parameters (𝑤(𝒙) and 𝜃(𝒙)) with a deep neural net:
deep autoregressive mixture density nets = DARMDN ("darm-dee-en")
› DARMDN(1) with a single Gaussian component: heteroscedastic 𝑝 𝑦 𝒙) = 𝑵 𝝁 𝒙 , 𝝈 𝒙
› DARMDN(10)
 Non-autoregressive models
› Gaussian process
› DMDN(10): classical mixture density nets with multivariate Gaussian components [Bishop 1994]
› Both assume y-independence
How do we learn the model?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 15
What is y-interdependence and why is it important?
B. Kegl / Huawei Research France
sin𝜃
cos𝜃
GP
DMDN(5)
DARMDN(1)
HUAWEI TECHNOLOGIES CO., LTD. Page 16
What is y-interdependence and why is it important?
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 18
 Approximation capacity in system modelling
› We want to be able to represent the real system dynamics efficiently
› We also want to have realistic representation of uncertainty ("plausible futures") to support
exploration
 "Raw angles" acrobot
› Normally angles are transformed using sine and cosine to make the system dynamics smooth
› What if we are agnostic? We do not know if a system variable is an angle
› Abrupt jumps are OK, but if we have (epistemic) uncertainty, posteriors need to be multimodal
B. Kegl / Huawei Research France
Is multi-modal posterior predictive important?
HUAWEI TECHNOLOGIES CO., LTD. Page 19
Is multi-modal posterior predictive important?
"raw angles" acrobot
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 20
› Baseline density ℒb is a multivariate unconditional spherical Gaussian
› Measures how much the data is more likely under the learned model than under the
baseline likelihood
› Baseline = 1, higher the better, no limit
Evaluation
Likelihood ratio to simple baseline
𝐿𝑅 𝒐 𝑡, 𝑎 𝑡 𝑡=1
𝑇
; 𝒑 =
𝒆ℒ 𝒐 𝑡,𝑎 𝑡 𝑡=1
𝑇
;𝒑
𝒆ℒb 𝒐 𝑡,𝑎 𝑡 𝑡=1
𝑇
Log Likelihood
ℒ 𝒐 𝑡, 𝑎 𝑡 𝑡=1
𝑇
; 𝒑 =
1
𝑇 − 1
𝑡=1
𝑇−1
log 𝑝1 𝑜𝑡+1
1
𝒔 𝑡 +
𝑗=2
4
log 𝑝𝑗 𝑜𝑡+1
𝑗
𝒔 𝑡, 𝑜𝑡+1
1
, … , 𝑜𝑡+1
𝑗−1
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 21
Results on skewed acrobot data
Algorithm Acrobot "sincos", data generated with linear policy
time series, 5K training points
Likelihood ratio to
spherical Gaussian
Precision
(R2) after
10 steps
Calibratedness
(Kolmogorov-Smirnov)
after 10 steps
Linear regression + constant sigma 2 4% 0.127
Gaussian process 56 83% 0.133
NN regression + constant sigma 32 55% 0.194
DMDN with 10 components 95 90% 0.128
DARMDN with 10 components 119 87% 0.095
B. Kegl / Huawei Research France
 DARMDN is both precise and well-calibrated
 OK, but does it matter for model-based RL?
HUAWEI TECHNOLOGIES CO., LTD. Page 22
1. Collect samples from a random policy
2. Train model on collected samples
3. Learn control policy on the model
4. Apply control policy on real system and collect the data, go back to 2.
Model-based RL loop
B. Kegl / Huawei Research France
 We retrain the model after each episode of 200 steps
 Control policy is classical random shooting (RS) [Richards 2005]
› Simulate trajectories of 𝑁 = 10 steps using random actions
› Select the optimal trajectory (with the highest reward after 𝑁 steps)
› Execute the first action of the optimal trajectory
HUAWEI TECHNOLOGIES CO., LTD. Page 23B. Kegl / Huawei Research France
Acrobot "raw angles"
 DARMDN with random shooting is the new SOTA
› Almost as good as planning using the real system dynamics
› Converges 2 to 4 times faster than previous SOTA
x4
x2
HUAWEI TECHNOLOGIES CO., LTD. Page 24B. Kegl / Huawei Research France
Acrobot "sincos"
HUAWEI TECHNOLOGIES CO., LTD. Page 25
Learnt policy after ~10k samples
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 26
Deterministic predictors
B. Kegl / Huawei Research France
 Do we really need to represent uncertainty?
 𝑝 𝑦 𝒙) = 𝐃𝐢𝐫𝐚𝐜(𝒇 𝒙; 𝛉 )
 What models?
› NNdet: classical neural net
› DARMDN(10)det: mean of the predictive posterior:
a deterministic model learned probabilistically
HUAWEI TECHNOLOGIES CO., LTD. Page 27B. Kegl / Huawei Research France
Acrobot "raw angle": no surprise
deterministic models are suboptimal
HUAWEI TECHNOLOGIES CO., LTD. Page 28B. Kegl / Huawei Research France
Acrobot "raw angle": no surprise
deterministic models are suboptimal
HUAWEI TECHNOLOGIES CO., LTD. Page 29B. Kegl / Huawei Research France
Acrobot "sincos": what?
Deterministic model is optimal but only if learned probabilistically
HUAWEI TECHNOLOGIES CO., LTD. Page 30B. Kegl / Huawei Research France
Acrobot "sincos": what?
Deterministic model is optimal but only if learned probabilistically
HUAWEI TECHNOLOGIES CO., LTD. Page 31
Is it heteroscedasticity or multimodality?
HUAWEI TECHNOLOGIES CO., LTD. Page 32B. Kegl / Huawei Research France
It is heteroscedasticity
HUAWEI TECHNOLOGIES CO., LTD. Page 33
 Model-based control, bandits, and reinforcement learning
› Learn to control the system in a sample efficient way:
» "real world will not become faster in a few years, contrary to computers"
[Chatzilygeroudis et al., 2019]
› State of the art suffers from the lack of efficient system modelling tools
› Modelling uncertainties is crucial for safety
 Bayesian optimization
› Require good and efficient models to quantify uncertainty due to unknown
 Transfer learning, meta-learning, and robust reinforcement learning
› Precise probabilistic system models allow to transfer models between systems of the same kind
 Anomaly detection
› Anomaly = system state is beyond "likely" behavior
Broader applications of DARMDN
B. Kegl / Huawei Research France
HUAWEI TECHNOLOGIES CO., LTD. Page 34
 Deep autoregressive mixture density (DARMDN) + random shooting is new
SOTA on Acrobot
 Autoregression is useful for modelling y-interdependence
 Multimodal posterior predictive is necessary on "raw angles" representation
 Deterministic DARMDN is as good as stochastic models on "sincos"
representation, beats NN model trained for deterministic (RMSE) loss
› Something happens in the long horizon, no error accumulation
› Perhaps heteroscedastic epistemic uncertainty models may "let outliers go"?
Conclusions
B. Kegl / Huawei Research France
Thank you
www.huawei.com
Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation, statements regarding the future
financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual
results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such
information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.
Page 35 HUAWEI TECHNOLOGIES CO., LTD.

More Related Content

Similar to DARMDN: Deep autoregressive mixture density nets for dynamical system modelling

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 

Similar to DARMDN: Deep autoregressive mixture density nets for dynamical system modelling (20)

Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
 
Generative models in the arts
Generative models in the artsGenerative models in the arts
Generative models in the arts
 
Vision Algorithmics
Vision AlgorithmicsVision Algorithmics
Vision Algorithmics
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
 
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Accelerating algorithmic and hardware advancements for power efficient on-dev...
Accelerating algorithmic and hardware advancements for power efficient on-dev...Accelerating algorithmic and hardware advancements for power efficient on-dev...
Accelerating algorithmic and hardware advancements for power efficient on-dev...
 
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial NetworkIRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Possibilities of generative models
Possibilities of generative modelsPossibilities of generative models
Possibilities of generative models
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 

More from Balázs Kégl

More from Balázs Kégl (10)

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
 
Machine learning in scientific workflows
Machine learning in scientific workflowsMachine learning in scientific workflows
Machine learning in scientific workflows
 
A historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricksA historical introduction to deep learning: hardware, data, and tricks
A historical introduction to deep learning: hardware, data, and tricks
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submission
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
 

Recently uploaded

Recently uploaded (20)

Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 

DARMDN: Deep autoregressive mixture density nets for dynamical system modelling

  • 1. HUAWEI TECHNOLOGIES CO., LTD. www.huawei.com DARMDN: Deep autoregressive mixture density nets for dynamical system modelling — Balazs Kegl, Gabriel Hurtado, Albert Thomas for Noah's Ark Research Lab, Paris
  • 2. HUAWEI TECHNOLOGIES CO., LTD. Page 2 Develop neural simulators trained on short system logs Objective B. Kegl / Huawei Research France
  • 3. HUAWEI TECHNOLOGIES CO., LTD. Page 3 Why?  Automate engineering systems › Data center cooling › Wireless parameter tuning › Wifi setup  Predictive maintenance › Copper and optical end-user devices › Wireless network devices › Data center servers  We believe these are only the tip of the iceberg B. Kegl / Huawei Research France
  • 4. HUAWEI TECHNOLOGIES CO., LTD. Page 4 AI: Highly visible breakthroughs
  • 5. HUAWEI TECHNOLOGIES CO., LTD. Page 5 Why aren't these algorithms already in engineering systems?
  • 6. HUAWEI TECHNOLOGIES CO., LTD. Page 6  Physical systems do not get faster with time  System access is tightly controlled by engineers whose responsibility is to keep the systems running Why is it hard? BU Engineer System 𝒂 𝒕 𝒐 𝒕, 𝒓𝒕 Micro-data!!! reinforcement learning
  • 7. HUAWEI TECHNOLOGIES CO., LTD. Page 7  Generative time-series predictors (= neural system models) › Sample efficient: can be learned on a couple of thousands of time steps › Introspective and well-calibrated: honest about their own uncertainty  Control and exploration using system models › Basic model predictive control (random shooting) › Active sampling and exploration › Learn the control agent › Multi-agent control and transfer learning  Landing › Wireless parameter tuning › Data center cooling › Diagnostics and debugging tools usable by engineers Research program B. Kegl / Huawei Research France        
  • 8. HUAWEI TECHNOLOGIES CO., LTD. Page 8  Predict (random) future from history of system observables and control actions: 𝒐 𝑡+1 ~ 𝒑 𝒚 𝒐 𝑡+1 𝒙 𝒐1, 𝑎1 , … 𝒐 𝑡, 𝑎 𝑡 › We want to simulate multiple futures from the model Objective of neural system models B. Kegl / Huawei Research France
  • 9. HUAWEI TECHNOLOGIES CO., LTD. Page 9  Generative regression: predict 𝒚 ~ 𝑝 𝒚 𝒙) instead of 𝒚 = 𝑓 𝒙 › Predictors that are honest about their uncertainty: introspective models  Requirements › Both 𝒙 and 𝒚 are multidimensional › Training should scale well with the dimension of 𝒙 and 𝒚 and the size of the training data › Easy to compute likelihood › Easy to sample (simulate) › Able to model y-interdependence › Able to model different types of variables › Frequent semi-automatic retraining and retuning: robustness and debuggability Objective B. Kegl / Huawei Research France
  • 10. HUAWEI TECHNOLOGIES CO., LTD. Page 10 Can AI learn physics (of a system) from data? 𝜽 𝟏 𝜽 𝟐 B. Kegl / Huawei Research France
  • 11. HUAWEI TECHNOLOGIES CO., LTD. Page 11 Yes it can! Which one is the physical model and which one is AI? You can vote in the chat window: AI is left or right? B. Kegl / Huawei Research France
  • 12. HUAWEI TECHNOLOGIES CO., LTD. Page 12 Formal model illustrated on acrobot System observables: 𝒐 = (𝜃2 , 𝜃2 , 𝜃1 , 𝜃1 ) Actions: torque at second joint, 𝑎 = {left, none, right} Objective: learn 𝒑(𝒐 𝑡+1|(𝒐1, 𝑎1), … , (𝒐 𝑡, 𝑎 𝑡)) Decomposition 1 (summarizing history): 𝒑(𝒐 𝑡+1|(𝒐1, 𝑎1), … , (𝒐 𝑡, 𝑎 𝑡)) = 𝒑 𝒐 𝑡+1 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡 𝒇FE is a time series feature extractor: 𝒔𝑡 = 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡 𝒑(𝒐 𝑡+1|(𝒐1, 𝑎1), … , (𝒐 𝑡, 𝑎 𝑡)) = 𝒑(𝒐 𝑡+1|𝒔𝑡) Decomposition 2 (autoregression): 𝒑 𝒐 𝑡+1 𝒔𝑡 = 𝑝1 𝜃𝑡+1 2 𝒔𝑡 𝑝2 𝜃𝑡+1 2 𝒔 𝑡, 𝜃𝑡+1 2 𝑝3 𝜃𝑡+1 1 𝒔 𝑡, 𝜃𝑡+1 2 , 𝜃𝑡+1 2 𝑝4 𝜃𝑡+1 1 𝒔 𝑡, 𝜃𝑡+1 2 , 𝜃𝑡+1 2 , 𝜃𝑡+1 1 Decomposition 3 (mixture model): 𝑝 𝑦 𝒙) = ℓ=1 𝐿 𝑤ℓ (𝒙)𝒫ℓ 𝑦; 𝜃ℓ (𝒙) 𝒫: component type (e.g. Gaussian) 𝑤: component weight 𝜃: component parameters (e.g. μ, 𝜎) B. Kegl / Huawei Research France 𝜽 𝟏 𝜽 𝟐
  • 13. HUAWEI TECHNOLOGIES CO., LTD. Page 13  1. Explicit summary of history 𝒔𝑡 = 𝒇FE 𝒐1, 𝑎1 , … , 𝒐 𝑡, 𝑎 𝑡 › Simplifies the time series problem into "classical" prediction › System engineers can input prior knowledge › Can be fine-tuned using end to end training or extended to RNNs  2. Autoregression 𝑝 𝒚 𝒙) = 𝑝1 𝑦1 𝒙) 𝑗=2 𝑑 𝑝𝑗 𝑦 𝑗 𝑦1, … , 𝑦 𝑗−1, 𝒙) › Fighting curse of dimensionality: » We reduce the 𝑑-dimensional model into 𝑑 one-dimensional models › We can tune the models separately: » unlike e.g. images, system logs may have varying column types › Modelling y-interdependence: 𝑝 𝑦1 𝒙) and 𝑝 𝑦2 𝒙) can be strongly dependent in physical systems  3. Mixture model 𝑝 𝑦 𝒙) = ℓ=1 𝐿 𝑤ℓ(𝒙)𝒫ℓ 𝑦; 𝜃ℓ(𝒙) › Simple: easy to compute likelihood, easy to simulate from › Versatile: can use prior knowledge (component type), can approximate any density Why the decompositions? B. Kegl / Huawei Research France
  • 14. HUAWEI TECHNOLOGIES CO., LTD. Page 14  Any regressor + fixed sigma: 𝑝 𝑦 𝒙) = 𝑵(𝒇 𝒙; 𝛉 , 𝝈) › Linear regression › Classical neural nets  We learn the parameters (𝑤(𝒙) and 𝜃(𝒙)) with a deep neural net: deep autoregressive mixture density nets = DARMDN ("darm-dee-en") › DARMDN(1) with a single Gaussian component: heteroscedastic 𝑝 𝑦 𝒙) = 𝑵 𝝁 𝒙 , 𝝈 𝒙 › DARMDN(10)  Non-autoregressive models › Gaussian process › DMDN(10): classical mixture density nets with multivariate Gaussian components [Bishop 1994] › Both assume y-independence How do we learn the model? B. Kegl / Huawei Research France
  • 15. HUAWEI TECHNOLOGIES CO., LTD. Page 15 What is y-interdependence and why is it important? B. Kegl / Huawei Research France sin𝜃 cos𝜃 GP DMDN(5) DARMDN(1)
  • 16. HUAWEI TECHNOLOGIES CO., LTD. Page 16 What is y-interdependence and why is it important? B. Kegl / Huawei Research France
  • 17. HUAWEI TECHNOLOGIES CO., LTD. Page 18  Approximation capacity in system modelling › We want to be able to represent the real system dynamics efficiently › We also want to have realistic representation of uncertainty ("plausible futures") to support exploration  "Raw angles" acrobot › Normally angles are transformed using sine and cosine to make the system dynamics smooth › What if we are agnostic? We do not know if a system variable is an angle › Abrupt jumps are OK, but if we have (epistemic) uncertainty, posteriors need to be multimodal B. Kegl / Huawei Research France Is multi-modal posterior predictive important?
  • 18. HUAWEI TECHNOLOGIES CO., LTD. Page 19 Is multi-modal posterior predictive important? "raw angles" acrobot B. Kegl / Huawei Research France
  • 19. HUAWEI TECHNOLOGIES CO., LTD. Page 20 › Baseline density ℒb is a multivariate unconditional spherical Gaussian › Measures how much the data is more likely under the learned model than under the baseline likelihood › Baseline = 1, higher the better, no limit Evaluation Likelihood ratio to simple baseline 𝐿𝑅 𝒐 𝑡, 𝑎 𝑡 𝑡=1 𝑇 ; 𝒑 = 𝒆ℒ 𝒐 𝑡,𝑎 𝑡 𝑡=1 𝑇 ;𝒑 𝒆ℒb 𝒐 𝑡,𝑎 𝑡 𝑡=1 𝑇 Log Likelihood ℒ 𝒐 𝑡, 𝑎 𝑡 𝑡=1 𝑇 ; 𝒑 = 1 𝑇 − 1 𝑡=1 𝑇−1 log 𝑝1 𝑜𝑡+1 1 𝒔 𝑡 + 𝑗=2 4 log 𝑝𝑗 𝑜𝑡+1 𝑗 𝒔 𝑡, 𝑜𝑡+1 1 , … , 𝑜𝑡+1 𝑗−1 B. Kegl / Huawei Research France
  • 20. HUAWEI TECHNOLOGIES CO., LTD. Page 21 Results on skewed acrobot data Algorithm Acrobot "sincos", data generated with linear policy time series, 5K training points Likelihood ratio to spherical Gaussian Precision (R2) after 10 steps Calibratedness (Kolmogorov-Smirnov) after 10 steps Linear regression + constant sigma 2 4% 0.127 Gaussian process 56 83% 0.133 NN regression + constant sigma 32 55% 0.194 DMDN with 10 components 95 90% 0.128 DARMDN with 10 components 119 87% 0.095 B. Kegl / Huawei Research France  DARMDN is both precise and well-calibrated  OK, but does it matter for model-based RL?
  • 21. HUAWEI TECHNOLOGIES CO., LTD. Page 22 1. Collect samples from a random policy 2. Train model on collected samples 3. Learn control policy on the model 4. Apply control policy on real system and collect the data, go back to 2. Model-based RL loop B. Kegl / Huawei Research France  We retrain the model after each episode of 200 steps  Control policy is classical random shooting (RS) [Richards 2005] › Simulate trajectories of 𝑁 = 10 steps using random actions › Select the optimal trajectory (with the highest reward after 𝑁 steps) › Execute the first action of the optimal trajectory
  • 22. HUAWEI TECHNOLOGIES CO., LTD. Page 23B. Kegl / Huawei Research France Acrobot "raw angles"  DARMDN with random shooting is the new SOTA › Almost as good as planning using the real system dynamics › Converges 2 to 4 times faster than previous SOTA x4 x2
  • 23. HUAWEI TECHNOLOGIES CO., LTD. Page 24B. Kegl / Huawei Research France Acrobot "sincos"
  • 24. HUAWEI TECHNOLOGIES CO., LTD. Page 25 Learnt policy after ~10k samples B. Kegl / Huawei Research France
  • 25. HUAWEI TECHNOLOGIES CO., LTD. Page 26 Deterministic predictors B. Kegl / Huawei Research France  Do we really need to represent uncertainty?  𝑝 𝑦 𝒙) = 𝐃𝐢𝐫𝐚𝐜(𝒇 𝒙; 𝛉 )  What models? › NNdet: classical neural net › DARMDN(10)det: mean of the predictive posterior: a deterministic model learned probabilistically
  • 26. HUAWEI TECHNOLOGIES CO., LTD. Page 27B. Kegl / Huawei Research France Acrobot "raw angle": no surprise deterministic models are suboptimal
  • 27. HUAWEI TECHNOLOGIES CO., LTD. Page 28B. Kegl / Huawei Research France Acrobot "raw angle": no surprise deterministic models are suboptimal
  • 28. HUAWEI TECHNOLOGIES CO., LTD. Page 29B. Kegl / Huawei Research France Acrobot "sincos": what? Deterministic model is optimal but only if learned probabilistically
  • 29. HUAWEI TECHNOLOGIES CO., LTD. Page 30B. Kegl / Huawei Research France Acrobot "sincos": what? Deterministic model is optimal but only if learned probabilistically
  • 30. HUAWEI TECHNOLOGIES CO., LTD. Page 31 Is it heteroscedasticity or multimodality?
  • 31. HUAWEI TECHNOLOGIES CO., LTD. Page 32B. Kegl / Huawei Research France It is heteroscedasticity
  • 32. HUAWEI TECHNOLOGIES CO., LTD. Page 33  Model-based control, bandits, and reinforcement learning › Learn to control the system in a sample efficient way: » "real world will not become faster in a few years, contrary to computers" [Chatzilygeroudis et al., 2019] › State of the art suffers from the lack of efficient system modelling tools › Modelling uncertainties is crucial for safety  Bayesian optimization › Require good and efficient models to quantify uncertainty due to unknown  Transfer learning, meta-learning, and robust reinforcement learning › Precise probabilistic system models allow to transfer models between systems of the same kind  Anomaly detection › Anomaly = system state is beyond "likely" behavior Broader applications of DARMDN B. Kegl / Huawei Research France
  • 33. HUAWEI TECHNOLOGIES CO., LTD. Page 34  Deep autoregressive mixture density (DARMDN) + random shooting is new SOTA on Acrobot  Autoregression is useful for modelling y-interdependence  Multimodal posterior predictive is necessary on "raw angles" representation  Deterministic DARMDN is as good as stochastic models on "sincos" representation, beats NN model trained for deterministic (RMSE) loss › Something happens in the long horizon, no error accumulation › Perhaps heteroscedastic epistemic uncertainty models may "let outliers go"? Conclusions B. Kegl / Huawei Research France
  • 34. Thank you www.huawei.com Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice. Page 35 HUAWEI TECHNOLOGIES CO., LTD.