Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deeplearning ai june-sharable (1)

State-of-the-art time-series prediction with continuous-time recurrent neural networks.

Neural networks with continuous-time hidden state representations have become unprecedentedly popular within the machine learning community. This is due to their strong approximation capability in modeling time-series, their adaptive computation modality, their memory and parameter efficiency. In this talk Ramin will discuss how this family of neural networks work and why they realize attractive degrees of generalizability across different application domains.

OUR SPEAKER
Ramin Hasani, PhD, Machine Learning Scientist at TU Wien, expert in robotics, including previously being a scholar MIT CSAL, presents technical aspects of continuous-time neural networks.

  • Be the first to comment

  • Be the first to like this

Deeplearning ai june-sharable (1)

  1. 1. On the Expressive Power of Time-continuous Neural Networks Ramin Hasani June 12th 2020 Liquid Time-constant Networks R. Hasani, M. Lechner, A. Amini, D. Rus, R. Grosu submitted to NeurIPS 2020 https://arxiv.org/abs/2006.04439
  2. 2. What is a time-continuous neural network? 2 Chen et al. NeurIPS 2018 He et al. CVPR 2016 Dynamical systems Number of layers Width activations Inputs Model parameters Hidden state What is depth here?
  3. 3. Time-continuous neural networks 3 Neural ODE Chen et al. NeurIPS, 2018 Continuous-time (CT) RNN [Funahashi et al. 1993] Chen et al. NeurIPS, 2018
  4. 4. Time-continuous neural networks How to implement them? 4 Numerical ODE solvers 𝑑𝐱(𝑑) 𝑑𝑑 β‰ˆ 𝐱 𝑑 + 𝛿𝑑 βˆ’ 𝐱 𝑑 𝛿𝑑 β‰ˆ 𝑓 𝐱 𝑑 , 𝑑, πœƒ 𝐱 𝑑 + 𝛿𝑑 = 𝐱 𝑑 + 𝛿𝑑 𝑓(𝐱 𝑑 , 𝑑, πœƒ)Forward-pass Choice of the way we do an integration step determines forward pass complexity
  5. 5. Time-continuous neural networks How to train them? 5 Neural ODE Adjoint Sensitivity Method [Pontryagin et al. 1962, Chen et al. NeurIPS, 2018] Memory Complexity 𝑂 1 Per layer of 𝑓
  6. 6. Time-continuous neural networks How to train them? 6 Backpropagation through-time [Werbos, 1990, Hasani et al. 2020] Memory Complexity 𝑂 𝐿 βˆ— 𝑇 Per layer of 𝑓 Perform one forward-pass 𝐱 𝑑 + 𝛿𝑑 = 𝐱 𝑑 + 𝛿𝑑 𝑓 𝐱 𝑑 , 𝑑, πœƒ Compute gradients through the ODE solver dΘ = 𝑑𝐿 𝑑π‘₯(𝑑 + 𝛿𝑑) , 𝑑π‘₯(𝑑 + 𝛿𝑑) 𝑑π‘₯ 𝑑 , 𝑑π‘₯(𝑑 + 𝛿𝑑) 𝑑𝑓 , 𝑑𝑓 𝑑π‘₯(𝑑) , 𝑑𝑓 𝑑𝑑 , 𝑑𝑓 π‘‘πœƒ Update parameters Θ'() ← Θ*+, + 𝛾 dΘ
  7. 7. Time-continuous neural networks Better to stay with BPTT 7 Hasani, et al. 2020 Gholami et al. Arxiv, 2019
  8. 8. Time-continuous neural network How to use them? 8 Formulate them as recurrent neural networks (RNNs) Rubanova et al. NeurIPS 2019
  9. 9. Time-continuous neural network What are they useful for? 9 Best can be used as recurrent neural networks (RNNs) for Irregularly sampled time series Sparse data (e.g. missing values, non-uniform intervals) Rubanova et al. NeurIPS 2019
  10. 10. Time-continuous neural network What are they useful for? 10 Best practice is to be used as recurrent neural networks (RNNs) For Irregularly sampled time series Rubanova et al. NeurIPS 2019
  11. 11. Time-continuous neural networks Liquid time-constant networks (LTCs) 11 Hasani, et al. 2020 Neural ODE CT-RNN Instead of:
  12. 12. What is the motivation behind chosing LTC models? 12
  13. 13. [Raghu et al. ICML 2017] Expressivity Defining a better measure Trajectory Length as a measure of expressivity of Deep networks [Raghu et al. ICML 2017] Projection to trajectory latent 2-D space Input trajectory … n hidden layers each of width k Weights ∼ 𝒩(0, !! " " ) ReLU activations Projection to trajectory latent 2-D space Arc-length 13 Hasani, et al. 2020
  14. 14. LTC CT-RNN Neural ODE PCA Input trajectory activation functions: ReLU, tanh, logistic sigmoid Projection to a latent trajectory 2-D space Let’s implement the trajectory space for time-continuous models Weights ∼ 𝒩(0, !! " " ) PCA = Principle Component Analysis Expressivity Trajectory length as a measure of expressivity 14 Hasani, et al. 2020
  15. 15. Expressivity Trajectory length as a measure of expressivity 15 Hasani, et al. 2020
  16. 16. Expressivity Trajectory length as a measure of expressivity 16 RK2(3) RK4(5) ABM1(13) TR-BDF2 ODE Solvers 0 200 400 600 TrajectoryLength LTC N-ODE CT-RNN samples = 100 activations = relu depth = 1, width = 100 2 w = 2, 2 b = 1 10 25 50 100 150 200 Network Width (k) 100 101 102 103 104 TrajectoryLength LTC N-ODE CT-RNN samples = 100, solver = RK45 activations = tanh depth = 1, 2 w = 2, 2 b = 1 A B 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) N-ODE CT-RNN LTC 1 2 4 8 w 2 101 102 103 104 TrajectoryLength LTC N-ODE CT-RNN samples = 100, solver = RK45 activations = relu depth = 1, 2 b = 1 C N-ODE CT-RNN LTC 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) 1 2 3 4 PC 0 20 40 60 80 100 VarienceExplained(%) L1 L2 L3 L4 L5 L6 Network Layers 101 102 103 TrajectoryLength LTC N-ODE CT-RNN samples = 100 solver = RK45 activations = sigmoid depth = 6, 2 w = 2, 2 b = 1 D Hasani, et al. 2020
  17. 17. Expressivity Trajectory length as a measure of expressivity 17 L1 L2 L3 L4 Network Layers 102 104 106 TrajectoryLength LTC N-ODE CT-RNN samples = 100 solver = RK45 activations = Htanh depth = 4, width = 25 2 w = 2, 2 b = 1 10 25 50 100 150 200 Network Width (k) 102 104 106 TrajectoryLength LTC N-ODE CT-RNN samples = 100, solver = RK45 activations = Htanh depth = 1, 2 w = 2, 2 b = 1 10 25 50 100 150 200 Network Width (k) 100 101 102 103 104 TrajectoryLength LTC N-ODE CT-RNN samples = 100 solver = RK45 activations = sigmoid depth = 1, 2 w = 2, 2 b = 1 1 2 4 8 16 32 w 2 101 102 103 104 TrajectoryLength LTC N-ODE CT-RNN samples = 100 solver = RK45 activations = Htanh depth = 1, 2 b = 1 0.1 0.2 0.02 0.01 0.001 Input step-size 102 103 TrajectoryLength LTC N-ODE CT-RNN samples = 100 solver = RK45 activations = Htanh depth = 1, width = 25 2 w = 2, 2 b = 1 Hasani, et al. 2020
  18. 18. Expressivity Trajectory length lower bound 18 Hasani, et al. 2020 LTC:
  19. 19. Performance LTCs in modeling physical dynamics 19 Hasani, et al. 2020
  20. 20. Performance LTCs in modeling irregularly sampled data 20 Hasani, et al. 2020
  21. 21. Performance LTCs in Modeling more time series data 21 Hasani, et al. 2020
  22. 22. Summary 22 ΓΌ Time-continuous time models are useful for modeling irregularly sampled data ΓΌ Training Neural ODEs by adjoint: trading accuracy for low memory costs ΓΌ Neural ODEs are not limited to this !" !# = 𝑓 π‘₯, 𝐼, 𝑑, πœƒ representation ΓΌ More expressive dynamics can be achieved by LTCs ΓΌ Time-continuous models can be combined by conv nets to perform robotics ΓΌ Down-side: All ODE-RNNs provably suffer from vanishing/exploding gradiants Solution: Learning Long-Term Dependencies in Irregularly-Sampled Time Series M. Lechner and R. Hasani submitted to NeurIPS 2020 https://arxiv.org/abs/2006.04418
  23. 23. Thank you! Feel free to reach out: rhasani@mit.edu

Γ—