This is a tutorial on Hamiltonian Neural Networks based on Greydanus et al's work (and independently-proposed Bertalan et al). I go through the classical mechanics necessary to understand them, and discuss their connection to Neural Ordinary Differential Equations. I finish by with a PyTorch example to predict the path of a falling ball.
1. An Introduction to
Hamiltonian Neural Networks
Presented by Miles Cranmer, Princeton University
@MilesCranmer
(advised by Shirley Ho/David Spergel)
This is based on none of my own research.
The work is by:
Sam Greydanus, Misko Dzamba, and Jason Yosinski
(+ Tom Bertalan, Felix Dietrich, Igor Mesić, and Ioannis G
Kevrekidis which was posted at a similar time)
3. Forces
• Objects and fields by themselves induce
forces on other objects
• A vector-wise sum of forces gets the net force
• Divide by mass of the body to get the
acceleration
• Common forces:
• Normal force (desk holding something)
• Friction
• Tension (string)
• Gravity
[1]
4. Lagrangian Mechanics
• For a coordinate system,
• (Focus on object coordinates for today)
• Write down kinetic energy =
• Potential energy =
• Lagrangian is a function of coordinates and (usually) their first
order derivatives
• Action is:
• Apply principle of stationary action
5. Lagrangian Mechanics 2
• By extremizing the action, we get the Euler-Lagrange equations.
• Example: falling ball:
• Numerically integrate these to get the dynamics of the system
6. Hamiltonian Mechanics
• Canonical momenta for a system:
• Legendre transformation of L is the Hamiltonian:
• This usually is the energy, conserved in a dynamical system.
• What path preserves H?
• Move perpendicular to its gradient!
• Called symplectic programming
8. Hamiltonian Mechanics 2
• H-preserving path = Symplectic Gradient:
• Also known as Hamilton’s equations!
• Can use these first order, explicit ODEs to integrate physical
dynamics
• Problems with L:
• Second order, implicit ODEs
• L isn’t meaningful by itself
9. Things to worry about with L, H
• Dissipation/friction
• Need to add force to Euler-Lagrange equation
• Can also use multiplicative factor:
• Energy pools/boundaries
• Constraints
• E.g., normal forces
• Sol’n: Use better coordinates (sometimes tricky)
• Or, use constraint function that equals 0
• (Lagrange multiplier method)
• *After reading the presentation – if you manage to think of a way
to add these techniques to a Hamiltonian NN, come talk to me!
10. Integrators
• Presented with an explicit differential equation,
we can use several methods to numerically integrate it.
• Recall that:
• This is an Euler integrator:
11. Accurate Integrators
• Advanced integrators do several
intermediate steps to improve accuracy
• Runge-Kutta integrators target accuracy
• Can be very accurate, but not preserve
known invariants!
• Symplectic integrators target energy
conservation
• Can preserve energy very well, but have no
accuracy!
• (All integrators are bad for longterm
accuracy)
[3]
14. • Symplectic 4th order (Yoshida)
• These exactly conserve energy!
• Do drift (update x) and kick (update p) steps separately
• (c, d) are ugly constants,
some negative,
which add to 1
[4]
15. Pivot to Machine Learning
• Recall (or not?): Machine Learning is parameter estimation where
the parameters lack explicit physical meaning!
• Many types of ML:
• Supervised (common):
• Regression
• Classification
• Unsupervised
• E.g., clustering, density estimation
• Semi-supervised – a mix
• Linear Regression – this counts as ML!
[5]
16. Neural Networks
• Repeat after me:
Neural Networks are piecewise Linear Regression!
• Mathematically (we’ll only talk Multi-Layer Perceptrons):
• (You do a linear regression -> zero the negatives -> repeat)
17. Neural Networks 2
• Repeat after me:
Neural Networks are piecewise Linear Regression!
• 0-hidden layer Neural Network: linear regression!
• 1-hidden layer NN with ReLU: Piecewise
• Whatever combination of “neurons” are on = different “region” for linear
regression
• 2^(layers*hidden size) different linear regression solutions
• Continuously connected
• Don’t expect good extrapolation! Only nearby interpolation
• Neural Net parameters both inform the slope and the regions.
19. Why?
• ReLU on = linear regression
• ReLU off = 0
• Remaining nodes simplify to
linear regression!
[6]
20. Neural Network Aside
• Other activation functions: tanh and softplus, smear this linearity
• Neural Networks are universal function approximators. In the
limit of infinitely wide layers, even with two hidden ones, they can
express any mapping.
• They happen to be efficient at doing this too!
• All Neural Network techniques are about getting them to cheat
less. They are very good at cheating.
• Data Augmentation (hugely important)
• Regularization
• Structure (Convolutional NN, Graph Net, etc)
21. Differentiability
• Derivative is well-defined. Just a product of sparse matrices!
• Interested in:
• Derivative wrt weights used for optimization (SGD or Adam)
• Auto-diff frameworks like TensorFlow and PyTorch make this easy.
• Demo: https://playground.tensorflow.org
22. Neural Nets for Physical Dynamics
• Here we will focus on physical systems over time.
• Many other things like sequences can be reframed as dynamics
problems.
• We are interested in problems where we have:
•
• for i particles over time
• In addition to other fixed properties...
• How do we use Neural Nets to simulate systems?
23. Example - Pendulum
• How to learn to estimate the future position and velocity of a
pendulum?
• Neural Net:
• n is the number of particles*dynamical parameters
• l is the number of fixed parameters
• Pendulum:
• n = 2 (theta, theta velocity)
• l = 2 (gravity, length of pendulum)
• Want to only predict change in parameters - easier regression problem
• So, here we are learning a function that approximates a velocity update
and a force law
[7]
24. Real World Applications (of NNs for
simulation)
• Neural Networks learn "effective" forces in simulations
• They only look at the most relevant degrees of freedom!
• Can be more accurate at reduced computational cost
• Some examples:
• Shirley Ho's U-Net can do cosmological simulations much faster and more
accurately than standard simulators
• Peter Battaglia's Interaction Network used in many applications
• Drug discovery/molecular+protein modelling – getting very popular
• E.g., Cecilia Clementi, Frank Noe, Mark Waller, many others
• DeepMind's AlphaFold Protein Folding algorithm - destroys baseline algorithms at
finding structure from genetic code
• See IPAM's recent workshop for good list!
• Some say intelligent reasoning is based on learning to simulate potential
outcomes => path to general intelligence?
25. Hamiltonian Neural Networks
• Learn a mapping from coordinates and momenta to a single
number
• The derivatives of this can describe your dynamics by Hamilton's
equations:
• Comparing the true and predicted dynamical updates gives a
minimization objective:
(Sam’s blog)
27. Why?
• It works better; it’s more interpretable. Not only
do we have a simulator, we know the energy!
(Sam’s blog)
28. Why does it work?
• It uses symplectic gradients: by prescribing that we can only move
along the level set of H, it learns the proper H.
Start: Final:
(Sam’s blog)
30. Integrators
• So far we have only talked about Euler integrators. But as dH is
just an ODE, we can use any integrator: RK4 and symplectic
included.
• If H has learned the true energy, we can exactly preserve it with
symplectic integrators.
• In practice, RK4 still more accurate. Maybe some combination is best?
This model is less than 6 months old! We don't know what is best yet.
• Can train + eval with RK4 or Symplectic Methods!
• Do multiple queries and multiple derivatives of your network’s H
• This works very well in practice.
31. I don’t know the canonical coordinates!
• Pair two Neural Networks:
• g, an autoencoder to latent variables
• H, a Hamiltonian that pretends those
latent variables are (q, p).
• Training this setup in combination
will learn the
canonical coords
+ the Hamiltonian!
(Sam’s blog)
32. Tips
• Activations:
• Recall: Neural Networks are piecewise linear regression.
• Looking at derivatives from ReLU means we are literally learning a lookup
table – not good!
• Use Softplus or Tanh to make H have a smoother derivative
• Use more hidden nodes than for regular NNs, as H needs to be very
smooth
• Stability:
• According to some (Stephan Hoyer), better to learn multiple timesteps at
once.
• Use RK4 integrators
33. Bonus: Neural ODEs
• Famous 2018 paper:
Neural Ordinary Differential
Equations.
• Hamiltonian Neural
Networks -ARE- a Neural
ODE.
• Paper connects ResNets
with Euler integrators
• Paper: “Why not just learn a
derivative and integrate it?”
• Smoother output!
(Chen et al)
34. PyTorch Tutorial – Falling Ball
• Short: https://bit.ly/2JiTEJE
• (Copy to new notebook in your drive)