Introduction to Hamiltonian Neural Networks

An Introduction to
Hamiltonian Neural Networks
Presented by Miles Cranmer, Princeton University
@MilesCranmer
(advised by Shirley Ho/David Spergel)
This is based on none of my own research.
The work is by:
Sam Greydanus, Misko Dzamba, and Jason Yosinski
(+ Tom Bertalan, Felix Dietrich, Igor Mesić, and Ioannis G
Kevrekidis which was posted at a similar time)

Ordering:
1. Classical Mechanics Review
2. Neural Networks
3. Hamiltonian Neural Networks
4. Bonus: Neural ODEs
5. Code Demo

Forces
• Objects and fields by themselves induce
forces on other objects
• A vector-wise sum of forces gets the net force
• Divide by mass of the body to get the
acceleration
• Common forces:
• Normal force (desk holding something)
• Friction
• Tension (string)
• Gravity
[1]

Lagrangian Mechanics
• For a coordinate system,
• (Focus on object coordinates for today)
• Write down kinetic energy =
• Potential energy =
• Lagrangian is a function of coordinates and (usually) their first
order derivatives
• Action is:
• Apply principle of stationary action

Lagrangian Mechanics 2
• By extremizing the action, we get the Euler-Lagrange equations.
• Example: falling ball:
• Numerically integrate these to get the dynamics of the system

Hamiltonian Mechanics
• Canonical momenta for a system:
• Legendre transformation of L is the Hamiltonian:
• This usually is the energy, conserved in a dynamical system.
• What path preserves H?
• Move perpendicular to its gradient!
• Called symplectic programming

Hamiltonian Mechanics 2
• H-preserving path = Symplectic Gradient:
• Also known as Hamilton’s equations!
• Can use these first order, explicit ODEs to integrate physical
dynamics
• Problems with L:
• Second order, implicit ODEs
• L isn’t meaningful by itself

Things to worry about with L, H
• Dissipation/friction
• Need to add force to Euler-Lagrange equation
• Can also use multiplicative factor:
• Energy pools/boundaries
• Constraints
• E.g., normal forces
• Sol’n: Use better coordinates (sometimes tricky)
• Or, use constraint function that equals 0
• (Lagrange multiplier method)
• *After reading the presentation – if you manage to think of a way
to add these techniques to a Hamiltonian NN, come talk to me!

Integrators
• Presented with an explicit differential equation,
we can use several methods to numerically integrate it.
• Recall that:
• This is an Euler integrator:

Accurate Integrators
• Advanced integrators do several
intermediate steps to improve accuracy
• Runge-Kutta integrators target accuracy
• Can be very accurate, but not preserve
known invariants!
• Symplectic integrators target energy
conservation
• Can preserve energy very well, but have no
accuracy!
• (All integrators are bad for longterm
accuracy)
[3]

Integrator Examples
• Runge-Kutta 4th order
(most common)
• High accuracy, low-
cost
• Does not necessarily
preserve energy
[3]

• Symplectic 4th order (Yoshida)
• These exactly conserve energy!
• Do drift (update x) and kick (update p) steps separately
• (c, d) are ugly constants,
some negative,
which add to 1
[4]

Pivot to Machine Learning
• Recall (or not?): Machine Learning is parameter estimation where
the parameters lack explicit physical meaning!
• Many types of ML:
• Supervised (common):
• Regression
• Classification
• Unsupervised
• E.g., clustering, density estimation
• Semi-supervised – a mix
• Linear Regression – this counts as ML!
[5]

Neural Networks
• Repeat after me:
Neural Networks are piecewise Linear Regression!
• Mathematically (we’ll only talk Multi-Layer Perceptrons):
• (You do a linear regression -> zero the negatives -> repeat)

Neural Networks 2
• Repeat after me:
Neural Networks are piecewise Linear Regression!
• 0-hidden layer Neural Network: linear regression!
• 1-hidden layer NN with ReLU: Piecewise
• Whatever combination of “neurons” are on = different “region” for linear
regression
• 2^(layers*hidden size) different linear regression solutions
• Continuously connected
• Don’t expect good extrapolation! Only nearby interpolation
• Neural Net parameters both inform the slope and the regions.

I don’t believe you!
• Randomly-initialized 2-hidden layer 50-node NN:

Why?
• ReLU on = linear regression
• ReLU off = 0
• Remaining nodes simplify to
linear regression!
[6]

Neural Network Aside
• Other activation functions: tanh and softplus, smear this linearity
• Neural Networks are universal function approximators. In the
limit of infinitely wide layers, even with two hidden ones, they can
express any mapping.
• They happen to be efficient at doing this too!
• All Neural Network techniques are about getting them to cheat
less. They are very good at cheating.
• Data Augmentation (hugely important)
• Regularization
• Structure (Convolutional NN, Graph Net, etc)

Differentiability
• Derivative is well-defined. Just a product of sparse matrices!
• Interested in:
• Derivative wrt weights used for optimization (SGD or Adam)
• Auto-diff frameworks like TensorFlow and PyTorch make this easy.
• Demo: https://playground.tensorflow.org

Neural Nets for Physical Dynamics
• Here we will focus on physical systems over time.
• Many other things like sequences can be reframed as dynamics
problems.
• We are interested in problems where we have:
•
• for i particles over time
• In addition to other fixed properties...
• How do we use Neural Nets to simulate systems?

Example - Pendulum
• How to learn to estimate the future position and velocity of a
pendulum?
• Neural Net:
• n is the number of particles*dynamical parameters
• l is the number of fixed parameters
• Pendulum:
• n = 2 (theta, theta velocity)
• l = 2 (gravity, length of pendulum)
• Want to only predict change in parameters - easier regression problem
• So, here we are learning a function that approximates a velocity update
and a force law
[7]

Real World Applications (of NNs for
simulation)
• Neural Networks learn "effective" forces in simulations
• They only look at the most relevant degrees of freedom!
• Can be more accurate at reduced computational cost
• Some examples:
• Shirley Ho's U-Net can do cosmological simulations much faster and more
accurately than standard simulators
• Peter Battaglia's Interaction Network used in many applications
• Drug discovery/molecular+protein modelling – getting very popular
• E.g., Cecilia Clementi, Frank Noe, Mark Waller, many others
• DeepMind's AlphaFold Protein Folding algorithm - destroys baseline algorithms at
finding structure from genetic code
• See IPAM's recent workshop for good list!
• Some say intelligent reasoning is based on learning to simulate potential
outcomes => path to general intelligence?

Hamiltonian Neural Networks
• Learn a mapping from coordinates and momenta to a single
number
• The derivatives of this can describe your dynamics by Hamilton's
equations:
• Comparing the true and predicted dynamical updates gives a
minimization objective:
(Sam’s blog)

Why?
• It works better; it’s more interpretable. Not only
do we have a simulator, we know the energy!
(Sam’s blog)

Why does it work?
• It uses symplectic gradients: by prescribing that we can only move
along the level set of H, it learns the proper H.
Start: Final:
(Sam’s blog)

(Sanchez-Gonzalez
et al)
Graph Network extension:

Integrators
• So far we have only talked about Euler integrators. But as dH is
just an ODE, we can use any integrator: RK4 and symplectic
included.
• If H has learned the true energy, we can exactly preserve it with
symplectic integrators.
• In practice, RK4 still more accurate. Maybe some combination is best?
This model is less than 6 months old! We don't know what is best yet.
• Can train + eval with RK4 or Symplectic Methods!
• Do multiple queries and multiple derivatives of your network’s H
• This works very well in practice.

I don’t know the canonical coordinates!
• Pair two Neural Networks:
• g, an autoencoder to latent variables
• H, a Hamiltonian that pretends those
latent variables are (q, p).
• Training this setup in combination
will learn the
canonical coords
+ the Hamiltonian!
(Sam’s blog)

Tips
• Activations:
• Recall: Neural Networks are piecewise linear regression.
• Looking at derivatives from ReLU means we are literally learning a lookup
table – not good!
• Use Softplus or Tanh to make H have a smoother derivative
• Use more hidden nodes than for regular NNs, as H needs to be very
smooth
• Stability:
• According to some (Stephan Hoyer), better to learn multiple timesteps at
once.
• Use RK4 integrators

Bonus: Neural ODEs
• Famous 2018 paper:
Neural Ordinary Differential
Equations.
• Hamiltonian Neural
Networks -ARE- a Neural
ODE.
• Paper connects ResNets
with Euler integrators
• Paper: “Why not just learn a
derivative and integrate it?”
• Smoother output!
(Chen et al)

PyTorch Tutorial – Falling Ball
• Short: https://bit.ly/2JiTEJE
• (Copy to new notebook in your drive)

Figure + other references
1. http://ffden-2.phys.uaf.edu/211_fall2004.web.dir/Jeff_Levison/Freebody%20diagram.htm
2. https://physics.stackexchange.com/questions/384990/why-will-a-dropped-object-land-at-the-same-time-as-a-sideways-
thrown-one
3. https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods
4. https://en.wikipedia.org/wiki/Leapfrog_integration
5. https://en.wikipedia.org/wiki/Linear_regression#/media/File:Linear_regression.svg
6. https://medium.com/@amarbudhiraja/https-medium-com-amarbudhiraja-learning-less-to-learn-better-dropout-in-
deep-machine-learning-74334da4bfc5
7. https://medium.com/@kriswilliams/how-life-is-like-a-pendulum-8811c4177685
Other resources used:
1. https://arxiv.org/abs/1906.01563
2. https://arxiv.org/abs/1907.12715
3. https://arxiv.org/pdf/1909.12790.pdf
4. https://greydanus.github.io/2019/05/15/hamiltonian-nns/
5. https://arxiv.org/pdf/1806.07366.pdf

Introduction to Hamiltonian Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Hamiltonian Neural Networks

Similar to Introduction to Hamiltonian Neural Networks (20)

Recently uploaded

Recently uploaded (20)

Introduction to Hamiltonian Neural Networks