The models used in molecular simulations have become complex to the point of "existing" only in software source code, which is hard to read, rarely reviewed and often not even published. This talk explores how and why we should pull the models out of the source code and put them back into scientific discourse.
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Numerical models for complex molecular systems
1. Mod`eles num´eriques pour les syst`emes mol´eculaires
complexes
Konrad HINSEN
Centre de Biophysique Mol´eculaire, Orl´eans, France
et
Synchrotron SOLEIL, Saint Aubin, France
6 septembre 2018
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 1 / 23
2. Sujets abord´es
1 Les mod`eles en physique
2 Les mod`eles en simulation mol´eculaire
3 Mes int´erˆets de recherche dans ce domaine
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 2 / 23
3. Models in physics
Theories and models
Dominant model type: differential equations
Most models are “plugins” for “frameworks” called theories
Exception: ad-hoc models for emergent phenomena in complex
systems
Some widely used theories and their models
Classical mechanics: Lagrangian function
Quantum mechanics: Hamilton operator
Thermodynamics: thermodynamic potential function
Statistical mechanics: partition function
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 3 / 23
4. Physical models and computation
Comparison with observation requires numbers, and thus computation
Models/equations are specifications, not algorithms
Finding solution algorithms is not trivial
Traditional path: analytical solution → numerical evaluation
Recent alternative: direct numerical solution
Both paths rely heavily on approximations.
Constructing and evaluating approximations to models is a big part of the
everyday work of a physicist.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 4 / 23
5. A simple example: simulating celestial mechanics
Given: past positions of
the planets of the solar
system
Goal: predict the future
positions of these planets
K. Hinsen, Comp. Sci. Eng. 17(4), 2015
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 5 / 23
6. Phase 1: physics
1 Approximation: There is nothing but the solar system.
We neglect the influence of the rest of the universe.
2 Approximation: The Sun and the planets are point masses.
We neglect the influence of their sizes and shapes.
3 Approximation: Newton’s laws of motion and gravity
We neglect relativistic and quantum effects.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 6 / 23
7. The physics model
1. Law of motion (the theory):
d
dt
ri (t) = vi (t) v: velocity
d
dt
vi (t) = ai (t) a: acceleration
Fi (t) = mi ai (t) F: force, m: mass
2. Law of universal gravitation:
Fi = N
j=1
j=i
Fij
Fij = −G
mi mj
|ri −rj |2
ri −rj
|ri −rj |
3. Two observations ri (t1) and ri (t2)
This model defines ri (t) for all t, past and future.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 7 / 23
8. Phase 2a (idealist): computable analysis
Goal: construct an algorithm that, given t and an error bound ,
computes r
( )
i (t) such that
ri (t) − r
( )
i (t) <
possible in principle (existence proof)
hasn’t been done for Newton’s equations (as far as I know)
impractical in terms of CPU time and memory requirements
Marian B. Pour-El and J. Ian Richards
Computability in Analysis and Physics
Springer, Berlin, 1989
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 8 / 23
9. Phase 2b (realist): numerical analysis
1 Approximation: differentials → finite differences
Accept discretization error in return for solvable equations.
2 Approximation: real numbers → floating-point numbers
Accept round-off error in return for efficiency.
Choices to be made:
finite-difference scheme
integration step size
floating-point precision
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 9 / 23
10. The numerical model
St¨ormer-Verlet integrator:
r
(n+1)
i = 2r
(n)
i − r
(n−1)
i +
∆2
mi
F
(n)
i
Gravitation:
F
(n)
i = N
j=1
j=i
F
(n)
ij
F
(n)
ij = −G
mi mj
|r
(n)
i −r
(n)
j |2
r
(n)
i −r
(n)
j
|r
(n)
i −r
(n)
j |
ri : vector of three floats for each i
∆: integration step size (float)
floating-point precision: IEEE-754 single/double, arbitrary via MPFR
Floating-point requires an explicit choice of the order of operations,
but then we have specified the results to the last bit!
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 10 / 23
11. Phase 3: software
1 Approximation: algorithmic changes during code optimization
Accept modified results in return for speed.
2 Approximation: the compiler re-orders floating-point operations
Accept modified results in return for speed.
Verification/validation practically impossible:
approximations not documented for software users
users cannot opt out
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 11 / 23
12. The invasion of complexity
Scientific computation in the 1960s
Long but simple computations.
Check by hand for small N.
Small N → big N, re-run.
Slowly but surely...
Ever more complex objects of study.
Ever more complex models.
Ever more complex computational protocols.
Ever more complex software.
It’s becoming impossible to keep track of all approximations. Scientists
don’t know which model they are actually using.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 12 / 23
13. Complexity
−→
Same equations, but a lot more points and parameters
More severe approximations required for efficiency
Software source code becomes very difficult to read...
... but we have no other precise notation for the models.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 13 / 23
14. Complexity in Molecular Dynamics simulations
Principle
Follow the motions of the atomic nuclei.
Essential input: a model for the interactions between atoms
1964: liquid argon
A single atom type: argon
Lennard-Jones interactions: two parameters
1994: lysozyme (a small protein)
1 960 atoms of 26 distinct types (forcefield AMBER94)
74 759 energy parameters
Parameter assignment requires non-trivial graph traversal algorithms
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 14 / 23
15. Uncertainty through obscurity: a recent case
A. Smart, Physics Today, 22 August 2018
My view: not a coding error, but a badly chosen approximation not
documented anywhere else than in unpublished source code.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 15 / 23
16. Research agenda for a better care of models
Goals
Make models readable by scientists (source code → paper)
Make all approximations explicit and exposed to peer review
K. Hinsen, F1000 Research 3, 101 (2014)
Approaches
Digital scientific notations
Model-Driven Engineering ?
K. Hinsen, The Self-Journal of Science (2016)
K. Hinsen, PeerJ CompSci 4, e158 (2018)
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 16 / 23
17. Digital scientific notations
Notations for models.
Formal languages.
Specification, not implementation.
Human-readable, embedded in plain text.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 17 / 23
18. Leibniz: a digital scientific notation
An algebraic specification language inspired by Maude
Based on equational logic and term rewriting
Main novelty: embedded into plain text (“literate specification”)
Application domains:
Development focus: physics, chemistry
More generally: models based on continuous mathematics
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 18 / 23
19. Unusual features
Doing research = constructing software tools
No namespaces, no scopes, but explicit renaming.
Minimal built-in functionality: numbers, strings, and booleans.
No “standard library”.
Adapt published libraries rather then re-use without modification.
Discourage the creation of black-box code libraries.
Understandability takes priority over modularity and reusability
Think of it as exectuable mathematics, not software
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 19 / 23
20. Example: source code
#lang leibniz
@import["mechanics" "mechanics.xml"]
@import["quantities" "quantities.xml"]
@title{Motion of a mass on a spring}
@author{Konrad Hinsen}
@context["mass-on-a-spring"
#:use "mechanics/dynamics"
#:use "quantities/angular-frequency"]{
We consider a point-like object of mass @op{m : M} attached to a
spring whose mass we assume to be negligible. The other end of the
spring is attached to a wall. When the particle is at position
@op{x : T→L}, the force @op{F : T→F} acting on it is proportional
to the displacement @op{d : T→L} of @term{x} relative to the
spring's equilibrium length @op{l : L}:
@inset{
@equation[def-d]{d = x - l} @linebreak[]
@equation[force]{F = -(k × d)}
}
where @op{k : force-constant} characterizes the elastic properties
of the spring.
Newton's equation of motion for the position @term{x} of the mass
takes the form
@inset{
@equation[newton-x]{m × 𝒟(𝒟(x)) = -(k × (x - l))}
}
This is a second-order ordinary differential equation, which can be
rewritten in terms of the displacement @term{d}, yielding
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 20 / 23
21. Example: rendered view
Motion of a mass on a spring
We consider a point-like object of mass m:M attached to a spring whose mass we
assume to be negligible. The other end of the spring is attached to a wall. When the
particle is at position x:T→L, the force F:T→F acting on it is proportional to the
displacement d:T→L of x relative to the spring’s equilibrium length l:L:
def-d: d = x - l
force: F = -(k × d)
where k:force-constant characterizes the elastic properties of the spring.
Newton’s equation of motion for the position x of the mass takes the form
newton-x: m × ( (x)) = -(k × (x - l))
This is a second-order ordinary differential equation, which can be rewritten in terms
of the displacement d, yielding
newton-d: ( (d)) = -((k ÷ m) × d).
Introducing ω:angular-frequency defined by ω = √(k ÷ m), the solution can be written
as
solution: d[t] = A × cos((ω × t) + δ)
∀ t : T,
where cos(angle):ℝ is the cosine function. The amplitude A:L and the phase δ:angle
can take arbitray values.
Additional arithmetic definitions for this context:
mass on a
ass on a spring
by Konrad Hinsen
Context mass-on-a-spring
uses mechanics/dynamics
uses quantities/angular-
frequency
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 21 / 23
23. Play with it yourself
The code is on GitHub.
Warning: research code !
Look at the growing collection of examples.
Written in Racket, which provides excellent support for this kind of project:
the Scribble language for writing documentation, which Leibniz extends.
the DrRacket programming environment, which is Leibniz’ authoring
environment.
Konrad HINSEN (CBM/SOLEIL) Mod`eles mol´eculaires 6 septembre 2018 23 / 23