SGC 2015 - Mathematical Sciences Extension Studies

St. George’s College 2015 - Mathematical
Sciences Exploration Studies
Daniel Xavier Ogburn 1
School of Physics,
Field Theory and Quantum Gravity,
University of Western Australia
June 2, 2015
1Electronic address: daniel.ogburn@research.uwa.edu.au

Contents
1 Introduction 5
1.1 Tutor List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Exploration Studies and Extension Problems . . . . . . . . . . . 5
1.2.1 People you should know about . . . . . . . . . . . . . . 6
1.2.2 Theorems and Theories you should know about . . . . . 7
1.3 Layout and Conventions . . . . . . . . . . . . . . . . . . . . . 9
2 Dimensional Analysis and Fundamental Laws 11
2.1 Dimensional Analysis . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Preamble: March 9, 2015 . . . . . . . . . . . . . . . . . 11
2.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Moral of the story . . . . . . . . . . . . . . . . . . . . . 19
2.2 Dimensionless Constants and Fundamental Laws . . . . . . . . 20
2.2.1 Physical Systems, Fundamental Laws . . . . . . . . . . 20
2.2.2 Examples and Problems . . . . . . . . . . . . . . . . . 21
2.2.3 Buckingham Pi-Theorem . . . . . . . . . . . . . . . . . 24
2.2.4 Gravity, The Hierarchy Problem and Extra-Dimensional
Braneworlds . . . . . . . . . . . . . . . . . . . . . . . 25
3 Geometry of Antiquity and The Universe 31
3.1 Introduction: Conic Sections . . . . . . . . . . . . . . . . . . . 31
3.2 Parabolas and Geometric Optics . . . . . . . . . . . . . . . . . 32
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 The Parabola . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Scale Invariance and Transcendality . . . . . . . . . . . 34
3.2.4 Symmetries and Canonical Form . . . . . . . . . . . . . 37
3.2.5 Optical Properties and Spherical Aberration . . . . . . . 38
3.3 Ellipses and Planetary / Atomic Orbits . . . . . . . . . . . . . . 43
3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 The Ellipse . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Parametric Form . . . . . . . . . . . . . . . . . . . . . 46
3.4 The Two Body Problem and Planetary Orbits (Easter Sketch) . . 49
3.4.1 History and Cultural Impact . . . . . . . . . . . . . . . 49
3.4.2 Inverse Square Law and Central Potentials . . . . . . . 50
3.4.3 Symmetries and Jacobi Coordinates . . . . . . . . . . . 51
3.4.4 Kepler’s Laws . . . . . . . . . . . . . . . . . . . . . . 59
3.4.5 Superintegrability and Constants of Motion . . . . . . . 59
3.5 Hyperbolae, Comets and Atomic Scattering . . . . . . . . . . . 60
3.6 General Relativistic Corrections . . . . . . . . . . . . . . . . . 60
4 Physics in Non-Inertial Frames 61
4.1 The Lie Group of Rotations: Design a Death Star . . . . . . . . 61
3

4 CONTENTS
4.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.2 BFF: Linear Maps and Matrices . . . . . . . . . . . . . 62
4.1.3 SO(3): The Lie Group of Rotations . . . . . . . . . . . 64
4.2 Rigid Bodies and Moments of Inertia . . . . . . . . . . . . . . . 69
4.2.1 Rotations: about an arbitrary axis . . . . . . . . . . . . 69
4.2.2 Principal Axes and Spectral Decomposition . . . . . . . 79
4.2.3 Parallel and Perpendicular Axis Theorems . . . . . . . . 82
4.2.4 Precession and Torque: Equinox, Spinning top . . . . . 83
4.3 Accelerating Frames: The Tides . . . . . . . . . . . . . . . . . 86
4.3.1 Isometries of Euclidean Space: Galileo . . . . . . . . . 86
4.3.2 Linearly Accelerating Frames . . . . . . . . . . . . . . 86
4.3.3 The Tides . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4 Centrifugal and Corriolis Forces . . . . . . . . . . . . . . . . . 89
4.4.1 Rotational Motion and Angular Velocity . . . . . . . . . 89
4.4.2 Differential Operators in Rotating Frames: Newton’s Sec-
ond Law . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4.3 Centrifugal Force . . . . . . . . . . . . . . . . . . . . . 95
4.4.4 Coriolis Force . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Focault’s Pendulum . . . . . . . . . . . . . . . . . . . . . . . . 96
5 Nature’s Ways: The Calculus of Variations 97
5.1 Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . 97
5.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 97
5.1.2 The Principle of Stationary Action . . . . . . . . . . . . 97
5.1.3 The Euler-Lagrange Equations of Motion . . . . . . . . 98
5.1.4 N-Dimensional Euler-Lagrange Equations . . . . . . . . 101
5.1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1.6 Multiple Independent Parameters . . . . . . . . . . . . 105
5.1.7 More Examples . . . . . . . . . . . . . . . . . . . . . . 106
5.1.8 Closing Remarks . . . . . . . . . . . . . . . . . . . . . 108

Chapter 1
Introduction
1.1 Tutor List
For the year 2015, here is a list of tutors for mathematics, physics and related
areas:
• Myself (Daniel Ogburn)
• Ben Luo
• Theresa Feddersen .
1.2 Exploration Studies and Extension Problems
As a young (or old) individual, one should strive for ‘professionalism’ in their
pursuits. This means doing things ‘properly’ and diligently, without cutting cor-
ners. One aspect of professionalism in the mathematical sciences, is to develop a
forte for ‘problem solving’. Developing your capacity for problem solving is not
something that can really be philosophised or ‘rote-learned’. Certain principles
may help you, but at the end of the day the best way to develop this faculty is to
attempt many different problems. Your mission, should you choose to accept, is
to attempt the problems in these extension studies. Because they are unique and
‘different’ to the style of problems you will usually get at a university, they will
help you to develop in new ways.
Further to improving your ‘professionalism’ and problem solving capacity, these
studies should provide a fun and interesting side journey. We will try and explore
areas of mathematics and physics which are somehow glossed over. Hopefully
you will ﬁnd that these areas are in fact rich and interesting, with much to be
explored. To a large extent, you will develop tools and insights to give you an
edge in your university work.
Lastly, note that the idea here is to have fun! Engage your peers and your tutors
as you work your way through these explorations. As the prince of mathematics
once said:
“It is not knowledge, but the act of learning, not possession but the act of getting
there, which grants the greatest enjoyment.”– Carl Friedrich Gauss.
5

6 CHAPTER 1. INTRODUCTION
1.2.1 People you should know about
• Euclid
• Archimedes
• Muhammad Al-Kwarizmi
• Carl Friedrich Gauss: Gauss referred to mathematics as ‘The Queen of all
Sciences’ and is almost universally considered to be the ‘Prince of Math-
ematics’. Apart from being a prodigy, Gauss is responsible for the expan-
sion and developments in 18th century mathematics and physics. There-
fore, you will see his name behind many fundamental theorems from all
areas of mathematics.
• Bernard Riemann: Gauss’ best student. Along with Gauss, Riemann is
largely responsible for non-Euclidean geometry – the basis for all modern
theoretical physics. He is also responsible for the ‘Riemann Hypothesis’,
which has been long considered to be the greatest unsolved conjecture in
mathematics.
• Leonard Euler
• Joseph Lagrange
• Joseph Fourier
• Emmy Noether
• Henri Poincare
• David Hilbert
• Elie Cartan
• John von Neumann
• Srinivasa Ramanujan
• Alexander Grothendieck
• Grigori Perelman
• Terrence Tao
• Isaac Newton and Gottfried Wilhelm:
• Gallileo Galilei
• Michael Faraday
• James Maxwell
• Sir William Rowan Hamilton
• Ludwig Boltzmann
• Max Planck
• Paul Dirac
• Albert Einstein
• Richard Feynmann
• Lev Landau

1.2. EXPLORATION STUDIES AND EXTENSION PROBLEMS 7
• Edward Witten
• Nima Arkani-Hamed
• Jin-Mann Wong (Jenny Wong).
1.2.2 Theorems and Theories you should know about
Here are a few of the ‘major’ results that you should know about by the end
of your mathematics / physics degree. Some of them you should understand in
detail – i.e. derivations, proofs and conceptual working knowledge. Other items,
you should at least have heard or come across and understand the essence of the
result if not the speciﬁcs. Please note that the list is far from exhaustive and there
are probably many important results that I have ommitted at this time.
• Euler’s Formula
• The Fundamental Theorem of Algebra
• Weirstrass Factorization Theorem
• The Generalized Stokes’ Theorem
• The Cauchy Residue Theorem
• The Gaussian Distribution and Hypothesis Testing
• Principle of Least Squares Regression and the L2
, l2
Hilbert Spaces
• The Multinomial Theorem
• Uniform, Binomial, Poisson and Chi-squared distributions
• Non-parametric statistics
• Fourier’s Theorem and its generalizations
• The Spectral Theorem
• The Shannon-Nyquist Theorem
• Parseval’s Theorem
• Shannon Entropy
• The Riemann Hypothesis and Prime Number Theorem
• The Navier-Stokes Equation and associated Millenium Problem
• Thurston’s Geometrization Theorem (Poincare Conjecture)
• Information Complexity and P = NP Conjecture
• Fermat’s Last Theorem
• Maxwell’s Equations in 3-dimensional and 4-dimensional form
• The Black-Scholes Equation
• Markov Chains and Monte Carlo Methods
• Game Theory and the Nash Equillibrium
• Measure Theory and the Lesbegue Integral
• Dedekind’s Construction of the Real Numbers

8 CHAPTER 1. INTRODUCTION
• Proof of the Irrationality of
?
2.
• Cantor’s diagonalization argument and classiﬁcation of inﬁnities
• Godel’s Incompleteness Theorem
• The Axiom of Choice
• Lagrange’ Thereom (Group Theory)
• The First and Second Isomorphism Theorems
• Splitting Lemma
• The Jordan-Holder Theorem
• Hamilton’s Quaternion Formula
• Brouwer Fixed Point Theorems
• The Hairy Ball Theorem
• Banach-Tarksi Paradox
• The Lyapunov Exponent and Lorenz Attractor
• Noether’s Theorem and Killing’s Equation
• De-Rham Cohomology, Closed Forms and Conservation Laws
• Topological Invariants and the Atiyah-Singer Index Theorem
• Newton’s Second Law
• Newton’s Law of Gravity and Derivation of the Kepler Orbits
• The Lorentz Symmetry Group and Special Relativity
• Planck’s Radiation Law
• Boltzmann’s Equipartition Theorem and Proof
• The Ising Model and its solution
• The Bose-Einstein Distribution and Fermi Distribution
• The EPR Paradox
• Boltzmann’s Entropy Formula
• LASERS and Stimulated Emission
• Spherical Harmonics and Quantum Model of the Hydrogen Atom
• General Relativity, Einstein Field Equations and Einstein-Hilbert Action
• The Standard Model of Particle Physics
• Lie Groups and Lie Algebras
• Hilbert Spaces, Metric Spaces and Topological
• The Heisenberg Uncertainty Principle
• The Schrodinger and Dirac Equations for Quantum Mechanics
• The Klein-Kordon Equation and Feynmann Path Integral
• The Hawking-Bekenstein and Unruh Black Hole Thermodynamic Formu-
las

1.3. LAYOUT AND CONVENTIONS 9
• The Hawking-Penrose Singularity theorems and Penrose black hole In-
equalities
• The Magnetohydrodynamic Equations
• Hamiltonian and Lagrangian Varitational Principles
• The Hamilton-Jacobi Equation and Louiville’s Theorem
• Sturm-Liouville Theory and Harmonic Analysis
• Gamma Function, Hypergeometric Function and Orthogonal Polynomial
Families
• Eigenvalue/Eigenvector Decomposition and Jordan Normal Form
• The Matrix Exponential
• The Standard Model of Cosmology
• Random Matrices and Random Walks
• The Cosmological Constant Problem and Dark Energy
• Approaches to Quantum Gravity
• Ogburn-Waters-Sciffer method for generating ellipsoidal harmonic func-
tions.
1.3 Layout and Conventions
For these exploration sessions, I will typically include excerises and problems in
increasing order of difficulty. Furthermore, I make the (somewhat grey) distinc-
tion between ‘exercises’ and ‘problems’ as follows.
Definition 1 (Exercise) ‘Exercises’ are essentially numerical or algorithmic cal-
culations to check whether you understand how to manipulate the mechanics of
the mathematics involved.
‘Problems’ are generally more difficult as they require more conceptual under-
standing – i.e. you have to be able to interpret the problem and formulate in such
a way that it is reduced to an ‘exercise’.
Definition 2 (Problem) A ‘problem’ is something which can be reduced to an
exercise, with the appropriate creativity and conceptual faculty.
One essential skill for any true scientist, is the ability to reduce real-world sce-
narios, models and problems into mathematical ones. In this manner, the most
powerful sciences (quantitative ones) are ones in which scientists turn problems
into exercises.

Chapter 2
Dimensional Analysis and
Fundamental Laws
2.1 Dimensional Analysis
2.1.1 Preamble: March 9, 2015
Dimensional analysis is a deceptively simple, but fundamentally powerful tool in
the mathematical sciences – one that is often overlooked! Furthermore, dimen-
sional is not just a study tool – it is a research-level tool that can allow one to
probe the unknown and construct or discover something new and tangible.
For student purposes, dimensional analysis serves as fast error-checking algo-
rithm for your calculations. It is also useful for extracting ‘physically meaning-
ful’ information out of your system. In this sense, being ‘dimensionally-aware’
throughout all of your calculations can help one to develop a deeper understand-
ing of the quantities and objects being manipulated.
Given a large set of parameters describing a system, one can often form a smaller
number of dimensionless parameters which completely characterize that system
– hence removing any redundant information. The precise statement of the last
idea is known as the ‘Buckingham Pi Theorem’1
. This has vast practical appli-
cations to mathematical modelling, fluid mechanics, thermodynamics, electro-
dynamics, cosmology and much more. For now, we begin with a few examples
then work through some questions 2
The main idea of the following examples and problems is two-fold: first inspect
an equation and work out the dimensions (or units) of each variable and constant,
given some starting information. We then check whether or not the equation is
dimensionally consistent. Any equation from any area of science and mathemat-
ics must be dimensionally consistent – if it isn’t, then it’s wrong. In this sense,
you don’t need to understand the science or theory behind an equation to deduce
when it is incorrect on dimensional grounds!
What was said for ‘dimensional consistency’ is also true of ‘structural consis-
tency’ – a concept which we shall briefly review below.
1
For those of you who have done (or will do) linear algebra, this is just a practical conse-
quence of the ‘rank-nullity’ theorem.
2
Thanks to Scott Meyer and Matthew Fernandez for feedback.
11

12CHAPTER 2. DIMENSIONAL ANALYSIS AND FUNDAMENTAL LAWS
Theorem 1 (Dimensional Consistency) Given any equation:
LHS = RHS (2.1)
In any set of variables, the equation is wrong if the dimensions of ‘LHS’ and
‘RHS’ are not equivalent.
An easy way to prove this theorem is to note that given any equation, we have an
associated ‘auxiliary dimensional equation’. That is, an equation consisting only
of the dimensions of LHS and RHS, which we shall denote with the ‘square-
bracket notation’ [LHS] and [RHS], respectively:
[LHS] = [RHS]. (2.2)
Hence for an equation to be correct, it must be both numerically correct and
dimensionally correct.For more complicated equations, such as tensor or spinor
equations, we must also consider ‘structural consistency’.
Notation: Throughout this tutorial we will use ‘logarithmic notation’ for dimen-
sional analysis. This is often used by cosmologists and particle physicists, but
can be easily converted to the more common ‘exponential notation’.
2.1.2 Examples
Example 1 (Structural Consistency) One simple example of structural consis-
tency is matrix mutiplication – we can only multiply two matrices A and B if the
number of columns of A is equal to the number of rows of B.
Any equation that is not structurally consistent is fundamentally nonsensical and
therefore wrong. Therefore, it is wise to be always be mindful of the structures
involved in an equation.
Now recall lengths, areas and volumes. The fundamental unit that characterizes
these quantities is length: L.
Example 2 (Spatial Dimensions) Given a rectangular box, with sides of length
a, b, c the volume is VB = a × b × c. Since each of the sides has the dimensions
of length: [a] = [b] = [c] = L, the volume has dimensions
[VB] =[a × b × c]
=[a] + [b] + [c]
=L + L + L
=3L ,
which we interpret as length-cubed: L3
. The notation [ ] is used to denote the
dimensions of whatever quantity is inside the brackets. Notice also, that when we
were looking for the dimensions of a product of variables [a×b×c], we added the
dimensions of each variable: [a×b×c] = [a]+[b]+[c] = L+L+L = 3L.Finally,
we ended up with [VB] = 3L, which means that the volume V has 3 factors of the
unit length L – hence volume V has dimensions of length-cubed (in ‘exponential
notation’) L3
. Of course, we already knew this!
Exercise 1 Use the rectangular box example to calculate the dimensions of the
area of a rectangle of sides with length ‘a and ‘b , given the area formula
AR = ab. (2.3)

2.1. DIMENSIONAL ANALYSIS 13
Problem 1 (Hyperbox) Sometimes playing in a 3-dimensional world is boring
– which is why 10-dimensional superstring was invented 3
. Consider now a
N-dimensional hyperbox, which we shall refer to simply as ‘N-box’. So for ex-
ample, a cube can be considered to be a 3 − box. In determining the amount
of material required to mass-produce N-boxes, Microsoft comes up with the fol-
lowing equationw for the ‘hyper-volume’ and of an N − box with sides of equal
length ‘a’:
V ol(N − box) = aN−1
. (2.4)
Explain mathematically why this equation is incorrect, or (possibly) correct on
dimensional grounds.
Similarly to the multiplication rule, if we are inverting quantities we invert their
units – hence: [1
a
] = −[a], [ 1
a2 ] = −[a2
] = −2[a], etc. Combining this with the
multiplication rule, we get the division rule: [a
b
] = [a] − [b]. For example, if
C is the concentration of whey protein in milk, it has units ML−3
of mass over
volume – hence dimensionally: [C] = M − 3L.
Now that we have done some simple exercises, lets see how dimensional analysis
can be used for error checking. Lets say someone tells us that the volume VS of
a sphere of radius R is given by
VS =
4
3
πR2
. (2.5)
Obviously, this is wrong – but if you’ve forgotten the correct formula, there’s an
easy way to see why it is wrong using dimensional analysis. First of all [R] = L,
since radius has dimensions of length. Furthermore, [4
3
π] = 0 since this is just a
pure number (so it is dimensionless). Therefore,
[VS] =[
4
3
πR2
]
=[
4
3
π] + [R × R]
=[
4
3
π] + [R] + [R]
=0 + L + L
=2L.
But wait a minute, volume has units of length cubed, hence [VS] = 3L. We
then conclude by dimensional arguments that the formula VS = 4
3
πR2
is incor-
rect!
Although the last example was easy, the same principles can be applied to much
more complicated formulas in the mathematical sciences – indeed, it is used in
research and in practice when doing estimates, checking articles or performing
large derivations and calculations. Lets do one more example.
Example 3 Newton’s Second Law of Motion:
Force = Mass × Acceleration, (2.6)
or more generally,
F = m
dP
dt
, (2.7)
where P is the momentum of a particle of mass m. Newton’s Second Law of
motion is the fundamental postulate governing classical physics between the late
3
Disclaimer: String Theory may have been invented for other reasons.

17th and early 19th centuries. It is vastly important today as the law defines what
the force is, for an object of mass ‘m moving with an acceleration ‘a . The three
fundamental units here are mass M, time T and length L. Displacement ‘x
has dimensions of length L, hence velocity ‘v – which is the rate of change of
displacement 4
, has units of length over time:
[v] =[
dx
dt
]
=[dx] − [dt]
=[x] − [t]
=L − T , (2.8)
hence v has units L
T
. Similarly, acceleration a is the rate of change of velocity,
hence
[a] =[
dv
dt
]
=[dv] − [dt]
=(L − T) − T
=L − 2T, (2.9)
which means ‘a has units of length over time-squared: L
T2 . Finally, mass m
trivially has units of mass: [m] = M (note that here we use the capital M to
denote the fundamental unit of mass, where as the lower-case m is mass variable
that we insert into Newton’s 2nd Law). Therefore, force F has the following
dimensions
[F] =[m][a]
=[m] + [a]
=M + L − 2T,
(2.10)
whence F has units of (mass × length)/ (time-squared): ML
T2 .
Exercise 2 Use dimensional analysis to conclude which formulas are incorrect
on dimensional grounds – i.e. which of the following formulas are dimensionally
inconsistent. Show your working.
1. A triangle has a base b and a vertical height h, each with dimensions of
length L. Check whether the following formula for its area is dimension-
ally consistent
A =
1
2
b2
h. (2.11)
2. A circle has a radius r with dimensions of length L. Its area is given by
A =
1
2
πr2
. (2.12)
Is this dimensionally consistent? A stronger question to ask is whether
this formula is correct – if not, why not?
4
For those of you unfamiliar with the definition of velocity and acceleration in terms of
calculus, you can think of dx
dt as the change in displacement x over an ‘infinitesimally small
amount’ of time dt. Then dx carries dimensions of length and dt has dimensions of time: [dx] =
L , [dt] = T. Note that in general, for an arbirtrary quantity y, the ‘infinitesimal quantity’ dy
carries the dimensions: [dy] = [y].

A Euclidean ellipse, produced by slicing a cone, can be defined as: “the set of
points uch that the sum of the distances to two fixed points (the foci) is constant.”
In a parallel universe, Rene Descartes decides that he write the equation for an
ellipse with its major axis coincident with the x − axis of the Cartesian plane,
as:
x2
a2
+
y
b2
= 1, (2.13)
where the right-hand side is dimensionless.
Here a is the semi-major axis length and b is the semi-minor axis length. Noting
that one parametrisation in polar coordinates is x = a cos(θ), y = b sin(θ), we
can see that x and y have units of length. Therefore, prove that this equation is
wrong on dimensional grounds. Now suggest the correct equation.
There is one more rule of dimensional analysis which involves analysing equa-
tions which include a sum of terms. In particular, given a quantity A = B +
C + D, to compute the dimensions [A] of A, we don’t just add the dimensions
of B, C and D:
[A] = [B] + [C] + [D], (2.14)
but rather, we have the consistency requirement that:
[A] = [B] = [C] = [D]. (2.15)
This is because B, C and D should all separately have the same units. As such,
this observation is very useful for determining the dimension of multiple un-
known quantities in an equation that involves a sum of different terms. For
example, the area of a toddler house drawing is given by: AHouse = ATriangle +
ASquare = 1
2
bh + a2
, where b is the base length of the triangle, h is its verti-
cal length and a is the length of the sides of the square. Therefore, [AHouse] =
[ATriangle] = [ASquare] = 2L, hence [1
2
bh] = [a2
] which implies [b] + [h] =
2[a] = 2L.
One last concept: A dimensionless constant, C, is defined to be a quantity which
has no dimensions – hence [C] = 0. These are fundamentally important in
the description of a physical system since they do not depend on the units you
choose. Thus, in some manner they are represent a ‘universal’ quantity or prop-
erty – indeed, the dimensionless constants of a system describe a universality
class 5
. A university class is essentially a set of theories which have the same
‘critical behaviour’ – i.e. near a critical point (e.g. phase transition), each the-
ory in the same universality class will possess quantities which obey the same
scaling laws.
2.1.3 Problems
To answer the following questions, try not to worry too much about terminology
or new and abstract concepts. We are only interested in dimensions – so if you
stay focused and don’t get distracted by the extra information, you can finish
them quickly with no prerequisite knowledge!
Problem 2 A hypercube living in d dimensions has d sides, each with length a
and dimensions of length L. Its hyper-volume has units of Ld
and is given by
the formula
V = aD
. (2.16)
5
A more precise meaning of this statement can be found in the theory of ‘Renormalization
Groups’.

Verify that this is dimensionally consistent – i.e. show that [V ] = L + ... + L =
d × L. What dimensions would its surface area have? Hint: this would be same
the as dimensions of the area of one of its ‘hyper-faces’.
Exercise 3 (Make Math, Not War) The U.S. Navy invests a significant amount
of money into acoustic scattering studies for submarine detection (SONAR). As
part of this research, the Dahlgren Naval Academy uses ‘prolate spheroidal har-
monics’ (vibrational modes of a ‘stretched sphere’) to do fast, accurate scattering
calculations. In this process, a submarine can be approximated to be the shape of
a ‘prolate spheroid’ or ‘rugby ball’. A prolate spheroid is essentially the surface
generated by rotating an ellipse about its major axis. Given a prolate spheroid
with a semi-major axis length a and semi-minor axis of length b, its volume is
V =
4π
3
ab2
(2.17)
Is this formula dimensionally-consistent? What about the following formula for
the surface area (it should have units of length-squared):
S = 2πb2
(1 +
a
be
sin−1
(e))? (2.18)
Note, sin−1
is the ‘inverse sine’ or ‘arcsine’ function. It necessarily preserves
dimensionality, hence [sin−1
(e)] = [e]. The variable e is the ‘eccentricity’ of
the spheroid. It is a dimensionless quantity: [e] = 0, which measures how
‘stretched’ the spheroid is – i.e. how much it deviates from a sphere. It is given
by the (dimensionally-consistent!) formula:
e2
= 1 −
b2
a2
. (2.19)
A perfect sphere corresponds to e = 0, where as an infinitely stretched sphere
corresponds to e → 1.
Problem 3 (Twiggy) In a parallel-universe, Andrew Forrest has a dungeon with
BF flawless black opals inside it. From a financial point of view, these have di-
mensions of money $ – i.e. [BF ] = $. A machine recently designed by Ian
McArthur, head of physics at UWA, uses quantum fluctuations of the spacetime
vacuum to produce black opals at a rate of RUWA black opals per minute. Sens-
ing the loss of his monopoly on the black opal market, Andrew Forrest employs
a competing physicist at Curtin University to create a quantum vacuum stabi-
lizer. This reduces the number of black opals that Ian can produce per minute by
RC black opals per minute, where |RC|≤ RUWA. Working on a broad concept
problem, a team of first year students at St. George’s college come up with the
following model to predict the value V of shares in Forrest BlackOps inc. on the
stockmarket as a function of time t (time has dimensions T):
V = β
D
BF
− λ(RUWA + RC)τDe−λ(1− t
τ
)
(2.20)
where the constant τ (having dimensions of time T) denotes 6
the time at which
European Union is predicted to collapse. Furthermore, D is a function that
measures the market demand for black opals (with no dimensions) and β is an
economic constant predicted by game theory with units of money-squared: $2
.
Finally, λ is a dimensionless parameter (so [λ] = 0) that depends on the num-
ber of avocados served at the college since the establishment of St. George’s
Avocadoes Anonymous up to the given time t.
6
This is the Greek letter tau – not the Roman letter t.

Is this model dimensionally consistent – i.e. does [V ] = $?
What about the following formula, proposed by students from St. Catherines
College (who didn’t practice dimensional analysis)?
V =
D
BF
− D2
e−t
(2.21)
On dimensional grounds, list two reasons why this model incorrect.
Problem 4 (Spacetime Surfing) Don’t worry about the physics, just keep track
of dimensions and rules!
On March 17, 2014 the Harvard-Smithsonian Center for Astrophysics released
a press-conference tomorrow indicating the discovery of gravitational waves.
Gravitational waves are ripples through spacetime created by large gravitational
disturbances in the cosmos – for example, exploding stars and coalescing black-
holes. These are predicted by Einstein’s theory of General Relativity – a theory
in which gravity is a simple consequence of the geometry (shape) of spacetime.
In this theory, choosing natural units for the speed of light: c = 1, time and
spatial length become dimensionally equivalent: T = L. Therefore, dimension-
ally we have: [time] = [distance] and [c] = [distance/time] = L − T = 0. A
geometry which models gravitational waves is described by the following metric
(an abstract object which tells you how gravity and measures of time and length
vary at each point in spacetime):
g = η + h (2.22)
where η is a flat-space metric (describing an empty universe):
η := −dt + dx ⊗ dx + dy ⊗ dy + dz ⊗ dz (2.23)
and h is a symmetric-tensor, given in de-Donder gauge by
h := cos(k · r)A +
1
2
× trace(h) × η. (2.24)
Here is a small (<< 1) dimensionless parameter: [ ] = 0 and A is a symmetric
tensor field with dimensions of length-squared: [A] = 2L. Note that the trace
operation turns tensors into scalars, so it removes the dimensionality of a tensor:
[trace(h)] = 0. Furthermore, consider · as another form of multiplication. Since
the wave vector k and position vector r have inverse units, we have [k] = −L,
[r] = +L – hence [k · r] = 0. For the purposes of dimensional-analysis, we
can treat the tensor product ⊗ as ordinary multiplication also. The differential
quantities have the following dimensions: [dt] = [dx] = [dy] = [dz] = L, hence
[dx ⊗ dx] = 2[dt] = 2L for example. Since x, y, z, t represent coordinates in
spacetime, we also have [x] = [y] = [z] = [t] = L.
Show that the metric g demonstrates a dimensionally-inconsistent solution to the
Einstein field equations. Where is the error? Suggest what could be done to this
metric to ‘fix’ it and give a dimensionally-consistent solution.
Remark: If you were certain that the equation for h was correct, it would be
unnecessary to tell you the dimensions of A – you could work it out since you
already know [cos(k · r)] = 0 (the function cos(something) is necessarily di-
mensionless). Therefore, pretending [A] is unknown, prove that [A] = 2L given
all the other information.

After completing the last few problems, one should realize that much time can be
saved by ignoring most of the information and concentrating only the dimensions
of the variables and constants in the given formulas. This is true in general!
Therefore, to do dimensional analysis, one need not necessarily understand the
science or mathematics behind an equation – but simply the dimensions of the
quantities involved. Therefore, it is an easy way to show when something is
wrong without knowing what you are talking about. 7
Problem 5 (Super-Fluffy Super-Symmetric Tensors) Despite successful ‘solar-
system tests’ of Einstein’s theory of gravity, it has severe shortcomings. One
fundamental issue with Einstein’s theory is that it is not consistent with quantum
theory (which we know is extremely accurate on short-distance scales) – leading
to problems such as the ‘cosmological constant problem’. Another problem is
that it predicts singularities where the laws of physics breakdown. To rectify
Einstein’s theory, many physicists have attempted to unify gravity with quantum
mechanics over the last century. As it turns out, creating a theory of quantum
gravity presents immense mathematical and experimental obstacles.
One approach to understanding quantum gravity, is to consider supersymmetric
theories containing particles of ‘higher spin’. Such theories are conjectured to
reduce back to superstring theory when some symmetry is broken. If so, such
theories (when constructed) will lead to a greater understanding of the global
structure of string theory – for example, dualities. To construct a supersymmetric
theory with massive higher spin particles, one must find a geometric object called
the ‘Super-cotton tensor’ – this describes the conformal8
(‘shapes and angles’)
structure of spacetime. To help find this tensor, Wα(2s), we know that it has
following dimensional form:
[W] = [(D ¯D)2s+1
]H, (2.25)
where Hα(2s) is the gravitational superfield (when s = 1) and Dα and ¯Dβ are
‘spinor-derivative’ operators.
Roughly speaking, along the lines of Roger Penrose, one can think of a (mass-
less) spinor as the ‘square root’ of some vector field. Therefore, the square of a
spinor must have the same dimensionality as a partial derivative:
[D2
] = [ ¯D2
] = [
∂
∂x
]. (2.26)
Furthermore, analysis of non-minimal and type (1,1) supergravity actions leads
us to conclude that:
[H] = −
1
2
M, (2.27)
where M is the unit mass. Note that in natural units with the speed of light
,c = 1, the unit of mass and unit of length are inverses of each other:
M = −L, (2.28)
or M = L−1
in exponential notation. Finally, note that a differential 1-form dx
can be thought of as a differential length element, hence has units of length:
[dx] = L. (2.29)
7
Dimensional analysis would have saved the present author about 100 hours of supergravity
calculations – time which was largely lost due to two dimensionally-inconsistent equations in a
published journal article.
8
Conformal symmetries are ones that preserve relative angles, but not lengths. For example,
a scaling transformation is an example of a conformal transformation.

Since ∂
∂x
can be thought of as a rate of change of length, it must have the inverse
units of dx.
I: With this information, deduce both the mass dimensions and length dimen-
sions of the Super-Cotton tensor W.
II: Using the ﬂat superspace anti-commutation relation
{Dα, ¯Dβ} := Dα
¯Dβ − ¯DβDα = ∂αβ, (2.30)
where ∂αβ ∝
3
a=1
σa
αβ∂a is the spinor form of the vector derivative ∂a := ∂
∂xa ,
derive the relation:
[D] =
1
2
[
∂
∂xa
] (2.31)
which was assumed in part I.
Note that you may assume [D] = [ ¯D] and that the Clifford algebra generator
σa
is dimensionless. Also, note that superscript a in xa
is simply a coordinate
label. For a 3-dimensional manifold, a = 1, 2, 3 so we have local coordinates
x1
, x2
, x3
to keep track of points in space.
2.1.4 Moral of the story
Dimensional analysis can tell you when an equation is wrong, but it doesn’t nec-
essarily imply that an equation is correct – even though its dimensions might be
consistent. As a student, you should make use of dimensional analysis when-
ever you can – try it on all formulas you get which have dimensionful quantities.
This will help you to gain a strong intuition of whether or not statements and
equations are sensible and consistent. This helps you to be a fast calculator and
it will also help you to pick up errors in your lecture notes ...

2.2 Dimensionless Constants and Fundamental Laws
2.2.1 Physical Systems, Fundamental Laws
One of the key concepts in dimensional analysis is that of dimensionless param-
eters. Dimensionless parameters are important, because they allow you to char-
acterise both physical and theoretical mathematical systems in a scale-invariant
way. Note that mastering the following concepts and exercises requires a good
understanding of the material in Session 1. For the more mathematically in-
clined, one of the examples and exercises illustrates how to mathematically prove
the π theorem by using the rank-nullity theorem from linear algebra – this is a
good exercise for understanding matrix equations and the correspondence be-
tween matrices and simultaneous equations.
In terms of applications, we will use dimensional analysis to reach a deeper
understanding of simple harmonic motion, viscous fluids, electromagnetism and
Einstein’s theory of gravity.
Definition 3 (Physical Systems) A physical system in the mathematical sciences
typically consists of:
1. A set of physical parameters.
2. A set of governing equations (fundamental laws) which describe the be-
haviour and evolution of the system.
3. A set of fundamental ‘units’ which describe the dimensionality of the sys-
tem.
Definition 4 (Fundamental Law) A fundamental law is a principle, mathe-
matical statement or an axiom used to describe a system, which cannot be de-
rived from any other principles, equations or axioms.
In this sense, we can view the ‘fundamental laws’ of the natural sciences as more
heauristic notion of ‘mathematical axioms’ (formal assumptions agreed upon by
utility and sensibility).
Arguably, the notion of a ‘fundamental law’ is relative to the context of one’s
analysis. For example, many classical laws, previously considered to be fun-
damental, are a macroscopic consequence of quantum dynamics or statistical
mechanics. However, if we are only looking for classical effects in our analysis,
we may often ignore the quantum mechanical details and treat our classical laws
as ‘fundamental’. The goal of the natural sciences – including mathematics, is to
reduce the number of ‘fundamental laws’ of nature to a minimum. In this sense,
one is able to capture nature in the ‘simplest way possible’9
. In this manner, per-
haps the largest and longest standing goal of theoretical physics, is to construct
and experimentally test a full theory of quantum gravity.
Fundamental laws often go hand-in-hand with one or more special ‘dimension-
less constants’ that capture information and deep insights about the mathematics
and physics of systems governed by that law. We shall now investigate such
examples.
9
Confer with ‘Occam’s Razor’.

2.2. DIMENSIONLESS CONSTANTS AND FUNDAMENTAL LAWS 21
2.2.2 Examples and Problems
Example 4 Lets take a simple, but profound 10
example – the simple harmonic
oscillator. One example of a simple harmonic oscillator, is a mass placed on
a frictionless tabletop attached to a spring. This string is either stretched or
compressed, then released so that the mass proceeds to undergo simple harmonic
motion. This physical system is therefore described by
1. A set of 4 physical parameters: the spring constant κ and the initial posi-
tion x0 and initial velocity v0 of the mass m.
2. An equation of motion called ‘Hooke’s Law’ 11
, which says that when you
stretch or compress the spring, the force acting to restore the spring to its
natural length is given by:
F = −κx (2.32)
where x is the displacement of the mass attached to the spring. Combining
this with Newton’s 2nd Law, F = ma, we get the equation of motion for
the spring:
m
d2
x
dt2
= −κx, (2.33)
where a = d2x
dt2 is the acceleration of the spring.
3. A set of 3 physical units: mass M, time T, length L (usually kilograms,
seconds, metres).
Ignoring microscopic and non-linear effects, we may naively view ‘Hooke’s law’
as a fundamental law (or definition) concerning systems in simple harmonic mo-
tion. Therefore, it must have some special ‘dimensionless constant’ attached to
it.
From the 4 parameters and 3 physical units in the simple harmonic oscillator
system, the Pi theorem claims that we can form one dimensionless constant. To
do this, one needs to know the dimensions of the parameters involved. Clearly
initial displacement has dimensions of length and initial velocity has dimensions
of length /time: [x0] = L, [v0] = L − T. To work out the dimensions of the
spring constant κ, we inspect the equation of motion. Since acceleration has
dimensions of length over time-squared, we have [d2x
dt2 ] = L−2T. Therefore, we
have
[m
d2
x
dt2
] = [−κx] =⇒
[m] + [
d2
x
dt2
] =[κ] + [x]
M + L − 2T =[κ] + L =⇒
[κ] =M − 2T. (2.34)
Note that the mathematical symbol ‘ =⇒ ’ means ‘implies’. Now that we have
the dimensions of all parameters in this system, we can form a dimensionless
product. In particular, we need one inverse mass factor and two factors of time
to cancel the dimensions in [κ] = M − 2T. We can get an inverse unit of
mass from [ 1
m
] = −M and two inverse time units by combining [x0] = L and
10
Despite its simplicity, the (quantum) harmonic oscillator is the cornerstone for modern quan-
tum field theory and particle physics. In this picture, a quantum field is an infinite continuum
of simple harmonic oscillators, whose motion is captured by Fourier theory, Lie algebras and
Special Relativity.
11
After the famous pirate, Captain Robert Hooke.

[v0] = L − T. In particular, [(x0
v0
)2
] = 2[x0] − 2[v0] = 2L − 2(L − T) = 2T.
Hence, we get the dimensionless constant:
G :=
k
m
(
x0
v0
)2
=⇒
[G] =[
k
m
(
x0
v0
)2
]
=[k] − [m] + 2([x0] − 2[v0])
=M − 2T − M + 2T = 0. (2.35)
Since the constant G has no formal name, we will claim it and call it the ‘Geor-
gian Constant’ after St. George – the patron saint of dimensional analysis.
The last example illustrated a few important concepts. First of all, we showed
that mathematically all the information about a physical system is giving by a
set of parameters, a set of physical units (corresponding to the ‘dimensions’) and
at least one governing equation. Second, we showed how we can calculate the
units of an otherwise unknown constant by using dimensional analysis – this is
how we found the dimensions of the spring constant κ.
Finally, we showed in this particular case, having 4 parameters and 3 physical
units, we were able to form one dimensionless constant: G . Although we could
have taken any multiple or power of this constant and still arrived at a dimen-
sionless quantity, is there essentially only one independent product that we can
form out of the parameters in the simple harmonic oscillator. This is because G,
1
G
, G2
or 2G for example, all contain the same ‘information’.
The last observation is one example of the ‘fundamental theorem of dimensional
analysis’, also known as the ‘π theorem’.
Theorem 2 (Buckingham Pi Theorem) Given a system specified by n inde-
pendent parameters and k different physical units, there are exactly n − k in-
dependent dimensionless constants which can be formed by taking products of
the parameters.
Thus in the last example, we saw that the simple harmonic oscillator was de-
scribed 4 parameters and 3 physical units – hence as claimed, there was indeed
only 4 − 3 = 1 independent dimensionless constant that we could have formed.
Hence, any other dimensionless constant in this system must be some multiple
or some power of G. Before doing the exercises, here is one more example from
fluid mechanics.
Example 5 In fluid mechanics, the notion of the ‘thickness’ of a fluid is formal-
ized by defining its ‘viscosity’. In particular, the dynamic or shear viscosity of a
fluid measures its ability to resist ‘shearing’– an effect where successive layers
of the fluid move in the same direction but with different speeds. For example,
relative to water, glass 12
and honey have a very high shear viscosity, whereas
superfluid Helium has zero viscosity 13
.
Given a fluid trapped between two parallel plates–the bottom plate being station-
ary and the top plate moving with velocity v parallel to the stationary plate, the
magnitude of the force required to keep the top plate moving at constant velocity
is given by:
F = ηA
v
y
(2.36)
12
The myth about old church windows sagging is not due to the fact that glass can be modelled
as a viscous liquid, but rather due to the glass-making techniques of past centuries.
13
The transition to the ‘superfluid’ phase occurs below 1 Kelvin – i.e. close to absolute zero
temperature.

Here v is the speed (magnitude of the velocity) of the top plate, A is its surface
area and y is the separation distance between the plate. The parameter η is
defined to be the shear viscosity of the fluid. We can calculate its units using
dimensional analysis. First, from Newton’s 2nd law we know that the force has
the dimensions: [F] = M + L − 2T. Furthermore, the area A has dimensions
of length-squared [A] = 2L, the speed v has dimensions [v] = L − T and the
separation y has dimensions [y] = L. Hence
[F] =[η] + [A] + [v] − [y] =⇒
[η] =[F] − [A] − [v] + [y]
=(M + L − 2T) − 2L − (L − T) + L
=M − L − T (2.37)
whence η has units of M
LT
. Now, the kinematic viscosity ν 14
of the fluid is defined
as the ratio of the dynamic viscosity η and the density ρ (mass per volume) of
the fluid:
ν =
η
ρ
. (2.38)
Since density has units of mass per length-cubed, we have [ρ] = M − 3L and
thus
[ν] = [
η
ρ
] = [η] − [ρ] = M − L − T − (M − 3L) = 2L − T. (2.39)
In some set of scenarios, we can think of this fluid as parameterized by four
parameters: density ρ, shear viscosity η , kinematic viscosity ν and the fluid
speed v (assuming the fluid only travels in the horizontal direction). Since we
have three different physical units – mass, length and time, the Pi theorem tells us
we can form one independent dimensionless constant. This special, widely-used
constant is called the ‘Reynolds number’ of the fluid and is defined by:
R =
ρvl
η
=
lv
ν
(2.40)
where l is the ‘characteristic length scale’ for the fluid system (e.g. for a fluid
flowing in a pipe, this length scale would be the diameter of the pipe).
In essence, the Reynolds number expresses the ratio of inertial forces to the
viscous forces. In this manner, it describes relative importance of these two
types of forces in different scenarios. Since it is dimensionless, the Reynolds
number is scale invariant – meaning it characterises the way a fluid will flow on
all length scales (within the valid regime of your theory).
Exercise 4 We defined the Reynolds number R in two ways – one in terms of its
dynamic viscosity η and the other in terms of its kinematic viscosity ν. Show that
the Reynolds number is dimensionless using both of its definitions.
The Reynold’s number also controls the amount of ‘turbulence’ present in a fluid
system – with high Reynolds numbers corresponding to turbulent flow. There-
fore, by evolving the dimensionless Reynolds constant from low values to high
values, we will see a laminar flow turn into one with instabilities, vortices and
chaos.
14
This is the Greek letter ‘nu - not the Roman letter ‘v’.

2.2.3 Buckingham Pi-Theorem
Formally, the rank-nullity theorem states that given a m × n matrix (m rows, n
columns) A, which maps n-dimensional vectors in Rn
to m-dimensional vectors
in Rm
, then the rank and nullity of the matrix A satisfy:
rank(A) + nullity(A) = n (2.41)
where the rank of A is defined as the number of linearly independent row vectors
(or column vectors) of A and the nullity of A is defined as the dimension of the
kernel of A – i.e. the number of linearly independent n-dimensional vectors
which get mapped to 0 by A. Note that m ≤ n necessarily (or the system is
over-determined).
In the context of dimensional analysis and the π Theorem, we can think of math-
ematical or physical system with n parameters and k different types of funda-
mental units (dimensions) as a system of n linear equations (one for each pa-
rameter) in k variables (the units). In particular, we make use of the additive or
‘logarithmic’ notation which we have been using for dimensional analysis.
Problem 6 (π Day, π theorem) On 04/03/2015, Tibra Ali decides to have a Pi
battle at the Perimeter Institute for Theoretical Physics. On the same day, An-
gela Burvill and Joshua Bailey decide to have a Pi eating contest – where each
student has to eat one frozen meat pie for each correct digit of Pi that other re-
cites. However, realizing that becoming a mathematician requires thousands of
hours of diligence, William decides to prove the ‘Buckingham Pi theorem’ using
the ‘rank-nullity’ theorem from linear algebra – which he remembers from last
semester! To up the stakes, Ben Luo decides to dangle Rowan Seton from the
top of the college tower till William proves his theorem.
Assuming Ben has finite strength, save Rowan by proving the Pi theorem with
William.
Hint for proof: For the above problem, note that we can view a system with
paramters χ1, ..., χn, fundamental units u1, ..., uk and the following dimensions
for the parameters:
χ1 =λ11u1+ . . . + λ1kuk
χ2 =λ21u1+ . . . + λ2kuk
...
...
...
...
χn =λn1u1+ . . . + λnkuk , (2.42)
as a system of n linear equations in m variables. One can now apply the rank-
nullity theorem to this system that the number of dimensionless constants which
one can form from the corresponding physical system, should be equal to the
nullity of the n × k ‘dimensional matrix’ formed by the coefficients λij where
i = 1, ..., n and j = 1, ..k.
Example 6 As a simple example of the algebra required, say we have a system
with three parameters x, y, z and two fundamental physical units U1, U2. We can
represent the dimensions of our parameters as a matrix by letting each column
correspond to different parameters and letting each row correspond to different
fundamental units. Therefore, we let the first column correspond to the parame-
ter x, the second column to y and the third column to z.
The he first row corresponds to the unit U1 second row to the unit U2. Then the
entry in the first row and column corresponds to the number of dimensions of x

has in the unit U1. So if for example, x has the units Ua
1 Ub
2 then it has dimensions:
[x] = [Ua
1 ] + [Ub
2] = aU1 + bU2. Similarly, let y have units Uc
1Ud
2 and z have
units Ue
1 Uf
2 : hence [y] = cU1 + dU2 and [z] = eU1 + fU2.
Forming the ‘dimensional matrix’ D for this physical system, we have:
D =

a c e
b d f

(2.43)
To see that this makes sense, we can simply act15
the transpose of the dimen-
sional matrix DT
on the vector U =

U1
U2

containing the physical units to re-
cover all three of our dimensional equations [x] = aU1 + bU2, [y] = cU1 + dU2
etc. To ﬁnd dimensionless constants, we have to solve the ‘nullspace equation’:

a c e
b d f

!
α
β
γ
(
) =

0
0

for all possible vectors !
α
β
γ
(
). In particular, dimension-
less constants will be a product of powers of the different physical parameters:
xα
yβ
zγ
, where the exponents α, β, γ are components of a vector !
α
β
γ
(
) which
solves the nullspace equation.
The number of linearly independent vectors !
α
β
γ
(
) which solves the null-space
matrix equation, coincides with the ‘nullity’ of the dimensional matrix D – it
is precisely equal to the number of dimensionless constants we can form. In
particular, since we have n = 3 independent physical parameters x, y, z corre-
sponding to three columns of our dimensional matrix D and k = 2 fundamental
units U1, U2 corresponding to the two (linearly-independent 16
) rows of D, the
rank-nullity theorem tells us that the nullity of D is given by
nullity(D) = n − k = 3 − 2 = 1. (2.44)
Since the nullity of D is precisely equal to the number of dimensionless con-
stants we can form for this physical system, this shows that the π Theorem for
dimensional analysis, is just a special instance of the rank-nullity theorem for
linear algebra.
2.2.4 Gravity, The Hierarchy Problem and Extra-Dimensional
Braneworlds
The following is an extended set of exercises which test all the skills the tuto-
rials have elucidated so far in dimensional analysis. It will also you introduce
to some concepts which may be new and bizarre, whilst linking them back to
everyday reality. The overall goal will be to derive a dimensionless constant
that characterises classical gravity on all length scales (no knowledge of relativ-
ity is required)! By comparing this constant to another dimensionless constant
from electromagnetism, we will see why gravity is so much weaker than the
other three forces in nature – then investigate a solution to this peculiarity using
brane-world models of the universe.
15
By matrix multiplication.
16
These rows are necessarily linearly independent, since we assume our fundamental physical
units to be independent – by deﬁnition.

As far as we understand, all interactions in nature take place through four funda-
mental forces. At present, we have a rather ‘successful’ theoretical and experi-
mental quantum description of three of these forces – that is, we have constructed
quantum ﬁeld theories to describe the ‘quanta’ (particles) which mediate these
forces. Gravity, despite our everyday experience of it, remains somewhat myste-
rious and theoretically elusive in several ways – in particular, because it is highly
resistant to all attempts to turn it into a quantum theory like the other forces. If
there really any hope of long-distance interstellar space travel and other extreme
‘sci-ﬁ’ technologies, a theory of quantum gravity will be the cornerstone.
As a reminder, the four forces dictating our universe are the
• Electromagnetic Force: Which governs electromagnetic radiation (such as
light) as well as interactions between charged particles. In the quantum
description (Quantum Electrodynamics), this force is carried by massless
particles known as ‘photons’.
• Weak Nuclear Force: In the quantum description, this force is mediated by
massive particles known as the Z and W±
bosons. It is involved in quark
transformations as well as some interactions between charged particles.
• Strong Nuclear Force: In the quantum description (Quantum Chromodyan-
mics), this force is mediated by ‘gluons’ and is responsible for the inter-
actions between quarks, which are the particles making up hadrons such
as the proton and neutron. In this manner, it is responsible for processes
such as fusion, which is the source of energy for our sun.
• Gravitational Force: In the attempted quantum descriptions, this force is
mediated by a massless particle known as the ‘graviton’. It is responsible
for the interactions of all particles with mass, but also determines the tra-
jectories of massless particles (e.g. gravitational bending of light) since it
warps the spacetime continuum.
At higher energies, these four forces start to unify into one single force – for
example, the electromagnetic and weak nuclear forces unify to make the elec-
troweak force. Attempts to unify the electroweak and strong nuclear forces
have been partially successful and fall under ‘The Standard Model’ of particle
physics. On the other hand, attempts to unify gravity with the other forces have
been largely unsuccessful, with the only real promising candidate being String
Theory.
One of the biggest mysteries about the gravitational force, is why it is so weak
compared to the other forces in nature. In some sense this is ‘unnatural’, hence
suggests that on some deeper level, gravity is fundamentally different form the
other forces. As the goal of this tute, we will use dimensional analysis to charac-
terise the gravitational and electromagnetic forces with some special dimension-
less constants – then compare their strengths to prove this claim. Finally, we will
end on some very recent 17
advancements in theoretical physics which propose
an explanation of why gravity is the weakest of the four forces.
Exercise 5 (Newton, Einstein and Braneworlds: The Gravitational Coupling Constant)
Of the many things that Isaac Newton is famous for, one of them is coming up
with multiple mathematical proofs of the fact that the planets orbit the sun in
elliptical paths – and that this elliptical motion is a direct consequence of an
inverse square law. Thus, by planar geometry and calculus he came up with
the following gravitational force law to explain the astronomical bservations of
17
The last 5-10 years.

Johannes Kepler and Tycho Brae:
F = −GN
m1m2
r2
ˆr (2.45)
where GN is Newton’s gravitational constant, m1 and m2 are the masses of two
objects separated by a distance r and ˆr is a ‘unit vector’ (vector with magnitude
1) pointing from one object to the other. This tells us the gravitational force that
one massive object exerts on another massive object.
QI:Using Newton’s 2nd Law, F = ma, deduce the dimensions or units of GN .
Note that you are working with mass, length and time (M,L,T) as your funda-
mental units, hence [m1] = [m2] = M. Furthermore, by deﬁnition the unit
vector 18 ˆr = r2−r1
|r2−r1|
is dimensionless: [ˆr] = 0. Note that in general, the di-
mensions or units of a vector quantity are always the same as the units of the
magnitude (and components) of that vector – hence [r] = [r] for example.
Now that we have the dimensions of GN , we are ready to consider Einstein’s
theory of gravitation. Einstein’s theory differs from Newton’s theory in many
ways – fundamentally it explains gravity as a consequence of spacetime curving
around any object with mass, where the ‘amount’ of curvature being greater
for greater masses (e.g. the Sun). On an astrophysical level, it is important
as it helps to explain the big bang, solar fusion and the existence of the black
holes – objects which are necessary for the stability of some galaxies such as
the Milk Way. In terms of everyday living, general relativity is essential for the
operation of GPS satellites – without the gravitational corrections to the timing
(gravitational time-dilation) offered by Einstein’s theory, the GPS system would
not be accurate enough to work.
In Einstein’s theory, spacetime is modelled by the following objects 19
• A energy-momentum tensor T which contains information about ‘sources’
of curvature – matter and energy. It’s components have dimensions of
an energy-density: [Tab] = [ Energy
V olume
] = M − L − 2T. Since the tensor
itself is a second-rank covariant tensor, we have: [T] = [Tabdxa
⊗ dxb
] =
[Tab] + [dxa
⊗ dxb
] = M − L − 2T + 2L = M + L − 2T.
Note that the dimensionality of energy can be deduced from the relation:
Work = Force × Distance and hence [Energy] = [Work] = [Force] +
[Distance] = M + L − 2T + L = M + 2L − 2T.
• A metric tensor g describing how gravity distorts measures of length and
time. This has units of length-squared: [g] = 2L.
• The Riemann Curvature tensor, Riem, describes how the curvature of
spacetime varies in different regions. It also measures how gravity distorts
parallel-transport. It is given roughly 20
as the anti-symmetrized second
tensor ‘gradient’ of the metric: Riem ∼ ⊗ ⊗ g, where are a type
of derivative operator and ⊗ is a type of multiplication for tensors.
18
Here r1 and r2 are the position vectors describing the location of the masses m1 and m2
with respect to some origin.
19
Note that most physicists do not understand differential geometry, hence when they speak
of tensors they usually are talking about components of tensors. This won’t matter here, but for
reference, if you ever want to compare: covariant tensors have two extra factors of length com-
pared to their components and contravariant tensor have two factors less than their components
– which basically means adding ±2L to the dimensions.
20
Don’t ever show this to a differential geometer. If you want the real deﬁnition, see me.

• The Ricci tensor, Ric, is given by taking the trace of the Riemann tensor:
Ric = Trace(Riem). It describes how gravity distorts volumes and is
also related to how different geometries evolve under the heat equation.
• The Ricci Scalar R – this quantity is a function which measures how grav-
ity locally distorts volumes. Einstein’s theory can be derived by saying
that nature minimizes this quantity – an approach due to a mathematician
named David Hilbert 21
. It is given by the taking the trace of Riemann
tensor twice: R = Trace(Trace(Riem)) = Trace(Ric).
QII:Using the above information, derive the dimensions of Newton’s gravita-
tional constant GN again, this time using Einstein’s law of gravity:
Ric −
1
2
Rg =
8πGN
c4
T. (2.46)
You will need the following facts: the derivative operator reduces the length
dimension of a tensor by one factor, whereas the tensor product ⊗ raises it by one
factor (in this case). Hence [Riem] = 2[ ]+2[⊗]+[g] = −2L+2L+2L = 2L.
Furthermore, the trace of a (covariant) tensor reduces its length dimension by
two factors, hence for example: Trace[Riem] = [Riem] − 2L.
Tip: To ease calculations, you may use so-called ‘natural units’ where the speed
of light c = 1. In these units length and time have the same dimensionality, hence
[c] = [Distance] − [Time] = 0 and T = L. You will then get the dimensions of
GN in natural units which you can compare to your value of GN using Newton’s
Law, after you set T = L.
Finally, we are in a position to understand a very special dimensionless con-
stant – the ‘gravitational coupling constant’, αG. Since it is dimensionless, this
constant characterises the strength of the gravitational force on all length scales
(within the regime of validity of Einstein’s theory). It can be deﬁned in terms of
any pair of stable elementary particles – in practice, we use the electron.
In particular, we have:
αG =
GN m2
e
¯hc
≈ 1.7518 × 10−45
(2.47)
where c is the speed of light, GN is Newton’s gravitational constant and me is
the mass on an electron. The quantity ¯h = h
2π
is the reduced Planck constant
which characterises the scale at which matter exhibits quantum behaviour such
as wave-particle duality 22
QIII:Show that the gravitational coupling constant αG is indeed dimensionless.
Note that [me] = M. To work out the dimensions of ¯h = h
2π
, you will need the
Planck-Einstein relation which relates the energy of a photon (particle of light)
its frequency:
E = hf. (2.48)
Then [h] = [E] − [f]. Since the frequency of light is the number of oscillations
of the electromagnetic wave per unit time, we have [f] = −T. You can get the
dimensions , [E] of energy E from the calculation shown above for the energy-
momentum tensor.
21
In retrospect, David Hilbert deserves almost the same level of credit as Einstein for the
theory of general relativity.
22
If ¯h was really large – say ¯h ≈ 1 for example, then we would observe wave-particle duality
on a macroscopic scale and the universe would be a scary, crazy place. Bullets would diffract
through doorways and Leanora’s ﬁsts could quantum tunnel through walls.

Now, for the last part of this problem, we introduce one more fundamental phys-
ical unit: the unit of electric charge, Q 23
. Similar to the gravitational coupling
constant, there is a dimensionless constant which characterises the strength of the
electromagnetic interaction (which is responsible for almost all of chemistry) –
the ‘fine structure constant’ αEM . The value of this constant is (accurately) pre-
dicted and measured using the theory of Quantum Electrodynamics, which is
a type of quantum field theory largely due to Richard Feynmann and Freeman
Dyson. It is given by
αEM =
1
4π 0
e2
¯hc
(2.49)
where 0 is electric permittivity of the vacuum. It has units [ 0] = [Farads/Meter] =
[Seconds4
Amps2
Meters−2
kg−1
] = 4T + 2Q − 2T − 2L − M. Hence
[ 0] = 2T + 2Q − 2L − M. The parameter e is the charge of an electron,
with dimensions [e] = Q.
Using ‘natural units’ – a popular convention in particle physics, we set all of our
previous parameters to equal 1. Thus, 4πGN = c = ¯h = 0 = 1, where 0 is
electric permittivity of the vacuum. In these units, the fine-structure constant is
given by
αEM =
e2
4π
≈ 7.297 × 10−3
. (2.50)
QIV:Choosing natural units: 4πGN = c = ¯h = 0 = 1, is the same as forcing
these parameters to be dimensionless. Show that this is equivalent to setting all
the fundamental units to be the same T = L = M = Q. Hint: you should get
four equations for the dimensions of these parameters.
Note that you can calculate the values of the fine-structure and gravitational cou-
pling constants yourself by Googling their values in SI units (or any other con-
sistent set of units you choose). Taking their ratio, we see that (in natural units):
αEM
αG
= (
e
me
)2
≈
7.297 × 10−3
1.752 × 10−45
≈ 4.16 × 1042
. (2.51)
This says that the electromagnetic force is about 42 orders of magnitude24
stronger
than the gravitational force. In a similar fashion, the weak-nuclear force is about
32 orders of magnitude (1032
) times stronger than gravity. The challenge to
explain why gravity is so weak compared to the other forces is known as ‘the
heirarchy problem’.
One class of attempts to solve the heirarchy problem, involves the visible uni-
verse being confined to a 4-dimensional ‘brane’, which is basically a 4-dimensional
slice living in a larger spacetime. Such models are called ‘braneworld models’.
In this view, the electromagnetic, weak and strong nuclear forces take place on
the 4-dimensional brane – but gravitational interactions (mediated by ‘graviton’
particles) take place in 4-dimensions and in the ‘large extra dimensions’. This
then gives a natural explanation to the gravitational coupling constant being so
small. In some variations 25
, the introduction of large extra-dimensions also
solves the ‘Dark Energy’ or ‘Cosmological Constant’ problem – where Dark En-
ergy naturally arises as the ‘surface tension’ of the 4-dimensional brane. Using
braneworld models, we can derive (!) Newton’s gravitational constant directly
from the size (‘hyper-volume’) of the extra dimensions in our universe.
23
The SI unit for charge is Coulombs.
24
Note, 42 is also the meaning of life.
25
Those investigated in the present author’s masters thesis.

A very special class of braneworld models , known as known as theories with
‘Supersymmetric Large Extra Dimensions’ envisions spacetime as 6-dimensional
(4-dimensional brane + 2 large extra dimensions) with some super-symmetry
added – this enables bosons and fermions to transform into each other 26
. In
these models, the extra-dimensions take the form of some compact hypersur-
face. Newton’s gravitational constant GN is then derived from the relation 27
:
GN =
3κ2
16πS
(2.52)
where S is the surface-area of the extra dimensions and κ is Einstein’s constant,
with dimensions [κ] = [GN ].
QV:The above formula for GN is correct, even though it may look dimensionally
incorrect. What units would S need to have for dimensional consistency? In that
case, what quantity does the surface-area S actually represent? Hint: Recall the
‘unit vector’ in Newton’s law of gravity.
The last problem illustrates a common theme in engineering, physics and math-
ematics – normalization. Normalized quantities are typically dimensionless! As
such, they are very useful and friendly to work with.
26
Supersymmetry removes the problem of Tachyons in String Theory and also stabilizes the
mass of the Higgs boson.
27
First derived in this generality by the present author in 2013.

Chapter 3
Geometry of Antiquity and The
Universe
3.1 Introduction: Conic Sections
Our scientific perception of the world today, is due largely to the great geometers
of antiquity. Pythagoras’ theorem for example, essentially defines the ‘straight-
line’ (Euclidean) distance between two points in space – giving us Euclidean
preconceptions of the world. In this manner, one of the most influential devel-
opements that the Greeks left us with, is the theory of conic sections. Developed
to a large extent by Appolonius and Archimedes, conic sections have provided a
core staple of the framework for the scientific renaissance instigated by Galileo
and Kepler – leading ultimately to Newton’s theory of gravity, the planetary or-
bits and a heliocentric view of the universe.
Definition 5 A traditional conic section is the curve of intersection, obtained by
slicing a cone with a plane. Geometricallys, a general conic C is a set of points
S whose distances to a fixed point (focus) F and a fixed line (directrix) l are in
a constant ratio (the eccentricity) . Algebraically:
p ∈ C ⇐⇒
d(p, F)
d(p, l)
= , (3.1)
where d is any metric (measure of distance).
Note that in the special case of a (Euclidean) circle, the focus is at the center
of the circle and the directrix is at infinity – hence the eccentricity = 0 for a
circle.
The following problem should be re-attempted at the end of each (conic) section
of this chapter.
Problem 7 (GoPro or Go Home) Frustrated by his attempts to retake Constantino-
ple from the neo-Ottoman empire, the cyborg Emperor Constantine decides to go
home. Getting into his skytaxi, which travels on fixed skylanes which permit only
perpendicular turns, Constantine realises that he travelling in an l1 metric space
– the ‘taxicab geometry’. Here the distance between two points P1 = (x1, y1)
and P2 = (x2, y2) in R2
is defined by the so-called taxi-cab metric:
d1(P1, P2) := |x1 − x2|+|y1 − y2|. (3.2)
1. Using the geometric definition of a parabola, sketch the graph of a few
parabolas with different focal lengths in the taxi-cab metric.
31

32 CHAPTER 3. GEOMETRY OF ANTIQUITY AND THE UNIVERSE
2. Using the geometric definition of a circle, sketch the graph of a unit circle
in the taxi-cab metric.
3. Using the geometric definition of an ellipse, sketch the graph of a few
ellipses – with varying eccentricity, in the taxi-cab metric.
4. Using the geometric definition of a hyperbola, sketch the graph of a few
hyperbolae in the taxi-cab metric.
5. Compare the above ‘taxi-cab’ conic sections to graphs of the correspond-
ing Euclidean conic sections.
3.2 Parabolas and Geometric Optics
3.2.1 Overview
The first, most significant application of parabolas, was in Galilleo’s revolution-
ary projectile motion experiments. Sesequently, they served as ‘Victoria’s secret
models’ for Isaac Newton’s ‘Principia Mathematica – in particular, in his analy-
sis of conic sections and Kepler’s laws of planetary motion.
We shall begin first by giving a general geometric definition of the parabola,
then deriving the canonical (natural) equation for a Euclidean parabola in carte-
sian coordinates. Once this is established, we will investigate and derive a few
remarkable properties of parabolas – in particular, motivated by the ‘science of
light’ (optic). Finally, we will study some fun, practical applications of parabo-
las in regards to the natural world – geometric optics, projectile motion and
parabolic orbits.
3.2.2 The Parabola
Definition 6 (Geometric Definition) A parabola is the set of points which is
equistant from a focus (fixed point) and directrix (fixed line).
It follows from the definition, that a parabola is a conic section with eccentricity
= 1. In particular, it can be obtained by slicing a cone parallel to a plane
tangent to the cone.
See Whiteboard Diagram
By now, most of you will be familiar with algebraic forms of the parabola. For
example, as a rational normal curve with exponent 2 in algebraic geometry, or as
cartesian equation: y = x2
from Euclidean geometry. You will now derive the
canonical Euclidean equation y = x2
from the geometric definition.
Exercise 6 (Pachelbel’s Parabola and The Canon Equation) Having being am-
bushed by the ineffable weeping angels, Dr. Who is forced back to the Renais-
sance Era. In a severe misunderstanding, he accidentally replaces Pachelbel’s
Canon for a derivation of Canoncial Euclidean parabola equation – which he
needs to return to his own timeline. Trying to make sense of the parabola,
Pachelbel decides to invent the Cartesian coordinate system so he can graph
this technology of the ‘future’.
Help Palchelbel by deriving the canonical parabola equation, while sketching
every step clearly.

3.2. PARABOLAS AND GEOMETRIC OPTICS 33
1. Draw horizontal (x) and vertical (y) coordinate axes. On the vertical axis
– the symmetry axis of the parabola, mark the origin, (0, 0) – this is the
vertex of the parabola. Upwards from the origin, on the symmetry axis,
mark the point F = (0, f) – this is the ‘focal point’ (focus) of the parabola
and f is the ‘focal length’. Below the x-axis, draw the line y = −f – this
is the ‘directrix’ of the parabola.
Note that the value, y = −f for the directrix, can be derived from the
definition of the parabola having chosen the origin 0 = (0, 0) to lie on the
parabola. In particuarly, the length OF is equal to the distance from O to
a point perpendicularly below O on the directrix.
2. Pick any point P in the plane – preferably one in the positive (x, y) quad-
rant. Now draw a line FP between this point and the focus F. Draw
another line PD from P to a point D perpendicularly below on the direc-
trix. By the definition of the parabola, the lines FP and PD should have
equal length. Therefore, using Pythagora’s theorem to compute the length
of FP, derive the following relation:
(y + f)2
= x2
+ (y − f)2
. (3.3)
3. Using the above equation, show that:
y =
x2
4f
. (3.4)
This is the cartesian equation for a Euclidean parabola with focal length
f, axis of symmetry along the y-axis and vertex (0, 0). Setting f = 1
4
, we
get the ‘canonical parabola equation’:
y = x2
. (3.5)
Problem 8 (The Doctor’s Cannon) Having finished his derivation, Pachelbel
returns to Dr. Who to verify his mathematical construction. At this point, Dr.
Who has added skrillex to Pachelbel’s cannon. Furious, Pachelbel demands that
Dr. Who remove all new additions to the cannon. Reluctantly, Dr. Who decides
that he shall acquiesce provided that Pachelbel removes all redundant steps from
his mathematical derivation and justify its generality.
Help save history from skrillex by helping Pachelbel in his derivation. In par-
ticular, some steps in the above derivation provided superfluous, ‘a-priori’ infor-
mation. Can you identify which ones?
Furthermore, we choose the origin to be the vertex and the y − axis to be the
symmetry axis – this made calculations easier. What ‘obvious’ properties of Eu-
clidean space allow us to do this, without losing any generality in our derivation?
Problem 9 (Constantine’s Plasma Cannon) On his way home, cyborg Emperor
Constantine’s taxicab is ambushed by the weeping angels who are hunting Dr.
Who throughout spacetime. As a result, the emperor is teleported back to Pachel-
bel’s study in the Renaissance era. Seeing this opportunity, Dr. Who and Pachel-
bel beg for the emperor’s help – in particular, his plasma cannon should give the
angels something real to weep about. To this end, Constantine decides he will
help Dr. Who and save the universe from skrillex music ... iff Dr. Who helps him
to sketch and derive the parabola equation in the taxi-cab metric.
Help Constantine help Palchelbel help Dr. Who, by writing down the cartesian
equation for a parabola of focal length f with distances defined by the taxi-cab
metric d1 instead of the euclidean metric.

Now sketch this parabola.
3.2.3 Scale Invariance and Transcendality
Recalling from earlier exploration sessions, we studied several physical systems
and laws of the universe which exhibited very special constants – in particu-
lar, ‘dimensionless constants’ which characterised such laws or systems on all
length scales. The Reynold’s number for fluids and fine-structure constant for
quantum electrodynamics were two such constants. Now we present a mathe-
matical constant which characterises the ‘shape’ of all parabolas in a universal,
scale-invariant way. Since this constant is dimensionless, it is invariant under
conformal transformations. 1
First, we must define the ‘Latus rectum’ of the parabola. In particular, the latus
rectum of a parabola is the chord perpendicular to the symmetry axis (i.e. parallel
to the directrix) which passes through the focus F and intersects the parabola on
each side of the symmetry axis.
Exercise 7 (Parabolic Proctology) Using the geometric definition of a parabola,
prove that the latus rectum has a length of 4f, where f is the focal length of the
parabola.
Hint: For a parabola of the form y = x2
4f
, note the y-coordinates of the point at
which the latus rectum intersects the parabola.
Hint: Since you’re using the geometric definition of a parabola, you will have to
make use of the directrix – which is conveniently located at y = −f if you chose
the above parabola.
Definition 7 (Universal Parabolic Constant) The universal parabolic constant
P, is defined as the ratio (for any parabola), of the arc length S of the parabolic
segment formed by the latus rectum to the focal parameter 2f (half the latus-
rectum length) :
P =
S
2f
. (3.6)
Exercise 8 (Who would like to write a Fugue?) Whilst waiting for the cyborg
emperor to take care of the angels, Dr. Who picks up an renaissance guitar
ancestor and plays ‘While my guitar gently weeps’. Unsatisfied, he decides to
write a Fugue. Fugue’s, interpretted in the right sense, possess (almost) confor-
mal symmetry. One particular conformal symmetry is the ‘dilation/contraction’
operation – which shrinks or expands vectors (and hence objects).
If we let f have units of length, use dimensional analysis to prove that the uni-
versal parabolic constant is dimensionless.
Now, for a more serious derivation, we shall calculate the exact value of P and
prove a remarkable number-theoretic property – that it is transcendental.
Problem 10 (Transcendence (Hard)) 1. Simplifying the problem: Because
of translational and rotational symmetry, it suffices to consider a parabola
of the following form: y = x2
4f
with the y axis as its symmetry axis and
origin (0, 0) as the vertex.
1
Roughly, transformations that preserve relative angles but not lengths.

2. Calculating parabolic arc-length To calculate the arc-length of parabola
cut-off by the latus rectum, we express the parabola as a parametric curve
γ with curve parameter x:
γ(x) = (x,
x2
4f
), (3.7)
hence γ maps the parameter x to the corresponding point (x, y) = (x, x2
4f
)
on the parabola.
Since the tangent vector to this curve represents infinitesimal rates of change
along the curve (with respect to parameter x), it is given by the velocity
vector:
d
dx
γ(x) =
d
dx
(x,
x2
4f
) = (1,
x
2f
). (3.8)
In particular, an infinitesimal length element along the curve, is repre-
sented by the vector (differential 1-form):
dγ = (1,
x
2f
)dx, (3.9)
which has magnitude:
ds = (1,
x
2f
) dx =
d
1 +
x2
4f2
dx. (3.10)
Hence, if we integrate this length element from x = −2f to x = +2f (the
end points of the latus rectum), we get the parabolic arc length we desire:
S =
2f
−2f
d
1 +
x2
4f2
dx. (3.11)
Now, the universal parabolic constant was defined to be P = S
2f
, hence:
P =
1
2f
2f
−2f
d
1 +
x2
4f2
dx. (3.12)
3. Integration Step I Use a change of variables to prove that we can simplify
the arc-length integral to the following canonical form:
P =
1
−1
?
1 + t2dx. (3.13)
This form is ‘canonical’ in the sense that focal length f doesn’t appear
anywhere in the integral.
4. Integration Step II Use trigonometric substitution (or otherwise) to show
that:
P = arcsin(1) +
?
2. (3.14)
Hint: Recall the hyperbolic trigonometric identities:
cosh2
(θ) − sinh2
(θ) = 1 =⇒ 1 + sinh2
(θ) = cosh(θ), (3.15)
cosh(2θ) = cosh2
(θ) + sinh2
(θ) = 2 cosh2
(θ) − 1. (3.16)

5. Algebraic Simplification Using the definition of hyperbolic sine:
sinh(θ) =
eθ
− e−θ
2
, (3.17)
along with the quadratic formula:
az2
+ bz + c = 0 ⇐⇒ z =
−b ±
?
b2 − 4ac
2a
, (3.18)
prove that
arcsin(1) = ln(1 +
?
2). (3.19)
Hint: let z = eθ
, then solve sinh(θ) = 1 for theta using the Euler expan-
sion for sinh given above.
6. Transcendality Recall that a real number α is transcendental if it is not the
root of any polynomial equation with rational coefficients. Real numbers
which are roots of polynomials with rational coefficients are ‘algebraic’
numbers. Hence if a number is transcendental it cannot be algebraic and
vice-versa. It follows that the sum of a transcendental number and an
algebraic number is necessarily transcendental.
To see that the universal parabolic constant P = ln(1 +
?
2) +
?
2 is
transcendental, it suffices to prove that ln(1 +
?
2) is transcendental. This
is because
?
2 is irrational, but not transcendental: in particular, we can
form a quadratic equation with rational coefficients: x2
− 2 = 0, of which?
2 is a root.
To see that ln(1 +
?
2) is transcendental, we do a proof by contradiction.
In particular, the Lindemann–Weierstrass theorem implies that if λ is alge-
braic (not transcendental), then eλ
is necesarily transcendental. Hence, if
ln(1 +
?
2) were algebraic, eln(1+
?2)
= 1 +
?
2 would be transcendental –
however, it is clearly not since this is a root of a quadratic with rational co-
efficients. Therefore, ln(1+
?
2) is transcendental and hence the universal
parabolic constant:
P = ln(1 +
?
2) +
?
2 2.295587, (3.20)
is a transcendental number.
Problem 11 (Tying loose ends) Prove the assertion that 1+
?
2 is an algebraic
number. In particular, find a polynomial with rational coefficients such that one
of its roots is equal to 1 +
?
2.
Hint: Recalling elementary polynomial theory, roots of the form: α +
?β –
where α, β are integers, come in pairs: λ = α ±
?β. Therefore, you should be
looking for a quadratic.
Problem 12 (Pi Day) If you didn’t celebrate Pi day, use the Lindelmann-Weirstrass
theorem to prove that π is transcendental. In particular, recall Euler’s formula:
eiπ
+ 1 = 0. (3.21)
Hint: Try assuming that iπ is algebraic.
Look at what we achived so far – we have proved that parabolas are charac-
terised by a transcendental dimnesional constant. Transcendental numbers are
extremely rare – e and π being the most famous examples.

3.2.4 Symmetries and Canonical Form
The natural symmetries of Euclidean space are symmetries which preserve the
Euclidean metric – that is, transformations of Rn
which leave lengths and rela-
tive angles (i.e. angles between vectors) unchanged. In elementary terms, these
are symmetries which leave the ‘dot-product’ unchanged. Because of this, as
parabolas are invariant under rotations and translations – their governing equa-
tions in a given coordinate system might change, but the parabola itself will be
unaffected. For example, translations simply correspond to a shift in the focus
F of the parabola, whilst rotations correspond to a rotation of the directrix D of
a parabola. Therefore, we can define a parabola more abstractly in the following
way.
Definition 8 (Ogburn’s Definition) Given a metric space (M, d) with set M
and metric d, a parabola is the ordered pair (F, D) where F ∈ M and D is a
straight line in (M, d), satisfying the following properties:
1. Focal Parameter: The minimum distance between F and D is 2f.
2. Parabolic Property: When (F, D) acts on any subset S of M, the result is
the collection of points U ⊂ S which is equidistant from F and D:
d(U, D) = d(U, F). (3.22)
In this manner, it becomes clear that if d is an inner product – such as the
Euclidean metric (dot-product), then a parabola (F, D) will be preserved by
isometries (rotations and translations for Euclidean space) since they preserve
the ‘parabolic property’ and ‘focal parameter’.
In Euclidean space, we can take any parabola and apply a sequence of transfor-
mations to it so that it becomes a canonical parabola y = x2
4f
. In particular, we
will need at most 2 translations to move the focus to F = (0, f), followed by at
most 1 rotation to rotate the directrix to coincide with the line y = −f . Proving
this for parabolas which have only been translated and/or rotated by multiples of
90 degrees, is relatively simple – which we shall do now.
Problem 13 (Transformations and Canonical Form) 1. Translations Given
a parabola of the form:
ay2
+ bx2
+ cy + dx + e = 0, (3.23)
where a, b, c, d, e are real constants and either a or b is zero, complete the
square to get a parabola of the form:
(y − y0) =
(x − x0)2
4f
, or (x − x0) =
(y − y0)2
4f
. (3.24)
In particular, find the vertices (x0, y0) and focal lengths f for these parabo-
las in terms of a, b, c, d and e.
2. Vertices Using the previous equations:
(y − y0) =
(x − x0)2
4f
, or (x − x0) =
(y − y0)2
4f
, (3.25)
prove that (x0, y0) is indeed the vertex of each of these parabolas.
Hint: It suffices to show that (x0, y0) is a minimum or maximum critical
point (turning point) of each the curves. Use calculus.

Problem 14 (DIY) For parabolas which have been rotated through some arbi-
trary angle θ, we note that parabolas can be put into 1-1 correspondence with
quadratic forms. Using the quadratic form corresponding to a given parabola,
we can then apply change of basis transformations (rotation matrices) to rotate
the parabola back into the standard orientation with the y axis the symmetry
axis. Investigate this when you get the chance!
3.2.5 Optical Properties and Spherical Aberration
Due to their reflective properties, parabolas act as the ideal shape for many mir-
rors and lenses. In reality, parabolic lenses are difficult to construct, so ‘spherical
lenses’ are used instead. To this extent, one takes the radius of curvature of such
a lense be large relative to the length of the lense – then one can approximate the
portion of circle traced out by the lense as a parabola. Such an approximation is
the basis for a large amount of classical optics – for example, lens making.
Perhaps the most ‘physically’ important mathematical property of the parabola,
is its ‘parabolic reflection property’. To this extent, in the following, we shall
treat parabolas as ‘reflective surfaces’ and take it for granted that light travels in
straight lines (geodesics to be precise). Furthermore, we shall assume the law of
reflection: that is, that the angle between the normal to a surface and the incident
light ray is equal to the angle between the reflected light ray and the normal to
the surface. Mathematically:
θincidence = θreflection. (3.26)
For the purpose of reflection, we look at the tangent plane to a surface at a point
– the point where the incident light ray strikes the surface. This allows us to
apply the law of reflection to arbitrary differentiable surfaces.
Theorem 3 (Parabolic Reflection) Light rays incident on a reflective parabola,
parallel to the axis of symmetry are reflected back through the focus. Conversely,
light rays incident on the parabola which travel through the focus, are reflected
from the parabola along a line parallel to the symmetry axis.
Problem 15 (Reflective Moments) To prove this theorem we must do the fol-
lowing:
1. Simplify Since parabolas are characterised by a universal constant, it suf-
fices to prove the reflection property for a simple parabola of the form
y = x2
– i.e. f = 1
4
.
2. Diagrams Draw the focus F, vertex O and point P = (x0, y0) on the
parabola which the light ray hits. Now draw a line PD from P to the
point D perpendicularly below P, lying on the directrix. Draw the line
FP – this has the same length as PD, via the geometric definition of a
parabola.
3. Bisector = Tangent Draw a point M as the mid-point of the line con-
necting F and D. Then, using the law of reflection and some congruent
triangles, you should be able to show that MF bisects the angle FPD –
in particular, MF is perpendicular to FD. Now locate the x coordinate
of the point M – you should be able to prove (again using the geometric
definition of the parabola), that x = 1
2
x0 – i.e. the midpoint of the line
OD.

Now use calculus to calculate the slope of the tangent to the parabola at
the point of light intersection, P. Prove that the slope of the bisector MP
is equal to the slope of the tangent at P – hence identifying the bisector as
the tangent to parabola at P.
4. Fin At this point, the theorem has been proved. Do the necessary trigonom-
etry to and ray diagrams to see why this is so (unless it’s already obvious
to you). If you’re still stuck, as your tutor to draw the diagrams for you!
So far, we have demonstrated that (reflective) parabolas have the unique prop-
erty of reflecting light rays which are parallel to their symmetry axis, through the
focus of the parabola and vice-versa. Therefore, for many practical applications
– where a single focal point is required, parabolic lenses are the ideal lens. In
reality however, it is hard to make perfectly parabolic lenses so spherical or ‘cir-
cular’ lenses are used instead. The idealised performance of such lenses depends
on the ratio of the tangential length L of the lens, to the radius of curvature R of
lens. In particular, for the lens to ‘behave like a parabola’, its length L must be
much smaller than the radius of curvature and hence the focal length f (noting
that R = 2f). The deviation or ‘error’ arising from this parabolic approxima-
tion is the essence of ‘spherical aberration’ – that is, the blurring and loss of
resolution of images formed by the lens.
Problem 16 (Parabolic Approximation and Spherical Abberation in Lenses)
To quantify the previous statements, we shall now investigate spherical aberra-
tion mathematically. Let yp define a segment of a parabola – i.e. an ideal lens,
with focal length f and tangential length L. Now let ys define the lower segment
of a semi-circle whose center lies a distance R = 2f directly above the vertex of
the parabola – this represents a circular lens. Therefore, we have
yp =
x2
4f
ys =R − R
™
1 − (
x
R
)2, −L ≤ x ≤ L
∆y :=ys − yp, (3.27)
where ∆y is the difference between the y coordinate of the lower semi-circle,
ys, and the parabola, yp. Approximating a circular lens – a lower semi-circle,
by a parabola whose vertex (0, 0) coincides with the edge of the semi-circle and
whose focal length f = 1
2
is half the radius of curvature R of the circle, we get an
error ∆y which grows the further away we are from the vertex of the parabola.
1. Taylor Expanding the Semi-Circle Using a Taylor expansion about zero,
in the variable z := x
R
, show that we can write semi-circle equation as:
ys =
∞
k=1
(−1)k
¢1
2
k

(
x
R
)2k
=
1
4
(
x
R
)4
+
1
8
(
x
R
)4
+
1
16
(
X
R
)6
+ ... (3.28)
Hint: You can use binomial theorem instead. In particular, this says that
for any real constant α and variable z with |z| 1:
(1 + z)α
=
∞
n=0
¢
α
n

zn
, (3.29)
where the binomial coefficients are defined by:
¢
α
n

=
α!
n! (α − n)!
. (3.30)

When α is non-integer, the binomial coefficients are generalized by the
‘Gamma funtion’ Γ – or equivalently, for real-valued α, the ‘Pochammer’
symbol’ (α)(n) := α(α − 1)...(α − n + 1). In particular, we have:
¢
α
n

:=
Γ(α + 1)
Γ(n + 1)Γ(α − n)
=
α(α − 1)...(α − n + 1)
n!
. (3.31)
Note that for integer n, Γ(n + 1) = n!.
2. A Parabola: To be or not to be Using the series expansion of the cir-
cle equation, ys, show that the error in the parabolic approximation for a
spherical(circular) lens, is given by:
∆y =R − R
™
1 − (
x
R
)2 −
x2
2R
=
∞
k=2
(−1)k
¢1
2
k

(
x
R
)2k
=
1
8
(
x
R
)4
+
1
16
(
x
R
)6
+ ... (3.32)
This shows that the ‘spherical abberration’ that occurs in the parabolic
approximation of a circular lens, is of the order O(( x
R
)4
) – where x is the
distance from the vertex of the lens in the direction parallel to the directrix
– i.e. perpendicular to the symmetry axis of the parabola. In particular,
the maximum error we have is:
Max[∆y] =
∞
k=2
(−1)k
¢1
2
k

(
L
R
)2k
= O((
L
R
)4
), (3.33)
where xmax = L is the length of the lens measured by a line tangent to
the vertex of the lens. In our case, this is the length of the line tangent to
the parabola at (0, 0) when we are trying to approximate the parabola near
its vertex by a circular arc. Thus, one way to keep the spherical aberration
small is make the radius of curvature R of the lens large with respect to
the length L of the lens.
We shall now apply the last result to obtain a differential error estimate which
quantifies how the spherical-aberration of a lens (‘non-parabolicness’) generates
a ‘fuzziness’ or spread in the in focus. In particular, instead of the focus being a
single (ideal) point, it now becomes a small line segment – physically leading to
a blurriness of images formed by the lens.
Problem 17 (Losing Focus!) Thanks to Emperor Constantine’s ‘plasma inter-
vention cannon’, the weeping angels are now a thing of the past. However, the
‘past’ is relative! This means that the weeping angels still lurk in one of many
universes. Not to fear, ‘The Doctor’ 2
decides it is time to return to the future
– leaving Pachelbel’s (musical) cannon unspoiled by The Doctor’s attempted
Skrillex additions.
To travel to the future, The Doctor needs to fire up his ‘Alcubierre’ warp drive
– this will allow him to generate a faster-than-light warp-bubble which he can
travel through spacetime with. However, during the angel attack, one of his
synchronising lasers was damaged. In order to fix the laser, he must ground a
2
Thanks ‘The’ Dr. Ashleigh Punch for noting that ‘Dr. Who’ should always be referred to
as ‘The Doctor’. Good luck to her when she finally meets, marries him and has time-travelling
babies.

new optical lens – such that its focal point, F = (0, f), shifts by a maximum of
1 micrometer: ∆f = 1µm = 10−6
m under the effect of spherical aberration.
Assuming he needs a lens of length L = 1mm = 10−3
m, we can help The
Doctor, as follows.
1. Mathematical Constructions By re-writing the standard parabola equa-
tion, we can express the focal length f as a function of the (x, y) coordi-
nates:
f =
x2
4y
. (3.34)
Now, we note that a ‘linear approximation’ to the error in the focal length,
is given by the ‘total differential’, df. In particular, by viewing f = f(x, y)
as a function of two-variables x and y, show that its total differential (ex-
terior derivative) is given by:
df =
x
2y
dx −
x2
4y
dy. (3.35)
Hint: Recall that the total differential of a function f(x, y) of two variables
is given by:
df(x, y) :=
∂f
∂x
dx +
∂f
∂y
dy. (3.36)
2. Physical Estimates Now, we replacing the differential df, dx and dy by
their finite counterparts: ∆f, ∆x, ∆y – i.e. the ‘error’ in f,x and y, we
get:
∆f =
x
2y
∆x −
x2
4y
∆y. (3.37)
To get the maximum error in the focal length however, we need to consider
the magnitude of error contributions from ∆x and ∆y, hence we re-define
∆f as:
∆f :=|
x
2y
||∆x|+|
x2
4y
||∆y|. (3.38)
Therefore, the error ∆f is maximized when ∆x and ∆y are maximized
(for a fixed coordinate (x, y)).
Physically, we set ∆x = 0 since there is no ‘error’ in the x-coordinate of
our lens – the tangent to circle aligns with the tangent to parabola vertex.
Now, we recall from the last problem that the maximum error in our y
coordinate is given when x takes its maximum value x = L – i.e. the
‘spherical aberration’ is maximized at the edges of the lens (away from
the vertex):
Max[∆y] =
∞
k=2
(−1)k
¢1
2
k

(
L
R
)2k
= O((
L
R
)4
), (3.39)
where R = 2f is the ‘radius of curvature’ of the lens (the radius of the
circle).
Using x = L, y = L2
4f
, ∆x = 0 and the maximum value for ∆y, show that:
Max[∆f] = fMax[|∆y|] = f|
∞
k=2
(−1)k
¢1
2
k

(
L
R
)2k
|= O((
L
R
)4
).
(3.40)

3. Experimental Solution Ignoring higher-order contributions to the error in
f, we have:
∆f ≈
1
8
(
L
R
)4
=
1
16
L4
R3
. (3.41)
Show this by taking the ﬁrst term in the binomial expansion above.
With this leading-order estimate for ∆f, we want ∆f ≤ 10−6
m to achieve
the accuracy desired for The Doctor’s laser. For the given lens length
L = 0.001m, calculate the minimum radius of curvature Rmin for the lens
required to achieve the accuracy: ∆f ≤ 10−6
m.
4. Checking Validity By taking the next term in the binomial expansion, we
can compute the next order contribution to the error in f:
f
1
16
(
L
R
)6
=
1
32
L6
R5
. (3.42)
Rather than adding this to the error ∆f, we can instead use the value of
Rmin we calculated (which gave ∆f = 10−6
m) to estimate the relative
magnitude of the leading error term: 1
16
L4
R3 and the next correct term , 1
32
L6
R5 .
Compute the ratio of these error terms and argue whether or not it was
justiﬁed to ignore the next correction term when calculating an approxi-
mation for Rmin. For example, if the ratio is less than 0.01 (or 1%), we
can justify ignoring the correction term.

3.3. ELLIPSES AND PLANETARY / ATOMIC ORBITS 43
3.3 Ellipses and Planetary / Atomic Orbits
3.3.1 Overview
In this session we shall review one of the great conic sections from antiquity –
the ellipse! Ellipses have played an important role in the scientific and cultural
history of human society – perhaps most controversially3
as proof (through Jo-
hanne’s Kepler’s laws of planetary motion) that the earth and other planets, orbit
the Sun along elliptical trajectories. Here, we shall study the famous ‘two body’
problem – that is the orbits of two massive objects interacting with each other
gravitationally. After constructing the associated differential equations and con-
stants of motion, we shall solve the two body problem to derive Kepler’s laws
of planetary motion – in particular, obtaining elliptical trajectories as bound or-
bits.
Apart from planetary or semi-classical atomic orbits, ellipses play a huge role
in modern mathematics and physics. In particular, this includes ellipsoidal har-
monic analysis (e.g. MRI scans), elliptical integrals and even in the generalized
sense – elliptic curves used in the proof of Fermat’s Last Theorem. Hence, apart
from scholarly and cultured reasons to study ellipses, it is prudent for mathemati-
cians and scientists to have some working knowledge of their geometry.
3.3.2 The Ellipse
We can think of an ellipse as a ‘stretched’ circle in the sense that a circle is a
special case of an ellipse – an ellipse that has zero eccentricity. More generally,
we can define an ellipse as the planar curve which is generaetd by slicing a cone
with a plane non-parallel to the cone’s symmetry axis. Hence, if we slice the
cone perpendicular to its symmetry axis, we will get a circle. If we slice the cone
at an angle 0 θ π
2
to the symmetry axis we will get a general ellipse.
See Tutor for Diagrams
A more operationally useful definition, is the following geometric definition.
Definition 9 (Geometric Ellipse) Given a metric4
d and set M, we choose two
fixed points F1 and F2 in M. An ellipse is then defined to be the subset E of
points of M, such the sum of the distances from any point p ∈ E on the ellipse,
to each of the foci, is a constant:
d(p, F1) + d(p, F2) = 2a. (3.43)
The constant 2a, is the length of the major axis of the ellipse – which is the
straight line segment connecting the foci F1 and F2 to opposing edges of the
ellipse. The line-segment perpendicular to the major axis and intersecting the
center of the ellipse, is the ‘minor axis’ – conventionally, we label its length as
2b. The distance d(F1, F2) between the foci is defined as d(F1, F2) = 2f, where
f is said to be the focal length of the ellipse.
3
Recall that Galileo was persecuted by the church for proposing a heliocentric model of the
solar system.
4
Recall that a metric is a means of measuring (or defining) distances.

Exercise 9 (Not so eccentric) Using the above geometric definition of an el-
lipse, prove that when the foci F1 and F2 are located at the same point – i.e.
F1 = F2 = F, that such an ellipse is simply a circle.
Hint: Recall that a circle B(r; C) with center C and radius r is defined to be set
of all points at a distance r from the central point C:
d(p, C) = r ∀p ∈ B(r; C). (3.44)
The previous definitions, although powerful, are somewhat abstract. Some of
you will be more familiar with the ‘cartesian form’ form for a Euclidean ellipse
– an ellipse with the Pythagoras measure of distance 5
. In the next problem, we
shall derive this ‘standard’ ellipse equation.
Problem 18 (A Canonical Western) Having destroyed the weeping angels, back
in his home spacetime neighbourhood the cyborg Emperor Constantine decides
it is time to relax. In particular, he feels like watching an old ‘western’ style
movie – to his surprise, it turns out that his excursions with The Doctor have
removed Clint Eastwood from history! Shocked, the Emperor decides to travel
back in time to the wild west, taking his plasma cannon with him.
Stepping into Ye Old Town, Southern Mississippi, he comes across a rowdy coun-
try girl, ‘Big A. Geller’. Noticing his large cannon and thinking herself Numero
Uno as the county sheriff, she challenges the cyborg to a game of cards. Not
wanting to be beaten by her classic parlour tricks, the cyborg ups the challenge
and calls a duel. The rules of duelling in Ye Old Town are such that each con-
testant must stand at a fixed location. The crowd then must stand such that the
sum of a spectator’s distance from each of the contestants is equal a fixed value
– ‘the duelling constant 2a’, which is chosen prior to the duel by the duelling
master.
You – the duelling master and member of Ye Old Town, are asked to provide the
equation and draw the curve on which the crowd must stand during the duel.
1. Symmetry is your friend Because of the translational and rotational in-
variance of Euclidean space, it suffices to consider an ellipse whose center
is at the origin (0, 0) of some cartesian coordinate system (translational
symmetry). Furthermore, we may choose the foci F1 and F2 to lie on the
x axis, coinciding with the major axis of the ellipse (rotational symmetry).
Such an ellipse is now in ‘canonical form’. Therefore, we have:
F1 = (−f, 0) and F2 = (f, 0), (3.45)
where f is the focal length of the ellipse.
Convince yourself of these arguments above. For example, consider an
ellipse is characterised by its focal length and the length of its major axis
– or equivalently, its eccentricity and focal length or its major and minor
axes lengths. Now consider what happens to these parameters when you
rotate or translate the ellipse.
2. Algebra Let the point P = (x, y) be an arbitrary point on the ellipse. The
geometric definition of an ellipse says that each point must have a sum of
distances to the foci, which is constant – equal to the major axis length.
5
Recall, this means that d((x1, y1), (x2, y2)) =
a(x1 − x2)2 + (y1 − y2)2 – the length of
the hypotenuse of the triangle with sides of length |x1 − x2| and |y1 − y2|.

Hence, we have:
d(P, F1) + d(P, F2) = P − F1 + P − F2
=
—
(x + f)2 + (y − 0)2 +
—
(x − f)2 + (y − 0)2
=
—
(x + f)2 + y2 +
—
(x − f)2 + y2. (3.46)
Using the fact that d(P, F1)+d(P, F2) = 2a for all points P on the ellipse,
simplify the resulting equation:
—
(x + f)2 + y2 +
—
(x − f)2 + y2 = 2a (3.47)
to the canonical form:
x2
a2
+
y2
b2
= 1, (3.48)
where b =
—
a2 − f2 = a
—
(1 − 2) is the semi-minor axis length and
= f
a
is the eccentricity of the ellipse.
Hint: To get rid of square roots, take one square root to the other side
of the equation, square both sides and then simplfy. Using the simplified
equation, get rid of the remaining square root by moving all other terms to
the other side of the equation and squaring both sides again.
3. Interpretation The eccentricity = f
a
of the ellipse – ratio of its focal
length to semi-major axis length, controls how ‘stretched’ the ellipse is in
the x and y directions. To get an understanding of this parameter, it helps
to draw a few different ellipses corresponding to different eccentricities.
i)Draw an ellipse with eccentricty = 0. What curve is this?
ii) Now draw ellipses with eccentricity = 0.25 and = 0.8. What do
you notice?
iii) Try and sketch an ellipse with = 0.99. What happens as → 1−
? If
you recall the previous section, what special curve is obtained in the limit
= 1?
Hints: To sketch these curves, you must first fix some value for the focal
length f – or equivalently, the semi-major axis length a. For simplicity,
set a = 1 to obtain the resulting sketches.
Devil in the Details Using the relations given between the semi-major axis
length a and semi-minor axis length b, show that for all ellipses:
a ≥ b. (3.49)
Note that when we derived our ellipse equation, we assumed that the major axis
and x axis coincided. What would happen to the equation if the major axis was
instead along the y axis?
More generally, what can you say about the denominators of x2
and y2
appearing
in your ellipse equation and the location of the major axis of the ellipse?
Exercise 10 (Dat Metric) Referring back to the previous section on parabolas,
derive the cartesian equation for a ‘taxicab ellipse’. That is, derive an equa-
tion for an ellipse with major axis length 2a using the ‘taxicab’ (l1
) measure of
distance.
For simplicity, you may assume the foci are located at (−f, 0) and (f, 0).

3.3.3 Parametric Form
Now that we have studied Euclidean ellipses in cartesian form, it is prudent to
study the ellipse in ‘parametric form’. In particular, we shall study the ellipse in
‘polar coordinates’ parameterised by the angle θ between positive x axis and the
position vector r = (x, y). This will assist us in solving the two body problem,
which involves solving a second-order non-homogenous differential equation in
polar coordinates, for the planetary orbits given as trajectories (solutions to the
DE) in polar coordinates. Alternatively, the Laplace-Runge-Lens vector may be
used to solve the two-body problem (much simpler!), however this still requires
identifying the ellipse in polar coordinates.
Recall that polar coordinates (r, θ) are related to cartesian corodinates (x, y) by
the following equations:
x =r cos(θ) y = r sin(θ), (3.50)
whence r ∈ [0, ∞) and θ ∈ [0, 2π) (measured counter-clockwise).
To derive the polar form of the ellipse, parameterised by the polar angle θ, we
shall first construct a form of the ellipse with an arbitrary parameter s – then use
some geometry to convert relate this parametrisation to the polar one. This is
achieved in the following problem.
Problem 19 (Art thou 580nm?) The ellipse has been formed, the dust settles
and the crowd begins to go quiet. Suddenly, the duel is interrupted by the ap-
pearance of a wild, green titanoboa6
from a pre-historic era! It seems like the
cyborg forgot to turn off his time machine... Well well well, rowdy sherrify, Big
A. Geller. At this moment, the cyborg and sherrif agree to put their differences
aside to take down the gigantic serpent.
Upon taking position behind Ye Olde Tavern, the townspeople notice that Big
A. Geller is starting to panick. Rousing her to action, they implore – “R’ you
yella?! Being wildlings of the wild west, the crowd enjoy the spectacle and
forms a moving ellipse with the titanoboa at one foci and the sherrif and cyborg
at the other foci. In this moment, the sherrif notices a large wolf running the
perimeter of the ellipse. In order to estimate the time before the wolf attacks
the serpent (providing a perfect distraction to fire his 500nm laser), the cyborg
needs to know both the arc-length of ellipse segment and the angular velocity of
the wolf – a calculation most easily performed in polar coordinates.
Help the people of Ye Olde Town by deriving the polar coordinate representation
of an ellipse.
1. Arbitrary Parameter t Using an abitrary parameter t ∈ [0, ∞) we can
parametrise a standard ellipse (major axis along the x-axis) as the follow-
ing curve:
γ(t) := (x(t), y(t)) = (a cos(t), b sin(t)), (3.51)
where a and b are semi-major and semi-minor axes lengths of the ellipse.
Note that this parameter t is not the same as the polar angle – i.e. the angle
between the position vector and the x-axis!
Prove that the above parametrisation defines the standard ellipse.
6
An extinct species of snake from the Paleocene epoch (≈ 60 million years ago). These were
the largest snakes to ever exist – up to 12.8m long and ∼ 1, 100kg heavy!

Hint: Show that the equation:
x2
a2
+
y2
b2
= 1, (3.52)
is satisﬁed.
2. Polar Angle Parameter θ Now, switching to polar coordinates (r, θ), use
trigonometry and/or algebra to show that
tan(θ) = −
b sin(t)
a cos(t)
. (3.53)
Hence, we have: tan2
(θ) = b2
a2 cos2
(t).
Using the previous relations and the fact that radial coordinate is given by
r =
—
x2 + y2 =
˜
a2 cos2(t) + b2 sin2
(t), (3.54)
prove that we have:
r =
ab
—
a2 sin2
(θ) + b2 cos2(θ)
. (3.55)
Hence we can view r = r(θ) as the paramteric equation for an ellipse
parametrised by the polar-angle θ.
Using some trigonometry and the relationship between eccentricity and
the semi-axes lengths a, b, simplify our polar equation to the ‘canonical’
form:
r(θ) =
b
—
1 − 2 cos2(θ)
. (3.56)
3. Understanding Plot the points γ(0), γ(π) and γ(π
2
, γ(3π
2
). What is special
about these points?
Now, ﬁnd the coordinates of the foci F1 and F2 in polar coordinates. Note
that the cartesian coordinates for these points are (−f, 0) and (0, f).
4. Translation of center (challenge) Note that in the above derivation, we
have constructed an ellipse whose center is at the origin (0, 0). As it
turns out, for celestial mechanics and other problems, a much more useful
parametrisation is when we let one foci of the ellipse coincide with the
origin (0, 0). This corresponds to translating the ellipse along the major
axis by a distance ±f (depending on which focus is now at the origin).
By using a slight modifcation of our construction, show that the canon-
ical equation for an ellipse in polar coordinates (r, θ) whose center is at
(r, θ) = (f, 0) or (f, π), is given parametrically by:
r =
c
1 ± cos(θ)
. (3.57)
where
c = a(1 − 2
). (3.58)
Hence, using the relations between c, and a, b, show that:
a =
c
1 − 2
, b =
c
?
1 − 2
(3.59)

5. Rotation of major axis (challenge) Now, our final challenge. Prove that
when rotate our ellipse so that one foci lies at the origin and the other now
lies on a line whose (fixed) polar angle θ = θ0, that our canonical equation
takes the following form:
r =
c
1 − cos(θ − θ0)
. (3.60)
In particular, this is the equation for an ellipse with one focus at the origin
(r, θ) = (0, 0), center at (r, θ) = (f, 0) and second focus at (r, θ) =
(2f, θ0).
6. Easy Stretches Using the formula for an ellipse with one focus at the
origin (r, θ) = (0, 0), center at (r, θ) = (f, 0) and second focus at (r, θ) =
(2f, θ0), show that the radial coordinate of the ellipse has the following
extremal values:
rmin =
c
1 +
, rmax =
c
1 −
. (3.61)
In terms of celestial mechanics, with one massive body (such as the Sun)
at one focus and another massive body (such as the earth) at second focus,
rmin and rmax represent the distance between the bodies at perihelion
(θ = θ0) and aphelion (θ − θ0 = π), respectively. When we are dealing
lunar bodies orbiting a planet, these minima and maxima are the ‘perigee’
and ‘apogee’.
Wolf Velocity Let us consider now, that our polar angle θ of the wolf cicrum-
navigating the ellipse, is a function of time t. We can express this by writing
θ = θ(t).
Using the chain-rule and the formula for an ellipse with one focus at the origin
(we can choose θ0 = 0) compute the radial velocity dr
dt
of the wolf in terms of
the angular velocity dr
dθ
.
dr
dt
= (
dr
dθ
)
dθ
dt
(3.62)
Therefore, we can get the actual velocity of the wolf – that is, the time-rate of
change of its position vector:
d
dt
r =
dx
dt
e1 +
dy
dt
e2. (3.63)
One can do this either by switching back to cartesian coordinates, or noting that
the polar coordinate basis vectors will vary with the paramter t – hence we need
to differentiate these too! The latter, we shall illustrate next week when solving
the two body problem.
Arc Length and time taken Note that it is a significant challenge to compute
the arc-length of a segment of an ellipse. Although there are many expressions
(elliptical integrals or ‘Jacobi functions’), these are all very non-trivial! How-
ever, since the cyborg emperor in our story can measure the angular velocity dθ
dt
,
we can compute the it takes for the wolf to reach a point on the ellipse where a
ray from the polar-coordinate origin intersects the serpent. All we need are the
initial polar angles of the wolf, the ray on which the serpent lies and the angular
velocity as a function of time.
Using the above argument, derive a simple expression for the time it takes the
wolf to travel from θstart to θtitanoboa.
That’s all for this week!

3.4. THE TWO BODY PROBLEM AND PLANETARY ORBITS (EASTER SKETCH)49
3.4 The Two Body Problem and Planetary Orbits
(Easter Sketch)
3.4.1 History and Cultural Impact
The two body problem, is arguably one of the most profound and influential
problems in Western science since antiquity. As such, we have the following
key players in this story:
• Apollonius of Perga – 262BC to 190 BC.
• Aristachus of Samos – 3rd Century BC. Heliocentrism.
• Copernicus – 1473 to 1543. Heliocentric theory.
• Tycho Brahe – 1546 to 1601. Data collection.
• Johannes Kepler – 1571 to 1630. Kepler laws of planetary motion from
Brahe’s measurements.
• Galileo Galilei – 1564 to 1642. Gravitational acceleration. Constant mo-
tion, preliminary notions of inertia.
• Isaac Newton – 1652 to 1726. Principia Mathematica, solution to two-
body problem.
In the days of the early astronomy of the ancient Greeks, it was widely believed
that the earth was the center of the universe – with the celestial bodies of the
heavens revolving about it. To this extent, Ptolemy created a way to track the
motion of the heavens that fitted this geocentric viewpoint. In contest however,
was Aristachus of Samos, who prosed a heliocentric model – one in which the
Sun was the centre of the observable universe. This viewpoint – which we now
know to be accurate for our solar system, was strongly opposed and did not
resurface till the early Renaissance era.
In the transition between the medieval and renaissance periods, Copernicus pro-
posed a heliocentric model of the solar system. This was radical in the sense
that it ran against the conventional philosophies and paradigms of the time – in
particular, the Vatican church. After Copernicus’ death, Tycho Brahe performed
many astronomical observations (later aided with the new improvements made
to telescopes by Galileo). Using the extensive astronomical data he gathered,
Brahe developed several laws of planetary motion to match his observations.
These were subsequently revised by Kepler, who demonstrated that the planets
orbited the sun along trajectories described by geometries of the ancient Greeks
– in particular, the ‘conic sections’ developed by Apollonius of Perga. Although
Kepler’s laws of planetary motion were in strong agreement with Brahe’s data,
the physical mechanism for producing the elliptical orbits of the planets (or hy-
perbolic orbits of comets) had not yet been demonstrated.
The problem of finding a physical mechanism (potential or force law) to pro-
duce the Kepler orbits, when considering two interacting bodies, became known
as the ‘two body problem’7
. This problem caught the attention of none other than
the father of modern mathematics – Sir Isaac Newton. Inspired by this problem,
Newton reinvestigated the geometry of antiquity – proving many theorems and
lemmas regarding conic sections. Newton used these results to solve the two
7
Note that today, we use this terminology for the converse problem – “given two bodies
interacting under some potential or force law, what trajectories will their motion follow?

body problem. In particular, Newton showed that the Kepler orbits arise when
two massive bodies interact via an attractive ‘inverse square’ force law – New-
ton’s ‘universal law of gravitation’. However, Newton did not stop here. He then
invented calculus and demonstrated his new mathematics by using it to again
prove that his force law gave rise to the Kepler orbits – along with many other
results.
Newton’s work on the two body problem – and his demonstration of calculus
in its solution, formed the basis for his Magnus Opus – the ‘Principia Mathe-
matica’, which is without doubt, one of the most important texts in human his-
tory. At this point, with combined astronomical data and a powerful mathemati-
cal demonstration of a heliocentric solar system governed by an experimentally
testable force law (Newtonian gravity), the two body problem had conclusively
been solved. Well, not quite...
Since Newton, many other influential mathematicians and physicists have stud-
ied the two body problem, providing alternative proofs and solutions. The two
most notable methods are that of variational calculus – Lagrangian and Hamil-
tonian mechanics, as well as more abstract methods pertaining to symmetries
and constants of motion – the ‘Laplace-Runge-Lens’ vector. Such methods have
subsequently formed the basis for modern physics – in particular, quantum me-
chanics, general relativity and (relativistic) quantum field theories.
3.4.2 Inverse Square Law and Central Potentials
Here we shall derive the Kepler orbits as the solutions to the motion of two mas-
sive bodies acting on each other via the gravitational force, provided by New-
ton’s Universal Law of Gravitation8
In particular, we have two massive bodies
with masses m1 and m2 acting on each other as follows:
F12(r1, r2) :=
Gm1m2
r1 − r2
2
ˆr12
F21 = − F12 . Newton’s 3rd Law (3.64)
Here Fij is a vector function of two vectors in R3
that is, Fij maps the dis-
placement vectors ri and rj of two masses of mass mi and mj, respectively, to
a force vector Fij(ri, rj) whose direction is from mass mi to mj – i.e. parallel
to (ri − rj). Therefore, using Newton’s 2nd Law we get the following coupled
second-order differential equations:
F12 =m1:r1
F21 =m2:r2, (3.65)
which we want to solve for the trajectories (integral curves) traced out by the
displacement vectors r1 and r2, of the massive bodies.
Exercise 11 ([Art 101) Sketch the above force laws with a vector diagram for
the masses, displacement vectors and force vectors.
As a bonus, sketch what would happen to the force vectors if we replaced the
masses with electric charges and the force law with Coulomb’s law. Consider
the following cases: two positive charges, two negative charges and oppositely
charged particles.
8
Historically, the two body problem was posed in reverse – that is, given the Kepler orbits,
find a potential (or force law) that gives rise to these trajectories.

Problem 20 (Central Potential) Conservative force fields have the property that
the work done in moving an object under a conservative force, is independent of
the path taken. Furthermore, such forces can be derived as the exterior deriva-
tive (or gradient) of some scalar potential:
F = − U. (3.66)
Equivalently 9
,
F = −dU. (3.67)
In this problem, consider a so-called ‘central potential’:
U(r1, r2) = −
Gm1m2
r1 − r2
. (3.68)
Such a potential has the property that it is ‘spherically symmetric’ – meaning it
is invariant under rotations. Furthermore, it is invariant under translations. To
see this explicitly, note that U does not depend on the directions of r1 or r2, but
only on the magnitude of their difference, r1 − r2 (which is invariant under
said isometries10
). Hence, we can write:
U(r1, r2) = U( r1 − r2 ). (3.69)
Such a potential is said to be a ‘central potential’. This is because if we let one
mass coincide with the origin, then U only depends on the distance r = r1−r2
from the origin. In particular, the U = constant surfaces are spheres.
Prove that Newton’s Universal Law of Gravitation tells us that gravity is a con-
servative force field. In particular, show that:
F12 = − U( r1 − r2 ). (3.70)
Path independence then follows from noting that the gradient vector field is dual
to the exterior derivative – from which one can apply the generalized Stokes’
theorem (or ‘Fundamental theorem of calculus’):
p1
p2
dF =
∂Path(p1,p2)
F = F(p1) − F(p2). (3.71)
Although this is sufficient, prove path independence by showing that:
Curl(F12) := × F12 = 0. (3.72)
Path independence then follows explicitly from the classical Stokes theorem.
3.4.3 Symmetries and Jacobi Coordinates
Clearly, solving the differential equations for the displacement vectors of each
mass is a rather primitive and inefficient brute-force approach. However, as
we know with most problems involving the momenta of more than one mas-
sive body, they are drastically simplified by switching to the ‘center of mass’
(CM) coordinates. In particular, we make the following simplifying approxima-
tions:
9
The musical ‘flat’ superscript is denote the differential 1-form (‘covector’ or ‘dual vector’)
corresponding to the force vector F. This correspondence is provided by the Euclidean metric.
For practical purposes, you can consider dU as the ‘total differential’ from first-year calculus.
10
Recall that isometries are symmetries that leave the metric unchanged – i.e. lengths and
relative angles between vectors.

1. The bodies are spherical and can therefore be dynamically treated as point-
particles of an equivalent mass, located at their centres of mass.
2. There are no external net forces – that is, the bodies are only interacting
via the inverse-square force law (gravity or electromagnetism).
We define the following vector and scalar variables:
M =m1 + m2
r :=r1 − r2
R :=
m1r1 + m2r2
M
pj :=mjrj j=1,2 – no sum
P :=M 9R = p1 + p2. (3.73)
These are the Jacobi variables for the center of mass frame. In particular, r is the
‘relative displacement’ (of the two bodies), R is the center-of-mass displacement
and P is the center-of-mass momentum.
Exercise 12 Sketch a vector diagram to illustrate the relation between the CM
(center-of-mass) coordinates (r, R) and the original coordinates (r1, r2). For
this sketch, investigate the difference when one body is much less massive than
the other m1 m2, (approximately) equally massive m1 ≈ m2 or much more
massive: m1 tm2.
This may illustrate, for example, the earth-Sun and Earth – 3D-Printed Earth
scenarios.
Exercise 13 By employing Newton’s 3rd Law, prove that the center of mass ex-
periences no acceleration (in the absence of external forces):
:R = 0. (3.74)
Show that in the presence of external forces, Newton’s Second Law implies that:
Fext = M 9R = M :R. (3.75)
Therefore, the CM moves as if it were just single particle of mass M subjected
to some net external force on the system. This justifies our ability to represent
our extended bodies as point particles, in the sense that their trajectories can be
represented by their CM trajectories.
This shows that the velocity V = 9R of the center of mass, is a constant vector.
Therefore, it follows that the total momentum P = MV is also constant (i.e.
momentum conservation).
Since the phase space for 2-body system (point particles) is 12 dimensional (3
momenta and 3 position coordinates for each particle/body), it follows that the
trajectory R(t) of the center of mass can be uniquely determined from knowl-
edge of the initial displacement and velocity vectors of the masses.
We now note, that a consequence of our initial assumptions, is that the two-body
problem can be reduced to an equivalent 1-dimensional motion. To see this, we
note that the total angular momentum L of our system is constant (conserved) –
in reality, due to the oblateness of the earth and other inhomogeneities, the total
angular momentum varies slightly. In particular, L precesses.
To see this, we first establish the following results.

Problem 21 Recall that the angular momentum of a particle of mass m, dis-
placement vector r and linear momentum p, with respect to some origin, is given
by
l = r × p. (3.76)
Using this definition, the total angular momentum of our system is given by:
L = r1 × p1 + r2 × p2 = r1 × m1v1 + r2 × m2v2. (3.77)
Now define rj to be the displacement of mass mj relative to the center of mass.
Note that j without the prime is just the displacement vector of mj relative to
our original origin, (0, 0, 0). Given these definitions, show that:
L = R × M 9R + R ×
j
mj 9rj +
£
j
mjrj × 9R

+
j
rj × mj 9rj. (3.78)
Simplify this by showing that:
j
mjrj = 0. (3.79)
Hint: Identify this as the derivative of an expression for the position of the CM
relative to the CM. Hence show that:
L = R × P +
j
rj × mj 9rj. (3.80)
It follows that:
L = LCM motion + Lmotion relative to CM := Lorbital + Lspin. (3.81)
Now, apart from the precession of the equinoxes 11
), to good a approximation it
holds that for the earth-Sun gravitational interaction:
9Lspin = 9L − 9Lorb
=
j
rj × Fext
j
=Γext
about CM
≈0, (3.82)
where
9Lorbital = 9R × P + R × 9P = 0 + R × Fext
. (3.83)
A useful quantity to work with is the reduced mass:
µ :=
m1m2
M
=
m1m2
m1 + m2
. (3.84)
To see that the motion of the two bodies with respect to each other lies in a
2D plane, it suffices to show that the angular momentum vector is constant. In
particular, this is because the angular momentum vector is perpendicular to both
the momentum (velocity) P and displacement vector 9r – since the trajectory of
the motion is described by these vectors it follows that they are confined to a
11
In astronomy, distant stars provide a roughly ‘fixed’ reference frame which we may do
measurements with respect to. In particular, ‘the precession of the equinoxes’ refers to the 50
arcseconds per year rotation of the earth’s axis relative to the ‘fixed’ stars.

plane orthogonal to the angular momentum vector, L. To see that the angular
momentum vector is constant, note that it is given by:
L =L1 + L2
=r1 × p1 + r2 × p2
=m1r1 × 9r1 + m2r2 × 9r2, (3.85)
with linear momenta deﬁned as usual by pj = mj 9rj and j = 1, 2. Now, the net
torque on our system is given by the rate of change of the angular momentum of
the system:
Γ =
d
dt
L
=
d
dt
pr1 × p1 + r2 × p2q
=0 + m1r1 × :r1 + m2r2 × :r2 + 0. (3.86)
In the center-of-mass frame, we take the center of mass to be the origin – hence
in this frame, R = 0. Therefore:
0 =
m1
M
r1 +
m2
M
r2. (3.87)
Exercise 14 (Mass Reduction Strategies) In her effort to reduce her mass, Sarah
decides to use the following ‘reduced mass’ formula:
µ :=
m1m2
M
=
m1m2
m1 + m2
(3.88)
to simplify her exercise routine. One of her exercises (set by Dogburn), is to
prove that we can simplify the angular momentum expression to:
L = r × µ9r, (3.89)
where r := (r1 − r2) is the relative displacement vector, as before.
Help Sarah by deriving the above expression for the total angular momentum in
terms of the reduced mass µ and relative displacement r.
Hint: Work in the center-of-mass frame and use the relation
0 =
m1
M
r1 +
m2
M
r2 (3.90)
along with the earlier expression for L in terms of r1 and r2.
Using the reduced expression for the total angular momentum in the CM frame,
we see that the torque on the system vanishes in the absence of external forces:
d
dt
L =9r × µ9r + r × µ:r
=0 + r × Fgrav
=0. (3.91)
It follows that the angular momentum L is constant and hence the motion of the
bodies occurs in a plane with L as its normal vector.

Exercise 15 (CERN Confirms Existence of The Force) On April 1, 2015, the
European Organization for Nuclear Research (CERN), confirmed the existence
of ‘the force’. To verify their claim12
, prove that Newton’s universal gravitational
force law takes the following form in reduced-mass coordinates:
Fgrav =
GMµ
r2
ˆr, (3.92)
where r = r is the magnitude of the relative displacement vector and ˆr := 1
r
r
is a unit vector in the direction of r. Since r points from mass 2 towards mass 1,
the force written above is that experienced by mass 1.
Now, re-express Newton’s Second Law for mass 1,
F = 9p1 = m1:r1, (3.93)
in terms of the relative displacement vector, r.
Hint: Work in the CM frame as before when we proved the angular momentum
vector was constant.
Our overall equation of motion for our center of mass R is given by the differ-
ential equation:
M :R = 0. (3.94)
This just tells us that the center of mass moves with constant linear velocity –
i.e. we are working in a translating, non-accelerating frame. Recall that to solve
the two body problem, we were required to derive the trajectories (r1, r2), or
equivalently (R, r) with respect to some parameter (e.g. time). Since the above
differential equation implies that R(t) = R0 for all time, it remains to solve
for the trajectory, r(t), of the relative displacement vector. To this extent, we
combine Newton’s Law of Universal Gravitation with Newton’s Second Law, to
obtain the following vector differential equation (a system of scalar differential
equations):
−
GMµ
r2
ˆr =µ:r , (3.95)
which simplifies:
:r = −
GM
r2
ˆr. (3.96)
Since we have proved that the motion occurs in a 2D plane, we are working
in a 2-dimensional vectors – which is necessarily by spanned by two linearly-
independent vectors. Now, we know that the above vectors appearing in the
differential equation are equal iff their components (in some basis) are equal –
we can use this extract a system of scalar (component) differential equations.
Although we could work in Cartesian coordinates, the symmetry of our problem
suggests a far more natural coordinate system – polar (‘circular’) coordinates.
However, not only do we need polar coordinates – we need polar coordinate
basis vectors. Therefore, we perform the following change of basis and change
of variables:
ˆr :=∂r = cos(θ)∂x + sin(θ)∂y
ˆθ :=∂θ = − sin(θ)∂x + cos(θ)∂y
x =r cos(θ)
y =r sin(θ), (3.97)
12
Disclaimer: April Fools

where ˆr, ˆθ (or ∂r and ∂θ) are unit vectors in the radial and angular (tangential)
directions and pdx, ∂y are unit vectors in the x and y coordinate directions – i.e.
the standard Cartesian basis vectors13
.
Problem 22 (Bipolar) One therapy for bipolar problems, is to derive the change-
of-basis relations between a standard (Cartesian) basis and a polar coordinate
basis. To help your friend, complete the following problems.
1. Trigonometric Derivation: Derive the change of basis {∂x, ∂y} → {∂r, ∂θ},
from Cartesian to Polar (circular) coordinates, by drawing the x, y and r, θ
coordinate lines on a plane and using trigonometry. Note that at any point
in the the plane R2
, we have two Cartesian basis vectors – these are in-
variant in the sense that we can transport them around without changing
them. However, the polar basis is only deﬁned at every point except the
origin (since θ is singular at the origin) – more importantly, the polar basis
vectors change for different θ (but do not vary with respect to r).
2. Chain-Rule and Differential Operator Derivation: We can alternatively
use the fact that the basis vectors for Cartesian and polar coordinates, are
in fact the tangent vectors to the cartesian and polar coordinate curves.
Differential geometry tells us that tangent vectors correspond to differen-
tial operators in the ‘obvious’ way. To see this, show that the following
operators (partial derivatives) coincide with the polar basis vectors you
constructed geometrically:
∂r =(
∂x
∂r
)∂x + (
∂y
∂r
)∂y
∂θ =(
∂x
∂θ
)∂x + (
∂y
∂θ
)∂y , (3.98)
using the expressions for x and y in polar coordinates.
3. Group-Theoretic Derivation Some of you will recall the 1−1 correspon-
dence between linear operators and matrices. In this particular case, we
note that the polar basis is obtained by rotating the Cartesian basis vec-
tors through an angle θ anti-clockwise. In particular, recall that the 2-by-2
matrix which rotates vectors in R2
in this manner, is given (w.r.t to the
standard basis) by:
R(θ) =

cos θ − sin θ
sin θ cos θ
'
(3.99)
(3.100)
Hence, show that the following rotation of the Cartesian basis vectors
R(θ)
¢
1
0

, R(θ)
¢
0
1

(3.101)
results in the polar basis vectors derived with earlier methods.
Before we can ﬁnish expanding our equation of motion (vector differential equa-
tion) in the polar basis, we must observe the following critical observation –
“Although the cartesian basis vectors do not vary with respect to time t, the
13
Note that we use partial derivative notation to suggest the correspondence between tangent
vectors and differential operators. In particular, this makes it easy to memorize and derive the
change of basis formulae.

polar basis vectors do. This is because the polar coordinate basis vectors are
functions of r and θ, which are in-turn, quantities that do vary in time.
Using the chain-rule and the expressions for the polar basis vectors (in terms of
the Cartesian basis) derived previously, one can derive the following expressions
for relative displacement, velocity and acceleration:
r =r(cos(θ)∂x + sin(θ)∂y) = r∂r = rˆr
9r =9r∂r + r 9θ∂θ = 9rˆr + r 9θ ˆθ
:r =(:r − r 9θ2
)ˆr + (r:θ + 29r 9θ)ˆθ. (3.102)
Exercise 16 Assert the above statement by ﬁlling the details of this derivation.
Hint: Before differentiating each vector, express it in the Cartesian basis ∂x, ∂y
(but keep the polar coordinates r, θ), then differentiate. Use the expressions for
the polar coordinate basis vectors to then simplify the resulting vector (expressed
in the Cartesian basis) as a linear combination of polar basis vectors.
Therefore, our relative-displacement equation of motion becomes:
(:r − r 9θ2
)ˆr + (r:θ + 29r 9θ)ˆθ =(−
GG
r2
)ˆr + 0ˆθ. (3.103)
DOUBLE-CHECK AND SIMPLIFY
We therefore get the two ordinary differential equations:
:r − r 9θ2
= −
GM
r2
r:θ + 29r 9θ =0. (3.104)
Now note that the magnitude of angular momentum is given by: L := L =
r × µ9r = µr2 9θ, hence:
:θ = −
2L9r
r3µ
. (3.105)
Since angular momenta is constant, we can decouple the θ variable from the
radial differential equation – in particular, we shall use the substitution
9θ =
L
r2µ
, (3.106)
to remove the θ variable from the r-component of the vector differential equation.
Now, rather than solving for r and θ as functions of time, we can get the orbit
(solution the vector differential equation) by parametrising r in terms of the polar
angle, θ. To do this, we must make use of the chain-rule:
9r :=
dr
dt
=
dr
dθ
dθ
dt
:= r 9θ
:r =
d9r
dt
= r 9θ2
+ r :θ. (3.107)
Exercise 17 Substitute the expressions for 9r,:r in terms of r and r into the dif-
ferential equations we have been working with. In particular, use the θ com-
ponent of the vector differential equation (involving :θ) to re-write the :θ which
appears, in terms of 9r and r.

Show that after these substitutions, we obtain the following second-order non-
homogenous differential equation of a single variable, r :
L2
r4µ

d2
r
dθ2
−
2
r
(
dr
dθ
)2
− r

= −
GM
r2
. (3.108)
Hint: Do not eliminate 9θ till the very final step , where you can substitute the
expression:
9θ =
L
r2µ
. (3.109)
Remark: Note that we have reduced a system of coupled, second order dif-
ferential equations into a single differential equation of single variable. This
was achievable because we used a constant of motion – the angular momentum
(proportional to 9θ) to eliminate some degrees of freedom for the motion of our
objects. In particular, we reduced the dimensionality of the phase space for the
orbit.
Finally, we make the judicious change of variables (to reciprocal radius), r → s
with
s :=
1
r
. (3.110)
To this extent, we make use of the chain rule again:
dr
dθ
=
−1
s2
d2
r
dθ2
=
2
s3
(
ds
dθ
)2
−
1
s2
d2
dθ2
. (3.111)
Exercise 18 Verify the above identies by using the chain rule yourself. Substitute
the above identities into our simplified differential equation. After some algebra,
you should be able to obtain the final form of our original differential equation:
d2
s
dθ2
+ s =
GMµ2
L2
. (3.112)
Note that the term on the right-hand side of this equation is a constant since the
(reduced) mass and angular momenta are conserved.
Those of you who have studied ordinary differential equations, should realize
that this is a linear, ordinary second-order differential equation with a non-
homogenous term: GMµ2
L2 . Such ODEs can be solved by obtaining their char-
acteristic equation, finding the homogenous solution, then adding the particular
solution to the homogenous solution to obatin the general solution.
Exercise 19 Write the homogenous second order ODE corresponding to our
non-homgenous ODE. Sovle the characteristic equation to obtain a homogenous
solution of the form:
sh(θ) = A sin(θ) + B cos(θ), (3.113)
where A, B are constants determined by initial conditions.
Show, using trigonometric identities, that the homogenous solution can be re-
written in the form:
sh(θ) = λ cos(θ − θ0), (3.114)

where λ, θ0 are some constants related to A, B (i.e. find the relations).
Now show that we have the following particular solution:
sp =
GMµ2
L
. (3.115)
Therefore, our general solution is given by:
s(θ) = sp(θ) + sh(θ) = λ cos(θ − θ0) +
GMµ2
L
. (3.116)
Re-write this solution so that it takes the following form:
s(θ) = C r1 + cos(θ − θ0)s. (3.117)
In particular, find the constants C and in terms of λ and GMµ2
L
.
Since our general trigonometric solution is given by
s =
Gµ
L2
(1 + cos(θ − θ0)), (3.118)
we can choose our coordinates so that θ = 0 when s is maximal (r is minimal),
so that θ0 = 0 – this corresponds to rotating our (r, θ) polar coordinate system.
Changing s back to our radial coordinates, r = 1
s
, we thus have:
r(θ) =
L2
Gµ
1
1 + cos(θ)
, (3.119)
where is the eccentricity of the orbit (conic section). For 0 1 we get
elliptical orbits – with two foci. For the special case = 0 (which happens only
when one mass is infinitely larger than the other), we get circular orbits. For the
sun-Earth orbits, we have ≈ 0 – so to some approximation, we can treat the
orbit as circular.
The bound orbits for the Kepler problem are clearly given by conics sections
with 1. For 1, we get hyperbolae and for = 1 we get parabolic
‘orbits’ – these orbits are not ‘bound’ and hence not period. They may describe
incoming comets or asteroids as they are gravitationally slingshotted by the sun
out of the solar system.
Implications – Kepler’s laws of Planetary Motion
3.4.4 Kepler’s Laws
Corollaries (K laws)
Derive period of Earth’s orbit
3.4.5 Superintegrability and Constants of Motion
Conservation of Energy
Solution 2: Laplace-Runge-Lens vector ...
Casimirs and rotation algebra
Hydrogen two-body problem and solution with Pauli vector.

3.5 Hyperbolae, Comets and Atomic Scattering
Hyperbolae
Hyperbolic Trigonometry
Hyperbolic Metric Spaces (special relativity – hyperbolic reverse Cauchy-Schwarz
inequality, light cones and foliation, ) and manifolds (elementary concepts)
3.6 General Relativistic Corrections
No actual ‘GR’ needed to derive/analyse orbit equation. Just state the next-
order terms added to newtonian potential for an observer in the Schwarschild
Spacetime. From this potential, construct the ‘Newtonian force’ or Lagrangian
to obtain the resulting differential equations.
Show how the DE’s can be solved rather simply by a slight modiﬁcation to the
solution for Newtonian gravity.

Chapter 4
Physics in Non-Inertial Frames
In this series of exploration studies, we will investigate the mathematics of accel-
erating reference frames and the isometries of Euclidean space – the rotation and
translation groups. Such mathematics has been extensively developed through-
out the 17th, 18th and 19th Centuries and is elegantly unified in a higher realm
of – the theory of Lie Groups / Lie Algebras, Clifford Algebras and Differential
Geometry. Nonetheless, we shall these topics in their most basic form – one
accessible to first year tertiary students.
We will apply our mathematical structures to study the dynamics of rigid bod-
ies moving in Euclidean space. In particular, we aim to explore and develop a
working knowledge of angular momentum, moments of inertia and rotational
symmetry operations. Once these foundations are reviewed, we may move onto
the dynamics of objects in linearly and angularly (rotating) accelerating frames.
Such dynamics is then applied to understand the ‘Coriolis effect’ – a natural
phenomenon. Our journey ends with a brief study and solution of the ingenious
‘Focault’s Pendulum’ – the first device ever built to measure the rotation of the
earth about its axis.
4.1 The Lie Group of Rotations: Design a Death
Star
In this section, we investigate how one can apply the theory of the Lie groups
and Lie algebras to the construction and design of an orbital death star 1
– in
particular, an orbital space station equipped with high intensity Bose-Einstein
condensate based gamma-ray LASERS, naval anti-missile lasers, electromag-
netic rail guns and nuclear warheads.
When it comes to military technology, the most advanced science often takes
place in the form of weapons targeting, tracking and detection systems – a re-
cent example is the huge investment in stealth technology and C.I.A drone re-
connaissance by the United States military. This is because target detection and
acquisition is paramount – after all, you can’t eliminate something if you can’t
detect it and aim at it. Even master Sun Tzu understood the importance of this
element of warfare 2
. To this extent, we will see how the rotational Lie groups
1
For those of you who haven’t seen Star Wars, a death star is a large spherical-ish spaceship,
the size of a small moon, equipped with a beam weapon which can destroy entire planets.
2
For those of you who need to read more – Sun Tzu’s “Art of War”. The Giles translation is
recommended.
61

62 CHAPTER 4. PHYSICS IN NON-INERTIAL FRAMES
and Lie algebras, realized in matrix form, can be used to orient an orbital space
station along with the gun turrets it is equipped with. We conclude by looking
at the quaternionic representation of the rotation group – which leads us to the
first solid historical example of an abstract algebra (a ‘generalization’ of com-
plex numbers), constructed by the famous Irish polymath – Sir William Rowan
Hamilton.
This tutorial will make use of matrices and matrix algebra, abstract algebras
and group theory, vectors, rotations and various physical concepts. As such it
should be mastered by engineering, physics, computer science and math students
alike. Hopefully, it will unify and consolidate various areas of your studies – and
maybe convince you to get a job in weapons design/satellite programming.
4.1.1 Notation
For this tutorial, we will be sticking to Einstein notation – this means that when-
ever we see two indices repeated in some quantity that we are summing this
quantity over all possible values of those indices (omitting the summation sym-
bol ). So for example, we denote a 3-dimensional real vector v in terms of a
standard basis e1, e2, e3 as:
v = vi
ei, (4.1)
where the contracted index i ranges across i = 1, 2, 3:
vi
ei := v1
e1 + v2
e2 + v3
e3. (4.2)
As before, we keep one index raised and one index lowered for a pair of repeated
indices 3
. Furthermore, components of vectors are raised – hence vj
refers to the
j−th component of the vector v (not the j−th power), for example. You may be
familiar with representing a vector by its components – v = (v1
, v2
, ..., vn
) – this
notation is fine, yet elementary as it hides the choice of basis (which is assumed
to be the standard basis) by only displaying the components of the vector.
4.1.2 BFF: Linear Maps and Matrices
As one progresses in the mathematical sciences, one frequents the land of matrix
operations – for proofs, problems and simplifying calculations. Perhaps the main
reason for their popularity is that there is a one-to-one correspondence between
matrices and linear maps on vector spaces. In particular, a linear map L on a
vector space V (e.g. 3-dimensional Euclidean space R3
) is defined as follows.
Definition 10 A linear map ˆL : V → V which maps the vector space V to itself,
is one which has the following property:
• Linearity:
ˆL(au + bw) = aˆL(u) + bˆL(w) ∀u, w ∈ V, ∀a, b ∈ F (4.3)
where F is some number field (e.g. the real numbers R or the complex
numbers C).
3
A convention which matters in non-Euclidean spaces, since it helps to distinguish covariant
tensors (e.g. covectors such as the total differential) from contravariant ones (e.g. your usual
vectors).

4.1. THE LIE GROUP OF ROTATIONS: DESIGN A DEATH STAR 63
How does this correspond to matrices? Notice that if we represent a vector
v = vi
ei := v1
e1 + ... + vn
en in an n-dimensional vector space (e.g. Rn
) as a
column vector:
v =
¤
¦
¦
¦
¥
v1
v2
...
vn

(4.4)
Then we can readily compute the action of some matrix on this vector via matrix
multiplication. In particular, the action of an n×n matrix Aon an n-dimensional
vector v will produce another n-dimensional vector, u = Mv – which we call
the transformation of the vector v by the matrix M. For example:
Av =
¤
¦
¦
¦
¥
A1
1 A1
2 · · · A1
n
A2
1 A2
2 · · · A2
n
...
... · · ·
...
An
1 An
2 · · · An
n

¤
¦
¦
¦
¥
v1
v2
...
vn

=
¤
¦
¦
¦
¥
A1
1v1
+ A1
2v2
+ ... + A1
nvn
A2
1v1
+ A2
2v2
+ ... + A2
nvn
...
An
1v1
+ An
2v2
+ ... + An
nvn

(4.5)
Alternatively, in Einstein notation, the action of the matrix A on the vector v is
given by:
u = Ai
jvj
ei (4.6)
where the components Ai
j of the matrix A correspond to the entry in the ith
column and jth row of A4
. The contracted indices i and j run over 1 to n (the
dimension of the vector space in which v lives).
Now, if one recalls, the action of matrices on vectors is linear – that is, given any
scalars λ, γ and any n-dimensional vectors v and u, then for any n × n matrices
A and B we have:
A(λv + γu) = λAv + γAu, (4.7)
hence matrices obey the linearity property required by linear maps. In this sense,
we can think of the components of a matrix as the components of a linear map
in some chosen basis – conversely, by computing the action of a linear map ˆL
on a set of basis vectors {ej}, we can determine its components in that basis –
which we can view as entries in some matrix. To make this explicit with some
examples, we shall see how rotation maps can be realized in matrix form.
Exercise 20 (Apocalypse Now) Being quite bored of mathematics, physics, sword-
play, music and games, Thomas McKenney chooses to partake in a new pastime
– world domination. He decides the best way to undertake this, is to build his
own star wars-inspired Orbital Death Star. The St. George’s College Board
decides to fund Thomas in this pursuit – agreeing that world domination ﬁts
into the cultural expansion program as well as securing funding for building
maintenance. To this extent, Thomas realizes he must complete the St. George’s
College Mathematical Sciences tutorials in order to prepare his laser targeting
algorithms. To aid Thomas in this noble enterprise, think of a way to mathemat-
ically express the statement – “by computing the action of a linear map ˆL on a
set of basis vectors {ej}, we can determine its components in that basis.
Hint: Compare the action of a linear map ˆL on a vector v with the action of
some matrix A on v – in particular, compare the coefﬁcients of standard basis
vectors {ej} in the resulting transformed vectors: ˆL(v) and Av. Now look at
the special case when v is simply equal to one of the standard basis vectors ej.
4
Rows have a raised index and columns have a lowered index – taking the transpose of the
matrix reverses this.

4.1.3 SO(3): The Lie Group of Rotations
In 3-dimensional space Euclidean space, there are three independent axes of
rotation in any given coordinate system. Rotations of vectors are linear maps –
to see this, complete the following exercise.
Exercise 21 (Microsoft Death Star) Linear operations are nice – firstly because
they are relatively simple and second because they can be represented by matri-
ces, meaning that they are easy to program and implement into computer algo-
rithms. Therefore, to build a feasible laser targeting system, one would hope that
programming the rotation of the laser turret amounts to linear operations. Tak-
ing an interest in weapons targeting systems, Emma Krantz decides to program
such a system for her programming competition – to assess the feasibility, she
has to prove that rotations are linear operations.
Let ˆR represent some 3-dimensional rotation operation and v be some 3-dimensional
vector. Argue geometrically that the action of ˆR on the vector v is linear – i.e.
show that ˆR satisfies the linearity property required by a linear map.
Hint: Given a 3-dimensional vector v, we can always scale it by some number
λ ∈ R. If |λ| 1 we dilate the length of the vector and if |λ| 1 we contract
it. Furthermore, if λ 0 we preserve the orientation of the vector and if λ 0
we reverse it. Argue that scaling first v → λv and then rotating the resulting
vector λv is the same as first rotating v and then scaling it by λ — this shows
that rotation is a degree 1 homogeneous operation.
Hint XP: Further show that adding two vectors v + u and then rotating the
sum of the two vectors, is the same as rotating each of the vectors separately (by
the same rotation) and then adding the individual rotated vectors. This shows
that rotations are additive operations – if you combine this property with the
degree 1 homogeneous property, this gives the linearity property which proves
that rotations are linear maps.
If one sets up a 3-dimensional Cartesian coordinate system, with coordinates
x, y, z (or x1
, x2
, x3
) and standard basis vectors e1, e2, e3 corresponding to unit
vectors in the x, y and z directions, respectively, then one has three independent
rotation operators R1, R2 and R3 which rotate vectors about each of the corre-
spondence axes (x, y and z). These are linear maps and hence can be represented
as 3 × 3 matrices. We can also view them as functions of the angle which they
rotate by. Explicitly, these matrices are:
R1(θ) =

!
1 0 0
0 cos θ − sin θ
0 sin θ cos θ
(
0
) (4.8)
R2(β) =

!
cos β 0 sin β
0 1 0
− sin β 0 cos β
(
0
) (4.9)
R3(γ) =

!
cos γ − sin γ 0
sin γ cos γ 0
0 0 1
(
0
) (4.10)

Geometrically, R1(θ) rotates any vector v anti-clockwise 5
by an angle θ about
the x-axis – this means it rotates v in a plane perpendicular to the x-axis. Sim-
ilarly, R2(β) rotates by an angle β anticlockwise about the y-axis and R3(γ)
rotates anticlockwise by an angle γ about the z-axis.
Exercise 22 (Eigenvectors of Rotation) Clearly if you have a vector that lies
along the x-axis and you rotate it about the x-axis, nothing happens to the vector.
This is because any vector lying along the x-axis is an eigenvector of the x-
rotation matrix R1(θ). More generally, if we rotate any vector v = v1
e1 +
v2
e2 + v3
e3, about the j-th coordinate axis, then its j-th component will not
change.
Q: Using matrix multiplication and representing each vector as a column vector,
show that:
Rj(θ)ej = ej, (no summation) (4.11)
which means that the standard basis vector ej is an eigenvector of the rotation
operator Rj with eigenvalue 1.
Now, using the previous result and the fact that rotations are linear operators,
prove that6
:
Rj(θ)v =
k=j
(vk
Rj(θ)ek) + vj
ej, (4.12)
where the summed index k = j means you sum over all values (1, 2, 3) of k not
equal to j. Hence, rotations about a given axis preserve the component of any
vector along that axis.
Problem 23 (The Proof is Trivial) If you rotate a vector about an axis through
angle of zero degrees, the vector should remain unchanged.
Verify that all three rotation operators Rj(θ) become the 3 × 3 identity matrix
(the matrix with 1’s down the main diagonal entries and zeros everywhere else)
when you set the angle θ = 0.
As it turns out, the set of rotation matrices forms a mathematical structure known
as a ‘Lie Group’. As such they are used for lying/truth algorithms. Actually
that’s a lie – they are actually a type of ‘continuous’ or rather ‘smooth’ group (as
opposed to a discrete group) named after the mathematician Sophus Lie, who
developed and pioneered them. Lie groups are of fundamental importance to
modern physics and mathematics – in fact, they are the core element underlying
major developments in particle physics 7
, high energy physics and gauge theory.
We define a Lie group as follows.
Definition 11 A Lie group G is a differentiable manifold which is also a group
whose group operations are smooth (infinitely differentiable). This means that G
equipped with the operation satisfies the group properties
• Closure/Binary Operation: If A, B ∈ G then A B ∈ G.
• Associativity: For any A, B, C ∈ G, A (B C) = (A B) C.
• Identity Element: ∃I ∈ G such that I A = A I = A, ∀A ∈ G.
5
Almost always in mathematics, anti-clockwise is considered to be a positive orientation and
clockwise is considered to be negative.
6
This is not using the Einstein summation convention – so vj
ej is for a fixed value of j, not
a sum over all possible values of the index j.
7
The Standard Model of Particle physics is in fact a Lie Group – this tells us the symmetries
that nature obeys for the electromagnetic, weak and strong nuclear forces.

• Inverses: For any A ∈ G∃B such that A B = I. If is a multiplicative
operation, we denote B = A−1
, the inverse of A. IF is additive (or
commutative), we denote B by −A.
where is a binary operation8
(e.g. matrix multiplication) which is smooth.
Exercise 23 (YOLO) Unaware of the on-going ‘Project Death Star’ of St. George’s
College, University Hall decides to hold a party to show how awesome they are.
After shouting YOLO, a drunken University Hall student jumps into a pit of horny
honey badgers and dies a humiliating death. Despite making it into the presti-
gious Darwin Awards, this is tragic because that student lived a life without ever
proving that the real numbers R form a group under addition – and that the
non-zero real numbers R{0} form a group under multiplication.
Using your wisdom and foresight to avoid a similar fate, prove that the real
numbers form a group under the addition operation + with 0 being the additive
identity element. Similarly, prove that the set of non-zero real numbers forms a
group under the multiplication operation × with 1 being multiplicative identity
element. Together, these statements imply that the real numbers form a special
mathematical structure called a ‘ﬁeld’.
Rotations form the Lie Group SO(3), which is the 3-dimensional ‘Special Or-
thogonal Group’. This group is characterized as the set of 3 × 3 matrices {A}
which have the following properties 9
• det(A) = 1
• AT
A = 1
for any rotation matrix A. Since the determinant of a linear map tells you how the
map distorts volumes, the ﬁrst condition (the ‘Special’ part) says that rotations
preserve volumes – this is a consequence of the more general observation that
rotations are isometries of Euclidean space, meaning that they preserve lengths
of vectors and relative angles between vectors (rotating any pair of vectors si-
multaneously leaves the angle between them unchanged). Furthermore, since
the second condition (the ‘Orthogonal’ part) can be written as:
AT
= A−1
(4.13)
where A−1
is the inverse of the rotation matrix A, the second condition says
that rotations are orthogonal10
transformations – meaning they preserve orthog-
onality of vectors (or that the column vectors in a rotation matrix are mutually
orthogonal). Hence, the second property comes from the fact that isometries
preserve angles between objects.
Note that the group operation for SO(3) is matrix multiplication – which is a
smooth operation since it essentially amounts to the multiplication and addition
of numbers.
Exercise 24 (Killing Time) Whilst waiting on the construction of the death star
by the St. George’s College engineering, science and mathematics students (as
well as legal approvals from Georgian law graduates), Thomas feels the urge
to kill – kill time that is. As a member of the St. George’s College Orbital
8
A binary operation on a set V , is one that combines two elements a, b of V to give another
element of V : a b = c ∈ V . Examples of binary operations include addition of numbers or
vectors, multiplication of numbers and cross products of vectors.
9
Recall that det means the matrix determinant of A and AT
denotes the matrix transpose of
A.
10
Recall that orthogonal is the mathematical term for ‘perpendicular’.

Death Star, help Thomas kill time by explicitly showing that the rotation matri-
ces Rj(θ) satisfy the two properties which characterize the special orthogonal
group, SO(3).
Hint: It helps to show that for any rotation matrix Rj(θ), one has (Rj(θ))T
=
Rj(−θ) = (Rj(θ))−1
, which can be argued geometrically and/or algebraically
using the fact that cosine is an even function 11
cos(θ) = cos(−θ) and that sine
is odd: sin(−θ) = − sin(θ).
Exercise 25 (Group Project: Project Death Star) In an attempt to understand
rotations better for the programming of a weapons targeting system on the Geor-
gian Death Star, the members of the SGC Mathematical Sciences Tutorial sit
down and try to prove that the set of rotation matrices, SO(3), form a group.
Since this includes you, complete this proof. This means verifying that SO(3)
satisfies the four properties required to be a group, with matrix multiplication
being the group operation.
Hint: Recall how the 3 × 3 identity matrix I3 acts on a 3-dimensional vector
v – that is, I3v = v. Furthermore, to show that every element of SO(3) has
inverse, consider Ru(θ) – an arbitrary rotation operator which rotates objects
anticlockwise through an angle θ about an axis defined by the vector u, then
consider how one would undo rotations performed by Ru(θ).
Because the Lie Group SO(3) is transitive, we can write any general rotation
as a product of finitely-many rotation matrices. For us, this means that we can
write any rotation as a sequence of rotations about the x, y and z axes:
R(α, γ, β) = R3(γ) R2(β) R1(α). (4.14)
Note that since matrix multiplication is not commutative, the order in which
multiply (hence the order in which we rotate) matters. In particular, when the
rotation R(α, γ, β) given by (4.14) acts on a vector v, it rotates it first by an
angle α about the x-axis, then by an angle β about the y-axis and finally by an
angle γ about the z-axis. In general, we could write down a matrix Ru(θ) which
rotates objects anticlockwise about some axis defined by the vector u through
an angle θ – indeed, such a matrix is given by the (easy-to-prove) ‘Rodrigue’s
rotation formula’, which we will investigate later.
Exercise 26 (Spring Cleaning) After finally getting building and environmen-
tal approvals, as well as successfully subduing Greens Party protesters, St. George’s
College sends Project Death Star into its testing phase. Having a particular dis-
taste for Justin Bieber, Thomas decides that he wants to aim and fire the gamma-
ray LASER on the death star at Justin Bieber’s hometown – during Christmas
when Justin Bieber is home with his family. For shielding reasons, in its inac-
tive state, the Death Star’s cannon is oriented along the x-axis in the following
figure.
This is because the cannon portion of the death star has weaker armour. In order
to fire the death star at Justin Bieber, Thomas must rotate the death star to point
at Ontario, Canada. After the death star is oriented in this way, Emma Krantz’s
targeting algorithm will takeover and refine the aim to Justin Bieber’s house.
The coordinate system we use is centred with the death star at its origin. In order
to shoot JB, the death star must be oriented in the direction of the purple ray in
the above diagram. This can be achieved by feeding the correct rotation matrix
11
Recall that even functions f(x) are symmetric about x = 0 and odd functions are anti-
symmetric.

Figure 4.1: Aiming an Orbital Death Star with sequential rotations.
into the death star targeting systems. There are multiple ways to construct such
a matrix – however, for our purposes, it is easiest to construct it by sequential
rotations about the three different coordinate axes.
Q: Write down the rotation matrices corresponding to the rotations indicated
by each of the angles – α, β, γ – show in the diagram. Note that these are not
necessarily in the order x − y − z! Once you’re conﬁdent that you have the
correct rotation matrices, multiply these matrices in the correct order to give
a rotation matrix which will rotate the death star cannon from the x − axis to
Justin bieber’s home state.
Hint: It helps to keep track of which coordinate stays constant under a cer-
tain rotation – recalling the rotation eigenvectors, it then follows that you are
performing a rotation about that coordinate axis. For example, the γ rotation
correspond to an anticlockwise rotation about the y coordinate axis through an
angle γ.
After pointing the death star at Ontario, the Krantz algorithm takes over and
performs a super-accurate shot – killing Justin Bieber with minimal collateral
damage. Fearing that the warlike nation of Canada will retaliate with direct
line-of-sight missile attacks, Thomas decides it is best to return the death star to
its original orientation along the x-axis – the side that faces Canada will thus
have more armour as well as an anti-missile system featuring an array of LAWS
Naval lasers stolen from the U.S. Military.
Q: Write down a sequence of rotations to rotate the death star to its original ori-
entation. Now write down a single rotation matrix to perform this total rotation.
Hint: Recall the fact that rotation operations form a Lie group – in particular,
this means that every rotation has an inverse. Recalling the properties of the
rotation group SO(3), in particular the orthogonality property: AT
= A−1
,
there is a super-easy way to invert the death star rotation and return it to its
original orientation. Alternatively, recall that you showed R(−θ) = (R(θ))−1
–
either algebraically or geometrically. Use this to ﬁnd the rotation matrix which
returns the death star to its original orientation.

4.2. RIGID BODIES AND MOMENTS OF INERTIA 69
4.2 Rigid Bodies and Moments of Inertia
A rigid body is a collection of particles (discrete or continuous) which are ‘fixed’
relative to each other. The dynamics of many objects – such as a cricket ball
flying out of Shane Warne’s hand, can be modelled as the motion of a rigid
body. In this regard, we can decompose the motion of rigid bodies as a linear
motion of the center of mass of the rigid body, accompanied by some rotational
motion of the rigid body about its principal axis.
To see why we need to consider rotation about more than one axis, consider the
following. In general, the direction of the angular momentum vector of a rotat-
ing object does not necessarily coincide with the axis of rotation. The angular
momentum and rotation axis coincide when the rotation is a principal axis. As
we shall see, for any rigid body rotating about some specified point (e.g. the
center of mass), there are three unique principal axes – these form an orthogonal
system (and hence a basis for some 3-dimensional vector space).
In this tutorial12
, we shall see how to describe the general rotational dynamics
of rigid bodies with general concepts such as the ‘inertia tensor’ and eigenvector
decompositions for determining the principal rotation axes for a rigid body. As
we shall see, for rigid bodies with various geometric symmetries, we obtain the
well-known (first year physics / engineering) formulae for simple rigid bodies –
e.g. cubes, spheres, cylinders. After deriving some familiar results, we shall use
the general theory to analyse the motion of a spinning top – although a child’s
toy, a deceptively non-trivial system!
4.2.1 Rotations: about an arbitrary axis
Unlike the circular motion of point particles (whose rotational motion can be
described by some scalar angular velocity), the rotational dynamics of extended
rigid bodies requires us to treat angular velocity as a 3-dimensional vector quan-
tity:
ω = (ωx, ωy, ωz), (4.15)
where the components, ωj denote the angular velocity about the j−th coordinate
axis. Note that it is the angular velocity vector, ω, which defines the axis of rota-
tion – that is, ω lies along the axis rotation. For general rotational dynamics, the
angular momentum vector, L, need not lie along the axis of rotation – hence L
and ω are not necessarily parallel13
. To construct the angular momentum vector
from the angular velocity, we note two scenarios in rigid body rotations.
1. Rotation with a fixed point: If a rigid body has a fixed point then it
can only rotate about this fixed point, making it a natural choice for the
origin of your coordinate system (vector space). Simple examples include
a pendulum swinging from a fixed pivot or a spinning top whose tip is
confined to some hole in the surface on which it rotates.
2. Free Rotation: A freely rotating rigid body is one which does not have
a fixed point. In this case, the center of mass (CM) makes for the natu-
ral choice of origin for a coordinate system (the center-of-mass reference
12
This tutorial is based mostly on chapter 11 of John Taylor’s ‘Classical Mechanics’. A very
accessible book for second (or first) year students who are willing to solve many problems
13
For special cases, such as circular motion of a point particle, L and ω are parallel – allowing
one to treat the angular velocity as a scalar quantity

frame). One may then decompose the motion of the object as the motion
of the CM along with rotations relative to the CM.
The total angular momentum of a rigid body, composed of particles (subs-systems)
of mass mj with displacements vectors rj relative to some origin 0, with linear
(tangential) velocities vj, is given as the sum of the angular momenta of each
particle (sub-system):
L =
j
mjrj × vj
= mjrj × (ω × rj), (4.16)
where ω is the angular momentum vector for the rigid body14
. One expand the
triple cross product (triple vector product), using the result from the following
exercise.
Exercise 27 (BAC-CAB Rule) For any 3-dimensional vectors, A, B, C, prove
that:
A × (B × C) = B(A · C) − C(A · B). (4.17)
Hint: Expand out both sides of the BAC − CAB equation and compare. Keep
in mind that the vector product takes two vector inputs and produces a vector
output. The scalar product (dot product) on the other-hand, takes two vector
inputs and produces a scalar output – the magnitude of the projection of one
vector onto the other vector.
Note that an alternative to the above formulae, is to treat r and ω as 1-forms and
use the exterior product and hodge dual operations:
(r ∧

r ∧ ω
¨
, (4.18)
which is equivalent to using Levi-Civita symbol identities.
Returning to the angular momentum construction and using the vector-triple
product identity, we have:
r × (ω × r) =ω(r · r) − r(r · ω)
= r 2
ω − (r · ω)r
=

(y2
+ z2
)ωx − xyωy − zxωz
¨
∂x +

−xyωx − (x2
+ z2
)ωy − zyωz
¨
∂y
+

−zxωx − yzωy + (x2
+ y2
)ωz
¨
∂z.
using a coordinate system and basis with
r = x∂x + y∂y + z∂z, ω = ωx∂x + ωy∂y + ωz∂z. (4.19)
Exercise 28 (How to read an angular momentum vector) If necessary, verify
the intermediate steps of the previous calculation. Hence, or otherwise, show
that the total angular momentum takes the form:
L =
j
mjrj × vj
=
j
mj

(y2
j + z2
j )ωx − xjyjωy − zjxjωz
¨
∂x +
j
mj

−xjyjωx − (x2
j + z2
j )ωy − zjyjωz
¨
∂y
+
j
mj

−zjxjωx − yjzjωy + (x2
j + y2
j )ωz
¨
∂z.
14
Note that the body has one unique angular momentum vector since it is a rigid body. This
is in contrast to a loose collection of particles with (possibly) distinct angular momenta vectors.

Note that we have added the sub-script j to denote the position coordinates
(xj, yj, zj) of the j − th mass in the rigid body system.
Using the above expansion of the angular momentum vector, we can write down
the components of the Inertia Tensor as follows. Since the inertia tensor is a
linear operator, we can represent it in our basis {∂x, ∂y, ∂z} as a 3-by-3 matrix I,
whose entries are read-off the angular momentum vector by matching the above
expansion with the following important matrix equation:
L := Iω, (4.20)
or equivalently, the three component equations for Ln with n = 1, 2, 3 (repre-
senting the directions x, y, z)
Ln =
3
k=1
Inkωk. (4.21)
That is, using the vector-triple expansion of the definition of the total angular
momentum vector, we can get the components Ink of the inertia matrix by look-
ing at the coefficient of the angular velocity ωk about the k − th coordinate axis,
in the n − th component of the angular momentum vector. So for example, Ixx
would be the coefficient of ωx in the x-component of the angular momentum L
vector:
Ixx =
j
mj(y2
j + z2
j ). (4.22)
Ixy would be the coefficient of ωy in the x-component of the angular momentum
vector:
Ixy = −
j
mjxjyj. (4.23)
Exercise 29 (Bedtime Reading) Continue the above to read-off the other com-
ponents of the inertia matrix. In particular, determine all components of the
inertia matrix and list them.
Hint: Don’t fall asleep.
The inertia matrix then takes the form:
I =

!
Ixx Ixy Ixz
Iyx Iyy Iyz
Izx Izy Izz
(
0
) (4.24)
(4.25)
and our matrix equation relating the angular momentum vector to the inertia
matrix and the angular velocity vector, gives us the following components of the
angular momentum vector:
Lx =Ixxωx + Ixyωy + Ixzωz
Ly =Iyxωx + Iyyωy + Iyzωz
Lz =Izxωx + Izyωy + Izzωz. (4.26)
As it turns out, the inertia tensor for a rigid body is a symmetric tensor. This
means it can be represented by a symmetric matrix I – that is, we have:
I = IT
, (4.27)

where T denotes the‘transpose’ operation. Visually, this means that the 3−by −
3 matrix I has reflection symmetry about its main diagonal. Note that this is
an extremely important property of the inertia tensor as the ‘spectral theorem
for self-adjoint operators’15
guarantees that the intertia tensor is diagonalizable.
This means we can choose a basis (the eigen-basis)in which the inertia matrix is
diagonal – thus guaranteeing the existence of principal axes.
Finally, as our last technical point, we note that since the inertia tensor is sym-
metric, it corresponds to a ‘quadratic form’ – in particular, one that acts on the
3-dimensional vector space R3
. This means that in principle, we could graphi-
cally represent the intertia tensor of different rigid bodies by a ‘quadric surface’,
generated by the equation:
rT
Ir = 0 ⇐⇒ (Ir) · r = r · (Ir) = 0. (4.28)
In the principal axes basis, I will be diagonal – allowing us to represent the
inertia tensor as an ellipsoid with principal axis lengths corresponding to each
component of the intertia tensor.
Problem 24 (Symmetry: To be, or not to be?) In the alternate timeline of Cy-
borg Emperor Constantine, the cyborg comes across William Shakespeare – a
bard-mathematician. In his new play, ‘Hamlet’, he depicts the dramatic life of
a physicist who cannot decide on the best construction for his theory of rota-
tional dynamics. In particular, Hamlet has the option of using exterior calculus
and defining the inertia-tensor as an anti-symmetric rank 2 tensor (differential
2-form), or using simple matrix/vector algebra and defining the inertia matrix in
a symmetric way.
Help Hamlet by proving our definitions lead to a symmetric inertia tensor.
Hint: It suffices to show that Ixy = Iyx, Ixz = Izx, Iyz = Izy.
Now, can you think of any physical reasons that the inertia tensor should be
symmetric (with our definitions)?
Because the inertia tensor is in general symmetric, it follows that there are only
3(3 + 1)/2 = 6 independent components at most. Note that this depends on the
basis we choose for our vector space – if we choose the principal basis, then the
inertia matrix is diagonal and thus has 3 independent components.
The results derived thus far pertain to a rigid body composed of discrete sub-
systems of mass mj (hence the sums over j). In the case of continuously dis-
tributed rigid bodies – such as a cube, ball or Theresa Feddersen, we must replace
these summations with integrations. In particular, the continuum limit is given
by:
mj → δmj,
j
mj →
D
dm, (4.29)
where dm is an infinitesimal mass element and D is a subset of R3
representing
the rigid body. The infinitesimal mass element, dm, can be related to the density
and measure (volume, area of length)of a (1,2, or 3-dimensional) rigid body. For
example, for a 3-dimensional rigid body with mass-density profile ρ(x, y, z),
we would have: dm = ρ(x, y, z)dxdyz, which is the volume density at a point
multipled by the volume of an infinitesimal box at that point. If the density ρ is
a constant, then we simply have: dm = ρ × d(Measure), where ‘Measure’ is
equal to the length, area or volume (depending on the object). This may seem
15
Note that a symmetric matrix corresponds to a self-adjoint linear operator over Rn
or Cn
with respect to Euclidean (formally, l2) inner-product.

confusing at first, so it is best to illustrate articulate these points with an example.
Example 7 (Constantine’s X-Cube) Having defended the residents of ‘Ye Olde
Town’ from the Titanoboa with Big A. Geller, Constantine is rewarded with a
mysterious cube. The cube has special functions16
, which can only be activated
if it thrown into the air and undergoes certain rotational sequences. In an at-
tempt to unlock the powers of the cube (and gain X-cube achievement points),
Constantine decides to master the rotational dynamics by investigating a cube’s
moments of inertia. To help the cyborg Emperor, we now procede with the nec-
essary mathematics.
First, we are given that the cube has a uniform density profile, with total mass M
and side length a. To make life easy, we choose a coordinate system such that
our coordinate axes coincide with the edges of the cube. We now construct the
inertia tensor for two classes of rotational motion for the cube.
1. Rotation about a corner (rotation with a fixed point): Here we con-
sider the scenario where our cube is rotating about a corner. A natural
coordiante system is to let the origin of some coordinate system coincide
with a corner of the cube. The cube is then defined as the following do-
main in R3
Cube = [0, a] × [0, a] × [0, a]. (4.30)
Taking one corner to be fixed at the origin, we can consider rotations of
the cube about different axes. For rotations about an edge (with one cor-
ner fixed), WLOG we can take the cube to be rotating about the x-axis –
whence our angular velocity vector will have the simple form:
ω = ωx∂x, (4.31)
equivalently: ω = (ωx, 0, 0) in component notation. Since there is only
one non-zero component of the angular velocity, we can omit the coordi-
nate subscript – hence: ωx = ω := ω .
Taking the continuum limit of our earlier summation expressions for the
components of inertia, we are able to derive the intertia matrix for our cube
(a rigid body which has continuously distributed mass). To this effect, we
16
These include a ‘red ring of death’, which annihilates everything (except the user) within a
300 meter radius.

have the following diagonal element of our inertia matrix:
Ixx = lim _n → ∞ lim
mj→δm
n
j
mj(x2
j + y2
j )
=
cube
dm(x2
+ y2
)
=
cube
ρdV (x2
+ y2
)
=
cube
M
V
dV (x2
+ y2
)
=
cube
M
V
dxdyz(x2
+ y2
)
=
a
0
dx
a
0
dy
a
0
dz
M
a3
(x2
+ y2
)
=
M
a3
2
3
a5
=
2
3
Ma2
. (4.32)
Similarly, in the continuum limit, we have:
Iyy =
cube
dm(z2
+ x2
)
Izz =
cube
dm(x2
+ y2
). (4.33)
Given the symmetry of the cube, it is clear that Iyy = Izz = Ixx.
Finally, to determine the off-diagonal elements of the inertia tensor we
take the continuum limit of our earlier results again:
Ixy = − lim
n→∞
lim
mj→δm
n
j
mjxjyj
= −
cube
dm(xy)
= −
cube
ρdV (xy)
= − ρ
a
0
dx
a
0
dy
a
0
dz(xy)
= − ρ
a
0
xdx
a
0
ydy
a
0
dz
= − ρ(
a2
2
)(
a2
2
)a
= −
1
4
Ma2
. (4.34)

To determine the other off-diagonal elements, the continuum limit of our
earlier expressions gives:
Ixz = −
cube
dm(xz)
Iyz = −
cube
dm(yz)
(4.35)
where Ixy = Iyx, Iyz = Izy Iyx = Ixy by symmetry of the inertia tensor.
Furthermore, the geometric symmetry of the cube tells us that Ixy = Ixz =
Iyz – hence we have determined all components of the inertia matrix. Ex-
plicitly, our inertia matrix in this {∂x, ∂y, ∂z} basis, is given by:
I =

!
2
3
Ma2
−1
4
Ma2
−1
4
Ma2
−1
4
Ma2 2
3
Ma2
−1
4
Ma2
−1
4
Ma2
−1
4
Ma2 2
3
Ma2
(
0
) (4.36)
(4.37)
which we can simply by taking drawing out a common factor:
I =
Ma2
12

!
8 −3 −3
−3 8 −3
−3 −3 8
(
0
) . (4.38)
(4.39)
Earlier, we derived a general expression for the angular momentum, L, of
a rotating rigid body – in particular, the angular momentum was shown to
be generated by the action of the intertia tensor (a linear operator) on the
angular momentum vector. In other words, we have:
L = Iω, (4.40)
where I is our inertia matrix. Doing this matrix multiplication explicitly
with our inertia tensor and angular velocity vector, we see that:
L =
Ma2
ω
12
(8, −3, −3). (4.41)
Hence, we have demonstrated an explicit scenario where the angular mo-
mentum vector L does not point in the same direction as the angular ve-
locity vector, ω.
Similarly, for rotation of the cube about its main diagonal (with one corner
ﬁxed), a unit vector in the direction of rotation is 1?3
(1, 1, 1). It follows
that the angular velocity vector is parallel to this, giving ω = ω?3
(1, 1, 1).
Thus, the angular momentum for this rotation is given by
L = Iω =
Ma2
6
ω. (4.42)
In this scenario, the angular momentum points in the same direction as the
angular velocity vector.

2. Rotation about the cube’s center (free rotations about the center-of-
mass): If the cube is rotating about its center, the natural choice of coor-
dinate system is to place the origin O at the center of the cube. Therefore,
the cube is deﬁned as the following domain in R3
Cube = [−a/2, a/2] × [−a/2, a/2] × [−a/2, a/2]. (4.43)
To account for this, we must change our limits of integration accordingly.
Using the same deﬁnitions as before, it is an easy exercise to show that for
this rotational motion, we have:
Ixx =
1
6
Ma2
, (4.44)
with Iyy = Izz = Ixx via geometrical symmetry. Furthermore, noting that
our domain of integration for each variable is now symmetric about the
origin – running from −a/2 to +a/2, it is easy to see that the off-diagonal
components of the inertia tensor vanish in this basis. In particular, we are
integrating odd functions of one-variable over symmetric domains.
Therefore, our intertia tensor is diagonal in this basis! This means that
the coordinate axes for this class of rotations, are the principal axis of our
cube! Mathematically, we are operating in the eigen-basis of our inertia
matrix. Thus,
I =
1
6
Ma2
!
1 0 0
0 1 0
0 0 1
(
0
) . (4.45)
(4.46)
Now, for rotations about the diagonal of the cube, the direction of the
angular velocity vector is parallel to the vector (1, 1, 1). In particular, ω =
ω 1?3
(1, 1, 1), where the 1?3
is used to make our direction vector a unit
vector. Therefore, the angular momentum vector is given by:
L = Iω =
Ma2
6
ω. (4.47)
Hence, for this class of rotations, the angular momentum vector is always
parallel to the angular velocity vector.
Note that it can be shown (an exercise – or argued by symmetry), that
the angular momentum for rotation about any axis through the center of
the cube is the same as the angular momentum for rotation about about
the main diagonal through the corner of the cube. This is because the
angular momenta about the main diagonal through the center of the cube
is precisely the same as the main diagonal through the corner – hence the
angular momenta for rotations about these two axes must coincide.
As we saw, out of a potential 6 possible independent components for its inertia
tensor, the cube exhibited only one or two independent components, depending
on the type of rotation (and basis) we chose. The cube was relatively simple
due to being a geometric object with a high degree of symmetry – the symmetry
allowed us to reduce the number of independent components. In the next guided
problem, we shall study the rotational characteristics of another object with a
relatively large degree of symmetry – the cone.

Problem 25 (The Cursed Cone) For generations, since the killing of Pelops,
the family of Atreus has carried the burden of curse. In the alternate universe
of the cyborg Constantine, the Greek hero Orestes, son of Agammemnon, King
of the Greeks, must end the curse upon his family by taking a solid gold cone to
the Areopagus. At this Athenian court, Orestes and his sister Electra will stand
judgement. To purify the cone and its associated curses (matricide being one of
them), Apollo, god of the bow, must derive the rotational characteristics of the
cone.
Consider a uniform, solid cone of mass M, height h and base radius R. To
consider rotational dynamics of the cone rotating about its vertex (tip)17
, we let
the origin O of some coordinate system coincide with the tip of the cone. It can
be described as the following domain in R3
:
Cone = {(x, y, z) ∈ R3
: x2
+ y2
≤ R2
, 0 ≤ z ≤ h.} (4.48)
Equivalently, in a cylindrical coordinate system (ρ, φ, z) we can describe the
cone as the following subset:
Cone = {(r, φ, z) ∈ R3
: 0 ≤ r ≤ R, 0 ≤ z ≤ h.}. (4.49)
In this manner, we see that φ is a free variable ranging from 0 to 2π – thus in this
coordinate system, the S1
(rotational) symmetry of the cone is explicit. Recalling
that the angular velocity vector ω defines the axis of rotation, we can describe
the fixed-point rotational dynamics of the cone by determining its inertia tensor
for some arbitrary ω.
1. Let ρ = M
V
be the mass density of the cone. To calculate the volume
V of the cone, we can derive an expression simply by doing the volume
integral:
Vcone(R, h) =
φ=π
φ=0
z=h
z=0
r=Rz
h
r=0
dV (4.50)
where dV = rdrdθdφ is orienting volume form of the cone (‘infinitesimal
volume element’). Note that when integrating we must use the geometric
constaint, r
R
= z
h
(via similar triangles), required by a cone whose bas
angle is 45 degrees.
Show that Vcone(R, h) = 1
3
πR2
h. It follows that:
ρ =
3M
πR2h
. (4.51)
Therefore, the infinitesimal mass element is given by dm = ρdV = 3M
πR2h
¨
dV .
2. Diagonal components: moments of inertia Use the previously derived
expressions to calculate the components of the inertia matrix of the cone.
In particular, calculate the following diagonal elements (moments of iner-
tia) of the inertia matrix:
Izz :=
Cone
dm(x2
+ y2
)
=
Cone
dV [ρ(x2
+ y2
)]
=. (4.52)
17
Note that here are we considering the class of rotations with a fixed point – the cone’s vertex.

Due to the rotational symmetry of the cone, one should be able to see that
the Ixx and Iyy moments of inertia should be equal (you can do the explicit
calculations to show this).
Iyy =
Cone
dV [ρ(x2
+ z2
)]
=. (4.53)
Ixx =
Cone
dV [ρ(y2
+ z2
)]
=. (4.54)
You should ﬁnd that
Izz =
3
10
MR2
Ixx =Iyy =
3
20
M

R2
+ 4h2
¨
. (4.55)
Hint: For the Izz integral, it is prudent to change to polar coordinates.
Recall that in a Cartesian coordinate system, we have dV = dxdydz and
that in a polar coordinate system, we have dV = rdrdθdφ with r = |J|
being the Jacobian determinant factor.
3. Off-Diagonal components: products of inertia Now, by either using
direct calculation or exploiting the rotational symmetry of the cone, com-
pute the off-diagonal elements of the inertia matrix. That is, compute the
‘products of inertia’:
Ixz = −
cone
dm(xz)
Ixy = −
cone
dm(xy)
Iyz = −
cone
dm(yz)
(4.56)
Formulae
Hint: Due to the rotational symmetry of the cone, it should be clear that
the products of inertia are zero (but you should prove this!). That is,
Ixy = Ixz = Iyz = 0. (4.57)
Note, it may help to remember the change-of-coordinate relations, x =
r cos(φ), y = r sin(φ).
4. Angular Momentum Note that the inertia matrix for this class of rotations
of the cone (rotations with the vertex ﬁxed) is diagonal:
I =
3
20
M

!
R2
+ 4h2
0 0
0 R2
+ 4h2
0
0 0 2R2
(
0
) . (4.58)
(4.59)

This means that if the angular velocity ω = (ωx, ωy, ωz) is directed along
a coordinate axis (x, y or z) then so is the momentum L:
L = Iω = (λ1ωx, λ2ωy, λ3ωz), (4.60)
where λj (j = 1, 2, 3) are the diagonal elements (actually, eigenvalues)
of the inertia matrix. For example, if the cone is rotation about the x
coordinate axis (i.e. ω = (ωx, 0, 0)), with the vertex fixed at the origin, we
will have:
L = Iω =
3
20
M(R2
+ 4h2
)∂x, (4.61)
where ∂x is a unit vector in the x direction.
Problem 26 (Electra’s Ellipsoid) Prove that an ellipsoid of uniform mass den-
sity ρ, total mass M and principal axes lengths a1, a2, a3 has a diagonal inertia
tensor with principal moments of inertia (eigenvalues):
λj =
2
5
Ma2
. (4.62)
Hint: Due to the reflection symmetries of the ellipsoid, the off-diagonal compo-
nents of the inertia tensor are necessarily zero.
Hint: When doing the integrations, make use of symmetry and switch to spher-
ical coordinates to do the final integrations. The Jacobian factor for spherical
coordinates is r2
sin(θ), where θ is the azimuthal angle (angle between the r
coordinate lines and the z-axis).
Hint: The volume of an ellipsoid can be derived rather easily by integration:
V = Ellipsoid
dV , in spherical coordinates. You should find that V = 4
3
πabc.
Now, compute the inertia matrix for a sphere of radius R. In other words, set
a = b = c = R. What do you notice? What does this suggest about the
rotational dynamics of a sphere?
4.2.2 Principal Axes and Spectral Decomposition
The guaranteed existence of three mutually perpendicular principal axes for any
rigid body, is a consequence of the fact that the inertia tensor is symmetric –
i.e. it is represented by a symmetric matrix: I = IT
. To see this, we now state
and illustrate a proof of the following, vastly important (and powerful) theorem.
Theorem 4 (Spectral Theorem (for symmetric linear operators)) Given any
symmetric linear operator (matrix) A : Rn
→ Rn
acting on n-dimensional
Euclidean space, Rn
, we can extract n independent and mutually orthogonal
eigenvectors with n corresponding eigenvalues. That is, we can diagonalize A:
A = UΛUT
, (4.63)
where Λ = diag(λ1, λ2, ..., λn) is a diagonal matrix consisting of the eigenvalues
λj = 0 of A. The matrix U is a special orthogonal matrix (rotation matrix,
U ∈ SO(3))– that is, it is composed of the (mutually orthogonal) eigenvectors
of A as its columns.
One can make sense of this theorem by observing the following sketch proof.

Proof 1 We can use an inductive proof by starting with n = 1 – i.e. a non-
zero 1 × 1 matrix A = [a]. Clearly this has one eigenvalue a and eigenvector
([a]).Let In denote the n × n identity matrix. Now considering the polynomial
P(λ) = det(A − λIn). By the fundamental theorem of algebra, the eigenvalue
equation P(λ) = 0 has n distinct roots over the ﬁeld of complex numbers, C.
It follows that since the matrix (A − λ n) is non-invertible (zero determinant)
for any eigenvalue λ, that the following equation holds:
Au = λu, (4.64)
for any real vector u ∈ Rn
. We can always divide u by its length u . Since A
and u are real, it follows that:
λ = uT
Au, (4.65)
(since uT
u = u · u = 1) hence λ is real. Doing this for all eigenvalues λj and
eigenvectors uj, we can use the Gram-Schmidt procedure to create an orthonor-
mal set of n linearly independent eigen-vectors. Putting these into the columns
of a matrix: U = [u1....un], we can then use the matrix U to diagonalize A –
that is,
A = UΛUT
, (4.66)
where U is an orthogonal matrix (UT
= U−1
) and Λ = diag(λ1, λ2, ..., λn) is a
diagonal matrix whose entries are the eigenvalues of A.
Since the inertia tensor is a symmetric linear operator – represented by a sym-
metric 3-by-3 matrix, the spectral theorem implies the following physical re-
sult.
Corollary 1 (Existence of Principal Axes) Fony rigid body R and some point
O in space, there are three mutually orthogonal (perpendicular) principal axes
through O. In such a basis, the inertia tensor I = diag(λ1, λ2, λ3) is diagonal.
When the angular velocity ω (rotational axis) points along any of these axes the
angular momentum L is parallel to it.
Therefore, the principal axes of a rigid body are the eigenvectors of its iner-
tia tensor. Furthermore, the ‘principal moments of inertia’ are the moments of
inertia about each of these axes – i.e. the eigenvalues of the inertia tensor.
We now illustrate the process of ‘principal axes decomposition’ (spectral de-
composition) for rotational dynamics. To this extent, let us return to the cube!
Problem 27 (Clymtaemnestra’s Cunning Cube) Having puriﬁed the line of Atreus
of the family curse, Orestes and Electra establish a court of justice under, Apollo
in Athens. All seems well, until a cunning cube rolls into the Athenian court.
With it, the cube brings the Furies – called upon by the spirit of Orestes’ mother
in vengance.
To protect Orestes and transform the Furies into Eumenides (Benevolent Ones),
Athena must perform a spectral decomposition of the inertia tensor for Clym-
taemnestra’s Cube, rotating about its corner.
We computed the inertia tensor for a cube (of edge length a) rotating about its

corner, in our ﬁrst example:
I =
Ma2
12

!
8 −3 −3
−3 8 −3
−3 −3 8
(
0
) . (4.67)
(4.68)
1. Eigenvalues (Principal Moments)
By solving the characteristic equation, det(A − λ 3) = 0, show that we
get the following eigenvalues:
λ1 = 2µ, λ2 = λ3 = 11µ, (4.69)
where µ := Ma2
12
.
2. Eigenvectors(Principal axes)). By solving the eigenvector equations:
(I − λj13)ωj = 0, (4.70)
for eigenvectors ωj with corresponding eigenvalues λj (with j = 1, 2, 3),
show that:
e1 =
1
?
3
(1, 1, 1), (4.71)
is a unit eigenvector corresponding to eigenvalue λ1 = 2µ. Therefore, one
principal axis of the cube is the main diagonal of the cube (between O and
(a, a, a)). The corresponding moment of inertia is equal to λ1 = 2µ.
Solving the second eigenvector equation, with λ2 = λ3 = 11µ, we see that
we get the following constraints (withj = 2, 3)
ωj
x + ωj
y + ωj
z = 0. (4.72)
This is precisely equivalent to the orthogonality condition:
e1 · ej = 0, (4.73)
for j = 2, 3. In other words, the ﬁrst principal axis is perpendicular to
the second and third principal axes – the latter, are not uniquely deter-
mined since we can chosoe any two linearly independent vectors in the
2-dimensinal subspace (plane through the origin) of R3
orthogonal to e1.
3. Eigenbasis (Principal axes basis).
Using the principal axes as our basis vectors, the inertia matrix with re-
spect to this basis, has the diagonal form:
I =
Ma2
12

!
2 0 0
0 11 0
0 0 11
(
0
) . (4.74)
(4.75)
as guaranteed by the spectral decomposition (principal axes) theorem.
For any rotations about the principal axes, the angular momentum L of the
cube is necessarily parallel to the angular velocity ω.

More generally, for bodies of uniform density, a geometric axis of symmetry
will serve as a principal axis for the body. The two remaining principal axes will
be in a plane perpendicular to the first principal axis. In the case that the body
has rotational symmetry about an axis through its center, then the remaining two
principal axes can have any direction perpendicular to the first principal axis.
For rigid bodies with minimal symmetry, it may happen that the principal axes
are uniquely determined – i.e there is no freedom to choose the remaining two
principal axes once the first is established. For bodies with maximal symmetry
– such as sphere, it turns out that any axis is a principal axis.
4.2.3 Parallel and Perpendicular Axis Theorems
In the following extended problem, we will see explicitly how our higher-level
theory of rotational dynamics returns the traditional results from simple rota-
tional dynamics considered in first-year physics and engineering.
Problem 28 (Special Properties) Having converted the Furies, Athena decides
to entertain the redeemed Orestes by asking him to prove the following properties
of the inertia tensor. Help Orestes by solving these problems.
1. Elegance
Recalling the definitions of the components of the inertia tensor for a rigid
body R, prove that we can compile these into the single elegant form:
Ijk =
R
dm[r · rδjk − rjrk], (4.75)
where r is the displacement vector relative to some origin O, and j, k =
1, 2, 3. Our differential mass element dm = ρdV is given in terms of the
mass density profile ρ and differential volume element dV . Furthermore,
the Kronecker delta is defined such that δjk = 1 when j = k and δjk = 0
when j = k.
Hint: If you choose an (x, y, z) Cartesian coordinate system, one has
r = (x, y, z).
2. Additivity
Just like moments of inertia, the inertia tensor obeys an additive property
in the following sense. Given two rigid bodies A and B – e.g. a tetrahe-
dron (pyramid) stacked on a cube, the inertia tensor for the combined18
rigid body, A ⊕ B, has the following inertia tensor:
IA⊕B = IA + IB. (4.75)
Using this property, write down the inertia tensor for a cube stacked on
top of an inverted cone rotating about its vertex.
Likewise, we can ‘subtract’ inertia tensors in this way – i.e. if a rigid body
A B is given by removing a rigid body B from a rigid body A, we have:
IA B = IA − IB. (4.75)
Using this property, write down the inertia tensor for a spherical shell, of
outer radius R2 and inner radius R1, in terms of the inertia tensor for a
18
Here the ‘direct sum’ notation is denote the addition of two subsets of R3
.

ball (solid sphere) of radius R2 and the inertia tensor for a hollow sphere
of radius R1.
3. Triangle Inequalities and Representative Ellipsoid Prove that for rota-
tions of an arbitrary rigid body R about an arbitrary pivot point, O, the
principal moments of inertia of the corresponding inertia tensor (i.e. diag-
onal components in the principal axes basis) obey the triangle inequality:
Ixx + Iyy ≤ Izz, Izz + Iyy ≤ Ixx, Izz + Ixx ≤ Iyy. (4.75)
Hint: Work from the deﬁnition of the components of the inertia tensor.
Also, here we let the (x, y, z) coordinate axes coincide with the principal
axes – hence Ixx = λ1, Iyy = λ2, Izz = λ3.
It follows that the inertia tensor for the rigid body R, is equivalent to the
inertia tensor for an ellipsoid (with principal axes lengths 2a, 2b, 2c) with
the following principal moments of inertia:
2
5
Ma2
=λ2 + λ3 − Iλ1 ≥ 0
2
5
Mb2
=λ3 + λ1 − Iλ2 ≥ 0
2
5
Mc2
=λ1 + λ2 − Iλ3 ≥ 0.
4. Generalized Parallel Axis Theorem The ‘parallel axis’ theorem in your
typical ﬁrst-year physics or engineering textbook, will usually say some-
thing along the lines: “The moment of inertia I for an object of mass M
rotating about axis parallel to an axis through the, center of mass ICM , is
given by:
I = ICM + Md2
, (4.73)
where d is perpendicular distance from the axis throguh the center of mass
and the parallel axis of rotation. In this simple statement, I and ICM are
scalars. We now generalize this to the inertia tensor:
I = ICM + m[(r · r)g − r ⊗ r], (4.73)
where g is the Euclidean metric (identity matrix in Cartesian coordinates)
and r is the position vector relative to the center of mass (origin), O.
To make sense of the above, we shall go back to the component-form of the
inertia tensor that you are familiar with – i.e. matrices. In particular, let
ICM
denote the moment of inertia tensor of a rigid body of mass M about
its center of mass. Let ICM+∆r
denote the inertia tensor about a point,
P = rCM + ∆r, displaced from the CM by a vector δr = (∆x, ∆y, ∆z).
Prove that we then have:
ICM+∆r
xx = ICM
xx +M((∆y)2
+(∆z)2
), ICM+∆r
yz = ICM
yz −M(∆y)(∆z).
(4.73)
4.2.4 Precession and Torque: Equinox, Spinning top
We now consider the motion of a spinning top (of total mass M) consisting
of a solid rod of length R, passing through a uniform circular disk attached.
Weonsider the class of rotations about the tip of the spinning top (rotations with a

fixed point). Let the tip coincide with the origin O of some Cartesian coordinate
system and let R be a displacement vector from O to the center of mass, CM.
At the CM, the force of gravity acts vertically downward: Fgrav = Mg with
g = −g∂z.
DIAGRAM
If we let the top make an angle θ with respect to the z − axis, then due to the
spinning top’s axial symmetry, the axis of the rod is a principal axis. If we let e3
be unit vector in the direction of the rod, then the remaining principal axes are
defined by vectors (e1, e2)perpendicular to e3 – in such a basis, the inertia tensor
is diagonal,
I = diag(λ1, λ2, λ3). (4.74)
In the absence of gravity, we analyse the motion of the top spinning about its
symmetry axis (with basepoint fixed at O). The corresponding angular velocity
is given by ω = ω3, leading to the following parallel angular momentum:
L = λ3ω = λ3ωe3. (4.75)
In the case of zero gravity (no torque), the angular momentum is constant and so
the axis of rotation remains fixed. In the presence of gravity, there is a gravita-
tional torque that is generated:
Γ = R × Mg. (4.76)
This has magnitude RMg sin(θ) and a direction which is perpendicular to both
the vertical z axis and the axis (R) of the top. If we take the (‘reasonable’)
approximation that the torque due to gravity is small (by selecting R, M, g small
relative to other parameters), then we show that the angular momentum direction
(rotational axis) is approximately constant – with a precession about the z-axis.
To see this, note that:
Γ =
d
dt
L. (4.77)
The non-zero time-variation of L implies a time-varying angular velocity vector,
ω. Due to small torque (and hence small time-variation), since we have ωt=0 =
(0, 0, ω3), it follows that ω1 and ω2 remain to be small (which can be made
precise with more detailed analysis). Therefore, we can assume that L = λ3ω =
λ3ωe3 is satisfied throughout the motion – the only time-varying quantity in this
expression is the principal axis, e3. To see this, note that in this regime, the
torque Γ is orthogonal to L as boldsymbolΓ is perpendicular to e3:
λ3ω 9e3 = R × Mg. (4.78)
Using R = Re3 and g = −g∂z, we have:
d
dt
e3 = Ω × e3, (4.79)
where:
Ω =
MgR
λ3ω
∂z. (4.80)
Recall now, the expression for the tangential velocity v of some rotational mo-
tion for an object (or center of mass) with angular velocity (rotation axis) Ω and
displacement vector r is given by:
v = Ω × r. (4.81)

Therefore, the motion of the top is a rotational motion about the rotational axis
R = Re3, with a superimposed precession about the z-axis. In particular, the
vector e3 rotates about the z-axis with an angular frequency:
Ω =
RMg
λ3ω
. (4.82)
This makes sense since the torque vector Γ is directed into the page – the direc-
tion in which the angular momentum vector changes.
Although the motion discussed here describes a spinning top, the same effect –
weak torque and precession, can be applied to the dynamics of the solar system.
For example, because of the earth’s bulge at the equator (oblate shape) – the sun
and moon exert small torques on the earth. These torques cause the earth’s rota-
tional axis (for the 24-hour day-night rotation cycle) to precess. At the moment,
the earth’s spin axis is inclined at θ = 23deg
from the normal to the earth’s orbit
around the sun. Due to the torques acting on the earth, the axis of spin traces
out a half-cone angle of 23 degrees around the normal to the orbital plane – this
precession motion is known as the precession of the equinoxes. The period for
this motion is:
T = 26, 000 years. (4.83)
This means that in 13,000 years the north pole star will be 46 degrees away from
true North.
Note that we have only reached the tip of our spinning top 19
. To understand
the full dynamics of a spinning top, one would have to introduce the concept of
Euler angles 20
In the full treatment, one can prove the validity of the approxima-
tions following the ‘small torque’ setup. The general dynamics also includes an
additional motion – ‘nutation’, which is essentially a tilting motion of the top’s
rotational axis towards the vertical axis. One can illustrate nutation combined
with precession by inscribing the trajectory of the ‘top end’ of the spinning top
onto the surface a sphere (see John Taylor’s Classical Mechanics).
19
Iceberg.
20
Alternatively, the quaternions, H, or the rotation group, SO(3).

4.3 Accelerating Frames: The Tides
In some manner, it is fair to say that modern ‘physics’ began with the impetus
provided by Galileo. The Galilean ‘principle of relativity’ can be summarised as
– “ The laws of motion are the same in all inertial frames.21
This is an extremely
important statement, from which most of Newtonian mechanics can be derived.
In particular, it establishes the fundamental ‘symmetry group’ for the laws of
nature to be that of the ‘Euclidean symmetry group: ISO(3)’ – rotations of
space coupled with translations of space and time22
. This is in contrast to the
Poincare symmetry group (ISO(4)) underlying Einstein’s principle of ‘special
relativity’ – i.e. the speed of light in all frames is a constant and ‘inertial frames’
are deﬁned up to 4D Lorentz transformations (instead of 3D rotations).
In this exploration we will study the symmetries underlying classical, non-relativistic
physics. In this manner, we will demonstrate how to ‘extend’ Newton’s laws to
accelerating reference frames. This will allow us to ﬁnish with a mathemati-
cal and physical explanation of two ‘high’ and ‘low’ tides which are observed
daily on the earth. In particular, we will obtain a somewhat accurate estimate
of the average height difference between high tides and low tides for oceanic
bodies.
4.3.1 Isometries of Euclidean Space: Galileo
Euclidean isometries – Galilean Relativity
Give an explicit representation of ISO(3)
Rot trans R v 0 1
Prove that Newton’s 3 Laws are invariant under ISO(3)
4.3.2 Linearly Accelerating Frames
Let S0 be an inertial frame and let S be a frame with acceleration A relative S0
– for example, a train with some velocity V and acceleration A = 9A relative to
some station O. If a passenger in the train throws a tennis ball (mass m) with
velocity 9r0 relative to S0, then using Newton’s second law in this inertial frame
gives:
F = m:r0. (4.84)
where r0 is the ball’s displacement relative to S0 and F is the net force on the ball.
Relative to the accelerating frame (train) S, the ball’s velocity can be expanded
as its velocity 9r relative to S (the train) and the velocity V of frame S relative to
S0:
9r0 = 9r + V. (4.85)
Differentiating, we see that:
:r = :r0 − A, (4.86)
hence we have:
m:r = F − mA. (4.87)
21
Galileo stated this in his “ Dialogue Concerning the Two Chief World Systems”.
22
‘An extra symmetry.

4.3. ACCELERATING FRAMES: THE TIDES 87
If we identify Finertial = −mA to be the inertial force, we see clearly how
Newton’s law in non-inertial frames is augmented. For example, during take-off
in an aircraft, one feels a force that pushes one back into their seat – likewise
when a bus breaks and one is standing. These ﬁctitious forces (termed ‘inertial
forces’) are simply introduced to extend Newton’s laws to non-inertial reference
frames.
Exercise 30 (Accelerating Pendulum)
4.3.3 The Tides
Using the non-inertial extension of Newton’s second law derived earlier,
m:r = F − mA, (4.88)
we see can obtain a physical explanation of the high and low tides observed
on Earth. The tides are the result of bulges in the earth’s oceans caused by the
gravitational attraction of the moon and sun. As the earth rotates, objects on the
earth’s surface move past these bulges and are subject to a rising and falling of
the sea-level. In our analysis, we shall ﬁrst only consider the effect of a single
external body on the earth’s oceans – in particular, the moon.
DIAGRAMS
An inaccurate explanation of the tides is one in which the oceans bulge towards
the moon on just the side of the earth facing the mood. In such a scenario,
we would only get one high tide per day. The correct explanation is that the
moon’s gravitational attract to the earth, imparts a small accelerating A towards
the moon – a centripetal acceleration of the earth and moon as they orbit a com-
mon center of mass (very close to the earth’s center). This centripetal accelera-
tion of any object as it orbits the earth, corresponds to the pull of the moon that
the object would feel at the earth’s center. Any object on the moon side of the
earth is pulled by the moon with a force slightly greater than that would be at
the earth’s center – hence the ocean surface bulges towards the moon. Objects
on the far side from the moon are pulled by the moon with a force that is slightly
weaker than that at the center of the earth – this slight repulsion causes the ocean
to bulge on the side away from the moon and accounts for the second high tide
of each day.
DIAGRAM
The forces on any mass m near the earth’s surface include
• The gravitational pull of the earth: mg
• The gravitational pull of the moon: −GMmm
d2
ˆd.
• The net non-gravitational force: Fng. This could be the bouyancy force on
a drop of water in the ocean.
The acceleration of the frame S0 (the origin O at the earth’s center) is given
by:
A = −
GMm
d2
0
ˆd0. (4.89)
Thus, using the non-inertial form Newton’s second law, we have:
m:r =F − mA
=(mg −
GMmm
d2
ˆd + Fng) +
GMm
d2
0
ˆd0 . (4.90)

From this we, have:
m:r = mg + Ftid + Fng (4.91)
where the tidal force is:
Ftid = −GMmm(
1
d2
ˆd −
1
d2
0
ˆd0). (4.92)
Finish derivation of equations
Diagrams
Gravitational Potential energies and derivation of tide height difference
Corrections / higher-order effects

4.4. CENTRIFUGAL AND CORRIOLIS FORCES 89
4.4 Centrifugal and Corriolis Forces
In this section23
, we shall see how to extend Newtonian mechanics to rotating
reference frames. Some examples – the surface of the earth (which rotates about
an axis through the earth’s center of mass, a squirrel running on the arm of a
spinning ice-skater (Monica Leslie) or a space station rotating about some axis to
generate artificial gravity. As we shall find, classical dynamics in rotating frames
gives rise to ‘fictitious forces’ when these dynamics are analysed (using New-
ton’s laws) from the perspective of a stationary observer (observer attached to
an inertial reference frame). These ‘fictitious forces’ are the well-known ‘Corio-
lis’ and ‘centrifugal’ forces – despite being ‘fictitious’, their effects are very real.
One can feel them – for example, when moving on a turn-table (merry-go-round)
or driving around a sharp corner.
In the larger scheme of things, the Coriolis and centrifugal forces affect the tra-
jectories of projectiles in the atmosphere as well as the formation and dynamics
of different weather systems.
4.4.1 Rotational Motion and Angular Velocity
Recalling our previous investigation of moments of inertia and rotational dyan-
mics, we characterised the rotational dynamics of a rigid body with a ‘inertia
tensor’ and an angular velocity vector ω, defining the axis of rotation and the
rate of rotation (rotation speed) ω = ω.
We know that for the rotational motion of a rigid body with some fixed point O
of rotation, Euler’s rotation theorem tells us that a general rotation of the rigid
body relative to O can be described as a rotation of the body about some axis
through O. In other words, to describe the rotational dyamics of a rigid body, we
need four pieces of information: (O, ω). Explicitly,
• A fixed point O of the rotational motion. For freely-moving rigid bodies
(with no fixed-point), we can instead use the center of mass, CM and
analyse dynamics in the center-of-mass frame24
.In this frame, the center
of mass is a ‘fixed-point’.
• A vector ω defining the axis of rotation. Since ω ∈ R3
, ω has 3-independent
components in general – hence corresponds to 3 ‘pieces of information’.
Originally, Euler gave geoemtric proofs of his rotation theorem. One such proof
shows that any rotation can be constructed from two reflection transformations
(linear transformations R with determinant −1 and RT
= R−1
) – i.e. elements
of the 3-dimensinal orthogonal group, O(3). Equivalently, Euler’s rotation the-
orem can be proved via ‘Rodrigue’s’ rotation formula or showing that the set
of rotations (linear transformations with RT
= R−1
and determinant +1) in 3-
dimensions, forms a Lie group, SO(3). We give a simple proof now, making use
of basic linear algebra.
Proof 2 (Euler’s Theorem via eigenvectors) Given any rotation R acting on
R3
, there exists some vector n invariant under R.
23
Some of this material is based on Chapter 9 of John R. Taylor’s “Classical Mechanics”.
24
Recall the tutorials on the ‘two-body problem’ – here we showed that the motion of any
rigid body could be decomposed as the translational motion of its CM and rotations about this
CM.

To see the proof of this statement, note that for a vector n to be invariant under
the rotation R, we must have:
Rn = n, (4.93)
which means that n is an eigenvector of R with eigenvalue λ = 1. Therefore,
we must show that such an eigenvector exists.
Since RT
= R−1
, it is easy to see that det(R) = ±1. In fact, since R is a
rotation, it is an isometry of Euclidean space – hence det = +1 (R preserves
relative orientations). Furthermore, since R is a linear operator acting on R3
, we
have: det(−R) = (−1)3
det(R) = −1 and det(R−1
) = det(R) = 1 (transpose
preserves the determinant: det(A) = det(AT
)).Combining these results, we
have:
det(R − I) = det((R − I)T
)
= det(RT
− I)
= det(R−1
− I)
= det(−R−1
(R − I)) = det(−R−1
) det(R − I)
= − det(R − I), (4.94)
hence det(R − I) = 0. This tells us that λ = 1 is an eigenvalue of the rotation
R. If R is a trivial rotation, then R = I and the result follows. If R is non-trivial,
then we have (R − I) == 0, hence (R − I) must have a non-trivial kernel –
hence ∃n ∈ R3
such that:
(R − I)n = 0. (4.95)
It follows that n is an eigenvector of R with eigenvalue λ = 1, hence a vector
invariant under R. This means R represents a rotation about the axis n.
To see Euler’s theorem realized explicitly, we now present the Euler-Rodrigue
rotation formula25
. In particular, any rotation of the displacement vector r =
x∂x + y∂y + z∂z, will take the following form:
r = r + 2a(ω × r) + 2ω × (ω × r), (4.96)
where ω = (b, c, d) and (a, b, c, d) are some set of parameters lying on the 3-
sphere (a higher-dimensional sphere of unit radius):
a2
+ b2
+ c2
+ d2
= 1. (4.97)
In particular, to perform a rotation of r counterclockwise through an angle θ
about an axis deﬁned by the unit-vector k = (kx, ky, kz), we have:
a = cos(
θ
2
), ω = sin(
θ
2
)k. (4.98)
Exercise 31 (Rodrigue’s Rotation) In the timeline of the cyborg Constantine,
Michael Grebla decides to run Rodrigue’s ‘Conerto de Aranjuez’ for a Sunday
concert at St. George’s College. In order to hold this concert, Grebla enlists
the help of the cyborg who must travel back in time to fetch Joaquin Rodrigo.
In the midst of exam-period confusion, Constantine instead brings back Olinde
Rodrigues. As penance for this unnecsesary distortion of the spacetime con-
tinuum, the cyborg is asked to derive Rodrigue’s rotation formula from Euler’s
construction:
r = cos(θ)r + sin(θ)(k × r) + (1 − cos(θ))(k · r)k. (4.99)
25
This can be proven with Euler’s ‘Four-Square identity’ and the Euler rotation parameters.

Help restore balance to spacetime continuum by completing this task. The fol-
lowing hints will help.
Hint: Recall the vector-triple formula (BAC-CAB rule) relates the cross-products
of three vectors a, b, c to pair-wise dot-products:
a × (b × c) = b(a · c) − c(a · b). (4.100)
Hint: Recall your trigonmetric identities: sin(2z) = 2 sin(z) cos(z) and sin2
(z) =
1
2
(1 − cos(2z)).
Note, Rodrigue’s formula can be derived from scratch by using vector pro-
jections (paralllel and orthogonal projections).The more general form gives a
closed-form explicit construction of an arbitrary rotation operator – this makes
use of matrix representations of the Lie algebra so(3) of rotations. To this ex-
tent, if we let K = [k]× be a matrix representing the linear operator k×, then
the rotation matrix representing clockwise rotations through an angle θ about an
axis k (a unit vector) is given by:
Rk(θ) = I3 + sin(θ)K + (1 − cos(θ))K2
. (4.101)
Turning back to the physical world, we now consider the rotational motion of a
point P relative to some fixed origin O with angular velocity (rotation axis) ω.
In particular, we let r = OP be the displacement vector of P and let θ be the
co-latitude of r – i.e. the angle between the rotation axis (vertical) and r.
DIAGRAM
Physically, we could be describing the motion of an object on the earth’s surface,
with ω defining an axis through the north pole of the earth and O the center-of-
mass of the earth. If follows that the point P undergoes circular motion about ω
with angular speed of ω and radius ρ = r sin(θ) (with r = r ). The instanteous
velocity of the particle at any point in its trajectory tangential to its path – in
particular, it is given by:
v = ω × r. (4.102)
However, v = d
dt
r – thus it follows that the rate of change a vector r fixed in a
rotating body, as viewed from a non-rotating frame, is given by:
dt
dt
r = ω × r. (4.103)
Hence the time-derivative differential operator d
dt
acting on vectors fixed in a
rotating body, takes the form:
dt
dt
= ω× (4.104)
in a non-rotating frame. The operator derivative d
dt
is a linear operator. Similarly,
the operator on the right-hand side, ω× is also a linear operator (acting on R3
)
which can be represented by a 3 − by − 3 matrix [ω]×:
ω× → [ω]× = ωj
Ej =

!
0 −ω1
ω2
ω1
0 −ω3
−ω2
ω3
0
(
0
) (4.105)
(4.106)
,an element of the lie algebra so(3) of rotations. Here ωj
are components of the
vector ω in some basis ej (j = 1, 2, 3), with Ej being the generators for rotations
about each basis vector, respectively.

Exercise 32 (Additive Property of Angular Velocity) Translational velocities
are additive in the following way. Given a reference frame 1 and reference frame
2, moving relative to 1 with velocity v21, it follows that a third object with veloc-
ity v32 relative to frame 2, has the following velocity relative to frame 1:
v31 = v32 + v21. (4.107)
Consider now, two rotating frames 1 and 2, with frame 2 having an angular
velocity ω21 relative to frame 1. Now consider a third rigid object, rotating
with an angular velocity ω32 relative to frame 2, about some origin O. Let r be
the displacement vector from O to some point fixed on the third object. Using
the definition of angular velocity: v = ω × r as well as the additive property
of translation velocities, prove that angular velocities also possess the additive
property. That is, show that:
ω31 = ω32 + ω21. (4.108)
Hint: It may help to observe that the vector r is arbitrary.
4.4.2 Differential Operators in Rotating Frames: Newton’s
Second Law
For the remainder of this chapter, we take the convention that capitalized vectors
denote properties of reference frames. For example, we reserve V and A for the
velocity and acceleration of some (non-inertial) reference frame S with respect
to some inertial frame S0. Likewise, we use Ω to denote the angular velocity of
a rotating frame S with respect to inertial frame S0.
We now consider a frame attached to our rotating Earth. The Earth rotates once
about its axis every 24 hours26
. Hence, for a reference frame fixed to the earth,
we have a rotation rate of:
Ω =
2π
24 × 3600
rad
s
≈ 7.3 × 10−5
rad/s. (4.109)
We now let O be the origin of some inertial frame S0, with coordinate axes
x0, y0, z0. Furthermore, we consider some rotating frame S (with coordinate
axes x, y, z) attached to S0, rotating with an angular velocity Ω relative to S0.
Let the {e1, e2, e3} be an orthonormal basis for the rotating frame S – i.e. unit
vectors pointing along the coordinate axes x, y, z of the rotating frame. Time
derivatives in the frame S0 will take a different form to time derivatives in the
frame S, as we shall now demonstrate.
DIAGRAM
Consider an arbitrary vector, r. We shall denote its rate of change relative to the
inertial frame S0 by:
(
dr
dt
)S0 (4.110)
and let
(
dr
dt
)S (4.111)
26
As Taylor notes, the rotation of the earth is one sidereal day – a rotation with respect to
the ‘fixed’(distant) stars. This is shorter than the solar day by a factor of 365/366, which is
negligible for the calculations we are goign to perform.

denote its rate of change relative to the rotating frame, S. In the basis of the
rotating frame, we can expand r as follows:
r = r1
e1 + r2
e2 + r3
e3 = rj
ej, (4.112)
making use of the Einstein summation convention27
in the second equality. Since
the vectors ej are fixed in the rotating frame (attached to the coordinate axes
x, y, z), the rate of change of r in this frame is given by:
(
dr
dt
)S = (
drj
dt
)ej. (4.113)
Given that the scalar functions rj
(components of r) are invariant under isome-
tries (the same in either frame), we don’t need to worry about specifying which
frame these scalar derivatives are taken in. In fact, we should probably denote
such ‘scalar derivatives’ by ∂t or ∂
∂t
.
On the flip side, the vectors ej are co-rotating with the frame S – hence relative
to the inertial frame S0, they are not fixed. Thus, in the frame S0, the rate of
change of r is given by:
(
dr
dt
)S0 = (
drj
dt
)ej + rj
(
dej
dt
)S0 . (4.114)
Since S is rotating with angular velocity Ω relative to S0, it follows that the
velocities of the (co-rotating) basis vectors ej relative to S0 are given by our
tangential velocity formula:
(
dej
dt
)S0 = Ω × ej. (4.115)
Therefore, it follows that the rate of change of the vector r in the inertial frame
S0, is given by:
(
dr
dt
)S0 = (
drj
dt
)ej + rj
Ω × ej. (4.116)
Now note that the first time appearing on the right-hand-side of this equation is
simply the rate of change of r in the frame S, hence we have:
(
dr
dt
)S0 = (
dr
dt
)S + rj
Ω × ej. (4.117)
We can summarize our results in an elegant fashion, by taking the ‘operator’
viewpoint. In particular, ‘vector time derivative’ operators in the rotating frame
S, take the form
(
d
dt
)S =
∂
∂t
. (4.118)
On the other hand, vector time derivative operators take the form
(
d
dt
)S0 =
∂
∂t
+ Ω× = (
d
dt
)S + Ω× (4.119)
in the inertial frame, S0. The appearance of the additional linear operator, Ω×,
in the inertial frame, is extremely important. In particular, it is this term that
generates the apparent (fictitous) centrifugal and Corilios forces!
Mathematically speaking, we could probably formalize our results by consider-
ing dynamical systems in the perspective of differential geoemtry – the ‘vector
27
Repeated indices are summed over the dimension of the vector space. Here we are working
in 3-dimensions, hence j = 1, 2, 3. Thus rj
ej := r1
e1 + r2
e2 + r3
.

derivatives’ would then probably correspond to some vector differential operator
(e.g. a Lie derivative or covariant derivative). The effect of the rotating frame is
then to add the action of the rotation Lie algebra, so(3), on our vector space –
explicitly through the Ω× term (recall that this operator can be represented by a
matrix in so(3)).
Using our results, we can now investigate how Newton’s second law can be
‘extended’ to non-inertial reference frames. For the rest of this section, we shall
use ‘dot’ notation to indicate time-derivatives with respect to the rotating frame,
S. In other words:
9r := (
dr
dt
)S, (4.120)
for any vector r.
Problem 29 (The Newtonian Differential Operator) Recall that Newton’s Sec-
ond Law is a deﬁnition of ‘force’. In particular, that an object with a trajectory
r(t) (path traced out by the displacement vector over time) and mass m, experi-
ences a force F according to the relation:
m(
d2
r
dt2
) = F. (4.121)
This relation holds provided that the time derivatives are taken in an inertial-
frame. In particular, we should write:
m(
d2
r
dt2
)S0 = F. (4.122)
Using our earlier relations, we wish to express the Newtonian differential oper-
ator:
m(
d2
dt2
)S0 = m(
d
dt
)S0 (
d
dt
)S0 , (4.123)
in terms of the time-derivative operators in the rotating (non-inertial) frame S.
Note (!), for the physics we wish to consider – dynamics of objects co-rotating
with the Earth, we make take the angular velocity Ω of the frame S to be con-
stant. Hence,
(
dΩ
dt
)S = 0. (4.124)
From this information, show that:
m(
d2
dt2
)S0 = (
d2
dt2
)S + 2mΩ × (
d
dt
)S + mΩ × (Ω×). (4.125)
In other words, show that:
m(
d2
r
dt2
)S0 =(
d2
r
dt2
)S + 2mΩ × (
dr
dt
)S + mΩ × (Ω × r)
=:r + 2Ω × 9r + Ω × (Ω × r). (4.126)
Hint: First expand the operator m( d
dt
)S0 using our previous results, then apply
the operator ( d
dt
)S0 to your expansion.
Hint: The cross-product is not an associative operator – this is easy to see since
the so(3) matrices representing Ω× are not associatve under matrix multiplica-
tion! Therefore, the brackets in the triple cross-product term are important.

In the above problem, we see that the second-order vector time derivative opera-
tor in an inertial frame is clearly different from the second-order time derivative
operator in the rotating frame (partial derivative, scalar derivative). In particular,
we pick up two ‘additional’ terms. Since the second-order vector time derivative
operator is used to construct Newton’s second law, it is clear that when using
Newton’s second law to describe the motion of an object in a rotating frame –
i.e. the trajectory traced out by the vector r(t), that we have to add ‘extra’ force
terms.
Reviewing our main result
F := m(
d2
r
dt2
)S0 = m(
d2
r
dt2
)S + 2mΩ × 9r + mΩ × (Ω × r), (4.127)
we see that Newtonian ‘force’ in the rotating frame, is given by:
m(
d2
r
dt2
)S = F + 2m9r × Ω + m(Ω × r) × Ω. (4.128)
Exercise 33 Verify the previous statement by re-arranging our force equation
and using the antisymmetric property of the vector cross product operator.
The additional terms appearing on the right-hand side of Newton’s second law
in our rotating frame are the centrifugal force
Fctf = m(Ω × r) × Ω, (4.129)
and the Coriolis force
Fcor = 2m9r × Ω. (4.130)
Note that these are ‘apparent’ forces rather than ‘real forces’, in the sense that
they have no physical mechanism to generate them. Regardless, we experience
these additional forces because when we are in a non-inertial frame. Gravity, on
the other hand, is a physical force and does not depend on choice of reference
frames (it is the curvature of spacetime around massive bodies). In summary,
Newton’s second law in a rotating frame, can be written as
m:r = Fexternal + Fctf + Fcor. (4.131)
We now proceed to investigate the ﬁctitious forces seperately.
4.4.3 Centrifugal Force
The Coriolis force on an object is proportional to the object’s velocity, 9r, relative
to the rotating frame. Therefore, it is negligible for objects that are moving
sufﬁciently slowly – or motions occuring over short time-scales. As an order of
magnitude comparison of the centrifugal and Coriolis forces, we have:
Fcor ∼ mvΩ, Fctf ∼ mrΩ2
, (4.132)
where v is the (translational) speed of the object relative to the rotating frame
of the Earth – i.e. the speed measured by an observer on the Earth’s surface.
Therefore, we have:
Fcor
Fctf
∼
v
RΩ
∼ vV, (4.133)

where R is the radius of the earth and V is the tangential speed for a point on
the Earth’s equator. For objects near the Earth’s surface, it is valid to make the
approximation r ∼ R, since R r. Using the rotation rate of the Earth,
Ω ∼ 7.3 × 10−5
rad/s along with the Earth’s radius: R ∼ 6400km, it follows
that V ∼ 1674.3km/h or equivalently – V ∼ 465m/s. Therefore, for objects
travelling with velocity v 1674km/h it may be reasonable to ignore the
Coriolis force experienced by the object.
DIAGRAM
Free-Fall Acceleation and the true direction of g.
4.4.4 Coriolis Force
Coriolis Force
Turntables – thought experiment
Weather systems – circulation direction.
Projectile deﬂection. Snipers in Northern and Southern Hemisphere.
Free-fall and the Coriolis effect.
Object dropped down a mine-shaft to Earth’s center.
4.5 Focault’s Pendulum
Focault’s Pendulum and measurement of rotation rate of Earth.

Chapter 5
Nature’s Ways: The Calculus of
Variations
A section of notes for topics that individuals have requested. Disclaimer: this is
written from memory and pen-paper calculations.
5.1 Lagrangian Mechanics
5.1.1 Background
After a while, one begins to realise that using Newton’s laws to solve problems
in classical mechanics can get very tedious and annoying. Thankfully, apart
from making good cheese, wine and conquering most of Europe, the French
were (and still are) also very good at producing world-class mathematicians.
One such mathematician was Joseph Lagrange, who amongst a trillion other
accomplishments, came up with a revolutionary reformulation of classical me-
chanics in conjunction with several other mathematicians 1
and physicists. This
approach is now known as ‘Lagrangian mechanics’ and is an extremely power-
ful and vast generalisation of Newtonian mechanics. Today, almost the entirety
of modern physics is based on the principles set down by Lagrange and Hamil-
ton. It also has vast applications to optimization problems and many areas of
engineering.
5.1.2 The Principle of Stationary Action
The fundamental concept behind Lagrangian mechanics is the ‘principle of sta-
tionary action’. It is more commonly referred to as the principle of ‘least action’,
which is technically incorrect 2
. It basically says that nature is lazy, and will al-
ways (classically) take the path of stationary action – which means it makes the
following functional stationary:
S = Ldt (5.1)
1
Most notably, the Irish mathematician Sir William Rowan Hamilton.
2
Recall that when you are trying to find the critical points of a function, you first find its
derivative and then set it to zero. This doesn’t just give you points at which the function is
minimized – you also get inflection points and maxima.
97

98 CHAPTER 5. NATURE’S WAYS: THE CALCULUS OF VARIATIONS
Here the quantity S, called the ‘action’, is a functional – an object which acts
on functions. The function L is called the ‘Lagrangian’ of your theory – it con-
tains all necessary information about your physical system. Different theories
and different systems will have different lagrangians. Finally, the integral
used here is the indefinite-integral with respect to time t, which parametrises the
system.
For systems in classical mechanics, the Lagrangian sometimes (but not always!)
takes the following form:
L = T − U (5.2)
where T is the total kinetic energy of the system and U is its potential energy.
If the system is conservative (i.e. no losses due to friction etc) and the potential
energy U is time-independent, then Lagrangian will take this special form. Note
the minus sign in T − U is important – if this was plus sign, then the Lagrangian
would be the total energy (or Hamilton in this restricted set of cases).
If the system is non-conservative, then one usually has to add extra terms the
action to account for losses / dissipation (or net gain) of energy.
If the system is constrained – e.g. a bead confined to roll on some surface, then
one needs to either use the method of Lagrange multipliers or to express the
system in-terms of unconstrained variables.
5.1.3 The Euler-Lagrange Equations of Motion
The Euler-Lagrange equations of Motion are the equations you have to solve
to determine the dynamical time evolution of your system in the Lagrangian
formalism. In some subset of cases, these are simply equivalent to the equations
of motion you get using Newton’s Second Law: F = ma. Here I will specify
a simple system, then show how to derive the Euler-Lagrange equations for this
system using the principle of stationary action. Later, I will specify a more
general system then re-derive the Euler-Lagrange equations. Finally, I will give
an example of the power of the Lagrangian formalism – in particular, a proof of
the fact that a straight line is the shortest distance between two points in ordinary
Euclidean geometry.
In the Lagrange formalism, a system is specified by a set of generalized coordi-
nates: q1
(t), ..., qn
(t) (parametrised by time t) and a set of generalized velocities
which are the derivatives of the coordinates with respect to time t: 9q1
, ..., 9qn
.
In non-relativistic mechanics, we view the time t as the independent variable
and the coordinates qi
and velocities 9qi
as dependent variables, parametrised by
t. The configuration space is then taken to be the set of all possible values:
(q1
, ..., qn
, 9q1
, ..., 9qn
) of the generalized coordinates and the corresponding ve-
locities. Note that generalized coordinates represent points in some space M,
and the generalized velocities are (tangent) vectors attached to these points (re-
call velocity is a vector quantity). Hence the configuration space of a physical
system naturally takes the form of a ‘tangent bundle’ 3
, denoted TM.
Abstraction aside, we now consider the Lagrangian for a simple system (e.g. a
point-particle moving with constant acceleration) with a generalized coordinate
3
A collection of points and the tangent spaces attached to those points. If the coordinate
space M is n-dimensional, then the tangent bundle TM is 2n-dimensional.

5.1. LAGRANGIAN MECHANICS 99
q and a generalized velocity 9q = dq
dt
. The Lagrangian L = L(q, 9q) for this system
is a function of q and 9q, deﬁned on the conﬁguration space 4
TM.
The action S[L] corresponding to this Lagrangian L, is given by:
S[L] = L(q, 9q) = L(q, 9q)dt. (5.3)
We can compute the variation of this action δS[L] by using integration by parts
and computing the variation of the Lagrangian: δL. Note that to compute
the variation of the Lagrangian, δL, we simply use the same rules as we do
when computing a total differential (or ‘exterior derivative’). In particular, we
have
δL(q, 9q) =
∂L
∂q
δq +
∂L
∂ 9q
δ 9q (5.4)
Note that we have assumed that the Lagrangian L does not explicitly depend on
time t. It only depends on t implicitly through q(t) and 9q(t). If it did explicitly
depend on t, e.g. for a system with a time-varying potential energy U(t), then
we would just include an extra term: ∂L
∂t
in the variation of L.
Therefore, we have:
δS[L] =δ Ldt
= δLdt
= (
∂L
∂q
δq +
∂L
∂ 9q
δ 9q)dt (5.5)
Note that the variation ‘operator’ δ commutes with derivative operators. Hence
for example, d
dt
δq = δ d
dt
q = δ 9q. Our goal is to compute the ‘functional deriva-
tive’ of the functional S with respect to the generalized coordinate q. The func-
tional derivative allows us to differentiate functionals with respect to functions
– apart from a few technicalities, it behaves much the ordinary derivative. This
means we want the quantity δS
δq
, so we need the term δq to right of both terms in
the integrand of (5.5). However, the second term contains δ 9q := δ d
dt
q. In order
to ‘move’ the total derivative d
dt
away from the q, we use the integration by parts
technique 5
:
d
dt
(
∂L
∂ 9q
δq)dt = d(
∂L
∂ 9q
δq) =⇒
{
d
dt
(
∂L
∂ 9q
)δq}dt + (
∂L
∂ 9q
δ
d
dt
q)dt =[
∂L
∂ 9q
δq]|
t=tf
t=ti
(5.6)
where ti and tf denote the range of integration over time – we almost always use
ti = −∞ and tf = +∞ for a classical action. Now, we make the (physically-
motived) assumption that the quantity the quantity on the right-hand side van-
ishes: [∂L
∂ 9q
δq]|
t=tf
t=ti
= 0. This is almost-always true for most physical Lagrangians
4
In general, L could also be a function of higher derivatives of q, for example – L =
L(q, 9q, :q, ..), however for most practical cases we just consider L = L(q, 9q).
5
Or rather, the fundamental theorem of calculus (for 1-dimensional problems) / a special case
of the generalized Stokes theorem for higher dimensions.

L 6
. Therefore, taking this assumption, we get:
{
d
dt
(
∂L
∂ 9q
)δq}dt + (
∂L
∂ 9q
δ
d
dt
q)dt =0 =⇒
(
∂L
∂ 9q
δ
d
dt
q)dt = − {
d
dt
(
∂L
∂ 9q
)δq}dt. (5.7)
This allows us to write the variation (5.5) of the action as:
δS[L] = dt(
∂L
∂q
δq) − dt(
d
dt
(
∂L
∂ 9q
)δq
= dt{
∂L
∂q
−
d
dt
(
∂L
∂ 9q
)}δq. (5.8)
Note that here we’ve made a common (mathematically-motivated 7
) change of
notation: (Stuff)dt =: dt(Stuff). Finally, we bring the δq in the integrand
(5.8) to the left-hand side and formally define the functional derivation of the
action S to be:
δS[L]
δq
=
∂L
∂q
−
d
dt
(
∂L
∂ 9q
). (5.9)
In this language, the principle of stationary action states that the variation must
vanish: δS = 0, which is equivalent to saying the functional derivative is zero:
δS[L]
δq
= 0. Therefore
∂L
∂q
−
d
dt
(
∂L
∂ 9q
) = 0, (5.10)
which are precisely the Euler-Lagrange equations of motion for this dynamical
system! Thus we have explicitly demonstrated that the Euler-Lagrange equations
are a direct consequence of the principle of least action – furthermore, we listed
the assumptions made throughout the derivation. In particular, we assumed zero
boundary contributions to the action and that the Lagrangian L was not explicitly
dependent on time (so ∂L
∂t
= 0) and that it only depended on the generalized co-
ordinates and velocities: L = L(q, 9q). If we relaxed some of these assumptions,
we could extra terms in the Euler-Lagrange equations.
Note, there is another way to view this derivation using Taylor expansions. This
method is a bit more suggestive and intuitive in regards to why we call these
techniques ‘variational principles’ or ‘variational calculus’. The premise is that
we perturb the action by perturbing the function it acts on: S[L + δL] ≈ S[L] +
δS, then define the variation as the difference between the perturbed action and
the original action: δS = S[L + δL] − S[L].
Functions L which satisfy the stationary action condition: δS[L] = 0, are called
Lagrangians. They are inflection points of the action functional. In some cases
they correspond to minima or maxima of the action. For this reason, they are
fundamental to variational calculus. For example, if the action represented the
length of a curve or the surface area of a soap bubble, we could use variational
calculus to find a curve with minimal length or the shape of a soap bubble surface
with minimal area under some given constraints.
6
One rare case where one gets so-called ‘boundary contributions’ to the action integral, is in
general relativity – in particular, the Gibbons-Hawking-York boundary term, which accounts for
the case when spacetime is a manifold with a boundary.
7
In this manner, we can think of the integral sign and the variables we integrate with respect
to (dt) as an abstract operator or ‘functional’ called a ‘measure’. Thus dt is an operator which
acts on functions to give some number – which is the value of the function it integrates.

Example 8 As an example, take the motion of a point-particle with mass m
and position coordinate x, moving in one-dimension. We view x as a function
of time t: x = x(t). Then x is our generalized coordinate with corresponding
generalized velocity 9x. If the particle’s is moving due to some conservative
force acting on it, then it has some associated potential energy U. Assuming U
is independent of time t, we then have U = U(x) in general (e.g. the particle
could be moving vertically and experiencing a gravitational force with potential
U = U(x)). The Lagrangian is then given by:
L = Kinetic Energy − Potential Energy =
1
2
m 9x2
− U(x). (5.11)
The Euler-Lagrange equations then tell us that:
∂L
∂x
−
d
dt
(
∂L
∂ 9x
) = 0, (5.12)
hence we see that
−
∂U(x)
∂x
−
d
dt
(m 9x) = 0. (5.13)
Since U is only a function of one variable, we write the partial derivative as a
total derivative instead, hence:
−
dU(x)
dx
= m:x (5.14)
since the mass m is constant. Recalling that a conservative force F can be de-
ﬁned as the gradient of some potential: F = − U, we then identify −dU(x)
dx
as
the component Fx of the force acting on this particle in the x-direction. Hence
we have:
Fx = m:x (5.15)
which is precisely Newton’s second law. Note that this is based on the assump-
tion that the Lagrangian L was only dependent on x and 9x. In general, one may
have a time-varying acceleration (e.g. a radiating charge or stealth ﬁghter jet)
– in such a case, we would modify the Euler Lagrange equations and therefore
modify our statement of Newton’s second law.
5.1.4 N-Dimensional Euler-Lagrange Equations
To see how this formalism generalizes to higher-dimensional systems, we pro-
ceed as follows. Let qi
denote the i−th generalized coordinate for a system with
n generalized coordinates, q1
, ..., qn
. The n corresponding generalized velocities
are then given by 9qi
, where i = 1, ..., n. Collecting the variables q1
, ..., qn
and
9q1
, ..., 9qn
into vectors q and 9
q, respectively, we can view the Lagrangian as a
function of 2n variables, parametrised by time t:
L = L(q, 9
q; t). (5.16)
The action functional generated by this Lagrangian is given by:
S[L] = Ldt. (5.17)

To vary the action, we Taylor expand L(q1
, ..., qn
, 9q1
, ..., 9qn
) to first order in all
its variables. In particular, we have:
S[L + δL] = L(q + δq, 9
q + δ 9
q)dt
= [L( 9
q, q) +
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt
= L( 9
q, q)dt + [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt
=S[L( 9
q, q)] + [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt,
(5.18)
hence
δS :=S[L + δL] − S[L]
= [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt
= [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
−
d
dt
(
∂L
∂ 9q1
)δq1
+ ... −
d
dt
(
∂L
∂ 9qn
)δqn
]dt
= {[
∂L
∂q1
−
d
dt
(
∂L
∂ 9q1
)]δq1
+ ... + [
∂L
∂qn
−
d
dt
(
∂L
∂ 9qn
)]δqn
]}dt (5.19)
where we have used integration by parts to move the total derivative d
dt
from
the perturbations, ∂ 9qi
, to the corresponding coefficients, ∂L
∂ 9qi . Again, one makes
the assumption of vanishing boundary contributions: d(( ∂L
∂ 9qi )δqi
) = [ ∂L
∂ 9qi ]|∞
−∞=
0.
The principle of stationary action says that a physical system classically evolves
such that the action is stationary: δS
δq
= 0. For this to happen, the coefficients
of the variations δqi
of the coordinates, must vanish in the integral (5.19). This
means that we obtain a system of n differential equations, which are the n −
dimensional Euler-Lagrange equations:
∂L
∂q1
−
d
dt
(
∂L
∂ 9q1
) =0
∂L
∂q2
−
d
dt
(
∂L
∂ 9q2
) =0
...
∂L
∂qn
−
d
dt
(
∂L
∂ 9qn
) =0. (5.20)
In this manner, one can now derive Newton’s Second Law in n dimensions by
generalizing the 1-dimensional case outlined earlier. In particular, this is done
by considering a potential U = U(x1
, ..., xn
) which depends on the n position
coordinates x1
, .., xn
. The velocities are given by dxi
dt
. Putting these into vector
quantities, the kinetic energy of a point particle of mass m with velocity 9
x is
given by:
K =
1
2
m 9
x 2
. (5.21)
Since the potential energy U is time-independent, we can write the Lagrangian
for this system as:
L = K − U =
1
2
m 9
x 2
−U(x). (5.22)

The Euler-Lagrange equations can be found using the system (5.20) earlier. In
particular, since we have
∂
∂ 9qi
9
x 2
=
∂
∂ 9qi
[( 9q1
)2
+ ... + ( 9qn
)2
]
=2 9qi
, (5.23)
the Euler-Lagrange equation for the i − th coordinate of the point particle, is
given by:
m
d
dt
9qi
+
∂U
∂qi
= 0. (5.24)
Re-arranging, this is simply the i − th component of the n-dimensional version
of Newton’s Second Law of motion:
m:qi
= −
∂U
∂qi
. (5.25)
Collecting the n equations into one vector equation, this is made explicit:
F := m:
q = − U, (5.26)
where U is the gradient (vector) of the potential energy function U. This state-
ment is in fact, quite general – that is, a conservative force F arising from a
potential U, is necessarily given by: F = − U. So for example, given a grav-
itational potential U = −GM
r
, we see that the (conservative) gravitational force
is given by:
F = − (
GM
r
) = −
GM
r2
ˆr, (5.27)
where G is Newton’s gravitational constant and ˆr is a unit-vector pointing in the
radial direction away from a massive object of mass M. The minus sign then
accounts for the fact that the gravitational force is directed towards the massive
object.
5.1.5 Examples
Example 9 (Simple Pendulum) Consider a vertical pendulum of mass m and
length l. We set up a coordinate system with horizontal (pointing right) coordi-
nate x and vertical (downward) coordinate y, where θ is the angle between the
vertical y-axis and the arm of the pendulum. We set the origin to be at the begin-
ning of the pendulum arm, from which the mass hangs at the opposite end. Since
this system is undergoing rotational motion (the mass at the end of the pendu-
lum is moving in a circular arc of radius l) with a ﬁxed radius l (the length of the
pendulum arm), the mass at the end of the pendulum has a tangential velocity
of: v = rω = r 9θ. Therefore, the total kinetic energy is given by:
K =
1
2
m v 2
=
1
2
ml2 9θ2
. (5.28)
The potential energy is given by: U = Gravitational Force × Distance, which is
the projection of mgl in the vertical direction:
U = mgy = mgl cos(θ). (5.29)
The Lagrangian is therefore given by
L(θ, 9θ) = K − U =
1
2
ml2 9θ2
− mgl cos(θ), (5.30)

where θ and 9θ are the generalized coordinate and corresponding generalized ve-
locity, respectively. The Euler-Lagrange equation is given by
∂L
∂θ
−
d
dt
∂L
∂ 9θ
= 0, (5.31)
which simplifies to
:θ +
g
l
sin(θ) = 0. (5.32)
This differential equation can be solved analytically for θ using hypergeometric
functions. Alternatively, one can make the small angle approximation to lin-
earise this non-linear differential equation: sin(θ) ≈ θ, for small displacements
θ 1 (radians).
Note that using the Lagrangian approach, one only needs to compute the poten-
tial energy and kinetic energy for the pendulum system. This is a rather trivial
task (as shown) which avoids the messiness of having to consider forces and
‘tension’, which is required by the Newtonian approach.
Another advantage of the Lagrangian formalism, is that one may easily change
coordinates without having to worry about introducing ‘fictitious forces’ (e.g.
centrifugal, Coriolis) – the principle of ‘generalised coordinates’ essentially bids
one to express the Lagrangian in terms of the most ‘natural’ coordinate system
for the problem at hand. Here made use of the rotational nature of the problem to
switch from the Cartesian x, y coordinates to the polar coordinates r, θ (although
we didn’t use r, since we the radial coordinate was fixed at r = l).
Example 10 (Harmonic Oscillator) Consider a 3-dimensional harmonic oscil-
lator. Such a system may be envisioned as a mass attached to a spring, whose
other end is fixed at some origin. If we let a 3-dimensional Cartesian coordinate
system – x, y, z – coincide with initial (non-stretched) position of the mass, then
stretching the string in any direction will induce a radial oscillatory motion. Let
k denote the spring constant and m denote the mass at the end of the spring. The
force on the mass is given by Hooke’s law:
F = −kr (5.33)
where r is the (radial) position vector: r = xe1 + ye2 + ze3 ∼ (x, y, z). The
potential energy of the spring is equal to the work done required to stretch the
spring from its rest
U =
r
0
F · dl = (−kr)dr = −
1
2
kr2
. (5.34)
The kinetic energy of the mass is given by
K =
1
2
m v 2
=
1
2
m9r2
, (5.35)
where 9r2
= 9x2
+ 9y2
+ 9z2
. We could use Cartesian coordinates, however radial
coordinates are the ‘natural choice’ for this problem (since it is effectively a 1-
dimensional problem – the motion only occurs in the radial direction, which is
one-dimensional). Therefore we choose r and 9r = d
dt
r to be our generalized
coordinate and generalized velocity, respectively. The Euler-Lagrange equation
is given by
∂L
∂r
−
d
dt
∂L
∂ 9r
= 0, (5.36)

which reduces to
:r +
k
m
r = 0. (5.37)
This second-order linear differential equation is solved by the usual means. In
particular, the characteristic equation is given by:
λ2
+
k
m
= 0, (5.38)
whence the eigenvalues are λ = ±i
˜
k
m
. Let ω :=
˜
k
m
denote the fundamental
frequency. Then the general solution is giveb by:
r(t) = c1eiωt
+ c2eiωt
, (5.39)
where c1 and c2 are constants determined by the initial conditions. This can
alternatively be expressed in real form,
r(t) = a1 cos(ωt) + a2 sin(ωt) (5.40)
where are a1 and a2 are constants determined by the initial conditions. In partic-
ular, r(0) = a1 and 9r(0) = a2ω. Hence a1 is the initial displacement and a2 is
the initial velocity divided by the fundamental frequency.
Note if you’ve forgotten how to get from complex form to real form, recall that
cos(x) =
eix
+ e−ix
2
, sin(x) =
eix
− eix
2i
(5.41)
where i2
:= −1. Comparing coefficients we see that the constants are explicitly
related by:
c1 =
a1
2
+
a2
2i
, c2 =
a1
2
−
a2
2i
. (5.42)
5.1.6 Multiple Independent Parameters
For the purpose of the (modern and topical) branch of mathematical physics
known as ‘minimal surface’ theory, along with relativity and quantum field the-
ory, it is important to extend the Lagrangian formalism to include physical sys-
tems – or more specifically, generalized coordinates, which depend on more
than one independent parameter. Until now, we have considered systems which
were parametrised by one independent variable – time t. We now consider sys-
tems which are parametrised by k independent variables, which we shall denote
t1, ..., tn for familiarity.
For simplicity, we shall just consider systems with one generalized coordinate
(parametrised by multiple variables) for now. The extension to an arbitrary num-
ber of generalised coordinates is done in the obvious way, analogous to our pre-
vious extension when we had just one independent parameter t.
Let t1, ..., tk denote our k independent parameters and let q := q(t1, ..., tk) denote
our generalized coordinate, dependent on these parameters. The correspond-
ing generalized velocities (with respect to each parameter) are then give by:
∂q
∂t1
,..., ∂q
∂tk
. Given some function L := L(q, ∂q
∂t1
, ..., ∂q
∂tk
; t1, ..., tk) explicitly de-
pendent on the generalized coordinate q, generalized velocities ∂q
∂ti
and implicitly
dependent on the independent parameters t1, ..., tk, we now wish to formulate a

variational problem. In particular, we consider the following action functional
(a k-dimensional integral performed over t1, ..., tk):
S[L] = Ldt1dt2...dtk (5.43)
and ask the question – which functions L make this action stationary? To solve
the variational problem, we proceed as before to vary the action by Taylor expan-
sion of L in all its variables. In order to do this, some new notation will be handy.
Let vq
i denote the i − th generalized velocity corresponding to the generalized
coordinate q – particular, we have: vq
1 := ∂q
∂t1
, ..., vq
k := ∂q
∂tk
. The variation of the
Lagrangian is then given using the same rules as the total differential:
δL =
∂L
∂q
δq +
∂L
∂vq
1
δvq
1 + ... +
∂L
∂vq
k
δvq
k. (5.44)
Therefore, the variation of the action is given by:
δS = δLdt1...dtk
= [
∂L
∂q
δq +
∂L
∂vq
1
δvq
1 + ... +
∂L
∂vq
k
δvq
k]dt1...dtk
= [
∂L
∂q
δq −
∂
∂t1
(
∂L
∂vq
1
)δq − ... −
∂
∂tk
(
∂L
∂vq
k
)δq]dt1...dtk
= [
∂L
∂q
−
∂
∂t1
(
∂L
∂vq
1
) − ... −
∂
∂tk
(
∂L
∂vq
k
)]δqdt1...dtk. (5.45)
where we have used integration by parts (or Stoke’s Theorem) for multiple vari-
ables, to swap the derivatives ∂
∂ti
from the velocity variations δ ∂q
∂ti
to the corre-
sponding coefﬁcients ∂q
∂ti
– which introduces the minus signs. Therefore, we have
the functional derivative of the action with respect to the generalized coordinate,
given by:
δS
δq
=
∂L
∂q
−
∂
∂t1
(
∂L
∂vq
1
) − ... −
∂
∂tk
(
∂L
∂vq
k
). (5.46)
The principal of stationary action tells us that nature classically selects this func-
tional derivative to be zero, which gives us the Euler-Lagrange equations for a
system with one generalized coordinate q, parametrised by k independent vari-
ables t1, ..., tk:
0 =
δS
δq
|Nature=
∂L
∂q
−
∂
∂t1
(
∂L
∂vq
1
) − ... −
∂
∂tk
(
∂L
∂vq
k
). (5.47)
5.1.7 More Examples
We can use variational calculus to derive the (rather famous) minimal surface
equation. In particular, we consider the following example.
Example 11 (Minimal Surface Equation) We consider all two-dimensional sur-
faces parametrised by two independent variables, z := z(x, y), then ask the
question – which surface of this general form has the minimal surface area? To
answer this question, we can use the Euler-Lagrange equation (5.47) derived
earlier. Say that the surface z := z(x, y) parametrised by the two independent
variables t1 = x and t2 = y, has a domain D. Then (recall) its surface area is
given by the double-integral:
A =
d
1 + (
∂z
∂x
)2 + (
∂z
∂y
)2dxdy. (5.48)

We can view this as a variational problem by observing that: z is generalised
coordinate parametrised by two independent variables x and y. The correspond-
ing generalised velocities are given by (various notations) vz
1 = zx := ∂z
∂x
and
vz
2 = zy := ∂z
∂y
– we shall stick with the latter notation. Now, the total surface
area A can be viewed as an action functional: A = A[L], whilst our integrand
(infinitesimal / area differential) can be viewed as the corresponding Lagrangian:
L(z, zx, zy) =
˜
1 + (∂z
∂x
)2 + (∂z
∂y
)2 =
—
1 + z2
x + z2
y.
Since we seek to minimize A, we need to first find surfaces (parametric func-
tions) z(x, y) which make the functional A stationary. We then need to check that
these stationary ‘points’ (functions) correspond to minima, rather than inflection
points or maxima. The first task can be achieved by solving the Euler-Lagrange
equations (5.47), which take the form:
∂L
∂z
−
d
dx
∂L
∂zx
−
d
dy
L
∂zy
=0 =⇒
0 +
d
dx
zx
—
1 + z2
x + z2
y
+
d
dy
zy
—
1 + z2
x + z2
y
=0 . (5.49)
Although the last equation, known as the ‘minimal surface equation’, was de-
rived by Lagrange in 1762, non-trivial (non-planar) solutions were not found till
1776 by the French Mathematical Engineer, Jean Meusnier. In particular, the
planar solution is given by:
Z(x, y) = Ax + By + C (5.50)
where A, B, C are constants. Here Zx = ∂Z
∂x
= A, Zy = ∂Z
∂y
= B and L =
?
1 + A2 + B2 e.t.c.
Switching to cylindrical coordinates: (ρ, θ, z), with x = ρ cos(θ), y = ρ sin(θ)
and z = z, we have another solution to the minimal surface problem. This is
given by the Catenoid – a surface of revolution parametrised by a single inde-
pendent variable, z:
ρ = λ cosh(
z
λ
) (5.51)
where λ is a constant. Note that ρ is independent of the second independent
variable θ, since the surface rotationally symmetric (it was produced by rotating
a catenoid curve about the z-axis). To show this is a solution, we can either re-
derive the minimal surface equation, starting from the infinitesimal area element:
dA =
˜
1 + (∂ρ
z
)2 + (∂ρ
∂θ
)2ρdθdz, or try some messy crap with the chain rule and
the Cartesian coordinate equation. It’s far easier to start from the action principle
again, with the Lagrangian: L(ρ, ∂ρ
∂z
, ∂ρ
∂θ
). Since our Catenoid is independent
of theta (symmetry in θ), we have ∂ρ
∂θ
= 0. Therefore, our Lagrangian is the
coefficient function(coefficient of dθ ∧ dz) our area 2-form element:
L = L(ρ, ρz) = ρ
™
1 + (
∂ρ
∂z
)2 + 0. (5.52)
Letting ρz := ∂ρ
z
, our Euler-Lagrange equation is given by:
∂L
∂ρ
−
d
dz
∂L
∂ρz
= 0, (5.53)
which simplifies to:
—
1 + ρ2
z −
d
dz
ρzρ
—
1 + ρ2
z
= 0. (5.54)

With some application of the chain and product rules, along with the hyperbolic
trigonometric identities
1 + sinh2
(x) = cosh2
(x)
d
dx
cosh(λx) =λ sinh(λx),
d
dx
sinh(λx) = λ cosh(λx)
d
dx
tanh(x) = sech2
(x) (5.55)
one can show that the Catenoid surface, given by ρ(z) = λ cosh(z
λ
), solves the
Euler-Lagrange equation (5.54). Hence the Catenoid corresponds to a ‘critical-
surface’ (cf. ‘critical point’) of the surface area functional A and makes this
functional (action) stationary. To see that it is indeed a minimal surface, simply
note that the Lagrangian is given by the square root of a strictly-positive quantity.
Since the Lagrangian is strictly positive, the corresponding area (action) integral
is strictly positive. This means that the Catenoid surface (or in fact any surface!),
cannot be a maximal surface. Hence the Catenoid is either a stationary point or
a minima of the area action functional. It is in fact a minimal surface.
5.1.8 Closing Remarks
: The Lagrangian formalism is for the most part, a second-order formalism. This
means that the equations of motion resulting from the Euler-Lagrange equations
are usually second order differential equations. For many different reasons, it is
sometimes to advantageous or necessary to switch to a first-order formalism –
‘Hamiltonian mechanics’. To do this, one defines the Hamiltonian as the Legen-
dre transform of the Lagrangian:
H(q, p; t) = p · 9
q − L(q, 9
q; t) (5.56)
where the p is the conjugate momentum vector (related to the generalized ve-
locities). The components of p are defined as the partial derivatives of the La-
grangian with respect to the generalized velocities:
pi
:=
∂L
∂ 9qi
. (5.57)
In this formalism, the natural variables are now the generalized coordinates q and
the conjugate momenta p. From a practical point of view, the ultimate result is
that Hamilton’s equations are coupled first-order differential equations – which
in general are easier to solve than the Euler-Lagrange equations.
Although they are essentially equivalent, there are many theoretical motivations
for the Hamiltonian formalism – most notably, that it allows a dynamical system
to be represented in ‘phase space’. Evolution of the system is then described
by trajectories (q(t), p(t)) in phase-space. With such a structure, the system
can be analysed using symplectic geometry and Liouville theory – the key point
being that the Hamiltonian H(q, p) defines a ‘flow’ on phase space (a map on
the cotangent bundle). This flow gives rise to a conserved, non-vanishing object
called the ‘symplectic form’ – the basis for many deep mathematical theorems
regarding dynamics.

SGC 2015 - Mathematical Sciences Extension Studies

More Related Content

What's hot

Viewers also liked

Similar to SGC 2015 - Mathematical Sciences Extension Studies

SGC 2015 - Mathematical Sciences Extension Studies