St. George’s College 2014 - Mathematical Sciences
Tutorials (Broad Concept Problems)
Daniel Xavier Ogburn ∗
School of Physics,
Field Theory and Quantum Gravity,
University of Western Australia
December 22, 2014
∗
Electronic address: daniel.ogburn@research.uwa.edu.au
1
Contents
1 Introduction 6
2 Tutor List 6
3 Broad Concept Problems 6
4 Tutorial 1 - Dimensional Analysis and the Buckingham Pi Theorem
(part I) 7
4.1 Prologue: March 15, 2014 . . . . . . . . . . . . . . . . . . . . . 7
4.2 Examples and Problems . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.1 Moral of the story . . . . . . . . . . . . . . . . . . . . . . 15
5 Tutorials 2 - Dimensional Analysis and the Buckingham Pi Theorem
(part II) 15
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Examples and Problems . . . . . . . . . . . . . . . . . . . . . . . 16
6 Tutorial 3 - Return of Dimensional Analysis: Gravity, The Hierarchy
Problem and extra-dimensional Braneworlds 22
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Extended Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 Tutorial 4: 50 Shades of Error, Shade I – Multivariable calculus and
The Total Differential 29
7.1 Russian Playpen: Functions of more than one variable . . . . . . . 29
7.2 Russian Daycare: Partial Differentiation . . . . . . . . . . . . . . 30
7.3 Russian Kindergarten: The Exterior Derivative (Total Differential) 34
7.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 36
8 Tutorial 5: Absolute Error and Game of Thrones 41
8.1 Absolute Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.2 Examples and Problems . . . . . . . . . . . . . . . . . . . . . . . 43
9 Tutorial 6: Medicine – An Error a Day Keeps the Tutor Away 54
9.1 Relative and Percentage Error . . . . . . . . . . . . . . . . . . . 54
9.2 Error Etiquette . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9.3 Sleepy Snorlax’s Medical (mis)Adventures . . . . . . . . . . . . 55
2
10 Tutorial 7: Romanian High School, Part I – Einstein Convention and
Vector Algebra 61
10.1 Conventions: Einstein Notation and Vector/Matrix Operations . . 62
10.1.1 Scalar and Vector Products – Dot Product . . . . . . . . . 66
10.1.2 Scalar and Vector Products – The Permutation Symbol . . 69
10.1.3 Scalar and Vector Products – The Cross Product . . . . . . 70
11 Tutorial 8: Design a Death Star – applications of Lie Groups/Algebras 73
11.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
11.2 BFF: Linear Maps and Matrices . . . . . . . . . . . . . . . . . . 74
11.3 SO(3): The Lie Group of Rotations . . . . . . . . . . . . . . . . . 76
11.4 so(3): Quaternions, Lie Algebras and Cross Products . . . . . . . 83
12 Tutorial 9+10: The Fault in Our Stars – Project Death Star (II) 84
12.1 Infinitesimal Rotations and Lie Algebras . . . . . . . . . . . . . . 85
13 Tutorial 11: Fiery the angels fell – Project Death Star (III) 100
13.1 Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
13.2 The Circle Group . . . . . . . . . . . . . . . . . . . . . . . . . . 101
13.3 The Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13.4 Quaternions, Rotations and the 3-Sphere . . . . . . . . . . . . . . 109
14 Interlude: Academic and Intellectual Maturity 113
14.1 Keeping a CV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
14.2 Important Learnings and Observations . . . . . . . . . . . . . . . 114
14.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
15 Tutorial 12: Metric Spaces and Relativity I 119
15.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.1.1 Euclidean Metric Spaces . . . . . . . . . . . . . . . . . . 120
15.1.2 Fun Metric Spaces . . . . . . . . . . . . . . . . . . . . . 123
15.2 Non-Euclidean Metric Spaces and Relativity . . . . . . . . . . . . 130
16 Tutorial 13/14: Relativity and Hyperbolic Distance 131
16.1 The Two Faces of Trigonometry . . . . . . . . . . . . . . . . . . 131
16.1.1 The Circular Face . . . . . . . . . . . . . . . . . . . . . . 131
16.1.2 The Hyperbolic Face . . . . . . . . . . . . . . . . . . . . 134
16.2 Lorentz Metric and Relativity . . . . . . . . . . . . . . . . . . . . 137
16.2.1 Minkowski Spacetime . . . . . . . . . . . . . . . . . . . 138
16.2.2 Lorentz Metric and Light-Cone Structure . . . . . . . . . 140
16.2.3 Projections and Familiar Formulas . . . . . . . . . . . . . 146
3
17 Tutorial 15: Differential Equations and Operators 150
17.1 Differential Operators and Simple DEs . . . . . . . . . . . . . . . 150
17.2 Physical Examples . . . . . . . . . . . . . . . . . . . . . . . . . 156
17.3 Operators, Eigenfunctions and Spectra . . . . . . . . . . . . . . . 159
18 Tutorial 16:Differential Equations and Integrating Factors 162
18.1 Review – Theory of separation of variables . . . . . . . . . . . . 162
18.2 Integration Factors . . . . . . . . . . . . . . . . . . . . . . . . . 164
19 Tutorial 17: Second Order Linear Differential Equations 170
19.1 Homogenous Second Order ODEs . . . . . . . . . . . . . . . . . 170
19.2 Theory of Linear ODEs . . . . . . . . . . . . . . . . . . . . . . . 172
19.3 Explicit Algorithm and Illustrations . . . . . . . . . . . . . . . . 175
20 Tutorial 18: Calculus of Vectors and Differential Forms I 183
20.1 Vector Valued Functions . . . . . . . . . . . . . . . . . . . . . . 183
20.2 Exterior Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . 186
21 Tutorial 19: Calculus of Vectors and Differential Forms II 194
21.1 Gradients and Exterior Derivatives . . . . . . . . . . . . . . . . . 194
21.1.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . 194
21.1.2 Exterior Derivatives . . . . . . . . . . . . . . . . . . . . 199
21.2 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
21.3 Hodge Dual, Closed and Exact Forms . . . . . . . . . . . . . . . 204
22 Tutorial 20: Calculus of Vectors and Differential Forms III 205
22.1 Sleight of Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
22.2 Curl of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . 207
23 Tutorial 21: Coordinate Systems and Scale Factors 212
23.1 Orientation and Measure . . . . . . . . . . . . . . . . . . . . . . 212
23.2 Smooth Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
24 Tutorial 22: Line Integrals and Exterior Calculus 221
24.1 Exterior Product and Derivatives . . . . . . . . . . . . . . . . . . 221
24.2 Orienting Volume Forms . . . . . . . . . . . . . . . . . . . . . . 224
24.3 Duality and Orthogonality . . . . . . . . . . . . . . . . . . . . . 226
25 Tutorial 23: Serendipity and Integration 230
25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
25.2 Differentigration . . . . . . . . . . . . . . . . . . . . . . . . . . 231
4
25.3 Quantum Field Theory Aside . . . . . . . . . . . . . . . . . . . . 236
25.4 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
26 2015 Academic Program Suggestions 239
26.1 Tutoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
26.2 Mathematical Sciences Tutorial Plan . . . . . . . . . . . . . . . . 241
27 Miscellaneous 242
27.1 Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . . 242
27.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 242
27.1.2 The Principle of Stationary Action . . . . . . . . . . . . . 242
27.2 The Euler-Lagrange Equations of Motion . . . . . . . . . . . . . 243
27.3 N-Dimensional Euler-Lagrange Equations . . . . . . . . . . . . . 247
27.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
27.5 Multiple Independent Parameters . . . . . . . . . . . . . . . . . . 252
27.6 More Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
27.7 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 256
5
1 Introduction
At this present moment, the suggested layout for tutorials will be: 20 minutes of
‘broad concept problems’ and 40 minutes of subject-specific help with questions
from your coursework. If the there is a large turnout of students and more time is
necessary, the tutorials will be extend to 30 minutes of ‘broad concept problems’
and 60 minutes of subject-specific help. In addition, students may approach tutors
outside of tutorial times for help with specific coursework problems or concepts –
but they will have to arrange this themselves with the individual tutors.
2 Tutor List
For the year 2014, here is a list of tutors and the respective subjects they are dedi-
cated to. Note that each tutor will probably be able to help you with other mathe-
matics or physics related enquiries.
• Murdock Grewar – PHYS1021
• Tessa McGrath – MATH1711
• Ben Luo – PHYS1001
• Jake Miller – MATH1001
In addition, students with any mathematics, physics or statistics enquiries are wel-
come to seek me for assistance.
3 Broad Concept Problems
These problems are designed to help expose you to important material and concepts
outside the scope of a standard curriculum. They are also designed to help you
think about applications of your mathematical powers to the world at large. In this
manner, the hope is that students will develop a higher level of critical thinking,
logical reasoning and mathematical intuition for investigating different scenarios
and solving new problems.
Note that I will generally aim to cover problems that you wouldn’t usually see in
your lectures – or focus on topics which are (by student and professional experi-
ence) useful and important, but otherwise overlooked or just briefly glossed-over
6
in typical university courses. Since the tutorials are targeted at people from both
pure and applied mathematics, or physical and non-physical sciences, the assumed
physical science knowledge will be kept to a minimum. In cases where physics or
engineering examples are used, the prerequisite concepts will be introduced – but
only to emphasize the take-home message.
To get the most out of these tutorials, you should attempt all the broad-concept
problems. Some weeks, we will continue a certain theme from the previous week.
If you can’t finish the problems in the tutorial, or decide to finish them after the
tutorial in your own-time – feel free to ask questions during the week. The tutors
appreciate that students are busy with their coursework and assessed homework
problems, so the broad concept problems should be fairly quick to solve. As an
incentive, doing these ’extra-curricular’ questions should give you an edge over
your rival Tommy Moore, Uni Hall, St. Cats and Trinity students.
4 Tutorial 1 - Dimensional Analysis and the Buckingham
Pi Theorem (part I)
4.1 Prologue: March 15, 2014
Dimensional analysis is a deceptively simple, but fundamentally powerful tool in
the mathematical sciences – one that is often overlooked! There will be a day
when the importance of dimensional analysis is forgotten and lost in the education
system, but today is not that day.
Ultimately, dimensional analysis serves as fast error-checking algorithm for your
calculations. It is also useful for extracting ‘physically meaningful’ information
out of your system. In particular, given a large set of parameters describing a
system, one can often form a smaller number of dimensionless parameters which
completely characterize that system – hence removing any redundant information.
The precise statement of the last idea is known as the ‘Buckingham Pi Theorem’1,
which we shall investigate next week – don’t worry about the formality of the
name, it has vast (but simple) practical applications to fluid mechanics, thermody-
namics, electrodynamics, cosmology and much more. For now, we begin with a
few examples then work through some questions 2.
1
For those of you who have done (or will do) linear algebra, this is just a practical consequence
of the ‘rank-nullity’ theorem.
2
Thanks to Scott Meyer and Matthew Fernandez for feedback
7
The main idea of the following examples and problems is two-fold: first inspect
an equation and work out the dimensions (or units) of each variable and constant,
given some starting information. We then check whether or not the equation is
dimensionally consistent. Any equation from any area of science and mathematics
must be dimensionally consistent – if it isn’t, then it’s wrong. In this sense, you
don’t need to understand the science or theory behind an equation to deduce when
it is incorrect on dimensional grounds.
4.2 Examples and Problems
Recall lengths, areas and volumes. The fundamental unit that characterizes these
quantities is length: L. Given a rectangular box, with sides of length a, b, c the
volume is VB = a × b × c. Since each of the sides has the dimensions of length:
[a] = [b] = [c] = L, the volume has dimensions
[VB] =[a × b × c]
=[a] + [b] + [c]
=L + L + L
=3L ,
which we interpret as length-cubed: L3. The notation [ ] is used to denote the
dimensions of whatever quantity is inside the brackets. Notice also, that when we
were looking for the dimensions of a product of variables [a × b × c], we added the
dimensions of each variable: [a×b×c] = [a]+[b]+[c] = L+L+L = 3L.Finally,
we ended up with [VB] = 3L, which means that the volume V has 3 factors of the
unit length L – hence volume V has dimensions of length-cubed L3. Of course,
we already knew this!
Similarly to the multiplication rule, if we are inverting quantities we invert their
units – hence: [1
a] = −[a], [ 1
a2 ] = −[a2] = −2[a], etc. Combining this with the
multiplication rule, we get the division rule: [a
b ] = [a] − [b]. For example, if C
is the concentration of protein in milk, it has units ML−3 of mass over volume –
hence dimensionally: [C] = M − 3L.
Exercise 1 Use the rectangular box example to calculate the dimensions of the
area of a rectangle of sides with length ‘a and ‘b , given the area formula
AR = ab. (1)
Now that we have done some simple problems, lets see how dimensional analysis
can be used for error checking. Lets say someone tells us that the volume VS of a
8
sphere of radius R is given by
VS =
4
3
πR2
. (2)
Obviously, this is wrong – but if you’ve forgotten the correct formula, there’s an
easy way to see why it is wrong using dimensional analysis. First of all [R] = L,
since radius has dimensions of length. Furthermore, [4
3π] = 0 since this is just a
pure number (so it is dimensionless). Therefore,
[VS] =[
4
3
πR2
]
=[
4
3
π] + [R × R]
=[
4
3
π] + [R] + [R]
=0 + L + L
=2L.
But wait a minute, volume has units of length cubed, hence [VS] = 3L. We
then conclude by dimensional arguments that the formula VS = 4
3πR2 is incor-
rect!
Although the last example was easy, the same principles can be applied to much
more complicated formulas in the mathematical sciences – indeed, it is used in
research and in practice when doing estimates, checking articles or performing
large derivations and calculations. Lets do one more example.
Example 1 Newton’s Second Law of Motion: Force = Mass × Acceleration, or
F = ma, is the fundamental postulate governing classical physics between the
late 17th and early 19th centuries. It is vastly important today as the law defines
what the force is, for an object of mass ‘m moving with an acceleration ‘a . The
three fundamental units here are mass M, time T and length L. Displacement ‘x
has dimensions of length L, hence velocity ‘v – which is the rate of change of
9
displacement 3, has units of length over time:
[v] =[
dx
dt
]
=[dx] − [dt]
=[x] − [t]
=L − T , (3)
hence v has units L
T . Similarly, acceleration a is the rate of change of velocity,
hence
[a] =[
dv
dt
]
=[dv] − [dt]
=(L − T) − T
=L − 2T, (4)
which means ‘a has units of length over time-squared: L
T2 . Finally, mass m triv-
ially has units of mass: [m] = M (note that here we use the capital M to denote
the fundamental unit of mass, where as the lower-case m is mass variable that we
insert into Newton’s 2nd Law). Therefore, force F has the following dimensions
[F] =[m][a]
=[m] + [a]
=M + L − 2T,
(5)
whence F has units of (mass × length)/ (time-squared): ML
T2 .
Exercise 2 Use dimensional analysis to conclude which formulas are incorrect on
dimensional grounds – i.e. which of the following formulas are dimensionally
inconsistent. Show your working.
1. A triangle has a base b and a vertical height h, each with dimensions of
length L. Check whether the following formula for its area is dimensionally
consistent
A =
1
2
b2
h. (6)
3
For those of you unfamiliar with the definition of velocity and acceleration in terms of calculus,
you can think of dx
dt
as the change in displacement x over an ‘infinitesimally small amount’ of time
dt. Then dx carries dimensions of length and dt has dimensions of time: [dx] = L , [dt] = T. Note
that in general, for an arbirtrary quantity y, the ‘infinitesimal quantity’ dy carries the dimensions:
[dy] = [y].
10
2. A circle has a radius r with dimensions of length L. Its area is given by
A =
1
2
πr2
. (7)
Is this dimensionally consistent? A stronger question to ask is whether this
formula is correct – if not, why not?
There is one more rule of dimensional analysis which involves analysing equations
which include a sum of terms. In particular, given a quantity A = B + C + D, to
compute the dimensions [A] of A, we don’t just add the dimensions of B, C and
D:
[A] = [B] + [C] + [D], (8)
but rather, we have the consistency requirement that:
[A] = [B] = [C] = [D]. (9)
This is because B, C and D should all separately have the same units. As such,
this observation is very useful for determining the dimension of multiple unknown
quantities in an equation that involves a sum of different terms. For example, the
area of a toddler house drawing is given by: AHouse = ATriangle + ASquare =
1
2bh + a2, where b is the base length of the triangle, h is its vertical length and
a is the length of the sides of the square. Therefore, [AHouse] = [ATriangle] =
[ASquare] = 2L, hence [1
2bh] = [a2] which implies [b] + [h] = 2[a] = 2L.
One last concept: A dimensionless constant, C, is defined to be a quantity which
has no dimensions – hence [C] = 0. These are fundamentally important in the
description of a physical system since they do not depend on the units you choose.
Thus, in some manner they are represent a ‘universal’ quantity or property – indeed,
the dimensionless constants of a system describe a universality class 4.
To answer the following questions, try not to worry too much about terminology
or new and abstract concepts. We are only interested in dimensions – so if you
stay focused and don’t get distracted by the extra information, you can finish them
quickly with no prerequisite knowledge!
Exercise 3 1. A hypercube living in d dimensions has d sides, each with length
a and dimensions of length L. Its hyper-volume has units of Ld and is given
by the formula
V = aD
. (10)
4
A more precise meaning of this statement can be found in the theory of ‘Renormalization
Groups’.
11
Verify that this is dimensionally consistent – i.e. show that [V ] = L + ... +
L = d × L. What dimensions would its surface area have? Hint: this would
be same the as dimensions of the area of one of its ‘faces’.
2. The U.S. Navy invests a significant amount of money into acoustic scatter-
ing studies for submarine detection (SONAR). As part of this research, the
Dahlgren Naval Academy uses ‘prolate spheroidal harmonics’ (vibrational
modes of a ‘stretched sphere’) to do fast, accurate scattering calculations. In
this process, a submarine can be approximated to be the shape of a ‘prolate
spheroid’ or ‘rugby ball’. A prolate spheroid is essentially the surface gen-
erated by rotating an ellipse about its major axis. Given a prolate spheroid
with a semi-major axis length a and semi-minor axis of length b, its volume
is
V =
4π
3
ab2
(11)
Is this formula dimensionally-consistent? What about the following formula
for the surface area (it should have units of length-squared):
S = 2πb2
(1 +
a
be
sin−1
(e))? (12)
Note, sin−1
is the ‘inverse sine’ or ‘arcsine’ function. It necessarily pre-
serves dimensionality, hence [sin−1
(e)] = [e]. The variable e is the ‘ec-
centricity’ of the spheroid. It is a dimensionless quantity: [e] = 0, which
measures how ‘stretched’ the spheroid is – i.e. how much it deviates from a
sphere. It is given by the (dimensionally-consistent!) formula:
e2
= 1 −
b2
a2
. (13)
A perfect sphere corresponds to e = 0, where as an infinitely stretched
sphere corresponds to e → 1.
3. In a parallel-universe, Andrew Forrest has a dungeon with BF flawless black
opals inside it. From a financial point of view, these have dimensions of
money $ – i.e. [BF ] = $. A machine recently designed by Ian McArthur,
head of physics at UWA, uses quantum fluctuations of the spacetime vacuum
to produce black opals at a rate of RUWA black opals per minute. Sensing
the loss of his monopoly on the black opal market, Andrew Forrest employs
a competing physicist at Curtin University to create a quantum vacuum sta-
bilizer. This reduces the number of black opals that Ian can produce per
minute by RC black opals per minute, where |RC|≤ RUWA. Working on
12
a broad concept problem, a team of first year students at St. George’s col-
lege come up with the following model to predict the value V of shares in
Forrest BlackOps inc. on the stockmarket as a function of time t (time has
dimensions T):
V = β
D
BF
− λ(RUWA + RC)τDe−λ(1− t
τ
)
(14)
where the constant τ (having dimensions of time T) denotes 5 the time at
which European Union is predicted to collapse. Furthermore, D is a function
that measures the market demand for black opals (with no dimensions) and
β is an economic constant predicted by game theory with units of money-
squared: $2. Finally, λ is a dimensionless parameter (so [λ] = 0) that de-
pends on the number of avocados served at the college since the establish-
ment of St. George’s Avocadoes Anonymous up to the given time t.
Is this model dimensionally consistent – i.e. does [V ] = $?
What about the following formula, proposed by students from St. Catherines
College (who didn’t practice dimensional analysis)?
V =
D
BF
− D2
e−t
(15)
On dimensional grounds, list two reasons why this model incorrect.
4. Bonus Question (Don’t worry about the physics, just keep track of dimen-
sions and rules)
The Harvard-Smithsonian Center for Astrophysics is about to release a press-
conference tomorrow (March 17, 2014), indicating the discovery of gravi-
tational waves. Gravitational waves are ripples through spacetime created
by large gravitational disturbances in the cosmos – for example, exploding
stars and coalescing black-holes. These are predicted by Einstein’s theory
of General Relativity – a theory in which gravity is a simple consequence
of the geometry (shape) of spacetime. In this theory, choosing natural units
for the speed of light: c = 1, time and spatial length become dimensionally
equivalent: T = L. Therefore, dimensionally we have: [time] = [distance]
and [c] = [distance/time] = L − T = 0. A geometry which models gravi-
tational waves is described by the following metric (an abstract object which
tells you how gravity and measures of time and length vary at each point in
spacetime):
g = η + h (16)
5
This is the Greek letter tau – not the Roman letter t.
13
where η is a flat-space metric (describing an empty universe):
η := −dt + dx ⊗ dx + dy ⊗ dy + dz ⊗ dz (17)
and h is a symmetric-tensor, given in de-Donder gauge by
h := cos(k · r)A +
1
2
× trace(h) × η. (18)
Here is a small (<< 1) dimensionless parameter: [ ] = 0 and A is a sym-
metric tensor field with dimensions of length-squared: [A] = 2L. Note that
the trace operation turns tensors into scalars, so it removes the dimensional-
ity of a tensor: [trace(h)] = 0. Furthermore, consider · as another form of
multiplication. Since the wave vector k and position vector r have inverse
units, we have [k] = −L, [r] = +L – hence [k · r] = 0. For the purposes
of dimensional-analysis, we can treat the tensor product ⊗ as ordinary mul-
tiplication also. The differential quantities have the following dimensions:
[dt] = [dx] = [dy] = [dz] = L, hence [dx ⊗ dx] = 2[dt] = 2L for
example. Since x, y, z, t represent coordinates in spacetime, we also have
[x] = [y] = [z] = [t] = L.
Show that the metric g demonstrates a dimensionally-inconsistent solution
to the Einstein field equations. Where is the error? Suggest what could be
done to this metric to ‘fix’ it and give a dimensionally-consistent solution.
Remark: If you were certain that the equation for h was correct, it would be
unnecessary to tell you the dimensions of A – you could work it out since you
already know [cos(k · r)] = 0 (the function cos(something) is necessarily
dimensionless). Therefore, pretending [A] is unknown, prove that [A] = 2L
given all the other information.
After completing the last two problems, one should realize that much time can be
saved by ignoring most of the information and concentrating only the dimensions
of the variables and constants in the given formulas. This is true in general! There-
fore, to do dimensional analysis, one need not necessarily understand the science
or mathematics behind an equation – but simply the dimensions of the quantities
involved. Therefore, it is an easy way to show when something is wrong without
knowing what you are talking about. 6
6
Dimensional analysis would have saved the present author about 100 hours of supergravity cal-
culations – time which was largely lost due to two dimensionally-inconsistent equations in a pub-
lished journal article.
14
4.2.1 Moral of the story
Dimensional analysis can tell you when an equation is wrong, but it doesn’t nec-
essarily imply that an equation is correct – even though its dimensions might be
consistent. As a student, you should make use of dimensional analysis whenever
you can – try it on all formulas you get which have dimensionful quantities. This
will help you to gain a strong intuition of whether or not statements and equations
are sensible and consistent. This helps you to be a fast calculator and it will also
help you to pick up errors in your lecture notes ...
5 Tutorials 2 - Dimensional Analysis and the Buckingham
Pi Theorem (part II)
5.1 Background
One of the key concepts in dimensional analysis is that of dimensionless parame-
ters. Dimensionless parameters are important, because they allow you to charac-
terise both physical and theoretical mathematical systems in a scale-invariant way.
Note that mastering the following concepts and exercises requires a good under-
standing of the material in Tutorial 1. For the more mathematically inclined, one
of the examples and exercises illustrates how to mathematically prove the π theo-
rem by using the rank-nullity theorem from linear algebra – this is a good exercise
for understanding matrix equations and the correspondence between matrices and
simultaneous equations! For the applied minds, we use dimensional analysis to in-
vestigate and form dimensionless constants to characterise the harmonic oscillator,
viscous fluids, electromagnetism and Einstein’s theory of gravity.
BIG DISCLAIMER: Notation
Note that for the most part, we have used ‘additive notation’ to denote the dimen-
sions of some quantity – e.g. [Force] = M + L − 2T. However, in engineering
and sometimes in physics 7, you will often see multiplicative notation being used –
meaning F has dimensions ML
T2 . For these tutorials, we have referred to the later as
the ‘units’ of F, rather than its dimensions. Technically speaking, both are correct
– although units typically refer to some standard of measure, such as kilograms
or kg for the standard SI unit of mass. Here we’ve just taken M, L, T to refer to
both dimensions and their respective standard units. After some practice, it should
7
In particle physics and quantum field theory, additive notation is common for computations as it
is the smarter way to do things.
15
be easy interchange between conventions – the reason we use additive notation
is that it’s faster to calculate dimensions this way and it is less prone to mistakes
(since you are adding and subtracting instead of multiplying and dividing). Fur-
thermore, additive notation makes it easier to prove things like the π theorem for
dimensional
A physical system in the mathematical sciences typically consists of:
1. A set of physical parameters.
2. A set of governing equations which describe the behaviour or evolution of
the system.
3. A set of fundamental ‘units’ which describe the dimensionality of the sys-
tem.
5.2 Examples and Problems
Example 2 Lets take a simple, but profound 8 example – the simple harmonic
oscillator. One example of a simple harmonic oscillator, is a mass placed on a fric-
tionless tabletop attached to a spring. This string is either stretched or compressed,
then released so that the mass proceeds to undergo simple harmonic motion. This
physical system is therefore described by
1. A set of 4 physical parameters: the spring constant κ and the initial position
x0 and initial velocity v0 of the mass m.
2. An equation of motion called ‘Hooke’s Law’ 9, which says that when you
stretch or compress the spring, the force acting to restore the spring to its
natural length is given by:
F = −κx (19)
where x is the displacement of the mass attached to the spring. Combining
this with Newton’s 2nd Law, F = ma, we get the equation of motion for the
spring:
m
d2x
dt2
= −κx, (20)
where a = d2x
dt2 is the acceleration of the spring.
8
Despite its simplicity, the (quantum) harmonic oscillator is the cornerstone for modern quantum
field theory and particle physics. In this picture, a quantum field is an infinite continuum of simple
harmonic oscillators, whose motion is captured by Fourier theory, Lie algebras and Special Relativity.
9
After the famous pirate, Captain Hooke.
16
3. A set of 3 physical units: mass M, time T, length L (usually kilograms,
seconds, metres).
Now, from these 4 parameters and 3 physical units, I claim that we can form one
dimensionless constant. To do this, one needs to know the dimensions of the pa-
rameters involved. Clearly initial displacement has dimensions of length and initial
velocity has dimensions of length /time: [x0] = L, [v0] = L − T. To work out the
dimensions of the spring constant κ, we inspect the equation of motion.
Since acceleration has dimensions of length over time-squared, we have [d2x
dt2 ] =
L − 2T. Therefore, we have
[m
d2x
dt2
] = [−κx] =⇒
[m] + [
d2x
dt2
] =[κ] + [x]
M + L − 2T =[κ] + L =⇒
[κ] =M − 2T. (21)
Note that the mathematical symbol ‘ =⇒ ’ means ‘implies’. Now that we have the
dimensions of all parameters in this system, we can form a dimensionless product.
In particular, we need one inverse mass factor and two factors of time to cancel the
dimensions in [κ] = M −2T. We can get an inverse unit of mass from [ 1
m ] = −M
and two inverse time units by combining [x0] = L and [v0] = L − T. In particular,
[(x0
v0
)2] = 2[x0] − 2[v0] = 2L − 2(L − T) = 2T. Hence, we get the dimensionless
constant:
G :=
k
m
(
x0
v0
)2
=⇒
[G] =[
k
m
(
x0
v0
)2
]
=[k] − [m] + 2([x0] − 2[v0])
=M − 2T − M + 2T = 0. (22)
Since the constant G has no formal name, we will claim it and call it the ‘Georgian
Constant’ after St. George – the patron saint of dimensional analysis.
The last example illustrated a few important concepts. First of all, we showed
that mathematically all the information about a physical system is giving by a set
of parameters, a set of physical units or dimensions and at least one governing
equation. Second, we showed how we can work the units of an otherwise unknown
constant by using dimensional analysis – this is how we found the dimensions of
the spring constant κ.
17
Finally, we showed in this particular case, having 4 parameters and 3 physical
units, we were able to form one dimensionless constant: G . Although we could
have taken any multiple or power of this constant and still arrived at dimensionless
quantity, there essentially only one independent product that we can form out of
the parameters in the simple harmonic oscillator. This is because G, 1
G , G2 or 2G
for example, all contain the same ‘information’.
The last observation is one example of the ‘fundamental theorem of dimensional
analysis’, also known as the ‘π theorem’.
Theorem 1 (Buckingham Pi Theorem) Given a system specified by n indepen-
dent parameters and k different physical units, there are exactly n−k independent
dimensionless constants which can be formed by taking products of the parameters.
Thus in the last example, we saw that the simple harmonic oscillator was described
4 parameters and 3 physical units – hence as claimed, there was indeed only 4−3 =
1 independent dimensionless constant that we could have formed. Hence, any other
dimensionless constant in this system must be some multiple or some power of G.
Before doing the exercises, here is one more example from fluid mechanics.
Example 3 In fluid mechanics, the notion of the ‘thickness’ of a fluid is formalized
by defining its ‘viscosity’. In particular, the dynamic or shear viscosity of a fluid
measures its ability to resist ‘shearing’– an effect where successive layers of the
fluid move in the same direction but with different speeds. For example, relative
to water, glass 10 and honey have a very high shear viscosity, whereas superfluid
Helium has zero viscosity 11.
Given a fluid trapped between two parallel plates–the bottom plate being station-
ary and the top plate moving with velocity v parallel to the stationary plate, the
magnitude of the force required to keep the top plate moving at constant velocity
is given by:
F = ηA
v
y
(23)
Here v is the speed (magnitude of the velocity) of the top plate, A is its surface
area and y is the separation distance between the plate. The parameter η is defined
to be the shear viscosity of the fluid. We can calculate its units using dimensional
analysis. First, from Newton’s 2nd law we know that the force has the dimensions:
[F] = M + L − 2T. Furthermore, the area A has dimensions of length-squared
10
The myth about old church windows sagging is not due to the fact that glass can be modelled as
a viscous liquid, but rather due to the glass-making techniques of past centuries.
11
The transition to the ‘superfluid’ phase occurs below 1 Kelvin – i.e. close to absolute zero
temperature.
18
[A] = 2L, the speed v has dimensions [v] = L − T and the separation y has
dimensions [y] = L. Hence
[F] =[η] + [A] + [v] − [y] =⇒
[η] =[F] − [A] − [v] + [y]
=(M + L − 2T) − 2L − (L − T) + L
=M − L − T (24)
whence η has units of M
LT . Now, the kinematic viscosity ν 12 of the fluid is defined
as the ratio of the dynamic viscosity η and the density ρ (mass per volume) of the
fluid:
ν =
η
ρ
. (25)
Since density has units of mass per length-cubed, we have [ρ] = M − 3L and thus
[ν] = [
η
ρ
] = [η] − [ρ] = M − L − T − (M − 3L) = 2L − T. (26)
In some set of scenarios, we can think of this fluid as parameterized by four pa-
rameters: density ρ, shear viscosity η , kinematic viscosity ν and the fluid speed v
(assuming the fluid only travels in the horizontal direction). Since we have three
different physical units – mass, length and time, the Pi theorem tells us we can
form one independent dimensionless constant. This special, widely-used constant
is called the ‘Reynolds number’ of the fluid and is defined by:
R =
ρvl
η
=
lv
ν
(27)
where l is the ‘characteristic length scale’ for the fluid system (e.g. for a fluid
flowing in a pipe, this length scale would be the diameter of the pipe).
In essence, the Reynolds number expresses the ratio of inertial forces to the viscous
forces. In this manner, it describes relative importance of these two types of forces
in different scenarios. Since it is dimensionless, the Reynolds number is scale
invariant – meaning it characterises the way a fluid will flow on all length scales
(within the valid regime of your theory).
Exercise 4 We defined the Reynolds number R in two ways – one in terms of its
dynamic viscosity η and the other in terms of its kinematic viscosity ν. Show that
the Reynolds number is dimensionless using both of its definitions.
12
This is the Greek letter ‘nu - not the Roman letter ‘v’.
19
Example 4 (Mathematical Challenge: Proving the π Theorem) Here is a walk-
through of a proof of the Pi Theorem, using the ‘rank-nullity’ theorem from linear
algebra. For those of you who haven’t encountered matrices before, you can still
make sense of the following in terms of systems of linear equations – but that will
be trickier ... so either save it for later, or talk to your tutor.
Formally, the rank-nullity theorem states that given a m × n matrix (m rows, n
columns) A, which maps n-dimensional vectors to m-dimensional vectors, then
the rank and nullity of the matrix A satisfy:
rank(A) + nullity(A) = n (28)
where the rank of A is defined as the number of linearly independent rows of A
and the nullity of A is defined as the dimension of the kernel of A – i.e. the number
of linearly independent n-dimensional vectors which get mapped to 0 by A. Note
that m ≤ n necessarily (or the system is over-determined).
Now, in the context of dimensional analysis and the π Theorem, we can think a
mathematical or physical system with n parameters and k different types of fun-
damental units (dimensions) as a system of k linear equations in n unknowns, as
follows. Say for example, we have three parameters x, y, z and two fundamental
physical units U1, U2. Then we can represent the dimensions of our parameters
as a matrix by letting each column correspond to different parameters and letting
each row correspond to different fundamental units. So in this example, we let the
first column correspond to the parameter x, the second column to y and the third
column to z. Then the first row corresponds to the unit U1 second row to the unit
U2. Then the entry in the first row and column corresponds to the number of di-
mensions of x has in the unit U1. So if for example, x has the units Ua
1 Ub
2 then it
has dimensions: [x] = [Ua
1 ] + [Ub
2] = aU1 + bU2. Similarly, let y have units Uc
1Ud
2
and z have units Ue
1 Uf
2 : hence [y] = cU1 + dU2 and [z] = eU1 + fU2. We can
form the ‘dimensional matrix’ D for this physical system, which is represented as:
D =

a c e
b d f

(29)
To see that this makes sense, we can simply act13 the transpose of the dimensional
matrix DT on the vector U =

U1
U2

containing the physical units to recover all
three of our dimensional equations [x] = aU1 + bU2, [y] = cU1 + dU2 etc. To find
dimensionless constants, we have to solve the ‘nullspace equation’:
13
By matrix multiplication.
20
a c e
b d f

!
α
β
γ
(
) =

0
0

for all possible vectors !
α
β
γ
(
). In particular, dimen-
sionless constants will be a product of powers of the different physical parame-
ters: xαyβzγ, where the exponents α, β, γ are components of a vector !
α
β
γ
(
) which
solves the nullspace equation.
The number of linearly independent vectors !
α
β
γ
(
) which solves the null-space ma-
trix equation, coincides with the ‘nullity’ of the dimensional matrix D – it is pre-
cisely equal to the number of dimensionless constants we can form. In particular,
since we have n = 3 independent physical parameters x, y, z corresponding to
three columns of our dimensional matrix D and k = 2 fundamental units U1, U2
corresponding to the two (linearly-independent 14) rows of D, the rank-nullity the-
orem tells us that the nullity of D is given by
nullity(D) = n − k = 3 − 2 = 1. (30)
Since the nullity of D is precisely equal to the number of dimensionless constants
we can form for this physical system, this shows that the π Theorem for dimen-
sional analysis, is just a special instance of the rank-nullity theorem for linear al-
gebra.
Exercise 5 (Challenge: Finish proving the π Theorem) In the previous example,
we set-up the proof of the π theorem for the general case ... but really only proved
it for the case of 3 parameters and 2 fundamental units. By extending the argu-
ment to n parameters x1, ...., xn and k units U1, ..., Uk, prove the π theorem for the
general case of arbitrary n and k.
Hint: Sketching this proof simply amounts to keeping tracking of your indices and
labels. As a suggestion, try denoting the units of x1 by Ua11
1 ...Ua1k
k and the units
of x2 by Ua21
1 ...Ua2k
k etc.
If you have completed and understood these exercises, you are well on your way
to becoming an expert in dimensional analysis. Soon you’ll be better than your
lecturers (possibly).
14
These rows are necessarily linearly independent, since we assume our fundamental physical
units to be independent – by definition.
21
6 Tutorial 3 - Return of Dimensional Analysis: Gravity,
The Hierarchy Problem and extra-dimensional Braneworlds
6.1 Introduction
The following is an extended exercise which test all the skills the tutorials have elu-
cidated so far in dimensional analysis. It will also you introduce to some concepts
which may be new and bizarre, whilst linking them back to everyday reality. The
overall goal will be to derive a dimensionless constant that characterises classical
gravity on all length scales (no knowledge of relativity is required)! By comparing
this constant to another dimensionless constant from electromagnetism, we will see
why gravity is so much weaker than the other three forces in nature – then investi-
gate a solution to this peculiarity using brane-world models of the universe.
6.2 Background
As far as we understand, all interactions in nature take place through four funda-
mental forces. At present, we have a rather ‘successful’ theoretical and experimen-
tal quantum description of three of these forces – that is, we have constructed quan-
tum field theories to describe the ‘quanta’ (particles) which mediate these forces.
Gravity, despite our everyday experience of it, remains somewhat mysterious and
theoretically elusive in several ways – in particular, because it is highly resistant to
all attempts to turn it into a quantum theory like the other forces. As a reminder,
the four forces dictating our universe are the
• Electromagnetic Force: Which governs electromagnetic radiation (such as
light) as well as interactions between charged particles. In the quantum de-
scription (Quantum Electrodynamics), this force is carried by massless par-
ticles known as ‘photons’.
• Weak Nuclear Force: In the quantum description, this force is mediated by
massive particles known as the Z and W± bosons. It is involved in quark
transformations as well as some interactions between charged particles.
• Strong Nuclear Force: In the quantum description (Quantum Chromodyan-
mics), this force is mediated by ‘gluons’ and is responsible for the interac-
tions between quarks, which are the particles making up hadrons such as the
proton and neutron. In this manner, it is responsible for processes such as
fusion, which is the source of energy for our sun.
22
• Gravitational Force: In the attempted quantum descriptions, this force is
mediated by a massless particle known as the ‘graviton’. It is responsible for
the interactions of all particles with mass, but also determines the trajectories
of massless particles (e.g. gravitational bending of light) since it warps the
spacetime continuum.
At higher energies, these four forces start to unify into one single force – for ex-
ample, the electromagnetic and weak nuclear forces unify to make the electroweak
force. Attempts to unify the electroweak and strong nuclear forces have been par-
tially successful and fall under ‘The Standard Model’ of particle physics. On the
other hand, attempts to unify gravity with the other forces have been largely un-
successful, with the only real promising candidate being String Theory.
One of the biggest mysteries about the gravitational force, is why it is so weak com-
pared to the other forces in nature. In some sense this is ‘unnatural’, hence suggests
that on some deeper level, gravity is fundamentally different form the other forces.
As the goal of this tute, we will use dimensional analysis to characterise the grav-
itational and electromagnetic forces with some special dimensionless constants –
then compare their strengths to prove this claim. Finally, we will end on some very
recent 15 advancements in theoretical physics which propose an explanation of why
gravity is the weakest of the four forces.
6.3 Extended Problem
Exercise 6 (Newton, Einstein and Braneworlds: The Gravitational Coupling Constant)
Of the many things that Isaac Newton is famous for, one of them is coming up with
multiple mathematical proofs of the fact that the planets orbit the sun in elliptical
paths – and that this elliptical motion is a direct consequence of an inverse square
law. Thus, by planar geometry and calculus he came up with the following gravi-
tational force law to explain the astronomical observations of Johannes Kepler and
Tycho Brae:
F = −GN
m1m2
r2
ˆr (31)
where GN is Newton’s gravitational constant, m1 and m2 are the masses of two
objects separated by a distance r and ˆr is a ‘unit vector’ (vector with magnitude 1)
pointing from one object to the other. This tells us the gravitational force that one
massive object exerts on another massive object.
15
The last 5-10 years.
23
QI:Using Newton’s 2nd Law, F = ma, deduce the dimensions or units of GN .
Note that you are working with mass, length and time (M,L,T) as your fundamental
units, hence [m1] = [m2] = M. Furthermore, by definition the unit vector 16
ˆr = r2−r1
|r2−r1| is dimensionless: [ˆr] = 0. Note that in general, the dimensions or
units of a vector quantity are always the same as the units of the magnitude (and
components) of that vector – hence [r] = [r] for example.
Now that we have the dimensions of GN , we are ready to consider Einstein’s theory
of gravitation. Einstein’s theory differs from Newton’s theory in many ways – fun-
damentally it explains gravity as a consequence of spacetime curving around any
object with mass, where the ‘amount’ of curvature being greater for greater masses
(e.g. the Sun). On an astrophysical level, it is important as it helps to explain the
big bang, solar fusion and the existence of the black holes – objects which are nec-
essary for the stability of some galaxies such as the Milk Way. In terms of everyday
living, general relativity is essential for the operation of GPS satellites – without
the gravitational corrections to the timing (gravitational time-dilation) offered by
Einstein’s theory, the GPS system would not be accurate enough to work.
In Einstein’s theory, spacetime is modelled by the following objects 17
• A energy-momentum tensor T which contains information about ‘sources’
of curvature – matter and energy. It’s components have dimensions of an
energy-density: [Tab] = [ Energy
V olume ] = M − L − 2T. Since the tensor itself is
a second-rank covariant tensor, we have: [T] = [Tabdxa ⊗ dxb] = [Tab] +
[dxa ⊗ dxb] = M − L − 2T + 2L = M + L − 2T.
Note that the dimensionality of energy can be deduced from the relation:
Work = Force × Distance and hence [Energy] = [Work] = [Force] +
[Distance] = M + L − 2T + L = M + 2L − 2T.
• A metric tensor g describing how gravity distorts measures of length and
time. This has units of length-squared: [g] = 2L.
• The Riemann Curvature tensor, Riem, describes how the curvature of space-
time varies in different regions. It also measures how gravity distorts parallel-
16
Here r1 and r2 are the position vectors describing the location of the masses m1 and m2 with
respect to some origin.
17
Note that most physicists do not understand differential geometry, hence when they speak of ten-
sors they usually are talking about components of tensors. This won’t matter here, but for reference,
if you ever want to compare: covariant tensors have two extra factors of length compared to their
components and contravariant tensor have two factors less than their components – which basically
means adding ±2L to the dimensions.
24
transport. It is given roughly 18 as the anti-symmetrized second tensor ‘gra-
dient’ of the metric: Riem ∼ ⊗ ⊗ g, where are a type of derivative
operator and ⊗ is a type of multiplication for tensors.
• The Ricci tensor, Ric, is given by taking the trace of the Riemann tensor:
Ric = Trace(Riem). It describes how gravity distorts volumes and is also
related to how different geometries evolve under the heat equation.
• The Ricci Scalar R – this quantity is a function which measures how gravity
locally distorts volumes. Einstein’s theory can be derived by saying that
nature minimizes this quantity – an approach due to a mathematician named
David Hilbert 19. It is given by the taking the trace of Riemann tensor twice:
R = Trace(Trace(Riem)) = Trace(Ric).
• The speed of light, c. This universal speed limit quantifies how fast mass-
less particles can move and also how fast gravitational disturbances (gravity
waves) can propagate. It has dimensions of speed: [c] = L − T.
QII:Using the above information, derive the dimensions of Newton’s gravitational
constant GN again, this time using Einstein’s law of gravity:
Ric −
1
2
Rg =
8πGN
c4
T. (32)
You will need the following facts: the derivative operator reduces the length
dimension of a tensor by one factor, whereas the tensor product ⊗ raises it by one
factor (in this case). Hence [Riem] = 2[ ] + 2[⊗] + [g] = −2L + 2L + 2L = 2L.
Furthermore, the trace of a (covariant) tensor reduces its length dimension by two
factors, hence for example: Trace[Riem] = [Riem] − 2L.
Tip: To ease calculations, you may use so-called ‘natural units’ where the speed
of light c = 1. In these units length and time have the same dimensionality, hence
[c] = [Distance] − [Time] = 0 and T = L. You will then get the dimensions of
GN in natural units which you can compare to your value of GN using Newton’s
Law, after you set T = L.
Finally, we are in a position to understand a very special dimensionless constant –
the ‘gravitational coupling constant’, αG. Since it is dimensionless, this constant
characterises the strength of the gravitational force on all length scales (within the
regime of validity of Einstein’s theory). It can be defined in terms of any pair of
stable elementary particles – in practice, we use the electron.
18
Don’t ever show this to a differential geometer. If you want the real definition, see me.
19
In retrospect, David Hilbert deserves almost the same level of credit as Einstein for the theory
of general relativity.
25
In particular, we have:
αG =
GN m2
e
¯hc
≈ 1.7518 × 10−45
(33)
where c is the speed of light, GN is Newton’s gravitational constant and me is the
mass on an electron. The quantity ¯h = h
2π is the reduced Planck constant which
characterises the scale at which matter exhibits quantum behaviour such as wave-
particle duality 20
QIII:Show that the gravitational coupling constant αG is indeed dimensionless.
Note that [me] = M. To work out the dimensions of ¯h = h
2π , you will need the
Planck-Einstein relation which relates the energy of a photon (particle of light) its
frequency:
E = hf. (34)
Then [h] = [E] − [f]. Since the frequency of light is the number of oscillations
of the electromagnetic wave per unit time, we have [f] = −T. You can get the
dimensions , [E] of energy E from the calculation shown above for the energy-
momentum tensor.
Now, for the last part of this problem, we introduce one more fundamental phys-
ical unit: the unit of electric charge, Q 21. Similar to the gravitational coupling
constant, there is a dimensionless constant which characterises the strength of the
electromagnetic interaction (which is responsible for almost all of chemistry) – the
‘fine structure constant’ αEM . The value of this constant is (accurately) predicted
and measured using the theory of Quantum Electrodynamics, which is a type of
quantum field theory largely due to Richard Feynmann and Freeman Dyson. It is
given by
αEM =
1
4π 0
e2
¯hc
(35)
where 0 is electric permittivity of the vacuum. It has units [ 0] = [Farads/Meter] =
[Seconds4 Amps2 Meters−2 kg−1] = 4T + 2Q − 2T − 2L − M. Hence
[ 0] = 2T + 2Q − 2L − M. The parameter e is the charge of an electron, with
dimensions [e] = Q.
Using ‘natural units’ – a popular convention in particle physics, we set all of our
previous parameters to equal 1. Thus, 4πGN = c = ¯h = 0 = 1, where 0 is
20
If ¯h was really large – say ¯h ≈ 1 for example, then we would observe wave-particle duality on
a macroscopic scale and the universe would be a scary, crazy place. Bullets would diffract through
doorways and Leanora’s fists could quantum tunnel through walls.
21
The SI unit for charge is Coulombs.
26
electric permittivity of the vacuum. In these units, the fine-structure constant is
given by
αEM =
e2
4π
≈ 7.297 × 10−3
. (36)
QIV:Choosing natural units: 4πGN = c = ¯h = 0 = 1, is the same as forcing
these parameters to be dimensionless. Show that this is equivalent to setting all the
fundamental units to be the same T = L = M = Q. Hint: you should get four
equations for the dimensions of these parameters.
Note that you can calculate the values of the fine-structure and gravitational cou-
pling constants yourself by Googling their values in SI units (or any other consis-
tent set of units you choose). Taking their ratio, we see that (in natural units):
αEM
αG
= (
e
me
)2
≈
7.297 × 10−3
1.752 × 10−45
≈ 4.16 × 1042
. (37)
This says that the electromagnetic force is about 42 orders of magnitude22 stronger
than the gravitational force. In a similar fashion, the weak-nuclear force is about 32
orders of magnitude (1032) times stronger than gravity. The challenge to explain
why gravity is so weak compared to the other forces is known as ‘the heirarchy
problem’.
One class of attempts to solve the heirarchy problem, involves the visible universe
being confined to a 4-dimensional ‘brane’, which is basically a 4-dimensional slice
living in a larger spacetime. Such models are called ‘braneworld models’. In this
view, the electromagnetic, weak and strong nuclear forces take place on the 4-
dimensional brane – but gravitational interactions (mediated by particles known as
‘gravitons’) take place in 4-dimensions and in the ‘large extra dimensions’. This
then gives a natural explanation to the gravitational coupling constant being so
small. In some variations 23, the introduction of large extra-dimensions also solves
the ‘Dark Energy’ or ‘Cosmological Constant’ problem – where Dark Energy nat-
urally arises as the ‘surface tension’ of the 4-dimensional brane. Using braneworld
models, we can derive (!) Newton’s gravitational constant directly from the size
(‘hyper-volume’) of the extra dimensions in our universe.
A very special class of braneworld models , known as known as theories with
‘Supersymmetric Large Extra Dimensions’ envisions spacetime as 6-dimensional
(4-dimensional brane + 2 large extra dimensions) with some extra symmetry added
(super-symmetry) that enables bosons and fermions to transform into each other
22
Note, 42 is also the meaning of life.
23
Those investigated in the present author’s masters thesis.
27
24. In these models, the extra-dimensions take the form of some compact hypersur-
face. Newton’s gravitational constant GN is then theoretically explained using the
formula 25:
GN =
3κ2
16πS
(38)
where S is the surface-area of the extra dimensions and κ is Einstein’s constant,
with dimensions [κ] = [GN ].
QV:The above formula for GN is correct, even though it may look dimensionally
incorrect. What units would S need to have for dimensional consistency? In that
case, what quantity does the surface-area S actually represent? Hint: Recall the
‘unit vector’ in Newton’s law of gravity.
The last problem illustrates a common theme in engineering, physics and math-
ematics – normalization. Normalized quantities are typically dimensionless! As
such, they are very useful and friendly to work with.
24
Supersymmetry removes the problem of Tachyons in String Theory and also stabilizes the mass
of the Higgs boson.
25
First derived in this generality by the present author in 2013.
28
7 Tutorial 4: 50 Shades of Error, Shade I – Multivariable
calculus and The Total Differential
In this tutorial, we revise some elementary concepts from multivariable calculus
– partial differentiation and the ‘total differential’ or ‘exterior derivative’. If you
haven’t formally studied these topics then don’t worry – as long as you are comfort-
able differentiating functions of a single variable, the rest will follow easily.
After revising these mathematical tools, we will see how they are used in error
analysis. In particular, the total differential provides an elegant way to compute
the absolute error for any derived quantity, in terms of your experimental preci-
sion error. This is extremely useful for the applied sciences and engineering. For
those of you who are only interested in pure mathematics, then note that the tech-
niques used here are precisely the same techniques that are used when you study
linear approximations 26 and Taylor series expansions for functions of more than
variable.
Note that this tutorial is the first of a sequence of tutorials that will be dedicated to
error analysis, least-squares regression (e.g. line of best fit) and other techniques
that you will use frequently in statistics and the applied sciences to determine the
value of derived quantity, along with an estimate of its corresponding error. As
such they will successively build on each other.
7.1 Russian Playpen: Functions of more than one variable
Given a function f of one variable, which maps real numbers 27 R to real numbers
R, we can formally28 express it as:
f :R → R
:x → f(x),
which says that f sends the number x another number f(x). For example, if we
have the function f(x) = x2 whose graph is a parabola, then we write:
f :R → R
:x → x2
.
26
Linear or ‘tangent plane’ approximations are just a special case of a Taylor series expansion.
27
The ‘blackboard font’ r, denoted as R, symbolizes the set of ‘real numbers’. This is includes all
integers, rational numbers, irrational and transcendental numbers (such as π) etc.
28
Technically, you restrict the set f is mapping from to its domain and the set it is mapping into, to
its range.
29
So for example, in this case we have f(1) = 12 = 1 and f(7) = 72 = 49
etc.
We now generalize this as follows. A function f of more than one variable, maps
several copies of the set of real numbers to several copies of the set real numbers.
For example, a function f of two variables, x and y, can be formally expressed
as
f :R × R → R × R
:(x, y) → f(x, y).
Here the notation R × R means the set of all ordered pairs of real numbers (x, y).
So for example, if we have the circular function given by: f(x, y) = x2 + y2, then
we have f(1, −1) = 12+(−1)2 = 1+1 = 2 and f(π,
?3) = π2+(
?3)2 = π2+3
e.t.c.
Note, there is nothing strange about functions of several variables. You see them
everyday. For example, we can view the volume of a rectangular box with sides of
length x, y and z as a function of three variables:
V (x, y, z) = xyz. (39)
Or, as another example, the concentration C of a substance dissolved in water will
depend on the amount (‘mass’ or any other measure) m of the substance dissolved
and the amount (volume) of water (or any other liquid) v it is being dissolved into.
Thus we can consider the blood alcohol concentration C of a student at PROSH as
a function of two variables: C = C(m, v), where m is the amount of alcohol and
v is the amount of blood in that person.
7.2 Russian Daycare: Partial Differentiation
To some, partial differentiation may sound hard. However, it is actually extremely
simple – hence why it is taught to children at daycare in Russia. All you need to do,
is differentiate your function with respect to some chosen variable, while treating
all the other variables as constants.
The notation for partial derivatives is given by ‘del’ symbol, ∂. So for example,
if we are taking the usual total derivative with respect to x, we have the Leibniz29
29
Leibniz was Austrian version of Newton, or Newton was the English version of Leibniz. Leibniz
developed calculus at the same time as Newton as well as several other fields of mathematics – such
as binary numbers.
30
notation d
dx. If we are taking a partial derivative with respect to x, we use the
notation ∂
∂x instead. The best way to illustrate is with a few examples.
Example 5 (Return of the Box) Our rectangular box has now followed us into
Tutorial 4. Having being stalked by this sentient box, we decide to partially differ-
entiate its volume. Denoting the length of each of its sides by the variables x, y
and z respectively, its volume is given by the following function of three variables:
V (x, y, z) = xyz. Partially differentiating it with respect to x, we find:
∂
∂x
V (x, y, z) =
∂
∂x
(xyz) = (
∂x
∂x
)yz = yz. (40)
What we did here was to treat y and z as constants, while differentiating with
respect to x. Since the derivative of x with respect to x is just 1, we arrived at the
above result. We show similarly that:
∂
∂y
V (x, y, z) =xz
∂
∂z
V (x, y, z) =xy. (41)
Now, if we differentiate twice with respect to x, or twice with respect to y, we get:
∂2
∂x2
V (x, y, z) =:
∂
∂x
∂
∂x
V (x, y, z) =
∂
∂x
(yz) = 0
∂2
∂y2
(xyz) =:
∂
∂y
∂
∂y
(xyz) =
∂
∂y
(xz) = 0, (42)
where the notation ∂2
∂x2 denotes the ‘second partial derivative’ with respect to x.
Note that differentiating the volume with respect to x the second time, gives zero
since the first derivative of V (x, y, z) with respect to x no longer depends on x –
i.e. the product (yz) is a constant with respect to x, hence its partial derivative with
respect to x vanishes.
We can also take mixed derivatives. For example, differentiating V (x, y, z) first
with respect to x and then with respect to y, gives:
∂
∂y
∂
∂x
V (x, y, z) =
∂
∂y
(yz) = z. (43)
Now, if take the derivatives in reverse order – y first, then x, we get
∂
∂x
∂
∂y
V (x, y, z) =
∂
∂x
(xz) = z, (44)
31
which is exactly the same as taking the derivatives in original order. This illustrates
an importantly and extremely consequential property of functions of more than
one variable – in general, for ‘nice’30 functions (most functions you will ever deal
with), the order in which you take two partial derivatives doesn’t matter. That is,
for nice functions f, we have
∂
∂y
∂
∂x
f(x, y, ...) =
∂
∂x
∂
∂y
f(x, y, ...). (45)
This observation is formalized as ‘Clairaut’s Theorem’ (or ‘Schawrz’s Theorem’)31.
Exercise 7 (Sir Steven – The Suspicious Spheroid) A solid oblate spheroid (‘squashed
sphere’)32, by the name of Sir Steven, suspiciously follows our friend – the rect-
angular box, into Tutorial 4. Sir Steven was produced by rotating a filled-ellipse
about its minor (shorter) axis. At his present age, Sir Steven has a minor axis
length of 2b and a major axis length of 2a. Since being knighted, Steven has taken
to a gluttonous lifestyle (hence a b). During Lent, Sir Steven decides to read
Allen Mandelbaum’s translation of Dante Alighieri’s Divine Comedy – and in the
midst of an epiphany, he decides to calculate his own volume, which is given as
a function V (a, b) of two variables a and b (the semi-major and semi-minor axes
lengths):
VSteven =
4π
3
a2
b. (46)
30
This means functions with continuous second-partial derivatives. More general, the ability to
commute the order of partial derivatives holds at any given point provided that the function has
continuous second partial derivatives in some open neighbourhood about that point.
31
After the French and German mathematicians, Alexis Clairaut and Hermann Schwarz, respec-
tively.
32
This problem is dedicated to Nicholas Jones, University of Bristol and his love of spheroids.
32
Q: Compute the following partial derivatives of V (a, b):
∂
∂a
V (a, b) =
∂
∂b
V (a, b) =
∂2
∂a2
V (a, b) =
∂2
∂b2
V (a, b) =
∂
∂b
∂
∂a
V (a, b) =
∂
∂a
∂
∂b
V (a, b) =
∂3
∂a3
V (a, b) =. (47)
Challenge Q (Russian Grade 1): With the help of his intelligent friend, Pappus 33
the Prolate Spheroid, Sir Steven manages to compute his surface area as a function
S(a, b) of two variables a and b:
SSteven = 2πa2
{1 +
1 − e2
e
tanh−1
(e)} (48)
where the eccentricity e of the generating-ellipse is defined via e2 = 1− b2
a2 . Using
this surface area formula, compute the following partial derivatives:
∂
∂a
S(a, b) =
∂
∂b
S(a, b) =
(49)
Hint: You will need to use the product (Leibniz) rule for differentiation along
with chain rule and the following identity for the derivative of arc-hyperbolic tan34
(hyperbolic tan inverse):
d
dx
tanh−1
(x) =
1
1 − x2
. (50)
33
Pappus claimed that Hippasus – a student of the Ancient Greek Pythagorean school of geometry,
was drowned for proving (or sharing) the ‘secret’ irrationality of
?2.
34
Sometimes the notation artanh(x) is used instead of tanh−1
(x) to the inverse hyperbolic tan-
gent function.
33
This derivative is well-defined for all real-values of x such that |x| 1. Thus,
you should replace x with e then use the chain rule to get partial derivatives of
tanh−1
(e) with respect to a or b, since e is a function of a and b.
Extra-Challenging Q (Russian Grade 1.1): If you think you have what it takes
to pass Grade 1 in Soviet Russia, compute the following partial derivatives then
check them using Wolfram alpha or Mathematica / Computer Algebra Software of
Choice:
∂2
∂a2
S(a, b) =
∂
∂b
∂
∂a
S(a, b) =
∂2014
∂a2014
S(a, b) = . (51)
Hint: If you can derive an expression for ∂n
∂an S(a, b) where ‘n’ is an arbitrary
positive integer (n = 1, 2, 3....) the last equation should be easy.35
7.3 Russian Kindergarten: The Exterior Derivative (Total Differen-
tial)
The ‘total differential’ or ‘exterior derivative’ of a function f, is denoted by df
– the resulting object is known as an ‘exact differential 1-form’ or ‘co-vector’. We
will see why it has the latter name shortly. To illustrate how to compute df, we
give a few examples then state the general case.
Given a function f = f(x) of a single variable x, its total differential is given
by:
df =
df
dx
dx. (52)
The quantity df
dx is a function (the derivative of f with respect to x), however the
quantity dx can be thought of in several ways. Formally, dx is a ‘differential 1-
form’ or basis ‘co-vector’ analogous to the standard basis vectors you may have
seen 36 e1, ˆx or ex. Informally, it can be thought of as infinitesimal quantity or
length in the x-coordinate. You will also recall that when you integrate a function
f = f(x) with respect to the variable x, you write it as:
f(x)dx. (53)
35
Disclaimer: It’s probably not easy, relatively speaking.
36
These are some of the more common notations.
34
If we replace f with its derivative df(x)
dx , then we have
df(x)
dx
dx = f(x) + c (54)
where c is some constant of integration; this is just a consequence of the funda-
mental theorem of calculus. Note that however, we said that df = df(x)
dx dx, so we
can actually view this statement as:
df(x)
dx
dx = df = f + c. (55)
In this manner, we can think of as a formal inverse37 of the ‘exterior derivative’
or ‘total differential’ operator d.
For a function f = f(x, y) of two variables x and y, computing its total differential
requires partial derivatives. In particular, we have
df =
∂f
∂x
dx +
∂f
∂y
dy. (56)
The object df is still a differential 1-form, but now it has two components: ∂f
∂x is
the component in the dx direction and ∂f
∂y is the component in the dy direction.
Alternatively we say ∂f
∂x is the coefficient of dx and ∂f
∂y is the coefficient of dy.
Hence we see that the total differential df of the function f, behaves similarly to a
2-dimensional vector (when f is a function of two-variables) – which motivates the
name ‘co-vector’ to describe df.
We generalise this now, in the most natural way. For a function f of n variables
x1, x2, ..., xn, its total differential is given by:
df =
∂f
∂x1
dx1 +
∂f
∂x2
dx2 + ... +
∂f
∂xn
dxn. (57)
This says that we partially differentiate f with respect to each of its variables,
then multiply that derivative by basis 1-form corresponding to the coordinate you
are differentiating with respect to. Adding all of these together gives the total
differential, shown in the equation (57). This may seem a little abstract, so its
best illustrated with a few examples – which we will return to next week when we
proceed with error analysis!
Note that the exterior derivative operator d obeys the following general properties
when acting on functions:
37
This is a very simple case of the so-called “generalized Stokes’ Theorem from differential
geometry.
35
1. Linearity: d(c1f + c2g) = c1df + c2dg for any two constants c1, c2 and any
two (differentiable) functions f, g.
2. Product (Leibniz) Rule: d(fg) = g(df) + fdg, for any two (differentiable)
functions f, g.
Example 6 (Rocky the Rectangular Box) Unable to stay down, Rocky the rect-
angular box has returned to help with exterior derivatives. Rocky’s volume V is
given as a function of three variables: V = V (x, y, z), where x, y, z are the lengths
of its sides. Since V (x, y, z) = xyz, the total differential of the volume is given
by:
dV (x, y, z) :=
∂V
∂x
dx +
∂V
∂y
dy +
∂V
∂z
dz = yzdx + xzdy + xydz. (58)
Observation: Notice that coefficient of dx is equal to yz, which is the surface area
of the face of the box in the plane perpendicular to the x-direction. Similarly, the
coefficient xz of dy is the area of the face of the box in the plane perpendicular
to y-direction etc. Depending on the symmetry of an object, its surface area and
volume are usually related in some manner by the operations of differentiation and
integration.
For example, Snorlax the Sleepy Sphere, has a volume V = V (r) which is a
function of its radius. In particular, the exact differential of its volume is given
by:
dV (r) = d(
4π
3
r3
) =
4π
3
d
dr
r3
= 4πr2
dr. (59)
The coefficient of dr is the surface area of the sphere, perpendicular to the dr
direction (recall that the surface of a sphere is perpendicular to its radius). In
particular, the quantity
dV
dr
= 4πr2
(60)
is the surface area.
7.3.1 Exercises
The following exercises are split into some purely mathematical exercises – geom-
etry, along with some applied exercises (thermodynamics) for physicists, engineers
and chemists. Bonus neural connections for those who complete both sets!
Exercise 8 (Geometry of Solids)
36
1. Given a circular cylinder of radius r and height h, we can view its volume V
and surface area S as functions of two variables:
V (r, h) =πr2
h
S(r, h) =2π(r2
+ rh). (61)
Compute the exterior derivatives dV and dS.
2. An elliptical cylinder is a cylinder with elliptical cross-sections – you can
think of its as ellipses stacked on top of each other ... 38 Given an elliptical
cylinder with height h, cross-sectional ellipses with semi-major axes lengths
a and semi-minor axes lengths b, its volume V and surface area S can be
viewed as functions of three variables
V (a, b, h) =πabh
S(a, b, h) =2πab + 2πph. (62)
where p is the perimeter of the elliptical cross-sections. To express p exactly,
one requires an infinite series:
p = 2πa(1 −
∞
n=1
(2n)!2
(2nn! )4
e2n
2n − 1
) (63)
where e =
?a2−b2
a is the eccentricity of the ellipse. Using the Ramanujan39
approximation: p ≈ π[3(a+b)−
—
(3a + b)(a + 3b)], compute the exterior
derivatives dV and dS.
3. For those of you who have studied infinite series, compute dS using the exact
expression for the perimeter of an ellipse stated above.
Given these examples, in addition to the previous exercises, complete the following
problems.
Exercise 9 (Exact Differentials: Thermodynamics/Thermochemistry) Thermodynamics
is a broad theory, originally explaining the phenomenon that we know as ‘heat’.
More generally, it governs a vast range of macroscopic phenomena in nature –
from reaction rates in thermochemical processes to the surface area of blackholes.
The most famous abstraction of thermodynamics, due to Steven Hawking, Bill Un-
ruh and Jacob Bekenstein, is that the surface area of a black hole is proportional to
38
Puns – bringing English lit and mathematics together since 1600.
39
A famous Indian child prodigy and mathematical genius who made great rediscoveries and con-
tributions to number theory, estimations and analysis in isolation.
37
its entropy and its temperature is inversely-proportional to its mass 40. One of the
fundamental concepts in thermodynamics, is the minimization of different types of
so-called ‘state functions’ or ‘thermodynamic potentials’ – representing different
types of energies.
• Internal Energy: U := U(S, V, Ni) = dU, where
dU = TdS − pdV +
i
µidNi. (64)
• Hemholtz Free Energy: F(T, V, Ni) = U − TS.
• Enthalphy: H(S, p, Ni) = U + pV .
• Gibbs Free Energy: G(T, p, Ni) = U + pV − TS.
Here we stated the natural variables for each function, U, F, H and G in the brack-
ets (..). These variables are entropy S, temperature T, volume V , pressure P and
number (amount) Ni of the i th reactant species (i.e. substance, chemical etc). The
chemical potentials µi are all fixed constants. Note, for those of you haven’t seen
the sigma41 notation for summation,
i
simply means the sum over all species
labelled by the index i.
By keeping track of which variables each function is strictly dependent on and
noting the expression for dU, prove that we get the following exact differentials:
dH(S, p, Ni) =TdS + V dP +
i
µidNi
dF(T, V, Ni) = − sdT − pdV +
i
µidNi
dG(T, p, Ni) = − SdT + V dp +
i
µidNi. (65)
Exercise 10 (Mathematical Proof: Cyclic Reciprocity Rule and Thermodynamics)
The goal of this exercise is to understand the following proof, memorize the main
steps (tricks) and then reproduce it from memory 42.
Say we are looking at the level sets of a function of three variables – for instance,
one of the thermodynamical potentials from the last exercise. In particular, suppose
40
Physicists that the present author has had the privilege of talking to in person :P.
41
Σ is the symbol for the Greek capital letter, ‘sigma’.
42
This problem is dedicated to Aston Williams, Engineer of Chemicals.
38
we have a function f = f(x, y, z) of the three variables x, y, z. If we have the
additional constraint that:
f(x, y, z) = 0 (66)
(e.g. zero Hemholtz free energy), then the implicit function theorem from multi-
variable calculus tells us that we can write any one of the variables x, y, z in terms
of the two other variables. WLOG43 lets take the variable z to be a function of the
two variables x and y: z = z(x, y).
The exterior derivative (total differential) of z is then given by
dz =
¢
∂z
∂x

y
dx +
¢
∂z
∂y

x
dy, (67)
where as usual, the partial derivative ∂z
∂x is taken while y is held constant and the
partial derivative ∂z
∂y is taken while x is held constant. This is made explicit by
the notation ( ∂z
∂x) )y, where the brackets and subscript denote the variables we are
keeping constant while differentiating 44
Taking dz = 0 (holding z constant), we can use the implicit function theorem again
to view y as function of x (when dz = 0): y = y(x), hence we have
dy =
¢
∂y
∂x

z
dx. (68)
Substituting this relation into the equation dz = 0, we see that:
0 = dz =
¢
∂z
∂x

y
dx +
¢
∂z
∂y

x
¢
∂y
∂x

z
dx. (69)
Since this equality is actually a co-vector (differential 1-form) equality, we use the
fact that a co-vector is identically zero if and only if its components are zero – i.e.
the coefficients of dx in this case. Hence
0 =
¢
∂z
∂x

y
+
¢
∂z
∂y

x
¢
∂y
∂x

z
=⇒ (70)
43
This is a common mathematical acronym for ‘Without Loss of Generality’.
44
The reason for this pedantry now, is that usually we differentiate with respect to variables which
are independent of each other. However, in the following step of the proof, y may also be related to z
except in the special circumstance that dz = 0 – thus we must explicitly denote that z is being held
fixed, hence this notation.
39
¢
∂z
∂x

y
= −
¢
∂z
∂y

x
¢
∂y
∂x

z
=⇒
¢
∂z
∂x

y
1
¡
∂z
∂y
©
x
1
¡
∂y
∂x
©
z
= − 1 =⇒
¢
∂y
∂z

x
¢
∂z
∂x

y
¢
∂x
∂y

z
= − 1 . (71)
This last relation (71), is known as ‘Euler’s cyclic rule’45 or the ‘triple product re-
lation’ e.t.c. It is a quintessential identity used in thermodynamics since it allows
one to typically express one set of physical quantities in terms of other physical
quantities through the functional relations established by (71). One easy way to re-
member it, is to look at the variables in the numerator: y, z, x, denominator: z, x, y
and the subscripts on the brackets: x, y, z – they are all some cyclic permutation in
the order: x → y → z → x.
45
After the prolific Swiss mathematician, Leonhard Euler – a name that appears everywhere in
mathematics.
40
8 Tutorial 5: Absolute Error and Game of Thrones
In this tutorial we will investigate the task of computing the ‘absolute error’ in a
given quantity, as a function of the precision in your measuring devices and mea-
suring ability. The problems and examples will have a Game of Thrones theme, to
celebrate (*spoiler alert*) the death of King Joffrey.
8.1 Absolute Error
Most quantities that we measure in science and engineering are ‘derived quanti-
ties’. This means that we measure them indirectly – in particular, we measure
some set of basic properties of a system or environment, then use some mathe-
matical model or formula to relate these properties to the quantity we are trying to
measure.
For example, if we want to measure the surface are of a basket ball, one would
probably measure the circumference46 with a tape measure or string. Then, using
the relation:
C = 2πR = πD (72)
where C is the circumference, R is the radius and D = 2R is the diameter, one
can then compute the radius of the ball. Once the radius is known, the surface area
S can be calculated:
S = 4πR2
. (73)
In this manner we have only performed a length measurement, yet we have ob-
tained a measurement of a ‘derived property’ of the ball – it’s surface area. If
you think carefully about the tools we use to measure things, one quickly comes
to the conclusion that almost all measurements we perform are those of derived
quantities. The question then arises – how do we obtain an estimate of the error in
our final measurement? To do so, one would have to relate the error in a derived
quantity to the error in the basic quantities which we directly measure.
One general procedure for obtaining a ‘total error’ or ‘absolute error’ estimate,
involves three ingredients:
• A knowledge of the precision of your measuring ability (inherently restricted
by the precision of your instruments).
46
That is, the circumference of a great circle – a circle which passes through the centre and divides
the ball into two equal hemispheres.
41
• A mathematical function relating your derived quantity to the quantities you
directly measure.
• The ‘total differential / exterior-derivative / exact derivative’ formula (Tuto-
rial 4).
Mathematically, we proceed as follows.
Definition 1 (Absolute Error) Let x1, ..., xn be a set of n quantities which are
to be measured (with their respective units). Now, let f(x1, ..., xn) be a func-
tion of n variables, representing some derived quantity which is to be measured.
If ∆x1, ..., ∆xn are the errors associated to the measurements of x1, ..., xn (e.g.
instrument precision) then the corresponding ‘absolute error’ in f(x1, ..., xn) is
given by the linear estimate:
∆f(x1, ..., xn) = |
∂f
∂x1
∆x1|+|
∂f
∂x2
∆x2|+... + |
∂f
∂xn
∆xn|. (74)
which is evaluated at the measured values x1, ..., xn.
Note that the formula for ∆f is similar to the total differential, df, where the dif-
ference is that we have replaced the covectors (1-forms) dx1, ..., dxn with the mea-
surement errors ∆x1, ..., ∆xn. The absolute value of each term is also taken – this
is because when looking to estimate the ‘Maximum Probable Error’, each error
should add up. When quoting the value of f as (derived) measurement, we say that
the quantity f has the value:
Measured Value of f = f(x1, ..., xn) ± ∆f. (75)
Therefore, (with some probability) we say that the true value of f lies in the interval
[f − ∆f, f + ∆f].
Note that in the case of perfect measurement technique, one would attribute the
errors ∆x1, ..., ∆xn to the instrumental precision. So for example, if you are mea-
suring the height h of Tyrion Lannister with a tape measure, the error ∆h would be
equal to half the width of the gradings in the tape measure. Finally, one should note
that this ‘absolute error’ formula only takes deterministic errors into account (i.e.
precision e.t.c) – it does not factor in wrong measurement technique or external
errors which one has not accounted for.
Before attempting the problems and examples, consider the following philosoph-
ical note. Because of Quantum Mechanics – in particular, the Heisenberg Un-
certainty Principle and the inherent non-deterministic nature of the universe, it is
inherently impossible to measure anything with 100% accuracy or certainty. This is
42
not due to imperfect craftsmanship (imperfect measuring devices) or human imper-
fection – it is because the process of observation and measurement itself, requires
interacting with the entity which we are trying to measure. This interaction alters
the state of the entity we are trying to measure and is necessarily constrained by
the Heisenberg uncertainty principle.
8.2 Examples and Problems
Example 7 (Thinking Ahead with Ned) In a sadistic rage, the false King Joffrey
decides to measure the surface area of Ned Stark’s head after decapitation. Being
a boy of elementary means, he approximates the Lord of Winterfell’s head as a
sphere. Using a string and ruler, he measures the circumference of Ned’s head by
to be C = 24 inches. He does this by marking the string, then measuring the string
with the ruler. The gradings on the ruler are spaced 1
4 of an inch apart – hence
precision of the ruler is 1
8 in. Assuming his technique is correct, this means that
the error associated to the circumference measurement is ∆C = 1
8 in. Therefore,
Joffrey deduces the surface area of Ned’s head to be:
S = 4πR2
= 4π(
C
2π
)2
=
1
π
C2
=
1
π
(24in)2
=
574
π
in2
. (76)
Viewing S = S(C) as a function of the measurement C, the absolute error in S is
given by:
∆S =|
∂S
∂C
∆C|= |
2
π
C∆C|
=|
2
π
24 ×
1
8
|in2
=
6
π
in2
. (77)
Hence, with the equivalent sphere approximation, the surface area of Ned Stark’s
head is:
S ± ∆S = (
574
π
±
6
π
)in2
≈ (183 ± 1.91)in2
. (78)
Exercise 11 (Thinking Ahead: Part II) Using his previous measurement of the
circumference of Ned’s head, compute the volume V of Ned’s head along with the
absolute error ∆V . Recall that the volume of a ball of radius R is given by:
V =
4
3
πR3
. (79)
Hint: First write V in terms the circumference C, using the relation C = 2πR.
43
Dry Humour: To account for dehydration-related shrinkage of Nedard’s head, add
5% ± 1% of the measured volume of Ned’s head. Note, you add the ±1% of V to
the previously calculated error ∆V – that is: ∆VNew = ∆VOld + 0.01V .
Exercise 12 (Thinking Ahead: Part III, Return of The King) After receiving tu-
ition help from St. George’s College tutors, King Joffrey decides to further his
skills by measuring the volume of Ned Stark’s head – this time, using more sophis-
ticated estimates. In particular, he approximates the Lord of Winterfell’s head to
be that of a prolate spheroid47, with its major axis aligned with the symmetry axis
(vertical axis) of Ned’s head. Again, using a string and ruler (this time in metric
units), Joffrey proceeds to measure the circular circumference of Ned’s head (pro-
late spheroids have circular cross sections along their minor axis) as well as the
elliptical circumference of Ned’s head (elliptical cross-sections along the major
axis).
Joffrey makes the following measurements:
CCircular =55cm = 2πR
CElliptical =62cm = 4aE(e) (80)
where a is the semi-major axis length of the ellipse, b is t the semi-minor axis
length, e =
˜
1 − (b
a)2 is the eccentricity of the ellipse and E(e) is a complete
elliptical integral of the second kind (computed numerically or as an infinite series
expansion in e). Note that the semi-minor axis length b of a prolate spheroid, is
equal to the radius of the circular cross-section along the minor axis of the spheroid
:
b = R, (81)
since the spheroid is generated by revolving the ellipse about the axis perpendicular
to the minor axis.
Given that Joffrey’s newfound metric ruler has 1mm = 0.001m spacings, the
precision error in his measurements in now given by: ∆CC = ∆CE = 0.5mm =
0.5 × 10−4m. Using the volume formula for a prolate spheroid:
V (a, b) =
4
3
πa2
b, (82)
compute the volume of Ned’s head, along with the associated absolute error ∆V .
This requires Russian Grade 1 skills.
47
A prolate spheroid was chosen over an oblate spheroid after using Microsoft Paint to compare
the width and height of Sean Bean’s (the actor playing Ned Stark) head.
44
Hint: To proceed, you should write the volume V in terms of the circular and
elliptical circumferences: V = V (CC, CE). This requires writing the semi-minor
and semi-major axis lengths in terms of the Circumferences. We already know that
b = R, hence b = CC
2π . To get the semi-major length a in terms of CE, one needs
an approximation for the elliptical integral E(e). Recalling from Russian Grade 1
in Tutorial 4, we have the Ramanujan approximation:
CE ≈ π[3(a + b) −
—
10ab + 3(a2 + b2)]. (83)
By bringing the 3(a + b) term to the left-hand side and squaring both sides, we can
obtain a quadratic equation for a in terms of b and CE. The positive root of this
equation is given by:
a =
3CE − 4bπ +
˜
3C2
E + 12bCEπ − 20b2π2
6π
. (84)
Substituting these expressions for a and b into V , one can then compute V and its
partial derivatives, required for computing ∆V .
Exercise 13 (La forma de la espada – “The Shape of the Sword”) The goal of
this problem, is to be able to reproduce all the steps and arguments to derive the
volume estimate – then compute the volume measurement and absolute error at the
end. Disclaimer: there may be errors in this error analysis!
To add further insult to the Stark family, Tywin Lannister – Hand of the King
and head of the Lannister family, decides to melt down Edard Stark’s greatsword,
“Ice. Being a pragmatic man, Tywin decides to calculate the volume of this sword
in order to work out how much Valyrian steel he will have to forge two new swords,
for his sons.
Not being as clever as Archimedes, Tywin doesn’t think to use water displacement
to measure this volume. Instead he proceeds as follows. We can approximate the
blade (Valyrian steel part) of the sword to be that of a shallow rhomboidal prism,
with maximum width at the hilt of the sword, decreasing in thickness down to the
pointed tip. A rhomboidal (diamond-shaped) prism, means that the width-wise
cross-sections of the sword of are shaped like rhombuses with very narrow (acute)
angles α in the plane parallel to the cutting edges and very large (obtuse) angles
β length in the plane perpendicular to the cutting edge. Despite the decreasing
thickness, the angles in the rhombus cross-sections will remain the same48.
Say that the rhomboidal cross-sections are measured to linearly decrease in area,
down from the hilt to the tip – reached zero area at the pointed end of the blade. By
48
So we could in-fact view the blade as a continuous conformal map of a rhombus.
45
knowing the length of the blade and the cross-sectional area at the hilt and at the tip
(zero), we can construct a linear function, A(x) (where x is the distance down the
blade, measured from the hilt), from which we can interpolate the cross-sectional
area of the blade anywhere between the hilt and the tip. The volume will then be
‘sum’ of these cross-sectional areas stacked on top each of each other – i.e. the
integral:
V =
xtip
xhilt
A(x)dx. (85)
Because of his war with the Stark family, Tywin has run out of protractors and is
thus left only with Joffrey’s string and ruler to carry out his measurements – to
which he proclaims, “FML! Tywin now summons the help of his educated son,
Tyrion Lannister. In a stroke of cleverness, to work out the angles of the rhombus
cross-section at the blade hilt, Tyrion measures the circumference of the blade.
Because of sword symmetry (the rhombus consisting of two mirrored isosceles
triangles), this circumference C is equal to four times the length of each side of the
cross-sectional rhombus at the hilt:
Crhombus(xhilt) = 4Lhilt. (86)
If the blade was completely flat, it would have a width of 2Lflat at the hilt, instead
of the string-measured value of 2Lhilt. Thus, by holding the string tangential to the
corner of the rhombus (which runs down the center of the blade), Tyrion measures
the ‘flat width’ of the blade: 2Lflat. He then computes the ‘flat circumference’ at
the hilt:
Cflat = 4Lflat (87)
and concludes that the deviation:
Crhombus − Cflat = 4(Lhilt − Lflat) (88)
must be due to the entirely to rhomboidal geometry49 of the cross-sections. Using
planar geometry that he learned while in his mother’s womb, Tyrion realises that
2Lflat is equal to the central diameter of the rhombus. Forming a right-angle
triangle in the rhombus, with hypotenuse Lhilt, acute angle α
2 , adjacent side Lflat.
49
The key concept here is that of a ‘defect angle’. The rhomboidal geometry introduces a non-zero
angular defect away from zero-angle describing flat cross-sections (straight lines).
46
Figure 1: Cross-sectional rhombus of idealized broadsword.
Therefore, simple trigonometry gives:
cos(
α
2
) =
Lflat
Lhilt
sin(
β
2
) =
Lflat
Lhilt
, (89)
which allows Tyrion to deduce the interior angles α and β of the cross-sectional
rhombus. By symmetry, the area of the cross-sectional rhombus at the hilt is simply
four times the area of this triangle (using Pythagoras’ theorem since we want all
quantities in terms of the measured quantities Lh, Lf )
Arhombus(xhilt) =4 ×
1
2
× base × height = 4 ×
1
2
Lflat
˜
L2
hilt − L2
flat
=2Lflat
˜
L2
hilt − L2
flat . (90)
To work out A(x) for any x ∈ [xhilt, xtip], Tyrion lays the sword flat. Overhead, the
sword looks like an isosceles triangle, with base 2Lflat and height Lblade. Splitting
these into two right-angled triangles, we get the following diagram:
Figure 2: Top view of broadsword laid flat.
In particular, Tyrion finds that tan(γ) =
Lflat
Lblade
. Setting up a coordinate system
with xhilt := 0 at the hilt and x = xtip = Lblade at the end of the blade, the height
y of the triangle at any point along the blade, can then be computed as a function
47
of the position x along the blade. Trigonometry shows that:
y = (Lblade − x) tan(θ) = (Lblade − x)
Lflat
Lblade
= Lflat −
1
Lblade
x. (91)
To work out the area A(x) of the cross-sectional rhombus at any point x along
the blade, one uses the previous formula: 2Lflat
˜
L2
hilt − L2
flat, but makes the
following replacement Lflat → y and Lhilt → y
tan(α
2
) = Lhilt
Lflat
y, since tan(α
2 ) =
Lflat
Lhilt
(recall the first diagram). Hence we have:
Arhombus(x) =2y
d
(
Lhilt
Lflat
y)2 − y2 = 2y
d
y2((
Lhilt
Lflat
)2 − 1) = 2y2
d
(
Lhilt
Lflat
)2 − 1
=2Lblade(Lflat −
1
Lblade
x)2
d
(
Lhilt
Lflat
)2 − 1. (92)
Having learned calculus from the ‘Principia Mathematica’, Tyrion concludes that
the volume is therefore given by the following function of three measured variables
50 — Lflat, Lhilt and Lblade:
V (Lflat, Lhilt, Lblade) =
x=Lblade
x=0
A(x)dx = 2
d
(
Lhilt
Lflat
)2 − 1
x=Lblade
x=0
(Lflat −
1
Lblade
x)2
dx
=2
d
(
Lhilt
Lflat
)2 − 1
1
3
Lblade
 
1 − 3Lflat + 3L2
flat
¨
=2Lblade
d
(
Lhilt
Lflat
)2 − 1
¢
L2
flat − Lflat +
1
3

. (93)
Given Tywin’s measurements of the broadsword along with the corresponding pre-
cision error
Lblade = 42in±
1
8
in, Cflat = 4Lflat = 4in±
1
8
in, Crhombus = 4Lhilt = (3+
7
8
)in±
1
8
in,
(94)
50
As a consistency check, one should note that we expect the volume to be a function of exactly
three variables. This is because the cross-sectional area of the sword is parametrised by two-variables
(being a non-square rhombus), whilst the length of the sword is parametrised by another independent
variable. If the cross-sectional rhombus was turned into a square, the sword would be reduced to
a spike and the volume would be a function of two measured variables – the blade length and the
length of one of the sides of the cross-sectional square.
48
one can deduce the following measurements and (reduced) errors for the L vari-
ables 51
Lblade = [42 ±
1
8
] in, Lflat = [1 ±
1
2
] in, Lhilt = [
1
4
(3 +
7
8
) ±
1
2
] in. (95)
Problem I From these measurements, compute the volume (in units of inches
cubed), V , of Ned’s broadswoard along with the corresponding absolute error, ∆V .
Convert these measurements into metric units, using the conversion: 1 inch =
2.54 cm.
In a thoughtful moment, Tyrion decides to calculate the financial worth of the
sword in terms of pure Valyrian steel. Given that Valyrian steel is worth 100
times its weight in gold, calculate the total worth W of the broadsword in terms
of kilograms of gold. To do this, use the fact that density of Valyrian steel 52 is
ρ = 7.85 g/cm3 = 0.284 lb/in3. Remember to use consistent units – either stick
with imperial units or convert everything to metric units.
Problem II Given that mass M = V olume × Density = V ρ, compute the
error in the amount of gold Tyrion will make by selling the steel smelted from
the broadsword blade. Assume that the density ρ given is accurate to the num-
ber of decimal places quoted – i.e. the precision error in density is given by:
∆ρ = 0.005g/cm3. Hence deduce the minimum and maximum amount of gold
(W = 100M) Tyrion will make, based on Tywin’s measurements – i.e. compute
W − ∆W and W + ∆W.
• The dimensions in the last question were computed using slightly larger-
than-average dimensions for Claymores and Two-handed swords from the
medieval ages.
• The last exercise illustrates an important technique in making measurements:
by measuring the circumference of the sword rather than just the edge of
the cross-sectional rhombus, the precision error in determining Lflat was
reduced by a factor of 4. In general when making measurements, it is better
to make measurements of quantities which are much larger than the precision
limitation set by your instrument – from these measurements, you can then
deduce measurements for quantities you need with lower precision error. So
for example, in determining the area of a circle with string, it is better to
measure its circumference rather than its radius (since the former is larger) –
this way, one may reduce the precision error in determining the radius by a
factor of 2π.
51
See the remark after this exercise.
52
The density of Carbon 1060 Steel used to make “Ice” replicas for crazy Game of Thrones fans.
49
Exercise 14 (Littlefinger’s bane – Lord Tyrion, Master of Coin) Having been given
the responsibility of managing the Kingdom’s finances, Tyrion Lannister finds that
he has inherited some ‘financial discrepancies’ from Littlefinger – that is, he has
found some mathematical ‘short-comings’ 53 in Littlefinger’s bookwork. In partic-
ular, apart from pocketing coin from time to time, Tyrion finds that his predecessor
Littlefinger has been using the wrong interest rate formula to calculate the king-
dom’s debt. Furthermore, Tyrion finds that Littlefinger has been ‘inflating’ the
recorded expenses, so as to inflate his own pockets. Being clever, Littlefinger ran-
domized the expenses which he had inflated and also kept all ‘inflations’ to within
2% of the true expense.
Littlefinger used discrete compound interest, compounded quarterly, to compute
the interest S(t) − S0 that the kingdom owes to a certain bank t years after taking
an initial loan S0. Given an annual interest rate of 7% – i.e. r = 0.07, the amount
owed to the bank is given by
S(t) = S0(1 +
r
m
)mt
(96)
where m = 4 is the number of times the interest was assumed to be compounded
per year. However, driven by avarice, the bank in fact changed the terms of the loan
so that interest was compounded continuously – i.e. m → ∞. With this correction,
Tyrion finds the actual amount owed after t years:
S(t) = lim
m→∞
S0(1 +
r
m
)mt
= S0ert
(97)
where e is the exponential function. First, Tyrion must correct the size of the debt
blackhole that Littlefinger’s endless borrowing has brought the kingdom into. To
do this, Tyrion must estimate the true total expenses E of the kingdom along with
an ‘absolute error estimate’ ∆E to account for the amount of money Littlefinger
has stolen. Once this is done, Tyrion must calculate the amount of interest that
the Kingdom will owe in the next financial year, along with an error estimate to
account how much of this may be due to Littlefinger.
Problem I: First make sure that you understand why discrete compound interest is
given by a geometric sequence and why continuous compounded interest is given
by the exponential function. Now,given an initial loan of S0 = 4, 000, 000gc (gc
= ‘gold coins’) taken t = 12.5 years ago along with a second (separate) loan of
˜S0 = 6, 000, 000gc taken t = 2 years ago, compute the total amount of money,
S + ˜S, that the Kingdom currently owes to the bank.
53
Pun intended.
50
After some statistical analysis, Tyrion concludes that Littlefinger has used a bi-
nomial distribution to inflate the expenses. In particular, Littlefinger has selected
a probability of p = 0.25 to choose whether or not to inflate an expense at any
given time. Counting N = 2014 expenses – and based on the asymptotic nature
of probabilities (Law of large numbers), it is a reasonable estimate to assume that
one quarter of all expenses are inflated. Thus, as a simplification, Tyrion decides
to assign the following error to each recorded expense, Ek:
∆Ek = −0.25 × (0.02 × Ek) = −0.005Ek, (98)
where the negative sign accounts for the fact that Littlefinger would only steal
money rather donating it. Because these errors are cumulative, the error in the total
expense E =
2014
k=1
Ek, is given by
∆E = 0.005E. (99)
Problem II In total, the Kingdom’s projected expenses are given by the sum of
its total debt (S + ˜S) as well as its internal expenses, I = 1, 000, 000gc. As an
approximation, treat Littlefinger’s inflation of the expenses as an inflation of the
initial loans taken from the bank – i.e.
∆S0 = 0.005∆S0, ∆ ˜S0 = 0.005∆ ˜S0. (100)
Using the arguments earlier, one can approximate the error in the internal expenses
to be ∆I = 0.005I. On second inspection, Tyrion realises that with their loose
contract, the bank may legally retro-actively alter the interest rate r on their loan
by ±1%. This induces an error of ∆r = ±0.01 in the compound interest formula.
To compute the error in the debts S(t = 12.5) and ˜S(t = 2), one views S and ˜S
as functions of the interest rate r and the initial loans – S0 and ˜S0 – then uses the
absolute error formula.
With this information, compute:
ETotal = S(12.5, r, S0) + ˜S(2, r, ˜S0) + I (101)
as well as the absolute error, ∆Etotal.
Problem III The quality of time-keeping devices owned by the kingdom and the
bank, is not very good. In particular, clocks are known to have an accuracy of about
1 minute per day – i.e. ∆tday = 60 seconds. Convert this into a yearly error in t,
giving ∆t. To account for the quasi-periodic orbit of their planet, add ±1.5 days
51
multiplied by the number of years elapsed to the estimate of ∆t – i.e. add 1.5
365t to
∆t, assuming each year has an average of 365 days on Tyrion’s planet. Using this
final estimate of error in the time elapsed and viewing the debts S = S(t, r, S0)
and ˜S = ˜S(t, r, ˜S0) as functions of three measured variables (with time included),
to compute a refined estimate for ∆S and ∆ ˜S0 that takes into account ∆t.
Problem IV If there is a chance that Littlefinger has caused a total of more than
333, 333gc (gold coins) of excess expenses (including interest), Tywin Lannister
will proceed to organize Littlefinger’s capture, de-sexing and subsequent torture.
Based on your calculations, will Littlefinger’s name become his new pathos?
Adopting the cunning of Tyrion Lannister, if Littlefinger has managed to fall below
Tywin’s ‘critical expense threshold’, can you modify the above argument to max-
imize the possible error in the computed expenses of the kingdom? Note that you
must do this in mathematically plausible and logical way, so as to persuade Lord
Tywin to torture Littlefinger. Alternatively, if Littlefinger has already exceeded the
threshold, can you use binomial statistics to maximize ∆E, hence maximizing the
severity of his torture?
Hint: Think about using the cumulative probability density function (sum of all
scenarios where theft occurs less than or equal to a certain number of times) de-
rived from the binomial distribution. If all else fails, either ‘inflate’ the probability
with which Littlefinger stole at any given expense, or inflate the amount by which
Littlefinger was inflating the expenses.
52
9 Tutorial 6: Medicine – An Error a Day Keeps the Tutor
Away
In the last tutorial which you completed 54 for the good of your future selves 55, you
studied ‘absolute errors’ using ‘linearisation’ or an informal re-interpretation of the
‘exterior derivative’ operation. In this tutorial, we will finalise our study on basic
error analysis with a few more concepts such as ‘relative error’, ‘percentage error’,
‘least scale error’ and ‘Maximum probable error’. The tutorial will conclude with
some illustration of how to extract and interpret error estimates from ‘least squares
regression’ or ‘line of best fit’ – the most commonly used statistical analysis tool
in experimental science and engineering.
9.1 Relative and Percentage Error
In the last tutorial, we defined the absolute error ∆f in the measurement of some
dependent variable f(x1, ..., xn), in terms of a set of measurements of experimen-
tally measured variables x1, ..., xn and their corresponding errors ∆x1, ..., ∆xn:
∆f(x1, ..., xn) = |
∂f
∂x1
∆x1|+|
∂f
∂x2
∆x2|+... + |
∂f
∂xn
∆xn|. (102)
The relative error in f is then defined very simply as the ratio of the absolute error
in f to experimentally determined value of f. We now formalize this.
Definition 2 The relative error in some measurement f(x1, ..., xn) of f, is defined
as
∆f
f
=
1
f(x1, ..., xn)
¢
|
∂f
∂x1
∆x1|+|
∂f
∂x2
∆x2|+... + |
∂f
∂xn
∆xn|

, (103)
where x1, ..., xn are some set of measured variables which determine f(x1, ..., xn.
Furthermore, the Percentage error in the measurement f is defined to be the rela-
tive error expressed as a percentage:
Percentage Error in f =
∆f
f
× 100%. (104)
54
Hint: Angela, Emma, Amelia, Zoe!
55
Hint: Future Angela, Future Emma, Future Zoe and Future Amelia.
53
9.2 Error Etiquette
We now define a fundamental standard in error analysis – the ‘Maximum Probable
Error’ (MPE). So far, we have always been using absolute values when computing
error – this is to maximize our calculation of the possible errors that may have accu-
mulated in our measurement process (which we could account for). Therefore, in
the absence of unaccounted errors, the Maximum Probable Error is the maximum
error that may have occurred if the worst-case scenario happened in our measure-
ments – i.e. all the errors added up instead of cancelling. In general, when an error
is quoted in scientific and engineering literature, it corresponds to the ‘Maximum
Probable Error’.
In principle, one could keep track of the signs in the error (e.g. if we knew that
a quantity may be larger than measured, but not smaller) and add them up so as
to give a total error which is less than the ‘maximum error’ – but this is almost
never used. Furthermore, the ‘absolute error’ formula corresponds to error at the
‘linearised level’. For each problem, we could try and derive a more accurate non-
linear calculation of the errors, but for the most part, one sticks to the absolute error
∆f.
In the absence or ignorance of systematic errors, the errors ∆x1, ..., ∆xn in the
experimentally measured quantities x1, .., xn are usually taken to be the precision
error due to the ‘least scale’ reading on your measuring device. In other words,
a measuring device – such as a ruler or the ATLAS detector in the Large Hadron
Collider, will typically have smallest scale reading, which is set by the resolution
of the instrument. Higher quality and more expensive measuring instruments will
usually have a higher (better) resolution – meaning a smaller least scale reading.
On a standard 30cm ruler, the least scale reading is usually 1mm, which is deter-
mined by the smallest separation in the marked spacings. As a rule of thumb and
limitations on one’s ability to interpolate, the Least Scale Error ∆x in some mea-
sured quantity x is usually taken to be equal to half the least scale reading. So for
our 30cm ruler, with a least scale of 1mm, we would take our least scale error for
any length measurements with this ruler to be ∆x = 0.5mm = 5×10−4m.
9.3 Sleepy Snorlax’s Medical (mis)Adventures
To make this concrete, lets consult Snorlax the Sleepy Sphere. Note that formally
speaking, the term ‘Sphere’ refers mathematically to the 2-dimensional boundary
of a 3-dimensional ball – i.e. its outer surface, excluding the interior. Here and
54
previously, we will use the term ‘sphere’ to interchangeably refer to the ‘2-Sphere’
(the surface) and the ‘3-Ball’ (boundary surface + interior).
Example 8 (Snorlax, The Sleepy Sphere) Upon recognizing that she has a sleep-
ing disorder, Soporific Snorlax decides to roll over to the Royal Perth medical cen-
tre. Here Snorlax meets Dr. Ashleigh Punch – a Georgian. After Deciding to
take a keen interest in medical physics, Dr. Punch decides to takes Snorlax’s mea-
surements – using high-energy x-rays and vernier calipers, Dr. Punch measures
Snorlax’s Diameter 2r (where r is the radius) to be 2r = 20.494024 metres (to
within precision of 1µ) with a least scale error of ∆2r = 5 × 10−7m = 0.5µm 56.
Hence we have r = 10.247000m and ∆r = 2.5 × 10−7m.
Dr. Punch then calculates Snorlax’s volume to be
V =
4
3
πr3
= 4506.9300m3
(105)
with an absolute error of
∆V =
4
3
π∆R3
= 4π|R2
∆R|= 3.29871 × 10−4
m3
(106)
and relative error of
∆V
V
=
4π|R2∆R|
4
3πr3
=
4506.9300m3
3.29871 × 10−4m3
=7.319210 × 10−8
, (107)
corresponding to a percentage error of
7.319210 × 10−8
× 100% = 7.319210 × 10−6
%. (108)
As far as medical measurements go, this is a high-precision measurement. After
consulting the Oxford Handbook of Clinical Medicine (9th Edition), Dr. Punch
concludes that Sleepy Snorlax is clinically obese and needs to get rid of excess
adipose tissue. She prescribes Snorlax one week of “Living Below the Line, fol-
lowed by power-lifting sessions at the gym and night-time cycling.
Exercise 15 (The Hippocratic Oath) After seeing Dr. Punch, Snorlax loses a lot
of weight. Too much weight. In fact, it turns out the dose of high-energy X-rays
that Dr. Punch used to image Snorlax, was 100, 000 times above clinical guidelines
56
1 micrometer is defined to as one millionth of a meter: 1µm = 10−6
m.
55
(oops!). Snorlax suspects she may in fact have cancer, so decides to consult Dr.
Kaylin Hooper – another Georgian medical student. Dr. Hooper suggests that one
way to test for cancer, is to measure Snorlax’s average density and compare this
density to that of a healthy sphere – since sentient sphere’s don’t have muscles or
bone or any internal structure ... the standard deviation in sphere densities amongst
the sphere population is extremely small. Having learned that type 1 spherical
cancer tumors have a higher density than that of healthy sphere tissue and type 2
tumors have a lower density than that of normal tissue, Dr. Hooper proposes that if
Snorlax’s density is significantly higher or lower than the sentient sphere average
density (to within 5 standard deviations and experimental error), then Snorlax has
sphere cancer.
Using Nuclear Magnetic Resonance Imaging (MRI = NMR) and a reconstruction
algorithm based on Ellipsoidal Harmonics (http://www.sciencedirect.
com/science/article/pii/S0010465513002610), Dr. Hooper mea-
sures Snorlax’s volumetric density to be:
ρexp = 103
kg/m3
, (109)
with a combined least-scale and numerical precision error (inherent in the algo-
rithm) of
∆ρexp = 0.001g/m3
. (110)
Pro Tip: Remember to keep track of the units you use and be consistent (e.g.
choose kilograms and metres).
Q0: A standard healthy sentient sphere has a density of ρavg = 969kg/m3 with a
population standard deviation of σρ = 6kg/m3. Using the particle physics stan-
dard of ‘5-sigma’ for statistical significance, determine whether or not Snorlax has
sphere cancer. If so, what type of sphere cancer(s) does Snorlax likely have?
In other words, does the possible range of Snorlax’s experimentally measured den-
sity, [ρexp − ∆ρexp, ρexp + ∆ρexp] lie entirely within the density of the standard
sphere population ρavg = 969kg/m3, to within 5 standard deviations, 5σρ –i.e.
[ρavg − 5σρ, ρavg + 5σρ]?
Furthermore, compute the relative error and percentage error in the experimentally
determined value of Snorlax’s density.
Q1: As an upcoming student in medicine, Matthew Fernandez realises that setting
the statistical significance level to five standard deviations is crazy for a medical
diagnosis. After some research, Matthew decides that setting a significance level of
three standard deviations to diagnose for sphere cancer, is far more sensible. Under
56
this new criteria, to within experimental error does Snorlax have sphere cancer? If
so, what type of sphere cancer(s) does Snorlax likely have?
A rare, but debilitating condition for sentient spheres is Volumetosis – a disease
in which the volume and surface area of a sphere are inconsistent. This means
that the radius associated to some calculated volume of the sphere, Rv = (3V
4π )
1
3 ,
disagrees with the radius associated to the surface area of the sphere: (Rs =
˜
S
4π ).
Volumetosis has two known causes and is thus classified into two types:
• Type 1 Volumetosis: Symmetry Breaking. This occurs when the sphere be-
gins to turn into an ellipsoid – typically because cancerous mutations induce
a change in the expression of the sphere gene’s that control its eccentricity.
• Type 2 Volumetosis: Quantum Russian-doll operators. This occurs when
a classical sphere gets infected by quantum operators, which turn it into
a quantum superposition of infinitely many concentric spheres of different
radii. Each time a measurement is made, the quantum collection collapses to
a single sphere of definite radius. Hence independent measurements of sur-
face area and volume will in general, yield different radii – with probabilities
centred around some classical average.
To directly, determine the volume of Snorlax, a machine called the ‘Banach-Tarski
Annihilator’ is used. Such a technique is only ever sanctioned by the medical
community in severe circumstances – which requires both the hospital and patient
signing-off on the ‘The Axiom of Choice’ form. This technique creates an exact
topological (genetic) copy of Snorlax, with the same volume, then proceeds to
bombard it with sphere anti-particles till the entire copy is annihilated. The number
of anti-particles used in the process is then counted and their equivalent volume is
calculated.
Q2: In particular, this technique works by calculating the number of anti-particles
N and assigning a volume u = 10−22m3 per particle. Hence,
V = N × u. (111)
Given N = 8.12247 × 1022 and an atom-counting resolution of ∆N = 106 par-
ticles (taking into account higher-loop corrections from quantum field scattering
processes), compute the total volume V along with the error in volume ∆V , as
measured by the Banach-Tarski Annihilator. Furthermore, compute the relative
error and percentage error in V .
Recompute ∆V as well as the relative and percentage errors, now taking into ac-
count an error ∆u = 10−25m3 in the volume per anti-particle. This additional
57
error is due to the non-local (spread out) nature of the anti-particle wavefunction
(or quantum probability density).
Q3: Given the measurements of Snorlax’s new volume V , compute Snorlax’s
volume-determined radius Rv along with the absolute error ∆Rv, relative error
∆Rv
Rv
and percentage error in Rv. Hint: Use
Rv = (
3V
4π
)
1
3 . (112)
A new measurement of Snorlax’s surface area is made, using the technique of
‘particle deposition’. This effectively deposits a layer of radio-shielding particles
onto the sphere, till the sphere is completely covered – at which point it is radio-
opaque (radio waves cannot pass through it). By rotating the sphere in an array of
directed radio emitters and measuring the intensity of radio waves passing through
the sphere, one determines when a complete layer of radio-shielding particles has
been laid onto the sphere. The sphere is then stripped using an electric field and
the particles are collected onto a flat single-molecule layer – whose surface area is
then measured, again using radio waves. In total, this measurement process has an
effective precision of 0.001m2 as well as an estimated inaccuracy of 5%, due to
non-linear and quantum electrodynamical effects. Hence, the experimental error in
surface area measurement is:
∆S = 10−3
m2
± 0.05 × 10−3
m2
. (113)
Thus, taking the maximum probable error, we set
∆S = 1.05 × 10−3
m2
. (114)
Furthermore, by this method, the surface area of Snorlax is determined to be
S = 18.0956m2
. (115)
Q4: Using the measurements S and ∆S, compute surface-area determined radius
Rs of Snorlax along with the absolute error, ∆Rs. Hint:
Rs =
™
S
4π
. (116)
Furthermore, compute the relative error and percentage error in ∆Rs.
Q5: By comparing the possible experimental values of volume and surface-area
determined radii of Snorlax, Rv and Rs, determine – to within experimental error,
58
whether or not Snorlax has volumetosis. If Snorlax has volumetosis, how severe is
it – i.e.
Hint: This amounts to comparing whether or not the measurement ranges, [Rv −
∆Rv, Rv +∆Rv] and [Rs −∆Rs, ∆Rs +∆Rs], overlap or not. The severity is de-
termined by the ‘range of disagreement’ – i.e. the maximum possible discrepancy
(non-overlap).
Q6 Type I volumetosis and Type II volumetosis can be distinguished as follows. In
particular, for some currently unknown reason, Type I volumetosis typically leads
to a sphere turning into a slightly oblate spheroid, meaning that its surface area
increases relative to its volume. This is because for a given volume, a sphere is an
object which has minimum surface area. Hence, in Type I volumetosis, the sur-
face area determined radius Rs of Snorlax would be measured to be consistently
greater than Snorlax’s volume-determined radius Rv. Since this form of volume-
tosis is topological, it can be treated by injecting Snorlax with ‘homeomorphism’
regulators which then continuously transform Snorlax’s gene expression back to
that of zero eccentricity.
In the case of Type II volumetosis, because the quantum superpositions are sym-
metrically weighted about some classical radius, on average (i.e. after a large num-
ber of measurements), the volume and surface area determined radii agree. How-
ever, because of the oblate spheroid mystery, it suffices to measure whether or not
the Rv is greater than Rs. In particular, if Rs  Rv, Snorlax has Type II volumeto-
sis, which cannot be cured by Dr. Punch or Dr. Hooper – for this, the Royal Perth
Hospital must bring in an external contractor, known as Dr. Who. Such an affair, is
extremely expensive.
From this, decide whether or not Snorlax requires the medical attention of Dr.
Who.
59
10 Tutorial 7: Romanian High School, Part I – Einstein
Convention and Vector Algebra
The last several tutorials have been rather applied – with the inclusion of some
abstract concepts. However, in order to progress and develop more powerful ma-
chinery, one has to delve into the abstract realm. This is the pattern of mathemat-
ics throughout history as the interplay between the creative abstraction and gen-
eralisation of certain ideas or observations that occur in nature – sometimes new
mathematics is motivated by what we see in nature, other times new insights and
perceptions of nature are generated by new ideas in mathematics. The debate over
whether mathematics is ‘created’ or ‘discovered’ however, is a rather contentious
and heated one – so we’ll avoid it for now.
In this following set of tutorials, we will explore a range of powerful abstract ideas
which play a central role in modern mathematics, physics and engineering. These
typically lead to something of practical advantage – either new and more efficient
calculation techniques or simply another perspective on things. Lie algebras ap-
pear everywhere – from everyday rotations to the algebras that govern quantum
mechanics and high-energy particle physics. Similarly, Clifford algebras play an
increasing significant role in modern mathematical developments – leading to an
elegant and efficient alternative formulation of vector calculus.
Before we can explore these slightly more advanced topics, we must first prac-
tice the ‘Einstein convention’. This will get you used to seeing objects in their
component form and how to re-express operations that you are familiar with. It
is very general and often saves time by avoiding nasty sigma signs , as well
visually keeping track of the dimensionality and rank of objects. Luckily, this it
taught in Romanian high schools57 in the context of tensor calculus and coordinate-
dependent differential geometry – so you should also be able to do it.
Note that the following tutorial combines the Einstein convention with ‘Ricci cal-
culus’58, which despite being a widespread convention for doing vector and tensor
calculus, is not necessarily the ultimate manner in which to calculate things. For
vector and tensor calculus, Cartan’s exterior calculus and modern coordinate-free
differential geometry often present the most illuminating, efficient and elegant pre-
sentation and calculation techniques – however, the pre-requisites are high and
57
As noted by a friend of the present author, which he met at the Perimeter Institute for Theoretical
Physics, Canada.
58
A coordinate-dependent calculus for tensors, developed by an Italian mathematician, Gregorio
Ricci-Curbastro.
60
hence will not feature in these tutorials (except in very elementary examples of the
exterior algebra).
10.1 Conventions: Einstein Notation and Vector/Matrix Operations
By now, you will have seen vectors represented in different ways – geometrically as
pointed arrows indicating magnitude and direction, algebraically as a set of compo-
nents in some standard basis or in matrix form as a row / column. So for example,
given a vector v in three dimensions we can write it as:
v = v1
e1 + v2
e2 + e3, (117)
where e1, e2, e3 are the ‘standard basis vectors’59, – i.e. vectors of unit length
pointing in the x, y and z coordinate directions. Alternatively, this vector can be
represented in terms of its components, with respect to some standard basis and
choice of origin:
v = (v1
, v2
, v3
) =
¤
¥
v1
v2
v3

. (118)
Tip: Don’t confuse the raised index vj for the j-th component of the vector v with
‘v to the power of j’. If we want to raise some component of a vector to a power, we
denote this by brackets: for example, (vj)3 is the j-th component v, cubed.
Using the ‘sigma notation’ for summation, one can equivalently express the vector
v = v1e1 + v2e2 + e3 as:
v =
3
j=1
vj
ej. (119)
However, after a while of writing in the summation symbol:
3
j=1
it can get a bit
tedious and unnecessary. Luckily however, the physicist Albert Einstein came up
with a very useful (and efficient) convention, that has since become commonplace
in modern physics and some areas of modern mathematics – such as differential
geometry and higher-level linear algebra. In particular, rather than writing
3
j=1
, we
just keep track of the dimensions we are in – since we are in three dimensions, we
59
Sometimes denoted ex, ey, ez or ˆx, ˆy, ˆz.
61
known that the index j in the term vjej has to run over j = 1, 2, 3. So in particular,
by keeping this in mind, we could simply write:
v = vj
ej (120)
and define the appearance of the repeated index j to mean a summation over all
possible values of that index:
vj
ej :
3
j=1
vj
ej = v1
e1 + v2
e2 + e3. (121)
Henceforth, we shall stick with notation ej to denote the ‘j-th’ standard basis vector
and omit the overhead arrow – hence ej := ej.
DISCLAIMER: Your lecturers might get scared when they see vectors without
arrows on top of them - if they do, tell that it’s okay and not to be afraid since
professional mathematicians don’t need arrows to know when an object is a vector
or not (most of the time it is obvious by context). We shall have ever, for clarity,
keep arrows for general vectors in what follows.
We can now formalise these observations as follows.
Definition 3 (Einstein Summation Convention) Given an n-dimensional vector
space (e.g. the familiar 3-dimensional Euclidean space R3), one denotes the com-
ponents of a vector v in some standard basis e1, ..., en by (vi), where the label i
runs over all possible values as determined by the dimension of the vector space –
i = 1, 2, ...., n−1, n. One indicates summation by the repetition of the same index
twice, hence:
v = vi
ei := v1
e1 + v2
e2 + ... + vn−1
en−1 + vn
en. (122)
As such, repeated indices are called ‘dummy indices’ since we can re-label them
to whatever we want. Indices which are not repeated are called ‘free indices’,
since these label fixed components of some object. In general, the number of free
indices will indicate the rank of an object – scalars are rank 0 objects, vectors are
rank 1 objects and matrices represent the components of rank-2 objects (rank-2
tensors). In general you can have objects (tensors) of arbitrary rank – for example,
the Riemann Curvature Tensor mentioned in Tutorial 1 is was a rank-4 object since
its components were labelled by four non-repeated indices: Rµνρσ.
As an illustration of ‘dummy indices’ (summed indices), we have:
vi
ei = vk
ek = vα
eα (123)
62
etcetera. Note that in almost all circumstances, an index should never have to be re-
peated more than twice in an expression to perform a summation – generally there’s
always a way to write your expressions in a way such that you have at most two-
repeated indices, whence in the Einstein summation convention an index should
never appear more than twice – if it does, then the expression is wrong.
As an example of free indices, we have the ’i-j’th component of a matrix M, la-
belled by Mij – this correspond to the entry in the ith row and the jth column. If
M is a matrix acting on an n-dimensional vector space (i.e. and n×n matrix) both i
and j range independently from 1 to n – hence in general, an n×n matrix can have
up to n2 independent components: M11, M12, ..., M1n, M21, ...., M2n, ...., Mn1, ..., Mnn
etcetera.
Problem 1 (Dot Product) Express the dot-product of two vectors u and v using
the Einstein summation convention.
Example 9 (Matrix Multiplication) In this manner, Einstein notation provides a
simple way to express the product of two matrices A and B. In particular, if we
take A to be an M × N matrix (M rows by N columns) and B to be an N × P
matrix, then their product C is an M × P matrix (M rows by P columns):
C = AB. (124)
We can label the components of C by Cij where i refers to the row and j refers to
the column – hence they range over the values i = 1, 2, ..., M and j = 1, 2, ..., P.
Thus, in Einstein notation, we can express the product of matrices A and B by:
Cij = AikBk
j :=
N
k=1
AikBk
j. (125)
Note that for the dummy index k, we have raised the k index on B – for our
purposes, there is not much need to distinguish between raised or lowered indices,
though in general there is! For aesthetic purposes however (hence to speed up
calculation), it is good practice to keep one index raised and one index lowered for
each pair of dummy indices.
Problem 2 (The Matrix) Some of you may be familiar with the previous rule for
matrix multiplication (125)s only implicitly – meaning you know how to multiply
two matrices by visually writing them out and then operating on them. If this is the
case, write down any two compatible matrices A and B, then check that Einstein
notation correctly reproduces the components (AB)ij of their product AB.
63
Hint: Remember that the matrix product AB is only well-defined if A has the same
number of columns as the number of rows of B. So, for ease, try this with a 2 × 3
matrix A and a 3 × 2 matrix B – this should give you a 2 × 2 matrix AB.
Note that physicists will tend to interchangeably refer to an object by its compo-
nents – hence they may view the matrix C as the two-index (rank 2) object Cij or
a vector v as the one-index (rank 1) object: vj. This is technically incorrect and its
important to remember the difference between the two. In particular, the vector v is
an invariant geometric object – this means that it doesn’t depend on any coordinate
system or choice of basis vectors. The components vi however, implicitly refer
to some a-priori chosen coordinate system or basis. Since we have been using the
standard basis, vi refers to the ith component of vector v – meaning the component
of v pointing in the direction of the ith basis vector, ei. Hence in physics, refer-
ring to a vector by its components vi implies that someone basis has been chosen
– sometimes the basis is stated explicitly, otherwise its normally assumed to refer
to some ‘standard basis’.
Problem 3 (Return to the Matrix) In The Matrix, all objects are represented by
matrices. To simulate reality, all calculations of graphics, rendering and physics
effects are performed using matrix representations of various algebras and vector
/ tensor operations 60. In this manner, all actions performed by inhabitants of The
Matrix are done via matrix multiplication – in general actions do not commute
(meaning you can’t change the order of some actions), since matrix multiplication
doesn’t commute.
To defeat Agent Smith, denoted by the matrix S, Neo Anderson has to act on Agent
Smith sequentially by the bullet matrix B and the Kung-Fu matrix K, which is then
followed by the ‘cheesy line’ matrix C. Express the resulting state of Agent Smith,
represented by the matrix:
CKBS (126)
using Einstein notation – i.e. letting F = CKBS, what are the components Fij of
F in terms of C, K, B and S?
Hint: It may help to find the products sequentially, hence finding BS then KBS
and finally computing CKBS. Note that you will need three sets of different
dummy indices for the product CKBS (three sets of repeated / summed indices)
as well.
Q: What restrictions are there on the dimensions (number of rows and columns)
of S, B, K and C? Furthermore, what dimension does the product CKBS have
60
To some extent, this is how computer games work – vectorising calculations using matrices and
other types of arrays dramatically speeds up computations (in most cases).
64
in terms of the dimensions of number of rows/columns of C and S? To answer
this, remember the compatibility condition (a restriction on dimensionality stated
earlier) required to multiply two matrices.
10.1.1 Scalar and Vector Products – Dot Product
A vector space naturally comes with a law of multiplying two vectors – the exterior
product (which we will discuss later). In the special case of three dimensions, R3,
one can extra the cross product of two vectors from the exterior product. This
result is another vector, hence why the cross product is sometimes referred to as the
‘vector product’. On the other hand, given some sort of metric or inner product61
structure, one may also define the ‘dot product’ of two vectors. The result is a
scalar – hence why it is also referred to as the ‘scalar product’ (although the latter
is more general). We shall recap what one can do with the dot and cross products,
in the context of Einstein notation
Recall that the dot product of two vectors u and v is related to the angle between
these vectors. This is done explicitly by the following formula:
u · v = u v cos(θ) (127)
where u and v are the norms (magnitudes or lengths) of u and v, respectively
and θ is the angle between u and v. This relation is made possible due to a special
relation which is only true for positive-definite inner-product spaces – the Cauchy-
Schwarz inequality:
|v · u|≤ u v (128)
which is true for any two vectors u and v.
If two vectors are orthogonal (perpendicular), then their inner product is zero:
u · v = u v cos(
π
2
) = 0. (129)
Conversely, two vectors are parallel: θ = 0, their inner product maximises:
u · v = u v cos(0) = u v (130)
and if they are anti-parallel (pointing in opposite directions): θ = −π, their inner
product minimises:
u · v = u v cos(π) = − u v . (131)
61
An inner product is the formal name for the dot product – in fact, the dot product is just a special
case of an inner product. It is the Euclidean inner product, since it makes use of the Euclidean notion
of distance – which comes from Pythagoras’ theorem.
65
To express orthogonality in Einstein notation, we define a special object called the
‘Kronecker delta’:
δij =
1, if i = j
0, if i = j
Hence, for example δ11 = 1 and δ10 = 0. Therefore, in this notation, the compo-
nents of the n × n identity matrix , which consists of 1 s down the main diagonal
and zero everywhere else, are given by:
ij = δij. (132)
Now, the standard (Cartesian) basis vectors ej that we have been using, are in fact
orthonormal – this means that they are mutually orthogonal (perpendicular) and
that they are normalised to have unit length. We can express these conditions in
Einstein notation using the Kronecker delta:
ei · ej =δij
ei :=
?ei · ei (not summed) = 1, (133)
where i, j = 1, 2, ..., n for vectors in n dimensions. Note that if we have a repeated
index in Einstein notation and which is explicitly not summed over, we simply
write ‘(not summed’) next to that quantity that contains the repeated index. When
a quantity is contracted with Kronecker delta, it forces the contracted index to take
the same value as the other index in the Kronecker delta, hence:
vi
δij = vj. (134)
Problem 4 (Baby Steps) In n = 3 dimensions, show explicitly that viδi2 = v2 by
summing over i = 1, 2, 3 and using the properties of the Kronecker delta.
Problem 5 (Bigger Baby Steps) Using the fact that ei · ej = δij, show explicitly
that the dot product of two vectors, v = viei and u = ujej is given by
v · u = vi
ui. (135)
Now, use this expression to write down an expression for the length of a vector v
in Einstein notation.
One final application of the inner product (dot product) here, is in the context of
projections. Projection here refer to projecting vectors onto other vectors or sub-
spaces. Such an operation is very important in mathematics and physics, especially
66
in the more advanced and abstract settings. One example that you should be famil-
iar with from high-school is analysing the dynamics of objects sliding down incline
planes – typically you look for the component of the gravitational force directed
down the plane, which is simply the projection of the gravitational force vector in
the direction of a vector pointing down the incline plane.
Intuitively, the name ‘projection’ is motivated if you think of two vectors v and
a starting at the point, with vector a lying on the ground and vector v pointing
upwards at some angle. The projection of v onto the vector a is then the shadow
that the vector v casts onto a – in general, this projection can be shorter, longer
or the same length as the vector a. A natural tool for mathematically formulating
projections is the inner product – for example, the dot product. In particular, the
dot product of two vectors, gives you the magnitude of each vector in the direction
of the other vector multiplied by the length of the other vector. Formally, we define
projections as follows.
Definition 4 (Vector Projections) The vector projection of vector v onto a vec-
tor a is given by
Projva :=
¢
v · a
a

1
a
a, (136)
which is equivalent to
Projva = p v cos(θ)q 1
a
a, (137)
where θ is the angle between v and a. The number
¡
v·a
a
©
= ( v cos(θ)) which
multiplies the unit vector 1
a a is called the ‘scalar projection’ of v onto a or the
‘component of v along’ a. Hence one can view the vector projection of v onto a
as unit vector 1
a a in the direction of a multiplied by the component of v along
a – i.e. a vector pointing in the direction of a which has the length of the scalar
projection of v onto a.
Problem 6 (Bulgarian Baby Steps) Using Einstein notation, write down the pre-
vious formulas for the projection of a vector v onto a vector a.
Hint: This means finding the components (Projva)k of Projva and then multi-
plying them by the standard basis vectors ek:
Projva = (Projva)k
ek. (138)
Problem 7 (Russian Baby Steps) Using the Kronecker delta relations ei · ej =
δij between the standard basis vectors (of unit length) stated earlier, compute the
67
vector and scalar projections of the vector v = viei in n = 3 dimensions, onto the
following vectors:
• e1
• e2
• e3
• ˆr = e1+e2+e3?3
•
ˆ
R = sin(θ) cos(φ)e1 + sin(θ) sin(φ)e2 + cos(θ)e3.
•
ˆ
φ = cos(φ)e1 + sin(φ)e2.
•
ˆ
θ = cos(θ) cos(φ)e1 + cos(θ) sin(φ)e2 − sin(θ)e3.
You have just found the components of v in a Cartesian basis {e1, e2, e3} as well
as a spherical-coordinate basis {
ˆ
R,
ˆ
φ,
ˆ
θ}.
10.1.2 Scalar and Vector Products – The Permutation Symbol
As stated earlier, one can define different types of multiplication between vectors.
An ‘inner product’ or ‘scalar product’ – such as the dot product, multiplies two
vectors to give a number. However, another fundamentally useful type of multi-
plication between vectors is given by the ‘cross product’ or ‘vector product’. This
operation takes two vectors v and u, then produces a vector v × u whose direc-
tion is perpendicular to both v and u, with a magnitude equal to the area of the
parallelogram formed by the vector v and u. To define the cross-product in Ein-
stein notation, we must first introduce a special object which appears everywhere in
vector calculus and tensor calculus (Ricci calculus) / differential geometry.
Definition 5 (Levi-Civita Symbol) The ‘Levi-Civita’ or permutation’ symbol
ijk in three dimensional Euclidean space (so i, j, k take the values can take any
value from 1 to 3), is a totally anti-symmetric object (but not a tensor!), which has
the following properties: ijk =



+1 if (i, j, k) is (1, 2, 3), (2, 3, 1) or (3, 1, 2),
−1 if (i, j, k) is (3, 2, 1), (1, 3, 2) or (2, 1, 3),
0 if i = j or j = k or k = i.
We could alternatively define the permutation symbol to have the following prop-
erties
1. Standard Orientation: 123 := +1
68
2. Total Antisymmetry: ijk = − jik = − ikj,
from which the previous properties would follow.
Problem 8 (Simple Proof) Prove that the properties:
1. Standard Orientation: 123 := +1
2. Total Antisymmetry: ijk = − jik = − ikj,
imply the following properties: ijk =



+1 if (i, j, k) is (1, 2, 3), (2, 3, 1) or (3, 1, 2),
−1 if (i, j, k) is (3, 2, 1), (1, 3, 2) or (2, 1, 3),
0 if i = j or j = k or k = i.
As such, the permutation symbol obeys the following multiplication property, which
comes from its relation to the matrix determinant:
ijk lmn =
δil δim δin
δjl δjm δjn
δkl δkm δkn
(139)
= δil pδjmδkn − δjnδkmq− δim pδjlδkn − δjnδklq+ δin pδjlδkm − δjmδklq
(140)
Problem 9 (Not-so simple proof) Using the multiplication property, prove the fol-
lowing contraction properties:
εijkεimn
=δj
m
δk
n
− δj
n
δk
m
εjmnεimn
=2δi
j
εijkεijk
=6. (141)
Hint: Remember that one is summing each pair of repeated indices from 1 to 3.
One should also note useful observation that δj
j = 3 – to see this, recall that in
sigma-notation, this is the same as saying
3
j=1
δj
j = δ1
1 + δ2
2 + δ3
3 = 1 + 1 + 1 = 3.
10.1.3 Scalar and Vector Products – The Cross Product
In Einstein notation, the components of the cross-product of two vectors v = vjej
and u = ujej in three dimensions, are given by the following formula:
(u × v)i = ijkuj
vk
, (142)
69
whence the resulting vector is given by multiplying these components by the stan-
dard basis vectors and summing them:
(u × v) = ijk
ujvkei. (143)
Problem 10 (Simple Proof) Using the usual formula for the cross-product that
you are used to (for example – via matrix determinants), prove that this is equiva-
lent to the formula given above in Einstein notation.
As such, we can see immediately why the cross-product is an antisymmetric oper-
ation:
(u × v) = −(v × u). (144)
Problem 11 (Easy Proof) Prove the antisymmetry property of the cross product
using the anti-symmetry properties of the Levi-Civita symbol ijk and the formula
for the cross-product in Einstein notation.
Note that we claimed earlier that the cross product of two vectors produces a vector
which is perpendicular to both of the vectors you are crossing. This can be seen
very easily using Einstein notation and the anti-symmetry properties of the Levi-
Civita symbol:
u · (u × v) =ui
(u × v)i = ui
ijkuj
vk
= ijkui
uj
vk
= ijkuj
ui
vk
swapping ui
and uj
= − jikuj
ui
vk
interchanging i and j in ijk
= − ijkui
uj
vk
relabelling the dummy indices i and j
= − u · (u × v). (145)
Hence, since u · (u × v) = −u · (u × v), we conclude that u · (u × v) = 0,
which means that u is perpendicular to (u × v) using the properties of the dot
product.
The cross product also has a geometric interpretation, which comes from the fol-
lowing formula for the magnitude of the cross product:
u × v = u v sin(θ) (146)
where θ is the angle between u and v.
70
Problem 12 (Easy Proof) Using the previous formula (146), prove that u×v = 0
whenever u and v are parallel.
Furthermore, argue geometrically on the basis of the previous formula (146), why
u · (u × v) = 0.
You may recognize that the quantity u × v given by the formula (146) is simply
the area of a parallelogram with sides u and v.
Problem 13 Given vectors v and u with units of length: [v] = [u] = L, use
dimensional analysis and formula (146) to show that their cross product has units
of area, L2.
It may seem strange that the cross-product produces a vector with different units to
each of the vectors you are crossing – however, this is natural when you consider
the applications of the cross-product to physics and engineering. More importantly,
it relates to the fact that the cross product is in-fact a ‘pseudovector’ rather than a
true vector – a concept which is only properly understood in the context of a more
general product called the ‘exterior product’ and an operation called the ‘hodge
dual’. You can however, think of a pseudo-vector as one which behaves like a
vector when rotated, but reverses direction (changes sign) when reflected.
Problem 14 (Simple Calculations) Using the area formula for the cross product,
compute areas of the parallelograms formed by the following sets of vectors in
three dimensions:
• e1 and e2
• e1 and e3
• e2 and e3.
Finally, we finish on a rather important set of identities.
Problem 15 (The Rotation Algebra: so(3)) Using the properties of the cross-product
listed in this tutorial (or otherwise), prove the following critical identities for the
standard Cartesian basis vectors in three dimensions:
• e1 × e2 = e3
• e2 × e3 = e1
• e3 × e1 = e2.
Hint: It suffices to show that (ej × ek) = ijkei.
71
11 Tutorial 8: Design a Death Star – applications of Lie
Groups/Algebras
In this tutorial, we investigate how one can apply the theory of the Lie groups and
Lie algebras to the construction and design of an orbital death star 62 – in particu-
lar, an orbital space station equipped with high intensity Bose-Einstein condensate
based gamma-ray LASERS, naval anti-missile lasers, electromagnetic rail guns
and nuclear warheads.
When it comes to military technology, the most advanced science often takes place
in the form of weapons targeting, tracking and detection systems – a recent exam-
ple is the huge investment in stealth technology and C.I.A drone reconnaissance
by the United States military. This is because target detection and acquisition is
paramount – after all, you can’t eliminate something if you can’t detect it and aim
at it. Even master Sun Tzu understood the importance of this element of warfare 63.
To this extent, we will see how the rotational Lie groups and Lie algebras, realized
in matrix form, can be used to orient an orbital space station along with the gun
turrets it is equipped with. We conclude by looking at the quaternionic represen-
tation of the rotation group – which leads us to the first solid historical example
of an abstract algebra (a ‘generalization’ of complex numbers), constructed by the
famous Irish polymath – Sir William Rowan Hamilton.
This tutorial will make use of matrices and matrix algebra, abstract algebras and
group theory, vectors, rotations and various physical concepts. As such it should
be mastered by engineering, physics, computer science and math students alike.
Hopefully, it will unify and consolidate various areas of your studies – and maybe
convince you to get a job in weapons design/satellite programming.
11.1 Notation
For this tutorial, we will be sticking to Einstein notation (see Tutorial 7) – this
means that whenever we see two indices repeated in some quantity that we are
summing this quantity over all possible values of those indices (omitting the sum-
mation symbol ). So for example, we denote a 3-dimensional real vector v in
62
For those of you who haven’t seen Star Wars, a death star is a large spherical-ish spaceship, the
size of a small moon, equipped with a beam weapon which can destroy entire planets.
63
For those of you who need to read more – Sun Tzu’s “Art of War”. The Giles translation is
recommended.
72
terms of a standard basis e1, e2, e3 as:
v = vi
ei, (147)
where the contracted index i ranges across i = 1, 2, 3:
vi
ei := v1
e1 + v2
e2 + v3
e3. (148)
As before, we keep one index raised and one index lowered for a pair of repeated
indices 64. Furthermore, components of vectors are raised – hence vj refers to
the j − th component of the vector v (not the j − th power), for example. For
those of you who didn’t attempt tutorial 7, you are probably most familiar with
representing a vector by its components – v = (v1, v2, ..., vn) – this notation is
fine, yet elementary as it hides the choice of basis (which is assumed to be the
standard basis) by only displaying the components of the vector.
11.2 BFF: Linear Maps and Matrices
As one progresses in the mathematical sciences, one frequents the land of matrix
operations – for proofs, problems and simplifying calculations. Perhaps the main
reason for their popularity is that there is a one-to-one correspondence between
matrices and linear maps on vector spaces. In particular, a linear map L on a vector
space V (e.g. 3-dimensional Euclidean space R3) is defined as follows.
Definition 6 A linear map ˆL : V → V which maps the vector space V to itself, is
one which has the following property:
• Linearity:
ˆL(au + bw) = aˆL(u) + bˆL(w) ∀u, w ∈ V, ∀a, b ∈ F (149)
where F is some number field (e.g. the real numbers R or the complex num-
bers C).
How does this correspond to matrices? Notice that if we represent a vector v =
viei := v1e1 + ... + vnen in an n-dimensional vector space (e.g. Rn) as a column
vector:
v =
¤
¦
¦
¦
¥
v1
v2
...
vn





(150)
64
A convention which matters in non-Euclidean spaces, since it helps to distinguish covariant
tensors (e.g. covectors such as the total differential) from contravariant ones (e.g. your usual vectors).
73
Then we can readily compute the action of some matrix on this vector via matrix
multiplication. In particular, the action of an n × n matrix Aon an n-dimensional
vector v will produce another n-dimensional vector, u = Mv – which we call the
transformation of the vector v by the matrix M. For example:
Av =
¤
¦
¦
¦
¥
A1
1 A1
2 · · · A1
n
A2
1 A2
2 · · · A2
n
...
... · · ·
...
An
1 An
2 · · · An
n





¤
¦
¦
¦
¥
v1
v2
...
vn





=
¤
¦
¦
¦
¥
A1
1v1 + A1
2v2 + ... + A1
nvn
A2
1v1 + A2
2v2 + ... + A2
nvn
...
An
1v1 + An
2v2 + ... + An
nvn





(151)
Alternatively, in Einstein notation, the action of the matrix A on the vector v is
given by:
u = Ai
jvj
ei (152)
where the components Ai
j of the matrix A correspond to the entry in the ith col-
umn and jth row of A65. The contracted indices i and j run over 1 to n (the
dimension of the vector space in which v lives).
Now, if one recalls, the action of matrices on vectors is linear – that is, given any
scalars λ, γ and any n-dimensional vectors v and u, then for any n × n matrices A
and B we have:
A(λv + γu) = λAv + γAu, (153)
hence matrices obey the linearity property required by linear maps. In this sense,
we can think of the components of a matrix as the components of a linear map in
some chosen basis – conversely, by computing the action of a linear map ˆL on a
set of basis vectors {ej}, we can determine its components in that basis – which
we can view as entries in some matrix. To make this explicit with some examples,
we shall see how rotation maps can be realized in matrix form.
Exercise 16 (Apocalypse Now) Being quite bored of mathematics, physics, sword-
play, music and games, Thomas McKenney chooses to partake in a new pastime –
world domination. He decides the best way to undertake this, is to build his own
star wars-inspired Orbital Death Star. The St. George’s College Board decides to
fund Thomas in this pursuit – agreeing that world domination fits into the cultural
expansion program as well as securing funding for building maintenance. To this
extent, Thomas realizes he must complete the St. George’s College Mathematical
Sciences tutorials in order to prepare his laser targeting algorithms. To aid Thomas
in this noble enterprise, think of a way to mathematically express the statement –
65
Rows have a raised index and columns have a lowered index – taking the transpose of the matrix
reverses this.
74
“by computing the action of a linear map ˆL on a set of basis vectors {ej}, we can
determine its components in that basis.
Hint: Compare the action of a linear map ˆL on a vector v with the action of some
matrix A on v – in particular, compare the coefficients of standard basis vectors
{ej} in the resulting transformed vectors: ˆL(v) and Av. Now look at the special
case when v is simply equal to one of the standard basis vectors ej.
11.3 SO(3): The Lie Group of Rotations
In 3-dimensional space Euclidean space, there are three independent axes of rota-
tion in any given coordinate system. Rotations of vectors are linear maps – to see
this, complete the following exercise.
Exercise 17 (Microsoft Death Star) Linear operations are nice – firstly because
they are relatively simple and second because they can be represented by matrices,
meaning that they are easy to program and implement into computer algorithms.
Therefore, to build a feasible laser targeting system, one would hope that pro-
gramming the rotation of the laser turret amounts to linear operations. Taking
an interest in weapons targeting systems, Emma Krantz decides to program such
a system for her programming competition – to assess the feasibility, she has to
prove that rotations are linear operations.
Let ˆR represent some 3-dimensional rotation operation and v be some 3-dimensional
vector. Argue geometrically that the action of ˆR on the vector v is linear – i.e. show
that ˆR satisfies the linearity property required by a linear map.
Hint: Given a 3-dimensional vector v, we can always scale it by some number
λ ∈ R. If |λ| 1 we dilate the length of the vector and if |λ| 1 we contract it.
Furthermore, if λ  0 we preserve the orientation of the vector and if λ  0 we
reverse it. Argue that scaling first v → λv and then rotating the resulting vector
λv is the same as first rotating v and then scaling it by λ — this shows that rotation
is a degree 1 homogeneous operation.
Hint XP: Further show that adding two vectors v + u and then rotating the sum
of the two vectors, is the same as rotating each of the vectors separately (by the
same rotation) and then adding the individual rotated vectors. This shows that
rotations are additive operations – if you combine this property with the degree 1
homogeneous property, this gives the linearity property which proves that rotations
are linear maps.
75
If one sets up a 3-dimensional Cartesian coordinate system, with coordinates x, y, z
(or x1, x2, x3) and standard basis vectors e1, e2, e3 corresponding to unit vectors
in the x, y and z directions, respectively, then one has three independent rotation
operators R1, R2 and R3 which rotate vectors about each of the correspondence
axes (x, y and z). These are linear maps and hence can be represented as 3 × 3
matrices. We can also view them as functions of the angle which they rotate by.
Explicitly, these matrices are:
R1(θ) =

!
1 0 0
0 cos θ − sin θ
0 sin θ cos θ
(
0
) (154)
R2(β) =

!
cos β 0 sin β
0 1 0
− sin β 0 cos β
(
0
) (155)
R3(γ) =

!
cos γ − sin γ 0
sin γ cos γ 0
0 0 1
(
0
) (156)
Geometrically, R1(θ) rotates any vector v anti-clockwise 66 by an angle θ about
the x-axis – this means it rotates v in a plane perpendicular to the x-axis. Simi-
larly, R2(β) rotates by an angle β anticlockwise about the y-axis and R3(γ) rotates
anticlockwise by an angle γ about the z-axis.
Exercise 18 (Eigenvectors of Rotation) Clearly if you have a vector that lies along
the x-axis and you rotate it about the x-axis, nothing happens to the vector. This is
because any vector lying along the x-axis is an eigenvector of the x-rotation matrix
R1(θ). More generally, if we rotate any vector v = v1e1 + v2e2 + v3e3, about the
j-th coordinate axis, then its j-th component will not change.
Q: Using matrix multiplication and representing each vector as a column vector,
show that:
Rj(θ)ej = ej, (no summation) (157)
which means that the standard basis vector ej is an eigenvector of the rotation
operator Rj with eigenvalue 1.
66
Almost always in mathematics, anti-clockwise is considered to be a positive orientation and
clockwise is considered to be negative.
76
Now, using the previous result and the fact that rotations are linear operators,
prove that67:
Rj(θ)v =
k=j
(vk
Rj(θ)ek) + vj
ej, (158)
where the summed index k = j means you sum over all values (1, 2, 3) of k not
equal to j. Hence, rotations about a given axis preserve the component of any
vector along that axis.
Problem 16 (The Proof is Trivial) If you rotate a vector about an axis through
angle of zero degrees, the vector should remain unchanged.
Verify that all three rotation operators Rj(θ) become the 3 × 3 identity matrix (the
matrix with 1’s down the main diagonal entries and zeros everywhere else) when
you set the angle θ = 0.
As it turns out, the set of rotation matrices forms a mathematical structure known
as a ‘Lie Group’. As such they are used for lying/truth algorithms. Actually that’s
a lie – they are actually a type of ‘continuous’ or rather ‘smooth’ group (as opposed
to a discrete group) named after the mathematician Sophus Lie, who developed and
pioneered them. Lie groups are of fundamental importance to modern physics and
mathematics – in fact, they are the core element underlying major developments in
particle physics 68, high energy physics and gauge theory. We define a Lie group
as follows.
Definition 7 A Lie group G is a differentiable manifold which is also a group
whose group operations are smooth (infinitely differentiable). This means that G
equipped with the operation satisfies the group properties
• Closure/Binary Operation: If A, B ∈ G then A B ∈ G.
• Associativity: For any A, B, C ∈ G, A (B C) = (A B) C.
• Identity Element: ∃I ∈ G such that I A = A I = A, ∀A ∈ G.
• Inverses: For any A ∈ G∃B such that A B = I. If is a multiplica-
tive operation, we denote B = A−1, the inverse of A. IF is additive (or
commutative), we denote B by −A.
67
This is not using the Einstein summation convention – so vj
ej is for a fixed value of j, not a
sum over all possible values of the index j.
68
The Standard Model of Particle physics is in fact a Lie Group – this tells us the symmetries that
nature obeys for the electromagnetic, weak and strong nuclear forces.
77
where is a binary operation69 (e.g. matrix multiplication) which is smooth.
Exercise 19 (YOLO) Unaware of the on-going ‘Project Death Star’ of St. George’s
College, University Hall decides to hold a party to show how awesome they are.
After shouting YOLO, a drunken University Hall student jumps into a pit of horny
honey badgers and dies a humiliating death. Despite making it into the prestigious
Darwin Awards, this is tragic because that student lived a life without ever proving
that the real numbers R form a group under addition – and that the non-zero real
numbers R{0} form a group under multiplication.
Using your wisdom and foresight to avoid a similar fate, prove that the real num-
bers form a group under the addition operation + with 0 being the additive iden-
tity element. Similarly, prove that the set of non-zero real numbers forms a group
under the multiplication operation × with 1 being multiplicative identity element.
Together, these statements imply that the real numbers form a special mathematical
structure called a ‘field’.
Rotations form the Lie Group SO(3), which is the 3-dimensional ‘Special Orthog-
onal Group’. This group is characterized as the set of 3 × 3 matrices {A} which
have the following properties 70
• det(A) = 1
• AT A = 1
for any rotation matrix A. Since the determinant of a linear map tells you how
the map distorts volumes, the first condition (the ‘Special’ part) says that rotations
preserve volumes – this is a consequence of the more general observation that ro-
tations are isometries of Euclidean space, meaning that they preserve lengths of
vectors and relative angles between vectors (rotating any pair of vectors simulta-
neously leaves the angle between them unchanged). Furthermore, since the second
condition (the ‘Orthogonal’ part) can be written as:
AT
= A−1
(159)
where A−1 is the inverse of the rotation matrix A, the second condition says that
rotations are orthogonal71 transformations – meaning they preserve orthogonality
of vectors (or that the column vectors in a rotation matrix are mutually orthogonal).
69
A binary operation on a set V , is one that combines two elements a, b of V to give another
element of V : a b = c ∈ V . Examples of binary operations include addition of numbers or vectors,
multiplication of numbers and cross products of vectors.
70
Recall that det means the matrix determinant of A and AT
denotes the matrix transpose of A.
71
Recall that orthogonal is the mathematical term for ‘perpendicular’.
78
Hence, the second property comes from the fact that isometries preserve angles
between objects.
Note that the group operation for SO(3) is matrix multiplication – which is a
smooth operation since it essentially amounts to the multiplication and addition
of numbers.
Exercise 20 (Killing Time) Whilst waiting on the construction of the death star
by the St. George’s College engineering, science and mathematics students (as
well as legal approvals from Georgian law graduates), Thomas feels the urge to
kill – kill time that is. As a member of the St. George’s College Orbital Death Star,
help Thomas kill time by explicitly showing that the rotation matrices Rj(θ) satisfy
the two properties which characterize the special orthogonal group, SO(3).
Hint: It helps to show that for any rotation matrix Rj(θ), one has (Rj(θ))T =
Rj(−θ) = (Rj(θ))−1, which can be argued geometrically and/or algebraically
using the fact that cosine is an even function 72 cos(θ) = cos(−θ) and that sine is
odd: sin(−θ) = − sin(θ).
Exercise 21 (Group Project: Project Death Star) In an attempt to understand
rotations better for the programming of a weapons targeting system on the Geor-
gian Death Star, the members of the SGC Mathematical Sciences Tutorial sit down
and try to prove that the set of rotation matrices, SO(3), form a group. Since this
includes you, complete this proof. This means verifying that SO(3) satisfies the
four properties required to be a group, with matrix multiplication being the group
operation.
Hint: Recall how the 3 × 3 identity matrix I3 acts on a 3-dimensional vector v –
that is, I3v = v. Furthermore, to show that every element of SO(3) has inverse,
consider Ru(θ) – an arbitrary rotation operator which rotates objects anticlock-
wise through an angle θ about an axis defined by the vector u, then consider how
one would undo rotations performed by Ru(θ).
Because the Lie Group SO(3) is transitive, we can write any general rotation as
a product of finitely-many rotation matrices. For us, this means that we can write
any rotation as a sequence of rotations about the x, y and z axes:
R(α, γ, β) = R3(γ) R2(β) R1(α). (160)
Note that since matrix multiplication is not commutative, the order in which mul-
tiply (hence the order in which we rotate) matters. In particular, when the rotation
72
Recall that even functions f(x) are symmetric about x = 0 and odd functions are anti-
symmetric.
79
R(α, γ, β) given by (160) acts on a vector v, it rotates it first by an angle α about
the x-axis, then by an angle β about the y-axis and finally by an angle γ about
the z-axis. In general, we could write down a matrix Ru(θ) which rotates objects
anticlockwise about some axis defined by the vector u through an angle θ – indeed,
such a matrix is given by the (easy-to-prove) ‘Rodrigue’s rotation formula’, which
we will investigate later.
Exercise 22 (Spring Cleaning) After finally getting building and environmental
approvals, as well as successfully subduing Greens Party protesters, St. George’s
College sends Project Death Star into its testing phase. Having a particular dis-
taste for Justin Bieber, Thomas decides that he wants to aim and fire the gamma-
ray LASER on the death star at Justin Bieber’s hometown – during Christmas when
Justin Bieber is home with his family. For shielding reasons, in its inactive state,
the Death Star’s cannon is oriented along the x-axis in the following figure.
Figure 3: Aiming an Orbital Death Star with sequential rotations.
This is because the cannon portion of the death star has weaker armour. In order
to fire the death star at Justin Bieber, Thomas must rotate the death star to point
at Ontario, Canada. After the death star is oriented in this way, Emma Krantz’s
targeting algorithm will takeover and refine the aim to Justin Bieber’s house.
The coordinate system we use is centred with the death star at its origin. In order
to shoot JB, the death star must be oriented in the direction of the purple ray in the
above diagram. This can be achieved by feeding the correct rotation matrix into the
death star targeting systems. There are multiple ways to construct such a matrix –
80
however, for our purposes, it is easiest to construct it by sequential rotations about
the three different coordinate axes.
Q: Write down the rotation matrices corresponding to the rotations indicated by
each of the angles – α, β, γ – show in the diagram. Note that these are not nec-
essarily in the order x − y − z! Once you’re confident that you have the correct
rotation matrices, multiply these matrices in the correct order to give a rotation
matrix which will rotate the death star cannon from the x − axis to Justin bieber’s
home state.
Hint: It helps to keep track of which coordinate stays constant under a certain ro-
tation – recalling the rotation eigenvectors, it then follows that you are performing
a rotation about that coordinate axis. For example, the γ rotation correspond to an
anticlockwise rotation about the y coordinate axis through an angle γ.
After pointing the death star at Ontario, the Krantz algorithm takes over and per-
forms a super-accurate shot – killing Justin Bieber with minimal collateral dam-
age. Fearing that the warlike nation of Canada will retaliate with direct line-of-
sight missile attacks, Thomas decides it is best to return the death star to its original
orientation along the x-axis – the side that faces Canada will thus have more ar-
mour as well as an anti-missile system featuring an array of LAWS Naval lasers
stolen from the U.S. Military.
Q: Write down a sequence of rotations to rotate the death star to its original ori-
entation. Now write down a single rotation matrix to perform this total rotation.
Hint: Recall the fact that rotation operations form a Lie group – in particular,
this means that every rotation has an inverse. Recalling the properties of the ro-
tation group SO(3), in particular the orthogonality property: AT = A−1, there
is a super-easy way to invert the death star rotation and return it to its original
orientation. Alternatively, recall that you showed R(−θ) = (R(θ))−1 – either al-
gebraically or geometrically. Use this to find the rotation matrix which returns the
death star to its original orientation.
11.4 so(3): Quaternions, Lie Algebras and Cross Products
Due to the extent of this tutorial, we will defer our investigation of Lie algebras and
the fate of the St. George’s College Orbital Death Star to the next tutorial. This will
involve a space battle with ships made by our rival colleges, so make sure you keep
your Georgian spirit alight by attending the next tutorial! Anyone found AWOL
will be marked as traitors and executed by the death star accordingly.
81
12 Tutorial 9+10: The Fault in Our Stars – Project Death
Star (II)
In the last tutorial, recall that we investigated the following concepts:
• Linear Maps and Matrices: Every linear transformation acting on a finite-
dimensional vector space can be represented as a matrix in some chosen
basis (usually the standard basis). To see this explicitly, we saw how a linear
map, f, acted on a set of basis vectors {ei} to give the components of some
matrix, Ai
j, representing the linear map f in that basis.
• Rotations: We argued geometrically that rotations are linear transformations
– hence they have a matrix representation. We the showed that these matri-
ces formed a special structure called a ‘Lie group’, which we denoted by
SO(3) – the 3-dimensional Special Orthogonal Group. Having character-
ized rotations as a Lie group, we then used several properties of this group
structure to construct a rotation matrix which rotated the St. George’s Col-
lege Death Star from an idle position to one pointing at Justin Bieber’s house
– subsequently firing and killing Justin Bieber.
In this tutorial, we will extend what we learned about linear maps and SO(3) – the
Lie Group of Rotations, to the idea of a ‘Lie Algebra’. In particular, we will see
how vectors in three dimensions, equipped with the cross-product operation form
a Lie Algebra. We will then introduce the idea of the ‘matrix exponential’ and see
how to derive the rotation matrices using their corresponding Lie algebra. In the
next and final Project Death Star tutorial, we shall unite these ideas by introducing
quaternions and Clifford algebras, then seeing how they can be used to represent
rotations in the most computationally efficient and stable way. For now however,
we will continue the adventure of ‘Project Death Star’ whilst employing the power
of mathematics along the way to vanquish our rival colleges.
82
12.1 Infinitesimal Rotations and Lie Algebras
Recall that in the standard basis {e1, e2, e3} for 3-dimensional Euclidean space R3,
anticlockwise rotations about the x, y and z axes are represented by the matrices
R1, R2 and R3, respectively. These matrices (as a function of the rotation angle)
were stated to be:
R1(θ) =

!
1 0 0
0 cos θ − sin θ
0 sin θ cos θ
(
0
) (161)
R2(β) =

!
cos β 0 sin β
0 1 0
− sin β 0 cos β
(
0
) (162)
R3(γ) =

!
cos γ − sin γ 0
sin γ cos γ 0
0 0 1
(
0
) (163)
We then argued using the eigenvectors of rotation, why these matrices correctly
represented rotations about their respective axes. We can however, derive these
matrices in several ways. One such way is via Lie algebras. To motivate this
connection, we one may ask as Sophus Lie did – how the rotation group behaves
on the infinitesimal scale. That is, how do we represent rotations through a very
small (infinitely small) but non-zero angle?
For simplicity, we shall first consider rotations through an infinitesimal angle δθ,
about the x, y and z axes. As such, there are several equivalent ways of constructing
such rotations:
• Taylor expanding the rotation matrices about zero – i.e. computing ˆR(0 +
δθ). This means performing a Taylor expansion of each of the functions
(sines and cosines) in the rotation matrices about zero:
sin(0 + δθ) =
∞
n=0
(−1)n
(2n + 1)!
(δθ)2n+1
≈ δθ + O((δθ)3
) (164)
cos(0 + δθ) =
∞
n=0
(−1)n
(2n)!
(δθ)2n
≈ 1 + O((δθ)2
). (165)
Note that the ‘big O’ notation +O((δθ)k) means ‘plus terms of order k
or greater’ in theta – i.e. terms which involve a factor of (δθ)k, (δθ)k+1,
83
(δθ)k+2 .... etcetera. Hence if we set the angle δθ 1, all higher order
terms (δθ)2, (δθ)3 ... have diminishing contributions to our representations
of the sine and cosine functions. In particular, if the angle δθ is ‘infinites-
imal’, any term which has δθ squared or any higher power, has negligible
contribution 73 – thus we discard these terms and keep only terms which are
linear in δθ – i.e. (δθ)1 and constant terms.
• We compute the linear approximation to the rotation matrices. This is pre-
cisely the same as doing the first order Taylor expansions of the sin and cos
functions about θ = 0. Another way to view this is to recall the way we used
the total differential to compute the ’absolute error’ in our earlier tutorials
– what we defined to be the error (omitting the absolute value signs) ∆f in
some quantity f, in-fact corresponds to a first order Taylor expansion or lin-
earisation (‘the tangent plane approximation’) to our function f about some
initial value:
f(θ + ∆θ) ≈ f(θ) + ∆f(θ) = f(θ) +
df
dθ
|θ∆θ. (166)
In this case, we are replacing a finite shift in angle with an infinitesimal one:
∆θ → δθ.
Regardless of which way view it, the result is the same.
Problem 17 (Warm-up) Having run out of elaborate excuses to skip the Sunday
SGC Mathematical Sciences tutorials, Angela decides to enrol in the Australian
SAS 74. Having passed the fitness tests (which include a large lung capacity), she
meets an ironic twist. As it turns out, Brigadier Daniel McDaniel 75 is actually
Daniel Ogburn in disguise (as hinted by his suspicious last name). To pass Angela,
he therefore decides that a true test of her aptitude for solving new problems quickly
(required in combat), is to get her to derive the following infinitesimal rotation
73
In a branch of mathematics known as ‘non-standard analysis’, one rigorously (axiomatically)
define (δθ)k
= 0 for k ≥ 2 and thus the approximate equals symbol becomes a formal equality.
74
Special Air Service regiment of the Australian Army – an elite commando unit.
75
Current commander of the Australian Defence Force’s Special Operations Command.
84
matrices:
R1(δθ) =

!
1 0 0
0 1 −δθ
0 δθ 1
(
0
) (167)
R2(δθ) =

!
1 0 δθ
0 1 0
−δθ 0 1
(
0
) (168)
R3(δθ) =

!
1 −δθ 0
δθ 1 0
0 0 1
(
0
) (169)
See if you can complete Angela’s problem and get into the Australian SAS under
Daniel’s criterion.
Hint: Start with original finite rotation matrices, then use the Taylor expansions /
linearisation discussed previously to arrive at the infinitesimal rotations.
Now, notice that all the infinitesimal rotation matrices have 1 s down the main
diagonal – hence they can be written as the 3×3 identity matrix I3 = diag(1, 1, 1)
plus some matrix involving the infinitesimal angles δθ (which we can express as
δθ multiplying some matrix) :
R1(δθ) =I3 + δθE1
R2(δθ) =I3 + δθE2
R3(δθ) =I3 + δθE3, (170)
85
where the matrices Ej are defined by76
E1 =

!
0 0 0
0 0 −1
0 1 0
(
0
) (171)
E2 =

!
0 0 1
0 0 0
−1 0 0
(
0
) (172)
E3 =

!
0 −1 0
1 0 0
0 0 0
(
0
) (173)
I3 :=

!
1 0 0
0 1 0
0 0 1
(
0
) (174)
Problem 18 (Orbital Warm-up) Having shut-down the St. George’s College Or-
bital Death Star to update the Death Star software, Thomas decides that he should
verify the above decomposition of the infinitesimal rotation matrices given by the
set of equations (170). Using simple matrix algebra, Verify that these expressions
are indeed true – hence giving an efficient approximation (which doesn’t involve
sines or cosines) to small-rotations (fine aiming) of the Death Star gamma-ray
laser.
If we have a vector v, then apply an infinitesimal rotation77 ˆR(δθ) to it, we get the
rotated vector ˆR(δθ)v – which can be computed in some chosen basis using ma-
trix multiplication and representing v as a column vector (recall the last tutorial).
Hence the amount δv that the vector v has shifted under the infinitesimal rotation
δθ is given by the difference:
δv = ˆR(δθ)v − v = ( ˆR(δθ) − I3)v, (175)
since I3v = v. Therefore, the infinitesimal rate of change of the vector v under
76
I included an explicit expression for the identity matrix I3 for those of you who don’t know what
this is.
77
Henceforth we shall use ˆR(θ) to denote some arbitrary rotation about some axis, through an
angle θ anticlockwise.
86
the rotation ˆR with respect to the rotation angle δθ, is given by:
δv
δθ
=
( ˆR(δθ) − I3)
δθ
v. (176)
This looks suspiciously like a derivative – it is in fact a formal derivative if we
take the limit as the rotation angle δθ → 0. To see this, we consider a vector
u(θ) which is a function of the rotation angle θ performed by the rotation ˆR(θ). In
particular, we let this vector function coincide with the constant vector v when it
is not-rotated – i.e. u(θ = 0) = v. For a general rotation angle θ, we therefore
have u(θ) = ˆR(θ)u. The infinitesimal rate of change of this vector function with
respect to θ is therefore expressed as the derivative:
d
dθ
u(θ) = [
d
dθ
( ˆR(θ) − I3)]v = [
d
dθ
ˆR(θ)]v. (177)
We take the derivative of a matrix which has functions as entries by taking the
derivative of each function – since the identity matrix I3 is constant, its derivative
is just the zero matrix: d
dθ I3 = 0. Since we were considering infinitesimal rotations
originally (‘small angles close to zero’), we evaluate this derivative of the vector
function u(θ) at the origin θ = 0 (which means taking the derivative then setting
θ = 0):
d
dθ
u(θ)|θ=0= [
d
dθ
( ˆR(θ)]|θ=0v. (178)
Notice that as θ varies, the vector u(θ) traces out a curve78 as it rotates – since the
vector d
dθ u(θ)|θ=α represents the rate of change of u at θ = α, is therefore tangent
to the curve at θ = α. Equivalently, we can consider the derivative of the rotation
matrix [ d
dθ ( ˆR(θ)]|θ=α to be ‘tangent to the rotation operator’ ˆR(θ) at θ = α in
the abstract sense. When θ = 0, any rotation operator is simply represented by
the identity matrix I3 – which is the identity element of the Lie Group SO(3) of
rotations 79. Hence infinitesimal rotations correspond to rotation matrices which
are ‘close’80 to the identity matrix I3.
As it turns out, if we restrict our attention to the behaviour of the rotation group
SO(3) about the origin – i.e. infinitesimal rotations and rotation matrices ‘close’
to the identity matrix I3, then in particular, the matrices:
d
dθ
ˆR(θ)|θ=0 (179)
78
Such curves are called ‘integral curves’ and the vector field d
dθ
u(θ) corresponds to the ‘direction
fields’ you may know from the theory of differential equations.
79
Recall the last tutorial
80
The notion of matrices being ‘close’ can be formalized by defining a metric or ‘norm’ (notion
of distance) for matrices – for example, the Froebenius or Hilbert-Schmidt operator norms.
87
which are ‘tangent to the identity matrix I3’, form a special type of mathematical
structure called a ‘Lie algebra’. Lie algebras are very special, because in some
sense they are a ‘linearisation’ or ‘first order approximation’ to a Lie group close
to the identity element of that group. More formally, if we recall that a Lie Group
is a smooth manifold (a generalization of a ‘smooth surface’ to arbitrary dimen-
sions), then its Lie algebra is defined to be the tangent space at the identity – which
you can think of as the ‘tangent plane approximation’ to the Lie group at the iden-
tity element 81. Many properties of the Lie Group are encoded in its Lie algebra
and because the Lie group is a non-linear object in general, it is much easier to
investigate the Lie algebra (which is a linear structure – a vector space) to deduce
properties of the Lie group.
Problem 19 (Devil in the Detail) On a bout of procrastination, Rowan Seton de-
cides to attend a SGC Mathematical Sciences Tutorial on the St. George’s College
Orbital Death Star. To stop Rowan from putting cats on Emma’s computer, Daniel
orders Rowan to complete the following calculations – fitting because Rowan likes
to go on tangents. In particular, show that for the rotation matrices Rj(θ) (where
j = 1, 2, 3) that the corresponding tangent matrices at the identity element are
given by:
d
dθ
R1(θ)|θ=0=E1
d
dθ
R2(θ)|θ=0=E2
d
dθ
R3(θ)|θ=0=E3 (180)
which motivates the decomposition we performed earlier for infinitesimal rota-
tions.
The purpose of the last exercise is to show that the matrices {Ej} correspond
to the linearisation of the rotation group SO(3) about the identity matrix – that
is, that they are tangent matrices living in the Lie algebra82 so(3) of the rotation
group.
Having some intuition now of how to construct a concrete Lie algebra (and a Death
Star), we now proceed with a formal definition of a general Lie algebra.
81
Recall that for a curve, the tangent line to that curve at any point x is a linear approximation to
the curve about that point. The tangent plane approximation generalizes this notion to differentiable
surfaces/geometric objects of arbitrary dimension.
82
For a lie group G, one usually denotes its Lie algebra by g – which is lower-case G in the
‘fraktur’ font.
88
Definition 8 A Lie algebra g is a vector space83 g (over a field F) equipped with a
binary operation84
[ , ] : g × g → g
(u, v) → [u, v] (181)
called the ‘Lie bracket’, which satisfies the following properties:
1. Left Linearity:
[αu + βv, w] = α[u, w] + β[v, w] ∀u, v ∈ g, ∀α, β ∈ F (182)
2. Anti-symmetry:
[u, v] = −[v, u] ∀u, v ∈ g, ∀α, β ∈ F (183)
3. Jacobi Identity:
[u, [v, w]] + [v, [w, u]] + [w, [u, v]] = 0 ∀ u, v, w ∈ g. (184)
Thus, we can think of a Lie algebra as a vector space whose vector multiplication
operation is the Lie bracket. However, earlier we said that the tangent matrices
{Ej} were elements of the Lie algebra – implying that they are vectors. This is not
a mistake – when we refer to a Lie algebra g as a vector space, it means a vector
space in an abstract sense (not column vectors!). A quick review of the vector
space axioms85 (defining properties) should reveal that the set of n × n real or
complex-valued matrices form a vector space – the basis for the vector space has
n2 basis vectors; one such basis consists of the matrices µij whose entries are all
zero except for entry in the i − th row and j − th column (which we can set to
be 1). In this manner, what we referred to as ‘tangent matrices’ are indeed tangent
vectors in this abstract sense.
Problem 20 (The Girl Who Cried Wolf) In one of many universes in the multi-
verse, a St. George’s fresher by the name of Sophia Lie continually makes excuses
not to attend the SGC Mathematical Sciences tutorials. This is because she has
questioniaphobia – a fear of asking questions. One day, an optically- and radar-
cloaked spaceshuttle docks with the the St. George’s College Dragon (the newly
83
Recall that a vector space over a field F is a set of vectors which obey the usual rules of vector
addition and scalar multiplication – for our purposes we usually take the field to be the real or
complex numbers, R and C.
84
Recall we defined binary operations in Tutorial 8.
85
Ask your tutor.
89
elected name for the Death Star). A team of Saint Catherine’s raiders board the
shuttle and capture Sophia while she is in her room – destroying the Dragon’s fine-
targeting systems on the way. Sophia sends an sms to her fellow Georgians in the
MS tutorials, but they refuse to believe her. To convince them that she is serious,
she decides to complete Tutorial 8 and 9.
To help Sophia, prove that the Left-Linearity and Anti-Symmetry properties of a
Lie algebra together imply Bilinearity:
[αu + βv, w] = α[u, w] + β[v, w], [w, αu + βv] = α[w, u] + β[w, v] ∀u, v ∈ g.
(185)
Furthermore, show that if we replace the Left-Linear property with the Bilinear
property and replace the anti-symmetry property with the alternating property:
[u, u] = 0 ∀ v ∈ g, (186)
then these together imply the anti-symmetry property 86.
In our case, the Lie algebra so(3) of the rotational Lie group SO(3) is a vector
space whose (abstract) vectors are 3 × 3 matrices satisfying certain conditions. To
find these conditions, we recall that the Special Orthogonal Group SO(3) was to
defined to be the set of 3 × 3 matrices which satisfied the criteria:
• Volume and Orientation Preserving: det(R) = 1
• Orthogonality: RT R = 1 ⇐⇒ RT = R−1
. If we now look at what happens to these conditions when R(δθ) = I3 + δθE is
a matrix representing an infinitesimal rotation δθ about some axis and E is some
tangent matrix at the identity, then the orthogonality condition gives:
(I3 + δθE)T
=(I3 + δθE)−1
=(I3 − δθE) =⇒ ET
= −E cancelling terms on both sides
(187)
which we can write as 87
ET
+ E = 0 ∀ E ∈ so(3). (188)
86
Note that this latter redefinition allows one to extend the notion of a Lie algebra to vector spaces
over number fields with a characteristic of 2.
87
Note we used the fact that the inverse rotation (R(δθ))−1
= R(−δθ) is given by rotating in the
reverse direction.
90
This condition means that all tangent matrices E – i.e. all matrices (abstract vec-
tors) in the rotational Lie algebra so(3) are anti-symmetric (symmetric about the
main diagonal but with opposite signs). As a consequence all matrices in the Lie
algebra are traceless – which is the infinitesimal form of the det(A) = 1 condi-
tion:
tr[E] = 0 ∀E ∈ so(3). (189)
Exercise 23 (Trial By Combat) In an on-going rivalry over who is taller, Leanora
and Daniel decide to duel on the bridge of the St. George Dragon. After 5 seconds
of attempted kicks and punches, Lea clumsily slips over and gives herself a con-
cussion – requiring Aston to rush her to the nearest hospital on the International
Space Station. As a winner of the duel, Daniel officially renames ‘Lie Algebras’ to
‘Lea Algebras’ and ‘Lie groups’ to ‘Ogburn groups’, since Lie algebras represent
the infinitesimal (vanishingly small) approximation to a Lie group.
As part of this process of re-writing all textbooks on Lie group theory, prove that
the anti-symmetry condition:
AT
+ A = 0 ∀ A ∈ so(3). (190)
implies the traceless condition: tr[A] = 0.
Hint: Recall that transposing a matrix doesn’t change its trace: tr[E] = tr[ET ].
Now show that the traceless condition tr[E] = 0, implies the rotation matrix
R(θ) = eθE satisfies the volume/orientation preserving condition: det[R(θ)] = 1.
Hint: Note that the exponential here is the ‘matrix exponential’ of the matrix E
(multiplied by the scalar θ) – which we will investigate later. For now it suffices to
use the following general exponential relation between the trace and determinant
of any square matrix A
det[eA
] = etr[A]
. (191)
Now that we have covered a fair amount of ground, it is time we move towards a
climactic result in our adventure. To do this, we define the Lie bracket on a matrix
Lie group to be given by the matrix commutator:
[A, B] := AB − BA ∀ n × n matrices A, B. (192)
Recall that matrix multiplication is not commutative, so in general AB = BA –
the commutator [A, B] is thus a measure of ‘how much’ the matrices A and B fail
to commute.
91
Ater completing the following exercise we will see the link between the lie algebra
of rotations so(3) to vector cross-products and Lie groups – which is perhaps geo-
metrically hinted at by the right-hand rule and the orthogonality of cross products.
Exercise 24 (The Twelve Labours of Joshua) In an unfortunate turn of events,
Joshua Bailey is blamed for the destruction of the computer systems controlling
the automated fine-targeting of the anti-missile/anti-shuttle Laser Weapon Systems
(LAWs)88. Not realizing this sabotage was lead by the Saint Catherine’s student –
Bronwen Herholdt89, posing as a competitor in the Inter-college Piano Competi-
tion, the warden of St. George’s College sentences Joshua to twelve labours in the
land of Lie groups and Lie algebras.
Recall that the matrices {Ej} were shown to be matrices which were tangent to
the rotation matrices {Rj(θ)} at the identity I3 of the Lie group of rotations. As
a friend of Joshua, to show that these matrices form a Lie algebra – the rotation
algebra so(3), you should help Joshua complete the following tasks:
1. The n − th power An of some square matrix A, is given by multiplying A
by itself n times. For the three tangent matrices Ej (j = 1, 2, 3), compute:
(Ej)1, (Ej)2, (Ej)3, (Ej)4. In particular show that:
(Ej)2
=Matrix which becomes − I2 after deleting the j’th row and column
(Ej)3
= − Ej
(Ej)4
=Ej (193)
For example,
(E1)2
=

!
0 0 0
0 −1 0
0 −1
(
0
) (194)
(195)
2. Using the previous results, show that for n ≥ 1:
(Ej)2n
=Matrix which becomes (−1)n
I2 after deleting the j’th row and column
(Ej)2n+1
=(−1)n
Ej. (196)
88
The U.S. Navy anti-missile /anti-UAV/anti-torpedo system: https://www.youtube.com/
watch?v=gMfYUyrKRng.
89
Our enemy, but my friend.
92
3. By writing down a general 3 × 3 anti-symmetric matrix (AT = −A), you
should see that it is parametrised by three unknowns (real numbers): A =
A(α, β, γ). In particular, show that any anti-symmetric matrix A can be
written as a linear combination of the {Ej} matrices:
A(α, β, γ) = αE1 + βE2 + γE3, α, β, γ ∈ F. (197)
This says that the set so(3) of 3×3 anti-symmetric matrices is a 3-dimensional
(abstract) vector space with {E1, E2, E3} acting as a set of (abstract) basis
vectors – hence why we denoted them using ‘E’ initially.
4. Using the matrix commutator, [A, B] = AB −BA, as the Lie bracket, prove
that the abstract vector space so(3) is indeed a 3-dimensional Lie algebra
– the special orthogonal algebra. To do this, simply verify that the matrix
commutator [_, _] is a binary operation and that so(3) satisfies the three 3
properties required by a Lie algebra.
Hint: First show that the matrix commutator [_, _] is anti-symmetric and left-
linear in general, then show that it obeys the Jacobi identity in general. It then
suffices90 to show that [_, _] is a binary operation – i.e. that the commutator
[A, B] of any 3 × 3 anti-symmetric matrices A, B is also anti-symmetric, by
showing that (using Einstein summation notation):
[Ei, Ej] = ijkEk
(198)
where ijk is the Levi-Civita symbol defined in tutorial 7.
5. Those of you familiar with vector cross-products will notice the similarity
between the Lie algebra relation: [Ei, Ej] = ijkEk and the cross-product
relation for the standard basis vectors {ej} in 3-dimensions:
ei × ej = ijkek
. (199)
This is because 3-dimensional Euclidean space R3 equipped with the vector-
cross product is indeed a Lie algebra! In-fact, it is identically the same Lie
algebra as so(3) simply presented in another way – we therefore say these
Lie algebras are ‘isomorphic’.
By defining the Lie bracket on R3 to be the cross-product:
[v, u] := v × u, v, u ∈ R3
(200)
90
If the commutator of any basis vectors produces an anti-symmetric matrix, bilinearity then im-
plies that the commutator is a binary operation.
93
show that (R3, ×) is indeed a Lie algebra.
Hint: You can essentially copy the proof you used for so(3) or find the
(obvious) isomorphism (one-to-one correspondence) between R3 and so(3)
– which has been hinted at in many ways.
6. In proving that cross-products in R3 form a Lie algebra, we have the bilinear
property in particular. This then shows that the following operator:
[r, _] = r× (201)
is a linear operator, acting on vectors in R3 to give the cross product:
[r, _](v) := [r, v] = r × v. (202)
Recalling the correspondence between linear operators and matrices, it fol-
lows that the operator r× has a matrix representation – this representation is
given by the Lie algebra isomorphism between R3 and so(3):
ej ↔ Ej, u × v ↔ [ui
Ei, vj
Ej]. (203)
In particular, we represent v× by the following matrix (using Einstein sum-
mation91)
v× → [v]× = vj
Ej =

!
0 −v1 v2
v1 0 −v3
−v2 v3 0
(
0
) (204)
(205)
Show explicitly by matrix multiplication, that [v]×u = v × u. Hint: Repre-
sent u as a column vector in the standard basis.
7. Earlier you computed the general odd and even powers, (Ej)2n+1 and (Ej)2n,
of the tangent matrices. If you were observant, you will have realized that:
E4n = E is the same periodic relation that the imaginary unit i4n = i obeys.
This is part of a deeper connection between Lie algebras and Lie groups
given by the ‘exponential map’. For compact connected Lie groups like the
rotation group SO(3), one can recover the entire group from its Lie algebra –
i.e. all information about the rotation group can be obtained from knowledge
of its infinitesimal behaviour about its identity.
91
Recall in 3-dimensions that v = vj
ej = v1
e1 + v2
e2 + v3
e3.
94
Formally, we define the matrix exponential of an arbitrary matrix A as:
eA
:=
1
n!
An
, (206)
provided the series converges. Note we define the zeroth power of a square
matrix to be the identity matrix: A0 = I. The matrix exponential obeys the
usual properties of the exponential function except that in general: eAeB =
eA+B, since matrix multiplication does not commute 92.
In general, the exponential map is given by the exponential map of Rieman-
nian geometry – which makes use of the fact that a Lie group is a smooth
manifold. Matrix Lie groups, such the rotation group, SO(3), are just a
special case in which the general exponential map can be expressed as the
matrix exponential.
Q: Show that eθEj is a solution to the matrix differential equation
d
dθ
Rj(θ)|θ=0= Ej. (207)
Hint: Use the series expansion of eθEj and the fact that (θA)n = θnAn for
an scalar θ and any square matrix A.
8. As promised, we now use Lie algebras to establish a fundamental link be-
tween cross-products and rotations. In particular, using the definition of the
matrix exponential, show that the rotation matrices Rj(θ) are given by ex-
ponentiating the tangent matrices which act as a basis for the Lie algebra
so(3):
eθEj
= Rj(θ), (208)
for j = 1, 2, 3.
9. Using previous observations, we can express a rotation about an axis defined
by some unit vector ˆv using our Lie algebra isomorphism and the matrix
exponential. In particular, [ˆv]x = vjEj and:
Rˆv(θ) = eθvjEj
. (209)
This is extremely inefficient, but if you have infinite time, check that the
above expression coincides with that given by ‘Rodrigue’s Rotation For-
mula’. Otherwise, try expanding both expressions to say – first order in
92
The correct relation is given by the Baker-Campbell-Hausdorff formula.
95
θ, then check that both Rodrigue’s formula and the SGC formula coincide
to first order. Note that if the rotation angle θ is small, but not infinitesimal,
you can still obtain approximations of the rotation matrix Rˆv(θ) or arbitrary
accuracy by taking more terms in the exponential series expansion.
10. It is possible to derive the standard rotation matrices, Rj(θ), corresponding
to rotations about the x, y and z axes by recalling the correspondence be-
tween linear maps and matrices. In particular, using trigonometry and stan-
dard geometry, you can derive a formula which rotates a vector v = (x, y, z)
about the z axis by keeping the z component constant.
Recall from tutorial 8 that the matrix components of a linear map acting on
some vector space were given by its action on the basis vectors for that vector
space (using Einstein summation):
ˆRv = ( ˆR)i
jvj
ei. (210)
Thus, in particular for the standard basis vectors:
ˆRei = ( ˆR)j
iej. (211)
So for example, for a rotation anticlockwise about the x-axis acting on a unit
vector in the x-direction, we have:
R1(θ)e1 =(R1(θ))j
1ej
=(R1(θ))1
1e1 + (R1(θ))2
1e2 + (R1(θ))3
1e3
=e1 (212)
since e1 is an eigenvector of x-axis rotations. Hence without knowing (R1(θ))1
1,
(R1(θ))1
2, and (R1(θ))3
1, we can then determine these components of the x-
rotation matrix by comparing the coefficients of the rotated vector e1. From
this we deduce that: (R1(θ))1
1 = 1 and (R1(θ))j
1 = 0 for j = 2, 3. Simi-
larly, by geometrically finding R1(θ)e2 and R1(θ)e3 via trigonometry, you
can work out the rest of the components of the x-rotation matrix R1(θ).
In the fashion just demonstrated, derive the x, y, z rotation matrices {Rj(θ)}.
11. Having neared the end of his Labours, the Warden decides to give Joshua the
peaceful and easy task of verifying Rodrigue’s Rotation Formula:
v(θ) = v0 cos θ + (k × v0) sin θ + k(k · v0)(1 − cos θ) (213)
which describes a vector v0 rotated through an angle θ about an axis defined
by the unit vector k. In particular, check that when you set the rotation
96
axis k equal to one of the standard basis vectors for R3: k = ej, that the
resulting rotated vector v(θ) is the same as the vector Rj(θ)v you would get
by applying the Rj(θ) rotation matrix.
12. Deciding that the last task was easy (despite being tedious), the Warden
sets a final labour for Joshua – to use the Lie-algebra isomorphism between
(R3, ×) and su(3) to express Rodrigue’s rotation formula in matrix form.
This means constructing an explicit matrix Rk(θ) such that Rk(θ)v0 =
v(θ).
Hint: You can express the right-hand-side of Rodrigue’s formula as some
matrix / linear operator acting on the vector v0, by writing the cross product
operators as matrices.
Hint: You will also need to use to express the dot product in matrix form –
as a row vector (on the left) multiplying a column vector (on the right).
Hint: You will need to use the vector-triple product formula for cross prod-
ucts to collect the cosine terms.
Exercise 25 (The Fault in Our Stars) Following the sabotage of the targeting com-
puters for the LAWs defence system, the Orbital Death Star is an orbiting duck.
Moments after the Georgians receive Sophia Lie’s solutions and warning message,
Trinity College and University Hall fire two Space Honey Badgers at the Georgian
Dragon. Unable to use the Krantz algorithm to aim the laser turrets, Angela – who
makes a guest appearance at the critical moment, decides to use the infinitesimal
rotation matrices to track the slow-moving honey badgers. Such a strategy avoids
having to evaluate the sine and cosine functions without a computer – an approx-
imation made possible by the fact that the lasers only have to perform small, slow
rotations to track the fearsome honey badgers 93.
To be continued ...
93
This is sufficient since the speed of light (hence a laser beam) is 3 × 108
m/s, negating the need
to consider time-of-flight at close distances.
97
13 Tutorial 11: Fiery the angels fell – Project Death Star
(III)
13.1 Prelude
Having not completed the last tutorial, the students of the St. George’s College
Mathematical Sciences tutorial were unable to manually shoot down the space
honey badgers fired from the Trinity College and University Hall sponsored dread-
noughts. As such, the honey badgers, Beelzebub and Mammon unleashed the en-
tirety of Pandaemonium on the St. George’s College Orbital Star (The Dragon).
Luckily however, by some hidden cunning, the regular attendants – Matthew Fer-
nandez, Joshua Bailey and William Cheng 94, managed to escape the fiery col-
lapse of the the dragon. With some resourcefulness – and the help of a distraction
provided by Georgie, the college puppy, they secured the death star blueprints ...
bringing them back to the SGC Mathematical Sciences tutors.
In the last tutorial, the ‘big picture’ themes and main ideas that you should have
understood were as follows.
1. With every Lie Group, there comes attached a corresponding Lie algebra.
Geometrically, we found this to be some abstract tangent plane approxima-
tion to the Lie group, encoding information about the group in an infinites-
imally small neighbourhood about its identity element. For ‘compact con-
nected’ Lie groups, such as the rotation group, one is able to reconstruct the
entire group simply by knowing its Lie algebra – the reconstruction being
performed by the exponential map.
2. A Lie algebra is an abstract vector space, characterized by an algebraic bi-
nary operation known as the ‘Lie bracket’. This operation was bilinear, anti-
symmetric and obeyed the Jacobi identity.
3. The algebra of vector cross-products in 3-dimensions and the lie algebra of
rotations were two explicit examples of a Lie algebra, which had a concrete
matrix representation. They were in-fact the ‘same’ Lie algebras in the sense
that they were isomorphic (the same Lie algebraic structure represented in
different ways).
4. The matrix exponential map allowed us to reconstruct the rotation matrices
from the tangent matrices (representing cross-product operators) we derived.
94
Disclaimer: These students requested to feature in the story, on the premise of their consistent
commitment.
98
Such a map is fundamental to understanding stability analysis, dynamical
systems, linear differential equations, Riemannian geometry and various ab-
stractions in higher-level mathematics.
In this tutorial, we will investigate the following structures as well as their applica-
tions:
1. The Circle Group S1, complex exponential and rotations in the complex
plane (2-dimensions).
2. Hamilton’s Quaternions, the 3-dimensional sphere S3 and rotations in 3-
dimensional space.
13.2 The Circle Group
By now, most of you should have some familiarity with the algebra of complex
numbers. Recall that a complex number can be represented in Cartesian form,
z = x+iy, where x, y ∈ R. As such, we can identify vectors in the complex plane
C with the 2-dimensional real plane R2, via the map:
z → (Re(z), Im(z)) (214)
which sends z = x + iy to the ordered pair, (x, y). This identification (isomor-
phism), allows for a very efficient way to rotate 2-dimensional real vectors, via
Euler’s formula
eiθ
= cos(θ) + isin(θ) (215)
and the polar (radius-angle) representation of complex numbers:
z = reiθ
, r = |z|=
—
x2 + y2, tan(θ) =
y
x
. (216)
In particular, given any 2-dimensional real vector, v = xe1 +ye2, we can represent
it as the complex number z =
—
x2 + y2ei arctan( y
x
)
. If we denote θ = arctan(y
x ),
then to rotate v by some angle α anti-clockwise in the complex plane, we simply
use the algebraic structure of the complex numbers:
z = eiα
z = |z|eiα
eiθ
= |z|ei(α+θ)
. (217)
To recover the rotated real vector v from the complex number z , we simply use
Euler’s formula:
z = |z|ei(α+θ)
= |z|(cos(α + θ) + i sin(α + θ)), (218)
99
followed by the inverse isomorphism:
v =|z |cos(Arg(z ))e1 + |z |sin(Arg(z ))e2
=
—
x2 + y2 cos(α + θ)e1 +
—
x2 + y2 sin(α + θ)e2. (219)
In this manner, we avoid the use of two-dimensional rotation matrices and change-
of-basis formulas. You might think – well, so what, rotations in two dimensions
are so easy that compute them while running out to front lawn during a 5:30am fire
drill. Well, that may be true – however, the natural question would be to ask, can
we use some clever isomorphism with a higher-dimensional generalization of the
complex numbers, to easily compute rotations in 3-dimensions? Of course we can.
Problem 21 (DYI: Isosceles Triangle) Lamenting the loss of The Dragon, Georgie
and the less-committed tutorial attendants, Matt, Joshua and William decide to
overcome their grief (and pay their respects) by completing more mathematics
problems. By choosing your favourite vector in R2, use the above isomorphism
between R2 and C to construct an isosceles triangle. This means making two
copies your vector and using Euler’s formula to rotate them in opposite directions,
by your favourite acute angle α. You may then need to horizontally or vertically
translate your vectors away from the origin to form an acute triangle, with either
the horizontal or vertical axes as the base.
Hint: Don’t choose the zero vector 0. Your favourite vector should in fact be (1, 0)
or e1. Don’t choose the zero angle. Your favourite acute angle should be π
4 or π
3 .
You can also form the third side by creating a vector with vector-subtraction.
As this point, you may be wondering what the ominous ‘circle group’ is. Strictly
speaking, a circle is a 1-dimensional smooth manifold, denoted by S1 (meaning
the 1-dimensional circle). A circle only appears to be two-dimensional because we
embed it in a 2-dimensional or 3-dimensional space – however, for a circle of some
fixed radius r, you can always parametrise it by one angular variable θ. You could
parametrise it in Cartesian coordinates (x, y) with two variables – however, this
requires the constraint r =
—
x2 + y2, meaning you can always write one variable
in terms of the other (because the radius is fixed) – hence really leaving only one
independent variable.
If we now restrict our attention to complex numbers (or 2-dimensional real vectors)
of some fixed radius – say r = 1 for simplicity, then we obtain a subset S1 of the
complex plane C, corresponding to the unit circle centred at the origin. As it turns
out, this subset forms an algebraic structure called ‘the circle group’ – satisfying
the axioms of an abelian group.
100
Exercise 26 By either consulting tutorial 8 or one of your tutors for the axioms of
an abelian group, show that subset of complex numbers with unit length form an
Abelian group.
Hint: Recall that if z, w ∈ C are complex numbers with unit length, then we can
represent them in polar form by: z = eiθ and w = eiφ, where θ and φ are the
principle arguments of z and w, respectively. Therefore, you should parametrize
S1 as the set: {eiθ : θ ∈ R}.
Strictly speaking, we should restrict to θ ∈ [0, 2π) and use modular arithmetic (the
formal term for what you usually do anyway):θ + 2π ≡ θ[mod2π].
Stronger to the previous result, the circle group is in fact a Lie group! To show
this, you could demonstrate that the multiplication map is smooth (simply corre-
sponding to the addition of angles) and that the unit circle S1 is a smooth manifold
(for example, by forming charts from stereographic projections). Since it is com-
pact (closed and bounded) and connected (meaning any two points on the circle are
connected by some path on the circle), it follows that we can reconstruct the circle
group from its Lie algebra via the ‘exponential map’.
Exercise 27 (Circular Reasoning) Wanting to design a bigger, better Orbital Death
Star, the three amigos decide to program a new targeting algorithm with the alge-
bra of Quaternions. However, with the closure of the university (due to Greens riots
regarding their endorsement of environmentally-unsustainable science projects)
and the permanent collapse of the ’BigAir’ server, the three amigos are set on
a quest to find and construct the Quaternion algebra. As such, they decide the cir-
cle group is a good place to start – maybe the rotation group can be reconstructed
as a product of three circle groups?
First take any element z = eiθ of the circle group, then look at its infinitesimal
form by setting the rotation angle θ → δθ, where δθ is infinitesimally small. To
this extent, you can use the first order Taylor series expansion of eiθ to analyze the
structure of S1 about the identity (θ = 0):
z = eßθ
=≈ 1 + iθ. (220)
By replicating the derivation of the Lie algebra for the 3-dimensional rotation
group, show that the lie algebra elements (u(1)) of the circle group are given by:
dz
dθ
= iθ, θ ∈ [0, 2π). (221)
Therefore, we can represent any element of the circle algebra u(1) by iθ – some
angle multiplied by i. Since multiplication of two circle group elements corre-
101
sponds to addition of angles (via the properties of the complex exponential), the
Lie bracket on the circle algebra u(1) is given by:
[a, b] = ab − ba. (222)
Show that the elements of u(1) do indeed obey the properties of a Lie algebra with
this Lie bracket.
Hint: Since the multiplication of complex (or real) numbers commutes, this exer-
cise is trivial as [a, b] = 0 ∀a, b ∈ C – i.e. the Lie bracket is zero, hence trivially
satisfies all required properties.
Formally, the circle group is often referred to as the 1-dimensional ‘Unitary group’,
U(1), characterized by the unitary condition:
zz†
= 1 ∀z ∈ U(1), (223)
where † is the ‘conjugate transpose’ – for complex numbers, this is simply the com-
plex conjugate. By the relation eiθ = e−iθ it is clear that all elements of the circle
group satisfy the unitarity condition: eiθeiθ = e0 = 1. As such, the circle group
acts as a building block for all higher-dimensional compact, connected, abelian Lie
groups – that is, all such Lie groups are simply a direct product of circle groups:
Tn = S1 × ... × S1, corresponding to an n-dimensional torus (the 2-dimensional
torus is your familiar doughnut).
Unfortunately, despite being central to the construction of abelian Lie groups, the
circle group does not serve as a building block for quaternions – our desired alge-
braic structure to represent rotations in 3-dimensions. You see this easily by noting
that rotations in 3-dimensions don’t commute (equivalently, the matrix multipli-
cation of rotation matrices isn’t commutative) – hence there is no chance that a
commutative group such as S1 will serve as an appropriate building block.
Finally, as a closing remark on the circle group, one should observe that L2(S1)
– the space of ‘square integrable’ functions95 defined on the unit circle, is simply
the space of periodic functions (we can always normalize the circumference 2π to
any period we want). As such, the representation theory of Lie group S1, gives
rise to Fourier series and Fourier analysis – quintessential to modern mathematics,
engineering, computer science, physics, chemistry, biology and music (acoustic
theory).
95
Lesbegue measurable functions: f ∈ L2
(S1
) =⇒
S1
|f|2
 ∞.
102
13.3 The Quaternions
The story of Quaternions starts96 with the Irish polymath – Sir William Rowan
Hamilton, who is by far and large, one of the most influential people in the his-
tory of the mathematical sciences (at least on par with Euler). Hamilton was by
all accounts, a genius at an early age – when he wasn’t busy advancing the human
frontiers of knowledge in mathematics and physics, he was topping languages at his
university in his spare time. One problem that took his fancy, was a way to extend
complex numbers (which we showed corresponded to 2-dimensional real numbers)
to higher spatial dimensions. Although he could not find a 3-dimensional gener-
alization, when working with four dimensions he created quaternions. According
to Hamilton, “on October 16 he was out walking along the Royal Canal in Dublin
with his wife when the solution in the form of the equation
i2
= j2
= k2
= ijk = −1 (224)
suddenly occurred to him; Hamilton then promptly carved this equation using his
penknife into the side of the nearby Broom Bridge. These are the defining relations
for the Quaternionic algebra – and certainly quite a discovery considering there are
exactly four normed division algebras (the real and complex numbers being two of
them)! We shall now continue our investigation.
In the same fashion that we put complex numbers into correspondence with two-
dimensional vectors in R2, we can put Quaternions into correspondence with four-
dimensional vectors in R4. This is done by representing an arbitrary quaternion Q
in the following way:
Q = a + ib + jc + kd, (225)
where a, b, c, d ∈ R are real numbers and i, j, k are the quaternionic generaliza-
tion of the imaginary unit for complex numbers, obeying the fundamental rela-
tion:
i2
= j2
= k2
= ijk = −1. (226)
The addition of Quaternions is performed in the obvious way, like the addition of
complex numbers – you can treat it as 4-dimensional vector addition, where the
coefficients of i, j, k add separately and the scalar part adds separately:
Q1 + Q2 =(a1 + ib1 + jc1 + kd1) + (a2 + ib2 + jc2 + kd4)
=(a1 + a2) + i(b1 + b2) + j(c1 + c2) + k(d1 + d2). (227)
96
Technically speaking, Benjamin Olinde Rodrigues came up the defining relation for quaternions
around the same time as Hamilton – but Hamilton is credited historically, perhaps because he did a
deeper investigation of their algebraic structure, whilst applying them to physics with great success.
103
Similarly, the other rules of addition of complex numbers (such as associativity and
commutativity) still hold – likewise with scalar multiplication and its distribution
over addition. What changes however, is that unlike complex numbers, the mul-
tiplication of quaternions is not commutative! That is, Q1 × Q2 = Q2 × Q1 in
general (cf. matrix multiplication). This follows directly from Hamilton’s funda-
mental relation.
Exercise 28 (Quaternions: A Quadrivial Quandary) Having rediscovered Hamil-
ton’s Quaternions whilst walking over a bridge in Dublin, Ben Luo gives the fol-
lowing problem to the surviving members of the St. George’s College Death Star.
In particular, using the defining relations:
i2
= j2
= k2
= ijk = −1, (228)
prove the following identities:
ij = k, ji = −k, (229)
jk = i, kj = −i, (230)
ki = j, ik = −j. (231)
Hint: Try multiplying the equation ijk = −1 (or subsequent products), from the
left or right by i, j or k while using the relations i2 = −1, j2 = −1, k2 = −1.
Note that you must keep track of the order in which you multiply – just like you
would for matrices or vector cross-products!
Given an arbitrary quaternion Q = a + ib + jc + kd, we call a the ‘scalar part’ and
ib+jc+kd the ‘vector part’. This is because if we set b = c = d = 0, the resulting
quaternion is simply a real number – which behaves like a scalar. Similarly,if we
set a = 0, the resulting quaternion Q = 0 + ib + jc + kd is ‘imaginary’ (provided
at least one of the other coefficients is non-zero) – it will exhibit vector behaviour,
in a fashion we will investigate later. First however, we must understand how the
operations of conjugation and inversion behave for quaternions – and in particular,
how to define the ‘length’ (norm) of a quaternion.
The conjugation of quaternions is a direct extension of the conjugation of complex
numbers. In particular, given a quaternion Q = a + ib + jc + kd, its conjugate97
is defined by:
¯Q = a − ib − jc − kd. (232)
That is, the conjugates of the quaternionic imaginary units are ¯i = −i,¯j = −j and
¯k = −k. Since ¯¯Q = Q, conjugation of quaternions is said to be an ‘involution’ –
meaning an operation that undoes itself (squares to give the identity).
97
Some of you may prefer the notation Q to denote conjugates – either one is fine as long as you
specify it.
104
Exercise 29 (Conjugating Zachary) Caught in a moral ethics debate with Zach
Menschelli over the construction of a new St. George’s College orbital death star,
Matthew Fernandez decides to represent Zach’s argument as a set of quaternion.
His logic is that by conjugating the quaternion – and therefore Zach’s argument, he
will confuse Zach and win him over in the moral ethics debate. Thus, we consider
the following ...
The conjugation of complex numbers cannot be expressed by multiplication or ad-
dition – it a unique operation in that sense and corresponds to the geometric fact
that conjugation equates to a reflection about the imaginary axis (an operation
in the two-dimensional orthogonal group, O(2) with determinant equal to −1).
However, quaternions being friendly creatures, permit an algebraic representation
of conjugation:
¯Q = −
1
2
(q + iqi + jqj + kqk). (233)
Using the previous identities derived, prove (via quaternion multiplication) that
this conjugation identity coincides with the first definition:
¯Q = a − ib − jc − kd. (234)
Having defined conjugation for quaternions, we are now in a possible to define
a sensible notion of ‘length’ (norm). As inspiration, one may recall that we can
compute the length (modulus) |z| of a complex number z, by the following for-
mula:
|z|=
?
¯zz, (235)
which simply follows from the polar representation of complex numbers. Simi-
larly, we can define the length of quaternion Q = a + ib + jc + kd, to be:
Q =
˜
Q ¯Q =
˜
¯QQ =
—
a2 + b2 + c2 + d2. (236)
The last equality coincides with the 4-dimensional Euclidean norm – or equiva-
lently, the four-dimensional version of Pythagoras’ theorem for the distance be-
tween the origin (0, 0, 0, 0) and a point (a, b, c, d).
Exercise 30 (The Social Norm) With college pride at an all time low – due to the
loss of The Dragon, the students of St. George’s College start to take on a more
serious disposition. As part of this, they shift their gaze to the stars and the realm
of mathematics that lies beyond. Therefore, at the next college party, to avoid being
perceived as uncouth, you are asked to prove that:
˜
Q ¯Q =
—
a2 + b2 + c2 + d2, (237)
105
using the algebraic properties of quaternions. Furthermore, show that if we mul-
tiply a quaternion Q by some real number λ, then λQ = |λ| Q – i.e. the norm
scales linearly.
Finally, show that the norm is multiplicative. This means showing that for any two
quaternions Q1, Q2, we have:
Q1Q2 = Q1 Q2 . (238)
Given this sensible ‘norm’ (length) for quaternions, we can define a notion of dis-
tance (formally, a ‘metric’) on the space of quaternions. In particular, the dis-
tance between two quaternions Q1 and Q2 is defined to be the norm of their differ-
ence:
ρ(Q1, Q2) = Q2 − Q1 , (239)
where the map ρ is the 4-dimensional Euclidean (Pythagorean) metric. This coin-
cides with the usual notion of distance which you are familiar with in 1, 2 and 3
dimensional vector spaces.
Exercise 31 (Transcending Pythagoras) In a particular, Joshua meets the an-
cient Greek mathematician Pythagoras (an eternal one) while on an excursion
between worlds. Inspired by this meeting, he thinks of a correspondence be-
tween quaternions and Euclidean geometry, which is as follows. Representing
the Quaternions Q1, Q2 as points (or vectors) (a1, b1, c1, d1) and (a2, b2, c3, d4) in
four dimensional real space R4, show that the distance formula
ρ(Q1, Q2) = Q2 − Q1 , (240)
simply gives the four-dimensional equivalent of Pythagoras’ theorem.
Hint: This means showing that Q2 − Q1 =
—
(a2 − a1)2 + ... + (d2 − d1)2.
Having defined the length of a quaternion, one can now make sense of what it
means to multiplicatively ‘invert’ a quaternion – i.e. to compute its reciprocal. In
particular, it was stated earlier that Quaternions were one of only four normed divi-
sion algebras – this means that there must exist some may of dividing quaternions,
which requires inverting them. Similar to complex numbers of unit length – which
we showed formed the ‘circle group’ (corresponding to two-dimensional rotations),
one can define a quaternion of unit length as follows. Given any Quaternion Q, the
corresponding unit Quaternion is given by:
ˆQ =
1
Q
Q. (241)
106
This is not surprising as it how we usually construct unit vectors.
To construct an inversion formula, we draw on the analogy provided by complex
numbers. In particular, given any complex number z, we know that z¯z = |z|2,
which is a real scalar. If we were to divide both sides by |z|2, we would have:
z¯z
|z|2
= 1. (242)
Rearranging, we have:
z
1
|z|2
¯z = 1, (243)
hence we can see explicitly that for any non-zero complex number z (|z|= 0), its
inverse is given by:
z−1
=
1
|z|2
¯z. (244)
One would expect that if you multiply a quaternion and its conjugate, that you get
a real scalar. Dividing the resulting number by that scalar would give 1, meaning
that inverting a non-zero quaternion (i.e. a quaternion whose components are not all
zero) should amount to the same procedure as we just demonstrated for complex
numbers. This is indeed true, hence given any non-zero quaternion Q, we can
construct its inverse Q−1 using the following formula:
Q−1
=
1
Q 2
¯Q. (245)
Exercise 32 (Broadening Units) Prove that above formula for the Quaternion in-
verse works – this is very trivial. In particular, show that Q 1
Q 2
¯Q = 1 using
any of the previous formulas derived. Now, write down your favourite quaternion
(one with all components a, b, c, d = 0) and compute the corresponding conjugate
quaternion ¯Q, unit Quaternion ˆQ and the inverse quaternion Q−1. Finally, per-
form the left and right multiplications explicitly: QQ−1 and Q−1Q and show they
are both equal to 1.
13.4 Quaternions, Rotations and the 3-Sphere
Recall we suggested that it might be possible to construct some algebra repre-
senting 3-dimensional rotations, by taking the product of three copies of the circle
group S1? This would form a 3-dimensional torus T3 = S1×S1×S1 parametrized
107
by three separate angles. Note that your usual torus (yummy doughnuts) is 2-
dimensional:T2 = S1 × S1 since they are parametrized by two angles. Physically,
you can see this explicitly by cutting a torus horizontally or vertically – clearly it
consists of two different circles whose symmetry axes are perpendicular to each
other (unless it is an oblique torus).
This intuition was not bad, although it was wrong. The correct geometric object
representing 3-dimensional spatial rotations, is in fact the 3-dimensional hyper-
sphere, S3 – referred to as the ‘3-Sphere’. Note that the spheres you are most
familiar with are the 2-dimensional surfaces S2 – they parametrized by two angles,
which you probably refer to as latitude and longitude (or θ and φ)98. As stated
in a previous tutorial, the 2-dimensional sphere is the boundary of 3-dimensional
ball – which is a solid (boundary surface + interior).Thus, you can think of a 3-
dimensional sphere as the boundary surface of a 4-dimensional ball – if you ever
want to be tripped out, Youtube a simulation of a topological 3-Sphere.
We shall now see how quaternions of unit length are in fact a direct representation
of the symmetries of 3-dimensional sphere S3. As such, they provide an algebraic
bridge between the Lie group of rotations SO(3), the 3-dimensional sphere and the
special unitary group SU(2) (we which haven’t covered) – the latter being a sym-
metry group in quantum mechanics (regarding spin and angular momentum) and
also the symmetry group representing interactions occurring via the weak nuclear
force.
To see the correspondence between quaternions and rotations, recall Euler’s ro-
tation theorem which says that “Any rotation or sequence of rotations of a rigid
body or reference frame about a fixed point, is equivalent to a single rotation by
a given angle θ about some fixed axis that runs through the fixed point.. As we
saw in tutorials 8 − 10, this was rather obvious when we constructed ‘Rodrigue’s
Rotation Formula’ – the data this required was some 3-dimensional unit vector
u = u1e1 + u2e2 + u3e3 representing the rotation axis, along with a scalar θ
representing the rotation angle.
Given a unit imaginary quaternion Q, a quaternionic generalization of Euler’s for-
mula is as follows:
Q = e
1
2θ(uxi+uyj+uzk)
= cos 1
2θ + (uxi + uyj + uzk) sin 1
2, (246)
where i, j, k are quaternionic imaginary units. We now see why we called the coef-
ficients of i, j, k the ‘vector part’ of the quaternion Q – they do indeed correspond
to some 3-dimensional vector (ux, uy, uz), which in this case describes a rotation
98
Mathematicians and Physicists often use opposite conventions...
108
axis. This quaternion can be thought of as a function of four-variables: the rotation
axis u (3 variables) and the rotation angle θ (1 variable) and will act as a linear
operator that rotates any 3-dimensional vector v by some angle θ, anti-clockwise
about the axis u.
Explicitly, we first represent any 3-dimensional real vector v = v1e1 +v2e2 +v3e3
by a purely imaginary quaternion: v = iv1 + jv2 + kv3. The rotated quaternion
v (rotated by an angle θ anticlockwise about an axis u) is given by a group action
known as ‘conjugation’ by the quaternion Q:
v = Q−1
vQ, (247)
using quaternionic multiplication (note that order is very important – if you conju-
gate the wrong way, you will get the reverse rotation). To recover the rotated vector,
you simply apply the reverse isomorphism and replace the quaternionic imaginary
units in the rotated quaternion with the standard basis vectors for R3.
Exercise 33 (Rotary Club) Having learned to rotate vectors using quaternions,
the three amigos were subsequently given lifetime membership to the Perth Rotary
Club. As part of this membership, they must explain how one would perform two
successive rotations of a 3-dimensional vector using unit quaternions Q1 and Q2,
representing rotations about axes u1, u2 and angles θ1, θ2. For you to obtain
membership, write down an algebraic expression to do this.
Now, in your expression – how do you guarantee that the rotation Q1 is performed
before Q2, and not the other way around?
Generalize your results to a sequence of n rotations, where n = 0, 1, 2, 3....
Finally, using the rotation matrices from the Lie group of rotations as an inspi-
ration, suggest an easy way to invert an imaginary unit quaternion representing
some rotation. Hint: you think of the conjugation operation as well as replacing
the rotation angle with its negative.
Exercise 34 (Drones) Having got wind of the renewed Project Death Star, the U.S.
Military decides to send an Amazon drone to St. George’s College to deliver a book
entitled: Freedom. Inside this book is a 1 megaton nuclear warhead. Having not
ordered this book, Ian Hardy decides to set up an old 20th century cannon on top
of the college tower – but replaces the internal structure with a LAWS naval laser.
The cannon being at an orientation of 45 degrees from the horizontal, facing the
river (the x-axis) with the y-axis aligning with Tommy Moore College and the z-
axis being vertically overhead. After setting the cannon to be the origin of some
109
3-dimensional real vector space – with z = 0 coinciding with the top of the tower,
the Freedom Drone approaches the college at a coordinate of (10, 7, 5) metres.
Write down a single quaternion Q that will rotate the cannon so that its line of
sight is directed head-on at the Freedom drone. Now use this quaternion to rotate
the cannon and check that it indeed works.
Hint: You may use three sequential rotations to obtain a single quaternion, or you
can try and find the appropriate rotation axis u and angle θ to write down the
quaternion in one go.
Finally, in the same manner that were able to use the Lie algebra of rotations to
perform vector cross products in 3-dimensions, we can also perform the scalar
product (dot product) and vector cross product of 3-dimensional vectors using the
isomorphism between 3-dimensional vectors and imaginary quaternions. In par-
ticular, given any two real 3-dimensional vectors v and u, we represent them as
quaternions v and u, by replacing the standard basis vectors {e1, e2, e2} with the
quaternionic imaginary units {i, j, k} – which is what we did previously. Now, we
can write their dot product and vector cross product in quaternion form:
v · u =
1
2
(v¯u + u¯v) = ¯vu + ¯uv,
cross(v, u) =
1
2
(vu − ¯u¯v). (248)
Replacing the quaternionic imaginary units with the standard basis vectors for R3
in the resulting quaternions (i.e. apply the inverse isomorphism), gives us the re-
sulting vector cross product of the two real vectors v and u.
Exercise 35 (Mid-Semester Break) Deciding that university exams were too easy,
the three amigos continued the renewed project Death Star (v2.0), during the mid-
semester break. As part of their daily intellectual exercise, help them verify that
the above quaternion identities do indeed reproduce the dot and cross products
computed by less-extravagant means.
This concludes our investigation of quaternions. There are many more surprising
and interesting properties as well as their appearance in different areas of the math-
ematical sciences – perhaps one day, we will return to them. For now, Adieu and
happy exams/holidays.
Note I expect everyone to complete unfinished tutorials before next semester! Michael
Champion is back, so there is no hiding. :)
110
14 Interlude: Academic and Intellectual Maturity
You may or may not be aware, but this semester the college has decided to ‘adopt’
some briefings on ‘academic maturity’. This is in essence, to meet a strong de-
mand/need that was implicit in Semester 1, but perhaps not explicit till a combined
review of the academic status of students at this college. The purpose of this set
of points as a subset of some initial pondering I made at the end of Semester 1, is
to lay bare a set of essential facts (reviewed after several discussions) which I had
erroneously taken for granted to be common knowledge.
Ultimately, the aim of the academic component of St. George’s college, is to foster
the development of wise, mature and competitively adept intellectuals who will
make the most of their innate and external potential. As your tutor, who wants
to see the best progress in your character development (not just academic), I am
here to provide some guidance and support. Sometimes this includes stating the
hard facts and hard truths. One would hope that you adopt these observations into
your future judgements and considerations of how you orient yourselves in the real
world – many of the lessons learned at university are just a reflection of the human
condition at large.
14.1 Keeping a CV
For the specific purpose of illustrating what opportunities are available to mathe-
matics and physics students, along with the general sort of ‘steps’ one must climb to
attain those opportunities, I will upload my academic (Mathematics/Physics) Cur-
riculum Vitae. Of course, everyone will have different journeys to take, but some-
times it helps to borrow ideas – or at least see what’s available on the menu.
Disclaimer: The CV which I will upload is a very ‘generic’ academic style one
– with extra information at the end. Note that one would almost never submit a
full-length CV when applying for scholarships, external workshops and schools
or professional appointments. In particular, one should always write a ‘purpose-
specific’ tailored CV for applications. The generic full-length CV you keep is more
of a personal record or ‘master file’ from which you can copy and paste / edit to
appropriate length for applications.
Note that it’s extremely important to create and update a CV during your univer-
sity training years. This is not just to help you apply for scholarships, travel or job
opportunities – it’s also a very precise and encouraging way to keep track of your
development! Personally, I would suggest using the typesetting program ‘LaTeX’
111
to create and update your CV (once you learn the minimal programming required
it’s more efficient than word). This gives it a neater and more professional look.
LaTeX is also the standard typesetting used for most professional communication
(journal articles, lecture notes, books etc) in the academic world – making Mi-
crosoft office quite an outdated fossil which all academics should upgrade from
(for example, it makes referencing extremely easy).
14.2 Important Learnings and Observations
Unfortunately, with respect to selection criteria, there is an abysmal correlation be-
tween high school performance (ATAR scores) and university performance (course-
work results). There are many reasons for this discrepancy.
• Fallacy: High ATAR = Intelligence = High University Performance.
• Fallacy: High ATAR = Strong Initiative and Self-Discipline.
• Fallacy: High ATAR = Self-Motivated and Passionate.
• Fallacy: High ATAR = Good Self Time-Management.
• Partial Truth: High ATAR = Well-developed study skills.
When people perceive that they are very successful – especially in a ‘be-all-end-all’
type affair as the Year 12 ATAR scores are hyped up to be, they can become naively
complacent. They assume that success with their ATAR entitles them to success
at university, at life or that it bestows them with the eternal label: “Intelligent
person. Sometimes, in a few cases where this success was determined mostly by
the individual student – and not the support of their teachers, parents, tutors or
schooling system, then complacency is somewhat justified. For the most part, it is
(from my observations) – pathological hubris.
This is not to say that students should not be rewarded or congratulated for success
in high school – but rather, that they should know early on, that ATAR scores are
a very superficial means of assessing ‘potential for success’. In this way, students
can be made aware from a very early stage that they must be willing to learn and
re-adapt themselves to new challenges and new scenarios that they are presented
with. University is one such challenge. This should be part of the student’s gradual
process of self-awareness, accountability and increased responsibilities, but also
part of the university and college responsibility – if they wish to maintain students
with ‘above average’ mindsets. To perform well at university – meaning main-
taining an average of ≥ 90% (or ≥ 80% if you want to use the High-Distinction
112
standard at UWA) in coursework, a typical high-performing high-school student
is suddenly presented with some cunningly ‘hidden’ challenges. These may or
may not be obvious on the outset, depending on the perception and wisdom of the
student. Nonetheless, they are summarised as follows:
• Success requires self-discipline and strategic time-management.
For some people, this may require giving up things such as computer games
or recreational skydiving. It also requires making lots of ‘lists’, keeping a
diary and calendar – prioritising major tasks and actually getting through
one’s lists. One must know one’s limits however – ‘down time’ is necessary
to maintain optimal performance. However, there is a point where ‘down-
time’ turns into avoidance and procrastination.
• Success requires self-motivation and self-regulation.
It’s no longer the job of your teachers, parents or school to tell you what to
do, or to inspire you to do well. Students who come from schools or families
where all the hard behind-the-scenes work was ‘done for them’, are often
slow to learn this lesson. It is in part because high school teaches students
the that their ATAR = their intelligence, without necessarily acknowledging
the monumental efforts of some teachers. The converse may also be true –
astounding individuals can emerge from abhorrent schools, entirely due to
their own initiative.
• Success requires honest self-inventory.
One bad habit that emerges from our present generation, is the habit of
‘blame’ and misdirection. Sometimes when people under-perform, they look
for excuses or people to blame – they may point to the lecturer, their tutors,
the university, personal relationships or a whole plethora of external factors.
The great contemporary intellect, Noam Chomsky, writes of this somewhere
– in essence the habit of blame is partly embedded within modern media,
which makes use of capitalistic ‘feel good’ psychology. It also partly comes
from high school, where teachers are held responsible and accountable for
almost everything. At university, the student must take on the attitude that
their learning, understanding and success is ultimately their own responsi-
bility – their lecturers and courses are merely there to facilitate their journey,
but the student must take their own steps.
Often one finds that people who exhibit substandard performance, whilst
being complacent, over-trivialise a mastery of skill or some accomplishment.
This is mostly likely just another instance of the ‘Dunning-Kruger effect’
(see below).
113
At the end of the day, it’s neither better to undersell or oversell yourself.
Sure, you want to sell yourself well when applying for jobs, scholarships or
various opportunities – or you may wish to undersell yourself so as not to
alienate oneself from people who are easily intimidated and envious. How-
ever, in your own personal and private inventory of your abilities, one should
strive to be very precise and accurate – this means having a good operational
understanding of your weaknesses and strengths. As a general principle, one
should always be working to improve their weaknesses and capitalize on
their strengths – this alone, will lead to a rise in grades and performance at
university (and life in general).
• Dunning-Kruger effect. “If you’re incompetent, you can’t know you’re in-
competent. [...] the skills you need to produce a right answer are exactly the
skills you need to recognize what a right answer is. –David Dunning.
“Unskilled individuals suffer from illusory superiority, mistakenly rating
their ability much higher than is accurate. This bias is attributed to a metacog-
nitive inability of the unskilled to recognize their ineptitude.[Kruger  Dun-
ning, Journal of Personality and Social Psychology, Vol 77(6), Dec 1999,
1121-1134]
“Those persons to whom a skill or set of skills come easily may find them-
selves with weak self-confidence, as they may falsely assume that others
have an equivalent understanding (Impostor syndrome).
In summary, the Dunning-Kruger effect is a rife disease of the mind which
can only be cured with ‘academic maturity’. Since it occurs in the population
at large, it’s not unusual in any respect – for example, recalling Alexander
Pope from 1709:
A little learning is a dangerous thing; drink deep, or taste not the Pierian
spring: there shallow draughts intoxicate the brain, and drinking largely
sobers us again.
Similarly, one can find much earlier statements pertaining to humility and
ability, for example, from Confucius – “Real knowledge is to know the
extent of one’s ignorance.
If you’re going to claim that you’re good or excellent at something (e.g. be-
ing intelligent), take a step back and ask yourself – in what context and by
what measures are these claims met? If someone else claims this, ask your-
self the same thing. On the other hand, if you claim to be bad at something,
take a step outside of yourself and scope out the progress you’ve made – are
114
you really ‘bad’, or are you just in the process of learning and training? Cer-
tainly the latter provides a more useful and operationally efficient perspective
99!
Of course, ineptness does not mean being incapable of learning or unable
to improve – indeed, a change of attitude can see inept people reaching a
level of mastery in the relevant skill (here we speak of mathematics, prob-
lem solving, physics and logical thinking). Basically, Dunning-Kruger and
false pride is an age-old problem with the greatest irony being that some-
times inept people perceive those who are very adept as being arrogant or
egotistical. C‘est la vie.
14.3 Motivation
Despite the above critical analysis, one must conclude with the observation that
amongst this group of students and tutors, we have some very talented individuals.
For the most part, it is clear that the students and tutors who have attended tutorials
so far, have great potential in their future careers. One small, but perhaps obvious
secret is that apart from a few cases of extraordinary innate genius or upbringing
(Carl Friedrich Gauss), the majority of intellects who have had a dramatic and pro-
found influence on science, art and the human condition in general – were people
who worked very hard behind the scenes. It’s not true that everyone can be as rev-
olutionary or influential as Einstein, Dirac or David Hilbert, but it is true that with
a small predisposition towards the mathematical sciences, accompanied by thou-
sands of hours of mathematical entertainment and thought experiments, that one
can have a good chance at making important and interesting, lasting contributions
to human knowledge and our understanding of the world around us. If you can
convince yourself that it is something you enjoy, then the hours required to get to
such a level aren’t really work at all – it’s just play.
Alternatively, you can make an immense amount of money with the strategic, or-
ganized and problem-solving mindset that the mathematical sciences equip you
with – for example, the mathematician James H. Simmons or the ex-theoretical
physicists who crashed the Stock Market with their micro-trading AIs. Whatever
your goal is, one fact remains – the better you perform, the more opportunities you
create for yourself. Success is attractive, under-performance isn’t.
“We at the height are ready to decline. There is a tide in the affairs of (wo)men
Which, taken at the flood, leads on to fortune; Omitted, all the voyage of their life
99
Disclaimer: A grain of salt – you may also just be really bad.
115
Is bound in shallows and in miseries. On such a full sea are we now afloat, And we
must take the current when it serves, Or lose our ventures.
116
15 Tutorial 12: Metric Spaces and Relativity I
One way that we see progress in science, is to take an everyday concept or prin-
ciple and make it ‘abstract’. This means distilling the concept and extracting its
fundamental elements so that the concept can be ‘re-applied’ to more general set-
tings. Sometimes this is purely a creative affair, but almost always it leads to new
applications, new ideas and entirely new fields of research.
In this tutorial, we take the everyday concept of ‘distance’ and mathematically for-
malize it to give a mathematical object known as a ‘metric’. We then couple this
object with a set (e.g. a 3-dimensional space), to give a structure called a ‘met-
ric space’. Such structures have fundamental applications to pure mathematics,
optimization, computer science and physics.
Exercise 36 With the person closest to you, think of 3 examples of different notions
of distance in everyday living. Furthermore, considering two distinct (possibly ab-
stract) objects or locations, state two notions of distance that give different values
for the distance between those objects.
15.1 Metric Spaces
To make the concept of ‘distance’ mathematically concrete, we formalize it with
the notion of a ‘metric space’.
Definition 9 A metric space is a collection (S, d) consisting of a set S and a metric
d on S, which is a map characterised by the following properties:
1. positivity – for any x, y ∈ S,
d(x, y) ≥ 0, (249)
2. non-degeneracy
d(x, y) = 0 iff x = y, (250)
3. symmetric
d(x, y) = d(y, x), (251)
4. triangle inequality
d(x, z) ≤ d(x, y) + d(y, z). (252)
117
Conceptually, the metric d gives us a measure of ‘distance’ on the set S. Note
that the above properties are somewhat ‘intuitive’ – the symmetric property (3) just
says that the distance from point A to point B should be the same as the distance
point B to point A. One could relax this property to obtain a more general notion
of distance, for example, on a ‘directed graph’.
Problem 22 (Warm-up (Challenging)) It is possible to derive the first property
(positivity) of a metric space from the other properties. In this manner, one may
view properties 2-4 as fundamental axioms and property as a consequence – mean-
ing one could technically discard it from the definition of a metric space.
Assuming properties 2-4 hold, prove that property 1 follows.
15.1.1 Euclidean Metric Spaces
We now consider a simple, but important example of a 1-dimensional metric space
(d, R1), where d is the map given by the ‘absolute value’ operation.
Example 10 (Absolutely Easy) Given the set R of real numbers (which we can
represent as the real number line), a ‘Euclidean’ metric is d is given by the absolute
value. In particular, for any two numbers x, y ∈ R, we define:
d(x, y) = |x − y|, (253)
to be the distance between the points x and y. To see that d is indeed a metric on R,
we must check that (d, R) satisfies the four axioms of a metric space. In particular,
let x, y, z ∈ R be any points on the real number line. The first three properties
follow directly from the properties of the absolute value – the triangle inequality
(fourth property) requires a bit more work.
1. Positivity: d(x, y) = |x − y|≥ 0 follows from the properties of the absolute
value.
2. Non-degeneracy: d(x, y) = |x − y|= 0 holds if and only x = y.
3. Symmetry: d(x, y) = |x − y|= |y − x|= d(y, x).
4. Triangle inequality: |x − z|≤ |x − y|+|y − z|. To see that this property
118
holds, consider the following proof:
−|a| ≤ a ≤ |a| and − |b|≤ b ≤ |b|
=⇒ −(|a|+|b|) ≤ a + b ≤ |a|+|b|
=⇒ |a + b| ≤ |a|+|b|
(254)
for any a, b ∈ R. In particular, if we let a = x − y and b = y − z we see that
|x − y + y − z|= |x − z|≤ |x − y|+|y − z|, (255)
hence proving that the map d(x, y) = |x−y| satisfies the triangle inequality.
Alternatively, here is another proof of the triangle inequality:
(|a + b|)2
=a2
+ b2
+ 2ab
=|a|2
+|b|2
+2ab
≤|a|2
+|b|2
+2|a||b|= (|a|+|b|)2
. (256)
Since both sides of the above inequality are positive, we can take the square
root 100 of both sides, giving
|a + b|≤ |a|+|b| ∀a, b ∈ R. (257)
Hence it follows that (d = | |, R) is a 1-dimensional metric space. We call this
1-dimensional ‘Euclidean Space’.
Problem 23 (Challenge) With a simple trick, one can use the 1-dimensional tri-
angle inequality to prove the ‘reverse triangle inequality’:
||a|−|b||≤ |a − b|. (258)
Do this if you finish the rest of the tutorial.
Although the 1-dimensional Euclidean metric space we previously considered was
very simple, it serves as a ‘building block’ for more complicated metric spaces –
for example, your familiar 2-dimensional Euclidean space.
Example 11 In two-dimensional Euclidean space (flat-land) R2, we can measure
the ‘straight-line’ distance between two points using Pythagoras’ theorem. First
we set up a 2D Cartesian coordinate system with (0, 0) as the origin. Now, taking
100
Because the real square root is a monotonic increasing function, it does not affect the direct of
the inequality.
119
points P1 = (x1, y1) and P2 = (x2, y2), one can draw a right-angle triangle
with the vector v = (x2 − x1, y2 − y1) point from P1 to P2 as its hypotenuse. In
terms of these coordinates, we can then measure the ‘Euclidean distance’ between
P1 and P2 to be the length of the hypotenuse of the previous triangle as given by
Pythagoras’ theorem 101
d(P1, P2) = v =
—
(x2 − x1)2 + (y2 − y1)2 (259)
It should be an easy exercise to verify that the 2-dimensional Euclidean metric
d, given by the Pythagoras rule is indeed a metric on 2-dimensional real space R2.
Note however, that proving the 2-dimensional triangle inequality may require some
cunning.
Exercise 37 (Easy (but no so easy)) Prove that the 2-dimensional Euclidean map
d given by (259) is indeed a metric on R2. In other words, prove that (d, R2)
satisfies the four axioms of a metric space.
Hint: Use Cartesian coordinates to label your points P1 = (x1, y1), P2 = (x2, y2),
P3 = (x3, y3). To show that the 2-dimensional triangle inequality holds, one may
consider the ‘geometrical’ proof by Euclid in Book 1 of Euclid’s Elements. Note
that this proof should suffice for Euclidean spaces of arbitrary dimension, since
any three points must lie in some 2-dimensional plane – a linear subspace of your
total space, whence Euclid’s proof can be applied.
Alternatively, to prove that
d(P1, P3) ≤ d(P1, P2) + d(P2, P3) (260)
for any points {P1, P2, P3} in R2, one can make use of the ‘1-dimensional triangle
inequality’ proven earlier or prove an inequality known as the ‘Cauchy-Schwarz’
inequality (easy-but-no-so-easy).
For 3-dimensional real space, R3, we have the additional knowledge that it can
be naturally equipped with a vector-space structure. In particular, recall that any
point (x, y, z) can be represented by a vector pointing from the origin (0, 0, 0)
to (x, y, z) and that the addition and subtraction of ’points’ simply corresponds
to vector addition and subtraction. In this manner, a notion of distance between
points A and B can be given by the Euclidean length of the vector between those
points – that is, the ‘norm’102 of the vector pointing from A to B (or vice versa),
101
Recall, for a right-angle triangle with sides of length a, b and c, where c is the hypotenuse, one
has a2
+ b2
= c2
.
102
Recall that the ‘norm’ of a vector v = (x, y, z) in the standard basis, is simply given by v =ax2 + y2 + z2.
120
given by vector subtraction. This notion of distance is simply a 3-dimensional
version of Pythagoras’ theorem, with A, B and the origin O = (0, 0, 0) making
up three points of a right-angled triangle. Therefore, for A = (a1, a2, a3), B =
(b1, b2, b3) ∈ R3, we see that
d(A, B) = AB = B − A , (261)
gives us our standard metric on R3, making (d, R3) a 3-dimensional Euclidean
metric space. Intuitively, one can easily generalize this observation to define a Eu-
clidean metric on n-dimensional real space Rn for any dimension 103 n ≥ 0.
Exercise 38 (Cauchy-Schwarz Inequality (Challenge)) Given any vector space
V (for example Rn) equipped with a positive-definite inner-product , (e.g. the
dot-product) and some corresponding induced‘norm’ ||_||(some notion of length,
e.g. ||v||=
—
|v · v|) , it follows that
| v, u  |≤ ||v||·||u||, (262)
where u, v are vectors in V .
For the case of the dot-product,  u, v := u · v, prove that the Cauchy-Schwarz
inequality holds on Rn.
Hint: Take n = 3 for simplicity. Also, note that technically you cannot use the
identity u · v = ||u||||v||cos(θ) (where θ is the ‘angle’ between u and v) since
this explicitly relies on the Cauchy-Schwarz inequality being true! Indeed, such an
identity is a characterising quality of Euclidean geometry.
Hint: You may want to consider expressing ||u||2||v||2 as sum of dot-products and
cross-products – this was used by Lagrange.
15.1.2 Fun Metric Spaces
In this section, we consider some ‘less obvious’ examples of metric spaces. For-
mally, these are referred to ‘Fun spaces’, after the German mathematician, Frei-
drich P. Fun104. Apart from being fun, some of these metric spaces have powerful
applications to the real world.
103
Note that a zero-dimensional space, R0
would simply correspond to a single point and hence
would trivially satisfy the axioms of a metric space.
104
This statement is false.
121
Exercise 39 (The Taxicab Metric) The Taxicab metric, or ‘rectilinear distance’
was first formally considered by Minkowski105 and can be defined on any real vec-
tor space, Rn, to make a metric space formally denoted by l1(Rn) (little ‘l1’). It
is nicknamed the ‘taxicab’ or ‘Manhattan’ metric because the distance between
points on a 2-dimensional space, R2 is computed using an L-shaped pattern as a
taxi driver in Manhattan would travel (rather than ‘as the crow flies’, which would
give the ‘Euclidean’ distance).
Q1 Prove that map d : R2 × R2 → R, defined by
d(P1, P2) = |x2 − x1|+|y2 − y1| (263)
where Pj = (xj, yj) are points in R2, defines a metric on R2.
Q2 In general usage, a circle refers to a 1-dimensional manifold which we usually
view as an object embedded in a 2-dimensional plane (a graph). However, a more
general definition is to consider a circle as a subset of points in R2 which are
equidistant by some fixed amount r, from some chosen point which we call the
‘centre’ of the circle. Mathematically, if we take the center to be the origin (0, 0),
then a circle S1(r = radius, 0 = center) is the locus of points x ∈ R2 such that
d(x, 0) = r. (264)
Using this definition, work out what a circle of ‘radius 1’ centred at the origin
would look like with the taxicab metric. Now, draw another circle with the same
radius and origin on top of your original circle, but this time using the standard
Euclidean metric. What do you notice?
Exercise 40 (The lp Family) In an attempt to digitally preserve old music, Matt
Fernandez buys one million LP records. While listening to these records in Tower
C2 and bragging about his view of the river, Matt decides to generalize the Taxicab
metric in the following way:
dlp (X, Y ) := (|y1 − x1|p
+|y2 − x2|p
+.... + |yn − xn|p
)
1
p (265)
where X = (x1, ..., xn) and Y = (y1, ..., yn) are points in n-dimensional real
space, Rn. Note the constant p that appears in the exponents as a fixed power. The
subscript lp in d is simply a label to keep track of of which metric we are referring
to.
Q1: Clearly, when p = 1 this gives us the taxicab metric on Rn (set n = 2 if you
need to convince yourself). Now verify this.
105
A mentor to Einstein and German mathematician responsible for the ‘4-dimensional’ model of
Special Relativity.
122
Q2: When p = 2, what familiar metric do you get? Hint: Try setting n = 2 or
n = 3 for simplicity.
Q3: Recalling the earlier definition of a circle S(r, 0) of radius r and center 0 as
the set of points X ∈ R2 such that
dlp (X, 0) = r, (266)
draw what a circle of radius r = 1 and center (0, 0) would look like in R2 when
• p = 1 and p = 2
• 0  p  1, for example p = 1
2.
• p = 100 or some other large number.
In particular, what do you notice happens when p → 0? If you start at p = 1 then
vary p to 0, you should get a Euclidean rhombus (diamond) whose edges ‘collapse’
inward toward the origin while its vertices remain fixed at (±1, 0) and (0, ±1).
Now what happens when p → ∞? Your circle should look a Euclidean square!
Hint: You are essentially solving the equation d((x, y), 0) = 1, using the above
expression for your metric d for the stated value of p. The set of points that form a
solution to this equation should give you a graph of your ‘circle’.
Hint: It helps to consider the four different quadrants of the Cartesian plane sep-
arately, in order to get rid of the absolute value signs. This means x  0, y  0,
then x  0, y  0 e.t.c. It also helps to choose some few points, e.g. (1, 0), which
satisfy the equation d((x, y), 0) = 1, plot them and see if you can guess the overall
pattern.
Q4: Challenge Prove that the map dlp on Rn is indeed a metric, as claimed.
Hint: You will need the Cauchy-Schwarz inequality, one way or another, to prove
that the triangle inequality is satisfied.
Q5: Challenge If you did your math correctly, you should see that when p →
∞ our circle becomes a (Euclidean) square!! The special case when p = ∞
is mathematically very important – but also fun. In particular, the metric d∞ is
referred to as the ‘Chebyshev’ distance 106,‘supremum distance’ – or colloquially
as the ‘chessboard metric’. Using some sorcery of inequalities and your knowledge
of the algebra of limits, prove that:
dlp (X, Y ) = Max
i=1,...,n
{|y1 − x1|, |y2 − x2|, ..., |yn − xn|} (267)
106
After the 19th Century Russian mathematician,Pafnuty Chebyshev.
123
for p = ∞. This simply says that the distance between two points is the magnitude
of the maximum possible difference between any two of their coordinates. An easy
way to think of this is that on a chessboard, the minimum number of moves that a
King requires to move from one particular square to another particular square is
equal to the dl∞ distance between the squares.
Hint: For simplicity, you can set n = 2 dimensions then try a proof for general n
afterwards.
Exercise 41 (The Phlegethon River metric) 107
The Phlegethon river features in Greek literature as one of five infernal regions of
hell. More importantly, in Dante’s Inferno, it features as a river of boiling blood
in which greater murderers – such as Atilla the Hun, are tortured for eternity.
Considering the portion of the seventh circle of hell containing the Phlegethon
river, we can view this as a 2-dimensional space. Being residents of St. George’s
College – an Anglican college, the centaur Nessus agrees to give us free rides
along and across the Phlegethon river, whenever we please. Setting up a coordi-
nate system with the x-axis along the Phlegethon river and y-axis in the direction
perpendicular, we now distinguish places in the seventh circle to according to the
amount of distance we have to walk to travel between places. In this respect, the
‘shortest’ path between places will typically make use of free boat rides from Nes-
sus along the Phlegethon river – we can use this notion of ‘shortest’ path to define
a metric!
Mathematically, we can define ‘Phlegethon River metric’ d on R2 to be:
dRiver(P1, P2) = Min {dl2 (P1, P2), |y2 − y1|} (268)
where P1 = (x1, y2), P2 = (x2, y2) are points in R2 and dl2 (P1, P2) is the stan-
dard Euclidean or l2 distance between P1 and P2.
Q1: Choosing three distinct points, compute the ‘Phlegethon distance’ between
them. Draw a diagram to illustrate the you would take (shortest path) between
points? What do you notice?
Hint: To see something interesting, Choose points for which y2 = y1 and take at
least one point to be on the opposite side of the Phlegethon river (x-axis). It
Q2: Intermediate Recalling the earlier definition of a circle S1(r, 0) of radius r
centred at the origin, draw what a circle of radius 7 and center (0, 0) would look
like with the Phlegethon river metric.
107
From recollection, Professor Brailey Sims referred to this as the ‘Jungle River Metric’.
124
Now, shift this circle vertically by 2 units, i.e. draw another circle of radius 7 with
center (0, 2).
Now sketch a circle of radius 1 centred at (2, 2).
Hint: If you get something crazy, it’s probably correct. In particular, your first
‘circle’ should108 just be two straight lines at y = ±3.5 extending to x = ±∞.
Your second circle should look like a ‘Euclidean circle’ which ‘opens’ up and
stretches out to x = ±∞ at y = −1.5.109 Your third circle should like the graph
of an ordinary Euclidean circle.
Q3: Easy(ish) Prove that the Phlegethon River metric is indeed a metric. Hint:
Since we already proved that l2 and l1 were metrics, it shouldn’t take much to show
that the Phlegethon metric on R2 satisfies the metric space axioms.
Q4: Challenge The Phlegethon metric we proposed made use of the Euclidean met-
ric, dl2 . Replacing this part of the definition with the general lp family of metrics,
prove or disprove that
dRiver(P1, P2) = Min{dlp (P1, P2), |y2 − y1|} (269)
gives rise to a whole family of different metrics for 0 ≤ p ≤ ∞. We shall call refer
to such metrics as the ‘infernal p-metrics’.
In particular, sketch what a circle of radius 1 and center (0, 0) would look like for
p = 0.5, p = 1, p = 3, p = ∞. This should be an easy extension of the p = 2 case
and your sketches for the circles under the dlp metrics.
Exercise 42 (Cryptographer’s Metric) An essential part of cryptography and in-
formation theory is ‘error detecting’ and ‘error correction’. In particular, informa-
tion is often delivered as a string of symbols or digits. In the modern world, most
information is delivered in ‘binary’ – meaning a base two system consisting of ze-
ros and ones. Like your usual base 10 numbers, a binary string of digits represents
a number. This number may correspond to some letter of the English or Cyrillic al-
phabet (for example) – and often, it comes with some ‘cipher’, meaning a rule that
puts binary strings into a unique 1-1 correspondence with letters in your alphabet.
Sometimes when information is transmitted, errors can accrue – for example, digits
in the sent binary string get ‘flipped’ (meaning zeros get replaced by ones and
vice versa). An obvious way to measure ‘how different’ two strings of symbols
differ – e.g. binary strings, is to use count the number of of positions at which the
108
Acknowledgement to Matthew, Theresa and William for pointing out a flaw in my memory.
109
The idea of an ‘exploded, bleeding circle seems befitting for the Phlegethon.
125
corresponding symbols of two strings are different. In other words, one counts the
minimum number of alterations (‘bit flips’ for binary strings) required to transform
one string into the other. Mathematically, this gives us a metric known as the
‘Hamming distance’ between strings:
d(String1, String2) = # of Positions where String 1 and 2 differ. (270)
For binary strings, e.g. S1 = (1, 0, 1, 1, 0) and S2 = (0, 0, 0, 0, 0), we can compute
the Hamming distance easily considering S1 and S2 as 5-dimensional vectors of
integers modulo 2 – i.e. S1 = (a, b, c, d, e) where a, ..., e ∈ Z2 e.t.c.110 Subtracting
the strings as vectors to form an ‘error’ vector, = (String1 − String2) Mod 2,
a 1 appears in positions where String 1 and 2 were different and 0 appears where
they are equivalent. Therefore, we can write the Hamming distance between two
binary strings, String 1 and String 2 as
d(String1, String2) = Sum of all entries of , (271)
where = (String1 − String2) Modulo 2.111
Q1: Prove that the Hamming distance gives a metric on the set of ‘strings’ or ‘code-
words’ of a given length L (note that a string of length L means an L-dimensional
vector with symbols as its components).
Q2: When transmitting information, the ‘Hamming distance’ between the trans-
mitted string and the received string is equal to the ‘minimum number of errors’
that could have occurred in the communication process. Explain why it is specif-
ically equal to the ‘minimum’ number of possible errors, as opposed to say, the
‘maximum number’.
Q3(Challenge): For binary strings, an n − bit error means that the hamming dis-
tance between the transmitted and received string is equal to n, where n is some
integer. One can ‘error-protect’ their codewords / strings by appending ‘extra bits’
to their transmitted string, in some clever fashion. This assumes that all codewords
/ strings differ by some ‘minimum’ distance dmin to begin with – note that all
distinct strings necessarily differ by at least 1 digit, but by choosing large enough
strings to carry to smaller amounts of information (block codes), one can increase
error-correction capacity. In essence, by adding more bits one can protect against
larger errors.
110
Recall that ‘modulo 2’ means that all positive and negative even integers become equivalent to
0 and all positive and negative odd integers become equivalent to 1.
111
Note that it doesn’t matter in which order we subtract the strings, since doing the subtraction
modulo 2 gives the same result – one could even add the strings.
126
Can you think of an decoding algorithm to protect strings against 1-bit errors? Here
you will need to assume strings have some length n = a+b, where a is the number
of information digits and b are the number of error-correction distances.
Q4(Extra Challenge): Having completed or searched for the solution to the last
problem, what is the minimum number of extra bits required to protect 8-bit code-
words/strings from 1-bit errors?
What is the minimum number of extra bits required to protect against 2-bit errors?
Can you devise ‘the most efficient’ error correction algorithm to protect n − bit
words against k − bit errors, where k  n? If you can, prove that this is the most
efficient.
Upon completing this exercise, you’ve effectively reproduced the research of Richard
Hamming in 1950, regarding the creation of linear error-correcting codes.
Exercise 43 (The Discrete Metric) Given a set S, one can endow it with the ‘dis-
crete metric’ d. This defined by:
d(x, y) =
1, if x = y
0, if x = y
, (272)
for any members x, y of the set S. Note that every topological space can be
equipped with this metric. Although it seems bizarre, the discrete metric is im-
mensely useful for providing ‘counter-examples’ in topology as well as playing an
essential role in some proofs.
Q: Prove that the discrete metric is indeed a metric.
127
15.2 Non-Euclidean Metric Spaces and Relativity
When we used the ‘Pythagoras’ theorem to give us notions of distance in n-dimensional
real space, Rn, we constructed ‘Euclidean metric spaces’ which give rise to ‘Eu-
clidean geometry’ – summarized by Euclid of Alexandria in his ‘Elements’ at
around 300BC. One thing that characterises such geometry, is the parallel postulate
– that two straight lines which are parallel (and non-overlapping) never intersect.
Another characterising property is that the sum of angles in a triangle is always
180 degrees.
Towards the latter part of the 1700s, due immensely to work of the the German
mathematician, C.F. Gauss, ‘non-Euclidean’ notions of geometry began to emerge.
In particular, geometries where ‘straight lines’ did intersect. Such advances pro-
pelled the field ‘differential geometry’ and the work of Gauss’ student, Bernhard
Riemann – which underpins General Relativity, vast areas of applied mathematics,
information theory, engineering and most of modern physics.
A simple example of a non-Euclidean geometry is the planet we live on! If we con-
sider the Earth a sphere (or oblate spheroid), ‘straight lines’(geodesics) correspond
to ‘great circles’ on the sphere – these are circles which slice along the diameter
of the sphere. All great circles necessarily intersect each other – as such, there
exists no ‘straight line’ on a sphere which has a some parallel. Similarly, angles in
a spherical triangle sum to more than 180 degrees but less than 540 degrees.
Taking this notion further, together with Albert Einstein, the German mathemati-
cian Hermann Minkowski developed a model of the physical world as a ‘4-dimensional
spacetime’ with a special metric known as the ‘Lorentz metric’. In this manner, the
relativistic laws physics can be naturally obtained by considering spacetime as a
4-dimensional hyperbolic space, rather than the Newtonian 3 + 1 dimensional112
Euclidean space, with we are so familiar with.
In the next tutorial, we will use our knowledge of metric spaces to investigate
Minkowski’s ideas.
112
In Newtonian physics, one considers space and time as fundamentally ‘separate’ and indepen-
dent entities – meaning one 3-dimensional space and events measured by some universal clock (one-
dimensional time). In relativity, the 3+1 or ‘space + time’ split occurs when an observer decomposes
events into his 3-dimensional rest space and the 1-dimensional subspace orthogonal to it – his ‘time’
axis.
128
16 Tutorial 13/14: Relativity and Hyperbolic Distance
The outcome of the last study group session, should have been a mathematical
understanding of the following key ideas:
• We can mathematically formalize the notion of ‘distance’ between points in
a set by endowing that set with a ‘metric’ – a function which is characterised
by a set of intuitive axioms.
• Metric spaces have vast applications to the real world – recall the Euclidean,
Taxicab and Hamming metrics. More generally, many optimization prob-
lems such as optimizing fuel consumption and lowest risk paths through
naval minefields for warships, can be viewed as ‘minimizing’ some abstract
‘distance’.
• Our perceptions of ‘geometry’ – such the ‘shape’ of objects, is fundamen-
tally tied to the metric space we choose to work in. Recall that ‘circles’ for
example, can appear to look entirely different to the usual ‘Euclidean circle’,
simply by a new choice of metric.
In this tutorial, we will revise elementary notions of ‘circular’113 and hyperbolic
trigonometry. We then see how one can use this knowledge to represent different
notions in Einstein’s theory of Special Relativity – hence showing how properties
of nature emerge from the notion of a metric space.
16.1 The Two Faces of Trigonometry
16.1.1 The Circular Face
Today, we shall proclaim Janus to be the patron of trigonometry. Recall the geo-
metric definitions of the sine, cosine and tangent functions arise by considering a
right-angle triangle inscribed into a circle whose radius is the hypotenuse of said
triangle. Now let θ be the angle between the hypotenuse and another radial line of
the triangle as in the figure below. By setting the radius of the circle r = 1, one
then sees that the fundamental trigonometric identity:
sin2
(θ) + cos2
(θ) = 1, (273)
113
That is, the standard trigonometry of sines, cosines e.t.c.
129
Figure 4: Illustrating trigonometry as the ‘geometry of the circle’.
is simply a consequence (or expression) of Pythagoras’ Theorem for right-angled
triangles. For this reason, one may consider trigonometry to be the ‘geometry of
the circle’.
N.B:If you consider the complementary sides of the triangle to be the x, y variables
in the Cartesian plane as in the diagram, then the equation for the graph of a circle
is also just a restatement of Pythagoras’ theorem – that is, the set of all points
(x, y) ∈ R2 which satisfy the equation:
x2
+ y2
= r2
. (274)
Exercise 44 (Easy Warm-up) In the above formula, we chose the radius r = 1.
Now, let r be arbitrary and use your standard trigonometry techniques to re-
express Pythagoras’ theorem:
a2
+ b2
= c2
, (275)
in terms of sine and cosine, where c is the hypotenuse of the triangle and a, b are
the two other sides.
In particular, one should see a cancellation of r2 from both sides, thus proving the
fundamental trigonometric identity.
130
Exercise 45 (Easy) Recall that the tangent function is geometrically defined in
terms of the ratio of the sides opposite and adjacent to the given angle θ:
tan(θ) =
opposite
adjacent
=
sin(θ)
cos(θ)
. (276)
Similarly, recall that the reciprocal trigonometric functions are defined as:
cot(θ) =
1
tan(θ)
csc(θ) =
1
sin(θ)
sec(θ) =
1
cos(θ)
. (277)
From this, show that identities:
1 + tan2
(θ) = sec2
(θ)
1 + cot2
(θ) = csc2
(θ), (278)
simply follow from the fundamental trigonometric identity.
Now recall Euler’s formula 114
eiθ
= cos(θ) + i sin(θ), (279)
which relates the complex exponential function to trigonometry and the geometry
of the complex plane C. This is somewhat intuitive, if we represent a complex
number z = x + iy (where x, y ∈ R) in polar form:
z = reiθ
(280)
where r =
—
x2 + y2 and θ is the angle between the vector z and the real axis
(measured anticlockwise from y = 0). Then as θ varies from 0 to 2π, the vector z
traces out a circle of radius r – in which we can inscribe a right-angle triangle with
sides of length r cos(θ) and r sin(θ).
Exercise 46 (Moderate) Using Euler’s formula, prove that we can express sine
and cosine as follows:
cos(θ) =
eiθ + e−iθ
2
, sin(θ) =
eiθ − e−iθ
2i
. (281)
Hint: It’s easy to forget the factor of i in denominator of the sine formula.
114
Recall the imaginary unit i is defined such that i2
= −1.
131
Exercise 47 (Easy) A lesser known mathematician around the time of Isaac New-
ton, was Abraham de Moivre. Apart from calculating the day of his own death
(based on number of hours slept), De Moivre is known for the following trigono-
metric formula:
(cos(θ) + i sin(θ))n
= cos(nθ) + i sin(nθ), (282)
which holds for any θ ∈ R and any n ∈ Z.
Prove this formula.
Hint: Use Euler’s formula.
16.1.2 The Hyperbolic Face
Recall that the equation for the graph of a circle of radius r, centred at the origin
(0, 0) of a Cartesian coordinate system, is given by:
x2
+ y2
= r2
(283)
where (x, y) ∈ R2. From this circle and the Pythagorean theorem, we earlier
obtained the fundamental circular trigonometric identity. Therefore, it seems plau-
sible that by considering the geometry of other conic sections, one should obtain
analogous sets of identities. For the case of an ellipse, this is just a distortion or
‘rescaling’ of the circle, by different factors along each axis – ultimately leading
back to the circular trigonometric identities.
If we now consider a unit equilateral (rectangular) hyperbola centred at the origin,
with its foci and vertices lying along the x-axis (so that its branches open up along
this axis), an equation for its graph is given by:
x2
− y2
= 1. (284)
This is a consequence of its geometric definition as “The locus of points such that
the difference between the distances to each focus is constant” and Pythagoras’
theorem 115.
Tutorial 12 Observation: If we equipped our 2-dimensional space R2 with a hy-
perbolic metric, defined by
η(V1, V2) := x1x2 − y1y2 (285)
115
Alternatively, one may consider this equation as a hyperbolic version of the Pythagorean theo-
rem.
132
where Vj = (xj, yj), then (284) would simply be the equation for the ‘unit circle’
in this ‘hyperbolic space’!
For the Euclidean unit circle, we had x = cos(θ) and y = sin(θ) giving rise to the
fundamental identity. In this hyperbolic case, we instead have:
x = cosh(θ), y = sinh(θ) (286)
where cosh and sinh are the ‘hyperbolic’ cosine and sine, giving us the fundamen-
tal hyperbolic trigonometric identity:
cosh2
(θ) − sinh2
(θ) = 1. (287)
From this identity, one derive the consequential hyperbolic identities in an analo-
gous fashion to the circular trigonometric identities.
Exercise 48 (Easy-Optional) As above, repeat the earlier exercise deriving trigono-
metric identities – but this time, replace each function with its hyperbolic version
(notationally, this just amounts to adding a ‘h’ at the end of the function name).
Hint: Beware of new − signs appearing.
Continuing the analogy, recall that from Euler’s formula one gets the trigonometric
identities:
cos(z) =
eiz + e−iz
2
, sin(z) =
eiz − e−iz
2i
, (288)
where z was some real number (an angle). More generally, Euler’s formula holds
for complex numbers, z ∈ C – in particular, one can prove this since the complex
Taylor series converges on the entire complex plane. Now, notice the following
trick. We perform the transformation:
z → iz, (289)
which corresponds to a counter-clockwise rotation by π
2 radians in the complex
plane116
Fun Aside: This same transformation, S → iS, is used in Quantum Field Theory
to convert quantum partition functions into thermodynamical ones (from statistical
mechanics). In that case, S is the action117 for some quantum theory and this
116
Recall that multiplying a complex number (represented by a 2-dimensional vector) by the imag-
inary unit i, rotates it by 90 degrees anti-clockwise. This can be seen in polar form, since multiplying
by eiθ
rotates a complex number counter-clockwise by θ. In particular, i = ei π
2 .
117
An mathematical object (‘integral’) which essentially contains the entire theory.
133
‘complex rotation’ is known as a ‘Wick rotation’. It features in the derivation of
the Hawking-Bekenstein temperature of black hole.
In our case, we set z = x to be some real number. After the transformation x → ix
and some algebra, we then get the hyperbolic trigonometric functions:
cosh(x) =
ex + e−x
2
, sinh(x) =
ex − e−x
2
. (290)
Exercise 49 (Easy-Moderate) Using the Euler identities for sin, cos and their
hyperbolic counterparts (in terms of the exponentials), prove that:
• cos(iθ) = cosh(θ).
• sin(iθ) = i sinh(θ).
Challenge: Using the last two identities, can you think of an easy ‘trick’ to extract
the hyperbolic trigonometric identities from the corresponding circular trigono-
metric identities?
Following the results of the last exercise, one should observe:
cos2
(iθ) = cosh2
(θ), sin2
(iθ) = − sinh2
(θ). (291)
Therefore, in all circular trigonometric identities involving squares of trigonomet-
ric functions, making the transformation θ → iθ should induce the following trans-
formations:
cos2
(θ) → cosh2
(θ)
sin2
(θ) → − sinh2
(θ) . (292)
In particular, observe what happens to the fundamental circular trigonometric iden-
tity:
cos2
(θ) + sin2
(θ) → cosh2
(θ) − sinh2
(θ). (293)
In this manner, one should be able to quickly deduce the hyperbolic identities from
the circular ones.
Exercise 50 (Easy) Using the above trick, complete the following hyperbolic iden-
tities:
1 − coth2
(θ) =
1 − tanh2
(θ) =. (294)
134
Hint: Remember that tan2(θ), cot2(θ) and csc2(θ) will all involve a factor of
sin2
(θ), so these terms will pick a minus sign when one switches to the hyperbolic
counterparts.
Exercise 51 (Perambling Down Memory Lane) Recall the angular addition for-
mulas for sine and cosine, derive the hyperbolic counterparts. In particular, ex-
press the following in terms of products of sinh(θ) and cosh(θ):
sinh(x + y) =
cosh(x + y) =. (295)
Hence,
sinh(2x) =
cosh(2y) =. (296)
Hint: Use the identities cos(iθ) = cosh(θ) and sin(iθ) = i sinh(θ), which you
proved earlier.
16.2 Lorentz Metric and Relativity
From antiquity to Newton and onwards, most people have perceived the world
as ‘3 + 1-dimensional’ in the sense that ‘space’ (3-dimensional) and ‘time’ (1-
dimensional) were viewed as independent entities. In particular, to the uneducated
their still persists a false notion of some ‘universal time’ or ‘universal clock’ which
‘ticks’ at the same rate for all observers. Such assumptions lead us to noting that
nature obeys a set of symmetries - rotations and translations. This means that
Newtonian physics is governed by a 3 + 3 = 6 dimensional symmetry group118
(called the ‘Gallilean’ or ‘Euclidean’ group)– 3 dimensions for rotations (one for
each axis about which an object can rotate) and 3 dimensions for translations. This
is simply wrong.
As most of you would be familiar with, the theory of relativity arose at the begin-
ning of the 20th Century, to explain various phenomena such as the null result to the
Michelson-Morely attempts to measure an ‘aether wind’. Although it is attributed
to Albert Einstein, one should note that the mathematician Henri Poincare and
physicist Hendrik Lorentz had already observed a set of symmetries under which
118
A Lie group, recalling from tutorials 8-11.
135
Maxwell’s equations for electromagnetism, were invariant 119. This set of sym-
metries is known as the ‘Lorentz Group’, 120 whose elements are the ‘Lorentz-
transformations’ . Such a group is 6-dimensional as it consists of three spherical
rotations (spatial rotations) and three hyperbolic rotations (Lorentz boosts). If one
enhances this symmetry group by adding in translations (three in space and one
in time), it becomes the 10-dimensional ‘Poincare group’. As Einstein deduced,
not only does electromagnetism obey these symmetries, but in fact the physics for
all observers121 obeys these symmetries. Such an observation is equivalent to Ein-
stein’s statement that “the laws of physics are the same in all inertial reference
frames”. To understand the equivalence of the Lorentz symmetries to Einstein’s
postulates, we must establish the notion of ‘Minkowski spacetime’ – that is, the
universe and its entire history as a 4-dimensional metric space with a ‘hyperbolic’
metric.
16.2.1 Minkowski Spacetime
Despite relativity being somewhat ‘colloquial’ knowledge these days, for most peo-
ple it is fundamental to their intuition to view time and space as separate entities.
In part, this is because we measure ‘distance’ with rulers and ‘time’ with clocks.
This is carried along with the notion of some universal time-keeping device – for
example, we all synchronize our clocks to some state or international clock.
Now consider the radar – an invention from World War 2122 that uses ‘radio waves’.
Radar can be used to measure distances via bouncing radio waves off objects and
measuring their ‘time of flight’. This is possible because radio waves are electro-
magnetic waves – and as Maxwell showed, they must therefore travel at some fixed
speed 123 c (the speed of light, shown to be constant via the Michelson-Morely ex-
periment). In this manner, one could think of distances between objects in terms of
the number of ‘seconds’ it takes to transmit and bounce/receive a radio wave off an
object. As such, ‘spatial distances’ simply become another measure of ‘time’ and
the union of 3-dimensional ‘space’ and 1-dimensional ‘time’ into a 4-dimensional
spacetime no longer seems as foreign.
119
It is not hard to envision that given sufficient time, Poincare, Lorentz and collaborators would
have reproduced the tenets of Special Relativity.
120
A Lie symmetry group, for those of you who completed tutorials 8-10.
121
In a flat spacetime – hence ignoring curvature arising from gravity.
122
A secret technology developed by British which played a huge role in their defence against the
Germans during the bombing of London.
123
Technically speaking, in a vacuum. Refractive processes alter the effective speed in the atmo-
sphere.
136
We now propose spacetime to be the set of points in the 4-dimensional real vector
space, R4. Note that this is not necessarily a Euclidean space since we haven’t
specified a metric yet! Points in spacetime are called events, which may include
the college winning the piano competition and ‘battle of the bands’, or may include
the creation of a Higgs boson in the Large Hadron Collider at CERN.
An observer is defined to be some particle travelling along in spacetime, though
you can consider yourself an observer. Thus an observer traces out a curve through
spacetime, which we call a worldline – each point on that worldline is an event in
the history of that observer. The time measured by each observer – i.e. by a clock
they carry, is equal to the arc-length (or simply, ‘length’) of their worldline (or some
segment of it when measuring time between events on their worldline).
Given an observer, one may define an ‘origin’ in spacetime relative to the wordline
of that observer – this allows us to turn spacetime into a vector space, which is a
structure you are familiar with. However, the principle of relativity tells us that
no observer is ‘unique’! This means that the choice of origin (a ‘special point’)
is not unique – prior to the observer, there is no ‘unique point’ in spacetime. As
such, the correct intrinsic mathematical for spacetime is an ‘affine space’ – which is
essentially a vector space with its origin ‘scrubbed out’ (it is the role of an observer
to specify the origin).
Unlike observers with mass, photons (or ‘light rays’) are massless – they travel on
special curves called ‘null lines’. This is because the length of such curves is zero
(‘null’).
Exercise 52 (Gendanken) In mad effort to avoid appearing in college tutorial
questions, Ben Luo decides to accelerate himself close to the speed of light (rela-
tive to St. George’s College) with an ‘acceletron’ built by Tessa McGrath. Unfor-
tunately, Tessa intentionally miscalibrated the acceletron – thus turning Ben into a
photon.
Recall that the ‘proper time elapsed’ between events (points) on an observer’s
wordline is the ‘arc-length’ between those points. Thus, if photons travel on ‘null-
lines’, what is the proper-time elapsed since Ben Luo turned into Ben Photon?
How does this differ by the time that Tessa measures with her watch?
Can you see any problems with this geometric definition of proper time? If so, what
restrictions would you suggest?
Some of you may be familiar with the Frenet-Serret formulas pertaining to the
‘differential geometry of curves’, from first year vector calculus. In particular, you
may recall that the motion of a particle in 3-dimensional space can be modelled
137
as the a curve traced out by some vector r(t) = (x(t), y(t), z(t)) parametrised
by some parameter t – for example, time. In this manner, the ‘velocity’ of the
particle is given as a vector v(t) = d
dt r(t) which is tangent to the curve at the point
(x(t), y(t), z(t)).
If we now consider an infinitesimal line segment of length dr = v(t) dt = along
the curve and integrate this between two points on the curve, one gets the ‘length’
(or rather, ‘arc length’) of the curve between those points. In appropriate units,
this is simply the distance that particle (whose trajectory is given by the curve) has
travelled:
L =
r2
r1
dr =
t2
t1
v(t) dt. (297)
Exercise 53 (The Road Less Travelled) To help Ben in his dilemma, Claire Wadding-
ton develops a cure for Ben’s photonitis, returning him to his original form. En-
lightened by his journey, Ben takes recourse in analysing his travels with the
Frenet-Serret formalism. In this manner, he models his journey as a worldline in
4-dimensional space given by R(t) = (cτ, x(τ), y(τ), z(τ)) where τ is his ‘proper
time’. Tangent to his worldline, is his 4-velocity vector
U =
dR(τ)
dτ
= (c,
dx(τ)
dτ
,
dy(τ)
dτ
,
dz(τ)
dτ
). (298)
Using the 3-dimensional curve discussed earlier as an analogy, write down an
expression for the ‘arc length’ of Ben’s worldline between the points at which Ben
came to St. George’s College and the point at which he was cured of being a
photon.
Draw a diagram of Ben’s worldline, with points labelled, to illustrate the line in-
tegral you have written. What do you notice about the period in which Ben was a
photon?
16.2.2 Lorentz Metric and Light-Cone Structure
Thus far, any mention of ‘length’ in our 4-dimensional spacetime comes with an
implicit reference to some spacetime ‘metric’. Recall from tutorial 12 that a metric
d on some set S was defined to be a map on S with the following properties:
1. positivity – for any x, y ∈ S,
d(x, y) ≥ 0, (299)
138
2. non-degeneracy
d(x, y) = 0 iff x = y, (300)
3. symmetric
d(x, y) = d(y, x), (301)
4. triangle inequality
d(x, z) ≤ d(x, y) + d(y, z). (302)
The first exercise from tutorial 12 was to show that ‘positivity’ was a consequence
of properties 2-4. In particular, the direction of the triangle inequality determines
the positivity of the metric. If we now relax the notion of ‘positivity’ and con-
sider the idea of ‘negative’ distances, this would require ‘reversing’ the triangle
inequality. In other words, the fact that everyday ‘Euclidean distance’ is positive
is a strict consequence of the triangle inequality – which itself is a consequence of
the Cauchy-Schwarz inequality. Thus, to get ‘negative distances’ on a real vector
space Rn, one needs to reverse the Cauchy-Schwarz inequality – this is the heart
of Euclidean vector geometry. This is precisely what we need to do in relativ-
ity.
Upon the (arbitrary) choice of some origin O = (0, 0, 0, 0) in spacetime, we es-
tablish a 4-dimensional Cartesian coordinate system (ct, x, y, z) with respect to
this origin. We can then think of points in spacetime as marked by position vec-
tors relative to the origin124, given the choice of some standard basis vectors125
{∂t, ∂x, ∂y, ∂z}. Thus in this basis, the point (ct, x, y, z) can be written as the
4-vector:
R = ct∂t + x∂x + y∂y + z∂z. (303)
Marking two points P1 = (ct1, x1, y1, z1) and P2 = (ct2, x2, y2, z2) by the two
4-vectors
R1 =ct1∂t + x1∂x + y1∂y + z1∂z
R2 =ct2∂t + x2∂x + y2∂y + z2∂z (304)
we can then form a vector pointing from P1 to P2 via ‘vector subtraction’:
P1P2 = R2 − R1 = c∆t∂t + ∆x∂x + ∆y∂y + ∆z∂z (305)
124
More precisely, by the ‘affine subtraction map’.
125
Although the notation correctly suggests these to be partial derivative operators, we can view
these as ‘unit vectors’ in each of the coordinate directions – perhaps you prefer the notation ∂x = ex
for example.
139
where ∆t = t2 − t1, ∆x = x2 − x1, e.t.c.
Exercise 54 (Art Class) Draw a spacetime diagram with the origin at some point
on an observer’s worldline. Now draw two more points P1 and P2 marked by vec-
tors R1 and R2 (relative to the origin) to illustrate the vector subtraction process
described above.
You should get a triangle.
In this manner, we define the ‘Lorentz distance’ ∆S between two points P1 and P2
on a 4-dimensional spacetime to be the ‘length’ of the 4-vector P1P2 as given by
the Minkowski metric:
∆S :=η(P1P2, P1P2) = η(R2 − R1, R2 − R1)
= − (c∆t)2
+ (∆x)2
+ (∆y)2
+ (∆z)2
. (306)
Comparing to Euclidean geometry, you will notice that this almost looks like a
‘4-dimensional’ version of the Euclidean dot-product of a vector with itself:
P1P2 · P1P2 = (c∆t)2
+ (∆x)2
+ (∆y)2
+ (∆z)2
. (307)
The difference is that in the Lorentz distance, we have the appearance of a minus
sign in front the ‘time’ term – we call this a ‘Lorentzian signature’ (−, +, +, +).
For those of you who have studied relativity, you will recall that ∆S simply your
‘spacetime interval’ which is invariant under Lorentz transformations.
The object η is the Minwkoski metric126, which we can define as type of ‘Lorentzian
dot-product’ between two arbitrary 4-vectors in spacetime:
η(V1, V2) = V1 ·Lorentz V2 = −a1a2 + b1b2 + c1c2 + d1d2, (308)
where Vj = aj∂t + bj∂x + cj∂y + dj∂z and aj, bj, cj, dj are real constants, for
j = 1, 2.
Exercise 55 (Easy) Compute the ‘Lorentzian dot-product’ between the following
sets of vectors:
• V1 = 2∂t, V2 = 6∂t + 9∂x + 6∂y + 9∂z.
• V1 = 1∂t, V2 = 1∂x + 1∂y + 1∂z.
• V1 = 1∂x, V2 = 1∂y + 1∂z.
• V1 = 1∂t + 1∂x, V2 = V1.
126
The symbol η is the Greek letter ‘eta’.
140
Now compare these to the Euclidean dot-product between the same vectors. What
major differences do you notice?
Recalling the definition of the Euclidean length of a vector R = a∂t +x∂x +y∂y +
z∂z:
V Euclid=
—
V ·Euclid V =
—
a2 + x2 + y2 + z2, (309)
what do you notice about the Euclidean length of the vector V1 = 1∂t +1∂x versus
its Lorentzian length?
In Lorentzian geometry, there arises non-zero vectors which have zero length!
Such vectors are called ‘null vectors’ – they are tangent vectors to null geodesics
(‘null lines’ or ‘light rays’). As such, these vectors are the 4-velocity vectors for
‘photons’. If we extract the 3-velocity vector v as the spatial part of the 4-velocity
vector V for a photon, one should notice that the length of the 3-vector v is sim-
ply the ‘speed’ of the photon: Speed = c = v , as measured by some ob-
server.
Because the Lorentz metric allows the ‘hyperbolic dot product’ between non-zero
vectors to be positive, zero and negative, spacetime is endowed with a natural ‘light
cone structure’. To see this, consider the following exercise.
Exercise 56 (Lawyer of the Universe) During an evening walk down the Swan
river from St. George’s College, Christabel Moffat decides to reflect on her life
choices. In this moment of contemplation, she has an epiphany – if she considers
her straight-line walk from the college down the river to be a walk along the x-axis
whilst measuring some time elapsed t since leaving the college, she can plot her
walk on a Minowski spacetime diagram.
Q1: Draw a set of two coordinate axes on your page, labelling the vertical axis
as time t and the horizontal axis as displacement x from the origin (St. George’s
College). Consider the start of Christabel’s walk in her frame. Upon leaving the
college, it is noticed that Christabel announces her adventurous nature by flashing
two torches simultaneously – one in the direction of the river (+x direction) and
one in the direction of the College (−x direction). Plot this event on your spacetime
diagram.
Hint:You should get two ‘45-degree’ (in our Euclidean view) lines emanating from
the origin in opposite directions. These lines represent the journey of the light-
rays emitted from the torch, in spacetime. These light rays define the surface of a
1-dimensional cone.
Q2: After drawing her spacetime diagram, Christabel wonders what on Earth
caused her to take up art. In this manner, she starts to think about ‘causality’ –
141
that is, events which are ‘causally connected’ to each other in spacetime. To help
Christabel find the cause of her artistic inspiration, consider that the ‘speed of
light’ is a fundamental ‘speed limit’ in our universe. For one thing to cause another,
it cannot transmit information faster than light (spooky ‘action at a distance’) –
therefore, everything that is causally connected must lie within some restricted
region of Christabel’s spacetime diagram. Identify this region.
Hint: You can consider adding another axes to your spacetime diagram – say the y
axis, which lies in the direction of Mounts Bay Road perpendicular to the college-
river axis). Now, imagine instead of shining two torches upon leaving the college,
that Christabel instead sets off an LED hoola hoop – meaning that light rays get
shone outwards in a circle in the x − y plane. If you consider the worldlines of all
the light rays in this fashion, they should form a 2-dimensional surface – the light
cone!
Q3: Once you have identified the region in spacetime ‘causally connected’ to
Christabel, what can you say about points that lie outside this region? In particu-
lar, is it possible that any event outside this region could have influenced Christabel
to draw spacetime diagrams?
Q4: In Christabel’s frame (of mind), everything is moving relative to her – she
is stationary (at rest). Except when drawing spacetime diagrams, she considers
space to be 3-dimensional and time to be separate. This is a lie, though it has some
element of truth. In particular, the 3-dimension space Christabel sees is her ‘rest
space’ or ‘space of displacements’ – meaning everything that is ‘orthogonal’ to her
wordline via the Lorentz metric. Ignoring the z-coordinate for Christabel’s vertical
direction, sketch the coordinates t, x, y and draw on her spacetime diagram the set
of all points which are ‘equidistant’ to her by some fixed amount – say 1.
This means sketching the surface:
η((ct, x, y, 0), (ct, x, y, 0)) = −(ct)2
+ x2
+ y2
= 1, (310)
using the coordinate notation (ct, x, y, 0) to denote the 4-vector V = ct∂t +x∂x +
y∂y + 0∂z. Note, you can set c = 1 for simplicity and work in so-called ‘natural
units’.
Hint: Recalling our tutorial on metric spaces, you should recognize this set of
points to be under the general definition a ‘sphere’, except that we are using the
Lorentz metric! Therefore, what you are actually sketching is a ‘Lorentzian sphere’
or ‘Hyperbolic sphere’. Such a surface has a special name – a ‘hyperboloid of one-
sheet’.
In the last exercise, you should have noted the following observations:
142
• For each observer, there is a natural ‘light cone’ that surrounds them in space-
time. Everything inside this light cone is causally connected to that observer,
meaning that they are related in a ‘cause-and-effect’ fashion. With respect to
their origin, all events in their ‘past’ are causally connected in their backward
light-cone – all future events are in their forward light-cone.
• Every event outside the light cone one is acausal, meaning that they are not
causally connected to the observer.
• Everything that lies ‘on’ the light cone, must travel at the speed of light –
this means photons, carriers of the electromagnetic force. Possibly gravitons
too, but they won’t exist in Minkowski space 127!
• The unit sphere in Minokwski spacetime is a hyperboloid. This motivates
one to consider spacetime as an exercise in ‘hyperbolic geometry’, as op-
posed to ‘Euclidean’ (Newtonian) geometry. Thus, our exercises on hyper-
bolic trigonometry!
In this manner, we say that spacetime has a natural ‘light cone structure’ – each
observer partitions spacetime with respect to their world-line, into events that lie
inside, on or outside a light cone situated at each point on their wordline. If we now
considered a light-cone located at every point on their wordline, then considered
spacetime to be filled with worldlines (particles/observers), on then realizes that
you can ‘tessellate’ spacetime with light-cones 128. This notion can be formalized
mathematically to say that ‘light cones provide a foliation of spacetime’.
Formally, the splitting of spacetime into different regions can be done as follows.
Given a 4-vector, V = (ct, x, y, z), we say that
• V is time-like if η(V, V )  0.
• V is space-like if η(V, V )  0.
• V is null or light-like if η(V, V ) = 0.
Clearly, photons (light) have null vectors as vectors tangent to their worldline. Sim-
ilarly, all vectors which lie inside Christabel’s light-cone in the previous exercise,
would be time-like vectors. All vectors outside would be space-like.
Exercise 57 (Taychon Express) In an attempt to get more time in the music prac-
tice rooms, Gabrielle Ruttico steals Tessa’s acceletron and tries to turn herself
127
Minkowski spacetime is flat. Gravitons are the predicted quanta of the gravitational field, hence
if gravitons exist there must be gravity present – implying some curvature of spacetime.
128
Or rather, stack light cones to fill spacetime without any gaps or overlaps on the same worldline.
143
into a string-theory tachyon129, thereby travelling faster than light and hopefully
backwards in time.
Unfortunately, as an act of OHS, Dogburn already disabled the acceletron due
to Ben’s previous enlightenment (photonic) journey. Disappointed with her time-
travel progress, Gabrielle settles for more constructive stress-reduction strategies.
Measured in her Gabby-centric coordinate system (the origin being the point at
spacetime at which she discards the acceletron), the following events take place in
spacetime:
• Tea is brewed in front of her, at a distance of 0.05m in her future rest frame
(the Elsie room) – 600 seconds after discarding the acceletron. To an order
of magnitude, one can approximate the spatial displacement of the Elsie
room as (x, y, z) = (0, 10m, 10m) from the music room. Thus, we can
represent the tea brewing event with a vector: VTea = 600c∂t + 0∂x +
10∂y + 10∂z = (600c, 0, 10, 10).
• Rowan’s room is stung by Adi, 1200 seconds after discarding the acceletron.
To an order of magnitude, this event occurs at a displacement of (x, y, z) =
(0, 100m, 100m) from the music room. Thus, in spacetime, this event is
given by the vector V = (1200c, 0, 100, 100) relative to Gabrielle’s origin.
• Rory the Cyborg sees battleships ablaze on the shores of Orion. This de-
scribes the clash of Orions against the Antaran race – leading to many cy-
cles of peace and prosperity with the defeat of the Antaran empire. Orion’s
belt is about 1000 light years from the sun – lets say, along the x-axis in
Gabby’s frame for simplicity. This event is described by a spacetime vector
(ct, x, y, z) = (0, 1000 × 365 × 24 × 60 × 60 × c, 0, 0).
• Ben Luo spontaneously turns into a photon (long-term side-effects of pho-
tonitis). This is described by a vector: V = (c, c, 0, 0).
Using the previous definitions, deduce which of the above events are ‘spacelike’,
‘timelike’ and ‘null’ (light-like) with respect to Gabrielle’s origin (i.e. the light-
cone structure of her world-line).
16.2.3 Projections and Familiar Formulas
As a reward for completing the earlier part of this tutorial, we now see how hyper-
bolic trigonometry can be used to replace your usual ‘time-dilation’ and ‘length-
129
Tachyons are the mathematical reason for the instability (infinities and divergences) of traditional
string theory – supersymmetry eliminates these Tachyons and protects the theory from instabilities.
144
contraction’ formulas.
Recall that given two vectors v and u in a Euclidean vector space, the Euclidean
‘dot-product’ contains information about their lengths and the ‘angle’ θ between
them:
u ·Euclid v = u v cos(θ). (311)
This special geometric interpretation of the ‘inner product’ (dot-product) is due to
the ‘Cauchy-Schwarz’ inequality, which holds for all positive-definite inner prod-
uct spaces (vector spaces with positive-definite inner-products):
|u · v|≤ u v . (312)
To see this, we simply expand the absolute value (by its definition) and re-arrange
the inequality:
− 1 ≤
u · v
u v
≤ 1. (313)
Now, what monotone function lies between ±1? Cosine of course – in the do-
main [0, π] it is monotone decreasing. Therefore, we can taken the inverse of the
cosine function of both sides of the inequality (reversing it since cosine is decreas-
ing):
0 ≤ arccos(
u · v
u v
) ≤ π, (314)
which allows one to interpret u·v
u v as some geometric angle θ.
Inside a spacetime observer’s light cone, the vector space they generate relative
to their origin is ‘negative definite’ as we partially saw earlier – in particular,
the inner-product of any two future-pointing time-like vectors, U and V , is neg-
ative:
η(U, V ) ≤ 0. (315)
Therefore, inside the light cone, we get the ‘reverse Cauchy-Schwarz inequal-
ity’:
|U ·Lorentz V |≥ V U . (316)
Re-arranging and noting that |U ·Lorentz V |= −U ·Lorentz V for any future-pointing
time-like vectors U, V , we get:
−
U ·Lorentz V
U U
≥ 1. (317)
What trigonometric function satisfies the property that it is always ≥ 1? Hyper-
bolic cosine! Thus, we can define the hyperbolic angle or rapidity between any
145
two forward-pointing time-like vectors, U and V in spacetime:
θ = cosh−1
(−1
U ·Lorentz V
U U
). (318)
N.B: The ‘norm’ used thus far, U , is not the Euclidean norm! It is the Lorentz
norm – for it to define a positive length, we define it as:
U Lorentz=
—
|U ·Lorentz U|. (319)
If U is time-like, it’s Lorentz dot-product with itself is negative (from previous
definitions), in which case:
U Lorentz=
—
|U ·Lorentz U| =
—
−U ·Lorentz U. (320)
Note that the idea of the hyperbolic angle allows us to re-write the usual time-
dilation and length-contraction formulas in terms of hyperbolic trigonometry. In
particular, if an observer measures some event described by a 4-vector V relative
to their origin, the time at which that event occurs will the be projection of V onto
(parallel to) their world-line! However, recall that in Euclidean vector spaces we
used the dot-product to give us the parallel projection of one vector on another.
Similarly, recall that we could also get perpendicular projections in a similar man-
ner.
Formally, given an event V relative to some observer, one can decompose the event
into components parallel and perpendicular to the observers worldline:
V = V + V ⊥
. (321)
As it turns out, the space of vectors which is perpendicular to the worldline is
a 3-dimensional vector space – their ‘rest space’ or physical space by everyday
perception. Hence, if we describe an observer by the vector U and an external
event (or observer) by a vector V , the component of the projection of the external
event onto the observer U is given by:
U ·Lorentz V = − U V cosh(θ), (322)
where θ is the hyperbolic angle between U and V . Hence,
V = V cosh(θ),
V ⊥
= V sinh(θ). (323)
146
Problem 24 (Challenging) Prove the above projection formulas, using the fun-
damental hyperbolic trig identity and the fact that V and V ⊥ are ‘hyperbolic
orthogonal’ (orthogonal with respect to the Lorentz metric).
Physically, one may interpret the length V of the parallel-projection V of an
event V onto an observers worldline, as the time elapsed (relative to the observers
origin) as measured that observer. Since V is a vector between the origin O and
some point P (the event), this suggests that V is the proper-time between events
O and P and that V is the ‘dilated-time’ as measured by our initial observer.
Thus, one would interpret:
cosh(θ) =
1
˜
1 − v2
c2
(324)
as the time-dilation or factor, which you may know as γ !
Exercise 58 (Relative Velocities and Hyper-trig) Using your knowledge of hy-
perbolic trigonometry, show that the above interpretation of cosh(θ) as the ‘Lorentz’
factor γ is indeed sensible. In particular, first show that:
v = tanh(θ). (325)
Now, since tanh(θ) = sinh(θ)
cosh(θ) and since:
V = cosh(θ), V ⊥
= sinh(θ), (326)
what does this suggest? It suggests that:
V =
sinh(θ)
cosh(θ)
= ? . (327)
Of course, this is just ‘spatial distance (length)’ divided by time – a physical 3-
dimensional velocity as measured by our proverbial observer!
Exercise 59 (Length Contraction and relativistic velocity addition) Show that the
length-contraction factor is indeed given by sinh(θ).
Now use the hyperbolic trig identity: tanh(θ + α) = ... to derive the relativistic
velocity addition formula by denoting tanh(θ) = v and tanh(α) = u for some
3-velocities with magnitudes u and v.
Orthogonal complements!
147
17 Tutorial 15: Differential Equations and Operators
In this tutorial, we will investigate the topic of ‘differential equations’. Differential
equations are perhaps one of the most widely used mathematical tools in science.
Together with their ‘discrete’ counterparts – ‘difference equations’ (recursion re-
lations), differential equations describe the vast majority (almost all) explainable
processes in the natural world.
Some popular examples of differential equations in physics include – Newton’s
2nd Law of motion (F = ma), Maxwell’s equations for electromagnetism (de-
scribing light and all forms of electromagnetic radiation), Einstein’s gravitational
field equations for general relativity, the ‘heat equation’ for thermodynamics, the
Navier-Stokes equation governing fluid mechanics, the Simple Harmonic Oscilla-
tor equation (describing all oscillatory motion) and the ‘wave equation’. In a wider
setting, we also have the Verhulst equations for ‘population-growth’, predator-prey
models, ‘damped oscillation equations’ for LRC electrical circuits, chemical re-
action rate equations and the Black-Scholes stochastic differential equations for
modelling the stock market.
In essence, the list of examples of applications of differential equations endless.
Certainly, for this reason, if you want to ever get a job in mathematical modelling,
then it’s a good idea to get some mastery of differential equations! In this tutorial,
we revise the basics with a non-standard presentation – in particular, we will study
differential equations in the context of ‘differential operators’. This will allow
you to provide a link between what you learn in ‘linear algebra’ and/or ‘quantum
mechanics’ to differential equations, under a branch of mathematics known as ‘op-
erator theory’ and ‘functional analysis’.
17.1 Differential Operators and Simple DEs
In mathematics, an ‘operator’ is a general term for an object which acts or ‘oper-
ates’ on another object, to produce a new (transformed) object. For example, you
may recall from class or an earlier college tutorial that matrices were simply co-
ordinate representations of (finite-dimensional) ‘linear operators’. In your courses,
you will mostly study easy differential equations – in particular, ‘linear ones’. As
such, you will encounter ‘linear differential operators’. These operators are all
‘non-compact’ and in some sense, ‘infinite-dimensional’ – which gives them very
interesting properties.
148
Recall that a function f of one variable, is defined as a mapping between sets130
f : SDomain → SCodomain
x → f(x). (328)
In general, the differential equations you will study will involve functions of a real
variable, hence SDomain = R. Depending on the application, its range may be that
of complex C numbers – e.g. if you are studying AC circuits or electromagnetism,
or it may be real.
In essence, a differential equation is simply an equation involving derivatives of
some function. Viewed another way, a differential equation is essentially a differ-
ential operator acting on some function to transform it. As a small technical note,
one may be taught (perhaps mislead by notation) that a differential equation in-
volves an ‘independent variable’(s) which you differentiate with respect to, as well
as a dependent variable(s) or ‘response variable’(s) which you are differentiating –
for example, dy
dx = 0. Technically speaking, here y is a function of x, so we write:
y = y(x) to formally denote this. It’s important to keep this in the back of your
mind, even if the common notation omits this.
Example 12 (The World’s Simplest Differential Equation)
f(x) = 0. (329)
Technically speaking, this is a differential equation with the zeroth derivative of f
with respect to x. As such it is a ‘zeroth-order’ differential equation. The world’s
next simplest DE would be:
f(x) = c, (330)
where c is some constant. Then of course, one could have f(x) is equal to any
specified function of x which you desire. These are all trivial differential equations.
Example 13 (The World’s Next Simplest Differential Equation) We now wish to
solve the following equation
df
dx
= 0, (331)
for the function f = f(x).131 The solution of course, is trivial since the differential
equation asks – what functions f have derivative zero with respect to x? Constant
functions of course: f(x) = c, for some constant c.
130
Note that a function maps from its domain (some predetermined set of values on which it is
defined) to its co-domain (some pre-determined ‘target set’). Its ’range’ is the image of its domain –
that is, every member of its co-domain which is equal to f(somepointinthedomain). Sometimes,
‘co-domain’ and ‘range’ are used interchangeably.
131
You may also see this equation written in the notation: f = 0, with primes being short-hand
notation for denoting differentiation.
149
Note that we can also view this differential equation as follows:
Df(x) = 0, (332)
where D = d
dx is a ‘first-order differential operator’. In this manner, we view the
differential equation as some operator (the operator d
dx which acts on objects by
taking their derivative with respect to x) acting on f to give zero.
In the previous example, one may see that asking for solutions to a differential
equation:
Df = 0 (333)
is the same asking what the kernel of the relevant differential operator D is – i.e.
the set of functions which get sent to 0 when the operator D acts on them. In this
manner, to say that f(x) = c is a solution to df
dx = 0 is the same as saying that the
function f(x) = c lies in the kernel of the differential operator d
dx .
Trivial examples aside, we now proceed with some more interesting (and useful)
examples of differential equations along with their solution strategies and algo-
rithms on the way.
Example 14 (A First-Order DE) Consider the following differential equation:
Df = 0, (334)
where f = f(x) is some function of x and D = d
dx − m, for some constant m. We
can re-write this as
df
dx
= m, (335)
and seek all solutions to this differential equation (finding the kernel of D). There
are two ways to proceed from this point. One point is to make an intuitive guess
and test if it is correct. In this case, asking for functions with a constant derivative
is the same as asking for functions of a constant gradient m, so we know that all
solutions have to be ‘straight lines’ – i.e. linear functions of the form:
f(x) = mx + b, (336)
where b is an arbitrary constant (the y-intercept of the graph y = f(x) = mx +
b). Alternatively, we can proceed in a more ‘systematic way’ using the method of
‘separation of variables’, which is based on the concept of ‘exact differentials’.
In this manner, you can take it for granted that you can use Leibniz notation in a
literal way which is still rigorous (with the right technology) – so we multiply both
sides of the equation (335) by dx to get rid of it from the denominator:
df = mdx. (337)
150
We now have an exact differential df of f on the left-hand side. The fundamental
theorem of calculus tells us how to integrate this precisely:
f(x)
f(x0)
df =
x
x0
md˜x
=⇒ f(x) − f(x0) =m(x − x0)
=⇒ f(x) =m(x − x0) + f(x0)
=mx + (f(x0) − mx0), (338)
where x and x0 are the limits we are integrating between, x0 being some ‘initial
point’ chosen a-priori. Without further information, we can simply re-label the
constant (f(x0) − mx0) = b. Alternatively, we could have used the ‘indefinite
integral’ approach and arrived at the same conclusion: f(x) = mx + b. Hence,
the kernel of the differential operator d
dx − m is the set of all linear functions on R
– that is, the set of functions {f(x) = mx + b}, where b is an arbitrary parameter.
In the previous example, we saw that two approaches worked – one was the semi-
heuristic approach which relied on intuition and guessing the correct answer. The
second, was a somewhat algorithmic approach – ‘separation of variables’. We will
now use separation of variables in one more toy example, before proceeding to a
physical, less trivial example – Newton’s second law of motion.
Example 15 (Second, Third and Infinite Order Differential Equations) Consider
nowz, the kernel of the second-order differential operator D = d2
dx2 . That is, con-
sider all solutions of the differential equation:
d2
dx2
f = 0, (339)
where f = f(x). This is a second-order differential equation because it involves
two derivatives of x. We shall now switch to using primes to denote derivatives,
when convenient – e.g. d2f
dx2 := f , df
dx = f e.t.c. Again, it is easy to see that the
solution to the differential equation f = 0 is the set of all straight lines, since
straight lines have ‘no curvature’ (or equivalently, no acceleration). Systemati-
151
cally, we can use separation of variables as before:
d2
dx2
f =0 ⇐⇒
d
dx
(
df
dx
) = 0 ⇐⇒
d
dx
f = 0
=⇒
f (x)
f (x0)
df =
x
x0
0 · dx = 0
=⇒ f (x) − f (x0) =0, let m = f (x0)
=⇒ f :=
df
dx
= m
=⇒ df = mdx. (340)
Again, we can avoid specifying limits of integration by using ‘indefinite integrals’
and keeping track of integration constants – however, for now it’s best to leave
them in since students often have a bad habit of omitting them. Also, later on when
one does ‘initial value problems’, keeping the limits of integration is equivalent to
use ‘initial data’ to determine your integration constants at the end ...
For a more useful example of integration, we now turn Newton’s definition (his
second law of motion) of the force experienced by a point particle:
Force = mass of particle × acceleration of particle. (341)
In one-dimensional motion, recall that for a particle with displacement x from the
origin, its velocity v and acceleration a are defined as derivatives with respect to
time:
v =
dx
dt
, a =
dv
dt
=
d2
dt2
x. (342)
In this manner, we can re-write Newton’s second law in one-dimension as:
F = m
d2x
dt2
. (343)
This is simply a definition. To get useful, predictive physics out of Newton’s law,
we need a force law – this means, some functional form for F! One simple exam-
ple, is to consider an object falling under gravity from the Newtonian view. For an
object of mass m that is small with respect to the Earth’s mass, close to the surface
of the Earth we can approximate the force it experiences due to gravity as:
Fgrav = mg, (344)
152
where g is the average acceleration due to gravity near the Earth’s surface – say,
9.8m/s downwards. Ignoring air-resistance and all other effects, this ‘free fall’
motion is one-dimension (downwards). If let x = x(t) be the vertical distance of
an object from the Earth’s surface, Newton’s second law of motion gives rise to the
following differential equation:
Fgrav = m
d2x
dt2
=⇒ mg = m
d2x
dt2
. (345)
To cancel the mass m from both sides of the above differential equation is a subtle
point – on the left-hand side, we have the ‘gravitational mass’ and on the right, we
have the ‘inertial mass’. Indeed, this was something Newton considered. Flipping
this around, we can say that the g on the left side is ’gravitational acceleration’
and that the d2x
dt2 on the right-hand side is ‘inertial acceleration’. That these are
‘equivalent’, is indeed a statement of ‘Einstein’s equivalence’ principle of General
Relativity (in some weak form)!
Physics aside, we end up with the second-order differential equation:
d2x
dt2
= g. (346)
Clearly x must be some quadratic function (degree two polynomial) of t. To prove
this explicitly, let 9x = dx
dt , then use separation of variables:
d2x
dt2
=
d 9x
dt
= g
=⇒ d( 9x) = gdt
=⇒ 9x =gt + u ⇐⇒ dx = (gt + u)dt
=⇒ dx = (gt + u)dt
=⇒ x =
1
2
gt2
+ ut + c. (347)
where u is a constant of integration132. Now, you may recall from high-school
physics that this is a more-familiar consequence of Newton’s second law – that
is, an object experience a constant force such as gravity! Hence we have x(t) =
1
2at2 + ut, where a = g is the acceleration of the object (due to gravity) and u
is its ‘initial velocity’ – that is, v(0) = dx
dt |t=0. Furthermore, c = x(0) is its
132
Recall earlier comments, we can either carefully add constants of integration while doing indef-
inite integrals, or explicitly specify the limits of integration – both are equivalent.
153
‘initial displacement’ – its displacement at the initial time t = 0. Note that we can
equivalently say the kernel of the differential operator D = d2
dx2 is the set of all
‘parabolic functions’, or all functions of the form:
1
2
gt2
+ ut + c, (348)
where g, u, c are arbitrary constants (in general).
With the constants of integration undetermined, we call x(t) = 1
2gt2 + ut + c
the ‘general solution’ of the previous differential equation. Because the previous
example was a ‘second order’ differential equation, we had to ‘integrate twice’ –
meaning we needed two pieces of ‘initial data’ to specify a unique solution. The
data we need is the initial velocity u = 9x(0) of our object and its initial displace-
ment x(0). In general, for an ‘n-th’ order differential equation of a function of one
real variable, you need n pieces of ‘initial data’ to get a unique solution.
17.2 Physical Examples
Thus far, we have considered ‘1-dimensional systems’ in the sense that ‘response’
variable we solved for in our differential equations, was 1-dimensional (a function).
Since the concept of ‘dimension’ applies to more general things than just ‘space’,
you will find that nature is governed by many differential equations with differ-
ent dimensionality. As such, you can think of differential equations as modelling
the ‘time-evolution’ of a (smooth133system – that is, how some physical system
evolves in time.
We now consider the following following scenario. After borrowing the old cannon
from Kings Park, St. George’s College refurbishes the cannon and places it on top
of the tower. Having gone mad with power, the warden – Ian Hardy, decides to
‘cleanse’ college row by firing the cannon at the other colleges. The motion of the
cannon-balls is to some approximation, governed by Newton’s second law:
F = m
d2
dt2
r, (349)
where vecr = r(t) is the displacement vector for the cannon-ball, a vector func-
tion of time. For now, we can establish a coordinate frame on top of the tower –
133
Technically, your system must be ‘smooth’ in the sense that it is differentiable – that is, it’s
state-space is a smooth manifold. Some processes are not continuous, therefore not differentiable!
Nonetheless, many discrete processes, such as quantum random walks (Ben Luo), can be approxi-
mated by some smooth process governed by a differential equation.
154
considering the vertical direction of the motion to be in the +z direction, and the
horizontal motion along college row to be in the +x direction. To turn Newton’s
second law into a differential equation for the ‘time-evolution’ of the cannon-ball
(i.e. its trajectory), we need to know what forces are acting on the ball.
For simplicity, lets say that the forces acting on the cannon-ball are:
• Gravity acting downwards:
Fgrav = mg (350)
where g = (0, 0, −g)m/s2 is the acceleration due to gravity and g = 9.8m/s2
is its magnitude.
• A ‘drag force’ opposing the motion of the ball, due to friction between the
cannon-ball and molecules of air. For reasonably low-velocity objects like
cannon-balls, we can model aerodynamic drag linearly – that is, linearly
proportional to the ball’s velocity:
Fdrag = −bv (351)
where v = (vx, vy, vz) is the cannon-ball’s velocity vector (note that vy =
0, by assumption / our chosen orientation of coordinate axis). Technically
speaking, this is the ‘Stokes’ drag – modelling the air as a fluid (in the most
general sense) and ignoring turbulence.
Thus, the total force is acting on the ball is F = Fgrav +Fdrag = (−bvx, 0, −mg−
bvz). We now get a system of second-order differential equations:
F =m
d2
dt2
r ⇐⇒ (−bvx, −bvy, −mg − bvz) = (m
d2x
dt2
,
d2y
dt2
,
d2z
dt2
)
=⇒
d2x
dt2
= −
b
m
vx (352)
d2y
dt2
= −
b
m
vy
d2z
dt2
= −
b
m
vz − g. (353)
note that vx := dx
dt , vz := dz
dt etc so the velocities are functions of time.
Problem 25 (Gallilean Relativity) Recalling his study of the Gallilean symmetry
group, Ian Hardy notes that when he calculates the trajectory of his cannon-ball
that he can rotate coordinates to make its motion 2-dimensional. This is because
155
‘Newtonian mechanics’ is relative under the ‘Gallilean Lie group’ – or equiva-
lently, Newton laws of physics are the same in all Gallilean inertial reference
frames134. This means, we can fire it so its initial velocity in the y-direction is
zero – hence we can ignore the y coordinate and y differential equation, since we
will simply have y(t) = y0 where y0 is the initial y coordinate of the cannon-ball.
Nonetheless, he needs to solve the x and z differential equations to get the motion.
Luckily, these equations are uncoupled! This means that they are independent, so
we can solve them separately.
I: Solve the projectile motion differential equation (352) for x(t), using separa-
tion of variables. You will need to use that fact that vx = dx
dt to do the second
integration. When separating variables, you will need to use the fact that:
dv
v
= ln(v) + c, (354)
where c is some constant of integration determined by you ‘initial value’ data (i.e.
initial velocities and initial time). Alternatively, you can define your integration
limits explicitly in terms of your initial data: t0 = 0, vx(0), vz(0) and x0, z0.
II: Solve the differential equation (353) for the z component of the cannon-ball
trajectory. Hint: recall that f
f = ln(f) + c, where f is some function, f is its
derivative and c is a constant of integration.
You should check your answers with a tutor – or ask them for help. Note that your
solutions should be of the form:
x(t) =x(0) −
m
b
vx(0)(1 − e− b
m
t
)
z(t) =z(0) −
mg
b
t +
m
b
(vz(0) +
mg
b
)(1 − e− b
m
t
). (355)
Problem 26 (Physical Meaning of DEs) In the previous problem, find an expres-
sion for the velocity when the ‘drag force’ cancels out the gravitational force – i.e.
when:
Fgrav + Fdrag = 0. (356)
This is the point at which the net force acting on the cannon ball is zero – meaning,
it travels thereafter with constant velocity. Such a velocity is referred to as the
objects terminal velocity. Physically, your answer should depend on the constant b
134
Einstein’s theory of special relativity says that nature (a 4-dimensional affine space instead of
a 3-dimensional vector space) is invariant under transformations of the ‘Lorentz symmetry group’,
which is different to the Gallilean group.
156
since this is the aerodynamic constant or ‘drag coefficient’, related to the geometry
of the object and how we model air as a fluid.
Q2: Find the ‘time’ of flight of the cannon-ball. In particular, given z(0) = h
is the height of the St. George’s College tower (say 15 metres) and given that
z(tfinal) ≈ 0, when the cannon-ball hits St. Catherine’s college – solve the z
equation of motion for the time elapsed: ∆t = tfinal − t0 = tfinal (setting t0 = 0
for simplicity). Note, if you can’t solve it analytically – first try letting b → 0 then
solve the for the simplified case, where there is ‘no air-resistance’.
Q3: Set b = 0 and re-solve the differential equation arising from Newton’s second
Law. This should give you the trajectory without air-resistance. In this try, you can
express the z coordinate in terms of x and should get some of the form: z ∼ x2,
which is the equation for a parabola! This tells us that in the absence of drag forces
(e.g. in a vacuum), projectile motion under gravity follows parabolic trajectories.
Now, using your knowledge of limits and the exponential function, carefully take
the limit b → 0 for the trajectory (x(t), 0, z(t)) of the cannon-ball in the case with
drag forces present. This should coincide with your result for the solution to the
differential equation without air-resistance.
Thus we have considered a subset of a class of differential equations called ‘or-
dinary differential equations’ (ODEs). They are the simplest types of differential
equations, which is why we can find nice ‘analytic’ solutions. In general, differen-
tial equations can be extremely hard to solve – sometimes, only numerical solutions
are available (effectively speaking, since analytic solutions involving infinite sums
of special functions can be slower for a computer to evaluate than a solution gener-
ated by numerical means). For engineering purposes, the overwhelming majority
of physical processes are modelled using ‘numerical analysis’ to solve complicated
differential equations. Nonetheless, it is important and instructive to get a handle
of differential equations with known analytic solutions behave.
17.3 Operators, Eigenfunctions and Spectra
In the previous example of projectile motion with air resistance, we can view the
problem as a statement in operator theory:
Dr(t) = (0, 0, g) (357)
where D = d2
dt2 − b d
dt is the Newtonian operator minus the air-resistance opera-
tor (they act on r to give Netwon’s law F = m d2
dt2 r and the drag force: Fdrag =
157
−b d
dt r). In the absence of gravity, the Kernel of the of the operator D is simply
solutions to the projectile motion differential equation on the International space
station (where gravity is negligible). You may wonder what the purpose is for
the operator viewpoint – indeed, for such simple differential equations, it serves
only an aesthetic purpose to connect the theory of differential equations to linear
and abstract algebra. However, the operator formalism is immensely useful when
studying properties of more complicated equations – for example the ‘heat equa-
tion’. This is a very active area of research as you may see by typing ‘Heat Kernel’
into Google.
For those of you who studied quantum mechanics, you should be familiar with the
momentum operator:
ˆpx = −i¯h
d
dx
, (358)
where i is the imaginary unit and ¯h is Planck’s constant (which sets the ‘length’
scale of quantum behaviour). You may now ask, what are the eigenfunctions of the
momentum operator? That is, what functions ψ(x) solve the eigenvalue equation
(a differential equation):
ˆpxψ(x) = kψ(x), (359)
where k is the eigenvalue of the eigenfunction ψ. Recalling eigenvectors and eigen-
values from linear algebra, you will notice that functions form a vector space in the
abstract sense – that is, they satisfy all the vector space axioms (addition, linearity
etc). Hence, eigenfunctions and eigenvectors are the same concept – except that
eigenfunctions typically exist in infinite dimensional vector spaces.
Now, to solve our problem of finding the eigenfunctions of the momentum operator,
we have to solve the differential equation:
− i¯h
dψ
dx
= kψ(x). (360)
Exercise 60 (Snakes on a Plane Wave) After being turned into a photon, Ben Luo
decides to take revenge on the tutorial students who drew inaccurate spacetime di-
agrams of his journey of enlightenment. In this manner, he decides to turn Matt
Fernandez into a plane wave – that is, a solution to Schrodinger’s equation in free
space. To do this, he first has to set Matt loose into a region of the universe where
gravity is negligible and there are no external interactions interfering with him.
As his final revenge, Ben sets a bunch of quantum snakes loose to hunt down Matt.
Having turned into plane-wave, Matt has a definite momentum but indefinite posi-
tion. Quantum mechanically speaking, he exists across all space simultaneously –
158
until an observer (or snake) performs a position measurement on him. Thus, he is
safe for now.
To help Matt, solve the above differential equation and show that you do indeed get
‘plane wave’ solutions.
In the last exercise, you should see that ψ(x) = Ae
ik
¯h
x
is the general solution
to the differential equation. Here A is some constant, which is determined by
‘normalization’ (total probability summing to 1) which we can ignore for now135.
The complex exponential function e
ik
¯h
x
is a ‘plane wave’ – like electromagnetic
waves in free space. To see that explicitly, you can use Euler’s formula: e
ik
¯h
x
=
cos(k
¯h x) + i sin(k
¯h).
In the context of operator theory, we say that e
ik
¯h
x
is an eigenfunction of the mo-
mentum operator, −i¯hdψ
dx with eigenvalue k. Quantum mechanically, the process
of acting an ‘operator’ (observable) on a wavefunction (representing a particle, per-
son or cat in a box etc) is precisely the process of measurement. The eigenvalue
we get is the result ‘measurement outcome’ – in this case, it is a value for the
momentum k in the x-direction of Matt Plane Wave Fernandez. How does this co-
incide with what you see in linear algebra? Well eigenfunctions and eigenvectors
are really just part of the same general concept – to make sense of everything in an
efficient, powerful way, you will need to study the theory of ‘Hilbert Spaces’. This
is also illustrates one motivation and application for operator theory – the entirety
of quantum mechanics is based on it! In more general terms, studying the proper-
ties of differential operators can tell you a lot about the properties of the solutions
to the differential equations they generate ... even if you can’t find them!
135
Since space is mathematically infinite, we must restrict to some finite space / region ... or we
will get a divergent integral.
159
18 Tutorial 16:Differential Equations and Integrating Fac-
tors
In the last tutorial, we looked at the preliminary notion of ‘differential operators’
in the context of linear ‘Ordinary Differential Equations’ (ODEs). In the examples
and problems covered, we were able to solve the differential equations arising from
various processes by the method of ‘separation of variables’. Although powerful,
the method of separation of variables only works if a differential equation is ‘sep-
arable’ – most differential equations aren’t, although many important differential
equations are.
As it turns out, whether or not a differential equation is separable is intimately tied
to the coordinate system in which it arises. In particular, the study of the sepa-
rability of Elliptic Partial Differential Equations (covering a vast class of physical
phenomena) – such as the ‘Laplace equation’, is a contemporary area of research
in differential geometry 136. Luckily for us, many differential equations which do
not ’appear’ to be separable, can be put into a ‘separable form’ by using simple
tool – an ‘integrating factor’.
18.1 Review – Theory of separation of variables
Definition 10 (Separability) A first order differential equation in y, is separable
if it can be written in the form:
dy
dx
= h(y)g(x), (361)
for some functions h and g.
Previously, we looked at n-th order ordinary differential equations137 of the form:
dn
dxn
f(y) = g(x)h(y) (362)
where f, g, h are suitably defined functions of x or y and n ≥ 1 (in particular, we
looked at simple cases with f(y) = y). Such equations were easy to solve because
we could directly ‘integrate’ them. In particular, by letting v(y) = dn−1
dxn−1 , the DE
(362) becomes:
dv
dx
= g(x)h(y), (363)
136
In this regard, recent studies of properties and existence of the ‘Benenti tensor’ marks a critical
advancement in this area.
137
Specifically, for n = 1 and n = 2.
160
hence separating variables gives:
dv(y)
g(y)
= g(x)dx, (364)
allowing us to explicitly integrate the left-hand side with respect to y and the right-
hand side with respect to x. By applying ‘initial conditions’ (physical data), we
then get a unique solution for v(y) = dn−1
dxn−1 f(y).
Applying this process n times, we finally arrive at an implicit solution for y in
terms of the independent variable, x:
G(y) = H(x), (365)
where G and H are functions determined by integration. We can re-write this
as:
F(x, y) = G(y) − H(x) = 0. (366)
Hence, solutions to our original differential equation are level curves of the func-
tion F of two variables. If F continuously differentiable on some open set U, then
by the implicit function theorem, it follows (roughly138) that on some open subset
of U, we can explicitly write y as a function of x:
y = Q(x), (367)
where Q is some appropriate function, providing an explicit solution to our original
differential equation.
If you review this process, one will immediately see that the key ingredient behind
‘separation of variables’ for ODEs, is the existence of a ‘total differential’ or ‘exact
differential 1-form’ (recall tutorial 4). In particular, we had
dv(y)
g(y)
= g(x)dx, (368)
and it was stated that one could explicitly integrate the left-hand side of the equa-
tion:
dv(y)
g(y)
=
v (y)
g(y)
dy = .... (369)
138
For a more ‘explicit’( accurate) statement of the implicit function theorem, see your calculus
textbook.
161
In general, this is only possible if the function v (y)
g(y) is of a special form – in partic-
ular, if v (y) = λg (y) (for some constant λ), then we can use the identity:
g (y)
g(y)
dy = ln[g(y)] + c, (370)
where c is some constant of integration. This is because we have an exact differen-
tial 139
g (y)
g(y)
dy = d(ln[g(y)]), (371)
allowing us to apply the ‘fundamental theorem of Calculus’.
If the left-hand side, dv(y)
g(y) , is not of this form, either we cannot integrate it or we
must use some special ‘tricks’ to put it into this form.
Problem 27 (A question of separability) Consider the following differential equa-
tion for y as a function of t:
dy
dt
+
5
t
y = t − 2 +
2
t
. (372)
Now try and solve this differential equation for y explicitly in terms of t, using the
method of separation of variables.
Hint: If you can’t complete this problem in 5−10 minutes, move to the next section.
18.2 Integration Factors
For now, we shall consider ordinary first-order differential equations. The integrat-
ing factor method can be used recursively for higher-order differential equations
... if you are lucky. For example, recall the projectile motion problems in tuto-
rial 15 – these were second order differential equations, but effectively amounted
two sequential first order differential equations for the projectile trajectory (first we
solved for velocity, then displacement).
In general, the integrating factor method is useful for solving ODEs of the follow-
ing form:
dy
dx
+ P(x)y = Q(x), (373)
139
Recall that for a function f of one variable – say y, its ‘exterior derivative’ or exact differential
is given by: df = df
dy
dy. The term dy is a differential 1-form – an object which is ‘dual’ to unit
vector ey in the y-direction.
162
where P and Q are functions of the independent variable, x. To put this differential
equation into ‘separable form’ (defined earlier), we may consider multiplying it by
some function I(x) to get an exact-differential on both sides.
Assuming this is possible, we have:
I(x)(
dy
dx
+ P(x)y) = I(x)Q(x), (374)
where (dy
dx + P(x)y) is a total derivative. Alternatively, multiplying by dx on both
sides, we get
I(x)(dy + P(x)ydx) = I(x)Q(x)dx, (375)
where I(x)(dy+P(x)ydx) is an exact differential.140 Hence, we must have:
I(x)(
dy
dx
+ P(x)y) =I(x)
dy
dx
+ y
dI
dx
⇐⇒ I(x)(dy + P(x)ydx) =d(I(x)y)
= ydI(x) + Idy
= y
dI
dx
dx + Idy. (376)
Comparing coefficients of dx and dy (linearly independent dual vectors) on the
left- and right-hand sides, we must have:
dI
dx
= I(x)P(x), I(x) = I(x). (377)
Hence we have a separable first-order differential equation for the integrating fac-
tor, I(x):
dI
dx
=I(x)P(x)
=⇒
dI
I
=P(x)dx
=⇒
dI
I
= P(x)dx
=⇒ ln[I(x)] =C + P(x)dx
=⇒ I(x) =e P(x)dx
, set C=0 . (378)
140
Recall that an exact differential, or ‘exterior derivative’ of a function f(x, y) of two variables is
given given by: df = df
dx
dx + df
dx
dy.
163
Note that here, the constant C of integration with respect to I(x) is superfluous –
hence we discard it by setting it zero. Hence we arrive at a functional expression
for our integration factor. Doing this process in reverse, we can then solve our
original differential equation!
To summarize, we can reduce any differential equation of the form:
dy
dx
+ P(x)y = Q(x), (379)
to separable one:
I(x)(
dy
dx
+ P(x)y) =Q(x)I(x)
⇐⇒ d(I(x)y) = Q(X)I(x)dx
⇐⇒ I(x)y = λ + Q(x)I(x)dx, (380)
where λ is the constant of integration appearing from the left-hand side. We did
this by multiplying both sides by an integration factor I(x), whose form is given
by:
I(x) = e P(x)dx
. (381)
Exercise 61 (Reading the Question) In an alternate universe, William is still work-
ing on the first problem – not having read the hint pertaining to needing an alter-
nate solution strategy.
To help alternate William out, try solving the earlier differential equation for y as
function of t
dy
dt
+
5
t
y = t − 2 +
2
t
, (382)
by using the integration factor method.
Hint: Note that since the independent variable here is t, we have P(t) = 5
t ,
Q(t) = t − 2 + 2
t and hence:
I(t) =e
5
t
dt
=e5ln(t)
=eln(t5)
=t5
, (383)
(ignoring the constant of integration).
164
Problem 28 (Party Patrol) In a twist of events, during a “Filius Fogg themed
college party, Alice becomes a bit too rowdy. The Sherrif on duty – Matthew Goss,
decides that it’s time to Taser Alice with the official RA Taser. When dry, the human
skin has a resistance of about 100, 000 Ohms.
To model this tasering process, we can consider Alice to be a resistor of 100, 000Ω
(Ohms) and Matt’s taser to be a discharging capacitor, with a capacitance of C =
100µF (micro farads). This makes the system an ‘RC Circuit’. Using Ohms Law,
we have
V = IR, (384)
between any two points on the circuit, where the voltage V and current I are
functions of time and the resitance R between the two points is constant. Since
capacitance C is related to the voltage and charge Q stored in the taser we have:
Q = CV. (385)
Finally, since current is defined as the ‘rate of flow of charge’ through any chosen
point in the circuit, we have I = dQ
dt . Thus, differentiating the capacitance equation
with respect to t (noting that the capacitance C is constant), we get:
dQ
dt
= I = C
dV
dt
. (386)
Combining this with Ohm’s law and conservation of charge (Kirchoff’s current
law), we get the following differential equation for the voltage V :
dV
dt
+
1
RC
V = 0. (387)
Q0[Easy]: Reproduce the intermediate steps required to derive the above differen-
tial equation. Now, if you like – instead of eliminating the current I, try eliminating
the voltage V and arrive at differential equation for the current I as a function of
time t.
Q1: Is this differential equation separable? If so, re-arrange it to the canonical
form defined at the start of the tutorial.
Q2: Solve this differential equation for the voltage V as a function of time t, where
t = 0 is the time of initial discharge (tasering). To get a unique solution, set the
initial voltage V0 := V (0) to be 150, 000 volts.
Hint: Your solution involve some sort of ‘exponential decay’. Can you think of why
this makes sense physically?
165
Q3: What is the voltage V , after a time t = 3 seconds of tasing? Would this voltage
‘stun’ the target, or is it lethal?
Q4: There is one special constant that characterises an RC circuit. In fact, any
sort of ‘exponential decay law’ (analogous to the ‘half-life’ of a radioactive sub-
stance) such as the ‘skin-depth’ of an electromagnetic wave penetrating some sur-
face has an equivalent constant. This constant is called the ‘RC time constant’,
defined by:
τ = RC. (388)
This is equivalent to the time it takes to discharge the capacitor to 1
e = e−1 ≈
36.8% of its initial charge.
Prove that τ = RC is indeed the time at which V = 1
e V0. Now compute the time
constant τ for this RC circuit.
Q6: Having sobered up from her first tasering, Alice continues to party – however,
an unfortunate turn of events leads to someone spilling drink on her, thus making
her extremely aggressive. Being a responsible Sherrif, Matt decides to taser Alice
again.
Uh oh! When setting the voltage, Matt forgot to take into account that now wet,
Alice’s equivalent resistance is reduced to R = 1000Ω. Recompute the quantities
in Q3, Q4 and Q5 with this new value for resistance.
For most of you, the last problem should have been relatively easy since no integra-
tion factor was required. The following problem can be solved either using integra-
tion factor, or (with a trivial trick) immediately by separation of variables.
Problem 29 (Fresher Inductions) Having already tased Alice, Matthew Goss grows
hungry for power. As such, in the upcoming fresher inductions, he decides to con-
nect new college students to an inductor – thus forming an ‘RL-Circuit’ (resistor-
inductor circuit).
An inductor is essentially a coil of wire (e.g. copper wound on a torus) which acts
to resist changes in electric current that flows through it. Drawing on the concept
of ‘inertia’ from classical mechanics, one can very loosely consider it as something
analogous to ‘mass’ for an electrical circuit. The inductance L of a circuit element
is defined by the magnetic flux φ through the circuit, generated by a flow of charge
(current) I:
L =
dφ
dI
. (389)
Faraday’s law of induction states that the voltage induced by any change in mag-
166
netic flux through the circuit is given by:
V =
dφ
dt
=⇒ V = L
dI
dt
. (390)
Combing this with Ohm’s Law: V = IR and the conservation of (electrical poten-
tial) energy (Kirchoff’s Voltage law), we get the following differential equation for
the current I flowing through the circuit, as function of time t:
Vin = IR + L
dI
dt
. (391)
Here Vin is the ‘input voltage’, which is constant in time.
Q0[Easy]: If you study physics or engineering, derive the above differential equa-
tion based on the principles outlined. Hint: All the hard work has already been
done.
Q1: Rearrange this differntiale equation into standard form, then solve directly
using ’separation of variables’ find an ‘integration factor’ and then solve it.
Q2: The voltage VR across the resistive element of this circuit (the college fresher),
is given by conservation of energy:
VR = Vin − VL, (392)
where VL is the voltage through the inductor.
Combining this with the above differential equation and using a ‘step-voltage’ in-
put (meaning Vin(t) = 0 for t  0 and Vin = V0 for t ≥ 0), one should get:
VL(t) = V0e−R
L
t
, VR = V0(1 − e−R
L
t
). (393)
Similar to the RC circuit, we can define a special constant – the ‘time constant’
for RL circuit.
In particular, we define the time constant τ for an RL circuit to be the time it takes
for the voltage to drop across the inductor L, to a factor of 1
e of its initial value.
Equivalently, this the time taken for the voltage to rise to within 1
e of its final value
across the resistor (fresher) R.
Using this information, prove that time constant is given by:
τ =
L
R
. (394)
167
19 Tutorial 17: Second Order Linear Differential Equa-
tions
In the last two tutorials we reviewed first order ordinary differential equations
(ODEs) and how the arose as models of various physical phenomena. In partic-
ular, we looked at differential equation of the form:
dy
dx
+ P(x)y = 0, (395)
where P is some function of the independent variable x. Such an equation was
solved using ‘separation of variables’ and integrating both sides. We also studied
ODEs of the form:
dy
dx
+ P(x)y = Q(x), (396)
where P and Q are functions of x. For such equations, we had multiply both sides
by an ‘integrating factor’ – a function of the form
I(x) = e P(x)dx
, (397)
in order to express the equation in ‘separable form’, whence we could apply the
separation of variables method.
However, in general, physical processes may be modelled by differential equa-
tions containing higher order derivatives. In special cases like the projectile motion
problem – a second order differential equation, it may be possible to recursively
apply the ‘separation of variables’ method and integrate multiple times to obtain
the general solution. Fortunately, not all second order differential equations are
that easy (otherwise they would be boring) – hence we need a general solution
strategy.
For those of you have already studied second order differential equations, this will
be good revision with a twist of applications and some extra insight into the un-
derlying mathematical theory. In particular, we connect the mathematics of linear
spaces (vector spaces) and differential operators to solutions strategies for second
order ODEs.
19.1 Homogenous Second Order ODEs
A linear homogeneous second order ordinary differential equation in the de-
pendent (response) variable f, is given by:
a
d2f
dx2
+ b
df
dx
+ cf = 0, (398)
168
where a, b, c ∈ R are real coefficients and f is a function of independent variable
x.
To say that the second order ODE (398) is ‘linear’, means that the solution space of
the differential equation is two-dimensional vector space (an abstract vector space
where functions are vectors). Recall that a vector space (‘linear space’) V is char-
acterized by the property that any linear combination of two vectors: v1,v2 ∈ V , is
equal to another vector: u = λ1v1 + λ2v2 (where λj are constants) which also lies
inside V .
In the context of our differential equation, this means that any linear combination
of solutions, f1 and f2, to (398) must also be a solution to the differential equa-
tion:
λ1f1 + λ2f2, (399)
where λ1,2 are real or complex coefficients. This means that the solution space to
our second order ODE is closed under addition and scalar multiplication – making
it a vector space. Note that it trivially contains the ‘additive identity’ or ‘zero
element’, given by the function: f(x) = 0.
Exercise 62 (Doubting Thomas) The expression ‘don’t be a doubting’ Thomas,
comes from Thomas apostle who refused to believe that Jesus had returned from
the dead and appeared to the other apostles. He demanded to see evidence with
his own eyes – thus remaining skeptical till he could feel and see the wounds the
martyr had received during crucifiction. In a dramatic re-adaptation of ancient
events, Thomas McKenney refuses to believe that the second order ODE (398) is
linear till he sees a proof with his own eyes.
Q1: To help Thomas, prove that given any two solutions f1(x) and f2(x) to the
ODE (398), that
g(x) = λ1f1 + λ2f2 (400)
is also a solution, where λ1,2 are real or complex coefficients.
Q2: To finish showing that the solution space to (398) is a linear space (vector
space), we must show it is a vector space. Show that the ‘zero function’:
0(x) = 0 ∀x ∈ D (401)
where D is the domain on which the ODE is defined, is a solution to the ODE. This
is called the ‘trivial solution’.
It is now instructive to see an explicit example of a second order ODE and its
solutions.
169
Exercise 63 Q1: Verify that f(x) = λ1e
(−1+
?13)
6
x
+ λ2e
(−1−
?13)
6
x
, is a solution to
the following differential equation:
3
d2f
dx2
+
df
dx
− f = 0, (402)
where λ1,2 are constant coefficients.
Q2(Messy): By defining f0 = f(0) to be f evaluated at x = 0 and f0 := df
dx|x=0 to
be the derivative of f (with respect to x) evaluated at x = 0, express the constants
λ1 and λ2 in terms of f0 and f0.
As the last exercise illustrates, it is relatively easy to verify that solution satisfies
some differential equation once you have it – but in general, you will have to derive
the solution yourself. To do this, we take a brief detour into the land of matrices and
linear algebra to obtain a solution algorithm for second order linear ODEs.
19.2 Theory of Linear ODEs
In general, the corresponding demonstration can be performed for n-th order linear
ODEs with constant coefficients. However, for simplicity, we shall stick to the case
n = 2 – second order ODEs.
If we are given second order linear differential equation
a
d2f
dx2
+ b
df
dx
+ cf = 0, (403)
you may have noticed (recalling tutorial 16 and 17) that we can re-write this
as:
Df = 0, (404)
where
D := a
d2
dx2
+ b
d
dx
+ c (405)
is a linear (differential) operator. However, from linear algebra (or tutorial 8), you
may recall that linear operators and matrices have one-to-one correspondence –
at least for finite dimensional vector spaces. This immediately suggest a relation-
ship between linear algebra and differential equations. Now consider the following
trick – very similar to the one we used to solve our second order projectile motion
differential equation in tutorial 16:
f1 :=f,
f2 :=
df
dx
, (406)
170
where f is the dependent variable in our second order ODE (403). If we differen-
tiate the above system of equations for f1 and f2, we get:
d
dx
f1 :=
df
x
= f2,
d
dx
f2 :=
d2f
dx2
= −
b
a
df
dx
−
c
a
f
= −
c
a
f1 −
b
a
f2, (407)
where we have simply used the definitions of f1,2 and expressed f of f and f by
re-arranging our ODE (403). This almost looks like a system of linear equations –
indeed it is! If we define the 2-by-2 matrix M
M =
¢
0 1
−c
a − b
a

(408)
we see that the array (407) can be written as a matrix equation:
d
dx
F = MF (409)
where F =
¢
f1
f2

is a column vector containing the functions f1 and f2. Note that
the differential operator d
dx simply acts on a general matrix by acting on each of its
components – so for example,
d
dx
F :=
¢df1
dx
df2
dx

. (410)
Therefore, we have reduced our linear second order differential equation (403) to a
system of coupled first-order differential equations (407). Intuitively, the form
suggests ‘separation of variables’ – something like:
dF
F
= Mdx, (411)
then integrating both sides. Strictly speaking, this is not ‘formally’ correct (al-
though it can be formalized) – nonetheless, it gives the solution (provided a =
0):
F = CT
eMt
. (412)
171
Here C = is a 2-dimensional vector of constants – CT = [c1c2] is its transpose,
turning it into a row vector (a 1-by-2 matrix). The quantity eMt is the ‘matrix
exponential’ of the matrix Mt (M multiplied by t) – it is well-defined provided
M has a finite operator norm. For now, you don’t have to worry about what this
means – we encountered this object earlier in tutorial 8, where we used Taylor
series to express the exponential of a matrix. Here it suffices to note that we can
compute the exponential of a matrix easily by diagonalizing it – that is, by finding
its eigenvalues (spectrum) and its eigenvectors.
If M is of a ‘nice form’, it will have two linearly independent eigenvectors cor-
responding to two (possibly distinct) eigenvalues. In the case where there is only
one distinct eigenvector, we have to put M into its ‘Jordan normal form’ which
is not too difficult, but well-beyond the technicality we we want to delve into 141.
Nonetheless, assuming we can diagonalize M, then there exists some matrix U
consisting of the eigenvectors of M as its columns, such that:
M = UΛU†
, (413)
where † denotes the conjugate transpose of X (if X is real-valued matrix, this is
just the transpose). The matrix Λ is the 2-by-2 diagonal matrix:
Λ =
¢
λ1 0
0 λ2

(414)
consisting of the eigenvalues λ1,2 of M. Hence, the matrix exponential eMt is
given by:
eMt
= X
¢
eλ1t 0
0 eλ2t

X†
. (415)
In the case the eigenvalues are distinct (non-equal), one will find that the general
solution to our second order linear differential equation (403), is given by:
f(x) = c1eλ1x
+ c2eλ2x
, (416)
where λj are the eigenvalues of the matrix M and cj are constants determined
by ‘boundary values’ (f and its derivative at x = 0) or ’initial conditions’ (if the
independent variable x represents ’time’). Note, we can consider the functions eλ1x
and eλ2x to be ‘basis vectors’ for the 2-dimensional solution space of our ODE –
hence any general vector in that space (i.e. any solution) is necessarily some linear
combination of eλ1x and eλ2x!
141
Alternatively, one can make use of the ‘Putzer algorithm’ to compute the matrix exponential. In
some cases, it is easy to compute directly via Taylor series and brute force – with some intuition.
172
Exercise 64 (Spectrum of a Linear Second Order Differential Operator) Show
that the eigenvalues of the matrix
M =
¢
0 1
−c
a − b
a

(417)
which arose in our construction of a general solution to the second order ODE
(403), are given by the quadratic equation:
aλ2
+ bλ + c = 0. (418)
Such an equation is called the characteristic equation or auxiliary equation for
the second order ODE (403).
Hint: Recall that the spectrum of a square n-by-n matrix M is obtained by solving
the eigenvalue equation:
MF = λF ⇐⇒ (M − λI)F = 0, (419)
where I is the n-by-n identity matrix. Such a system has solutions precisely when
the determinant of (M − λI) vanishes:
det(M − λI) = 0. (420)
This gives an n-th order polynomial in λ. For you, n = 2.
19.3 Explicit Algorithm and Illustrations
For some of you, the above derivation of a general solution to second order linear
homogeneous ODEs for the case of ‘distinct eigenvalues’, may seem a bit abstract.
To illustrate an easy-to-remember ‘algorithm’ for solving any such ODE, we will
explore a physical example of a second order ODE – the damped harmonic oscil-
lator. Such an example is fundamental physics, engineering and many problems
in mathematical modellings since a variety of physical processes are governed by
identical mathematics.
Recall that given an arbitrary linear homogeneous second order ODE
a
d2f
dx2
+ b
df
dx
+ cf = 0, (421)
its corresponding characteristic equation (eigenvalue equation) is given by:
aλ2
+ bλ + c = 0, . (422)
173
Solutions to the characteristic equation are given by the quadratic formula:
λ =
−b ±
?
b2 − 4ac
2a
. (423)
As such, solutions depend fundamentally on the sign of the discriminant:
∆ := b2
− 4ac. (424)
This gives us the following cases:
• Case 1a: Real and Distinct If the discriminant is positive:
∆ = b2
− 4ac  0, , (425)
then there are two real distinct eigenvalues, λ1,2 and the general solution to
our ODE (421) is given by:
f(x) = c1eλ1x
+ c2eλ2x
, (426)
where cj are constants to be determined by initial / boundary values.
• Case 1b: Complex Conjugate Pairs If our discriminant is negative:
∆ = b2
− 4ac  0, (427)
then we will have two complex roots, λ1 and λ2 = ¯λ1 which are complex
conjugates of each other:
λ1 = −
b
2a
+ i
—
|b2 − 4ac|
2a
, λ2 = −
b
2a
− i
—
|b2 − 4ac|
2a
. (428)
Note that whenever complex eigenvalues appear to a linear ODE of any order
(with constant real coefficients), they must always appear in complex conju-
gate pairs. This is a consequence of an elementary theorem of polynomials
with real coefficients. In this case, the general solution is still given by:
f(x) = c1eλ1x
+ c2eλ2x
, (429)
except that cj may now be complex coefficients. Using some algebraic ma-
nipulations (Euler’s formula), we can express the general solution in a stan-
dard trigonometric form with real coefficients k1,2:
f(x) = e− b
2a
x
(k1 cos(ωx) + k2 sin(ωx)) (430)
where ω :=
?|b2−4ac|
2a .
174
• Case 2: Real Repeated Roots When the discriminant vanishes:
∆ = b2
− 4ac = 0, (431)
we get a real repeated root: λ1 = λ2 = − b
2a . In this case, we may be
tempted to write the general solution as:
f(x) = c1eλ1x
+ c2eλ2x
= (c1 + c2)eλ1x
, (432)
but quickly realize that we only one independent function: eλ1x. Since we
have a second order linear differential equation, we know that its solution
space must be a two-dimensional vector space – thus, in order to create a
basis for it, we need another function which is linearly independent from
eλ1x, but also a solution of our ODE. Such a function is given by multiplying
eλ1x by x, hence we find that the set of functions:
{eλ1x
, xeλ1x
}, (433)
spans the solution space142. In particular, the general solution to our ODE is
given by:
f(x) = c1eλ1x
+ c2xeλ1x
, (434)
where cj are constants determined by initial/boundary values.
Exercise 65 (Elementary, my dear Waston) In between legendary crime solving
stints, Sherlock Holmes gets bored. So instead of solving cases of crime, he solves
different cases of second order linear differential equations.
Q1: Using Euler’s formula
eiωx
= cos(ωx) + i sin(ωx), (435)
and the identity e− b
2a
±iωx
= e− b
2a e±ωx, help Sherlock prove that in the Case 1b of
complex conjugate roots, we can re-write the solution (429) in the trigonometric
form (430).
Hint: This means starting (429) and using Euler’s formula to get it into the form:
Stuff1 × e− b
2a
x
cos(ωx) + Stuff2 × e− b
2a
x
sin(ωx), (436)
where Stuff is some combination of c1 and c2. You then identify Stuff1 and
Stuff2 as the constants k1 and k2. If you have done your algebra correctly, the
constants kj are necessarily real (given a, b, c are real coefficients).
142
You have realized by now, that the solution space to the second order ODE (421) is simply the
kernel of the differential operator D = a d2
dx2 + b d
dx
+ c
175
Q2: If eλ1x is a solution to our ODE (421) in Case 2 of real repeated roots, prove
that xeλ1x is indeed also a solution of (421). To do this, you need to substitute
xeλ1x into the left-hand side of (421) and show that it vanishes.
We are now ready to consider a physical example of a second order linear differ-
ential equation. Imagine a spring placed on a table top, with a mass m at the end
of the spring and some wall placed at its other end. If it is stretched in a straight
line, by some initial displacement x0 from its equilibrium position x = 0, it will
undergo oscillatory (harmonic) motion. However, due to the friction of the table
top (and to a lesser extent – the air), this motion will be ‘damped’. The force on
the mass due the spring and its displacement x(t) at time t from its equilibrium
position x = 0, is given by Hooke’s law:
Fspring = −kx, (437)
where k is the ‘spring constant’ (related to the elasticity of the spring). The force of
friction can be modelled as a ‘linear drag’ at low velocities – meaning it is directly
proportional to the velocity v(t) = dx
dt of the mass at the end of the spring:
Fdamp = −b
dx
dt
, (438)
where b is the ’damping constant’ (related to the friction, air-resistance and internal
energy loss in the spring). Newton’s second law of motion tells us that the net sum
of forces acting on the mass m at the end of the spring is equal to the mass m times
its acceleration: a = d2x
dt2 . Thus, we get:
Ftotal =Fspring + Fdamp
=⇒ m
d2x
dt2
= − b
dx
dt
− kx, (439)
hence arriving at the following linear second order ODE with real coefficients:
m
d2x
dt2
+ b
dx
dt
+ kx = 0. (440)
This is a differential equation in the displacement x of the spring as a function of
the time t.
Exercise 66 (A Divine Comedian) Rather than braving the dark forest and nine
circles of hell by himself, Dante called upon the assistance of the ancient Roman
poet – Publius Vergilius to see him through. In one of the lost cantos, Dante finds
himself in a 10th circle of hell, where he has to solve the damped harmonic oscilla-
tor differential equation – however, his guide Virgil, not being educated in 18-19th
century mathematics, is unable to help him.
176
To help Dante reach his beloved Beatrice, you will need to solve the damped har-
monic oscillator problem – which I will walk you through.
Step 1: Given the DHO equation (440), write down the corresponding charac-
terstic equation (eigenvalue equation). Solve this equation using the quadratic
formula and write an expression for the discriminant.
Step 2: Your discriminant should take the form ∆ = b2 − 4mk. Depending on the
mass m, damping constant b and spring constant k, the discriminant may be posi-
tive, negative or zero – leading to vastly different motions of Dante’s spring. Define
the angular frequency of the motion to be ω =
?|b2−4mk|
2a (noting the absolute
value to make it real). Now, write down the general solution for the displacement
x(t) of the spring at time t, for each of the three different cases considered earlier
(positive, negative and zero discriminant).
Step 3: The constants cj or kj that you get in your solutions will be determined by
the ‘initial conditions’ of Dante’s spring system. In particular, if we define x0 :=
x(0) to be the initial displacement and 9x0 := 9x(0) to be the initial velocity (dots
indicating derivative with respect to time t), write your undetermined constants cj
and kj in terms of x0 and 9x0.
Step 4 – Characterising the system: The potential energy stored in Dante’s oscil-
lator at displacement x, is given by:
U =
1
2
kx2
. (441)
When b = 0, there is no damping and this energy is conserved – in particular, it is
converted between kinetic energy 1
2m(dx
dt )2 and potential energy. In that case, the
total energy is given by its initial kinetic energy and potential energy:
Etotal =
1
2
kx2
0 +
1
2
m( 9x0)2
. (442)
The effect of damping is that the total energy decreases over time – some of it is lost
due to friction, converting mechanical energy into thermodynamic energy (heat).
Q1: In the case of negative discriminant, you get oscillatory motion (with expo-
nential damping). It has angular frequency ω =
?|b2−4mk|
2a . Given this, work out
the period of one oscillation.
Q2: The damped harmonic oscillator is a resonating system. As such, we can
define ‘quality factor’ which characterises ‘how good a resonator’ it is. An ideal
177
oscillator would lose no energy per cycle of oscillation. We defined the quality
factor, in the case of negative discriminant, as:
Q = 2π
Energy Stored
Energy Lost Per Cycle
. (443)
Defining the ‘damping ratio’ ξ = b
2
?mk
, find an expression for the quality factor
Q of Dante’s spring in terms of ξ.
Step 5 [SAVE DANTE]: Having determined the solution to a damped harmonic
oscillator, Dante decides to call upon the help of Odysseus (In Latin, Ulysses)
and Aeneas to help him design a ‘spring catapult’. This will eject Dante from the
10th circle into Paradise, where he can be reunited with his lost love, Beatrice.
Assuming Dante weighs m = 70kg and that Paradise is 1000km in the vertical
direction from hell’s 10th circle, workout some possible set of values for the spring
constant k, damping coefficient b as well as initial displacement x0 and initial
velocity 9x0 which will allow Dante to reach Paradise.
To do this, assume that Dante is ejected from the Spring catapult when it reaches
its maximum displacement (limited by the stretchiness of the string) – at this in-
stant, Dante will leave the spring with an upward force of md2x
dt2 and some velocity
dx
dt . Assuming that the force of gravity acts downward on him once he leaves the
spring: F = −mg (g = 10m/s2), Dante will need sufficient exit velocity to reach
paradise.
Hint: Oscillatory motion (negative discriminant) occurs when the motion is under-
damped. In this case the spring oscillates about its equilibrium position, with an
exponential envelope damping the motion so after sufficient (infinite) time the os-
cillations cease – the spring then remains at equilibrium. Over-damped motion
occurs when the discriminant is positive and will typically lead to the spring re-
turning to equilibrium without oscillation.
Hint: This problem is not really well-defined and so might not be (easily) solvable.
So you can re-interpret it to give something sensible and solvable.
The last exercise should illustrate that even for a deceptively ‘simple’ system as
the damped harmonic oscillator, there are a lot of fine details regarding the time-
evolution and behaviour of the system. Such behaviour intricately depends on the
constants that appear in the second order ODE governing its motion.
To finish, we shall look at a more explicit demonstration of second order ODEs.
In particular, we will look at the LRC circuit. In particular, recall from tutorial
16 the examples of an RC and LR circuit. These circuits involved a resistive
178
element R, a capacitor (charged storage device) C and some inductor (coil of wire)
L. What happens when you shove all three elements together in a series circuit?
To answer this question, you may use Kirchoff’s Laws (conservation of energy
and charge) along with the defining expressions for inductance (Faraday’s law),
resistance (Ohm’s law) and Capacitance to obtain a dynamical equation for the
time evolution of the LRC circuit.
Exercise 67 The current I(t) (time rate of flow of charge: I = dq
dt ) flowing through
any element of a series LRC circuit at time t, is governed by the following second
order linear ODE:
[
d2
dt2
+
R
L
d
dt
+
1
LC
]I(t) = 0, (444)
where R is the resistance, L is the inductance and C is the capacitance.
Q0: If you who have studied circuits, derive the above differential equation.
Q1: The LRC circuit is simply a ‘re-hash’ of the damped harmonic oscillator.
Looking at the coefficient of dI
dt we see that the equivalent ‘damping constant’ is
given by R
L – correctly suggesting that the resistor acts to ‘oppose’ the flow of
charge (current). Further more, if we multiply through by L, coefficient 1
C of I(t)
suggests treating the capacitor as some ‘spring constant’ type term. This makes
sense in that a capacitor ‘stores energy’ (charge) in the manner that a spring stores
potential energy.
Solve the differential equation for the three different cases, depending on the sign
of the discriminant:
∆ =
(R
L ) − 4
LC
2
. (445)
Now pick some values of I(0), 9I(0), R, L and C and determine a unique solution
– this should correspond to only one case. Graph the current I(t) as function of
time t for the solution you get.
Q2: Define ω0 = 1?LC
, to be the ‘natural frequency’ of our LRC system. This is
the frequency our system would oscillate at without ‘damping’ (without resistance)
– i.e. a perfect resonator. Now define ‘nerper frequency’ α to be:
α =
R
2L
. (446)
Re-write the LRC differential equation in terms of ω0 and α.
Q3: The ‘damping ratio’ ξ for an LRC circuit characterises the ‘energy loss’ with
179
respect to the resonating properties of the circuit. It is defined by:
ξ =
R
2
™
C
L
. (447)
Re-write this in terms of the natural frequency ω0 and the nerper frequency α.
180
20 Tutorial 18: Calculus of Vectors and Differential Forms
I
In this tutorial, we will review the concept of (smooth) ‘vector-valued functions’
and the differentiable landscapes in which they arise. To understand various geo-
metric structures and physical phenomena from a contemporary perspective, one
will find that it is necessary to call upon the calculus of vector-valued functions.
For now, we will restrict ourself to the ‘differentiation’ side of things - in particular,
investigating differential operators such as the ‘curl’, ’gradient’, ‘divergence’ and
‘Laplacian’.
The vector calculus used by Engineers, chemists and elementary physicists today,
is largely due to developments in 19th century mathematics and physics – applied
and popularized largely by the work of James C. Maxwell, J. W. Gibbs and Oliver
Heaviside. Nonetheless, one can provide an efficient abstraction and generalization
of this ‘calculus vectors’ to the calculus of ‘differential forms’ – defined on the
‘exterior algebra’ of vector space. Collectively, this is called ‘exterior calculus’,
or ‘Cartan calculus’ due to the work of the great geometer, Ellie Cartan. Such a
framework is the natural framework to study modern differential geometry – and by
association, mathematical hydrodynamics (fluid mechanics), advanced mechanics,
relativistic electromagnetism and general relativity.
20.1 Vector Valued Functions
Recall that a real-valued function f of n variables takes a point x = (x1, x2, ..., xn)
in Rn and maps it to some real number f(x) = f(x1, ..., xn) ∈ R. In ‘physical’
terminology, such an object is a ‘scalar-valued function’. In modelling natural
phenomena, one quickly finds that scalar quantities (such as temperature, speed,
distance) are insufficient to describe nature. In particular, many physical quantities
are ‘vector-valued’ – for example, displacement, velocity, force, electromagnetic
fields and fluid flow. This immediately demands a formal notion of a ‘vector-
valued’ function.
Definition 11 A vector valued function F on Rn, takes a point p = (x1, ..., xn) ∈
181
Rn and maps it to a vector F(x) ∈ Rn. Mathematically 143,
F : Rn
→ Rn
p →F(p). (448)
We can represent a vector-valued function several ways. First, given some basis
{ej} for Rn, we can represent a vector-valued function simply by its component
functions Fj:
F(x) = (F1(x), F2(x), ..., Fn(x)) = (F1(x1, ..., xn), ..., Fn(x1, ..., xn)). (449)
Note that each component function, Fj(x1, ..., xn) is a function of n variables.
More explicitly, we can represent a vector-valued function in a geometrically in-
variant form:
F(x) = F1
(x)e1 + F2
(x)e2 + ... + Fn
(x)en (450)
expanding it in terms of the standard basis vectors ej multiplied by the component
functions Fj(x). [Recall the convention of ‘raising’ the indices of the component
functions and lowering those of the standard basis vectors – the Einstein summation
convention explained in tutorial 7].
Example 16 In physics, ‘force’ is formally a vector quantity. For example, given a
particle of mass m, the force it experiences depends on the nature of its trajectory
through space, via Newton’s second law:
F = ma. (451)
If we let x = (x1, ..., xn) be the displacement vector of the particle, then Newton’s
second law tells us that the force F is a vector-valued function of the displacement
x as follows:
F(x) = m
d2x
dt2
= m(:x1, :x2, ..., :xn). (452)
Problem 30 (The Garden of Forking Paths) Caught in a surreal twist of George
Louis Borges’ ‘garden of forking paths’, a student of the mathematical sciences
finds themselves in the middle of a garden. To their left and their right, lay two
opposing paths – forking off into darkness. The student is forewarned, that upon
143
Strictly speaking, F maps it to the ‘tangent space’ Tp(Rn
) at p (the space of all tangent vectors
at p). However, due to affine parallelism, this is canonically isomorphic to the vector space Rn
– so
we can ignore the distinction.
182
selecting one path, the other path will seal itself for eternity. The two paths, ulti-
mately lead to two very different futures (and pasts).
Thinking that they can cheat nature, the student begins down one path – then
quickly reverses towards the opposing path. Unbeknown to the student, is the pres-
ence of a sentient surveillance drone – monitoring the choices of the student. In
this instance, the paths shift and transform – leading the student into the delusion
that they have changed paths without detection.
From the point of view of the drone, which maintains constant altitude, the Garden
can be represented as a 3-dimensional vector space with the center of the garden as
the origin. The drone orbits the garden in a circle of constant radius r, described by
the radius vector r = (x, y, 0) – with the Cartesian coordinates x = x(t), y = y(t)
being a function of time. The altitude z = 0 is starting (constant) altitude of the
drone.
Q1: Switching to cylindrical polar coordinates, x = r cos(θ), y = r sin(θ) and
z = z, one has the inverse transformations: r =
—
x2 + y2 and θ = arctan(y
x ).
Since x, y were functions of t it follows that θ is a function of t. Since r is constant
(for circular motion), we have dr
dt = 0.
Compute the velocity of the surveillance drone. In other words, compute the vector-
valued function:
v =
dr
dt
, (453)
and simplify the expression in terms of x,y and ω = 9θ = dθ
dt . Hint: Use the chain
rule.
Q2: Given circular motion in some plane, the angular momentum of that motion
will lie in a direction orthogonal (perpendicular) to that plane. In particular, the
angular momentum for the drone’s motion is defined as the vector-valued function:
L = r × P, (454)
where P = m9r is linear momentum of the drone. The mass m is constant.
Compute the angular momentum of the drone. Simplify your expression so that
your final result is strictly in terms of m, ω and r – or m,v and r.
Hint: v = rω and x2 + y2 = r2.
Q3: Recall that the ‘more general’ form of Newton’s second Law, which applies
to physical systems beyond classical mechanics, is that given some object with
183
momentum P, the force it experiences is defined by:
F =
dP
dt
. (455)
Using this definition of force, show that the ‘Torque’ experienced by the surveil-
lance drone, defined by the vector-valued function:
τ =
dL
dt
, (456)
simplifies to:
τ = r × F. (457)
Hint: You can use the ‘product rule’ (Leibniz rule) for differentiation with the
cross-product (i.e. scalar derivatives distribute over the cross product). Remember
also that dr
dt = v = 1
m P and that the cross-product is anti-symmetric – hence,
u × u = 0 for any vector u.
20.2 Exterior Calculus
By now, in one way or another, most of you will have seen ‘differential forms’ – im-
plicitly, or explicitly. At the very least, you will have come across differential forms
when separating variables to solve first-order ordinary differential equations. You
will have also come across them when integrating. For example, the fundamental
theorem of calculus states that given any differentiable function f, one has:
df = f + c, (458)
where c is a constant of integration. The integrand, df, is an ‘exact differential 1-
form’ and the function f is a ‘0-form’. To get some conceptual intuition for differ-
ential forms, we now look at the concept of a ‘dual basis’ for a vector space.
Definition 12 Given a basis {ej} for an n-dimensional vector space, Rn, one can
define a dual basis {θj}, by its action on the original basis. In particular, one
whose elements obey the relation:
θj
(ek) := δj
k, (459)
where j and k range across 1, 2, ..., n. Furthermore, these elements {θj} form a
basis for a vector space – the dual vector space to Rn, denoted by (Rn) . For
184
For our purposes (since they are isomorphic), we can identify the dual vector space
with the original vector space: (Rn) = Rn. Hence, all linear combinations of the
dual vectors θj, are also dual vectors. Furthermore, their action distributes over
addition and scalar multiplication of vectors: hence
θj
(aek + bel) = aθj
(ek) + bθj
(el). (460)
Note that by convention, the indices on the vector basis are lowered and the indices
on the dual basis are raised. The object δj
k is the ‘Kronecker delta’, defined in
tutorial 7 as:
δj
k =
0 if j = k
1 if j = k.
(461)
Exercise 68 Take the standard basis {e1, e2, e3} for R3 and let {θj)} (where j =
1, 2, 3) be a basis dual to it. Using the previous definition of a dual basis, compute
the following quantities:
θ1
(e1) =
θ2
(e1) =
θ3
(e1) =
(aθ1
+ bθ2
)(e2) =
(aθ1
+ bθ2
)(e1 + e2) =
(aθ1
+ bθ2
+ cθ3
)(ke3) = . (462)
Check these with your tutors.
Problem 31 (Explicit Representation) Still running along their chosen path, our
student stumbles upon an opening in the garden – which reveals a clear, moon-
lit pool filled by a small natural waterfall. Drinking from the pool to refresh
themselves, the student leaps back in horror to find that their reflection no longer
matches them. At this instance, a Satyr emerges into the clearing and confronts
the student: “The image you see, is your dual self. This is your transformation, an
explicit future selected by the path you chose.”
Q1: As an approximation to the existential crisis faced by our student, we can
model 3-dimensional objects using vectors. To obtain a dual model, we simply
replace these vectors with their dual vectors. Show that if we represent vectors in
R3 as column vectors, then we can explicitly represent their dual vectors as row
185
vectors – i.e. their transpose144.
In particular, by letting e1 = (1 0 0)T , e2 = (0 1 0)T and e3 = (0 0 1)T ,
compute the quantities in the previous exercise by using this matrix representation
along with matrix multiplication (row vectors × column vectors).
Remark: Recall that you can write the ‘dot-product’ of two vectors u, v by writing
v as a column vector and multiplying it by u written as a row vector (this is actually
the dual vector of u). Hence u · v = uT v. In this manner, you can view the action
of a dual vector on a vector as the dot-product of two ‘standard’ vectors.
As it turns out, for the real space Rn, we can represent the standard unit vectors
e1, e2 and e3 in the x,y and z directions, respectively, as ‘tangent vectors’ or ‘partial
derivative operators’ in those directions:
e1 :=∂1 =
∂
∂x
e2 :=∂2 =
∂
∂y
e3 :=∂3 =
∂
∂z
. (463)
This is a formal correspondence145, however for now it suffices to view it in the
following intuitive way.
Proof 1 (Sketch of Vector-Operator Correspondence) Consider a particle mov-
ing with constant unit velocity ( v = 1) along the x-coordinate axis – its trajectory
γ(t) = (x(t), y(t), z(t)) (parametrised by time t) is a straight line. From Newto-
nian mechanics, we know that its velocity vector at any point on the trajectory is
tangent to the trajectory and points in the direction of the motion. Hence at the
point γ(t), the velocity vector of the particle is given in geometric form as:
v(γ(t)) =
dγ(t)
dt
=
∂γ
∂x
dx
dt
=1
∂γ(t)
∂x
. (464)
However, on the left-hand side, we know: v(γ(t)) = 1e1|γ(t) – a unit velocity
vector in the x-direction, at the point γ(t). Hence we can view the vector function
144
Incidentally, the matrix transpose operation is an explicit realization of the ‘dual map’ or ‘dual
transformation’ – turning vectors into their duals and vice versa.
145
See your tutor if you want a precise explanation.
186
v as a ‘differential operator’ acting on the curve γ(t):
v(γ(t)) =
∂
∂x
(γ(t)) (465)
to take its partial derivative with respect to x. Since this is true for any point γ(t)
along the trajectory γ, This allows us to make the identification:
v = e1 =
∂
∂x
. (466)
By considering similar motions in the y and z directions, we can make the identifi-
cations: e2 = ∂
∂y and e3 = ∂
∂z , completing the correspondence.
At this point, you might be wondering why this abstraction and formality is nec-
essary. It is necessary to establish to establish the correspondence between partial
derivative operators (tangent vectors) and differential 1-forms as ‘dual vectors’.
For the present, we will restrict ourselves to 3-dimensional vector spaces and func-
tions of 3 variables. Note however, that the following is easily generalized to n-
dimensional vector spaces, for 1 ≤ n  ∞.
First, recall the definition of the ‘total differential’ or ‘exact differential’ of a func-
tion.
Definition 13 (Exterior Derivative of a function) Given a differentiable function
f = f(x, y, z) of 3 variables x, y, z, its exterior derivative is given by:
df =
∂f
∂x
dx +
∂f
∂y
dy +
∂f
∂z
dz. (467)
This is an exact differential 1-form. The operator d acting on f is called the ‘ex-
terior derivative operator’ – in this case (acting on functions), it simply coincides
with the ‘total differential’.
Previously (tutorial 4-6) we viewed the objects dx, dy, dz as ‘infinitesimal’ length
elements in the x, y, z directions – mentioning that they were ‘vectors in the ab-
stract sense’. What we really meant to say, is that dx, dy, dz are exact differential
1-forms. They arise as the natural ‘dual basis’ for R3.
Example 17 Earlier we showed that we can represent the standard basis for R3
as ‘partial derivative operators’ – that is, ej = ∂j := ∂
∂xj
where x1 = x, x2 =
y, x3 = z. This allows us to identify {θ1 = dx, θ2 = dy, θ3 = dz} as the dual
basis, in following way (definition):
θj
(ek) = dxj
(∂k) :=
∂xj
∂xk
. (468)
187
Since the coordinates x1, x2, x3 are all independent, it follows that:
∂xj
∂xk
= δj
k, (469)
whence the collection {dxj} satisfies the defining property of a dual basis.
Exercise 69 (Voices in the wind) After meeting Satyr and viewing their future self,
our student is now confronted by a gale – carrying with it, voices from their past
life. Amidst this chaotic cacophany, the student hears scattered teachings of di-
mensional analysis.
Q1: To banish the gale, the student must work out the relationship between the
dimensions of the basis vectors ej = ∂j and the dual vectors θj = dxj. Do this.
Hint: To compute the dimensional relation required, you can use the definition of
a dual basis: θj(ej) = δj
k and note that the Kronecker delta is a dimensionless
quantity.
Q2: You are now told that the differential 1-forms (dual vectors) dxj represent
‘infinitesimal length elements’. For this to make sense, it follows that [dxj] = L,
where L is some unit of length. From this and your result in Q1, compute the
dimensions of the tangent vector (partial derivative operator) ej = ∂j.
The name differential ‘1-form’ is suggestive of the fact that there exists, in general,
‘differential k-forms’, where k is some non-negative integer. Such a suggestion is
true – where k has an upper limit of k = n, n being the dimension of your vector
space (for us, n = 3). The reason for this limit will become apparently shortly. For
now, it necessary it introduce a special ‘product’ between differential forms – the
‘exterior product’. Under this product, differential forms form an ‘algebra’ known
as the ‘exterior algebra’. As it turns out, any finite dimensional vector space (e.g.
R3) automatically comes equipped with an exterior algebra.
Definition 14 (Exterior Product) Given two differential 1-forms, ω and θ, their
exterior product ∧ is defined as:
ω ∧ θ, (470)
which is a differential 2-form. Furthermore, ∧ is characterised by the following
properties:
1. Antisymmetry:
ω ∧ θ = −θ ∧ ω. (471)
2. Bilinearity:
(λ1ω + λ2φ) ∧ θ = λ1ω ∧ θ + λ2ω ∧ φ, (472)
188
and
ω ∧ (λ2φ + λ3θ) = λ1ω ∧ φ + λ3ω ∧ θ, (473)
where λj are real constants and θ, φ, ω are differential forms.
Exercise 70 Using the above definitions, compute the following exterior products:
1. dx ∧ dx =
2. (dx + dy) ∧ dy =
3. (xdx + ydy + zdz) ∧ (xdy + ydy + zdy) =
4. (dy ∧ dz) − (dz ∧ dy) = .
As noted, the exterior product or ‘wedge product’ of two differential 1-forms, pro-
duces an object known as a ‘differential 2-form’. In general, one can define a
differential k − form, where the integer k denotes the ‘degree’ of the differen-
tial form. Thus, it naturally follows to extend the definition of an exterior product
between differential forms of arbitrary degree. To do this, we need to add the
‘associative property.
Definition 15 Given differential forms ω, φ, θ of arbitrary degree, the exterior
product is associative:
ω ∧ (φ ∧ θ) = (ω ∧ φ) ∧ θ = ω ∧ φ ∧ θ. (474)
Hence we can omit the brackets.
Back in Leibniz’s day, when Bach was reinventing music and Newton was compet-
ing for priority in the invention of ‘calculus’, quantities such as ‘dx’ were viewed
as ‘infinitesimal length elements’. One attempt to formalize this notion is found
in an area of mathematics known as ‘non-standard’ analysis – something akin to
the perturbation theory used by physicists. In modern geometry however, quanti-
ties such as dx are formalized by the ‘calculus of differential forms’, pioneered by
Ellie Cartan.
The notion of ‘dx’ is an infinitesimal length element makes some sense, consid-
ering it has the correct dimensionality. In this regard, one may view differential
2-forms such as dx ∧ dy as corresponding to an ‘infinitesimal area element’ – in
particular, an infinitesimal parallelogram (square) consisting of sides dx and dy.
Differential 3-forms such as dx ∧ dy ∧ dz then correspond to ‘infinitesimal vol-
ume elements’ – an infinitesimal parallelipiped (box) with edges dx, dy, dz. The
following exercises should help formalize this notion.
189
Exercise 71 Using the anti-symmetry, linearity and associative properties of the
exterior product, compute/simplify the following exterior derivatives:
• (dx ∧ dy) ∧ dx =
• (4dx ∧ 9dy ∧ dz) ∧ 3dx =
• (dx ∧ dy ∧ dz) − (dz ∧ dx ∧ dy) =
• (dx ∧ dy ∧ dz) + (dz ∧ dx ∧ dy) + (dy ∧ dz ∧ dx) =
Problem 32 (Lines, Planes and Orientation) Considering that dx and dy are ba-
sis vectors for the vector dual space of R3, which is equivalent to R3, one may view
the differential 2-form (dx ∧ dy) as an object representing the x − y plane. Lines
in x, y, z directions can then be represented by differential 1-forms dx, dy, dz.
Q1: When taking the exterior product between (dx∧dy) and any other differential
form, only the components orthogonal to dx and dy (the x and y directions) survive.
Show that this is true, by computing:
ω ∧ (dx ∧ dy) (475)
where ω = adx + bdy + cdz is an arbitrary differential 1-form (trivially, only the
z-component should survive).
Q2: Let 1?2
(dx + dy) and 1?2
(dx − dy) represent the lines: y = −x and y = x.
Since these lines are orthogonal, the wedge product of the corresponding differen-
tial forms should survive. Compute
1
?2
(dx + dy) ∧
1
?2
(dx − dy). (476)
Remark: The result you get should be proportional to dx ∧ dy – which represents
the x, y plane. This says that the differential forms 1?2
(dx + dy) and 1?2
(dx − dy)
act as a basis for the x-y plane. Note that this makes sense, since if we change these
to vectors via the dual map: dx → e1 and dy → e2, we simply get the standard
basis vectors rotated by 45 degrees clockwise.
Q3: For a 3-dimensional vector space, what is the largest degree that non-zero
differential form can have? To answer this question, consider the differential 3-
form dx∧dy∧dz and try to compute its exterior product with any other differential
form (of degree k ≥ 1).
Q4: What happens when you change the order the of the differential 1-forms ap-
pearing in dx ∧ dy, dy ∧ dz, dz ∧ dx and dx ∧ dy ∧ dz?
190
In particular, to reverse the orientation of the y direction we can replace y with
−y. Compare dx ∧ d(−y) to −(dx ∧ dy) and (dy ∧ dx) – what do you notice?
What relation does this suggest between signs and orientation of coordinates in a
differential form?
You should notice that the overall sign changes. This is because each differential
form inherits the orientation imposed on the underlying vector space. In particular,
if we choose our orientation to be ‘right-handed’ (e3 = e1 × e2) then we define:
dV = dx ∧ dy ∧ dz, (477)
to be the ‘orientation volume form’.
Q5: Prove that any differential 3-form on R3 must be a multiple of the orienting
volume form, dV = dx ∧ dy ∧ dz.
Those of you who completed this tutorial may find some of the final concepts to be
vague or abstract – not to worry! Next tutorial, we will make a lot of notions more
‘explicit’ – in particular, by linking exterior products to vector cross products and
showing how they generalize the cross product to arbitrary dimensions. Further-
more, we shall illustrate a strict duality between lines and planes via the ‘Hodge
dual map’ on differential forms. Finally, we shall define the exterior derivative d
on differential forms of arbitrary degree and see how we can use this to link the
calculus of differential forms to the calculus of vector-valued functions.
191
21 Tutorial 19: Calculus of Vectors and Differential Forms
II
In the last tutorial we reviewed the concept of ‘vector-valued’ functions and a ‘vec-
tor field’. We also established preliminary notions of ‘exterior algebra’ – that is,
the exterior product (wedge product) and differential (exterior) forms.
This week, we will continue presenting ideas side-by-side from the 19th Century
perspective (vector calculus) and the 20th century perspective (exterior calculus).
Although the latter may seem more advanced or abstract, it will become as in-
tuitive as the calculus of vector spaces. Overall, these notions are necessary for
understanding higher mathematics and quantifying the beauty of nature.
21.1 Gradients and Exterior Derivatives
Recall that the derivative df
dx |x=x0 of a function f of one variable, gives us the
slope of a tangent line to the graph: y = f(x) at the point x = x0 at which we
evaluate the derivative. We can generalize this geometric relation between ‘slopes’
of graphs and derivatives to functions of several variables.
21.1.1 Gradients
Definition 16 (Gradient) Given a function f : Rn → Rn, (x1, ..., xn) → f(x1, ..., xn)
of n variables, its gradient vector field f is given by:
f =
∂f
∂x1
∂1 +
∂f
∂x2
∂2 + ... +
∂f
∂xn
∂n, (478)
where ∂j are the standard (Cartesian) basis vectors146 for Rn. In component form,
we can denote f by ( ∂f
∂x1 , ..., ∂f
∂xn ).
Note that the symbol is vector differential operator, nabla (derived from ‘nevel’,
the Hebrew word for ‘harp’). We can represent in Cartesian coordinates by:
=
∂
∂x1
+
∂
∂x2
+ ... +
∂
∂xn
. (479)
146
You may recall the notation ej or ˆxj for the j − th standard basis vector. Here we use the
notation ∂j := ∂
∂xj , drawing upon the correspondence established in Tutorial 18 between standard
basis vectors and partial derivative operators.
192
Hence, from this perspective, the gradient f is simply given as the above oper-
ator acting on f. Furthermore, is a linear differential operator (since it can be
represented as as sum of linear differential operators) – hence it obeys the Leibniz
product rule:
(fg) = f g + g f, (480)
and linearity property:
(c1f + c2g) = c1 f + c2 g, (481)
for arbitrary differentiable functions f, g.
To get an intuition of how the ‘gradient vector field’ behaves, we first illustrate its
first fundamental property – the gradient of a function is a vector field which points
in the direction of the maximum (positive) rate of change of the function and whose
magnitude is equal to the ‘slope’ of the graph of the function in that direction.
Example 18 (A Sunburnt Country) Nostalgic over happier times, Dr. Claire Wadding-
ton decides to read a poem by Dorothea Mackellar while partying in a desert music
festival with Angus Turner: “I love a sunburnt country, A land of sweeping plains,
Of ragged mountain ranges, Of droughts and flooding rains.... Realizing that she
is surrounded entirely by barren sweeping plains, she recalls her time in Northern
England. To this extent, she decides that she can approximate the hills between
Durham and Edinburgh by Circular or Elliptical Paraboloids.
If we define a function f(x, y) = h−(x2+y2) of two-variables, then the graph of a
circular paraboloid in 3-dimensions, is given by: z = f(x, y). Letting z represent
the altitude of a point on the hill (whose peak height is h), and (x, y) represent
coordinates on 2-dimensional map (restricted so that x2 + y2 ≤ h), we can draw
the ‘level sets’ of f or ‘contour lines’ of the hill by drawing the concentric circles,
z = f(x, y) for different values of z [Do this as an exercise].
The gradient vector field of f, is given by:
f =
∂f
∂x
∂x +
∂f
∂y
∂y = −2x∂x − 2y∂y. (482)
This points in direction (x, y), which is the direction opposite the radius vector
r = x∂x + y∂y – as such, it is orthogonal to the level sets (circles) of f. Note that
z = 0 corresponds to the summit of the hill – which is where the gradient vector
field points towards (increasing altitude). Since the gradient field is perpendicular
to the contour lines, it points in the direction of the maximum rate of increase of
z = f(x, y) – i.e. the steepest ascent up the hill.
193
The magnitude of f is given by its norm: f = 2
—
x2 + y2 = 2r. It is equal
to the magnitude of the slope of the line z = −2r – the path of steepest ascent.
In the last example, we simply stated that f pointed in direction of maximum
rate of change without proving it. Other than using properties of contour maps
(which follow from said mathematics), we can prove our statement using the con-
cept of ‘directional derivative’. Given a function f of n variables, we can define its
directional derivative in the direction of the vector v by its dot-product with the
gradient vector field of f:
Dvf := ( f) · v. (483)
This gives us the ‘rate of change of f’ in the direction of v. If we want the rate of
change per unit distance, we must normalize v to make it a unit vector:
ˆDvf := ( f) ·
1
v
v. (484)
Problem 33 (Cartographer’s Catastrophe) One known property of contour maps
in geography, is that the path of steepest ascent is in a direction perpendicular to
the contour lines. However, this is precisely the direction of the gradient vector
field – hence the gradient vector field points in the direction of the maximum rate
of increase of a function (for geographical maps, we have ‘altitude’ as a function
of two variables (x, y)). To convince yourself, note that moving parallel to a con-
tour line leaves you at a constant altitude – only when you deviate from the contour
direction does your altitude begin to change.
One day, Carly Fazioli decides to become a Cartographer (map maker) – hereby,
changing her name to Carly Cartographer. When purchasing one of her maps, an
ex Georgian student of the mathematical sciences study group asks Carly to prove
that paths perpendicular to contours are those of steepest ascent. Not having stud-
ied mathematics, Carly is in a catastrophe. Help Carly by proving this statement!
Hints:
• First note what value of θ maximizes cos(θ) for 0 ≤ θ ≤ π.
• If we let z = f(x, y) represent some landscape (with Z being altitude), write
down the rate of change of z in the direction of an arbitrary 2-dimensional
vector v in the x-y plane. Do this in terms of v and f via the formula:
a · b = a b cos(θab).
• Show that the direction derivative ˆDvf of f, is maximized when v is parallel
to the gradient vector field f – i.e. it points in the same direction.
194
Aside: Those of you who paid attention, will quickly notice that we only proved
the maximizing property of the gradient vector field – we didn’t actually prove that
it is perpendicular to contour lines! To see this in generality, try to understand the
following argument.
Toy problems aside, gradient vector fields also play a fundamental role in opti-
mization theory and physics. In particular, you may recall that the ‘work done’
under a ‘conservative force’ field is ‘independent of the path taken’. This is a di-
rect consequence of the general definition of work as the ‘line-integral’ of the force
experienced along a path, as well as the ‘Kelvin-Stokes’ theorem and the fact that
curl-free fields can be written (with some restriction) as a gradient vector field. One
day you will understand the power of these statements, but for the meantime we
will focus on idea of ‘conservative forces’ and ‘potentials’.
Recall that Isaac Newton derived Kepler’s astronomical laws of orbital motion by
postulating his own gravitational force law. In particular, given two massive bodies
of mass m and M, the gravitational force that M exerts on m is given by:
F =
GmM
r2
ˆr, (485)
where ˆr is a unit vector pointing from m to M, r is the distance between the bodies
and G is Newton’s gravitational constant. Hence, since F = ma, the acceleration
a experienced by m due to this force, is given by:
a =
GM
r2
ˆr. (486)
To see that the gravitational force is conservative, note that it can be derived from
a ‘gravitational potential’. In particular, the work done in moving the mass m
from a point at infinity to a radial distance r from M is given by:
W =
r=∞
r=r
F · dl =
GmM
r2
dr = −
GmM
r
. (487)
We then define the gravitational potential of the mass M by the function U(r) =
1
m W = −GM
r . Knowing this potential alone, we can reconstruct the gravitational
force field generated by the mass M.
Exercise 72 (May the force be with you) On a voyage through deep space to find
the ancient sith empire, the mixed-powers Darth Revan decides to pass the time by
deriving Newton’s gravitational force law – a first order approximation to Ein-
stein’s theory of gravity. To help Darth Revan, compute the gradient vector field
generated by the gravitational potential of a spherical starbase with mass M.
195
In other words, given
U = −
GM
r
, (488)
compute U and prove that it is equal to
GM
r2
ˆr, (489)
where r =
—
x2 + y2 + z2 is the magnitude of the radial vector, r = x∂x + y∂y +
z∂z and ˆr = 1
r r is the corresponding unit vector.
Hence, the gravitational force field generated by a massive body of mass M, ex-
erted on a body of mass m, is given by:
F = m U. (490)
Hint: It helps to show that ∂
x r−1 = −xr−3. Similarly for ∂
y r−1 and ∂
z r−1. Then
factorize the resulting gradient as −r−2r−1(x, y, z) = −r−2ˆr and add appropri-
ate constants.
To see that the gravitational force is conservative, note that the gravitational po-
tential U(r) (and hence work W = mU) generated by the mass M only depends
on the radial distance from the gravitational source M – therefore, the work done
to move another mass in the field of M only depends on the end-points of its path
(initial and final radial distances), but not the path taken.
As a bonus, note that the exact same mathematics can be applied to electrostatics
(except that you can negative charges but not negative mass). In particular, by
replacing the masses m and M with charges q and Q in the above exercise, as well
as replacing F = mg with F = qE and Newton’s constant G with Coulomb’s
constant C = 1
4π 0
– one can obtain identical results for electrostatics as those
for gravity. In this case, the electric potential generated by a charge Q is given
by:
U(r) = −CQr = −
Q
4π 0r
. (491)
Its gradient U is the electric field generated by the charge Q:
E = − U =
Q
4π 0r2
ˆr, (492)
where r is a (radial) vector pointing in the direction away Q. Note, if Q is a
negative charge, E will point towards Q and if Q is positive, the electric field will
point away.
196
21.1.2 Exterior Derivatives
Recall in the last tutorial and previous tutorials that we defined the exterior deriva-
tive (total differential) of a function f of n variables, in local coordinates (x1, ..., xn),
to be:
df =
∂f
∂x1
dx1
+
∂f
∂x2
dx2
+ ... +
∂f
∂xn
dxn
, (493)
where the objects dxj were formally defined to be the ‘dual basis vectors’ – i.e.
a basis for the vector-space of differential 1-forms. These we were related to the
standard basis vectors, {∂j} by their action on them:
dxj
(∂k) = δj
k, (494)
where δj
k was the Kronecker delta. Now notice the similarity between the exterior
derivative and the gradient operators:
df =
∂f
∂x1
dx1
+
∂f
∂x2
dx2
+ ... +
∂f
∂xn
dxn
f =
∂f
∂x1
∂1 +
∂f
∂x2
∂2 + ... +
∂f
∂xn
∂n. (495)
Clearly df and f have the same components. In particular df is the dual vector of
the gradient vector field f. This duality is formally provided by the Euclidean met-
ric g, whose components can be defined by its action on the standard basis:
g(∂j, ∂k) = δjk. (496)
Since g is a bilinear map (linear in each ‘slot’), we can fill one slot with a vector to
obtain a linear map:
g(pdj, ), (497)
which acts on vectors. For a Cartesian coordinate system in Euclidean space, one
simply has:
g(pdj, ) = dxj
, (498)
making the duality between vectors and differential 1-forms trivial. Now recall that
an ‘exact differential 1-form’ ω is one that can be expressed in the form ω = df,
for some function f. Such a form is uniquely defined up to function whose exterior
derivative is zero – i.e. a constant. To see this, note that if we transform f → f +c,
where c is constant, then one has:
ω = df → d(f + c) = df + dc = df + 0. (499)
197
In physics terms, the transformation f → f+c is an example of ‘gauge-transformation’
– electromagnetism is one gauge-theory which has this gauge symmetry.
Back to mathematics, we see that gradient vector fields f and exact differential
1-forms df have one-to-one correspondence. Clearly if we were to integrate df
between two points P1 and P2, by the fundamental theorem of calculus the result
must only depend on the value of f at these end points:
P2
P1
df = f(P2) − f(P1). (500)
It does not depend on the path taken between the two endpoints. Similarly, since
the gradient vector field f is dual to df, any line-integral of f must be path-
independent. However, note that any conservative force can be expressed as the
gradient of some suitable scalar potential: F = − U. The work-done under some
force-field F in moving an object from one point to another, is given by the line-
integral of F between those points – which in general, depends on the path taken.
Since F = − U for conservative force fields, it follows that the work-done must
be path independent!
Clearly, this duality between the exterior derivative and gradient operator provides
a neat way to prove a deep, fundamental result in physics. Moreover, it provides an
easy way to generalize classical results from flat 3-dimensional spaces to arbitrary
smooth manifolds in arbitrary dimensions. In a similar manner, we can use the ex-
terior derivative to prove the claim made earlier that the gradient vector field points
in a direction perpendicular to the contour curves (level sets) of a function.
21.2 Divergence
Previously, gradient operator allowed us to turn a function into a vector field. In
modern speak, it turned a rank-0 tensor into a rank-1 tensor field. A natural ques-
tion therefore, is whether there is a differential operator which turns rank-1 tensor
fields (vector fields) into rank-0 tensors (functions)? Of course there is.
Definition 17 Given a differentiable vector field v = v1∂1 + ... + vn∂n, its diver-
gence in the standard Cartesian basis {∂j} for Rn can be expressed as:
· v =
∂v1
∂x1
+
∂v2
∂x2
+ .. +
∂vn
∂xn
. (501)
198
Note that this definition is rather restrictive, since it refers explicitly to a Cartesian
basis. Nonetheless, until you begin problems in curvilinear coordinates (spherical
and cylindrical polar coordinates for example), it will suffice for most calcula-
tions.
Some insight into the geometrical significance of the ‘divergence’ operator can be
given in 3 dimensions, by its relation to ‘flux integrals’. Note that given a two-
dimensional surface S and some vector field v, the flux of v through S is roughly
the intensity of the component of v parallel to S – i.e. the rate of flow of some
property through S, per unit area. If you think of v as the velocity vector field for
some fluid and S as some surface immersed in the fluid, then the flux of through
S represents the intensity of the fluid flow through S. The divergence is then the
‘volume density’ of this flux.
Definition 18 (3-dimensional Divergence) Given a vector field F, and some point
p, let S be some closed surface S containing p bounding a volume V (notation: we
write S = ∂V , where ∂ denotes ‘boundary of’). The divergence of F at p is then
defined as the limit of the net flow (flux) of F across S = ∂V divided by the volume
V enclosed by S, as V collapses to zero:
div[F](p) = lim
V →p
S=∂V
1
V
F · dS. (502)
Note that the quantity
S=∂V
is the surface integral over the boundary surface
of V and dS = ndS is the outward-pointing normal vector of (perpendicular
to) S, whose magnitude dS is the infinitesimal surface area at any point on S.
Furthermore, note that this definition of divergence does not depend on the explicit
surface chosen – if it did, it would be useless!
As such, the divergence measures the ‘source’ or ‘sink’ behaviour of a vector field
at any point. To see this explicitly, consider the following example.
Example 19 (Fracking Well) Fracking is a process by which water can be pumped
deep below the Earth into fine rock formations to build pressure, crack rocks and
release natural gas for extraction. At the top of some well site, the diffusion of
the natural gas can be modelled by a vector field F = x∂x + y∂y + 100z∂z – i.e.
a somewhat upward ‘conical flow’. The units of the components is flow rate of
natural gas per second per unit area – i.e. [Mass]
[Time][Area] The divergence of this vector
199
field is given by:
Div[F] = · F
=
∂Fx
∂x
+
∂Fy
∂y
+
∂Fz
∂z
=1 + 1 + 100
=102, (503)
ignoring units. Since this divergence is positive, the well opening can be viewed
mathematically as a ‘source’ for the natural gas vector field. You can plot this
vector field to see that it does indeed ‘look’ like a source.
If one were to consider another operation – for example, Carbon sequestration,
then you may write a carbon vector field as: F = −x∂x − y∂y − 10z∂z at the
top of some pump leading underground, for example. Computing the divergence
should give you a negative quantity – hence corresponding to the top of the pump
being a Carbon sink.
Example 20 (Electric Charge) Due to Michael Faraday, we often do calculations
in electromagnetism with an artificial quantity – the ‘electric field’ E defined in
terms of the force experienced by a positive test charge due to another given charge
configuration. In particular, an electric charge Q generates an electric field defined
by E = 1
QF, where F is the force experienced (a measurable quantity) by a positive
test charge in field of Q.
Regardless of the nature of the charge Q, at far enough distances, we can approx-
imate the electric field generated by Q as that of a point charge via Coulomb’s
law:
E =
Q
4π 0r2
ˆr, (504)
where r = x
r ∂x + y
r ∂y + z
r ∂z is a unit vector pointing outward from Q (placing Q
at the origin (0, 0, 0)).
Without a second thought, Kate Lindley is told that ‘positive charges’ act ‘sources’
for electric field lines and negative charges act as ‘sinks’. Graphically, this makes
sense if we plot E. However, if we try to formalize this notion, one might think that
in the neighbourhood of positive charges an electric field has positive divergence
– and around negative charges its divergence is negative. Is this really the case?
Considering a point-charge at the origin (0, 0, 0) and computing its divergence, we
200
get:
Div[E] = · E
=
∂Ex
∂x
+
∂Ey
∂y
+
∂Ez
∂z
=(
1
r3
−
3(x)
r4
xr−1
) + (...) + (...)
=
3
r3
−
3(x2 + y2 + z2)
r5
=0, (505)
since Ex = Q
4π 0r2
x
r = Q
4π 0
xr3, e.t.c. Oh no! What is going wrong here? The
real problem is that electric field E in its given form is not ‘defined’ at the origin
– technically speaking, this a hole in the solution space to Maxwell’s equations for
electromagnetism.
If we use Gauss’ law to compute the charge enclosed by a sphere around the origin,
then we must take into account that a point charge at the origin has an infinite
charge density at the origin and has zero charge density everywhere else. To take
this into account, we need the 3-dimensional Dirac delta distribution δ3(r) – which
is (informally speaking) defined to be infinity at r = 0 and zero everywhere else.
In this manner, we can write:
· E =
ρ
0
= 4πQδ3
(r). (506)
Since V δ3(r)dV = 1 (the volume integral for a volume V enclosing the origin –
by properties of the Dirac Delta), we recover Q as the charge by Gauss’ law.
Note that the concepts in the last example apply directly to Newtonian gravity, by
consideration of the gravitational field and Gauss law applied to gravity. A more
satisfying explanation for this phenomena is found readily in ‘De Rham Coho-
mology’, relating the concept of ‘closed differential forms’ and ‘non-exact differ-
ential forms’ to topology and a generalized notion of ‘charges’ and conservation
laws.
Problem 34 In fluid mechanics, a fluid is classified as ‘incompressible’ if the di-
vergence of its velocity vector field is zero:
Div[F] = 0. (507)
Similarly, a fluid can be called ‘vortex-free’ if its ‘curl’ is zero (another vector dif-
ferential operator). Fluids with vortices have non-zero circulation at various points
201
– i.e. the line integral of the velocity vector field about a closed loop containing
a vortex is non-zero. Loosely speaking, if you integrate around two vortices of
equal magnitude rotating in the opposite direction, the circulation is zero – hence
establishing the beginnings of a notion of duality between vortices and conserved
charges from electromagnetism.
Ignoring these concepts for now, consider a fluid with a single counter-clockwise
vortex modelled by the velocity vector field:
v = −y∂x + ∂y. (508)
Such a vector field arises as the curl of the vector field: 1
2
—
x2 + y2 + z2∂z. One
result of vector calculus is that the divergence of a vector field arising as the curl
of another vector field, is zero. Hence a true vortex has zero divergence – in some
regime (linear perhaps), they are conserved quantities.
Q1:Show that for v = −y∂x + ∂y, one does indeed have Div[v] = 0.
Q2: Given an arbitrary vector field F = Fx∂x + Fy∂y + Fz∂z expressed in a
Cartesian coordinate basis, its curl is a vector field defined by the expression:
× F =
¢
∂Fz
∂y
−
∂Fy
∂z

∂x +
¢
∂Fx
∂z
−
∂Fz
∂x

∂y +
¢
∂Fy
∂x
−
∂Fx
∂y

∂z.
Using this expression as well as the definition of the divergence, prove that Div[ ×
F] = ·( ×F) = 0. This establishes our previous statements in some generality.
Hint: You will need to Clairaut’s theorem – that is, partial derivatives of a (‘ap-
propriately defined’) function commute.
21.3 Hodge Dual, Closed and Exact Forms
For next time ... perhaps.
202
22 Tutorial 20: Calculus of Vectors and Differential Forms
III
In the last tutorial, we developed the concept of the ‘gradient vector field’ f
generated by a function f and demonstrated its duality to the exterior derivative df
of the function f. We then went through several exercises and problems illustrating
that the gradient vector field is orthogonal to the level sets of a function and that
it points in the direction of the maximum rate of increase of that function. We
then related gradients to ‘potentials’ and conservative forces – in particular, the
Newtonian gravitational force and electric field / electrostatic force law generated
by a point charge.
Furthermore, we investigated the concept of the ‘divergence’ of a vector field –
a type of derivative operator which turns vector fields into functions. This was
defined as the net flux of a vector field at a point, through some imaginary closed
surface bounding that point – divided by the volume enclosed by the surface as the
surface collapsed to zero. Hence the divergence was an operator that measured the
the ‘source /sink’ characteristics of a vector field at any given point.
22.1 Sleight of Hand
In the last tutorial, we proved that the gradient vector field f pointed in the di-
rection of the maximum rate of increase of f – and that its magnitude | f| was
equal to the magnitude of the maximum rate of change of f. With some differential
geometry, it then follows that the gradient vector field is orthogonal to the level sets
(contours) of f – however, we didn’t explicitly show this. A less advanced proof
(using vector calculus), is illustrated as follows.
First, we state two key ingredients (without proof):
• An n − dimensional smooth surface can be described by a family of n
orthogonal curves. Equivalently, the surface can be described by n linearly
independent tangent vector fields – these can be constructed to be mutually
orthogonal via the Gram-Schmidt process.
• The ’Implicit Function Theorem’ for a function of n variables. By now, you
should have covered this in class – if not, it can be found in any (decent)
calculus textbook. Roughly speaking, this guarantees that if the differential
df (or gradient f) of a function f is non-zero on some open set, then the
level sets (contours) of f (graph of the set f(r) = constant) exist.
203
Proof 2 (Orthogonality of Gradient and level sets) Given a function f on n vari-
ables, (x1, ..., xn) ∈ Rn, one defines a ‘level set’ of f (generalization of the notion
of ‘level curves’ to ‘level hypersurfaces’) as the set of points such that:
f(x1
, ..., xn
) = c, (509)
for some chosen constant c. Therefore, the family of level of sets of f is a parameter
family generated by c (as c varies) – the union of this family is thus the ‘contour
graph’ or ‘level graph’ of f. If the gradient f of f (or exterior derivative df)
is non-zero at some point y = (y1, ..., yn) (with f(y) = c), then the implicit
function theorem implies that the pre-image of C is a submanfiold of Rn – i.e.
the set {x ∈ Rn : f(x) = c} is a smooth hypersurface in Rn. This means it
is generated by n − 1 independent curves, with well-defined tangent vector fields
(velocity vectors) to each curve.
With the existence of a level set established, we parametrized one of the n−1 curves
in the level set by the vector function rc(t) = (x1(t), ..., xn(t)), where c labels the
level set (contour) and t is our parameter. Evaluating our original function f on
this curve and taking its exterior derivative t, we find
df =
∂f
∂t
dt
=[
∂f
∂x1
dx1
dt
+ ...
∂f
∂xn
dxn
dt
]dt. (510)
However, we know that f(rc(t)) = c (by construction of the level set), hence it
follows that: df
dt |rc(t) = 0. Hence, we have:
[
∂f
∂x1
dx1
dt
+ ...
∂f
∂xn
dxn
dt
] = 0. (511)
Now notice that the quantity on the left-hand side of this equation is simply the dot
product of the two following vectors:
f =
∂f
∂x1
∂1 + ... +
∂f
∂xn
∂n
9rc =
dx1
dt
∂1 + ... +
dxn
dt
∂n, (512)
evaluated along the curve given by rc(t) in a level set of f labelled by c. The vector
f is the gradient of f and 9rc is the tangent (velocity) vector to the curve rc(t).
Since · 9rc = 0 as shown above, it follows that the gradient is orthogonal to the
curve rc(t). Repeating this argument for all other curves generating the level set
204
(hypersurface) f(x) = c, it follows that the gradient vector field is orthogonal all
the level curves of f in this level set – hence it must be orthogonal to the level set
(hypersurface).
Applying the above result to the case n = 2 for contour maps – i.e graphs of the
form z = f(x, y), we see that there is only one level curve (n − 1 = 1) generating
each level set of f. Hence the gradient vector field f possesses two properties – it
is orthogonal to the contour lines of f and it points in the direction of the maximum
rate of increase of f (its magnitude f specifying this rate).
22.2 Curl of a Vector Field
We already know one derivative operator that acts on vector fields – the divergence.
This turns a vector field into a function. However, one may also ask for operation
that preserves the ‘tensor rank’ of a vector field – i.e. a differential operator that
turns vector fields into vector fields. One such operator is ‘curl’, denoted by ×.
Such an operator generates ‘vorticity’ (rotation) in a vector field – i.e. clockwise
or counterclockwise rotational flow behaviour.
We shall proceed by first giving a ‘geometric’ definition of the curl, then provide a
formula to calculate the curl of a vector field in Cartesian coordinates. N.B. – The
curl is only defined in three dimensions, just like the cross product.147
Definition 19 (Curl of a Vector Field) Given a vector field F, its curl × F is
defined implicitly as follows. Given a unit vector ˆn normal (orthogonal) to some
(imaginary) surface S, the component of × F in direction of ˆn is defined as
the ‘circulation per unit area’ of a curve C = ∂S bounding S as the surface S
collapses to zero:
( × F) · ˆn := lim
|S|→0
1
|S|
C=∂S
F · dr. (513)
Here, the ‘circulation’ is the line integral
C=∂S
F · dr (with positive anti-clockwise
orientation) of our vector field F around the curve C = ∂S bounding the surface
S.
Intuitively, the ‘circulation’ (line integral defined above) of a vector field mea-
sures how much a vector field ‘rotates’ (circulates) at any given point. Since the
147
Such an operator is naturally extended to arbitrary dimensions by a combination of the ‘exterior
derivative’ and ‘Hodge dual’ operators.
205
components of the curl are the area density of some infinitesimal circulation (in
each direction), the curl of a vector field must be zero if it exhibits no rotational
behaviour (no circulation). Here are some pictures pillaged from the internet to
illustrate this.
Figure 5: Curl of a vector field F.
Figure 6: Projections (components) of the curl defined geometrically as infinitesi-
mal area densities of line integrals around the boundaries of imaginary (contrived)
surfaces.
206
To get a clearer idea, we shall now give an algebraic definition of the curl as well
some examples and problems. Given a vector field148 F = F1∂1 + F2∂2 + F3∂3,
expressed in the Cartesian basis {∂1, ∂2, ∂3} = {∂x, ∂y, ∂z} for R3, its curl is given
by the following formula:
( × F) = ijk
(∂iFj)∂k
= ijk
(
∂Fj
∂xi
)∂k
=(
∂F3
∂y
−
∂F2
∂z
)∂x − (
∂F1
∂x
−
∂F3
∂z
)∂y + (
∂F2
∂x
−
∂F1
∂y
)∂z. (514)
Note that in the first line, we used the Einstein summation convention as well as the
‘Levi-Civita’ or ‘Permutation’ symbol – defined in earlier tutorials. These give an
elegant way to remember the curl. Alternatively, to remember the explicit formula
in the last line, you can think of the curl informally as the determinant of a 3-by-3
matrix with a row of the standard basis vectors [∂x∂y∂z] in the first line, the partial
derivative operators [ ∂
∂x
∂
∂y
∂
∂z ] in the second line and a row of the components of F
in the third line: [F1F2F3]. The derivative operators in the second line act on the
components in the third line of the matrix (taking partial derivatives of F), whilst
the basis vectors in the first row multiply everything – this ensures that the result is
a vector field (not a function):
∂x ∂y ∂z
∂
∂x
∂
∂y
∂
∂z
F1 F2 F3
.
We now proceed to some examples and problems.
Example 21 (Curl of a Gradient) In an ideal gymnasium, nobody in their right
mind curls in the squat racks. Similar to the ideal gym squat rack, a gradient vector
field has ZERO CURL. This is an extremely important and fundamental property
of gradient vector fields – one which is responsible for the statement that the ‘work
done to move a particle in a conservative force field, is independent of the path
taken’. In other words, it relates to path independence of line integrals of gradient
vector fields (with some minor technicalities).
To prove this statement, consider a function f of three Cartesian variables (x, y, z).
148
Recall that ∂j := ∂
∂xj is both a differential operator (partial derivative) and the standard basis
vector in the direction of the Cartesian coordinate xj
.
207
Its gradient vector field is given by:
f = (
∂f
∂x
)∂1 + (
∂f
∂y
)∂2 + (
∂f
∂z
)∂3. (515)
The curl of this vector field is then given using the Cartesian formula, stated earlier:
× ( f) =(∂2(∂3f) − ∂3(∂2f))∂1 − (∂1(∂3f) − ∂3(∂1f))∂2 + (∂1(∂2f)∂2(∂1f))∂3
=(
∂
∂y
∂f
∂z
−
∂
∂z
∂f
∂y
)∂1 − (
∂
∂x
∂f
∂z
−
∂
∂z
∂f
∂x
)∂2 + (
∂
∂x
∂f
∂y
−
∂
∂y
∂f
∂x
)∂3
=0. (516)
Note that we are still using the notation ∂j = ∂
∂xj . To get from the second line to
third line in the above derivation, we had to make use of Clairaut’s theorem – i.e.
the fact that partial derivatives (in Cartesian coordinates) commute:
∂
∂xj
∂
∂xk
f =
∂
∂xk
∂
∂xj
f, (517)
for arbitrary j, k = 1, 2, 3. For this commuting property to hold, it suffices that the
second order partial derivatives of f exist and are continuous.
Although we chose Cartesian coordinates for R3, we could have chosen any set of
coordinates (with an appropriation modification to the curl formula) and arrived at
the same general result:
× ( f) = 0, (518)
which holds ∀f satisfying the conditions of Clairaut’s theorem.
Alternatively, a very elegant and far more general proof can be found using exterior
calculus via the exterior derivative d and hodge dual :
d(df) = d2
f = 0, (519)
since d2 = 0. Hopefully we can cover this in a future tutorial.
Problem 35 (Faraday’s Law of Induction) The Scottish Mathematical Physicist,
James Clerk Maxwell, is perhaps most immortalized by through the ‘Maxwell
equations’ for electromagnetism. Technically, these laws were derived by other
scientists / mathematicians such as Faraday, Ampere and Gauss – however, their
vector-calculus form is due to the work of Maxwell. In this form, they are im-
plicitly relativistically invariant – a symmetry that helped spurr the discovery and
development of special relativity.
208
One of Maxwell’s equations, is a statement of Faraday’s law of induction. This
says that a time-varying magnetic field B is generated by an electric field E with
non-zero curl:
× E = −
∂
∂t
B. (520)
Now recall that the ‘Coloumb field’ – i.e. an electrostatic field generated by a point
charge, can be written as the gradient vector field of some potential (tutorial 19):
E =
Q
4π 0r
, (521)
where 0 is the electric permitivity of free space.
Q1: Using an earlier result from this tutorial, prove that as a consequence of Fara-
day’s law, that a point charge cannot generate a time-varying magnetic field. In
other words, show that any magnetic field arising from a point charge is necessarily
static. Hint: Static here means that ∂
∂t B = 0.
Q2: Instead of the electric field generated by a point charge, we now consider the
following static electric field:
E = −y∂x + x∂x. (522)
Draw a graph of this electric field in the x − y plane, then compute its curl:
× E = (523)
Now, using Faraday’s Law, solve the resulting vector differential equation for the
magnetic field B.
Hint: The differential equation is trivial. All you need to do is integrate over time
t. Since the electric field is static, such an integration is simple.
Q3: Repeat the previous question, this time adding a harmonic time dependence to
the electric field:
E = −yeiωt
∂x + xeiωt
∂x, (524)
where ω is the angular frequency of the electric field.
209
23 Tutorial 21: Coordinate Systems and Scale Factors
By now, all of you will have come across more than one type of ‘coordinate sys-
tem’. For example, in two dimensions you will have used rectangular (Cartesian)
coordinates, (x, y), as well as ‘polar coordinates’, (r, θ). Depending on the sym-
metries of your problem, each coordinate system would have had its advantages
and disadvantages.
In general, there are an infinite number of coordinate systems you could use to set-
up a problem. However, for spaces such as R2 and R3 equipped with a Euclidean
metric, there is a special (finite) class of coordinate systems known as ‘separable’
coordinate systems. Such a term arises from the fact that the Laplace operator
is separable in these systems – meaning that the Laplace equation is a differen-
tial equation that can be solved by ‘separation of variables’. More generally, the
Hamilton-Jacobi equations are separable in such coordinate systems.
In this tutorial, we will explore a few examples of different 2-dimensional and 3-
dimensional coordinate systems. In particular, we will develop several concepts
and ideas in a ‘geometrical viewpoint’. This should help you gain some physical
intuition behind objects such as the ‘Jacobian determinant’ as well as change-of-
variables.
23.1 Orientation and Measure
Recall that if we are integrating a function f = f(x) of one real variable x ∈ R,
we write:
I =
L
f(x)dx. (525)
The L here denotes the subset of the real line R which we integrate over. For most
integrals, this is just some interval149 L = [a, b]. One way to view the process
of integration is in terms of an operator (a ‘measure’) acting on a function. In
particular, we can view the previous integral I as the operator
L
dx (526)
acting on the function f. This assigns some value to f – it’s measure (Riemann
integral) on the set L (e.g. over an interval L = [a, b]). Such an abstraction,
149
Note that strictly speaking, it doesn’t matter whether or not you include the endpoints – if the
integral exists / converges, then you can take a limit which features the endpoints.
210
turns out to be very powerful and useful – for example, the study of probability
measures. Collectively, it is part of a beautiful area of mathematics known as
‘measure theory’.
Fundamental to the construction of a measure
L
dx on the real line R, is the exis-
tence of the object ‘dx’. By now, you should know that this is not just some hazily
defined ‘infinitesimal’ quantity along the x − axis – it is in-fact a well-defined
‘differential 1-form’. Geometrically, not only does it represent an infinitesimal line
element in the x direction, it also represents an ‘orientation’ on the x-axis. This
orientation is in the positive x direction. Technically, we could assign an opposite
orientation by defining dl = −dx in our definition of the measure,
L
dl.
For one-dimensional integrals, the notion of orientation may seem trivial. How-
ever, generalizing to integrals over surfaces, volumes and general n-dimensional
oriented manifolds, there are always two possible choices of ‘orientation’ defined
– this is encoded in an object known as the ‘orienting n-form’.
23.2 Smooth Curves
When we perform an integral f(x)dx, we are integrating the function f along
the x-axis representing the real line R. However, in general, one can integrate
functions along an arbitrary curve. Such integrals form a class known as line inte-
grals.
A curve is a 1-dimensional manifold, meaning it can be parametrised by one vari-
able. In a Euclidean space such as R2 or R3, we can represent this curve by the
functional equations (in the standard Cartesian basis):
γ(t) = x(t)e1 + y(t)e2, t ∈ L (527)
and
γ(t) = x(t)e1 + y(t)e2 + z(t)e3, t ∈ L (528)
where ej are the standard basis vectors in the x, y and z directions and L is some
subset of R – e.g. L = [0, 1] or L = R itself. In this manner, we can view γ(t)
as the position vector for some motion. This means, that for each value of t, we
assign a vector γ(t) which starts at the origin 0 and points to some location on the
curve.150
150
Such a construction is only possible for affine spaces – such Euclidean vector spaces. In general,
one must be more subtle in defining and representing a curve.
211
At each point γ(t0) on the (smooth) curve γ, there is a unique vector tangent to
the curve at that point. This vector, also known as the ‘velocity vector’, is given
by:
9γ(t0) =
dγ
dt
|t=t0 = lim
∆t→0
γ(t + ∆t) − γ(t)
∆t
. (529)
Exercise 73 (Constructing a Tangent Vector Field) At this point, the dissatisfied
tutorial member asks for their money back – not having gained any geometrical
intuition. To this extent, they are given the following exercise to construct a tangent
vector field.
1. Take Cartesian coordinates x, y, z for R3 and draw a curve γ(t) = (x(t), y(t), z(t))
in R3.
2. Label two separate points t0 and t1 along the curve and draw the cor-
responding position vectors γ(t0) and γ(t1) along the curve. Note that
both these vectors start at the origin and point to (x(0), y(0), z(0)) and
(x(1), y(1), z(1)), respectively.
3. Draw the displacement vector, ∆γ = γ(t1) − γ(t0), using your vector sub-
traction rules.
4. Recall that the linearisation of a function f about some point t is given by
its first order Taylor expansion:
f(t + ∆t) ≈ f(t) +
df
dt
∆t. (530)
Similarly, we can define the linearisation of the curve γ about a point t as
follows:
γ(t + ∆t) := x(t + ∆t)e1 + y(t + ∆t)e2 + z(t + ∆t)e3. (531)
Using the previous result (530), expand each of the coordinates x(t), y(t), z(t)
about the point t on the curve. Hence obtain an expression for γ(t + ∆t)
and collect the coefficients for each of the standard basis vectors ej.
5. Using your previous result, simplify the expression on the right-hand side of
the following:
∆γ = γ(t + ∆t) − γ(t). (532)
Your result should involve a factor of ∆t multiplying everything on the right-
hand side.
212
6. Dividing both sides by ∆t, you should get a vector on the right-hand side
which does not involve ∆t:
∆γ
∆t
=
γ(t + ∆t) − γ(t)
∆t
. (533)
Now taking the limit ∆t → 0, we can define:
9γ(t) = lim
∆t→0
∆γ
∆t
. (534)
If you’ve done this exercise correctly, you should see that:
9γ(t) =
dγ
dt
(t) =
dx
dt
e1 +
dy
dt
e2 +
dz
dt
e3. (535)
7. In your diagram, let ∆t = t1 − t0. If you imagine the limit t1
−→+t0 (equiva-
lently, ∆ → 0) you will see that the displacement vector ∆γ = γ(t1)−γ(t0)
approaches a vector which is parallel (tangent) to the curve at t0. Hence the
‘velocity vector’ 9γ(t0) and tangent vector to the curve at t0 are the geomet-
rical object.
If we now wish to consider integrals along an arbitrary curve γ, we need an ‘orient-
ing 1-form’ along the curve. Recalling to integral of a function f of one variable
along the x-axis, we had dx as our orienting 1-form. For a curve γ, the orient-
ing 1-form will geometrically correspond to an infinitesimal displacement along
the curve. From the previous exercise, we already know that infinitesimal changes
along a curve γ, are represented by the tangent (velocity) vector field dγ
dt . In par-
ticular, the magnitude (norm) of the tangent vector field:
9γ(t) =
™
(
dx
dt
)2 + (
dy
dt
)2 + ..., (536)
is simply the rate of change (gradient, slope) of the curve at t in the direction
of increasing t. Hence, an infinitesimal displacement dl along the curve is given
by:
dl = dγ =
dγ
dt
(t) dt (537)
This is simply a consequence of the chain rule – or in geometric terms, a conse-
quence of the graph you drew in the previous exercise. The object dl given by
(537), is the orienting 1-form for the curve γ. It is also known as the ‘line-element’
along the curve – as such, it is a generalization of the orienting 1-form dx along
213
the x-axis, considered earlier. Note that the structure of the equation (537) takes
the form:
dl = (Some Factor) × (Infinitesimal Change in Some Parameter). (538)
Here our curve γ was parametrised by t, with the corresponding 1-form dt. The
factor appearing in front of dt was given by the magnitude dγ
dt (t) of the tangent
(velocity) vector to the curve γ at the point t. Such a factor is called ‘scale factor’ –
usually denoted hγ. A scale factor serves to turn the 1-form dt into an infinitesimal
length dl. As such, one may assign the scale factor units of length and treat dt as
dimensionless151. Furthermore, dl encodes an orientation for the curve γ – positive
in the direction of increasing t. If we defined dl as the negative of dγ
dt (t) dt, we
would get the reverse orientation.
Having constructed a way to get infinitesimal length elements along arbitrary curves
(along with scale factors), one may generalize this to obtain infinitesimal area el-
ements for smooth, orientable surfaces. Later we will look at the notion an ‘ori-
enting 2-form’ – an object which encodes both the orientation of a surface, along
with an infinitesimal notion of ‘area’ along that surface. This allows us to perform
integration on surfaces and hence define measures on them (such as the ’surface
area’).
A surface is a 2-dimensional manifold. This means that it can be parametrised
by two variables, (s, t). For most problems, you will look at surfaces which are
embedded in R3 – as such, their graphs will be specified by a set of 3-coordinates
parametrized by two variables: (x(s, t), y(s, t), z(s, t)). For now, note that when
we change coordinates (x, y) from Cartesian coordinates to another set of coordi-
nates s, t, the infinitesimal area element dx ∧ dy (orienting 2-form) has to change
also:
dx ∧ dy → |J|ds ∧ dt. (539)
Here, the quantity |J| is the Jacobian determinant. For orthogonal coordinate sys-
tems, this is simply equal to the product of the scale factors in each direction ds
and dt:
|J|= hsht. (540)
It arises because in Cartesian coordinates, dxdy represents the area of an infinites-
imal rectangle. However, in a new coordinate system (s, t), the element dsdt may
not represent an infinitesimal area (for example, it may have the wrong units) –
hence to turn dsdt into an area, we need to multiply it by an appropriate scaling
151
Alternatively, if 9γ is a physical velocity, one would assign units of length/time to its magnitude
and assign units of time to dt.
214
funtion (the Jacobian determinant). Another way to look at this, is to note that an
infinitesimal area in the new coordinate system is given by the formula:
dls ∧ dlt = hsds ∧ htdt = hshtds ∧ dt. (541)
where dls and dt are defined as above. We will investigate this in more detail in a
future tutorial, but for now it suffices to remember the relation (540) between the
Jacobian determinant and the coordinate scale factors.
To get some operational understanding of orienting forms and how they arise in
different coordinate systems, consider the following example.
Example 22 (Parabolic Coordinates) In two-dimensions, instead of using Carte-
sian coordinates (x, y), one may choose to use parabolic coordinates. Parabolic
coordinates are useful for problems with some sort of parabolic symmetry – for
example, investigating the ‘Stark effect’ (splitting of the spectral lines of an atom
in a strong electric field).
Parabolic coordinates 152 (σ, τ) are a two-dimensional orthogonal coordinate sys-
tem, in which the coordinate curves are parabolas. Such coordinates are defined
implicitly as follows:
x = στ, y =
1
2
(τ2
− σ2
). (542)
Eliminating τ, we see that curves of constant σ correspond to confocal parabolas
(parabolas with the same focus) opening upward in the positive y direction:
y =
1
2σ2
x2
−
1
2
σ2
. (543)
Similarly, eliminating σ, curves of constant τ correspond to confocal parabolas
opening downward in the negative y direction:
y = −
1
2τ2
x2
+
1
2
τ2
. (544)
We now wish to derive the scale factors, hτ , hσ, corresponding to infinitesimal dis-
placements dτ and dσ in the τ and σ directions, respectively. To do this, consider
a curve along the τ coordinate – meaning that we keep σ constant. We can write
this curve as
γ(t) = (x(t), y(t)) = (στ(t),
1
2
((τ(t))2
− σ2
), (545)
152
Note that τ is the greek letter ’tau’ - not the variable t.
215
where τ = τ(t) is a function of the parameter t. Recalling our expression, dl =
hγdt = dγ(t)
dt dt for an infinitesimal displacement along a curve γ, we can work
out the scale factor hτ for the τ coordinate by getting the magnitude of a vector
tangent to the τ coordinate curves. Mathematically, we have:
dγ
dt
= (σ
dτ
dt
, τ
dτ
dt
), (546)
using the product rule and the fact that σ is constant along the τ coordinate curves.
Hence, the magnitude of this vector is given by:
dγ
dt
=
™
(σ2 + τ2)(
dτ
dt
)2 =
—
σ2 + τ2
dτ
dt
. (547)
Hence, using the chain rule, we see that the infinitesimal length dlτ in the τ direc-
tion is given by:
dlτ =
dγ
dt
dt =
—
σ2 + τ2
dτ
dt
dt =
—
σ2 + τ2dτ. (548)
The coefficient of dτ is identified to be the scale factor hτ corresponding to τ.
Hence we have:
hτ =
—
σ2 + τ2. (549)
By considering the σ coordinate curves (curves of constant τ), we can derive the
scale factor hσ in the same way. The result is:
hσ =
—
σ2 + τ2 = hτ . (550)
With these results in mind, the orienting area 2-form in parabolic coordinates is
given by:
dA = |J|dσ ∧ dτ = hσhτ dσ ∧ dτ = (σ2
+ τ2
)dσ ∧ dτ. (551)
The Jacobian determinant |J|= hσhτ = (σ2 + τ2) represents how the notion of
area is warped in a Parabolic coordinate system. 153
Using the previous example as a template, consider now your familiar and well-
loved 2-dimensional polar coordinates (r, θ).
153
Note that areas themselves are geometrical quantities. Therefore, they do not depend on the
choice of coordinate system. How we measure and compute areas however, does change.
216
Exercise 74 (Arctic Renaissance) During a performance of the guitar orchestra
piece ‘Arctic Renaissance’, a lost bipolar polar bear wanders into the St. George’s
College Mathematical Sciences Tutorials. It turns out that the polar bear is lost
because it did not take into account the scale factors in a polar coordinate system
– thus grossly miscalculating its journey.
I: To help the bipolar bear find his way home, consider the change of variables
(x, y) → (r, θ) defined by:
x = r cos(θ), y = r sin(θ). (552)
Now derive the scale factors hr and hθ for each of the coordinate curves, r and θ.
II: Draw a picture illustrating the relation between dlθ = hθdθ and dθ. You should
see that dlθ is simply the formula for the length of an infinitesimal circular arc.
III: Compute the orienting area 2-form for polar coordinates: dA = |J|dr ∧ dθ.
Illustrate this with a diagram showing the area of an infinitesimal circular wedge
(actually, an incomplete annulus).
IV: For those of you who have studied Jacobian maps (matrices), compute the
Jacobian matrix for the change of variables (x, y) → (r, θ). Now show that its
determinant is indeed given by |J|= r.
Hint: Ask your tutor for help!
Exercise 75 (Temporal Epilepsy) Having heard about the successful return of
the bipolar polar bear to his homeland, an epileptic ellipse named ’Eclectic’ walks
into the SGC Mathematical Sciences Tutorial. It turns out that ellipses epilepsy
comes from having the incorrect scale factors for elliptical coordinates programmed
into its DNA (a result of natural radiation-induced mutations). To help Eclectic,
consider the an elliptical coordinate system (µ, ν) defined by:
x = a cosh(µ) cos(ν), y = a sinh(µ) sin(ν). (553)
I: Show that the µ coordinate lines (curves of constant ν) form hyperbola. You can
do this by eliminating ν from the above equations, as well as using the identity:
cosh2
(µ) − sinh2
(µ) = 1. (554)
Similarly, show that the ν coordinate lines (curves of constant µ) form ellipses.
Hint: For the µ coordinate lines, you should arrive at the equation:
x2
a2 cos2 ν
−
y2
a2 sin2
ν
= cosh2
µ − sinh2
µ = 1. (555)
217
Note that these ellipses and hyperbolae are confocal – i.e. they have common focii
located at x = −a and x = a on the x-axis.
II: Using the same approach as the example in Parabolic coordinates, derive the
orienting 1-forms dlν and dlµ. From these, work out the scale factors hν and hµ
for the elliptic coordinate system.
Hint: You should get:
hµ = hν = a
˜
sinh2
µ + sin2
ν = a
˜
cosh2
µ − cos2 ν. (556)
III: Using the scale factors derived, compute the orienting area 2-form for ellipti-
cal coordinates:
dA = |J|dµ ∧ dν = hµhνdµ ∧ dν. (557)
If you can, draw a diagram illustrating the infinitesimal area element (in a manner
similar to what you did for the polar coordinates)
IV: For those of you who have studied Jacobian matrices, show that the deter-
minant of the Jacobian matrix for the transformation: (x, y) → (µ, ν) is given
by:
|J|= a2
 
sinh2
µ + sin2
ν
¨
. (558)
In the next tutorial, we will see how these results can be used to perform line
integrals (integrals along curves), surface integrals (integrals along surfaces) and
integrals of over arbitrary submanifolds of Rn (e.g. Volume integrals in R3). In this
manner, we will formalize the notion of the ‘orienting area 2-form’ and generalize
it to give an ‘orienting volume n-form’.
218
24 Tutorial 22: Line Integrals and Exterior Calculus
In the last tutorial, we reviewed the concept of tangent vector fields in their relation
to the coordinate curves generating a coordinate system. In particular, we saw sim-
ple geometric considerations could help us to understand how ‘scale factors’ arise
in morphing infinitesimal length elements from one coordinate system to a new
coordinate system. Such scaling functions appeared as the norms (magnitude) of
the tangent vector fields associated to the coordinate curves in the new coordinate
system.
In particular, given a change of variables between Cartesian coordinates and some
new orthogonal coordinate system, (x, y, z) → (u1, u2, u3), our infinitesimal length
elements change from (dx, dy, dz) to (h1du1, h2du2, h3du3) where the functions
hj (j = 1, 2, 3) are the ‘scale factors’ for the transformation. Such scaling fac-
tors turn infinitesimal changes in the coordinates (du1, du2, du3) into infinitesimal
length changes (dl1 = h1du1, ..., dl3 = h3du3). As we saw, they can computed as
the norms of the tangent vector fields to the coordinate curves – viewing u1, u2, u3
each as a curve parametrized by a single variable:
hj = 9uj
=
™
(
∂x
∂uj
)2 + (
∂y
∂uj
)2 + (
∂z
∂uj
)2. (559)
Here we have represented the coordinate uj by the curve: γuj = (x(uj), y(uj), z(uj)
which is parametrized by uj.
In this tutorial, we will see how to make use of line-elements and tangent vector
fields to perform integrations along smooth curves (so-called ‘line-integrals’). We
will then set up some basic results from exterior calculus to give a natural extension
of these ideas to surface integrals and integrals over arbitrary manifolds (given
some atlas / set of coordinate charts).
24.1 Exterior Product and Derivatives
Recall that given the standard basis (∂x, ∂y, ∂z) for R3 (unit vectors tangent to the
x, y and z coordinate curves), one has the corresponding dual basis (dx, dy, dz)
consist of dual vectors (differential 1-forms). Since (dx, dy, dz) is an orthonormal
basis (with respect to the Euclidean metric), we can write any differential 1-form
ω as:
ω = adx + by + cdz, (560)
219
where a, b, c are some unique set of coefficients (the components of ω in the basis
(dx, dy, dz)). By their definition, the dual vectors obey the axioms of a vector
space or ‘linear space’. For example, linearity:
λ1(adx + bdy + cdz) = λ1adx + λ1bdy + λ1cdz (561)
and
(a1dx+b1dy+c1dz)+(a2dx+b2dy+c2dz) = (a1+a2)dx+(b1+b2)dy+(c1+c2)dz,
(562)
where λ1, aj, bj and cj are all scalars (constants).
Now recall that we defined the exterior (wedge) product ∧ of as binary operation on
the space of differential 1-forms. It satisfied the antisymmetric property:
ω ∧ η = −η ∧ ω, (563)
as well as their (bi)linearity property:
hω ∧ (fη + gβ) = fhω ∧ η + ghω ∧ β, (564)
where ω, η, β are differential 1-forms and f, g, h are functions. Note that the jux-
taposition fh denotes the multiplication of two functions – which is defined point-
wise: (fh)(r) = f(r)h(r) for r = (x, y, z) ∈ R3.
Given two differential 1-forms, ω and β, their exterior product, ω ∧ β, is a differ-
ential 2-form. Under usual ‘vector addition’, the space of differential 2-forms on
R3 is also a linear space – meaning, that the addition of differential 2-forms is a
linear operation. A basis for this linear space is given by the following differential
2-forms:
dx ∧ dy, dy ∧ dz, dz ∧ dx. (565)
Example 23 (A Sound Basis) To show that the previous basis is indeed a basis for
the linear space of differential 2-forms in 3-dimensional Euclidean space, we note
that the exterior product of two differential 1-forms is necessarily a differential
2-form (possibly zero). Therefore, if we take two arbitrary differential 1-forms,
compute their exterior product and simplify the result, we should be left with:
Some Differential 2-form = Coefficients × Some Basis Differential 2-form.
(566)
220
So in particular,using linearity, we note that the basis for differential 2-forms is
necessarily generated by taking exterior products of the basis differential 1-forms.
Looking at all possibilities, we have:
dx ∧ dx = 0, dx ∧ dy, dx ∧ dz = −dz ∧ dx, (567)
dy ∧ dx = −dx ∧ dy, dy ∧ dy = 0, dy ∧ dz, (568)
dz ∧ dx, dz ∧ dy = −dy ∧ dz, dz ∧ dz = 0. (569)
Here we have made use of the anti-symmetry property of the exterior product, as
well as its consequence: ω ∧ ω = 0 for any exterior form ω. Thus, we up to a ±
sign (which we can discard), we are left with three unique possibilities:
dx ∧ dy, dy ∧ dz, dz ∧ dx, (570)
as a basis for the linear space (denoted henceforth) Λ2(R3), of differential 2-forms
on R3. This means that given some differential 2-form ω, we can write it as:
ω = ω1dx ∧ dy + ω2dy ∧ dz + ω3dz ∧ dx, (571)
where ωj are the components (functions – possibly constant) of ω in the standard
basis for Λ2(R3).
Problem 36 (Supreme Commander) Once upon a midnight dreary, while he pon-
dered weak and weary over the loss of his exterior bases in Supreme Commander,
the Senior student-to-be, Zac Menschelli, nodded nearly napping. Suddenly there
came a tapping, as of someone gently rapping, rapping on his chamber door – it
turned out to be Georgie, the college raven. Having attended the mathematical
sciences study group, the Georgie tells Zac he needs to construct new bases at a
new set of coordinates.
Choosing spherical coordinates (r, θ, φ), defined implicitly via:
x = r cos(θ) sin(φ), y = r sin(θ) sin(φ), z = r cos(θ) cos(φ). (572)
Here r is the radial coordinate (0 ≤ r ≤ ∞), θ is the longitudinal angle (0 ≤ θ ≤
2π) and φ is the azimuthal angle (0 ≤ φ ≤ π).
I: Help Zac construct a basis for Λ1(R3), the space of differential 1-forms on R3,
in spherical coordinates. Note that there are two ways you can do this.
1. Write down the ‘obvious’ basis.
221
2. Derive the basis, first by starting with the fact that dx, dy, dz is a basis –
then using the total differential formula (exterior derivative of a function):
df(r, θ, φ) =
∂f
∂r
dr +
∂f
∂θ
dθ +
∂f
∂φ
dφ, (573)
to explicitly evaluate dx, dy and dz explicitly in terms of r, θ, φ and dr, dθ, dφ.
II: Using a similar argument to the previous example, derive a basis for Λ2(R3),
the space of differential 2-forms on R3, in spherical coordinates. Alternatively,
start with the basis {dx∧dy, dy∧dz, dz∧dx} and do a ‘change of variables’ – i.e.
substitute in your expressions for dx,dy and dz in terms of spherical coordinates.
Challenge III: Generalizing what you have done so far in what you think is the
‘most sensible’ way, write down a basis for the space of differential 3-forms on
R3. In particular, What does this look like in Cartesian coordinates? What about
Spherical coordinates?
IV: Try and write a basis for the space of differential 4-forms on R3. If you have
trouble doing this, try to construct a differential 4-form which is non-zero. What is
the obstruction?
24.2 Orienting Volume Forms
Finally, it remains to introduce the last set of differential forms that exist over
R3. In particular, if we take the exterior product of three differential forms or a
differential 2-form and a differential 1-form, we obtain a differential 3-form. Note
that differential forms of higher degree on R3 are all necessarily zero. This is
a consequence of the anti-symmetry property – meaning that each basis 1-form,
dx, dy and dz can only appear once in a set of consecutive exterior products. To
see this, note that if a differential form ω appears more than once in a chain of
exterior products, one can always permute the chain (possibly picking up a ± sign)
so that it has the form: “... ∧ ω ∧ ω ∧ ...”, with ω ∧ ω = 0 collapsing the whole
chain.
Similar to the construction of the basis for Λ2(R3) we can construct a basis for
Λ3(R3) by seeing what survives when we take all possible combinations of exterior
products of the bases for Λ2(R3) and Λ1(R3). If you did this correctly in the
previous problem, you will have found that the only surviving differential 3-form
is:
dx ∧ dy ∧ dz, (574)
222
(up to a ± sign or some permutation of x, y, z). Note that so far, we have always
chosen the cyclic convention: x → y → z → x when ordering our differential 1-
forms – this is intentional as it corresponds to the choice of a ‘right-handed orien-
tation’ on our vector space, R3. Such an orientation is the ‘standard orientation’
for 3-dimensional Euclidean space. In particular, for R3 we have that:
1 = dV = dx ∧ dy ∧ dz, (575)
is the ‘orienting volume 3-form’. By order of the differential 1-forms dx, dy, dz,
it defines an orientation on R3. Furthermore, geometrically, one may think this as
representing an infinitesimal cube (or parallelipiped) with sides of length dx, dy
and dz – giving it dimensions of ‘volume’ (Length3).
In general, for an n-dimensional inner-product space (vector space equipped with
an inner-product or metric tensor) such as Rn, one can equip it with an orienting
volume n-form. This gives the space an orientation, as well as a way to measure
‘volumes’ – the orienting volume appears when performing volume integrals on
the space. It also acts a basis for forms of the highest degree on that space – any
differential n-form must be some scalar multiple of it.
Now that we have established the idea that differential p-forms (where p is some
integer) behave like ‘abstract vectors’ in a linear space (under ‘addition’), we can
explore the relation between exterior products, differential forms and the vector
calculus we already know and love.
Exercise 76 (Choice of Orientation) When questioned about her Orientation dur-
ing ‘true colours’ week, a spherical coordinate system is caught-off guard. Real-
izing that she has lost her orienting volume 3-form, she decides to construct a new
one.
I: If you are (left)right-handed, construct a (left)right-handed orienting volume
form in spherical coordinates. [If you are genuinely ambidextrous, construct two
oppositely oriented coordinate systems with each hand, simultaneously (writing
with two pens at once)]. You can do this by computing dx, dy, dz in terms of
spherical coordinates (r, θ, φ) and the spherical basis 1-forms: (dr, dθ, dφ), then
substituting your results into the expression dV = dx∧dy∧dz (for the right handed
system). Alternatively, for the left-handed system, use dVLefty = −dx ∧ dy ∧ dz.
II: Now, use your results from Tutorial 21 to construct the orienting volume 3-form
via the expression:
dV = hrdr ∧ hθdθ ∧ hφdφ = |J|dr ∧ dθ ∧ dφ. (576)
223
This requires knowing / deriving the scale factors hj, or the determinant J of the
Jacobian matrix. If you did your math correctly, you should see that this is precisely
the same as the volume-form you constructed in part I.
24.3 Duality and Orthogonality
By now, you should have noticed that the exterior product between differential
forms, behaves very similarly to the cross-product (vector product) between vec-
tors. In particular, they are both ‘antisymmetric’ and ‘bilinear’ operations. How-
ever, there are some fundamental differences. The first obvious difference is that
the exterior product obeys the associative law:
(ω ∧ β) ∧ η = ω ∧ (β ∧ η), (577)
where as the cross-product is not associative – in general,
(u × v) × w = u × (v × w). (578)
Exercise 77 (Counterstrike) During an evening sesh of counterstrike, Steven Meek
decides to take out his rage over his 1ms lack of reaction time (leading to his
death), by constructing a counter-example to the (incorrect) statement that ‘the
cross-product is associative’.
Not wanting to be outdone by Big Dog, can you also construct a counter-example?
The second major difference is that the cross-product maps two vectors to another
vector – an output which is the same ‘type’ of object as the input. The exterior
product of two differential 1-forms however, takes two differential 1-forms and
turns them into a differential 2-form – hence the output is a different ‘type’ of
object to the input. Yet, as following exercise should illustrate, the coefficients /
components that appear in both the cross-product and exterior product are ‘simi-
lar’154 ...
Exercise 78 I: Given the vectors v = a1∂x + b1∂y + c1∂z and u = a2∂x + b2∂y +
c2∂z, compute their cross product and simplify the result:
v × u = (....)∂x + (...)∂y + (.....)∂z. (579)
154
Spoiler Warning: Identical.
224
II: Now, turn these vectors into dual vectors (differential 1-forms) by replacing the
standard basis vectors with the basis differential 1-forms: v = a1dx+b1dy+c1dz,
u = a2dx + b2dy + c2dz. Compute the exterior product between these 1-forms:
v ∧ u = (....)dx ∧ dy + (....)dy ∧ dz + (.....)dz ∧ dx. (580)
III: Compare the coefficients of the standard basis vectors appearing in your cross-
product to the coefficients of the standard basis 2-forms in your exterior product.
What do you notice?
In the last exercise, you should find that the components (coefficients) appearing
the cross and exterior products are exactly the same. Why is this the case? To
make sense of this, note that the cross product v × u produces a new vector which
is orthogonal to both v and u – its magnitude is equal to the area spanned by the
parallelogram formed by v and u. However, those of you who remember your
rules of dimensional analysis will quickly point out that if your vectors are not
dimensionless (e.g. physical vectors that have length), then their cross product
produces a vector which has different units to the input vectors!
Now comes a strong peculiarity. If we ‘reflect’ the input vectors v, u about some
plane, then take their cross-product, the resulting vector is different to the one we
obtain by first taking the cross-product, then reflecting the result v × u! There-
fore, the vector produced by a cross-product is not preserved by reflections – even
though it is preserved by rotations. This means it is not a ‘true’ vector in the geo-
metric sense – it is in-fact a pseudo vector.
Making the same considerations with the exterior product, one should quickly see
that the exterior product of two differential 1-forms does not suffer the same tech-
nical peculiarities as the cross-product of two vectors.
Problem 37 (Gendanken) By drawing a diagram and considering the reflections
described above, illustrate the fact that the cross-product of two vectors is a pseudo-
vector and not a true vector.
Now show geometrically that the exterior product does not suffer this. Note that to
this extent, you can represent a differential 2-form, for example dx ∧ dy, as either
x-y plane or an infinitesimal paralleliped with sides dx and dy.
The discrepancy between cross and exterior products is resolved with mighty Hodge
Dual Operator, which we denote by – also known as the ‘Hodge star’.
Conceptually speaking, the problem with the cross-product comes from the fact
that lines and planes are ‘dual’ in 3-dimensions. In particular, note that we can
225
represent a plane by two linearly-independent vectors tangent to that plane (a basis
for the 2-dimensional vectors space described by that plane if it contains the origin
0), or we can equivalently describe it by a vector (or line) normal to the plane –
i.e. a vector orthogonal to the two vectors tangent to the plane. Hence, in essence,
the information which encodes that an object is a ‘plane’, may come as a pair of
linearly-independent tangent vectors, or as a single normal vector! Since a normal
vector can be used to generate a normal line, this shows that lines and planes are in
some sense ‘dual’ to each other in 3-dimensions. Such a duality is formalized with
the ‘Hodge dual’ operator ∗.
In general, for differential forms on Rn, the hodge dual turns a differential p-form
ω, into a differential (n − p) form, ω. Thus in particular, for n = 3, one sees that
the standard Cartesian basis differential 2-forms (representing planes) are dual to
the basis differential 1-forms (representing lines).
Exercise 79 (Lines and Planes in 3-dimensions (BFFs)) The hodge dual ω of a
differential form ω, is defined implicitly via the relation:
ω ∧ ω = 1, (581)
where 1 = dV = dx ∧ dy ∧ dz is the orienting volume 3-form on R3.
To the best of your ability, use this definition to compute the following hodge duals:
1. dx
2. dy
3. dz
4. (dx ∧ dy)
5. (dy ∧ dz)
6. (dz ∧ dx)
7. (dx ∧ dy ∧ dz).
8. 1.
What do you notice about the relation between differential 1-forms and 2-forms?
What about differential 3-forms and 0-forms (constants / functions such as ’1’)?
If you have reached this point, try completing the same exercise except with Spher-
ical Coordinates.
Now try this for Polar Cylindrical Coordinates.
226
Now try this exercise for Ellipsoidal Coordinates.
Now try this for Paraboloidal Coordinates.
In the next tutorial, we will see how to exploit these ‘abstractions’ of vector cal-
culus to our advantage in the computation of contour, surface and volume inte-
grals.
227
25 Tutorial 23: Serendipity and Integration
Problem 38 (Conceptual Puzzle) Consider a borderless (infinite) pool table155.
Placing the white ball in its starting location, is possible to shoot the other balls
such that the black ball ends up exactly where the white ball started? Note the
following restrictions and assumptions.
• The pool table is frictionless, so momentum is conserved.
• You are restricted to 2-dimensional motion (no bouncing over other balls).
• You can ignore spin and any sources of energy loss – hence all momentum
changes are linear.
If your answer is ‘yes’, then you need to provide a geometric arrangement that
solves this problem. If your answer is ‘no’, then you need to provide physical
principles for it is not possible.
25.1 Introduction
Oftentimes in the process of scientific research, one may, in the quest to solving
one particular problem, discover or invent something new. This discovery may or
may not be related to the initial problem one was trying to solve, but it is important
and interesting in its own right. Such a process is known as ‘serendipity’ and is
common to all pursuits of exploration and knowledge.
In this tutorial, we will investigate some easy-to-understand consequences of re-
search that has arisen in attempts to better the mathematical structure of Quantum
Field Theory. In particular, we will look at some new analytical integration tech-
niques that have been developed by Achim Kempf156, David M Jackson and Ale-
jandro H Morales in their attempts to mathematically formalise ‘path integrals’ in
quantum field theory 157.
155
Billiards Table.
156
The ideas for this tutorial arose from personal conversation with Achim, to which we are grateful.
157
Original paper can be foudn here http://iopscience.iop.org/1751-8121/47/
41/415204/pdf/1751-8121_47_41_415204.pdf.
228
25.2 Differentigration
Differentiation is in general, an easy algorithmic process. Expressions may get
tough and unweildly, but at the end of the day, one can usually follow a set of
straight rules to arrive at an answer. As such, it relatively easy to program a com-
puter to differentiate 158. Integration on the other hand, is much less straightfor-
ward. Indeed, there are many numerical methods devoted to integration for this
reason!
Therefore, it may come as a surprise to know that you can turn the process of
integration into one of differentiation. This sounds great! Of course, such surprises
always come with some caveats and limitations. As it turns out, this ‘trick’ only
works for ‘analytic functions’ – that is, functions which have a convergent power
series representation.
Theorem 2 Given a function f : R → R which has a convergent series expansion,
the following representation for its integral holds:
x
0
f(x )dx = f(∂y)(
exy − 1
y
)|y=0. (582)
The expression f(∂y) is the function f(x) with the variable x replaced by the par-
tial derivative operator, ∂y := ∂
∂y . For non polynomial functions, we can evaluate
the right-hand side by expanding f(x) as a power series (e.g. Taylor series) and
replace the argument x with ∂y. The resulting series of differential operators acts
on everything to the right, (exy−1
y ), which we then evaluate at y = 0 (after differ-
entiating).
This may seem an odd way to integrate, but perhaps not so odd when you look
at it as a consequence of a more general identity arising from consideration of
Fourier and Laplace transforms. To get some intuition, we proceed with an exam-
ple.
Example 24 (The Immeasurable Man) Having invented a time-travel machine
to avoid capture by the Roman army besieging Syracuse, the great geometer –
Archimedes, travels to St. George’s College for help. In particular, Archimedes
finds that he cannot integrate with the Roman culture – nor can he integrate func-
tions on the real line. However, he does understand preliminary concepts of differ-
158
An exception is when dealing with ‘special functions’, which may or may not have a series
representation.
229
entiation – meaning, one can teach him to evaluate the following integral159:
x
0
zdz. (583)
Using Achim’s tricks, we have the following identity:
x
0
zdz = f(∂y)(
exy − 1
y
)|y=0, (584)
where f(z) = z. Thus, f(∂y) = (∂y) = ∂
∂y . Collecting these statements we have
x
0
zdz =f(∂y)(
exy − 1
y
)|y=0
=
exy(xy − 1) + 1
y2
|y=0
= lim
y→0
exy(xy − 1) + 1
y2
|y
=
xexy(xy − 1) + xexy
2y
|y=0, L’Hopital’s Rule
=
x2exy
2
|y=0
=
x2
2
. (585)
Indeed, this is the same answer that one would obtain via Riemann integration.
You may wonder, what benefit such a technique offers given the effort required to
evaluate an integral as simple as
x
0
zdz. The answer is two-fold. The first reason is
that an extension of this trick may be used to evaluate ‘improper integrals’ and/or
‘contour integrals’. Contour integrals require significant finesse in order to choose
the correct integration paths (contours) soas to make use of the pole structure of
f(z) and results from residue calculus (Cauchy’s theorem).
Exercise 80 (Bored Beyond Measure) Unfortunately, Archimedes found the last
example boring. This is because he already knew how to compute the area of a
triangle. To make things more interesting, we now show him how to compute area
159
Archimedes could do this geometrically anyway.
230
below a parabola (or equivalently, bounded by a parabola) using our integration
trick. To this extent, compute the following integral:
x
0
f(z)dz = f(∂y)(
exy − 1
y
)|y=0, (586)
where f(z) = z2. This is the area under the parabola bounded by the horizontal
axis, z = 0 and z = x.
Hints: Note that you will have to differentiate twice (using the product rule). After
differentiating, you will have to use L’Hopital’s (Bernoulli’s) rule to evaluate the
limit y → 0. In-fact, you will have to use L’Hopital’s rule three times. Alterna-
tively, you can play around with series expansions and limit identities if you don’t
like L’Hopital’s rule.
At this point, the integrals performed so far may seem relatively trivial – after all,
we have only integrated polynomials. Consider now, the following example, which
makes use of Taylor series!
Example 25 (Tailored Functions) Having travelled to Singapore to get a cheap,
good quality tailored suit, Archimedes now travels back to St. George’s College to
get a Taylor series for the exponential function. He asks William Cheng to provide
such a series. With some probability p, 0 ≤ p  1, William provides the following
(correct) Taylor series for the exponential centred around x = 0:
ex
=
xn
n!
. (587)
Archimdes wishes to use this to derive an expression for:
x
0
f(z)dz (588)
where f(z) = eaz and a is some constant. Using our trick, we have:
x
0
f(z)dz =f(∂y)(
exy − 1
y
)|y=0
=e∂y
(
exy − 1
y
)|y=0, (589)
which requires expanding the differential operator, ea∂y , as a Taylor series then
acting it on everything to the right of it. Rather than getting into a lot of mess,
231
recall the following definition for the Taylor series of a function g(y) centred at
y = a:
g(y) =
∞
n=0
¢
dn
dyn
g(y)

|y=a
(x − a)n
n!
, (590)
where gn(a) = ( dn
dyn g(y))|y=a are called the ‘Taylor coefficients’ of for the Taylor
series of g(y) centred at a. Therefore, the action of differential operator e∂y on
some function g, evaluated at y, is given by:
ea∂y
g(y)|y=
∞
n=0
an
n!
∂n
∂yn
g(y)|y=0
=
∞
n=0
(
∂ng(y)
∂yn
)|y
an
n!
. (591)
Comparing this to the Taylor series of g(y + a) centred at y, we have:
g(y + a) =
∞
n=0
(
∂ng(y)
∂yn
)|y
(y + a − a)n
n!
, (592)
which is identical! Therefore, in general, we see that:
ea∂y
g(y)|y= g(y + a). (593)
Therefore, we see that the differential operator ea∂y |y acts on an arbitrary differen-
tial function g(y), to translate its argument by a.
Aside: Recalling back to our earlier study of Lie groups and Lie algebras, this
is because the operator ∂y is a basis element in the Lie algebra of translations.
Therefore, taking its exponential gives a corresponding Lie group element, ea∂y ,
which is a member of the Lie group of translations (a symmetry group). Note that
in these Lie groups and algebras act on the ‘ring of smooth functions’ on R, as
opposed to a vector space.
For practical purposes however, it suffices to remember that
ea∂y
g(y) = g(y + a). (594)
232
Applying this to our integral, we see that
x
0
eaz
dz =ea∂y
(
exy − 1
y
)|y=0
=e∂y
(
exy − 1
y
)|y=0
=(
ex(y+a) − 1
y + a
)|y=0
=
1
a
eax
, (595)
as expected!
Problem 39 (Sinus Problems) Having not adapted to the pollen in Western Aus-
tralia, Archimedes develops intense hayfever and sinus problems over spring. After
some inspiration, Archimedes develops a cure for his hayfever through mathemat-
ical biology. However, in this process, he needs to evaluate the integral:
x
0
cos(ax)dx, (596)
using our differentiation-integration trick. Help Archimedes modify his gene ex-
pression by solving this problem.
Hint: Use the fact that cos(ax) = eiax+e−iax
2 in conjugation with our previous
identity, eayg(y) = g(y + a), to evaluate the action of the differential operator
cos(a∂y)|y=0 on exy−1
y .
Exercise 81 (Differentiation by Parts) Integration ‘by parts’, is a trick that es-
sentially relies on two things:
• The product (Leibniz) rule for differentiation.
• The fundamental theorem of calculus (or Generalized Stokes’ Theorem).
In particular, for two functions f, g, we have the mnemonic:
fdg = fg|− gdf, (597)
where | specifies the integration bounds. This comes from:
d(fg) = fg|, (598)
233
then expanding the left-hand side via the product rule (note that d is the exterior
derivative).
Some integrals such as:
x
0
x2
eax
dx, (599)
can therefore be solved recursively, via integration by parts.
I: Solve the afore mentioned integral using integration by parts . II: Solve this
integral Achim’s differentiation-integration trick which we have studied so far.
25.3 Quantum Field Theory Aside
The second reason that we may consider our integration ‘trick’ is on theoretical
grounds. In particular, recall Young’s ‘double slit’ experiment. If we shoot an elec-
tron at a plate with a single slit, followed by a fluorescent screen, we will observe a
single dot on the screen where the electron hits. In this set-up, the electron behaves
like a classical particle. Repeating this experiment for electrons fired with the same
kinetic energy, we will build up uniform pattern on the fluorescent screen.
Now, if we replace our single slit with a narrowly separated double slit and repeat
our experiment, instead of building up a uniform distribution on the screen, we will
observe an inference pattern. In particular, there will be areas of minimum inten-
sity and areas of maximum intensity – something we would expect if the electron
was a ‘wave’. There is nothing spooky about this result, in-fact, it is simply a con-
crete illustration of the ‘matter-wave’ duality of particles as explained by quantum
mechanics.
It appears then, that if we choose to measure the electron as a particle, we will
observe particle behaviour. If we choose to observe it as a wave, then we see
wavelike behaviour – in essence, this choice is made by having a ‘double-slit’
or ‘single-slit’. Using the interference pattern, we can construct a ‘probability
density’ for the electron striking the screen and associate its path through each slit,
with some probability amplitude.
If we now consider what happens when we have two double slits, in succession,
then we have four choices of paths for the electron to hit the screen. We multiply
successive probabilities to get the probability of each path taken.
Now, consider as Feynmann did, if we have infinitely many successive double slits.
More so, not just a countable infinity, but rather, one double slit at each point in
234
space – an uncountable infinity of double slits. In this manner, we see that there
are an uncountable number of paths for the electron to travel – and an uncountable
number of probabilities to multiply. To deal with this sort of ‘continuous infinity’,
we have a familiar tool – integration!
As it turns out, the above thought experiment leads to a ‘path integral formulation’
of quantum mechanics. In this formulation, we don’t view electrons as waves –
rather, we treat them as particles and obtain probabilities by doing an uncountable
infinity of integrations. In particular, we perform an integral at every point along
the path the particle travels in spacetime, then consider every possible path the
particle can take in spacetime. Although such an approach gives results that agree
with traditional quantum mechanics, it turns out that this ‘path integral’ is not a
well-defined mathematical quantity.
This problem of not well-defined path integrals persists in the next stage of theo-
retical physics – quantum field theory. Despite many attempts by generations of
mathematicians and physicists, the path integral is still not a well-defined or well-
understood mathematical quantity. On the other hand, it yields results which can
be experimentally measured to extreme accuracy. Therefore, by turning integra-
tion into differentiation – using tricks as that outlined earlier, one arrives at some
interesting possibilities for constructing a well-defined path integral.
25.4 Generalizations
The results covered so far, can be easily generalized to arbitrary finite intervals. In
particular, note that:
b
a
f(x)dx =
b−a
0
f(x + a)dx. (600)
In this manner, we can change our lower integration limit from 0 to any finite real
number. Alternatively, we have the following identity (also contained in Achim’s
paper):
b
a
f(x)dx = f(−i∂x)
eibx − eiax
ix
|x=0, (601)
where f(−i∂x) is the function f(x) with its argument x replaced by the differential
operator −i∂x = −i ∂
∂x .
235
For Fourier transforms (integrated over the whole real line), the following identity
is useful:
g(x)dx = 2πg(−i∂x)δ(x)|x=0, (602)
where δ(x) is the Dirac delta distribution, centred at 0.
For Laplace transforms (integrated from zero to infinity), the following conse-
quence is useful:
∞
0
f(x)dx = 2πf(−i∂x)H(−i∂x)δ(x)|x=0, (603)
where H is the Heaviside distribution. As a function H(x) = 0 for x  0 and
H(x) = 1 for x ≥ 1.
Try experimenting with these generalizations of our initial identity to evaluate in-
tegrals that you already know how to perform using standard calculus rules. This
should help you build some confidence and intuition with these techniques. It will
also stop you from getting bored over summer!
Once you are confident that you have a hand on these integration tricks, you may
apply them to harder integrals which you may not know how to solve otherwise.
In particular, for those of you who use Fourier and Laplace transforms, or per-
form contour integrals in complex analysis, you should find that these tricks may
simplify some of your problems.
236
26 2015 Academic Program Suggestions
26.1 Tutoring
In response to David Platt’s request for thoughts on the structuring of the academic
program provided by St. George’s college, I have the following comments.
• The nature of Remedial Help
For people requiring coursework help, David has suggested that students
should be encouraged / expected to offer academic assistance to other stu-
dents, as part of ‘college spirit’. He suggested this means getting rid of paid
tuition for coursework help. Such views represent an ideal academic col-
lege system, and indeed, they could implemented be if these expectations
were made at the very start of the year. There are however, some obvious
obstacles and short-comings which will need to be addressed.
• The first issue is that the necessary expertise for academic assistance is lim-
ited to the students in the each relevant discipline. Of these students, it may
be fair to say that most of them would be willing to help another student as
part of the college camaraderie. However, each student in a position capable
of providing quality assistance, will be limited by their own commitments
and available time. Hence the assistance they provide, may or may not nec-
essarily match the time needed by the student requesting it. This is espe-
cially true in the case of students who are struggling and need several hours
of dedicated 1-on-1 assistance. The student needing assistance then has two
in-college options. One is to seek assistance / time from another student and
the other is to wait until the original student is free again.
If there are enough skilled students to able and willing to provide the hours
/ week required by students requiring assistance, then with appropriate com-
munication (e.g. a college facebook study-group for each academic disci-
pline), this idealized academic support system may work at college. In prac-
ticality, a functional academic support system for remedial help, would have
to be a mix of both worlds. If the college does not provide remedial help
at a dedicated professional level, then there is less incentive for students to
come to college since they could just as equally make friends in class and
ask assistance from them.
In essence, there should be at least one dedicated tutor for each discipline
who is sufficiently skilled. This solves the scenario in which a student re-
quires help, but is unable to obtain in-college assistance from other students
237
in their required time-frame. This also protects the system against the cir-
cumstances where capable students are unwilling to assist certain students
for personal/social reasons.
• To protect such a system from the obvious problem of students bypassing
the student community help to get help from a dedicated tutor, it should be
outlined in each study group (as well as at the start of year and throughout
semester) that students seeking remedial help must first seek help from the
relevant student body at the college. This could work by posting on the
study group page. In the situation that there is evidence that the student has
sought help from other students, but was unsuccessful – either due to lack of
availability or expertise, then the student should be able to request help from
a dedicated tutor. Perhaps there is also an explicit and reasonable expectation
for the dedicated tutors to provide some academic support ‘off-the books’, in
the same manner that other students have that expectation upon themselves.
The key to prevent the hybrid system from falling back to the old system
(where there are too many remedial tutors) is to encourage the psychology
of community help at college. I believe that creating discipline-specific study
/ social groups (separate from formal, extra-curricular tutorials), which stu-
dents of each discipline must join, will go a long way to help create and
foster the academic environment envisioned by David Platt and Michael
Champion. If successful, a hybrid system has the potential to both sup-
port a collective academic team environment as well as possessing some the
professionalism and fall-back structures that a traditional college should pro-
vide – for example, to cater for critical periods in semester (such as the exam
periods when everyone is very busy).
We need the following implementation:
1. A facebook group for each subject discipline that students should / must
enrol in at the start of the year (separate to tutorial groups). Along with this
is the explicit expectation that they help other students academically when
they can (new students won’t mind this if it’s introduced from the beginning).
2. Smaller quantity of tutors, but more quality. Getting a HD in a subject is
probably not sufficient (in general) for someone to be a dedicated tutor in
that discipline. The tutor should exhibit some consistent high performance
in the discipline and some level of mastery.
3. Senior tutors, such as Claire and Raymond, should have more input / say in
the direction of the academic program at college.
238
26.2 Mathematical Sciences Tutorial Plan
1. Mathematics of the GPS System: 3-4 tutorials. Requires elementary notions
of non-Euclidean geometry, triangulation, consequences of general relativ-
ity, and error analysis (differential error). Combine this with tute on metric
spaces.
2. Mathematics of Space Travel: 4-5 tutorials. Include two tutes on conic sec-
tions and the Kepler orbits. Include a tute on hyperbolic trigonometry / ge-
ometry, special relativity. Include tute on Alcubierre metric, warp drive and
optimisation.
3. Integration Techniques and Applications: 3-4 tutorials. Take the Kempf in-
tegration tutorial as the last, then add two or three new tutes. Include on tute
on the Gamma function, then another tute on other special functions.
4. Fourier Analysis and Spectral theory: 3-4 tutorials. This will include theory,
some basic programming and experimental investigation of instruments / the
wolf-note. We may also compare music tracks and voices in this manner.
5. Dimensional Analysis tutes from 2013: 3 tutorials.
6. Error Analysis tutes from 2013: 3-4 tutorials.
7. Re-hash of the Lie Groups / Lie Algebras: 3-4 tutorials. Second semester /
end of year.
8. Differential Equations tutorials: 3-4 tutorials from 2013 (add new one on
non-linear DEs).
Additionally, need the following ground-rules / changes:
1. Leon’s ‘mathematics is a mountain’, input-output type motivational talk.
2. Bring healthy snacks to tutes + chocolate to encourage people.
3. Use white-board / experimental apparatus at start of tute to motivate them.
4. Feedback / grade, informal qualification.
5. Rebrand: Mathematical Sciences Exploration Group.
239
27 Miscellaneous
A section of notes for topics that individuals have requested. Disclaimer: this is
written from memory and pen-paper calculations.
27.1 Lagrangian Mechanics
27.1.1 Background
After a while, one begins to realise that using Newton’s laws to solve problems
in classical mechanics can get very tedious and annoying. Thankfully, apart from
making good cheese, wine and conquering most of Europe, the French were (and
still are) also very good at producing world-class mathematicians. One such math-
ematician was Joseph Lagrange, who amongst a trillion other accomplishments,
came up with a revolutionary reformulation of classical mechanics in conjunction
with several other mathematicians 160 and physicists. This approach is now known
as ‘Lagrangian mechanics’ and is an extremely powerful and vast generalisation of
Newtonian mechanics. Today, almost the entirety of modern physics is based on
the principles set down by Lagrange and Hamilton. It also has vast applications to
optimization problems and many areas of engineering.
27.1.2 The Principle of Stationary Action
The fundamental concept behind Lagrangian mechanics is the ‘principle of sta-
tionary action’. It is more commonly referred to as the principle of ‘least action’,
which is technically incorrect 161. It basically says that nature is lazy, and will
always (classically) take the path of stationary action – which means it makes the
following functional stationary:
S = Ldt (604)
Here the quantity S, called the ‘action’, is a functional – an object which acts on
functions. The function L is called the ‘Lagrangian’ of your theory – it contains all
necessary information about your physical system. Different theories and different
160
Most notably, the Irish mathematician Sir William Rowan Hamilton.
161
Recall that when you are trying to find the critical points of a function, you first find its derivative
and then set it to zero. This doesn’t just give you points at which the function is minimized – you
also get inflection points and maxima.
240
systems will have different lagrangians. Finally, the integral used here is the
indefinite-integral with respect to time t, which parametrises the system.
For systems in classical mechanics, the Lagrangian sometimes (but not always!)
takes the following form:
L = T − U (605)
where T is the total kinetic energy of the system and U is its potential energy. If the
system is conservative (i.e. no losses due to friction etc) and the potential energy U
is time-independent, then Lagrangian will take this special form. Note the minus
sign in T − U is important – if this was plus sign, then the Lagrangian would be
the total energy (or Hamilton in this restricted set of cases).
If the system is non-conservative, then one usually has to add extra terms the action
to account for losses / dissipation (or net gain) of energy.
If the system is constrained – e.g. a bead confined to roll on some surface, then
one needs to either use the method of Lagrange multipliers or to express the system
in-terms of unconstrained variables.
27.2 The Euler-Lagrange Equations of Motion
The Euler-Lagrange equations of Motion are the equations you have to solve to de-
termine the dynamical time evolution of your system in the Lagrangian formalism.
In some subset of cases, these are simply equivalent to the equations of motion you
get using Newton’s Second Law: F = ma. Here I will specify a simple system,
then show how to derive the Euler-Lagrange equations for this system using the
principle of stationary action. Later, I will specify a more general system then re-
derive the Euler-Lagrange equations. Finally, I will give an example of the power
of the Lagrangian formalism – in particular, a proof of the fact that a straight line
is the shortest distance between two points in ordinary Euclidean geometry.
In the Lagrange formalism, a system is specified by a set of generalized coordi-
nates: q1(t), ..., qn(t) (parametrised by time t) and a set of generalized velocities
which are the derivatives of the coordinates with respect to time t: 9q1, ..., 9qn. In
non-relativistic mechanics, we view the time t as the independent variable and the
coordinates qi and velocities 9qi as dependent variables, parametrised by t. The con-
figuration space is then taken to be the set of all possible values: (q1, ..., qn, 9q1, ..., 9qn)
of the generalized coordinates and the corresponding velocities. Note that general-
ized coordinates represent points in some space M, and the generalized velocities
are (tangent) vectors attached to these points (recall velocity is a vector quantity).
241
Hence the configuration space of a physical system naturally takes the form of a
‘tangent bundle’ 162, denoted TM.
Abstraction aside, we now consider the Lagrangian for a simple system (e.g. a
point-particle moving with constant acceleration) with a generalized coordinate q
and a generalized velocity 9q = dq
dt . The Lagrangian L = L(q, 9q) for this system is
a function of q and 9q, defined on the configuration space 163 TM.
The action S[L] corresponding to this Lagrangian L, is given by:
S[L] = L(q, 9q) = L(q, 9q)dt. (606)
We can compute the variation of this action δS[L] by using integration by parts and
computing the variation of the Lagrangian: δL. Note that to compute the variation
of the Lagrangian, δL, we simply use the same rules as we do when computing a
total differential (or ‘exterior derivative’). In particular, we have
δL(q, 9q) =
∂L
∂q
δq +
∂L
∂ 9q
δ 9q (607)
Note that we have assumed that the Lagrangian L does not explicitly depend on
time t. It only depends on t implicitly through q(t) and 9q(t). If it did explicitly
depend on t, e.g. for a system with a time-varying potential energy U(t), then we
would just include an extra term: ∂L
∂t in the variation of L.
Therefore, we have:
δS[L] =δ Ldt
= δLdt
= (
∂L
∂q
δq +
∂L
∂ 9q
δ 9q)dt (608)
Note that the variation ‘operator’ δ commutes with derivative operators. Hence for
example, d
dt δq = δ d
dt q = δ 9q. Our goal is to compute the ‘functional derivative’
of the functional S with respect to the generalized coordinate q. The functional
162
A collection of points and the tangent spaces attached to those points. If the coordinate space
M is n-dimensional, then the tangent bundle TM is 2n-dimensional.
163
In general, L could also be a function of higher derivatives of q, for example – L = L(q, 9q, :q, ..),
however for most practical cases we just consider L = L(q, 9q).
242
derivative allows us to differentiate functionals with respect to functions – apart
from a few technicalities, it behaves much the ordinary derivative. This means we
want the quantity δS
δq , so we need the term δq to right of both terms in the integrand
of (608). However, the second term contains δ 9q := δ d
dt q. In order to ‘move’
the total derivative d
dt away from the q, we use the integration by parts technique
164:
d
dt
(
∂L
∂ 9q
δq)dt = d(
∂L
∂ 9q
δq) =⇒
{
d
dt
(
∂L
∂ 9q
)δq}dt + (
∂L
∂ 9q
δ
d
dt
q)dt =[
∂L
∂ 9q
δq]|
t=tf
t=ti
(609)
where ti and tf denote the range of integration over time – we almost always use
ti = −∞ and tf = +∞ for a classical action. Now, we make the (physically-
motived) assumption that the quantity the quantity on the right-hand side vanishes:
[∂L
∂ 9q δq]|
t=tf
t=ti
= 0. This is almost-always true for most physical Lagrangians L 165.
Therefore, taking this assumption, we get:
{
d
dt
(
∂L
∂ 9q
)δq}dt + (
∂L
∂ 9q
δ
d
dt
q)dt =0 =⇒
(
∂L
∂ 9q
δ
d
dt
q)dt = − {
d
dt
(
∂L
∂ 9q
)δq}dt. (610)
This allows us to write the variation (608) of the action as:
δS[L] = dt(
∂L
∂q
δq) − dt(
d
dt
(
∂L
∂ 9q
)δq
= dt{
∂L
∂q
−
d
dt
(
∂L
∂ 9q
)}δq. (611)
Note that here we’ve made a common (mathematically-motivated 166) change of
notation: (Stuff)dt =: dt(Stuff). Finally, we bring the δq in the integrand
(611) to the left-hand side and formally define the functional derivation of the ac-
tion S to be:
δS[L]
δq
=
∂L
∂q
−
d
dt
(
∂L
∂ 9q
). (612)
164
Or rather, the fundamental theorem of calculus (for 1-dimensional problems) / a special case of
the generalized Stokes theorem for higher dimensions.
165
One rare case where one gets so-called ‘boundary contributions’ to the action integral, is in
general relativity – in particular, the Gibbons-Hawking-York boundary term, which accounts for the
case when spacetime is a manifold with a boundary.
166
In this manner, we can think of the integral sign and the variables we integrate with respect to
(dt) as an abstract operator or ‘functional’ called a ‘measure’. Thus dt is an operator which acts
on functions to give some number – which is the value of the function it integrates.
243
In this language, the principle of stationary action states that the variation must
vanish: δS = 0, which is equivalent to saying the functional derivative is zero:
δS[L]
δq = 0. Therefore
∂L
∂q
−
d
dt
(
∂L
∂ 9q
) = 0, (613)
which are precisely the Euler-Lagrange equations of motion for this dynamical
system! Thus we have explicitly demonstrated that the Euler-Lagrange equations
are a direct consequence of the principle of least action – furthermore, we listed
the assumptions made throughout the derivation. In particular, we assumed zero
boundary contributions to the action and that the Lagrangian L was not explicitly
dependent on time (so ∂L
∂t = 0) and that it only depended on the generalized coor-
dinates and velocities: L = L(q, 9q). If we relaxed some of these assumptions, we
could extra terms in the Euler-Lagrange equations.
Note, there is another way to view this derivation using Taylor expansions. This
method is a bit more suggestive and intuitive in regards to why we call these tech-
niques ‘variational principles’ or ‘variational calculus’. The premise is that we
perturb the action by perturbing the function it acts on: S[L + δL] ≈ S[L] + δS,
then define the variation as the difference between the perturbed action and the
original action: δS = S[L + δL] − S[L].
Functions L which satisfy the stationary action condition: δS[L] = 0, are called
Lagrangians. They are inflection points of the action functional. In some cases
they correspond to minima or maxima of the action. For this reason, they are
fundamental to variational calculus. For example, if the action represented the
length of a curve or the surface area of a soap bubble, we could use variational
calculus to find a curve with minimal length or the shape of a soap bubble surface
with minimal area under some given constraints.
Example 26 As an example, take the motion of a point-particle with mass m and
position coordinate x, moving in one-dimension. We view x as a function of time
t: x = x(t). Then x is our generalized coordinate with corresponding generalized
velocity 9x. If the particle’s is moving due to some conservative force acting on it,
then it has some associated potential energy U. Assuming U is independent of time
t, we then have U = U(x) in general (e.g. the particle could be moving vertically
and experiencing a gravitational force with potential U = U(x)). The Lagrangian
is then given by:
L = Kinetic Energy − Potential Energy =
1
2
m 9x2
− U(x). (614)
244
The Euler-Lagrange equations then tell us that:
∂L
∂x
−
d
dt
(
∂L
∂ 9x
) = 0, (615)
hence we see that
−
∂U(x)
∂x
−
d
dt
(m 9x) = 0. (616)
Since U is only a function of one variable, we write the partial derivative as a total
derivative instead, hence:
−
dU(x)
dx
= m:x (617)
since the mass m is constant. Recalling that a conservative force F can be defined
as the gradient of some potential: F = − U, we then identify −dU(x)
dx as the
component Fx of the force acting on this particle in the x-direction. Hence we
have:
Fx = m:x (618)
which is precisely Newton’s second law. Note that this is based on the assumption
that the Lagrangian L was only dependent on x and 9x. In general, one may have
a time-varying acceleration (e.g. a radiating charge or stealth fighter jet) – in such
a case, we would modify the Euler Lagrange equations and therefore modify our
statement of Newton’s second law.
27.3 N-Dimensional Euler-Lagrange Equations
To see how this formalism generalizes to higher-dimensional systems, we proceed
as follows. Let qi denote the i − th generalized coordinate for a system with
n generalized coordinates, q1, ..., qn. The n corresponding generalized velocities
are then given by 9qi, where i = 1, ..., n. Collecting the variables q1, ..., qn and
9q1, ..., 9qn into vectors q and 9
q, respectively, we can view the Lagrangian as a func-
tion of 2n variables, parametrised by time t:
L = L(q, 9
q; t). (619)
The action functional generated by this Lagrangian is given by:
S[L] = Ldt. (620)
245
To vary the action, we Taylor expand L(q1, ..., qn, 9q1, ..., 9qn) to first order in all its
variables. In particular, we have:
S[L + δL] = L(q + δq, 9
q + δ 9
q)dt
= [L( 9
q, q) +
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt
= L( 9
q, q)dt + [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt
=S[L( 9
q, q)] + [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt,
(621)
hence
δS :=S[L + δL] − S[L]
= [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
+
∂L
∂ 9q1
δ 9q1
+ ... +
∂L
∂ 9qn
δ 9qn
]dt
= [
∂L
∂q1
δq1
+ ... +
∂L
∂qn
δqn
−
d
dt
(
∂L
∂ 9q1
)δq1
+ ... −
d
dt
(
∂L
∂ 9qn
)δqn
]dt
= {[
∂L
∂q1
−
d
dt
(
∂L
∂ 9q1
)]δq1
+ ... + [
∂L
∂qn
−
d
dt
(
∂L
∂ 9qn
)]δqn
]}dt (622)
where we have used integration by parts to move the total derivative d
dt from
the perturbations, ∂ 9qi, to the corresponding coefficients, ∂L
∂ 9qi . Again, one makes
the assumption of vanishing boundary contributions: d(( ∂L
∂ 9qi )δqi) = [ ∂L
∂ 9qi ]|∞
−∞=
0.
The principle of stationary action says that a physical system classically evolves
such that the action is stationary: δS
δq = 0. For this to happen, the coefficients of
the variations δqi of the coordinates, must vanish in the integral (622). This means
that we obtain a system of n differential equations, which are the n−dimensional
Euler-Lagrange equations:
∂L
∂q1
−
d
dt
(
∂L
∂ 9q1
) =0
∂L
∂q2
−
d
dt
(
∂L
∂ 9q2
) =0
...
∂L
∂qn
−
d
dt
(
∂L
∂ 9qn
) =0. (623)
246
In this manner, one can now derive Newton’s Second Law in n dimensions by
generalizing the 1-dimensional case outlined earlier. In particular, this is done
by considering a potential U = U(x1, ..., xn) which depends on the n position
coordinates x1, .., xn. The velocities are given by dxi
dt . Putting these into vector
quantities, the kinetic energy of a point particle of mass m with velocity 9
x is given
by:
K =
1
2
m 9
x 2
. (624)
Since the potential energy U is time-independent, we can write the Lagrangian for
this system as:
L = K − U =
1
2
m 9
x 2
−U(x). (625)
The Euler-Lagrange equations can be found using the system (623) earlier. In
particular, since we have
∂
∂ 9qi
9
x 2
=
∂
∂ 9qi
[( 9q1
)2
+ ... + ( 9qn
)2
]
=2 9qi
, (626)
the Euler-Lagrange equation for the i − th coordinate of the point particle, is given
by:
m
d
dt
9qi
+
∂U
∂qi
= 0. (627)
Re-arranging, this is simply the i − th component of the n-dimensional version of
Newton’s Second Law of motion:
m:qi
= −
∂U
∂qi
. (628)
Collecting the n equations into one vector equation, this is made explicit:
F := m:
q = − U, (629)
where U is the gradient (vector) of the potential energy function U. This state-
ment is in fact, quite general – that is, a conservative force F arising from a poten-
tial U, is necessarily given by: F = − U. So for example, given a gravitational
potential U = −GM
r , we see that the (conservative) gravitational force is given
by:
F = − (
GM
r
) = −
GM
r2
ˆr, (630)
where G is Newton’s gravitational constant and ˆr is a unit-vector pointing in the
radial direction away from a massive object of mass M. The minus sign then
accounts for the fact that the gravitational force is directed towards the massive
object.
247
27.4 Examples
Example 27 (Simple Pendulum) Consider a vertical pendulum of mass m and
length l. We set up a coordinate system with horizontal (pointing right) coordinate
x and vertical (downward) coordinate y, where θ is the angle between the vertical
y-axis and the arm of the pendulum. We set the origin to be at the beginning of the
pendulum arm, from which the mass hangs at the opposite end. Since this system
is undergoing rotational motion (the mass at the end of the pendulum is moving in
a circular arc of radius l) with a fixed radius l (the length of the pendulum arm),
the mass at the end of the pendulum has a tangential velocity of: v = rω = r 9θ.
Therefore, the total kinetic energy is given by:
K =
1
2
m v 2
=
1
2
ml2 9θ2
. (631)
The potential energy is given by: U = Gravitational Force × Distance, which is
the projection of mgl in the vertical direction:
U = mgy = mgl cos(θ). (632)
The Lagrangian is therefore given by
L(θ, 9θ) = K − U =
1
2
ml2 9θ2
− mgl cos(θ), (633)
where θ and 9θ are the generalized coordinate and corresponding generalized veloc-
ity, respectively. The Euler-Lagrange equation is given by
∂L
∂θ
−
d
dt
∂L
∂ 9θ
= 0, (634)
which simplifies to
:θ +
g
l
sin(θ) = 0. (635)
This differential equation can be solved analytically for θ using hypergeometric
functions. Alternatively, one can make the small angle approximation to linearise
this non-linear differential equation: sin(θ) ≈ θ, for small displacements θ 1
(radians).
Note that using the Lagrangian approach, one only needs to compute the potential
energy and kinetic energy for the pendulum system. This is a rather trivial task
(as shown) which avoids the messiness of having to consider forces and ‘tension’,
which is required by the Newtonian approach.
248
Another advantage of the Lagrangian formalism, is that one may easily change
coordinates without having to worry about introducing ‘fictitious forces’ (e.g. cen-
trifugal, Coriolis) – the principle of ‘generalised coordinates’ essentially bids one
to express the Lagrangian in terms of the most ‘natural’ coordinate system for the
problem at hand. Here made use of the rotational nature of the problem to switch
from the Cartesian x, y coordinates to the polar coordinates r, θ (although we didn’t
use r, since we the radial coordinate was fixed at r = l).
Example 28 (Harmonic Oscillator) Consider a 3-dimensional harmonic oscilla-
tor. Such a system may be envisioned as a mass attached to a spring, whose other
end is fixed at some origin. If we let a 3-dimensional Cartesian coordinate system
– x, y, z – coincide with initial (non-stretched) position of the mass, then stretching
the string in any direction will induce a radial oscillatory motion. Let k denote the
spring constant and m denote the mass at the end of the spring. The force on the
mass is given by Hooke’s law:
F = −kr (636)
where r is the (radial) position vector: r = xe1 + ye2 + ze3 ∼ (x, y, z). The
potential energy of the spring is equal to the work done required to stretch the
spring from its rest
U =
r
0
F · dl = (−kr)dr = −
1
2
kr2
. (637)
The kinetic energy of the mass is given by
K =
1
2
m v 2
=
1
2
m 9r2
, (638)
where 9r2 = 9x2 + 9y2 + 9z2. We could use Cartesian coordinates, however radial
coordinates are the ‘natural choice’ for this problem (since it is effectively a 1-
dimensional problem – the motion only occurs in the radial direction, which is one-
dimensional). Therefore we choose r and 9r = d
dt r to be our generalized coordinate
and generalized velocity, respectively. The Euler-Lagrange equation is given by
∂L
∂r
−
d
dt
∂L
∂ 9r
= 0, (639)
which reduces to
:r +
k
m
r = 0. (640)
This second-order linear differential equation is solved by the usual means. In
particular, the characteristic equation is given by:
λ2
+
k
m
= 0, (641)
249
whence the eigenvalues are λ = ±i
˜
k
m . Let ω :=
˜
k
m denote the fundamental
frequency. Then the general solution is giveb by:
r(t) = c1eiωt
+ c2eiωt
, (642)
where c1 and c2 are constants determined by the initial conditions. This can alter-
natively be expressed in real form,
r(t) = a1 cos(ωt) + a2 sin(ωt) (643)
where are a1 and a2 are constants determined by the initial conditions. In particu-
lar, r(0) = a1 and 9r(0) = a2ω. Hence a1 is the initial displacement and a2 is the
initial velocity divided by the fundamental frequency.
Note if you’ve forgotten how to get from complex form to real form, recall that
cos(x) =
eix + e−ix
2
, sin(x) =
eix − eix
2i
(644)
where i2 := −1. Comparing coefficients we see that the constants are explicitly
related by:
c1 =
a1
2
+
a2
2i
, c2 =
a1
2
−
a2
2i
. (645)
27.5 Multiple Independent Parameters
For the purpose of the (modern and topical) branch of mathematical physics known
as ‘minimal surface’ theory, along with relativity and quantum field theory, it is im-
portant to extend the Lagrangian formalism to include physical systems – or more
specifically, generalized coordinates, which depend on more than one independent
parameter. Until now, we have considered systems which were parametrised by one
independent variable – time t. We now consider systems which are parametrised
by k independent variables, which we shall denote t1, ..., tn for familiarity.
For simplicity, we shall just consider systems with one generalized coordinate
(parametrised by multiple variables) for now. The extension to an arbitrary number
of generalised coordinates is done in the obvious way, analogous to our previous
extension when we had just one independent parameter t.
Let t1, ..., tk denote our k independent parameters and let q := q(t1, ..., tk) denote
our generalized coordinate, dependent on these parameters. The corresponding
generalized velocities (with respect to each parameter) are then give by: ∂q
∂t1
,..., ∂q
∂tk
.
250
Given some function L := L(q, ∂q
∂t1
, ..., ∂q
∂tk
; t1, ..., tk) explicitly dependent on the
generalized coordinate q, generalized velocities ∂q
∂ti
and implicitly dependent on the
independent parameters t1, ..., tk, we now wish to formulate a variational problem.
In particular, we consider the following action functional (a k-dimensional integral
performed over t1, ..., tk):
S[L] = Ldt1dt2...dtk (646)
and ask the question – which functions L make this action stationary? To solve the
variational problem, we proceed as before to vary the action by Taylor expansion
of L in all its variables. In order to do this, some new notation will be handy. Let vq
i
denote the i − th generalized velocity corresponding to the generalized coordinate
q – particular, we have: vq
1 := ∂q
∂t1
, ..., vq
k := ∂q
∂tk
. The variation of the Lagrangian
is then given using the same rules as the total differential:
δL =
∂L
∂q
δq +
∂L
∂vq
1
δvq
1 + ... +
∂L
∂vq
k
δvq
k. (647)
Therefore, the variation of the action is given by:
δS = δLdt1...dtk
= [
∂L
∂q
δq +
∂L
∂vq
1
δvq
1 + ... +
∂L
∂vq
k
δvq
k]dt1...dtk
= [
∂L
∂q
δq −
∂
∂t1
(
∂L
∂vq
1
)δq − ... −
∂
∂tk
(
∂L
∂vq
k
)δq]dt1...dtk
= [
∂L
∂q
−
∂
∂t1
(
∂L
∂vq
1
) − ... −
∂
∂tk
(
∂L
∂vq
k
)]δqdt1...dtk. (648)
where we have used integration by parts (or Stoke’s Theorem) for multiple vari-
ables, to swap the derivatives ∂
∂ti
from the velocity variations δ ∂q
∂ti
to the corre-
sponding coefficients ∂q
∂ti
– which introduces the minus signs. Therefore, we have
the functional derivative of the action with respect to the generalized coordinate,
given by:
δS
δq
=
∂L
∂q
−
∂
∂t1
(
∂L
∂vq
1
) − ... −
∂
∂tk
(
∂L
∂vq
k
). (649)
The principal of stationary action tells us that nature classically selects this func-
tional derivative to be zero, which gives us the Euler-Lagrange equations for a sys-
tem with one generalized coordinate q, parametrised by k independent variables
t1, ..., tk:
0 =
δS
δq
|Nature=
∂L
∂q
−
∂
∂t1
(
∂L
∂vq
1
) − ... −
∂
∂tk
(
∂L
∂vq
k
). (650)
251
27.6 More Examples
We can use variational calculus to derive the (rather famous) minimal surface equa-
tion. In particular, we consider the following example.
Example 29 (Minimal Surface Equation) We consider all two-dimensional sur-
faces parametrised by two independent variables, z := z(x, y), then ask the ques-
tion – which surface of this general form has the minimal surface area? To answer
this question, we can use the Euler-Lagrange equation (650) derived earlier. Say
that the surface z := z(x, y) parametrised by the two independent variables t1 = x
and t2 = y, has a domain D. Then (recall) its surface area is given by the double-
integral:
A =
d
1 + (
∂z
∂x
)2 + (
∂z
∂y
)2dxdy. (651)
We can view this as a variational problem by observing that: z is generalised co-
ordinate parametrised by two independent variables x and y. The correspond-
ing generalised velocities are given by (various notations) vz
1 = zx := ∂z
∂x and
vz
2 = zy := ∂z
∂y – we shall stick with the latter notation. Now, the total surface
area A can be viewed as an action functional: A = A[L], whilst our integrand
(infinitesimal / area differential) can be viewed as the corresponding Lagrangian:
L(z, zx, zy) =
˜
1 + (∂z
∂x )2 + (∂z
∂y )2 =
˜
1 + z2
x + z2
y.
Since we seek to minimize A, we need to first find surfaces (parametric functions)
z(x, y) which make the functional A stationary. We then need to check that these
stationary ‘points’ (functions) correspond to minima, rather than inflection points
or maxima. The first task can be achieved by solving the Euler-Lagrange equations
(650), which take the form:
∂L
∂z
−
d
dx
∂L
∂zx
−
d
dy
L
∂zy
=0 =⇒
0 +
d
dx
zx
˜
1 + z2
x + z2
y
+
d
dy
zy
˜
1 + z2
x + z2
y
=0 . (652)
Although the last equation, known as the ‘minimal surface equation’, was derived
by Lagrange in 1762, non-trivial (non-planar) solutions were not found till 1776
by the French Mathematical Engineer, Jean Meusnier. In particular, the planar
solution is given by:
Z(x, y) = Ax + By + C (653)
where A, B, C are constants. Here Zx = ∂Z
∂x = A, Zy = ∂Z
∂y = B and L =
?
1 + A2 + B2 e.t.c.
252
Switching to cylindrical coordinates: (ρ, θ, z), with x = ρ cos(θ), y = ρ sin(θ)
and z = z, we have another solution to the minimal surface problem. This is given
by the Catenoid – a surface of revolution parametrised by a single independent
variable, z:
ρ = λ cosh(
z
λ
) (654)
where λ is a constant. Note that ρ is independent of the second independent vari-
able θ, since the surface rotationally symmetric (it was produced by rotating a
catenoid curve about the z-axis). To show this is a solution, we can either re-
derive the minimal surface equation, starting from the infinitesimal area element:
dA =
˜
1 + (∂ρ
z )2 + (∂ρ
∂θ )2ρdθdz, or try some messy crap with the chain rule and
the Cartesian coordinate equation. It’s far easier to start from the action principle
again, with the Lagrangian: L(ρ, ∂ρ
∂z , ∂ρ
∂θ ). Since our Catenoid is independent of
theta (symmetry in θ), we have ∂ρ
∂θ = 0. Therefore, our Lagrangian is the coeffi-
cient function(coefficient of dθ ∧ dz) our area 2-form element:
L = L(ρ, ρz) = ρ
™
1 + (
∂ρ
∂z
)2 + 0. (655)
Letting ρz := ∂ρ
z , our Euler-Lagrange equation is given by:
∂L
∂ρ
−
d
dz
∂L
∂ρz
= 0, (656)
which simplifies to:
—
1 + ρ2
z −
d
dz
ρzρ
—
1 + ρ2
z
= 0. (657)
With some application of the chain and product rules, along with the hyperbolic
trigonometric identities
1 + sinh2
(x) = cosh2
(x)
d
dx
cosh(λx) =λ sinh(λx),
d
dx
sinh(λx) = λ cosh(λx)
d
dx
tanh(x) = sech2
(x) (658)
one can show that the Catenoid surface, given by ρ(z) = λ cosh(z
λ), solves the
Euler-Lagrange equation (657). Hence the Catenoid corresponds to a ‘critical-
surface’ (cf. ‘critical point’) of the surface area functional A and makes this func-
tional (action) stationary. To see that it is indeed a minimal surface, simply note that
the Lagrangian is given by the square root of a strictly-positive quantity. Since the
253
Lagrangian is strictly positive, the corresponding area (action) integral is strictly
positive. This means that the Catenoid surface (or in fact any surface!), cannot be
a maximal surface. Hence the Catenoid is either a stationary point or a minima of
the area action functional. It is in fact a minimal surface.
27.7 Closing Remarks
: The Lagrangian formalism is for the most part, a second-order formalism. This
means that the equations of motion resulting from the Euler-Lagrange equations
are usually second order differential equations. For many different reasons, it is
sometimes to advantageous or necessary to switch to a first-order formalism –
‘Hamiltonian mechanics’. To do this, one defines the Hamiltonian as the Legendre
transform of the Lagrangian:
H(q, p; t) = p · 9
q − L(q, 9
q; t) (659)
where the p is the conjugate momentum vector (related to the generalized veloci-
ties). The components of p are defined as the partial derivatives of the Lagrangian
with respect to the generalized velocities:
pi
:=
∂L
∂ 9qi
. (660)
In this formalism, the natural variables are now the generalized coordinates q and
the conjugate momenta p. From a practical point of view, the ultimate result is
that Hamilton’s equations are coupled first-order differential equations – which in
general are easier to solve than the Euler-Lagrange equations.
Although they are essentially equivalent, there are many theoretical motivations
for the Hamiltonian formalism – most notably, that it allows a dynamical system
to be represented in ‘phase space’. Evolution of the system is then described by
trajectories (q(t), p(t)) in phase-space. With such a structure, the system can be
analysed using symplectic geometry and Liouville theory – the key point being
that the Hamiltonian H(q, p) defines a ‘flow’ on phase space (a map on the cotan-
gent bundle). This flow gives rise to a conserved, non-vanishing object called the
‘symplectic form’ – the basis for many deep mathematical theorems regarding dy-
namics.
254

SGC 2014 - Mathematical Sciences Tutorials

  • 1.
    St. George’s College2014 - Mathematical Sciences Tutorials (Broad Concept Problems) Daniel Xavier Ogburn ∗ School of Physics, Field Theory and Quantum Gravity, University of Western Australia December 22, 2014 ∗ Electronic address: daniel.ogburn@research.uwa.edu.au 1
  • 2.
    Contents 1 Introduction 6 2Tutor List 6 3 Broad Concept Problems 6 4 Tutorial 1 - Dimensional Analysis and the Buckingham Pi Theorem (part I) 7 4.1 Prologue: March 15, 2014 . . . . . . . . . . . . . . . . . . . . . 7 4.2 Examples and Problems . . . . . . . . . . . . . . . . . . . . . . . 8 4.2.1 Moral of the story . . . . . . . . . . . . . . . . . . . . . . 15 5 Tutorials 2 - Dimensional Analysis and the Buckingham Pi Theorem (part II) 15 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2 Examples and Problems . . . . . . . . . . . . . . . . . . . . . . . 16 6 Tutorial 3 - Return of Dimensional Analysis: Gravity, The Hierarchy Problem and extra-dimensional Braneworlds 22 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.3 Extended Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7 Tutorial 4: 50 Shades of Error, Shade I – Multivariable calculus and The Total Differential 29 7.1 Russian Playpen: Functions of more than one variable . . . . . . . 29 7.2 Russian Daycare: Partial Differentiation . . . . . . . . . . . . . . 30 7.3 Russian Kindergarten: The Exterior Derivative (Total Differential) 34 7.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 36 8 Tutorial 5: Absolute Error and Game of Thrones 41 8.1 Absolute Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 8.2 Examples and Problems . . . . . . . . . . . . . . . . . . . . . . . 43 9 Tutorial 6: Medicine – An Error a Day Keeps the Tutor Away 54 9.1 Relative and Percentage Error . . . . . . . . . . . . . . . . . . . 54 9.2 Error Etiquette . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 9.3 Sleepy Snorlax’s Medical (mis)Adventures . . . . . . . . . . . . 55 2
  • 3.
    10 Tutorial 7:Romanian High School, Part I – Einstein Convention and Vector Algebra 61 10.1 Conventions: Einstein Notation and Vector/Matrix Operations . . 62 10.1.1 Scalar and Vector Products – Dot Product . . . . . . . . . 66 10.1.2 Scalar and Vector Products – The Permutation Symbol . . 69 10.1.3 Scalar and Vector Products – The Cross Product . . . . . . 70 11 Tutorial 8: Design a Death Star – applications of Lie Groups/Algebras 73 11.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 11.2 BFF: Linear Maps and Matrices . . . . . . . . . . . . . . . . . . 74 11.3 SO(3): The Lie Group of Rotations . . . . . . . . . . . . . . . . . 76 11.4 so(3): Quaternions, Lie Algebras and Cross Products . . . . . . . 83 12 Tutorial 9+10: The Fault in Our Stars – Project Death Star (II) 84 12.1 Infinitesimal Rotations and Lie Algebras . . . . . . . . . . . . . . 85 13 Tutorial 11: Fiery the angels fell – Project Death Star (III) 100 13.1 Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 13.2 The Circle Group . . . . . . . . . . . . . . . . . . . . . . . . . . 101 13.3 The Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 13.4 Quaternions, Rotations and the 3-Sphere . . . . . . . . . . . . . . 109 14 Interlude: Academic and Intellectual Maturity 113 14.1 Keeping a CV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 14.2 Important Learnings and Observations . . . . . . . . . . . . . . . 114 14.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 15 Tutorial 12: Metric Spaces and Relativity I 119 15.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 15.1.1 Euclidean Metric Spaces . . . . . . . . . . . . . . . . . . 120 15.1.2 Fun Metric Spaces . . . . . . . . . . . . . . . . . . . . . 123 15.2 Non-Euclidean Metric Spaces and Relativity . . . . . . . . . . . . 130 16 Tutorial 13/14: Relativity and Hyperbolic Distance 131 16.1 The Two Faces of Trigonometry . . . . . . . . . . . . . . . . . . 131 16.1.1 The Circular Face . . . . . . . . . . . . . . . . . . . . . . 131 16.1.2 The Hyperbolic Face . . . . . . . . . . . . . . . . . . . . 134 16.2 Lorentz Metric and Relativity . . . . . . . . . . . . . . . . . . . . 137 16.2.1 Minkowski Spacetime . . . . . . . . . . . . . . . . . . . 138 16.2.2 Lorentz Metric and Light-Cone Structure . . . . . . . . . 140 16.2.3 Projections and Familiar Formulas . . . . . . . . . . . . . 146 3
  • 4.
    17 Tutorial 15:Differential Equations and Operators 150 17.1 Differential Operators and Simple DEs . . . . . . . . . . . . . . . 150 17.2 Physical Examples . . . . . . . . . . . . . . . . . . . . . . . . . 156 17.3 Operators, Eigenfunctions and Spectra . . . . . . . . . . . . . . . 159 18 Tutorial 16:Differential Equations and Integrating Factors 162 18.1 Review – Theory of separation of variables . . . . . . . . . . . . 162 18.2 Integration Factors . . . . . . . . . . . . . . . . . . . . . . . . . 164 19 Tutorial 17: Second Order Linear Differential Equations 170 19.1 Homogenous Second Order ODEs . . . . . . . . . . . . . . . . . 170 19.2 Theory of Linear ODEs . . . . . . . . . . . . . . . . . . . . . . . 172 19.3 Explicit Algorithm and Illustrations . . . . . . . . . . . . . . . . 175 20 Tutorial 18: Calculus of Vectors and Differential Forms I 183 20.1 Vector Valued Functions . . . . . . . . . . . . . . . . . . . . . . 183 20.2 Exterior Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . 186 21 Tutorial 19: Calculus of Vectors and Differential Forms II 194 21.1 Gradients and Exterior Derivatives . . . . . . . . . . . . . . . . . 194 21.1.1 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . 194 21.1.2 Exterior Derivatives . . . . . . . . . . . . . . . . . . . . 199 21.2 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 21.3 Hodge Dual, Closed and Exact Forms . . . . . . . . . . . . . . . 204 22 Tutorial 20: Calculus of Vectors and Differential Forms III 205 22.1 Sleight of Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 22.2 Curl of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . 207 23 Tutorial 21: Coordinate Systems and Scale Factors 212 23.1 Orientation and Measure . . . . . . . . . . . . . . . . . . . . . . 212 23.2 Smooth Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 24 Tutorial 22: Line Integrals and Exterior Calculus 221 24.1 Exterior Product and Derivatives . . . . . . . . . . . . . . . . . . 221 24.2 Orienting Volume Forms . . . . . . . . . . . . . . . . . . . . . . 224 24.3 Duality and Orthogonality . . . . . . . . . . . . . . . . . . . . . 226 25 Tutorial 23: Serendipity and Integration 230 25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 25.2 Differentigration . . . . . . . . . . . . . . . . . . . . . . . . . . 231 4
  • 5.
    25.3 Quantum FieldTheory Aside . . . . . . . . . . . . . . . . . . . . 236 25.4 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 26 2015 Academic Program Suggestions 239 26.1 Tutoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 26.2 Mathematical Sciences Tutorial Plan . . . . . . . . . . . . . . . . 241 27 Miscellaneous 242 27.1 Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . . 242 27.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 242 27.1.2 The Principle of Stationary Action . . . . . . . . . . . . . 242 27.2 The Euler-Lagrange Equations of Motion . . . . . . . . . . . . . 243 27.3 N-Dimensional Euler-Lagrange Equations . . . . . . . . . . . . . 247 27.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 27.5 Multiple Independent Parameters . . . . . . . . . . . . . . . . . . 252 27.6 More Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 27.7 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 256 5
  • 6.
    1 Introduction At thispresent moment, the suggested layout for tutorials will be: 20 minutes of ‘broad concept problems’ and 40 minutes of subject-specific help with questions from your coursework. If the there is a large turnout of students and more time is necessary, the tutorials will be extend to 30 minutes of ‘broad concept problems’ and 60 minutes of subject-specific help. In addition, students may approach tutors outside of tutorial times for help with specific coursework problems or concepts – but they will have to arrange this themselves with the individual tutors. 2 Tutor List For the year 2014, here is a list of tutors and the respective subjects they are dedi- cated to. Note that each tutor will probably be able to help you with other mathe- matics or physics related enquiries. • Murdock Grewar – PHYS1021 • Tessa McGrath – MATH1711 • Ben Luo – PHYS1001 • Jake Miller – MATH1001 In addition, students with any mathematics, physics or statistics enquiries are wel- come to seek me for assistance. 3 Broad Concept Problems These problems are designed to help expose you to important material and concepts outside the scope of a standard curriculum. They are also designed to help you think about applications of your mathematical powers to the world at large. In this manner, the hope is that students will develop a higher level of critical thinking, logical reasoning and mathematical intuition for investigating different scenarios and solving new problems. Note that I will generally aim to cover problems that you wouldn’t usually see in your lectures – or focus on topics which are (by student and professional experi- ence) useful and important, but otherwise overlooked or just briefly glossed-over 6
  • 7.
    in typical universitycourses. Since the tutorials are targeted at people from both pure and applied mathematics, or physical and non-physical sciences, the assumed physical science knowledge will be kept to a minimum. In cases where physics or engineering examples are used, the prerequisite concepts will be introduced – but only to emphasize the take-home message. To get the most out of these tutorials, you should attempt all the broad-concept problems. Some weeks, we will continue a certain theme from the previous week. If you can’t finish the problems in the tutorial, or decide to finish them after the tutorial in your own-time – feel free to ask questions during the week. The tutors appreciate that students are busy with their coursework and assessed homework problems, so the broad concept problems should be fairly quick to solve. As an incentive, doing these ’extra-curricular’ questions should give you an edge over your rival Tommy Moore, Uni Hall, St. Cats and Trinity students. 4 Tutorial 1 - Dimensional Analysis and the Buckingham Pi Theorem (part I) 4.1 Prologue: March 15, 2014 Dimensional analysis is a deceptively simple, but fundamentally powerful tool in the mathematical sciences – one that is often overlooked! There will be a day when the importance of dimensional analysis is forgotten and lost in the education system, but today is not that day. Ultimately, dimensional analysis serves as fast error-checking algorithm for your calculations. It is also useful for extracting ‘physically meaningful’ information out of your system. In particular, given a large set of parameters describing a system, one can often form a smaller number of dimensionless parameters which completely characterize that system – hence removing any redundant information. The precise statement of the last idea is known as the ‘Buckingham Pi Theorem’1, which we shall investigate next week – don’t worry about the formality of the name, it has vast (but simple) practical applications to fluid mechanics, thermody- namics, electrodynamics, cosmology and much more. For now, we begin with a few examples then work through some questions 2. 1 For those of you who have done (or will do) linear algebra, this is just a practical consequence of the ‘rank-nullity’ theorem. 2 Thanks to Scott Meyer and Matthew Fernandez for feedback 7
  • 8.
    The main ideaof the following examples and problems is two-fold: first inspect an equation and work out the dimensions (or units) of each variable and constant, given some starting information. We then check whether or not the equation is dimensionally consistent. Any equation from any area of science and mathematics must be dimensionally consistent – if it isn’t, then it’s wrong. In this sense, you don’t need to understand the science or theory behind an equation to deduce when it is incorrect on dimensional grounds. 4.2 Examples and Problems Recall lengths, areas and volumes. The fundamental unit that characterizes these quantities is length: L. Given a rectangular box, with sides of length a, b, c the volume is VB = a × b × c. Since each of the sides has the dimensions of length: [a] = [b] = [c] = L, the volume has dimensions [VB] =[a × b × c] =[a] + [b] + [c] =L + L + L =3L , which we interpret as length-cubed: L3. The notation [ ] is used to denote the dimensions of whatever quantity is inside the brackets. Notice also, that when we were looking for the dimensions of a product of variables [a × b × c], we added the dimensions of each variable: [a×b×c] = [a]+[b]+[c] = L+L+L = 3L.Finally, we ended up with [VB] = 3L, which means that the volume V has 3 factors of the unit length L – hence volume V has dimensions of length-cubed L3. Of course, we already knew this! Similarly to the multiplication rule, if we are inverting quantities we invert their units – hence: [1 a] = −[a], [ 1 a2 ] = −[a2] = −2[a], etc. Combining this with the multiplication rule, we get the division rule: [a b ] = [a] − [b]. For example, if C is the concentration of protein in milk, it has units ML−3 of mass over volume – hence dimensionally: [C] = M − 3L. Exercise 1 Use the rectangular box example to calculate the dimensions of the area of a rectangle of sides with length ‘a and ‘b , given the area formula AR = ab. (1) Now that we have done some simple problems, lets see how dimensional analysis can be used for error checking. Lets say someone tells us that the volume VS of a 8
  • 9.
    sphere of radiusR is given by VS = 4 3 πR2 . (2) Obviously, this is wrong – but if you’ve forgotten the correct formula, there’s an easy way to see why it is wrong using dimensional analysis. First of all [R] = L, since radius has dimensions of length. Furthermore, [4 3π] = 0 since this is just a pure number (so it is dimensionless). Therefore, [VS] =[ 4 3 πR2 ] =[ 4 3 π] + [R × R] =[ 4 3 π] + [R] + [R] =0 + L + L =2L. But wait a minute, volume has units of length cubed, hence [VS] = 3L. We then conclude by dimensional arguments that the formula VS = 4 3πR2 is incor- rect! Although the last example was easy, the same principles can be applied to much more complicated formulas in the mathematical sciences – indeed, it is used in research and in practice when doing estimates, checking articles or performing large derivations and calculations. Lets do one more example. Example 1 Newton’s Second Law of Motion: Force = Mass × Acceleration, or F = ma, is the fundamental postulate governing classical physics between the late 17th and early 19th centuries. It is vastly important today as the law defines what the force is, for an object of mass ‘m moving with an acceleration ‘a . The three fundamental units here are mass M, time T and length L. Displacement ‘x has dimensions of length L, hence velocity ‘v – which is the rate of change of 9
  • 10.
    displacement 3, hasunits of length over time: [v] =[ dx dt ] =[dx] − [dt] =[x] − [t] =L − T , (3) hence v has units L T . Similarly, acceleration a is the rate of change of velocity, hence [a] =[ dv dt ] =[dv] − [dt] =(L − T) − T =L − 2T, (4) which means ‘a has units of length over time-squared: L T2 . Finally, mass m triv- ially has units of mass: [m] = M (note that here we use the capital M to denote the fundamental unit of mass, where as the lower-case m is mass variable that we insert into Newton’s 2nd Law). Therefore, force F has the following dimensions [F] =[m][a] =[m] + [a] =M + L − 2T, (5) whence F has units of (mass × length)/ (time-squared): ML T2 . Exercise 2 Use dimensional analysis to conclude which formulas are incorrect on dimensional grounds – i.e. which of the following formulas are dimensionally inconsistent. Show your working. 1. A triangle has a base b and a vertical height h, each with dimensions of length L. Check whether the following formula for its area is dimensionally consistent A = 1 2 b2 h. (6) 3 For those of you unfamiliar with the definition of velocity and acceleration in terms of calculus, you can think of dx dt as the change in displacement x over an ‘infinitesimally small amount’ of time dt. Then dx carries dimensions of length and dt has dimensions of time: [dx] = L , [dt] = T. Note that in general, for an arbirtrary quantity y, the ‘infinitesimal quantity’ dy carries the dimensions: [dy] = [y]. 10
  • 11.
    2. A circlehas a radius r with dimensions of length L. Its area is given by A = 1 2 πr2 . (7) Is this dimensionally consistent? A stronger question to ask is whether this formula is correct – if not, why not? There is one more rule of dimensional analysis which involves analysing equations which include a sum of terms. In particular, given a quantity A = B + C + D, to compute the dimensions [A] of A, we don’t just add the dimensions of B, C and D: [A] = [B] + [C] + [D], (8) but rather, we have the consistency requirement that: [A] = [B] = [C] = [D]. (9) This is because B, C and D should all separately have the same units. As such, this observation is very useful for determining the dimension of multiple unknown quantities in an equation that involves a sum of different terms. For example, the area of a toddler house drawing is given by: AHouse = ATriangle + ASquare = 1 2bh + a2, where b is the base length of the triangle, h is its vertical length and a is the length of the sides of the square. Therefore, [AHouse] = [ATriangle] = [ASquare] = 2L, hence [1 2bh] = [a2] which implies [b] + [h] = 2[a] = 2L. One last concept: A dimensionless constant, C, is defined to be a quantity which has no dimensions – hence [C] = 0. These are fundamentally important in the description of a physical system since they do not depend on the units you choose. Thus, in some manner they are represent a ‘universal’ quantity or property – indeed, the dimensionless constants of a system describe a universality class 4. To answer the following questions, try not to worry too much about terminology or new and abstract concepts. We are only interested in dimensions – so if you stay focused and don’t get distracted by the extra information, you can finish them quickly with no prerequisite knowledge! Exercise 3 1. A hypercube living in d dimensions has d sides, each with length a and dimensions of length L. Its hyper-volume has units of Ld and is given by the formula V = aD . (10) 4 A more precise meaning of this statement can be found in the theory of ‘Renormalization Groups’. 11
  • 12.
    Verify that thisis dimensionally consistent – i.e. show that [V ] = L + ... + L = d × L. What dimensions would its surface area have? Hint: this would be same the as dimensions of the area of one of its ‘faces’. 2. The U.S. Navy invests a significant amount of money into acoustic scatter- ing studies for submarine detection (SONAR). As part of this research, the Dahlgren Naval Academy uses ‘prolate spheroidal harmonics’ (vibrational modes of a ‘stretched sphere’) to do fast, accurate scattering calculations. In this process, a submarine can be approximated to be the shape of a ‘prolate spheroid’ or ‘rugby ball’. A prolate spheroid is essentially the surface gen- erated by rotating an ellipse about its major axis. Given a prolate spheroid with a semi-major axis length a and semi-minor axis of length b, its volume is V = 4π 3 ab2 (11) Is this formula dimensionally-consistent? What about the following formula for the surface area (it should have units of length-squared): S = 2πb2 (1 + a be sin−1 (e))? (12) Note, sin−1 is the ‘inverse sine’ or ‘arcsine’ function. It necessarily pre- serves dimensionality, hence [sin−1 (e)] = [e]. The variable e is the ‘ec- centricity’ of the spheroid. It is a dimensionless quantity: [e] = 0, which measures how ‘stretched’ the spheroid is – i.e. how much it deviates from a sphere. It is given by the (dimensionally-consistent!) formula: e2 = 1 − b2 a2 . (13) A perfect sphere corresponds to e = 0, where as an infinitely stretched sphere corresponds to e → 1. 3. In a parallel-universe, Andrew Forrest has a dungeon with BF flawless black opals inside it. From a financial point of view, these have dimensions of money $ – i.e. [BF ] = $. A machine recently designed by Ian McArthur, head of physics at UWA, uses quantum fluctuations of the spacetime vacuum to produce black opals at a rate of RUWA black opals per minute. Sensing the loss of his monopoly on the black opal market, Andrew Forrest employs a competing physicist at Curtin University to create a quantum vacuum sta- bilizer. This reduces the number of black opals that Ian can produce per minute by RC black opals per minute, where |RC|≤ RUWA. Working on 12
  • 13.
    a broad conceptproblem, a team of first year students at St. George’s col- lege come up with the following model to predict the value V of shares in Forrest BlackOps inc. on the stockmarket as a function of time t (time has dimensions T): V = β D BF − λ(RUWA + RC)τDe−λ(1− t τ ) (14) where the constant τ (having dimensions of time T) denotes 5 the time at which European Union is predicted to collapse. Furthermore, D is a function that measures the market demand for black opals (with no dimensions) and β is an economic constant predicted by game theory with units of money- squared: $2. Finally, λ is a dimensionless parameter (so [λ] = 0) that de- pends on the number of avocados served at the college since the establish- ment of St. George’s Avocadoes Anonymous up to the given time t. Is this model dimensionally consistent – i.e. does [V ] = $? What about the following formula, proposed by students from St. Catherines College (who didn’t practice dimensional analysis)? V = D BF − D2 e−t (15) On dimensional grounds, list two reasons why this model incorrect. 4. Bonus Question (Don’t worry about the physics, just keep track of dimen- sions and rules) The Harvard-Smithsonian Center for Astrophysics is about to release a press- conference tomorrow (March 17, 2014), indicating the discovery of gravi- tational waves. Gravitational waves are ripples through spacetime created by large gravitational disturbances in the cosmos – for example, exploding stars and coalescing black-holes. These are predicted by Einstein’s theory of General Relativity – a theory in which gravity is a simple consequence of the geometry (shape) of spacetime. In this theory, choosing natural units for the speed of light: c = 1, time and spatial length become dimensionally equivalent: T = L. Therefore, dimensionally we have: [time] = [distance] and [c] = [distance/time] = L − T = 0. A geometry which models gravi- tational waves is described by the following metric (an abstract object which tells you how gravity and measures of time and length vary at each point in spacetime): g = η + h (16) 5 This is the Greek letter tau – not the Roman letter t. 13
  • 14.
    where η isa flat-space metric (describing an empty universe): η := −dt + dx ⊗ dx + dy ⊗ dy + dz ⊗ dz (17) and h is a symmetric-tensor, given in de-Donder gauge by h := cos(k · r)A + 1 2 × trace(h) × η. (18) Here is a small (<< 1) dimensionless parameter: [ ] = 0 and A is a sym- metric tensor field with dimensions of length-squared: [A] = 2L. Note that the trace operation turns tensors into scalars, so it removes the dimensional- ity of a tensor: [trace(h)] = 0. Furthermore, consider · as another form of multiplication. Since the wave vector k and position vector r have inverse units, we have [k] = −L, [r] = +L – hence [k · r] = 0. For the purposes of dimensional-analysis, we can treat the tensor product ⊗ as ordinary mul- tiplication also. The differential quantities have the following dimensions: [dt] = [dx] = [dy] = [dz] = L, hence [dx ⊗ dx] = 2[dt] = 2L for example. Since x, y, z, t represent coordinates in spacetime, we also have [x] = [y] = [z] = [t] = L. Show that the metric g demonstrates a dimensionally-inconsistent solution to the Einstein field equations. Where is the error? Suggest what could be done to this metric to ‘fix’ it and give a dimensionally-consistent solution. Remark: If you were certain that the equation for h was correct, it would be unnecessary to tell you the dimensions of A – you could work it out since you already know [cos(k · r)] = 0 (the function cos(something) is necessarily dimensionless). Therefore, pretending [A] is unknown, prove that [A] = 2L given all the other information. After completing the last two problems, one should realize that much time can be saved by ignoring most of the information and concentrating only the dimensions of the variables and constants in the given formulas. This is true in general! There- fore, to do dimensional analysis, one need not necessarily understand the science or mathematics behind an equation – but simply the dimensions of the quantities involved. Therefore, it is an easy way to show when something is wrong without knowing what you are talking about. 6 6 Dimensional analysis would have saved the present author about 100 hours of supergravity cal- culations – time which was largely lost due to two dimensionally-inconsistent equations in a pub- lished journal article. 14
  • 15.
    4.2.1 Moral ofthe story Dimensional analysis can tell you when an equation is wrong, but it doesn’t nec- essarily imply that an equation is correct – even though its dimensions might be consistent. As a student, you should make use of dimensional analysis whenever you can – try it on all formulas you get which have dimensionful quantities. This will help you to gain a strong intuition of whether or not statements and equations are sensible and consistent. This helps you to be a fast calculator and it will also help you to pick up errors in your lecture notes ... 5 Tutorials 2 - Dimensional Analysis and the Buckingham Pi Theorem (part II) 5.1 Background One of the key concepts in dimensional analysis is that of dimensionless parame- ters. Dimensionless parameters are important, because they allow you to charac- terise both physical and theoretical mathematical systems in a scale-invariant way. Note that mastering the following concepts and exercises requires a good under- standing of the material in Tutorial 1. For the more mathematically inclined, one of the examples and exercises illustrates how to mathematically prove the π theo- rem by using the rank-nullity theorem from linear algebra – this is a good exercise for understanding matrix equations and the correspondence between matrices and simultaneous equations! For the applied minds, we use dimensional analysis to in- vestigate and form dimensionless constants to characterise the harmonic oscillator, viscous fluids, electromagnetism and Einstein’s theory of gravity. BIG DISCLAIMER: Notation Note that for the most part, we have used ‘additive notation’ to denote the dimen- sions of some quantity – e.g. [Force] = M + L − 2T. However, in engineering and sometimes in physics 7, you will often see multiplicative notation being used – meaning F has dimensions ML T2 . For these tutorials, we have referred to the later as the ‘units’ of F, rather than its dimensions. Technically speaking, both are correct – although units typically refer to some standard of measure, such as kilograms or kg for the standard SI unit of mass. Here we’ve just taken M, L, T to refer to both dimensions and their respective standard units. After some practice, it should 7 In particle physics and quantum field theory, additive notation is common for computations as it is the smarter way to do things. 15
  • 16.
    be easy interchangebetween conventions – the reason we use additive notation is that it’s faster to calculate dimensions this way and it is less prone to mistakes (since you are adding and subtracting instead of multiplying and dividing). Fur- thermore, additive notation makes it easier to prove things like the π theorem for dimensional A physical system in the mathematical sciences typically consists of: 1. A set of physical parameters. 2. A set of governing equations which describe the behaviour or evolution of the system. 3. A set of fundamental ‘units’ which describe the dimensionality of the sys- tem. 5.2 Examples and Problems Example 2 Lets take a simple, but profound 8 example – the simple harmonic oscillator. One example of a simple harmonic oscillator, is a mass placed on a fric- tionless tabletop attached to a spring. This string is either stretched or compressed, then released so that the mass proceeds to undergo simple harmonic motion. This physical system is therefore described by 1. A set of 4 physical parameters: the spring constant κ and the initial position x0 and initial velocity v0 of the mass m. 2. An equation of motion called ‘Hooke’s Law’ 9, which says that when you stretch or compress the spring, the force acting to restore the spring to its natural length is given by: F = −κx (19) where x is the displacement of the mass attached to the spring. Combining this with Newton’s 2nd Law, F = ma, we get the equation of motion for the spring: m d2x dt2 = −κx, (20) where a = d2x dt2 is the acceleration of the spring. 8 Despite its simplicity, the (quantum) harmonic oscillator is the cornerstone for modern quantum field theory and particle physics. In this picture, a quantum field is an infinite continuum of simple harmonic oscillators, whose motion is captured by Fourier theory, Lie algebras and Special Relativity. 9 After the famous pirate, Captain Hooke. 16
  • 17.
    3. A setof 3 physical units: mass M, time T, length L (usually kilograms, seconds, metres). Now, from these 4 parameters and 3 physical units, I claim that we can form one dimensionless constant. To do this, one needs to know the dimensions of the pa- rameters involved. Clearly initial displacement has dimensions of length and initial velocity has dimensions of length /time: [x0] = L, [v0] = L − T. To work out the dimensions of the spring constant κ, we inspect the equation of motion. Since acceleration has dimensions of length over time-squared, we have [d2x dt2 ] = L − 2T. Therefore, we have [m d2x dt2 ] = [−κx] =⇒ [m] + [ d2x dt2 ] =[κ] + [x] M + L − 2T =[κ] + L =⇒ [κ] =M − 2T. (21) Note that the mathematical symbol ‘ =⇒ ’ means ‘implies’. Now that we have the dimensions of all parameters in this system, we can form a dimensionless product. In particular, we need one inverse mass factor and two factors of time to cancel the dimensions in [κ] = M −2T. We can get an inverse unit of mass from [ 1 m ] = −M and two inverse time units by combining [x0] = L and [v0] = L − T. In particular, [(x0 v0 )2] = 2[x0] − 2[v0] = 2L − 2(L − T) = 2T. Hence, we get the dimensionless constant: G := k m ( x0 v0 )2 =⇒ [G] =[ k m ( x0 v0 )2 ] =[k] − [m] + 2([x0] − 2[v0]) =M − 2T − M + 2T = 0. (22) Since the constant G has no formal name, we will claim it and call it the ‘Georgian Constant’ after St. George – the patron saint of dimensional analysis. The last example illustrated a few important concepts. First of all, we showed that mathematically all the information about a physical system is giving by a set of parameters, a set of physical units or dimensions and at least one governing equation. Second, we showed how we can work the units of an otherwise unknown constant by using dimensional analysis – this is how we found the dimensions of the spring constant κ. 17
  • 18.
    Finally, we showedin this particular case, having 4 parameters and 3 physical units, we were able to form one dimensionless constant: G . Although we could have taken any multiple or power of this constant and still arrived at dimensionless quantity, there essentially only one independent product that we can form out of the parameters in the simple harmonic oscillator. This is because G, 1 G , G2 or 2G for example, all contain the same ‘information’. The last observation is one example of the ‘fundamental theorem of dimensional analysis’, also known as the ‘π theorem’. Theorem 1 (Buckingham Pi Theorem) Given a system specified by n indepen- dent parameters and k different physical units, there are exactly n−k independent dimensionless constants which can be formed by taking products of the parameters. Thus in the last example, we saw that the simple harmonic oscillator was described 4 parameters and 3 physical units – hence as claimed, there was indeed only 4−3 = 1 independent dimensionless constant that we could have formed. Hence, any other dimensionless constant in this system must be some multiple or some power of G. Before doing the exercises, here is one more example from fluid mechanics. Example 3 In fluid mechanics, the notion of the ‘thickness’ of a fluid is formalized by defining its ‘viscosity’. In particular, the dynamic or shear viscosity of a fluid measures its ability to resist ‘shearing’– an effect where successive layers of the fluid move in the same direction but with different speeds. For example, relative to water, glass 10 and honey have a very high shear viscosity, whereas superfluid Helium has zero viscosity 11. Given a fluid trapped between two parallel plates–the bottom plate being station- ary and the top plate moving with velocity v parallel to the stationary plate, the magnitude of the force required to keep the top plate moving at constant velocity is given by: F = ηA v y (23) Here v is the speed (magnitude of the velocity) of the top plate, A is its surface area and y is the separation distance between the plate. The parameter η is defined to be the shear viscosity of the fluid. We can calculate its units using dimensional analysis. First, from Newton’s 2nd law we know that the force has the dimensions: [F] = M + L − 2T. Furthermore, the area A has dimensions of length-squared 10 The myth about old church windows sagging is not due to the fact that glass can be modelled as a viscous liquid, but rather due to the glass-making techniques of past centuries. 11 The transition to the ‘superfluid’ phase occurs below 1 Kelvin – i.e. close to absolute zero temperature. 18
  • 19.
    [A] = 2L,the speed v has dimensions [v] = L − T and the separation y has dimensions [y] = L. Hence [F] =[η] + [A] + [v] − [y] =⇒ [η] =[F] − [A] − [v] + [y] =(M + L − 2T) − 2L − (L − T) + L =M − L − T (24) whence η has units of M LT . Now, the kinematic viscosity ν 12 of the fluid is defined as the ratio of the dynamic viscosity η and the density ρ (mass per volume) of the fluid: ν = η ρ . (25) Since density has units of mass per length-cubed, we have [ρ] = M − 3L and thus [ν] = [ η ρ ] = [η] − [ρ] = M − L − T − (M − 3L) = 2L − T. (26) In some set of scenarios, we can think of this fluid as parameterized by four pa- rameters: density ρ, shear viscosity η , kinematic viscosity ν and the fluid speed v (assuming the fluid only travels in the horizontal direction). Since we have three different physical units – mass, length and time, the Pi theorem tells us we can form one independent dimensionless constant. This special, widely-used constant is called the ‘Reynolds number’ of the fluid and is defined by: R = ρvl η = lv ν (27) where l is the ‘characteristic length scale’ for the fluid system (e.g. for a fluid flowing in a pipe, this length scale would be the diameter of the pipe). In essence, the Reynolds number expresses the ratio of inertial forces to the viscous forces. In this manner, it describes relative importance of these two types of forces in different scenarios. Since it is dimensionless, the Reynolds number is scale invariant – meaning it characterises the way a fluid will flow on all length scales (within the valid regime of your theory). Exercise 4 We defined the Reynolds number R in two ways – one in terms of its dynamic viscosity η and the other in terms of its kinematic viscosity ν. Show that the Reynolds number is dimensionless using both of its definitions. 12 This is the Greek letter ‘nu - not the Roman letter ‘v’. 19
  • 20.
    Example 4 (MathematicalChallenge: Proving the π Theorem) Here is a walk- through of a proof of the Pi Theorem, using the ‘rank-nullity’ theorem from linear algebra. For those of you who haven’t encountered matrices before, you can still make sense of the following in terms of systems of linear equations – but that will be trickier ... so either save it for later, or talk to your tutor. Formally, the rank-nullity theorem states that given a m × n matrix (m rows, n columns) A, which maps n-dimensional vectors to m-dimensional vectors, then the rank and nullity of the matrix A satisfy: rank(A) + nullity(A) = n (28) where the rank of A is defined as the number of linearly independent rows of A and the nullity of A is defined as the dimension of the kernel of A – i.e. the number of linearly independent n-dimensional vectors which get mapped to 0 by A. Note that m ≤ n necessarily (or the system is over-determined). Now, in the context of dimensional analysis and the π Theorem, we can think a mathematical or physical system with n parameters and k different types of fun- damental units (dimensions) as a system of k linear equations in n unknowns, as follows. Say for example, we have three parameters x, y, z and two fundamental physical units U1, U2. Then we can represent the dimensions of our parameters as a matrix by letting each column correspond to different parameters and letting each row correspond to different fundamental units. So in this example, we let the first column correspond to the parameter x, the second column to y and the third column to z. Then the first row corresponds to the unit U1 second row to the unit U2. Then the entry in the first row and column corresponds to the number of di- mensions of x has in the unit U1. So if for example, x has the units Ua 1 Ub 2 then it has dimensions: [x] = [Ua 1 ] + [Ub 2] = aU1 + bU2. Similarly, let y have units Uc 1Ud 2 and z have units Ue 1 Uf 2 : hence [y] = cU1 + dU2 and [z] = eU1 + fU2. We can form the ‘dimensional matrix’ D for this physical system, which is represented as: D = a c e b d f (29) To see that this makes sense, we can simply act13 the transpose of the dimensional matrix DT on the vector U = U1 U2 containing the physical units to recover all three of our dimensional equations [x] = aU1 + bU2, [y] = cU1 + dU2 etc. To find dimensionless constants, we have to solve the ‘nullspace equation’: 13 By matrix multiplication. 20
  • 21.
    a c e bd f ! α β γ ( ) = 0 0 for all possible vectors ! α β γ ( ). In particular, dimen- sionless constants will be a product of powers of the different physical parame- ters: xαyβzγ, where the exponents α, β, γ are components of a vector ! α β γ ( ) which solves the nullspace equation. The number of linearly independent vectors ! α β γ ( ) which solves the null-space ma- trix equation, coincides with the ‘nullity’ of the dimensional matrix D – it is pre- cisely equal to the number of dimensionless constants we can form. In particular, since we have n = 3 independent physical parameters x, y, z corresponding to three columns of our dimensional matrix D and k = 2 fundamental units U1, U2 corresponding to the two (linearly-independent 14) rows of D, the rank-nullity the- orem tells us that the nullity of D is given by nullity(D) = n − k = 3 − 2 = 1. (30) Since the nullity of D is precisely equal to the number of dimensionless constants we can form for this physical system, this shows that the π Theorem for dimen- sional analysis, is just a special instance of the rank-nullity theorem for linear al- gebra. Exercise 5 (Challenge: Finish proving the π Theorem) In the previous example, we set-up the proof of the π theorem for the general case ... but really only proved it for the case of 3 parameters and 2 fundamental units. By extending the argu- ment to n parameters x1, ...., xn and k units U1, ..., Uk, prove the π theorem for the general case of arbitrary n and k. Hint: Sketching this proof simply amounts to keeping tracking of your indices and labels. As a suggestion, try denoting the units of x1 by Ua11 1 ...Ua1k k and the units of x2 by Ua21 1 ...Ua2k k etc. If you have completed and understood these exercises, you are well on your way to becoming an expert in dimensional analysis. Soon you’ll be better than your lecturers (possibly). 14 These rows are necessarily linearly independent, since we assume our fundamental physical units to be independent – by definition. 21
  • 22.
    6 Tutorial 3- Return of Dimensional Analysis: Gravity, The Hierarchy Problem and extra-dimensional Braneworlds 6.1 Introduction The following is an extended exercise which test all the skills the tutorials have elu- cidated so far in dimensional analysis. It will also you introduce to some concepts which may be new and bizarre, whilst linking them back to everyday reality. The overall goal will be to derive a dimensionless constant that characterises classical gravity on all length scales (no knowledge of relativity is required)! By comparing this constant to another dimensionless constant from electromagnetism, we will see why gravity is so much weaker than the other three forces in nature – then investi- gate a solution to this peculiarity using brane-world models of the universe. 6.2 Background As far as we understand, all interactions in nature take place through four funda- mental forces. At present, we have a rather ‘successful’ theoretical and experimen- tal quantum description of three of these forces – that is, we have constructed quan- tum field theories to describe the ‘quanta’ (particles) which mediate these forces. Gravity, despite our everyday experience of it, remains somewhat mysterious and theoretically elusive in several ways – in particular, because it is highly resistant to all attempts to turn it into a quantum theory like the other forces. As a reminder, the four forces dictating our universe are the • Electromagnetic Force: Which governs electromagnetic radiation (such as light) as well as interactions between charged particles. In the quantum de- scription (Quantum Electrodynamics), this force is carried by massless par- ticles known as ‘photons’. • Weak Nuclear Force: In the quantum description, this force is mediated by massive particles known as the Z and W± bosons. It is involved in quark transformations as well as some interactions between charged particles. • Strong Nuclear Force: In the quantum description (Quantum Chromodyan- mics), this force is mediated by ‘gluons’ and is responsible for the interac- tions between quarks, which are the particles making up hadrons such as the proton and neutron. In this manner, it is responsible for processes such as fusion, which is the source of energy for our sun. 22
  • 23.
    • Gravitational Force:In the attempted quantum descriptions, this force is mediated by a massless particle known as the ‘graviton’. It is responsible for the interactions of all particles with mass, but also determines the trajectories of massless particles (e.g. gravitational bending of light) since it warps the spacetime continuum. At higher energies, these four forces start to unify into one single force – for ex- ample, the electromagnetic and weak nuclear forces unify to make the electroweak force. Attempts to unify the electroweak and strong nuclear forces have been par- tially successful and fall under ‘The Standard Model’ of particle physics. On the other hand, attempts to unify gravity with the other forces have been largely un- successful, with the only real promising candidate being String Theory. One of the biggest mysteries about the gravitational force, is why it is so weak com- pared to the other forces in nature. In some sense this is ‘unnatural’, hence suggests that on some deeper level, gravity is fundamentally different form the other forces. As the goal of this tute, we will use dimensional analysis to characterise the grav- itational and electromagnetic forces with some special dimensionless constants – then compare their strengths to prove this claim. Finally, we will end on some very recent 15 advancements in theoretical physics which propose an explanation of why gravity is the weakest of the four forces. 6.3 Extended Problem Exercise 6 (Newton, Einstein and Braneworlds: The Gravitational Coupling Constant) Of the many things that Isaac Newton is famous for, one of them is coming up with multiple mathematical proofs of the fact that the planets orbit the sun in elliptical paths – and that this elliptical motion is a direct consequence of an inverse square law. Thus, by planar geometry and calculus he came up with the following gravi- tational force law to explain the astronomical observations of Johannes Kepler and Tycho Brae: F = −GN m1m2 r2 ˆr (31) where GN is Newton’s gravitational constant, m1 and m2 are the masses of two objects separated by a distance r and ˆr is a ‘unit vector’ (vector with magnitude 1) pointing from one object to the other. This tells us the gravitational force that one massive object exerts on another massive object. 15 The last 5-10 years. 23
  • 24.
    QI:Using Newton’s 2ndLaw, F = ma, deduce the dimensions or units of GN . Note that you are working with mass, length and time (M,L,T) as your fundamental units, hence [m1] = [m2] = M. Furthermore, by definition the unit vector 16 ˆr = r2−r1 |r2−r1| is dimensionless: [ˆr] = 0. Note that in general, the dimensions or units of a vector quantity are always the same as the units of the magnitude (and components) of that vector – hence [r] = [r] for example. Now that we have the dimensions of GN , we are ready to consider Einstein’s theory of gravitation. Einstein’s theory differs from Newton’s theory in many ways – fun- damentally it explains gravity as a consequence of spacetime curving around any object with mass, where the ‘amount’ of curvature being greater for greater masses (e.g. the Sun). On an astrophysical level, it is important as it helps to explain the big bang, solar fusion and the existence of the black holes – objects which are nec- essary for the stability of some galaxies such as the Milk Way. In terms of everyday living, general relativity is essential for the operation of GPS satellites – without the gravitational corrections to the timing (gravitational time-dilation) offered by Einstein’s theory, the GPS system would not be accurate enough to work. In Einstein’s theory, spacetime is modelled by the following objects 17 • A energy-momentum tensor T which contains information about ‘sources’ of curvature – matter and energy. It’s components have dimensions of an energy-density: [Tab] = [ Energy V olume ] = M − L − 2T. Since the tensor itself is a second-rank covariant tensor, we have: [T] = [Tabdxa ⊗ dxb] = [Tab] + [dxa ⊗ dxb] = M − L − 2T + 2L = M + L − 2T. Note that the dimensionality of energy can be deduced from the relation: Work = Force × Distance and hence [Energy] = [Work] = [Force] + [Distance] = M + L − 2T + L = M + 2L − 2T. • A metric tensor g describing how gravity distorts measures of length and time. This has units of length-squared: [g] = 2L. • The Riemann Curvature tensor, Riem, describes how the curvature of space- time varies in different regions. It also measures how gravity distorts parallel- 16 Here r1 and r2 are the position vectors describing the location of the masses m1 and m2 with respect to some origin. 17 Note that most physicists do not understand differential geometry, hence when they speak of ten- sors they usually are talking about components of tensors. This won’t matter here, but for reference, if you ever want to compare: covariant tensors have two extra factors of length compared to their components and contravariant tensor have two factors less than their components – which basically means adding ±2L to the dimensions. 24
  • 25.
    transport. It isgiven roughly 18 as the anti-symmetrized second tensor ‘gra- dient’ of the metric: Riem ∼ ⊗ ⊗ g, where are a type of derivative operator and ⊗ is a type of multiplication for tensors. • The Ricci tensor, Ric, is given by taking the trace of the Riemann tensor: Ric = Trace(Riem). It describes how gravity distorts volumes and is also related to how different geometries evolve under the heat equation. • The Ricci Scalar R – this quantity is a function which measures how gravity locally distorts volumes. Einstein’s theory can be derived by saying that nature minimizes this quantity – an approach due to a mathematician named David Hilbert 19. It is given by the taking the trace of Riemann tensor twice: R = Trace(Trace(Riem)) = Trace(Ric). • The speed of light, c. This universal speed limit quantifies how fast mass- less particles can move and also how fast gravitational disturbances (gravity waves) can propagate. It has dimensions of speed: [c] = L − T. QII:Using the above information, derive the dimensions of Newton’s gravitational constant GN again, this time using Einstein’s law of gravity: Ric − 1 2 Rg = 8πGN c4 T. (32) You will need the following facts: the derivative operator reduces the length dimension of a tensor by one factor, whereas the tensor product ⊗ raises it by one factor (in this case). Hence [Riem] = 2[ ] + 2[⊗] + [g] = −2L + 2L + 2L = 2L. Furthermore, the trace of a (covariant) tensor reduces its length dimension by two factors, hence for example: Trace[Riem] = [Riem] − 2L. Tip: To ease calculations, you may use so-called ‘natural units’ where the speed of light c = 1. In these units length and time have the same dimensionality, hence [c] = [Distance] − [Time] = 0 and T = L. You will then get the dimensions of GN in natural units which you can compare to your value of GN using Newton’s Law, after you set T = L. Finally, we are in a position to understand a very special dimensionless constant – the ‘gravitational coupling constant’, αG. Since it is dimensionless, this constant characterises the strength of the gravitational force on all length scales (within the regime of validity of Einstein’s theory). It can be defined in terms of any pair of stable elementary particles – in practice, we use the electron. 18 Don’t ever show this to a differential geometer. If you want the real definition, see me. 19 In retrospect, David Hilbert deserves almost the same level of credit as Einstein for the theory of general relativity. 25
  • 26.
    In particular, wehave: αG = GN m2 e ¯hc ≈ 1.7518 × 10−45 (33) where c is the speed of light, GN is Newton’s gravitational constant and me is the mass on an electron. The quantity ¯h = h 2π is the reduced Planck constant which characterises the scale at which matter exhibits quantum behaviour such as wave- particle duality 20 QIII:Show that the gravitational coupling constant αG is indeed dimensionless. Note that [me] = M. To work out the dimensions of ¯h = h 2π , you will need the Planck-Einstein relation which relates the energy of a photon (particle of light) its frequency: E = hf. (34) Then [h] = [E] − [f]. Since the frequency of light is the number of oscillations of the electromagnetic wave per unit time, we have [f] = −T. You can get the dimensions , [E] of energy E from the calculation shown above for the energy- momentum tensor. Now, for the last part of this problem, we introduce one more fundamental phys- ical unit: the unit of electric charge, Q 21. Similar to the gravitational coupling constant, there is a dimensionless constant which characterises the strength of the electromagnetic interaction (which is responsible for almost all of chemistry) – the ‘fine structure constant’ αEM . The value of this constant is (accurately) predicted and measured using the theory of Quantum Electrodynamics, which is a type of quantum field theory largely due to Richard Feynmann and Freeman Dyson. It is given by αEM = 1 4π 0 e2 ¯hc (35) where 0 is electric permittivity of the vacuum. It has units [ 0] = [Farads/Meter] = [Seconds4 Amps2 Meters−2 kg−1] = 4T + 2Q − 2T − 2L − M. Hence [ 0] = 2T + 2Q − 2L − M. The parameter e is the charge of an electron, with dimensions [e] = Q. Using ‘natural units’ – a popular convention in particle physics, we set all of our previous parameters to equal 1. Thus, 4πGN = c = ¯h = 0 = 1, where 0 is 20 If ¯h was really large – say ¯h ≈ 1 for example, then we would observe wave-particle duality on a macroscopic scale and the universe would be a scary, crazy place. Bullets would diffract through doorways and Leanora’s fists could quantum tunnel through walls. 21 The SI unit for charge is Coulombs. 26
  • 27.
    electric permittivity ofthe vacuum. In these units, the fine-structure constant is given by αEM = e2 4π ≈ 7.297 × 10−3 . (36) QIV:Choosing natural units: 4πGN = c = ¯h = 0 = 1, is the same as forcing these parameters to be dimensionless. Show that this is equivalent to setting all the fundamental units to be the same T = L = M = Q. Hint: you should get four equations for the dimensions of these parameters. Note that you can calculate the values of the fine-structure and gravitational cou- pling constants yourself by Googling their values in SI units (or any other consis- tent set of units you choose). Taking their ratio, we see that (in natural units): αEM αG = ( e me )2 ≈ 7.297 × 10−3 1.752 × 10−45 ≈ 4.16 × 1042 . (37) This says that the electromagnetic force is about 42 orders of magnitude22 stronger than the gravitational force. In a similar fashion, the weak-nuclear force is about 32 orders of magnitude (1032) times stronger than gravity. The challenge to explain why gravity is so weak compared to the other forces is known as ‘the heirarchy problem’. One class of attempts to solve the heirarchy problem, involves the visible universe being confined to a 4-dimensional ‘brane’, which is basically a 4-dimensional slice living in a larger spacetime. Such models are called ‘braneworld models’. In this view, the electromagnetic, weak and strong nuclear forces take place on the 4- dimensional brane – but gravitational interactions (mediated by particles known as ‘gravitons’) take place in 4-dimensions and in the ‘large extra dimensions’. This then gives a natural explanation to the gravitational coupling constant being so small. In some variations 23, the introduction of large extra-dimensions also solves the ‘Dark Energy’ or ‘Cosmological Constant’ problem – where Dark Energy nat- urally arises as the ‘surface tension’ of the 4-dimensional brane. Using braneworld models, we can derive (!) Newton’s gravitational constant directly from the size (‘hyper-volume’) of the extra dimensions in our universe. A very special class of braneworld models , known as known as theories with ‘Supersymmetric Large Extra Dimensions’ envisions spacetime as 6-dimensional (4-dimensional brane + 2 large extra dimensions) with some extra symmetry added (super-symmetry) that enables bosons and fermions to transform into each other 22 Note, 42 is also the meaning of life. 23 Those investigated in the present author’s masters thesis. 27
  • 28.
    24. In thesemodels, the extra-dimensions take the form of some compact hypersur- face. Newton’s gravitational constant GN is then theoretically explained using the formula 25: GN = 3κ2 16πS (38) where S is the surface-area of the extra dimensions and κ is Einstein’s constant, with dimensions [κ] = [GN ]. QV:The above formula for GN is correct, even though it may look dimensionally incorrect. What units would S need to have for dimensional consistency? In that case, what quantity does the surface-area S actually represent? Hint: Recall the ‘unit vector’ in Newton’s law of gravity. The last problem illustrates a common theme in engineering, physics and math- ematics – normalization. Normalized quantities are typically dimensionless! As such, they are very useful and friendly to work with. 24 Supersymmetry removes the problem of Tachyons in String Theory and also stabilizes the mass of the Higgs boson. 25 First derived in this generality by the present author in 2013. 28
  • 29.
    7 Tutorial 4:50 Shades of Error, Shade I – Multivariable calculus and The Total Differential In this tutorial, we revise some elementary concepts from multivariable calculus – partial differentiation and the ‘total differential’ or ‘exterior derivative’. If you haven’t formally studied these topics then don’t worry – as long as you are comfort- able differentiating functions of a single variable, the rest will follow easily. After revising these mathematical tools, we will see how they are used in error analysis. In particular, the total differential provides an elegant way to compute the absolute error for any derived quantity, in terms of your experimental preci- sion error. This is extremely useful for the applied sciences and engineering. For those of you who are only interested in pure mathematics, then note that the tech- niques used here are precisely the same techniques that are used when you study linear approximations 26 and Taylor series expansions for functions of more than variable. Note that this tutorial is the first of a sequence of tutorials that will be dedicated to error analysis, least-squares regression (e.g. line of best fit) and other techniques that you will use frequently in statistics and the applied sciences to determine the value of derived quantity, along with an estimate of its corresponding error. As such they will successively build on each other. 7.1 Russian Playpen: Functions of more than one variable Given a function f of one variable, which maps real numbers 27 R to real numbers R, we can formally28 express it as: f :R → R :x → f(x), which says that f sends the number x another number f(x). For example, if we have the function f(x) = x2 whose graph is a parabola, then we write: f :R → R :x → x2 . 26 Linear or ‘tangent plane’ approximations are just a special case of a Taylor series expansion. 27 The ‘blackboard font’ r, denoted as R, symbolizes the set of ‘real numbers’. This is includes all integers, rational numbers, irrational and transcendental numbers (such as π) etc. 28 Technically, you restrict the set f is mapping from to its domain and the set it is mapping into, to its range. 29
  • 30.
    So for example,in this case we have f(1) = 12 = 1 and f(7) = 72 = 49 etc. We now generalize this as follows. A function f of more than one variable, maps several copies of the set of real numbers to several copies of the set real numbers. For example, a function f of two variables, x and y, can be formally expressed as f :R × R → R × R :(x, y) → f(x, y). Here the notation R × R means the set of all ordered pairs of real numbers (x, y). So for example, if we have the circular function given by: f(x, y) = x2 + y2, then we have f(1, −1) = 12+(−1)2 = 1+1 = 2 and f(π, ?3) = π2+( ?3)2 = π2+3 e.t.c. Note, there is nothing strange about functions of several variables. You see them everyday. For example, we can view the volume of a rectangular box with sides of length x, y and z as a function of three variables: V (x, y, z) = xyz. (39) Or, as another example, the concentration C of a substance dissolved in water will depend on the amount (‘mass’ or any other measure) m of the substance dissolved and the amount (volume) of water (or any other liquid) v it is being dissolved into. Thus we can consider the blood alcohol concentration C of a student at PROSH as a function of two variables: C = C(m, v), where m is the amount of alcohol and v is the amount of blood in that person. 7.2 Russian Daycare: Partial Differentiation To some, partial differentiation may sound hard. However, it is actually extremely simple – hence why it is taught to children at daycare in Russia. All you need to do, is differentiate your function with respect to some chosen variable, while treating all the other variables as constants. The notation for partial derivatives is given by ‘del’ symbol, ∂. So for example, if we are taking the usual total derivative with respect to x, we have the Leibniz29 29 Leibniz was Austrian version of Newton, or Newton was the English version of Leibniz. Leibniz developed calculus at the same time as Newton as well as several other fields of mathematics – such as binary numbers. 30
  • 31.
    notation d dx. Ifwe are taking a partial derivative with respect to x, we use the notation ∂ ∂x instead. The best way to illustrate is with a few examples. Example 5 (Return of the Box) Our rectangular box has now followed us into Tutorial 4. Having being stalked by this sentient box, we decide to partially differ- entiate its volume. Denoting the length of each of its sides by the variables x, y and z respectively, its volume is given by the following function of three variables: V (x, y, z) = xyz. Partially differentiating it with respect to x, we find: ∂ ∂x V (x, y, z) = ∂ ∂x (xyz) = ( ∂x ∂x )yz = yz. (40) What we did here was to treat y and z as constants, while differentiating with respect to x. Since the derivative of x with respect to x is just 1, we arrived at the above result. We show similarly that: ∂ ∂y V (x, y, z) =xz ∂ ∂z V (x, y, z) =xy. (41) Now, if we differentiate twice with respect to x, or twice with respect to y, we get: ∂2 ∂x2 V (x, y, z) =: ∂ ∂x ∂ ∂x V (x, y, z) = ∂ ∂x (yz) = 0 ∂2 ∂y2 (xyz) =: ∂ ∂y ∂ ∂y (xyz) = ∂ ∂y (xz) = 0, (42) where the notation ∂2 ∂x2 denotes the ‘second partial derivative’ with respect to x. Note that differentiating the volume with respect to x the second time, gives zero since the first derivative of V (x, y, z) with respect to x no longer depends on x – i.e. the product (yz) is a constant with respect to x, hence its partial derivative with respect to x vanishes. We can also take mixed derivatives. For example, differentiating V (x, y, z) first with respect to x and then with respect to y, gives: ∂ ∂y ∂ ∂x V (x, y, z) = ∂ ∂y (yz) = z. (43) Now, if take the derivatives in reverse order – y first, then x, we get ∂ ∂x ∂ ∂y V (x, y, z) = ∂ ∂x (xz) = z, (44) 31
  • 32.
    which is exactlythe same as taking the derivatives in original order. This illustrates an importantly and extremely consequential property of functions of more than one variable – in general, for ‘nice’30 functions (most functions you will ever deal with), the order in which you take two partial derivatives doesn’t matter. That is, for nice functions f, we have ∂ ∂y ∂ ∂x f(x, y, ...) = ∂ ∂x ∂ ∂y f(x, y, ...). (45) This observation is formalized as ‘Clairaut’s Theorem’ (or ‘Schawrz’s Theorem’)31. Exercise 7 (Sir Steven – The Suspicious Spheroid) A solid oblate spheroid (‘squashed sphere’)32, by the name of Sir Steven, suspiciously follows our friend – the rect- angular box, into Tutorial 4. Sir Steven was produced by rotating a filled-ellipse about its minor (shorter) axis. At his present age, Sir Steven has a minor axis length of 2b and a major axis length of 2a. Since being knighted, Steven has taken to a gluttonous lifestyle (hence a b). During Lent, Sir Steven decides to read Allen Mandelbaum’s translation of Dante Alighieri’s Divine Comedy – and in the midst of an epiphany, he decides to calculate his own volume, which is given as a function V (a, b) of two variables a and b (the semi-major and semi-minor axes lengths): VSteven = 4π 3 a2 b. (46) 30 This means functions with continuous second-partial derivatives. More general, the ability to commute the order of partial derivatives holds at any given point provided that the function has continuous second partial derivatives in some open neighbourhood about that point. 31 After the French and German mathematicians, Alexis Clairaut and Hermann Schwarz, respec- tively. 32 This problem is dedicated to Nicholas Jones, University of Bristol and his love of spheroids. 32
  • 33.
    Q: Compute thefollowing partial derivatives of V (a, b): ∂ ∂a V (a, b) = ∂ ∂b V (a, b) = ∂2 ∂a2 V (a, b) = ∂2 ∂b2 V (a, b) = ∂ ∂b ∂ ∂a V (a, b) = ∂ ∂a ∂ ∂b V (a, b) = ∂3 ∂a3 V (a, b) =. (47) Challenge Q (Russian Grade 1): With the help of his intelligent friend, Pappus 33 the Prolate Spheroid, Sir Steven manages to compute his surface area as a function S(a, b) of two variables a and b: SSteven = 2πa2 {1 + 1 − e2 e tanh−1 (e)} (48) where the eccentricity e of the generating-ellipse is defined via e2 = 1− b2 a2 . Using this surface area formula, compute the following partial derivatives: ∂ ∂a S(a, b) = ∂ ∂b S(a, b) = (49) Hint: You will need to use the product (Leibniz) rule for differentiation along with chain rule and the following identity for the derivative of arc-hyperbolic tan34 (hyperbolic tan inverse): d dx tanh−1 (x) = 1 1 − x2 . (50) 33 Pappus claimed that Hippasus – a student of the Ancient Greek Pythagorean school of geometry, was drowned for proving (or sharing) the ‘secret’ irrationality of ?2. 34 Sometimes the notation artanh(x) is used instead of tanh−1 (x) to the inverse hyperbolic tan- gent function. 33
  • 34.
    This derivative iswell-defined for all real-values of x such that |x| 1. Thus, you should replace x with e then use the chain rule to get partial derivatives of tanh−1 (e) with respect to a or b, since e is a function of a and b. Extra-Challenging Q (Russian Grade 1.1): If you think you have what it takes to pass Grade 1 in Soviet Russia, compute the following partial derivatives then check them using Wolfram alpha or Mathematica / Computer Algebra Software of Choice: ∂2 ∂a2 S(a, b) = ∂ ∂b ∂ ∂a S(a, b) = ∂2014 ∂a2014 S(a, b) = . (51) Hint: If you can derive an expression for ∂n ∂an S(a, b) where ‘n’ is an arbitrary positive integer (n = 1, 2, 3....) the last equation should be easy.35 7.3 Russian Kindergarten: The Exterior Derivative (Total Differen- tial) The ‘total differential’ or ‘exterior derivative’ of a function f, is denoted by df – the resulting object is known as an ‘exact differential 1-form’ or ‘co-vector’. We will see why it has the latter name shortly. To illustrate how to compute df, we give a few examples then state the general case. Given a function f = f(x) of a single variable x, its total differential is given by: df = df dx dx. (52) The quantity df dx is a function (the derivative of f with respect to x), however the quantity dx can be thought of in several ways. Formally, dx is a ‘differential 1- form’ or basis ‘co-vector’ analogous to the standard basis vectors you may have seen 36 e1, ˆx or ex. Informally, it can be thought of as infinitesimal quantity or length in the x-coordinate. You will also recall that when you integrate a function f = f(x) with respect to the variable x, you write it as: f(x)dx. (53) 35 Disclaimer: It’s probably not easy, relatively speaking. 36 These are some of the more common notations. 34
  • 35.
    If we replacef with its derivative df(x) dx , then we have df(x) dx dx = f(x) + c (54) where c is some constant of integration; this is just a consequence of the funda- mental theorem of calculus. Note that however, we said that df = df(x) dx dx, so we can actually view this statement as: df(x) dx dx = df = f + c. (55) In this manner, we can think of as a formal inverse37 of the ‘exterior derivative’ or ‘total differential’ operator d. For a function f = f(x, y) of two variables x and y, computing its total differential requires partial derivatives. In particular, we have df = ∂f ∂x dx + ∂f ∂y dy. (56) The object df is still a differential 1-form, but now it has two components: ∂f ∂x is the component in the dx direction and ∂f ∂y is the component in the dy direction. Alternatively we say ∂f ∂x is the coefficient of dx and ∂f ∂y is the coefficient of dy. Hence we see that the total differential df of the function f, behaves similarly to a 2-dimensional vector (when f is a function of two-variables) – which motivates the name ‘co-vector’ to describe df. We generalise this now, in the most natural way. For a function f of n variables x1, x2, ..., xn, its total differential is given by: df = ∂f ∂x1 dx1 + ∂f ∂x2 dx2 + ... + ∂f ∂xn dxn. (57) This says that we partially differentiate f with respect to each of its variables, then multiply that derivative by basis 1-form corresponding to the coordinate you are differentiating with respect to. Adding all of these together gives the total differential, shown in the equation (57). This may seem a little abstract, so its best illustrated with a few examples – which we will return to next week when we proceed with error analysis! Note that the exterior derivative operator d obeys the following general properties when acting on functions: 37 This is a very simple case of the so-called “generalized Stokes’ Theorem from differential geometry. 35
  • 36.
    1. Linearity: d(c1f+ c2g) = c1df + c2dg for any two constants c1, c2 and any two (differentiable) functions f, g. 2. Product (Leibniz) Rule: d(fg) = g(df) + fdg, for any two (differentiable) functions f, g. Example 6 (Rocky the Rectangular Box) Unable to stay down, Rocky the rect- angular box has returned to help with exterior derivatives. Rocky’s volume V is given as a function of three variables: V = V (x, y, z), where x, y, z are the lengths of its sides. Since V (x, y, z) = xyz, the total differential of the volume is given by: dV (x, y, z) := ∂V ∂x dx + ∂V ∂y dy + ∂V ∂z dz = yzdx + xzdy + xydz. (58) Observation: Notice that coefficient of dx is equal to yz, which is the surface area of the face of the box in the plane perpendicular to the x-direction. Similarly, the coefficient xz of dy is the area of the face of the box in the plane perpendicular to y-direction etc. Depending on the symmetry of an object, its surface area and volume are usually related in some manner by the operations of differentiation and integration. For example, Snorlax the Sleepy Sphere, has a volume V = V (r) which is a function of its radius. In particular, the exact differential of its volume is given by: dV (r) = d( 4π 3 r3 ) = 4π 3 d dr r3 = 4πr2 dr. (59) The coefficient of dr is the surface area of the sphere, perpendicular to the dr direction (recall that the surface of a sphere is perpendicular to its radius). In particular, the quantity dV dr = 4πr2 (60) is the surface area. 7.3.1 Exercises The following exercises are split into some purely mathematical exercises – geom- etry, along with some applied exercises (thermodynamics) for physicists, engineers and chemists. Bonus neural connections for those who complete both sets! Exercise 8 (Geometry of Solids) 36
  • 37.
    1. Given acircular cylinder of radius r and height h, we can view its volume V and surface area S as functions of two variables: V (r, h) =πr2 h S(r, h) =2π(r2 + rh). (61) Compute the exterior derivatives dV and dS. 2. An elliptical cylinder is a cylinder with elliptical cross-sections – you can think of its as ellipses stacked on top of each other ... 38 Given an elliptical cylinder with height h, cross-sectional ellipses with semi-major axes lengths a and semi-minor axes lengths b, its volume V and surface area S can be viewed as functions of three variables V (a, b, h) =πabh S(a, b, h) =2πab + 2πph. (62) where p is the perimeter of the elliptical cross-sections. To express p exactly, one requires an infinite series: p = 2πa(1 − ∞ n=1 (2n)!2 (2nn! )4 e2n 2n − 1 ) (63) where e = ?a2−b2 a is the eccentricity of the ellipse. Using the Ramanujan39 approximation: p ≈ π[3(a+b)− — (3a + b)(a + 3b)], compute the exterior derivatives dV and dS. 3. For those of you who have studied infinite series, compute dS using the exact expression for the perimeter of an ellipse stated above. Given these examples, in addition to the previous exercises, complete the following problems. Exercise 9 (Exact Differentials: Thermodynamics/Thermochemistry) Thermodynamics is a broad theory, originally explaining the phenomenon that we know as ‘heat’. More generally, it governs a vast range of macroscopic phenomena in nature – from reaction rates in thermochemical processes to the surface area of blackholes. The most famous abstraction of thermodynamics, due to Steven Hawking, Bill Un- ruh and Jacob Bekenstein, is that the surface area of a black hole is proportional to 38 Puns – bringing English lit and mathematics together since 1600. 39 A famous Indian child prodigy and mathematical genius who made great rediscoveries and con- tributions to number theory, estimations and analysis in isolation. 37
  • 38.
    its entropy andits temperature is inversely-proportional to its mass 40. One of the fundamental concepts in thermodynamics, is the minimization of different types of so-called ‘state functions’ or ‘thermodynamic potentials’ – representing different types of energies. • Internal Energy: U := U(S, V, Ni) = dU, where dU = TdS − pdV + i µidNi. (64) • Hemholtz Free Energy: F(T, V, Ni) = U − TS. • Enthalphy: H(S, p, Ni) = U + pV . • Gibbs Free Energy: G(T, p, Ni) = U + pV − TS. Here we stated the natural variables for each function, U, F, H and G in the brack- ets (..). These variables are entropy S, temperature T, volume V , pressure P and number (amount) Ni of the i th reactant species (i.e. substance, chemical etc). The chemical potentials µi are all fixed constants. Note, for those of you haven’t seen the sigma41 notation for summation, i simply means the sum over all species labelled by the index i. By keeping track of which variables each function is strictly dependent on and noting the expression for dU, prove that we get the following exact differentials: dH(S, p, Ni) =TdS + V dP + i µidNi dF(T, V, Ni) = − sdT − pdV + i µidNi dG(T, p, Ni) = − SdT + V dp + i µidNi. (65) Exercise 10 (Mathematical Proof: Cyclic Reciprocity Rule and Thermodynamics) The goal of this exercise is to understand the following proof, memorize the main steps (tricks) and then reproduce it from memory 42. Say we are looking at the level sets of a function of three variables – for instance, one of the thermodynamical potentials from the last exercise. In particular, suppose 40 Physicists that the present author has had the privilege of talking to in person :P. 41 Σ is the symbol for the Greek capital letter, ‘sigma’. 42 This problem is dedicated to Aston Williams, Engineer of Chemicals. 38
  • 39.
    we have afunction f = f(x, y, z) of the three variables x, y, z. If we have the additional constraint that: f(x, y, z) = 0 (66) (e.g. zero Hemholtz free energy), then the implicit function theorem from multi- variable calculus tells us that we can write any one of the variables x, y, z in terms of the two other variables. WLOG43 lets take the variable z to be a function of the two variables x and y: z = z(x, y). The exterior derivative (total differential) of z is then given by dz = ¢ ∂z ∂x y dx + ¢ ∂z ∂y x dy, (67) where as usual, the partial derivative ∂z ∂x is taken while y is held constant and the partial derivative ∂z ∂y is taken while x is held constant. This is made explicit by the notation ( ∂z ∂x) )y, where the brackets and subscript denote the variables we are keeping constant while differentiating 44 Taking dz = 0 (holding z constant), we can use the implicit function theorem again to view y as function of x (when dz = 0): y = y(x), hence we have dy = ¢ ∂y ∂x z dx. (68) Substituting this relation into the equation dz = 0, we see that: 0 = dz = ¢ ∂z ∂x y dx + ¢ ∂z ∂y x ¢ ∂y ∂x z dx. (69) Since this equality is actually a co-vector (differential 1-form) equality, we use the fact that a co-vector is identically zero if and only if its components are zero – i.e. the coefficients of dx in this case. Hence 0 = ¢ ∂z ∂x y + ¢ ∂z ∂y x ¢ ∂y ∂x z =⇒ (70) 43 This is a common mathematical acronym for ‘Without Loss of Generality’. 44 The reason for this pedantry now, is that usually we differentiate with respect to variables which are independent of each other. However, in the following step of the proof, y may also be related to z except in the special circumstance that dz = 0 – thus we must explicitly denote that z is being held fixed, hence this notation. 39
  • 40.
    ¢ ∂z ∂x y = − ¢ ∂z ∂y x ¢ ∂y ∂x z =⇒ ¢ ∂z ∂x y 1 ¡ ∂z ∂y © x 1 ¡ ∂y ∂x © z = −1 =⇒ ¢ ∂y ∂z x ¢ ∂z ∂x y ¢ ∂x ∂y z = − 1 . (71) This last relation (71), is known as ‘Euler’s cyclic rule’45 or the ‘triple product re- lation’ e.t.c. It is a quintessential identity used in thermodynamics since it allows one to typically express one set of physical quantities in terms of other physical quantities through the functional relations established by (71). One easy way to re- member it, is to look at the variables in the numerator: y, z, x, denominator: z, x, y and the subscripts on the brackets: x, y, z – they are all some cyclic permutation in the order: x → y → z → x. 45 After the prolific Swiss mathematician, Leonhard Euler – a name that appears everywhere in mathematics. 40
  • 41.
    8 Tutorial 5:Absolute Error and Game of Thrones In this tutorial we will investigate the task of computing the ‘absolute error’ in a given quantity, as a function of the precision in your measuring devices and mea- suring ability. The problems and examples will have a Game of Thrones theme, to celebrate (*spoiler alert*) the death of King Joffrey. 8.1 Absolute Error Most quantities that we measure in science and engineering are ‘derived quanti- ties’. This means that we measure them indirectly – in particular, we measure some set of basic properties of a system or environment, then use some mathe- matical model or formula to relate these properties to the quantity we are trying to measure. For example, if we want to measure the surface are of a basket ball, one would probably measure the circumference46 with a tape measure or string. Then, using the relation: C = 2πR = πD (72) where C is the circumference, R is the radius and D = 2R is the diameter, one can then compute the radius of the ball. Once the radius is known, the surface area S can be calculated: S = 4πR2 . (73) In this manner we have only performed a length measurement, yet we have ob- tained a measurement of a ‘derived property’ of the ball – it’s surface area. If you think carefully about the tools we use to measure things, one quickly comes to the conclusion that almost all measurements we perform are those of derived quantities. The question then arises – how do we obtain an estimate of the error in our final measurement? To do so, one would have to relate the error in a derived quantity to the error in the basic quantities which we directly measure. One general procedure for obtaining a ‘total error’ or ‘absolute error’ estimate, involves three ingredients: • A knowledge of the precision of your measuring ability (inherently restricted by the precision of your instruments). 46 That is, the circumference of a great circle – a circle which passes through the centre and divides the ball into two equal hemispheres. 41
  • 42.
    • A mathematicalfunction relating your derived quantity to the quantities you directly measure. • The ‘total differential / exterior-derivative / exact derivative’ formula (Tuto- rial 4). Mathematically, we proceed as follows. Definition 1 (Absolute Error) Let x1, ..., xn be a set of n quantities which are to be measured (with their respective units). Now, let f(x1, ..., xn) be a func- tion of n variables, representing some derived quantity which is to be measured. If ∆x1, ..., ∆xn are the errors associated to the measurements of x1, ..., xn (e.g. instrument precision) then the corresponding ‘absolute error’ in f(x1, ..., xn) is given by the linear estimate: ∆f(x1, ..., xn) = | ∂f ∂x1 ∆x1|+| ∂f ∂x2 ∆x2|+... + | ∂f ∂xn ∆xn|. (74) which is evaluated at the measured values x1, ..., xn. Note that the formula for ∆f is similar to the total differential, df, where the dif- ference is that we have replaced the covectors (1-forms) dx1, ..., dxn with the mea- surement errors ∆x1, ..., ∆xn. The absolute value of each term is also taken – this is because when looking to estimate the ‘Maximum Probable Error’, each error should add up. When quoting the value of f as (derived) measurement, we say that the quantity f has the value: Measured Value of f = f(x1, ..., xn) ± ∆f. (75) Therefore, (with some probability) we say that the true value of f lies in the interval [f − ∆f, f + ∆f]. Note that in the case of perfect measurement technique, one would attribute the errors ∆x1, ..., ∆xn to the instrumental precision. So for example, if you are mea- suring the height h of Tyrion Lannister with a tape measure, the error ∆h would be equal to half the width of the gradings in the tape measure. Finally, one should note that this ‘absolute error’ formula only takes deterministic errors into account (i.e. precision e.t.c) – it does not factor in wrong measurement technique or external errors which one has not accounted for. Before attempting the problems and examples, consider the following philosoph- ical note. Because of Quantum Mechanics – in particular, the Heisenberg Un- certainty Principle and the inherent non-deterministic nature of the universe, it is inherently impossible to measure anything with 100% accuracy or certainty. This is 42
  • 43.
    not due toimperfect craftsmanship (imperfect measuring devices) or human imper- fection – it is because the process of observation and measurement itself, requires interacting with the entity which we are trying to measure. This interaction alters the state of the entity we are trying to measure and is necessarily constrained by the Heisenberg uncertainty principle. 8.2 Examples and Problems Example 7 (Thinking Ahead with Ned) In a sadistic rage, the false King Joffrey decides to measure the surface area of Ned Stark’s head after decapitation. Being a boy of elementary means, he approximates the Lord of Winterfell’s head as a sphere. Using a string and ruler, he measures the circumference of Ned’s head by to be C = 24 inches. He does this by marking the string, then measuring the string with the ruler. The gradings on the ruler are spaced 1 4 of an inch apart – hence precision of the ruler is 1 8 in. Assuming his technique is correct, this means that the error associated to the circumference measurement is ∆C = 1 8 in. Therefore, Joffrey deduces the surface area of Ned’s head to be: S = 4πR2 = 4π( C 2π )2 = 1 π C2 = 1 π (24in)2 = 574 π in2 . (76) Viewing S = S(C) as a function of the measurement C, the absolute error in S is given by: ∆S =| ∂S ∂C ∆C|= | 2 π C∆C| =| 2 π 24 × 1 8 |in2 = 6 π in2 . (77) Hence, with the equivalent sphere approximation, the surface area of Ned Stark’s head is: S ± ∆S = ( 574 π ± 6 π )in2 ≈ (183 ± 1.91)in2 . (78) Exercise 11 (Thinking Ahead: Part II) Using his previous measurement of the circumference of Ned’s head, compute the volume V of Ned’s head along with the absolute error ∆V . Recall that the volume of a ball of radius R is given by: V = 4 3 πR3 . (79) Hint: First write V in terms the circumference C, using the relation C = 2πR. 43
  • 44.
    Dry Humour: Toaccount for dehydration-related shrinkage of Nedard’s head, add 5% ± 1% of the measured volume of Ned’s head. Note, you add the ±1% of V to the previously calculated error ∆V – that is: ∆VNew = ∆VOld + 0.01V . Exercise 12 (Thinking Ahead: Part III, Return of The King) After receiving tu- ition help from St. George’s College tutors, King Joffrey decides to further his skills by measuring the volume of Ned Stark’s head – this time, using more sophis- ticated estimates. In particular, he approximates the Lord of Winterfell’s head to be that of a prolate spheroid47, with its major axis aligned with the symmetry axis (vertical axis) of Ned’s head. Again, using a string and ruler (this time in metric units), Joffrey proceeds to measure the circular circumference of Ned’s head (pro- late spheroids have circular cross sections along their minor axis) as well as the elliptical circumference of Ned’s head (elliptical cross-sections along the major axis). Joffrey makes the following measurements: CCircular =55cm = 2πR CElliptical =62cm = 4aE(e) (80) where a is the semi-major axis length of the ellipse, b is t the semi-minor axis length, e = ˜ 1 − (b a)2 is the eccentricity of the ellipse and E(e) is a complete elliptical integral of the second kind (computed numerically or as an infinite series expansion in e). Note that the semi-minor axis length b of a prolate spheroid, is equal to the radius of the circular cross-section along the minor axis of the spheroid : b = R, (81) since the spheroid is generated by revolving the ellipse about the axis perpendicular to the minor axis. Given that Joffrey’s newfound metric ruler has 1mm = 0.001m spacings, the precision error in his measurements in now given by: ∆CC = ∆CE = 0.5mm = 0.5 × 10−4m. Using the volume formula for a prolate spheroid: V (a, b) = 4 3 πa2 b, (82) compute the volume of Ned’s head, along with the associated absolute error ∆V . This requires Russian Grade 1 skills. 47 A prolate spheroid was chosen over an oblate spheroid after using Microsoft Paint to compare the width and height of Sean Bean’s (the actor playing Ned Stark) head. 44
  • 45.
    Hint: To proceed,you should write the volume V in terms of the circular and elliptical circumferences: V = V (CC, CE). This requires writing the semi-minor and semi-major axis lengths in terms of the Circumferences. We already know that b = R, hence b = CC 2π . To get the semi-major length a in terms of CE, one needs an approximation for the elliptical integral E(e). Recalling from Russian Grade 1 in Tutorial 4, we have the Ramanujan approximation: CE ≈ π[3(a + b) − — 10ab + 3(a2 + b2)]. (83) By bringing the 3(a + b) term to the left-hand side and squaring both sides, we can obtain a quadratic equation for a in terms of b and CE. The positive root of this equation is given by: a = 3CE − 4bπ + ˜ 3C2 E + 12bCEπ − 20b2π2 6π . (84) Substituting these expressions for a and b into V , one can then compute V and its partial derivatives, required for computing ∆V . Exercise 13 (La forma de la espada – “The Shape of the Sword”) The goal of this problem, is to be able to reproduce all the steps and arguments to derive the volume estimate – then compute the volume measurement and absolute error at the end. Disclaimer: there may be errors in this error analysis! To add further insult to the Stark family, Tywin Lannister – Hand of the King and head of the Lannister family, decides to melt down Edard Stark’s greatsword, “Ice. Being a pragmatic man, Tywin decides to calculate the volume of this sword in order to work out how much Valyrian steel he will have to forge two new swords, for his sons. Not being as clever as Archimedes, Tywin doesn’t think to use water displacement to measure this volume. Instead he proceeds as follows. We can approximate the blade (Valyrian steel part) of the sword to be that of a shallow rhomboidal prism, with maximum width at the hilt of the sword, decreasing in thickness down to the pointed tip. A rhomboidal (diamond-shaped) prism, means that the width-wise cross-sections of the sword of are shaped like rhombuses with very narrow (acute) angles α in the plane parallel to the cutting edges and very large (obtuse) angles β length in the plane perpendicular to the cutting edge. Despite the decreasing thickness, the angles in the rhombus cross-sections will remain the same48. Say that the rhomboidal cross-sections are measured to linearly decrease in area, down from the hilt to the tip – reached zero area at the pointed end of the blade. By 48 So we could in-fact view the blade as a continuous conformal map of a rhombus. 45
  • 46.
    knowing the lengthof the blade and the cross-sectional area at the hilt and at the tip (zero), we can construct a linear function, A(x) (where x is the distance down the blade, measured from the hilt), from which we can interpolate the cross-sectional area of the blade anywhere between the hilt and the tip. The volume will then be ‘sum’ of these cross-sectional areas stacked on top each of each other – i.e. the integral: V = xtip xhilt A(x)dx. (85) Because of his war with the Stark family, Tywin has run out of protractors and is thus left only with Joffrey’s string and ruler to carry out his measurements – to which he proclaims, “FML! Tywin now summons the help of his educated son, Tyrion Lannister. In a stroke of cleverness, to work out the angles of the rhombus cross-section at the blade hilt, Tyrion measures the circumference of the blade. Because of sword symmetry (the rhombus consisting of two mirrored isosceles triangles), this circumference C is equal to four times the length of each side of the cross-sectional rhombus at the hilt: Crhombus(xhilt) = 4Lhilt. (86) If the blade was completely flat, it would have a width of 2Lflat at the hilt, instead of the string-measured value of 2Lhilt. Thus, by holding the string tangential to the corner of the rhombus (which runs down the center of the blade), Tyrion measures the ‘flat width’ of the blade: 2Lflat. He then computes the ‘flat circumference’ at the hilt: Cflat = 4Lflat (87) and concludes that the deviation: Crhombus − Cflat = 4(Lhilt − Lflat) (88) must be due to the entirely to rhomboidal geometry49 of the cross-sections. Using planar geometry that he learned while in his mother’s womb, Tyrion realises that 2Lflat is equal to the central diameter of the rhombus. Forming a right-angle triangle in the rhombus, with hypotenuse Lhilt, acute angle α 2 , adjacent side Lflat. 49 The key concept here is that of a ‘defect angle’. The rhomboidal geometry introduces a non-zero angular defect away from zero-angle describing flat cross-sections (straight lines). 46
  • 47.
    Figure 1: Cross-sectionalrhombus of idealized broadsword. Therefore, simple trigonometry gives: cos( α 2 ) = Lflat Lhilt sin( β 2 ) = Lflat Lhilt , (89) which allows Tyrion to deduce the interior angles α and β of the cross-sectional rhombus. By symmetry, the area of the cross-sectional rhombus at the hilt is simply four times the area of this triangle (using Pythagoras’ theorem since we want all quantities in terms of the measured quantities Lh, Lf ) Arhombus(xhilt) =4 × 1 2 × base × height = 4 × 1 2 Lflat ˜ L2 hilt − L2 flat =2Lflat ˜ L2 hilt − L2 flat . (90) To work out A(x) for any x ∈ [xhilt, xtip], Tyrion lays the sword flat. Overhead, the sword looks like an isosceles triangle, with base 2Lflat and height Lblade. Splitting these into two right-angled triangles, we get the following diagram: Figure 2: Top view of broadsword laid flat. In particular, Tyrion finds that tan(γ) = Lflat Lblade . Setting up a coordinate system with xhilt := 0 at the hilt and x = xtip = Lblade at the end of the blade, the height y of the triangle at any point along the blade, can then be computed as a function 47
  • 48.
    of the positionx along the blade. Trigonometry shows that: y = (Lblade − x) tan(θ) = (Lblade − x) Lflat Lblade = Lflat − 1 Lblade x. (91) To work out the area A(x) of the cross-sectional rhombus at any point x along the blade, one uses the previous formula: 2Lflat ˜ L2 hilt − L2 flat, but makes the following replacement Lflat → y and Lhilt → y tan(α 2 ) = Lhilt Lflat y, since tan(α 2 ) = Lflat Lhilt (recall the first diagram). Hence we have: Arhombus(x) =2y d ( Lhilt Lflat y)2 − y2 = 2y d y2(( Lhilt Lflat )2 − 1) = 2y2 d ( Lhilt Lflat )2 − 1 =2Lblade(Lflat − 1 Lblade x)2 d ( Lhilt Lflat )2 − 1. (92) Having learned calculus from the ‘Principia Mathematica’, Tyrion concludes that the volume is therefore given by the following function of three measured variables 50 — Lflat, Lhilt and Lblade: V (Lflat, Lhilt, Lblade) = x=Lblade x=0 A(x)dx = 2 d ( Lhilt Lflat )2 − 1 x=Lblade x=0 (Lflat − 1 Lblade x)2 dx =2 d ( Lhilt Lflat )2 − 1 1 3 Lblade   1 − 3Lflat + 3L2 flat ¨ =2Lblade d ( Lhilt Lflat )2 − 1 ¢ L2 flat − Lflat + 1 3 . (93) Given Tywin’s measurements of the broadsword along with the corresponding pre- cision error Lblade = 42in± 1 8 in, Cflat = 4Lflat = 4in± 1 8 in, Crhombus = 4Lhilt = (3+ 7 8 )in± 1 8 in, (94) 50 As a consistency check, one should note that we expect the volume to be a function of exactly three variables. This is because the cross-sectional area of the sword is parametrised by two-variables (being a non-square rhombus), whilst the length of the sword is parametrised by another independent variable. If the cross-sectional rhombus was turned into a square, the sword would be reduced to a spike and the volume would be a function of two measured variables – the blade length and the length of one of the sides of the cross-sectional square. 48
  • 49.
    one can deducethe following measurements and (reduced) errors for the L vari- ables 51 Lblade = [42 ± 1 8 ] in, Lflat = [1 ± 1 2 ] in, Lhilt = [ 1 4 (3 + 7 8 ) ± 1 2 ] in. (95) Problem I From these measurements, compute the volume (in units of inches cubed), V , of Ned’s broadswoard along with the corresponding absolute error, ∆V . Convert these measurements into metric units, using the conversion: 1 inch = 2.54 cm. In a thoughtful moment, Tyrion decides to calculate the financial worth of the sword in terms of pure Valyrian steel. Given that Valyrian steel is worth 100 times its weight in gold, calculate the total worth W of the broadsword in terms of kilograms of gold. To do this, use the fact that density of Valyrian steel 52 is ρ = 7.85 g/cm3 = 0.284 lb/in3. Remember to use consistent units – either stick with imperial units or convert everything to metric units. Problem II Given that mass M = V olume × Density = V ρ, compute the error in the amount of gold Tyrion will make by selling the steel smelted from the broadsword blade. Assume that the density ρ given is accurate to the num- ber of decimal places quoted – i.e. the precision error in density is given by: ∆ρ = 0.005g/cm3. Hence deduce the minimum and maximum amount of gold (W = 100M) Tyrion will make, based on Tywin’s measurements – i.e. compute W − ∆W and W + ∆W. • The dimensions in the last question were computed using slightly larger- than-average dimensions for Claymores and Two-handed swords from the medieval ages. • The last exercise illustrates an important technique in making measurements: by measuring the circumference of the sword rather than just the edge of the cross-sectional rhombus, the precision error in determining Lflat was reduced by a factor of 4. In general when making measurements, it is better to make measurements of quantities which are much larger than the precision limitation set by your instrument – from these measurements, you can then deduce measurements for quantities you need with lower precision error. So for example, in determining the area of a circle with string, it is better to measure its circumference rather than its radius (since the former is larger) – this way, one may reduce the precision error in determining the radius by a factor of 2π. 51 See the remark after this exercise. 52 The density of Carbon 1060 Steel used to make “Ice” replicas for crazy Game of Thrones fans. 49
  • 50.
    Exercise 14 (Littlefinger’sbane – Lord Tyrion, Master of Coin) Having been given the responsibility of managing the Kingdom’s finances, Tyrion Lannister finds that he has inherited some ‘financial discrepancies’ from Littlefinger – that is, he has found some mathematical ‘short-comings’ 53 in Littlefinger’s bookwork. In partic- ular, apart from pocketing coin from time to time, Tyrion finds that his predecessor Littlefinger has been using the wrong interest rate formula to calculate the king- dom’s debt. Furthermore, Tyrion finds that Littlefinger has been ‘inflating’ the recorded expenses, so as to inflate his own pockets. Being clever, Littlefinger ran- domized the expenses which he had inflated and also kept all ‘inflations’ to within 2% of the true expense. Littlefinger used discrete compound interest, compounded quarterly, to compute the interest S(t) − S0 that the kingdom owes to a certain bank t years after taking an initial loan S0. Given an annual interest rate of 7% – i.e. r = 0.07, the amount owed to the bank is given by S(t) = S0(1 + r m )mt (96) where m = 4 is the number of times the interest was assumed to be compounded per year. However, driven by avarice, the bank in fact changed the terms of the loan so that interest was compounded continuously – i.e. m → ∞. With this correction, Tyrion finds the actual amount owed after t years: S(t) = lim m→∞ S0(1 + r m )mt = S0ert (97) where e is the exponential function. First, Tyrion must correct the size of the debt blackhole that Littlefinger’s endless borrowing has brought the kingdom into. To do this, Tyrion must estimate the true total expenses E of the kingdom along with an ‘absolute error estimate’ ∆E to account for the amount of money Littlefinger has stolen. Once this is done, Tyrion must calculate the amount of interest that the Kingdom will owe in the next financial year, along with an error estimate to account how much of this may be due to Littlefinger. Problem I: First make sure that you understand why discrete compound interest is given by a geometric sequence and why continuous compounded interest is given by the exponential function. Now,given an initial loan of S0 = 4, 000, 000gc (gc = ‘gold coins’) taken t = 12.5 years ago along with a second (separate) loan of ˜S0 = 6, 000, 000gc taken t = 2 years ago, compute the total amount of money, S + ˜S, that the Kingdom currently owes to the bank. 53 Pun intended. 50
  • 51.
    After some statisticalanalysis, Tyrion concludes that Littlefinger has used a bi- nomial distribution to inflate the expenses. In particular, Littlefinger has selected a probability of p = 0.25 to choose whether or not to inflate an expense at any given time. Counting N = 2014 expenses – and based on the asymptotic nature of probabilities (Law of large numbers), it is a reasonable estimate to assume that one quarter of all expenses are inflated. Thus, as a simplification, Tyrion decides to assign the following error to each recorded expense, Ek: ∆Ek = −0.25 × (0.02 × Ek) = −0.005Ek, (98) where the negative sign accounts for the fact that Littlefinger would only steal money rather donating it. Because these errors are cumulative, the error in the total expense E = 2014 k=1 Ek, is given by ∆E = 0.005E. (99) Problem II In total, the Kingdom’s projected expenses are given by the sum of its total debt (S + ˜S) as well as its internal expenses, I = 1, 000, 000gc. As an approximation, treat Littlefinger’s inflation of the expenses as an inflation of the initial loans taken from the bank – i.e. ∆S0 = 0.005∆S0, ∆ ˜S0 = 0.005∆ ˜S0. (100) Using the arguments earlier, one can approximate the error in the internal expenses to be ∆I = 0.005I. On second inspection, Tyrion realises that with their loose contract, the bank may legally retro-actively alter the interest rate r on their loan by ±1%. This induces an error of ∆r = ±0.01 in the compound interest formula. To compute the error in the debts S(t = 12.5) and ˜S(t = 2), one views S and ˜S as functions of the interest rate r and the initial loans – S0 and ˜S0 – then uses the absolute error formula. With this information, compute: ETotal = S(12.5, r, S0) + ˜S(2, r, ˜S0) + I (101) as well as the absolute error, ∆Etotal. Problem III The quality of time-keeping devices owned by the kingdom and the bank, is not very good. In particular, clocks are known to have an accuracy of about 1 minute per day – i.e. ∆tday = 60 seconds. Convert this into a yearly error in t, giving ∆t. To account for the quasi-periodic orbit of their planet, add ±1.5 days 51
  • 52.
    multiplied by thenumber of years elapsed to the estimate of ∆t – i.e. add 1.5 365t to ∆t, assuming each year has an average of 365 days on Tyrion’s planet. Using this final estimate of error in the time elapsed and viewing the debts S = S(t, r, S0) and ˜S = ˜S(t, r, ˜S0) as functions of three measured variables (with time included), to compute a refined estimate for ∆S and ∆ ˜S0 that takes into account ∆t. Problem IV If there is a chance that Littlefinger has caused a total of more than 333, 333gc (gold coins) of excess expenses (including interest), Tywin Lannister will proceed to organize Littlefinger’s capture, de-sexing and subsequent torture. Based on your calculations, will Littlefinger’s name become his new pathos? Adopting the cunning of Tyrion Lannister, if Littlefinger has managed to fall below Tywin’s ‘critical expense threshold’, can you modify the above argument to max- imize the possible error in the computed expenses of the kingdom? Note that you must do this in mathematically plausible and logical way, so as to persuade Lord Tywin to torture Littlefinger. Alternatively, if Littlefinger has already exceeded the threshold, can you use binomial statistics to maximize ∆E, hence maximizing the severity of his torture? Hint: Think about using the cumulative probability density function (sum of all scenarios where theft occurs less than or equal to a certain number of times) de- rived from the binomial distribution. If all else fails, either ‘inflate’ the probability with which Littlefinger stole at any given expense, or inflate the amount by which Littlefinger was inflating the expenses. 52
  • 53.
    9 Tutorial 6:Medicine – An Error a Day Keeps the Tutor Away In the last tutorial which you completed 54 for the good of your future selves 55, you studied ‘absolute errors’ using ‘linearisation’ or an informal re-interpretation of the ‘exterior derivative’ operation. In this tutorial, we will finalise our study on basic error analysis with a few more concepts such as ‘relative error’, ‘percentage error’, ‘least scale error’ and ‘Maximum probable error’. The tutorial will conclude with some illustration of how to extract and interpret error estimates from ‘least squares regression’ or ‘line of best fit’ – the most commonly used statistical analysis tool in experimental science and engineering. 9.1 Relative and Percentage Error In the last tutorial, we defined the absolute error ∆f in the measurement of some dependent variable f(x1, ..., xn), in terms of a set of measurements of experimen- tally measured variables x1, ..., xn and their corresponding errors ∆x1, ..., ∆xn: ∆f(x1, ..., xn) = | ∂f ∂x1 ∆x1|+| ∂f ∂x2 ∆x2|+... + | ∂f ∂xn ∆xn|. (102) The relative error in f is then defined very simply as the ratio of the absolute error in f to experimentally determined value of f. We now formalize this. Definition 2 The relative error in some measurement f(x1, ..., xn) of f, is defined as ∆f f = 1 f(x1, ..., xn) ¢ | ∂f ∂x1 ∆x1|+| ∂f ∂x2 ∆x2|+... + | ∂f ∂xn ∆xn| , (103) where x1, ..., xn are some set of measured variables which determine f(x1, ..., xn. Furthermore, the Percentage error in the measurement f is defined to be the rela- tive error expressed as a percentage: Percentage Error in f = ∆f f × 100%. (104) 54 Hint: Angela, Emma, Amelia, Zoe! 55 Hint: Future Angela, Future Emma, Future Zoe and Future Amelia. 53
  • 54.
    9.2 Error Etiquette Wenow define a fundamental standard in error analysis – the ‘Maximum Probable Error’ (MPE). So far, we have always been using absolute values when computing error – this is to maximize our calculation of the possible errors that may have accu- mulated in our measurement process (which we could account for). Therefore, in the absence of unaccounted errors, the Maximum Probable Error is the maximum error that may have occurred if the worst-case scenario happened in our measure- ments – i.e. all the errors added up instead of cancelling. In general, when an error is quoted in scientific and engineering literature, it corresponds to the ‘Maximum Probable Error’. In principle, one could keep track of the signs in the error (e.g. if we knew that a quantity may be larger than measured, but not smaller) and add them up so as to give a total error which is less than the ‘maximum error’ – but this is almost never used. Furthermore, the ‘absolute error’ formula corresponds to error at the ‘linearised level’. For each problem, we could try and derive a more accurate non- linear calculation of the errors, but for the most part, one sticks to the absolute error ∆f. In the absence or ignorance of systematic errors, the errors ∆x1, ..., ∆xn in the experimentally measured quantities x1, .., xn are usually taken to be the precision error due to the ‘least scale’ reading on your measuring device. In other words, a measuring device – such as a ruler or the ATLAS detector in the Large Hadron Collider, will typically have smallest scale reading, which is set by the resolution of the instrument. Higher quality and more expensive measuring instruments will usually have a higher (better) resolution – meaning a smaller least scale reading. On a standard 30cm ruler, the least scale reading is usually 1mm, which is deter- mined by the smallest separation in the marked spacings. As a rule of thumb and limitations on one’s ability to interpolate, the Least Scale Error ∆x in some mea- sured quantity x is usually taken to be equal to half the least scale reading. So for our 30cm ruler, with a least scale of 1mm, we would take our least scale error for any length measurements with this ruler to be ∆x = 0.5mm = 5×10−4m. 9.3 Sleepy Snorlax’s Medical (mis)Adventures To make this concrete, lets consult Snorlax the Sleepy Sphere. Note that formally speaking, the term ‘Sphere’ refers mathematically to the 2-dimensional boundary of a 3-dimensional ball – i.e. its outer surface, excluding the interior. Here and 54
  • 55.
    previously, we willuse the term ‘sphere’ to interchangeably refer to the ‘2-Sphere’ (the surface) and the ‘3-Ball’ (boundary surface + interior). Example 8 (Snorlax, The Sleepy Sphere) Upon recognizing that she has a sleep- ing disorder, Soporific Snorlax decides to roll over to the Royal Perth medical cen- tre. Here Snorlax meets Dr. Ashleigh Punch – a Georgian. After Deciding to take a keen interest in medical physics, Dr. Punch decides to takes Snorlax’s mea- surements – using high-energy x-rays and vernier calipers, Dr. Punch measures Snorlax’s Diameter 2r (where r is the radius) to be 2r = 20.494024 metres (to within precision of 1µ) with a least scale error of ∆2r = 5 × 10−7m = 0.5µm 56. Hence we have r = 10.247000m and ∆r = 2.5 × 10−7m. Dr. Punch then calculates Snorlax’s volume to be V = 4 3 πr3 = 4506.9300m3 (105) with an absolute error of ∆V = 4 3 π∆R3 = 4π|R2 ∆R|= 3.29871 × 10−4 m3 (106) and relative error of ∆V V = 4π|R2∆R| 4 3πr3 = 4506.9300m3 3.29871 × 10−4m3 =7.319210 × 10−8 , (107) corresponding to a percentage error of 7.319210 × 10−8 × 100% = 7.319210 × 10−6 %. (108) As far as medical measurements go, this is a high-precision measurement. After consulting the Oxford Handbook of Clinical Medicine (9th Edition), Dr. Punch concludes that Sleepy Snorlax is clinically obese and needs to get rid of excess adipose tissue. She prescribes Snorlax one week of “Living Below the Line, fol- lowed by power-lifting sessions at the gym and night-time cycling. Exercise 15 (The Hippocratic Oath) After seeing Dr. Punch, Snorlax loses a lot of weight. Too much weight. In fact, it turns out the dose of high-energy X-rays that Dr. Punch used to image Snorlax, was 100, 000 times above clinical guidelines 56 1 micrometer is defined to as one millionth of a meter: 1µm = 10−6 m. 55
  • 56.
    (oops!). Snorlax suspectsshe may in fact have cancer, so decides to consult Dr. Kaylin Hooper – another Georgian medical student. Dr. Hooper suggests that one way to test for cancer, is to measure Snorlax’s average density and compare this density to that of a healthy sphere – since sentient sphere’s don’t have muscles or bone or any internal structure ... the standard deviation in sphere densities amongst the sphere population is extremely small. Having learned that type 1 spherical cancer tumors have a higher density than that of healthy sphere tissue and type 2 tumors have a lower density than that of normal tissue, Dr. Hooper proposes that if Snorlax’s density is significantly higher or lower than the sentient sphere average density (to within 5 standard deviations and experimental error), then Snorlax has sphere cancer. Using Nuclear Magnetic Resonance Imaging (MRI = NMR) and a reconstruction algorithm based on Ellipsoidal Harmonics (http://www.sciencedirect. com/science/article/pii/S0010465513002610), Dr. Hooper mea- sures Snorlax’s volumetric density to be: ρexp = 103 kg/m3 , (109) with a combined least-scale and numerical precision error (inherent in the algo- rithm) of ∆ρexp = 0.001g/m3 . (110) Pro Tip: Remember to keep track of the units you use and be consistent (e.g. choose kilograms and metres). Q0: A standard healthy sentient sphere has a density of ρavg = 969kg/m3 with a population standard deviation of σρ = 6kg/m3. Using the particle physics stan- dard of ‘5-sigma’ for statistical significance, determine whether or not Snorlax has sphere cancer. If so, what type of sphere cancer(s) does Snorlax likely have? In other words, does the possible range of Snorlax’s experimentally measured den- sity, [ρexp − ∆ρexp, ρexp + ∆ρexp] lie entirely within the density of the standard sphere population ρavg = 969kg/m3, to within 5 standard deviations, 5σρ –i.e. [ρavg − 5σρ, ρavg + 5σρ]? Furthermore, compute the relative error and percentage error in the experimentally determined value of Snorlax’s density. Q1: As an upcoming student in medicine, Matthew Fernandez realises that setting the statistical significance level to five standard deviations is crazy for a medical diagnosis. After some research, Matthew decides that setting a significance level of three standard deviations to diagnose for sphere cancer, is far more sensible. Under 56
  • 57.
    this new criteria,to within experimental error does Snorlax have sphere cancer? If so, what type of sphere cancer(s) does Snorlax likely have? A rare, but debilitating condition for sentient spheres is Volumetosis – a disease in which the volume and surface area of a sphere are inconsistent. This means that the radius associated to some calculated volume of the sphere, Rv = (3V 4π ) 1 3 , disagrees with the radius associated to the surface area of the sphere: (Rs = ˜ S 4π ). Volumetosis has two known causes and is thus classified into two types: • Type 1 Volumetosis: Symmetry Breaking. This occurs when the sphere be- gins to turn into an ellipsoid – typically because cancerous mutations induce a change in the expression of the sphere gene’s that control its eccentricity. • Type 2 Volumetosis: Quantum Russian-doll operators. This occurs when a classical sphere gets infected by quantum operators, which turn it into a quantum superposition of infinitely many concentric spheres of different radii. Each time a measurement is made, the quantum collection collapses to a single sphere of definite radius. Hence independent measurements of sur- face area and volume will in general, yield different radii – with probabilities centred around some classical average. To directly, determine the volume of Snorlax, a machine called the ‘Banach-Tarski Annihilator’ is used. Such a technique is only ever sanctioned by the medical community in severe circumstances – which requires both the hospital and patient signing-off on the ‘The Axiom of Choice’ form. This technique creates an exact topological (genetic) copy of Snorlax, with the same volume, then proceeds to bombard it with sphere anti-particles till the entire copy is annihilated. The number of anti-particles used in the process is then counted and their equivalent volume is calculated. Q2: In particular, this technique works by calculating the number of anti-particles N and assigning a volume u = 10−22m3 per particle. Hence, V = N × u. (111) Given N = 8.12247 × 1022 and an atom-counting resolution of ∆N = 106 par- ticles (taking into account higher-loop corrections from quantum field scattering processes), compute the total volume V along with the error in volume ∆V , as measured by the Banach-Tarski Annihilator. Furthermore, compute the relative error and percentage error in V . Recompute ∆V as well as the relative and percentage errors, now taking into ac- count an error ∆u = 10−25m3 in the volume per anti-particle. This additional 57
  • 58.
    error is dueto the non-local (spread out) nature of the anti-particle wavefunction (or quantum probability density). Q3: Given the measurements of Snorlax’s new volume V , compute Snorlax’s volume-determined radius Rv along with the absolute error ∆Rv, relative error ∆Rv Rv and percentage error in Rv. Hint: Use Rv = ( 3V 4π ) 1 3 . (112) A new measurement of Snorlax’s surface area is made, using the technique of ‘particle deposition’. This effectively deposits a layer of radio-shielding particles onto the sphere, till the sphere is completely covered – at which point it is radio- opaque (radio waves cannot pass through it). By rotating the sphere in an array of directed radio emitters and measuring the intensity of radio waves passing through the sphere, one determines when a complete layer of radio-shielding particles has been laid onto the sphere. The sphere is then stripped using an electric field and the particles are collected onto a flat single-molecule layer – whose surface area is then measured, again using radio waves. In total, this measurement process has an effective precision of 0.001m2 as well as an estimated inaccuracy of 5%, due to non-linear and quantum electrodynamical effects. Hence, the experimental error in surface area measurement is: ∆S = 10−3 m2 ± 0.05 × 10−3 m2 . (113) Thus, taking the maximum probable error, we set ∆S = 1.05 × 10−3 m2 . (114) Furthermore, by this method, the surface area of Snorlax is determined to be S = 18.0956m2 . (115) Q4: Using the measurements S and ∆S, compute surface-area determined radius Rs of Snorlax along with the absolute error, ∆Rs. Hint: Rs = ™ S 4π . (116) Furthermore, compute the relative error and percentage error in ∆Rs. Q5: By comparing the possible experimental values of volume and surface-area determined radii of Snorlax, Rv and Rs, determine – to within experimental error, 58
  • 59.
    whether or notSnorlax has volumetosis. If Snorlax has volumetosis, how severe is it – i.e. Hint: This amounts to comparing whether or not the measurement ranges, [Rv − ∆Rv, Rv +∆Rv] and [Rs −∆Rs, ∆Rs +∆Rs], overlap or not. The severity is de- termined by the ‘range of disagreement’ – i.e. the maximum possible discrepancy (non-overlap). Q6 Type I volumetosis and Type II volumetosis can be distinguished as follows. In particular, for some currently unknown reason, Type I volumetosis typically leads to a sphere turning into a slightly oblate spheroid, meaning that its surface area increases relative to its volume. This is because for a given volume, a sphere is an object which has minimum surface area. Hence, in Type I volumetosis, the sur- face area determined radius Rs of Snorlax would be measured to be consistently greater than Snorlax’s volume-determined radius Rv. Since this form of volume- tosis is topological, it can be treated by injecting Snorlax with ‘homeomorphism’ regulators which then continuously transform Snorlax’s gene expression back to that of zero eccentricity. In the case of Type II volumetosis, because the quantum superpositions are sym- metrically weighted about some classical radius, on average (i.e. after a large num- ber of measurements), the volume and surface area determined radii agree. How- ever, because of the oblate spheroid mystery, it suffices to measure whether or not the Rv is greater than Rs. In particular, if Rs Rv, Snorlax has Type II volumeto- sis, which cannot be cured by Dr. Punch or Dr. Hooper – for this, the Royal Perth Hospital must bring in an external contractor, known as Dr. Who. Such an affair, is extremely expensive. From this, decide whether or not Snorlax requires the medical attention of Dr. Who. 59
  • 60.
    10 Tutorial 7:Romanian High School, Part I – Einstein Convention and Vector Algebra The last several tutorials have been rather applied – with the inclusion of some abstract concepts. However, in order to progress and develop more powerful ma- chinery, one has to delve into the abstract realm. This is the pattern of mathemat- ics throughout history as the interplay between the creative abstraction and gen- eralisation of certain ideas or observations that occur in nature – sometimes new mathematics is motivated by what we see in nature, other times new insights and perceptions of nature are generated by new ideas in mathematics. The debate over whether mathematics is ‘created’ or ‘discovered’ however, is a rather contentious and heated one – so we’ll avoid it for now. In this following set of tutorials, we will explore a range of powerful abstract ideas which play a central role in modern mathematics, physics and engineering. These typically lead to something of practical advantage – either new and more efficient calculation techniques or simply another perspective on things. Lie algebras ap- pear everywhere – from everyday rotations to the algebras that govern quantum mechanics and high-energy particle physics. Similarly, Clifford algebras play an increasing significant role in modern mathematical developments – leading to an elegant and efficient alternative formulation of vector calculus. Before we can explore these slightly more advanced topics, we must first prac- tice the ‘Einstein convention’. This will get you used to seeing objects in their component form and how to re-express operations that you are familiar with. It is very general and often saves time by avoiding nasty sigma signs , as well visually keeping track of the dimensionality and rank of objects. Luckily, this it taught in Romanian high schools57 in the context of tensor calculus and coordinate- dependent differential geometry – so you should also be able to do it. Note that the following tutorial combines the Einstein convention with ‘Ricci cal- culus’58, which despite being a widespread convention for doing vector and tensor calculus, is not necessarily the ultimate manner in which to calculate things. For vector and tensor calculus, Cartan’s exterior calculus and modern coordinate-free differential geometry often present the most illuminating, efficient and elegant pre- sentation and calculation techniques – however, the pre-requisites are high and 57 As noted by a friend of the present author, which he met at the Perimeter Institute for Theoretical Physics, Canada. 58 A coordinate-dependent calculus for tensors, developed by an Italian mathematician, Gregorio Ricci-Curbastro. 60
  • 61.
    hence will notfeature in these tutorials (except in very elementary examples of the exterior algebra). 10.1 Conventions: Einstein Notation and Vector/Matrix Operations By now, you will have seen vectors represented in different ways – geometrically as pointed arrows indicating magnitude and direction, algebraically as a set of compo- nents in some standard basis or in matrix form as a row / column. So for example, given a vector v in three dimensions we can write it as: v = v1 e1 + v2 e2 + e3, (117) where e1, e2, e3 are the ‘standard basis vectors’59, – i.e. vectors of unit length pointing in the x, y and z coordinate directions. Alternatively, this vector can be represented in terms of its components, with respect to some standard basis and choice of origin: v = (v1 , v2 , v3 ) = ¤ ¥ v1 v2 v3 . (118) Tip: Don’t confuse the raised index vj for the j-th component of the vector v with ‘v to the power of j’. If we want to raise some component of a vector to a power, we denote this by brackets: for example, (vj)3 is the j-th component v, cubed. Using the ‘sigma notation’ for summation, one can equivalently express the vector v = v1e1 + v2e2 + e3 as: v = 3 j=1 vj ej. (119) However, after a while of writing in the summation symbol: 3 j=1 it can get a bit tedious and unnecessary. Luckily however, the physicist Albert Einstein came up with a very useful (and efficient) convention, that has since become commonplace in modern physics and some areas of modern mathematics – such as differential geometry and higher-level linear algebra. In particular, rather than writing 3 j=1 , we just keep track of the dimensions we are in – since we are in three dimensions, we 59 Sometimes denoted ex, ey, ez or ˆx, ˆy, ˆz. 61
  • 62.
    known that theindex j in the term vjej has to run over j = 1, 2, 3. So in particular, by keeping this in mind, we could simply write: v = vj ej (120) and define the appearance of the repeated index j to mean a summation over all possible values of that index: vj ej : 3 j=1 vj ej = v1 e1 + v2 e2 + e3. (121) Henceforth, we shall stick with notation ej to denote the ‘j-th’ standard basis vector and omit the overhead arrow – hence ej := ej. DISCLAIMER: Your lecturers might get scared when they see vectors without arrows on top of them - if they do, tell that it’s okay and not to be afraid since professional mathematicians don’t need arrows to know when an object is a vector or not (most of the time it is obvious by context). We shall have ever, for clarity, keep arrows for general vectors in what follows. We can now formalise these observations as follows. Definition 3 (Einstein Summation Convention) Given an n-dimensional vector space (e.g. the familiar 3-dimensional Euclidean space R3), one denotes the com- ponents of a vector v in some standard basis e1, ..., en by (vi), where the label i runs over all possible values as determined by the dimension of the vector space – i = 1, 2, ...., n−1, n. One indicates summation by the repetition of the same index twice, hence: v = vi ei := v1 e1 + v2 e2 + ... + vn−1 en−1 + vn en. (122) As such, repeated indices are called ‘dummy indices’ since we can re-label them to whatever we want. Indices which are not repeated are called ‘free indices’, since these label fixed components of some object. In general, the number of free indices will indicate the rank of an object – scalars are rank 0 objects, vectors are rank 1 objects and matrices represent the components of rank-2 objects (rank-2 tensors). In general you can have objects (tensors) of arbitrary rank – for example, the Riemann Curvature Tensor mentioned in Tutorial 1 is was a rank-4 object since its components were labelled by four non-repeated indices: Rµνρσ. As an illustration of ‘dummy indices’ (summed indices), we have: vi ei = vk ek = vα eα (123) 62
  • 63.
    etcetera. Note thatin almost all circumstances, an index should never have to be re- peated more than twice in an expression to perform a summation – generally there’s always a way to write your expressions in a way such that you have at most two- repeated indices, whence in the Einstein summation convention an index should never appear more than twice – if it does, then the expression is wrong. As an example of free indices, we have the ’i-j’th component of a matrix M, la- belled by Mij – this correspond to the entry in the ith row and the jth column. If M is a matrix acting on an n-dimensional vector space (i.e. and n×n matrix) both i and j range independently from 1 to n – hence in general, an n×n matrix can have up to n2 independent components: M11, M12, ..., M1n, M21, ...., M2n, ...., Mn1, ..., Mnn etcetera. Problem 1 (Dot Product) Express the dot-product of two vectors u and v using the Einstein summation convention. Example 9 (Matrix Multiplication) In this manner, Einstein notation provides a simple way to express the product of two matrices A and B. In particular, if we take A to be an M × N matrix (M rows by N columns) and B to be an N × P matrix, then their product C is an M × P matrix (M rows by P columns): C = AB. (124) We can label the components of C by Cij where i refers to the row and j refers to the column – hence they range over the values i = 1, 2, ..., M and j = 1, 2, ..., P. Thus, in Einstein notation, we can express the product of matrices A and B by: Cij = AikBk j := N k=1 AikBk j. (125) Note that for the dummy index k, we have raised the k index on B – for our purposes, there is not much need to distinguish between raised or lowered indices, though in general there is! For aesthetic purposes however (hence to speed up calculation), it is good practice to keep one index raised and one index lowered for each pair of dummy indices. Problem 2 (The Matrix) Some of you may be familiar with the previous rule for matrix multiplication (125)s only implicitly – meaning you know how to multiply two matrices by visually writing them out and then operating on them. If this is the case, write down any two compatible matrices A and B, then check that Einstein notation correctly reproduces the components (AB)ij of their product AB. 63
  • 64.
    Hint: Remember thatthe matrix product AB is only well-defined if A has the same number of columns as the number of rows of B. So, for ease, try this with a 2 × 3 matrix A and a 3 × 2 matrix B – this should give you a 2 × 2 matrix AB. Note that physicists will tend to interchangeably refer to an object by its compo- nents – hence they may view the matrix C as the two-index (rank 2) object Cij or a vector v as the one-index (rank 1) object: vj. This is technically incorrect and its important to remember the difference between the two. In particular, the vector v is an invariant geometric object – this means that it doesn’t depend on any coordinate system or choice of basis vectors. The components vi however, implicitly refer to some a-priori chosen coordinate system or basis. Since we have been using the standard basis, vi refers to the ith component of vector v – meaning the component of v pointing in the direction of the ith basis vector, ei. Hence in physics, refer- ring to a vector by its components vi implies that someone basis has been chosen – sometimes the basis is stated explicitly, otherwise its normally assumed to refer to some ‘standard basis’. Problem 3 (Return to the Matrix) In The Matrix, all objects are represented by matrices. To simulate reality, all calculations of graphics, rendering and physics effects are performed using matrix representations of various algebras and vector / tensor operations 60. In this manner, all actions performed by inhabitants of The Matrix are done via matrix multiplication – in general actions do not commute (meaning you can’t change the order of some actions), since matrix multiplication doesn’t commute. To defeat Agent Smith, denoted by the matrix S, Neo Anderson has to act on Agent Smith sequentially by the bullet matrix B and the Kung-Fu matrix K, which is then followed by the ‘cheesy line’ matrix C. Express the resulting state of Agent Smith, represented by the matrix: CKBS (126) using Einstein notation – i.e. letting F = CKBS, what are the components Fij of F in terms of C, K, B and S? Hint: It may help to find the products sequentially, hence finding BS then KBS and finally computing CKBS. Note that you will need three sets of different dummy indices for the product CKBS (three sets of repeated / summed indices) as well. Q: What restrictions are there on the dimensions (number of rows and columns) of S, B, K and C? Furthermore, what dimension does the product CKBS have 60 To some extent, this is how computer games work – vectorising calculations using matrices and other types of arrays dramatically speeds up computations (in most cases). 64
  • 65.
    in terms ofthe dimensions of number of rows/columns of C and S? To answer this, remember the compatibility condition (a restriction on dimensionality stated earlier) required to multiply two matrices. 10.1.1 Scalar and Vector Products – Dot Product A vector space naturally comes with a law of multiplying two vectors – the exterior product (which we will discuss later). In the special case of three dimensions, R3, one can extra the cross product of two vectors from the exterior product. This result is another vector, hence why the cross product is sometimes referred to as the ‘vector product’. On the other hand, given some sort of metric or inner product61 structure, one may also define the ‘dot product’ of two vectors. The result is a scalar – hence why it is also referred to as the ‘scalar product’ (although the latter is more general). We shall recap what one can do with the dot and cross products, in the context of Einstein notation Recall that the dot product of two vectors u and v is related to the angle between these vectors. This is done explicitly by the following formula: u · v = u v cos(θ) (127) where u and v are the norms (magnitudes or lengths) of u and v, respectively and θ is the angle between u and v. This relation is made possible due to a special relation which is only true for positive-definite inner-product spaces – the Cauchy- Schwarz inequality: |v · u|≤ u v (128) which is true for any two vectors u and v. If two vectors are orthogonal (perpendicular), then their inner product is zero: u · v = u v cos( π 2 ) = 0. (129) Conversely, two vectors are parallel: θ = 0, their inner product maximises: u · v = u v cos(0) = u v (130) and if they are anti-parallel (pointing in opposite directions): θ = −π, their inner product minimises: u · v = u v cos(π) = − u v . (131) 61 An inner product is the formal name for the dot product – in fact, the dot product is just a special case of an inner product. It is the Euclidean inner product, since it makes use of the Euclidean notion of distance – which comes from Pythagoras’ theorem. 65
  • 66.
    To express orthogonalityin Einstein notation, we define a special object called the ‘Kronecker delta’: δij = 1, if i = j 0, if i = j Hence, for example δ11 = 1 and δ10 = 0. Therefore, in this notation, the compo- nents of the n × n identity matrix , which consists of 1 s down the main diagonal and zero everywhere else, are given by: ij = δij. (132) Now, the standard (Cartesian) basis vectors ej that we have been using, are in fact orthonormal – this means that they are mutually orthogonal (perpendicular) and that they are normalised to have unit length. We can express these conditions in Einstein notation using the Kronecker delta: ei · ej =δij ei := ?ei · ei (not summed) = 1, (133) where i, j = 1, 2, ..., n for vectors in n dimensions. Note that if we have a repeated index in Einstein notation and which is explicitly not summed over, we simply write ‘(not summed’) next to that quantity that contains the repeated index. When a quantity is contracted with Kronecker delta, it forces the contracted index to take the same value as the other index in the Kronecker delta, hence: vi δij = vj. (134) Problem 4 (Baby Steps) In n = 3 dimensions, show explicitly that viδi2 = v2 by summing over i = 1, 2, 3 and using the properties of the Kronecker delta. Problem 5 (Bigger Baby Steps) Using the fact that ei · ej = δij, show explicitly that the dot product of two vectors, v = viei and u = ujej is given by v · u = vi ui. (135) Now, use this expression to write down an expression for the length of a vector v in Einstein notation. One final application of the inner product (dot product) here, is in the context of projections. Projection here refer to projecting vectors onto other vectors or sub- spaces. Such an operation is very important in mathematics and physics, especially 66
  • 67.
    in the moreadvanced and abstract settings. One example that you should be famil- iar with from high-school is analysing the dynamics of objects sliding down incline planes – typically you look for the component of the gravitational force directed down the plane, which is simply the projection of the gravitational force vector in the direction of a vector pointing down the incline plane. Intuitively, the name ‘projection’ is motivated if you think of two vectors v and a starting at the point, with vector a lying on the ground and vector v pointing upwards at some angle. The projection of v onto the vector a is then the shadow that the vector v casts onto a – in general, this projection can be shorter, longer or the same length as the vector a. A natural tool for mathematically formulating projections is the inner product – for example, the dot product. In particular, the dot product of two vectors, gives you the magnitude of each vector in the direction of the other vector multiplied by the length of the other vector. Formally, we define projections as follows. Definition 4 (Vector Projections) The vector projection of vector v onto a vec- tor a is given by Projva := ¢ v · a a 1 a a, (136) which is equivalent to Projva = p v cos(θ)q 1 a a, (137) where θ is the angle between v and a. The number ¡ v·a a © = ( v cos(θ)) which multiplies the unit vector 1 a a is called the ‘scalar projection’ of v onto a or the ‘component of v along’ a. Hence one can view the vector projection of v onto a as unit vector 1 a a in the direction of a multiplied by the component of v along a – i.e. a vector pointing in the direction of a which has the length of the scalar projection of v onto a. Problem 6 (Bulgarian Baby Steps) Using Einstein notation, write down the pre- vious formulas for the projection of a vector v onto a vector a. Hint: This means finding the components (Projva)k of Projva and then multi- plying them by the standard basis vectors ek: Projva = (Projva)k ek. (138) Problem 7 (Russian Baby Steps) Using the Kronecker delta relations ei · ej = δij between the standard basis vectors (of unit length) stated earlier, compute the 67
  • 68.
    vector and scalarprojections of the vector v = viei in n = 3 dimensions, onto the following vectors: • e1 • e2 • e3 • ˆr = e1+e2+e3?3 • ˆ R = sin(θ) cos(φ)e1 + sin(θ) sin(φ)e2 + cos(θ)e3. • ˆ φ = cos(φ)e1 + sin(φ)e2. • ˆ θ = cos(θ) cos(φ)e1 + cos(θ) sin(φ)e2 − sin(θ)e3. You have just found the components of v in a Cartesian basis {e1, e2, e3} as well as a spherical-coordinate basis { ˆ R, ˆ φ, ˆ θ}. 10.1.2 Scalar and Vector Products – The Permutation Symbol As stated earlier, one can define different types of multiplication between vectors. An ‘inner product’ or ‘scalar product’ – such as the dot product, multiplies two vectors to give a number. However, another fundamentally useful type of multi- plication between vectors is given by the ‘cross product’ or ‘vector product’. This operation takes two vectors v and u, then produces a vector v × u whose direc- tion is perpendicular to both v and u, with a magnitude equal to the area of the parallelogram formed by the vector v and u. To define the cross-product in Ein- stein notation, we must first introduce a special object which appears everywhere in vector calculus and tensor calculus (Ricci calculus) / differential geometry. Definition 5 (Levi-Civita Symbol) The ‘Levi-Civita’ or permutation’ symbol ijk in three dimensional Euclidean space (so i, j, k take the values can take any value from 1 to 3), is a totally anti-symmetric object (but not a tensor!), which has the following properties: ijk =    +1 if (i, j, k) is (1, 2, 3), (2, 3, 1) or (3, 1, 2), −1 if (i, j, k) is (3, 2, 1), (1, 3, 2) or (2, 1, 3), 0 if i = j or j = k or k = i. We could alternatively define the permutation symbol to have the following prop- erties 1. Standard Orientation: 123 := +1 68
  • 69.
    2. Total Antisymmetry:ijk = − jik = − ikj, from which the previous properties would follow. Problem 8 (Simple Proof) Prove that the properties: 1. Standard Orientation: 123 := +1 2. Total Antisymmetry: ijk = − jik = − ikj, imply the following properties: ijk =    +1 if (i, j, k) is (1, 2, 3), (2, 3, 1) or (3, 1, 2), −1 if (i, j, k) is (3, 2, 1), (1, 3, 2) or (2, 1, 3), 0 if i = j or j = k or k = i. As such, the permutation symbol obeys the following multiplication property, which comes from its relation to the matrix determinant: ijk lmn = δil δim δin δjl δjm δjn δkl δkm δkn (139) = δil pδjmδkn − δjnδkmq− δim pδjlδkn − δjnδklq+ δin pδjlδkm − δjmδklq (140) Problem 9 (Not-so simple proof) Using the multiplication property, prove the fol- lowing contraction properties: εijkεimn =δj m δk n − δj n δk m εjmnεimn =2δi j εijkεijk =6. (141) Hint: Remember that one is summing each pair of repeated indices from 1 to 3. One should also note useful observation that δj j = 3 – to see this, recall that in sigma-notation, this is the same as saying 3 j=1 δj j = δ1 1 + δ2 2 + δ3 3 = 1 + 1 + 1 = 3. 10.1.3 Scalar and Vector Products – The Cross Product In Einstein notation, the components of the cross-product of two vectors v = vjej and u = ujej in three dimensions, are given by the following formula: (u × v)i = ijkuj vk , (142) 69
  • 70.
    whence the resultingvector is given by multiplying these components by the stan- dard basis vectors and summing them: (u × v) = ijk ujvkei. (143) Problem 10 (Simple Proof) Using the usual formula for the cross-product that you are used to (for example – via matrix determinants), prove that this is equiva- lent to the formula given above in Einstein notation. As such, we can see immediately why the cross-product is an antisymmetric oper- ation: (u × v) = −(v × u). (144) Problem 11 (Easy Proof) Prove the antisymmetry property of the cross product using the anti-symmetry properties of the Levi-Civita symbol ijk and the formula for the cross-product in Einstein notation. Note that we claimed earlier that the cross product of two vectors produces a vector which is perpendicular to both of the vectors you are crossing. This can be seen very easily using Einstein notation and the anti-symmetry properties of the Levi- Civita symbol: u · (u × v) =ui (u × v)i = ui ijkuj vk = ijkui uj vk = ijkuj ui vk swapping ui and uj = − jikuj ui vk interchanging i and j in ijk = − ijkui uj vk relabelling the dummy indices i and j = − u · (u × v). (145) Hence, since u · (u × v) = −u · (u × v), we conclude that u · (u × v) = 0, which means that u is perpendicular to (u × v) using the properties of the dot product. The cross product also has a geometric interpretation, which comes from the fol- lowing formula for the magnitude of the cross product: u × v = u v sin(θ) (146) where θ is the angle between u and v. 70
  • 71.
    Problem 12 (EasyProof) Using the previous formula (146), prove that u×v = 0 whenever u and v are parallel. Furthermore, argue geometrically on the basis of the previous formula (146), why u · (u × v) = 0. You may recognize that the quantity u × v given by the formula (146) is simply the area of a parallelogram with sides u and v. Problem 13 Given vectors v and u with units of length: [v] = [u] = L, use dimensional analysis and formula (146) to show that their cross product has units of area, L2. It may seem strange that the cross-product produces a vector with different units to each of the vectors you are crossing – however, this is natural when you consider the applications of the cross-product to physics and engineering. More importantly, it relates to the fact that the cross product is in-fact a ‘pseudovector’ rather than a true vector – a concept which is only properly understood in the context of a more general product called the ‘exterior product’ and an operation called the ‘hodge dual’. You can however, think of a pseudo-vector as one which behaves like a vector when rotated, but reverses direction (changes sign) when reflected. Problem 14 (Simple Calculations) Using the area formula for the cross product, compute areas of the parallelograms formed by the following sets of vectors in three dimensions: • e1 and e2 • e1 and e3 • e2 and e3. Finally, we finish on a rather important set of identities. Problem 15 (The Rotation Algebra: so(3)) Using the properties of the cross-product listed in this tutorial (or otherwise), prove the following critical identities for the standard Cartesian basis vectors in three dimensions: • e1 × e2 = e3 • e2 × e3 = e1 • e3 × e1 = e2. Hint: It suffices to show that (ej × ek) = ijkei. 71
  • 72.
    11 Tutorial 8:Design a Death Star – applications of Lie Groups/Algebras In this tutorial, we investigate how one can apply the theory of the Lie groups and Lie algebras to the construction and design of an orbital death star 62 – in particu- lar, an orbital space station equipped with high intensity Bose-Einstein condensate based gamma-ray LASERS, naval anti-missile lasers, electromagnetic rail guns and nuclear warheads. When it comes to military technology, the most advanced science often takes place in the form of weapons targeting, tracking and detection systems – a recent exam- ple is the huge investment in stealth technology and C.I.A drone reconnaissance by the United States military. This is because target detection and acquisition is paramount – after all, you can’t eliminate something if you can’t detect it and aim at it. Even master Sun Tzu understood the importance of this element of warfare 63. To this extent, we will see how the rotational Lie groups and Lie algebras, realized in matrix form, can be used to orient an orbital space station along with the gun turrets it is equipped with. We conclude by looking at the quaternionic represen- tation of the rotation group – which leads us to the first solid historical example of an abstract algebra (a ‘generalization’ of complex numbers), constructed by the famous Irish polymath – Sir William Rowan Hamilton. This tutorial will make use of matrices and matrix algebra, abstract algebras and group theory, vectors, rotations and various physical concepts. As such it should be mastered by engineering, physics, computer science and math students alike. Hopefully, it will unify and consolidate various areas of your studies – and maybe convince you to get a job in weapons design/satellite programming. 11.1 Notation For this tutorial, we will be sticking to Einstein notation (see Tutorial 7) – this means that whenever we see two indices repeated in some quantity that we are summing this quantity over all possible values of those indices (omitting the sum- mation symbol ). So for example, we denote a 3-dimensional real vector v in 62 For those of you who haven’t seen Star Wars, a death star is a large spherical-ish spaceship, the size of a small moon, equipped with a beam weapon which can destroy entire planets. 63 For those of you who need to read more – Sun Tzu’s “Art of War”. The Giles translation is recommended. 72
  • 73.
    terms of astandard basis e1, e2, e3 as: v = vi ei, (147) where the contracted index i ranges across i = 1, 2, 3: vi ei := v1 e1 + v2 e2 + v3 e3. (148) As before, we keep one index raised and one index lowered for a pair of repeated indices 64. Furthermore, components of vectors are raised – hence vj refers to the j − th component of the vector v (not the j − th power), for example. For those of you who didn’t attempt tutorial 7, you are probably most familiar with representing a vector by its components – v = (v1, v2, ..., vn) – this notation is fine, yet elementary as it hides the choice of basis (which is assumed to be the standard basis) by only displaying the components of the vector. 11.2 BFF: Linear Maps and Matrices As one progresses in the mathematical sciences, one frequents the land of matrix operations – for proofs, problems and simplifying calculations. Perhaps the main reason for their popularity is that there is a one-to-one correspondence between matrices and linear maps on vector spaces. In particular, a linear map L on a vector space V (e.g. 3-dimensional Euclidean space R3) is defined as follows. Definition 6 A linear map ˆL : V → V which maps the vector space V to itself, is one which has the following property: • Linearity: ˆL(au + bw) = aˆL(u) + bˆL(w) ∀u, w ∈ V, ∀a, b ∈ F (149) where F is some number field (e.g. the real numbers R or the complex num- bers C). How does this correspond to matrices? Notice that if we represent a vector v = viei := v1e1 + ... + vnen in an n-dimensional vector space (e.g. Rn) as a column vector: v = ¤ ¦ ¦ ¦ ¥ v1 v2 ... vn (150) 64 A convention which matters in non-Euclidean spaces, since it helps to distinguish covariant tensors (e.g. covectors such as the total differential) from contravariant ones (e.g. your usual vectors). 73
  • 74.
    Then we canreadily compute the action of some matrix on this vector via matrix multiplication. In particular, the action of an n × n matrix Aon an n-dimensional vector v will produce another n-dimensional vector, u = Mv – which we call the transformation of the vector v by the matrix M. For example: Av = ¤ ¦ ¦ ¦ ¥ A1 1 A1 2 · · · A1 n A2 1 A2 2 · · · A2 n ... ... · · · ... An 1 An 2 · · · An n ¤ ¦ ¦ ¦ ¥ v1 v2 ... vn = ¤ ¦ ¦ ¦ ¥ A1 1v1 + A1 2v2 + ... + A1 nvn A2 1v1 + A2 2v2 + ... + A2 nvn ... An 1v1 + An 2v2 + ... + An nvn (151) Alternatively, in Einstein notation, the action of the matrix A on the vector v is given by: u = Ai jvj ei (152) where the components Ai j of the matrix A correspond to the entry in the ith col- umn and jth row of A65. The contracted indices i and j run over 1 to n (the dimension of the vector space in which v lives). Now, if one recalls, the action of matrices on vectors is linear – that is, given any scalars λ, γ and any n-dimensional vectors v and u, then for any n × n matrices A and B we have: A(λv + γu) = λAv + γAu, (153) hence matrices obey the linearity property required by linear maps. In this sense, we can think of the components of a matrix as the components of a linear map in some chosen basis – conversely, by computing the action of a linear map ˆL on a set of basis vectors {ej}, we can determine its components in that basis – which we can view as entries in some matrix. To make this explicit with some examples, we shall see how rotation maps can be realized in matrix form. Exercise 16 (Apocalypse Now) Being quite bored of mathematics, physics, sword- play, music and games, Thomas McKenney chooses to partake in a new pastime – world domination. He decides the best way to undertake this, is to build his own star wars-inspired Orbital Death Star. The St. George’s College Board decides to fund Thomas in this pursuit – agreeing that world domination fits into the cultural expansion program as well as securing funding for building maintenance. To this extent, Thomas realizes he must complete the St. George’s College Mathematical Sciences tutorials in order to prepare his laser targeting algorithms. To aid Thomas in this noble enterprise, think of a way to mathematically express the statement – 65 Rows have a raised index and columns have a lowered index – taking the transpose of the matrix reverses this. 74
  • 75.
    “by computing theaction of a linear map ˆL on a set of basis vectors {ej}, we can determine its components in that basis. Hint: Compare the action of a linear map ˆL on a vector v with the action of some matrix A on v – in particular, compare the coefficients of standard basis vectors {ej} in the resulting transformed vectors: ˆL(v) and Av. Now look at the special case when v is simply equal to one of the standard basis vectors ej. 11.3 SO(3): The Lie Group of Rotations In 3-dimensional space Euclidean space, there are three independent axes of rota- tion in any given coordinate system. Rotations of vectors are linear maps – to see this, complete the following exercise. Exercise 17 (Microsoft Death Star) Linear operations are nice – firstly because they are relatively simple and second because they can be represented by matrices, meaning that they are easy to program and implement into computer algorithms. Therefore, to build a feasible laser targeting system, one would hope that pro- gramming the rotation of the laser turret amounts to linear operations. Taking an interest in weapons targeting systems, Emma Krantz decides to program such a system for her programming competition – to assess the feasibility, she has to prove that rotations are linear operations. Let ˆR represent some 3-dimensional rotation operation and v be some 3-dimensional vector. Argue geometrically that the action of ˆR on the vector v is linear – i.e. show that ˆR satisfies the linearity property required by a linear map. Hint: Given a 3-dimensional vector v, we can always scale it by some number λ ∈ R. If |λ| 1 we dilate the length of the vector and if |λ| 1 we contract it. Furthermore, if λ 0 we preserve the orientation of the vector and if λ 0 we reverse it. Argue that scaling first v → λv and then rotating the resulting vector λv is the same as first rotating v and then scaling it by λ — this shows that rotation is a degree 1 homogeneous operation. Hint XP: Further show that adding two vectors v + u and then rotating the sum of the two vectors, is the same as rotating each of the vectors separately (by the same rotation) and then adding the individual rotated vectors. This shows that rotations are additive operations – if you combine this property with the degree 1 homogeneous property, this gives the linearity property which proves that rotations are linear maps. 75
  • 76.
    If one setsup a 3-dimensional Cartesian coordinate system, with coordinates x, y, z (or x1, x2, x3) and standard basis vectors e1, e2, e3 corresponding to unit vectors in the x, y and z directions, respectively, then one has three independent rotation operators R1, R2 and R3 which rotate vectors about each of the correspondence axes (x, y and z). These are linear maps and hence can be represented as 3 × 3 matrices. We can also view them as functions of the angle which they rotate by. Explicitly, these matrices are: R1(θ) = ! 1 0 0 0 cos θ − sin θ 0 sin θ cos θ ( 0 ) (154) R2(β) = ! cos β 0 sin β 0 1 0 − sin β 0 cos β ( 0 ) (155) R3(γ) = ! cos γ − sin γ 0 sin γ cos γ 0 0 0 1 ( 0 ) (156) Geometrically, R1(θ) rotates any vector v anti-clockwise 66 by an angle θ about the x-axis – this means it rotates v in a plane perpendicular to the x-axis. Simi- larly, R2(β) rotates by an angle β anticlockwise about the y-axis and R3(γ) rotates anticlockwise by an angle γ about the z-axis. Exercise 18 (Eigenvectors of Rotation) Clearly if you have a vector that lies along the x-axis and you rotate it about the x-axis, nothing happens to the vector. This is because any vector lying along the x-axis is an eigenvector of the x-rotation matrix R1(θ). More generally, if we rotate any vector v = v1e1 + v2e2 + v3e3, about the j-th coordinate axis, then its j-th component will not change. Q: Using matrix multiplication and representing each vector as a column vector, show that: Rj(θ)ej = ej, (no summation) (157) which means that the standard basis vector ej is an eigenvector of the rotation operator Rj with eigenvalue 1. 66 Almost always in mathematics, anti-clockwise is considered to be a positive orientation and clockwise is considered to be negative. 76
  • 77.
    Now, using theprevious result and the fact that rotations are linear operators, prove that67: Rj(θ)v = k=j (vk Rj(θ)ek) + vj ej, (158) where the summed index k = j means you sum over all values (1, 2, 3) of k not equal to j. Hence, rotations about a given axis preserve the component of any vector along that axis. Problem 16 (The Proof is Trivial) If you rotate a vector about an axis through angle of zero degrees, the vector should remain unchanged. Verify that all three rotation operators Rj(θ) become the 3 × 3 identity matrix (the matrix with 1’s down the main diagonal entries and zeros everywhere else) when you set the angle θ = 0. As it turns out, the set of rotation matrices forms a mathematical structure known as a ‘Lie Group’. As such they are used for lying/truth algorithms. Actually that’s a lie – they are actually a type of ‘continuous’ or rather ‘smooth’ group (as opposed to a discrete group) named after the mathematician Sophus Lie, who developed and pioneered them. Lie groups are of fundamental importance to modern physics and mathematics – in fact, they are the core element underlying major developments in particle physics 68, high energy physics and gauge theory. We define a Lie group as follows. Definition 7 A Lie group G is a differentiable manifold which is also a group whose group operations are smooth (infinitely differentiable). This means that G equipped with the operation satisfies the group properties • Closure/Binary Operation: If A, B ∈ G then A B ∈ G. • Associativity: For any A, B, C ∈ G, A (B C) = (A B) C. • Identity Element: ∃I ∈ G such that I A = A I = A, ∀A ∈ G. • Inverses: For any A ∈ G∃B such that A B = I. If is a multiplica- tive operation, we denote B = A−1, the inverse of A. IF is additive (or commutative), we denote B by −A. 67 This is not using the Einstein summation convention – so vj ej is for a fixed value of j, not a sum over all possible values of the index j. 68 The Standard Model of Particle physics is in fact a Lie Group – this tells us the symmetries that nature obeys for the electromagnetic, weak and strong nuclear forces. 77
  • 78.
    where is abinary operation69 (e.g. matrix multiplication) which is smooth. Exercise 19 (YOLO) Unaware of the on-going ‘Project Death Star’ of St. George’s College, University Hall decides to hold a party to show how awesome they are. After shouting YOLO, a drunken University Hall student jumps into a pit of horny honey badgers and dies a humiliating death. Despite making it into the prestigious Darwin Awards, this is tragic because that student lived a life without ever proving that the real numbers R form a group under addition – and that the non-zero real numbers R{0} form a group under multiplication. Using your wisdom and foresight to avoid a similar fate, prove that the real num- bers form a group under the addition operation + with 0 being the additive iden- tity element. Similarly, prove that the set of non-zero real numbers forms a group under the multiplication operation × with 1 being multiplicative identity element. Together, these statements imply that the real numbers form a special mathematical structure called a ‘field’. Rotations form the Lie Group SO(3), which is the 3-dimensional ‘Special Orthog- onal Group’. This group is characterized as the set of 3 × 3 matrices {A} which have the following properties 70 • det(A) = 1 • AT A = 1 for any rotation matrix A. Since the determinant of a linear map tells you how the map distorts volumes, the first condition (the ‘Special’ part) says that rotations preserve volumes – this is a consequence of the more general observation that ro- tations are isometries of Euclidean space, meaning that they preserve lengths of vectors and relative angles between vectors (rotating any pair of vectors simulta- neously leaves the angle between them unchanged). Furthermore, since the second condition (the ‘Orthogonal’ part) can be written as: AT = A−1 (159) where A−1 is the inverse of the rotation matrix A, the second condition says that rotations are orthogonal71 transformations – meaning they preserve orthogonality of vectors (or that the column vectors in a rotation matrix are mutually orthogonal). 69 A binary operation on a set V , is one that combines two elements a, b of V to give another element of V : a b = c ∈ V . Examples of binary operations include addition of numbers or vectors, multiplication of numbers and cross products of vectors. 70 Recall that det means the matrix determinant of A and AT denotes the matrix transpose of A. 71 Recall that orthogonal is the mathematical term for ‘perpendicular’. 78
  • 79.
    Hence, the secondproperty comes from the fact that isometries preserve angles between objects. Note that the group operation for SO(3) is matrix multiplication – which is a smooth operation since it essentially amounts to the multiplication and addition of numbers. Exercise 20 (Killing Time) Whilst waiting on the construction of the death star by the St. George’s College engineering, science and mathematics students (as well as legal approvals from Georgian law graduates), Thomas feels the urge to kill – kill time that is. As a member of the St. George’s College Orbital Death Star, help Thomas kill time by explicitly showing that the rotation matrices Rj(θ) satisfy the two properties which characterize the special orthogonal group, SO(3). Hint: It helps to show that for any rotation matrix Rj(θ), one has (Rj(θ))T = Rj(−θ) = (Rj(θ))−1, which can be argued geometrically and/or algebraically using the fact that cosine is an even function 72 cos(θ) = cos(−θ) and that sine is odd: sin(−θ) = − sin(θ). Exercise 21 (Group Project: Project Death Star) In an attempt to understand rotations better for the programming of a weapons targeting system on the Geor- gian Death Star, the members of the SGC Mathematical Sciences Tutorial sit down and try to prove that the set of rotation matrices, SO(3), form a group. Since this includes you, complete this proof. This means verifying that SO(3) satisfies the four properties required to be a group, with matrix multiplication being the group operation. Hint: Recall how the 3 × 3 identity matrix I3 acts on a 3-dimensional vector v – that is, I3v = v. Furthermore, to show that every element of SO(3) has inverse, consider Ru(θ) – an arbitrary rotation operator which rotates objects anticlock- wise through an angle θ about an axis defined by the vector u, then consider how one would undo rotations performed by Ru(θ). Because the Lie Group SO(3) is transitive, we can write any general rotation as a product of finitely-many rotation matrices. For us, this means that we can write any rotation as a sequence of rotations about the x, y and z axes: R(α, γ, β) = R3(γ) R2(β) R1(α). (160) Note that since matrix multiplication is not commutative, the order in which mul- tiply (hence the order in which we rotate) matters. In particular, when the rotation 72 Recall that even functions f(x) are symmetric about x = 0 and odd functions are anti- symmetric. 79
  • 80.
    R(α, γ, β)given by (160) acts on a vector v, it rotates it first by an angle α about the x-axis, then by an angle β about the y-axis and finally by an angle γ about the z-axis. In general, we could write down a matrix Ru(θ) which rotates objects anticlockwise about some axis defined by the vector u through an angle θ – indeed, such a matrix is given by the (easy-to-prove) ‘Rodrigue’s rotation formula’, which we will investigate later. Exercise 22 (Spring Cleaning) After finally getting building and environmental approvals, as well as successfully subduing Greens Party protesters, St. George’s College sends Project Death Star into its testing phase. Having a particular dis- taste for Justin Bieber, Thomas decides that he wants to aim and fire the gamma- ray LASER on the death star at Justin Bieber’s hometown – during Christmas when Justin Bieber is home with his family. For shielding reasons, in its inactive state, the Death Star’s cannon is oriented along the x-axis in the following figure. Figure 3: Aiming an Orbital Death Star with sequential rotations. This is because the cannon portion of the death star has weaker armour. In order to fire the death star at Justin Bieber, Thomas must rotate the death star to point at Ontario, Canada. After the death star is oriented in this way, Emma Krantz’s targeting algorithm will takeover and refine the aim to Justin Bieber’s house. The coordinate system we use is centred with the death star at its origin. In order to shoot JB, the death star must be oriented in the direction of the purple ray in the above diagram. This can be achieved by feeding the correct rotation matrix into the death star targeting systems. There are multiple ways to construct such a matrix – 80
  • 81.
    however, for ourpurposes, it is easiest to construct it by sequential rotations about the three different coordinate axes. Q: Write down the rotation matrices corresponding to the rotations indicated by each of the angles – α, β, γ – show in the diagram. Note that these are not nec- essarily in the order x − y − z! Once you’re confident that you have the correct rotation matrices, multiply these matrices in the correct order to give a rotation matrix which will rotate the death star cannon from the x − axis to Justin bieber’s home state. Hint: It helps to keep track of which coordinate stays constant under a certain ro- tation – recalling the rotation eigenvectors, it then follows that you are performing a rotation about that coordinate axis. For example, the γ rotation correspond to an anticlockwise rotation about the y coordinate axis through an angle γ. After pointing the death star at Ontario, the Krantz algorithm takes over and per- forms a super-accurate shot – killing Justin Bieber with minimal collateral dam- age. Fearing that the warlike nation of Canada will retaliate with direct line-of- sight missile attacks, Thomas decides it is best to return the death star to its original orientation along the x-axis – the side that faces Canada will thus have more ar- mour as well as an anti-missile system featuring an array of LAWS Naval lasers stolen from the U.S. Military. Q: Write down a sequence of rotations to rotate the death star to its original ori- entation. Now write down a single rotation matrix to perform this total rotation. Hint: Recall the fact that rotation operations form a Lie group – in particular, this means that every rotation has an inverse. Recalling the properties of the ro- tation group SO(3), in particular the orthogonality property: AT = A−1, there is a super-easy way to invert the death star rotation and return it to its original orientation. Alternatively, recall that you showed R(−θ) = (R(θ))−1 – either al- gebraically or geometrically. Use this to find the rotation matrix which returns the death star to its original orientation. 11.4 so(3): Quaternions, Lie Algebras and Cross Products Due to the extent of this tutorial, we will defer our investigation of Lie algebras and the fate of the St. George’s College Orbital Death Star to the next tutorial. This will involve a space battle with ships made by our rival colleges, so make sure you keep your Georgian spirit alight by attending the next tutorial! Anyone found AWOL will be marked as traitors and executed by the death star accordingly. 81
  • 82.
    12 Tutorial 9+10:The Fault in Our Stars – Project Death Star (II) In the last tutorial, recall that we investigated the following concepts: • Linear Maps and Matrices: Every linear transformation acting on a finite- dimensional vector space can be represented as a matrix in some chosen basis (usually the standard basis). To see this explicitly, we saw how a linear map, f, acted on a set of basis vectors {ei} to give the components of some matrix, Ai j, representing the linear map f in that basis. • Rotations: We argued geometrically that rotations are linear transformations – hence they have a matrix representation. We the showed that these matri- ces formed a special structure called a ‘Lie group’, which we denoted by SO(3) – the 3-dimensional Special Orthogonal Group. Having character- ized rotations as a Lie group, we then used several properties of this group structure to construct a rotation matrix which rotated the St. George’s Col- lege Death Star from an idle position to one pointing at Justin Bieber’s house – subsequently firing and killing Justin Bieber. In this tutorial, we will extend what we learned about linear maps and SO(3) – the Lie Group of Rotations, to the idea of a ‘Lie Algebra’. In particular, we will see how vectors in three dimensions, equipped with the cross-product operation form a Lie Algebra. We will then introduce the idea of the ‘matrix exponential’ and see how to derive the rotation matrices using their corresponding Lie algebra. In the next and final Project Death Star tutorial, we shall unite these ideas by introducing quaternions and Clifford algebras, then seeing how they can be used to represent rotations in the most computationally efficient and stable way. For now however, we will continue the adventure of ‘Project Death Star’ whilst employing the power of mathematics along the way to vanquish our rival colleges. 82
  • 83.
    12.1 Infinitesimal Rotationsand Lie Algebras Recall that in the standard basis {e1, e2, e3} for 3-dimensional Euclidean space R3, anticlockwise rotations about the x, y and z axes are represented by the matrices R1, R2 and R3, respectively. These matrices (as a function of the rotation angle) were stated to be: R1(θ) = ! 1 0 0 0 cos θ − sin θ 0 sin θ cos θ ( 0 ) (161) R2(β) = ! cos β 0 sin β 0 1 0 − sin β 0 cos β ( 0 ) (162) R3(γ) = ! cos γ − sin γ 0 sin γ cos γ 0 0 0 1 ( 0 ) (163) We then argued using the eigenvectors of rotation, why these matrices correctly represented rotations about their respective axes. We can however, derive these matrices in several ways. One such way is via Lie algebras. To motivate this connection, we one may ask as Sophus Lie did – how the rotation group behaves on the infinitesimal scale. That is, how do we represent rotations through a very small (infinitely small) but non-zero angle? For simplicity, we shall first consider rotations through an infinitesimal angle δθ, about the x, y and z axes. As such, there are several equivalent ways of constructing such rotations: • Taylor expanding the rotation matrices about zero – i.e. computing ˆR(0 + δθ). This means performing a Taylor expansion of each of the functions (sines and cosines) in the rotation matrices about zero: sin(0 + δθ) = ∞ n=0 (−1)n (2n + 1)! (δθ)2n+1 ≈ δθ + O((δθ)3 ) (164) cos(0 + δθ) = ∞ n=0 (−1)n (2n)! (δθ)2n ≈ 1 + O((δθ)2 ). (165) Note that the ‘big O’ notation +O((δθ)k) means ‘plus terms of order k or greater’ in theta – i.e. terms which involve a factor of (δθ)k, (δθ)k+1, 83
  • 84.
    (δθ)k+2 .... etcetera.Hence if we set the angle δθ 1, all higher order terms (δθ)2, (δθ)3 ... have diminishing contributions to our representations of the sine and cosine functions. In particular, if the angle δθ is ‘infinites- imal’, any term which has δθ squared or any higher power, has negligible contribution 73 – thus we discard these terms and keep only terms which are linear in δθ – i.e. (δθ)1 and constant terms. • We compute the linear approximation to the rotation matrices. This is pre- cisely the same as doing the first order Taylor expansions of the sin and cos functions about θ = 0. Another way to view this is to recall the way we used the total differential to compute the ’absolute error’ in our earlier tutorials – what we defined to be the error (omitting the absolute value signs) ∆f in some quantity f, in-fact corresponds to a first order Taylor expansion or lin- earisation (‘the tangent plane approximation’) to our function f about some initial value: f(θ + ∆θ) ≈ f(θ) + ∆f(θ) = f(θ) + df dθ |θ∆θ. (166) In this case, we are replacing a finite shift in angle with an infinitesimal one: ∆θ → δθ. Regardless of which way view it, the result is the same. Problem 17 (Warm-up) Having run out of elaborate excuses to skip the Sunday SGC Mathematical Sciences tutorials, Angela decides to enrol in the Australian SAS 74. Having passed the fitness tests (which include a large lung capacity), she meets an ironic twist. As it turns out, Brigadier Daniel McDaniel 75 is actually Daniel Ogburn in disguise (as hinted by his suspicious last name). To pass Angela, he therefore decides that a true test of her aptitude for solving new problems quickly (required in combat), is to get her to derive the following infinitesimal rotation 73 In a branch of mathematics known as ‘non-standard analysis’, one rigorously (axiomatically) define (δθ)k = 0 for k ≥ 2 and thus the approximate equals symbol becomes a formal equality. 74 Special Air Service regiment of the Australian Army – an elite commando unit. 75 Current commander of the Australian Defence Force’s Special Operations Command. 84
  • 85.
    matrices: R1(δθ) = ! 1 00 0 1 −δθ 0 δθ 1 ( 0 ) (167) R2(δθ) = ! 1 0 δθ 0 1 0 −δθ 0 1 ( 0 ) (168) R3(δθ) = ! 1 −δθ 0 δθ 1 0 0 0 1 ( 0 ) (169) See if you can complete Angela’s problem and get into the Australian SAS under Daniel’s criterion. Hint: Start with original finite rotation matrices, then use the Taylor expansions / linearisation discussed previously to arrive at the infinitesimal rotations. Now, notice that all the infinitesimal rotation matrices have 1 s down the main diagonal – hence they can be written as the 3×3 identity matrix I3 = diag(1, 1, 1) plus some matrix involving the infinitesimal angles δθ (which we can express as δθ multiplying some matrix) : R1(δθ) =I3 + δθE1 R2(δθ) =I3 + δθE2 R3(δθ) =I3 + δθE3, (170) 85
  • 86.
    where the matricesEj are defined by76 E1 = ! 0 0 0 0 0 −1 0 1 0 ( 0 ) (171) E2 = ! 0 0 1 0 0 0 −1 0 0 ( 0 ) (172) E3 = ! 0 −1 0 1 0 0 0 0 0 ( 0 ) (173) I3 := ! 1 0 0 0 1 0 0 0 1 ( 0 ) (174) Problem 18 (Orbital Warm-up) Having shut-down the St. George’s College Or- bital Death Star to update the Death Star software, Thomas decides that he should verify the above decomposition of the infinitesimal rotation matrices given by the set of equations (170). Using simple matrix algebra, Verify that these expressions are indeed true – hence giving an efficient approximation (which doesn’t involve sines or cosines) to small-rotations (fine aiming) of the Death Star gamma-ray laser. If we have a vector v, then apply an infinitesimal rotation77 ˆR(δθ) to it, we get the rotated vector ˆR(δθ)v – which can be computed in some chosen basis using ma- trix multiplication and representing v as a column vector (recall the last tutorial). Hence the amount δv that the vector v has shifted under the infinitesimal rotation δθ is given by the difference: δv = ˆR(δθ)v − v = ( ˆR(δθ) − I3)v, (175) since I3v = v. Therefore, the infinitesimal rate of change of the vector v under 76 I included an explicit expression for the identity matrix I3 for those of you who don’t know what this is. 77 Henceforth we shall use ˆR(θ) to denote some arbitrary rotation about some axis, through an angle θ anticlockwise. 86
  • 87.
    the rotation ˆRwith respect to the rotation angle δθ, is given by: δv δθ = ( ˆR(δθ) − I3) δθ v. (176) This looks suspiciously like a derivative – it is in fact a formal derivative if we take the limit as the rotation angle δθ → 0. To see this, we consider a vector u(θ) which is a function of the rotation angle θ performed by the rotation ˆR(θ). In particular, we let this vector function coincide with the constant vector v when it is not-rotated – i.e. u(θ = 0) = v. For a general rotation angle θ, we therefore have u(θ) = ˆR(θ)u. The infinitesimal rate of change of this vector function with respect to θ is therefore expressed as the derivative: d dθ u(θ) = [ d dθ ( ˆR(θ) − I3)]v = [ d dθ ˆR(θ)]v. (177) We take the derivative of a matrix which has functions as entries by taking the derivative of each function – since the identity matrix I3 is constant, its derivative is just the zero matrix: d dθ I3 = 0. Since we were considering infinitesimal rotations originally (‘small angles close to zero’), we evaluate this derivative of the vector function u(θ) at the origin θ = 0 (which means taking the derivative then setting θ = 0): d dθ u(θ)|θ=0= [ d dθ ( ˆR(θ)]|θ=0v. (178) Notice that as θ varies, the vector u(θ) traces out a curve78 as it rotates – since the vector d dθ u(θ)|θ=α represents the rate of change of u at θ = α, is therefore tangent to the curve at θ = α. Equivalently, we can consider the derivative of the rotation matrix [ d dθ ( ˆR(θ)]|θ=α to be ‘tangent to the rotation operator’ ˆR(θ) at θ = α in the abstract sense. When θ = 0, any rotation operator is simply represented by the identity matrix I3 – which is the identity element of the Lie Group SO(3) of rotations 79. Hence infinitesimal rotations correspond to rotation matrices which are ‘close’80 to the identity matrix I3. As it turns out, if we restrict our attention to the behaviour of the rotation group SO(3) about the origin – i.e. infinitesimal rotations and rotation matrices ‘close’ to the identity matrix I3, then in particular, the matrices: d dθ ˆR(θ)|θ=0 (179) 78 Such curves are called ‘integral curves’ and the vector field d dθ u(θ) corresponds to the ‘direction fields’ you may know from the theory of differential equations. 79 Recall the last tutorial 80 The notion of matrices being ‘close’ can be formalized by defining a metric or ‘norm’ (notion of distance) for matrices – for example, the Froebenius or Hilbert-Schmidt operator norms. 87
  • 88.
    which are ‘tangentto the identity matrix I3’, form a special type of mathematical structure called a ‘Lie algebra’. Lie algebras are very special, because in some sense they are a ‘linearisation’ or ‘first order approximation’ to a Lie group close to the identity element of that group. More formally, if we recall that a Lie Group is a smooth manifold (a generalization of a ‘smooth surface’ to arbitrary dimen- sions), then its Lie algebra is defined to be the tangent space at the identity – which you can think of as the ‘tangent plane approximation’ to the Lie group at the iden- tity element 81. Many properties of the Lie Group are encoded in its Lie algebra and because the Lie group is a non-linear object in general, it is much easier to investigate the Lie algebra (which is a linear structure – a vector space) to deduce properties of the Lie group. Problem 19 (Devil in the Detail) On a bout of procrastination, Rowan Seton de- cides to attend a SGC Mathematical Sciences Tutorial on the St. George’s College Orbital Death Star. To stop Rowan from putting cats on Emma’s computer, Daniel orders Rowan to complete the following calculations – fitting because Rowan likes to go on tangents. In particular, show that for the rotation matrices Rj(θ) (where j = 1, 2, 3) that the corresponding tangent matrices at the identity element are given by: d dθ R1(θ)|θ=0=E1 d dθ R2(θ)|θ=0=E2 d dθ R3(θ)|θ=0=E3 (180) which motivates the decomposition we performed earlier for infinitesimal rota- tions. The purpose of the last exercise is to show that the matrices {Ej} correspond to the linearisation of the rotation group SO(3) about the identity matrix – that is, that they are tangent matrices living in the Lie algebra82 so(3) of the rotation group. Having some intuition now of how to construct a concrete Lie algebra (and a Death Star), we now proceed with a formal definition of a general Lie algebra. 81 Recall that for a curve, the tangent line to that curve at any point x is a linear approximation to the curve about that point. The tangent plane approximation generalizes this notion to differentiable surfaces/geometric objects of arbitrary dimension. 82 For a lie group G, one usually denotes its Lie algebra by g – which is lower-case G in the ‘fraktur’ font. 88
  • 89.
    Definition 8 ALie algebra g is a vector space83 g (over a field F) equipped with a binary operation84 [ , ] : g × g → g (u, v) → [u, v] (181) called the ‘Lie bracket’, which satisfies the following properties: 1. Left Linearity: [αu + βv, w] = α[u, w] + β[v, w] ∀u, v ∈ g, ∀α, β ∈ F (182) 2. Anti-symmetry: [u, v] = −[v, u] ∀u, v ∈ g, ∀α, β ∈ F (183) 3. Jacobi Identity: [u, [v, w]] + [v, [w, u]] + [w, [u, v]] = 0 ∀ u, v, w ∈ g. (184) Thus, we can think of a Lie algebra as a vector space whose vector multiplication operation is the Lie bracket. However, earlier we said that the tangent matrices {Ej} were elements of the Lie algebra – implying that they are vectors. This is not a mistake – when we refer to a Lie algebra g as a vector space, it means a vector space in an abstract sense (not column vectors!). A quick review of the vector space axioms85 (defining properties) should reveal that the set of n × n real or complex-valued matrices form a vector space – the basis for the vector space has n2 basis vectors; one such basis consists of the matrices µij whose entries are all zero except for entry in the i − th row and j − th column (which we can set to be 1). In this manner, what we referred to as ‘tangent matrices’ are indeed tangent vectors in this abstract sense. Problem 20 (The Girl Who Cried Wolf) In one of many universes in the multi- verse, a St. George’s fresher by the name of Sophia Lie continually makes excuses not to attend the SGC Mathematical Sciences tutorials. This is because she has questioniaphobia – a fear of asking questions. One day, an optically- and radar- cloaked spaceshuttle docks with the the St. George’s College Dragon (the newly 83 Recall that a vector space over a field F is a set of vectors which obey the usual rules of vector addition and scalar multiplication – for our purposes we usually take the field to be the real or complex numbers, R and C. 84 Recall we defined binary operations in Tutorial 8. 85 Ask your tutor. 89
  • 90.
    elected name forthe Death Star). A team of Saint Catherine’s raiders board the shuttle and capture Sophia while she is in her room – destroying the Dragon’s fine- targeting systems on the way. Sophia sends an sms to her fellow Georgians in the MS tutorials, but they refuse to believe her. To convince them that she is serious, she decides to complete Tutorial 8 and 9. To help Sophia, prove that the Left-Linearity and Anti-Symmetry properties of a Lie algebra together imply Bilinearity: [αu + βv, w] = α[u, w] + β[v, w], [w, αu + βv] = α[w, u] + β[w, v] ∀u, v ∈ g. (185) Furthermore, show that if we replace the Left-Linear property with the Bilinear property and replace the anti-symmetry property with the alternating property: [u, u] = 0 ∀ v ∈ g, (186) then these together imply the anti-symmetry property 86. In our case, the Lie algebra so(3) of the rotational Lie group SO(3) is a vector space whose (abstract) vectors are 3 × 3 matrices satisfying certain conditions. To find these conditions, we recall that the Special Orthogonal Group SO(3) was to defined to be the set of 3 × 3 matrices which satisfied the criteria: • Volume and Orientation Preserving: det(R) = 1 • Orthogonality: RT R = 1 ⇐⇒ RT = R−1 . If we now look at what happens to these conditions when R(δθ) = I3 + δθE is a matrix representing an infinitesimal rotation δθ about some axis and E is some tangent matrix at the identity, then the orthogonality condition gives: (I3 + δθE)T =(I3 + δθE)−1 =(I3 − δθE) =⇒ ET = −E cancelling terms on both sides (187) which we can write as 87 ET + E = 0 ∀ E ∈ so(3). (188) 86 Note that this latter redefinition allows one to extend the notion of a Lie algebra to vector spaces over number fields with a characteristic of 2. 87 Note we used the fact that the inverse rotation (R(δθ))−1 = R(−δθ) is given by rotating in the reverse direction. 90
  • 91.
    This condition meansthat all tangent matrices E – i.e. all matrices (abstract vec- tors) in the rotational Lie algebra so(3) are anti-symmetric (symmetric about the main diagonal but with opposite signs). As a consequence all matrices in the Lie algebra are traceless – which is the infinitesimal form of the det(A) = 1 condi- tion: tr[E] = 0 ∀E ∈ so(3). (189) Exercise 23 (Trial By Combat) In an on-going rivalry over who is taller, Leanora and Daniel decide to duel on the bridge of the St. George Dragon. After 5 seconds of attempted kicks and punches, Lea clumsily slips over and gives herself a con- cussion – requiring Aston to rush her to the nearest hospital on the International Space Station. As a winner of the duel, Daniel officially renames ‘Lie Algebras’ to ‘Lea Algebras’ and ‘Lie groups’ to ‘Ogburn groups’, since Lie algebras represent the infinitesimal (vanishingly small) approximation to a Lie group. As part of this process of re-writing all textbooks on Lie group theory, prove that the anti-symmetry condition: AT + A = 0 ∀ A ∈ so(3). (190) implies the traceless condition: tr[A] = 0. Hint: Recall that transposing a matrix doesn’t change its trace: tr[E] = tr[ET ]. Now show that the traceless condition tr[E] = 0, implies the rotation matrix R(θ) = eθE satisfies the volume/orientation preserving condition: det[R(θ)] = 1. Hint: Note that the exponential here is the ‘matrix exponential’ of the matrix E (multiplied by the scalar θ) – which we will investigate later. For now it suffices to use the following general exponential relation between the trace and determinant of any square matrix A det[eA ] = etr[A] . (191) Now that we have covered a fair amount of ground, it is time we move towards a climactic result in our adventure. To do this, we define the Lie bracket on a matrix Lie group to be given by the matrix commutator: [A, B] := AB − BA ∀ n × n matrices A, B. (192) Recall that matrix multiplication is not commutative, so in general AB = BA – the commutator [A, B] is thus a measure of ‘how much’ the matrices A and B fail to commute. 91
  • 92.
    Ater completing thefollowing exercise we will see the link between the lie algebra of rotations so(3) to vector cross-products and Lie groups – which is perhaps geo- metrically hinted at by the right-hand rule and the orthogonality of cross products. Exercise 24 (The Twelve Labours of Joshua) In an unfortunate turn of events, Joshua Bailey is blamed for the destruction of the computer systems controlling the automated fine-targeting of the anti-missile/anti-shuttle Laser Weapon Systems (LAWs)88. Not realizing this sabotage was lead by the Saint Catherine’s student – Bronwen Herholdt89, posing as a competitor in the Inter-college Piano Competi- tion, the warden of St. George’s College sentences Joshua to twelve labours in the land of Lie groups and Lie algebras. Recall that the matrices {Ej} were shown to be matrices which were tangent to the rotation matrices {Rj(θ)} at the identity I3 of the Lie group of rotations. As a friend of Joshua, to show that these matrices form a Lie algebra – the rotation algebra so(3), you should help Joshua complete the following tasks: 1. The n − th power An of some square matrix A, is given by multiplying A by itself n times. For the three tangent matrices Ej (j = 1, 2, 3), compute: (Ej)1, (Ej)2, (Ej)3, (Ej)4. In particular show that: (Ej)2 =Matrix which becomes − I2 after deleting the j’th row and column (Ej)3 = − Ej (Ej)4 =Ej (193) For example, (E1)2 = ! 0 0 0 0 −1 0 0 −1 ( 0 ) (194) (195) 2. Using the previous results, show that for n ≥ 1: (Ej)2n =Matrix which becomes (−1)n I2 after deleting the j’th row and column (Ej)2n+1 =(−1)n Ej. (196) 88 The U.S. Navy anti-missile /anti-UAV/anti-torpedo system: https://www.youtube.com/ watch?v=gMfYUyrKRng. 89 Our enemy, but my friend. 92
  • 93.
    3. By writingdown a general 3 × 3 anti-symmetric matrix (AT = −A), you should see that it is parametrised by three unknowns (real numbers): A = A(α, β, γ). In particular, show that any anti-symmetric matrix A can be written as a linear combination of the {Ej} matrices: A(α, β, γ) = αE1 + βE2 + γE3, α, β, γ ∈ F. (197) This says that the set so(3) of 3×3 anti-symmetric matrices is a 3-dimensional (abstract) vector space with {E1, E2, E3} acting as a set of (abstract) basis vectors – hence why we denoted them using ‘E’ initially. 4. Using the matrix commutator, [A, B] = AB −BA, as the Lie bracket, prove that the abstract vector space so(3) is indeed a 3-dimensional Lie algebra – the special orthogonal algebra. To do this, simply verify that the matrix commutator [_, _] is a binary operation and that so(3) satisfies the three 3 properties required by a Lie algebra. Hint: First show that the matrix commutator [_, _] is anti-symmetric and left- linear in general, then show that it obeys the Jacobi identity in general. It then suffices90 to show that [_, _] is a binary operation – i.e. that the commutator [A, B] of any 3 × 3 anti-symmetric matrices A, B is also anti-symmetric, by showing that (using Einstein summation notation): [Ei, Ej] = ijkEk (198) where ijk is the Levi-Civita symbol defined in tutorial 7. 5. Those of you familiar with vector cross-products will notice the similarity between the Lie algebra relation: [Ei, Ej] = ijkEk and the cross-product relation for the standard basis vectors {ej} in 3-dimensions: ei × ej = ijkek . (199) This is because 3-dimensional Euclidean space R3 equipped with the vector- cross product is indeed a Lie algebra! In-fact, it is identically the same Lie algebra as so(3) simply presented in another way – we therefore say these Lie algebras are ‘isomorphic’. By defining the Lie bracket on R3 to be the cross-product: [v, u] := v × u, v, u ∈ R3 (200) 90 If the commutator of any basis vectors produces an anti-symmetric matrix, bilinearity then im- plies that the commutator is a binary operation. 93
  • 94.
    show that (R3,×) is indeed a Lie algebra. Hint: You can essentially copy the proof you used for so(3) or find the (obvious) isomorphism (one-to-one correspondence) between R3 and so(3) – which has been hinted at in many ways. 6. In proving that cross-products in R3 form a Lie algebra, we have the bilinear property in particular. This then shows that the following operator: [r, _] = r× (201) is a linear operator, acting on vectors in R3 to give the cross product: [r, _](v) := [r, v] = r × v. (202) Recalling the correspondence between linear operators and matrices, it fol- lows that the operator r× has a matrix representation – this representation is given by the Lie algebra isomorphism between R3 and so(3): ej ↔ Ej, u × v ↔ [ui Ei, vj Ej]. (203) In particular, we represent v× by the following matrix (using Einstein sum- mation91) v× → [v]× = vj Ej = ! 0 −v1 v2 v1 0 −v3 −v2 v3 0 ( 0 ) (204) (205) Show explicitly by matrix multiplication, that [v]×u = v × u. Hint: Repre- sent u as a column vector in the standard basis. 7. Earlier you computed the general odd and even powers, (Ej)2n+1 and (Ej)2n, of the tangent matrices. If you were observant, you will have realized that: E4n = E is the same periodic relation that the imaginary unit i4n = i obeys. This is part of a deeper connection between Lie algebras and Lie groups given by the ‘exponential map’. For compact connected Lie groups like the rotation group SO(3), one can recover the entire group from its Lie algebra – i.e. all information about the rotation group can be obtained from knowledge of its infinitesimal behaviour about its identity. 91 Recall in 3-dimensions that v = vj ej = v1 e1 + v2 e2 + v3 e3. 94
  • 95.
    Formally, we definethe matrix exponential of an arbitrary matrix A as: eA := 1 n! An , (206) provided the series converges. Note we define the zeroth power of a square matrix to be the identity matrix: A0 = I. The matrix exponential obeys the usual properties of the exponential function except that in general: eAeB = eA+B, since matrix multiplication does not commute 92. In general, the exponential map is given by the exponential map of Rieman- nian geometry – which makes use of the fact that a Lie group is a smooth manifold. Matrix Lie groups, such the rotation group, SO(3), are just a special case in which the general exponential map can be expressed as the matrix exponential. Q: Show that eθEj is a solution to the matrix differential equation d dθ Rj(θ)|θ=0= Ej. (207) Hint: Use the series expansion of eθEj and the fact that (θA)n = θnAn for an scalar θ and any square matrix A. 8. As promised, we now use Lie algebras to establish a fundamental link be- tween cross-products and rotations. In particular, using the definition of the matrix exponential, show that the rotation matrices Rj(θ) are given by ex- ponentiating the tangent matrices which act as a basis for the Lie algebra so(3): eθEj = Rj(θ), (208) for j = 1, 2, 3. 9. Using previous observations, we can express a rotation about an axis defined by some unit vector ˆv using our Lie algebra isomorphism and the matrix exponential. In particular, [ˆv]x = vjEj and: Rˆv(θ) = eθvjEj . (209) This is extremely inefficient, but if you have infinite time, check that the above expression coincides with that given by ‘Rodrigue’s Rotation For- mula’. Otherwise, try expanding both expressions to say – first order in 92 The correct relation is given by the Baker-Campbell-Hausdorff formula. 95
  • 96.
    θ, then checkthat both Rodrigue’s formula and the SGC formula coincide to first order. Note that if the rotation angle θ is small, but not infinitesimal, you can still obtain approximations of the rotation matrix Rˆv(θ) or arbitrary accuracy by taking more terms in the exponential series expansion. 10. It is possible to derive the standard rotation matrices, Rj(θ), corresponding to rotations about the x, y and z axes by recalling the correspondence be- tween linear maps and matrices. In particular, using trigonometry and stan- dard geometry, you can derive a formula which rotates a vector v = (x, y, z) about the z axis by keeping the z component constant. Recall from tutorial 8 that the matrix components of a linear map acting on some vector space were given by its action on the basis vectors for that vector space (using Einstein summation): ˆRv = ( ˆR)i jvj ei. (210) Thus, in particular for the standard basis vectors: ˆRei = ( ˆR)j iej. (211) So for example, for a rotation anticlockwise about the x-axis acting on a unit vector in the x-direction, we have: R1(θ)e1 =(R1(θ))j 1ej =(R1(θ))1 1e1 + (R1(θ))2 1e2 + (R1(θ))3 1e3 =e1 (212) since e1 is an eigenvector of x-axis rotations. Hence without knowing (R1(θ))1 1, (R1(θ))1 2, and (R1(θ))3 1, we can then determine these components of the x- rotation matrix by comparing the coefficients of the rotated vector e1. From this we deduce that: (R1(θ))1 1 = 1 and (R1(θ))j 1 = 0 for j = 2, 3. Simi- larly, by geometrically finding R1(θ)e2 and R1(θ)e3 via trigonometry, you can work out the rest of the components of the x-rotation matrix R1(θ). In the fashion just demonstrated, derive the x, y, z rotation matrices {Rj(θ)}. 11. Having neared the end of his Labours, the Warden decides to give Joshua the peaceful and easy task of verifying Rodrigue’s Rotation Formula: v(θ) = v0 cos θ + (k × v0) sin θ + k(k · v0)(1 − cos θ) (213) which describes a vector v0 rotated through an angle θ about an axis defined by the unit vector k. In particular, check that when you set the rotation 96
  • 97.
    axis k equalto one of the standard basis vectors for R3: k = ej, that the resulting rotated vector v(θ) is the same as the vector Rj(θ)v you would get by applying the Rj(θ) rotation matrix. 12. Deciding that the last task was easy (despite being tedious), the Warden sets a final labour for Joshua – to use the Lie-algebra isomorphism between (R3, ×) and su(3) to express Rodrigue’s rotation formula in matrix form. This means constructing an explicit matrix Rk(θ) such that Rk(θ)v0 = v(θ). Hint: You can express the right-hand-side of Rodrigue’s formula as some matrix / linear operator acting on the vector v0, by writing the cross product operators as matrices. Hint: You will also need to use to express the dot product in matrix form – as a row vector (on the left) multiplying a column vector (on the right). Hint: You will need to use the vector-triple product formula for cross prod- ucts to collect the cosine terms. Exercise 25 (The Fault in Our Stars) Following the sabotage of the targeting com- puters for the LAWs defence system, the Orbital Death Star is an orbiting duck. Moments after the Georgians receive Sophia Lie’s solutions and warning message, Trinity College and University Hall fire two Space Honey Badgers at the Georgian Dragon. Unable to use the Krantz algorithm to aim the laser turrets, Angela – who makes a guest appearance at the critical moment, decides to use the infinitesimal rotation matrices to track the slow-moving honey badgers. Such a strategy avoids having to evaluate the sine and cosine functions without a computer – an approx- imation made possible by the fact that the lasers only have to perform small, slow rotations to track the fearsome honey badgers 93. To be continued ... 93 This is sufficient since the speed of light (hence a laser beam) is 3 × 108 m/s, negating the need to consider time-of-flight at close distances. 97
  • 98.
    13 Tutorial 11:Fiery the angels fell – Project Death Star (III) 13.1 Prelude Having not completed the last tutorial, the students of the St. George’s College Mathematical Sciences tutorial were unable to manually shoot down the space honey badgers fired from the Trinity College and University Hall sponsored dread- noughts. As such, the honey badgers, Beelzebub and Mammon unleashed the en- tirety of Pandaemonium on the St. George’s College Orbital Star (The Dragon). Luckily however, by some hidden cunning, the regular attendants – Matthew Fer- nandez, Joshua Bailey and William Cheng 94, managed to escape the fiery col- lapse of the the dragon. With some resourcefulness – and the help of a distraction provided by Georgie, the college puppy, they secured the death star blueprints ... bringing them back to the SGC Mathematical Sciences tutors. In the last tutorial, the ‘big picture’ themes and main ideas that you should have understood were as follows. 1. With every Lie Group, there comes attached a corresponding Lie algebra. Geometrically, we found this to be some abstract tangent plane approxima- tion to the Lie group, encoding information about the group in an infinites- imally small neighbourhood about its identity element. For ‘compact con- nected’ Lie groups, such as the rotation group, one is able to reconstruct the entire group simply by knowing its Lie algebra – the reconstruction being performed by the exponential map. 2. A Lie algebra is an abstract vector space, characterized by an algebraic bi- nary operation known as the ‘Lie bracket’. This operation was bilinear, anti- symmetric and obeyed the Jacobi identity. 3. The algebra of vector cross-products in 3-dimensions and the lie algebra of rotations were two explicit examples of a Lie algebra, which had a concrete matrix representation. They were in-fact the ‘same’ Lie algebras in the sense that they were isomorphic (the same Lie algebraic structure represented in different ways). 4. The matrix exponential map allowed us to reconstruct the rotation matrices from the tangent matrices (representing cross-product operators) we derived. 94 Disclaimer: These students requested to feature in the story, on the premise of their consistent commitment. 98
  • 99.
    Such a mapis fundamental to understanding stability analysis, dynamical systems, linear differential equations, Riemannian geometry and various ab- stractions in higher-level mathematics. In this tutorial, we will investigate the following structures as well as their applica- tions: 1. The Circle Group S1, complex exponential and rotations in the complex plane (2-dimensions). 2. Hamilton’s Quaternions, the 3-dimensional sphere S3 and rotations in 3- dimensional space. 13.2 The Circle Group By now, most of you should have some familiarity with the algebra of complex numbers. Recall that a complex number can be represented in Cartesian form, z = x+iy, where x, y ∈ R. As such, we can identify vectors in the complex plane C with the 2-dimensional real plane R2, via the map: z → (Re(z), Im(z)) (214) which sends z = x + iy to the ordered pair, (x, y). This identification (isomor- phism), allows for a very efficient way to rotate 2-dimensional real vectors, via Euler’s formula eiθ = cos(θ) + isin(θ) (215) and the polar (radius-angle) representation of complex numbers: z = reiθ , r = |z|= — x2 + y2, tan(θ) = y x . (216) In particular, given any 2-dimensional real vector, v = xe1 +ye2, we can represent it as the complex number z = — x2 + y2ei arctan( y x ) . If we denote θ = arctan(y x ), then to rotate v by some angle α anti-clockwise in the complex plane, we simply use the algebraic structure of the complex numbers: z = eiα z = |z|eiα eiθ = |z|ei(α+θ) . (217) To recover the rotated real vector v from the complex number z , we simply use Euler’s formula: z = |z|ei(α+θ) = |z|(cos(α + θ) + i sin(α + θ)), (218) 99
  • 100.
    followed by theinverse isomorphism: v =|z |cos(Arg(z ))e1 + |z |sin(Arg(z ))e2 = — x2 + y2 cos(α + θ)e1 + — x2 + y2 sin(α + θ)e2. (219) In this manner, we avoid the use of two-dimensional rotation matrices and change- of-basis formulas. You might think – well, so what, rotations in two dimensions are so easy that compute them while running out to front lawn during a 5:30am fire drill. Well, that may be true – however, the natural question would be to ask, can we use some clever isomorphism with a higher-dimensional generalization of the complex numbers, to easily compute rotations in 3-dimensions? Of course we can. Problem 21 (DYI: Isosceles Triangle) Lamenting the loss of The Dragon, Georgie and the less-committed tutorial attendants, Matt, Joshua and William decide to overcome their grief (and pay their respects) by completing more mathematics problems. By choosing your favourite vector in R2, use the above isomorphism between R2 and C to construct an isosceles triangle. This means making two copies your vector and using Euler’s formula to rotate them in opposite directions, by your favourite acute angle α. You may then need to horizontally or vertically translate your vectors away from the origin to form an acute triangle, with either the horizontal or vertical axes as the base. Hint: Don’t choose the zero vector 0. Your favourite vector should in fact be (1, 0) or e1. Don’t choose the zero angle. Your favourite acute angle should be π 4 or π 3 . You can also form the third side by creating a vector with vector-subtraction. As this point, you may be wondering what the ominous ‘circle group’ is. Strictly speaking, a circle is a 1-dimensional smooth manifold, denoted by S1 (meaning the 1-dimensional circle). A circle only appears to be two-dimensional because we embed it in a 2-dimensional or 3-dimensional space – however, for a circle of some fixed radius r, you can always parametrise it by one angular variable θ. You could parametrise it in Cartesian coordinates (x, y) with two variables – however, this requires the constraint r = — x2 + y2, meaning you can always write one variable in terms of the other (because the radius is fixed) – hence really leaving only one independent variable. If we now restrict our attention to complex numbers (or 2-dimensional real vectors) of some fixed radius – say r = 1 for simplicity, then we obtain a subset S1 of the complex plane C, corresponding to the unit circle centred at the origin. As it turns out, this subset forms an algebraic structure called ‘the circle group’ – satisfying the axioms of an abelian group. 100
  • 101.
    Exercise 26 Byeither consulting tutorial 8 or one of your tutors for the axioms of an abelian group, show that subset of complex numbers with unit length form an Abelian group. Hint: Recall that if z, w ∈ C are complex numbers with unit length, then we can represent them in polar form by: z = eiθ and w = eiφ, where θ and φ are the principle arguments of z and w, respectively. Therefore, you should parametrize S1 as the set: {eiθ : θ ∈ R}. Strictly speaking, we should restrict to θ ∈ [0, 2π) and use modular arithmetic (the formal term for what you usually do anyway):θ + 2π ≡ θ[mod2π]. Stronger to the previous result, the circle group is in fact a Lie group! To show this, you could demonstrate that the multiplication map is smooth (simply corre- sponding to the addition of angles) and that the unit circle S1 is a smooth manifold (for example, by forming charts from stereographic projections). Since it is com- pact (closed and bounded) and connected (meaning any two points on the circle are connected by some path on the circle), it follows that we can reconstruct the circle group from its Lie algebra via the ‘exponential map’. Exercise 27 (Circular Reasoning) Wanting to design a bigger, better Orbital Death Star, the three amigos decide to program a new targeting algorithm with the alge- bra of Quaternions. However, with the closure of the university (due to Greens riots regarding their endorsement of environmentally-unsustainable science projects) and the permanent collapse of the ’BigAir’ server, the three amigos are set on a quest to find and construct the Quaternion algebra. As such, they decide the cir- cle group is a good place to start – maybe the rotation group can be reconstructed as a product of three circle groups? First take any element z = eiθ of the circle group, then look at its infinitesimal form by setting the rotation angle θ → δθ, where δθ is infinitesimally small. To this extent, you can use the first order Taylor series expansion of eiθ to analyze the structure of S1 about the identity (θ = 0): z = eßθ =≈ 1 + iθ. (220) By replicating the derivation of the Lie algebra for the 3-dimensional rotation group, show that the lie algebra elements (u(1)) of the circle group are given by: dz dθ = iθ, θ ∈ [0, 2π). (221) Therefore, we can represent any element of the circle algebra u(1) by iθ – some angle multiplied by i. Since multiplication of two circle group elements corre- 101
  • 102.
    sponds to additionof angles (via the properties of the complex exponential), the Lie bracket on the circle algebra u(1) is given by: [a, b] = ab − ba. (222) Show that the elements of u(1) do indeed obey the properties of a Lie algebra with this Lie bracket. Hint: Since the multiplication of complex (or real) numbers commutes, this exer- cise is trivial as [a, b] = 0 ∀a, b ∈ C – i.e. the Lie bracket is zero, hence trivially satisfies all required properties. Formally, the circle group is often referred to as the 1-dimensional ‘Unitary group’, U(1), characterized by the unitary condition: zz† = 1 ∀z ∈ U(1), (223) where † is the ‘conjugate transpose’ – for complex numbers, this is simply the com- plex conjugate. By the relation eiθ = e−iθ it is clear that all elements of the circle group satisfy the unitarity condition: eiθeiθ = e0 = 1. As such, the circle group acts as a building block for all higher-dimensional compact, connected, abelian Lie groups – that is, all such Lie groups are simply a direct product of circle groups: Tn = S1 × ... × S1, corresponding to an n-dimensional torus (the 2-dimensional torus is your familiar doughnut). Unfortunately, despite being central to the construction of abelian Lie groups, the circle group does not serve as a building block for quaternions – our desired alge- braic structure to represent rotations in 3-dimensions. You see this easily by noting that rotations in 3-dimensions don’t commute (equivalently, the matrix multipli- cation of rotation matrices isn’t commutative) – hence there is no chance that a commutative group such as S1 will serve as an appropriate building block. Finally, as a closing remark on the circle group, one should observe that L2(S1) – the space of ‘square integrable’ functions95 defined on the unit circle, is simply the space of periodic functions (we can always normalize the circumference 2π to any period we want). As such, the representation theory of Lie group S1, gives rise to Fourier series and Fourier analysis – quintessential to modern mathematics, engineering, computer science, physics, chemistry, biology and music (acoustic theory). 95 Lesbegue measurable functions: f ∈ L2 (S1 ) =⇒ S1 |f|2 ∞. 102
  • 103.
    13.3 The Quaternions Thestory of Quaternions starts96 with the Irish polymath – Sir William Rowan Hamilton, who is by far and large, one of the most influential people in the his- tory of the mathematical sciences (at least on par with Euler). Hamilton was by all accounts, a genius at an early age – when he wasn’t busy advancing the human frontiers of knowledge in mathematics and physics, he was topping languages at his university in his spare time. One problem that took his fancy, was a way to extend complex numbers (which we showed corresponded to 2-dimensional real numbers) to higher spatial dimensions. Although he could not find a 3-dimensional gener- alization, when working with four dimensions he created quaternions. According to Hamilton, “on October 16 he was out walking along the Royal Canal in Dublin with his wife when the solution in the form of the equation i2 = j2 = k2 = ijk = −1 (224) suddenly occurred to him; Hamilton then promptly carved this equation using his penknife into the side of the nearby Broom Bridge. These are the defining relations for the Quaternionic algebra – and certainly quite a discovery considering there are exactly four normed division algebras (the real and complex numbers being two of them)! We shall now continue our investigation. In the same fashion that we put complex numbers into correspondence with two- dimensional vectors in R2, we can put Quaternions into correspondence with four- dimensional vectors in R4. This is done by representing an arbitrary quaternion Q in the following way: Q = a + ib + jc + kd, (225) where a, b, c, d ∈ R are real numbers and i, j, k are the quaternionic generaliza- tion of the imaginary unit for complex numbers, obeying the fundamental rela- tion: i2 = j2 = k2 = ijk = −1. (226) The addition of Quaternions is performed in the obvious way, like the addition of complex numbers – you can treat it as 4-dimensional vector addition, where the coefficients of i, j, k add separately and the scalar part adds separately: Q1 + Q2 =(a1 + ib1 + jc1 + kd1) + (a2 + ib2 + jc2 + kd4) =(a1 + a2) + i(b1 + b2) + j(c1 + c2) + k(d1 + d2). (227) 96 Technically speaking, Benjamin Olinde Rodrigues came up the defining relation for quaternions around the same time as Hamilton – but Hamilton is credited historically, perhaps because he did a deeper investigation of their algebraic structure, whilst applying them to physics with great success. 103
  • 104.
    Similarly, the otherrules of addition of complex numbers (such as associativity and commutativity) still hold – likewise with scalar multiplication and its distribution over addition. What changes however, is that unlike complex numbers, the mul- tiplication of quaternions is not commutative! That is, Q1 × Q2 = Q2 × Q1 in general (cf. matrix multiplication). This follows directly from Hamilton’s funda- mental relation. Exercise 28 (Quaternions: A Quadrivial Quandary) Having rediscovered Hamil- ton’s Quaternions whilst walking over a bridge in Dublin, Ben Luo gives the fol- lowing problem to the surviving members of the St. George’s College Death Star. In particular, using the defining relations: i2 = j2 = k2 = ijk = −1, (228) prove the following identities: ij = k, ji = −k, (229) jk = i, kj = −i, (230) ki = j, ik = −j. (231) Hint: Try multiplying the equation ijk = −1 (or subsequent products), from the left or right by i, j or k while using the relations i2 = −1, j2 = −1, k2 = −1. Note that you must keep track of the order in which you multiply – just like you would for matrices or vector cross-products! Given an arbitrary quaternion Q = a + ib + jc + kd, we call a the ‘scalar part’ and ib+jc+kd the ‘vector part’. This is because if we set b = c = d = 0, the resulting quaternion is simply a real number – which behaves like a scalar. Similarly,if we set a = 0, the resulting quaternion Q = 0 + ib + jc + kd is ‘imaginary’ (provided at least one of the other coefficients is non-zero) – it will exhibit vector behaviour, in a fashion we will investigate later. First however, we must understand how the operations of conjugation and inversion behave for quaternions – and in particular, how to define the ‘length’ (norm) of a quaternion. The conjugation of quaternions is a direct extension of the conjugation of complex numbers. In particular, given a quaternion Q = a + ib + jc + kd, its conjugate97 is defined by: ¯Q = a − ib − jc − kd. (232) That is, the conjugates of the quaternionic imaginary units are ¯i = −i,¯j = −j and ¯k = −k. Since ¯¯Q = Q, conjugation of quaternions is said to be an ‘involution’ – meaning an operation that undoes itself (squares to give the identity). 97 Some of you may prefer the notation Q to denote conjugates – either one is fine as long as you specify it. 104
  • 105.
    Exercise 29 (ConjugatingZachary) Caught in a moral ethics debate with Zach Menschelli over the construction of a new St. George’s College orbital death star, Matthew Fernandez decides to represent Zach’s argument as a set of quaternion. His logic is that by conjugating the quaternion – and therefore Zach’s argument, he will confuse Zach and win him over in the moral ethics debate. Thus, we consider the following ... The conjugation of complex numbers cannot be expressed by multiplication or ad- dition – it a unique operation in that sense and corresponds to the geometric fact that conjugation equates to a reflection about the imaginary axis (an operation in the two-dimensional orthogonal group, O(2) with determinant equal to −1). However, quaternions being friendly creatures, permit an algebraic representation of conjugation: ¯Q = − 1 2 (q + iqi + jqj + kqk). (233) Using the previous identities derived, prove (via quaternion multiplication) that this conjugation identity coincides with the first definition: ¯Q = a − ib − jc − kd. (234) Having defined conjugation for quaternions, we are now in a possible to define a sensible notion of ‘length’ (norm). As inspiration, one may recall that we can compute the length (modulus) |z| of a complex number z, by the following for- mula: |z|= ? ¯zz, (235) which simply follows from the polar representation of complex numbers. Simi- larly, we can define the length of quaternion Q = a + ib + jc + kd, to be: Q = ˜ Q ¯Q = ˜ ¯QQ = — a2 + b2 + c2 + d2. (236) The last equality coincides with the 4-dimensional Euclidean norm – or equiva- lently, the four-dimensional version of Pythagoras’ theorem for the distance be- tween the origin (0, 0, 0, 0) and a point (a, b, c, d). Exercise 30 (The Social Norm) With college pride at an all time low – due to the loss of The Dragon, the students of St. George’s College start to take on a more serious disposition. As part of this, they shift their gaze to the stars and the realm of mathematics that lies beyond. Therefore, at the next college party, to avoid being perceived as uncouth, you are asked to prove that: ˜ Q ¯Q = — a2 + b2 + c2 + d2, (237) 105
  • 106.
    using the algebraicproperties of quaternions. Furthermore, show that if we mul- tiply a quaternion Q by some real number λ, then λQ = |λ| Q – i.e. the norm scales linearly. Finally, show that the norm is multiplicative. This means showing that for any two quaternions Q1, Q2, we have: Q1Q2 = Q1 Q2 . (238) Given this sensible ‘norm’ (length) for quaternions, we can define a notion of dis- tance (formally, a ‘metric’) on the space of quaternions. In particular, the dis- tance between two quaternions Q1 and Q2 is defined to be the norm of their differ- ence: ρ(Q1, Q2) = Q2 − Q1 , (239) where the map ρ is the 4-dimensional Euclidean (Pythagorean) metric. This coin- cides with the usual notion of distance which you are familiar with in 1, 2 and 3 dimensional vector spaces. Exercise 31 (Transcending Pythagoras) In a particular, Joshua meets the an- cient Greek mathematician Pythagoras (an eternal one) while on an excursion between worlds. Inspired by this meeting, he thinks of a correspondence be- tween quaternions and Euclidean geometry, which is as follows. Representing the Quaternions Q1, Q2 as points (or vectors) (a1, b1, c1, d1) and (a2, b2, c3, d4) in four dimensional real space R4, show that the distance formula ρ(Q1, Q2) = Q2 − Q1 , (240) simply gives the four-dimensional equivalent of Pythagoras’ theorem. Hint: This means showing that Q2 − Q1 = — (a2 − a1)2 + ... + (d2 − d1)2. Having defined the length of a quaternion, one can now make sense of what it means to multiplicatively ‘invert’ a quaternion – i.e. to compute its reciprocal. In particular, it was stated earlier that Quaternions were one of only four normed divi- sion algebras – this means that there must exist some may of dividing quaternions, which requires inverting them. Similar to complex numbers of unit length – which we showed formed the ‘circle group’ (corresponding to two-dimensional rotations), one can define a quaternion of unit length as follows. Given any Quaternion Q, the corresponding unit Quaternion is given by: ˆQ = 1 Q Q. (241) 106
  • 107.
    This is notsurprising as it how we usually construct unit vectors. To construct an inversion formula, we draw on the analogy provided by complex numbers. In particular, given any complex number z, we know that z¯z = |z|2, which is a real scalar. If we were to divide both sides by |z|2, we would have: z¯z |z|2 = 1. (242) Rearranging, we have: z 1 |z|2 ¯z = 1, (243) hence we can see explicitly that for any non-zero complex number z (|z|= 0), its inverse is given by: z−1 = 1 |z|2 ¯z. (244) One would expect that if you multiply a quaternion and its conjugate, that you get a real scalar. Dividing the resulting number by that scalar would give 1, meaning that inverting a non-zero quaternion (i.e. a quaternion whose components are not all zero) should amount to the same procedure as we just demonstrated for complex numbers. This is indeed true, hence given any non-zero quaternion Q, we can construct its inverse Q−1 using the following formula: Q−1 = 1 Q 2 ¯Q. (245) Exercise 32 (Broadening Units) Prove that above formula for the Quaternion in- verse works – this is very trivial. In particular, show that Q 1 Q 2 ¯Q = 1 using any of the previous formulas derived. Now, write down your favourite quaternion (one with all components a, b, c, d = 0) and compute the corresponding conjugate quaternion ¯Q, unit Quaternion ˆQ and the inverse quaternion Q−1. Finally, per- form the left and right multiplications explicitly: QQ−1 and Q−1Q and show they are both equal to 1. 13.4 Quaternions, Rotations and the 3-Sphere Recall we suggested that it might be possible to construct some algebra repre- senting 3-dimensional rotations, by taking the product of three copies of the circle group S1? This would form a 3-dimensional torus T3 = S1×S1×S1 parametrized 107
  • 108.
    by three separateangles. Note that your usual torus (yummy doughnuts) is 2- dimensional:T2 = S1 × S1 since they are parametrized by two angles. Physically, you can see this explicitly by cutting a torus horizontally or vertically – clearly it consists of two different circles whose symmetry axes are perpendicular to each other (unless it is an oblique torus). This intuition was not bad, although it was wrong. The correct geometric object representing 3-dimensional spatial rotations, is in fact the 3-dimensional hyper- sphere, S3 – referred to as the ‘3-Sphere’. Note that the spheres you are most familiar with are the 2-dimensional surfaces S2 – they parametrized by two angles, which you probably refer to as latitude and longitude (or θ and φ)98. As stated in a previous tutorial, the 2-dimensional sphere is the boundary of 3-dimensional ball – which is a solid (boundary surface + interior).Thus, you can think of a 3- dimensional sphere as the boundary surface of a 4-dimensional ball – if you ever want to be tripped out, Youtube a simulation of a topological 3-Sphere. We shall now see how quaternions of unit length are in fact a direct representation of the symmetries of 3-dimensional sphere S3. As such, they provide an algebraic bridge between the Lie group of rotations SO(3), the 3-dimensional sphere and the special unitary group SU(2) (we which haven’t covered) – the latter being a sym- metry group in quantum mechanics (regarding spin and angular momentum) and also the symmetry group representing interactions occurring via the weak nuclear force. To see the correspondence between quaternions and rotations, recall Euler’s ro- tation theorem which says that “Any rotation or sequence of rotations of a rigid body or reference frame about a fixed point, is equivalent to a single rotation by a given angle θ about some fixed axis that runs through the fixed point.. As we saw in tutorials 8 − 10, this was rather obvious when we constructed ‘Rodrigue’s Rotation Formula’ – the data this required was some 3-dimensional unit vector u = u1e1 + u2e2 + u3e3 representing the rotation axis, along with a scalar θ representing the rotation angle. Given a unit imaginary quaternion Q, a quaternionic generalization of Euler’s for- mula is as follows: Q = e 1 2θ(uxi+uyj+uzk) = cos 1 2θ + (uxi + uyj + uzk) sin 1 2, (246) where i, j, k are quaternionic imaginary units. We now see why we called the coef- ficients of i, j, k the ‘vector part’ of the quaternion Q – they do indeed correspond to some 3-dimensional vector (ux, uy, uz), which in this case describes a rotation 98 Mathematicians and Physicists often use opposite conventions... 108
  • 109.
    axis. This quaternioncan be thought of as a function of four-variables: the rotation axis u (3 variables) and the rotation angle θ (1 variable) and will act as a linear operator that rotates any 3-dimensional vector v by some angle θ, anti-clockwise about the axis u. Explicitly, we first represent any 3-dimensional real vector v = v1e1 +v2e2 +v3e3 by a purely imaginary quaternion: v = iv1 + jv2 + kv3. The rotated quaternion v (rotated by an angle θ anticlockwise about an axis u) is given by a group action known as ‘conjugation’ by the quaternion Q: v = Q−1 vQ, (247) using quaternionic multiplication (note that order is very important – if you conju- gate the wrong way, you will get the reverse rotation). To recover the rotated vector, you simply apply the reverse isomorphism and replace the quaternionic imaginary units in the rotated quaternion with the standard basis vectors for R3. Exercise 33 (Rotary Club) Having learned to rotate vectors using quaternions, the three amigos were subsequently given lifetime membership to the Perth Rotary Club. As part of this membership, they must explain how one would perform two successive rotations of a 3-dimensional vector using unit quaternions Q1 and Q2, representing rotations about axes u1, u2 and angles θ1, θ2. For you to obtain membership, write down an algebraic expression to do this. Now, in your expression – how do you guarantee that the rotation Q1 is performed before Q2, and not the other way around? Generalize your results to a sequence of n rotations, where n = 0, 1, 2, 3.... Finally, using the rotation matrices from the Lie group of rotations as an inspi- ration, suggest an easy way to invert an imaginary unit quaternion representing some rotation. Hint: you think of the conjugation operation as well as replacing the rotation angle with its negative. Exercise 34 (Drones) Having got wind of the renewed Project Death Star, the U.S. Military decides to send an Amazon drone to St. George’s College to deliver a book entitled: Freedom. Inside this book is a 1 megaton nuclear warhead. Having not ordered this book, Ian Hardy decides to set up an old 20th century cannon on top of the college tower – but replaces the internal structure with a LAWS naval laser. The cannon being at an orientation of 45 degrees from the horizontal, facing the river (the x-axis) with the y-axis aligning with Tommy Moore College and the z- axis being vertically overhead. After setting the cannon to be the origin of some 109
  • 110.
    3-dimensional real vectorspace – with z = 0 coinciding with the top of the tower, the Freedom Drone approaches the college at a coordinate of (10, 7, 5) metres. Write down a single quaternion Q that will rotate the cannon so that its line of sight is directed head-on at the Freedom drone. Now use this quaternion to rotate the cannon and check that it indeed works. Hint: You may use three sequential rotations to obtain a single quaternion, or you can try and find the appropriate rotation axis u and angle θ to write down the quaternion in one go. Finally, in the same manner that were able to use the Lie algebra of rotations to perform vector cross products in 3-dimensions, we can also perform the scalar product (dot product) and vector cross product of 3-dimensional vectors using the isomorphism between 3-dimensional vectors and imaginary quaternions. In par- ticular, given any two real 3-dimensional vectors v and u, we represent them as quaternions v and u, by replacing the standard basis vectors {e1, e2, e2} with the quaternionic imaginary units {i, j, k} – which is what we did previously. Now, we can write their dot product and vector cross product in quaternion form: v · u = 1 2 (v¯u + u¯v) = ¯vu + ¯uv, cross(v, u) = 1 2 (vu − ¯u¯v). (248) Replacing the quaternionic imaginary units with the standard basis vectors for R3 in the resulting quaternions (i.e. apply the inverse isomorphism), gives us the re- sulting vector cross product of the two real vectors v and u. Exercise 35 (Mid-Semester Break) Deciding that university exams were too easy, the three amigos continued the renewed project Death Star (v2.0), during the mid- semester break. As part of their daily intellectual exercise, help them verify that the above quaternion identities do indeed reproduce the dot and cross products computed by less-extravagant means. This concludes our investigation of quaternions. There are many more surprising and interesting properties as well as their appearance in different areas of the math- ematical sciences – perhaps one day, we will return to them. For now, Adieu and happy exams/holidays. Note I expect everyone to complete unfinished tutorials before next semester! Michael Champion is back, so there is no hiding. :) 110
  • 111.
    14 Interlude: Academicand Intellectual Maturity You may or may not be aware, but this semester the college has decided to ‘adopt’ some briefings on ‘academic maturity’. This is in essence, to meet a strong de- mand/need that was implicit in Semester 1, but perhaps not explicit till a combined review of the academic status of students at this college. The purpose of this set of points as a subset of some initial pondering I made at the end of Semester 1, is to lay bare a set of essential facts (reviewed after several discussions) which I had erroneously taken for granted to be common knowledge. Ultimately, the aim of the academic component of St. George’s college, is to foster the development of wise, mature and competitively adept intellectuals who will make the most of their innate and external potential. As your tutor, who wants to see the best progress in your character development (not just academic), I am here to provide some guidance and support. Sometimes this includes stating the hard facts and hard truths. One would hope that you adopt these observations into your future judgements and considerations of how you orient yourselves in the real world – many of the lessons learned at university are just a reflection of the human condition at large. 14.1 Keeping a CV For the specific purpose of illustrating what opportunities are available to mathe- matics and physics students, along with the general sort of ‘steps’ one must climb to attain those opportunities, I will upload my academic (Mathematics/Physics) Cur- riculum Vitae. Of course, everyone will have different journeys to take, but some- times it helps to borrow ideas – or at least see what’s available on the menu. Disclaimer: The CV which I will upload is a very ‘generic’ academic style one – with extra information at the end. Note that one would almost never submit a full-length CV when applying for scholarships, external workshops and schools or professional appointments. In particular, one should always write a ‘purpose- specific’ tailored CV for applications. The generic full-length CV you keep is more of a personal record or ‘master file’ from which you can copy and paste / edit to appropriate length for applications. Note that it’s extremely important to create and update a CV during your univer- sity training years. This is not just to help you apply for scholarships, travel or job opportunities – it’s also a very precise and encouraging way to keep track of your development! Personally, I would suggest using the typesetting program ‘LaTeX’ 111
  • 112.
    to create andupdate your CV (once you learn the minimal programming required it’s more efficient than word). This gives it a neater and more professional look. LaTeX is also the standard typesetting used for most professional communication (journal articles, lecture notes, books etc) in the academic world – making Mi- crosoft office quite an outdated fossil which all academics should upgrade from (for example, it makes referencing extremely easy). 14.2 Important Learnings and Observations Unfortunately, with respect to selection criteria, there is an abysmal correlation be- tween high school performance (ATAR scores) and university performance (course- work results). There are many reasons for this discrepancy. • Fallacy: High ATAR = Intelligence = High University Performance. • Fallacy: High ATAR = Strong Initiative and Self-Discipline. • Fallacy: High ATAR = Self-Motivated and Passionate. • Fallacy: High ATAR = Good Self Time-Management. • Partial Truth: High ATAR = Well-developed study skills. When people perceive that they are very successful – especially in a ‘be-all-end-all’ type affair as the Year 12 ATAR scores are hyped up to be, they can become naively complacent. They assume that success with their ATAR entitles them to success at university, at life or that it bestows them with the eternal label: “Intelligent person. Sometimes, in a few cases where this success was determined mostly by the individual student – and not the support of their teachers, parents, tutors or schooling system, then complacency is somewhat justified. For the most part, it is (from my observations) – pathological hubris. This is not to say that students should not be rewarded or congratulated for success in high school – but rather, that they should know early on, that ATAR scores are a very superficial means of assessing ‘potential for success’. In this way, students can be made aware from a very early stage that they must be willing to learn and re-adapt themselves to new challenges and new scenarios that they are presented with. University is one such challenge. This should be part of the student’s gradual process of self-awareness, accountability and increased responsibilities, but also part of the university and college responsibility – if they wish to maintain students with ‘above average’ mindsets. To perform well at university – meaning main- taining an average of ≥ 90% (or ≥ 80% if you want to use the High-Distinction 112
  • 113.
    standard at UWA)in coursework, a typical high-performing high-school student is suddenly presented with some cunningly ‘hidden’ challenges. These may or may not be obvious on the outset, depending on the perception and wisdom of the student. Nonetheless, they are summarised as follows: • Success requires self-discipline and strategic time-management. For some people, this may require giving up things such as computer games or recreational skydiving. It also requires making lots of ‘lists’, keeping a diary and calendar – prioritising major tasks and actually getting through one’s lists. One must know one’s limits however – ‘down time’ is necessary to maintain optimal performance. However, there is a point where ‘down- time’ turns into avoidance and procrastination. • Success requires self-motivation and self-regulation. It’s no longer the job of your teachers, parents or school to tell you what to do, or to inspire you to do well. Students who come from schools or families where all the hard behind-the-scenes work was ‘done for them’, are often slow to learn this lesson. It is in part because high school teaches students the that their ATAR = their intelligence, without necessarily acknowledging the monumental efforts of some teachers. The converse may also be true – astounding individuals can emerge from abhorrent schools, entirely due to their own initiative. • Success requires honest self-inventory. One bad habit that emerges from our present generation, is the habit of ‘blame’ and misdirection. Sometimes when people under-perform, they look for excuses or people to blame – they may point to the lecturer, their tutors, the university, personal relationships or a whole plethora of external factors. The great contemporary intellect, Noam Chomsky, writes of this somewhere – in essence the habit of blame is partly embedded within modern media, which makes use of capitalistic ‘feel good’ psychology. It also partly comes from high school, where teachers are held responsible and accountable for almost everything. At university, the student must take on the attitude that their learning, understanding and success is ultimately their own responsi- bility – their lecturers and courses are merely there to facilitate their journey, but the student must take their own steps. Often one finds that people who exhibit substandard performance, whilst being complacent, over-trivialise a mastery of skill or some accomplishment. This is mostly likely just another instance of the ‘Dunning-Kruger effect’ (see below). 113
  • 114.
    At the endof the day, it’s neither better to undersell or oversell yourself. Sure, you want to sell yourself well when applying for jobs, scholarships or various opportunities – or you may wish to undersell yourself so as not to alienate oneself from people who are easily intimidated and envious. How- ever, in your own personal and private inventory of your abilities, one should strive to be very precise and accurate – this means having a good operational understanding of your weaknesses and strengths. As a general principle, one should always be working to improve their weaknesses and capitalize on their strengths – this alone, will lead to a rise in grades and performance at university (and life in general). • Dunning-Kruger effect. “If you’re incompetent, you can’t know you’re in- competent. [...] the skills you need to produce a right answer are exactly the skills you need to recognize what a right answer is. –David Dunning. “Unskilled individuals suffer from illusory superiority, mistakenly rating their ability much higher than is accurate. This bias is attributed to a metacog- nitive inability of the unskilled to recognize their ineptitude.[Kruger Dun- ning, Journal of Personality and Social Psychology, Vol 77(6), Dec 1999, 1121-1134] “Those persons to whom a skill or set of skills come easily may find them- selves with weak self-confidence, as they may falsely assume that others have an equivalent understanding (Impostor syndrome). In summary, the Dunning-Kruger effect is a rife disease of the mind which can only be cured with ‘academic maturity’. Since it occurs in the population at large, it’s not unusual in any respect – for example, recalling Alexander Pope from 1709: A little learning is a dangerous thing; drink deep, or taste not the Pierian spring: there shallow draughts intoxicate the brain, and drinking largely sobers us again. Similarly, one can find much earlier statements pertaining to humility and ability, for example, from Confucius – “Real knowledge is to know the extent of one’s ignorance. If you’re going to claim that you’re good or excellent at something (e.g. be- ing intelligent), take a step back and ask yourself – in what context and by what measures are these claims met? If someone else claims this, ask your- self the same thing. On the other hand, if you claim to be bad at something, take a step outside of yourself and scope out the progress you’ve made – are 114
  • 115.
    you really ‘bad’,or are you just in the process of learning and training? Cer- tainly the latter provides a more useful and operationally efficient perspective 99! Of course, ineptness does not mean being incapable of learning or unable to improve – indeed, a change of attitude can see inept people reaching a level of mastery in the relevant skill (here we speak of mathematics, prob- lem solving, physics and logical thinking). Basically, Dunning-Kruger and false pride is an age-old problem with the greatest irony being that some- times inept people perceive those who are very adept as being arrogant or egotistical. C‘est la vie. 14.3 Motivation Despite the above critical analysis, one must conclude with the observation that amongst this group of students and tutors, we have some very talented individuals. For the most part, it is clear that the students and tutors who have attended tutorials so far, have great potential in their future careers. One small, but perhaps obvious secret is that apart from a few cases of extraordinary innate genius or upbringing (Carl Friedrich Gauss), the majority of intellects who have had a dramatic and pro- found influence on science, art and the human condition in general – were people who worked very hard behind the scenes. It’s not true that everyone can be as rev- olutionary or influential as Einstein, Dirac or David Hilbert, but it is true that with a small predisposition towards the mathematical sciences, accompanied by thou- sands of hours of mathematical entertainment and thought experiments, that one can have a good chance at making important and interesting, lasting contributions to human knowledge and our understanding of the world around us. If you can convince yourself that it is something you enjoy, then the hours required to get to such a level aren’t really work at all – it’s just play. Alternatively, you can make an immense amount of money with the strategic, or- ganized and problem-solving mindset that the mathematical sciences equip you with – for example, the mathematician James H. Simmons or the ex-theoretical physicists who crashed the Stock Market with their micro-trading AIs. Whatever your goal is, one fact remains – the better you perform, the more opportunities you create for yourself. Success is attractive, under-performance isn’t. “We at the height are ready to decline. There is a tide in the affairs of (wo)men Which, taken at the flood, leads on to fortune; Omitted, all the voyage of their life 99 Disclaimer: A grain of salt – you may also just be really bad. 115
  • 116.
    Is bound inshallows and in miseries. On such a full sea are we now afloat, And we must take the current when it serves, Or lose our ventures. 116
  • 117.
    15 Tutorial 12:Metric Spaces and Relativity I One way that we see progress in science, is to take an everyday concept or prin- ciple and make it ‘abstract’. This means distilling the concept and extracting its fundamental elements so that the concept can be ‘re-applied’ to more general set- tings. Sometimes this is purely a creative affair, but almost always it leads to new applications, new ideas and entirely new fields of research. In this tutorial, we take the everyday concept of ‘distance’ and mathematically for- malize it to give a mathematical object known as a ‘metric’. We then couple this object with a set (e.g. a 3-dimensional space), to give a structure called a ‘met- ric space’. Such structures have fundamental applications to pure mathematics, optimization, computer science and physics. Exercise 36 With the person closest to you, think of 3 examples of different notions of distance in everyday living. Furthermore, considering two distinct (possibly ab- stract) objects or locations, state two notions of distance that give different values for the distance between those objects. 15.1 Metric Spaces To make the concept of ‘distance’ mathematically concrete, we formalize it with the notion of a ‘metric space’. Definition 9 A metric space is a collection (S, d) consisting of a set S and a metric d on S, which is a map characterised by the following properties: 1. positivity – for any x, y ∈ S, d(x, y) ≥ 0, (249) 2. non-degeneracy d(x, y) = 0 iff x = y, (250) 3. symmetric d(x, y) = d(y, x), (251) 4. triangle inequality d(x, z) ≤ d(x, y) + d(y, z). (252) 117
  • 118.
    Conceptually, the metricd gives us a measure of ‘distance’ on the set S. Note that the above properties are somewhat ‘intuitive’ – the symmetric property (3) just says that the distance from point A to point B should be the same as the distance point B to point A. One could relax this property to obtain a more general notion of distance, for example, on a ‘directed graph’. Problem 22 (Warm-up (Challenging)) It is possible to derive the first property (positivity) of a metric space from the other properties. In this manner, one may view properties 2-4 as fundamental axioms and property as a consequence – mean- ing one could technically discard it from the definition of a metric space. Assuming properties 2-4 hold, prove that property 1 follows. 15.1.1 Euclidean Metric Spaces We now consider a simple, but important example of a 1-dimensional metric space (d, R1), where d is the map given by the ‘absolute value’ operation. Example 10 (Absolutely Easy) Given the set R of real numbers (which we can represent as the real number line), a ‘Euclidean’ metric is d is given by the absolute value. In particular, for any two numbers x, y ∈ R, we define: d(x, y) = |x − y|, (253) to be the distance between the points x and y. To see that d is indeed a metric on R, we must check that (d, R) satisfies the four axioms of a metric space. In particular, let x, y, z ∈ R be any points on the real number line. The first three properties follow directly from the properties of the absolute value – the triangle inequality (fourth property) requires a bit more work. 1. Positivity: d(x, y) = |x − y|≥ 0 follows from the properties of the absolute value. 2. Non-degeneracy: d(x, y) = |x − y|= 0 holds if and only x = y. 3. Symmetry: d(x, y) = |x − y|= |y − x|= d(y, x). 4. Triangle inequality: |x − z|≤ |x − y|+|y − z|. To see that this property 118
  • 119.
    holds, consider thefollowing proof: −|a| ≤ a ≤ |a| and − |b|≤ b ≤ |b| =⇒ −(|a|+|b|) ≤ a + b ≤ |a|+|b| =⇒ |a + b| ≤ |a|+|b| (254) for any a, b ∈ R. In particular, if we let a = x − y and b = y − z we see that |x − y + y − z|= |x − z|≤ |x − y|+|y − z|, (255) hence proving that the map d(x, y) = |x−y| satisfies the triangle inequality. Alternatively, here is another proof of the triangle inequality: (|a + b|)2 =a2 + b2 + 2ab =|a|2 +|b|2 +2ab ≤|a|2 +|b|2 +2|a||b|= (|a|+|b|)2 . (256) Since both sides of the above inequality are positive, we can take the square root 100 of both sides, giving |a + b|≤ |a|+|b| ∀a, b ∈ R. (257) Hence it follows that (d = | |, R) is a 1-dimensional metric space. We call this 1-dimensional ‘Euclidean Space’. Problem 23 (Challenge) With a simple trick, one can use the 1-dimensional tri- angle inequality to prove the ‘reverse triangle inequality’: ||a|−|b||≤ |a − b|. (258) Do this if you finish the rest of the tutorial. Although the 1-dimensional Euclidean metric space we previously considered was very simple, it serves as a ‘building block’ for more complicated metric spaces – for example, your familiar 2-dimensional Euclidean space. Example 11 In two-dimensional Euclidean space (flat-land) R2, we can measure the ‘straight-line’ distance between two points using Pythagoras’ theorem. First we set up a 2D Cartesian coordinate system with (0, 0) as the origin. Now, taking 100 Because the real square root is a monotonic increasing function, it does not affect the direct of the inequality. 119
  • 120.
    points P1 =(x1, y1) and P2 = (x2, y2), one can draw a right-angle triangle with the vector v = (x2 − x1, y2 − y1) point from P1 to P2 as its hypotenuse. In terms of these coordinates, we can then measure the ‘Euclidean distance’ between P1 and P2 to be the length of the hypotenuse of the previous triangle as given by Pythagoras’ theorem 101 d(P1, P2) = v = — (x2 − x1)2 + (y2 − y1)2 (259) It should be an easy exercise to verify that the 2-dimensional Euclidean metric d, given by the Pythagoras rule is indeed a metric on 2-dimensional real space R2. Note however, that proving the 2-dimensional triangle inequality may require some cunning. Exercise 37 (Easy (but no so easy)) Prove that the 2-dimensional Euclidean map d given by (259) is indeed a metric on R2. In other words, prove that (d, R2) satisfies the four axioms of a metric space. Hint: Use Cartesian coordinates to label your points P1 = (x1, y1), P2 = (x2, y2), P3 = (x3, y3). To show that the 2-dimensional triangle inequality holds, one may consider the ‘geometrical’ proof by Euclid in Book 1 of Euclid’s Elements. Note that this proof should suffice for Euclidean spaces of arbitrary dimension, since any three points must lie in some 2-dimensional plane – a linear subspace of your total space, whence Euclid’s proof can be applied. Alternatively, to prove that d(P1, P3) ≤ d(P1, P2) + d(P2, P3) (260) for any points {P1, P2, P3} in R2, one can make use of the ‘1-dimensional triangle inequality’ proven earlier or prove an inequality known as the ‘Cauchy-Schwarz’ inequality (easy-but-no-so-easy). For 3-dimensional real space, R3, we have the additional knowledge that it can be naturally equipped with a vector-space structure. In particular, recall that any point (x, y, z) can be represented by a vector pointing from the origin (0, 0, 0) to (x, y, z) and that the addition and subtraction of ’points’ simply corresponds to vector addition and subtraction. In this manner, a notion of distance between points A and B can be given by the Euclidean length of the vector between those points – that is, the ‘norm’102 of the vector pointing from A to B (or vice versa), 101 Recall, for a right-angle triangle with sides of length a, b and c, where c is the hypotenuse, one has a2 + b2 = c2 . 102 Recall that the ‘norm’ of a vector v = (x, y, z) in the standard basis, is simply given by v =ax2 + y2 + z2. 120
  • 121.
    given by vectorsubtraction. This notion of distance is simply a 3-dimensional version of Pythagoras’ theorem, with A, B and the origin O = (0, 0, 0) making up three points of a right-angled triangle. Therefore, for A = (a1, a2, a3), B = (b1, b2, b3) ∈ R3, we see that d(A, B) = AB = B − A , (261) gives us our standard metric on R3, making (d, R3) a 3-dimensional Euclidean metric space. Intuitively, one can easily generalize this observation to define a Eu- clidean metric on n-dimensional real space Rn for any dimension 103 n ≥ 0. Exercise 38 (Cauchy-Schwarz Inequality (Challenge)) Given any vector space V (for example Rn) equipped with a positive-definite inner-product , (e.g. the dot-product) and some corresponding induced‘norm’ ||_||(some notion of length, e.g. ||v||= — |v · v|) , it follows that | v, u |≤ ||v||·||u||, (262) where u, v are vectors in V . For the case of the dot-product, u, v := u · v, prove that the Cauchy-Schwarz inequality holds on Rn. Hint: Take n = 3 for simplicity. Also, note that technically you cannot use the identity u · v = ||u||||v||cos(θ) (where θ is the ‘angle’ between u and v) since this explicitly relies on the Cauchy-Schwarz inequality being true! Indeed, such an identity is a characterising quality of Euclidean geometry. Hint: You may want to consider expressing ||u||2||v||2 as sum of dot-products and cross-products – this was used by Lagrange. 15.1.2 Fun Metric Spaces In this section, we consider some ‘less obvious’ examples of metric spaces. For- mally, these are referred to ‘Fun spaces’, after the German mathematician, Frei- drich P. Fun104. Apart from being fun, some of these metric spaces have powerful applications to the real world. 103 Note that a zero-dimensional space, R0 would simply correspond to a single point and hence would trivially satisfy the axioms of a metric space. 104 This statement is false. 121
  • 122.
    Exercise 39 (TheTaxicab Metric) The Taxicab metric, or ‘rectilinear distance’ was first formally considered by Minkowski105 and can be defined on any real vec- tor space, Rn, to make a metric space formally denoted by l1(Rn) (little ‘l1’). It is nicknamed the ‘taxicab’ or ‘Manhattan’ metric because the distance between points on a 2-dimensional space, R2 is computed using an L-shaped pattern as a taxi driver in Manhattan would travel (rather than ‘as the crow flies’, which would give the ‘Euclidean’ distance). Q1 Prove that map d : R2 × R2 → R, defined by d(P1, P2) = |x2 − x1|+|y2 − y1| (263) where Pj = (xj, yj) are points in R2, defines a metric on R2. Q2 In general usage, a circle refers to a 1-dimensional manifold which we usually view as an object embedded in a 2-dimensional plane (a graph). However, a more general definition is to consider a circle as a subset of points in R2 which are equidistant by some fixed amount r, from some chosen point which we call the ‘centre’ of the circle. Mathematically, if we take the center to be the origin (0, 0), then a circle S1(r = radius, 0 = center) is the locus of points x ∈ R2 such that d(x, 0) = r. (264) Using this definition, work out what a circle of ‘radius 1’ centred at the origin would look like with the taxicab metric. Now, draw another circle with the same radius and origin on top of your original circle, but this time using the standard Euclidean metric. What do you notice? Exercise 40 (The lp Family) In an attempt to digitally preserve old music, Matt Fernandez buys one million LP records. While listening to these records in Tower C2 and bragging about his view of the river, Matt decides to generalize the Taxicab metric in the following way: dlp (X, Y ) := (|y1 − x1|p +|y2 − x2|p +.... + |yn − xn|p ) 1 p (265) where X = (x1, ..., xn) and Y = (y1, ..., yn) are points in n-dimensional real space, Rn. Note the constant p that appears in the exponents as a fixed power. The subscript lp in d is simply a label to keep track of of which metric we are referring to. Q1: Clearly, when p = 1 this gives us the taxicab metric on Rn (set n = 2 if you need to convince yourself). Now verify this. 105 A mentor to Einstein and German mathematician responsible for the ‘4-dimensional’ model of Special Relativity. 122
  • 123.
    Q2: When p= 2, what familiar metric do you get? Hint: Try setting n = 2 or n = 3 for simplicity. Q3: Recalling the earlier definition of a circle S(r, 0) of radius r and center 0 as the set of points X ∈ R2 such that dlp (X, 0) = r, (266) draw what a circle of radius r = 1 and center (0, 0) would look like in R2 when • p = 1 and p = 2 • 0 p 1, for example p = 1 2. • p = 100 or some other large number. In particular, what do you notice happens when p → 0? If you start at p = 1 then vary p to 0, you should get a Euclidean rhombus (diamond) whose edges ‘collapse’ inward toward the origin while its vertices remain fixed at (±1, 0) and (0, ±1). Now what happens when p → ∞? Your circle should look a Euclidean square! Hint: You are essentially solving the equation d((x, y), 0) = 1, using the above expression for your metric d for the stated value of p. The set of points that form a solution to this equation should give you a graph of your ‘circle’. Hint: It helps to consider the four different quadrants of the Cartesian plane sep- arately, in order to get rid of the absolute value signs. This means x 0, y 0, then x 0, y 0 e.t.c. It also helps to choose some few points, e.g. (1, 0), which satisfy the equation d((x, y), 0) = 1, plot them and see if you can guess the overall pattern. Q4: Challenge Prove that the map dlp on Rn is indeed a metric, as claimed. Hint: You will need the Cauchy-Schwarz inequality, one way or another, to prove that the triangle inequality is satisfied. Q5: Challenge If you did your math correctly, you should see that when p → ∞ our circle becomes a (Euclidean) square!! The special case when p = ∞ is mathematically very important – but also fun. In particular, the metric d∞ is referred to as the ‘Chebyshev’ distance 106,‘supremum distance’ – or colloquially as the ‘chessboard metric’. Using some sorcery of inequalities and your knowledge of the algebra of limits, prove that: dlp (X, Y ) = Max i=1,...,n {|y1 − x1|, |y2 − x2|, ..., |yn − xn|} (267) 106 After the 19th Century Russian mathematician,Pafnuty Chebyshev. 123
  • 124.
    for p =∞. This simply says that the distance between two points is the magnitude of the maximum possible difference between any two of their coordinates. An easy way to think of this is that on a chessboard, the minimum number of moves that a King requires to move from one particular square to another particular square is equal to the dl∞ distance between the squares. Hint: For simplicity, you can set n = 2 dimensions then try a proof for general n afterwards. Exercise 41 (The Phlegethon River metric) 107 The Phlegethon river features in Greek literature as one of five infernal regions of hell. More importantly, in Dante’s Inferno, it features as a river of boiling blood in which greater murderers – such as Atilla the Hun, are tortured for eternity. Considering the portion of the seventh circle of hell containing the Phlegethon river, we can view this as a 2-dimensional space. Being residents of St. George’s College – an Anglican college, the centaur Nessus agrees to give us free rides along and across the Phlegethon river, whenever we please. Setting up a coordi- nate system with the x-axis along the Phlegethon river and y-axis in the direction perpendicular, we now distinguish places in the seventh circle to according to the amount of distance we have to walk to travel between places. In this respect, the ‘shortest’ path between places will typically make use of free boat rides from Nes- sus along the Phlegethon river – we can use this notion of ‘shortest’ path to define a metric! Mathematically, we can define ‘Phlegethon River metric’ d on R2 to be: dRiver(P1, P2) = Min {dl2 (P1, P2), |y2 − y1|} (268) where P1 = (x1, y2), P2 = (x2, y2) are points in R2 and dl2 (P1, P2) is the stan- dard Euclidean or l2 distance between P1 and P2. Q1: Choosing three distinct points, compute the ‘Phlegethon distance’ between them. Draw a diagram to illustrate the you would take (shortest path) between points? What do you notice? Hint: To see something interesting, Choose points for which y2 = y1 and take at least one point to be on the opposite side of the Phlegethon river (x-axis). It Q2: Intermediate Recalling the earlier definition of a circle S1(r, 0) of radius r centred at the origin, draw what a circle of radius 7 and center (0, 0) would look like with the Phlegethon river metric. 107 From recollection, Professor Brailey Sims referred to this as the ‘Jungle River Metric’. 124
  • 125.
    Now, shift thiscircle vertically by 2 units, i.e. draw another circle of radius 7 with center (0, 2). Now sketch a circle of radius 1 centred at (2, 2). Hint: If you get something crazy, it’s probably correct. In particular, your first ‘circle’ should108 just be two straight lines at y = ±3.5 extending to x = ±∞. Your second circle should look like a ‘Euclidean circle’ which ‘opens’ up and stretches out to x = ±∞ at y = −1.5.109 Your third circle should like the graph of an ordinary Euclidean circle. Q3: Easy(ish) Prove that the Phlegethon River metric is indeed a metric. Hint: Since we already proved that l2 and l1 were metrics, it shouldn’t take much to show that the Phlegethon metric on R2 satisfies the metric space axioms. Q4: Challenge The Phlegethon metric we proposed made use of the Euclidean met- ric, dl2 . Replacing this part of the definition with the general lp family of metrics, prove or disprove that dRiver(P1, P2) = Min{dlp (P1, P2), |y2 − y1|} (269) gives rise to a whole family of different metrics for 0 ≤ p ≤ ∞. We shall call refer to such metrics as the ‘infernal p-metrics’. In particular, sketch what a circle of radius 1 and center (0, 0) would look like for p = 0.5, p = 1, p = 3, p = ∞. This should be an easy extension of the p = 2 case and your sketches for the circles under the dlp metrics. Exercise 42 (Cryptographer’s Metric) An essential part of cryptography and in- formation theory is ‘error detecting’ and ‘error correction’. In particular, informa- tion is often delivered as a string of symbols or digits. In the modern world, most information is delivered in ‘binary’ – meaning a base two system consisting of ze- ros and ones. Like your usual base 10 numbers, a binary string of digits represents a number. This number may correspond to some letter of the English or Cyrillic al- phabet (for example) – and often, it comes with some ‘cipher’, meaning a rule that puts binary strings into a unique 1-1 correspondence with letters in your alphabet. Sometimes when information is transmitted, errors can accrue – for example, digits in the sent binary string get ‘flipped’ (meaning zeros get replaced by ones and vice versa). An obvious way to measure ‘how different’ two strings of symbols differ – e.g. binary strings, is to use count the number of of positions at which the 108 Acknowledgement to Matthew, Theresa and William for pointing out a flaw in my memory. 109 The idea of an ‘exploded, bleeding circle seems befitting for the Phlegethon. 125
  • 126.
    corresponding symbols oftwo strings are different. In other words, one counts the minimum number of alterations (‘bit flips’ for binary strings) required to transform one string into the other. Mathematically, this gives us a metric known as the ‘Hamming distance’ between strings: d(String1, String2) = # of Positions where String 1 and 2 differ. (270) For binary strings, e.g. S1 = (1, 0, 1, 1, 0) and S2 = (0, 0, 0, 0, 0), we can compute the Hamming distance easily considering S1 and S2 as 5-dimensional vectors of integers modulo 2 – i.e. S1 = (a, b, c, d, e) where a, ..., e ∈ Z2 e.t.c.110 Subtracting the strings as vectors to form an ‘error’ vector, = (String1 − String2) Mod 2, a 1 appears in positions where String 1 and 2 were different and 0 appears where they are equivalent. Therefore, we can write the Hamming distance between two binary strings, String 1 and String 2 as d(String1, String2) = Sum of all entries of , (271) where = (String1 − String2) Modulo 2.111 Q1: Prove that the Hamming distance gives a metric on the set of ‘strings’ or ‘code- words’ of a given length L (note that a string of length L means an L-dimensional vector with symbols as its components). Q2: When transmitting information, the ‘Hamming distance’ between the trans- mitted string and the received string is equal to the ‘minimum number of errors’ that could have occurred in the communication process. Explain why it is specif- ically equal to the ‘minimum’ number of possible errors, as opposed to say, the ‘maximum number’. Q3(Challenge): For binary strings, an n − bit error means that the hamming dis- tance between the transmitted and received string is equal to n, where n is some integer. One can ‘error-protect’ their codewords / strings by appending ‘extra bits’ to their transmitted string, in some clever fashion. This assumes that all codewords / strings differ by some ‘minimum’ distance dmin to begin with – note that all distinct strings necessarily differ by at least 1 digit, but by choosing large enough strings to carry to smaller amounts of information (block codes), one can increase error-correction capacity. In essence, by adding more bits one can protect against larger errors. 110 Recall that ‘modulo 2’ means that all positive and negative even integers become equivalent to 0 and all positive and negative odd integers become equivalent to 1. 111 Note that it doesn’t matter in which order we subtract the strings, since doing the subtraction modulo 2 gives the same result – one could even add the strings. 126
  • 127.
    Can you thinkof an decoding algorithm to protect strings against 1-bit errors? Here you will need to assume strings have some length n = a+b, where a is the number of information digits and b are the number of error-correction distances. Q4(Extra Challenge): Having completed or searched for the solution to the last problem, what is the minimum number of extra bits required to protect 8-bit code- words/strings from 1-bit errors? What is the minimum number of extra bits required to protect against 2-bit errors? Can you devise ‘the most efficient’ error correction algorithm to protect n − bit words against k − bit errors, where k n? If you can, prove that this is the most efficient. Upon completing this exercise, you’ve effectively reproduced the research of Richard Hamming in 1950, regarding the creation of linear error-correcting codes. Exercise 43 (The Discrete Metric) Given a set S, one can endow it with the ‘dis- crete metric’ d. This defined by: d(x, y) = 1, if x = y 0, if x = y , (272) for any members x, y of the set S. Note that every topological space can be equipped with this metric. Although it seems bizarre, the discrete metric is im- mensely useful for providing ‘counter-examples’ in topology as well as playing an essential role in some proofs. Q: Prove that the discrete metric is indeed a metric. 127
  • 128.
    15.2 Non-Euclidean MetricSpaces and Relativity When we used the ‘Pythagoras’ theorem to give us notions of distance in n-dimensional real space, Rn, we constructed ‘Euclidean metric spaces’ which give rise to ‘Eu- clidean geometry’ – summarized by Euclid of Alexandria in his ‘Elements’ at around 300BC. One thing that characterises such geometry, is the parallel postulate – that two straight lines which are parallel (and non-overlapping) never intersect. Another characterising property is that the sum of angles in a triangle is always 180 degrees. Towards the latter part of the 1700s, due immensely to work of the the German mathematician, C.F. Gauss, ‘non-Euclidean’ notions of geometry began to emerge. In particular, geometries where ‘straight lines’ did intersect. Such advances pro- pelled the field ‘differential geometry’ and the work of Gauss’ student, Bernhard Riemann – which underpins General Relativity, vast areas of applied mathematics, information theory, engineering and most of modern physics. A simple example of a non-Euclidean geometry is the planet we live on! If we con- sider the Earth a sphere (or oblate spheroid), ‘straight lines’(geodesics) correspond to ‘great circles’ on the sphere – these are circles which slice along the diameter of the sphere. All great circles necessarily intersect each other – as such, there exists no ‘straight line’ on a sphere which has a some parallel. Similarly, angles in a spherical triangle sum to more than 180 degrees but less than 540 degrees. Taking this notion further, together with Albert Einstein, the German mathemati- cian Hermann Minkowski developed a model of the physical world as a ‘4-dimensional spacetime’ with a special metric known as the ‘Lorentz metric’. In this manner, the relativistic laws physics can be naturally obtained by considering spacetime as a 4-dimensional hyperbolic space, rather than the Newtonian 3 + 1 dimensional112 Euclidean space, with we are so familiar with. In the next tutorial, we will use our knowledge of metric spaces to investigate Minkowski’s ideas. 112 In Newtonian physics, one considers space and time as fundamentally ‘separate’ and indepen- dent entities – meaning one 3-dimensional space and events measured by some universal clock (one- dimensional time). In relativity, the 3+1 or ‘space + time’ split occurs when an observer decomposes events into his 3-dimensional rest space and the 1-dimensional subspace orthogonal to it – his ‘time’ axis. 128
  • 129.
    16 Tutorial 13/14:Relativity and Hyperbolic Distance The outcome of the last study group session, should have been a mathematical understanding of the following key ideas: • We can mathematically formalize the notion of ‘distance’ between points in a set by endowing that set with a ‘metric’ – a function which is characterised by a set of intuitive axioms. • Metric spaces have vast applications to the real world – recall the Euclidean, Taxicab and Hamming metrics. More generally, many optimization prob- lems such as optimizing fuel consumption and lowest risk paths through naval minefields for warships, can be viewed as ‘minimizing’ some abstract ‘distance’. • Our perceptions of ‘geometry’ – such the ‘shape’ of objects, is fundamen- tally tied to the metric space we choose to work in. Recall that ‘circles’ for example, can appear to look entirely different to the usual ‘Euclidean circle’, simply by a new choice of metric. In this tutorial, we will revise elementary notions of ‘circular’113 and hyperbolic trigonometry. We then see how one can use this knowledge to represent different notions in Einstein’s theory of Special Relativity – hence showing how properties of nature emerge from the notion of a metric space. 16.1 The Two Faces of Trigonometry 16.1.1 The Circular Face Today, we shall proclaim Janus to be the patron of trigonometry. Recall the geo- metric definitions of the sine, cosine and tangent functions arise by considering a right-angle triangle inscribed into a circle whose radius is the hypotenuse of said triangle. Now let θ be the angle between the hypotenuse and another radial line of the triangle as in the figure below. By setting the radius of the circle r = 1, one then sees that the fundamental trigonometric identity: sin2 (θ) + cos2 (θ) = 1, (273) 113 That is, the standard trigonometry of sines, cosines e.t.c. 129
  • 130.
    Figure 4: Illustratingtrigonometry as the ‘geometry of the circle’. is simply a consequence (or expression) of Pythagoras’ Theorem for right-angled triangles. For this reason, one may consider trigonometry to be the ‘geometry of the circle’. N.B:If you consider the complementary sides of the triangle to be the x, y variables in the Cartesian plane as in the diagram, then the equation for the graph of a circle is also just a restatement of Pythagoras’ theorem – that is, the set of all points (x, y) ∈ R2 which satisfy the equation: x2 + y2 = r2 . (274) Exercise 44 (Easy Warm-up) In the above formula, we chose the radius r = 1. Now, let r be arbitrary and use your standard trigonometry techniques to re- express Pythagoras’ theorem: a2 + b2 = c2 , (275) in terms of sine and cosine, where c is the hypotenuse of the triangle and a, b are the two other sides. In particular, one should see a cancellation of r2 from both sides, thus proving the fundamental trigonometric identity. 130
  • 131.
    Exercise 45 (Easy)Recall that the tangent function is geometrically defined in terms of the ratio of the sides opposite and adjacent to the given angle θ: tan(θ) = opposite adjacent = sin(θ) cos(θ) . (276) Similarly, recall that the reciprocal trigonometric functions are defined as: cot(θ) = 1 tan(θ) csc(θ) = 1 sin(θ) sec(θ) = 1 cos(θ) . (277) From this, show that identities: 1 + tan2 (θ) = sec2 (θ) 1 + cot2 (θ) = csc2 (θ), (278) simply follow from the fundamental trigonometric identity. Now recall Euler’s formula 114 eiθ = cos(θ) + i sin(θ), (279) which relates the complex exponential function to trigonometry and the geometry of the complex plane C. This is somewhat intuitive, if we represent a complex number z = x + iy (where x, y ∈ R) in polar form: z = reiθ (280) where r = — x2 + y2 and θ is the angle between the vector z and the real axis (measured anticlockwise from y = 0). Then as θ varies from 0 to 2π, the vector z traces out a circle of radius r – in which we can inscribe a right-angle triangle with sides of length r cos(θ) and r sin(θ). Exercise 46 (Moderate) Using Euler’s formula, prove that we can express sine and cosine as follows: cos(θ) = eiθ + e−iθ 2 , sin(θ) = eiθ − e−iθ 2i . (281) Hint: It’s easy to forget the factor of i in denominator of the sine formula. 114 Recall the imaginary unit i is defined such that i2 = −1. 131
  • 132.
    Exercise 47 (Easy)A lesser known mathematician around the time of Isaac New- ton, was Abraham de Moivre. Apart from calculating the day of his own death (based on number of hours slept), De Moivre is known for the following trigono- metric formula: (cos(θ) + i sin(θ))n = cos(nθ) + i sin(nθ), (282) which holds for any θ ∈ R and any n ∈ Z. Prove this formula. Hint: Use Euler’s formula. 16.1.2 The Hyperbolic Face Recall that the equation for the graph of a circle of radius r, centred at the origin (0, 0) of a Cartesian coordinate system, is given by: x2 + y2 = r2 (283) where (x, y) ∈ R2. From this circle and the Pythagorean theorem, we earlier obtained the fundamental circular trigonometric identity. Therefore, it seems plau- sible that by considering the geometry of other conic sections, one should obtain analogous sets of identities. For the case of an ellipse, this is just a distortion or ‘rescaling’ of the circle, by different factors along each axis – ultimately leading back to the circular trigonometric identities. If we now consider a unit equilateral (rectangular) hyperbola centred at the origin, with its foci and vertices lying along the x-axis (so that its branches open up along this axis), an equation for its graph is given by: x2 − y2 = 1. (284) This is a consequence of its geometric definition as “The locus of points such that the difference between the distances to each focus is constant” and Pythagoras’ theorem 115. Tutorial 12 Observation: If we equipped our 2-dimensional space R2 with a hy- perbolic metric, defined by η(V1, V2) := x1x2 − y1y2 (285) 115 Alternatively, one may consider this equation as a hyperbolic version of the Pythagorean theo- rem. 132
  • 133.
    where Vj =(xj, yj), then (284) would simply be the equation for the ‘unit circle’ in this ‘hyperbolic space’! For the Euclidean unit circle, we had x = cos(θ) and y = sin(θ) giving rise to the fundamental identity. In this hyperbolic case, we instead have: x = cosh(θ), y = sinh(θ) (286) where cosh and sinh are the ‘hyperbolic’ cosine and sine, giving us the fundamen- tal hyperbolic trigonometric identity: cosh2 (θ) − sinh2 (θ) = 1. (287) From this identity, one derive the consequential hyperbolic identities in an analo- gous fashion to the circular trigonometric identities. Exercise 48 (Easy-Optional) As above, repeat the earlier exercise deriving trigono- metric identities – but this time, replace each function with its hyperbolic version (notationally, this just amounts to adding a ‘h’ at the end of the function name). Hint: Beware of new − signs appearing. Continuing the analogy, recall that from Euler’s formula one gets the trigonometric identities: cos(z) = eiz + e−iz 2 , sin(z) = eiz − e−iz 2i , (288) where z was some real number (an angle). More generally, Euler’s formula holds for complex numbers, z ∈ C – in particular, one can prove this since the complex Taylor series converges on the entire complex plane. Now, notice the following trick. We perform the transformation: z → iz, (289) which corresponds to a counter-clockwise rotation by π 2 radians in the complex plane116 Fun Aside: This same transformation, S → iS, is used in Quantum Field Theory to convert quantum partition functions into thermodynamical ones (from statistical mechanics). In that case, S is the action117 for some quantum theory and this 116 Recall that multiplying a complex number (represented by a 2-dimensional vector) by the imag- inary unit i, rotates it by 90 degrees anti-clockwise. This can be seen in polar form, since multiplying by eiθ rotates a complex number counter-clockwise by θ. In particular, i = ei π 2 . 117 An mathematical object (‘integral’) which essentially contains the entire theory. 133
  • 134.
    ‘complex rotation’ isknown as a ‘Wick rotation’. It features in the derivation of the Hawking-Bekenstein temperature of black hole. In our case, we set z = x to be some real number. After the transformation x → ix and some algebra, we then get the hyperbolic trigonometric functions: cosh(x) = ex + e−x 2 , sinh(x) = ex − e−x 2 . (290) Exercise 49 (Easy-Moderate) Using the Euler identities for sin, cos and their hyperbolic counterparts (in terms of the exponentials), prove that: • cos(iθ) = cosh(θ). • sin(iθ) = i sinh(θ). Challenge: Using the last two identities, can you think of an easy ‘trick’ to extract the hyperbolic trigonometric identities from the corresponding circular trigono- metric identities? Following the results of the last exercise, one should observe: cos2 (iθ) = cosh2 (θ), sin2 (iθ) = − sinh2 (θ). (291) Therefore, in all circular trigonometric identities involving squares of trigonomet- ric functions, making the transformation θ → iθ should induce the following trans- formations: cos2 (θ) → cosh2 (θ) sin2 (θ) → − sinh2 (θ) . (292) In particular, observe what happens to the fundamental circular trigonometric iden- tity: cos2 (θ) + sin2 (θ) → cosh2 (θ) − sinh2 (θ). (293) In this manner, one should be able to quickly deduce the hyperbolic identities from the circular ones. Exercise 50 (Easy) Using the above trick, complete the following hyperbolic iden- tities: 1 − coth2 (θ) = 1 − tanh2 (θ) =. (294) 134
  • 135.
    Hint: Remember thattan2(θ), cot2(θ) and csc2(θ) will all involve a factor of sin2 (θ), so these terms will pick a minus sign when one switches to the hyperbolic counterparts. Exercise 51 (Perambling Down Memory Lane) Recall the angular addition for- mulas for sine and cosine, derive the hyperbolic counterparts. In particular, ex- press the following in terms of products of sinh(θ) and cosh(θ): sinh(x + y) = cosh(x + y) =. (295) Hence, sinh(2x) = cosh(2y) =. (296) Hint: Use the identities cos(iθ) = cosh(θ) and sin(iθ) = i sinh(θ), which you proved earlier. 16.2 Lorentz Metric and Relativity From antiquity to Newton and onwards, most people have perceived the world as ‘3 + 1-dimensional’ in the sense that ‘space’ (3-dimensional) and ‘time’ (1- dimensional) were viewed as independent entities. In particular, to the uneducated their still persists a false notion of some ‘universal time’ or ‘universal clock’ which ‘ticks’ at the same rate for all observers. Such assumptions lead us to noting that nature obeys a set of symmetries - rotations and translations. This means that Newtonian physics is governed by a 3 + 3 = 6 dimensional symmetry group118 (called the ‘Gallilean’ or ‘Euclidean’ group)– 3 dimensions for rotations (one for each axis about which an object can rotate) and 3 dimensions for translations. This is simply wrong. As most of you would be familiar with, the theory of relativity arose at the begin- ning of the 20th Century, to explain various phenomena such as the null result to the Michelson-Morely attempts to measure an ‘aether wind’. Although it is attributed to Albert Einstein, one should note that the mathematician Henri Poincare and physicist Hendrik Lorentz had already observed a set of symmetries under which 118 A Lie group, recalling from tutorials 8-11. 135
  • 136.
    Maxwell’s equations forelectromagnetism, were invariant 119. This set of sym- metries is known as the ‘Lorentz Group’, 120 whose elements are the ‘Lorentz- transformations’ . Such a group is 6-dimensional as it consists of three spherical rotations (spatial rotations) and three hyperbolic rotations (Lorentz boosts). If one enhances this symmetry group by adding in translations (three in space and one in time), it becomes the 10-dimensional ‘Poincare group’. As Einstein deduced, not only does electromagnetism obey these symmetries, but in fact the physics for all observers121 obeys these symmetries. Such an observation is equivalent to Ein- stein’s statement that “the laws of physics are the same in all inertial reference frames”. To understand the equivalence of the Lorentz symmetries to Einstein’s postulates, we must establish the notion of ‘Minkowski spacetime’ – that is, the universe and its entire history as a 4-dimensional metric space with a ‘hyperbolic’ metric. 16.2.1 Minkowski Spacetime Despite relativity being somewhat ‘colloquial’ knowledge these days, for most peo- ple it is fundamental to their intuition to view time and space as separate entities. In part, this is because we measure ‘distance’ with rulers and ‘time’ with clocks. This is carried along with the notion of some universal time-keeping device – for example, we all synchronize our clocks to some state or international clock. Now consider the radar – an invention from World War 2122 that uses ‘radio waves’. Radar can be used to measure distances via bouncing radio waves off objects and measuring their ‘time of flight’. This is possible because radio waves are electro- magnetic waves – and as Maxwell showed, they must therefore travel at some fixed speed 123 c (the speed of light, shown to be constant via the Michelson-Morely ex- periment). In this manner, one could think of distances between objects in terms of the number of ‘seconds’ it takes to transmit and bounce/receive a radio wave off an object. As such, ‘spatial distances’ simply become another measure of ‘time’ and the union of 3-dimensional ‘space’ and 1-dimensional ‘time’ into a 4-dimensional spacetime no longer seems as foreign. 119 It is not hard to envision that given sufficient time, Poincare, Lorentz and collaborators would have reproduced the tenets of Special Relativity. 120 A Lie symmetry group, for those of you who completed tutorials 8-10. 121 In a flat spacetime – hence ignoring curvature arising from gravity. 122 A secret technology developed by British which played a huge role in their defence against the Germans during the bombing of London. 123 Technically speaking, in a vacuum. Refractive processes alter the effective speed in the atmo- sphere. 136
  • 137.
    We now proposespacetime to be the set of points in the 4-dimensional real vector space, R4. Note that this is not necessarily a Euclidean space since we haven’t specified a metric yet! Points in spacetime are called events, which may include the college winning the piano competition and ‘battle of the bands’, or may include the creation of a Higgs boson in the Large Hadron Collider at CERN. An observer is defined to be some particle travelling along in spacetime, though you can consider yourself an observer. Thus an observer traces out a curve through spacetime, which we call a worldline – each point on that worldline is an event in the history of that observer. The time measured by each observer – i.e. by a clock they carry, is equal to the arc-length (or simply, ‘length’) of their worldline (or some segment of it when measuring time between events on their worldline). Given an observer, one may define an ‘origin’ in spacetime relative to the wordline of that observer – this allows us to turn spacetime into a vector space, which is a structure you are familiar with. However, the principle of relativity tells us that no observer is ‘unique’! This means that the choice of origin (a ‘special point’) is not unique – prior to the observer, there is no ‘unique point’ in spacetime. As such, the correct intrinsic mathematical for spacetime is an ‘affine space’ – which is essentially a vector space with its origin ‘scrubbed out’ (it is the role of an observer to specify the origin). Unlike observers with mass, photons (or ‘light rays’) are massless – they travel on special curves called ‘null lines’. This is because the length of such curves is zero (‘null’). Exercise 52 (Gendanken) In mad effort to avoid appearing in college tutorial questions, Ben Luo decides to accelerate himself close to the speed of light (rela- tive to St. George’s College) with an ‘acceletron’ built by Tessa McGrath. Unfor- tunately, Tessa intentionally miscalibrated the acceletron – thus turning Ben into a photon. Recall that the ‘proper time elapsed’ between events (points) on an observer’s wordline is the ‘arc-length’ between those points. Thus, if photons travel on ‘null- lines’, what is the proper-time elapsed since Ben Luo turned into Ben Photon? How does this differ by the time that Tessa measures with her watch? Can you see any problems with this geometric definition of proper time? If so, what restrictions would you suggest? Some of you may be familiar with the Frenet-Serret formulas pertaining to the ‘differential geometry of curves’, from first year vector calculus. In particular, you may recall that the motion of a particle in 3-dimensional space can be modelled 137
  • 138.
    as the acurve traced out by some vector r(t) = (x(t), y(t), z(t)) parametrised by some parameter t – for example, time. In this manner, the ‘velocity’ of the particle is given as a vector v(t) = d dt r(t) which is tangent to the curve at the point (x(t), y(t), z(t)). If we now consider an infinitesimal line segment of length dr = v(t) dt = along the curve and integrate this between two points on the curve, one gets the ‘length’ (or rather, ‘arc length’) of the curve between those points. In appropriate units, this is simply the distance that particle (whose trajectory is given by the curve) has travelled: L = r2 r1 dr = t2 t1 v(t) dt. (297) Exercise 53 (The Road Less Travelled) To help Ben in his dilemma, Claire Wadding- ton develops a cure for Ben’s photonitis, returning him to his original form. En- lightened by his journey, Ben takes recourse in analysing his travels with the Frenet-Serret formalism. In this manner, he models his journey as a worldline in 4-dimensional space given by R(t) = (cτ, x(τ), y(τ), z(τ)) where τ is his ‘proper time’. Tangent to his worldline, is his 4-velocity vector U = dR(τ) dτ = (c, dx(τ) dτ , dy(τ) dτ , dz(τ) dτ ). (298) Using the 3-dimensional curve discussed earlier as an analogy, write down an expression for the ‘arc length’ of Ben’s worldline between the points at which Ben came to St. George’s College and the point at which he was cured of being a photon. Draw a diagram of Ben’s worldline, with points labelled, to illustrate the line in- tegral you have written. What do you notice about the period in which Ben was a photon? 16.2.2 Lorentz Metric and Light-Cone Structure Thus far, any mention of ‘length’ in our 4-dimensional spacetime comes with an implicit reference to some spacetime ‘metric’. Recall from tutorial 12 that a metric d on some set S was defined to be a map on S with the following properties: 1. positivity – for any x, y ∈ S, d(x, y) ≥ 0, (299) 138
  • 139.
    2. non-degeneracy d(x, y)= 0 iff x = y, (300) 3. symmetric d(x, y) = d(y, x), (301) 4. triangle inequality d(x, z) ≤ d(x, y) + d(y, z). (302) The first exercise from tutorial 12 was to show that ‘positivity’ was a consequence of properties 2-4. In particular, the direction of the triangle inequality determines the positivity of the metric. If we now relax the notion of ‘positivity’ and con- sider the idea of ‘negative’ distances, this would require ‘reversing’ the triangle inequality. In other words, the fact that everyday ‘Euclidean distance’ is positive is a strict consequence of the triangle inequality – which itself is a consequence of the Cauchy-Schwarz inequality. Thus, to get ‘negative distances’ on a real vector space Rn, one needs to reverse the Cauchy-Schwarz inequality – this is the heart of Euclidean vector geometry. This is precisely what we need to do in relativ- ity. Upon the (arbitrary) choice of some origin O = (0, 0, 0, 0) in spacetime, we es- tablish a 4-dimensional Cartesian coordinate system (ct, x, y, z) with respect to this origin. We can then think of points in spacetime as marked by position vec- tors relative to the origin124, given the choice of some standard basis vectors125 {∂t, ∂x, ∂y, ∂z}. Thus in this basis, the point (ct, x, y, z) can be written as the 4-vector: R = ct∂t + x∂x + y∂y + z∂z. (303) Marking two points P1 = (ct1, x1, y1, z1) and P2 = (ct2, x2, y2, z2) by the two 4-vectors R1 =ct1∂t + x1∂x + y1∂y + z1∂z R2 =ct2∂t + x2∂x + y2∂y + z2∂z (304) we can then form a vector pointing from P1 to P2 via ‘vector subtraction’: P1P2 = R2 − R1 = c∆t∂t + ∆x∂x + ∆y∂y + ∆z∂z (305) 124 More precisely, by the ‘affine subtraction map’. 125 Although the notation correctly suggests these to be partial derivative operators, we can view these as ‘unit vectors’ in each of the coordinate directions – perhaps you prefer the notation ∂x = ex for example. 139
  • 140.
    where ∆t =t2 − t1, ∆x = x2 − x1, e.t.c. Exercise 54 (Art Class) Draw a spacetime diagram with the origin at some point on an observer’s worldline. Now draw two more points P1 and P2 marked by vec- tors R1 and R2 (relative to the origin) to illustrate the vector subtraction process described above. You should get a triangle. In this manner, we define the ‘Lorentz distance’ ∆S between two points P1 and P2 on a 4-dimensional spacetime to be the ‘length’ of the 4-vector P1P2 as given by the Minkowski metric: ∆S :=η(P1P2, P1P2) = η(R2 − R1, R2 − R1) = − (c∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 . (306) Comparing to Euclidean geometry, you will notice that this almost looks like a ‘4-dimensional’ version of the Euclidean dot-product of a vector with itself: P1P2 · P1P2 = (c∆t)2 + (∆x)2 + (∆y)2 + (∆z)2 . (307) The difference is that in the Lorentz distance, we have the appearance of a minus sign in front the ‘time’ term – we call this a ‘Lorentzian signature’ (−, +, +, +). For those of you who have studied relativity, you will recall that ∆S simply your ‘spacetime interval’ which is invariant under Lorentz transformations. The object η is the Minwkoski metric126, which we can define as type of ‘Lorentzian dot-product’ between two arbitrary 4-vectors in spacetime: η(V1, V2) = V1 ·Lorentz V2 = −a1a2 + b1b2 + c1c2 + d1d2, (308) where Vj = aj∂t + bj∂x + cj∂y + dj∂z and aj, bj, cj, dj are real constants, for j = 1, 2. Exercise 55 (Easy) Compute the ‘Lorentzian dot-product’ between the following sets of vectors: • V1 = 2∂t, V2 = 6∂t + 9∂x + 6∂y + 9∂z. • V1 = 1∂t, V2 = 1∂x + 1∂y + 1∂z. • V1 = 1∂x, V2 = 1∂y + 1∂z. • V1 = 1∂t + 1∂x, V2 = V1. 126 The symbol η is the Greek letter ‘eta’. 140
  • 141.
    Now compare theseto the Euclidean dot-product between the same vectors. What major differences do you notice? Recalling the definition of the Euclidean length of a vector R = a∂t +x∂x +y∂y + z∂z: V Euclid= — V ·Euclid V = — a2 + x2 + y2 + z2, (309) what do you notice about the Euclidean length of the vector V1 = 1∂t +1∂x versus its Lorentzian length? In Lorentzian geometry, there arises non-zero vectors which have zero length! Such vectors are called ‘null vectors’ – they are tangent vectors to null geodesics (‘null lines’ or ‘light rays’). As such, these vectors are the 4-velocity vectors for ‘photons’. If we extract the 3-velocity vector v as the spatial part of the 4-velocity vector V for a photon, one should notice that the length of the 3-vector v is sim- ply the ‘speed’ of the photon: Speed = c = v , as measured by some ob- server. Because the Lorentz metric allows the ‘hyperbolic dot product’ between non-zero vectors to be positive, zero and negative, spacetime is endowed with a natural ‘light cone structure’. To see this, consider the following exercise. Exercise 56 (Lawyer of the Universe) During an evening walk down the Swan river from St. George’s College, Christabel Moffat decides to reflect on her life choices. In this moment of contemplation, she has an epiphany – if she considers her straight-line walk from the college down the river to be a walk along the x-axis whilst measuring some time elapsed t since leaving the college, she can plot her walk on a Minowski spacetime diagram. Q1: Draw a set of two coordinate axes on your page, labelling the vertical axis as time t and the horizontal axis as displacement x from the origin (St. George’s College). Consider the start of Christabel’s walk in her frame. Upon leaving the college, it is noticed that Christabel announces her adventurous nature by flashing two torches simultaneously – one in the direction of the river (+x direction) and one in the direction of the College (−x direction). Plot this event on your spacetime diagram. Hint:You should get two ‘45-degree’ (in our Euclidean view) lines emanating from the origin in opposite directions. These lines represent the journey of the light- rays emitted from the torch, in spacetime. These light rays define the surface of a 1-dimensional cone. Q2: After drawing her spacetime diagram, Christabel wonders what on Earth caused her to take up art. In this manner, she starts to think about ‘causality’ – 141
  • 142.
    that is, eventswhich are ‘causally connected’ to each other in spacetime. To help Christabel find the cause of her artistic inspiration, consider that the ‘speed of light’ is a fundamental ‘speed limit’ in our universe. For one thing to cause another, it cannot transmit information faster than light (spooky ‘action at a distance’) – therefore, everything that is causally connected must lie within some restricted region of Christabel’s spacetime diagram. Identify this region. Hint: You can consider adding another axes to your spacetime diagram – say the y axis, which lies in the direction of Mounts Bay Road perpendicular to the college- river axis). Now, imagine instead of shining two torches upon leaving the college, that Christabel instead sets off an LED hoola hoop – meaning that light rays get shone outwards in a circle in the x − y plane. If you consider the worldlines of all the light rays in this fashion, they should form a 2-dimensional surface – the light cone! Q3: Once you have identified the region in spacetime ‘causally connected’ to Christabel, what can you say about points that lie outside this region? In particu- lar, is it possible that any event outside this region could have influenced Christabel to draw spacetime diagrams? Q4: In Christabel’s frame (of mind), everything is moving relative to her – she is stationary (at rest). Except when drawing spacetime diagrams, she considers space to be 3-dimensional and time to be separate. This is a lie, though it has some element of truth. In particular, the 3-dimension space Christabel sees is her ‘rest space’ or ‘space of displacements’ – meaning everything that is ‘orthogonal’ to her wordline via the Lorentz metric. Ignoring the z-coordinate for Christabel’s vertical direction, sketch the coordinates t, x, y and draw on her spacetime diagram the set of all points which are ‘equidistant’ to her by some fixed amount – say 1. This means sketching the surface: η((ct, x, y, 0), (ct, x, y, 0)) = −(ct)2 + x2 + y2 = 1, (310) using the coordinate notation (ct, x, y, 0) to denote the 4-vector V = ct∂t +x∂x + y∂y + 0∂z. Note, you can set c = 1 for simplicity and work in so-called ‘natural units’. Hint: Recalling our tutorial on metric spaces, you should recognize this set of points to be under the general definition a ‘sphere’, except that we are using the Lorentz metric! Therefore, what you are actually sketching is a ‘Lorentzian sphere’ or ‘Hyperbolic sphere’. Such a surface has a special name – a ‘hyperboloid of one- sheet’. In the last exercise, you should have noted the following observations: 142
  • 143.
    • For eachobserver, there is a natural ‘light cone’ that surrounds them in space- time. Everything inside this light cone is causally connected to that observer, meaning that they are related in a ‘cause-and-effect’ fashion. With respect to their origin, all events in their ‘past’ are causally connected in their backward light-cone – all future events are in their forward light-cone. • Every event outside the light cone one is acausal, meaning that they are not causally connected to the observer. • Everything that lies ‘on’ the light cone, must travel at the speed of light – this means photons, carriers of the electromagnetic force. Possibly gravitons too, but they won’t exist in Minkowski space 127! • The unit sphere in Minokwski spacetime is a hyperboloid. This motivates one to consider spacetime as an exercise in ‘hyperbolic geometry’, as op- posed to ‘Euclidean’ (Newtonian) geometry. Thus, our exercises on hyper- bolic trigonometry! In this manner, we say that spacetime has a natural ‘light cone structure’ – each observer partitions spacetime with respect to their world-line, into events that lie inside, on or outside a light cone situated at each point on their wordline. If we now considered a light-cone located at every point on their wordline, then considered spacetime to be filled with worldlines (particles/observers), on then realizes that you can ‘tessellate’ spacetime with light-cones 128. This notion can be formalized mathematically to say that ‘light cones provide a foliation of spacetime’. Formally, the splitting of spacetime into different regions can be done as follows. Given a 4-vector, V = (ct, x, y, z), we say that • V is time-like if η(V, V ) 0. • V is space-like if η(V, V ) 0. • V is null or light-like if η(V, V ) = 0. Clearly, photons (light) have null vectors as vectors tangent to their worldline. Sim- ilarly, all vectors which lie inside Christabel’s light-cone in the previous exercise, would be time-like vectors. All vectors outside would be space-like. Exercise 57 (Taychon Express) In an attempt to get more time in the music prac- tice rooms, Gabrielle Ruttico steals Tessa’s acceletron and tries to turn herself 127 Minkowski spacetime is flat. Gravitons are the predicted quanta of the gravitational field, hence if gravitons exist there must be gravity present – implying some curvature of spacetime. 128 Or rather, stack light cones to fill spacetime without any gaps or overlaps on the same worldline. 143
  • 144.
    into a string-theorytachyon129, thereby travelling faster than light and hopefully backwards in time. Unfortunately, as an act of OHS, Dogburn already disabled the acceletron due to Ben’s previous enlightenment (photonic) journey. Disappointed with her time- travel progress, Gabrielle settles for more constructive stress-reduction strategies. Measured in her Gabby-centric coordinate system (the origin being the point at spacetime at which she discards the acceletron), the following events take place in spacetime: • Tea is brewed in front of her, at a distance of 0.05m in her future rest frame (the Elsie room) – 600 seconds after discarding the acceletron. To an order of magnitude, one can approximate the spatial displacement of the Elsie room as (x, y, z) = (0, 10m, 10m) from the music room. Thus, we can represent the tea brewing event with a vector: VTea = 600c∂t + 0∂x + 10∂y + 10∂z = (600c, 0, 10, 10). • Rowan’s room is stung by Adi, 1200 seconds after discarding the acceletron. To an order of magnitude, this event occurs at a displacement of (x, y, z) = (0, 100m, 100m) from the music room. Thus, in spacetime, this event is given by the vector V = (1200c, 0, 100, 100) relative to Gabrielle’s origin. • Rory the Cyborg sees battleships ablaze on the shores of Orion. This de- scribes the clash of Orions against the Antaran race – leading to many cy- cles of peace and prosperity with the defeat of the Antaran empire. Orion’s belt is about 1000 light years from the sun – lets say, along the x-axis in Gabby’s frame for simplicity. This event is described by a spacetime vector (ct, x, y, z) = (0, 1000 × 365 × 24 × 60 × 60 × c, 0, 0). • Ben Luo spontaneously turns into a photon (long-term side-effects of pho- tonitis). This is described by a vector: V = (c, c, 0, 0). Using the previous definitions, deduce which of the above events are ‘spacelike’, ‘timelike’ and ‘null’ (light-like) with respect to Gabrielle’s origin (i.e. the light- cone structure of her world-line). 16.2.3 Projections and Familiar Formulas As a reward for completing the earlier part of this tutorial, we now see how hyper- bolic trigonometry can be used to replace your usual ‘time-dilation’ and ‘length- 129 Tachyons are the mathematical reason for the instability (infinities and divergences) of traditional string theory – supersymmetry eliminates these Tachyons and protects the theory from instabilities. 144
  • 145.
    contraction’ formulas. Recall thatgiven two vectors v and u in a Euclidean vector space, the Euclidean ‘dot-product’ contains information about their lengths and the ‘angle’ θ between them: u ·Euclid v = u v cos(θ). (311) This special geometric interpretation of the ‘inner product’ (dot-product) is due to the ‘Cauchy-Schwarz’ inequality, which holds for all positive-definite inner prod- uct spaces (vector spaces with positive-definite inner-products): |u · v|≤ u v . (312) To see this, we simply expand the absolute value (by its definition) and re-arrange the inequality: − 1 ≤ u · v u v ≤ 1. (313) Now, what monotone function lies between ±1? Cosine of course – in the do- main [0, π] it is monotone decreasing. Therefore, we can taken the inverse of the cosine function of both sides of the inequality (reversing it since cosine is decreas- ing): 0 ≤ arccos( u · v u v ) ≤ π, (314) which allows one to interpret u·v u v as some geometric angle θ. Inside a spacetime observer’s light cone, the vector space they generate relative to their origin is ‘negative definite’ as we partially saw earlier – in particular, the inner-product of any two future-pointing time-like vectors, U and V , is neg- ative: η(U, V ) ≤ 0. (315) Therefore, inside the light cone, we get the ‘reverse Cauchy-Schwarz inequal- ity’: |U ·Lorentz V |≥ V U . (316) Re-arranging and noting that |U ·Lorentz V |= −U ·Lorentz V for any future-pointing time-like vectors U, V , we get: − U ·Lorentz V U U ≥ 1. (317) What trigonometric function satisfies the property that it is always ≥ 1? Hyper- bolic cosine! Thus, we can define the hyperbolic angle or rapidity between any 145
  • 146.
    two forward-pointing time-likevectors, U and V in spacetime: θ = cosh−1 (−1 U ·Lorentz V U U ). (318) N.B: The ‘norm’ used thus far, U , is not the Euclidean norm! It is the Lorentz norm – for it to define a positive length, we define it as: U Lorentz= — |U ·Lorentz U|. (319) If U is time-like, it’s Lorentz dot-product with itself is negative (from previous definitions), in which case: U Lorentz= — |U ·Lorentz U| = — −U ·Lorentz U. (320) Note that the idea of the hyperbolic angle allows us to re-write the usual time- dilation and length-contraction formulas in terms of hyperbolic trigonometry. In particular, if an observer measures some event described by a 4-vector V relative to their origin, the time at which that event occurs will the be projection of V onto (parallel to) their world-line! However, recall that in Euclidean vector spaces we used the dot-product to give us the parallel projection of one vector on another. Similarly, recall that we could also get perpendicular projections in a similar man- ner. Formally, given an event V relative to some observer, one can decompose the event into components parallel and perpendicular to the observers worldline: V = V + V ⊥ . (321) As it turns out, the space of vectors which is perpendicular to the worldline is a 3-dimensional vector space – their ‘rest space’ or physical space by everyday perception. Hence, if we describe an observer by the vector U and an external event (or observer) by a vector V , the component of the projection of the external event onto the observer U is given by: U ·Lorentz V = − U V cosh(θ), (322) where θ is the hyperbolic angle between U and V . Hence, V = V cosh(θ), V ⊥ = V sinh(θ). (323) 146
  • 147.
    Problem 24 (Challenging)Prove the above projection formulas, using the fun- damental hyperbolic trig identity and the fact that V and V ⊥ are ‘hyperbolic orthogonal’ (orthogonal with respect to the Lorentz metric). Physically, one may interpret the length V of the parallel-projection V of an event V onto an observers worldline, as the time elapsed (relative to the observers origin) as measured that observer. Since V is a vector between the origin O and some point P (the event), this suggests that V is the proper-time between events O and P and that V is the ‘dilated-time’ as measured by our initial observer. Thus, one would interpret: cosh(θ) = 1 ˜ 1 − v2 c2 (324) as the time-dilation or factor, which you may know as γ ! Exercise 58 (Relative Velocities and Hyper-trig) Using your knowledge of hy- perbolic trigonometry, show that the above interpretation of cosh(θ) as the ‘Lorentz’ factor γ is indeed sensible. In particular, first show that: v = tanh(θ). (325) Now, since tanh(θ) = sinh(θ) cosh(θ) and since: V = cosh(θ), V ⊥ = sinh(θ), (326) what does this suggest? It suggests that: V = sinh(θ) cosh(θ) = ? . (327) Of course, this is just ‘spatial distance (length)’ divided by time – a physical 3- dimensional velocity as measured by our proverbial observer! Exercise 59 (Length Contraction and relativistic velocity addition) Show that the length-contraction factor is indeed given by sinh(θ). Now use the hyperbolic trig identity: tanh(θ + α) = ... to derive the relativistic velocity addition formula by denoting tanh(θ) = v and tanh(α) = u for some 3-velocities with magnitudes u and v. Orthogonal complements! 147
  • 148.
    17 Tutorial 15:Differential Equations and Operators In this tutorial, we will investigate the topic of ‘differential equations’. Differential equations are perhaps one of the most widely used mathematical tools in science. Together with their ‘discrete’ counterparts – ‘difference equations’ (recursion re- lations), differential equations describe the vast majority (almost all) explainable processes in the natural world. Some popular examples of differential equations in physics include – Newton’s 2nd Law of motion (F = ma), Maxwell’s equations for electromagnetism (de- scribing light and all forms of electromagnetic radiation), Einstein’s gravitational field equations for general relativity, the ‘heat equation’ for thermodynamics, the Navier-Stokes equation governing fluid mechanics, the Simple Harmonic Oscilla- tor equation (describing all oscillatory motion) and the ‘wave equation’. In a wider setting, we also have the Verhulst equations for ‘population-growth’, predator-prey models, ‘damped oscillation equations’ for LRC electrical circuits, chemical re- action rate equations and the Black-Scholes stochastic differential equations for modelling the stock market. In essence, the list of examples of applications of differential equations endless. Certainly, for this reason, if you want to ever get a job in mathematical modelling, then it’s a good idea to get some mastery of differential equations! In this tutorial, we revise the basics with a non-standard presentation – in particular, we will study differential equations in the context of ‘differential operators’. This will allow you to provide a link between what you learn in ‘linear algebra’ and/or ‘quantum mechanics’ to differential equations, under a branch of mathematics known as ‘op- erator theory’ and ‘functional analysis’. 17.1 Differential Operators and Simple DEs In mathematics, an ‘operator’ is a general term for an object which acts or ‘oper- ates’ on another object, to produce a new (transformed) object. For example, you may recall from class or an earlier college tutorial that matrices were simply co- ordinate representations of (finite-dimensional) ‘linear operators’. In your courses, you will mostly study easy differential equations – in particular, ‘linear ones’. As such, you will encounter ‘linear differential operators’. These operators are all ‘non-compact’ and in some sense, ‘infinite-dimensional’ – which gives them very interesting properties. 148
  • 149.
    Recall that afunction f of one variable, is defined as a mapping between sets130 f : SDomain → SCodomain x → f(x). (328) In general, the differential equations you will study will involve functions of a real variable, hence SDomain = R. Depending on the application, its range may be that of complex C numbers – e.g. if you are studying AC circuits or electromagnetism, or it may be real. In essence, a differential equation is simply an equation involving derivatives of some function. Viewed another way, a differential equation is essentially a differ- ential operator acting on some function to transform it. As a small technical note, one may be taught (perhaps mislead by notation) that a differential equation in- volves an ‘independent variable’(s) which you differentiate with respect to, as well as a dependent variable(s) or ‘response variable’(s) which you are differentiating – for example, dy dx = 0. Technically speaking, here y is a function of x, so we write: y = y(x) to formally denote this. It’s important to keep this in the back of your mind, even if the common notation omits this. Example 12 (The World’s Simplest Differential Equation) f(x) = 0. (329) Technically speaking, this is a differential equation with the zeroth derivative of f with respect to x. As such it is a ‘zeroth-order’ differential equation. The world’s next simplest DE would be: f(x) = c, (330) where c is some constant. Then of course, one could have f(x) is equal to any specified function of x which you desire. These are all trivial differential equations. Example 13 (The World’s Next Simplest Differential Equation) We now wish to solve the following equation df dx = 0, (331) for the function f = f(x).131 The solution of course, is trivial since the differential equation asks – what functions f have derivative zero with respect to x? Constant functions of course: f(x) = c, for some constant c. 130 Note that a function maps from its domain (some predetermined set of values on which it is defined) to its co-domain (some pre-determined ‘target set’). Its ’range’ is the image of its domain – that is, every member of its co-domain which is equal to f(somepointinthedomain). Sometimes, ‘co-domain’ and ‘range’ are used interchangeably. 131 You may also see this equation written in the notation: f = 0, with primes being short-hand notation for denoting differentiation. 149
  • 150.
    Note that wecan also view this differential equation as follows: Df(x) = 0, (332) where D = d dx is a ‘first-order differential operator’. In this manner, we view the differential equation as some operator (the operator d dx which acts on objects by taking their derivative with respect to x) acting on f to give zero. In the previous example, one may see that asking for solutions to a differential equation: Df = 0 (333) is the same asking what the kernel of the relevant differential operator D is – i.e. the set of functions which get sent to 0 when the operator D acts on them. In this manner, to say that f(x) = c is a solution to df dx = 0 is the same as saying that the function f(x) = c lies in the kernel of the differential operator d dx . Trivial examples aside, we now proceed with some more interesting (and useful) examples of differential equations along with their solution strategies and algo- rithms on the way. Example 14 (A First-Order DE) Consider the following differential equation: Df = 0, (334) where f = f(x) is some function of x and D = d dx − m, for some constant m. We can re-write this as df dx = m, (335) and seek all solutions to this differential equation (finding the kernel of D). There are two ways to proceed from this point. One point is to make an intuitive guess and test if it is correct. In this case, asking for functions with a constant derivative is the same as asking for functions of a constant gradient m, so we know that all solutions have to be ‘straight lines’ – i.e. linear functions of the form: f(x) = mx + b, (336) where b is an arbitrary constant (the y-intercept of the graph y = f(x) = mx + b). Alternatively, we can proceed in a more ‘systematic way’ using the method of ‘separation of variables’, which is based on the concept of ‘exact differentials’. In this manner, you can take it for granted that you can use Leibniz notation in a literal way which is still rigorous (with the right technology) – so we multiply both sides of the equation (335) by dx to get rid of it from the denominator: df = mdx. (337) 150
  • 151.
    We now havean exact differential df of f on the left-hand side. The fundamental theorem of calculus tells us how to integrate this precisely: f(x) f(x0) df = x x0 md˜x =⇒ f(x) − f(x0) =m(x − x0) =⇒ f(x) =m(x − x0) + f(x0) =mx + (f(x0) − mx0), (338) where x and x0 are the limits we are integrating between, x0 being some ‘initial point’ chosen a-priori. Without further information, we can simply re-label the constant (f(x0) − mx0) = b. Alternatively, we could have used the ‘indefinite integral’ approach and arrived at the same conclusion: f(x) = mx + b. Hence, the kernel of the differential operator d dx − m is the set of all linear functions on R – that is, the set of functions {f(x) = mx + b}, where b is an arbitrary parameter. In the previous example, we saw that two approaches worked – one was the semi- heuristic approach which relied on intuition and guessing the correct answer. The second, was a somewhat algorithmic approach – ‘separation of variables’. We will now use separation of variables in one more toy example, before proceeding to a physical, less trivial example – Newton’s second law of motion. Example 15 (Second, Third and Infinite Order Differential Equations) Consider nowz, the kernel of the second-order differential operator D = d2 dx2 . That is, con- sider all solutions of the differential equation: d2 dx2 f = 0, (339) where f = f(x). This is a second-order differential equation because it involves two derivatives of x. We shall now switch to using primes to denote derivatives, when convenient – e.g. d2f dx2 := f , df dx = f e.t.c. Again, it is easy to see that the solution to the differential equation f = 0 is the set of all straight lines, since straight lines have ‘no curvature’ (or equivalently, no acceleration). Systemati- 151
  • 152.
    cally, we canuse separation of variables as before: d2 dx2 f =0 ⇐⇒ d dx ( df dx ) = 0 ⇐⇒ d dx f = 0 =⇒ f (x) f (x0) df = x x0 0 · dx = 0 =⇒ f (x) − f (x0) =0, let m = f (x0) =⇒ f := df dx = m =⇒ df = mdx. (340) Again, we can avoid specifying limits of integration by using ‘indefinite integrals’ and keeping track of integration constants – however, for now it’s best to leave them in since students often have a bad habit of omitting them. Also, later on when one does ‘initial value problems’, keeping the limits of integration is equivalent to use ‘initial data’ to determine your integration constants at the end ... For a more useful example of integration, we now turn Newton’s definition (his second law of motion) of the force experienced by a point particle: Force = mass of particle × acceleration of particle. (341) In one-dimensional motion, recall that for a particle with displacement x from the origin, its velocity v and acceleration a are defined as derivatives with respect to time: v = dx dt , a = dv dt = d2 dt2 x. (342) In this manner, we can re-write Newton’s second law in one-dimension as: F = m d2x dt2 . (343) This is simply a definition. To get useful, predictive physics out of Newton’s law, we need a force law – this means, some functional form for F! One simple exam- ple, is to consider an object falling under gravity from the Newtonian view. For an object of mass m that is small with respect to the Earth’s mass, close to the surface of the Earth we can approximate the force it experiences due to gravity as: Fgrav = mg, (344) 152
  • 153.
    where g isthe average acceleration due to gravity near the Earth’s surface – say, 9.8m/s downwards. Ignoring air-resistance and all other effects, this ‘free fall’ motion is one-dimension (downwards). If let x = x(t) be the vertical distance of an object from the Earth’s surface, Newton’s second law of motion gives rise to the following differential equation: Fgrav = m d2x dt2 =⇒ mg = m d2x dt2 . (345) To cancel the mass m from both sides of the above differential equation is a subtle point – on the left-hand side, we have the ‘gravitational mass’ and on the right, we have the ‘inertial mass’. Indeed, this was something Newton considered. Flipping this around, we can say that the g on the left side is ’gravitational acceleration’ and that the d2x dt2 on the right-hand side is ‘inertial acceleration’. That these are ‘equivalent’, is indeed a statement of ‘Einstein’s equivalence’ principle of General Relativity (in some weak form)! Physics aside, we end up with the second-order differential equation: d2x dt2 = g. (346) Clearly x must be some quadratic function (degree two polynomial) of t. To prove this explicitly, let 9x = dx dt , then use separation of variables: d2x dt2 = d 9x dt = g =⇒ d( 9x) = gdt =⇒ 9x =gt + u ⇐⇒ dx = (gt + u)dt =⇒ dx = (gt + u)dt =⇒ x = 1 2 gt2 + ut + c. (347) where u is a constant of integration132. Now, you may recall from high-school physics that this is a more-familiar consequence of Newton’s second law – that is, an object experience a constant force such as gravity! Hence we have x(t) = 1 2at2 + ut, where a = g is the acceleration of the object (due to gravity) and u is its ‘initial velocity’ – that is, v(0) = dx dt |t=0. Furthermore, c = x(0) is its 132 Recall earlier comments, we can either carefully add constants of integration while doing indef- inite integrals, or explicitly specify the limits of integration – both are equivalent. 153
  • 154.
    ‘initial displacement’ –its displacement at the initial time t = 0. Note that we can equivalently say the kernel of the differential operator D = d2 dx2 is the set of all ‘parabolic functions’, or all functions of the form: 1 2 gt2 + ut + c, (348) where g, u, c are arbitrary constants (in general). With the constants of integration undetermined, we call x(t) = 1 2gt2 + ut + c the ‘general solution’ of the previous differential equation. Because the previous example was a ‘second order’ differential equation, we had to ‘integrate twice’ – meaning we needed two pieces of ‘initial data’ to specify a unique solution. The data we need is the initial velocity u = 9x(0) of our object and its initial displace- ment x(0). In general, for an ‘n-th’ order differential equation of a function of one real variable, you need n pieces of ‘initial data’ to get a unique solution. 17.2 Physical Examples Thus far, we have considered ‘1-dimensional systems’ in the sense that ‘response’ variable we solved for in our differential equations, was 1-dimensional (a function). Since the concept of ‘dimension’ applies to more general things than just ‘space’, you will find that nature is governed by many differential equations with differ- ent dimensionality. As such, you can think of differential equations as modelling the ‘time-evolution’ of a (smooth133system – that is, how some physical system evolves in time. We now consider the following following scenario. After borrowing the old cannon from Kings Park, St. George’s College refurbishes the cannon and places it on top of the tower. Having gone mad with power, the warden – Ian Hardy, decides to ‘cleanse’ college row by firing the cannon at the other colleges. The motion of the cannon-balls is to some approximation, governed by Newton’s second law: F = m d2 dt2 r, (349) where vecr = r(t) is the displacement vector for the cannon-ball, a vector func- tion of time. For now, we can establish a coordinate frame on top of the tower – 133 Technically, your system must be ‘smooth’ in the sense that it is differentiable – that is, it’s state-space is a smooth manifold. Some processes are not continuous, therefore not differentiable! Nonetheless, many discrete processes, such as quantum random walks (Ben Luo), can be approxi- mated by some smooth process governed by a differential equation. 154
  • 155.
    considering the verticaldirection of the motion to be in the +z direction, and the horizontal motion along college row to be in the +x direction. To turn Newton’s second law into a differential equation for the ‘time-evolution’ of the cannon-ball (i.e. its trajectory), we need to know what forces are acting on the ball. For simplicity, lets say that the forces acting on the cannon-ball are: • Gravity acting downwards: Fgrav = mg (350) where g = (0, 0, −g)m/s2 is the acceleration due to gravity and g = 9.8m/s2 is its magnitude. • A ‘drag force’ opposing the motion of the ball, due to friction between the cannon-ball and molecules of air. For reasonably low-velocity objects like cannon-balls, we can model aerodynamic drag linearly – that is, linearly proportional to the ball’s velocity: Fdrag = −bv (351) where v = (vx, vy, vz) is the cannon-ball’s velocity vector (note that vy = 0, by assumption / our chosen orientation of coordinate axis). Technically speaking, this is the ‘Stokes’ drag – modelling the air as a fluid (in the most general sense) and ignoring turbulence. Thus, the total force is acting on the ball is F = Fgrav +Fdrag = (−bvx, 0, −mg− bvz). We now get a system of second-order differential equations: F =m d2 dt2 r ⇐⇒ (−bvx, −bvy, −mg − bvz) = (m d2x dt2 , d2y dt2 , d2z dt2 ) =⇒ d2x dt2 = − b m vx (352) d2y dt2 = − b m vy d2z dt2 = − b m vz − g. (353) note that vx := dx dt , vz := dz dt etc so the velocities are functions of time. Problem 25 (Gallilean Relativity) Recalling his study of the Gallilean symmetry group, Ian Hardy notes that when he calculates the trajectory of his cannon-ball that he can rotate coordinates to make its motion 2-dimensional. This is because 155
  • 156.
    ‘Newtonian mechanics’ isrelative under the ‘Gallilean Lie group’ – or equiva- lently, Newton laws of physics are the same in all Gallilean inertial reference frames134. This means, we can fire it so its initial velocity in the y-direction is zero – hence we can ignore the y coordinate and y differential equation, since we will simply have y(t) = y0 where y0 is the initial y coordinate of the cannon-ball. Nonetheless, he needs to solve the x and z differential equations to get the motion. Luckily, these equations are uncoupled! This means that they are independent, so we can solve them separately. I: Solve the projectile motion differential equation (352) for x(t), using separa- tion of variables. You will need to use that fact that vx = dx dt to do the second integration. When separating variables, you will need to use the fact that: dv v = ln(v) + c, (354) where c is some constant of integration determined by you ‘initial value’ data (i.e. initial velocities and initial time). Alternatively, you can define your integration limits explicitly in terms of your initial data: t0 = 0, vx(0), vz(0) and x0, z0. II: Solve the differential equation (353) for the z component of the cannon-ball trajectory. Hint: recall that f f = ln(f) + c, where f is some function, f is its derivative and c is a constant of integration. You should check your answers with a tutor – or ask them for help. Note that your solutions should be of the form: x(t) =x(0) − m b vx(0)(1 − e− b m t ) z(t) =z(0) − mg b t + m b (vz(0) + mg b )(1 − e− b m t ). (355) Problem 26 (Physical Meaning of DEs) In the previous problem, find an expres- sion for the velocity when the ‘drag force’ cancels out the gravitational force – i.e. when: Fgrav + Fdrag = 0. (356) This is the point at which the net force acting on the cannon ball is zero – meaning, it travels thereafter with constant velocity. Such a velocity is referred to as the objects terminal velocity. Physically, your answer should depend on the constant b 134 Einstein’s theory of special relativity says that nature (a 4-dimensional affine space instead of a 3-dimensional vector space) is invariant under transformations of the ‘Lorentz symmetry group’, which is different to the Gallilean group. 156
  • 157.
    since this isthe aerodynamic constant or ‘drag coefficient’, related to the geometry of the object and how we model air as a fluid. Q2: Find the ‘time’ of flight of the cannon-ball. In particular, given z(0) = h is the height of the St. George’s College tower (say 15 metres) and given that z(tfinal) ≈ 0, when the cannon-ball hits St. Catherine’s college – solve the z equation of motion for the time elapsed: ∆t = tfinal − t0 = tfinal (setting t0 = 0 for simplicity). Note, if you can’t solve it analytically – first try letting b → 0 then solve the for the simplified case, where there is ‘no air-resistance’. Q3: Set b = 0 and re-solve the differential equation arising from Newton’s second Law. This should give you the trajectory without air-resistance. In this try, you can express the z coordinate in terms of x and should get some of the form: z ∼ x2, which is the equation for a parabola! This tells us that in the absence of drag forces (e.g. in a vacuum), projectile motion under gravity follows parabolic trajectories. Now, using your knowledge of limits and the exponential function, carefully take the limit b → 0 for the trajectory (x(t), 0, z(t)) of the cannon-ball in the case with drag forces present. This should coincide with your result for the solution to the differential equation without air-resistance. Thus we have considered a subset of a class of differential equations called ‘or- dinary differential equations’ (ODEs). They are the simplest types of differential equations, which is why we can find nice ‘analytic’ solutions. In general, differen- tial equations can be extremely hard to solve – sometimes, only numerical solutions are available (effectively speaking, since analytic solutions involving infinite sums of special functions can be slower for a computer to evaluate than a solution gener- ated by numerical means). For engineering purposes, the overwhelming majority of physical processes are modelled using ‘numerical analysis’ to solve complicated differential equations. Nonetheless, it is important and instructive to get a handle of differential equations with known analytic solutions behave. 17.3 Operators, Eigenfunctions and Spectra In the previous example of projectile motion with air resistance, we can view the problem as a statement in operator theory: Dr(t) = (0, 0, g) (357) where D = d2 dt2 − b d dt is the Newtonian operator minus the air-resistance opera- tor (they act on r to give Netwon’s law F = m d2 dt2 r and the drag force: Fdrag = 157
  • 158.
    −b d dt r).In the absence of gravity, the Kernel of the of the operator D is simply solutions to the projectile motion differential equation on the International space station (where gravity is negligible). You may wonder what the purpose is for the operator viewpoint – indeed, for such simple differential equations, it serves only an aesthetic purpose to connect the theory of differential equations to linear and abstract algebra. However, the operator formalism is immensely useful when studying properties of more complicated equations – for example the ‘heat equa- tion’. This is a very active area of research as you may see by typing ‘Heat Kernel’ into Google. For those of you who studied quantum mechanics, you should be familiar with the momentum operator: ˆpx = −i¯h d dx , (358) where i is the imaginary unit and ¯h is Planck’s constant (which sets the ‘length’ scale of quantum behaviour). You may now ask, what are the eigenfunctions of the momentum operator? That is, what functions ψ(x) solve the eigenvalue equation (a differential equation): ˆpxψ(x) = kψ(x), (359) where k is the eigenvalue of the eigenfunction ψ. Recalling eigenvectors and eigen- values from linear algebra, you will notice that functions form a vector space in the abstract sense – that is, they satisfy all the vector space axioms (addition, linearity etc). Hence, eigenfunctions and eigenvectors are the same concept – except that eigenfunctions typically exist in infinite dimensional vector spaces. Now, to solve our problem of finding the eigenfunctions of the momentum operator, we have to solve the differential equation: − i¯h dψ dx = kψ(x). (360) Exercise 60 (Snakes on a Plane Wave) After being turned into a photon, Ben Luo decides to take revenge on the tutorial students who drew inaccurate spacetime di- agrams of his journey of enlightenment. In this manner, he decides to turn Matt Fernandez into a plane wave – that is, a solution to Schrodinger’s equation in free space. To do this, he first has to set Matt loose into a region of the universe where gravity is negligible and there are no external interactions interfering with him. As his final revenge, Ben sets a bunch of quantum snakes loose to hunt down Matt. Having turned into plane-wave, Matt has a definite momentum but indefinite posi- tion. Quantum mechanically speaking, he exists across all space simultaneously – 158
  • 159.
    until an observer(or snake) performs a position measurement on him. Thus, he is safe for now. To help Matt, solve the above differential equation and show that you do indeed get ‘plane wave’ solutions. In the last exercise, you should see that ψ(x) = Ae ik ¯h x is the general solution to the differential equation. Here A is some constant, which is determined by ‘normalization’ (total probability summing to 1) which we can ignore for now135. The complex exponential function e ik ¯h x is a ‘plane wave’ – like electromagnetic waves in free space. To see that explicitly, you can use Euler’s formula: e ik ¯h x = cos(k ¯h x) + i sin(k ¯h). In the context of operator theory, we say that e ik ¯h x is an eigenfunction of the mo- mentum operator, −i¯hdψ dx with eigenvalue k. Quantum mechanically, the process of acting an ‘operator’ (observable) on a wavefunction (representing a particle, per- son or cat in a box etc) is precisely the process of measurement. The eigenvalue we get is the result ‘measurement outcome’ – in this case, it is a value for the momentum k in the x-direction of Matt Plane Wave Fernandez. How does this co- incide with what you see in linear algebra? Well eigenfunctions and eigenvectors are really just part of the same general concept – to make sense of everything in an efficient, powerful way, you will need to study the theory of ‘Hilbert Spaces’. This is also illustrates one motivation and application for operator theory – the entirety of quantum mechanics is based on it! In more general terms, studying the proper- ties of differential operators can tell you a lot about the properties of the solutions to the differential equations they generate ... even if you can’t find them! 135 Since space is mathematically infinite, we must restrict to some finite space / region ... or we will get a divergent integral. 159
  • 160.
    18 Tutorial 16:DifferentialEquations and Integrating Fac- tors In the last tutorial, we looked at the preliminary notion of ‘differential operators’ in the context of linear ‘Ordinary Differential Equations’ (ODEs). In the examples and problems covered, we were able to solve the differential equations arising from various processes by the method of ‘separation of variables’. Although powerful, the method of separation of variables only works if a differential equation is ‘sep- arable’ – most differential equations aren’t, although many important differential equations are. As it turns out, whether or not a differential equation is separable is intimately tied to the coordinate system in which it arises. In particular, the study of the sepa- rability of Elliptic Partial Differential Equations (covering a vast class of physical phenomena) – such as the ‘Laplace equation’, is a contemporary area of research in differential geometry 136. Luckily for us, many differential equations which do not ’appear’ to be separable, can be put into a ‘separable form’ by using simple tool – an ‘integrating factor’. 18.1 Review – Theory of separation of variables Definition 10 (Separability) A first order differential equation in y, is separable if it can be written in the form: dy dx = h(y)g(x), (361) for some functions h and g. Previously, we looked at n-th order ordinary differential equations137 of the form: dn dxn f(y) = g(x)h(y) (362) where f, g, h are suitably defined functions of x or y and n ≥ 1 (in particular, we looked at simple cases with f(y) = y). Such equations were easy to solve because we could directly ‘integrate’ them. In particular, by letting v(y) = dn−1 dxn−1 , the DE (362) becomes: dv dx = g(x)h(y), (363) 136 In this regard, recent studies of properties and existence of the ‘Benenti tensor’ marks a critical advancement in this area. 137 Specifically, for n = 1 and n = 2. 160
  • 161.
    hence separating variablesgives: dv(y) g(y) = g(x)dx, (364) allowing us to explicitly integrate the left-hand side with respect to y and the right- hand side with respect to x. By applying ‘initial conditions’ (physical data), we then get a unique solution for v(y) = dn−1 dxn−1 f(y). Applying this process n times, we finally arrive at an implicit solution for y in terms of the independent variable, x: G(y) = H(x), (365) where G and H are functions determined by integration. We can re-write this as: F(x, y) = G(y) − H(x) = 0. (366) Hence, solutions to our original differential equation are level curves of the func- tion F of two variables. If F continuously differentiable on some open set U, then by the implicit function theorem, it follows (roughly138) that on some open subset of U, we can explicitly write y as a function of x: y = Q(x), (367) where Q is some appropriate function, providing an explicit solution to our original differential equation. If you review this process, one will immediately see that the key ingredient behind ‘separation of variables’ for ODEs, is the existence of a ‘total differential’ or ‘exact differential 1-form’ (recall tutorial 4). In particular, we had dv(y) g(y) = g(x)dx, (368) and it was stated that one could explicitly integrate the left-hand side of the equa- tion: dv(y) g(y) = v (y) g(y) dy = .... (369) 138 For a more ‘explicit’( accurate) statement of the implicit function theorem, see your calculus textbook. 161
  • 162.
    In general, thisis only possible if the function v (y) g(y) is of a special form – in partic- ular, if v (y) = λg (y) (for some constant λ), then we can use the identity: g (y) g(y) dy = ln[g(y)] + c, (370) where c is some constant of integration. This is because we have an exact differen- tial 139 g (y) g(y) dy = d(ln[g(y)]), (371) allowing us to apply the ‘fundamental theorem of Calculus’. If the left-hand side, dv(y) g(y) , is not of this form, either we cannot integrate it or we must use some special ‘tricks’ to put it into this form. Problem 27 (A question of separability) Consider the following differential equa- tion for y as a function of t: dy dt + 5 t y = t − 2 + 2 t . (372) Now try and solve this differential equation for y explicitly in terms of t, using the method of separation of variables. Hint: If you can’t complete this problem in 5−10 minutes, move to the next section. 18.2 Integration Factors For now, we shall consider ordinary first-order differential equations. The integrat- ing factor method can be used recursively for higher-order differential equations ... if you are lucky. For example, recall the projectile motion problems in tuto- rial 15 – these were second order differential equations, but effectively amounted two sequential first order differential equations for the projectile trajectory (first we solved for velocity, then displacement). In general, the integrating factor method is useful for solving ODEs of the follow- ing form: dy dx + P(x)y = Q(x), (373) 139 Recall that for a function f of one variable – say y, its ‘exterior derivative’ or exact differential is given by: df = df dy dy. The term dy is a differential 1-form – an object which is ‘dual’ to unit vector ey in the y-direction. 162
  • 163.
    where P andQ are functions of the independent variable, x. To put this differential equation into ‘separable form’ (defined earlier), we may consider multiplying it by some function I(x) to get an exact-differential on both sides. Assuming this is possible, we have: I(x)( dy dx + P(x)y) = I(x)Q(x), (374) where (dy dx + P(x)y) is a total derivative. Alternatively, multiplying by dx on both sides, we get I(x)(dy + P(x)ydx) = I(x)Q(x)dx, (375) where I(x)(dy+P(x)ydx) is an exact differential.140 Hence, we must have: I(x)( dy dx + P(x)y) =I(x) dy dx + y dI dx ⇐⇒ I(x)(dy + P(x)ydx) =d(I(x)y) = ydI(x) + Idy = y dI dx dx + Idy. (376) Comparing coefficients of dx and dy (linearly independent dual vectors) on the left- and right-hand sides, we must have: dI dx = I(x)P(x), I(x) = I(x). (377) Hence we have a separable first-order differential equation for the integrating fac- tor, I(x): dI dx =I(x)P(x) =⇒ dI I =P(x)dx =⇒ dI I = P(x)dx =⇒ ln[I(x)] =C + P(x)dx =⇒ I(x) =e P(x)dx , set C=0 . (378) 140 Recall that an exact differential, or ‘exterior derivative’ of a function f(x, y) of two variables is given given by: df = df dx dx + df dx dy. 163
  • 164.
    Note that here,the constant C of integration with respect to I(x) is superfluous – hence we discard it by setting it zero. Hence we arrive at a functional expression for our integration factor. Doing this process in reverse, we can then solve our original differential equation! To summarize, we can reduce any differential equation of the form: dy dx + P(x)y = Q(x), (379) to separable one: I(x)( dy dx + P(x)y) =Q(x)I(x) ⇐⇒ d(I(x)y) = Q(X)I(x)dx ⇐⇒ I(x)y = λ + Q(x)I(x)dx, (380) where λ is the constant of integration appearing from the left-hand side. We did this by multiplying both sides by an integration factor I(x), whose form is given by: I(x) = e P(x)dx . (381) Exercise 61 (Reading the Question) In an alternate universe, William is still work- ing on the first problem – not having read the hint pertaining to needing an alter- nate solution strategy. To help alternate William out, try solving the earlier differential equation for y as function of t dy dt + 5 t y = t − 2 + 2 t , (382) by using the integration factor method. Hint: Note that since the independent variable here is t, we have P(t) = 5 t , Q(t) = t − 2 + 2 t and hence: I(t) =e 5 t dt =e5ln(t) =eln(t5) =t5 , (383) (ignoring the constant of integration). 164
  • 165.
    Problem 28 (PartyPatrol) In a twist of events, during a “Filius Fogg themed college party, Alice becomes a bit too rowdy. The Sherrif on duty – Matthew Goss, decides that it’s time to Taser Alice with the official RA Taser. When dry, the human skin has a resistance of about 100, 000 Ohms. To model this tasering process, we can consider Alice to be a resistor of 100, 000Ω (Ohms) and Matt’s taser to be a discharging capacitor, with a capacitance of C = 100µF (micro farads). This makes the system an ‘RC Circuit’. Using Ohms Law, we have V = IR, (384) between any two points on the circuit, where the voltage V and current I are functions of time and the resitance R between the two points is constant. Since capacitance C is related to the voltage and charge Q stored in the taser we have: Q = CV. (385) Finally, since current is defined as the ‘rate of flow of charge’ through any chosen point in the circuit, we have I = dQ dt . Thus, differentiating the capacitance equation with respect to t (noting that the capacitance C is constant), we get: dQ dt = I = C dV dt . (386) Combining this with Ohm’s law and conservation of charge (Kirchoff’s current law), we get the following differential equation for the voltage V : dV dt + 1 RC V = 0. (387) Q0[Easy]: Reproduce the intermediate steps required to derive the above differen- tial equation. Now, if you like – instead of eliminating the current I, try eliminating the voltage V and arrive at differential equation for the current I as a function of time t. Q1: Is this differential equation separable? If so, re-arrange it to the canonical form defined at the start of the tutorial. Q2: Solve this differential equation for the voltage V as a function of time t, where t = 0 is the time of initial discharge (tasering). To get a unique solution, set the initial voltage V0 := V (0) to be 150, 000 volts. Hint: Your solution involve some sort of ‘exponential decay’. Can you think of why this makes sense physically? 165
  • 166.
    Q3: What isthe voltage V , after a time t = 3 seconds of tasing? Would this voltage ‘stun’ the target, or is it lethal? Q4: There is one special constant that characterises an RC circuit. In fact, any sort of ‘exponential decay law’ (analogous to the ‘half-life’ of a radioactive sub- stance) such as the ‘skin-depth’ of an electromagnetic wave penetrating some sur- face has an equivalent constant. This constant is called the ‘RC time constant’, defined by: τ = RC. (388) This is equivalent to the time it takes to discharge the capacitor to 1 e = e−1 ≈ 36.8% of its initial charge. Prove that τ = RC is indeed the time at which V = 1 e V0. Now compute the time constant τ for this RC circuit. Q6: Having sobered up from her first tasering, Alice continues to party – however, an unfortunate turn of events leads to someone spilling drink on her, thus making her extremely aggressive. Being a responsible Sherrif, Matt decides to taser Alice again. Uh oh! When setting the voltage, Matt forgot to take into account that now wet, Alice’s equivalent resistance is reduced to R = 1000Ω. Recompute the quantities in Q3, Q4 and Q5 with this new value for resistance. For most of you, the last problem should have been relatively easy since no integra- tion factor was required. The following problem can be solved either using integra- tion factor, or (with a trivial trick) immediately by separation of variables. Problem 29 (Fresher Inductions) Having already tased Alice, Matthew Goss grows hungry for power. As such, in the upcoming fresher inductions, he decides to con- nect new college students to an inductor – thus forming an ‘RL-Circuit’ (resistor- inductor circuit). An inductor is essentially a coil of wire (e.g. copper wound on a torus) which acts to resist changes in electric current that flows through it. Drawing on the concept of ‘inertia’ from classical mechanics, one can very loosely consider it as something analogous to ‘mass’ for an electrical circuit. The inductance L of a circuit element is defined by the magnetic flux φ through the circuit, generated by a flow of charge (current) I: L = dφ dI . (389) Faraday’s law of induction states that the voltage induced by any change in mag- 166
  • 167.
    netic flux throughthe circuit is given by: V = dφ dt =⇒ V = L dI dt . (390) Combing this with Ohm’s Law: V = IR and the conservation of (electrical poten- tial) energy (Kirchoff’s Voltage law), we get the following differential equation for the current I flowing through the circuit, as function of time t: Vin = IR + L dI dt . (391) Here Vin is the ‘input voltage’, which is constant in time. Q0[Easy]: If you study physics or engineering, derive the above differential equa- tion based on the principles outlined. Hint: All the hard work has already been done. Q1: Rearrange this differntiale equation into standard form, then solve directly using ’separation of variables’ find an ‘integration factor’ and then solve it. Q2: The voltage VR across the resistive element of this circuit (the college fresher), is given by conservation of energy: VR = Vin − VL, (392) where VL is the voltage through the inductor. Combining this with the above differential equation and using a ‘step-voltage’ in- put (meaning Vin(t) = 0 for t 0 and Vin = V0 for t ≥ 0), one should get: VL(t) = V0e−R L t , VR = V0(1 − e−R L t ). (393) Similar to the RC circuit, we can define a special constant – the ‘time constant’ for RL circuit. In particular, we define the time constant τ for an RL circuit to be the time it takes for the voltage to drop across the inductor L, to a factor of 1 e of its initial value. Equivalently, this the time taken for the voltage to rise to within 1 e of its final value across the resistor (fresher) R. Using this information, prove that time constant is given by: τ = L R . (394) 167
  • 168.
    19 Tutorial 17:Second Order Linear Differential Equa- tions In the last two tutorials we reviewed first order ordinary differential equations (ODEs) and how the arose as models of various physical phenomena. In partic- ular, we looked at differential equation of the form: dy dx + P(x)y = 0, (395) where P is some function of the independent variable x. Such an equation was solved using ‘separation of variables’ and integrating both sides. We also studied ODEs of the form: dy dx + P(x)y = Q(x), (396) where P and Q are functions of x. For such equations, we had multiply both sides by an ‘integrating factor’ – a function of the form I(x) = e P(x)dx , (397) in order to express the equation in ‘separable form’, whence we could apply the separation of variables method. However, in general, physical processes may be modelled by differential equa- tions containing higher order derivatives. In special cases like the projectile motion problem – a second order differential equation, it may be possible to recursively apply the ‘separation of variables’ method and integrate multiple times to obtain the general solution. Fortunately, not all second order differential equations are that easy (otherwise they would be boring) – hence we need a general solution strategy. For those of you have already studied second order differential equations, this will be good revision with a twist of applications and some extra insight into the un- derlying mathematical theory. In particular, we connect the mathematics of linear spaces (vector spaces) and differential operators to solutions strategies for second order ODEs. 19.1 Homogenous Second Order ODEs A linear homogeneous second order ordinary differential equation in the de- pendent (response) variable f, is given by: a d2f dx2 + b df dx + cf = 0, (398) 168
  • 169.
    where a, b,c ∈ R are real coefficients and f is a function of independent variable x. To say that the second order ODE (398) is ‘linear’, means that the solution space of the differential equation is two-dimensional vector space (an abstract vector space where functions are vectors). Recall that a vector space (‘linear space’) V is char- acterized by the property that any linear combination of two vectors: v1,v2 ∈ V , is equal to another vector: u = λ1v1 + λ2v2 (where λj are constants) which also lies inside V . In the context of our differential equation, this means that any linear combination of solutions, f1 and f2, to (398) must also be a solution to the differential equa- tion: λ1f1 + λ2f2, (399) where λ1,2 are real or complex coefficients. This means that the solution space to our second order ODE is closed under addition and scalar multiplication – making it a vector space. Note that it trivially contains the ‘additive identity’ or ‘zero element’, given by the function: f(x) = 0. Exercise 62 (Doubting Thomas) The expression ‘don’t be a doubting’ Thomas, comes from Thomas apostle who refused to believe that Jesus had returned from the dead and appeared to the other apostles. He demanded to see evidence with his own eyes – thus remaining skeptical till he could feel and see the wounds the martyr had received during crucifiction. In a dramatic re-adaptation of ancient events, Thomas McKenney refuses to believe that the second order ODE (398) is linear till he sees a proof with his own eyes. Q1: To help Thomas, prove that given any two solutions f1(x) and f2(x) to the ODE (398), that g(x) = λ1f1 + λ2f2 (400) is also a solution, where λ1,2 are real or complex coefficients. Q2: To finish showing that the solution space to (398) is a linear space (vector space), we must show it is a vector space. Show that the ‘zero function’: 0(x) = 0 ∀x ∈ D (401) where D is the domain on which the ODE is defined, is a solution to the ODE. This is called the ‘trivial solution’. It is now instructive to see an explicit example of a second order ODE and its solutions. 169
  • 170.
    Exercise 63 Q1:Verify that f(x) = λ1e (−1+ ?13) 6 x + λ2e (−1− ?13) 6 x , is a solution to the following differential equation: 3 d2f dx2 + df dx − f = 0, (402) where λ1,2 are constant coefficients. Q2(Messy): By defining f0 = f(0) to be f evaluated at x = 0 and f0 := df dx|x=0 to be the derivative of f (with respect to x) evaluated at x = 0, express the constants λ1 and λ2 in terms of f0 and f0. As the last exercise illustrates, it is relatively easy to verify that solution satisfies some differential equation once you have it – but in general, you will have to derive the solution yourself. To do this, we take a brief detour into the land of matrices and linear algebra to obtain a solution algorithm for second order linear ODEs. 19.2 Theory of Linear ODEs In general, the corresponding demonstration can be performed for n-th order linear ODEs with constant coefficients. However, for simplicity, we shall stick to the case n = 2 – second order ODEs. If we are given second order linear differential equation a d2f dx2 + b df dx + cf = 0, (403) you may have noticed (recalling tutorial 16 and 17) that we can re-write this as: Df = 0, (404) where D := a d2 dx2 + b d dx + c (405) is a linear (differential) operator. However, from linear algebra (or tutorial 8), you may recall that linear operators and matrices have one-to-one correspondence – at least for finite dimensional vector spaces. This immediately suggest a relation- ship between linear algebra and differential equations. Now consider the following trick – very similar to the one we used to solve our second order projectile motion differential equation in tutorial 16: f1 :=f, f2 := df dx , (406) 170
  • 171.
    where f isthe dependent variable in our second order ODE (403). If we differen- tiate the above system of equations for f1 and f2, we get: d dx f1 := df x = f2, d dx f2 := d2f dx2 = − b a df dx − c a f = − c a f1 − b a f2, (407) where we have simply used the definitions of f1,2 and expressed f of f and f by re-arranging our ODE (403). This almost looks like a system of linear equations – indeed it is! If we define the 2-by-2 matrix M M = ¢ 0 1 −c a − b a (408) we see that the array (407) can be written as a matrix equation: d dx F = MF (409) where F = ¢ f1 f2 is a column vector containing the functions f1 and f2. Note that the differential operator d dx simply acts on a general matrix by acting on each of its components – so for example, d dx F := ¢df1 dx df2 dx . (410) Therefore, we have reduced our linear second order differential equation (403) to a system of coupled first-order differential equations (407). Intuitively, the form suggests ‘separation of variables’ – something like: dF F = Mdx, (411) then integrating both sides. Strictly speaking, this is not ‘formally’ correct (al- though it can be formalized) – nonetheless, it gives the solution (provided a = 0): F = CT eMt . (412) 171
  • 172.
    Here C =is a 2-dimensional vector of constants – CT = [c1c2] is its transpose, turning it into a row vector (a 1-by-2 matrix). The quantity eMt is the ‘matrix exponential’ of the matrix Mt (M multiplied by t) – it is well-defined provided M has a finite operator norm. For now, you don’t have to worry about what this means – we encountered this object earlier in tutorial 8, where we used Taylor series to express the exponential of a matrix. Here it suffices to note that we can compute the exponential of a matrix easily by diagonalizing it – that is, by finding its eigenvalues (spectrum) and its eigenvectors. If M is of a ‘nice form’, it will have two linearly independent eigenvectors cor- responding to two (possibly distinct) eigenvalues. In the case where there is only one distinct eigenvector, we have to put M into its ‘Jordan normal form’ which is not too difficult, but well-beyond the technicality we we want to delve into 141. Nonetheless, assuming we can diagonalize M, then there exists some matrix U consisting of the eigenvectors of M as its columns, such that: M = UΛU† , (413) where † denotes the conjugate transpose of X (if X is real-valued matrix, this is just the transpose). The matrix Λ is the 2-by-2 diagonal matrix: Λ = ¢ λ1 0 0 λ2 (414) consisting of the eigenvalues λ1,2 of M. Hence, the matrix exponential eMt is given by: eMt = X ¢ eλ1t 0 0 eλ2t X† . (415) In the case the eigenvalues are distinct (non-equal), one will find that the general solution to our second order linear differential equation (403), is given by: f(x) = c1eλ1x + c2eλ2x , (416) where λj are the eigenvalues of the matrix M and cj are constants determined by ‘boundary values’ (f and its derivative at x = 0) or ’initial conditions’ (if the independent variable x represents ’time’). Note, we can consider the functions eλ1x and eλ2x to be ‘basis vectors’ for the 2-dimensional solution space of our ODE – hence any general vector in that space (i.e. any solution) is necessarily some linear combination of eλ1x and eλ2x! 141 Alternatively, one can make use of the ‘Putzer algorithm’ to compute the matrix exponential. In some cases, it is easy to compute directly via Taylor series and brute force – with some intuition. 172
  • 173.
    Exercise 64 (Spectrumof a Linear Second Order Differential Operator) Show that the eigenvalues of the matrix M = ¢ 0 1 −c a − b a (417) which arose in our construction of a general solution to the second order ODE (403), are given by the quadratic equation: aλ2 + bλ + c = 0. (418) Such an equation is called the characteristic equation or auxiliary equation for the second order ODE (403). Hint: Recall that the spectrum of a square n-by-n matrix M is obtained by solving the eigenvalue equation: MF = λF ⇐⇒ (M − λI)F = 0, (419) where I is the n-by-n identity matrix. Such a system has solutions precisely when the determinant of (M − λI) vanishes: det(M − λI) = 0. (420) This gives an n-th order polynomial in λ. For you, n = 2. 19.3 Explicit Algorithm and Illustrations For some of you, the above derivation of a general solution to second order linear homogeneous ODEs for the case of ‘distinct eigenvalues’, may seem a bit abstract. To illustrate an easy-to-remember ‘algorithm’ for solving any such ODE, we will explore a physical example of a second order ODE – the damped harmonic oscil- lator. Such an example is fundamental physics, engineering and many problems in mathematical modellings since a variety of physical processes are governed by identical mathematics. Recall that given an arbitrary linear homogeneous second order ODE a d2f dx2 + b df dx + cf = 0, (421) its corresponding characteristic equation (eigenvalue equation) is given by: aλ2 + bλ + c = 0, . (422) 173
  • 174.
    Solutions to thecharacteristic equation are given by the quadratic formula: λ = −b ± ? b2 − 4ac 2a . (423) As such, solutions depend fundamentally on the sign of the discriminant: ∆ := b2 − 4ac. (424) This gives us the following cases: • Case 1a: Real and Distinct If the discriminant is positive: ∆ = b2 − 4ac 0, , (425) then there are two real distinct eigenvalues, λ1,2 and the general solution to our ODE (421) is given by: f(x) = c1eλ1x + c2eλ2x , (426) where cj are constants to be determined by initial / boundary values. • Case 1b: Complex Conjugate Pairs If our discriminant is negative: ∆ = b2 − 4ac 0, (427) then we will have two complex roots, λ1 and λ2 = ¯λ1 which are complex conjugates of each other: λ1 = − b 2a + i — |b2 − 4ac| 2a , λ2 = − b 2a − i — |b2 − 4ac| 2a . (428) Note that whenever complex eigenvalues appear to a linear ODE of any order (with constant real coefficients), they must always appear in complex conju- gate pairs. This is a consequence of an elementary theorem of polynomials with real coefficients. In this case, the general solution is still given by: f(x) = c1eλ1x + c2eλ2x , (429) except that cj may now be complex coefficients. Using some algebraic ma- nipulations (Euler’s formula), we can express the general solution in a stan- dard trigonometric form with real coefficients k1,2: f(x) = e− b 2a x (k1 cos(ωx) + k2 sin(ωx)) (430) where ω := ?|b2−4ac| 2a . 174
  • 175.
    • Case 2:Real Repeated Roots When the discriminant vanishes: ∆ = b2 − 4ac = 0, (431) we get a real repeated root: λ1 = λ2 = − b 2a . In this case, we may be tempted to write the general solution as: f(x) = c1eλ1x + c2eλ2x = (c1 + c2)eλ1x , (432) but quickly realize that we only one independent function: eλ1x. Since we have a second order linear differential equation, we know that its solution space must be a two-dimensional vector space – thus, in order to create a basis for it, we need another function which is linearly independent from eλ1x, but also a solution of our ODE. Such a function is given by multiplying eλ1x by x, hence we find that the set of functions: {eλ1x , xeλ1x }, (433) spans the solution space142. In particular, the general solution to our ODE is given by: f(x) = c1eλ1x + c2xeλ1x , (434) where cj are constants determined by initial/boundary values. Exercise 65 (Elementary, my dear Waston) In between legendary crime solving stints, Sherlock Holmes gets bored. So instead of solving cases of crime, he solves different cases of second order linear differential equations. Q1: Using Euler’s formula eiωx = cos(ωx) + i sin(ωx), (435) and the identity e− b 2a ±iωx = e− b 2a e±ωx, help Sherlock prove that in the Case 1b of complex conjugate roots, we can re-write the solution (429) in the trigonometric form (430). Hint: This means starting (429) and using Euler’s formula to get it into the form: Stuff1 × e− b 2a x cos(ωx) + Stuff2 × e− b 2a x sin(ωx), (436) where Stuff is some combination of c1 and c2. You then identify Stuff1 and Stuff2 as the constants k1 and k2. If you have done your algebra correctly, the constants kj are necessarily real (given a, b, c are real coefficients). 142 You have realized by now, that the solution space to the second order ODE (421) is simply the kernel of the differential operator D = a d2 dx2 + b d dx + c 175
  • 176.
    Q2: If eλ1xis a solution to our ODE (421) in Case 2 of real repeated roots, prove that xeλ1x is indeed also a solution of (421). To do this, you need to substitute xeλ1x into the left-hand side of (421) and show that it vanishes. We are now ready to consider a physical example of a second order linear differ- ential equation. Imagine a spring placed on a table top, with a mass m at the end of the spring and some wall placed at its other end. If it is stretched in a straight line, by some initial displacement x0 from its equilibrium position x = 0, it will undergo oscillatory (harmonic) motion. However, due to the friction of the table top (and to a lesser extent – the air), this motion will be ‘damped’. The force on the mass due the spring and its displacement x(t) at time t from its equilibrium position x = 0, is given by Hooke’s law: Fspring = −kx, (437) where k is the ‘spring constant’ (related to the elasticity of the spring). The force of friction can be modelled as a ‘linear drag’ at low velocities – meaning it is directly proportional to the velocity v(t) = dx dt of the mass at the end of the spring: Fdamp = −b dx dt , (438) where b is the ’damping constant’ (related to the friction, air-resistance and internal energy loss in the spring). Newton’s second law of motion tells us that the net sum of forces acting on the mass m at the end of the spring is equal to the mass m times its acceleration: a = d2x dt2 . Thus, we get: Ftotal =Fspring + Fdamp =⇒ m d2x dt2 = − b dx dt − kx, (439) hence arriving at the following linear second order ODE with real coefficients: m d2x dt2 + b dx dt + kx = 0. (440) This is a differential equation in the displacement x of the spring as a function of the time t. Exercise 66 (A Divine Comedian) Rather than braving the dark forest and nine circles of hell by himself, Dante called upon the assistance of the ancient Roman poet – Publius Vergilius to see him through. In one of the lost cantos, Dante finds himself in a 10th circle of hell, where he has to solve the damped harmonic oscilla- tor differential equation – however, his guide Virgil, not being educated in 18-19th century mathematics, is unable to help him. 176
  • 177.
    To help Dantereach his beloved Beatrice, you will need to solve the damped har- monic oscillator problem – which I will walk you through. Step 1: Given the DHO equation (440), write down the corresponding charac- terstic equation (eigenvalue equation). Solve this equation using the quadratic formula and write an expression for the discriminant. Step 2: Your discriminant should take the form ∆ = b2 − 4mk. Depending on the mass m, damping constant b and spring constant k, the discriminant may be posi- tive, negative or zero – leading to vastly different motions of Dante’s spring. Define the angular frequency of the motion to be ω = ?|b2−4mk| 2a (noting the absolute value to make it real). Now, write down the general solution for the displacement x(t) of the spring at time t, for each of the three different cases considered earlier (positive, negative and zero discriminant). Step 3: The constants cj or kj that you get in your solutions will be determined by the ‘initial conditions’ of Dante’s spring system. In particular, if we define x0 := x(0) to be the initial displacement and 9x0 := 9x(0) to be the initial velocity (dots indicating derivative with respect to time t), write your undetermined constants cj and kj in terms of x0 and 9x0. Step 4 – Characterising the system: The potential energy stored in Dante’s oscil- lator at displacement x, is given by: U = 1 2 kx2 . (441) When b = 0, there is no damping and this energy is conserved – in particular, it is converted between kinetic energy 1 2m(dx dt )2 and potential energy. In that case, the total energy is given by its initial kinetic energy and potential energy: Etotal = 1 2 kx2 0 + 1 2 m( 9x0)2 . (442) The effect of damping is that the total energy decreases over time – some of it is lost due to friction, converting mechanical energy into thermodynamic energy (heat). Q1: In the case of negative discriminant, you get oscillatory motion (with expo- nential damping). It has angular frequency ω = ?|b2−4mk| 2a . Given this, work out the period of one oscillation. Q2: The damped harmonic oscillator is a resonating system. As such, we can define ‘quality factor’ which characterises ‘how good a resonator’ it is. An ideal 177
  • 178.
    oscillator would loseno energy per cycle of oscillation. We defined the quality factor, in the case of negative discriminant, as: Q = 2π Energy Stored Energy Lost Per Cycle . (443) Defining the ‘damping ratio’ ξ = b 2 ?mk , find an expression for the quality factor Q of Dante’s spring in terms of ξ. Step 5 [SAVE DANTE]: Having determined the solution to a damped harmonic oscillator, Dante decides to call upon the help of Odysseus (In Latin, Ulysses) and Aeneas to help him design a ‘spring catapult’. This will eject Dante from the 10th circle into Paradise, where he can be reunited with his lost love, Beatrice. Assuming Dante weighs m = 70kg and that Paradise is 1000km in the vertical direction from hell’s 10th circle, workout some possible set of values for the spring constant k, damping coefficient b as well as initial displacement x0 and initial velocity 9x0 which will allow Dante to reach Paradise. To do this, assume that Dante is ejected from the Spring catapult when it reaches its maximum displacement (limited by the stretchiness of the string) – at this in- stant, Dante will leave the spring with an upward force of md2x dt2 and some velocity dx dt . Assuming that the force of gravity acts downward on him once he leaves the spring: F = −mg (g = 10m/s2), Dante will need sufficient exit velocity to reach paradise. Hint: Oscillatory motion (negative discriminant) occurs when the motion is under- damped. In this case the spring oscillates about its equilibrium position, with an exponential envelope damping the motion so after sufficient (infinite) time the os- cillations cease – the spring then remains at equilibrium. Over-damped motion occurs when the discriminant is positive and will typically lead to the spring re- turning to equilibrium without oscillation. Hint: This problem is not really well-defined and so might not be (easily) solvable. So you can re-interpret it to give something sensible and solvable. The last exercise should illustrate that even for a deceptively ‘simple’ system as the damped harmonic oscillator, there are a lot of fine details regarding the time- evolution and behaviour of the system. Such behaviour intricately depends on the constants that appear in the second order ODE governing its motion. To finish, we shall look at a more explicit demonstration of second order ODEs. In particular, we will look at the LRC circuit. In particular, recall from tutorial 16 the examples of an RC and LR circuit. These circuits involved a resistive 178
  • 179.
    element R, acapacitor (charged storage device) C and some inductor (coil of wire) L. What happens when you shove all three elements together in a series circuit? To answer this question, you may use Kirchoff’s Laws (conservation of energy and charge) along with the defining expressions for inductance (Faraday’s law), resistance (Ohm’s law) and Capacitance to obtain a dynamical equation for the time evolution of the LRC circuit. Exercise 67 The current I(t) (time rate of flow of charge: I = dq dt ) flowing through any element of a series LRC circuit at time t, is governed by the following second order linear ODE: [ d2 dt2 + R L d dt + 1 LC ]I(t) = 0, (444) where R is the resistance, L is the inductance and C is the capacitance. Q0: If you who have studied circuits, derive the above differential equation. Q1: The LRC circuit is simply a ‘re-hash’ of the damped harmonic oscillator. Looking at the coefficient of dI dt we see that the equivalent ‘damping constant’ is given by R L – correctly suggesting that the resistor acts to ‘oppose’ the flow of charge (current). Further more, if we multiply through by L, coefficient 1 C of I(t) suggests treating the capacitor as some ‘spring constant’ type term. This makes sense in that a capacitor ‘stores energy’ (charge) in the manner that a spring stores potential energy. Solve the differential equation for the three different cases, depending on the sign of the discriminant: ∆ = (R L ) − 4 LC 2 . (445) Now pick some values of I(0), 9I(0), R, L and C and determine a unique solution – this should correspond to only one case. Graph the current I(t) as function of time t for the solution you get. Q2: Define ω0 = 1?LC , to be the ‘natural frequency’ of our LRC system. This is the frequency our system would oscillate at without ‘damping’ (without resistance) – i.e. a perfect resonator. Now define ‘nerper frequency’ α to be: α = R 2L . (446) Re-write the LRC differential equation in terms of ω0 and α. Q3: The ‘damping ratio’ ξ for an LRC circuit characterises the ‘energy loss’ with 179
  • 180.
    respect to theresonating properties of the circuit. It is defined by: ξ = R 2 ™ C L . (447) Re-write this in terms of the natural frequency ω0 and the nerper frequency α. 180
  • 181.
    20 Tutorial 18:Calculus of Vectors and Differential Forms I In this tutorial, we will review the concept of (smooth) ‘vector-valued functions’ and the differentiable landscapes in which they arise. To understand various geo- metric structures and physical phenomena from a contemporary perspective, one will find that it is necessary to call upon the calculus of vector-valued functions. For now, we will restrict ourself to the ‘differentiation’ side of things - in particular, investigating differential operators such as the ‘curl’, ’gradient’, ‘divergence’ and ‘Laplacian’. The vector calculus used by Engineers, chemists and elementary physicists today, is largely due to developments in 19th century mathematics and physics – applied and popularized largely by the work of James C. Maxwell, J. W. Gibbs and Oliver Heaviside. Nonetheless, one can provide an efficient abstraction and generalization of this ‘calculus vectors’ to the calculus of ‘differential forms’ – defined on the ‘exterior algebra’ of vector space. Collectively, this is called ‘exterior calculus’, or ‘Cartan calculus’ due to the work of the great geometer, Ellie Cartan. Such a framework is the natural framework to study modern differential geometry – and by association, mathematical hydrodynamics (fluid mechanics), advanced mechanics, relativistic electromagnetism and general relativity. 20.1 Vector Valued Functions Recall that a real-valued function f of n variables takes a point x = (x1, x2, ..., xn) in Rn and maps it to some real number f(x) = f(x1, ..., xn) ∈ R. In ‘physical’ terminology, such an object is a ‘scalar-valued function’. In modelling natural phenomena, one quickly finds that scalar quantities (such as temperature, speed, distance) are insufficient to describe nature. In particular, many physical quantities are ‘vector-valued’ – for example, displacement, velocity, force, electromagnetic fields and fluid flow. This immediately demands a formal notion of a ‘vector- valued’ function. Definition 11 A vector valued function F on Rn, takes a point p = (x1, ..., xn) ∈ 181
  • 182.
    Rn and mapsit to a vector F(x) ∈ Rn. Mathematically 143, F : Rn → Rn p →F(p). (448) We can represent a vector-valued function several ways. First, given some basis {ej} for Rn, we can represent a vector-valued function simply by its component functions Fj: F(x) = (F1(x), F2(x), ..., Fn(x)) = (F1(x1, ..., xn), ..., Fn(x1, ..., xn)). (449) Note that each component function, Fj(x1, ..., xn) is a function of n variables. More explicitly, we can represent a vector-valued function in a geometrically in- variant form: F(x) = F1 (x)e1 + F2 (x)e2 + ... + Fn (x)en (450) expanding it in terms of the standard basis vectors ej multiplied by the component functions Fj(x). [Recall the convention of ‘raising’ the indices of the component functions and lowering those of the standard basis vectors – the Einstein summation convention explained in tutorial 7]. Example 16 In physics, ‘force’ is formally a vector quantity. For example, given a particle of mass m, the force it experiences depends on the nature of its trajectory through space, via Newton’s second law: F = ma. (451) If we let x = (x1, ..., xn) be the displacement vector of the particle, then Newton’s second law tells us that the force F is a vector-valued function of the displacement x as follows: F(x) = m d2x dt2 = m(:x1, :x2, ..., :xn). (452) Problem 30 (The Garden of Forking Paths) Caught in a surreal twist of George Louis Borges’ ‘garden of forking paths’, a student of the mathematical sciences finds themselves in the middle of a garden. To their left and their right, lay two opposing paths – forking off into darkness. The student is forewarned, that upon 143 Strictly speaking, F maps it to the ‘tangent space’ Tp(Rn ) at p (the space of all tangent vectors at p). However, due to affine parallelism, this is canonically isomorphic to the vector space Rn – so we can ignore the distinction. 182
  • 183.
    selecting one path,the other path will seal itself for eternity. The two paths, ulti- mately lead to two very different futures (and pasts). Thinking that they can cheat nature, the student begins down one path – then quickly reverses towards the opposing path. Unbeknown to the student, is the pres- ence of a sentient surveillance drone – monitoring the choices of the student. In this instance, the paths shift and transform – leading the student into the delusion that they have changed paths without detection. From the point of view of the drone, which maintains constant altitude, the Garden can be represented as a 3-dimensional vector space with the center of the garden as the origin. The drone orbits the garden in a circle of constant radius r, described by the radius vector r = (x, y, 0) – with the Cartesian coordinates x = x(t), y = y(t) being a function of time. The altitude z = 0 is starting (constant) altitude of the drone. Q1: Switching to cylindrical polar coordinates, x = r cos(θ), y = r sin(θ) and z = z, one has the inverse transformations: r = — x2 + y2 and θ = arctan(y x ). Since x, y were functions of t it follows that θ is a function of t. Since r is constant (for circular motion), we have dr dt = 0. Compute the velocity of the surveillance drone. In other words, compute the vector- valued function: v = dr dt , (453) and simplify the expression in terms of x,y and ω = 9θ = dθ dt . Hint: Use the chain rule. Q2: Given circular motion in some plane, the angular momentum of that motion will lie in a direction orthogonal (perpendicular) to that plane. In particular, the angular momentum for the drone’s motion is defined as the vector-valued function: L = r × P, (454) where P = m9r is linear momentum of the drone. The mass m is constant. Compute the angular momentum of the drone. Simplify your expression so that your final result is strictly in terms of m, ω and r – or m,v and r. Hint: v = rω and x2 + y2 = r2. Q3: Recall that the ‘more general’ form of Newton’s second Law, which applies to physical systems beyond classical mechanics, is that given some object with 183
  • 184.
    momentum P, theforce it experiences is defined by: F = dP dt . (455) Using this definition of force, show that the ‘Torque’ experienced by the surveil- lance drone, defined by the vector-valued function: τ = dL dt , (456) simplifies to: τ = r × F. (457) Hint: You can use the ‘product rule’ (Leibniz rule) for differentiation with the cross-product (i.e. scalar derivatives distribute over the cross product). Remember also that dr dt = v = 1 m P and that the cross-product is anti-symmetric – hence, u × u = 0 for any vector u. 20.2 Exterior Calculus By now, in one way or another, most of you will have seen ‘differential forms’ – im- plicitly, or explicitly. At the very least, you will have come across differential forms when separating variables to solve first-order ordinary differential equations. You will have also come across them when integrating. For example, the fundamental theorem of calculus states that given any differentiable function f, one has: df = f + c, (458) where c is a constant of integration. The integrand, df, is an ‘exact differential 1- form’ and the function f is a ‘0-form’. To get some conceptual intuition for differ- ential forms, we now look at the concept of a ‘dual basis’ for a vector space. Definition 12 Given a basis {ej} for an n-dimensional vector space, Rn, one can define a dual basis {θj}, by its action on the original basis. In particular, one whose elements obey the relation: θj (ek) := δj k, (459) where j and k range across 1, 2, ..., n. Furthermore, these elements {θj} form a basis for a vector space – the dual vector space to Rn, denoted by (Rn) . For 184
  • 185.
    For our purposes(since they are isomorphic), we can identify the dual vector space with the original vector space: (Rn) = Rn. Hence, all linear combinations of the dual vectors θj, are also dual vectors. Furthermore, their action distributes over addition and scalar multiplication of vectors: hence θj (aek + bel) = aθj (ek) + bθj (el). (460) Note that by convention, the indices on the vector basis are lowered and the indices on the dual basis are raised. The object δj k is the ‘Kronecker delta’, defined in tutorial 7 as: δj k = 0 if j = k 1 if j = k. (461) Exercise 68 Take the standard basis {e1, e2, e3} for R3 and let {θj)} (where j = 1, 2, 3) be a basis dual to it. Using the previous definition of a dual basis, compute the following quantities: θ1 (e1) = θ2 (e1) = θ3 (e1) = (aθ1 + bθ2 )(e2) = (aθ1 + bθ2 )(e1 + e2) = (aθ1 + bθ2 + cθ3 )(ke3) = . (462) Check these with your tutors. Problem 31 (Explicit Representation) Still running along their chosen path, our student stumbles upon an opening in the garden – which reveals a clear, moon- lit pool filled by a small natural waterfall. Drinking from the pool to refresh themselves, the student leaps back in horror to find that their reflection no longer matches them. At this instance, a Satyr emerges into the clearing and confronts the student: “The image you see, is your dual self. This is your transformation, an explicit future selected by the path you chose.” Q1: As an approximation to the existential crisis faced by our student, we can model 3-dimensional objects using vectors. To obtain a dual model, we simply replace these vectors with their dual vectors. Show that if we represent vectors in R3 as column vectors, then we can explicitly represent their dual vectors as row 185
  • 186.
    vectors – i.e.their transpose144. In particular, by letting e1 = (1 0 0)T , e2 = (0 1 0)T and e3 = (0 0 1)T , compute the quantities in the previous exercise by using this matrix representation along with matrix multiplication (row vectors × column vectors). Remark: Recall that you can write the ‘dot-product’ of two vectors u, v by writing v as a column vector and multiplying it by u written as a row vector (this is actually the dual vector of u). Hence u · v = uT v. In this manner, you can view the action of a dual vector on a vector as the dot-product of two ‘standard’ vectors. As it turns out, for the real space Rn, we can represent the standard unit vectors e1, e2 and e3 in the x,y and z directions, respectively, as ‘tangent vectors’ or ‘partial derivative operators’ in those directions: e1 :=∂1 = ∂ ∂x e2 :=∂2 = ∂ ∂y e3 :=∂3 = ∂ ∂z . (463) This is a formal correspondence145, however for now it suffices to view it in the following intuitive way. Proof 1 (Sketch of Vector-Operator Correspondence) Consider a particle mov- ing with constant unit velocity ( v = 1) along the x-coordinate axis – its trajectory γ(t) = (x(t), y(t), z(t)) (parametrised by time t) is a straight line. From Newto- nian mechanics, we know that its velocity vector at any point on the trajectory is tangent to the trajectory and points in the direction of the motion. Hence at the point γ(t), the velocity vector of the particle is given in geometric form as: v(γ(t)) = dγ(t) dt = ∂γ ∂x dx dt =1 ∂γ(t) ∂x . (464) However, on the left-hand side, we know: v(γ(t)) = 1e1|γ(t) – a unit velocity vector in the x-direction, at the point γ(t). Hence we can view the vector function 144 Incidentally, the matrix transpose operation is an explicit realization of the ‘dual map’ or ‘dual transformation’ – turning vectors into their duals and vice versa. 145 See your tutor if you want a precise explanation. 186
  • 187.
    v as a‘differential operator’ acting on the curve γ(t): v(γ(t)) = ∂ ∂x (γ(t)) (465) to take its partial derivative with respect to x. Since this is true for any point γ(t) along the trajectory γ, This allows us to make the identification: v = e1 = ∂ ∂x . (466) By considering similar motions in the y and z directions, we can make the identifi- cations: e2 = ∂ ∂y and e3 = ∂ ∂z , completing the correspondence. At this point, you might be wondering why this abstraction and formality is nec- essary. It is necessary to establish to establish the correspondence between partial derivative operators (tangent vectors) and differential 1-forms as ‘dual vectors’. For the present, we will restrict ourselves to 3-dimensional vector spaces and func- tions of 3 variables. Note however, that the following is easily generalized to n- dimensional vector spaces, for 1 ≤ n ∞. First, recall the definition of the ‘total differential’ or ‘exact differential’ of a func- tion. Definition 13 (Exterior Derivative of a function) Given a differentiable function f = f(x, y, z) of 3 variables x, y, z, its exterior derivative is given by: df = ∂f ∂x dx + ∂f ∂y dy + ∂f ∂z dz. (467) This is an exact differential 1-form. The operator d acting on f is called the ‘ex- terior derivative operator’ – in this case (acting on functions), it simply coincides with the ‘total differential’. Previously (tutorial 4-6) we viewed the objects dx, dy, dz as ‘infinitesimal’ length elements in the x, y, z directions – mentioning that they were ‘vectors in the ab- stract sense’. What we really meant to say, is that dx, dy, dz are exact differential 1-forms. They arise as the natural ‘dual basis’ for R3. Example 17 Earlier we showed that we can represent the standard basis for R3 as ‘partial derivative operators’ – that is, ej = ∂j := ∂ ∂xj where x1 = x, x2 = y, x3 = z. This allows us to identify {θ1 = dx, θ2 = dy, θ3 = dz} as the dual basis, in following way (definition): θj (ek) = dxj (∂k) := ∂xj ∂xk . (468) 187
  • 188.
    Since the coordinatesx1, x2, x3 are all independent, it follows that: ∂xj ∂xk = δj k, (469) whence the collection {dxj} satisfies the defining property of a dual basis. Exercise 69 (Voices in the wind) After meeting Satyr and viewing their future self, our student is now confronted by a gale – carrying with it, voices from their past life. Amidst this chaotic cacophany, the student hears scattered teachings of di- mensional analysis. Q1: To banish the gale, the student must work out the relationship between the dimensions of the basis vectors ej = ∂j and the dual vectors θj = dxj. Do this. Hint: To compute the dimensional relation required, you can use the definition of a dual basis: θj(ej) = δj k and note that the Kronecker delta is a dimensionless quantity. Q2: You are now told that the differential 1-forms (dual vectors) dxj represent ‘infinitesimal length elements’. For this to make sense, it follows that [dxj] = L, where L is some unit of length. From this and your result in Q1, compute the dimensions of the tangent vector (partial derivative operator) ej = ∂j. The name differential ‘1-form’ is suggestive of the fact that there exists, in general, ‘differential k-forms’, where k is some non-negative integer. Such a suggestion is true – where k has an upper limit of k = n, n being the dimension of your vector space (for us, n = 3). The reason for this limit will become apparently shortly. For now, it necessary it introduce a special ‘product’ between differential forms – the ‘exterior product’. Under this product, differential forms form an ‘algebra’ known as the ‘exterior algebra’. As it turns out, any finite dimensional vector space (e.g. R3) automatically comes equipped with an exterior algebra. Definition 14 (Exterior Product) Given two differential 1-forms, ω and θ, their exterior product ∧ is defined as: ω ∧ θ, (470) which is a differential 2-form. Furthermore, ∧ is characterised by the following properties: 1. Antisymmetry: ω ∧ θ = −θ ∧ ω. (471) 2. Bilinearity: (λ1ω + λ2φ) ∧ θ = λ1ω ∧ θ + λ2ω ∧ φ, (472) 188
  • 189.
    and ω ∧ (λ2φ+ λ3θ) = λ1ω ∧ φ + λ3ω ∧ θ, (473) where λj are real constants and θ, φ, ω are differential forms. Exercise 70 Using the above definitions, compute the following exterior products: 1. dx ∧ dx = 2. (dx + dy) ∧ dy = 3. (xdx + ydy + zdz) ∧ (xdy + ydy + zdy) = 4. (dy ∧ dz) − (dz ∧ dy) = . As noted, the exterior product or ‘wedge product’ of two differential 1-forms, pro- duces an object known as a ‘differential 2-form’. In general, one can define a differential k − form, where the integer k denotes the ‘degree’ of the differen- tial form. Thus, it naturally follows to extend the definition of an exterior product between differential forms of arbitrary degree. To do this, we need to add the ‘associative property. Definition 15 Given differential forms ω, φ, θ of arbitrary degree, the exterior product is associative: ω ∧ (φ ∧ θ) = (ω ∧ φ) ∧ θ = ω ∧ φ ∧ θ. (474) Hence we can omit the brackets. Back in Leibniz’s day, when Bach was reinventing music and Newton was compet- ing for priority in the invention of ‘calculus’, quantities such as ‘dx’ were viewed as ‘infinitesimal length elements’. One attempt to formalize this notion is found in an area of mathematics known as ‘non-standard’ analysis – something akin to the perturbation theory used by physicists. In modern geometry however, quanti- ties such as dx are formalized by the ‘calculus of differential forms’, pioneered by Ellie Cartan. The notion of ‘dx’ is an infinitesimal length element makes some sense, consid- ering it has the correct dimensionality. In this regard, one may view differential 2-forms such as dx ∧ dy as corresponding to an ‘infinitesimal area element’ – in particular, an infinitesimal parallelogram (square) consisting of sides dx and dy. Differential 3-forms such as dx ∧ dy ∧ dz then correspond to ‘infinitesimal vol- ume elements’ – an infinitesimal parallelipiped (box) with edges dx, dy, dz. The following exercises should help formalize this notion. 189
  • 190.
    Exercise 71 Usingthe anti-symmetry, linearity and associative properties of the exterior product, compute/simplify the following exterior derivatives: • (dx ∧ dy) ∧ dx = • (4dx ∧ 9dy ∧ dz) ∧ 3dx = • (dx ∧ dy ∧ dz) − (dz ∧ dx ∧ dy) = • (dx ∧ dy ∧ dz) + (dz ∧ dx ∧ dy) + (dy ∧ dz ∧ dx) = Problem 32 (Lines, Planes and Orientation) Considering that dx and dy are ba- sis vectors for the vector dual space of R3, which is equivalent to R3, one may view the differential 2-form (dx ∧ dy) as an object representing the x − y plane. Lines in x, y, z directions can then be represented by differential 1-forms dx, dy, dz. Q1: When taking the exterior product between (dx∧dy) and any other differential form, only the components orthogonal to dx and dy (the x and y directions) survive. Show that this is true, by computing: ω ∧ (dx ∧ dy) (475) where ω = adx + bdy + cdz is an arbitrary differential 1-form (trivially, only the z-component should survive). Q2: Let 1?2 (dx + dy) and 1?2 (dx − dy) represent the lines: y = −x and y = x. Since these lines are orthogonal, the wedge product of the corresponding differen- tial forms should survive. Compute 1 ?2 (dx + dy) ∧ 1 ?2 (dx − dy). (476) Remark: The result you get should be proportional to dx ∧ dy – which represents the x, y plane. This says that the differential forms 1?2 (dx + dy) and 1?2 (dx − dy) act as a basis for the x-y plane. Note that this makes sense, since if we change these to vectors via the dual map: dx → e1 and dy → e2, we simply get the standard basis vectors rotated by 45 degrees clockwise. Q3: For a 3-dimensional vector space, what is the largest degree that non-zero differential form can have? To answer this question, consider the differential 3- form dx∧dy∧dz and try to compute its exterior product with any other differential form (of degree k ≥ 1). Q4: What happens when you change the order the of the differential 1-forms ap- pearing in dx ∧ dy, dy ∧ dz, dz ∧ dx and dx ∧ dy ∧ dz? 190
  • 191.
    In particular, toreverse the orientation of the y direction we can replace y with −y. Compare dx ∧ d(−y) to −(dx ∧ dy) and (dy ∧ dx) – what do you notice? What relation does this suggest between signs and orientation of coordinates in a differential form? You should notice that the overall sign changes. This is because each differential form inherits the orientation imposed on the underlying vector space. In particular, if we choose our orientation to be ‘right-handed’ (e3 = e1 × e2) then we define: dV = dx ∧ dy ∧ dz, (477) to be the ‘orientation volume form’. Q5: Prove that any differential 3-form on R3 must be a multiple of the orienting volume form, dV = dx ∧ dy ∧ dz. Those of you who completed this tutorial may find some of the final concepts to be vague or abstract – not to worry! Next tutorial, we will make a lot of notions more ‘explicit’ – in particular, by linking exterior products to vector cross products and showing how they generalize the cross product to arbitrary dimensions. Further- more, we shall illustrate a strict duality between lines and planes via the ‘Hodge dual map’ on differential forms. Finally, we shall define the exterior derivative d on differential forms of arbitrary degree and see how we can use this to link the calculus of differential forms to the calculus of vector-valued functions. 191
  • 192.
    21 Tutorial 19:Calculus of Vectors and Differential Forms II In the last tutorial we reviewed the concept of ‘vector-valued’ functions and a ‘vec- tor field’. We also established preliminary notions of ‘exterior algebra’ – that is, the exterior product (wedge product) and differential (exterior) forms. This week, we will continue presenting ideas side-by-side from the 19th Century perspective (vector calculus) and the 20th century perspective (exterior calculus). Although the latter may seem more advanced or abstract, it will become as in- tuitive as the calculus of vector spaces. Overall, these notions are necessary for understanding higher mathematics and quantifying the beauty of nature. 21.1 Gradients and Exterior Derivatives Recall that the derivative df dx |x=x0 of a function f of one variable, gives us the slope of a tangent line to the graph: y = f(x) at the point x = x0 at which we evaluate the derivative. We can generalize this geometric relation between ‘slopes’ of graphs and derivatives to functions of several variables. 21.1.1 Gradients Definition 16 (Gradient) Given a function f : Rn → Rn, (x1, ..., xn) → f(x1, ..., xn) of n variables, its gradient vector field f is given by: f = ∂f ∂x1 ∂1 + ∂f ∂x2 ∂2 + ... + ∂f ∂xn ∂n, (478) where ∂j are the standard (Cartesian) basis vectors146 for Rn. In component form, we can denote f by ( ∂f ∂x1 , ..., ∂f ∂xn ). Note that the symbol is vector differential operator, nabla (derived from ‘nevel’, the Hebrew word for ‘harp’). We can represent in Cartesian coordinates by: = ∂ ∂x1 + ∂ ∂x2 + ... + ∂ ∂xn . (479) 146 You may recall the notation ej or ˆxj for the j − th standard basis vector. Here we use the notation ∂j := ∂ ∂xj , drawing upon the correspondence established in Tutorial 18 between standard basis vectors and partial derivative operators. 192
  • 193.
    Hence, from thisperspective, the gradient f is simply given as the above oper- ator acting on f. Furthermore, is a linear differential operator (since it can be represented as as sum of linear differential operators) – hence it obeys the Leibniz product rule: (fg) = f g + g f, (480) and linearity property: (c1f + c2g) = c1 f + c2 g, (481) for arbitrary differentiable functions f, g. To get an intuition of how the ‘gradient vector field’ behaves, we first illustrate its first fundamental property – the gradient of a function is a vector field which points in the direction of the maximum (positive) rate of change of the function and whose magnitude is equal to the ‘slope’ of the graph of the function in that direction. Example 18 (A Sunburnt Country) Nostalgic over happier times, Dr. Claire Wadding- ton decides to read a poem by Dorothea Mackellar while partying in a desert music festival with Angus Turner: “I love a sunburnt country, A land of sweeping plains, Of ragged mountain ranges, Of droughts and flooding rains.... Realizing that she is surrounded entirely by barren sweeping plains, she recalls her time in Northern England. To this extent, she decides that she can approximate the hills between Durham and Edinburgh by Circular or Elliptical Paraboloids. If we define a function f(x, y) = h−(x2+y2) of two-variables, then the graph of a circular paraboloid in 3-dimensions, is given by: z = f(x, y). Letting z represent the altitude of a point on the hill (whose peak height is h), and (x, y) represent coordinates on 2-dimensional map (restricted so that x2 + y2 ≤ h), we can draw the ‘level sets’ of f or ‘contour lines’ of the hill by drawing the concentric circles, z = f(x, y) for different values of z [Do this as an exercise]. The gradient vector field of f, is given by: f = ∂f ∂x ∂x + ∂f ∂y ∂y = −2x∂x − 2y∂y. (482) This points in direction (x, y), which is the direction opposite the radius vector r = x∂x + y∂y – as such, it is orthogonal to the level sets (circles) of f. Note that z = 0 corresponds to the summit of the hill – which is where the gradient vector field points towards (increasing altitude). Since the gradient field is perpendicular to the contour lines, it points in the direction of the maximum rate of increase of z = f(x, y) – i.e. the steepest ascent up the hill. 193
  • 194.
    The magnitude off is given by its norm: f = 2 — x2 + y2 = 2r. It is equal to the magnitude of the slope of the line z = −2r – the path of steepest ascent. In the last example, we simply stated that f pointed in direction of maximum rate of change without proving it. Other than using properties of contour maps (which follow from said mathematics), we can prove our statement using the con- cept of ‘directional derivative’. Given a function f of n variables, we can define its directional derivative in the direction of the vector v by its dot-product with the gradient vector field of f: Dvf := ( f) · v. (483) This gives us the ‘rate of change of f’ in the direction of v. If we want the rate of change per unit distance, we must normalize v to make it a unit vector: ˆDvf := ( f) · 1 v v. (484) Problem 33 (Cartographer’s Catastrophe) One known property of contour maps in geography, is that the path of steepest ascent is in a direction perpendicular to the contour lines. However, this is precisely the direction of the gradient vector field – hence the gradient vector field points in the direction of the maximum rate of increase of a function (for geographical maps, we have ‘altitude’ as a function of two variables (x, y)). To convince yourself, note that moving parallel to a con- tour line leaves you at a constant altitude – only when you deviate from the contour direction does your altitude begin to change. One day, Carly Fazioli decides to become a Cartographer (map maker) – hereby, changing her name to Carly Cartographer. When purchasing one of her maps, an ex Georgian student of the mathematical sciences study group asks Carly to prove that paths perpendicular to contours are those of steepest ascent. Not having stud- ied mathematics, Carly is in a catastrophe. Help Carly by proving this statement! Hints: • First note what value of θ maximizes cos(θ) for 0 ≤ θ ≤ π. • If we let z = f(x, y) represent some landscape (with Z being altitude), write down the rate of change of z in the direction of an arbitrary 2-dimensional vector v in the x-y plane. Do this in terms of v and f via the formula: a · b = a b cos(θab). • Show that the direction derivative ˆDvf of f, is maximized when v is parallel to the gradient vector field f – i.e. it points in the same direction. 194
  • 195.
    Aside: Those ofyou who paid attention, will quickly notice that we only proved the maximizing property of the gradient vector field – we didn’t actually prove that it is perpendicular to contour lines! To see this in generality, try to understand the following argument. Toy problems aside, gradient vector fields also play a fundamental role in opti- mization theory and physics. In particular, you may recall that the ‘work done’ under a ‘conservative force’ field is ‘independent of the path taken’. This is a di- rect consequence of the general definition of work as the ‘line-integral’ of the force experienced along a path, as well as the ‘Kelvin-Stokes’ theorem and the fact that curl-free fields can be written (with some restriction) as a gradient vector field. One day you will understand the power of these statements, but for the meantime we will focus on idea of ‘conservative forces’ and ‘potentials’. Recall that Isaac Newton derived Kepler’s astronomical laws of orbital motion by postulating his own gravitational force law. In particular, given two massive bodies of mass m and M, the gravitational force that M exerts on m is given by: F = GmM r2 ˆr, (485) where ˆr is a unit vector pointing from m to M, r is the distance between the bodies and G is Newton’s gravitational constant. Hence, since F = ma, the acceleration a experienced by m due to this force, is given by: a = GM r2 ˆr. (486) To see that the gravitational force is conservative, note that it can be derived from a ‘gravitational potential’. In particular, the work done in moving the mass m from a point at infinity to a radial distance r from M is given by: W = r=∞ r=r F · dl = GmM r2 dr = − GmM r . (487) We then define the gravitational potential of the mass M by the function U(r) = 1 m W = −GM r . Knowing this potential alone, we can reconstruct the gravitational force field generated by the mass M. Exercise 72 (May the force be with you) On a voyage through deep space to find the ancient sith empire, the mixed-powers Darth Revan decides to pass the time by deriving Newton’s gravitational force law – a first order approximation to Ein- stein’s theory of gravity. To help Darth Revan, compute the gradient vector field generated by the gravitational potential of a spherical starbase with mass M. 195
  • 196.
    In other words,given U = − GM r , (488) compute U and prove that it is equal to GM r2 ˆr, (489) where r = — x2 + y2 + z2 is the magnitude of the radial vector, r = x∂x + y∂y + z∂z and ˆr = 1 r r is the corresponding unit vector. Hence, the gravitational force field generated by a massive body of mass M, ex- erted on a body of mass m, is given by: F = m U. (490) Hint: It helps to show that ∂ x r−1 = −xr−3. Similarly for ∂ y r−1 and ∂ z r−1. Then factorize the resulting gradient as −r−2r−1(x, y, z) = −r−2ˆr and add appropri- ate constants. To see that the gravitational force is conservative, note that the gravitational po- tential U(r) (and hence work W = mU) generated by the mass M only depends on the radial distance from the gravitational source M – therefore, the work done to move another mass in the field of M only depends on the end-points of its path (initial and final radial distances), but not the path taken. As a bonus, note that the exact same mathematics can be applied to electrostatics (except that you can negative charges but not negative mass). In particular, by replacing the masses m and M with charges q and Q in the above exercise, as well as replacing F = mg with F = qE and Newton’s constant G with Coulomb’s constant C = 1 4π 0 – one can obtain identical results for electrostatics as those for gravity. In this case, the electric potential generated by a charge Q is given by: U(r) = −CQr = − Q 4π 0r . (491) Its gradient U is the electric field generated by the charge Q: E = − U = Q 4π 0r2 ˆr, (492) where r is a (radial) vector pointing in the direction away Q. Note, if Q is a negative charge, E will point towards Q and if Q is positive, the electric field will point away. 196
  • 197.
    21.1.2 Exterior Derivatives Recallin the last tutorial and previous tutorials that we defined the exterior deriva- tive (total differential) of a function f of n variables, in local coordinates (x1, ..., xn), to be: df = ∂f ∂x1 dx1 + ∂f ∂x2 dx2 + ... + ∂f ∂xn dxn , (493) where the objects dxj were formally defined to be the ‘dual basis vectors’ – i.e. a basis for the vector-space of differential 1-forms. These we were related to the standard basis vectors, {∂j} by their action on them: dxj (∂k) = δj k, (494) where δj k was the Kronecker delta. Now notice the similarity between the exterior derivative and the gradient operators: df = ∂f ∂x1 dx1 + ∂f ∂x2 dx2 + ... + ∂f ∂xn dxn f = ∂f ∂x1 ∂1 + ∂f ∂x2 ∂2 + ... + ∂f ∂xn ∂n. (495) Clearly df and f have the same components. In particular df is the dual vector of the gradient vector field f. This duality is formally provided by the Euclidean met- ric g, whose components can be defined by its action on the standard basis: g(∂j, ∂k) = δjk. (496) Since g is a bilinear map (linear in each ‘slot’), we can fill one slot with a vector to obtain a linear map: g(pdj, ), (497) which acts on vectors. For a Cartesian coordinate system in Euclidean space, one simply has: g(pdj, ) = dxj , (498) making the duality between vectors and differential 1-forms trivial. Now recall that an ‘exact differential 1-form’ ω is one that can be expressed in the form ω = df, for some function f. Such a form is uniquely defined up to function whose exterior derivative is zero – i.e. a constant. To see this, note that if we transform f → f +c, where c is constant, then one has: ω = df → d(f + c) = df + dc = df + 0. (499) 197
  • 198.
    In physics terms,the transformation f → f+c is an example of ‘gauge-transformation’ – electromagnetism is one gauge-theory which has this gauge symmetry. Back to mathematics, we see that gradient vector fields f and exact differential 1-forms df have one-to-one correspondence. Clearly if we were to integrate df between two points P1 and P2, by the fundamental theorem of calculus the result must only depend on the value of f at these end points: P2 P1 df = f(P2) − f(P1). (500) It does not depend on the path taken between the two endpoints. Similarly, since the gradient vector field f is dual to df, any line-integral of f must be path- independent. However, note that any conservative force can be expressed as the gradient of some suitable scalar potential: F = − U. The work-done under some force-field F in moving an object from one point to another, is given by the line- integral of F between those points – which in general, depends on the path taken. Since F = − U for conservative force fields, it follows that the work-done must be path independent! Clearly, this duality between the exterior derivative and gradient operator provides a neat way to prove a deep, fundamental result in physics. Moreover, it provides an easy way to generalize classical results from flat 3-dimensional spaces to arbitrary smooth manifolds in arbitrary dimensions. In a similar manner, we can use the ex- terior derivative to prove the claim made earlier that the gradient vector field points in a direction perpendicular to the contour curves (level sets) of a function. 21.2 Divergence Previously, gradient operator allowed us to turn a function into a vector field. In modern speak, it turned a rank-0 tensor into a rank-1 tensor field. A natural ques- tion therefore, is whether there is a differential operator which turns rank-1 tensor fields (vector fields) into rank-0 tensors (functions)? Of course there is. Definition 17 Given a differentiable vector field v = v1∂1 + ... + vn∂n, its diver- gence in the standard Cartesian basis {∂j} for Rn can be expressed as: · v = ∂v1 ∂x1 + ∂v2 ∂x2 + .. + ∂vn ∂xn . (501) 198
  • 199.
    Note that thisdefinition is rather restrictive, since it refers explicitly to a Cartesian basis. Nonetheless, until you begin problems in curvilinear coordinates (spherical and cylindrical polar coordinates for example), it will suffice for most calcula- tions. Some insight into the geometrical significance of the ‘divergence’ operator can be given in 3 dimensions, by its relation to ‘flux integrals’. Note that given a two- dimensional surface S and some vector field v, the flux of v through S is roughly the intensity of the component of v parallel to S – i.e. the rate of flow of some property through S, per unit area. If you think of v as the velocity vector field for some fluid and S as some surface immersed in the fluid, then the flux of through S represents the intensity of the fluid flow through S. The divergence is then the ‘volume density’ of this flux. Definition 18 (3-dimensional Divergence) Given a vector field F, and some point p, let S be some closed surface S containing p bounding a volume V (notation: we write S = ∂V , where ∂ denotes ‘boundary of’). The divergence of F at p is then defined as the limit of the net flow (flux) of F across S = ∂V divided by the volume V enclosed by S, as V collapses to zero: div[F](p) = lim V →p S=∂V 1 V F · dS. (502) Note that the quantity S=∂V is the surface integral over the boundary surface of V and dS = ndS is the outward-pointing normal vector of (perpendicular to) S, whose magnitude dS is the infinitesimal surface area at any point on S. Furthermore, note that this definition of divergence does not depend on the explicit surface chosen – if it did, it would be useless! As such, the divergence measures the ‘source’ or ‘sink’ behaviour of a vector field at any point. To see this explicitly, consider the following example. Example 19 (Fracking Well) Fracking is a process by which water can be pumped deep below the Earth into fine rock formations to build pressure, crack rocks and release natural gas for extraction. At the top of some well site, the diffusion of the natural gas can be modelled by a vector field F = x∂x + y∂y + 100z∂z – i.e. a somewhat upward ‘conical flow’. The units of the components is flow rate of natural gas per second per unit area – i.e. [Mass] [Time][Area] The divergence of this vector 199
  • 200.
    field is givenby: Div[F] = · F = ∂Fx ∂x + ∂Fy ∂y + ∂Fz ∂z =1 + 1 + 100 =102, (503) ignoring units. Since this divergence is positive, the well opening can be viewed mathematically as a ‘source’ for the natural gas vector field. You can plot this vector field to see that it does indeed ‘look’ like a source. If one were to consider another operation – for example, Carbon sequestration, then you may write a carbon vector field as: F = −x∂x − y∂y − 10z∂z at the top of some pump leading underground, for example. Computing the divergence should give you a negative quantity – hence corresponding to the top of the pump being a Carbon sink. Example 20 (Electric Charge) Due to Michael Faraday, we often do calculations in electromagnetism with an artificial quantity – the ‘electric field’ E defined in terms of the force experienced by a positive test charge due to another given charge configuration. In particular, an electric charge Q generates an electric field defined by E = 1 QF, where F is the force experienced (a measurable quantity) by a positive test charge in field of Q. Regardless of the nature of the charge Q, at far enough distances, we can approx- imate the electric field generated by Q as that of a point charge via Coulomb’s law: E = Q 4π 0r2 ˆr, (504) where r = x r ∂x + y r ∂y + z r ∂z is a unit vector pointing outward from Q (placing Q at the origin (0, 0, 0)). Without a second thought, Kate Lindley is told that ‘positive charges’ act ‘sources’ for electric field lines and negative charges act as ‘sinks’. Graphically, this makes sense if we plot E. However, if we try to formalize this notion, one might think that in the neighbourhood of positive charges an electric field has positive divergence – and around negative charges its divergence is negative. Is this really the case? Considering a point-charge at the origin (0, 0, 0) and computing its divergence, we 200
  • 201.
    get: Div[E] = ·E = ∂Ex ∂x + ∂Ey ∂y + ∂Ez ∂z =( 1 r3 − 3(x) r4 xr−1 ) + (...) + (...) = 3 r3 − 3(x2 + y2 + z2) r5 =0, (505) since Ex = Q 4π 0r2 x r = Q 4π 0 xr3, e.t.c. Oh no! What is going wrong here? The real problem is that electric field E in its given form is not ‘defined’ at the origin – technically speaking, this a hole in the solution space to Maxwell’s equations for electromagnetism. If we use Gauss’ law to compute the charge enclosed by a sphere around the origin, then we must take into account that a point charge at the origin has an infinite charge density at the origin and has zero charge density everywhere else. To take this into account, we need the 3-dimensional Dirac delta distribution δ3(r) – which is (informally speaking) defined to be infinity at r = 0 and zero everywhere else. In this manner, we can write: · E = ρ 0 = 4πQδ3 (r). (506) Since V δ3(r)dV = 1 (the volume integral for a volume V enclosing the origin – by properties of the Dirac Delta), we recover Q as the charge by Gauss’ law. Note that the concepts in the last example apply directly to Newtonian gravity, by consideration of the gravitational field and Gauss law applied to gravity. A more satisfying explanation for this phenomena is found readily in ‘De Rham Coho- mology’, relating the concept of ‘closed differential forms’ and ‘non-exact differ- ential forms’ to topology and a generalized notion of ‘charges’ and conservation laws. Problem 34 In fluid mechanics, a fluid is classified as ‘incompressible’ if the di- vergence of its velocity vector field is zero: Div[F] = 0. (507) Similarly, a fluid can be called ‘vortex-free’ if its ‘curl’ is zero (another vector dif- ferential operator). Fluids with vortices have non-zero circulation at various points 201
  • 202.
    – i.e. theline integral of the velocity vector field about a closed loop containing a vortex is non-zero. Loosely speaking, if you integrate around two vortices of equal magnitude rotating in the opposite direction, the circulation is zero – hence establishing the beginnings of a notion of duality between vortices and conserved charges from electromagnetism. Ignoring these concepts for now, consider a fluid with a single counter-clockwise vortex modelled by the velocity vector field: v = −y∂x + ∂y. (508) Such a vector field arises as the curl of the vector field: 1 2 — x2 + y2 + z2∂z. One result of vector calculus is that the divergence of a vector field arising as the curl of another vector field, is zero. Hence a true vortex has zero divergence – in some regime (linear perhaps), they are conserved quantities. Q1:Show that for v = −y∂x + ∂y, one does indeed have Div[v] = 0. Q2: Given an arbitrary vector field F = Fx∂x + Fy∂y + Fz∂z expressed in a Cartesian coordinate basis, its curl is a vector field defined by the expression: × F = ¢ ∂Fz ∂y − ∂Fy ∂z ∂x + ¢ ∂Fx ∂z − ∂Fz ∂x ∂y + ¢ ∂Fy ∂x − ∂Fx ∂y ∂z. Using this expression as well as the definition of the divergence, prove that Div[ × F] = ·( ×F) = 0. This establishes our previous statements in some generality. Hint: You will need to Clairaut’s theorem – that is, partial derivatives of a (‘ap- propriately defined’) function commute. 21.3 Hodge Dual, Closed and Exact Forms For next time ... perhaps. 202
  • 203.
    22 Tutorial 20:Calculus of Vectors and Differential Forms III In the last tutorial, we developed the concept of the ‘gradient vector field’ f generated by a function f and demonstrated its duality to the exterior derivative df of the function f. We then went through several exercises and problems illustrating that the gradient vector field is orthogonal to the level sets of a function and that it points in the direction of the maximum rate of increase of that function. We then related gradients to ‘potentials’ and conservative forces – in particular, the Newtonian gravitational force and electric field / electrostatic force law generated by a point charge. Furthermore, we investigated the concept of the ‘divergence’ of a vector field – a type of derivative operator which turns vector fields into functions. This was defined as the net flux of a vector field at a point, through some imaginary closed surface bounding that point – divided by the volume enclosed by the surface as the surface collapsed to zero. Hence the divergence was an operator that measured the the ‘source /sink’ characteristics of a vector field at any given point. 22.1 Sleight of Hand In the last tutorial, we proved that the gradient vector field f pointed in the di- rection of the maximum rate of increase of f – and that its magnitude | f| was equal to the magnitude of the maximum rate of change of f. With some differential geometry, it then follows that the gradient vector field is orthogonal to the level sets (contours) of f – however, we didn’t explicitly show this. A less advanced proof (using vector calculus), is illustrated as follows. First, we state two key ingredients (without proof): • An n − dimensional smooth surface can be described by a family of n orthogonal curves. Equivalently, the surface can be described by n linearly independent tangent vector fields – these can be constructed to be mutually orthogonal via the Gram-Schmidt process. • The ’Implicit Function Theorem’ for a function of n variables. By now, you should have covered this in class – if not, it can be found in any (decent) calculus textbook. Roughly speaking, this guarantees that if the differential df (or gradient f) of a function f is non-zero on some open set, then the level sets (contours) of f (graph of the set f(r) = constant) exist. 203
  • 204.
    Proof 2 (Orthogonalityof Gradient and level sets) Given a function f on n vari- ables, (x1, ..., xn) ∈ Rn, one defines a ‘level set’ of f (generalization of the notion of ‘level curves’ to ‘level hypersurfaces’) as the set of points such that: f(x1 , ..., xn ) = c, (509) for some chosen constant c. Therefore, the family of level of sets of f is a parameter family generated by c (as c varies) – the union of this family is thus the ‘contour graph’ or ‘level graph’ of f. If the gradient f of f (or exterior derivative df) is non-zero at some point y = (y1, ..., yn) (with f(y) = c), then the implicit function theorem implies that the pre-image of C is a submanfiold of Rn – i.e. the set {x ∈ Rn : f(x) = c} is a smooth hypersurface in Rn. This means it is generated by n − 1 independent curves, with well-defined tangent vector fields (velocity vectors) to each curve. With the existence of a level set established, we parametrized one of the n−1 curves in the level set by the vector function rc(t) = (x1(t), ..., xn(t)), where c labels the level set (contour) and t is our parameter. Evaluating our original function f on this curve and taking its exterior derivative t, we find df = ∂f ∂t dt =[ ∂f ∂x1 dx1 dt + ... ∂f ∂xn dxn dt ]dt. (510) However, we know that f(rc(t)) = c (by construction of the level set), hence it follows that: df dt |rc(t) = 0. Hence, we have: [ ∂f ∂x1 dx1 dt + ... ∂f ∂xn dxn dt ] = 0. (511) Now notice that the quantity on the left-hand side of this equation is simply the dot product of the two following vectors: f = ∂f ∂x1 ∂1 + ... + ∂f ∂xn ∂n 9rc = dx1 dt ∂1 + ... + dxn dt ∂n, (512) evaluated along the curve given by rc(t) in a level set of f labelled by c. The vector f is the gradient of f and 9rc is the tangent (velocity) vector to the curve rc(t). Since · 9rc = 0 as shown above, it follows that the gradient is orthogonal to the curve rc(t). Repeating this argument for all other curves generating the level set 204
  • 205.
    (hypersurface) f(x) =c, it follows that the gradient vector field is orthogonal all the level curves of f in this level set – hence it must be orthogonal to the level set (hypersurface). Applying the above result to the case n = 2 for contour maps – i.e graphs of the form z = f(x, y), we see that there is only one level curve (n − 1 = 1) generating each level set of f. Hence the gradient vector field f possesses two properties – it is orthogonal to the contour lines of f and it points in the direction of the maximum rate of increase of f (its magnitude f specifying this rate). 22.2 Curl of a Vector Field We already know one derivative operator that acts on vector fields – the divergence. This turns a vector field into a function. However, one may also ask for operation that preserves the ‘tensor rank’ of a vector field – i.e. a differential operator that turns vector fields into vector fields. One such operator is ‘curl’, denoted by ×. Such an operator generates ‘vorticity’ (rotation) in a vector field – i.e. clockwise or counterclockwise rotational flow behaviour. We shall proceed by first giving a ‘geometric’ definition of the curl, then provide a formula to calculate the curl of a vector field in Cartesian coordinates. N.B. – The curl is only defined in three dimensions, just like the cross product.147 Definition 19 (Curl of a Vector Field) Given a vector field F, its curl × F is defined implicitly as follows. Given a unit vector ˆn normal (orthogonal) to some (imaginary) surface S, the component of × F in direction of ˆn is defined as the ‘circulation per unit area’ of a curve C = ∂S bounding S as the surface S collapses to zero: ( × F) · ˆn := lim |S|→0 1 |S| C=∂S F · dr. (513) Here, the ‘circulation’ is the line integral C=∂S F · dr (with positive anti-clockwise orientation) of our vector field F around the curve C = ∂S bounding the surface S. Intuitively, the ‘circulation’ (line integral defined above) of a vector field mea- sures how much a vector field ‘rotates’ (circulates) at any given point. Since the 147 Such an operator is naturally extended to arbitrary dimensions by a combination of the ‘exterior derivative’ and ‘Hodge dual’ operators. 205
  • 206.
    components of thecurl are the area density of some infinitesimal circulation (in each direction), the curl of a vector field must be zero if it exhibits no rotational behaviour (no circulation). Here are some pictures pillaged from the internet to illustrate this. Figure 5: Curl of a vector field F. Figure 6: Projections (components) of the curl defined geometrically as infinitesi- mal area densities of line integrals around the boundaries of imaginary (contrived) surfaces. 206
  • 207.
    To get aclearer idea, we shall now give an algebraic definition of the curl as well some examples and problems. Given a vector field148 F = F1∂1 + F2∂2 + F3∂3, expressed in the Cartesian basis {∂1, ∂2, ∂3} = {∂x, ∂y, ∂z} for R3, its curl is given by the following formula: ( × F) = ijk (∂iFj)∂k = ijk ( ∂Fj ∂xi )∂k =( ∂F3 ∂y − ∂F2 ∂z )∂x − ( ∂F1 ∂x − ∂F3 ∂z )∂y + ( ∂F2 ∂x − ∂F1 ∂y )∂z. (514) Note that in the first line, we used the Einstein summation convention as well as the ‘Levi-Civita’ or ‘Permutation’ symbol – defined in earlier tutorials. These give an elegant way to remember the curl. Alternatively, to remember the explicit formula in the last line, you can think of the curl informally as the determinant of a 3-by-3 matrix with a row of the standard basis vectors [∂x∂y∂z] in the first line, the partial derivative operators [ ∂ ∂x ∂ ∂y ∂ ∂z ] in the second line and a row of the components of F in the third line: [F1F2F3]. The derivative operators in the second line act on the components in the third line of the matrix (taking partial derivatives of F), whilst the basis vectors in the first row multiply everything – this ensures that the result is a vector field (not a function): ∂x ∂y ∂z ∂ ∂x ∂ ∂y ∂ ∂z F1 F2 F3 . We now proceed to some examples and problems. Example 21 (Curl of a Gradient) In an ideal gymnasium, nobody in their right mind curls in the squat racks. Similar to the ideal gym squat rack, a gradient vector field has ZERO CURL. This is an extremely important and fundamental property of gradient vector fields – one which is responsible for the statement that the ‘work done to move a particle in a conservative force field, is independent of the path taken’. In other words, it relates to path independence of line integrals of gradient vector fields (with some minor technicalities). To prove this statement, consider a function f of three Cartesian variables (x, y, z). 148 Recall that ∂j := ∂ ∂xj is both a differential operator (partial derivative) and the standard basis vector in the direction of the Cartesian coordinate xj . 207
  • 208.
    Its gradient vectorfield is given by: f = ( ∂f ∂x )∂1 + ( ∂f ∂y )∂2 + ( ∂f ∂z )∂3. (515) The curl of this vector field is then given using the Cartesian formula, stated earlier: × ( f) =(∂2(∂3f) − ∂3(∂2f))∂1 − (∂1(∂3f) − ∂3(∂1f))∂2 + (∂1(∂2f)∂2(∂1f))∂3 =( ∂ ∂y ∂f ∂z − ∂ ∂z ∂f ∂y )∂1 − ( ∂ ∂x ∂f ∂z − ∂ ∂z ∂f ∂x )∂2 + ( ∂ ∂x ∂f ∂y − ∂ ∂y ∂f ∂x )∂3 =0. (516) Note that we are still using the notation ∂j = ∂ ∂xj . To get from the second line to third line in the above derivation, we had to make use of Clairaut’s theorem – i.e. the fact that partial derivatives (in Cartesian coordinates) commute: ∂ ∂xj ∂ ∂xk f = ∂ ∂xk ∂ ∂xj f, (517) for arbitrary j, k = 1, 2, 3. For this commuting property to hold, it suffices that the second order partial derivatives of f exist and are continuous. Although we chose Cartesian coordinates for R3, we could have chosen any set of coordinates (with an appropriation modification to the curl formula) and arrived at the same general result: × ( f) = 0, (518) which holds ∀f satisfying the conditions of Clairaut’s theorem. Alternatively, a very elegant and far more general proof can be found using exterior calculus via the exterior derivative d and hodge dual : d(df) = d2 f = 0, (519) since d2 = 0. Hopefully we can cover this in a future tutorial. Problem 35 (Faraday’s Law of Induction) The Scottish Mathematical Physicist, James Clerk Maxwell, is perhaps most immortalized by through the ‘Maxwell equations’ for electromagnetism. Technically, these laws were derived by other scientists / mathematicians such as Faraday, Ampere and Gauss – however, their vector-calculus form is due to the work of Maxwell. In this form, they are im- plicitly relativistically invariant – a symmetry that helped spurr the discovery and development of special relativity. 208
  • 209.
    One of Maxwell’sequations, is a statement of Faraday’s law of induction. This says that a time-varying magnetic field B is generated by an electric field E with non-zero curl: × E = − ∂ ∂t B. (520) Now recall that the ‘Coloumb field’ – i.e. an electrostatic field generated by a point charge, can be written as the gradient vector field of some potential (tutorial 19): E = Q 4π 0r , (521) where 0 is the electric permitivity of free space. Q1: Using an earlier result from this tutorial, prove that as a consequence of Fara- day’s law, that a point charge cannot generate a time-varying magnetic field. In other words, show that any magnetic field arising from a point charge is necessarily static. Hint: Static here means that ∂ ∂t B = 0. Q2: Instead of the electric field generated by a point charge, we now consider the following static electric field: E = −y∂x + x∂x. (522) Draw a graph of this electric field in the x − y plane, then compute its curl: × E = (523) Now, using Faraday’s Law, solve the resulting vector differential equation for the magnetic field B. Hint: The differential equation is trivial. All you need to do is integrate over time t. Since the electric field is static, such an integration is simple. Q3: Repeat the previous question, this time adding a harmonic time dependence to the electric field: E = −yeiωt ∂x + xeiωt ∂x, (524) where ω is the angular frequency of the electric field. 209
  • 210.
    23 Tutorial 21:Coordinate Systems and Scale Factors By now, all of you will have come across more than one type of ‘coordinate sys- tem’. For example, in two dimensions you will have used rectangular (Cartesian) coordinates, (x, y), as well as ‘polar coordinates’, (r, θ). Depending on the sym- metries of your problem, each coordinate system would have had its advantages and disadvantages. In general, there are an infinite number of coordinate systems you could use to set- up a problem. However, for spaces such as R2 and R3 equipped with a Euclidean metric, there is a special (finite) class of coordinate systems known as ‘separable’ coordinate systems. Such a term arises from the fact that the Laplace operator is separable in these systems – meaning that the Laplace equation is a differen- tial equation that can be solved by ‘separation of variables’. More generally, the Hamilton-Jacobi equations are separable in such coordinate systems. In this tutorial, we will explore a few examples of different 2-dimensional and 3- dimensional coordinate systems. In particular, we will develop several concepts and ideas in a ‘geometrical viewpoint’. This should help you gain some physical intuition behind objects such as the ‘Jacobian determinant’ as well as change-of- variables. 23.1 Orientation and Measure Recall that if we are integrating a function f = f(x) of one real variable x ∈ R, we write: I = L f(x)dx. (525) The L here denotes the subset of the real line R which we integrate over. For most integrals, this is just some interval149 L = [a, b]. One way to view the process of integration is in terms of an operator (a ‘measure’) acting on a function. In particular, we can view the previous integral I as the operator L dx (526) acting on the function f. This assigns some value to f – it’s measure (Riemann integral) on the set L (e.g. over an interval L = [a, b]). Such an abstraction, 149 Note that strictly speaking, it doesn’t matter whether or not you include the endpoints – if the integral exists / converges, then you can take a limit which features the endpoints. 210
  • 211.
    turns out tobe very powerful and useful – for example, the study of probability measures. Collectively, it is part of a beautiful area of mathematics known as ‘measure theory’. Fundamental to the construction of a measure L dx on the real line R, is the exis- tence of the object ‘dx’. By now, you should know that this is not just some hazily defined ‘infinitesimal’ quantity along the x − axis – it is in-fact a well-defined ‘differential 1-form’. Geometrically, not only does it represent an infinitesimal line element in the x direction, it also represents an ‘orientation’ on the x-axis. This orientation is in the positive x direction. Technically, we could assign an opposite orientation by defining dl = −dx in our definition of the measure, L dl. For one-dimensional integrals, the notion of orientation may seem trivial. How- ever, generalizing to integrals over surfaces, volumes and general n-dimensional oriented manifolds, there are always two possible choices of ‘orientation’ defined – this is encoded in an object known as the ‘orienting n-form’. 23.2 Smooth Curves When we perform an integral f(x)dx, we are integrating the function f along the x-axis representing the real line R. However, in general, one can integrate functions along an arbitrary curve. Such integrals form a class known as line inte- grals. A curve is a 1-dimensional manifold, meaning it can be parametrised by one vari- able. In a Euclidean space such as R2 or R3, we can represent this curve by the functional equations (in the standard Cartesian basis): γ(t) = x(t)e1 + y(t)e2, t ∈ L (527) and γ(t) = x(t)e1 + y(t)e2 + z(t)e3, t ∈ L (528) where ej are the standard basis vectors in the x, y and z directions and L is some subset of R – e.g. L = [0, 1] or L = R itself. In this manner, we can view γ(t) as the position vector for some motion. This means, that for each value of t, we assign a vector γ(t) which starts at the origin 0 and points to some location on the curve.150 150 Such a construction is only possible for affine spaces – such Euclidean vector spaces. In general, one must be more subtle in defining and representing a curve. 211
  • 212.
    At each pointγ(t0) on the (smooth) curve γ, there is a unique vector tangent to the curve at that point. This vector, also known as the ‘velocity vector’, is given by: 9γ(t0) = dγ dt |t=t0 = lim ∆t→0 γ(t + ∆t) − γ(t) ∆t . (529) Exercise 73 (Constructing a Tangent Vector Field) At this point, the dissatisfied tutorial member asks for their money back – not having gained any geometrical intuition. To this extent, they are given the following exercise to construct a tangent vector field. 1. Take Cartesian coordinates x, y, z for R3 and draw a curve γ(t) = (x(t), y(t), z(t)) in R3. 2. Label two separate points t0 and t1 along the curve and draw the cor- responding position vectors γ(t0) and γ(t1) along the curve. Note that both these vectors start at the origin and point to (x(0), y(0), z(0)) and (x(1), y(1), z(1)), respectively. 3. Draw the displacement vector, ∆γ = γ(t1) − γ(t0), using your vector sub- traction rules. 4. Recall that the linearisation of a function f about some point t is given by its first order Taylor expansion: f(t + ∆t) ≈ f(t) + df dt ∆t. (530) Similarly, we can define the linearisation of the curve γ about a point t as follows: γ(t + ∆t) := x(t + ∆t)e1 + y(t + ∆t)e2 + z(t + ∆t)e3. (531) Using the previous result (530), expand each of the coordinates x(t), y(t), z(t) about the point t on the curve. Hence obtain an expression for γ(t + ∆t) and collect the coefficients for each of the standard basis vectors ej. 5. Using your previous result, simplify the expression on the right-hand side of the following: ∆γ = γ(t + ∆t) − γ(t). (532) Your result should involve a factor of ∆t multiplying everything on the right- hand side. 212
  • 213.
    6. Dividing bothsides by ∆t, you should get a vector on the right-hand side which does not involve ∆t: ∆γ ∆t = γ(t + ∆t) − γ(t) ∆t . (533) Now taking the limit ∆t → 0, we can define: 9γ(t) = lim ∆t→0 ∆γ ∆t . (534) If you’ve done this exercise correctly, you should see that: 9γ(t) = dγ dt (t) = dx dt e1 + dy dt e2 + dz dt e3. (535) 7. In your diagram, let ∆t = t1 − t0. If you imagine the limit t1 −→+t0 (equiva- lently, ∆ → 0) you will see that the displacement vector ∆γ = γ(t1)−γ(t0) approaches a vector which is parallel (tangent) to the curve at t0. Hence the ‘velocity vector’ 9γ(t0) and tangent vector to the curve at t0 are the geomet- rical object. If we now wish to consider integrals along an arbitrary curve γ, we need an ‘orient- ing 1-form’ along the curve. Recalling to integral of a function f of one variable along the x-axis, we had dx as our orienting 1-form. For a curve γ, the orient- ing 1-form will geometrically correspond to an infinitesimal displacement along the curve. From the previous exercise, we already know that infinitesimal changes along a curve γ, are represented by the tangent (velocity) vector field dγ dt . In par- ticular, the magnitude (norm) of the tangent vector field: 9γ(t) = ™ ( dx dt )2 + ( dy dt )2 + ..., (536) is simply the rate of change (gradient, slope) of the curve at t in the direction of increasing t. Hence, an infinitesimal displacement dl along the curve is given by: dl = dγ = dγ dt (t) dt (537) This is simply a consequence of the chain rule – or in geometric terms, a conse- quence of the graph you drew in the previous exercise. The object dl given by (537), is the orienting 1-form for the curve γ. It is also known as the ‘line-element’ along the curve – as such, it is a generalization of the orienting 1-form dx along 213
  • 214.
    the x-axis, consideredearlier. Note that the structure of the equation (537) takes the form: dl = (Some Factor) × (Infinitesimal Change in Some Parameter). (538) Here our curve γ was parametrised by t, with the corresponding 1-form dt. The factor appearing in front of dt was given by the magnitude dγ dt (t) of the tangent (velocity) vector to the curve γ at the point t. Such a factor is called ‘scale factor’ – usually denoted hγ. A scale factor serves to turn the 1-form dt into an infinitesimal length dl. As such, one may assign the scale factor units of length and treat dt as dimensionless151. Furthermore, dl encodes an orientation for the curve γ – positive in the direction of increasing t. If we defined dl as the negative of dγ dt (t) dt, we would get the reverse orientation. Having constructed a way to get infinitesimal length elements along arbitrary curves (along with scale factors), one may generalize this to obtain infinitesimal area el- ements for smooth, orientable surfaces. Later we will look at the notion an ‘ori- enting 2-form’ – an object which encodes both the orientation of a surface, along with an infinitesimal notion of ‘area’ along that surface. This allows us to perform integration on surfaces and hence define measures on them (such as the ’surface area’). A surface is a 2-dimensional manifold. This means that it can be parametrised by two variables, (s, t). For most problems, you will look at surfaces which are embedded in R3 – as such, their graphs will be specified by a set of 3-coordinates parametrized by two variables: (x(s, t), y(s, t), z(s, t)). For now, note that when we change coordinates (x, y) from Cartesian coordinates to another set of coordi- nates s, t, the infinitesimal area element dx ∧ dy (orienting 2-form) has to change also: dx ∧ dy → |J|ds ∧ dt. (539) Here, the quantity |J| is the Jacobian determinant. For orthogonal coordinate sys- tems, this is simply equal to the product of the scale factors in each direction ds and dt: |J|= hsht. (540) It arises because in Cartesian coordinates, dxdy represents the area of an infinites- imal rectangle. However, in a new coordinate system (s, t), the element dsdt may not represent an infinitesimal area (for example, it may have the wrong units) – hence to turn dsdt into an area, we need to multiply it by an appropriate scaling 151 Alternatively, if 9γ is a physical velocity, one would assign units of length/time to its magnitude and assign units of time to dt. 214
  • 215.
    funtion (the Jacobiandeterminant). Another way to look at this, is to note that an infinitesimal area in the new coordinate system is given by the formula: dls ∧ dlt = hsds ∧ htdt = hshtds ∧ dt. (541) where dls and dt are defined as above. We will investigate this in more detail in a future tutorial, but for now it suffices to remember the relation (540) between the Jacobian determinant and the coordinate scale factors. To get some operational understanding of orienting forms and how they arise in different coordinate systems, consider the following example. Example 22 (Parabolic Coordinates) In two-dimensions, instead of using Carte- sian coordinates (x, y), one may choose to use parabolic coordinates. Parabolic coordinates are useful for problems with some sort of parabolic symmetry – for example, investigating the ‘Stark effect’ (splitting of the spectral lines of an atom in a strong electric field). Parabolic coordinates 152 (σ, τ) are a two-dimensional orthogonal coordinate sys- tem, in which the coordinate curves are parabolas. Such coordinates are defined implicitly as follows: x = στ, y = 1 2 (τ2 − σ2 ). (542) Eliminating τ, we see that curves of constant σ correspond to confocal parabolas (parabolas with the same focus) opening upward in the positive y direction: y = 1 2σ2 x2 − 1 2 σ2 . (543) Similarly, eliminating σ, curves of constant τ correspond to confocal parabolas opening downward in the negative y direction: y = − 1 2τ2 x2 + 1 2 τ2 . (544) We now wish to derive the scale factors, hτ , hσ, corresponding to infinitesimal dis- placements dτ and dσ in the τ and σ directions, respectively. To do this, consider a curve along the τ coordinate – meaning that we keep σ constant. We can write this curve as γ(t) = (x(t), y(t)) = (στ(t), 1 2 ((τ(t))2 − σ2 ), (545) 152 Note that τ is the greek letter ’tau’ - not the variable t. 215
  • 216.
    where τ =τ(t) is a function of the parameter t. Recalling our expression, dl = hγdt = dγ(t) dt dt for an infinitesimal displacement along a curve γ, we can work out the scale factor hτ for the τ coordinate by getting the magnitude of a vector tangent to the τ coordinate curves. Mathematically, we have: dγ dt = (σ dτ dt , τ dτ dt ), (546) using the product rule and the fact that σ is constant along the τ coordinate curves. Hence, the magnitude of this vector is given by: dγ dt = ™ (σ2 + τ2)( dτ dt )2 = — σ2 + τ2 dτ dt . (547) Hence, using the chain rule, we see that the infinitesimal length dlτ in the τ direc- tion is given by: dlτ = dγ dt dt = — σ2 + τ2 dτ dt dt = — σ2 + τ2dτ. (548) The coefficient of dτ is identified to be the scale factor hτ corresponding to τ. Hence we have: hτ = — σ2 + τ2. (549) By considering the σ coordinate curves (curves of constant τ), we can derive the scale factor hσ in the same way. The result is: hσ = — σ2 + τ2 = hτ . (550) With these results in mind, the orienting area 2-form in parabolic coordinates is given by: dA = |J|dσ ∧ dτ = hσhτ dσ ∧ dτ = (σ2 + τ2 )dσ ∧ dτ. (551) The Jacobian determinant |J|= hσhτ = (σ2 + τ2) represents how the notion of area is warped in a Parabolic coordinate system. 153 Using the previous example as a template, consider now your familiar and well- loved 2-dimensional polar coordinates (r, θ). 153 Note that areas themselves are geometrical quantities. Therefore, they do not depend on the choice of coordinate system. How we measure and compute areas however, does change. 216
  • 217.
    Exercise 74 (ArcticRenaissance) During a performance of the guitar orchestra piece ‘Arctic Renaissance’, a lost bipolar polar bear wanders into the St. George’s College Mathematical Sciences Tutorials. It turns out that the polar bear is lost because it did not take into account the scale factors in a polar coordinate system – thus grossly miscalculating its journey. I: To help the bipolar bear find his way home, consider the change of variables (x, y) → (r, θ) defined by: x = r cos(θ), y = r sin(θ). (552) Now derive the scale factors hr and hθ for each of the coordinate curves, r and θ. II: Draw a picture illustrating the relation between dlθ = hθdθ and dθ. You should see that dlθ is simply the formula for the length of an infinitesimal circular arc. III: Compute the orienting area 2-form for polar coordinates: dA = |J|dr ∧ dθ. Illustrate this with a diagram showing the area of an infinitesimal circular wedge (actually, an incomplete annulus). IV: For those of you who have studied Jacobian maps (matrices), compute the Jacobian matrix for the change of variables (x, y) → (r, θ). Now show that its determinant is indeed given by |J|= r. Hint: Ask your tutor for help! Exercise 75 (Temporal Epilepsy) Having heard about the successful return of the bipolar polar bear to his homeland, an epileptic ellipse named ’Eclectic’ walks into the SGC Mathematical Sciences Tutorial. It turns out that ellipses epilepsy comes from having the incorrect scale factors for elliptical coordinates programmed into its DNA (a result of natural radiation-induced mutations). To help Eclectic, consider the an elliptical coordinate system (µ, ν) defined by: x = a cosh(µ) cos(ν), y = a sinh(µ) sin(ν). (553) I: Show that the µ coordinate lines (curves of constant ν) form hyperbola. You can do this by eliminating ν from the above equations, as well as using the identity: cosh2 (µ) − sinh2 (µ) = 1. (554) Similarly, show that the ν coordinate lines (curves of constant µ) form ellipses. Hint: For the µ coordinate lines, you should arrive at the equation: x2 a2 cos2 ν − y2 a2 sin2 ν = cosh2 µ − sinh2 µ = 1. (555) 217
  • 218.
    Note that theseellipses and hyperbolae are confocal – i.e. they have common focii located at x = −a and x = a on the x-axis. II: Using the same approach as the example in Parabolic coordinates, derive the orienting 1-forms dlν and dlµ. From these, work out the scale factors hν and hµ for the elliptic coordinate system. Hint: You should get: hµ = hν = a ˜ sinh2 µ + sin2 ν = a ˜ cosh2 µ − cos2 ν. (556) III: Using the scale factors derived, compute the orienting area 2-form for ellipti- cal coordinates: dA = |J|dµ ∧ dν = hµhνdµ ∧ dν. (557) If you can, draw a diagram illustrating the infinitesimal area element (in a manner similar to what you did for the polar coordinates) IV: For those of you who have studied Jacobian matrices, show that the deter- minant of the Jacobian matrix for the transformation: (x, y) → (µ, ν) is given by: |J|= a2   sinh2 µ + sin2 ν ¨ . (558) In the next tutorial, we will see how these results can be used to perform line integrals (integrals along curves), surface integrals (integrals along surfaces) and integrals of over arbitrary submanifolds of Rn (e.g. Volume integrals in R3). In this manner, we will formalize the notion of the ‘orienting area 2-form’ and generalize it to give an ‘orienting volume n-form’. 218
  • 219.
    24 Tutorial 22:Line Integrals and Exterior Calculus In the last tutorial, we reviewed the concept of tangent vector fields in their relation to the coordinate curves generating a coordinate system. In particular, we saw sim- ple geometric considerations could help us to understand how ‘scale factors’ arise in morphing infinitesimal length elements from one coordinate system to a new coordinate system. Such scaling functions appeared as the norms (magnitude) of the tangent vector fields associated to the coordinate curves in the new coordinate system. In particular, given a change of variables between Cartesian coordinates and some new orthogonal coordinate system, (x, y, z) → (u1, u2, u3), our infinitesimal length elements change from (dx, dy, dz) to (h1du1, h2du2, h3du3) where the functions hj (j = 1, 2, 3) are the ‘scale factors’ for the transformation. Such scaling fac- tors turn infinitesimal changes in the coordinates (du1, du2, du3) into infinitesimal length changes (dl1 = h1du1, ..., dl3 = h3du3). As we saw, they can computed as the norms of the tangent vector fields to the coordinate curves – viewing u1, u2, u3 each as a curve parametrized by a single variable: hj = 9uj = ™ ( ∂x ∂uj )2 + ( ∂y ∂uj )2 + ( ∂z ∂uj )2. (559) Here we have represented the coordinate uj by the curve: γuj = (x(uj), y(uj), z(uj) which is parametrized by uj. In this tutorial, we will see how to make use of line-elements and tangent vector fields to perform integrations along smooth curves (so-called ‘line-integrals’). We will then set up some basic results from exterior calculus to give a natural extension of these ideas to surface integrals and integrals over arbitrary manifolds (given some atlas / set of coordinate charts). 24.1 Exterior Product and Derivatives Recall that given the standard basis (∂x, ∂y, ∂z) for R3 (unit vectors tangent to the x, y and z coordinate curves), one has the corresponding dual basis (dx, dy, dz) consist of dual vectors (differential 1-forms). Since (dx, dy, dz) is an orthonormal basis (with respect to the Euclidean metric), we can write any differential 1-form ω as: ω = adx + by + cdz, (560) 219
  • 220.
    where a, b,c are some unique set of coefficients (the components of ω in the basis (dx, dy, dz)). By their definition, the dual vectors obey the axioms of a vector space or ‘linear space’. For example, linearity: λ1(adx + bdy + cdz) = λ1adx + λ1bdy + λ1cdz (561) and (a1dx+b1dy+c1dz)+(a2dx+b2dy+c2dz) = (a1+a2)dx+(b1+b2)dy+(c1+c2)dz, (562) where λ1, aj, bj and cj are all scalars (constants). Now recall that we defined the exterior (wedge) product ∧ of as binary operation on the space of differential 1-forms. It satisfied the antisymmetric property: ω ∧ η = −η ∧ ω, (563) as well as their (bi)linearity property: hω ∧ (fη + gβ) = fhω ∧ η + ghω ∧ β, (564) where ω, η, β are differential 1-forms and f, g, h are functions. Note that the jux- taposition fh denotes the multiplication of two functions – which is defined point- wise: (fh)(r) = f(r)h(r) for r = (x, y, z) ∈ R3. Given two differential 1-forms, ω and β, their exterior product, ω ∧ β, is a differ- ential 2-form. Under usual ‘vector addition’, the space of differential 2-forms on R3 is also a linear space – meaning, that the addition of differential 2-forms is a linear operation. A basis for this linear space is given by the following differential 2-forms: dx ∧ dy, dy ∧ dz, dz ∧ dx. (565) Example 23 (A Sound Basis) To show that the previous basis is indeed a basis for the linear space of differential 2-forms in 3-dimensional Euclidean space, we note that the exterior product of two differential 1-forms is necessarily a differential 2-form (possibly zero). Therefore, if we take two arbitrary differential 1-forms, compute their exterior product and simplify the result, we should be left with: Some Differential 2-form = Coefficients × Some Basis Differential 2-form. (566) 220
  • 221.
    So in particular,usinglinearity, we note that the basis for differential 2-forms is necessarily generated by taking exterior products of the basis differential 1-forms. Looking at all possibilities, we have: dx ∧ dx = 0, dx ∧ dy, dx ∧ dz = −dz ∧ dx, (567) dy ∧ dx = −dx ∧ dy, dy ∧ dy = 0, dy ∧ dz, (568) dz ∧ dx, dz ∧ dy = −dy ∧ dz, dz ∧ dz = 0. (569) Here we have made use of the anti-symmetry property of the exterior product, as well as its consequence: ω ∧ ω = 0 for any exterior form ω. Thus, we up to a ± sign (which we can discard), we are left with three unique possibilities: dx ∧ dy, dy ∧ dz, dz ∧ dx, (570) as a basis for the linear space (denoted henceforth) Λ2(R3), of differential 2-forms on R3. This means that given some differential 2-form ω, we can write it as: ω = ω1dx ∧ dy + ω2dy ∧ dz + ω3dz ∧ dx, (571) where ωj are the components (functions – possibly constant) of ω in the standard basis for Λ2(R3). Problem 36 (Supreme Commander) Once upon a midnight dreary, while he pon- dered weak and weary over the loss of his exterior bases in Supreme Commander, the Senior student-to-be, Zac Menschelli, nodded nearly napping. Suddenly there came a tapping, as of someone gently rapping, rapping on his chamber door – it turned out to be Georgie, the college raven. Having attended the mathematical sciences study group, the Georgie tells Zac he needs to construct new bases at a new set of coordinates. Choosing spherical coordinates (r, θ, φ), defined implicitly via: x = r cos(θ) sin(φ), y = r sin(θ) sin(φ), z = r cos(θ) cos(φ). (572) Here r is the radial coordinate (0 ≤ r ≤ ∞), θ is the longitudinal angle (0 ≤ θ ≤ 2π) and φ is the azimuthal angle (0 ≤ φ ≤ π). I: Help Zac construct a basis for Λ1(R3), the space of differential 1-forms on R3, in spherical coordinates. Note that there are two ways you can do this. 1. Write down the ‘obvious’ basis. 221
  • 222.
    2. Derive thebasis, first by starting with the fact that dx, dy, dz is a basis – then using the total differential formula (exterior derivative of a function): df(r, θ, φ) = ∂f ∂r dr + ∂f ∂θ dθ + ∂f ∂φ dφ, (573) to explicitly evaluate dx, dy and dz explicitly in terms of r, θ, φ and dr, dθ, dφ. II: Using a similar argument to the previous example, derive a basis for Λ2(R3), the space of differential 2-forms on R3, in spherical coordinates. Alternatively, start with the basis {dx∧dy, dy∧dz, dz∧dx} and do a ‘change of variables’ – i.e. substitute in your expressions for dx,dy and dz in terms of spherical coordinates. Challenge III: Generalizing what you have done so far in what you think is the ‘most sensible’ way, write down a basis for the space of differential 3-forms on R3. In particular, What does this look like in Cartesian coordinates? What about Spherical coordinates? IV: Try and write a basis for the space of differential 4-forms on R3. If you have trouble doing this, try to construct a differential 4-form which is non-zero. What is the obstruction? 24.2 Orienting Volume Forms Finally, it remains to introduce the last set of differential forms that exist over R3. In particular, if we take the exterior product of three differential forms or a differential 2-form and a differential 1-form, we obtain a differential 3-form. Note that differential forms of higher degree on R3 are all necessarily zero. This is a consequence of the anti-symmetry property – meaning that each basis 1-form, dx, dy and dz can only appear once in a set of consecutive exterior products. To see this, note that if a differential form ω appears more than once in a chain of exterior products, one can always permute the chain (possibly picking up a ± sign) so that it has the form: “... ∧ ω ∧ ω ∧ ...”, with ω ∧ ω = 0 collapsing the whole chain. Similar to the construction of the basis for Λ2(R3) we can construct a basis for Λ3(R3) by seeing what survives when we take all possible combinations of exterior products of the bases for Λ2(R3) and Λ1(R3). If you did this correctly in the previous problem, you will have found that the only surviving differential 3-form is: dx ∧ dy ∧ dz, (574) 222
  • 223.
    (up to a± sign or some permutation of x, y, z). Note that so far, we have always chosen the cyclic convention: x → y → z → x when ordering our differential 1- forms – this is intentional as it corresponds to the choice of a ‘right-handed orien- tation’ on our vector space, R3. Such an orientation is the ‘standard orientation’ for 3-dimensional Euclidean space. In particular, for R3 we have that: 1 = dV = dx ∧ dy ∧ dz, (575) is the ‘orienting volume 3-form’. By order of the differential 1-forms dx, dy, dz, it defines an orientation on R3. Furthermore, geometrically, one may think this as representing an infinitesimal cube (or parallelipiped) with sides of length dx, dy and dz – giving it dimensions of ‘volume’ (Length3). In general, for an n-dimensional inner-product space (vector space equipped with an inner-product or metric tensor) such as Rn, one can equip it with an orienting volume n-form. This gives the space an orientation, as well as a way to measure ‘volumes’ – the orienting volume appears when performing volume integrals on the space. It also acts a basis for forms of the highest degree on that space – any differential n-form must be some scalar multiple of it. Now that we have established the idea that differential p-forms (where p is some integer) behave like ‘abstract vectors’ in a linear space (under ‘addition’), we can explore the relation between exterior products, differential forms and the vector calculus we already know and love. Exercise 76 (Choice of Orientation) When questioned about her Orientation dur- ing ‘true colours’ week, a spherical coordinate system is caught-off guard. Real- izing that she has lost her orienting volume 3-form, she decides to construct a new one. I: If you are (left)right-handed, construct a (left)right-handed orienting volume form in spherical coordinates. [If you are genuinely ambidextrous, construct two oppositely oriented coordinate systems with each hand, simultaneously (writing with two pens at once)]. You can do this by computing dx, dy, dz in terms of spherical coordinates (r, θ, φ) and the spherical basis 1-forms: (dr, dθ, dφ), then substituting your results into the expression dV = dx∧dy∧dz (for the right handed system). Alternatively, for the left-handed system, use dVLefty = −dx ∧ dy ∧ dz. II: Now, use your results from Tutorial 21 to construct the orienting volume 3-form via the expression: dV = hrdr ∧ hθdθ ∧ hφdφ = |J|dr ∧ dθ ∧ dφ. (576) 223
  • 224.
    This requires knowing/ deriving the scale factors hj, or the determinant J of the Jacobian matrix. If you did your math correctly, you should see that this is precisely the same as the volume-form you constructed in part I. 24.3 Duality and Orthogonality By now, you should have noticed that the exterior product between differential forms, behaves very similarly to the cross-product (vector product) between vec- tors. In particular, they are both ‘antisymmetric’ and ‘bilinear’ operations. How- ever, there are some fundamental differences. The first obvious difference is that the exterior product obeys the associative law: (ω ∧ β) ∧ η = ω ∧ (β ∧ η), (577) where as the cross-product is not associative – in general, (u × v) × w = u × (v × w). (578) Exercise 77 (Counterstrike) During an evening sesh of counterstrike, Steven Meek decides to take out his rage over his 1ms lack of reaction time (leading to his death), by constructing a counter-example to the (incorrect) statement that ‘the cross-product is associative’. Not wanting to be outdone by Big Dog, can you also construct a counter-example? The second major difference is that the cross-product maps two vectors to another vector – an output which is the same ‘type’ of object as the input. The exterior product of two differential 1-forms however, takes two differential 1-forms and turns them into a differential 2-form – hence the output is a different ‘type’ of object to the input. Yet, as following exercise should illustrate, the coefficients / components that appear in both the cross-product and exterior product are ‘simi- lar’154 ... Exercise 78 I: Given the vectors v = a1∂x + b1∂y + c1∂z and u = a2∂x + b2∂y + c2∂z, compute their cross product and simplify the result: v × u = (....)∂x + (...)∂y + (.....)∂z. (579) 154 Spoiler Warning: Identical. 224
  • 225.
    II: Now, turnthese vectors into dual vectors (differential 1-forms) by replacing the standard basis vectors with the basis differential 1-forms: v = a1dx+b1dy+c1dz, u = a2dx + b2dy + c2dz. Compute the exterior product between these 1-forms: v ∧ u = (....)dx ∧ dy + (....)dy ∧ dz + (.....)dz ∧ dx. (580) III: Compare the coefficients of the standard basis vectors appearing in your cross- product to the coefficients of the standard basis 2-forms in your exterior product. What do you notice? In the last exercise, you should find that the components (coefficients) appearing the cross and exterior products are exactly the same. Why is this the case? To make sense of this, note that the cross product v × u produces a new vector which is orthogonal to both v and u – its magnitude is equal to the area spanned by the parallelogram formed by v and u. However, those of you who remember your rules of dimensional analysis will quickly point out that if your vectors are not dimensionless (e.g. physical vectors that have length), then their cross product produces a vector which has different units to the input vectors! Now comes a strong peculiarity. If we ‘reflect’ the input vectors v, u about some plane, then take their cross-product, the resulting vector is different to the one we obtain by first taking the cross-product, then reflecting the result v × u! There- fore, the vector produced by a cross-product is not preserved by reflections – even though it is preserved by rotations. This means it is not a ‘true’ vector in the geo- metric sense – it is in-fact a pseudo vector. Making the same considerations with the exterior product, one should quickly see that the exterior product of two differential 1-forms does not suffer the same tech- nical peculiarities as the cross-product of two vectors. Problem 37 (Gendanken) By drawing a diagram and considering the reflections described above, illustrate the fact that the cross-product of two vectors is a pseudo- vector and not a true vector. Now show geometrically that the exterior product does not suffer this. Note that to this extent, you can represent a differential 2-form, for example dx ∧ dy, as either x-y plane or an infinitesimal paralleliped with sides dx and dy. The discrepancy between cross and exterior products is resolved with mighty Hodge Dual Operator, which we denote by – also known as the ‘Hodge star’. Conceptually speaking, the problem with the cross-product comes from the fact that lines and planes are ‘dual’ in 3-dimensions. In particular, note that we can 225
  • 226.
    represent a planeby two linearly-independent vectors tangent to that plane (a basis for the 2-dimensional vectors space described by that plane if it contains the origin 0), or we can equivalently describe it by a vector (or line) normal to the plane – i.e. a vector orthogonal to the two vectors tangent to the plane. Hence, in essence, the information which encodes that an object is a ‘plane’, may come as a pair of linearly-independent tangent vectors, or as a single normal vector! Since a normal vector can be used to generate a normal line, this shows that lines and planes are in some sense ‘dual’ to each other in 3-dimensions. Such a duality is formalized with the ‘Hodge dual’ operator ∗. In general, for differential forms on Rn, the hodge dual turns a differential p-form ω, into a differential (n − p) form, ω. Thus in particular, for n = 3, one sees that the standard Cartesian basis differential 2-forms (representing planes) are dual to the basis differential 1-forms (representing lines). Exercise 79 (Lines and Planes in 3-dimensions (BFFs)) The hodge dual ω of a differential form ω, is defined implicitly via the relation: ω ∧ ω = 1, (581) where 1 = dV = dx ∧ dy ∧ dz is the orienting volume 3-form on R3. To the best of your ability, use this definition to compute the following hodge duals: 1. dx 2. dy 3. dz 4. (dx ∧ dy) 5. (dy ∧ dz) 6. (dz ∧ dx) 7. (dx ∧ dy ∧ dz). 8. 1. What do you notice about the relation between differential 1-forms and 2-forms? What about differential 3-forms and 0-forms (constants / functions such as ’1’)? If you have reached this point, try completing the same exercise except with Spher- ical Coordinates. Now try this for Polar Cylindrical Coordinates. 226
  • 227.
    Now try thisexercise for Ellipsoidal Coordinates. Now try this for Paraboloidal Coordinates. In the next tutorial, we will see how to exploit these ‘abstractions’ of vector cal- culus to our advantage in the computation of contour, surface and volume inte- grals. 227
  • 228.
    25 Tutorial 23:Serendipity and Integration Problem 38 (Conceptual Puzzle) Consider a borderless (infinite) pool table155. Placing the white ball in its starting location, is possible to shoot the other balls such that the black ball ends up exactly where the white ball started? Note the following restrictions and assumptions. • The pool table is frictionless, so momentum is conserved. • You are restricted to 2-dimensional motion (no bouncing over other balls). • You can ignore spin and any sources of energy loss – hence all momentum changes are linear. If your answer is ‘yes’, then you need to provide a geometric arrangement that solves this problem. If your answer is ‘no’, then you need to provide physical principles for it is not possible. 25.1 Introduction Oftentimes in the process of scientific research, one may, in the quest to solving one particular problem, discover or invent something new. This discovery may or may not be related to the initial problem one was trying to solve, but it is important and interesting in its own right. Such a process is known as ‘serendipity’ and is common to all pursuits of exploration and knowledge. In this tutorial, we will investigate some easy-to-understand consequences of re- search that has arisen in attempts to better the mathematical structure of Quantum Field Theory. In particular, we will look at some new analytical integration tech- niques that have been developed by Achim Kempf156, David M Jackson and Ale- jandro H Morales in their attempts to mathematically formalise ‘path integrals’ in quantum field theory 157. 155 Billiards Table. 156 The ideas for this tutorial arose from personal conversation with Achim, to which we are grateful. 157 Original paper can be foudn here http://iopscience.iop.org/1751-8121/47/ 41/415204/pdf/1751-8121_47_41_415204.pdf. 228
  • 229.
    25.2 Differentigration Differentiation isin general, an easy algorithmic process. Expressions may get tough and unweildly, but at the end of the day, one can usually follow a set of straight rules to arrive at an answer. As such, it relatively easy to program a com- puter to differentiate 158. Integration on the other hand, is much less straightfor- ward. Indeed, there are many numerical methods devoted to integration for this reason! Therefore, it may come as a surprise to know that you can turn the process of integration into one of differentiation. This sounds great! Of course, such surprises always come with some caveats and limitations. As it turns out, this ‘trick’ only works for ‘analytic functions’ – that is, functions which have a convergent power series representation. Theorem 2 Given a function f : R → R which has a convergent series expansion, the following representation for its integral holds: x 0 f(x )dx = f(∂y)( exy − 1 y )|y=0. (582) The expression f(∂y) is the function f(x) with the variable x replaced by the par- tial derivative operator, ∂y := ∂ ∂y . For non polynomial functions, we can evaluate the right-hand side by expanding f(x) as a power series (e.g. Taylor series) and replace the argument x with ∂y. The resulting series of differential operators acts on everything to the right, (exy−1 y ), which we then evaluate at y = 0 (after differ- entiating). This may seem an odd way to integrate, but perhaps not so odd when you look at it as a consequence of a more general identity arising from consideration of Fourier and Laplace transforms. To get some intuition, we proceed with an exam- ple. Example 24 (The Immeasurable Man) Having invented a time-travel machine to avoid capture by the Roman army besieging Syracuse, the great geometer – Archimedes, travels to St. George’s College for help. In particular, Archimedes finds that he cannot integrate with the Roman culture – nor can he integrate func- tions on the real line. However, he does understand preliminary concepts of differ- 158 An exception is when dealing with ‘special functions’, which may or may not have a series representation. 229
  • 230.
    entiation – meaning,one can teach him to evaluate the following integral159: x 0 zdz. (583) Using Achim’s tricks, we have the following identity: x 0 zdz = f(∂y)( exy − 1 y )|y=0, (584) where f(z) = z. Thus, f(∂y) = (∂y) = ∂ ∂y . Collecting these statements we have x 0 zdz =f(∂y)( exy − 1 y )|y=0 = exy(xy − 1) + 1 y2 |y=0 = lim y→0 exy(xy − 1) + 1 y2 |y = xexy(xy − 1) + xexy 2y |y=0, L’Hopital’s Rule = x2exy 2 |y=0 = x2 2 . (585) Indeed, this is the same answer that one would obtain via Riemann integration. You may wonder, what benefit such a technique offers given the effort required to evaluate an integral as simple as x 0 zdz. The answer is two-fold. The first reason is that an extension of this trick may be used to evaluate ‘improper integrals’ and/or ‘contour integrals’. Contour integrals require significant finesse in order to choose the correct integration paths (contours) soas to make use of the pole structure of f(z) and results from residue calculus (Cauchy’s theorem). Exercise 80 (Bored Beyond Measure) Unfortunately, Archimedes found the last example boring. This is because he already knew how to compute the area of a triangle. To make things more interesting, we now show him how to compute area 159 Archimedes could do this geometrically anyway. 230
  • 231.
    below a parabola(or equivalently, bounded by a parabola) using our integration trick. To this extent, compute the following integral: x 0 f(z)dz = f(∂y)( exy − 1 y )|y=0, (586) where f(z) = z2. This is the area under the parabola bounded by the horizontal axis, z = 0 and z = x. Hints: Note that you will have to differentiate twice (using the product rule). After differentiating, you will have to use L’Hopital’s (Bernoulli’s) rule to evaluate the limit y → 0. In-fact, you will have to use L’Hopital’s rule three times. Alterna- tively, you can play around with series expansions and limit identities if you don’t like L’Hopital’s rule. At this point, the integrals performed so far may seem relatively trivial – after all, we have only integrated polynomials. Consider now, the following example, which makes use of Taylor series! Example 25 (Tailored Functions) Having travelled to Singapore to get a cheap, good quality tailored suit, Archimedes now travels back to St. George’s College to get a Taylor series for the exponential function. He asks William Cheng to provide such a series. With some probability p, 0 ≤ p 1, William provides the following (correct) Taylor series for the exponential centred around x = 0: ex = xn n! . (587) Archimdes wishes to use this to derive an expression for: x 0 f(z)dz (588) where f(z) = eaz and a is some constant. Using our trick, we have: x 0 f(z)dz =f(∂y)( exy − 1 y )|y=0 =e∂y ( exy − 1 y )|y=0, (589) which requires expanding the differential operator, ea∂y , as a Taylor series then acting it on everything to the right of it. Rather than getting into a lot of mess, 231
  • 232.
    recall the followingdefinition for the Taylor series of a function g(y) centred at y = a: g(y) = ∞ n=0 ¢ dn dyn g(y) |y=a (x − a)n n! , (590) where gn(a) = ( dn dyn g(y))|y=a are called the ‘Taylor coefficients’ of for the Taylor series of g(y) centred at a. Therefore, the action of differential operator e∂y on some function g, evaluated at y, is given by: ea∂y g(y)|y= ∞ n=0 an n! ∂n ∂yn g(y)|y=0 = ∞ n=0 ( ∂ng(y) ∂yn )|y an n! . (591) Comparing this to the Taylor series of g(y + a) centred at y, we have: g(y + a) = ∞ n=0 ( ∂ng(y) ∂yn )|y (y + a − a)n n! , (592) which is identical! Therefore, in general, we see that: ea∂y g(y)|y= g(y + a). (593) Therefore, we see that the differential operator ea∂y |y acts on an arbitrary differen- tial function g(y), to translate its argument by a. Aside: Recalling back to our earlier study of Lie groups and Lie algebras, this is because the operator ∂y is a basis element in the Lie algebra of translations. Therefore, taking its exponential gives a corresponding Lie group element, ea∂y , which is a member of the Lie group of translations (a symmetry group). Note that in these Lie groups and algebras act on the ‘ring of smooth functions’ on R, as opposed to a vector space. For practical purposes however, it suffices to remember that ea∂y g(y) = g(y + a). (594) 232
  • 233.
    Applying this toour integral, we see that x 0 eaz dz =ea∂y ( exy − 1 y )|y=0 =e∂y ( exy − 1 y )|y=0 =( ex(y+a) − 1 y + a )|y=0 = 1 a eax , (595) as expected! Problem 39 (Sinus Problems) Having not adapted to the pollen in Western Aus- tralia, Archimedes develops intense hayfever and sinus problems over spring. After some inspiration, Archimedes develops a cure for his hayfever through mathemat- ical biology. However, in this process, he needs to evaluate the integral: x 0 cos(ax)dx, (596) using our differentiation-integration trick. Help Archimedes modify his gene ex- pression by solving this problem. Hint: Use the fact that cos(ax) = eiax+e−iax 2 in conjugation with our previous identity, eayg(y) = g(y + a), to evaluate the action of the differential operator cos(a∂y)|y=0 on exy−1 y . Exercise 81 (Differentiation by Parts) Integration ‘by parts’, is a trick that es- sentially relies on two things: • The product (Leibniz) rule for differentiation. • The fundamental theorem of calculus (or Generalized Stokes’ Theorem). In particular, for two functions f, g, we have the mnemonic: fdg = fg|− gdf, (597) where | specifies the integration bounds. This comes from: d(fg) = fg|, (598) 233
  • 234.
    then expanding theleft-hand side via the product rule (note that d is the exterior derivative). Some integrals such as: x 0 x2 eax dx, (599) can therefore be solved recursively, via integration by parts. I: Solve the afore mentioned integral using integration by parts . II: Solve this integral Achim’s differentiation-integration trick which we have studied so far. 25.3 Quantum Field Theory Aside The second reason that we may consider our integration ‘trick’ is on theoretical grounds. In particular, recall Young’s ‘double slit’ experiment. If we shoot an elec- tron at a plate with a single slit, followed by a fluorescent screen, we will observe a single dot on the screen where the electron hits. In this set-up, the electron behaves like a classical particle. Repeating this experiment for electrons fired with the same kinetic energy, we will build up uniform pattern on the fluorescent screen. Now, if we replace our single slit with a narrowly separated double slit and repeat our experiment, instead of building up a uniform distribution on the screen, we will observe an inference pattern. In particular, there will be areas of minimum inten- sity and areas of maximum intensity – something we would expect if the electron was a ‘wave’. There is nothing spooky about this result, in-fact, it is simply a con- crete illustration of the ‘matter-wave’ duality of particles as explained by quantum mechanics. It appears then, that if we choose to measure the electron as a particle, we will observe particle behaviour. If we choose to observe it as a wave, then we see wavelike behaviour – in essence, this choice is made by having a ‘double-slit’ or ‘single-slit’. Using the interference pattern, we can construct a ‘probability density’ for the electron striking the screen and associate its path through each slit, with some probability amplitude. If we now consider what happens when we have two double slits, in succession, then we have four choices of paths for the electron to hit the screen. We multiply successive probabilities to get the probability of each path taken. Now, consider as Feynmann did, if we have infinitely many successive double slits. More so, not just a countable infinity, but rather, one double slit at each point in 234
  • 235.
    space – anuncountable infinity of double slits. In this manner, we see that there are an uncountable number of paths for the electron to travel – and an uncountable number of probabilities to multiply. To deal with this sort of ‘continuous infinity’, we have a familiar tool – integration! As it turns out, the above thought experiment leads to a ‘path integral formulation’ of quantum mechanics. In this formulation, we don’t view electrons as waves – rather, we treat them as particles and obtain probabilities by doing an uncountable infinity of integrations. In particular, we perform an integral at every point along the path the particle travels in spacetime, then consider every possible path the particle can take in spacetime. Although such an approach gives results that agree with traditional quantum mechanics, it turns out that this ‘path integral’ is not a well-defined mathematical quantity. This problem of not well-defined path integrals persists in the next stage of theo- retical physics – quantum field theory. Despite many attempts by generations of mathematicians and physicists, the path integral is still not a well-defined or well- understood mathematical quantity. On the other hand, it yields results which can be experimentally measured to extreme accuracy. Therefore, by turning integra- tion into differentiation – using tricks as that outlined earlier, one arrives at some interesting possibilities for constructing a well-defined path integral. 25.4 Generalizations The results covered so far, can be easily generalized to arbitrary finite intervals. In particular, note that: b a f(x)dx = b−a 0 f(x + a)dx. (600) In this manner, we can change our lower integration limit from 0 to any finite real number. Alternatively, we have the following identity (also contained in Achim’s paper): b a f(x)dx = f(−i∂x) eibx − eiax ix |x=0, (601) where f(−i∂x) is the function f(x) with its argument x replaced by the differential operator −i∂x = −i ∂ ∂x . 235
  • 236.
    For Fourier transforms(integrated over the whole real line), the following identity is useful: g(x)dx = 2πg(−i∂x)δ(x)|x=0, (602) where δ(x) is the Dirac delta distribution, centred at 0. For Laplace transforms (integrated from zero to infinity), the following conse- quence is useful: ∞ 0 f(x)dx = 2πf(−i∂x)H(−i∂x)δ(x)|x=0, (603) where H is the Heaviside distribution. As a function H(x) = 0 for x 0 and H(x) = 1 for x ≥ 1. Try experimenting with these generalizations of our initial identity to evaluate in- tegrals that you already know how to perform using standard calculus rules. This should help you build some confidence and intuition with these techniques. It will also stop you from getting bored over summer! Once you are confident that you have a hand on these integration tricks, you may apply them to harder integrals which you may not know how to solve otherwise. In particular, for those of you who use Fourier and Laplace transforms, or per- form contour integrals in complex analysis, you should find that these tricks may simplify some of your problems. 236
  • 237.
    26 2015 AcademicProgram Suggestions 26.1 Tutoring In response to David Platt’s request for thoughts on the structuring of the academic program provided by St. George’s college, I have the following comments. • The nature of Remedial Help For people requiring coursework help, David has suggested that students should be encouraged / expected to offer academic assistance to other stu- dents, as part of ‘college spirit’. He suggested this means getting rid of paid tuition for coursework help. Such views represent an ideal academic col- lege system, and indeed, they could implemented be if these expectations were made at the very start of the year. There are however, some obvious obstacles and short-comings which will need to be addressed. • The first issue is that the necessary expertise for academic assistance is lim- ited to the students in the each relevant discipline. Of these students, it may be fair to say that most of them would be willing to help another student as part of the college camaraderie. However, each student in a position capable of providing quality assistance, will be limited by their own commitments and available time. Hence the assistance they provide, may or may not nec- essarily match the time needed by the student requesting it. This is espe- cially true in the case of students who are struggling and need several hours of dedicated 1-on-1 assistance. The student needing assistance then has two in-college options. One is to seek assistance / time from another student and the other is to wait until the original student is free again. If there are enough skilled students to able and willing to provide the hours / week required by students requiring assistance, then with appropriate com- munication (e.g. a college facebook study-group for each academic disci- pline), this idealized academic support system may work at college. In prac- ticality, a functional academic support system for remedial help, would have to be a mix of both worlds. If the college does not provide remedial help at a dedicated professional level, then there is less incentive for students to come to college since they could just as equally make friends in class and ask assistance from them. In essence, there should be at least one dedicated tutor for each discipline who is sufficiently skilled. This solves the scenario in which a student re- quires help, but is unable to obtain in-college assistance from other students 237
  • 238.
    in their requiredtime-frame. This also protects the system against the cir- cumstances where capable students are unwilling to assist certain students for personal/social reasons. • To protect such a system from the obvious problem of students bypassing the student community help to get help from a dedicated tutor, it should be outlined in each study group (as well as at the start of year and throughout semester) that students seeking remedial help must first seek help from the relevant student body at the college. This could work by posting on the study group page. In the situation that there is evidence that the student has sought help from other students, but was unsuccessful – either due to lack of availability or expertise, then the student should be able to request help from a dedicated tutor. Perhaps there is also an explicit and reasonable expectation for the dedicated tutors to provide some academic support ‘off-the books’, in the same manner that other students have that expectation upon themselves. The key to prevent the hybrid system from falling back to the old system (where there are too many remedial tutors) is to encourage the psychology of community help at college. I believe that creating discipline-specific study / social groups (separate from formal, extra-curricular tutorials), which stu- dents of each discipline must join, will go a long way to help create and foster the academic environment envisioned by David Platt and Michael Champion. If successful, a hybrid system has the potential to both sup- port a collective academic team environment as well as possessing some the professionalism and fall-back structures that a traditional college should pro- vide – for example, to cater for critical periods in semester (such as the exam periods when everyone is very busy). We need the following implementation: 1. A facebook group for each subject discipline that students should / must enrol in at the start of the year (separate to tutorial groups). Along with this is the explicit expectation that they help other students academically when they can (new students won’t mind this if it’s introduced from the beginning). 2. Smaller quantity of tutors, but more quality. Getting a HD in a subject is probably not sufficient (in general) for someone to be a dedicated tutor in that discipline. The tutor should exhibit some consistent high performance in the discipline and some level of mastery. 3. Senior tutors, such as Claire and Raymond, should have more input / say in the direction of the academic program at college. 238
  • 239.
    26.2 Mathematical SciencesTutorial Plan 1. Mathematics of the GPS System: 3-4 tutorials. Requires elementary notions of non-Euclidean geometry, triangulation, consequences of general relativ- ity, and error analysis (differential error). Combine this with tute on metric spaces. 2. Mathematics of Space Travel: 4-5 tutorials. Include two tutes on conic sec- tions and the Kepler orbits. Include a tute on hyperbolic trigonometry / ge- ometry, special relativity. Include tute on Alcubierre metric, warp drive and optimisation. 3. Integration Techniques and Applications: 3-4 tutorials. Take the Kempf in- tegration tutorial as the last, then add two or three new tutes. Include on tute on the Gamma function, then another tute on other special functions. 4. Fourier Analysis and Spectral theory: 3-4 tutorials. This will include theory, some basic programming and experimental investigation of instruments / the wolf-note. We may also compare music tracks and voices in this manner. 5. Dimensional Analysis tutes from 2013: 3 tutorials. 6. Error Analysis tutes from 2013: 3-4 tutorials. 7. Re-hash of the Lie Groups / Lie Algebras: 3-4 tutorials. Second semester / end of year. 8. Differential Equations tutorials: 3-4 tutorials from 2013 (add new one on non-linear DEs). Additionally, need the following ground-rules / changes: 1. Leon’s ‘mathematics is a mountain’, input-output type motivational talk. 2. Bring healthy snacks to tutes + chocolate to encourage people. 3. Use white-board / experimental apparatus at start of tute to motivate them. 4. Feedback / grade, informal qualification. 5. Rebrand: Mathematical Sciences Exploration Group. 239
  • 240.
    27 Miscellaneous A sectionof notes for topics that individuals have requested. Disclaimer: this is written from memory and pen-paper calculations. 27.1 Lagrangian Mechanics 27.1.1 Background After a while, one begins to realise that using Newton’s laws to solve problems in classical mechanics can get very tedious and annoying. Thankfully, apart from making good cheese, wine and conquering most of Europe, the French were (and still are) also very good at producing world-class mathematicians. One such math- ematician was Joseph Lagrange, who amongst a trillion other accomplishments, came up with a revolutionary reformulation of classical mechanics in conjunction with several other mathematicians 160 and physicists. This approach is now known as ‘Lagrangian mechanics’ and is an extremely powerful and vast generalisation of Newtonian mechanics. Today, almost the entirety of modern physics is based on the principles set down by Lagrange and Hamilton. It also has vast applications to optimization problems and many areas of engineering. 27.1.2 The Principle of Stationary Action The fundamental concept behind Lagrangian mechanics is the ‘principle of sta- tionary action’. It is more commonly referred to as the principle of ‘least action’, which is technically incorrect 161. It basically says that nature is lazy, and will always (classically) take the path of stationary action – which means it makes the following functional stationary: S = Ldt (604) Here the quantity S, called the ‘action’, is a functional – an object which acts on functions. The function L is called the ‘Lagrangian’ of your theory – it contains all necessary information about your physical system. Different theories and different 160 Most notably, the Irish mathematician Sir William Rowan Hamilton. 161 Recall that when you are trying to find the critical points of a function, you first find its derivative and then set it to zero. This doesn’t just give you points at which the function is minimized – you also get inflection points and maxima. 240
  • 241.
    systems will havedifferent lagrangians. Finally, the integral used here is the indefinite-integral with respect to time t, which parametrises the system. For systems in classical mechanics, the Lagrangian sometimes (but not always!) takes the following form: L = T − U (605) where T is the total kinetic energy of the system and U is its potential energy. If the system is conservative (i.e. no losses due to friction etc) and the potential energy U is time-independent, then Lagrangian will take this special form. Note the minus sign in T − U is important – if this was plus sign, then the Lagrangian would be the total energy (or Hamilton in this restricted set of cases). If the system is non-conservative, then one usually has to add extra terms the action to account for losses / dissipation (or net gain) of energy. If the system is constrained – e.g. a bead confined to roll on some surface, then one needs to either use the method of Lagrange multipliers or to express the system in-terms of unconstrained variables. 27.2 The Euler-Lagrange Equations of Motion The Euler-Lagrange equations of Motion are the equations you have to solve to de- termine the dynamical time evolution of your system in the Lagrangian formalism. In some subset of cases, these are simply equivalent to the equations of motion you get using Newton’s Second Law: F = ma. Here I will specify a simple system, then show how to derive the Euler-Lagrange equations for this system using the principle of stationary action. Later, I will specify a more general system then re- derive the Euler-Lagrange equations. Finally, I will give an example of the power of the Lagrangian formalism – in particular, a proof of the fact that a straight line is the shortest distance between two points in ordinary Euclidean geometry. In the Lagrange formalism, a system is specified by a set of generalized coordi- nates: q1(t), ..., qn(t) (parametrised by time t) and a set of generalized velocities which are the derivatives of the coordinates with respect to time t: 9q1, ..., 9qn. In non-relativistic mechanics, we view the time t as the independent variable and the coordinates qi and velocities 9qi as dependent variables, parametrised by t. The con- figuration space is then taken to be the set of all possible values: (q1, ..., qn, 9q1, ..., 9qn) of the generalized coordinates and the corresponding velocities. Note that general- ized coordinates represent points in some space M, and the generalized velocities are (tangent) vectors attached to these points (recall velocity is a vector quantity). 241
  • 242.
    Hence the configurationspace of a physical system naturally takes the form of a ‘tangent bundle’ 162, denoted TM. Abstraction aside, we now consider the Lagrangian for a simple system (e.g. a point-particle moving with constant acceleration) with a generalized coordinate q and a generalized velocity 9q = dq dt . The Lagrangian L = L(q, 9q) for this system is a function of q and 9q, defined on the configuration space 163 TM. The action S[L] corresponding to this Lagrangian L, is given by: S[L] = L(q, 9q) = L(q, 9q)dt. (606) We can compute the variation of this action δS[L] by using integration by parts and computing the variation of the Lagrangian: δL. Note that to compute the variation of the Lagrangian, δL, we simply use the same rules as we do when computing a total differential (or ‘exterior derivative’). In particular, we have δL(q, 9q) = ∂L ∂q δq + ∂L ∂ 9q δ 9q (607) Note that we have assumed that the Lagrangian L does not explicitly depend on time t. It only depends on t implicitly through q(t) and 9q(t). If it did explicitly depend on t, e.g. for a system with a time-varying potential energy U(t), then we would just include an extra term: ∂L ∂t in the variation of L. Therefore, we have: δS[L] =δ Ldt = δLdt = ( ∂L ∂q δq + ∂L ∂ 9q δ 9q)dt (608) Note that the variation ‘operator’ δ commutes with derivative operators. Hence for example, d dt δq = δ d dt q = δ 9q. Our goal is to compute the ‘functional derivative’ of the functional S with respect to the generalized coordinate q. The functional 162 A collection of points and the tangent spaces attached to those points. If the coordinate space M is n-dimensional, then the tangent bundle TM is 2n-dimensional. 163 In general, L could also be a function of higher derivatives of q, for example – L = L(q, 9q, :q, ..), however for most practical cases we just consider L = L(q, 9q). 242
  • 243.
    derivative allows usto differentiate functionals with respect to functions – apart from a few technicalities, it behaves much the ordinary derivative. This means we want the quantity δS δq , so we need the term δq to right of both terms in the integrand of (608). However, the second term contains δ 9q := δ d dt q. In order to ‘move’ the total derivative d dt away from the q, we use the integration by parts technique 164: d dt ( ∂L ∂ 9q δq)dt = d( ∂L ∂ 9q δq) =⇒ { d dt ( ∂L ∂ 9q )δq}dt + ( ∂L ∂ 9q δ d dt q)dt =[ ∂L ∂ 9q δq]| t=tf t=ti (609) where ti and tf denote the range of integration over time – we almost always use ti = −∞ and tf = +∞ for a classical action. Now, we make the (physically- motived) assumption that the quantity the quantity on the right-hand side vanishes: [∂L ∂ 9q δq]| t=tf t=ti = 0. This is almost-always true for most physical Lagrangians L 165. Therefore, taking this assumption, we get: { d dt ( ∂L ∂ 9q )δq}dt + ( ∂L ∂ 9q δ d dt q)dt =0 =⇒ ( ∂L ∂ 9q δ d dt q)dt = − { d dt ( ∂L ∂ 9q )δq}dt. (610) This allows us to write the variation (608) of the action as: δS[L] = dt( ∂L ∂q δq) − dt( d dt ( ∂L ∂ 9q )δq = dt{ ∂L ∂q − d dt ( ∂L ∂ 9q )}δq. (611) Note that here we’ve made a common (mathematically-motivated 166) change of notation: (Stuff)dt =: dt(Stuff). Finally, we bring the δq in the integrand (611) to the left-hand side and formally define the functional derivation of the ac- tion S to be: δS[L] δq = ∂L ∂q − d dt ( ∂L ∂ 9q ). (612) 164 Or rather, the fundamental theorem of calculus (for 1-dimensional problems) / a special case of the generalized Stokes theorem for higher dimensions. 165 One rare case where one gets so-called ‘boundary contributions’ to the action integral, is in general relativity – in particular, the Gibbons-Hawking-York boundary term, which accounts for the case when spacetime is a manifold with a boundary. 166 In this manner, we can think of the integral sign and the variables we integrate with respect to (dt) as an abstract operator or ‘functional’ called a ‘measure’. Thus dt is an operator which acts on functions to give some number – which is the value of the function it integrates. 243
  • 244.
    In this language,the principle of stationary action states that the variation must vanish: δS = 0, which is equivalent to saying the functional derivative is zero: δS[L] δq = 0. Therefore ∂L ∂q − d dt ( ∂L ∂ 9q ) = 0, (613) which are precisely the Euler-Lagrange equations of motion for this dynamical system! Thus we have explicitly demonstrated that the Euler-Lagrange equations are a direct consequence of the principle of least action – furthermore, we listed the assumptions made throughout the derivation. In particular, we assumed zero boundary contributions to the action and that the Lagrangian L was not explicitly dependent on time (so ∂L ∂t = 0) and that it only depended on the generalized coor- dinates and velocities: L = L(q, 9q). If we relaxed some of these assumptions, we could extra terms in the Euler-Lagrange equations. Note, there is another way to view this derivation using Taylor expansions. This method is a bit more suggestive and intuitive in regards to why we call these tech- niques ‘variational principles’ or ‘variational calculus’. The premise is that we perturb the action by perturbing the function it acts on: S[L + δL] ≈ S[L] + δS, then define the variation as the difference between the perturbed action and the original action: δS = S[L + δL] − S[L]. Functions L which satisfy the stationary action condition: δS[L] = 0, are called Lagrangians. They are inflection points of the action functional. In some cases they correspond to minima or maxima of the action. For this reason, they are fundamental to variational calculus. For example, if the action represented the length of a curve or the surface area of a soap bubble, we could use variational calculus to find a curve with minimal length or the shape of a soap bubble surface with minimal area under some given constraints. Example 26 As an example, take the motion of a point-particle with mass m and position coordinate x, moving in one-dimension. We view x as a function of time t: x = x(t). Then x is our generalized coordinate with corresponding generalized velocity 9x. If the particle’s is moving due to some conservative force acting on it, then it has some associated potential energy U. Assuming U is independent of time t, we then have U = U(x) in general (e.g. the particle could be moving vertically and experiencing a gravitational force with potential U = U(x)). The Lagrangian is then given by: L = Kinetic Energy − Potential Energy = 1 2 m 9x2 − U(x). (614) 244
  • 245.
    The Euler-Lagrange equationsthen tell us that: ∂L ∂x − d dt ( ∂L ∂ 9x ) = 0, (615) hence we see that − ∂U(x) ∂x − d dt (m 9x) = 0. (616) Since U is only a function of one variable, we write the partial derivative as a total derivative instead, hence: − dU(x) dx = m:x (617) since the mass m is constant. Recalling that a conservative force F can be defined as the gradient of some potential: F = − U, we then identify −dU(x) dx as the component Fx of the force acting on this particle in the x-direction. Hence we have: Fx = m:x (618) which is precisely Newton’s second law. Note that this is based on the assumption that the Lagrangian L was only dependent on x and 9x. In general, one may have a time-varying acceleration (e.g. a radiating charge or stealth fighter jet) – in such a case, we would modify the Euler Lagrange equations and therefore modify our statement of Newton’s second law. 27.3 N-Dimensional Euler-Lagrange Equations To see how this formalism generalizes to higher-dimensional systems, we proceed as follows. Let qi denote the i − th generalized coordinate for a system with n generalized coordinates, q1, ..., qn. The n corresponding generalized velocities are then given by 9qi, where i = 1, ..., n. Collecting the variables q1, ..., qn and 9q1, ..., 9qn into vectors q and 9 q, respectively, we can view the Lagrangian as a func- tion of 2n variables, parametrised by time t: L = L(q, 9 q; t). (619) The action functional generated by this Lagrangian is given by: S[L] = Ldt. (620) 245
  • 246.
    To vary theaction, we Taylor expand L(q1, ..., qn, 9q1, ..., 9qn) to first order in all its variables. In particular, we have: S[L + δL] = L(q + δq, 9 q + δ 9 q)dt = [L( 9 q, q) + ∂L ∂q1 δq1 + ... + ∂L ∂qn δqn + ∂L ∂ 9q1 δ 9q1 + ... + ∂L ∂ 9qn δ 9qn ]dt = L( 9 q, q)dt + [ ∂L ∂q1 δq1 + ... + ∂L ∂qn δqn + ∂L ∂ 9q1 δ 9q1 + ... + ∂L ∂ 9qn δ 9qn ]dt =S[L( 9 q, q)] + [ ∂L ∂q1 δq1 + ... + ∂L ∂qn δqn + ∂L ∂ 9q1 δ 9q1 + ... + ∂L ∂ 9qn δ 9qn ]dt, (621) hence δS :=S[L + δL] − S[L] = [ ∂L ∂q1 δq1 + ... + ∂L ∂qn δqn + ∂L ∂ 9q1 δ 9q1 + ... + ∂L ∂ 9qn δ 9qn ]dt = [ ∂L ∂q1 δq1 + ... + ∂L ∂qn δqn − d dt ( ∂L ∂ 9q1 )δq1 + ... − d dt ( ∂L ∂ 9qn )δqn ]dt = {[ ∂L ∂q1 − d dt ( ∂L ∂ 9q1 )]δq1 + ... + [ ∂L ∂qn − d dt ( ∂L ∂ 9qn )]δqn ]}dt (622) where we have used integration by parts to move the total derivative d dt from the perturbations, ∂ 9qi, to the corresponding coefficients, ∂L ∂ 9qi . Again, one makes the assumption of vanishing boundary contributions: d(( ∂L ∂ 9qi )δqi) = [ ∂L ∂ 9qi ]|∞ −∞= 0. The principle of stationary action says that a physical system classically evolves such that the action is stationary: δS δq = 0. For this to happen, the coefficients of the variations δqi of the coordinates, must vanish in the integral (622). This means that we obtain a system of n differential equations, which are the n−dimensional Euler-Lagrange equations: ∂L ∂q1 − d dt ( ∂L ∂ 9q1 ) =0 ∂L ∂q2 − d dt ( ∂L ∂ 9q2 ) =0 ... ∂L ∂qn − d dt ( ∂L ∂ 9qn ) =0. (623) 246
  • 247.
    In this manner,one can now derive Newton’s Second Law in n dimensions by generalizing the 1-dimensional case outlined earlier. In particular, this is done by considering a potential U = U(x1, ..., xn) which depends on the n position coordinates x1, .., xn. The velocities are given by dxi dt . Putting these into vector quantities, the kinetic energy of a point particle of mass m with velocity 9 x is given by: K = 1 2 m 9 x 2 . (624) Since the potential energy U is time-independent, we can write the Lagrangian for this system as: L = K − U = 1 2 m 9 x 2 −U(x). (625) The Euler-Lagrange equations can be found using the system (623) earlier. In particular, since we have ∂ ∂ 9qi 9 x 2 = ∂ ∂ 9qi [( 9q1 )2 + ... + ( 9qn )2 ] =2 9qi , (626) the Euler-Lagrange equation for the i − th coordinate of the point particle, is given by: m d dt 9qi + ∂U ∂qi = 0. (627) Re-arranging, this is simply the i − th component of the n-dimensional version of Newton’s Second Law of motion: m:qi = − ∂U ∂qi . (628) Collecting the n equations into one vector equation, this is made explicit: F := m: q = − U, (629) where U is the gradient (vector) of the potential energy function U. This state- ment is in fact, quite general – that is, a conservative force F arising from a poten- tial U, is necessarily given by: F = − U. So for example, given a gravitational potential U = −GM r , we see that the (conservative) gravitational force is given by: F = − ( GM r ) = − GM r2 ˆr, (630) where G is Newton’s gravitational constant and ˆr is a unit-vector pointing in the radial direction away from a massive object of mass M. The minus sign then accounts for the fact that the gravitational force is directed towards the massive object. 247
  • 248.
    27.4 Examples Example 27(Simple Pendulum) Consider a vertical pendulum of mass m and length l. We set up a coordinate system with horizontal (pointing right) coordinate x and vertical (downward) coordinate y, where θ is the angle between the vertical y-axis and the arm of the pendulum. We set the origin to be at the beginning of the pendulum arm, from which the mass hangs at the opposite end. Since this system is undergoing rotational motion (the mass at the end of the pendulum is moving in a circular arc of radius l) with a fixed radius l (the length of the pendulum arm), the mass at the end of the pendulum has a tangential velocity of: v = rω = r 9θ. Therefore, the total kinetic energy is given by: K = 1 2 m v 2 = 1 2 ml2 9θ2 . (631) The potential energy is given by: U = Gravitational Force × Distance, which is the projection of mgl in the vertical direction: U = mgy = mgl cos(θ). (632) The Lagrangian is therefore given by L(θ, 9θ) = K − U = 1 2 ml2 9θ2 − mgl cos(θ), (633) where θ and 9θ are the generalized coordinate and corresponding generalized veloc- ity, respectively. The Euler-Lagrange equation is given by ∂L ∂θ − d dt ∂L ∂ 9θ = 0, (634) which simplifies to :θ + g l sin(θ) = 0. (635) This differential equation can be solved analytically for θ using hypergeometric functions. Alternatively, one can make the small angle approximation to linearise this non-linear differential equation: sin(θ) ≈ θ, for small displacements θ 1 (radians). Note that using the Lagrangian approach, one only needs to compute the potential energy and kinetic energy for the pendulum system. This is a rather trivial task (as shown) which avoids the messiness of having to consider forces and ‘tension’, which is required by the Newtonian approach. 248
  • 249.
    Another advantage ofthe Lagrangian formalism, is that one may easily change coordinates without having to worry about introducing ‘fictitious forces’ (e.g. cen- trifugal, Coriolis) – the principle of ‘generalised coordinates’ essentially bids one to express the Lagrangian in terms of the most ‘natural’ coordinate system for the problem at hand. Here made use of the rotational nature of the problem to switch from the Cartesian x, y coordinates to the polar coordinates r, θ (although we didn’t use r, since we the radial coordinate was fixed at r = l). Example 28 (Harmonic Oscillator) Consider a 3-dimensional harmonic oscilla- tor. Such a system may be envisioned as a mass attached to a spring, whose other end is fixed at some origin. If we let a 3-dimensional Cartesian coordinate system – x, y, z – coincide with initial (non-stretched) position of the mass, then stretching the string in any direction will induce a radial oscillatory motion. Let k denote the spring constant and m denote the mass at the end of the spring. The force on the mass is given by Hooke’s law: F = −kr (636) where r is the (radial) position vector: r = xe1 + ye2 + ze3 ∼ (x, y, z). The potential energy of the spring is equal to the work done required to stretch the spring from its rest U = r 0 F · dl = (−kr)dr = − 1 2 kr2 . (637) The kinetic energy of the mass is given by K = 1 2 m v 2 = 1 2 m 9r2 , (638) where 9r2 = 9x2 + 9y2 + 9z2. We could use Cartesian coordinates, however radial coordinates are the ‘natural choice’ for this problem (since it is effectively a 1- dimensional problem – the motion only occurs in the radial direction, which is one- dimensional). Therefore we choose r and 9r = d dt r to be our generalized coordinate and generalized velocity, respectively. The Euler-Lagrange equation is given by ∂L ∂r − d dt ∂L ∂ 9r = 0, (639) which reduces to :r + k m r = 0. (640) This second-order linear differential equation is solved by the usual means. In particular, the characteristic equation is given by: λ2 + k m = 0, (641) 249
  • 250.
    whence the eigenvaluesare λ = ±i ˜ k m . Let ω := ˜ k m denote the fundamental frequency. Then the general solution is giveb by: r(t) = c1eiωt + c2eiωt , (642) where c1 and c2 are constants determined by the initial conditions. This can alter- natively be expressed in real form, r(t) = a1 cos(ωt) + a2 sin(ωt) (643) where are a1 and a2 are constants determined by the initial conditions. In particu- lar, r(0) = a1 and 9r(0) = a2ω. Hence a1 is the initial displacement and a2 is the initial velocity divided by the fundamental frequency. Note if you’ve forgotten how to get from complex form to real form, recall that cos(x) = eix + e−ix 2 , sin(x) = eix − eix 2i (644) where i2 := −1. Comparing coefficients we see that the constants are explicitly related by: c1 = a1 2 + a2 2i , c2 = a1 2 − a2 2i . (645) 27.5 Multiple Independent Parameters For the purpose of the (modern and topical) branch of mathematical physics known as ‘minimal surface’ theory, along with relativity and quantum field theory, it is im- portant to extend the Lagrangian formalism to include physical systems – or more specifically, generalized coordinates, which depend on more than one independent parameter. Until now, we have considered systems which were parametrised by one independent variable – time t. We now consider systems which are parametrised by k independent variables, which we shall denote t1, ..., tn for familiarity. For simplicity, we shall just consider systems with one generalized coordinate (parametrised by multiple variables) for now. The extension to an arbitrary number of generalised coordinates is done in the obvious way, analogous to our previous extension when we had just one independent parameter t. Let t1, ..., tk denote our k independent parameters and let q := q(t1, ..., tk) denote our generalized coordinate, dependent on these parameters. The corresponding generalized velocities (with respect to each parameter) are then give by: ∂q ∂t1 ,..., ∂q ∂tk . 250
  • 251.
    Given some functionL := L(q, ∂q ∂t1 , ..., ∂q ∂tk ; t1, ..., tk) explicitly dependent on the generalized coordinate q, generalized velocities ∂q ∂ti and implicitly dependent on the independent parameters t1, ..., tk, we now wish to formulate a variational problem. In particular, we consider the following action functional (a k-dimensional integral performed over t1, ..., tk): S[L] = Ldt1dt2...dtk (646) and ask the question – which functions L make this action stationary? To solve the variational problem, we proceed as before to vary the action by Taylor expansion of L in all its variables. In order to do this, some new notation will be handy. Let vq i denote the i − th generalized velocity corresponding to the generalized coordinate q – particular, we have: vq 1 := ∂q ∂t1 , ..., vq k := ∂q ∂tk . The variation of the Lagrangian is then given using the same rules as the total differential: δL = ∂L ∂q δq + ∂L ∂vq 1 δvq 1 + ... + ∂L ∂vq k δvq k. (647) Therefore, the variation of the action is given by: δS = δLdt1...dtk = [ ∂L ∂q δq + ∂L ∂vq 1 δvq 1 + ... + ∂L ∂vq k δvq k]dt1...dtk = [ ∂L ∂q δq − ∂ ∂t1 ( ∂L ∂vq 1 )δq − ... − ∂ ∂tk ( ∂L ∂vq k )δq]dt1...dtk = [ ∂L ∂q − ∂ ∂t1 ( ∂L ∂vq 1 ) − ... − ∂ ∂tk ( ∂L ∂vq k )]δqdt1...dtk. (648) where we have used integration by parts (or Stoke’s Theorem) for multiple vari- ables, to swap the derivatives ∂ ∂ti from the velocity variations δ ∂q ∂ti to the corre- sponding coefficients ∂q ∂ti – which introduces the minus signs. Therefore, we have the functional derivative of the action with respect to the generalized coordinate, given by: δS δq = ∂L ∂q − ∂ ∂t1 ( ∂L ∂vq 1 ) − ... − ∂ ∂tk ( ∂L ∂vq k ). (649) The principal of stationary action tells us that nature classically selects this func- tional derivative to be zero, which gives us the Euler-Lagrange equations for a sys- tem with one generalized coordinate q, parametrised by k independent variables t1, ..., tk: 0 = δS δq |Nature= ∂L ∂q − ∂ ∂t1 ( ∂L ∂vq 1 ) − ... − ∂ ∂tk ( ∂L ∂vq k ). (650) 251
  • 252.
    27.6 More Examples Wecan use variational calculus to derive the (rather famous) minimal surface equa- tion. In particular, we consider the following example. Example 29 (Minimal Surface Equation) We consider all two-dimensional sur- faces parametrised by two independent variables, z := z(x, y), then ask the ques- tion – which surface of this general form has the minimal surface area? To answer this question, we can use the Euler-Lagrange equation (650) derived earlier. Say that the surface z := z(x, y) parametrised by the two independent variables t1 = x and t2 = y, has a domain D. Then (recall) its surface area is given by the double- integral: A = d 1 + ( ∂z ∂x )2 + ( ∂z ∂y )2dxdy. (651) We can view this as a variational problem by observing that: z is generalised co- ordinate parametrised by two independent variables x and y. The correspond- ing generalised velocities are given by (various notations) vz 1 = zx := ∂z ∂x and vz 2 = zy := ∂z ∂y – we shall stick with the latter notation. Now, the total surface area A can be viewed as an action functional: A = A[L], whilst our integrand (infinitesimal / area differential) can be viewed as the corresponding Lagrangian: L(z, zx, zy) = ˜ 1 + (∂z ∂x )2 + (∂z ∂y )2 = ˜ 1 + z2 x + z2 y. Since we seek to minimize A, we need to first find surfaces (parametric functions) z(x, y) which make the functional A stationary. We then need to check that these stationary ‘points’ (functions) correspond to minima, rather than inflection points or maxima. The first task can be achieved by solving the Euler-Lagrange equations (650), which take the form: ∂L ∂z − d dx ∂L ∂zx − d dy L ∂zy =0 =⇒ 0 + d dx zx ˜ 1 + z2 x + z2 y + d dy zy ˜ 1 + z2 x + z2 y =0 . (652) Although the last equation, known as the ‘minimal surface equation’, was derived by Lagrange in 1762, non-trivial (non-planar) solutions were not found till 1776 by the French Mathematical Engineer, Jean Meusnier. In particular, the planar solution is given by: Z(x, y) = Ax + By + C (653) where A, B, C are constants. Here Zx = ∂Z ∂x = A, Zy = ∂Z ∂y = B and L = ? 1 + A2 + B2 e.t.c. 252
  • 253.
    Switching to cylindricalcoordinates: (ρ, θ, z), with x = ρ cos(θ), y = ρ sin(θ) and z = z, we have another solution to the minimal surface problem. This is given by the Catenoid – a surface of revolution parametrised by a single independent variable, z: ρ = λ cosh( z λ ) (654) where λ is a constant. Note that ρ is independent of the second independent vari- able θ, since the surface rotationally symmetric (it was produced by rotating a catenoid curve about the z-axis). To show this is a solution, we can either re- derive the minimal surface equation, starting from the infinitesimal area element: dA = ˜ 1 + (∂ρ z )2 + (∂ρ ∂θ )2ρdθdz, or try some messy crap with the chain rule and the Cartesian coordinate equation. It’s far easier to start from the action principle again, with the Lagrangian: L(ρ, ∂ρ ∂z , ∂ρ ∂θ ). Since our Catenoid is independent of theta (symmetry in θ), we have ∂ρ ∂θ = 0. Therefore, our Lagrangian is the coeffi- cient function(coefficient of dθ ∧ dz) our area 2-form element: L = L(ρ, ρz) = ρ ™ 1 + ( ∂ρ ∂z )2 + 0. (655) Letting ρz := ∂ρ z , our Euler-Lagrange equation is given by: ∂L ∂ρ − d dz ∂L ∂ρz = 0, (656) which simplifies to: — 1 + ρ2 z − d dz ρzρ — 1 + ρ2 z = 0. (657) With some application of the chain and product rules, along with the hyperbolic trigonometric identities 1 + sinh2 (x) = cosh2 (x) d dx cosh(λx) =λ sinh(λx), d dx sinh(λx) = λ cosh(λx) d dx tanh(x) = sech2 (x) (658) one can show that the Catenoid surface, given by ρ(z) = λ cosh(z λ), solves the Euler-Lagrange equation (657). Hence the Catenoid corresponds to a ‘critical- surface’ (cf. ‘critical point’) of the surface area functional A and makes this func- tional (action) stationary. To see that it is indeed a minimal surface, simply note that the Lagrangian is given by the square root of a strictly-positive quantity. Since the 253
  • 254.
    Lagrangian is strictlypositive, the corresponding area (action) integral is strictly positive. This means that the Catenoid surface (or in fact any surface!), cannot be a maximal surface. Hence the Catenoid is either a stationary point or a minima of the area action functional. It is in fact a minimal surface. 27.7 Closing Remarks : The Lagrangian formalism is for the most part, a second-order formalism. This means that the equations of motion resulting from the Euler-Lagrange equations are usually second order differential equations. For many different reasons, it is sometimes to advantageous or necessary to switch to a first-order formalism – ‘Hamiltonian mechanics’. To do this, one defines the Hamiltonian as the Legendre transform of the Lagrangian: H(q, p; t) = p · 9 q − L(q, 9 q; t) (659) where the p is the conjugate momentum vector (related to the generalized veloci- ties). The components of p are defined as the partial derivatives of the Lagrangian with respect to the generalized velocities: pi := ∂L ∂ 9qi . (660) In this formalism, the natural variables are now the generalized coordinates q and the conjugate momenta p. From a practical point of view, the ultimate result is that Hamilton’s equations are coupled first-order differential equations – which in general are easier to solve than the Euler-Lagrange equations. Although they are essentially equivalent, there are many theoretical motivations for the Hamiltonian formalism – most notably, that it allows a dynamical system to be represented in ‘phase space’. Evolution of the system is then described by trajectories (q(t), p(t)) in phase-space. With such a structure, the system can be analysed using symplectic geometry and Liouville theory – the key point being that the Hamiltonian H(q, p) defines a ‘flow’ on phase space (a map on the cotan- gent bundle). This flow gives rise to a conserved, non-vanishing object called the ‘symplectic form’ – the basis for many deep mathematical theorems regarding dy- namics. 254