THE
AXIOM
St.Andrew’s Day 2021 Issue 11
42 | THE SCIENTIFIC ETONIAN
A NOTE FROM THE EDITORS
A NOTE FROM THE EDITORS
Two editions since our first online endeavour, we find ourselves on the cusp
of a new kind of beginning – a more hopeful one, as we see the pandemic
through its final stages. We have worked hard as a team to provide you with
this edition, which we hope many of you will be reading from crisp paper.
We hope you enjoy the wide span of topics, extending from a prototypical
Olympiad problem review (1992 Putnam A6) to matrix operations and more
applied maths (Neural Networks and their use in AI). There are also some
very practical tips for ensuring your older siblings don’t take advantage of you
(Pouring Drinks) and proving you have Supreme clothing without actually
owning it (Probabilistic Method).
As always, we would like to thank Dr Moston for overseeing the creation of
this edition, and to the Provost and Eton College for their support. Credit
must also be given to the writers, editors and designers, for the majority of
whom this is their first time contributing towards the Axiom.
So over Short Leave, you can add a riveting Axiom article to your fireside hot
chocolate...
Do enjoy,
The Editors
1
Contents
The Probabilistic Method.......................................3
Hilbert’s Hotel .........................................................5
Neural Networks and their Use in AI ....................7
Pouring Drinks..........................................................9
1992 Putnam Competition Question A6 ............11
Sophie German, Fermat and Primes ...................13
The Matrix Exponential ........................................15
Crossnumber ..........................................................17
Puzzles ....................................................................19
2
40 | THE SCIENTIFIC ETONIAN
The Probabilistic Method
Imagine you want to become a chief. You tell people
that you have the latest limited edition Supreme shirt
that costs £10,000. Unfortunately for you, no one
believes you – they want you to prove it. In this scenar-
io, the only possible way you can prove it is by actually
producing the physical object (or a fake). In math, it
seems like the same is true – if you want to prove that
an object with certain properties exists, you need to
actually construct it. However, it’s not why this article is
being written.
It turns out that another way of proving the existence of
an object with certain properties is by defining a ‘bag of
objects’ from which we can ‘randomly’ select the desired
object with positive probability.
More formally, we define a finite probability space,
which is a finite set W where each element is mapped
to a weight in [0,1]. In English: we have a set of objects
with each object being assigned a probability. The only
constraint on the weights is that their sum is 1.
Brief definitions:
Read the first 4, then use the rest as a reference if you
come across an unfamiliar term.
• We typically call the map P, i.e., P(x) represents the
probability associated with the object x.
• |W| = number of elements in the set W
• An ‘event’ is a subset A of W.
• The probability of an event A is ∑x AP(x) i.e. the sum
of the probabilities of all elements in A
• X Y means ‘union’, i.e. the set of all elements in X
and Y, without duplicates.
• X Y means ‘intersect’, i.e. the set containing all
shared elements between those sets
• ‘Pairwise-disjoint sets’ = no two sets have a common
element
• Kn is the complete graph with n vertices. This means
you have n vertices and as many edges as possible
(( )) – any pair of vertices is linked by an edge.
• Monochromatic = all of the same color
• Subgraph (given vertices) = the graph containing
all edges in the original graph that are between the
given set of vertices
• = the ‘choose’ function. Gives the number of
ways in which a set of r objects can be selected from
a set of n objects. The value itself is
Also written nCr.
Let’s look at some examples. corresponds to the
uniform distribution, because each element has equal
probability. What about
for ? It’s the binomial distribution! These
distributions are ways of describing the weights assigned
to different elements.
Union Bound
We now look at a useful theorem, known as the Union
Bound / Boole’s Inequality. The statement of the theo-
rem is:
just refers to the probability of the union
of all k events. We can prove this theorem relatively
straightforwardly by observing that
, for events A, B. This is true
because the union is defined to be all distinct elements
in A and in B, removing duplicates that are present in
both, and the intersection is defined to be all duplicates.
Therefore, their sum covers all non-distinct elements
in A and B, so the elements covered on each side are
identical, so the sum of the weights / probabilities will be
as well.
A direct extension of this result is that
, because .Therefore, we
can repeatedly apply this corollary to all k events to get
the Union Bound. For example,
and so on.
If the events are pairwise disjoint, ,
meaning from above, so
through the same inductive method, there is inequality
in the union bound.
If , then
if we treat W as an event. Therefore, .
3
FOURTH OF JUNE 2019 | 41
Example Problem
Let’s look at an example problem now. This is an actual
theorem proven by Paul Erdös using the probabilistic
method and is considered a classic.
Instead of trying to construct a coloured Kn with no
monochromatic Km, we will construct a probability
space, W, then add up all events where Km is monochro-
matic and use the union bound to show that we haven’t
covered all possible events, so there must be some case
in which Kn can be coloured with no monochromatic Km.
Let the probability space be all two-colorings of the
edges of Kn, with a uniform distribution. If S is a set of m
vertices of Kn, let As be the event that the correspond-
ing subgraph of Kn is monochromatic. It’s enough to
check that
because by a corollary of the union bound, this means
that , so there must be other events where
the corresponding subgraph of Kn is not monochromatic,
so we would have proven that we can colour Kn in two
colours with no monochromatic Kn.
So, let’s check it. We will now calculate . Given m
vertices from the n, the number of edges in the corre-
sponding subgraph is , because in Kn, any two verti-
ces have an edge between them, so the same will be true
here. The probability of a single edge being one of the
two colours is 0.5, and there are two options for which
colour all edges are coloured in. Therefore,
You should be getting excited at this point, as this is the
reciprocal of a quantity mentioned in the question –
we’re close! We can select a set of m vertices from n in
ways, so we have:
by the condition given in the question statement. So,
we’ve done it!
Though I’m sure at the start you were confused about
how to actually solve problems using this probabilistic
method, I hope going through this example has made
things a bit clearer for you. The crux of the approach is
setting up a probability space then adding up the proba-
bilities of events you don’t want to happen and showing
that these events don’t cover the probability space en-
tirely, meaning what you do want must exist in the prob-
ability space. A great way to solidify your understanding
is doing more problems. I highly recommend this hand-
out (https://web.evanchen.cc/handouts/Probabilistic-
Method/ProbabilisticMethod.pdf) by Evan Chen (and
frankly all of his handouts!) to whoever is interested.
It’s also worth pointing out that the notion of expected
value is very important in the probabilistic method but
was omitted here to avoid making the article too long.
Unfortunately for you, the wanna-be-chief, reality
doesn’t quite work like this, so you actually do need to
buy that £10,000 pound shirt if you want to brag about
it.
4
Zachary Marinov KS
38 | THE SCIENTIFIC ETONIAN
Hilbert’s Hotel
Hilbert’s Hotel was first introduced by David
Hilbert in 1924 for the purpose of challenging our
preconceived notions about the idea of infinity.
Essentially, it shows that a hotel which is fully
occupied and has infinitely many rooms can still
accommodate additional guests. This sounds
incredibly counterintuitive; if there are a countably
infinite number of rooms, and these rooms are all
occupied, how could more guests be brought in?
Surprisingly, there are a variety of different ways to
tackle this problem, although none of them involve
putting more than one guest in a room, which may
have been more practical…
How could a countably finite number of new guests
be accomodated in such a hotel?
The easiest way to solve this problem is simply
moving each guest into the next room. This would
involve moving the guest in room 1 to room 2, the
guest in room 2 to room 3, and so on and so forth,
allowing new guests to enter the rooms which are
emptied. In formulaic terms, if we assume that a
number of new guests Y want a room, and the room
number is represented by n, we can move the guests
already staying in the hotel from room n to room n
+ Y. Interestingly, they would have to do this at the
exact same time, otherwise this process would take
an infinite period of time as there are infinite rooms
(of course, this would never be possible in real life,
but hypothetically it is). This seems easy enough,
however, the issue here is that this method will only
work for a finite number of new guests, as the process
of moving each guest to the next room would take an
infinite period of time if done for infinite new guests.
This would pose a problem in the highly unlikely
scenario that an infinite number of new guests
suddenly turned up on the doorstep and demanded a
place to stay. How would we accommodate for both
the infinite guests already in the full hotel, and the
infinite new guests who want their own room? The
solution is actually quite simple; move each guest
already in a room to the room which is double their
current room number; thus, the guest in room n
would move to room 2n, and so on. This would result
in all the guests already in the hotel now staying in
even numbered rooms, which would leave every odd
numbered room available for the infinite number of
new guests. Incredibly, this simple solution allows for
an infinite number of guests to get rooms in a hotel
with an infinite number of rooms which were all full!
Notice the implication here: we have split an infinite
set into two infinite sets of the same size, but which
are the same as the original infinite set.
We now know how we would house an infinite
number of new guests in a hotel with an infinite
number of rooms and an infinite number of guests
staying in all of these rooms. However, this is
probably too simple for you. Imagine now, that there
are an infinite number of buses, which each carry
an infinite number of guests and pull up in front
of Hilbert’s hotel. The first step is identical to the
previous one; move all the guests already in the hotel
to the room number which is twice the number of
their current room, leaving all of the odd numbered
rooms open. Of course, since there are an infinite
number of infinite guests, not just an infinite number
of guests, it is not possible to simply move all guests
to the odd numbered rooms as done before. The
unique solution here is actually quite beautiful. What
we now must do is assume that the buses are each
numbered, and that so are the guests’ seats. The
guest on seat 1 of bus 1 would have to move into room
3, the guest on seat 2 would have to move into room
9, the guest in seat 3 would have to move into room
27, and on and on. As you have probably noticed,
they have to move into the room, which is expressed
by 3n, with n being their seat number. But why 3?
The answer is actually quite obvious; 3 is the smallest
prime number, apart from 2. However, powers of
2 cannot be used here, as that would render even
numbers, which are already occupied by the previous
guests. The second coach would therefore move into
the room expressed by 5n, and the third coach into
7n, and so on. The reason as to why prime numbers
can be used here is because this guarantees that no
guests on any of the infinite number of coaches will
have to go into the same room as anyone else; the
Fundamental Theorem of Arithmetic shows that all
positive integers apart from 1 can be expressed as a
product of one or more primes, in one unique way!
Notice also, that even an infinite number of infinite
guests will still leave many rooms empty, that is to
say, odd numbers which are not powers of prime
numbers, such as 15.
But what if there were more? A popular version
of Hilbert’s Hotel involves it being a luxury resort
on the seaside, and what better way to get to the
seaside than on a boat? However, boats aren’t safe,
and it would be much safer to be on a bus on a boat,
wouldn’t it?
5
FOURTH OF JUNE 2019 | 39
So, what if there were infinite ferries carrying infinite
buses each, which each carry infinite guests? Well,
we now have not one, not two, but three layers of
infinity. So, it follows that we should have a number
raised to a power which is raised to a power. As
before, the first step is the same; we must move all
the previous guests to the room number which is
twice their original one. Then, let the number of the
ships be s, coaches be m, and the number of seats
be n. This time, the steps are slightly more complex:
you would have to raise 3 to the power of the seat
number then raise the (s+1)st prime number to that
number. This may seem confusing, but the (s+1)
st prime number has to be used, as the first prime
number is 2, which, again, is not possible in this
situation. Therefore, the guest on n = 1 of m = 1 of s =
1, that being the guest on the first seat of the first bus
on the first ferry, would have to enter room number
27, as this is the result of raising 3 to the power of 3
to the power of 1. What you may have realized is that
the room numbers needed will increase dramatically
from one guest to another; the guest on seat 2 of bus
1 on ferry 1, i.e., the guest presumably sitting right
next to guest 1, would have to enter room number
19683, as this is the result of raising 3 to the power
of 3 to the power of 2. The next guest would enter
room 7625597484987, so hopefully they don’t have
a fear of heights!
This may seem all very pointless. But what Hilbert’s
Hotel does very aptly demonstrate is that the
cardinality, which is the number of sets in an element,
is equal for both the odd numbered rooms and all
rooms. Yet in a broader sense, Hilbert’s Hotel helps us
to understand that any infinite set of numbers (which
must be countable) can be mapped to the set of all
natural numbers. Thus, the cardinality of the set of
all odd numbers is equal to the cardinality of the set
of all natural numbers, even though the set of natural
numbers includes even numbers! This is because the
set of odd numbers is countably infinite, and thus can
be mapped to the set of all natural numbers.
6
Richard Kim KS
36 | THE SCIENTIFIC ETONIAN
To understand the concept of neural networks, first
we must delve into what AI is and the role neural
networks play in it. Artificial Intelligence is the ability
of machines to perform cognitive tasks at human
level, ranging from chess to speech recognition. So
how does this work? How can a device coded from
only ones and zeros perform such things? The answer
is simply that the software is exposed to incredible
amounts of data (the more the better) and it detects
patterns within the data. Then, when coming across
a real-life issue, it uses the patterns it has learnt to
solve the problem. This process is called machine
learning, a subset of AI. All of these calculations are
done on neural networks.
Neural Networks and Their Use in AI
adds the two shapes and deduces a nine. This is not a
realistic neural network, as normally for more com-
plex problems, there would be many more hidden
layers.
These nodes activate and deactivate, but how is it de-
cided if they are active? There is an equation for the
perceptron’s actions, as shown in the picture below.
To explain the equation, let’s isolate a node from a
hidden layer. From the previous three nodes we get
three numbers, all between 0 and 1, according to
how active the previous nodes were. Those numbers
are multiplied by their weight, a number specific to
the node. This number can change in every example,
but the reason it’s there is to take the importance of
that previous node into account. For example, if the
previous nodes represented the circle, the vertical
line and the small flick of a nine, we would say the
flick is the least important. It doesn’t fully distinguish
whether a number is a nine. Therefore, the weight
multiplies the activation number of the flick node to
take into account that it isn’t as important. On the
other hand, the circle of the nine is very important,
							
First, I would like to introduce you to the structure
of a neural network. Figure 1 is the most basic form
of a neural network. There are 3 inputs, which is the
data given to the network; and there is an output,
the answer or solution. The middle layer is called the
hidden layer because its values aren’t specifically ob-
served. The circles in the example are called nodes,
also known as perceptrons or neurons. These are the
building blocks of the network. Each node holds a
value between 0 and 1. The value it holds is related to
how active a certain node is. To explain how this net-
work works I will give it an objective: the objective of
this neural network is to identify handwritten digits.
A number is given through one input. Then it goes
through the hidden layers. Each node in the first
layer represents each specific pixel in the photo. If a
pixel is darker compared to its surrounding, suggest-
ing it’s part of the number, the corresponding node
becomes active. The next layer of nodes, instead
of being attached to a certain pixel corresponds to
a cluster of pixels, like an edge. Nodes in this layer
become active signifying the edges in the number,
but the computer still can’t deduce the number. The
next layer is the addition of these edges to make
certain shapes, for example in a nine there is a circle
at the top and a line coming down of the right of it.
Then, finally, after this layer the computer adds the
7
FOURTH OF JUNE 2019 | 37
“The progress that humans
made in just the last decade
is equal to the whole of the
18th Century”
so the weight is multiplied to amplify its importance.
Then, the sum of the activation numbers and the
weights are added to another number called the bias,
I will talk about that later, and that’s then multiplied
by a non-linear function. The non-linear function
is an equation that places the total of the brackets
into a number between 1 and 0, regarding the whole
node’s activation. Then that activation number is fed
to the next layer and the process is repeated.
How are the neural networks trained in the first
place? This has to do with the weights and the bias.
As, I mentioned for the weights, it changes the
activation number of a neutron according to its
importance towards the solution. The bias has the
same aim. It also changes the activation number, but
it’s co-dependent on the previous neurons. Without
getting into their differences too much, both biases
and weights can completely change the solution of
a program. At the very start when the network is
still being trained, its weights and bias are random.
Therefore, it gives the wrong answer. However, if it is
fed the right answer, the weights and biases change
slightly. Over time, it gets fine-tuned enough such
that it is nearly perfect. Importantly, you don’t want
to go past the perfect amount, or the weights get
too specific. This leads to the network getting it
wrong, as the written nine likely isn’t written with
absolutely perfect handwriting. All in all, weights
and biases are what train the network, and there
is a sweet spot where amount of ‘training’ isn’t too
specific or too general.
Now, moving away from the technicalities, I would
like to address the importance of the role AI and
neural networks will play in our future. The pro-
gress that humans have made in just the last decade
is equal to the whole of the 18th Century. This is
incredible, and I don’t believe such progress will slow
down, it will only continue to get faster. Currently
we are moving towards a new era, The Augmented
Age. This is the fastest jump between era’s we have
ever made, as the Information age, which began in
the mid-20th Century, is drawing to an end. What
is the Augmented Age? The Augmented Age is the
time where the border line between the physical
world and the virtual world will be less defiant, this is
called augmented reality.
The next steps for computer science are clear. Cur-
rently neural networks give us something called nar-
row AI. This means that the AI generated can only
be used for one purpose, i.e. chess AI can play chess
but cannot drive a car. However, in the future AGI
and Superintelligence may be a reality. AGI means
Artificial General Intelligence; this is human level AI.
It is AI used for various cognitive tasks humans can
do with one program, instead of a program for each
task, hence ‘general’. The reason this is so difficult to
reach is that the variables are indefinite. In a chess
program, the program knows the definite rules of
chess. It knows the parameters of what can happen,
and nothing can change that. However, even trying
to make narrow AI for self-driving cars is hard, as
there are too many undefined parameters, such as a
dog running onto the road. Humans can deal with all
of that with their cognitive abilities but computers
can’t. Superintelligence is one step further. That is
the level of AI more powerful than humans, being
able to solve problems that humans aren’t able to.
Koza Kurumlu (BJH)
8
34 | THE SCIENTIFIC ETONIAN
Pouring Drinks
Game Theory
If you have a sibling, you might already be famil-
iar with this problem. You both have a carton of
juice that you must divide amongst yourselves,
how do you do this fairly? This is a trivial game
theory situation where the dominant strategy of
either sibling would be to pour themselves more
juice at the expense of the other. Therefore, we
need a procedure where neither sibling loses out
(i.e., each sibling gets exactly ½ of the juice). The
solution for two people is quite simple: Person
A pours two glasses of juice; Person B chooses
which glass they want. A quick proof for why this
is fair goes as follows:
Let the amount of juice be 1 unit. Person A
pours x units of juice into one glass and 1-x units
of juice into the other. Person B will choose
the glass which holds max (x, 1 – x) units of
juice which means Person A gets min (x, 1 – x).
Person A will want to maximize their share, so
they will get max (min (x, 1 – x)) which is 0.5 as
we can see from the graph below. Since the most
Person A can get is ½ of juice, this procedure is
fair since Person B also gets at least ½.
A procedure for 3 people
Now, some people are unfortunate enough to
have two siblings, so we need a new procedure to
deal with this. Here, we have an added complexity
where we need to prevent two people from con-
spiring to give the third person an unfair share so
they can both share the rest between them. We
will again start with 1 unit of juice:
Person A pours what they think is ⅓ units of juice
into one glass which they then pass to Person B.
Person B can now choose to pour some of the
juice back into the carton if they believe the glass
contains more than ⅓ units of juice. Finally, the
glass is passed to Person C who can either chose
to accept the juice or not. If they accept it, Per-
son A and Person B then use the procedure for 2
people to divide the rest of the juice. If Person C
rejects this glass, Person B gets the glass if they
poured any juice back into the carton. If Person B
did not pour any juice back into the carton, Person
A gets the glass. The remaining two people then
carry out the procedure for 2 people to divide the
rest of the juice.
The reason this procedure works is that we first
ensure Person C gets ⅓ of the juice. If Person A
pours less than ⅓ units of juice, Person B will not
pour any back into the carton and Person C will
reject the glass, meaning that Person A must take
the glass with less than ⅓ units of juice. Person A
will want to avoid this by pouring at least ⅓ units
of juice. When this glass is passed to Person B, if
the glass contains more than ⅓ of the juice, they
will pour some back into the carton. This is so
that if they carry out the procedure for 2 people
later with Person A, there are ⅔ units of juice left,
allowing for Person B to get at least ⅓ of the juice.
9
FOURTH OF JUNE 2019 | 35
If Person B tries to give Person C less than a ⅓,
Person C will simply reject it and Person B will
have to take the glass.
So now that we have ensured Person C gets ⅓ of
the juice, we have ⅔ units of juice left which can be
divided using the procedure for 2 people. We know
that this ensures each person gets ½ of the juice
remaining, so this means every single person is left
with ⅓ units of juice.
A procedure for n people
Finally, how can we devise a procedure where we
divide 1 unit of juice amongst n people such that
each person gets 1/n units? We can use an iterative
method here where each iteration, one person
walks away with 1/n units of juice. The way this
works is as follows:
Person A pours the juice into a glass, then any of
the other n – 1 people can choose to tell Person A
to stop pouring the juice. Once someone has told
Person A to stop, Person A stops pouring, and the
glass is given to the first person who said stop. The
process is then repeated until we are left with one
last person who then carries out the procedure for
2 people with Person A.
This solution works because everyone has an in-
centive to tell Person A to stop as soon as 1/n units
of juice have been poured. This is because as soon
as more is poured, someone else can tell Person
A to stop, and now the rest of the players are
left with less than (n-1)/n units of juice, meaning
someone must walk away with less than 1/n units
of juice. Therefore, we can simply keep repeating
this process until two people are left and everyone
will walk away with a fair share.
Aarit Bhattacharya KS
References
Stromquist, W. How to Cut a Cake Fairly, The
American Mathematical Monthly, Vol. 87, No. 8
(Oct. 1980), pp. 640-644
Talwalkar, P. How To Split A Cake Fairly Using
Math – Game Theory Tuesdays, Accessed from
https://mindyourdecisions.com/blog/2015/12/22/
how-to-split-a-cake-fairly-using-math-game-the-
ory-tuesdays/
Happy pouring!
10
32 | THE SCIENTIFIC ETONIAN
Introduction
The William Lowell Putnam Mathematical
Competition is the preeminent mathematics compe-
tition for undergraduate students in the United States
and Canada. The competition consists of two 3-hour
sessions spread across one day. During each session,
participants work individually on 6 challenging math-
ematical problems. The questions are labelled from A1
to A6 and B1 to B6, with questions A6 and B6 being
the hardest of them all. I will be discussing question
A6 on the 1992 Putnam Competition.
Question statement
The question asks,
“Four points are chosen independently and at
random on the surface of a sphere, using a
uniform distribution (each point equally likely
to be chosen). What is the probability that the
center of the sphere lies inside the tetrahe-
dron whose vertices are at the four points?”
At first, this question can seem quite daunting, as it’s hard to
visualize all of the different possible tetrahedrons; frankly, it’s
also difficult because you don’t know where to start! In order
to simplify the question, we can first look at a similar 2D prob-
lem which we can then use to solve the 3D case.
Problem in 2D
Let’s imagine that the question asks, “Three points are chosen
at random on the circumference of a circle. What is the proba-
bility that the center of the circle lies inside the triangle whose
vertices are at the three points?” This now greatly simplifies
the problem, making it easier to understand. Let’s select two
points, and draw two diameters, each passing through one of
the two chosen points and landing on the other side of the cir-
cle at endpoints. The arc formed by these endpoints, that does
not contain the chosen points, must contain the third point
for the center to be contained by the triangle (see below). To
get your head around this insight, you can think about what
happens when the 3rd point is a diameter endpoint, and then
if you move it slightly outside the region.
1992 Putnam Competition
Question A6
11
FOURTH OF JUNE 2019 | 33
Let’s now have a fresh look at the problem. Instead of select-
ing points initially, we can choose two random diameters and
then randomly choose where our third point will be, as it will
end up in one of the four arcs created (see below). Finally, we
can randomly assign one end of each diameter to be the first
two points. In this situation, there are two options for each di-
ameter so four possibilities across both diameters. Only one of
these situations leads to the pre-chosen third point being on
the arc opposite the first two points. Therefore, the probabili-
ty of the triangle created by choosing three random points on
the circumference of a circle containing the center is ¼.
Problem in 3D
If we apply this logic to our 3D case, we can first random-
ly choose three diameters in our sphere, and then random-
ly choose where the fourth point will be, as it will be in one
of the 8 sections created by the three diameters. Finally,
we will randomly select an end of each diameter to place a
point. In a similar way to the 2D case, here we have 8 different
possibilities for where the first three points could be. Only one
of them leads to the fourth point being in the opposite sector,
or on the opposite side of the center, to the first three points,
where the tetrahedron created would contain the center of
the sphere. This means that we have a 1 in 8 chance of the tet-
rahedron created by choosing four random points on the sur-
face of a sphere containing the center of the sphere.
Conclusion
Here we can see that even one of the hardest questions in an
extremely difficult competition, which at first seems excep-
tionally challenging to wrap our heads around, can be solved
in quite an elegant manner if we work through the logic step
by step. Instead of trying to rush into the problem head-on,
sometimes we must take a step back and see what we can do
to simplify the question. In this situation, it would have been
exceedingly strenuous to try solving this question only in 3D,
but after having solved a similar question in 2D, we were able
to apply our knowledge to the 3D case as well. A question
which 123 of the 203 undergraduate level participants sub-
mitted no solution for was able to be solved and understood by
a much younger audience such as ourselves. This article shows
that a change of perspective is all that is needed to solve even
some of the most challenging problems.
Rajas Nanda KS
12
30 | THE SCIENTIFIC ETONIAN
Sophie Germain, Fermat and Primes
Number theory, otherwise understood as the
study of the natural (positive whole) numbers, makes
up a large part of ancient and modern mathematics
as an extension of pure mathematical theory. It is
one of the oldest, and often considered the purest,
of mathematical pursuits and was described as the
‘queen of mathematics’ by the great mathematician
and physicist Carl Friedrich Gauss. Its aim is to
delve into the complex and interesting relationships
between integers, and before the advent of computer
science, was considered solely as an object of
fascination (for those with that kind of curiosity).
Whether or not you agree with that conclusion,
number theory nevertheless provides many
interesting avenues for us to explore.
Some of these relationships lie waiting
to be discovered or proven, and generations of
mathematicians have dedicated their lives to fully
understanding and proving a conjecture that may
or may not turn out to be unsolvable. This idea is
perhaps most relevant in Fermat’s infamous Last
Theorem, which was finally proved in 1995 after more
than 350 years of scrutiny by the world’s greatest
minds. However, this story has been told one too
many times, and instead I’m going to attempt to
introduce and explain the individual cases that were
scattered across the 17th, 18th, and 19th centuries,
as well as the story of the first real attempt to prove
the general case. If anyone needs to brush up on
their memory of Fermat’s Last Theorem, it’s stated,
as follows, that there are no integers x,y,z>0, n>2,
such that xn+yn=zn. When looking at the theorem,
the relevance of number theory, and especially of
primes, is obvious; The fundamental theorem of
arithmetic (which I won’t prove here) states that
every integer greater than 1 is either prime (a natural
number larger than 1 that is not the product of two
smaller natural numbers) or can be represented as a
product of primes. It follows from this and from the
fact that ars=(ar)s=(as)r that n can be split into its
component primes and thus for a composite number
n the equation xn+yn = zn can be expressed as
(xm)p+(ym)p = (zm)p, where p is prime, and mp=n.
Note that we only need to concentrate on odd
primes, as the theory holds for n being a power of 2,
so for the special case where n is a power of 2 greater
than or equal to 4, we can instead represent it as a
4. If n is already prime, we don’t need to worry, and
so this shows that we can always express the equation
with the exponent as an odd prime or 4. This means
that if we can show that Fermat’s Last Theorem is
true for all odd primes and 4, then it holds true for
all integers greater than 2. This doesn’t actually get
us anywhere, but this was the deceptively promising
situation in the early 19th century.
Fermat himself managed to prove the case
for n=4 using an infinite descent method, and Euler
proved the case for n=3 in 1770. The case for n=5
seemed to be a little trickier to crack, and it required
a preliminary proof called Sophie Germain’s theorem,
which is where we introduce the main character.
“Unfortunately, this was 18th
century France, which was not
particularly welcoming to female
scientists and mathematicians.”
Sophie Germain was born in Paris, 1776 to
relatively rich parents. When she was only 13, the
French Revolution began, and with chaos on the
streets of Paris, Sophie retreated into the family’s
library and started reading Montucla’s L’Histoire
des Mathématiques. The story goes that she read
about Archimedes’ death to a Roman soldier who he
was ignoring due to his interest in a mathematical
problem. She was fascinated by the fact that
someone could be so engrossed in mathematics and
so started herself on the works of Newton, Euler
and Bézout. Unfortunately, this was 18th century
France, which was not particularly welcoming to
female scientists and mathematicians. Her parents
disapproved of her interest and denied her clothes
and a fire to stop her nocturnal studying.
When she was 18, the École Polytechnique
opened in Paris for the purpose of studying
mathematics and science, however the idea of a
woman joining the university was out of the question.
Sophie instead managed with obtaining the lecture
notes and submitting observations under the
pseudonym Monsieur LeBlanc. Some of her insights
13
FOURTH OF JUNE 2019 | 31
impressed Joseph-Louis Lagrange, who was at that
time giving lectures on analysis, so much so that
he asked to meet “Monsieur LeBlanc” in person.
Upon learning this, she was forced to reveal her real
identity, but Lagrange’s respect for her did not waver.
Instead, he became her sponsor and mentor.
Germain had interests in many areas of
mathematics and physics, but she spent most of her
time working on theories of elasticity (an equally
fascinating subject) and on number theory. In
particular, she was the first person to make a viable
attempt to fully prove Fermat’s Last Theorem.
Considering only odd prime exponents,
Fermat’s Last Theorem is often split into two cases:
for the equation xp+yp=zp, where each pair of x, y,
and z are coprime (this condition must hold for all
three to be coprime), the first case is where p doesn’t
divide xyz, and the second is where it does. The
result of Sophie Germain’s theorem is that if certain
conditions hold for the odd prime exponent p, then p2
divides xyz and so the first case must be false. These
two conditions are:
• There exists an auxiliary prime q such that there is
no value for x such that xp ≡ p(mod q)
• xp+yp+zp ≡ 0(mod q) ==> x ≡ 0(mod q) or
y ≡ 0(mod q) or z ≡ 0(mod q)
The proof relies mainly on modular arithmetic
and prime factors, as expected and it uses help from
other theorems such as Euler’s theorem and Fermat’s
Little Theorem. The result of this is that for a large
set of prime numbers, a certain case of Fermat’s
Last Theorem has been proven, and Sophie Germain
planned to go further, by showing that there were
infinite auxiliary primes for each odd prime, and as
either x,y, or z had to have all of them as factors,
then the finite integers x,y,z would have infinite
prime factors. Unfortunately, this ‘grand plan’, as she
called it, was doomed to fail, as Sophie later proved
herself that some primes had in fact a finite number
of auxiliary primes. But this was still the first time
that someone had attempted a proof for all odd
prime exponents, and although it didn’t succeed,
she became the first person to make real progress
on the problem, proving the theorem for a specific
case. She did all this in an environment where she was
rarely accepted or encouraged as a mathematician.
When she started working on elasticity, and won a
prize from the Institut de France for developments
in elastic theory, she was still not able to actually
attend the college, and later, when her work on metal
deformations was used to build the Eiffel Tower in
the 1880’s, the names of seventy-two scholars who
contributed to the design were listed around the
outside. Sophie Germain was not one of them.
Benedict Harvey KS
14
28 | THE SCIENTIFIC ETONIAN
The Matrix Exponential
Introduction
Here we will be exploring what it means to raise a
number to a matrix power and how it differs from
the definitions you learn from the real number line.
To achieve this, I assume the reader knows how this
method works for the real numbers and how simple
matrix operations work. Even if you haven’t studied
matrices, this is a remarkably interesting and useful
topic in maths, physics and especially in quantum
mechanics, so this is good further reading for anyone
planning to study maths/physics at university.
Conceptually, this makes no sense. How can you
multiply a number by itself a matrix number of
times? So, to solve this problem, we need to redefine
what it means to raise a number to a matrix pow-
er. Firstly, let’s simplify our case to the expression
exp(A) where A is a matrix and exp() is the natural
exponential function, i.e. eA
. This allows us to express
the exponential as a polynomial using the Taylor
series of exp():
Substituting our real “x” for our matrix “A” allows us
to evaluate exp(A) since there are no matrices in the
powers. This is the new definition of what it means to
raise a number to the power of a matrix:
This is now a matrix equation, so I have rewritten “1”
as I; the identity matrix. It is important to note that
A must be a square matrix (n x n) since multiplica-
tion requires the number of rows of one matrix to be
equal to the number of columns of the other. Since,
in this case, these are the same matrices, rows must
equal columns and hence A is square. Other than
this, there are no other restrictions on A: it can be
any dimension, contain complex numbers and be
singular or non-singular. However, to evaluate this
series, it’s required to converge for all matrices.
Proof (simplified)
Let A be a square (n x n) matrix and m is the largest
value in A:
Looking at the ratio between two consecutive terms
(allowed since terms ≥ 0):
Since all smaller terms will converge quicker,
Now we have a well-defined series equivalent of
raising a number to the power of a matrix; however,
it’s in the form of an infinite series which, in most
cases, is impossible to evaluate without taking a finite
number of terms. Fortunately, there are a few cases
which reduce the series to a finite polynomial.
Nilpotent case
A square matrix, N, can be described as nilpotent
if Nx
= 0 for some integer x. Since this matrix will
continue to be 0 for any power larger than x, we only
need to evaluate a finite number of terms to find an
exact result for the matrix polynomial. An example of
this may be :
15
FOURTH OF JUNE 2019 | 29
Diagonal case
A diagonal matrix is a matrix where the only non-ze-
ro entries are on its leading diagonal. Moreover, any
power of a diagonal matrix is just the power of each
term on the diagonal. Using this it’s easy to prove
that:
The third case is shown below where the Taylor series
is periodic which itself is a Taylor series. To demon-
strate this, let’s look at a use of the matrix exponen-
tial in the context of differential equations; its most
common use:
Hopefully, you can already see how useful matrices
are when it comes to solving any set of simultaneous
equations since you can always write them into a
single equation with a matrix constant. Differential
relationships are extremely common in the natural
world and hence the matrix exponential can be ex-
tremely useful in simplifying differential equations in
higher dimensions. Solving above, the matrix expo-
nential appears:
Using our definitions above, this is solvable for x and
y since the Taylor series will converge to a 2x2 matrix.
In fact, this particular Taylor series simplifies to an
incredibly significant result:
As some of you may know, this is the general rotation
matrix; increasing t by 1 moves the point (x,y) 1
radian round the unit circle. This is very pleasing.
Looking back at our original differential equation, this
is a very intuitive result. is the 90-degree
rotation matrix and the only way for position, (x,y),
to always be perpendicular to velocity, , is for
all the solutions to lie on the unit circle.
The matrix exponential also turns up in quantum me-
chanics, particularly in Schrödinger’s equation, since
this topic is filled with higher dimensional differential
equations. However, as with many topics in maths
nowadays, it’s rare to calculate it by hand. If you want
a more visual explanation of this topic, 3Blue1Brown
on YouTube has an amazing video visualising the
matrix exponential. I based this article off that video,
and I highly recommend his channel for any further
reading you wish to do for maths.
Arthur Tollit ma (NPTL)
16
26 | THE SCIENTIFIC ETONIAN
Crossnumber
Across:
1) Number of sides in an irregular shape
with average interior angle of 160
3) Fibonacci number
5) Square number
6) Number of digits in 13!
7) Multiple of 14 Down
9) 22nd triangle number
12) 1/5 of 10 Down
13) Square number plus 49
15) (12 Across - 6 Across) x 7 Across
Down:
2) 10 times a square number
3) 4th power of an integer
4) (5413 - year Eton was founded)
MOD 1000
6) A prime number found nowhere else
in the grid
7) Root of y = x2
- 12x - 253
8) Cube number
10) Fibonacci number
11) (12 Across + 1)2
13) 6 Across + (5 Across MOD 17)
14) Sum of coefficients of a quartic
with y-intercept of 5, where f(0) + f(1)
= 18
17
FOURTH OF JUNE 2019 | 27
Zachary Marinov KS
18
FOURTH
OF
JUNE
2019
|
25
Puzzles
Sources:
BMO1, 2020 Purple Comet Spring Meet 2020 Ritangle 2020
2015 AMC 8 Oxford MAT 2009
1) Given the equation
what is the value of sinx?
2) In the equation
the letters A, B, C, D and E represent different
base-10 digits. Given that C = 9, find A, B, D and E.
3) Find the sum of all values of x such that the set
4 {107,122, 127, 137, 152, x} has a mean that is equal
rgrto its median.
4) The diagram shown consists of four squares and two
44equilateral triangles, all with a side length of 1 unit,
44that surround a hexagon. Find a.
5) A football league consists of two four-team divi-
44sions. Each team plays every other team in its
44division N games. Each team plays every team in
44the other division M games with N > 2M and M > 4.
44Each team plays a 76 game schedule. How many
44games does a team play within its own division?
6) For what values of k does
have four real solutions?
Aarit Bhattacharya KS
Sources:
BMO1, 2020 Purple Comet Spring Meet 2020 Ritangle 2020
2015 AMC 8 Oxford MAT 2009
19
24
|
THE
SCIENTIFIC
ETONIAN

The Axiom - Issue 11 - St Andrew's Day 2021

  • 1.
  • 2.
    42 | THESCIENTIFIC ETONIAN A NOTE FROM THE EDITORS A NOTE FROM THE EDITORS Two editions since our first online endeavour, we find ourselves on the cusp of a new kind of beginning – a more hopeful one, as we see the pandemic through its final stages. We have worked hard as a team to provide you with this edition, which we hope many of you will be reading from crisp paper. We hope you enjoy the wide span of topics, extending from a prototypical Olympiad problem review (1992 Putnam A6) to matrix operations and more applied maths (Neural Networks and their use in AI). There are also some very practical tips for ensuring your older siblings don’t take advantage of you (Pouring Drinks) and proving you have Supreme clothing without actually owning it (Probabilistic Method). As always, we would like to thank Dr Moston for overseeing the creation of this edition, and to the Provost and Eton College for their support. Credit must also be given to the writers, editors and designers, for the majority of whom this is their first time contributing towards the Axiom. So over Short Leave, you can add a riveting Axiom article to your fireside hot chocolate... Do enjoy, The Editors 1
  • 3.
    Contents The Probabilistic Method.......................................3 Hilbert’sHotel .........................................................5 Neural Networks and their Use in AI ....................7 Pouring Drinks..........................................................9 1992 Putnam Competition Question A6 ............11 Sophie German, Fermat and Primes ...................13 The Matrix Exponential ........................................15 Crossnumber ..........................................................17 Puzzles ....................................................................19 2
  • 4.
    40 | THESCIENTIFIC ETONIAN The Probabilistic Method Imagine you want to become a chief. You tell people that you have the latest limited edition Supreme shirt that costs £10,000. Unfortunately for you, no one believes you – they want you to prove it. In this scenar- io, the only possible way you can prove it is by actually producing the physical object (or a fake). In math, it seems like the same is true – if you want to prove that an object with certain properties exists, you need to actually construct it. However, it’s not why this article is being written. It turns out that another way of proving the existence of an object with certain properties is by defining a ‘bag of objects’ from which we can ‘randomly’ select the desired object with positive probability. More formally, we define a finite probability space, which is a finite set W where each element is mapped to a weight in [0,1]. In English: we have a set of objects with each object being assigned a probability. The only constraint on the weights is that their sum is 1. Brief definitions: Read the first 4, then use the rest as a reference if you come across an unfamiliar term. • We typically call the map P, i.e., P(x) represents the probability associated with the object x. • |W| = number of elements in the set W • An ‘event’ is a subset A of W. • The probability of an event A is ∑x AP(x) i.e. the sum of the probabilities of all elements in A • X Y means ‘union’, i.e. the set of all elements in X and Y, without duplicates. • X Y means ‘intersect’, i.e. the set containing all shared elements between those sets • ‘Pairwise-disjoint sets’ = no two sets have a common element • Kn is the complete graph with n vertices. This means you have n vertices and as many edges as possible (( )) – any pair of vertices is linked by an edge. • Monochromatic = all of the same color • Subgraph (given vertices) = the graph containing all edges in the original graph that are between the given set of vertices • = the ‘choose’ function. Gives the number of ways in which a set of r objects can be selected from a set of n objects. The value itself is Also written nCr. Let’s look at some examples. corresponds to the uniform distribution, because each element has equal probability. What about for ? It’s the binomial distribution! These distributions are ways of describing the weights assigned to different elements. Union Bound We now look at a useful theorem, known as the Union Bound / Boole’s Inequality. The statement of the theo- rem is: just refers to the probability of the union of all k events. We can prove this theorem relatively straightforwardly by observing that , for events A, B. This is true because the union is defined to be all distinct elements in A and in B, removing duplicates that are present in both, and the intersection is defined to be all duplicates. Therefore, their sum covers all non-distinct elements in A and B, so the elements covered on each side are identical, so the sum of the weights / probabilities will be as well. A direct extension of this result is that , because .Therefore, we can repeatedly apply this corollary to all k events to get the Union Bound. For example, and so on. If the events are pairwise disjoint, , meaning from above, so through the same inductive method, there is inequality in the union bound. If , then if we treat W as an event. Therefore, . 3
  • 5.
    FOURTH OF JUNE2019 | 41 Example Problem Let’s look at an example problem now. This is an actual theorem proven by Paul Erdös using the probabilistic method and is considered a classic. Instead of trying to construct a coloured Kn with no monochromatic Km, we will construct a probability space, W, then add up all events where Km is monochro- matic and use the union bound to show that we haven’t covered all possible events, so there must be some case in which Kn can be coloured with no monochromatic Km. Let the probability space be all two-colorings of the edges of Kn, with a uniform distribution. If S is a set of m vertices of Kn, let As be the event that the correspond- ing subgraph of Kn is monochromatic. It’s enough to check that because by a corollary of the union bound, this means that , so there must be other events where the corresponding subgraph of Kn is not monochromatic, so we would have proven that we can colour Kn in two colours with no monochromatic Kn. So, let’s check it. We will now calculate . Given m vertices from the n, the number of edges in the corre- sponding subgraph is , because in Kn, any two verti- ces have an edge between them, so the same will be true here. The probability of a single edge being one of the two colours is 0.5, and there are two options for which colour all edges are coloured in. Therefore, You should be getting excited at this point, as this is the reciprocal of a quantity mentioned in the question – we’re close! We can select a set of m vertices from n in ways, so we have: by the condition given in the question statement. So, we’ve done it! Though I’m sure at the start you were confused about how to actually solve problems using this probabilistic method, I hope going through this example has made things a bit clearer for you. The crux of the approach is setting up a probability space then adding up the proba- bilities of events you don’t want to happen and showing that these events don’t cover the probability space en- tirely, meaning what you do want must exist in the prob- ability space. A great way to solidify your understanding is doing more problems. I highly recommend this hand- out (https://web.evanchen.cc/handouts/Probabilistic- Method/ProbabilisticMethod.pdf) by Evan Chen (and frankly all of his handouts!) to whoever is interested. It’s also worth pointing out that the notion of expected value is very important in the probabilistic method but was omitted here to avoid making the article too long. Unfortunately for you, the wanna-be-chief, reality doesn’t quite work like this, so you actually do need to buy that £10,000 pound shirt if you want to brag about it. 4 Zachary Marinov KS
  • 6.
    38 | THESCIENTIFIC ETONIAN Hilbert’s Hotel Hilbert’s Hotel was first introduced by David Hilbert in 1924 for the purpose of challenging our preconceived notions about the idea of infinity. Essentially, it shows that a hotel which is fully occupied and has infinitely many rooms can still accommodate additional guests. This sounds incredibly counterintuitive; if there are a countably infinite number of rooms, and these rooms are all occupied, how could more guests be brought in? Surprisingly, there are a variety of different ways to tackle this problem, although none of them involve putting more than one guest in a room, which may have been more practical… How could a countably finite number of new guests be accomodated in such a hotel? The easiest way to solve this problem is simply moving each guest into the next room. This would involve moving the guest in room 1 to room 2, the guest in room 2 to room 3, and so on and so forth, allowing new guests to enter the rooms which are emptied. In formulaic terms, if we assume that a number of new guests Y want a room, and the room number is represented by n, we can move the guests already staying in the hotel from room n to room n + Y. Interestingly, they would have to do this at the exact same time, otherwise this process would take an infinite period of time as there are infinite rooms (of course, this would never be possible in real life, but hypothetically it is). This seems easy enough, however, the issue here is that this method will only work for a finite number of new guests, as the process of moving each guest to the next room would take an infinite period of time if done for infinite new guests. This would pose a problem in the highly unlikely scenario that an infinite number of new guests suddenly turned up on the doorstep and demanded a place to stay. How would we accommodate for both the infinite guests already in the full hotel, and the infinite new guests who want their own room? The solution is actually quite simple; move each guest already in a room to the room which is double their current room number; thus, the guest in room n would move to room 2n, and so on. This would result in all the guests already in the hotel now staying in even numbered rooms, which would leave every odd numbered room available for the infinite number of new guests. Incredibly, this simple solution allows for an infinite number of guests to get rooms in a hotel with an infinite number of rooms which were all full! Notice the implication here: we have split an infinite set into two infinite sets of the same size, but which are the same as the original infinite set. We now know how we would house an infinite number of new guests in a hotel with an infinite number of rooms and an infinite number of guests staying in all of these rooms. However, this is probably too simple for you. Imagine now, that there are an infinite number of buses, which each carry an infinite number of guests and pull up in front of Hilbert’s hotel. The first step is identical to the previous one; move all the guests already in the hotel to the room number which is twice the number of their current room, leaving all of the odd numbered rooms open. Of course, since there are an infinite number of infinite guests, not just an infinite number of guests, it is not possible to simply move all guests to the odd numbered rooms as done before. The unique solution here is actually quite beautiful. What we now must do is assume that the buses are each numbered, and that so are the guests’ seats. The guest on seat 1 of bus 1 would have to move into room 3, the guest on seat 2 would have to move into room 9, the guest in seat 3 would have to move into room 27, and on and on. As you have probably noticed, they have to move into the room, which is expressed by 3n, with n being their seat number. But why 3? The answer is actually quite obvious; 3 is the smallest prime number, apart from 2. However, powers of 2 cannot be used here, as that would render even numbers, which are already occupied by the previous guests. The second coach would therefore move into the room expressed by 5n, and the third coach into 7n, and so on. The reason as to why prime numbers can be used here is because this guarantees that no guests on any of the infinite number of coaches will have to go into the same room as anyone else; the Fundamental Theorem of Arithmetic shows that all positive integers apart from 1 can be expressed as a product of one or more primes, in one unique way! Notice also, that even an infinite number of infinite guests will still leave many rooms empty, that is to say, odd numbers which are not powers of prime numbers, such as 15. But what if there were more? A popular version of Hilbert’s Hotel involves it being a luxury resort on the seaside, and what better way to get to the seaside than on a boat? However, boats aren’t safe, and it would be much safer to be on a bus on a boat, wouldn’t it? 5
  • 7.
    FOURTH OF JUNE2019 | 39 So, what if there were infinite ferries carrying infinite buses each, which each carry infinite guests? Well, we now have not one, not two, but three layers of infinity. So, it follows that we should have a number raised to a power which is raised to a power. As before, the first step is the same; we must move all the previous guests to the room number which is twice their original one. Then, let the number of the ships be s, coaches be m, and the number of seats be n. This time, the steps are slightly more complex: you would have to raise 3 to the power of the seat number then raise the (s+1)st prime number to that number. This may seem confusing, but the (s+1) st prime number has to be used, as the first prime number is 2, which, again, is not possible in this situation. Therefore, the guest on n = 1 of m = 1 of s = 1, that being the guest on the first seat of the first bus on the first ferry, would have to enter room number 27, as this is the result of raising 3 to the power of 3 to the power of 1. What you may have realized is that the room numbers needed will increase dramatically from one guest to another; the guest on seat 2 of bus 1 on ferry 1, i.e., the guest presumably sitting right next to guest 1, would have to enter room number 19683, as this is the result of raising 3 to the power of 3 to the power of 2. The next guest would enter room 7625597484987, so hopefully they don’t have a fear of heights! This may seem all very pointless. But what Hilbert’s Hotel does very aptly demonstrate is that the cardinality, which is the number of sets in an element, is equal for both the odd numbered rooms and all rooms. Yet in a broader sense, Hilbert’s Hotel helps us to understand that any infinite set of numbers (which must be countable) can be mapped to the set of all natural numbers. Thus, the cardinality of the set of all odd numbers is equal to the cardinality of the set of all natural numbers, even though the set of natural numbers includes even numbers! This is because the set of odd numbers is countably infinite, and thus can be mapped to the set of all natural numbers. 6 Richard Kim KS
  • 8.
    36 | THESCIENTIFIC ETONIAN To understand the concept of neural networks, first we must delve into what AI is and the role neural networks play in it. Artificial Intelligence is the ability of machines to perform cognitive tasks at human level, ranging from chess to speech recognition. So how does this work? How can a device coded from only ones and zeros perform such things? The answer is simply that the software is exposed to incredible amounts of data (the more the better) and it detects patterns within the data. Then, when coming across a real-life issue, it uses the patterns it has learnt to solve the problem. This process is called machine learning, a subset of AI. All of these calculations are done on neural networks. Neural Networks and Their Use in AI adds the two shapes and deduces a nine. This is not a realistic neural network, as normally for more com- plex problems, there would be many more hidden layers. These nodes activate and deactivate, but how is it de- cided if they are active? There is an equation for the perceptron’s actions, as shown in the picture below. To explain the equation, let’s isolate a node from a hidden layer. From the previous three nodes we get three numbers, all between 0 and 1, according to how active the previous nodes were. Those numbers are multiplied by their weight, a number specific to the node. This number can change in every example, but the reason it’s there is to take the importance of that previous node into account. For example, if the previous nodes represented the circle, the vertical line and the small flick of a nine, we would say the flick is the least important. It doesn’t fully distinguish whether a number is a nine. Therefore, the weight multiplies the activation number of the flick node to take into account that it isn’t as important. On the other hand, the circle of the nine is very important, First, I would like to introduce you to the structure of a neural network. Figure 1 is the most basic form of a neural network. There are 3 inputs, which is the data given to the network; and there is an output, the answer or solution. The middle layer is called the hidden layer because its values aren’t specifically ob- served. The circles in the example are called nodes, also known as perceptrons or neurons. These are the building blocks of the network. Each node holds a value between 0 and 1. The value it holds is related to how active a certain node is. To explain how this net- work works I will give it an objective: the objective of this neural network is to identify handwritten digits. A number is given through one input. Then it goes through the hidden layers. Each node in the first layer represents each specific pixel in the photo. If a pixel is darker compared to its surrounding, suggest- ing it’s part of the number, the corresponding node becomes active. The next layer of nodes, instead of being attached to a certain pixel corresponds to a cluster of pixels, like an edge. Nodes in this layer become active signifying the edges in the number, but the computer still can’t deduce the number. The next layer is the addition of these edges to make certain shapes, for example in a nine there is a circle at the top and a line coming down of the right of it. Then, finally, after this layer the computer adds the 7
  • 9.
    FOURTH OF JUNE2019 | 37 “The progress that humans made in just the last decade is equal to the whole of the 18th Century” so the weight is multiplied to amplify its importance. Then, the sum of the activation numbers and the weights are added to another number called the bias, I will talk about that later, and that’s then multiplied by a non-linear function. The non-linear function is an equation that places the total of the brackets into a number between 1 and 0, regarding the whole node’s activation. Then that activation number is fed to the next layer and the process is repeated. How are the neural networks trained in the first place? This has to do with the weights and the bias. As, I mentioned for the weights, it changes the activation number of a neutron according to its importance towards the solution. The bias has the same aim. It also changes the activation number, but it’s co-dependent on the previous neurons. Without getting into their differences too much, both biases and weights can completely change the solution of a program. At the very start when the network is still being trained, its weights and bias are random. Therefore, it gives the wrong answer. However, if it is fed the right answer, the weights and biases change slightly. Over time, it gets fine-tuned enough such that it is nearly perfect. Importantly, you don’t want to go past the perfect amount, or the weights get too specific. This leads to the network getting it wrong, as the written nine likely isn’t written with absolutely perfect handwriting. All in all, weights and biases are what train the network, and there is a sweet spot where amount of ‘training’ isn’t too specific or too general. Now, moving away from the technicalities, I would like to address the importance of the role AI and neural networks will play in our future. The pro- gress that humans have made in just the last decade is equal to the whole of the 18th Century. This is incredible, and I don’t believe such progress will slow down, it will only continue to get faster. Currently we are moving towards a new era, The Augmented Age. This is the fastest jump between era’s we have ever made, as the Information age, which began in the mid-20th Century, is drawing to an end. What is the Augmented Age? The Augmented Age is the time where the border line between the physical world and the virtual world will be less defiant, this is called augmented reality. The next steps for computer science are clear. Cur- rently neural networks give us something called nar- row AI. This means that the AI generated can only be used for one purpose, i.e. chess AI can play chess but cannot drive a car. However, in the future AGI and Superintelligence may be a reality. AGI means Artificial General Intelligence; this is human level AI. It is AI used for various cognitive tasks humans can do with one program, instead of a program for each task, hence ‘general’. The reason this is so difficult to reach is that the variables are indefinite. In a chess program, the program knows the definite rules of chess. It knows the parameters of what can happen, and nothing can change that. However, even trying to make narrow AI for self-driving cars is hard, as there are too many undefined parameters, such as a dog running onto the road. Humans can deal with all of that with their cognitive abilities but computers can’t. Superintelligence is one step further. That is the level of AI more powerful than humans, being able to solve problems that humans aren’t able to. Koza Kurumlu (BJH) 8
  • 10.
    34 | THESCIENTIFIC ETONIAN Pouring Drinks Game Theory If you have a sibling, you might already be famil- iar with this problem. You both have a carton of juice that you must divide amongst yourselves, how do you do this fairly? This is a trivial game theory situation where the dominant strategy of either sibling would be to pour themselves more juice at the expense of the other. Therefore, we need a procedure where neither sibling loses out (i.e., each sibling gets exactly ½ of the juice). The solution for two people is quite simple: Person A pours two glasses of juice; Person B chooses which glass they want. A quick proof for why this is fair goes as follows: Let the amount of juice be 1 unit. Person A pours x units of juice into one glass and 1-x units of juice into the other. Person B will choose the glass which holds max (x, 1 – x) units of juice which means Person A gets min (x, 1 – x). Person A will want to maximize their share, so they will get max (min (x, 1 – x)) which is 0.5 as we can see from the graph below. Since the most Person A can get is ½ of juice, this procedure is fair since Person B also gets at least ½. A procedure for 3 people Now, some people are unfortunate enough to have two siblings, so we need a new procedure to deal with this. Here, we have an added complexity where we need to prevent two people from con- spiring to give the third person an unfair share so they can both share the rest between them. We will again start with 1 unit of juice: Person A pours what they think is ⅓ units of juice into one glass which they then pass to Person B. Person B can now choose to pour some of the juice back into the carton if they believe the glass contains more than ⅓ units of juice. Finally, the glass is passed to Person C who can either chose to accept the juice or not. If they accept it, Per- son A and Person B then use the procedure for 2 people to divide the rest of the juice. If Person C rejects this glass, Person B gets the glass if they poured any juice back into the carton. If Person B did not pour any juice back into the carton, Person A gets the glass. The remaining two people then carry out the procedure for 2 people to divide the rest of the juice. The reason this procedure works is that we first ensure Person C gets ⅓ of the juice. If Person A pours less than ⅓ units of juice, Person B will not pour any back into the carton and Person C will reject the glass, meaning that Person A must take the glass with less than ⅓ units of juice. Person A will want to avoid this by pouring at least ⅓ units of juice. When this glass is passed to Person B, if the glass contains more than ⅓ of the juice, they will pour some back into the carton. This is so that if they carry out the procedure for 2 people later with Person A, there are ⅔ units of juice left, allowing for Person B to get at least ⅓ of the juice. 9
  • 11.
    FOURTH OF JUNE2019 | 35 If Person B tries to give Person C less than a ⅓, Person C will simply reject it and Person B will have to take the glass. So now that we have ensured Person C gets ⅓ of the juice, we have ⅔ units of juice left which can be divided using the procedure for 2 people. We know that this ensures each person gets ½ of the juice remaining, so this means every single person is left with ⅓ units of juice. A procedure for n people Finally, how can we devise a procedure where we divide 1 unit of juice amongst n people such that each person gets 1/n units? We can use an iterative method here where each iteration, one person walks away with 1/n units of juice. The way this works is as follows: Person A pours the juice into a glass, then any of the other n – 1 people can choose to tell Person A to stop pouring the juice. Once someone has told Person A to stop, Person A stops pouring, and the glass is given to the first person who said stop. The process is then repeated until we are left with one last person who then carries out the procedure for 2 people with Person A. This solution works because everyone has an in- centive to tell Person A to stop as soon as 1/n units of juice have been poured. This is because as soon as more is poured, someone else can tell Person A to stop, and now the rest of the players are left with less than (n-1)/n units of juice, meaning someone must walk away with less than 1/n units of juice. Therefore, we can simply keep repeating this process until two people are left and everyone will walk away with a fair share. Aarit Bhattacharya KS References Stromquist, W. How to Cut a Cake Fairly, The American Mathematical Monthly, Vol. 87, No. 8 (Oct. 1980), pp. 640-644 Talwalkar, P. How To Split A Cake Fairly Using Math – Game Theory Tuesdays, Accessed from https://mindyourdecisions.com/blog/2015/12/22/ how-to-split-a-cake-fairly-using-math-game-the- ory-tuesdays/ Happy pouring! 10
  • 12.
    32 | THESCIENTIFIC ETONIAN Introduction The William Lowell Putnam Mathematical Competition is the preeminent mathematics compe- tition for undergraduate students in the United States and Canada. The competition consists of two 3-hour sessions spread across one day. During each session, participants work individually on 6 challenging math- ematical problems. The questions are labelled from A1 to A6 and B1 to B6, with questions A6 and B6 being the hardest of them all. I will be discussing question A6 on the 1992 Putnam Competition. Question statement The question asks, “Four points are chosen independently and at random on the surface of a sphere, using a uniform distribution (each point equally likely to be chosen). What is the probability that the center of the sphere lies inside the tetrahe- dron whose vertices are at the four points?” At first, this question can seem quite daunting, as it’s hard to visualize all of the different possible tetrahedrons; frankly, it’s also difficult because you don’t know where to start! In order to simplify the question, we can first look at a similar 2D prob- lem which we can then use to solve the 3D case. Problem in 2D Let’s imagine that the question asks, “Three points are chosen at random on the circumference of a circle. What is the proba- bility that the center of the circle lies inside the triangle whose vertices are at the three points?” This now greatly simplifies the problem, making it easier to understand. Let’s select two points, and draw two diameters, each passing through one of the two chosen points and landing on the other side of the cir- cle at endpoints. The arc formed by these endpoints, that does not contain the chosen points, must contain the third point for the center to be contained by the triangle (see below). To get your head around this insight, you can think about what happens when the 3rd point is a diameter endpoint, and then if you move it slightly outside the region. 1992 Putnam Competition Question A6 11
  • 13.
    FOURTH OF JUNE2019 | 33 Let’s now have a fresh look at the problem. Instead of select- ing points initially, we can choose two random diameters and then randomly choose where our third point will be, as it will end up in one of the four arcs created (see below). Finally, we can randomly assign one end of each diameter to be the first two points. In this situation, there are two options for each di- ameter so four possibilities across both diameters. Only one of these situations leads to the pre-chosen third point being on the arc opposite the first two points. Therefore, the probabili- ty of the triangle created by choosing three random points on the circumference of a circle containing the center is ¼. Problem in 3D If we apply this logic to our 3D case, we can first random- ly choose three diameters in our sphere, and then random- ly choose where the fourth point will be, as it will be in one of the 8 sections created by the three diameters. Finally, we will randomly select an end of each diameter to place a point. In a similar way to the 2D case, here we have 8 different possibilities for where the first three points could be. Only one of them leads to the fourth point being in the opposite sector, or on the opposite side of the center, to the first three points, where the tetrahedron created would contain the center of the sphere. This means that we have a 1 in 8 chance of the tet- rahedron created by choosing four random points on the sur- face of a sphere containing the center of the sphere. Conclusion Here we can see that even one of the hardest questions in an extremely difficult competition, which at first seems excep- tionally challenging to wrap our heads around, can be solved in quite an elegant manner if we work through the logic step by step. Instead of trying to rush into the problem head-on, sometimes we must take a step back and see what we can do to simplify the question. In this situation, it would have been exceedingly strenuous to try solving this question only in 3D, but after having solved a similar question in 2D, we were able to apply our knowledge to the 3D case as well. A question which 123 of the 203 undergraduate level participants sub- mitted no solution for was able to be solved and understood by a much younger audience such as ourselves. This article shows that a change of perspective is all that is needed to solve even some of the most challenging problems. Rajas Nanda KS 12
  • 14.
    30 | THESCIENTIFIC ETONIAN Sophie Germain, Fermat and Primes Number theory, otherwise understood as the study of the natural (positive whole) numbers, makes up a large part of ancient and modern mathematics as an extension of pure mathematical theory. It is one of the oldest, and often considered the purest, of mathematical pursuits and was described as the ‘queen of mathematics’ by the great mathematician and physicist Carl Friedrich Gauss. Its aim is to delve into the complex and interesting relationships between integers, and before the advent of computer science, was considered solely as an object of fascination (for those with that kind of curiosity). Whether or not you agree with that conclusion, number theory nevertheless provides many interesting avenues for us to explore. Some of these relationships lie waiting to be discovered or proven, and generations of mathematicians have dedicated their lives to fully understanding and proving a conjecture that may or may not turn out to be unsolvable. This idea is perhaps most relevant in Fermat’s infamous Last Theorem, which was finally proved in 1995 after more than 350 years of scrutiny by the world’s greatest minds. However, this story has been told one too many times, and instead I’m going to attempt to introduce and explain the individual cases that were scattered across the 17th, 18th, and 19th centuries, as well as the story of the first real attempt to prove the general case. If anyone needs to brush up on their memory of Fermat’s Last Theorem, it’s stated, as follows, that there are no integers x,y,z>0, n>2, such that xn+yn=zn. When looking at the theorem, the relevance of number theory, and especially of primes, is obvious; The fundamental theorem of arithmetic (which I won’t prove here) states that every integer greater than 1 is either prime (a natural number larger than 1 that is not the product of two smaller natural numbers) or can be represented as a product of primes. It follows from this and from the fact that ars=(ar)s=(as)r that n can be split into its component primes and thus for a composite number n the equation xn+yn = zn can be expressed as (xm)p+(ym)p = (zm)p, where p is prime, and mp=n. Note that we only need to concentrate on odd primes, as the theory holds for n being a power of 2, so for the special case where n is a power of 2 greater than or equal to 4, we can instead represent it as a 4. If n is already prime, we don’t need to worry, and so this shows that we can always express the equation with the exponent as an odd prime or 4. This means that if we can show that Fermat’s Last Theorem is true for all odd primes and 4, then it holds true for all integers greater than 2. This doesn’t actually get us anywhere, but this was the deceptively promising situation in the early 19th century. Fermat himself managed to prove the case for n=4 using an infinite descent method, and Euler proved the case for n=3 in 1770. The case for n=5 seemed to be a little trickier to crack, and it required a preliminary proof called Sophie Germain’s theorem, which is where we introduce the main character. “Unfortunately, this was 18th century France, which was not particularly welcoming to female scientists and mathematicians.” Sophie Germain was born in Paris, 1776 to relatively rich parents. When she was only 13, the French Revolution began, and with chaos on the streets of Paris, Sophie retreated into the family’s library and started reading Montucla’s L’Histoire des Mathématiques. The story goes that she read about Archimedes’ death to a Roman soldier who he was ignoring due to his interest in a mathematical problem. She was fascinated by the fact that someone could be so engrossed in mathematics and so started herself on the works of Newton, Euler and Bézout. Unfortunately, this was 18th century France, which was not particularly welcoming to female scientists and mathematicians. Her parents disapproved of her interest and denied her clothes and a fire to stop her nocturnal studying. When she was 18, the École Polytechnique opened in Paris for the purpose of studying mathematics and science, however the idea of a woman joining the university was out of the question. Sophie instead managed with obtaining the lecture notes and submitting observations under the pseudonym Monsieur LeBlanc. Some of her insights 13
  • 15.
    FOURTH OF JUNE2019 | 31 impressed Joseph-Louis Lagrange, who was at that time giving lectures on analysis, so much so that he asked to meet “Monsieur LeBlanc” in person. Upon learning this, she was forced to reveal her real identity, but Lagrange’s respect for her did not waver. Instead, he became her sponsor and mentor. Germain had interests in many areas of mathematics and physics, but she spent most of her time working on theories of elasticity (an equally fascinating subject) and on number theory. In particular, she was the first person to make a viable attempt to fully prove Fermat’s Last Theorem. Considering only odd prime exponents, Fermat’s Last Theorem is often split into two cases: for the equation xp+yp=zp, where each pair of x, y, and z are coprime (this condition must hold for all three to be coprime), the first case is where p doesn’t divide xyz, and the second is where it does. The result of Sophie Germain’s theorem is that if certain conditions hold for the odd prime exponent p, then p2 divides xyz and so the first case must be false. These two conditions are: • There exists an auxiliary prime q such that there is no value for x such that xp ≡ p(mod q) • xp+yp+zp ≡ 0(mod q) ==> x ≡ 0(mod q) or y ≡ 0(mod q) or z ≡ 0(mod q) The proof relies mainly on modular arithmetic and prime factors, as expected and it uses help from other theorems such as Euler’s theorem and Fermat’s Little Theorem. The result of this is that for a large set of prime numbers, a certain case of Fermat’s Last Theorem has been proven, and Sophie Germain planned to go further, by showing that there were infinite auxiliary primes for each odd prime, and as either x,y, or z had to have all of them as factors, then the finite integers x,y,z would have infinite prime factors. Unfortunately, this ‘grand plan’, as she called it, was doomed to fail, as Sophie later proved herself that some primes had in fact a finite number of auxiliary primes. But this was still the first time that someone had attempted a proof for all odd prime exponents, and although it didn’t succeed, she became the first person to make real progress on the problem, proving the theorem for a specific case. She did all this in an environment where she was rarely accepted or encouraged as a mathematician. When she started working on elasticity, and won a prize from the Institut de France for developments in elastic theory, she was still not able to actually attend the college, and later, when her work on metal deformations was used to build the Eiffel Tower in the 1880’s, the names of seventy-two scholars who contributed to the design were listed around the outside. Sophie Germain was not one of them. Benedict Harvey KS 14
  • 16.
    28 | THESCIENTIFIC ETONIAN The Matrix Exponential Introduction Here we will be exploring what it means to raise a number to a matrix power and how it differs from the definitions you learn from the real number line. To achieve this, I assume the reader knows how this method works for the real numbers and how simple matrix operations work. Even if you haven’t studied matrices, this is a remarkably interesting and useful topic in maths, physics and especially in quantum mechanics, so this is good further reading for anyone planning to study maths/physics at university. Conceptually, this makes no sense. How can you multiply a number by itself a matrix number of times? So, to solve this problem, we need to redefine what it means to raise a number to a matrix pow- er. Firstly, let’s simplify our case to the expression exp(A) where A is a matrix and exp() is the natural exponential function, i.e. eA . This allows us to express the exponential as a polynomial using the Taylor series of exp(): Substituting our real “x” for our matrix “A” allows us to evaluate exp(A) since there are no matrices in the powers. This is the new definition of what it means to raise a number to the power of a matrix: This is now a matrix equation, so I have rewritten “1” as I; the identity matrix. It is important to note that A must be a square matrix (n x n) since multiplica- tion requires the number of rows of one matrix to be equal to the number of columns of the other. Since, in this case, these are the same matrices, rows must equal columns and hence A is square. Other than this, there are no other restrictions on A: it can be any dimension, contain complex numbers and be singular or non-singular. However, to evaluate this series, it’s required to converge for all matrices. Proof (simplified) Let A be a square (n x n) matrix and m is the largest value in A: Looking at the ratio between two consecutive terms (allowed since terms ≥ 0): Since all smaller terms will converge quicker, Now we have a well-defined series equivalent of raising a number to the power of a matrix; however, it’s in the form of an infinite series which, in most cases, is impossible to evaluate without taking a finite number of terms. Fortunately, there are a few cases which reduce the series to a finite polynomial. Nilpotent case A square matrix, N, can be described as nilpotent if Nx = 0 for some integer x. Since this matrix will continue to be 0 for any power larger than x, we only need to evaluate a finite number of terms to find an exact result for the matrix polynomial. An example of this may be : 15
  • 17.
    FOURTH OF JUNE2019 | 29 Diagonal case A diagonal matrix is a matrix where the only non-ze- ro entries are on its leading diagonal. Moreover, any power of a diagonal matrix is just the power of each term on the diagonal. Using this it’s easy to prove that: The third case is shown below where the Taylor series is periodic which itself is a Taylor series. To demon- strate this, let’s look at a use of the matrix exponen- tial in the context of differential equations; its most common use: Hopefully, you can already see how useful matrices are when it comes to solving any set of simultaneous equations since you can always write them into a single equation with a matrix constant. Differential relationships are extremely common in the natural world and hence the matrix exponential can be ex- tremely useful in simplifying differential equations in higher dimensions. Solving above, the matrix expo- nential appears: Using our definitions above, this is solvable for x and y since the Taylor series will converge to a 2x2 matrix. In fact, this particular Taylor series simplifies to an incredibly significant result: As some of you may know, this is the general rotation matrix; increasing t by 1 moves the point (x,y) 1 radian round the unit circle. This is very pleasing. Looking back at our original differential equation, this is a very intuitive result. is the 90-degree rotation matrix and the only way for position, (x,y), to always be perpendicular to velocity, , is for all the solutions to lie on the unit circle. The matrix exponential also turns up in quantum me- chanics, particularly in Schrödinger’s equation, since this topic is filled with higher dimensional differential equations. However, as with many topics in maths nowadays, it’s rare to calculate it by hand. If you want a more visual explanation of this topic, 3Blue1Brown on YouTube has an amazing video visualising the matrix exponential. I based this article off that video, and I highly recommend his channel for any further reading you wish to do for maths. Arthur Tollit ma (NPTL) 16
  • 18.
    26 | THESCIENTIFIC ETONIAN Crossnumber Across: 1) Number of sides in an irregular shape with average interior angle of 160 3) Fibonacci number 5) Square number 6) Number of digits in 13! 7) Multiple of 14 Down 9) 22nd triangle number 12) 1/5 of 10 Down 13) Square number plus 49 15) (12 Across - 6 Across) x 7 Across Down: 2) 10 times a square number 3) 4th power of an integer 4) (5413 - year Eton was founded) MOD 1000 6) A prime number found nowhere else in the grid 7) Root of y = x2 - 12x - 253 8) Cube number 10) Fibonacci number 11) (12 Across + 1)2 13) 6 Across + (5 Across MOD 17) 14) Sum of coefficients of a quartic with y-intercept of 5, where f(0) + f(1) = 18 17
  • 19.
    FOURTH OF JUNE2019 | 27 Zachary Marinov KS 18
  • 20.
    FOURTH OF JUNE 2019 | 25 Puzzles Sources: BMO1, 2020 PurpleComet Spring Meet 2020 Ritangle 2020 2015 AMC 8 Oxford MAT 2009 1) Given the equation what is the value of sinx? 2) In the equation the letters A, B, C, D and E represent different base-10 digits. Given that C = 9, find A, B, D and E. 3) Find the sum of all values of x such that the set 4 {107,122, 127, 137, 152, x} has a mean that is equal rgrto its median. 4) The diagram shown consists of four squares and two 44equilateral triangles, all with a side length of 1 unit, 44that surround a hexagon. Find a. 5) A football league consists of two four-team divi- 44sions. Each team plays every other team in its 44division N games. Each team plays every team in 44the other division M games with N > 2M and M > 4. 44Each team plays a 76 game schedule. How many 44games does a team play within its own division? 6) For what values of k does have four real solutions? Aarit Bhattacharya KS Sources: BMO1, 2020 Purple Comet Spring Meet 2020 Ritangle 2020 2015 AMC 8 Oxford MAT 2009 19
  • 21.