A nontechnical introduction to neural networks, with many examples and pictures. The first talk given at the Balliol College machine learning reading group.
18. In fact, 𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 gives the perpendicular distance of the point (𝑥%, 𝑥+) from the line
(multiplied by the length of 𝑤%, 𝑤+ , but don’t worry about this)
Further, this distance is signed: on one side of the line it’s positive, and on the other it’s negative
Thus 𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 tells us on which side of the line the point is, and how far away
Aim: learn good values for 𝑤%, 𝑤+ and 𝑏
Equations of lines
The blue point is on
the positive side
The orange point is on
the negative side
27. To train a neural network, treat the problem as an optimisation problem: find the weights 𝑤<, 𝑏
that minimise the training error
In real-world examples, there can be millions of weights
It turns out that the only reasonable way to optimise them is to use gradient descent
This means treating the training error, 𝐿, as a function of the weights, and then computing the
gradient
𝛻? 𝐿
with respect to the weights
If we update the weights with a small enough step: 𝑤 ↦ 𝑤 − 𝛻? 𝐿 ⋅ 𝛼, then we are guaranteed
to decrease the training error
Here, 𝛼 is called the step size – how much we move in the direction of the weights
There is a lot of theory about how to do this, but I’m not going to discuss it due to time
Training neural networks