An Introduction to Neural Networks and Machine Learning

An Introduction to Neural
Networks and Machine Learning
CHRIS NICHOLLS

Methods to solve problems that humans are good at, but machines are typically bad at
General purpose algorithms that learn from lots of data – ideally not hard-coded tricks to solve a
specific problem
What is machine learning?

The task is to take an image and predict one of a fixed set of labels. For example, the labels could
be {bird, plane, superman}
Input: red, green and blue values for each pixel
Output: a prediction for each class
The network is given thousands of correctly labelled images, and has to learn good parameters
Example: image classification

Reinforcement learning involves choosing which actions to take at different points in time to
maximise some score
Example:
◦ Playing games
◦ Driving a car (the score being time since last crash…)
◦ Controlling a robot arm: aim to pick up an object
This is difficult: it may be a long time before you see any reward for a good action
It’s hard to write a typical program with instructions to solve these problems
Example: reinforcement learning

The computer has no more information than the pixels on the screen and the score
For example, when playing space invaders, the machine doesn’t even know what its character is,
or what the bullets are, or what the aliens are, or that it gets points for hitting them
RL: Playing ATARI Video Games (DeepMind)

Go is a board game played on a 19x19 board
Players take it in turns to place stones, and try to control the
board by surrounding areas with their stones
It’s famously difficult to see who is winning at a given point
in time
There are approximately 2 ⋅ 10%&' legal board positions
(more than the number of atoms in the known universe)
The DeepMind approach combines neural networks with
reinforcement learning
They first learned an ‘evaluation function’, which tries to
estimate who is winning at a given point in time
They also try to come up with a small list of good moves in
any board position
This allows them to look many more moves ahead
RL: Playing Go (DeepMind)

There are many smaller problems here. Among them are:
◦ How to detect pedestrians in the road
◦ How to interpret information from sensors (e.g. distance to objects in various directions) and decide
when to brake and accelerate
◦ How to follow the road and also navigate from A to B
Example: self-driving cars

This talk is mainly about supervised learning
In this, the machine has some input (e.g. an image) and has to make a prediction (e.g. what it is)
It’s called supervised learning because we provide many examples of what we want it to do, and
we ask it to learn from those
This is called training data
Example: The training data in object recognition is labelled images (e.g. 1000 images of birds,
each labelled as a bird, and 1000 images of planes, each labelled as a plane).
The machine then tries to generalise this information so it can make predictions on new data
Supervised learning

Suppose our training data consists of points
(𝑥%, 𝑥+) together with a label of ‘blue’ or ‘orange’
Then we want the network to realise that points
on the top right are more likely to be blue, and
points on the bottom left are more likely to be
orange
Simple example: classify points in 2D

In supervised learning, it’s possible to fit a
function exactly to the training data, but have it
fail to generalise to new data
This is called overfitting
Consider the set of points on the right, and the
two lines that fit them.
Which seems better?
Several solutions to this:
◦ More data – if we had more data points, then it
would be clearer that a straight line is a better fit
◦ Constrain the function you are fitting – if you
impose that the degree of the polynomial has to be
small, you won’t get such a bad fit
Overfitting

What are neural networks
Neural networks are functions that can be fitted to data
Input: a vector (𝑥%, … , 𝑥.) – the input
Output: a vector 𝑦%, … , 𝑦0 – the prediction
In the middle: parameters to refine the output
Example input:
Input: the coordinates
(x1, x2) of the point
Output: y1 (a number)
Want to find a function
such that:
y1 > 0 if blue
y1 < 0 if orange

We can draw a line dividing the points in two
Points on one side of the line are predicted blue
Points on the other side are predicted orange
How to divide up
the data?

In some (i.e. most moderately interesting)
problems, there is no single line that separates
the data
Here, we can try and use several lines
Predict blue if the point lies inside all the lines
What if there is no
line?

The equation of a line in two dimensions is
𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 = 0
Equations of lines
𝑥% − 𝑥+ = 0 𝑥% + 2𝑥+ + 1 = 0𝑥% − 0.5 = 0

In fact, 𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 gives the perpendicular distance of the point (𝑥%, 𝑥+) from the line
(multiplied by the length of 𝑤%, 𝑤+ , but don’t worry about this)
Further, this distance is signed: on one side of the line it’s positive, and on the other it’s negative
Thus 𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 tells us on which side of the line the point is, and how far away
Aim: learn good values for 𝑤%, 𝑤+ and 𝑏
Equations of lines
The blue point is on
the positive side
The orange point is on
the negative side

Neural networks consist of several layers of
neurons
Each neuron is a choice of line (and which side of
the line is positive)
The output of the neuron is the signed
perpendicular distance of the point from the line
The output of one layer of neurons is fed into the
next layer of neurons
Neural networks

Technical point: this operation is linear. It turns out that doing linear operations one after the
other is the same as doing just one – no complexity is added. Thus we add a nonlinear function
between the layers.
Example of nonlinear function: set all negative outputs to zero (this is called a rectified linear
unit, or ReLU)
Neural networks

The lines on which the network changes decision (e.g. predicts blue on one side, orange on the
other)
There’s a great tool online called the ‘Neural Network Playground’, which lets you train a
network on data and see the features it learns
With three neurons, we get the following:
Decision boundaries
There are only three neurons,
but there are some technical
details that mean the decision
boundary can have four edges

It turns out that adding more layers of
neurons allows quite complex decision
boundaries
This network has 3 hidden layers (i.e. 3
layers excluding the input and output) with
8 neurons in each layer
The decision boundaries are essentially built
up from the straight lines we saw before,
but repeating this gives a lot of flexibility
More layers

We have seen how using several layers of simple functions can give a lot of complexity
The task is easier if we have better features to start with
The equation of a line is 𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 = 0
The equation of a circle with centre (𝑎%, 𝑎+) and radius 𝑟 is
𝑥% − 𝑎%
+ + 𝑥+ − 𝑎+
+ = 𝑟+
So if we use quantities like 𝑥%
+
, 𝑥+
+
, 𝑥% 𝑥+ then we have more flexibility with our decision boundary
It becomes easier to learn complicated decision boundaries
Better features

If we now allow the network to use 𝑥%
+
, 𝑥+
+
and
𝑥% 𝑥+ then we only need one neuron to learn the
decision boundary on the left
Same problem,
better features

You now understand how standard feed-forward neural networks can create complex decision
boundaries to classify data
This explanation generalises to higher dimensions: replace 𝑤% 𝑥% + 𝑤+ 𝑥+ + 𝑏 with 𝑊𝑥 + 𝑏,
where 𝑊 is a matrix and 𝑏 is a vector
Many interesting problems have very high-dimensional data though, and we might need to add
so many layers and neurons to our network to deal with this that it becomes impossible to train
in practice
Neural networks on real problems

In order to learn the parameters (all the lines), we provide training data to the network
This consists of many (the more the better) points and their correct labels (i.e. orange or blue in
the previous examples)
Then the network’s task is to find good values for its parameters – ideally the network wants to
find parameters such that it can correctly classify all the training data (more on this later)
How neural networks learn

To train a neural network, treat the problem as an optimisation problem: find the weights 𝑤<, 𝑏
that minimise the training error
In real-world examples, there can be millions of weights
It turns out that the only reasonable way to optimise them is to use gradient descent
This means treating the training error, 𝐿, as a function of the weights, and then computing the
gradient
𝛻? 𝐿
with respect to the weights
If we update the weights with a small enough step: 𝑤 ↦ 𝑤 − 𝛻? 𝐿 ⋅ 𝛼, then we are guaranteed
to decrease the training error
Here, 𝛼 is called the step size – how much we move in the direction of the weights
There is a lot of theory about how to do this, but I’m not going to discuss it due to time
Training neural networks

For object recognition, it’s hopeless to just use the pixel values in a feed-forward neural
network, since the problem is so complex
Instead, we need better features
We also need to reduce the size of the problem:
◦ If an image is 200x200 pixels, the input has 200x200x3 = 120,000 dimensions (far more than 2!)
We also want the prediction to be invariant under translation and scaling (it shouldn’t matter
too much where I see the object in the image, or how big it is)
I won’t describe convolutional neural networks in much detail here
Roughly speaking, they are in two parts:
◦ To start with, they have specially designed layers, which learn complicated features in images
◦ Then they feed these features into a standard feed-forward neural network, to classify
Convolutional neural networks

The layers in the first half of a convnet work using ‘filters’,
which are small images that are compared to the input at
each possible position
If the filter closely matches the image at that position, the
output is high, and if it doesn’t, the output is low
Thus the output of the layer consists of high outputs in the
positions where the filter matches, and low outputs in the
other positions
The filters on the first layer of convnets tend to end up as
edge detectors and colour detectors
Convolutional neural networks

At the top is the input image
Each square in the bottom image shows the output for a given filter
Light means high output, dark means low output
Note how some filters seem to be activating on blue
Other filters seem to activate on edges
First layer outputs

Much like with feed-forward neural networks, when you combine multiple layers, you end up
with a highly flexible function
It has been shown that the later layers of convnets can learn complicated features like eyes and
faces
That is, the neurons in these layers activate at positions where there is an eye, or where there is
a face
Crucially, they learn their own features
The features aren’t always interpretable – this is OK, since we don’t need to know exactly how
the network is working
Features that convnets learn

We can use convolutional neural networks to draw images in the style of other images
Neural Style
Content Image Style Image

We can use convolutional neural networks to draw images in the style of other images
Neural Style

One can take an image that a convolutional neural network is 100% sure belongs to one class,
and perturb it very slightly – it then becomes 100% sure that it’s a different class (see
http://karpathy.github.io/2015/03/30/breaking-convnets/)
The image on the left is an image of a school bus. The image in the middle is the noise to add.
The image on the right is the original image plus the noise.
The network predicts school bus for the left image, and ostrich for the right image
Fooling convolutional neural networks

Recurrent neural networks are modifications of feed-forward neural networks whose input is a
sequence and whose output is a predicted next element of the sequence
Example: train on text from Wikipedia.
Input: ‘The quick brown f’
Probable output: ‘o’
Recurrent neural networks

From https://www.robinsloan.com/notes/writing-with-the-machine/
The network is trained on some sci-fi magazines
RNN: Sci-fi novel autocomplete

From Karpathy (http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
Sample output from an RNN trained on Shakespeare:
“Alas, I think he shall be come approached and the day When little srain would be attain'd into
being never fed, And who is but a chain and subjects of his death, I should not sleep.”
RNN Example: Trained on Shakespeare

From Karpathy also:
RNN Example: Trained on Algebraic Geometry

A combination of RNNs and CNNs
Karpathy and Li used a convolutional neural network
followed by a recurrent neural network in order to take in
an image and predict a caption for the image
One dataset they used was MSCOCO (Microsoft Common
Objects in Context), which contains 300,000 images with up
to 5 captions per image

An Introduction to Neural Networks and Machine Learning

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to An Introduction to Neural Networks and Machine Learning

Similar to An Introduction to Neural Networks and Machine Learning (20)

Recently uploaded

Recently uploaded (20)

An Introduction to Neural Networks and Machine Learning