Intro to modelling-supervised learning

INTRO TO MACHINE
LEARNING
Justin Sebok

CONTENTS
What is machine learning?
Types of machine learning
Supervised learning and examples
Unsupervised learning and examples

WHAT IS MACHINE LEARNING?
Wikipedia: Machine Learning is a subfield of computer science which
gives computers the ability to learn without being explicitly
programmed.

programmed.
WTF does that mean?!

programmed.
WTF does that mean?!
Basically, Machine Learning involves using some “algorithms” which
learn using data to improve their predictions of something using
patterns in the data.
Data Algorith
m
Prediction
s

“… without being explicitly programmed”
This is what makes machine learning so powerful. Rather than
requiring specific instructions like in traditional computing, machine
learning allows the computers to improve their predictions just using
the data inputs.

TWO MAIN TYPES OF MACHINE
LEARNING ALGORITHM
Supervised Learning: We know what we are trying to predict. We
use some examples that we (and the model) know the answer to, to
“train” our model. It can then generate predictions to examples we
don’t know the answer to.
Examples: Predict the price a house will sell at. Identify the gender of
someone based on a photograph.
Unsupervised Learning: We don’t know what we are trying to
predict. We are trying to identify some naturally occurring patterns in
the data which may be informative.
Examples: Try to identify “clusters” of customers based on data we
have on them

TYPES OF SUPERVISED LEARNING
Supervised learning can be further broken down based on two
possible types of problem they may be trying to solve.
Classification Problems: These are problems where there is a finite
and countable number of possible solutions. There may be as few as
2 or as many as 1000+ possible solutions, but as long as we can
identify and count them all this doesn’t matter.
Examples: Identify the colour seen in a picture.
Regression Problems: These are problems where the feature we
are trying to predict is a number on a continuous scale.
Examples: Predict someone’s height.

TYPES OF SUPERVISED LEARNING
Supervised learning can be further broken down based on two
possible types of problem they may be trying to solve.
Classification Problems: These are problems where there is a finite
and countable number of possible solutions. These are categories or
classes. There may be as few as 2 or as many as 1000+ possible
solutions, but as long as we can identify and count them all this
doesn’t matter.
Examples: Identify plant species.
Regression Problems: These are problems where the feature we
are trying to predict is a number on a continuous scale.
Examples: Predict someone’s height.

INTRO TO A FEW SUPERVISED
LEARNING MODELS
Nearest Neighbours (Classification and Regression)
Decision Trees (Classification and Regression)
Linear Regression (Regression)

QUICK TERMINOLOGY
Observation: One of the “things” we are looking at. Could be a
person, a time, or a place.
Feature: Some aspect of the observation that we know. Could be a
person’s hair colour, the latitude and longitude of a city, or the
number of rooms a house has. May be denoted as x
Label: The feature of an observation which we are trying to predict.
For labelled observations, we already know the answer. May be
denoted as y

NEAREST NEIGHBOURS
Conceptually one of the simplest Machine Learning algorithms.
Uses the proximity or similarity of observations to make predictions
about them

NEAREST NEIGHBOURS
Conceptually one of the simplest Machine Learning algorithms.
Uses the proximity or similarity of observations to make predictions
about them
Method:
For the 1-Nearest Neighbour algorithm, find the closest labelled
observation to the unlabelled observation and apply the same label.
While it may seem very simple, it is often very effective!
It can be used for classification or regression

1 NEAREST NEIGHBOUR
PREDICTIONS

1 NEAREST NEIGHBOUR
PREDICTIONS
?

1 NEAREST NEIGHBOUR
PREDICTIONS
?
Here there is
some
ambiguity. We
are equal
distance from
both classes.
In this case, for
1-NN we would
just flip a coin
to choose a
class at random

1 NEAREST NEIGHBOUR
PREDICTIONS
?
6
3 0
8
6
1.5
5

1 NEAREST NEIGHBOUR
PREDICTIONS
6
3 0
8
6
1.5
5
8

K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
What is an outlier?

What is an outlier?
Outlier is a point which is distant or very different from other
observations.
This may be a legitimate datapoint, or may be an example of “noise”
in the data

How could we attempt to counteract this problem?

Why not try 2-Nearest Neighbours? Simply look at the 2 nearest
labelled examples and apply the label that they have.

What happens when we have a tie?

What happens when we have a tie?
Flip a coin…
Or we could use 3-Nearest Neighbours – No ties if we only have 2
classes

3-NEAREST NEIGHBOUR
PREDICTIONS
?

3-NEAREST NEIGHBOUR
PREDICTIONS

3-NEAREST NEIGHBOUR
PREDICTIONS
?
6
3 0
8
6
1.5
5
How can we
use the 3-
nearest
neighbour
approach in
regression?

3 NEAREST NEIGHBOUR
PREDICTIONS
6
3 0
8
6
1.5
5
4.67

SO WHAT K-VALUE DO I USE?
Choice of how many neighbours to use illustrates one of the main
trade-offs seen in machine learning:
Variance vs Bias

Variance vs Bias
Variance is the error in prediction we get from following our training
data too closely. We end up basing our predictions on “random noise”
in the data. If we choose too small a k-value, we may have a high
level of variance.

Variance vs Bias
Variance is the error in prediction we get from following our training
data too closely. We end up basing our predictions on “random noise”
in the data. If we choose too small a k-value, we may have a high
level of variance.
Bias is the error in prediction we get from using a simplified model to
predict very complex real-world things. If we choose too large a k-
value, we may have a high level of bias.

VARIANCE VS BIAS
One big part of machine learning is striking the right balance
between these two types of errors.

PROBLEM OF DIMENSIONALITY
1 Dimension: 5
observations to fill the
space
How many observations
do we need to fill 2
dimensions?

1 Dimension: 5
space
2 Dimensions: 25
space
How many
observations do we
need to fill 3
dimensions?

1 Dimension: 5
space
2 Dimensions: 25
space
3 Dimensions: 125
space
As dimensionality increases, the
number of observations required
to “fill the space” increases
exponentially

DECISION TREES
Another quite simple Machine Learning technique.
We attempt to “cut” the space where our observations exist and
predict labels based on the sections our observations end up in.

DECISION TREES
We can display these cuts in the
form of a tree, hence the name.
Here is an example of such a
tree used for predicting height
Another quite simple Machine Learning technique.
We attempt to “cut” the space where our observations exist and
predict labels based on the sections our observations end up in.

DECISION TREE – “CUTTING THE
SPACE”

DECISION TREE - “CUTTING THE
SPACE”
This is an
example of
“cutting the
space”

DECISION TREES
Once we have cut our space into chunks, how do we generate
predictions in that area?
?

DECISION TREES

DECISION TREES
6 8
5
?

DECISION TREES
6 8
5
6.33

DECISION TREE – WHERE DO WE
CUT?
Each cut should
improve the
prediction
accuracy by as
much as
possible

DECISION TREE – WHERE DO WE
CUT?

HOW COULD WE CUT A
“REGRESSION” DECISION TREE?

HOW COULD WE CUT A
“REGRESSION” DECISION TREE?
Very similar to the way classification trees are cut.
Each cut should reduce the difference between predicted output in an
area and the actual training output

BIAS AND VARIANCE IN DECISION
TREES
What would a decision tree with a high degree of bias look like?
What would a decision tree with a high degree of variance look like?

LINEAR REGRESSION
I will assume everyone knows the basics of linear regression.
While I won’t go into any of the maths, it is very useful to look at this
with the other models.

LINEAR REGRESSION
I will assume everyone knows the basics of linear regression.
What is a very basic definition of linear regression?

LINEAR REGRESSION
What would a linear regression line with a high degree of bias look
like?
What would a linear regression line with a high
degree of variance look like?

SPECTRUM OF SUPERVISED
LEARNING TECHNIQUES
No
assumptions
about data
Lots of
assumptions
about data
Where do the techniques we
have discussed fall on this
spectrum?

LEARNING TECHNIQUES
No
assumptions
about data
Lots of
assumptions
about data
Not
computationa
lly efficient
Very
computationally
efficient
The more assumptions we can make
about our data, the more
computationally efficient we can make it

LEARNING TECHNIQUES
No
assumptions
about data
Lots of
assumptions
about data
Not
computationa
lly efficient
Very
computationally
efficient
K-Nearest
Neighbour
s
Decisi
on
Trees
Linear
Regression

Intro to modelling-supervised learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Intro to modelling-supervised learning

Similar to Intro to modelling-supervised learning (20)

Recently uploaded

Recently uploaded (20)

Intro to modelling-supervised learning