IntroML_6_Classification_part3

Introduction to
Machine Learning
(5 ECTS)
Giovanni Di Liberto
Asst. Prof. in Intelligent Systems, SCSS
Room G.15, O’Reilly Institute

Trinity College Dublin, The University of Dublin
Overview previous lectures
2
• Classification
• Evaluation
• Overfitting and Cross-validation
• Chance level
• K-nearest neighbour (KNN)
• Decision tree

Overview lecture
3
• Cross-validation
• Overfitting
• More about Support Vector Machines (SVM)
• Data projection (introduction)
• Introduction to regression

Trinity College Dublin, The University of Dublin 4
Support Vector Machine (SVM)
Linear Binary SVM Classification
- Scenario where the two classes are linearly
separable
- The solid line in the plot on the right represents
the decision boundary of an SVM classifier
- This line separates the two classes + stays as far
away from the closest training instances as
possible

A more realistic scenario.
We are going to get some errors. We can choose:
Do we prefer having higher precision or higher
recall? We can’t have both, but we can move the
decision boundary to make the solution the best as
possible for our goals.

Overfitting
6
https://towardsdatascience.com/techniques-for-handling-underfitting-
and-overfitting-in-machine-learning-348daa2380b9
Overfitted model: it does not
generalise well!
Maybe some datapoints were bad
measurements or mislabelled

Overfitting
7
Controlling for overfitting
- We want to make sure that our model is working for real. That it generalises.
Not that it works (good classification) because we are overfitting
- To do so, we fit the model on one portion of the data and test it on a
separate portion of the data. This approach controls for overfitting as the
model is evaluated on unseen data (cross-validation)
Preventing overfitting
- More complex models tend to overfit more
- There are strategies to reduce the amount of overfitting (e.g., regularisation,
early stopping)

Cross-validation (controlling for overfitting)
8
https://towardsdatascience.com/cross-validation-k-fold-vs-monte-carlo-e54df2fc179b
Class 1
Class 2
Ground truth Training set Test set

Cross-validation
9
Class 1
Class 2

Cross-validation
10
Class 1
Class 2
The model is overfitting! Too
complex
At least the cross-validation is
controlling for that i.e.,
prediction on the test set is
not very good

k-fold Cross-validation
11

Baseline – real vs. ideal
12
- Coin flip:
- 2 classes (head vs. tail)
- 50-50 chance
- Random
- Is that a zero or a one digit?
- 2 classes
- Let’s use a simple linear classifier. We definitely want this classifier to
perform better than chance.
- What is chance? Well, 2 classes.. Isnt’t that a 50-50 chance to get it right?
- Nope. That depends on the probability of encountering a 1 or a 0
- So, let’s say that we have equal number of and ones in the dataset. That
means that we have a 50-50 chance that a random classifier gets it right.
- Yes.. with infinite data

Baseline – real vs. ideal
13
- Small datasets have a higher chance that a random classifier would get it right
by chance
- So, classification results should be compared to a baseline (or chance level)
that is calculated by taking into account the sample size (N)
- We will see that in the coming lectures
- Things get more complicated with multiclass and imbalanced datasets
https://www.discovermagazine.com/mind/machine-learning-exceeding-chance-level-by-chance

Precision vs. recall
14
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trade-off

ROC curve
15

Classification – evaluation metrics
16
F1-Score = harmonic mean of precision and recall
Precision, recall, and F1-score apply to both binary
balanced, binary imbalanced, and multiclass classification.

Classification in Python
17
X is the data matrix
(features)
y is the class (‘five’ or
‘not a five’)

- Some datasets are not even close
to being linearly separable.
- One approach is to use
polynomial features
e.g., x2 = (x1)2
x3 = (x1)3

- Some datasets are not even close
to being linearly separable.
- One approach is to use
polynomial features
e.g., x2 = (x1)2
x3 = (x1)3
- Kernel methods
https://towardsdatascience.com/the-kernel-trick-c98cdbcaeb3f

LDA: Linear Discriminant Analysis
and Data projection
x1
x2
Y ∈ {green,blue}
x2
x1
X: [x1, x2] Sometimes it is easier to look at things from a different angle,
instead of searching for a complicated solution

Data projection
x1
x2
Y ∈ {green,blue}
X: [x1, x2]
Xproj = X - [2,0]
Xproj = [x1, x2] - [2,0]
Xproj = [x1-2, x2]
xproj1
xproj2

Data projection
x1
x2
Y ∈ {green,blue}
X: [x1, x2]
Xproj = X - [2,3]
Xproj = [x1, x2] - [2,3]
Xproj = [x1-2, x2-3]
xproj1
xproj2

Data projection
A projection is a transformation of data points from one axis system to another
x1
x2
xproj1
xproj2
xproj1
xproj2

Data projection
x1
x2
x1
x2
Bad projection Good projection

x1
x2
Good projection
Data projection
LDA: Linear Discriminant Analysis
Find the axis that:
- Maximises the variance of the class
means (between-class)
- Minimises the within-class variance

x1
x2
Good projection
Data projection
xproj
Perfect separability between classes

Discussion
28

Discussion
29

Discussion
30
• How could we design a pothole detector that can map the potholes in
Dublin? What would be the data? How would we use this data to
perform classification and detect the potholes?
Problem/question Data collection
Preprocessing /
cleaning
Analysing
Interpretation /
outcome
Improve
ML
Visualisation Visualisation Visualisation

Supervised Learning
31
y = f(X)
f ynew
Model Training (learning or fit)
Xnew
f y
X
Using the model (test)
known known
unknown known
known unknown
Classification: y is a category/class
Regression: y is a number

Regression
Classification Regression
Find decision boundary:
e.g.:
Combination of X > boundary
 y class A
Combination of X < boundary
 y class B
Find decision boundary:
e.g.:
y = Combination of X

Regression
33
X2: inflation
X1: cost of materials
y = avg cost house
Using the past (of x) to
predict the future (of y)

Regression
Dependent variable
Independent variables

Classification in Python
35
(features)
‘not a five’)

Regression in Python
36
(features)
‘not a five’)

IntroML_6_Classification_part3

Recommended

Recommended

More Related Content

Similar to IntroML_6_Classification_part3

Similar to IntroML_6_Classification_part3 (20)

Recently uploaded

Recently uploaded (20)

IntroML_6_Classification_part3

Editor's Notes