This document provides an overview of lectures on machine learning topics including classification, overfitting, support vector machines, data projection, and regression. It discusses evaluating models, controlling overfitting through cross-validation, precision vs recall, and implementing classification and regression in Python using Scikit-Learn. Examples are provided on linear classification with SVM, handling non-linearly separable data, and using data projection techniques like LDA.
2. Trinity College Dublin, The University of Dublin
Overview previous lectures
2
• Classification
• Evaluation
• Overfitting and Cross-validation
• Chance level
• K-nearest neighbour (KNN)
• Decision tree
3. Trinity College Dublin, The University of Dublin
Overview lecture
3
• Cross-validation
• Overfitting
• More about Support Vector Machines (SVM)
• Data projection (introduction)
• Introduction to regression
4. Trinity College Dublin, The University of Dublin 4
Support Vector Machine (SVM)
Linear Binary SVM Classification
- Scenario where the two classes are linearly
separable
- The solid line in the plot on the right represents
the decision boundary of an SVM classifier
- This line separates the two classes + stays as far
away from the closest training instances as
possible
5. Trinity College Dublin, The University of Dublin 5
Support Vector Machine (SVM)
A more realistic scenario.
We are going to get some errors. We can choose:
Do we prefer having higher precision or higher
recall? We can’t have both, but we can move the
decision boundary to make the solution the best as
possible for our goals.
6. Trinity College Dublin, The University of Dublin
Overfitting
6
https://towardsdatascience.com/techniques-for-handling-underfitting-
and-overfitting-in-machine-learning-348daa2380b9
Overfitted model: it does not
generalise well!
Maybe some datapoints were bad
measurements or mislabelled
7. Trinity College Dublin, The University of Dublin
Overfitting
7
Controlling for overfitting
- We want to make sure that our model is working for real. That it generalises.
Not that it works (good classification) because we are overfitting
- To do so, we fit the model on one portion of the data and test it on a
separate portion of the data. This approach controls for overfitting as the
model is evaluated on unseen data (cross-validation)
Preventing overfitting
- More complex models tend to overfit more
- There are strategies to reduce the amount of overfitting (e.g., regularisation,
early stopping)
8. Trinity College Dublin, The University of Dublin
Cross-validation (controlling for overfitting)
8
https://towardsdatascience.com/cross-validation-k-fold-vs-monte-carlo-e54df2fc179b
Class 1
Class 2
Ground truth Training set Test set
9. Trinity College Dublin, The University of Dublin
Cross-validation
9
https://towardsdatascience.com/cross-validation-k-fold-vs-monte-carlo-e54df2fc179b
Class 1
Class 2
Ground truth Training set Test set
10. Trinity College Dublin, The University of Dublin
Cross-validation
10
https://towardsdatascience.com/cross-validation-k-fold-vs-monte-carlo-e54df2fc179b
Class 1
Class 2
Ground truth Training set Test set
The model is overfitting! Too
complex
At least the cross-validation is
controlling for that i.e.,
prediction on the test set is
not very good
11. Trinity College Dublin, The University of Dublin
k-fold Cross-validation
11
https://towardsdatascience.com/cross-validation-k-fold-vs-monte-carlo-e54df2fc179b
12. Trinity College Dublin, The University of Dublin
Baseline – real vs. ideal
12
- Coin flip:
- 2 classes (head vs. tail)
- 50-50 chance
- Random
- Is that a zero or a one digit?
- 2 classes
- Let’s use a simple linear classifier. We definitely want this classifier to
perform better than chance.
- What is chance? Well, 2 classes.. Isnt’t that a 50-50 chance to get it right?
- Nope. That depends on the probability of encountering a 1 or a 0
- So, let’s say that we have equal number of and ones in the dataset. That
means that we have a 50-50 chance that a random classifier gets it right.
- Yes.. with infinite data
13. Trinity College Dublin, The University of Dublin
Baseline – real vs. ideal
13
- Small datasets have a higher chance that a random classifier would get it right
by chance
- So, classification results should be compared to a baseline (or chance level)
that is calculated by taking into account the sample size (N)
- We will see that in the coming lectures
- Things get more complicated with multiclass and imbalanced datasets
https://www.discovermagazine.com/mind/machine-learning-exceeding-chance-level-by-chance
14. Trinity College Dublin, The University of Dublin
Precision vs. recall
14
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trade-off
16. Trinity College Dublin, The University of Dublin
Classification – evaluation metrics
16
F1-Score = harmonic mean of precision and recall
Precision, recall, and F1-score apply to both binary
balanced, binary imbalanced, and multiclass classification.
17. Trinity College Dublin, The University of Dublin
Classification in Python
17
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
X is the data matrix
(features)
y is the class (‘five’ or
‘not a five’)
18. Trinity College Dublin, The University of Dublin 18
Support Vector Machine (SVM)
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
19. Trinity College Dublin, The University of Dublin 19
Support Vector Machine (SVM)
- Some datasets are not even close
to being linearly separable.
- One approach is to use
polynomial features
e.g., x2 = (x1)2
x3 = (x1)3
20. Trinity College Dublin, The University of Dublin 20
Support Vector Machine (SVM)
- Some datasets are not even close
to being linearly separable.
- One approach is to use
polynomial features
e.g., x2 = (x1)2
x3 = (x1)3
- Kernel methods
https://towardsdatascience.com/the-kernel-trick-c98cdbcaeb3f
21. Trinity College Dublin, The University of Dublin 21
LDA: Linear Discriminant Analysis
and Data projection
x1
x2
Y ∈ {green,blue}
x2
x1
X: [x1, x2] Sometimes it is easier to look at things from a different angle,
instead of searching for a complicated solution
22. Trinity College Dublin, The University of Dublin 22
Data projection
x1
x2
Y ∈ {green,blue}
X: [x1, x2]
Xproj = X - [2,0]
Xproj = [x1, x2] - [2,0]
Xproj = [x1-2, x2]
xproj1
xproj2
23. Trinity College Dublin, The University of Dublin 23
Data projection
x1
x2
Y ∈ {green,blue}
X: [x1, x2]
Xproj = X - [2,3]
Xproj = [x1, x2] - [2,3]
Xproj = [x1-2, x2-3]
xproj1
xproj2
24. Trinity College Dublin, The University of Dublin 24
Data projection
A projection is a transformation of data points from one axis system to another
x1
x2
xproj1
xproj2
xproj1
xproj2
25. Trinity College Dublin, The University of Dublin 25
Data projection
x1
x2
x1
x2
Bad projection Good projection
26. Trinity College Dublin, The University of Dublin 26
x1
x2
Good projection
Data projection
LDA: Linear Discriminant Analysis
Find the axis that:
- Maximises the variance of the class
means (between-class)
- Minimises the within-class variance
27. Trinity College Dublin, The University of Dublin 27
x1
x2
Good projection
Data projection
xproj
Perfect separability between classes
30. Trinity College Dublin, The University of Dublin
Discussion
30
• How could we design a pothole detector that can map the potholes in
Dublin? What would be the data? How would we use this data to
perform classification and detect the potholes?
Problem/question Data collection
Preprocessing /
cleaning
Analysing
Interpretation /
outcome
Improve
ML
Visualisation Visualisation Visualisation
31. Trinity College Dublin, The University of Dublin
Supervised Learning
31
y = f(X)
f ynew
Model Training (learning or fit)
Xnew
f y
X
Using the model (test)
known known
unknown known
known unknown
Classification: y is a category/class
Regression: y is a number
32. Trinity College Dublin, The University of Dublin 32
Regression
Classification Regression
Find decision boundary:
e.g.:
Combination of X > boundary
y class A
Combination of X < boundary
y class B
Find decision boundary:
e.g.:
y = Combination of X
33. Trinity College Dublin, The University of Dublin
Regression
33
X2: inflation
X1: cost of materials
y = avg cost house
Using the past (of x) to
predict the future (of y)
34. Trinity College Dublin, The University of Dublin 34
Regression
Dependent variable
Independent variables
35. Trinity College Dublin, The University of Dublin
Classification in Python
35
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
X is the data matrix
(features)
y is the class (‘five’ or
‘not a five’)
36. Trinity College Dublin, The University of Dublin
Regression in Python
36
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
X is the data matrix
(features)
y is the class (‘five’ or
‘not a five’)
Editor's Notes
Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,