Session 6.pdf

Machine Learning Programming
BDA712-00
Lecturer: Josué Obregón PhD
Kyung Hee University
Department of Big Data Analytics
October 12, 2022
Logistic Regression II:
Model validation, image recognition and multiclass classification
1
Machine Learning Programming, KHU

Your first learning
program
Building a tiny
supervised learning
program
Hyperspace!
Multiple linear
regression
Getting real
Recognize a single digit
using MNSIT
A discerning
machine
From regression to
classification
Walking the
gradient
Gradient descent
algorithm
Previously, in our course…

Your first learning
program
Building a tiny
supervised learning
program
Hyperspace!
Multiple linear
regression
Getting real
Recognize a single digit
using MNSIT
A discerning
machine
From regression to
classification
Walking the
gradient
Gradient descent
algorithm
And today…

Today's agenda
• Model evaluation and selection
• Training vs.Testing
• MINST dataset
• Data input format
• Recognizing a single digit
• Data preprocessing and encoding
• Going multiclass
• Intuition behind the loss function
• Transforming linear regression to logistic regression
Machine Learning Programming, KHU 4

How should we think about model selection?
One of the central themes of this class, and Machine Learning is:
Generalizability: We want to construct models that generalize
well to unseen data
• i.e.,We want to:
1
2
Add variables/flexibility as long as doing so helps capture meaningful
trends in the data (avoid underfitting)
Ignore meaningless random fluctuations in the data (avoid overfitting)

How should we think about model selection?
Let’s remind ourselves of the first CentralTheme of this class.
1. Generalizability: We want to construct models that generalize
well to unseen data
• i.e.,We want to:
1
2
Add variables/flexibility as long as doing so helps capture meaningful
trends in the data (avoid underfitting)
Ignore meaningless random fluctuations in the day (avoid overfitting)

Assessing Model Performance
• Suppose we fit a model ̂
𝑓𝑓 𝑥𝑥 to some training data: Train = 𝑥𝑥 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
𝑖𝑖=1
𝑛𝑛
• We want to assess how well ̂
𝑓𝑓 performs
• We can compute the average squared prediction error over Train
𝑀𝑀𝑀𝑀𝐸𝐸𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 1
𝑛𝑛
∑𝑖𝑖=1
𝑛𝑛
𝑦𝑦(𝑖𝑖)− ̂
𝑓𝑓 𝑥𝑥 𝑖𝑖
2
• But this may push us towards more overfit models.
• Instead,we should compute it using fresh test data: Train = 𝑥𝑥 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
𝑖𝑖=1
𝑚𝑚
𝑀𝑀𝑀𝑀𝐸𝐸𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 1
𝑛𝑛
∑𝑖𝑖=1
𝑚𝑚
𝑦𝑦(𝑖𝑖)− ̂
𝑓𝑓 𝑥𝑥 𝑖𝑖
2
• This would tell us if ̂
𝑓𝑓 generalizes well to new data

Assessing Model Accuracy:Training Error vs.Testing Error
Here are three different models fit to the same small Train data set. Which of these three
is the best model?

Model 1 2 3
MSETrain = RSS/n 23.2 5.2 7.5

Here are some new observations,which form our Test data. How well do our models fit the
Test data?
Solid green points: Test data Open grey circles: Train data

Model 1 2 3
MSETrain 23.2 5.2 7.5
MSETest 24.6 10.3 7.0

Assessing Model Accuracy
• As we increase the flexibility of our model, our training set error always decreases
• The same is not true for test set error
• The test set error will decrease as we add flexibility that helps to capture useful
trends
• As we add too much flexibility, the test set error will begin to increase
due to model overfitting

Computer vision tasks
Image classification
https://medium.com/analytics-vidhya/image-classification-
vs-object-detection-vs-image-segmentation-f36db85fe81

MNIST Data
• MNIST is a collection of labeled images that’s been assembled
specifically for supervised learning.
• Its name stands for “Modified NIST,” because it’s a remix of earlier data from
the National Institute of Standards and Technology.
• MNIST contains images of handwritten digits, labeled with their
numerical values.
• 60,000 images for training and 10,000 for testing

MNIST Data
• Digits are made up of 28 by 28 grayscale pixels, each represented by
one byte.
• In MNIST’s grayscale, 0 stands for “perfect background white,” and
255 stands for “perfect foreground black.”

How to interpret image data
https://dev.to/sandeepbalachandran/machine-
learning-going-furthur-with-cnn-part-2-41km

Preparing the input Matrices
28×28 image
Flatten or reshape the 2D
matrix into a 1D vector 0 0 0 235 .. .. 1 2 1 0
We will get a 784 sized 1D vector
Add the bias column
Input for our logistic
regression algorithm

Preparing the input Matrices

Let’s get real (Lab Session 06)
Goal: Build a program on top of our previous implementation to use the
MNIST dataset as input and classify the images according to the digits
from 0 to 9.Additionally, check the generalization capabilities of our
model by checking the performance on unseen data (test set).
Let’s do it!
• https://classroom.github.com/a/R-cw8Rn-

Going Multiclass

One-hot encoding

Acknowledgements
Some of the lectures notes for this class feature content borrowed with
or without modification from the following sources:
• 95-791Data Mining Carneige Mellon University, Lecture notes (Prof.
Alexandra Chouldechova)
• An Introduction to Statistical Learning, with applications in R (Springer, 2013)
with permission from the authors: G. James, D. Witten, T. Hastie and R.
Tibshirani
• Machine learning online course from Andrew Ng

Session 6.pdf

Recommended

Recommended

More Related Content

Similar to Session 6.pdf

Similar to Session 6.pdf (20)

Recently uploaded

Recently uploaded (20)

Session 6.pdf