hands on machine learning Chapter 4 model training

Interaction Lab. Kumoh National Institute of Technology
Hands-On Machine Learning
with Scikit-Learn, Keras & TensorFlow
chapter4. Model Training
Jeong JaeYeop

■Linear Regression
■Gradient Descent
■Polynomial Regression
■Training Curve
■Regulated Linear Regression
■Logistic Regression
Agenda
Interaction Lab., Kumoh National Institue of Technology 2

Linear Regression
Gradient Descent
Polynomial Regression
Data Engineering Lab., Kumoh National Institue of Technology 3

■Linear Regression
 𝑦 = 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 + ⋯ + 𝜃𝑛𝑥𝑛
• 𝑦 : Predicted value
• 𝑛 : Size of data
• 𝑥 : Input data
 𝑀𝑆𝐸(𝑋, ℎ𝜃) =
1
𝑚 𝑖=1
𝑚
(𝜃𝑡𝑥 𝑖 − 𝑦(𝑖))2
• Mean squared error
■ Cost function
■ Predicted value – Actual value
■ Similar to the actual value, MSE value is small
Linear Regression(1/4)

■Normal Equation
 𝜃 = (𝑋𝑡𝑋)−1𝑋𝑡𝑦
• 𝜃 : Value to minimize cost function
• 𝑦 : Target vector

■Normal Equation
 𝜃 = (𝑋𝑡
𝑋)−1
𝑋𝑡
𝑦

■Normal Equation
 In Sckit-learn
• coef_ : weight
• intercept_ : bias

Gradient Descent
Training Curve

■Gradient Descent
 To adjust the parameters repeatedly to minimize the cost function
 Learning step : learning rate
Gradient Descent(1/6)

■Gradient Descent In Scikit-learn
 StandardScaler

■Batch Gradient Descent
 Computed for the entire training data
 𝑀𝑆𝐸(𝑋, ℎ𝜃) =
1
𝑚 𝑖=1
𝑚
(𝜃𝑡𝑥 𝑖 − 𝑦(𝑖))2
•
𝜕
𝜕𝜃𝑗
𝑀𝑆𝐸 𝜃 =
2
𝑚
𝑋𝑇
(𝑋𝜃 − 𝑦)

■Stochastic Gradient Descent
 Computed for only one sample data
• Learning schedule
■ Gradually reduce the learning rate

■Stochastic Gradient Descent
 20 steps

■Mini-Batch Gradient Descent
 Computed from a small data set called mini-batch
• Not entire data and one sample
• GPU for better performance

Training Curve
Regulated Linear Regression

■Polynomial Regression
 Not linear, complex shape
• Add the increments of each characteristic as a new characteristic
• Train linear models on datasets with extended characteristics
Polynomial Regression(1/2)

■Polynomial Regression in Scikit-learn
 PolynomialFeatures
 𝑦 = 0.5𝑥2
+ 1.0𝑥 + 2.0 + 𝑛𝑜𝑖𝑠𝑒
 𝑦 = 0.56𝑥2
+ 0.93𝑥 + 1.78
Polynomial Regression(2/2)

Training Curve
Logistic Regression

■Training Curve
 Checkable training set and validation set
• Make subset in training set and train several times
Training Curve(1/2)

Training Curve(2/2)
degree 1

Logistic Regression

■Regulation
 Avoid overfitting
• Limit weight in model
Regulated Linear Regression(1/8)

■Ridge Regression
 Regulation : 𝛼 𝑖=1
𝑛
𝜃𝑖
2
• 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝛼
1
2 𝑖=1
𝑛
𝜃𝑖
2
• 𝛼 : Parameter for regulate
• If 𝛼 is 0, ridge regression is linear regression
• 𝑊 = {𝜃1 + 𝜃2 + 𝜃3 + ⋯ + 𝜃𝑛} : Weight vector
• Regulation :
1
2
( 𝑊 2)2
• In Gradient Descent, 𝑀𝑆𝐸 + 𝛼𝑊

■Ridge Regression

■Lasso Regression
 Regulation : 𝛼 𝑖=1
𝑛
𝜃𝑖
 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝛼 𝑖=1
𝑛
𝜃𝑖
 Completely remove the weight of the less important variable
 Automatically selects variables and is a sparse model

■Lasso Regression
 Unable to Differentiate at 𝜃𝑖 = 0
• Subgradient vector

■Lasso Regression

■Elastic Net
 Ridge + Lasso
• 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝑟𝛼 𝑖=1
𝑛
𝜃𝑖 +
1−𝑟
2
𝛼
1
2 𝑖=1
𝑛
𝜃𝑖
2
• 𝑟 = 0, Ridge regression
• 𝑟 = 1, Lasso regression

■Early stopping
 Abort training when error is minimal

■Probability Estimate
 Compute the sum of weights for the input
 𝑝 = ℎ𝜃 𝑥 = 𝜎 𝜃𝑇
𝑥
 Probability more than 50% : correct
• Binary classification
 𝜎(∙) : Sigmoid function
• Output : 0 ~ 1
Logistic Regression(1/3)

■Train and Cost function
 Finding parameters of the model
• High probabilities for positive(y == 1) samples
• Low probabilities for negative samples
• 𝑐 𝜃 =
−log 𝑝 y = 1
−log 1 − 𝑝 y = 0
• 𝐽 𝜃 = −
1
𝑚 𝑖=0
𝑚
𝑦 𝑖 log 𝑝 𝑖 + 1 − 𝑦 log 1 − 𝑝 𝑖
•
𝜕
𝜕𝜃𝑗
𝐽 𝜃 =
1
𝑚 𝑖=1
𝑚
(𝜎(𝜃𝑇𝑥 𝑖 ) − 𝑦 𝑖 )𝑥(𝑗)
𝑖

■Softmax Regression
 Multinomial logistic regression
• Train several binary classifier, not connection
 𝑝𝑘 = 𝜎(𝑠(𝑥))𝑘 =
exp(𝑠𝑘(𝑥))
𝑗=1
𝐾
exp(𝑠𝑗(𝑥))
 Pick best score
• 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜎 𝑠 𝑥 𝑘
) = 𝑎𝑟𝑔𝑚𝑎𝑥(𝑠𝑘(𝑥)) = 𝑎𝑟𝑔𝑚𝑎𝑥 (𝜃(𝑘)
)𝑇
𝑥
 Cost function
• Cross-entropy

hands on machine learning Chapter 4 model training

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to hands on machine learning Chapter 4 model training

Similar to hands on machine learning Chapter 4 model training (20)

More from Jaey Jeong

More from Jaey Jeong (12)

Recently uploaded

Recently uploaded (20)

hands on machine learning Chapter 4 model training

Editor's Notes