SlideShare a Scribd company logo
1 of 34
Interaction Lab. Kumoh National Institute of Technology
Hands-On Machine Learning
with Scikit-Learn, Keras & TensorFlow
chapter4. Model Training
Jeong JaeYeop
■Linear Regression
■Gradient Descent
■Polynomial Regression
■Training Curve
■Regulated Linear Regression
■Logistic Regression
Agenda
Interaction Lab., Kumoh National Institue of Technology 2
Linear Regression
Gradient Descent
Polynomial Regression
Data Engineering Lab., Kumoh National Institue of Technology 3
■Linear Regression
 𝑦 = 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 + ⋯ + 𝜃𝑛𝑥𝑛
• 𝑦 : Predicted value
• 𝑛 : Size of data
• 𝑥 : Input data
 𝑀𝑆𝐸(𝑋, ℎ𝜃) =
1
𝑚 𝑖=1
𝑚
(𝜃𝑡𝑥 𝑖 − 𝑦(𝑖))2
• Mean squared error
■ Cost function
■ Predicted value – Actual value
■ Similar to the actual value, MSE value is small
Linear Regression(1/4)
Interaction Lab., Kumoh National Institue of Technology 4
■Normal Equation
 𝜃 = (𝑋𝑡𝑋)−1𝑋𝑡𝑦
• 𝜃 : Value to minimize cost function
• 𝑦 : Target vector
Linear Regression(2/4)
Interaction Lab., Kumoh National Institue of Technology 5
■Normal Equation
 𝜃 = (𝑋𝑡
𝑋)−1
𝑋𝑡
𝑦
Linear Regression(3/4)
Interaction Lab., Kumoh National Institue of Technology 6
■Normal Equation
 In Sckit-learn
• coef_ : weight
• intercept_ : bias
Linear Regression(4/4)
Interaction Lab., Kumoh National Institue of Technology 7
Gradient Descent
Polynomial Regression
Training Curve
Data Engineering Lab., Kumoh National Institue of Technology 8
■Gradient Descent
 To adjust the parameters repeatedly to minimize the cost function
 Learning step : learning rate
Gradient Descent(1/6)
■Gradient Descent In Scikit-learn
 StandardScaler
Gradient Descent(2/6)
Interaction Lab., Kumoh National Institue of Technology 10
■Batch Gradient Descent
 Computed for the entire training data
 𝑀𝑆𝐸(𝑋, ℎ𝜃) =
1
𝑚 𝑖=1
𝑚
(𝜃𝑡𝑥 𝑖 − 𝑦(𝑖))2
•
𝜕
𝜕𝜃𝑗
𝑀𝑆𝐸 𝜃 =
2
𝑚
𝑋𝑇
(𝑋𝜃 − 𝑦)
Gradient Descent(3/6)
Interaction Lab., Kumoh National Institue of Technology 11
■Stochastic Gradient Descent
 Computed for only one sample data
• Learning schedule
■ Gradually reduce the learning rate
Gradient Descent(4/6)
Interaction Lab., Kumoh National Institue of Technology 12
■Stochastic Gradient Descent
 20 steps
Gradient Descent(5/6)
Interaction Lab., Kumoh National Institue of Technology 13
■Mini-Batch Gradient Descent
 Computed from a small data set called mini-batch
• Not entire data and one sample
• GPU for better performance
Gradient Descent(6/6)
Interaction Lab., Kumoh National Institue of Technology 14
Polynomial Regression
Training Curve
Regulated Linear Regression
Data Engineering Lab., Kumoh National Institue of Technology 15
■Polynomial Regression
 Not linear, complex shape
• Add the increments of each characteristic as a new characteristic
• Train linear models on datasets with extended characteristics
Polynomial Regression(1/2)
■Polynomial Regression in Scikit-learn
 PolynomialFeatures
 𝑦 = 0.5𝑥2
+ 1.0𝑥 + 2.0 + 𝑛𝑜𝑖𝑠𝑒
 𝑦 = 0.56𝑥2
+ 0.93𝑥 + 1.78
Polynomial Regression(2/2)
Interaction Lab., Kumoh National Institue of Technology 17
Training Curve
Regulated Linear Regression
Logistic Regression
Data Engineering Lab., Kumoh National Institue of Technology 18
■Training Curve
 Checkable training set and validation set
• Make subset in training set and train several times
Training Curve(1/2)
Training Curve(2/2)
Interaction Lab., Kumoh National Institue of Technology 20
degree 1
Regulated Linear Regression
Logistic Regression
Data Engineering Lab., Kumoh National Institue of Technology 21
■Regulation
 Avoid overfitting
• Limit weight in model
Regulated Linear Regression(1/8)
■Ridge Regression
 Regulation : 𝛼 𝑖=1
𝑛
𝜃𝑖
2
• 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝛼
1
2 𝑖=1
𝑛
𝜃𝑖
2
• 𝛼 : Parameter for regulate
• If 𝛼 is 0, ridge regression is linear regression
• 𝑊 = {𝜃1 + 𝜃2 + 𝜃3 + ⋯ + 𝜃𝑛} : Weight vector
• Regulation :
1
2
( 𝑊 2)2
• In Gradient Descent, 𝑀𝑆𝐸 + 𝛼𝑊
Regulated Linear Regression(2/8)
■Ridge Regression
Regulated Linear Regression(3/8)
Interaction Lab., Kumoh National Institue of Technology 24
■Lasso Regression
 Regulation : 𝛼 𝑖=1
𝑛
𝜃𝑖
 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝛼 𝑖=1
𝑛
𝜃𝑖
 Completely remove the weight of the less important variable
 Automatically selects variables and is a sparse model
Regulated Linear Regression(4/8)
Interaction Lab., Kumoh National Institue of Technology 25
■Lasso Regression
 Unable to Differentiate at 𝜃𝑖 = 0
• Subgradient vector
Regulated Linear Regression(5/8)
Interaction Lab., Kumoh National Institue of Technology 26
■Lasso Regression
Regulated Linear Regression(6/8)
Interaction Lab., Kumoh National Institue of Technology 27
■Elastic Net
 Ridge + Lasso
• 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝑟𝛼 𝑖=1
𝑛
𝜃𝑖 +
1−𝑟
2
𝛼
1
2 𝑖=1
𝑛
𝜃𝑖
2
• 𝑟 = 0, Ridge regression
• 𝑟 = 1, Lasso regression
Regulated Linear Regression(7/8)
Interaction Lab., Kumoh National Institue of Technology 28
■Early stopping
 Abort training when error is minimal
Regulated Linear Regression(8/8)
Interaction Lab., Kumoh National Institue of Technology 29
Logistic Regression
■Probability Estimate
 Compute the sum of weights for the input
 𝑝 = ℎ𝜃 𝑥 = 𝜎 𝜃𝑇
𝑥
 Probability more than 50% : correct
• Binary classification
 𝜎(∙) : Sigmoid function
• Output : 0 ~ 1
Logistic Regression(1/3)
■Train and Cost function
 Finding parameters of the model
• High probabilities for positive(y == 1) samples
• Low probabilities for negative samples
• 𝑐 𝜃 =
−log 𝑝 y = 1
−log 1 − 𝑝 y = 0
• 𝐽 𝜃 = −
1
𝑚 𝑖=0
𝑚
𝑦 𝑖 log 𝑝 𝑖 + 1 − 𝑦 log 1 − 𝑝 𝑖
•
𝜕
𝜕𝜃𝑗
𝐽 𝜃 =
1
𝑚 𝑖=1
𝑚
(𝜎(𝜃𝑇𝑥 𝑖 ) − 𝑦 𝑖 )𝑥(𝑗)
𝑖
Logistic Regression(2/3)
Interaction Lab., Kumoh National Institue of Technology 32
■Softmax Regression
 Multinomial logistic regression
• Train several binary classifier, not connection
 𝑝𝑘 = 𝜎(𝑠(𝑥))𝑘 =
exp(𝑠𝑘(𝑥))
𝑗=1
𝐾
exp(𝑠𝑗(𝑥))
 Pick best score
• 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜎 𝑠 𝑥 𝑘
) = 𝑎𝑟𝑔𝑚𝑎𝑥(𝑠𝑘(𝑥)) = 𝑎𝑟𝑔𝑚𝑎𝑥 (𝜃(𝑘)
)𝑇
𝑥
 Cost function
• Cross-entropy
Logistic Regression(3/3)
Interaction Lab., Kumoh National Institue of Technology 33
Q&A

More Related Content

What's hot

What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation MaximizationAndres Mendez-Vazquez
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced dataSaurabhWani6
 
Neural Networks
Neural NetworksNeural Networks
Neural NetworksAdri Jovin
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forestsDebdoot Sheet
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introductionDaeJin Kim
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Rohit Kumar
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 

What's hot (20)

What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization
 
Clustering
ClusteringClustering
Clustering
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
Decision trees
Decision treesDecision trees
Decision trees
 
Greedymethod
GreedymethodGreedymethod
Greedymethod
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Ga ppt (1)
Ga ppt (1)Ga ppt (1)
Ga ppt (1)
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 

Similar to hands on machine learning Chapter 4 model training

deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationJaey Jeong
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLJanani C
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsJaey Jeong
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformerJaey Jeong
 
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasLinear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasSuneel Babu Chatla
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15Hao Zhuang
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forestJaey Jeong
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
08-Regression.pptx
08-Regression.pptx08-Regression.pptx
08-Regression.pptxShree Shree
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...sleepy_yoshi
 
# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustmentTerence Huang
 

Similar to hands on machine learning Chapter 4 model training (20)

deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagation
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skills
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasLinear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
08-Regression.pptx
08-Regression.pptx08-Regression.pptx
08-Regression.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
 
# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment
 
Complete (2)
Complete (2)Complete (2)
Complete (2)
 
Continuous control
Continuous controlContinuous control
Continuous control
 

More from Jaey Jeong

Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Jaey Jeong
 
Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationJaey Jeong
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for visionJaey Jeong
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNNJaey Jeong
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsJaey Jeong
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityJaey Jeong
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Jaey Jeong
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsJaey Jeong
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnnJaey Jeong
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingJaey Jeong
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkJaey Jeong
 

More from Jaey Jeong (12)

Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...
 
Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimation
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for vision
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settings
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual reality
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnn
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learing
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 

Recently uploaded

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

hands on machine learning Chapter 4 model training

  • 1. Interaction Lab. Kumoh National Institute of Technology Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow chapter4. Model Training Jeong JaeYeop
  • 2. ■Linear Regression ■Gradient Descent ■Polynomial Regression ■Training Curve ■Regulated Linear Regression ■Logistic Regression Agenda Interaction Lab., Kumoh National Institue of Technology 2
  • 3. Linear Regression Gradient Descent Polynomial Regression Data Engineering Lab., Kumoh National Institue of Technology 3
  • 4. ■Linear Regression  𝑦 = 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 + ⋯ + 𝜃𝑛𝑥𝑛 • 𝑦 : Predicted value • 𝑛 : Size of data • 𝑥 : Input data  𝑀𝑆𝐸(𝑋, ℎ𝜃) = 1 𝑚 𝑖=1 𝑚 (𝜃𝑡𝑥 𝑖 − 𝑦(𝑖))2 • Mean squared error ■ Cost function ■ Predicted value – Actual value ■ Similar to the actual value, MSE value is small Linear Regression(1/4) Interaction Lab., Kumoh National Institue of Technology 4
  • 5. ■Normal Equation  𝜃 = (𝑋𝑡𝑋)−1𝑋𝑡𝑦 • 𝜃 : Value to minimize cost function • 𝑦 : Target vector Linear Regression(2/4) Interaction Lab., Kumoh National Institue of Technology 5
  • 6. ■Normal Equation  𝜃 = (𝑋𝑡 𝑋)−1 𝑋𝑡 𝑦 Linear Regression(3/4) Interaction Lab., Kumoh National Institue of Technology 6
  • 7. ■Normal Equation  In Sckit-learn • coef_ : weight • intercept_ : bias Linear Regression(4/4) Interaction Lab., Kumoh National Institue of Technology 7
  • 8. Gradient Descent Polynomial Regression Training Curve Data Engineering Lab., Kumoh National Institue of Technology 8
  • 9. ■Gradient Descent  To adjust the parameters repeatedly to minimize the cost function  Learning step : learning rate Gradient Descent(1/6)
  • 10. ■Gradient Descent In Scikit-learn  StandardScaler Gradient Descent(2/6) Interaction Lab., Kumoh National Institue of Technology 10
  • 11. ■Batch Gradient Descent  Computed for the entire training data  𝑀𝑆𝐸(𝑋, ℎ𝜃) = 1 𝑚 𝑖=1 𝑚 (𝜃𝑡𝑥 𝑖 − 𝑦(𝑖))2 • 𝜕 𝜕𝜃𝑗 𝑀𝑆𝐸 𝜃 = 2 𝑚 𝑋𝑇 (𝑋𝜃 − 𝑦) Gradient Descent(3/6) Interaction Lab., Kumoh National Institue of Technology 11
  • 12. ■Stochastic Gradient Descent  Computed for only one sample data • Learning schedule ■ Gradually reduce the learning rate Gradient Descent(4/6) Interaction Lab., Kumoh National Institue of Technology 12
  • 13. ■Stochastic Gradient Descent  20 steps Gradient Descent(5/6) Interaction Lab., Kumoh National Institue of Technology 13
  • 14. ■Mini-Batch Gradient Descent  Computed from a small data set called mini-batch • Not entire data and one sample • GPU for better performance Gradient Descent(6/6) Interaction Lab., Kumoh National Institue of Technology 14
  • 15. Polynomial Regression Training Curve Regulated Linear Regression Data Engineering Lab., Kumoh National Institue of Technology 15
  • 16. ■Polynomial Regression  Not linear, complex shape • Add the increments of each characteristic as a new characteristic • Train linear models on datasets with extended characteristics Polynomial Regression(1/2)
  • 17. ■Polynomial Regression in Scikit-learn  PolynomialFeatures  𝑦 = 0.5𝑥2 + 1.0𝑥 + 2.0 + 𝑛𝑜𝑖𝑠𝑒  𝑦 = 0.56𝑥2 + 0.93𝑥 + 1.78 Polynomial Regression(2/2) Interaction Lab., Kumoh National Institue of Technology 17
  • 18. Training Curve Regulated Linear Regression Logistic Regression Data Engineering Lab., Kumoh National Institue of Technology 18
  • 19. ■Training Curve  Checkable training set and validation set • Make subset in training set and train several times Training Curve(1/2)
  • 20. Training Curve(2/2) Interaction Lab., Kumoh National Institue of Technology 20 degree 1
  • 21. Regulated Linear Regression Logistic Regression Data Engineering Lab., Kumoh National Institue of Technology 21
  • 22. ■Regulation  Avoid overfitting • Limit weight in model Regulated Linear Regression(1/8)
  • 23. ■Ridge Regression  Regulation : 𝛼 𝑖=1 𝑛 𝜃𝑖 2 • 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝛼 1 2 𝑖=1 𝑛 𝜃𝑖 2 • 𝛼 : Parameter for regulate • If 𝛼 is 0, ridge regression is linear regression • 𝑊 = {𝜃1 + 𝜃2 + 𝜃3 + ⋯ + 𝜃𝑛} : Weight vector • Regulation : 1 2 ( 𝑊 2)2 • In Gradient Descent, 𝑀𝑆𝐸 + 𝛼𝑊 Regulated Linear Regression(2/8)
  • 24. ■Ridge Regression Regulated Linear Regression(3/8) Interaction Lab., Kumoh National Institue of Technology 24
  • 25. ■Lasso Regression  Regulation : 𝛼 𝑖=1 𝑛 𝜃𝑖  𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝛼 𝑖=1 𝑛 𝜃𝑖  Completely remove the weight of the less important variable  Automatically selects variables and is a sparse model Regulated Linear Regression(4/8) Interaction Lab., Kumoh National Institue of Technology 25
  • 26. ■Lasso Regression  Unable to Differentiate at 𝜃𝑖 = 0 • Subgradient vector Regulated Linear Regression(5/8) Interaction Lab., Kumoh National Institue of Technology 26
  • 27. ■Lasso Regression Regulated Linear Regression(6/8) Interaction Lab., Kumoh National Institue of Technology 27
  • 28. ■Elastic Net  Ridge + Lasso • 𝐽 𝜃 = 𝑀𝑆𝐸 𝜃 + 𝑟𝛼 𝑖=1 𝑛 𝜃𝑖 + 1−𝑟 2 𝛼 1 2 𝑖=1 𝑛 𝜃𝑖 2 • 𝑟 = 0, Ridge regression • 𝑟 = 1, Lasso regression Regulated Linear Regression(7/8) Interaction Lab., Kumoh National Institue of Technology 28
  • 29. ■Early stopping  Abort training when error is minimal Regulated Linear Regression(8/8) Interaction Lab., Kumoh National Institue of Technology 29
  • 31. ■Probability Estimate  Compute the sum of weights for the input  𝑝 = ℎ𝜃 𝑥 = 𝜎 𝜃𝑇 𝑥  Probability more than 50% : correct • Binary classification  𝜎(∙) : Sigmoid function • Output : 0 ~ 1 Logistic Regression(1/3)
  • 32. ■Train and Cost function  Finding parameters of the model • High probabilities for positive(y == 1) samples • Low probabilities for negative samples • 𝑐 𝜃 = −log 𝑝 y = 1 −log 1 − 𝑝 y = 0 • 𝐽 𝜃 = − 1 𝑚 𝑖=0 𝑚 𝑦 𝑖 log 𝑝 𝑖 + 1 − 𝑦 log 1 − 𝑝 𝑖 • 𝜕 𝜕𝜃𝑗 𝐽 𝜃 = 1 𝑚 𝑖=1 𝑚 (𝜎(𝜃𝑇𝑥 𝑖 ) − 𝑦 𝑖 )𝑥(𝑗) 𝑖 Logistic Regression(2/3) Interaction Lab., Kumoh National Institue of Technology 32
  • 33. ■Softmax Regression  Multinomial logistic regression • Train several binary classifier, not connection  𝑝𝑘 = 𝜎(𝑠(𝑥))𝑘 = exp(𝑠𝑘(𝑥)) 𝑗=1 𝐾 exp(𝑠𝑗(𝑥))  Pick best score • 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜎 𝑠 𝑥 𝑘 ) = 𝑎𝑟𝑔𝑚𝑎𝑥(𝑠𝑘(𝑥)) = 𝑎𝑟𝑔𝑚𝑎𝑥 (𝜃(𝑘) )𝑇 𝑥  Cost function • Cross-entropy Logistic Regression(3/3) Interaction Lab., Kumoh National Institue of Technology 33
  • 34. Q&A

Editor's Notes

  1. 세타는 가중치 선형회귀는 가중치와 입력특성을 곱해서 연속된 값으로 특정한 값의 결과를 얻어내는 것 비용함수 평균제곱오차 예측 값에서 실제 값을 빼고 전체 데이터 수로 나눠주면 된다 평균제곱오차가 작으면 실제 값과 유사하다고 판단
  2. 선형회귀를 해석적으로 풀어봤을 때 수식으로 최적의 파라미터를 구할 수 있다. 밑에 코드로 식을 하나 만들어 냈고, 그 식을 다음에 그래프로 표현했다
  3. X0번째 항은 항상 1으로 하기 때문에 코드에서 처음에 1을 추가해주었고, 위 식에 맞게 코드를 구현해서 답을 구해보면 다음과 같이 처음 처음 선언했을 때와 유사한 값이 나온다 하지만 노이즈 때문에 조금 다르게 나옴 0일때와 2일때 값을 구해서 예측하는 선형을 그었을 때는 다음과 같다
  4. 사이킷런에서는 리니어레그리션 이용해서 구현 가능하다. 밑에 코에피는 계수 인터셉트는 편향을 뜻하고 값을 구해보면 다음과 같이 유사하게 나온다
  5. 경사하강법은 비용함수 값을 최소화 하기 위해서 이제 점진적으로 파라미터 값을 조정하는 것이다. 말그대로 모델의 경사를 타고 쭈욱 내려간다 여기서 하나하나 내려가는 런닝 스텝은 학습률을 뜻하고 이 학습률에 따라 파라미터가 갱신된다. 하지만 학습률이 너무 크다면 다음과 같이 발산하고, 너무 작으면 너무 느리게 최적값으로 수렴하기 때문에 시간이 오래걸린다. 또한 여기처럼 높은 차원의 그래프에서는 최적점이 여러개일수도 있고 가장 작은 전역 최솟값에 도달을 못할수돟 있다.
  6. 경사하강법에 경우 입력 스케일에 영향을 많이 받기 때문에 사이킷런의 스탠다드스케일러를 사용해서 조정해줘야한다. 그래프 보면 입력 스케일 차이가 나면 최적값을 찾아가는데 최저그이 경로로 가지 못한다
  7. 배치 경사하강법이란 한번 학습을 진행 할 때 전체 데이터 모두를 사용하는 것을 뜻한다 여기서 사용하는 비용함수는 평균제곱오차인데 학습하면서 파라미터를 갱신할 때는 비용함수를 미분한 값을 학습률과 곱해서 갱신시켜준다. 밑에 코드 확인 한번에 모든 데이터를 사용하기 때문에 데이터 모두를 묶어서 더해주고 사용해준다. 학습률에 따라서 서로 다른 모습을 보여준다
  8. 확률적 경사하강법은 모든 데이터를 사용하지 않고 한 개의 샘플을 이용해서 값을 계산한다. 그래서 배치경사하강법보다 빠르다 하지만 정확성이 조금 떨어질수 있다 말그대로 확률성이기에 그래프를 보면 최적의 경로 가지는 않지만 나름 수렴하는 것을 볼 수 있다. 여기서 중요한게 학습 스케줄이라는 것인데. 학습률을 점차적으로 조금씪 줄여가는게 중요하다. 처음에는 최적점에서 멀리있다고 판단해서 크게크게 나아가지만 점점 줄어들어서 최적점에 도달하게 한다
  9. 확률적 경사 하강법의 20 스텝을 구현해봤는데 무작위로 값을 뽑는 코드와 합습률을 조정하는 코드가 추가됐다. 옆에 그래프를 보면 처음엔 크게 갱신되다가 적절한 값을 찾았을 때 세밀하게 조정되는걸 알 수 있다
  10. 미니배치 경사하상법은 전체 데이터에서 일부분의 데이터를 사용해서 계산을 진행하는 것이다. 지피유를 이용하면 성능이 좋게 나오는 방법이라고 한다. 밑에 그래프에 3가지를 비교해뒀다
  11. 다항회귀는 기존에 있던 식에 고차항의 특성이 추가되면서 구현하고 특성샘플들을 알맞게 거듭제곱해주고 그것을 데이터에 추가해서 학습함 으로써 선형회귀로도 구현을 할 수 있다 밑에 코드로 데이터를 생성했다
  12. 사이킷런에서 제공하는 폴리노미어피쳐는 각 특성에 맞춰서 데이터를 변환해주는 것으로 디그리를 설정해서 각 특성에 맞게 값을 변환할 수 있게 해준다 여기서는 제곱, 값을 제곱해서 추가해준걸 확인가능 그걸로 값을 예측해보면 다음과 같이 가중치와 편향을 얻을 수 있다.
  13. 학습곡선이란 훈련셋과 검증셋을 이용해서 데이터에 적합한 모델을 찾는데 도움을 줄 수 있다. 여기 보면 300차항의 모델은 다음에 데이터에 과대적합 된 것을 알 수 있다 눈으로만 봐도 과하게 데이터에 맞춰져 있는 것이다 반면에 1차식은 데이터에 과소적합이다 모델의 특성을 다 이해할 수 없다 2차 식만이 적당히 학습되어져 있다, 이런 문제 때문에 학습 곡선을 이용해서 적절한 모델을 찾을 수 있다
  14. 그 전 데이터에 대한 1차항에 학습 곡선이다. 이 그래프는 모델이 과소적합이라는 것을 증명한다 처음에는 데이터 한두개에서는 적절한 손실값을 나타내지만 그 후로 데이터가 추가되면서 값이 그대로 유지된다 즉 데이터의 특성을 이해할 수 없어서 값에 변화가 없는것이다 검증 그래프를 보면 모델이 처음에는 적은 데이터 수 때문에 제대로 검증이 안되서 오류가 크지만 나중에 가서는 학습이 되면서 점차 감소한다 하지만 제대로 직선으로는 제대로 검증이 되지 않기 때문에 오차에 변화가 없어짐을 알 수 있다. 이러한 그래프를 보고 판단을 해야한다
  15. 규제란 일반적인 선형회귀에 규제라는 것을 추가해서 학습에 도움을 주는 것이다. 이 책에서는 모델의 가중치를 제어함으로써 과대적합을 피하도록 했다
  16. 릿지 회귀는 선형회귀 모델에 다음과 같은 규제식을(패널티) 추가해서 구현했다 여기서 알파는 규제를 제어하는 파라미터로써 알파가 0이되면 뒤에 식이 0이 되기 때문에 그냥 선형회귀와 같다. 결과적으로 릿지회귀가 파라미터 값을 갱신할 때는 가중치 벡터들과 그 알파 값을 곱해서 평균제곱오차에 더해준다. 그래프를 보면 오른쪽이 다항회귀인데 알파가 커질 수록 가중치가 크게 규제되는 것을 알 수 있고, 그에 따라서 그래프가 선형적으로 변한다
  17. 알파가 커질수록 계수 가중치 값이 줄어드는 것을 그래프로 볼 수 있다
  18. 라쏘는 릿지와 비슷하게 동작한다. 하지만 사용하는 규제가 다르다. 라쏘의 특징으로는 필요 없다고 생각하는 가중치의 값을 제거할려고 시도하는 것이다. 즉 0이 아닌 값이 적은 희소모델로 만든다 특징을 잘 살릴수 있다는 것이다. 라쏘의 그래프를 보면 알파 값이 커질수록 가중치가 제거된 것을 볼 수 있다 따라서 선형적으로 나타난다.
  19. 라쏘의 또 다른 특징으로는 라쏘의 그래프가 뾰족하게 나타나기 때문에 미분이 불가능 그래서 -1과 1사이의 값으로 바꿔서 한다
  20. 알파값을 키우니까 제거되는 가중치가 많아진 것을 알 수 있다
  21. 엘라스틱 넷은 라쏘와 릿지를 절충한 부분으로 r 파라미터로 값을 조절할 수 있다 r이 0이면 릿지고 1이면 라쏘다
  22. 또 다른 규제 방법으로는 조기 종료가 있다 최적의 모델 파라미터에서 값을 저장하고 그 다음에 오류가 증가하면 그 전에 최적의 모델 파라미터로 값을 되돌리고 학습을 종료하는 방법
  23. 로지스틱 회귀란 입력이 어떤 클래스에 속하는지 분류 할 수 있도록 해주는 것이다. 로지스틱 회귀는 입력 가중치들의 합을 이용해서 계산을 하고 시그모이드 함수를 통해서 0보다 크면 양성이고 작으면 음성인 이진분류기를 뜻한다.
  24. 이러한 로지스틱 회귀의 훈련은 양성일 때 가장 높은 확률 음성일 때 낮은 확률을 나오게 훈련시키면 되는데 따라서 사용하는 비용함수는 밑에 로그함수이다 이 로그함수에는 1에 가까운 값이 들어가면 값이 매우 작게 나오고 1과 먼 값이 들어가면 값이 매우 크게 나온다 따라서 각 데이터의 정답에 따라서 값을 조정할 수 있다. 밑에 식은 한번에 모아놓은것으로 레이블 값에 따라서 사용되는 함수는 다르다 밑에는 단순히 위 식을 미분해서 나타낸 식이다. 이 미분값으로 값을 업데이트 하는 것이다
  25. 소프트 맥스 회귀는 다중 로지스틱 회귀라고 생각하면 된다. 말 그대로 이진 분류기를 이용한 다중분류기로써 서 여러 개 이진분류기를 학습 시켜서 다중 분류기로 사용한다. 사용하는 비용함수는 크로스엔트로피이고 단순히 입력 정답값을 다 더하고 분자로 그 때의 정답값을 나누면서 확률을 구한다. 이 모든 클래스 확률의 합은 1으로써 진짜 확률과 똑같이 구현된다. 결국 이 클래스가 속한 정답은 가장 큰값을 선택해서 구한다.