SlideShare a Scribd company logo
Tuning Learning Rate
Taka Wang
20210727
Hyperparameters vs Model Parameters
● Learning rate
● Momentum or the hyperparameters for
Adam optimization algorithm
● Number of layers
● Number of hidden units
● Mini-batch size
● Activation function
● Number of epochs
● ...
2
1. How fast the
algorithm learns
2. Whether the cost
function is
minimized or not
Effect of Learning Rate
3
Source: Understanding Learning Rate in Machine Learning
4
Source: Setting the learning rate of your neural network.
5
Source: Understanding Fastai's fit_one_cycle method
Adjust Learning Rate During Training
● Adaptive Learning Rate Methods (AdaGrad, Adam, etc.)
● Learning Rate Annealing
● Cyclical Learning Rate
● LR Finder
6
Source: How do we decide the optimizer used for training?
Learning Rate Schedule
8
9
Why use learning rate schedule?
● Too small a learning rate and your neural network may not learn at all
● Too large a learning rate and you may overshoot areas of low loss (or even
overfit from the start of training)
➔ Finding a set of reasonably “good” weights early in the training process with
a larger learning rate.
➔ Tuning these weights later in the process to find more optimal weights using
a smaller learning rate.
10
Learning Rate Schedule
● Time-based decay
● Linear decay
● Step decay (Piecewise Constant
Decay)
● Polynomial decay
● Exponential decay Two Methods:
● Built-in Schedules
● Custom Callbacks (every batch)
Keras Example
import tensorflow as tf
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32')/255
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, activation='sigmoid', input_shape=(784,)))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics = ['accuracy'])
model.fit(x_train, y_train, epochs=10, verbose=0, callbacks=[])
11
12
Time-based decay (InverseTimeDecay)
13
lr_fn = keras.optimizers.schedules.InverseTimeDecay(
initial_learning_rate = 0.1,
decay_steps = 1.0,
decay_rate = 0.5
)
model.compile(
optimizer=tf.keras.optimizers.SGD(learning_rate=lr_fn),
loss='categorical_crossentropy',
metrics=['accuracy']
)
model.fit(data, labels, epochs=5)
Source: Learning Rate Schedules in Deep Learning
Step Decay
14
from tensorflow.keras.callbacks import LearningRateScheduler
class StepDecay:
def __init__(self, initAlpha=0.01, factor=0.25, dropEvery=10):
self.initAlpha = initAlpha
self.factor = factor
self.dropEvery = dropEvery
def __call__(self, epoch):
# compute the learning rate for the current epoch
exp = np.floor((1 + epoch) / self.dropEvery)
alpha = self.initAlpha * (self.factor ** exp)
return float(alpha) # learning rate
cb = [LearningRateScheduler(schedule)]
model.fit(x_train, y_train, epochs=10, callbacks=cb)
Linear Decay & Polynomial Decay
15
Learning rate is decayed to zero over a fixed number of epochs.
Cyclical Learning Rate
16
17
Let LR cyclical vary
between boundary
values
Estimate reasonable
bounds
Claims & Proposal
● We don’t know what the optimal initial learning rate is.
● Monotonically decreasing our learning rate may lead to our network getting
“stuck” in plateaus of the loss landscape.
18
● Define a minimu learning rate
● Define a maximum learning rate
● Allow the learning rate to cyclical oscillate between the two bounds
Source: Escaping from Saddle Points
saddle point
convex function
critical point
update rule
Loss
Landscape
20
model architecture & dataset
Source: VISUALIZING THE LOSS LANDSCAPE OF NEURAL NETS
21
CLR - Policies
● batch size: number of training examples
● batch or iteration: number of weight updates per
epoch (#total training examples/batch size)
● cycle: number of iterations (lower -> upper -> lower)
● step size: number of batch/iterations in a half cycle
https://github.com/bckenstler/CLR
Implementations
22
opt = SGD(lr=config.MIN_LR, momentum=0.9)
model = MiniGoogLeNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# initialize the cyclical learning rate callback
clr = CyclicLR(
mode="triangular",
base_lr=config.MIN_LR,
max_lr=config.MAX_LR,
step_size= config.STEP_SIZE * (trainX.shape[0] // config.BATCH_SIZE),
)
model.fit(
...,
steps_per_epoch=trainX.shape[0] // config.BATCH_SIZE,
epochs=config.NUM_EPOCHS,
callbacks=[clr])
MIN_LR = 1e-7
MAX_LR = 1e-2
BATCH_SIZE = 64
STEP_SIZE = 8 (4 or 8)
CLR_METHOD = "triangular"
NUM_EPOCHS = 96
https://github.com/bckenstler/CLR
TensorFlow Addons Optimizers
23
!pip install -q -U tensorflow_addons
import tensorflow_addons as tfa
...
steps_per_epoch = len(x_train) // BATCH_SIZE
clr = tfa.optimizers.CyclicalLearningRate(
initial_learning_rate=INIT_LR,
maximal_learning_rate=MAX_LR,
scale_fn=lambda x: 1/(2.**(x-1)),
step_size=2 * steps_per_epoch
)
optimizer = tf.keras.optimizers.SGD(clr)
clr_model = tf.keras.models.load_model("initial_model")
clr_history = train_model(clr_model, optimizer=optimizer)
#no_clr_history = train_model(standard_model, optimizer="sgd")
BATCH_SIZE = 64
EPOCHS = 10
INIT_LR = 1e-4
MAX_LR = 1e-2
Experiment Results - Triangular
24
Experiment Results - Triangular2
25
LR Finder (Range Test)
26
Automatic learning rate finder algorithm
28
Learning Rate Increase After Every Mini-Batch
3~5 epochs
29
● Recommended minimum: loss decreases the fastest (minimum
negative gradient)
● Recommended maximum: 10 times less (one order lower) than the
learning rate where the loss is minimum (if loss is low at 0.1, good
value to start is 0.01).
Source: The Learning Rate Finder Technique: How Reliable Is It?
30
Reminder
● use the same initial weights for the LRFinder and the subsequent model
training.
● We simply keep a copy of the model weights to reset them, that way they are
“as they were” before you ran the learning rate finder.
● We should never assume that the found learning rates are the best for any
model initialization ❌
● setting a narrower range than what is recommended is safer and could
reduce the risk of divergence due to very high learning rates.
31
● min: loss decreases the fastest
● max: narrower then 1 order lower
● Higher batch → higher learning rate
Source: The Learning Rate Finder Technique: How Reliable Is It?
Summary
● Learning Rate Annealing
● Cyclical Learning Rate
● LR Finder
32
One Cycle Policy
33
34
Learning rate
Batch Size
Momentum
Weight Decay
Learning Rate
fastai Modification (cosine descent)
0.08~0.8
The maximum should be the value picked with
a learning rate finder procedure.
Source: Finding Good Learning Rate and The One Cycle Policy.
Cyclic Momentum
36
fastai Modification (cosine ascent)
Source: Finding Good Learning Rate and The One Cycle Policy.
Weight Decay Matters
1e-3 1e-5
Example of super-convergence
38
Source: Understanding Fastai's fit_one_cycle method
@log_args(but_as=Learner.fit)
@delegates(Learner.fit_one_cycle)
def fine_tune(self:Learner, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100, pct_start=0.3, div=5.0, **kwargs):
"Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
self.freeze()
self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
base_lr /= 2
self.unfreeze()
self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
@log_args(but_as=Learner.fit)
def fit_one_cycle(self:Learner, n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25, wd=None,
moms=None, cbs=None, reset_opt=False):
"Fit `self.model` for `n_epoch` using the 1cycle policy."
if self.opt is None: self.create_opt()
self.opt.set_hyper('lr', self.lr if lr_max is None else lr_max)
lr_max = np.array([h['lr'] for h in self.opt.hypers])
scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
References
● 1.Keras learning rate schedules and decay (PyImageSearch)
● 2.Cyclical Learning Rates with Keras and Deep Learning (PyImageSearch)
● 3.Keras Learning Rate Finder (PyImageSearch)
● Learning Rate Schedule in Practice: an example with Keras and TensorFlow 2.0 👍
● Understanding Learning Rate in Machine Learning
● Learning Rate Schedules in Deep Learning
● Setting the learning rate of your neural network
● Exploring Super-Convergence 👍
● The Learning Rate Finder Technique: How Reliable Is It?
40
References - One Cycle
● One-cycle learning rate schedulers (Kaggle)
● Finding Good Learning Rate and The One Cycle Policy. 👍
● The 1cycle policy (fastbook author)
● Understanding Fastai's fit_one_cycle method 👍
41
Colab
● Keras learning rate schedules and decay (PyImageSearch)
● Cyclical Learning Rates with Keras and Deep Learning (PyImageSearch)
● Keras Learning Rate Finder (PyImageSearch) 💎
● TensorFlow Addons Optimizers: CyclicalLearningRate 👍
42
Further Reading
● Cyclical Learning Rates for Training Neural Networks (Leslie, 2015)
● Super-Convergence: Very Fast Training of Neural Networks Using Large Learning
Rates (Leslie et., 2017)
● A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate,
batch size, momentum, and weight decay (Leslie, 2018)
● SGDR: Stochastic Gradient Descent with Warm Restarts (2016)
● Snapshot Ensembles: Train 1, get M for free (2017)
● A brief history of learning rate schedulers and adaptive optimizers 💎
● Faster Deep Learning Training with PyTorch – a 2021 Guide 💎
43

More Related Content

What's hot

Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
Arsalan Qadri
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
Hasan H Topcu
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 
Knn
KnnKnn
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
Sebastian Ruder
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learning
ahmad bassiouny
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Kmeans
KmeansKmeans
Kmeans
Nikita Goyal
 
Random forest
Random forestRandom forest
Random forest
Musa Hawamdah
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
Yan Xu
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
Shubham Jaybhaye
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 

What's hot (20)

Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Knn
KnnKnn
Knn
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learning
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Kmeans
KmeansKmeans
Kmeans
 
Random forest
Random forestRandom forest
Random forest
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Similar to Tuning learning rate

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Regression ppt
Regression pptRegression ppt
Regression ppt
SuyashSingh70
 
C3 w1
C3 w1C3 w1
Introduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural netsIntroduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural nets
Sayak Paul
 
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
MahmoudAbuGhali
 
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters pptScalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Ruochun Tzeng
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
johanericka2
 
Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
Naitik Shukla
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
Raouf KESKES
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
AminaRepo
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
MohamedAliHabib3
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
Vishwas N
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Databricks
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Jisang Yoon
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Databricks
 
Competition winning learning rates
Competition winning learning ratesCompetition winning learning rates
Competition winning learning rates
MLconf
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
Hayim Makabee
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 

Similar to Tuning learning rate (20)

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
C3 w1
C3 w1C3 w1
C3 w1
 
Introduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural netsIntroduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural nets
 
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
17_00-Dima-Panchenko-cnn-tips-and-tricks.pptx
 
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters pptScalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
 
Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
Competition winning learning rates
Competition winning learning ratesCompetition winning learning rates
Competition winning learning rates
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 

More from Jamie (Taka) Wang

20200727_Insight workstation
20200727_Insight workstation20200727_Insight workstation
20200727_Insight workstationJamie (Taka) Wang
 
20200723_insight_release_plan
20200723_insight_release_plan20200723_insight_release_plan
20200723_insight_release_plan
Jamie (Taka) Wang
 
20200808自營電商平台策略討論
20200808自營電商平台策略討論20200808自營電商平台策略討論
20200808自營電商平台策略討論Jamie (Taka) Wang
 
20200408_gen11_sequence_diagram
20200408_gen11_sequence_diagram20200408_gen11_sequence_diagram
20200408_gen11_sequence_diagramJamie (Taka) Wang
 
20190827_activity_diagram
20190827_activity_diagram20190827_activity_diagram
20190827_activity_diagram
Jamie (Taka) Wang
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
Jamie (Taka) Wang
 
20161220 - microservice
20161220 - microservice20161220 - microservice
20161220 - microservice
Jamie (Taka) Wang
 
20160217 - Overview of Vortex Intelligent Data Sharing Platform
20160217 - Overview of Vortex Intelligent Data Sharing Platform20160217 - Overview of Vortex Intelligent Data Sharing Platform
20160217 - Overview of Vortex Intelligent Data Sharing Platform
Jamie (Taka) Wang
 
20151207 - iot strategy
20151207 - iot strategy20151207 - iot strategy
20151207 - iot strategy
Jamie (Taka) Wang
 
20141210 - Microservice Container
20141210 - Microservice Container20141210 - Microservice Container
20141210 - Microservice Container
Jamie (Taka) Wang
 

More from Jamie (Taka) Wang (20)

20200606_insight_Ignition
20200606_insight_Ignition20200606_insight_Ignition
20200606_insight_Ignition
 
20200727_Insight workstation
20200727_Insight workstation20200727_Insight workstation
20200727_Insight workstation
 
20200723_insight_release_plan
20200723_insight_release_plan20200723_insight_release_plan
20200723_insight_release_plan
 
20210105_量產技轉
20210105_量產技轉20210105_量產技轉
20210105_量產技轉
 
20200808自營電商平台策略討論
20200808自營電商平台策略討論20200808自營電商平台策略討論
20200808自營電商平台策略討論
 
20200427_hardware
20200427_hardware20200427_hardware
20200427_hardware
 
20200429_ec
20200429_ec20200429_ec
20200429_ec
 
20200607_insight_sync
20200607_insight_sync20200607_insight_sync
20200607_insight_sync
 
20220113_product_day
20220113_product_day20220113_product_day
20220113_product_day
 
20200429_software
20200429_software20200429_software
20200429_software
 
20200602_insight_business
20200602_insight_business20200602_insight_business
20200602_insight_business
 
20200408_gen11_sequence_diagram
20200408_gen11_sequence_diagram20200408_gen11_sequence_diagram
20200408_gen11_sequence_diagram
 
20190827_activity_diagram
20190827_activity_diagram20190827_activity_diagram
20190827_activity_diagram
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
20161220 - microservice
20161220 - microservice20161220 - microservice
20161220 - microservice
 
20160217 - Overview of Vortex Intelligent Data Sharing Platform
20160217 - Overview of Vortex Intelligent Data Sharing Platform20160217 - Overview of Vortex Intelligent Data Sharing Platform
20160217 - Overview of Vortex Intelligent Data Sharing Platform
 
20151111 - IoT Sync Up
20151111 - IoT Sync Up20151111 - IoT Sync Up
20151111 - IoT Sync Up
 
20151207 - iot strategy
20151207 - iot strategy20151207 - iot strategy
20151207 - iot strategy
 
20141210 - Microservice Container
20141210 - Microservice Container20141210 - Microservice Container
20141210 - Microservice Container
 
20161027 - edge part2
20161027 - edge part220161027 - edge part2
20161027 - edge part2
 

Recently uploaded

Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
VetriVel359477
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Injection: Risks and challenges - Injection of CO2 into geological rock forma...
Injection: Risks and challenges - Injection of CO2 into geological rock forma...Injection: Risks and challenges - Injection of CO2 into geological rock forma...
Injection: Risks and challenges - Injection of CO2 into geological rock forma...
Oeko-Institut
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 

Recently uploaded (20)

Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Injection: Risks and challenges - Injection of CO2 into geological rock forma...
Injection: Risks and challenges - Injection of CO2 into geological rock forma...Injection: Risks and challenges - Injection of CO2 into geological rock forma...
Injection: Risks and challenges - Injection of CO2 into geological rock forma...
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 

Tuning learning rate

  • 2. Hyperparameters vs Model Parameters ● Learning rate ● Momentum or the hyperparameters for Adam optimization algorithm ● Number of layers ● Number of hidden units ● Mini-batch size ● Activation function ● Number of epochs ● ... 2
  • 3. 1. How fast the algorithm learns 2. Whether the cost function is minimized or not Effect of Learning Rate 3 Source: Understanding Learning Rate in Machine Learning
  • 4. 4 Source: Setting the learning rate of your neural network.
  • 5. 5 Source: Understanding Fastai's fit_one_cycle method
  • 6. Adjust Learning Rate During Training ● Adaptive Learning Rate Methods (AdaGrad, Adam, etc.) ● Learning Rate Annealing ● Cyclical Learning Rate ● LR Finder 6
  • 7. Source: How do we decide the optimizer used for training?
  • 9. 9 Why use learning rate schedule? ● Too small a learning rate and your neural network may not learn at all ● Too large a learning rate and you may overshoot areas of low loss (or even overfit from the start of training) ➔ Finding a set of reasonably “good” weights early in the training process with a larger learning rate. ➔ Tuning these weights later in the process to find more optimal weights using a smaller learning rate.
  • 10. 10 Learning Rate Schedule ● Time-based decay ● Linear decay ● Step decay (Piecewise Constant Decay) ● Polynomial decay ● Exponential decay Two Methods: ● Built-in Schedules ● Custom Callbacks (every batch)
  • 11. Keras Example import tensorflow as tf (x_train, y_train), _ = tf.keras.datasets.mnist.load_data() x_train = x_train.reshape(60000, 784).astype('float32')/255 y_train = tf.keras.utils.to_categorical(y_train, num_classes=10) model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(10, activation='sigmoid', input_shape=(784,))) model.add(tf.keras.layers.Dense(10, activation='softmax')) model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics = ['accuracy']) model.fit(x_train, y_train, epochs=10, verbose=0, callbacks=[]) 11
  • 12. 12
  • 13. Time-based decay (InverseTimeDecay) 13 lr_fn = keras.optimizers.schedules.InverseTimeDecay( initial_learning_rate = 0.1, decay_steps = 1.0, decay_rate = 0.5 ) model.compile( optimizer=tf.keras.optimizers.SGD(learning_rate=lr_fn), loss='categorical_crossentropy', metrics=['accuracy'] ) model.fit(data, labels, epochs=5) Source: Learning Rate Schedules in Deep Learning
  • 14. Step Decay 14 from tensorflow.keras.callbacks import LearningRateScheduler class StepDecay: def __init__(self, initAlpha=0.01, factor=0.25, dropEvery=10): self.initAlpha = initAlpha self.factor = factor self.dropEvery = dropEvery def __call__(self, epoch): # compute the learning rate for the current epoch exp = np.floor((1 + epoch) / self.dropEvery) alpha = self.initAlpha * (self.factor ** exp) return float(alpha) # learning rate cb = [LearningRateScheduler(schedule)] model.fit(x_train, y_train, epochs=10, callbacks=cb)
  • 15. Linear Decay & Polynomial Decay 15 Learning rate is decayed to zero over a fixed number of epochs.
  • 17. 17 Let LR cyclical vary between boundary values Estimate reasonable bounds
  • 18. Claims & Proposal ● We don’t know what the optimal initial learning rate is. ● Monotonically decreasing our learning rate may lead to our network getting “stuck” in plateaus of the loss landscape. 18 ● Define a minimu learning rate ● Define a maximum learning rate ● Allow the learning rate to cyclical oscillate between the two bounds
  • 19. Source: Escaping from Saddle Points saddle point convex function critical point update rule
  • 20. Loss Landscape 20 model architecture & dataset Source: VISUALIZING THE LOSS LANDSCAPE OF NEURAL NETS
  • 21. 21 CLR - Policies ● batch size: number of training examples ● batch or iteration: number of weight updates per epoch (#total training examples/batch size) ● cycle: number of iterations (lower -> upper -> lower) ● step size: number of batch/iterations in a half cycle https://github.com/bckenstler/CLR
  • 22. Implementations 22 opt = SGD(lr=config.MIN_LR, momentum=0.9) model = MiniGoogLeNet.build(width=32, height=32, depth=3, classes=10) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) # initialize the cyclical learning rate callback clr = CyclicLR( mode="triangular", base_lr=config.MIN_LR, max_lr=config.MAX_LR, step_size= config.STEP_SIZE * (trainX.shape[0] // config.BATCH_SIZE), ) model.fit( ..., steps_per_epoch=trainX.shape[0] // config.BATCH_SIZE, epochs=config.NUM_EPOCHS, callbacks=[clr]) MIN_LR = 1e-7 MAX_LR = 1e-2 BATCH_SIZE = 64 STEP_SIZE = 8 (4 or 8) CLR_METHOD = "triangular" NUM_EPOCHS = 96 https://github.com/bckenstler/CLR
  • 23. TensorFlow Addons Optimizers 23 !pip install -q -U tensorflow_addons import tensorflow_addons as tfa ... steps_per_epoch = len(x_train) // BATCH_SIZE clr = tfa.optimizers.CyclicalLearningRate( initial_learning_rate=INIT_LR, maximal_learning_rate=MAX_LR, scale_fn=lambda x: 1/(2.**(x-1)), step_size=2 * steps_per_epoch ) optimizer = tf.keras.optimizers.SGD(clr) clr_model = tf.keras.models.load_model("initial_model") clr_history = train_model(clr_model, optimizer=optimizer) #no_clr_history = train_model(standard_model, optimizer="sgd") BATCH_SIZE = 64 EPOCHS = 10 INIT_LR = 1e-4 MAX_LR = 1e-2
  • 24. Experiment Results - Triangular 24
  • 25. Experiment Results - Triangular2 25
  • 26. LR Finder (Range Test) 26
  • 27. Automatic learning rate finder algorithm
  • 28. 28 Learning Rate Increase After Every Mini-Batch 3~5 epochs
  • 29. 29 ● Recommended minimum: loss decreases the fastest (minimum negative gradient) ● Recommended maximum: 10 times less (one order lower) than the learning rate where the loss is minimum (if loss is low at 0.1, good value to start is 0.01). Source: The Learning Rate Finder Technique: How Reliable Is It?
  • 30. 30 Reminder ● use the same initial weights for the LRFinder and the subsequent model training. ● We simply keep a copy of the model weights to reset them, that way they are “as they were” before you ran the learning rate finder. ● We should never assume that the found learning rates are the best for any model initialization ❌ ● setting a narrower range than what is recommended is safer and could reduce the risk of divergence due to very high learning rates.
  • 31. 31 ● min: loss decreases the fastest ● max: narrower then 1 order lower ● Higher batch → higher learning rate Source: The Learning Rate Finder Technique: How Reliable Is It?
  • 32. Summary ● Learning Rate Annealing ● Cyclical Learning Rate ● LR Finder 32
  • 35. Learning Rate fastai Modification (cosine descent) 0.08~0.8 The maximum should be the value picked with a learning rate finder procedure. Source: Finding Good Learning Rate and The One Cycle Policy.
  • 36. Cyclic Momentum 36 fastai Modification (cosine ascent) Source: Finding Good Learning Rate and The One Cycle Policy.
  • 38. Example of super-convergence 38 Source: Understanding Fastai's fit_one_cycle method
  • 39. @log_args(but_as=Learner.fit) @delegates(Learner.fit_one_cycle) def fine_tune(self:Learner, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100, pct_start=0.3, div=5.0, **kwargs): "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR" self.freeze() self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs) base_lr /= 2 self.unfreeze() self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs) @log_args(but_as=Learner.fit) def fit_one_cycle(self:Learner, n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25, wd=None, moms=None, cbs=None, reset_opt=False): "Fit `self.model` for `n_epoch` using the 1cycle policy." if self.opt is None: self.create_opt() self.opt.set_hyper('lr', self.lr if lr_max is None else lr_max) lr_max = np.array([h['lr'] for h in self.opt.hypers]) scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final), 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))} self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
  • 40. References ● 1.Keras learning rate schedules and decay (PyImageSearch) ● 2.Cyclical Learning Rates with Keras and Deep Learning (PyImageSearch) ● 3.Keras Learning Rate Finder (PyImageSearch) ● Learning Rate Schedule in Practice: an example with Keras and TensorFlow 2.0 👍 ● Understanding Learning Rate in Machine Learning ● Learning Rate Schedules in Deep Learning ● Setting the learning rate of your neural network ● Exploring Super-Convergence 👍 ● The Learning Rate Finder Technique: How Reliable Is It? 40
  • 41. References - One Cycle ● One-cycle learning rate schedulers (Kaggle) ● Finding Good Learning Rate and The One Cycle Policy. 👍 ● The 1cycle policy (fastbook author) ● Understanding Fastai's fit_one_cycle method 👍 41
  • 42. Colab ● Keras learning rate schedules and decay (PyImageSearch) ● Cyclical Learning Rates with Keras and Deep Learning (PyImageSearch) ● Keras Learning Rate Finder (PyImageSearch) 💎 ● TensorFlow Addons Optimizers: CyclicalLearningRate 👍 42
  • 43. Further Reading ● Cyclical Learning Rates for Training Neural Networks (Leslie, 2015) ● Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates (Leslie et., 2017) ● A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay (Leslie, 2018) ● SGDR: Stochastic Gradient Descent with Warm Restarts (2016) ● Snapshot Ensembles: Train 1, get M for free (2017) ● A brief history of learning rate schedulers and adaptive optimizers 💎 ● Faster Deep Learning Training with PyTorch – a 2021 Guide 💎 43