Advertisement

Mar. 23, 2023•0 likes## 0 likes

•2 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Art & Photos

mmmmmmmmmmmmmmm

SSSSSSSSSSSS5Follow

Advertisement

Advertisement

Advertisement

- Deep Learning Foundations and Applications Jiaul Paik Lecture 5
- Gradient Descent Algorithm 1. Randomly set the values of parameters (thetas) 2. Repeat until convergence 𝜽𝒋 𝒕+𝟏 = 𝜽𝒋 𝒕 - 𝒓 ∗ 𝝏𝑬 𝝏𝜽𝒋 for all j
- Parameter Initialization • Very large initialization leads to exploding gradients • Very small initialization leads to vanishing gradients • We need to maintain a balance
- Initialization • Xavier initialization For every layer l, set the parameters according to normal distribution 𝑛 𝑙−1 is the number of neurons in layer (l-1)
- Initialization • Kaiming Initialization For every layer l, set the parameters according to normal distribution 𝑛 𝑙 is the number of neurons in layer (l) 𝑊[𝑙] = 𝑁 0, 2 𝑛 𝑙 𝑏[𝑙] = 0
- Computing Loss
- Cross Entropy
- Batch Normalization
- Internal Covariance Shift • Each layer of a neural network has inputs with a corresponding distribution • It generally depends on • the randomness in the parameter initialization and • the randomness in the input data. • These effect on the internal layers during training is called internal covariate shift.
- Batch Normalization: Main idea • Normalize distribution of each input feature in each layer across each minibatch to N(0, 1) • Scale and shift
- Batch Normalization: How to do? • Normalize distribution of each input feature in each layer across each minibatch to N(0, 1) • Learn the scale and shift 𝜸 𝒂𝒏𝒅 𝜷 are trainable parameters. find using backprop Loffe & Szegedy
- Batch Normalization: Computing Gradients • Normalize distribution of each input feature in each layer across each minibatch to N(0, 1) • Learn the scale and shift Loffe & Szegedy
- Batch Normalization: At test time • You see only one example • Needs to use mean and variance for normalization • Needs to contain information learnt through all training examples • Run a moving average across all mini-batches of the entire training samples (population statistics)
- Regularization
- Improving Single Model Performance
- Regularization • Key idea • Add a term to the error/loss function
- Regularization • Key idea • Add a term to the error/loss function
- Regularization • Key idea • Add a term to the error/loss function
- Regularization • Key idea • Add a term to the error/loss function

Advertisement