2. MNIST Dataset
• The process of applying
simple and complex
transformations to
enhance the
performance of a model.
• Dataset augmentation
applies transformations
to your training
examples.
• Offline & Online data
augmentation.
3.
4.
5. Regularization
• Dropout: Turn off some neuron in each iteration with probability p.
• Early Stopping: Provide guidance such that how many iterations can
be run before the model begins to overfit.
• Weight Constraint: Scales weights to a pre-defined threshold.
• Noise:Introduce stochastic noise into training process.
• noise as a regularization method in a neural network with overfitting.
13. Cont.,
• Dropout is a technique where randomly selected neurons are ignored
during the training process.
• Large neural networks trained on small datasets will always overfit the
training data.
• Dropout changed the concept of learning all the weights together.
• It will do only partial learning.
• Can be used in Fully connected layers/ convolutional layers/ Recurrent
layers
• A new hyper parameter is introduced that specifies the probability
at which outputs of the particular layer are dropped out.
14. Some important aspects:
• Can be used with all network types and resolved Co-adaptation
• The weights of the network will be larger than normal network
because of dropout.
• Dropout forces a neural network to learn more robust features .
• Dropout roughly doubles the number of iterations required to
converge. However, training time for each epoch is less.
• Over fitting due to Co-adaptation – L1 & L2 Regularization
• scale the weights w by p , dropout for intermediate layer is 0.5 and
input layer is 0.2.
15.
16. CASE STUDY: CIFAR-10
• The deep network is built had three convolution layers of size 64,
128 and 256 followed by two densely connected layers of size 512
and an output layer of size 10 (number of classes in the CIFAR-10
dataset).
• ReLU as the activation function for hidden layers and sigmoid for
the output layer &cross-entropy loss.
18. Early stopping
• Early stopping is an optimization technique that is used to reduce
overfitting without compromising on the accuracy of the model.
• Deciding on the number epochs/ training time.
• Underfit/ overfit for limited or too long training.
• Stop training once the models performance start to degrade for
validation set is called as Early stopping.
• Elements in Early stopping
Monitoring model performance.
Trigger to stop training.
The choice of model to use.
19. Cont.,
• Early Stopping protect against over
fitting and needs considerably less
number of Epoch to train.
• A callback is a powerful tool to
customize the behavior of a Keras
model during training, evaluation, or
inference
• Callback-internal states and statistics
of a model during training
• Starting/ Stopping of the training
process
• End of epochs/ end of training a
batch.
• Monitor, patience, mode,
restore_best_weights
callback =
tf.keras.callbacks.EarlyStopping(patie
nce=4, restore_best_weights=True)
history1 = model2.fit(trn_images,
trn_labels,
epochs=50,validation_data=(valid_image
s, valid_labels),callbacks=[callback])
21. Ensemble Methods
• A solution to the high variance of neural networks is to train
multiple models and combine their predictions.
• Training Data: Vary the choice of data used to train each model in
the ensemble.
• Ensemble Models: Vary the choice of the models used in the
ensemble.
• Combinations: Vary the choice of the way that outcomes from
ensemble members are combined
22. Batch Normalization
• Batch normalization is one of the important features in Deep
Learning.
• we normalize each layer’s inputs by using the mean and standard
deviation of the values in the current batch.
• Batch normalization acts as a Regularizer, normalizing the inputs,
in the backpropagation process, and can be adapted to most of the
models to converge better.
• Normalizing the inputs to hidden layers helps in faster learning.
• Batch normalization reduces the covariate shift.
23. Cont.,
• when the distribution of input data shifts between the training
environment and live environment, the input and the output
distribution may change, but the labels remain the same.
• Covariate shift can occur gradually over time or suddenly after the
deployment of the model.
• Normalization has the effect of stabilizing the neural network.
• we use batch normalization by normalizing the outputs using
mean=0, standard dev=1 (μ=0,σ=1).
• The learning rates can be made high for improving the training
process.
28. Weight Initialization Techniques
• The weight initialization technique adapted for neural network
can determine how quickly the network converges.
• In neural networks, weights represent the strength of connections
between units or neurons in adjacent network layers.
• Improperly initialized weights can negatively affect the training
process by contributing to the vanishing or exploding gradient
problem.
• Initializing with weights that are too large may result in exploding
gradient values during forward propagation or back-propagation.
29. Cont.,
• Xavier and Bengio (2010) proposed the “Xavier” initialization
which considers the size of the network (number of input and
output units) while initializing weights.
• This approach ensures that the weights stay within a reasonable
range of values by making them inversely proportional to the
square root of the number of units in the previous layer.
• Weight pruning means eliminating unnecessary values in the
weight tensors.