SlideShare a Scribd company logo
1 of 5
Naitik(https://www.linkedin.com/in/naitikshukla/)
Very General guidelines for usage on DNN and how to proceed
for starters
Training data
A few measures one can take toget better training data:
 Get your hands on as large a dataset as possible(DNNs are quite data-hungry:
more is better)
 Remove any training sample with corrupted data (short texts, highly distorted
images, spurious output labels, features with lots of null values, etc.)
 Data Augmentation - create new examples (in case of images - rescale, add noise,
etc.)
Choose appropriate activation functions
Activations introduces the much-desired non-linearity intothe model. For
years, sigmoid activation functions have been the preferable choice. However,
a sigmoid function is inherently cursed by these two drawbacks –
1. Saturation of sigmoid at tails (further causing vanishing gradient problem).
2. Sigmoid are not zero-centered.
A better alternative is a tanh function - mathematically, tanh is just a rescaled and
shifted sigmoid
tanh(x) = 2*sigmoid(x) – 1
 tanh can still suffer from the vanishing gradient problem, but the good news is
- tanh is zero-centered.
 Hence, using tanh as activation function will result into faster convergence.
other alternatives are ReLU,SoftSign,etc.
Number of Hidden Units and Layers
Keeping a larger number of hidden units than the optimal number, is generally a safe
bet. Since, any regularization method will take care of superfluous units(at least to some
extent).
On the other hand, while keeping smaller numbers of hidden units(than the optimal
number), there are higher chances of underfitting the model.
Selecting the optimal number of layers is relatively straight forward.
As @Yoshua-Bengiomentioned on Quora - “You just keep on adding
layers, until the test error doesn’t improve anymore”. ;)
Naitik(https://www.linkedin.com/in/naitikshukla/)
Weight Initialization
Always initialize the weights with small random numbers tobreak the symmetry
between different units.
Toinitialize the weights that are evenly distributed, a uniform distribution is probably
one of the best choice.
Furthermore, as shown in the paper (Glorot and Bengio, 2010), units with more incoming
connections (fan_in) should have relatively smaller weights.
Thanks to all these thorough experiments, now we have a tested formula that we can
directly use for weight initialization; i.e. –
Learning Rates
This is probably one of the most important hyperparameter, governing the learning
process.
Set the learning rate too small and your model might take ages to converge, make it too
large and within initial few training examples, your loss might shoot up to sky.
Optimal learning rate should be in accordance to the specific task. Generally taken 0.01.
One possible alternative:
Gradually decreasing thelearning rate,after each epoch or after a few thousand examples.
Although this might help in faster training, but requires another manual decision about
the new learning rates.
Thesekinds of strategieswerequite common a few yearsback. Generally, learningratecan
be halved after each epoch.
Better Alternative:
weightsdrawnfrom ~ Uniform(-r,r)
where r = sqrt(6/(fan_in + fan_out)) for tanh activations
r = 4*(sqrt(6/fan_in + fan_out)) for sigmoid activations
where fan_in is thesize of theprevious layer and fan_out is the
size of next layer.
Naitik(https://www.linkedin.com/in/naitikshukla/)
We have better momentum based methods tochange the learning rate, based on the
curvature of the error function.
It might also help to set different learning rates for individual parameters in the model
since, some parameters might be learning at a relatively slower or faster rate.
Advanced Alternative:
Good amount of research on optimization methods, resulting into adaptive
learning rates.
We have numerous options starting from good old Momentum Method to Adagrad, Adam,
RMSProp etc.
Methods like Adagrad or Adam effectively save us from manually choosing an initial
learning rate.
Hyperparameter Tuning: Shun Grid Search - Embrace Random Search
Grid Search has been prevalent in classical machine learning. However, Grid Search is
not at all efficient in finding optimal hyperparameters for DNNs.
Primarily, because of the time taken by a DNN in trying out different hyperparameter
combinations. As the number of hyperparameters keeps on increasing, computation
required for Grid Search also increases exponentially.
There are twoways to go about it:
1. Based on your prior experience, you can manually tune some common
hyperparameters like learning rate, number of layers, etc.
2. Instead of Grid Search, use Random Search/Random Sampling for
choosing optimal hyperparameters. It is also possible to add some prior
knowledge to further decrease the search space(like learning rate shouldn’t be too
large or too small).
Learning Methods
Good old Stochastic Gradient Descent might not be as efficient for DNNs.
Lot of research to develop more flexible optimization algorithms. For
e.g.: Adagrad, Adam, AdaDelta, RMSProp, etc.
In addition to providing adaptive learning rates, these sophisticated methods also
use different rates for different model parameters and this generally results into
a smoother convergence.
Naitik(https://www.linkedin.com/in/naitikshukla/)
Best Practice:
Keep dimensions ofweights in the exponential power of 2
memory management is still done at the byte level; So, it’s always good to keep the
size of your parameters as 64,128,512,1024(all powers of 2). This might help in
sharding the matrices, weights, etc.
Unsupervised Pretraining
Doesn’t matter whether you are working with NLP, Computer Vision, Speech
Recognition, etc. Unsupervised Pretraining always help the training of your
supervised or other unsupervised models.
You can use ImageNet dataset to pretrain your model in an unsupervised manner, for a
2-class supervised classification.
Mini-Batch vs. Stochastic Learning
Major objective of training a model is to learn appropriate parameters, that results into
an optimal mapping from inputs to outputs.
Stochastic:
While employing a stochastic learning approach, gradients of weights are tuned after
each training sample, introducing noise into gradients(hence the word ‘stochastic’).
This has a very desirable effect; i.e. - with the introduction of noise during the training,
the model becomes less prone to overfitting.
Stochastic learning might effectively waste a large portion of computation power of
nowadays machines. If we are capable of computing Matrix-Matrix multiplication,
then why should we limit ourselves, to iterate through the multiplications of individual
pairs of Vectors?
There are scenarios when the model is getting the training data as a stream(online
learning), then resorting to Stochastic Learning is a good option.
Mini-Batch:
For greater throughput/faster learning, it’s recommended to use mini-batches instead of
stochastic learning.
Naitik(https://www.linkedin.com/in/naitikshukla/)
 selecting an appropriate batch size is equally important; so that we can still retain
some noise(by not using a huge batch)
 simultaneously use the computation power of machines more effectively.
Commonly, a batch of 16 to 128 examples is a good choice(exponential of 2).
Dropout for Regularization
Considering, millions of parameters to be learned, regularization becomes an imperative
requisite to prevent overfitting in DNNs.
You can keep on using L1/L2 regularization as well, but Dropout is preferable to check
overfitting in DNNs.
If the model is less complex, then a dropout of 0.2 might also suffice else a default value
of 0.5 is a good choice.
References:
1. Practical Recommendations for Deep Architectures(Yoshua Bengio)
2. How to train your Deep Neural Network(Rishabh Shukla)
3. Dropout: A Simple Way to Prevent Neural Networks from Overfitting

More Related Content

What's hot

Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with baggingChode Amarnath
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...Jiapeng Wu
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRatul Alahy
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble MethodsDongseo University
 
Introduction to deep learning workshop
Introduction to deep learning workshopIntroduction to deep learning workshop
Introduction to deep learning workshopShamane Siriwardhana
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
Fast AutoAugment
Fast AutoAugmentFast AutoAugment
Fast AutoAugmentYongsu Baek
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics Akanksha Bali
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning LandscapeEng Teong Cheah
 

What's hot (20)

Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
 
Introduction to deep learning workshop
Introduction to deep learning workshopIntroduction to deep learning workshop
Introduction to deep learning workshop
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Fast AutoAugment
Fast AutoAugmentFast AutoAugment
Fast AutoAugment
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 

Similar to Dnn guidelines

Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessingAjay Taneja
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...mlaij
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...mlaij
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Machine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfMachine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfPhD Assistance
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 

Similar to Dnn guidelines (20)

Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
 
Machine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfMachine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdf
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
ML crash course
ML crash courseML crash course
ML crash course
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 

Dnn guidelines

  • 1. Naitik(https://www.linkedin.com/in/naitikshukla/) Very General guidelines for usage on DNN and how to proceed for starters Training data A few measures one can take toget better training data:  Get your hands on as large a dataset as possible(DNNs are quite data-hungry: more is better)  Remove any training sample with corrupted data (short texts, highly distorted images, spurious output labels, features with lots of null values, etc.)  Data Augmentation - create new examples (in case of images - rescale, add noise, etc.) Choose appropriate activation functions Activations introduces the much-desired non-linearity intothe model. For years, sigmoid activation functions have been the preferable choice. However, a sigmoid function is inherently cursed by these two drawbacks – 1. Saturation of sigmoid at tails (further causing vanishing gradient problem). 2. Sigmoid are not zero-centered. A better alternative is a tanh function - mathematically, tanh is just a rescaled and shifted sigmoid tanh(x) = 2*sigmoid(x) – 1  tanh can still suffer from the vanishing gradient problem, but the good news is - tanh is zero-centered.  Hence, using tanh as activation function will result into faster convergence. other alternatives are ReLU,SoftSign,etc. Number of Hidden Units and Layers Keeping a larger number of hidden units than the optimal number, is generally a safe bet. Since, any regularization method will take care of superfluous units(at least to some extent). On the other hand, while keeping smaller numbers of hidden units(than the optimal number), there are higher chances of underfitting the model. Selecting the optimal number of layers is relatively straight forward. As @Yoshua-Bengiomentioned on Quora - “You just keep on adding layers, until the test error doesn’t improve anymore”. ;)
  • 2. Naitik(https://www.linkedin.com/in/naitikshukla/) Weight Initialization Always initialize the weights with small random numbers tobreak the symmetry between different units. Toinitialize the weights that are evenly distributed, a uniform distribution is probably one of the best choice. Furthermore, as shown in the paper (Glorot and Bengio, 2010), units with more incoming connections (fan_in) should have relatively smaller weights. Thanks to all these thorough experiments, now we have a tested formula that we can directly use for weight initialization; i.e. – Learning Rates This is probably one of the most important hyperparameter, governing the learning process. Set the learning rate too small and your model might take ages to converge, make it too large and within initial few training examples, your loss might shoot up to sky. Optimal learning rate should be in accordance to the specific task. Generally taken 0.01. One possible alternative: Gradually decreasing thelearning rate,after each epoch or after a few thousand examples. Although this might help in faster training, but requires another manual decision about the new learning rates. Thesekinds of strategieswerequite common a few yearsback. Generally, learningratecan be halved after each epoch. Better Alternative: weightsdrawnfrom ~ Uniform(-r,r) where r = sqrt(6/(fan_in + fan_out)) for tanh activations r = 4*(sqrt(6/fan_in + fan_out)) for sigmoid activations where fan_in is thesize of theprevious layer and fan_out is the size of next layer.
  • 3. Naitik(https://www.linkedin.com/in/naitikshukla/) We have better momentum based methods tochange the learning rate, based on the curvature of the error function. It might also help to set different learning rates for individual parameters in the model since, some parameters might be learning at a relatively slower or faster rate. Advanced Alternative: Good amount of research on optimization methods, resulting into adaptive learning rates. We have numerous options starting from good old Momentum Method to Adagrad, Adam, RMSProp etc. Methods like Adagrad or Adam effectively save us from manually choosing an initial learning rate. Hyperparameter Tuning: Shun Grid Search - Embrace Random Search Grid Search has been prevalent in classical machine learning. However, Grid Search is not at all efficient in finding optimal hyperparameters for DNNs. Primarily, because of the time taken by a DNN in trying out different hyperparameter combinations. As the number of hyperparameters keeps on increasing, computation required for Grid Search also increases exponentially. There are twoways to go about it: 1. Based on your prior experience, you can manually tune some common hyperparameters like learning rate, number of layers, etc. 2. Instead of Grid Search, use Random Search/Random Sampling for choosing optimal hyperparameters. It is also possible to add some prior knowledge to further decrease the search space(like learning rate shouldn’t be too large or too small). Learning Methods Good old Stochastic Gradient Descent might not be as efficient for DNNs. Lot of research to develop more flexible optimization algorithms. For e.g.: Adagrad, Adam, AdaDelta, RMSProp, etc. In addition to providing adaptive learning rates, these sophisticated methods also use different rates for different model parameters and this generally results into a smoother convergence.
  • 4. Naitik(https://www.linkedin.com/in/naitikshukla/) Best Practice: Keep dimensions ofweights in the exponential power of 2 memory management is still done at the byte level; So, it’s always good to keep the size of your parameters as 64,128,512,1024(all powers of 2). This might help in sharding the matrices, weights, etc. Unsupervised Pretraining Doesn’t matter whether you are working with NLP, Computer Vision, Speech Recognition, etc. Unsupervised Pretraining always help the training of your supervised or other unsupervised models. You can use ImageNet dataset to pretrain your model in an unsupervised manner, for a 2-class supervised classification. Mini-Batch vs. Stochastic Learning Major objective of training a model is to learn appropriate parameters, that results into an optimal mapping from inputs to outputs. Stochastic: While employing a stochastic learning approach, gradients of weights are tuned after each training sample, introducing noise into gradients(hence the word ‘stochastic’). This has a very desirable effect; i.e. - with the introduction of noise during the training, the model becomes less prone to overfitting. Stochastic learning might effectively waste a large portion of computation power of nowadays machines. If we are capable of computing Matrix-Matrix multiplication, then why should we limit ourselves, to iterate through the multiplications of individual pairs of Vectors? There are scenarios when the model is getting the training data as a stream(online learning), then resorting to Stochastic Learning is a good option. Mini-Batch: For greater throughput/faster learning, it’s recommended to use mini-batches instead of stochastic learning.
  • 5. Naitik(https://www.linkedin.com/in/naitikshukla/)  selecting an appropriate batch size is equally important; so that we can still retain some noise(by not using a huge batch)  simultaneously use the computation power of machines more effectively. Commonly, a batch of 16 to 128 examples is a good choice(exponential of 2). Dropout for Regularization Considering, millions of parameters to be learned, regularization becomes an imperative requisite to prevent overfitting in DNNs. You can keep on using L1/L2 regularization as well, but Dropout is preferable to check overfitting in DNNs. If the model is less complex, then a dropout of 0.2 might also suffice else a default value of 0.5 is a good choice. References: 1. Practical Recommendations for Deep Architectures(Yoshua Bengio) 2. How to train your Deep Neural Network(Rishabh Shukla) 3. Dropout: A Simple Way to Prevent Neural Networks from Overfitting