SlideShare a Scribd company logo
Outline
 Machine Learning basics
 Introduction to Deep Learning
 what is Deep Learning
 why is it useful
 Main components/hyper-parameters:
 activation functions
 optimizers, cost functions and training
 regularization methods
 tuning
 classification vs. regression tasks
 DNN basic architecture:
 Convolutional neural network
Most material from CS224 NLP with DL course at Stanford
Machine learning is a field of computer science that gives computers the
ability to learn without being explicitly programmed
Methods that can learn from and make predictions on data
Labeled Data
Labeled Data
Machine Learning
algorithm
Learned model Prediction
Training
Prediction
Machine Learning Basics
Regression
Supervised: Learning with a labeled training set
Example: email classification with already labeled emails
Unsupervised: Discover patterns in unlabeled data
Example: cluster similar documents based on text
Reinforcement learning: learn to act based on feedback/reward
Example: learn to play Go, reward: win or lose
Types of Learning
class A
class A
Classification Clustering
http://mbjoseph.github.io/2013/11/27/measure.html
Most machine learning methods work well because of human-designed
representations and input features
ML becomes just optimizing weights to best make a final prediction
ML vs. Deep Learning
Deep learning algorithms attempt to learn (multiple levels of) representation by
using a hierarchy of multiple layers
If you provide the system tons of information, it begins to understand it and
respond in useful ways.
What is Deep Learning (DL) ?
https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
o Manually designed features are often over-specified, incomplete and take a
long time to design and validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable
framework for representing world, visual and linguistic information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning
o Utilize large amounts of training data
Why is DL useful?
In ~2010 DL started outperforming
other ML techniques
first in speech and vision, then NLP
Neural Network Intro
Demo
How do we train?
𝒉 = 𝝈(𝐖𝟏𝒙 + 𝒃𝟏)
𝒚 = 𝝈(𝑾𝟐𝒉 + 𝒃𝟐)
𝒉
𝒚
𝒙
4 + 2 = 6 neurons (not counting inputs)
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters
Weights
Activation functions
Training
Sample
labeled data
(batch)
Forward it
through the
network, get
predictions
Back-
propagate
the errors
Update the
network
weights
Optimize (min. or max.) objective/cost function 𝑱(𝜽)
Generate error signal that measures difference
between predictions and target values
Use error signal to change the weights and get more
accurate predictions
Subtracting a fraction of the gradient moves you
towards the (local) minimum of the cost function
https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
Non-linearities needed to learn complex (non-linear) representations of data,
otherwise the NN would be just a linear function
More layers and neurons can approximate more complex functions
Activation functions
W1W2𝑥 = 𝑊𝑥
Full list: https://en.wikipedia.org/wiki/Activation_function
http://cs231n.github.io/assets/nn1/layer_sizes.jpeg
Activation: Sigmoid
+ Nice interpretation as the firing rate of a neuron
• 0 = not firing at all
• 1 = fully firing
- Sigmoid neurons saturate and kill gradients, thus NN will barely learn
• when the neuron’s activation are 0 or 1 (saturate)
� gradient at these regions almost zero
� almost no signal will flow to its weights
� if initial weights are too large then most neurons would saturate
Takes a real-valued number and
“squashes” it into range between 0
and 1.
𝑅𝑛 → 0,1
http://adilmoujahid.com/images/activation.png
Activation: ReLU
Takes a real-valued number and
thresholds it at zero
𝑅𝑛 → 𝑅+
𝑛
Most Deep Networks use ReLU nowadays
� Trains much faster
• accelerates the convergence of SGD
• due to linear, non-saturating form
� Less expensive operations
• compared to sigmoid/tanh (exponentials etc.)
• implemented by simply thresholding a matrix at zero
� More expressive
� Prevents the gradient vanishing problem
f 𝑥 = max(0, 𝑥)
http://adilmoujahid.com/images/activation.png
Overfitting
Learned hypothesis may fit the
training data very well, even
outliers (noise) but fail to
generalize to new examples (test
data)
http://wiki.bethanycrane.com/overfitting-of-data
https://www.neuraldesigner.com/images/learning/selection_error.svg
L2 = weight decay
• Regularization term that penalizes big weights, added to
the objective
• Weight decay value determines how dominant regularization is
during gradient computation
• Big weight decay coefficient  big penalty for big weights
Regularization
Dropout
• Randomly drop units (along with their
connections) during training
• Each unit retained with fixed probability p,
independent of other units
• Hyper-parameter p to be chosen (tuned)
𝐽𝑟𝑒𝑔 𝜃 = 𝐽 𝜃 + 𝜆
𝑘
𝜃𝑘
2
Early-stopping
• Use validation error to decide when to stop training
• Stop when monitored quantity has not improved after n subsequent epochs
• n is called patience
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural
networks from overfitting." Journal of machine learning research (2014)
Convolutional Neural
Networks
0 We know it is good to learn a small model.
0 From this fully connected model, do we really need
all the edges?
0 Can some of these be shared?
Consider learning an image:
0 Some patterns are much smaller than the whole
image
“beak” detector
Can represent a small region with fewer parameters
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
“middle beak”
detector
They can be compressed
to the same parameters.
The whole CNN
Fully Connected
Feedforward network
cat dog ……
Convolution
Max Pooling
Convolution
Max Pooling
Flattened
Can
repeat
many
times
A convolutional layer
A filter
A CNN is a neural network with some convolutional layers
(and some other layers). A convolutional layer has a number
of filters that does convolutional operation.
Beak detector
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
Repeat this for each filter
stride=1
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Feature
Map
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
image
convolution
-1 1 -1
-1 1 -1
-1 1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1
x
2
x
…
…
36
x
…
…
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
Fully-
connected
Max Pooling
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
Why Pooling
0Subsampling pixels will not change the
object
Subsampling
bird
bird
We can subsample the pixels to make image smaller
fewer parameters to characterize the image
A CNN compresses a fully
connected network in two
ways:
0 Reducing number of connections
0 Shared weights on the edges
0 Max pooling further reduces the complexity
The whole CNN
Convolution
Max Pooling
Convolution
Max Pooling
Can
repeat
many
times
A new image
The number of channels
is the number of filters
Smaller than the original
image
3 0
1
3
-1 1
3
0
The whole CNN
Fully Connected
Feedforward network
cat dog ……
Convolution
Max Pooling
Convolution
Max Pooling
Flattened
A new image
A new image
Flattening
3 0
1
3
-1 1
3
0 Flattened
3
0
1
3
-1
1
0
3
Fully Connected
Feedforward network
Only modified the network structure and
input format (vector -> 3-D tensor)
CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
input
1 -1 -1
-1 1 -1
-1 -1 1
-1 1 -1
-1 1 -1
-1 1 -1
There are
25 3x3
filters.
…
…
Input_shape = ( 28 , 28 , 1)
1: black/white, 3: RGB
28 x 28 pixels
3 -1
-3 1
3
Only modified the network structure and
input format (vector -> 3-D array)
CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
Input
1 x 28 x 28
25 x 26 x 26
25 x 13 x 13
50 x 11 x 11
50 x 5 x 5
How many parameters for
each filter?
How many parameters
for each filter?
9
225=
25x9
Only modified the network structure and
input format (vector -> 3-D array)
CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
Input
1 x 28 x 28
25 x 26 x 26
25 x 13 x 13
50 x 11 x 11
50 x 5 x 5
Flattened
1250
Fully connected
feedforward network
Output
 Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal
of machine learning research (2014)
 Bergstra, James, and Yoshua Bengio. "Random search for hyper-parameter optimization." Journal of
Machine Learning Research, Feb (2012)
 Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)
 Severyn, Aliaksei, and Alessandro Moschitti. "UNITN: Training Deep Convolutional Neural Network for
Twitter Sentiment Classification." SemEval@ NAACL-HLT (2015)
 Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical
machine translation." EMNLP (2014)
 Ilya Sutskever et al. “Sequence to sequence learning with neural networks.” NIPS (2014)
 Bahdanau et al. "Neural machine translation by jointly learning to align and translate." ICLR (2015)
 Gal, Y., Islam, R., Ghahramani, Z. “Deep Bayesian Active Learning with Image Data.” ICML (2017)
 Nair, V., Hinton, G.E. “Rectified linear units improve restricted boltzmann machines.” ICML (2010)
 Ronan Collobert, et al. “Natural language processing (almost) from scratch.” JMLR (2011)
 Kumar, Shantanu. "A Survey of Deep Learning Methods for Relation Extraction." arXiv preprint
arXiv:1705.03645 (2017)
 Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016) [code]
 Zeng, D.et al. “Relation classification via convolutional deep neural network”. COLING (2014)
 Nguyen, T.H., Grishman, R. “Relation extraction: Perspective from CNNs.” VS@ HLT-NAACL. (2015)
 Zhang, D., Wang, D. “Relation classification via recurrent NN.” -arXiv preprint arXiv:1508.01006 (2015)
 Zhou, P. et al. “Attention-based bidirectional LSTM networks for relation classification . ACL (2016)
 Mike Mintz et al. “Distant supervision for relation extraction without labeled data.” ACL- IJCNLP (2009)
References
https://giphy.com/gifs/thanks-thank-you-thnx-3o6ozuHcxTtVWJJn32/download

More Related Content

What's hot

TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
Barbara Fusinska
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
David Rostcheck
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
Oswald Campesato
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
Sri Ambati
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
Jason Tsai
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
Barbara Fusinska
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
HJ van Veen
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Simplilearn
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
Mohamed Loey
 
Deep Learning Survey
Deep Learning SurveyDeep Learning Survey
Deep Learning Survey
Anthony Parziale
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
Madhu Sanjeevi (Mady)
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
Massimiliano Ruocco
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning intro
beamandrew
 

What's hot (20)

TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
Deep Learning Survey
Deep Learning SurveyDeep Learning Survey
Deep Learning Survey
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning intro
 

Similar to Deep learning

deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
johanericka2
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
Oswald Campesato
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
Amr Rashed
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
Oswald Campesato
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
Oswald Campesato
 
[Revised] Intro to CNN
[Revised] Intro to CNN[Revised] Intro to CNN
[Revised] Intro to CNN
Vincent Tatan
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
Oswald Campesato
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
Vincent Tatan
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
Julien SIMON
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
Sri Ambati
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Itachi SK
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
Julien SIMON
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
Ankita Tiwari
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
Oswald Campesato
 

Similar to Deep learning (20)

deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
[Revised] Intro to CNN
[Revised] Intro to CNN[Revised] Intro to CNN
[Revised] Intro to CNN
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 

Recently uploaded

spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 

Recently uploaded (20)

spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 

Deep learning

  • 1.
  • 2. Outline  Machine Learning basics  Introduction to Deep Learning  what is Deep Learning  why is it useful  Main components/hyper-parameters:  activation functions  optimizers, cost functions and training  regularization methods  tuning  classification vs. regression tasks  DNN basic architecture:  Convolutional neural network Most material from CS224 NLP with DL course at Stanford
  • 3. Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed Methods that can learn from and make predictions on data Labeled Data Labeled Data Machine Learning algorithm Learned model Prediction Training Prediction Machine Learning Basics
  • 4. Regression Supervised: Learning with a labeled training set Example: email classification with already labeled emails Unsupervised: Discover patterns in unlabeled data Example: cluster similar documents based on text Reinforcement learning: learn to act based on feedback/reward Example: learn to play Go, reward: win or lose Types of Learning class A class A Classification Clustering http://mbjoseph.github.io/2013/11/27/measure.html
  • 5. Most machine learning methods work well because of human-designed representations and input features ML becomes just optimizing weights to best make a final prediction ML vs. Deep Learning
  • 6. Deep learning algorithms attempt to learn (multiple levels of) representation by using a hierarchy of multiple layers If you provide the system tons of information, it begins to understand it and respond in useful ways. What is Deep Learning (DL) ? https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
  • 7. o Manually designed features are often over-specified, incomplete and take a long time to design and validate o Learned Features are easy to adapt, fast to learn o Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. o Can learn both unsupervised and supervised o Effective end-to-end joint system learning o Utilize large amounts of training data Why is DL useful? In ~2010 DL started outperforming other ML techniques first in speech and vision, then NLP
  • 8. Neural Network Intro Demo How do we train? 𝒉 = 𝝈(𝐖𝟏𝒙 + 𝒃𝟏) 𝒚 = 𝝈(𝑾𝟐𝒉 + 𝒃𝟐) 𝒉 𝒚 𝒙 4 + 2 = 6 neurons (not counting inputs) [3 x 4] + [4 x 2] = 20 weights 4 + 2 = 6 biases 26 learnable parameters Weights Activation functions
  • 9. Training Sample labeled data (batch) Forward it through the network, get predictions Back- propagate the errors Update the network weights Optimize (min. or max.) objective/cost function 𝑱(𝜽) Generate error signal that measures difference between predictions and target values Use error signal to change the weights and get more accurate predictions Subtracting a fraction of the gradient moves you towards the (local) minimum of the cost function https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
  • 10. Non-linearities needed to learn complex (non-linear) representations of data, otherwise the NN would be just a linear function More layers and neurons can approximate more complex functions Activation functions W1W2𝑥 = 𝑊𝑥 Full list: https://en.wikipedia.org/wiki/Activation_function http://cs231n.github.io/assets/nn1/layer_sizes.jpeg
  • 11. Activation: Sigmoid + Nice interpretation as the firing rate of a neuron • 0 = not firing at all • 1 = fully firing - Sigmoid neurons saturate and kill gradients, thus NN will barely learn • when the neuron’s activation are 0 or 1 (saturate) � gradient at these regions almost zero � almost no signal will flow to its weights � if initial weights are too large then most neurons would saturate Takes a real-valued number and “squashes” it into range between 0 and 1. 𝑅𝑛 → 0,1 http://adilmoujahid.com/images/activation.png
  • 12. Activation: ReLU Takes a real-valued number and thresholds it at zero 𝑅𝑛 → 𝑅+ 𝑛 Most Deep Networks use ReLU nowadays � Trains much faster • accelerates the convergence of SGD • due to linear, non-saturating form � Less expensive operations • compared to sigmoid/tanh (exponentials etc.) • implemented by simply thresholding a matrix at zero � More expressive � Prevents the gradient vanishing problem f 𝑥 = max(0, 𝑥) http://adilmoujahid.com/images/activation.png
  • 13. Overfitting Learned hypothesis may fit the training data very well, even outliers (noise) but fail to generalize to new examples (test data) http://wiki.bethanycrane.com/overfitting-of-data https://www.neuraldesigner.com/images/learning/selection_error.svg
  • 14. L2 = weight decay • Regularization term that penalizes big weights, added to the objective • Weight decay value determines how dominant regularization is during gradient computation • Big weight decay coefficient  big penalty for big weights Regularization Dropout • Randomly drop units (along with their connections) during training • Each unit retained with fixed probability p, independent of other units • Hyper-parameter p to be chosen (tuned) 𝐽𝑟𝑒𝑔 𝜃 = 𝐽 𝜃 + 𝜆 𝑘 𝜃𝑘 2 Early-stopping • Use validation error to decide when to stop training • Stop when monitored quantity has not improved after n subsequent epochs • n is called patience Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research (2014)
  • 15. Convolutional Neural Networks 0 We know it is good to learn a small model. 0 From this fully connected model, do we really need all the edges? 0 Can some of these be shared?
  • 16. Consider learning an image: 0 Some patterns are much smaller than the whole image “beak” detector Can represent a small region with fewer parameters
  • 17. Same pattern appears in different places: They can be compressed! What about training a lot of such “small” detectors and each detector must “move around”. “upper-left beak” detector “middle beak” detector They can be compressed to the same parameters.
  • 18. The whole CNN Fully Connected Feedforward network cat dog …… Convolution Max Pooling Convolution Max Pooling Flattened Can repeat many times
  • 19. A convolutional layer A filter A CNN is a neural network with some convolutional layers (and some other layers). A convolutional layer has a number of filters that does convolutional operation. Beak detector
  • 20. Convolution 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 3 -1 -3 -1 -3 1 0 -3 -3 -3 0 1 3 -2 -2 -1 -1 1 -1 -1 1 -1 -1 1 -1 Filter 2 -1 -1 -1 -1 -1 -1 -2 1 -1 -1 -2 1 -1 0 -4 3 Repeat this for each filter stride=1 Two 4 x 4 images Forming 2 x 4 x 4 matrix Feature Map
  • 21. 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 image convolution -1 1 -1 -1 1 -1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 1 1 x 2 x … … 36 x … … 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 Convolution v.s. Fully Connected Fully- connected
  • 22. Max Pooling 3 -1 -3 -1 -3 1 0 -3 -3 -3 0 1 3 -2 -2 -1 -1 1 -1 -1 1 -1 -1 1 -1 Filter 2 -1 -1 -1 -1 -1 -1 -2 1 -1 -1 -2 1 -1 0 -4 3 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1
  • 23. Why Pooling 0Subsampling pixels will not change the object Subsampling bird bird We can subsample the pixels to make image smaller fewer parameters to characterize the image
  • 24. A CNN compresses a fully connected network in two ways: 0 Reducing number of connections 0 Shared weights on the edges 0 Max pooling further reduces the complexity
  • 25. The whole CNN Convolution Max Pooling Convolution Max Pooling Can repeat many times A new image The number of channels is the number of filters Smaller than the original image 3 0 1 3 -1 1 3 0
  • 26. The whole CNN Fully Connected Feedforward network cat dog …… Convolution Max Pooling Convolution Max Pooling Flattened A new image A new image
  • 27. Flattening 3 0 1 3 -1 1 3 0 Flattened 3 0 1 3 -1 1 0 3 Fully Connected Feedforward network
  • 28. Only modified the network structure and input format (vector -> 3-D tensor) CNN in Keras Convolution Max Pooling Convolution Max Pooling input 1 -1 -1 -1 1 -1 -1 -1 1 -1 1 -1 -1 1 -1 -1 1 -1 There are 25 3x3 filters. … … Input_shape = ( 28 , 28 , 1) 1: black/white, 3: RGB 28 x 28 pixels 3 -1 -3 1 3
  • 29. Only modified the network structure and input format (vector -> 3-D array) CNN in Keras Convolution Max Pooling Convolution Max Pooling Input 1 x 28 x 28 25 x 26 x 26 25 x 13 x 13 50 x 11 x 11 50 x 5 x 5 How many parameters for each filter? How many parameters for each filter? 9 225= 25x9
  • 30. Only modified the network structure and input format (vector -> 3-D array) CNN in Keras Convolution Max Pooling Convolution Max Pooling Input 1 x 28 x 28 25 x 26 x 26 25 x 13 x 13 50 x 11 x 11 50 x 5 x 5 Flattened 1250 Fully connected feedforward network Output
  • 31.  Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research (2014)  Bergstra, James, and Yoshua Bengio. "Random search for hyper-parameter optimization." Journal of Machine Learning Research, Feb (2012)  Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)  Severyn, Aliaksei, and Alessandro Moschitti. "UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification." SemEval@ NAACL-HLT (2015)  Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." EMNLP (2014)  Ilya Sutskever et al. “Sequence to sequence learning with neural networks.” NIPS (2014)  Bahdanau et al. "Neural machine translation by jointly learning to align and translate." ICLR (2015)  Gal, Y., Islam, R., Ghahramani, Z. “Deep Bayesian Active Learning with Image Data.” ICML (2017)  Nair, V., Hinton, G.E. “Rectified linear units improve restricted boltzmann machines.” ICML (2010)  Ronan Collobert, et al. “Natural language processing (almost) from scratch.” JMLR (2011)  Kumar, Shantanu. "A Survey of Deep Learning Methods for Relation Extraction." arXiv preprint arXiv:1705.03645 (2017)  Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016) [code]  Zeng, D.et al. “Relation classification via convolutional deep neural network”. COLING (2014)  Nguyen, T.H., Grishman, R. “Relation extraction: Perspective from CNNs.” VS@ HLT-NAACL. (2015)  Zhang, D., Wang, D. “Relation classification via recurrent NN.” -arXiv preprint arXiv:1508.01006 (2015)  Zhou, P. et al. “Attention-based bidirectional LSTM networks for relation classification . ACL (2016)  Mike Mintz et al. “Distant supervision for relation extraction without labeled data.” ACL- IJCNLP (2009) References

Editor's Notes

  1. Hyper-parameters
  2. ReLU units can “die” large gradient flowing through a ReLU could cause weights to update in such a way that the neuron will never activate on any datapoint again gradient flowing through the unit will forever be zero from that point on With a proper setting of the learning rate this is less frequently an issue