SlideShare a Scribd company logo
Full Stack Deep Learning - UC Berkeley Spring 2021 - Sergey Karayev, Josh Tobin, Pieter Abbeel
Troubleshooting Deep Neural
Networks
Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
2
Planning & 

project setup
Data collection
& labeling
Training &
debugging
Deploying & 

testing
Team & hiring
Per-project
activities
Infra &
tooling
Cross-project
infrastructure
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why talk about DL troubleshooting?
3
XKCD, https://xkcd.com/1838/
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why talk about DL troubleshooting?
4
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why talk about DL troubleshooting?
5
Troubleshooting - overview
Common sentiment among practitioners: 

80-90% of time debugging and tuning 

10-20% deriving math or implementing things
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is DL troubleshooting so hard?
6
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Suppose you can’t reproduce a result
7
He, Kaiming, et al. "Deep residual learning for image recognition." 

Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Your learning curve
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
8
Poor model
performance
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
9
Poor model
performance
Implementation
bugs
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Most DL bugs are invisible
10
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Most DL bugs are invisible
11
Troubleshooting - overview
Labels out of order!
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
12
Poor model
performance
Implementation
bugs
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
13
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Models are sensitive to hyperparameters
14
Andrej Karpathy, CS231n course notes
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet
classification." Proceedings of the IEEE international conference on computer vision. 2015.
Models are sensitive to hyperparameters
15
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
16
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
17
Data/model fit
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Data / model fit
18
Yours: self-driving car images
Troubleshooting - overview
Data from the paper: ImageNet
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
19
Data/model fit
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
20
Dataset
construction
Data/model fit
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Constructing good datasets is hard
21
Amount of lost sleep over...
PhD Tesla
Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Common dataset construction issues
• Not enough data

• Class imbalances

• Noisy labels

• Train / test from different distributions

• etc
22
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Takeaways: why is troubleshooting hard?
• Hard to tell if you have a bug

• Lots of possible sources for the same degradation in
performance

• Results can be sensitive to small changes in
hyperparameters and dataset makeup
23
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
24
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Key mindset for DL troubleshooting
25
Pessimism
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Key idea of DL troubleshooting
26
…Start simple and gradually
ramp up complexity
Since it’s hard to
disambiguate errors…
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
27
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
28
Start 

simple
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
29
Implement
& debug
Start 

simple
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
30
Implement
& debug
Start 

simple
Evaluate
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
31
Tune hyp-
eparams
Implement
& debug
Start 

simple
Evaluate
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
• Use coarse-to-fine random searches
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Tune hyp-
eparams
Quick summary
32
Implement
& debug
Start 

simple
Evaluate
Improve
model/data
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
• Use coarse-to-fine random searches
• Make your model bigger if you underfit; add
data or regularize if you overfit
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
We’ll assume you already have…
• Initial test set

• A single metric to improve

• Target performance based on human-level performance, published results, previous
baselines, etc
33
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
We’ll assume you already have…
34
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - overview
• Initial test set

• A single metric to improve

• Target performance based on human-level
performance, published results, previous
baselines, etc
Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
35
Troubleshooting - overview
Full Stack Deep Learning - UC Berkeley Spring 2021
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Strategy for DL troubleshooting
36
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Starting simple
37
Normalize inputs
Choose a simple
architecture
Simplify the problem
Use sensible defaults
Steps
b
a
c
d
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Demystifying architecture selection
38
Start here Consider using this later
Images LeNet-like architecture
LSTM with one hidden
layer (or temporal convs)
Fully connected neural net
with one hidden layer
ResNet
Attention model or
WaveNet-like model
Problem-dependent
Images
Sequences
Other
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
39
“This”
“is”
“a”
“cat”
Input 2
Input 3
Input 1
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
40
“This”
“is”
“a”
“cat”
Input 2
Input 3
Input 1
Troubleshooting - start simple
1. Map each into a lower dimensional feature space
Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
41
ConvNet Flatten
“This”
“is”
“a”
“cat”
LSTM
Input 2
Input 3
Input 1
(64-dim)
(72-dim)
(48-dim)
Troubleshooting - start simple
1. Map each into a lower dimensional feature space
Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
42
ConvNet Flatten Con“cat”
“This”
“is”
“a”
“cat”
LSTM
Input 2
Input 3
Input 1
(64-dim)
(72-dim)
(48-dim)
(184-dim)
Troubleshooting - start simple
2. Concatenate
Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
43
ConvNet Flatten Concat
“This”
“is”
“a”
“cat”
LSTM
FC FC Output
T/F
Input 2
Input 3
Input 1
(64-dim)
(72-dim)
(48-dim)
(184-dim)
(256-dim)
(128-dim)
Troubleshooting - start simple
3. Pass through fully connected layers to output
Full Stack Deep Learning - UC Berkeley Spring 2021
Starting simple
44
Normalize inputs
Choose a simple
architecture
Simplify the problem
Use sensible defaults
Steps
b
a
c
d
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Recommended network / optimizer defaults
• Optimizer: Adam optimizer with learning rate 3e-4
• Activations: relu (FC and Conv models), tanh (LSTMs)
• Initialization: He et al. normal (relu), Glorot normal (tanh)

• Regularization: None

• Data normalization: None
45
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Definitions of recommended initializers
• (n is the number of inputs, m is the number of outputs)
• He et al. normal (used for ReLU)







• Glorot normal (used for tanh)
46
N 0,
r
2
n
!
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
N 0,
r
2
n + m
!
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Normalize inputs
Starting simple
47
Choose a simple
architecture
Simplify the problem
Use sensible defaults
b
a
c
Steps
d
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Important to normalize scale of input data
• Subtract mean and divide by variance 

• For images, fine to scale values to [0, 1] or [-0.5, 0.5]

(e.g., by dividing by 255)

[Careful, make sure your library doesn’t do it for you!]
48
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Normalize inputs
Starting simple
49
Choose a simple
architecture
Simplify the problem
Use sensible defaults
b
a
c
Steps
d
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Consider simplifying the problem as well
• Start with a small training set (~10,000 examples)

• Use a fixed number of objects, classes, image size, etc.

• Create a simpler synthetic training set
50
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Simplest model for pedestrian detection
• Start with a subset of 10,000 images for training, 1,000 for val, and 500
for test

• Use a LeNet architecture with sigmoid cross-entropy loss

• Adam optimizer with LR 3e-4

• No regularization
51
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Normalize inputs
Starting simple
52
Choose a simple
architecture
Simplify the problem
Use sensible defaults
b
a
c
Steps
d
Summary
• Start with a simpler
version of your problem
(e.g., smaller dataset)
• Adam optimizer & no
regularization
• Subtract mean and divide
by std, or just divide by
255 (ims)
• LeNet, LSTM, or fully
connected
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
53
Troubleshooting - start simple
Full Stack Deep Learning - UC Berkeley Spring 2021
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Strategy for DL troubleshooting
54
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Implementing bug-free DL models
55
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Preview: the five most common DL bugs
• Incorrect shapes for your tensors

Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape =
(None, 1), (x+y).shape = (None, None)

• Pre-processing inputs incorrectly

E.g., Forgetting to normalize, or too much pre-processing

• Incorrect input to your loss function

E.g., softmaxed outputs to a loss that expects logits

• Forgot to set up train mode for the net correctly

E.g., toggling train/eval, controlling batch norm dependencies 

• Numerical instability - inf/NaN

Often stems from using an exp, log, or div operation
56
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
General advice for implementing your model
57
Use off-the-shelf components, e.g.,
• Keras

• tf.layers.dense(…) 

instead of 

tf.nn.relu(tf.matmul(W, x))
• tf.losses.cross_entropy(…) 

instead of writing out the exp
Lightweight implementation
• Minimum possible new lines of
code for v1

• Rule of thumb: <200 lines

• (Tested infrastructure
components are fine)
Build complicated data pipelines later
• Start with a dataset you can load into
memory
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Implementing bug-free DL models
58
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
a
Shape
mismatch
Casting
issue
OOM
Other
Common
issues Recommended resolution
Scale back memory intensive
operations one-by-one
Standard debugging toolkit (Stack
Overflow + interactive debugger)
Implementing bug-free DL models
59
Step through model creation and
inference in a debugger
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
a
Shape
mismatch
Casting
issue
OOM
Other
Common
issues Recommended resolution
Scale back memory intensive
operations one-by-one
Standard debugging toolkit (Stack
Overflow + interactive debugger)
Implementing bug-free DL models
60
Step through model creation and
inference in a debugger
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Debuggers for DL code
61
• Pytorch: easy, use ipdb

• tensorflow: trickier 



Option 1: step through graph creation
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Debuggers for DL code
62
• Pytorch: easy, use ipdb

• tensorflow: trickier 



Option 2: step into training loop
Evaluate tensors using sess.run(…)
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Debuggers for DL code
63
• Pytorch: easy, use ipdb

• tensorflow: trickier 



Option 3: use tfdb
Stops
execution at
each
sess.run(…)
and lets you
inspect
python -m tensorflow.python.debug.examples.debug_mnist --debug
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
a
Shape
mismatch
Casting
issue
OOM
Other
Common
issues Recommended resolution
Scale back memory intensive
operations one-by-one
Standard debugging toolkit (Stack
Overflow + interactive debugger)
Implementing bug-free DL models
64
Step through model creation and
inference in a debugger
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Shape
mismatch Undefined
shapes
Incorrect
shapes
Common
issues Most common causes
• Flipped dimensions when using tf.reshape(…)
• Took sum, average, or softmax over wrong
dimension
• Forgot to flatten after conv layers
• Forgot to get rid of extra “1” dimensions (e.g., if
shape is (None, 1, 1, 4)
• Data stored on disk in a different dtype than
loaded (e.g., stored a float64 numpy array, and
loaded it as a float32)
Implementing bug-free DL models
65
• Confusing tensor.shape, tf.shape(tensor),
tensor.get_shape()
• Reshaping things to a shape of type Tensor (e.g.,
when loading data from a file)
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Casting issue
Data not in
float32
Common
issues Most common causes
Implementing bug-free DL models
66
• Forgot to cast images from uint8 to float32
• Generated data using numpy in float64, forgot to
cast to float32
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
OOM
Too big a
tensor
Too much
data
Common
issues Most common causes
Duplicating
operations
Other
processes
• Other processes running on your GPU
• Memory leak due to creating multiple models in
the same session
• Repeatedly creating an operation (e.g., in a
function that gets called over and over again)
• Too large a batch size for your model (e.g.,
during evaluation)
• Too large fully connected layers
• Loading too large a dataset into memory, rather
than using an input queue
• Allocating too large a buffer for dataset creation
Implementing bug-free DL models
67
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Other common
errors Other bugs
Common
issues Most common causes
Implementing bug-free DL models
68
• Forgot to initialize variables
• Forgot to turn off bias when using batch norm
• “Fetch argument has invalid type” - usually you
overwrote one of your ops with an output during
training
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Implementing bug-free DL models
69
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
Implementing bug-free DL models
70
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
Implementing bug-free DL models
71
Troubleshooting - debug
• Flipped the sign of the loss function / gradient
• Learning rate too high
• Softmax taken over wrong dimension
Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Numerical issue. Check all exp, log, and div operations
• Learning rate too high
Implementing bug-free DL models
72
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Data or labels corrupted (e.g., zeroed, incorrectly
shuffled, or preprocessed incorrectly)
• Learning rate too high
Implementing bug-free DL models
73
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Learning rate too low
• Gradients not flowing through the whole model
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits, accidentally add ReLU on output)
• Data or labels corrupted
Implementing bug-free DL models
74
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Numerical issue. Check all exp, log, and div operations
• Learning rate too high
• Data or labels corrupted (e.g., zeroed or incorrectly
shuffled)
• Learning rate too high
• Learning rate too low
• Gradients not flowing through the whole model
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits)
• Data or labels corrupted
Implementing bug-free DL models
75
Troubleshooting - debug
• Flipped the sign of the loss function / gradient
• Learning rate too high
• Softmax taken over wrong dimension
Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Implementing bug-free DL models
76
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
77
More
useful
Less
useful
You can: 

• Walk through code line-by-line and
ensure you have the same output
• Ensure your performance is up to par
with expectations
Troubleshooting - debug
• Official model implementation evaluated on similar dataset
to yours
Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
78
More
useful
Less
useful
You can: 

• Walk through code line-by-line and
ensure you have the same output
Troubleshooting - debug
• Official model implementation evaluated on similar
dataset to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)
Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar dataset
to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,
MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear
regression)
Hierarchy of known results
79
More
useful
Less
useful
You can: 

• Same as before, but with lower
confidence
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
80
More
useful
Less
useful
You can: 

• Ensure your performance is up to par
with expectations
Troubleshooting - debug
• Official model implementation evaluated on similar
dataset to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from a paper (with no code)
Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar
dataset to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,
MNIST)
Hierarchy of known results
81
More
useful
Less
useful
You can: 

• Make sure your model performs well in a
simpler setting
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar dataset
to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,
MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear
regression)
Hierarchy of known results
82
More
useful
Less
useful
You can: 

• Get a general sense of what kind of
performance can be expected
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar dataset
to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,
MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear
regression)
Hierarchy of known results
83
More
useful
Less
useful
You can: 

• Make sure your model is learning
anything at all
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
84
More
useful
Less
useful
Troubleshooting - debug
• Official model implementation evaluated on similar dataset
to yours

• Official model implementation evaluated on benchmark
(e.g., MNIST)

• Unofficial model implementation

• Results from the paper (with no code)

• Results from your model on a benchmark dataset (e.g.,
MNIST)

• Results from a similar model on a similar dataset

• Super simple baselines (e.g., average of outputs or linear
regression)
Full Stack Deep Learning - UC Berkeley Spring 2021
Summary: how to implement & debug
85
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps Summary
• Look for corrupted data, over-
regularization, broadcasting errors
• Keep iterating until model performs
up to expectations
Troubleshooting - debug
• Step through in debugger & watch out
for shape, casting, and OOM errors
Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
86
Troubleshooting - debug
Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
87
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
88
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
89
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
90
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
25
2 27
5 32
2 34
Irreducible error
Avoidable bias
(i.e., underfitting)
Train error
Variance (i.e.,
overfitting) Val error
Val set overfitting
Test error
0
5
10
15
20
25
30
35
40
Breakdown of test error by source
Bias-variance decomposition
91
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
• Test error = irreducible error + bias + variance + val overfitting
• This assumes train, val, and test all come from the same distribution.
What if not?
92
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Handling distribution shift
93
Test data
Use two val sets: one sampled from training distribution and
one from test distribution
Train data
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
The bias-variance tradeoff
94
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance with distribution shift
95
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance with distribution shift
96
25
2 27
2 29
3 32
2 34
I
r
r
e
d
u
c
i
b
l
e
e
r
r
o
r
A
v
o
i
d
a
b
l
e
b
i
a
s
(
i
.
e
.
,
u
n
d
e
r
f
i
t
t
i
n
g
)
T
r
a
i
n
e
r
r
o
r
V
a
r
i
a
n
c
e
T
r
a
i
n
v
a
l
e
r
r
o
r
D
i
s
t
r
i
b
u
t
i
o
n
s
h
i
f
t
T
e
s
t
v
a
l
e
r
r
o
r
V
a
l
o
v
e
r
f
i
t
t
i
n
g
T
e
s
t
e
r
r
o
r
0
5
10
15
20
25
30
35
40
Breakdown of test error by source
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
97
Error source Value
Goal
performance
1%
Train error 20%
Validation error 27%
Test error 28%
Train - goal = 19%

(under-fitting)
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
98
Error source Value
Goal
performance
1%
Train error 20%
Validation error 27%
Test error 28%
Val - train = 7%

(over-fitting)
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
99
Error source Value
Goal
performance
1%
Train error 20%
Validation error 27%
Test error 28%
Test - val = 1%

(looks good!)
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Summary: evaluating model performance
100
Troubleshooting - evaluate
Test error = irreducible error + bias + variance 

+ distribution shift + val overfitting
Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
101
Troubleshooting - evaluate
Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
102
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
103
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing under-fitting (i.e., reducing bias)
104
Try first
Try later
Troubleshooting - improve
A. Make your model bigger (i.e., add layers or use
more units per layer)

B. Reduce regularization

C. Error analysis

D. Choose a different (closer to state-of-the art)
model architecture (e.g., move from LeNet to
ResNet)

E. Tune hyper-parameters (e.g., learning rate)

F. Add features
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
105
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy 

(i.e., 1% error)
Error source Value Value
Goal performance 1% 1%
Train error 20% 7%
Validation error 27% 19%
Test error 28% 20%
Add more layers
to the ConvNet
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
106
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy 

(i.e., 1% error)
Error source Value Value Value
Goal performance 1% 1% 1%
Train error 20% 7% 3%
Validation error 27% 19% 10%
Test error 28% 20% 10%
Switch to
ResNet-101
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
107
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy 

(i.e., 1% error)
Error source Value Value Value Value
Goal performance 1% 1% 1% 1%
Train error 20% 7% 3% 0.8%
Validation error 27% 19% 10% 12%
Test error 28% 20% 10% 12%
Tune learning
rate
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
108
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing over-fitting (i.e., reducing variance)
109
Try first
Try later
Troubleshooting - improve
A. Add more training data (if possible!)

B. Add normalization (e.g., batch norm, layer norm)

C. Add data augmentation

D. Increase regularization (e.g., dropout, L2, weight decay)

E. Error analysis

F. Choose a different (closer to state-of-the-art) model
architecture

G. Tune hyperparameters

H. Early stopping

I. Remove features

J. Reduce model size
Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing over-fitting (i.e., reducing variance)
110
Try first
Try later
Not
recommended!
Troubleshooting - improve
A. Add more training data (if possible!)

B. Add normalization (e.g., batch norm, layer norm)

C. Add data augmentation

D. Increase regularization (e.g., dropout, L2, weight decay)

E. Error analysis

F. Choose a different (closer to state-of-the-art) model
architecture

G. Tune hyperparameters

H. Early stopping

I. Remove features

J. Reduce model size
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
111
Error source Value
Goal performance 1%
Train error 0.8%
Validation error 12%
Test error 12%
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
112
Error source Value Value
Goal performance 1% 1%
Train error 0.8% 1.5%
Validation error 12% 5%
Test error 12% 6%
Increase dataset
size to 250,000
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
113
Error source Value Value Value
Goal performance 1% 1% 1%
Train error 0.8% 1.5% 1.7%
Validation error 12% 5% 4%
Test error 12% 6% 4%
Add weight
decay
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
114
Error source Value Value Value Value
Goal performance 1% 1% 1% 1%
Train error 0.8% 1.5% 1.7% 2%
Validation error 12% 5% 4% 2.5%
Test error 12% 6% 4% 2.6%
Add data
augmentation
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
115
Error source Value Value Value Value Value
Goal performance 1% 1% 1% 1% 1%
Train error 0.8% 1.5% 1.7% 2% 0.6%
Validation error 12% 5% 4% 2.5% 0.9%
Test error 12% 6% 4% 2.6% 1.0%
Tune num layers, optimizer params, weight
initialization, kernel size, weight decay
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
116
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing distribution shift
117
Try first
Try later
Troubleshooting - improve
A. Analyze test-val set errors & collect more
training data to compensate

B. Analyze test-val set errors & synthesize more
training data to compensate

C. Apply domain adaptation techniques to
training & test distributions
Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
118
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
119
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error type 1: hard-to-see
pedestrians
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
120
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error type 2: reflections
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
121
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error type 3 (test-val only):
night scenes
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
122
Error type
Error %
(train-val)
Error %
(test-val)
Potential solutions Priority
1. Hard-to-see
pedestrians
0.1% 0.1% • Better sensors Low
2. Reflections 0.3% 0.3%
• Collect more data with reflections

• Add synthetic reflections to train set

• Try to remove with pre-processing

• Better sensors
Medium
3. Nighttime
scenes
0.1% 1%
• Collect more data at night

• Synthetically darken training images

• Simulate night-time data

• Use domain adaptation
High
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Domain adaptation
123
What is it?
Techniques to train on “source”
distribution and generalize to another
“target” using only unlabeled data or
limited labeled data
When should you consider using it?
• Access to labeled data from test
distribution is limited

• Access to relatively similar data is
plentiful
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Types of domain adaptation
124
Type Use case Example techniques
Supervised
You have limited data
from target domain
• Fine-tuning a pre-
trained model

• Adding target data to
train set
Un-supervised
You have lots of un-
labeled data from target
domain
• Correlation Alignment
(CORAL)

• Domain confusion

• CycleGAN
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
125
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Rebalancing datasets
• If (test)-val looks significantly better than test, you overfit to the val set

• This happens with small val sets or lots of hyper parameter tuning

• When it does, recollect val data
126
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
127
Troubleshooting - improve
Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
128
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Hyperparameter optimization
129
Model & optimizer choices?
Network: ResNet

- How many layers?

- Weight initialization?

- Kernel size? 

- Etc

Optimizer: Adam

- Batch size?

- Learning rate?

- beta1, beta2, epsilon? 

Regularization

- ….
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Which hyper-parameters to tune?
130
Choosing hyper-parameters
• More sensitive to some than others

• Depends on choice of model

• Rules of thumb (only) to the right

• Sensitivity is relative to default values! 

(e.g., if you are using all-zeros weight
initialization or vanilla SGD, changing to the
defaults will make a big difference)
Hyperparameter Approximate sensitivity
Learning rate High
Learning rate schedule High
Optimizer choice Low
Other optimizer params

(e.g., Adam beta1)
Low
Batch size Low
Weight initialization Medium
Loss function High
Model depth Medium
Layer size High
Layer params 

(e.g., kernel size)
Medium
Weight of regularization Medium
Nonlinearity Low
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 1: manual hyperparam optimization
131
How it works
• Understand the algorithm

• E.g., higher learning rate means faster
less stable training

• Train & evaluate model

• Guess a better hyperparam value & re-
evaluate 

• Can be combined with other methods
(e.g., manually select parameter ranges to
optimizer over)
Advantages
Disadvantages
• For a skilled practitioner, may require least
computation to get good result
• Requires detailed understanding of the
algorithm

• Time-consuming
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 2: grid search
132
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
• Super simple to implement

• Can produce good results
• Not very efficient: need to train on all
cross-combos of hyper-parameters

• May require prior knowledge about
parameters to get

good results
Advantages
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 3: random search
133
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
• Easy to implement

• Often produces better results than grid
search
• Not very interpretable

• May require prior knowledge about
parameters to get

good results
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
134
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
135
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Best performers
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
136
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
137
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
138
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
• Can narrow in on very high performing
hyperparameters

• Most used method in practice
• Somewhat manual process
Advantages
etc.
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 5: Bayesian hyperparam opt
139
How it works (at a high level)
• Start with a prior estimate of parameter
distributions

• Maintain a probabilistic model of the
relationship between hyper-parameter
values and model performance

• Alternate between:

• Training with the hyper-parameter
values that maximize the expected
improvement

• Using training results to update our
probabilistic model

• To learn more, see:



Advantages
Disadvantages
• Generally the most efficient hands-off way
to choose hyperparameters
• Difficult to implement from scratch

• Can be hard to integrate with off-the-shelf
tools
https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Method 5: Bayesian hyperparam opt
140
How it works (at a high level)
• Start with a prior estimate of parameter
distributions

• Maintain a probabilistic model of the
relationship between hyper-parameter
values and model performance

• Alternate between:

• Training with the hyper-parameter
values that maximize the expected
improvement

• Using training results to update our
probabilistic model

• To learn more, see:



Advantages
Disadvantages
• Generally the most efficient hands-off way
to choose hyperparameters
• Difficult to implement from scratch

• Can be hard to integrate with off-the-shelf
tools
More on tools to do this automatically in
the infrastructure & tooling lecture!
https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Summary of how to optimize hyperparams
• Coarse-to-fine random searches

• Consider Bayesian hyper-parameter optimization solutions
as your codebase matures
141
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
142
Troubleshooting - tune
Full Stack Deep Learning - UC Berkeley Spring 2021
Conclusion
143
• DL debugging is hard due to many
competing sources of error
• To train bug-free DL models, we treat
building our model as an iterative process
• The following steps can make the process
easier and catch errors as early as possible
Troubleshooting - conclusion
Full Stack Deep Learning - UC Berkeley Spring 2021
How to build bug-free DL models
144
Tune hyp-
eparams
Implement
& debug
Start 

simple
Evaluate
Improve
model/data
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
• Use coarse-to-fine random searches
• Make your model bigger if you underfit; add
data or regularize if you overfit
Troubleshooting - conclusion
Full Stack Deep Learning - UC Berkeley Spring 2021
Where to go to learn more
• Andrew Ng’s book Machine Learning Yearning (http://
www.mlyearning.org/)

• The following Twitter thread:

https://twitter.com/karpathy/status/1013244313327681536

• This blog post: 

https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-
building-deep-neural-networks/
145
Troubleshooting - conclusion
Full Stack Deep Learning - UC Berkeley Spring 2021
Thank you!
146

More Related Content

What's hot

Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Sergey Karayev
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
Sergey Karayev
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Sergey Karayev
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
Sergey Karayev
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural Nets
Anuj Gupta
 
On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)
CISPA Helmholtz Center for Information Security
 
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
CISPA Helmholtz Center for Information Security
 
On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)
CISPA Helmholtz Center for Information Security
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesJames Johnson
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
QuantUniversity
 
On Impact in Software Engineering Research
On Impact in Software Engineering ResearchOn Impact in Software Engineering Research
On Impact in Software Engineering Research
CISPA Helmholtz Center for Information Security
 
Agile Deep Learning
Agile Deep LearningAgile Deep Learning
Agile Deep Learning
David Murgatroyd
 
Table of Contents
Table of ContentsTable of Contents
Table of Contentsbutest
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
Claire Le Goues
 
Final grasp ASE
Final grasp ASEFinal grasp ASE
Final grasp ASE
babak danyal
 

What's hot (20)

Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
 
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural Nets
 
On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)On Impact in Software Engineering Research (HU Berlin 2021)
On Impact in Software Engineering Research (HU Berlin 2021)
 
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
On impact in Software Engineering Research (ICSE 2018 New Faculty Symposium)
 
On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
On Impact in Software Engineering Research
On Impact in Software Engineering ResearchOn Impact in Software Engineering Research
On Impact in Software Engineering Research
 
Agile Deep Learning
Agile Deep LearningAgile Deep Learning
Agile Deep Learning
 
Table of Contents
Table of ContentsTable of Contents
Table of Contents
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
Final grasp ASE
Final grasp ASEFinal grasp ASE
Final grasp ASE
 

Similar to Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - Spring 2021)

IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
IEEEMEMTECHSTUDENTSPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
IEEEFINALYEARSTUDENTPROJECT
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
CHENHuiMei
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
Anubhav Jain
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Seunghyun Hwang
 
EE337 Course Introduction
EE337 Course IntroductionEE337 Course Introduction
EE337 Course Introduction
rajbabureliance
 
EE337 Course Introduction
EE337 Course IntroductionEE337 Course Introduction
EE337 Course Introduction
rajbabureliance
 
What to do as simulation expert
What to do as simulation expertWhat to do as simulation expert
What to do as simulation expert
Abinash Panda
 
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...
IWMW
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
CS, NcState
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Vincenzo Lomonaco
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
IntroML_3_DataVisualisation
IntroML_3_DataVisualisationIntroML_3_DataVisualisation
IntroML_3_DataVisualisation
Elio Laureano
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Databricks
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
MLAI2
 
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
YONG ZHENG
 
Mis 589 Success Begins / snaptutorial.com
Mis 589  Success Begins / snaptutorial.comMis 589  Success Begins / snaptutorial.com
Mis 589 Success Begins / snaptutorial.com
WilliamsTaylor44
 

Similar to Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - Spring 2021) (20)

IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
EE337 Course Introduction
EE337 Course IntroductionEE337 Course Introduction
EE337 Course Introduction
 
EE337 Course Introduction
EE337 Course IntroductionEE337 Course Introduction
EE337 Course Introduction
 
What to do as simulation expert
What to do as simulation expertWhat to do as simulation expert
What to do as simulation expert
 
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
IntroML_3_DataVisualisation
IntroML_3_DataVisualisationIntroML_3_DataVisualisation
IntroML_3_DataVisualisation
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
 
Mis 589 Success Begins / snaptutorial.com
Mis 589  Success Begins / snaptutorial.comMis 589  Success Begins / snaptutorial.com
Mis 589 Success Begins / snaptutorial.com
 

More from Sergey Karayev

Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
Sergey Karayev
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
Sergey Karayev
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
Sergey Karayev
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
Sergey Karayev
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
Sergey Karayev
 

More from Sergey Karayev (6)

Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - Spring 2021)

  • 1. Full Stack Deep Learning - UC Berkeley Spring 2021 - Sergey Karayev, Josh Tobin, Pieter Abbeel Troubleshooting Deep Neural Networks
  • 2. Full Stack Deep Learning - UC Berkeley Spring 2021 Lifecycle of a ML project 2 Planning & 
 project setup Data collection & labeling Training & debugging Deploying & 
 testing Team & hiring Per-project activities Infra & tooling Cross-project infrastructure Troubleshooting - overview
  • 3. Full Stack Deep Learning - UC Berkeley Spring 2021 Why talk about DL troubleshooting? 3 XKCD, https://xkcd.com/1838/ Troubleshooting - overview
  • 4. Full Stack Deep Learning - UC Berkeley Spring 2021 Why talk about DL troubleshooting? 4 Troubleshooting - overview
  • 5. Full Stack Deep Learning - UC Berkeley Spring 2021 Why talk about DL troubleshooting? 5 Troubleshooting - overview Common sentiment among practitioners: 80-90% of time debugging and tuning 10-20% deriving math or implementing things
  • 6. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is DL troubleshooting so hard? 6 Troubleshooting - overview
  • 7. Full Stack Deep Learning - UC Berkeley Spring 2021 Suppose you can’t reproduce a result 7 He, Kaiming, et al. "Deep residual learning for image recognition." 
 Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Your learning curve Troubleshooting - overview
  • 8. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 8 Poor model performance Troubleshooting - overview
  • 9. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 9 Poor model performance Implementation bugs Troubleshooting - overview
  • 10. Full Stack Deep Learning - UC Berkeley Spring 2021 Most DL bugs are invisible 10 Troubleshooting - overview
  • 11. Full Stack Deep Learning - UC Berkeley Spring 2021 Most DL bugs are invisible 11 Troubleshooting - overview Labels out of order!
  • 12. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 12 Poor model performance Implementation bugs Troubleshooting - overview
  • 13. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 13 Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  • 14. Full Stack Deep Learning - UC Berkeley Spring 2021 Models are sensitive to hyperparameters 14 Andrej Karpathy, CS231n course notes Troubleshooting - overview
  • 15. Full Stack Deep Learning - UC Berkeley Spring 2021 Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. Models are sensitive to hyperparameters 15 Troubleshooting - overview
  • 16. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 16 Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  • 17. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 17 Data/model fit Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  • 18. Full Stack Deep Learning - UC Berkeley Spring 2021 Data / model fit 18 Yours: self-driving car images Troubleshooting - overview Data from the paper: ImageNet
  • 19. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 19 Data/model fit Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  • 20. Full Stack Deep Learning - UC Berkeley Spring 2021 Why is your performance worse? 20 Dataset construction Data/model fit Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  • 21. Full Stack Deep Learning - UC Berkeley Spring 2021 Constructing good datasets is hard 21 Amount of lost sleep over... PhD Tesla Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018 Troubleshooting - overview
  • 22. Full Stack Deep Learning - UC Berkeley Spring 2021 Common dataset construction issues • Not enough data • Class imbalances • Noisy labels • Train / test from different distributions • etc 22 Troubleshooting - overview
  • 23. Full Stack Deep Learning - UC Berkeley Spring 2021 Takeaways: why is troubleshooting hard? • Hard to tell if you have a bug • Lots of possible sources for the same degradation in performance • Results can be sensitive to small changes in hyperparameters and dataset makeup 23 Troubleshooting - overview
  • 24. Full Stack Deep Learning - UC Berkeley Spring 2021 Strategy for DL troubleshooting 24 Troubleshooting - overview
  • 25. Full Stack Deep Learning - UC Berkeley Spring 2021 Key mindset for DL troubleshooting 25 Pessimism Troubleshooting - overview
  • 26. Full Stack Deep Learning - UC Berkeley Spring 2021 Key idea of DL troubleshooting 26 …Start simple and gradually ramp up complexity Since it’s hard to disambiguate errors… Troubleshooting - overview
  • 27. Full Stack Deep Learning - UC Berkeley Spring 2021 Strategy for DL troubleshooting 27 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - overview
  • 28. Full Stack Deep Learning - UC Berkeley Spring 2021 Quick summary 28 Start 
 simple • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) Troubleshooting - overview
  • 29. Full Stack Deep Learning - UC Berkeley Spring 2021 Quick summary 29 Implement & debug Start 
 simple • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result Troubleshooting - overview
  • 30. Full Stack Deep Learning - UC Berkeley Spring 2021 Quick summary 30 Implement & debug Start 
 simple Evaluate • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next Troubleshooting - overview
  • 31. Full Stack Deep Learning - UC Berkeley Spring 2021 Quick summary 31 Tune hyp- eparams Implement & debug Start 
 simple Evaluate • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next • Use coarse-to-fine random searches Troubleshooting - overview
  • 32. Full Stack Deep Learning - UC Berkeley Spring 2021 Tune hyp- eparams Quick summary 32 Implement & debug Start 
 simple Evaluate Improve model/data • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next • Use coarse-to-fine random searches • Make your model bigger if you underfit; add data or regularize if you overfit Troubleshooting - overview
  • 33. Full Stack Deep Learning - UC Berkeley Spring 2021 We’ll assume you already have… • Initial test set • A single metric to improve • Target performance based on human-level performance, published results, previous baselines, etc 33 Troubleshooting - overview
  • 34. Full Stack Deep Learning - UC Berkeley Spring 2021 We’ll assume you already have… 34 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - overview • Initial test set • A single metric to improve • Target performance based on human-level performance, published results, previous baselines, etc
  • 35. Full Stack Deep Learning - UC Berkeley Spring 2021 Questions? 35 Troubleshooting - overview
  • 36. Full Stack Deep Learning - UC Berkeley Spring 2021 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Strategy for DL troubleshooting 36 Troubleshooting - start simple
  • 37. Full Stack Deep Learning - UC Berkeley Spring 2021 Starting simple 37 Normalize inputs Choose a simple architecture Simplify the problem Use sensible defaults Steps b a c d Troubleshooting - start simple
  • 38. Full Stack Deep Learning - UC Berkeley Spring 2021 Demystifying architecture selection 38 Start here Consider using this later Images LeNet-like architecture LSTM with one hidden layer (or temporal convs) Fully connected neural net with one hidden layer ResNet Attention model or WaveNet-like model Problem-dependent Images Sequences Other Troubleshooting - start simple
  • 39. Full Stack Deep Learning - UC Berkeley Spring 2021 Dealing with multiple input modalities 39 “This” “is” “a” “cat” Input 2 Input 3 Input 1 Troubleshooting - start simple
  • 40. Full Stack Deep Learning - UC Berkeley Spring 2021 Dealing with multiple input modalities 40 “This” “is” “a” “cat” Input 2 Input 3 Input 1 Troubleshooting - start simple 1. Map each into a lower dimensional feature space
  • 41. Full Stack Deep Learning - UC Berkeley Spring 2021 Dealing with multiple input modalities 41 ConvNet Flatten “This” “is” “a” “cat” LSTM Input 2 Input 3 Input 1 (64-dim) (72-dim) (48-dim) Troubleshooting - start simple 1. Map each into a lower dimensional feature space
  • 42. Full Stack Deep Learning - UC Berkeley Spring 2021 Dealing with multiple input modalities 42 ConvNet Flatten Con“cat” “This” “is” “a” “cat” LSTM Input 2 Input 3 Input 1 (64-dim) (72-dim) (48-dim) (184-dim) Troubleshooting - start simple 2. Concatenate
  • 43. Full Stack Deep Learning - UC Berkeley Spring 2021 Dealing with multiple input modalities 43 ConvNet Flatten Concat “This” “is” “a” “cat” LSTM FC FC Output T/F Input 2 Input 3 Input 1 (64-dim) (72-dim) (48-dim) (184-dim) (256-dim) (128-dim) Troubleshooting - start simple 3. Pass through fully connected layers to output
  • 44. Full Stack Deep Learning - UC Berkeley Spring 2021 Starting simple 44 Normalize inputs Choose a simple architecture Simplify the problem Use sensible defaults Steps b a c d Troubleshooting - start simple
  • 45. Full Stack Deep Learning - UC Berkeley Spring 2021 Recommended network / optimizer defaults • Optimizer: Adam optimizer with learning rate 3e-4 • Activations: relu (FC and Conv models), tanh (LSTMs) • Initialization: He et al. normal (relu), Glorot normal (tanh) • Regularization: None • Data normalization: None 45 Troubleshooting - start simple
  • 46. Full Stack Deep Learning - UC Berkeley Spring 2021 Definitions of recommended initializers • (n is the number of inputs, m is the number of outputs) • He et al. normal (used for ReLU)
 
 
 
 • Glorot normal (used for tanh) 46 N 0, r 2 n ! <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> N 0, r 2 n + m ! <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> Troubleshooting - start simple
  • 47. Full Stack Deep Learning - UC Berkeley Spring 2021 Normalize inputs Starting simple 47 Choose a simple architecture Simplify the problem Use sensible defaults b a c Steps d Troubleshooting - start simple
  • 48. Full Stack Deep Learning - UC Berkeley Spring 2021 Important to normalize scale of input data • Subtract mean and divide by variance • For images, fine to scale values to [0, 1] or [-0.5, 0.5]
 (e.g., by dividing by 255)
 [Careful, make sure your library doesn’t do it for you!] 48 Troubleshooting - start simple
  • 49. Full Stack Deep Learning - UC Berkeley Spring 2021 Normalize inputs Starting simple 49 Choose a simple architecture Simplify the problem Use sensible defaults b a c Steps d Troubleshooting - start simple
  • 50. Full Stack Deep Learning - UC Berkeley Spring 2021 Consider simplifying the problem as well • Start with a small training set (~10,000 examples) • Use a fixed number of objects, classes, image size, etc. • Create a simpler synthetic training set 50 Troubleshooting - start simple
  • 51. Full Stack Deep Learning - UC Berkeley Spring 2021 Simplest model for pedestrian detection • Start with a subset of 10,000 images for training, 1,000 for val, and 500 for test • Use a LeNet architecture with sigmoid cross-entropy loss • Adam optimizer with LR 3e-4 • No regularization 51 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - start simple
  • 52. Full Stack Deep Learning - UC Berkeley Spring 2021 Normalize inputs Starting simple 52 Choose a simple architecture Simplify the problem Use sensible defaults b a c Steps d Summary • Start with a simpler version of your problem (e.g., smaller dataset) • Adam optimizer & no regularization • Subtract mean and divide by std, or just divide by 255 (ims) • LeNet, LSTM, or fully connected Troubleshooting - start simple
  • 53. Full Stack Deep Learning - UC Berkeley Spring 2021 Questions? 53 Troubleshooting - start simple
  • 54. Full Stack Deep Learning - UC Berkeley Spring 2021 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Strategy for DL troubleshooting 54 Troubleshooting - debug
  • 55. Full Stack Deep Learning - UC Berkeley Spring 2021 Implementing bug-free DL models 55 Get your model to run Compare to a known result Overfit a single batch b a c Steps Troubleshooting - debug
  • 56. Full Stack Deep Learning - UC Berkeley Spring 2021 Preview: the five most common DL bugs • Incorrect shapes for your tensors
 Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape = (None, 1), (x+y).shape = (None, None) • Pre-processing inputs incorrectly
 E.g., Forgetting to normalize, or too much pre-processing • Incorrect input to your loss function
 E.g., softmaxed outputs to a loss that expects logits • Forgot to set up train mode for the net correctly
 E.g., toggling train/eval, controlling batch norm dependencies • Numerical instability - inf/NaN
 Often stems from using an exp, log, or div operation 56 Troubleshooting - debug
  • 57. Full Stack Deep Learning - UC Berkeley Spring 2021 General advice for implementing your model 57 Use off-the-shelf components, e.g., • Keras • tf.layers.dense(…) 
 instead of 
 tf.nn.relu(tf.matmul(W, x)) • tf.losses.cross_entropy(…) 
 instead of writing out the exp Lightweight implementation • Minimum possible new lines of code for v1 • Rule of thumb: <200 lines • (Tested infrastructure components are fine) Build complicated data pipelines later • Start with a dataset you can load into memory Troubleshooting - debug
  • 58. Full Stack Deep Learning - UC Berkeley Spring 2021 Implementing bug-free DL models 58 Get your model to run Compare to a known result Overfit a single batch b a c Steps Troubleshooting - debug
  • 59. Full Stack Deep Learning - UC Berkeley Spring 2021 Get your model to run a Shape mismatch Casting issue OOM Other Common issues Recommended resolution Scale back memory intensive operations one-by-one Standard debugging toolkit (Stack Overflow + interactive debugger) Implementing bug-free DL models 59 Step through model creation and inference in a debugger Troubleshooting - debug
  • 60. Full Stack Deep Learning - UC Berkeley Spring 2021 Get your model to run a Shape mismatch Casting issue OOM Other Common issues Recommended resolution Scale back memory intensive operations one-by-one Standard debugging toolkit (Stack Overflow + interactive debugger) Implementing bug-free DL models 60 Step through model creation and inference in a debugger Troubleshooting - debug
  • 61. Full Stack Deep Learning - UC Berkeley Spring 2021 Debuggers for DL code 61 • Pytorch: easy, use ipdb • tensorflow: trickier 
 
 Option 1: step through graph creation Troubleshooting - debug
  • 62. Full Stack Deep Learning - UC Berkeley Spring 2021 Debuggers for DL code 62 • Pytorch: easy, use ipdb • tensorflow: trickier 
 
 Option 2: step into training loop Evaluate tensors using sess.run(…) Troubleshooting - debug
  • 63. Full Stack Deep Learning - UC Berkeley Spring 2021 Debuggers for DL code 63 • Pytorch: easy, use ipdb • tensorflow: trickier 
 
 Option 3: use tfdb Stops execution at each sess.run(…) and lets you inspect python -m tensorflow.python.debug.examples.debug_mnist --debug Troubleshooting - debug
  • 64. Full Stack Deep Learning - UC Berkeley Spring 2021 Get your model to run a Shape mismatch Casting issue OOM Other Common issues Recommended resolution Scale back memory intensive operations one-by-one Standard debugging toolkit (Stack Overflow + interactive debugger) Implementing bug-free DL models 64 Step through model creation and inference in a debugger Troubleshooting - debug
  • 65. Full Stack Deep Learning - UC Berkeley Spring 2021 Shape mismatch Undefined shapes Incorrect shapes Common issues Most common causes • Flipped dimensions when using tf.reshape(…) • Took sum, average, or softmax over wrong dimension • Forgot to flatten after conv layers • Forgot to get rid of extra “1” dimensions (e.g., if shape is (None, 1, 1, 4) • Data stored on disk in a different dtype than loaded (e.g., stored a float64 numpy array, and loaded it as a float32) Implementing bug-free DL models 65 • Confusing tensor.shape, tf.shape(tensor), tensor.get_shape() • Reshaping things to a shape of type Tensor (e.g., when loading data from a file) Troubleshooting - debug
  • 66. Full Stack Deep Learning - UC Berkeley Spring 2021 Casting issue Data not in float32 Common issues Most common causes Implementing bug-free DL models 66 • Forgot to cast images from uint8 to float32 • Generated data using numpy in float64, forgot to cast to float32 Troubleshooting - debug
  • 67. Full Stack Deep Learning - UC Berkeley Spring 2021 OOM Too big a tensor Too much data Common issues Most common causes Duplicating operations Other processes • Other processes running on your GPU • Memory leak due to creating multiple models in the same session • Repeatedly creating an operation (e.g., in a function that gets called over and over again) • Too large a batch size for your model (e.g., during evaluation) • Too large fully connected layers • Loading too large a dataset into memory, rather than using an input queue • Allocating too large a buffer for dataset creation Implementing bug-free DL models 67 Troubleshooting - debug
  • 68. Full Stack Deep Learning - UC Berkeley Spring 2021 Other common errors Other bugs Common issues Most common causes Implementing bug-free DL models 68 • Forgot to initialize variables • Forgot to turn off bias when using batch norm • “Fetch argument has invalid type” - usually you overwrote one of your ops with an output during training Troubleshooting - debug
  • 69. Full Stack Deep Learning - UC Berkeley Spring 2021 Get your model to run Compare to a known result Overfit a single batch b a c Steps Implementing bug-free DL models 69 Troubleshooting - debug
  • 70. Full Stack Deep Learning - UC Berkeley Spring 2021 Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes Implementing bug-free DL models 70 Troubleshooting - debug
  • 71. Full Stack Deep Learning - UC Berkeley Spring 2021 Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes Implementing bug-free DL models 71 Troubleshooting - debug • Flipped the sign of the loss function / gradient • Learning rate too high • Softmax taken over wrong dimension
  • 72. Full Stack Deep Learning - UC Berkeley Spring 2021 Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Numerical issue. Check all exp, log, and div operations • Learning rate too high Implementing bug-free DL models 72 Troubleshooting - debug
  • 73. Full Stack Deep Learning - UC Berkeley Spring 2021 Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Data or labels corrupted (e.g., zeroed, incorrectly shuffled, or preprocessed incorrectly) • Learning rate too high Implementing bug-free DL models 73 Troubleshooting - debug
  • 74. Full Stack Deep Learning - UC Berkeley Spring 2021 Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Learning rate too low • Gradients not flowing through the whole model • Too much regularization • Incorrect input to loss function (e.g., softmax instead of logits, accidentally add ReLU on output) • Data or labels corrupted Implementing bug-free DL models 74 Troubleshooting - debug
  • 75. Full Stack Deep Learning - UC Berkeley Spring 2021 Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Numerical issue. Check all exp, log, and div operations • Learning rate too high • Data or labels corrupted (e.g., zeroed or incorrectly shuffled) • Learning rate too high • Learning rate too low • Gradients not flowing through the whole model • Too much regularization • Incorrect input to loss function (e.g., softmax instead of logits) • Data or labels corrupted Implementing bug-free DL models 75 Troubleshooting - debug • Flipped the sign of the loss function / gradient • Learning rate too high • Softmax taken over wrong dimension
  • 76. Full Stack Deep Learning - UC Berkeley Spring 2021 Get your model to run Compare to a known result Overfit a single batch b a c Steps Implementing bug-free DL models 76 Troubleshooting - debug
  • 77. Full Stack Deep Learning - UC Berkeley Spring 2021 Hierarchy of known results 77 More useful Less useful You can: 
 • Walk through code line-by-line and ensure you have the same output • Ensure your performance is up to par with expectations Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours
  • 78. Full Stack Deep Learning - UC Berkeley Spring 2021 Hierarchy of known results 78 More useful Less useful You can: 
 • Walk through code line-by-line and ensure you have the same output Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST)
  • 79. Full Stack Deep Learning - UC Berkeley Spring 2021 • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression) Hierarchy of known results 79 More useful Less useful You can: 
 • Same as before, but with lower confidence Troubleshooting - debug
  • 80. Full Stack Deep Learning - UC Berkeley Spring 2021 Hierarchy of known results 80 More useful Less useful You can: 
 • Ensure your performance is up to par with expectations Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from a paper (with no code)
  • 81. Full Stack Deep Learning - UC Berkeley Spring 2021 • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) Hierarchy of known results 81 More useful Less useful You can: 
 • Make sure your model performs well in a simpler setting Troubleshooting - debug
  • 82. Full Stack Deep Learning - UC Berkeley Spring 2021 • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression) Hierarchy of known results 82 More useful Less useful You can: 
 • Get a general sense of what kind of performance can be expected Troubleshooting - debug
  • 83. Full Stack Deep Learning - UC Berkeley Spring 2021 • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression) Hierarchy of known results 83 More useful Less useful You can: 
 • Make sure your model is learning anything at all Troubleshooting - debug
  • 84. Full Stack Deep Learning - UC Berkeley Spring 2021 Hierarchy of known results 84 More useful Less useful Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression)
  • 85. Full Stack Deep Learning - UC Berkeley Spring 2021 Summary: how to implement & debug 85 Get your model to run Compare to a known result Overfit a single batch b a c Steps Summary • Look for corrupted data, over- regularization, broadcasting errors • Keep iterating until model performs up to expectations Troubleshooting - debug • Step through in debugger & watch out for shape, casting, and OOM errors
  • 86. Full Stack Deep Learning - UC Berkeley Spring 2021 Questions? 86 Troubleshooting - debug
  • 87. Full Stack Deep Learning - UC Berkeley Spring 2021 Strategy for DL troubleshooting 87 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - evaluate
  • 88. Full Stack Deep Learning - UC Berkeley Spring 2021 Bias-variance decomposition 88 Troubleshooting - evaluate
  • 89. Full Stack Deep Learning - UC Berkeley Spring 2021 Bias-variance decomposition 89 Troubleshooting - evaluate
  • 90. Full Stack Deep Learning - UC Berkeley Spring 2021 Bias-variance decomposition 90 Troubleshooting - evaluate
  • 91. Full Stack Deep Learning - UC Berkeley Spring 2021 25 2 27 5 32 2 34 Irreducible error Avoidable bias (i.e., underfitting) Train error Variance (i.e., overfitting) Val error Val set overfitting Test error 0 5 10 15 20 25 30 35 40 Breakdown of test error by source Bias-variance decomposition 91 Troubleshooting - evaluate
  • 92. Full Stack Deep Learning - UC Berkeley Spring 2021 Bias-variance decomposition • Test error = irreducible error + bias + variance + val overfitting • This assumes train, val, and test all come from the same distribution. What if not? 92 Troubleshooting - evaluate
  • 93. Full Stack Deep Learning - UC Berkeley Spring 2021 Handling distribution shift 93 Test data Use two val sets: one sampled from training distribution and one from test distribution Train data Troubleshooting - evaluate
  • 94. Full Stack Deep Learning - UC Berkeley Spring 2021 The bias-variance tradeoff 94 Troubleshooting - evaluate
  • 95. Full Stack Deep Learning - UC Berkeley Spring 2021 Bias-variance with distribution shift 95 Troubleshooting - evaluate
  • 96. Full Stack Deep Learning - UC Berkeley Spring 2021 Bias-variance with distribution shift 96 25 2 27 2 29 3 32 2 34 I r r e d u c i b l e e r r o r A v o i d a b l e b i a s ( i . e . , u n d e r f i t t i n g ) T r a i n e r r o r V a r i a n c e T r a i n v a l e r r o r D i s t r i b u t i o n s h i f t T e s t v a l e r r o r V a l o v e r f i t t i n g T e s t e r r o r 0 5 10 15 20 25 30 35 40 Breakdown of test error by source Troubleshooting - evaluate
  • 97. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 97 Error source Value Goal performance 1% Train error 20% Validation error 27% Test error 28% Train - goal = 19%
 (under-fitting) 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - evaluate
  • 98. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 98 Error source Value Goal performance 1% Train error 20% Validation error 27% Test error 28% Val - train = 7%
 (over-fitting) 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - evaluate
  • 99. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 99 Error source Value Goal performance 1% Train error 20% Validation error 27% Test error 28% Test - val = 1%
 (looks good!) 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - evaluate
  • 100. Full Stack Deep Learning - UC Berkeley Spring 2021 Summary: evaluating model performance 100 Troubleshooting - evaluate Test error = irreducible error + bias + variance 
 + distribution shift + val overfitting
  • 101. Full Stack Deep Learning - UC Berkeley Spring 2021 Questions? 101 Troubleshooting - evaluate
  • 102. Full Stack Deep Learning - UC Berkeley Spring 2021 Strategy for DL troubleshooting 102 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - improve
  • 103. Full Stack Deep Learning - UC Berkeley Spring 2021 Address distribution shift Prioritizing improvements (i.e., applied b-v) 103 Address under-fitting Re-balance datasets (if applicable) Address over-fitting b a c Steps d Troubleshooting - improve
  • 104. Full Stack Deep Learning - UC Berkeley Spring 2021 Addressing under-fitting (i.e., reducing bias) 104 Try first Try later Troubleshooting - improve A. Make your model bigger (i.e., add layers or use more units per layer) B. Reduce regularization C. Error analysis D. Choose a different (closer to state-of-the art) model architecture (e.g., move from LeNet to ResNet) E. Tune hyper-parameters (e.g., learning rate) F. Add features
  • 105. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 105 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy 
 (i.e., 1% error) Error source Value Value Goal performance 1% 1% Train error 20% 7% Validation error 27% 19% Test error 28% 20% Add more layers to the ConvNet Troubleshooting - improve
  • 106. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 106 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy 
 (i.e., 1% error) Error source Value Value Value Goal performance 1% 1% 1% Train error 20% 7% 3% Validation error 27% 19% 10% Test error 28% 20% 10% Switch to ResNet-101 Troubleshooting - improve
  • 107. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 107 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy 
 (i.e., 1% error) Error source Value Value Value Value Goal performance 1% 1% 1% 1% Train error 20% 7% 3% 0.8% Validation error 27% 19% 10% 12% Test error 28% 20% 10% 12% Tune learning rate Troubleshooting - improve
  • 108. Full Stack Deep Learning - UC Berkeley Spring 2021 Address distribution shift Prioritizing improvements (i.e., applied b-v) 108 Address under-fitting Re-balance datasets (if applicable) Address over-fitting b a c Steps d Troubleshooting - improve
  • 109. Full Stack Deep Learning - UC Berkeley Spring 2021 Addressing over-fitting (i.e., reducing variance) 109 Try first Try later Troubleshooting - improve A. Add more training data (if possible!) B. Add normalization (e.g., batch norm, layer norm) C. Add data augmentation D. Increase regularization (e.g., dropout, L2, weight decay) E. Error analysis F. Choose a different (closer to state-of-the-art) model architecture G. Tune hyperparameters H. Early stopping I. Remove features J. Reduce model size
  • 110. Full Stack Deep Learning - UC Berkeley Spring 2021 Addressing over-fitting (i.e., reducing variance) 110 Try first Try later Not recommended! Troubleshooting - improve A. Add more training data (if possible!) B. Add normalization (e.g., batch norm, layer norm) C. Add data augmentation D. Increase regularization (e.g., dropout, L2, weight decay) E. Error analysis F. Choose a different (closer to state-of-the-art) model architecture G. Tune hyperparameters H. Early stopping I. Remove features J. Reduce model size
  • 111. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 111 Error source Value Goal performance 1% Train error 0.8% Validation error 12% Test error 12% 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  • 112. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 112 Error source Value Value Goal performance 1% 1% Train error 0.8% 1.5% Validation error 12% 5% Test error 12% 6% Increase dataset size to 250,000 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  • 113. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 113 Error source Value Value Value Goal performance 1% 1% 1% Train error 0.8% 1.5% 1.7% Validation error 12% 5% 4% Test error 12% 6% 4% Add weight decay 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  • 114. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 114 Error source Value Value Value Value Goal performance 1% 1% 1% 1% Train error 0.8% 1.5% 1.7% 2% Validation error 12% 5% 4% 2.5% Test error 12% 6% 4% 2.6% Add data augmentation 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  • 115. Full Stack Deep Learning - UC Berkeley Spring 2021 Train, val, and test error for pedestrian detection 115 Error source Value Value Value Value Value Goal performance 1% 1% 1% 1% 1% Train error 0.8% 1.5% 1.7% 2% 0.6% Validation error 12% 5% 4% 2.5% 0.9% Test error 12% 6% 4% 2.6% 1.0% Tune num layers, optimizer params, weight initialization, kernel size, weight decay 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  • 116. Full Stack Deep Learning - UC Berkeley Spring 2021 Address distribution shift Prioritizing improvements (i.e., applied b-v) 116 Address under-fitting Re-balance datasets (if applicable) Address over-fitting b a c Steps d Troubleshooting - improve
  • 117. Full Stack Deep Learning - UC Berkeley Spring 2021 Addressing distribution shift 117 Try first Try later Troubleshooting - improve A. Analyze test-val set errors & collect more training data to compensate B. Analyze test-val set errors & synthesize more training data to compensate C. Apply domain adaptation techniques to training & test distributions
  • 118. Full Stack Deep Learning - UC Berkeley Spring 2021 Error analysis 118 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Troubleshooting - improve
  • 119. Full Stack Deep Learning - UC Berkeley Spring 2021 Error analysis 119 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Error type 1: hard-to-see pedestrians Troubleshooting - improve
  • 120. Full Stack Deep Learning - UC Berkeley Spring 2021 Error analysis 120 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Error type 2: reflections Troubleshooting - improve
  • 121. Full Stack Deep Learning - UC Berkeley Spring 2021 Error analysis 121 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Error type 3 (test-val only): night scenes Troubleshooting - improve
  • 122. Full Stack Deep Learning - UC Berkeley Spring 2021 Error analysis 122 Error type Error % (train-val) Error % (test-val) Potential solutions Priority 1. Hard-to-see pedestrians 0.1% 0.1% • Better sensors Low 2. Reflections 0.3% 0.3% • Collect more data with reflections • Add synthetic reflections to train set • Try to remove with pre-processing • Better sensors Medium 3. Nighttime scenes 0.1% 1% • Collect more data at night • Synthetically darken training images • Simulate night-time data • Use domain adaptation High Troubleshooting - improve
  • 123. Full Stack Deep Learning - UC Berkeley Spring 2021 Domain adaptation 123 What is it? Techniques to train on “source” distribution and generalize to another “target” using only unlabeled data or limited labeled data When should you consider using it? • Access to labeled data from test distribution is limited • Access to relatively similar data is plentiful Troubleshooting - improve
  • 124. Full Stack Deep Learning - UC Berkeley Spring 2021 Types of domain adaptation 124 Type Use case Example techniques Supervised You have limited data from target domain • Fine-tuning a pre- trained model • Adding target data to train set Un-supervised You have lots of un- labeled data from target domain • Correlation Alignment (CORAL) • Domain confusion • CycleGAN Troubleshooting - improve
  • 125. Full Stack Deep Learning - UC Berkeley Spring 2021 Address distribution shift Prioritizing improvements (i.e., applied b-v) 125 Address under-fitting Re-balance datasets (if applicable) Address over-fitting b a c Steps d Troubleshooting - improve
  • 126. Full Stack Deep Learning - UC Berkeley Spring 2021 Rebalancing datasets • If (test)-val looks significantly better than test, you overfit to the val set • This happens with small val sets or lots of hyper parameter tuning • When it does, recollect val data 126 Troubleshooting - improve
  • 127. Full Stack Deep Learning - UC Berkeley Spring 2021 Questions? 127 Troubleshooting - improve
  • 128. Full Stack Deep Learning - UC Berkeley Spring 2021 Strategy for DL troubleshooting 128 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - tune
  • 129. Full Stack Deep Learning - UC Berkeley Spring 2021 Hyperparameter optimization 129 Model & optimizer choices? Network: ResNet - How many layers? - Weight initialization? - Kernel size? - Etc Optimizer: Adam - Batch size? - Learning rate? - beta1, beta2, epsilon? Regularization - …. 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - tune
  • 130. Full Stack Deep Learning - UC Berkeley Spring 2021 Which hyper-parameters to tune? 130 Choosing hyper-parameters • More sensitive to some than others • Depends on choice of model • Rules of thumb (only) to the right • Sensitivity is relative to default values! 
 (e.g., if you are using all-zeros weight initialization or vanilla SGD, changing to the defaults will make a big difference) Hyperparameter Approximate sensitivity Learning rate High Learning rate schedule High Optimizer choice Low Other optimizer params
 (e.g., Adam beta1) Low Batch size Low Weight initialization Medium Loss function High Model depth Medium Layer size High Layer params 
 (e.g., kernel size) Medium Weight of regularization Medium Nonlinearity Low Troubleshooting - tune
  • 131. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 1: manual hyperparam optimization 131 How it works • Understand the algorithm • E.g., higher learning rate means faster less stable training • Train & evaluate model • Guess a better hyperparam value & re- evaluate • Can be combined with other methods (e.g., manually select parameter ranges to optimizer over) Advantages Disadvantages • For a skilled practitioner, may require least computation to get good result • Requires detailed understanding of the algorithm • Time-consuming Troubleshooting - tune
  • 132. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 2: grid search 132 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages • Super simple to implement • Can produce good results • Not very efficient: need to train on all cross-combos of hyper-parameters • May require prior knowledge about parameters to get
 good results Advantages Troubleshooting - tune
  • 133. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 3: random search 133 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages • Easy to implement • Often produces better results than grid search • Not very interpretable • May require prior knowledge about parameters to get
 good results Troubleshooting - tune
  • 134. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 4: coarse-to-fine 134 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Troubleshooting - tune
  • 135. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 4: coarse-to-fine 135 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Best performers Troubleshooting - tune
  • 136. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 4: coarse-to-fine 136 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Troubleshooting - tune
  • 137. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 4: coarse-to-fine 137 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Troubleshooting - tune
  • 138. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 4: coarse-to-fine 138 Hyperparameter 1 (e.g., batch size) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages • Can narrow in on very high performing hyperparameters • Most used method in practice • Somewhat manual process Advantages etc. Troubleshooting - tune
  • 139. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 5: Bayesian hyperparam opt 139 How it works (at a high level) • Start with a prior estimate of parameter distributions • Maintain a probabilistic model of the relationship between hyper-parameter values and model performance • Alternate between: • Training with the hyper-parameter values that maximize the expected improvement • Using training results to update our probabilistic model • To learn more, see: 
 Advantages Disadvantages • Generally the most efficient hands-off way to choose hyperparameters • Difficult to implement from scratch • Can be hard to integrate with off-the-shelf tools https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f Troubleshooting - tune
  • 140. Full Stack Deep Learning - UC Berkeley Spring 2021 Method 5: Bayesian hyperparam opt 140 How it works (at a high level) • Start with a prior estimate of parameter distributions • Maintain a probabilistic model of the relationship between hyper-parameter values and model performance • Alternate between: • Training with the hyper-parameter values that maximize the expected improvement • Using training results to update our probabilistic model • To learn more, see: 
 Advantages Disadvantages • Generally the most efficient hands-off way to choose hyperparameters • Difficult to implement from scratch • Can be hard to integrate with off-the-shelf tools More on tools to do this automatically in the infrastructure & tooling lecture! https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f Troubleshooting - tune
  • 141. Full Stack Deep Learning - UC Berkeley Spring 2021 Summary of how to optimize hyperparams • Coarse-to-fine random searches • Consider Bayesian hyper-parameter optimization solutions as your codebase matures 141 Troubleshooting - tune
  • 142. Full Stack Deep Learning - UC Berkeley Spring 2021 Questions? 142 Troubleshooting - tune
  • 143. Full Stack Deep Learning - UC Berkeley Spring 2021 Conclusion 143 • DL debugging is hard due to many competing sources of error • To train bug-free DL models, we treat building our model as an iterative process • The following steps can make the process easier and catch errors as early as possible Troubleshooting - conclusion
  • 144. Full Stack Deep Learning - UC Berkeley Spring 2021 How to build bug-free DL models 144 Tune hyp- eparams Implement & debug Start 
 simple Evaluate Improve model/data • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next • Use coarse-to-fine random searches • Make your model bigger if you underfit; add data or regularize if you overfit Troubleshooting - conclusion
  • 145. Full Stack Deep Learning - UC Berkeley Spring 2021 Where to go to learn more • Andrew Ng’s book Machine Learning Yearning (http:// www.mlyearning.org/) • The following Twitter thread:
 https://twitter.com/karpathy/status/1013244313327681536 • This blog post: 
 https://pcc.cs.byu.edu/2017/10/02/practical-advice-for- building-deep-neural-networks/ 145 Troubleshooting - conclusion
  • 146. Full Stack Deep Learning - UC Berkeley Spring 2021 Thank you! 146