GoogLeNet Insights

Five Insights from
GoogLeNet
You Could Use In Your Own Deep Learning Nets
Auro Tripathy
3b 4a 4b 4c 4d 4e 5a3a 5b
www.shaBerline.com 1

Year 1989 Kicked-Off ConvoluKon Neural Nets
Ten-Digit Classifier using a Modest Neural Network with Three Hidden Layers
Backpropaga)on Applied to Handwri4en Zip Code
Recogni)on. LeCun, et. al.
hBp://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
Hidden
Units
Connec-ons Params
Out –
H3
(FC)
10
Visible
10 x (30W
+1B )=
310
10 x (30W
+1B )=
310
H3 –
H2
(FC)
30 30 * (192
Weights + 1
Bias) = 5790
30 *
(192 W + 1 B) =
5790
H2 –
H1
(Conv)
12 X 4 x
4 = 192
192 x (5 x 5 x 8
+ 1)=
38592
5 x 5 x 8 x 12 +
192 Biases =
2592
H1 –
Input
(Conv)
12 x 8 x
8 = 768
768 x (5 x 5 x 1
+ 1) = 19968
5 x 5 x 1 x 12 +
768 Biases =
1068
Totals 16 x 16
In + 990
Hidden +
10 Out
64660
ConnecKons
9760 Params
Each of the units in H2 combines local
informaKon coming from 8 of the 12
different feature maps in H1. www.shaBerline.com 2

Year 2012 Marked The InﬂecKon Point
Reintroducing CNNs Led to Big Drop in Error for Image ClassiﬁcaKon.
Since Then, Networks ConKnued to Reduce
28.2
25.8
16.4
11.7
7.3
6.7
3.57
0
5
10
15
20
25
30
ILSVRC'10 ILSVRC'11 ILSVRC'12
(Alexnet)
ILSVRC'13 ILSVRC'14 ILSVRC'14
(GoogLeNet)
ILSVRC'15
(ResNet)
0
20
40
60
80
100
120
140
160
Error %
Layers
Top-5

The Trend has been to Increase the
number of Layers (& Layer Size)
•  The typical ‘design paBern’ for ConvoluKonal Neural Nets:
–  Stacked convoluKonal layers,
•  linear filter followed by a non-linear acKvaKon
–  Followed by contrast normalizaKon and max pooling,
–  PenulKmate layers (one or more) are fully connected layers.
–  UlKmate layer is a loss layer, possibly more than one, in a weighted
mix
•  Use of dropouts to address the problem of over-fipng due to
many layers
•  In addiKon to classificaKon, architecture good for localizaKon
and object detecKon
–  despite concerns that max-pooling dilutes spaKal informaKon

The Challenge of Deep Networks
1.  Adding layers increases the number of
parameters and makes the network prone to
over-ﬁpng
–  Exacerbated by paucity of data
–  More data means more expense in their annotaKon
2.  More computaKon
–  Linear increase in ﬁlters results in quadraKc increase
in compute
–  If weights are close to zero, we’ve wasted compute
resources

Year 2014, GoogLeNet Took Aim at
Efficiency and PracKcality
Resultant benefits of the new architecture:
•  12 Kmes lesser parameters than AlexNet
– Significantly more accurate than AlexNet
– Lower memory-use and lower power-use acutely
important for mobile devices.
•  Stays within the targeted 1.5 Billion mulKply-
add budget
– ComputaKonal cost “less than 2X compared to
AlexNet”
hBp://www.youtube.com/watch?v=ySrj_G5gHWI&t=12m42s

Introducing the IncepKon Module
1x1
5x5
3x3
1x1
3x3 Max
Pooling
Previous
Layer
Concatenate

IntuiKon behind the IncepKon Module
•  Cluster neurons according to the correlaKon staKsKcs in the dataset
–  An opKmal layered network topology can be constructed by analyzing the
correlaKon staKsKcs of the preceding layer acKvaKons and and clustering
neurons with highly correlated outputs.
•  We already know that, in the lower layers, there exists high
correlaKons in image patches that are local and near-local.
–  These can be covered by 1x1 convoluKons
–  AddiKonally, a smaller number of spaKally spread-out clusters can be covered
by convoluKon over larger patches; i.e., 3x3, and 5x5
–  And there will be decreasing number of patches over larger and larger
regions.
•  It also suggests that the architecture is a combina)on of the of all
the convoluKons, the 1x1, 3x3, 5x5, as input to the next stage
•  Since max-pooling has been successful, it suggests adding a pooling
layer in parallel

In Images, correlaKon tends to be local, exploit it.
Heterogeneous set of convoluKons to cover spread-out clusters
Cover very local
clusters w/1x1 convoluKons
Cover more spread-out
Cover even more spread-out
5x5 3x3 1x1
5x5
3x31x1
Previous
Layer

Conceiving the IncepKon Module
5x5
3x3
1x1
3x3 Max
Pooling
Concatenate
Previous
Layer

IncepKon Module Put Into PracKce
Judicious Dimension ReducKon
1x1
5x5
3x3
1x1
3x3 Max
Pooling
Previous
Layer
Concatenate

Insights…
3b 4a 4b 4c 4d 4e 5a3a 5b

GoogLeNet Insight #1
(Summary from previous Slides)
Leads to the following architecture choices:
•  Choosing filter sizes of 1X1, 3X3, 5X5
•  Applying all three filters on the same “patch”
of image (no need to choose)
•  ConcatenaKng all filters as a single output
vector for the next stage.
•  ConcatenaKng an addiKonal pooling path
since pooling is essenKal to the success of
CNNs.

GoogLeNet Insights #2
Decrease dimensions wherever computaKon requirements increase
via a 1X1 Dimension ReducKon Layer
•  Use inexpensive 1X1 convoluKons to compute
reducKons before the expensive 3X3 and 3X5
convoluKons
•  1X1 convoluKons include a ReLU acKvaKon
making then dual-purpose.
1x1
Previous
Layer
ReLU

Stack IncepKon Modules Upon Each Other
•  Occasionally insert max-pooling layers with stride 2 to
decimate (by half) the resoluKon of the grid.
•  Stacking IncepKon Layers beneﬁts the results when
used at higher layers (not strictly necessary)
–  Lower layers are kept in tradiKonal convoluKons fashion
(for memory eﬃciency reasons)
•  This stacking allows for tweaking each module without
uncontrolled blowup in computaKonal complexity at
later stages.
–  For example, a tweak could be increase width at any stage.

GoogLeNet Components
Stacking IncepKon Modules
3b 4a 4b 4c 4d 4e 5a3a 5b
Input
Average
Pooling
Traditional
Convolutions
(Conv + MaxPool +
Conv + MaxPool)
Linear
Nine Inception Modules
SoftMax
w/LossMaxPool
Label

Counter-Balancing Back-PropagaKon Downsides in Deep Networks
•  A potenKal problem
–  Back-propagaKng thru deep networks could result in
“vanishing gradients” (possibly mean, dead ReLUs).
•  A soluKon
–  Intermediate layers do have discriminatory powers
–  Auxiliary classiﬁers were appended to the
intermediate layers
–  During training, the intermediate loss was added to
the total loss with a discounted factor of 0.3

Two AddiKonal Loss Layers
for Training to Depth
3b 4a 4b 4c 4d 4e 5a3a 5b
Input
Average
Pooling
Traditional
Convolutions
(Conv + MaxPool +
Conv + MaxPool)
Linear
Nine Inception Modules
SoftMax
w/Loss 2MaxPool
Average
Pooling
1x1
Conv
DropOutFully
Connected
SoftMax
w/Loss 0Linear
Label
SoftMax
w/Loss 1

End with Global Average Pooling Layer Instead of Fully Connected Layer
•  Fully-Connected layers are prone to over-fipng
–  Hampers generalizaKon
•  Average Pooling has no parameter to opKmize, thus no over-fipng.
•  Averaging more naKve to the convoluKonal structure
–  Natural correspondence between feature-maps and categories leading
to easier interpretaKon
•  Average Pooling does not exclude the use of Dropouts, a proven
regularizaKon method to avoid over-fipng.
3b 4a 4b 4c 4d 4e 5a3a 5b
Global
Average
Pooling
Linear
Layer for
adapting to
other label Sets
SoftMax
w/Loss
Label

Summarizing The Insights
1.  Exploit fully the fact that, in Images, correlaKon tend
to be local
•  Concatenate 1X1, 3X3, 5x5 convoluKons along with pooling
2.  Decrease dimensions wherever computaKon
requirements increase via a 1X1 Dimension ReducKon
Layer
3.  Stack IncepKon Modules Upon Each Other
4.  Counter-Balance Back-PropagaKon Downsides in
Deep Network
•  Uses intermediate losses in the ﬁnal loss
5.  End with Global Average Pooling Layer Instead of
Fully Connected Layer

References
•  Seminal
– Backpropaga)on Applied to Handwri4en Zip Code
Recogni)on. LeCun, et. al.
•  Deep Networks
– Going Deeper with ConvoluKons
– Network In Network

GoogLeNet Insights

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GoogLeNet Insights

Similar to GoogLeNet Insights (20)

More from Auro Tripathy

More from Auro Tripathy (6)

Recently uploaded

Recently uploaded (20)

GoogLeNet Insights