Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

3,345 views

Published on

Five Insights from GoogLeNet You Could Use In Your Own Deep Learning Nets

Published in:
Engineering

No Downloads

Total views

3,345

On SlideShare

0

From Embeds

0

Number of Embeds

125

Shares

0

Downloads

232

Comments

0

Likes

14

No embeds

No notes for slide

- 1. Five Insights from GoogLeNet You Could Use In Your Own Deep Learning Nets Auro Tripathy 3b 4a 4b 4c 4d 4e 5a3a 5b www.shaBerline.com 1
- 2. Year 1989 Kicked-Oﬀ ConvoluKon Neural Nets Ten-Digit Classiﬁer using a Modest Neural Network with Three Hidden Layers Backpropaga)on Applied to Handwri4en Zip Code Recogni)on. LeCun, et. al. hBp://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf Hidden Units Connec-ons Params Out – H3 (FC) 10 Visible 10 x (30W +1B )= 310 10 x (30W +1B )= 310 H3 – H2 (FC) 30 30 * (192 Weights + 1 Bias) = 5790 30 * (192 W + 1 B) = 5790 H2 – H1 (Conv) 12 X 4 x 4 = 192 192 x (5 x 5 x 8 + 1)= 38592 5 x 5 x 8 x 12 + 192 Biases = 2592 H1 – Input (Conv) 12 x 8 x 8 = 768 768 x (5 x 5 x 1 + 1) = 19968 5 x 5 x 1 x 12 + 768 Biases = 1068 Totals 16 x 16 In + 990 Hidden + 10 Out 64660 ConnecKons 9760 Params Each of the units in H2 combines local informaKon coming from 8 of the 12 diﬀerent feature maps in H1. www.shaBerline.com 2
- 3. Year 2012 Marked The InﬂecKon Point Reintroducing CNNs Led to Big Drop in Error for Image ClassiﬁcaKon. Since Then, Networks ConKnued to Reduce 28.2 25.8 16.4 11.7 7.3 6.7 3.57 0 5 10 15 20 25 30 ILSVRC'10 ILSVRC'11 ILSVRC'12 (Alexnet) ILSVRC'13 ILSVRC'14 ILSVRC'14 (GoogLeNet) ILSVRC'15 (ResNet) 0 20 40 60 80 100 120 140 160 Error % Layers www.shaBerline.com 3 Top-5
- 4. The Trend has been to Increase the number of Layers (& Layer Size) • The typical ‘design paBern’ for ConvoluKonal Neural Nets: – Stacked convoluKonal layers, • linear ﬁlter followed by a non-linear acKvaKon – Followed by contrast normalizaKon and max pooling, – PenulKmate layers (one or more) are fully connected layers. – UlKmate layer is a loss layer, possibly more than one, in a weighted mix • Use of dropouts to address the problem of over-ﬁpng due to many layers • In addiKon to classiﬁcaKon, architecture good for localizaKon and object detecKon – despite concerns that max-pooling dilutes spaKal informaKon www.shaBerline.com 4
- 5. The Challenge of Deep Networks 1. Adding layers increases the number of parameters and makes the network prone to over-ﬁpng – Exacerbated by paucity of data – More data means more expense in their annotaKon 2. More computaKon – Linear increase in ﬁlters results in quadraKc increase in compute – If weights are close to zero, we’ve wasted compute resources www.shaBerline.com 5
- 6. Year 2014, GoogLeNet Took Aim at Eﬃciency and PracKcality Resultant beneﬁts of the new architecture: • 12 Kmes lesser parameters than AlexNet – Signiﬁcantly more accurate than AlexNet – Lower memory-use and lower power-use acutely important for mobile devices. • Stays within the targeted 1.5 Billion mulKply- add budget – ComputaKonal cost “less than 2X compared to AlexNet” hBp://www.youtube.com/watch?v=ySrj_G5gHWI&t=12m42s www.shaBerline.com 6
- 7. Introducing the IncepKon Module www.shaBerline.com 7 1x1 5x5 3x3 1x1 3x3 Max Pooling Previous Layer Concatenate
- 8. IntuiKon behind the IncepKon Module • Cluster neurons according to the correlaKon staKsKcs in the dataset – An opKmal layered network topology can be constructed by analyzing the correlaKon staKsKcs of the preceding layer acKvaKons and and clustering neurons with highly correlated outputs. • We already know that, in the lower layers, there exists high correlaKons in image patches that are local and near-local. – These can be covered by 1x1 convoluKons – AddiKonally, a smaller number of spaKally spread-out clusters can be covered by convoluKon over larger patches; i.e., 3x3, and 5x5 – And there will be decreasing number of patches over larger and larger regions. • It also suggests that the architecture is a combina)on of the of all the convoluKons, the 1x1, 3x3, 5x5, as input to the next stage • Since max-pooling has been successful, it suggests adding a pooling layer in parallel www.shaBerline.com 8
- 9. In Images, correlaKon tends to be local, exploit it. Heterogeneous set of convoluKons to cover spread-out clusters www.shaBerline.com 9 Cover very local clusters w/1x1 convoluKons Cover more spread-out clusters w/3x3 convoluKons Cover even more spread-out clusters w/5x5 convoluKons 5x5 3x3 1x1 5x5 3x31x1 Previous Layer
- 10. Conceiving the IncepKon Module www.shaBerline.com 10 5x5 3x3 1x1 3x3 Max Pooling Concatenate Previous Layer
- 11. IncepKon Module Put Into PracKce Judicious Dimension ReducKon www.shaBerline.com 11 1x1 5x5 3x3 1x1 3x3 Max Pooling Previous Layer Concatenate
- 12. www.shaBerline.com 12 Insights… 3b 4a 4b 4c 4d 4e 5a3a 5b
- 13. GoogLeNet Insight #1 (Summary from previous Slides) Leads to the following architecture choices: • Choosing ﬁlter sizes of 1X1, 3X3, 5X5 • Applying all three ﬁlters on the same “patch” of image (no need to choose) • ConcatenaKng all ﬁlters as a single output vector for the next stage. • ConcatenaKng an addiKonal pooling path since pooling is essenKal to the success of CNNs. www.shaBerline.com 13
- 14. GoogLeNet Insights #2 Decrease dimensions wherever computaKon requirements increase via a 1X1 Dimension ReducKon Layer • Use inexpensive 1X1 convoluKons to compute reducKons before the expensive 3X3 and 3X5 convoluKons • 1X1 convoluKons include a ReLU acKvaKon making then dual-purpose. 1x1 Previous Layer ReLU www.shaBerline.com 14
- 15. GoogLeNet Insight #3 Stack IncepKon Modules Upon Each Other • Occasionally insert max-pooling layers with stride 2 to decimate (by half) the resoluKon of the grid. • Stacking IncepKon Layers beneﬁts the results when used at higher layers (not strictly necessary) – Lower layers are kept in tradiKonal convoluKons fashion (for memory eﬃciency reasons) • This stacking allows for tweaking each module without uncontrolled blowup in computaKonal complexity at later stages. – For example, a tweak could be increase width at any stage. www.shaBerline.com 15
- 16. GoogLeNet Components Stacking IncepKon Modules 3b 4a 4b 4c 4d 4e 5a3a 5b Input Average Pooling Traditional Convolutions (Conv + MaxPool + Conv + MaxPool) Linear Nine Inception Modules SoftMax w/LossMaxPool Label www.shaBerline.com 16
- 17. GoogLeNet Insight #4 Counter-Balancing Back-PropagaKon Downsides in Deep Networks • A potenKal problem – Back-propagaKng thru deep networks could result in “vanishing gradients” (possibly mean, dead ReLUs). • A soluKon – Intermediate layers do have discriminatory powers – Auxiliary classiﬁers were appended to the intermediate layers – During training, the intermediate loss was added to the total loss with a discounted factor of 0.3 www.shaBerline.com 17
- 18. Two AddiKonal Loss Layers for Training to Depth 3b 4a 4b 4c 4d 4e 5a3a 5b Input Average Pooling Traditional Convolutions (Conv + MaxPool + Conv + MaxPool) Linear Nine Inception Modules SoftMax w/Loss 2MaxPool Average Pooling 1x1 Conv DropOutFully Connected SoftMax w/Loss 0Linear Label SoftMax w/Loss 1 www.shaBerline.com 18
- 19. GoogLeNet Insight #5 End with Global Average Pooling Layer Instead of Fully Connected Layer • Fully-Connected layers are prone to over-ﬁpng – Hampers generalizaKon • Average Pooling has no parameter to opKmize, thus no over-ﬁpng. • Averaging more naKve to the convoluKonal structure – Natural correspondence between feature-maps and categories leading to easier interpretaKon • Average Pooling does not exclude the use of Dropouts, a proven regularizaKon method to avoid over-ﬁpng. 3b 4a 4b 4c 4d 4e 5a3a 5b Global Average Pooling Linear Layer for adapting to other label Sets SoftMax w/Loss Label www.shaBerline.com 19
- 20. Summarizing The Insights 1. Exploit fully the fact that, in Images, correlaKon tend to be local • Concatenate 1X1, 3X3, 5x5 convoluKons along with pooling 2. Decrease dimensions wherever computaKon requirements increase via a 1X1 Dimension ReducKon Layer 3. Stack IncepKon Modules Upon Each Other 4. Counter-Balance Back-PropagaKon Downsides in Deep Network • Uses intermediate losses in the ﬁnal loss 5. End with Global Average Pooling Layer Instead of Fully Connected Layer www.shaBerline.com 20
- 21. References • Seminal – Backpropaga)on Applied to Handwri4en Zip Code Recogni)on. LeCun, et. al. • Deep Networks – Going Deeper with ConvoluKons – Network In Network www.shaBerline.com 21

No public clipboards found for this slide

Be the first to comment