Min-Seo Kim
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: kms39273@naver.com
1
 Ongoing studies
• GoogLeNet
2
GoogLeNet
• The GoogLeNet submission to ILSVRC 2014 used 12× fewer parameters than the winning architecture, VGG,
from two years prior, yet it was significantly more accurate.
• Notable factor is that with the ongoing traction of mobile and embedded computing, the efficiency of our
algorithms – especially their power and memory use – gains importance.
Introduction
3
GoogLeNet
• GoogLeNet have typically had a standard structure – stacked convolutional layers (optionally followed by
contrast normalization and maxpooling) are followed by one or more fully-connected layers.
• Use 1×1 convolutional layers, ReLU activation function in Network-in-Network.
Related Work
4
GoogLeNet
• The most straightforward way of improving the performance of deep neural networks is by increasing their
size.
• However this simple solution comes with two major drawbacks.
• Bigger size typically means a larger number of parameters, which makes the enlarged network more
prone to overfitting. To prevent overfitting, creation of high quality training sets can be tricky and
expensive.
• Uniformly increased network size is the dramatically increased use of computational resources.
• Since in practice the computational budget is always finite, an efficient distribution of computing resources is
preferred to an indiscriminate increase of size
Motivation and High Level Considerations
5
GoogLeNet - Architectural Details
• To effectively extract feature maps, 1x1, 3x3, and 5x5 convolution filters are performed in parallel.
• However, this inevitably increases the computational load.
Inception module
6
GoogLeNet - Architectural Details
• Therefore, to address this issue, the 1x1 convolution filter was used.
• By placing it before the 3x3 and 5x5 filters, it reduces the dimensions, which in turn reduces the
computational load and introduces non-linearity.
Inception module
7
GoogLeNet - Architectural Details
- input tensor = 28X28X192
- convolution filter = 5X5X192
- padding = 2
- strride = 1
- number of filter = 32
28X28X192X5X5X32=1.2 billion times
How does the 1x1 conv filter reduce the amount of computation?
- input tensor = 28X28X192
- convolution filter = 1X1X16
- number of filter = 16
192X1X1X28X28X16=2.4 million
operations
- input tensor = 28X28X16
- convolution filter = 5X5X192
- padding = 2
- strride = 1
- number of filter = 32
16x5x5x28x28x32 = 10 million operations
Total of 12.4 million operations.
The number of operations has decreased tenfold.
The non-linearity has increased.
8
GoogLeNet - Architectural Details
• This is the parameter calculation for the Inception 3a module inside the actual GoogLeNet.
Inception in GoogLeNet(inception 3a)
9
GoogLeNet - Architectural Details
Entire GoogLeNet
10
GoogLeNet - Architectural Details
• This is where the lower layers are located, close to the input image.
• For efficient memory usage, we applied a basic CNN-type model in the lower layer.
• The Inception module is used in the higher layers, so it is not used in this part.
Part 1
11
GoogLeNet - Architectural Details
• To extract various features, the Inception module described earlier is implemented.
Part 2
12
GoogLeNet - Architectural Details
• As the depth of the model becomes very deep, the vanishing gradient problem can occur even when using
the ReLU activation function.
• We added an auxiliary classifier to the middle layer, which outputs intermediate results so that the gradient
can be passed as an additional backprop.
• To prevent it from having too much influence, the loss of the auxiliary classifier is multiplied by 0.3 and added
to the total loss of the entire network.
• In the actual test, we removed the auxiliary classifier and used only the softmax of the far end.
Part 3
13
GoogLeNet - Architectural Details
• This is the end of the model with the prediction results.
• The average pooling layer with global average pooling is applied.
• This reduces the size of the feature map without any additional parameters.
Part 4
14
GoogLeNet
• We presented a new methodology that is different from the existing CNN methods that only build up depth.
• It won the first prize at ILSVRC 2014, beating VGGNet.
Conclusions

GoogLeNet.pptx

  • 1.
    Min-Seo Kim Network ScienceLab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: kms39273@naver.com
  • 2.
  • 3.
    2 GoogLeNet • The GoogLeNetsubmission to ILSVRC 2014 used 12× fewer parameters than the winning architecture, VGG, from two years prior, yet it was significantly more accurate. • Notable factor is that with the ongoing traction of mobile and embedded computing, the efficiency of our algorithms – especially their power and memory use – gains importance. Introduction
  • 4.
    3 GoogLeNet • GoogLeNet havetypically had a standard structure – stacked convolutional layers (optionally followed by contrast normalization and maxpooling) are followed by one or more fully-connected layers. • Use 1×1 convolutional layers, ReLU activation function in Network-in-Network. Related Work
  • 5.
    4 GoogLeNet • The moststraightforward way of improving the performance of deep neural networks is by increasing their size. • However this simple solution comes with two major drawbacks. • Bigger size typically means a larger number of parameters, which makes the enlarged network more prone to overfitting. To prevent overfitting, creation of high quality training sets can be tricky and expensive. • Uniformly increased network size is the dramatically increased use of computational resources. • Since in practice the computational budget is always finite, an efficient distribution of computing resources is preferred to an indiscriminate increase of size Motivation and High Level Considerations
  • 6.
    5 GoogLeNet - ArchitecturalDetails • To effectively extract feature maps, 1x1, 3x3, and 5x5 convolution filters are performed in parallel. • However, this inevitably increases the computational load. Inception module
  • 7.
    6 GoogLeNet - ArchitecturalDetails • Therefore, to address this issue, the 1x1 convolution filter was used. • By placing it before the 3x3 and 5x5 filters, it reduces the dimensions, which in turn reduces the computational load and introduces non-linearity. Inception module
  • 8.
    7 GoogLeNet - ArchitecturalDetails - input tensor = 28X28X192 - convolution filter = 5X5X192 - padding = 2 - strride = 1 - number of filter = 32 28X28X192X5X5X32=1.2 billion times How does the 1x1 conv filter reduce the amount of computation? - input tensor = 28X28X192 - convolution filter = 1X1X16 - number of filter = 16 192X1X1X28X28X16=2.4 million operations - input tensor = 28X28X16 - convolution filter = 5X5X192 - padding = 2 - strride = 1 - number of filter = 32 16x5x5x28x28x32 = 10 million operations Total of 12.4 million operations. The number of operations has decreased tenfold. The non-linearity has increased.
  • 9.
    8 GoogLeNet - ArchitecturalDetails • This is the parameter calculation for the Inception 3a module inside the actual GoogLeNet. Inception in GoogLeNet(inception 3a)
  • 10.
    9 GoogLeNet - ArchitecturalDetails Entire GoogLeNet
  • 11.
    10 GoogLeNet - ArchitecturalDetails • This is where the lower layers are located, close to the input image. • For efficient memory usage, we applied a basic CNN-type model in the lower layer. • The Inception module is used in the higher layers, so it is not used in this part. Part 1
  • 12.
    11 GoogLeNet - ArchitecturalDetails • To extract various features, the Inception module described earlier is implemented. Part 2
  • 13.
    12 GoogLeNet - ArchitecturalDetails • As the depth of the model becomes very deep, the vanishing gradient problem can occur even when using the ReLU activation function. • We added an auxiliary classifier to the middle layer, which outputs intermediate results so that the gradient can be passed as an additional backprop. • To prevent it from having too much influence, the loss of the auxiliary classifier is multiplied by 0.3 and added to the total loss of the entire network. • In the actual test, we removed the auxiliary classifier and used only the softmax of the far end. Part 3
  • 14.
    13 GoogLeNet - ArchitecturalDetails • This is the end of the model with the prediction results. • The average pooling layer with global average pooling is applied. • This reduces the size of the feature map without any additional parameters. Part 4
  • 15.
    14 GoogLeNet • We presenteda new methodology that is different from the existing CNN methods that only build up depth. • It won the first prize at ILSVRC 2014, beating VGGNet. Conclusions