Korea University,
Department of Computer Science & Radio
Communication Engineering
2016010646 Bumsoo Kim
Google NetEverything about GoogleNet and its architecture
Bumsoo Kim Presentation 2017 1
Bumsoo Kim Presentation 2017 2
Whatis‘GoogleNet’?
GoogleNet
Bumsoo Kim Presentation 2017 3
Whatis‘GoogleNet’?
GoogleNet
Bumsoo Kim Presentation 2017 4
Whatis‘GoogleNet’?
GoogleNet
Bumsoo Kim Presentation 2017 5
Whatis‘GoogleNet’?
GoogleNet
Bumsoo Kim Presentation 2017 6
Whatis‘GoogleNet’?
GoogleNet
ILSVRC(ImageNet) 2014 Winning Model
Google, “Going Deeper with Convolutions”(CVPR2015)
Top5 Error Rate = 6.7%
Revolution of Depth (22 layers)
Bumsoo Kim Presentation 2017 7
Howdeepis‘GoogleNet’?
RevolutionofDepth
Bumsoo Kim Presentation 2017 8
Deeperisbetter?
RevolutionofDepth
The winning models of 2014 was distinguishable by its depth!
Bumsoo Kim Presentation 2017 9
Deeperisbetter?
RevolutionofDepth
The winning models of 2014 was distinguishable by its depth!
So, does deeper network is the key to better performance?
Bumsoo Kim Presentation 2017 10
Deeperisbetter?
RevolutionofDepth
The winning models of 2014 was distinguishable by its depth!
So, does deeper network is the key to better performance?
The answer is…
Bumsoo Kim Presentation 2017 11
Deeperisbetter?
RevolutionofDepth
The winning models of 2014 was distinguishable by its depth!
So, does deeper network is the key to better performance?
The answer is… NO!
Bumsoo Kim Presentation 2017 12
DeeperisNOTbetter
ObstaclesofDepth
Overfitting Unbalance & Aliasing
Bumsoo Kim Presentation 2017 13
DeeperisNOTbetter
ObstaclesofDepth
Overfitting Unbalance & Aliasing
Bumsoo Kim Presentation 2017 14
DeeperisNOTbetter
ObstaclesofDepth
Overfitting Unbalance & Aliasing
Bumsoo Kim Presentation 2017 15
DeeperisNOTbetter
ObstaclesofDepth
Overfitting Unbalance & Aliasing
Bumsoo Kim Presentation 2017 16
DeeperisNOTbetter
ObstaclesofDepth
Overfitting Unbalance & Aliasing
Bumsoo Kim Presentation 2017 17
DeeperisNOTbetter
ObstaclesofDepth
Overfitting Unbalance & Aliasing
Bumsoo Kim Presentation 2017 18
ZF-Netdidnotprovideafundamentalsolution
ObstaclesofDepth
ZF-net fundamentally failed in increasing depth
Bumsoo Kim Presentation 2017 19
ZF-Netdidnotprovideafundamentalsolution
ObstaclesofDepth
ZF-net fundamentally failed in increasing depth
2014 top models succeeded in addressing this problem
Bumsoo Kim Presentation 2017 20
ZF-Netdidnotprovideafundamentalsolution
ObstaclesofDepth
ZF-net fundamentally failed in increasing depth
2014 top models succeeded in addressing this problem
HOW?
Bumsoo Kim Presentation 2017 21
UniquemoduleofGooglenet
Inception
Bumsoo Kim Presentation 2017 22
NetworkinNetwork
Inceptionnodules
Bumsoo Kim Presentation 2017 23
NetworkinNetwork
Inceptionnodules
NIN (Network In Network) – Min Lin, 2013, Network in Network
local receptive linear features local receptive linear+non-linear features
non-linear features
Bumsoo Kim Presentation 2017 24
NetworkinNetwork
Inceptionnodules
NIN (Network In Network) – Min Lin, 2013, Network in Network
non-linear features
90% of computation
Bumsoo Kim Presentation 2017 25
NetworkinNetwork
Inceptionnodules
NIN (Network In Network) – Min Lin, 2013, Network in Network
Inception
Average Pooling
Bumsoo Kim Presentation 2017 26
1x1Convolutions
Inceptionnodules
Bumsoo Kim Presentation 2017 27
1x1Convolutions
Inceptionnodules
Original model
Bumsoo Kim Presentation 2017 28
1x1Convolutions
Inceptionnodules
Original model
Various filters for various feature scales
Bumsoo Kim Presentation 2017 29
1x1Convolutions
Inceptionnodules
Original model
Expensive unit : Larger filter sizes are very expensive!!
Bumsoo Kim Presentation 2017 30
1x1Convolutions
Inceptionnodules
Original model 1x1 Convolution
Bumsoo Kim Presentation 2017 31
1x1Convolutions
Inceptionnodules
1x1 Convolution
Dimension reduction
Hebbian Principle
Neurons that fire together,
wire together
동일 차원에서 activate된 neuron들은
비슷한 성질을 가진 것들로 묶일 수 있다.
c2 (large) c3 (small)
Bumsoo Kim Presentation 2017 32
Dowereallyneed5x5filters?
Inception
Factorizing Convolutions
5*5 = 25 → 2*(3*3) = 18 => 28% decrease of parameters!
Bumsoo Kim Presentation 2017 33
Dowereallyneed5x5filters?
Inception
Factorizing Convolutions
5*5 = 25 → 2*(3*3) = 18 => 28% decrease of parameters!
Bumsoo Kim Presentation 2017 34
HowEffectiveisit?
Inception
130M 4.5B
Bumsoo Kim Presentation 2017 35
Whydoweneedit?
AuxiliaryClassifier
Gradient Vanishing Problem
ReLU doesn’t provide a fundamental solution
Overlapping multiple ReLUs cause the same problem!
Bumsoo Kim Presentation 2017 36
Whatiswrongaboutit?
GradientVanishing
Vanishing Gradients on Sigmoids
* Deep Networks use activation function for non-linearity.
*“Saturated”: Weight ‘W’ is too large. => ‘z’is 0 or 1 => z*(1-z) = 0
z = 1/(1 + np.exp(-np.dot(W, x))) # forward pass
dx = np.dot(W.T, z*(1-z)) # backward pass: local gradient for x
dW = np.outer(z*(1-z), x) # backward pass: local gradient for W
* Local gradient is small : 0 <= z*(1-z) <= 0.25 (when z=0.5) : top layers are not trained
Bumsoo Kim Presentation 2017 37
Detailsarenotexplainedinthepaper
AuxiliaryClassifier
Training Deeper Convolutional Networks with Deep SuperVision(2014,
Liwei Chang, Chen-Yu Lee)
Bumsoo Kim Presentation 2017 38
BeforeGradientVanishes!
AuxiliaryClassifier
Training Deeper Convolutional Networks with Deep SuperVision(2014,
Liwei Chang, Chen-Yu Lee)
깊이가 깊어질수록 vanishing될 우려가
높아지므로, 얕은 깊이에서도 업데이트!
Bumsoo Kim Presentation 2017 39
EmpiricalDecision
AuxiliaryClassifier
Training Deeper Convolutional Networks with Deep SuperVision(2014,
Liwei Chang, Chen-Yu Lee)
넣는 위치는 이론적 근거 없이 순수하게
“경험적”으로 판단한다.
Bumsoo Kim Presentation 2017 40
HowEffectiveisit?
AuxiliaryClassifier
Training Deeper Convolutional Networks with Deep SuperVision(2014,
Liwei Chang, Chen-Yu Lee)
Bumsoo Kim Presentation 2017 41
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 42
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 43
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 44
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 45
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 46
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 47
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 48
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 49
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 50
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 51
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 52
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 53
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 55
Trainityourself!
GoogleNet
Bumsoo Kim Presentation 2017 56
Trainityourself!
GoogleNet
Thank You
Bumsoo Kim Presentation 2017 57

Google net

  • 1.
    Korea University, Department ofComputer Science & Radio Communication Engineering 2016010646 Bumsoo Kim Google NetEverything about GoogleNet and its architecture Bumsoo Kim Presentation 2017 1
  • 2.
    Bumsoo Kim Presentation2017 2 Whatis‘GoogleNet’? GoogleNet
  • 3.
    Bumsoo Kim Presentation2017 3 Whatis‘GoogleNet’? GoogleNet
  • 4.
    Bumsoo Kim Presentation2017 4 Whatis‘GoogleNet’? GoogleNet
  • 5.
    Bumsoo Kim Presentation2017 5 Whatis‘GoogleNet’? GoogleNet
  • 6.
    Bumsoo Kim Presentation2017 6 Whatis‘GoogleNet’? GoogleNet ILSVRC(ImageNet) 2014 Winning Model Google, “Going Deeper with Convolutions”(CVPR2015) Top5 Error Rate = 6.7% Revolution of Depth (22 layers)
  • 7.
    Bumsoo Kim Presentation2017 7 Howdeepis‘GoogleNet’? RevolutionofDepth
  • 8.
    Bumsoo Kim Presentation2017 8 Deeperisbetter? RevolutionofDepth The winning models of 2014 was distinguishable by its depth!
  • 9.
    Bumsoo Kim Presentation2017 9 Deeperisbetter? RevolutionofDepth The winning models of 2014 was distinguishable by its depth! So, does deeper network is the key to better performance?
  • 10.
    Bumsoo Kim Presentation2017 10 Deeperisbetter? RevolutionofDepth The winning models of 2014 was distinguishable by its depth! So, does deeper network is the key to better performance? The answer is…
  • 11.
    Bumsoo Kim Presentation2017 11 Deeperisbetter? RevolutionofDepth The winning models of 2014 was distinguishable by its depth! So, does deeper network is the key to better performance? The answer is… NO!
  • 12.
    Bumsoo Kim Presentation2017 12 DeeperisNOTbetter ObstaclesofDepth Overfitting Unbalance & Aliasing
  • 13.
    Bumsoo Kim Presentation2017 13 DeeperisNOTbetter ObstaclesofDepth Overfitting Unbalance & Aliasing
  • 14.
    Bumsoo Kim Presentation2017 14 DeeperisNOTbetter ObstaclesofDepth Overfitting Unbalance & Aliasing
  • 15.
    Bumsoo Kim Presentation2017 15 DeeperisNOTbetter ObstaclesofDepth Overfitting Unbalance & Aliasing
  • 16.
    Bumsoo Kim Presentation2017 16 DeeperisNOTbetter ObstaclesofDepth Overfitting Unbalance & Aliasing
  • 17.
    Bumsoo Kim Presentation2017 17 DeeperisNOTbetter ObstaclesofDepth Overfitting Unbalance & Aliasing
  • 18.
    Bumsoo Kim Presentation2017 18 ZF-Netdidnotprovideafundamentalsolution ObstaclesofDepth ZF-net fundamentally failed in increasing depth
  • 19.
    Bumsoo Kim Presentation2017 19 ZF-Netdidnotprovideafundamentalsolution ObstaclesofDepth ZF-net fundamentally failed in increasing depth 2014 top models succeeded in addressing this problem
  • 20.
    Bumsoo Kim Presentation2017 20 ZF-Netdidnotprovideafundamentalsolution ObstaclesofDepth ZF-net fundamentally failed in increasing depth 2014 top models succeeded in addressing this problem HOW?
  • 21.
    Bumsoo Kim Presentation2017 21 UniquemoduleofGooglenet Inception
  • 22.
    Bumsoo Kim Presentation2017 22 NetworkinNetwork Inceptionnodules
  • 23.
    Bumsoo Kim Presentation2017 23 NetworkinNetwork Inceptionnodules NIN (Network In Network) – Min Lin, 2013, Network in Network local receptive linear features local receptive linear+non-linear features non-linear features
  • 24.
    Bumsoo Kim Presentation2017 24 NetworkinNetwork Inceptionnodules NIN (Network In Network) – Min Lin, 2013, Network in Network non-linear features 90% of computation
  • 25.
    Bumsoo Kim Presentation2017 25 NetworkinNetwork Inceptionnodules NIN (Network In Network) – Min Lin, 2013, Network in Network Inception Average Pooling
  • 26.
    Bumsoo Kim Presentation2017 26 1x1Convolutions Inceptionnodules
  • 27.
    Bumsoo Kim Presentation2017 27 1x1Convolutions Inceptionnodules Original model
  • 28.
    Bumsoo Kim Presentation2017 28 1x1Convolutions Inceptionnodules Original model Various filters for various feature scales
  • 29.
    Bumsoo Kim Presentation2017 29 1x1Convolutions Inceptionnodules Original model Expensive unit : Larger filter sizes are very expensive!!
  • 30.
    Bumsoo Kim Presentation2017 30 1x1Convolutions Inceptionnodules Original model 1x1 Convolution
  • 31.
    Bumsoo Kim Presentation2017 31 1x1Convolutions Inceptionnodules 1x1 Convolution Dimension reduction Hebbian Principle Neurons that fire together, wire together 동일 차원에서 activate된 neuron들은 비슷한 성질을 가진 것들로 묶일 수 있다. c2 (large) c3 (small)
  • 32.
    Bumsoo Kim Presentation2017 32 Dowereallyneed5x5filters? Inception Factorizing Convolutions 5*5 = 25 → 2*(3*3) = 18 => 28% decrease of parameters!
  • 33.
    Bumsoo Kim Presentation2017 33 Dowereallyneed5x5filters? Inception Factorizing Convolutions 5*5 = 25 → 2*(3*3) = 18 => 28% decrease of parameters!
  • 34.
    Bumsoo Kim Presentation2017 34 HowEffectiveisit? Inception 130M 4.5B
  • 35.
    Bumsoo Kim Presentation2017 35 Whydoweneedit? AuxiliaryClassifier Gradient Vanishing Problem ReLU doesn’t provide a fundamental solution Overlapping multiple ReLUs cause the same problem!
  • 36.
    Bumsoo Kim Presentation2017 36 Whatiswrongaboutit? GradientVanishing Vanishing Gradients on Sigmoids * Deep Networks use activation function for non-linearity. *“Saturated”: Weight ‘W’ is too large. => ‘z’is 0 or 1 => z*(1-z) = 0 z = 1/(1 + np.exp(-np.dot(W, x))) # forward pass dx = np.dot(W.T, z*(1-z)) # backward pass: local gradient for x dW = np.outer(z*(1-z), x) # backward pass: local gradient for W * Local gradient is small : 0 <= z*(1-z) <= 0.25 (when z=0.5) : top layers are not trained
  • 37.
    Bumsoo Kim Presentation2017 37 Detailsarenotexplainedinthepaper AuxiliaryClassifier Training Deeper Convolutional Networks with Deep SuperVision(2014, Liwei Chang, Chen-Yu Lee)
  • 38.
    Bumsoo Kim Presentation2017 38 BeforeGradientVanishes! AuxiliaryClassifier Training Deeper Convolutional Networks with Deep SuperVision(2014, Liwei Chang, Chen-Yu Lee) 깊이가 깊어질수록 vanishing될 우려가 높아지므로, 얕은 깊이에서도 업데이트!
  • 39.
    Bumsoo Kim Presentation2017 39 EmpiricalDecision AuxiliaryClassifier Training Deeper Convolutional Networks with Deep SuperVision(2014, Liwei Chang, Chen-Yu Lee) 넣는 위치는 이론적 근거 없이 순수하게 “경험적”으로 판단한다.
  • 40.
    Bumsoo Kim Presentation2017 40 HowEffectiveisit? AuxiliaryClassifier Training Deeper Convolutional Networks with Deep SuperVision(2014, Liwei Chang, Chen-Yu Lee)
  • 41.
    Bumsoo Kim Presentation2017 41 Trainityourself! GoogleNet
  • 42.
    Bumsoo Kim Presentation2017 42 Trainityourself! GoogleNet
  • 43.
    Bumsoo Kim Presentation2017 43 Trainityourself! GoogleNet
  • 44.
    Bumsoo Kim Presentation2017 44 Trainityourself! GoogleNet
  • 45.
    Bumsoo Kim Presentation2017 45 Trainityourself! GoogleNet
  • 46.
    Bumsoo Kim Presentation2017 46 Trainityourself! GoogleNet
  • 47.
    Bumsoo Kim Presentation2017 47 Trainityourself! GoogleNet
  • 48.
    Bumsoo Kim Presentation2017 48 Trainityourself! GoogleNet
  • 49.
    Bumsoo Kim Presentation2017 49 Trainityourself! GoogleNet
  • 50.
    Bumsoo Kim Presentation2017 50 Trainityourself! GoogleNet
  • 51.
    Bumsoo Kim Presentation2017 51 Trainityourself! GoogleNet
  • 52.
    Bumsoo Kim Presentation2017 52 Trainityourself! GoogleNet
  • 53.
    Bumsoo Kim Presentation2017 53 Trainityourself! GoogleNet
  • 54.
    Bumsoo Kim Presentation2017 Trainityourself! GoogleNet
  • 55.
    Bumsoo Kim Presentation2017 55 Trainityourself! GoogleNet
  • 56.
    Bumsoo Kim Presentation2017 56 Trainityourself! GoogleNet
  • 57.
    Thank You Bumsoo KimPresentation 2017 57