Five	Insights	from		
GoogLeNet		
You	Could	Use	In	Your	Own	Deep	Learning		Nets	
Auro	Tripathy	
3b 4a 4b 4c 4d 4e 5a3a 5b
www.shaBerline.com	 1
Year	1989	Kicked-Off	ConvoluKon	Neural	Nets	
Ten-Digit	Classifier	using	a	Modest	Neural	Network	with	Three	Hidden	Layers	
Backpropaga)on	Applied	to	Handwri4en	Zip	Code	
Recogni)on.	LeCun,	et.	al.	
hBp://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf	
Hidden		
Units	
Connec-ons	 Params	
Out	–	
H3	
(FC)	
10	
Visible	
10	x	(30W	
+1B	)=	
310	
10	x	(30W	
+1B	)=	
310	
H3	–	
H2	
(FC)	
30	 30	*	(192	
Weights	+	1	
Bias)	=	5790	
30	*		
(192	W	+	1	B)	=		
5790	
H2	–	
H1	
(Conv)	
12	X	4	x	
4	=	192	
192	x	(5	x	5	x	8	
+	1)=	
38592	
5	x	5	x	8	x	12	+	
192	Biases	=	
2592	
H1	–	
Input	
(Conv)	
12	x	8	x	
8	=	768	
768	x	(5	x	5	x	1	
+	1)	=	19968	
5	x	5	x	1	x	12	+	
768	Biases	=	
1068	
Totals	 16	x	16	
In	+	990	
Hidden	+	
10	Out	
64660	
ConnecKons	
9760	Params	
Each	of	the	units	in	H2	combines	local	
informaKon	coming	from	8	of	the	12	
different	feature	maps	in	H1.	www.shaBerline.com	 2
Year	2012	Marked	The	InflecKon	Point		
Reintroducing	CNNs	Led	to	Big	Drop	in	Error	for	Image	ClassificaKon.	
	Since	Then,	 	Networks	ConKnued	to	Reduce	
28.2	
25.8	
16.4	
11.7	
7.3	
6.7	
3.57	
0	
5	
10	
15	
20	
25	
30	
ILSVRC'10	 ILSVRC'11	 ILSVRC'12	
(Alexnet)	
ILSVRC'13	 ILSVRC'14	 ILSVRC'14	
(GoogLeNet)	
ILSVRC'15	
(ResNet)	
0	
20	
40	
60	
80	
100	
120	
140	
160	
Error	%	
Layers	
www.shaBerline.com	 3	
Top-5
The	Trend	has	been	to	Increase	the	
number	of	Layers	(&	Layer	Size)	
•  The	typical	‘design	paBern’	for	ConvoluKonal	Neural	Nets:	
–  Stacked	convoluKonal	layers,		
•  linear	filter	followed	by	a	non-linear	acKvaKon	
–  Followed	by	contrast	normalizaKon	and	max	pooling,		
–  PenulKmate	layers	(one	or	more)	are	fully	connected	layers.	
–  UlKmate	layer	is	a	loss	layer,	possibly	more	than	one,	in	a	weighted	
mix	
•  Use	of	dropouts	to	address	the	problem	of	over-fipng	due	to	
many	layers		
•  In	addiKon	to	classificaKon,	architecture	good	for	localizaKon	
and	object	detecKon		
–  despite	concerns	that	max-pooling	dilutes	spaKal	informaKon	
www.shaBerline.com	 4
The	Challenge	of	Deep	Networks	
1.  Adding	layers	increases	the	number	of	
parameters	and	makes	the	network	prone	to	
over-fipng	
–  Exacerbated	by	paucity	of	data	
–  More	data	means	more	expense	in	their	annotaKon	
2.  More	computaKon	
–  Linear	increase	in	filters	results	in	quadraKc	increase	
in	compute	
–  If	weights	are	close	to	zero,	we’ve	wasted		compute	
resources	
www.shaBerline.com	 5
Year	2014,	GoogLeNet	Took	Aim	at	
Efficiency	and	PracKcality	
	Resultant	benefits	of	the	new	architecture:	
•  12	Kmes	lesser	parameters	than	AlexNet		
– Significantly	more	accurate	than	AlexNet	
– Lower	memory-use	and	lower	power-use	acutely	
important	for	mobile	devices.	
•  	Stays	within	the	targeted	1.5	Billion	mulKply-
add	budget	
– ComputaKonal	cost	“less	than	2X	compared	to	
AlexNet”	
hBp://www.youtube.com/watch?v=ySrj_G5gHWI&t=12m42s	
www.shaBerline.com	 6
Introducing	the	IncepKon	Module	
www.shaBerline.com	 7	
1x1
5x5
3x3
1x1
3x3 Max
Pooling
Previous
Layer
Concatenate
IntuiKon	behind	the	IncepKon	Module	
	•  Cluster	neurons	according	to	the	correlaKon	staKsKcs	in	the	dataset	
–  An	opKmal	layered	network	topology	can	be	constructed	by	analyzing	the	
correlaKon	staKsKcs	of	the	preceding	layer	acKvaKons	and		and	clustering	
neurons	with	highly	correlated	outputs.		
•  We	already	know	that,	in	the	lower	layers,	there	exists	high	
correlaKons	in	image	patches	that	are	local	and	near-local.	
–  These	can	be	covered	by	1x1	convoluKons	
–  AddiKonally,	a	smaller	number	of	spaKally	spread-out	clusters	can	be	covered	
by	convoluKon	over	larger	patches;	i.e.,	3x3,	and	5x5	
–  And	there	will	be	decreasing	number	of	patches	over	larger	and	larger	
regions.	
•  It	also	suggests	that	the	architecture	is	a	combina)on	of	the	of	all	
the	convoluKons,	the	1x1,	3x3,	5x5,	as	input	to	the	next	stage	
•  Since	max-pooling	has	been	successful,	it	suggests	adding	a	pooling	
layer	in	parallel		
www.shaBerline.com	 8
In	Images,	correlaKon	tends	to	be	local,	exploit	it.	
Heterogeneous	set	of	convoluKons	to	cover	spread-out	clusters	
www.shaBerline.com	 9	
Cover	very	local		
clusters	w/1x1	convoluKons	
Cover	more	spread-out		
clusters	w/3x3	convoluKons	
Cover	even	more	spread-out		
clusters	w/5x5	convoluKons	
5x5 3x3 1x1
5x5
3x31x1
Previous
Layer
Conceiving	the	IncepKon	Module	
www.shaBerline.com	 10	
5x5
3x3
1x1
3x3 Max
Pooling
Concatenate
Previous
Layer
IncepKon	Module	Put	Into	PracKce		
Judicious	Dimension	ReducKon	
www.shaBerline.com	 11	
1x1
5x5
3x3
1x1
3x3 Max
Pooling
Previous
Layer
Concatenate
www.shaBerline.com	 12	
Insights…	
3b 4a 4b 4c 4d 4e 5a3a 5b
GoogLeNet	Insight	#1	
(Summary	from	previous	Slides)	
Leads	to	the	following	architecture	choices:	
•  Choosing	filter	sizes	of	1X1,	3X3,	5X5		
•  Applying	all	three	filters	on	the	same	“patch”	
of	image	(no	need	to	choose)	
•  ConcatenaKng	all	filters	as	a	single	output	
vector	for	the	next	stage.	
•  ConcatenaKng	an	addiKonal	pooling	path	
since	pooling	is	essenKal	to	the		success	of	
CNNs.	
www.shaBerline.com	 13
GoogLeNet	Insights	#2	
Decrease	dimensions	wherever	computaKon	requirements	increase	
via	a	1X1	Dimension	ReducKon	Layer	
•  Use	inexpensive	1X1	convoluKons	to	compute	
reducKons	before	the	expensive	3X3	and	3X5	
convoluKons	
•  1X1	convoluKons	include	a	ReLU	acKvaKon	
making	then	dual-purpose.		
1x1
Previous
Layer
ReLU
www.shaBerline.com	 14
GoogLeNet	Insight	#3	
Stack	IncepKon	Modules	Upon	Each	Other	
•  Occasionally	insert	max-pooling	layers	with	stride	2	to	
decimate	(by	half)	the	resoluKon	of	the	grid.	
•  Stacking	IncepKon	Layers	benefits	the	results	when	
used	at	higher	layers	(not	strictly	necessary)	
–  Lower	layers	are	kept	in	tradiKonal	convoluKons	fashion	
(for	memory	efficiency	reasons)	
•  This	stacking	allows	for	tweaking	each	module	without	
uncontrolled	blowup	in	computaKonal	complexity	at	
later	stages.	
–  For	example,	a	tweak	could	be	increase	width	at	any	stage.		
www.shaBerline.com	 15
GoogLeNet	Components	
Stacking	IncepKon	Modules	
3b 4a 4b 4c 4d 4e 5a3a 5b
Input
Average
Pooling
Traditional
Convolutions
(Conv + MaxPool +
Conv + MaxPool)
Linear
Nine Inception Modules
SoftMax
w/LossMaxPool
Label
www.shaBerline.com	 16
GoogLeNet	Insight	#4	
Counter-Balancing	Back-PropagaKon	Downsides	in	Deep	Networks	
•  A	potenKal	problem	
–  Back-propagaKng	thru	deep	networks	could	result	in	
“vanishing	gradients”	(possibly	mean,	dead	ReLUs).		
•  A	soluKon	
–  Intermediate	layers	do	have	discriminatory	powers	
–  Auxiliary	classifiers	were	appended	to	the	
intermediate	layers	
–  During	training,	the	intermediate	loss	was	added	to	
the	total	loss	with	a	discounted	factor	of	0.3	
www.shaBerline.com	 17
Two	AddiKonal	Loss	Layers		
for	Training	to	Depth	
3b 4a 4b 4c 4d 4e 5a3a 5b
Input
Average
Pooling
Traditional
Convolutions
(Conv + MaxPool +
Conv + MaxPool)
Linear
Nine Inception Modules
SoftMax
w/Loss 2MaxPool
Average
Pooling
1x1
Conv
DropOutFully
Connected
SoftMax
w/Loss 0Linear
Label
SoftMax
w/Loss 1
www.shaBerline.com	 18
GoogLeNet	Insight	#5	
End	with	Global	Average	Pooling	Layer	Instead	of	Fully	Connected	Layer	
•  Fully-Connected	layers	are	prone	to	over-fipng	
–  Hampers	generalizaKon	
•  Average	Pooling	has	no	parameter	to	opKmize,	thus	no	over-fipng.	
•  Averaging	more	naKve	to	the	convoluKonal	structure	
–  Natural	correspondence	between	feature-maps	and	categories	leading	
to	easier	interpretaKon	
•  Average	Pooling	does	not	exclude	the	use	of	Dropouts,	a	proven	
regularizaKon	method	to	avoid	over-fipng.	
3b 4a 4b 4c 4d 4e 5a3a 5b
Global
Average
Pooling
Linear
Layer for
adapting to
other label Sets
SoftMax
w/Loss
Label
www.shaBerline.com	 19
Summarizing	The	Insights	
1.  Exploit	fully	the	fact	that,	in	Images,	correlaKon	tend	
to	be	local	
•  Concatenate	1X1,	3X3,	5x5	convoluKons	along	with	pooling	
2.  Decrease	dimensions	wherever	computaKon	
requirements	increase	via	a	1X1	Dimension	ReducKon	
Layer	
3.  Stack	IncepKon	Modules	Upon	Each	Other	
4.  Counter-Balance	Back-PropagaKon	Downsides	in	
Deep	Network	
•  Uses	intermediate	losses	in	the	final	loss	
5.  End	with	Global	Average	Pooling	Layer	Instead	of	
Fully	Connected	Layer	
www.shaBerline.com	 20
References	
•  Seminal	
– Backpropaga)on	Applied	to	Handwri4en	Zip	Code	
Recogni)on.	LeCun,	et.	al.	
•  Deep	Networks	
– Going	Deeper	with	ConvoluKons	
– Network	In	Network	
www.shaBerline.com	 21

GoogLeNet Insights