SlideShare a Scribd company logo
†
‡
† †‡ †
nofDepth
,96,/4,pool/2
256,pool/2
nv,384
nv,384
256,pool/2
4096
4096
1000
3x3conv,64
3x3conv,64,pool/2
3x3conv,128
3x3conv,128,pool/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
fc,4096
fc,4096
fc,1000
VGG,19layers
(ILSVRC2014)
input
Conv
7x7+2(S)
MaxPool
3x3+2(S)
LocalRespNorm
Conv
1x1+1(V)
Conv
3x3+1(S)
LocalRespNorm
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
AveragePool
5x5+3(V)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
AveragePool
5x5+3(V)
DepthConcat
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
AveragePool
7x7+1(V)
FC
Conv
1x1+1(S)
FC
FC
SoftmaxActivation
softmax0
Conv
1x1+1(S)
FC
FC
SoftmaxActivation
softmax1
SoftmaxActivation
softmax2
GoogleNet,22layers
(ILSVRC2014)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x2conv,128,/2
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,256,/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
7x7conv,64,/2,pool/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,512,/2
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
avepool,fc1000
ageRecognition”.arXi
et.al
w1 w2 w3
w1
w2
w3
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DE
tion, outputting the classification scores using global average pooling or global max p
from the feature map f (·). However, global average pooling increases in the respons
of entire feature map at specific class due to using an average of all pixel at a featur
On the other hand, global max pooling does not increase the entire feature map at s
class because of using a maximum pixel value in a feature map. Response score fo
class of global average pooling and global max pooling is calculated as follow Eq. (1
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
After outputting the score for each class, the attention of pedestrian and occlusion r
are generated. First, we fuse the multiple channel feature map to one channel. In this
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) so
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is
summation of feature map. In softmax weighting, it is weighted the feature maps fo
channel using softmax score by Eq. (2). The softmax weighting can mask the unnec
channel feature map. In SE block fusion, it is weighted the feature maps for each c
using the attention of SE block like Squeeze-and-Excitation Network. After fusing
channel, pedestrian classification and occlusion state attentions are fused. In this wo
calculate the attention by subtracting the occlusion attention from pedestrian classifi
attention. Here, we call the attention the attention map because of containing positi
negative values.
Attentioni =
C
∑
c=1
fc
(xi)∗
exp(vc
i )
∑J
j=1 exp vj
i
3.4 Perception branch
In the perception branch, it outputs the final result score using attention map and featu
of RoI pooling. Attention map can refine the feature map of RoI pooling, such as m
unnecessary background feature and enhancing the important locations. Converted
map is made of the inner product of attention map and feature map from RoI poolin
perception branch is composed two fully connected layers like Fast R-CNN. The struc
the perception branch is the same as conventional Fast R-CNN, however, our model e
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
Anonymous CVPR submission
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
f(xi) (4)
f (xi, yi) (5)
2. Concolusion
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
References
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
References
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
90
91
92
93
94
95
96
97
98
99
00
01
02
03
04
05
06
07
08
09
10
11
12
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
summation of feature map. In softmax weighting, it is weighted the feature maps for each
channel using softmax score by Eq. (2). The softmax weighting can mask the unnecessary
channel feature map. In SE block fusion, it is weighted the feature maps for each channel
using the attention of SE block like Squeeze-and-Excitation Network. After fusing to one
channel, pedestrian classification and occlusion state attentions are fused. In this work, we
calculate the attention by subtracting the occlusion attention from pedestrian classification
attention. Here, we call the attention the attention map because of containing positive and
negative values.
Attentioni =
C
∑
c=1
fc
(xi)∗
exp(vc
i )
∑J
j=1 exp vj
i
(2)
3.4 Perception branch
In the perception branch, it outputs the final result score using attention map and feature map
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5
tion, outputting the classification scores using global average pooling or global max pooling
from the feature map f (·). However, global average pooling increases in the response value
of entire feature map at specific class due to using an average of all pixel at a feature map.
On the other hand, global max pooling does not increase the entire feature map at specific
class because of using a maximum pixel value in a feature map. Response score for each
class of global average pooling and global max pooling is calculated as follow Eq. (1).
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5
tion, outputting the classification scores using global average pooling or global max pooling
from the feature map f (·). However, global average pooling increases in the response value
of entire feature map at specific class due to using an average of all pixel at a feature map.
On the other hand, global max pooling does not increase the entire feature map at specific
class because of using a maximum pixel value in a feature map. Response score for each
class of global average pooling and global max pooling is calculated as follow Eq. (1).
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
summation of feature map. In softmax weighting, it is weighted the feature maps for each
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
f(xi) (4)
f (xi, yi) (5)
2. Concolusion
References
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
How Small Network Can Detect Ped
Anonymous CVPR submission
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
M, N (3)
C (4)
Table 1. Classification error on the ILSVRC validation set.
Networks top-1 val. error top-5 val. error
VGGnet-GAP 33.4 12.2
GoogLeNet-GAP 35.0 13.2
AlexNet∗-GAP 44.9 20.9
AlexNet-GAP 51.1 26.3
GoogLeNet 31.9 11.3
VGGnet 31.2 11.4
AlexNet 42.6 19.5
NIN 41.9 19.6
GoogLeNet-GMP 35.6 13.9
Table 2. Localization error on the ILSVRC validation set. Bac
prop refers to using [23] for localization instead of CAM.
Method top-1 val.error top-5 val. error
GoogLeNet-GAP 56.40 43.00
VGGnet-GAP 57.20 45.14
GoogLeNet 60.09 49.34
AlexNet∗-GAP 63.75 49.53
AlexNet-GAP 67.19 52.16
NIN 65.47 54.19
Backprop on GoogLeNet 61.31 50.55
Lall(x) = Eatt(x) + Eper(x)
Eper(x)
Eatt(x)
g(x)
M(x)
g′(x)
g′(x) = (1 + M(x)) ⋅ g(x)
irshick2
Piotr Doll´ar2
Zhuowen Tu1
Kaiming He2
C San Diego 2
Facebook AI Research
@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com
rized network archi-
etwork is constructed
egates a set of trans-
ur simple design re-
architecture that has
is strategy exposes a
ality” (the size of the
factor in addition to
On the ImageNet-1K
under the restricted
ncreasing cardinality
racy. Moreover, in-
han going deeper or
Our models, named
entry to the ILSVRC
secured 2nd place.
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
+
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
....
total 32
paths
256-d in
+
256, 1x1, 64
64, 3x3, 64
64, 1x1, 256
+
256-d in
256-d out
256-d out
Figure 1. Left: A block of ResNet [14]. Right: A block of
ResNeXt with cardinality = 32, with roughly the same complex-
ity. A layer is shown as (# in channels, filter size, # out channels).
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topol-
ogy. This simple rule reduces the free choices of hyper-
parameters, and depth is exposed as an essential dimension
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie1
Ross Girshick2
Piotr Doll´ar2
Zhuowen Tu1
Kaiming He2
1
UC San Diego 2
Facebook AI Research
{s9xie,ztu}@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com
Abstract
We present a simple, highly modularized network archi-
tecture for image classification. Our network is constructed
by repeating a building block that aggregates a set of trans-
formations with the same topology. Our simple design re-
sults in a homogeneous, multi-branch architecture that has
only a few hyper-parameters to set. This strategy exposes a
new dimension, which we call “cardinality” (the size of the
set of transformations), as an essential factor in addition to
the dimensions of depth and width. On the ImageNet-1K
dataset, we empirically show that even under the restricted
condition of maintaining complexity, increasing cardinality
is able to improve classification accuracy. Moreover, in-
creasing cardinality is more effective than going deeper or
wider when we increase the capacity. Our models, named
ResNeXt, are the foundations of our entry to the ILSVRC
2016 classification task in which we secured 2nd place.
We further investigate ResNeXt on an ImageNet-5K set and
the COCO detection set, also showing better results than
its ResNet counterpart. The code and models are publicly
available online1
.
1. Introduction
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
+
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
....
total 32
paths
256-d in
+
256, 1x1, 64
64, 3x3, 64
64, 1x1, 256
+
256-d in
256-d out
256-d out
Figure 1. Left: A block of ResNet [14]. Right: A block of
ResNeXt with cardinality = 32, with roughly the same complex-
ity. A layer is shown as (# in channels, filter size, # out channels).
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topol-
ogy. This simple rule reduces the free choices of hyper-
parameters, and depth is exposed as an essential dimension
in neural networks. Moreover, we argue that the simplicity
of this rule may reduce the risk of over-adapting the hyper-
parameters to a specific dataset. The robustness of VGG-
nets and ResNets has been proven by various visual recog-
nition tasks [7, 10, 9, 28, 31, 14] and by non-visual tasks
involving speech [42, 30] and language [4, 41, 20].
Unlike VGG-nets, the family of Inception models [38,
17, 39, 37] have demonstrated that carefully designed
v:1611.05431v2[cs.CV]11Apr2017
Densely Connected Convolutional Networks
Gao Huang⇤
Cornell University
gh349@cornell.edu
Zhuang Liu⇤
Tsinghua University
liuzhuang13@mails.tsinghua.edu.cn
Laurens van der Maaten
Facebook AI Research
lvdmaaten@fb.com
Kilian Q. Weinberger
Cornell University
kqw4@cornell.edu
Abstract
Recent work has shown that convolutional networks can
be substantially deeper, more accurate, and efficient to train
if they contain shorter connections between layers close to
the input and those close to the output. In this paper, we
embrace this observation and introduce the Dense Convo-
lutional Network (DenseNet), which connects each layer
to every other layer in a feed-forward fashion. Whereas
traditional convolutional networks with L layers have L
connections—one between each layer and its subsequent
layer—our network has L(L+1)
2 direct connections. For
each layer, the feature-maps of all preceding layers are
used as inputs, and its own feature-maps are used as inputs
into all subsequent layers. DenseNets have several com-
pelling advantages: they alleviate the vanishing-gradient
problem, strengthen feature propagation, encourage fea-
ture reuse, and substantially reduce the number of parame-
ters. We evaluate our proposed architecture on four highly
competitive object recognition benchmark tasks (CIFAR-10,
CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig-
nificant improvements over the state-of-the-art on most of
them, whilst requiring less computation to achieve high per-
formance. Code and pre-trained models are available at
https://github.com/liuzhuang13/DenseNet.
1. Introduction
Convolutional neural networks (CNNs) have become
the dominant machine learning approach for visual object
recognition. Although they were originally introduced over
20 years ago [18], improvements in computer hardware and
network structure have enabled the training of truly deep
CNNs only recently. The original LeNet5 [19] consisted of
5 layers, VGG featured 19 [29], and only last year Highway
⇤Authors contributed equally
x0
x1
H1
x2
H2
H3
H4
x3
x4
Figure 1: A 5-layer dense block with a growth rate of k = 4.
Each layer takes all preceding feature-maps as input.
Networks [34] and Residual Networks (ResNets) [11] have
surpassed the 100-layer barrier.
As CNNs become increasingly deep, a new research
problem emerges: as information about the input or gra-
dient passes through many layers, it can vanish and “wash
out” by the time it reaches the end (or beginning) of the
network. Many recent publications address this or related
problems. ResNets [11] and Highway Networks [34] by-
pass signal from one layer to the next via identity connec-
tions. Stochastic depth [13] shortens ResNets by randomly
dropping layers during training to allow better information
and gradient flow. FractalNets [17] repeatedly combine sev-
eral parallel layer sequences with different number of con-
volutional blocks to obtain a large nominal depth, while
maintaining many short paths in the network. Although
these different approaches vary in network topology and
training procedure, they all share a key characteristic: they
create short paths from early layers to later layers.
1
arXiv:1608.06993v5[cs.CV]28Jan2018
tanh
× Σ
f(st)
g(st)
g′(st)
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

More Related Content

What's hot

[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
Deep Learning JP
 
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Yosuke Shinya
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
cvpaper. challenge
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
Sho Takase
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
Deep Learning JP
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
Yusuke Uchida
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめた
ぱんいち すみもと
 
Attentionの基礎からTransformerの入門まで
Attentionの基礎からTransformerの入門までAttentionの基礎からTransformerの入門まで
Attentionの基礎からTransformerの入門まで
AGIRobots
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models
cvpaper. challenge
 
Cosine Based Softmax による Metric Learning が上手くいく理由
Cosine Based Softmax による Metric Learning が上手くいく理由Cosine Based Softmax による Metric Learning が上手くいく理由
Cosine Based Softmax による Metric Learning が上手くいく理由
tancoro
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
 
【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...
【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...
【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...
Deep Learning JP
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
Arithmer Inc.
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )
cvpaper. challenge
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
Deep Learning JP
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
Takumi Ohkuma
 
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
Masahiro Suzuki
 
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
Deep Learning JP
 

What's hot (20)

[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめた
 
Attentionの基礎からTransformerの入門まで
Attentionの基礎からTransformerの入門までAttentionの基礎からTransformerの入門まで
Attentionの基礎からTransformerの入門まで
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models
 
Cosine Based Softmax による Metric Learning が上手くいく理由
Cosine Based Softmax による Metric Learning が上手くいく理由Cosine Based Softmax による Metric Learning が上手くいく理由
Cosine Based Softmax による Metric Learning が上手くいく理由
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...
【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...
【DL輪読会】StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-I...
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
 
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
 

Similar to [MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
Thiyagarajan G
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
ijsrd.com
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdf
Nesrine Wagaa
 
On Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingOn Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service Multicasting
Andrea Tassi
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Soma Boubou
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
pione30
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysis
lalitxp
 
Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problemAnkit Katiyar
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive Subsampling
Martino Ferrari
 
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONTWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
VLSICS Design
 
ICPR 2016
ICPR 2016ICPR 2016
Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)
MichaelDang47
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
Ding Li
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
ssuser2797e4
 
Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...
zammok
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
cscpconf
 
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationTwo Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
VLSICS Design
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
Universitat Politècnica de Catalunya
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
Steve Nouri
 
UDSLF
UDSLFUDSLF

Similar to [MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network (20)

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdf
 
On Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingOn Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service Multicasting
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysis
 
Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problem
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive Subsampling
 
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONTWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
 
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationTwo Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 
UDSLF
UDSLFUDSLF
UDSLF
 

More from Hiroshi Fukui

最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
Hiroshi Fukui
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural Network
Hiroshi Fukui
 
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
Hiroshi Fukui
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
Hiroshi Fukui
 
CVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたCVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみた
Hiroshi Fukui
 
2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料
Hiroshi Fukui
 

More from Hiroshi Fukui (6)

最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural Network
 
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
 
CVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたCVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみた
 
2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料
 

Recently uploaded

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 

Recently uploaded (20)

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 

[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

  • 2. nofDepth ,96,/4,pool/2 256,pool/2 nv,384 nv,384 256,pool/2 4096 4096 1000 3x3conv,64 3x3conv,64,pool/2 3x3conv,128 3x3conv,128,pool/2 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256,pool/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512,pool/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512,pool/2 fc,4096 fc,4096 fc,1000 VGG,19layers (ILSVRC2014) input Conv 7x7+2(S) MaxPool 3x3+2(S) LocalRespNorm Conv 1x1+1(V) Conv 3x3+1(S) LocalRespNorm MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) AveragePool 5x5+3(V) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) AveragePool 5x5+3(V) DepthConcat MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat AveragePool 7x7+1(V) FC Conv 1x1+1(S) FC FC SoftmaxActivation softmax0 Conv 1x1+1(S) FC FC SoftmaxActivation softmax1 SoftmaxActivation softmax2 GoogleNet,22layers (ILSVRC2014) KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015. 1x1conv,64 3x3conv,64 1x1conv,256 1x1conv,64 3x3conv,64 1x1conv,256 1x1conv,64 3x3conv,64 1x1conv,256 1x2conv,128,/2 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,256,/2 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 7x7conv,64,/2,pool/2 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,512,/2 3x3conv,512 1x1conv,2048 1x1conv,512 3x3conv,512 1x1conv,2048 1x1conv,512 3x3conv,512 1x1conv,2048 avepool,fc1000 ageRecognition”.arXi
  • 5. 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DE tion, outputting the classification scores using global average pooling or global max p from the feature map f (·). However, global average pooling increases in the respons of entire feature map at specific class due to using an average of all pixel at a featur On the other hand, global max pooling does not increase the entire feature map at s class because of using a maximum pixel value in a feature map. Response score fo class of global average pooling and global max pooling is calculated as follow Eq. (1 vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), After outputting the score for each class, the attention of pedestrian and occlusion r are generated. First, we fuse the multiple channel feature map to one channel. In this we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) so weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is summation of feature map. In softmax weighting, it is weighted the feature maps fo channel using softmax score by Eq. (2). The softmax weighting can mask the unnec channel feature map. In SE block fusion, it is weighted the feature maps for each c using the attention of SE block like Squeeze-and-Excitation Network. After fusing channel, pedestrian classification and occlusion state attentions are fused. In this wo calculate the attention by subtracting the occlusion attention from pedestrian classifi attention. Here, we call the attention the attention map because of containing positi negative values. Attentioni = C ∑ c=1 fc (xi)∗ exp(vc i ) ∑J j=1 exp vj i 3.4 Perception branch In the perception branch, it outputs the final result score using attention map and featu of RoI pooling. Attention map can refine the feature map of RoI pooling, such as m unnecessary background feature and enhancing the important locations. Converted map is made of the inner product of attention map and feature map from RoI poolin perception branch is composed two fully connected layers like Fast R-CNN. The struc the perception branch is the same as conventional Fast R-CNN, however, our model e 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 Anonymous CVPR submission Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) f(xi) (4) f (xi, yi) (5) 2. Concolusion 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion References 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion References 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply summation of feature map. In softmax weighting, it is weighted the feature maps for each channel using softmax score by Eq. (2). The softmax weighting can mask the unnecessary channel feature map. In SE block fusion, it is weighted the feature maps for each channel using the attention of SE block like Squeeze-and-Excitation Network. After fusing to one channel, pedestrian classification and occlusion state attentions are fused. In this work, we calculate the attention by subtracting the occlusion attention from pedestrian classification attention. Here, we call the attention the attention map because of containing positive and negative values. Attentioni = C ∑ c=1 fc (xi)∗ exp(vc i ) ∑J j=1 exp vj i (2) 3.4 Perception branch In the perception branch, it outputs the final result score using attention map and feature map 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5 tion, outputting the classification scores using global average pooling or global max pooling from the feature map f (·). However, global average pooling increases in the response value of entire feature map at specific class due to using an average of all pixel at a feature map. On the other hand, global max pooling does not increase the entire feature map at specific class because of using a maximum pixel value in a feature map. Response score for each class of global average pooling and global max pooling is calculated as follow Eq. (1). vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5 tion, outputting the classification scores using global average pooling or global max pooling from the feature map f (·). However, global average pooling increases in the response value of entire feature map at specific class due to using an average of all pixel at a feature map. On the other hand, global max pooling does not increase the entire feature map at specific class because of using a maximum pixel value in a feature map. Response score for each class of global average pooling and global max pooling is calculated as follow Eq. (1). vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply summation of feature map. In softmax weighting, it is weighted the feature maps for each 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) f(xi) (4) f (xi, yi) (5) 2. Concolusion References 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 How Small Network Can Detect Ped Anonymous CVPR submission Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) M, N (3) C (4)
  • 6. Table 1. Classification error on the ILSVRC validation set. Networks top-1 val. error top-5 val. error VGGnet-GAP 33.4 12.2 GoogLeNet-GAP 35.0 13.2 AlexNet∗-GAP 44.9 20.9 AlexNet-GAP 51.1 26.3 GoogLeNet 31.9 11.3 VGGnet 31.2 11.4 AlexNet 42.6 19.5 NIN 41.9 19.6 GoogLeNet-GMP 35.6 13.9 Table 2. Localization error on the ILSVRC validation set. Bac prop refers to using [23] for localization instead of CAM. Method top-1 val.error top-5 val. error GoogLeNet-GAP 56.40 43.00 VGGnet-GAP 57.20 45.14 GoogLeNet 60.09 49.34 AlexNet∗-GAP 63.75 49.53 AlexNet-GAP 67.19 52.16 NIN 65.47 54.19 Backprop on GoogLeNet 61.31 50.55
  • 7.
  • 8. Lall(x) = Eatt(x) + Eper(x) Eper(x) Eatt(x)
  • 9.
  • 10. g(x) M(x) g′(x) g′(x) = (1 + M(x)) ⋅ g(x)
  • 11.
  • 12.
  • 13. irshick2 Piotr Doll´ar2 Zhuowen Tu1 Kaiming He2 C San Diego 2 Facebook AI Research @ucsd.edu {rbg,pdollar,kaiminghe}@fb.com rized network archi- etwork is constructed egates a set of trans- ur simple design re- architecture that has is strategy exposes a ality” (the size of the factor in addition to On the ImageNet-1K under the restricted ncreasing cardinality racy. Moreover, in- han going deeper or Our models, named entry to the ILSVRC secured 2nd place. 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 + 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 .... total 32 paths 256-d in + 256, 1x1, 64 64, 3x3, 64 64, 1x1, 256 + 256-d in 256-d out 256-d out Figure 1. Left: A block of ResNet [14]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complex- ity. A layer is shown as (# in channels, filter size, # out channels). ing blocks of the same shape. This strategy is inherited by ResNets [14] which stack modules of the same topol- ogy. This simple rule reduces the free choices of hyper- parameters, and depth is exposed as an essential dimension Aggregated Residual Transformations for Deep Neural Networks Saining Xie1 Ross Girshick2 Piotr Doll´ar2 Zhuowen Tu1 Kaiming He2 1 UC San Diego 2 Facebook AI Research {s9xie,ztu}@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com Abstract We present a simple, highly modularized network archi- tecture for image classification. Our network is constructed by repeating a building block that aggregates a set of trans- formations with the same topology. Our simple design re- sults in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, in- creasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online1 . 1. Introduction 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 + 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 .... total 32 paths 256-d in + 256, 1x1, 64 64, 3x3, 64 64, 1x1, 256 + 256-d in 256-d out 256-d out Figure 1. Left: A block of ResNet [14]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complex- ity. A layer is shown as (# in channels, filter size, # out channels). ing blocks of the same shape. This strategy is inherited by ResNets [14] which stack modules of the same topol- ogy. This simple rule reduces the free choices of hyper- parameters, and depth is exposed as an essential dimension in neural networks. Moreover, we argue that the simplicity of this rule may reduce the risk of over-adapting the hyper- parameters to a specific dataset. The robustness of VGG- nets and ResNets has been proven by various visual recog- nition tasks [7, 10, 9, 28, 31, 14] and by non-visual tasks involving speech [42, 30] and language [4, 41, 20]. Unlike VGG-nets, the family of Inception models [38, 17, 39, 37] have demonstrated that carefully designed v:1611.05431v2[cs.CV]11Apr2017 Densely Connected Convolutional Networks Gao Huang⇤ Cornell University gh349@cornell.edu Zhuang Liu⇤ Tsinghua University liuzhuang13@mails.tsinghua.edu.cn Laurens van der Maaten Facebook AI Research lvdmaaten@fb.com Kilian Q. Weinberger Cornell University kqw4@cornell.edu Abstract Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convo- lutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several com- pelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage fea- ture reuse, and substantially reduce the number of parame- ters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig- nificant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high per- formance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet. 1. Introduction Convolutional neural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago [18], improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 [19] consisted of 5 layers, VGG featured 19 [29], and only last year Highway ⇤Authors contributed equally x0 x1 H1 x2 H2 H3 H4 x3 x4 Figure 1: A 5-layer dense block with a growth rate of k = 4. Each layer takes all preceding feature-maps as input. Networks [34] and Residual Networks (ResNets) [11] have surpassed the 100-layer barrier. As CNNs become increasingly deep, a new research problem emerges: as information about the input or gra- dient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network. Many recent publications address this or related problems. ResNets [11] and Highway Networks [34] by- pass signal from one layer to the next via identity connec- tions. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow. FractalNets [17] repeatedly combine sev- eral parallel layer sequences with different number of con- volutional blocks to obtain a large nominal depth, while maintaining many short paths in the network. Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers. 1 arXiv:1608.06993v5[cs.CV]28Jan2018
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.