[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向

[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
中部大学 Machine Perception ＆ Robotics Group
福井宏山下隆義山内悠嗣藤吉弘亘

歩行者検出
• 入力画像に対して歩行者の位置とスケールを推定する技術
2

歩行者検出における精度の遷移
3
2004, 2005 2009, 2010

歩行者の検出方法
• 特徴量の抽出と統計的学習法の組み合わせによる検出
– 代表的な例：Histogram of Oriented Gradient特徴量＋ Support Vector Machine [Dalal 2005]
4
検出枠特徴抽出
SVM
識別器識別結果
歩行者
入力画像
[Dalal 2005] N. Dalal and B. Triggs,"Histograms of Oriented Gradients for Human Detection", CVPR, 2005.

歩行者検出における精度の遷移
5
2004, 2005 2009, 2010 2013, 2014 2015, 2016

なぜ，歩行者検出の性能が飛躍的に向上したのか？
• 大規模なデータセットの登場
– 2009年以降は数万枚のサンプルを有するデータセットが登場
– RGB画像だけでなくLIDARやステレオ等のデータも用いられるように
• Deep Convolutional Neural Networkの発展
– チャンネル特徴量を用いた歩行者検出との組み合わせ
6

歩行者検出のデータセット
7
INRIA Dataset [Dalal 2004]
Caltech Pedestrian Dataset
[Dollár 2009]
・学習サンプル数：1,568
- 歩行者の数：1,208
・評価サンプル数：566
デジタルカメラで撮影した画像で構築
・学習サンプル数：33,171
- 歩行者の数：192,000
・評価サンプル数：4,024
車載カメラで撮影した画像で構築
・学習・評価サンプル数：80,000
- 歩行者の数：25,000
車載カメラで撮影した画像で構築
- LIDAR, ステレオ, GPSも公開
KITTI Dataset [Andreas 2012]
[Andreas 2012] G. Andreas, L. Philip and U. Raquel, "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite", CVPR, 2012.
[Dollár 2009] P. Dollár, C. Wojek, B. Schiele and P. Perona "Pedestrian Detection: A Benchmark”, CVPR, 2009.

Toronto City Dataset
• KITTI Datasetの後継となるデータセット
– KITTI Datasetを上回るデータ量
• トロント市全体を様々なデバイスで撮影
– RGB画像，上空画像(衛星, ドローン)， LIDAR，パノラマ，GPS
– 712km２の領域，8,439kmの道路，400,000軒の建物を撮影
8

Deep Convolutional Neural Networkの発展
• 2012年の物体認識コンテストをきっかけに歩行者検出でも応用される[Krizhevsky 2012]
– AlexNetにより高精度な1000クラスの物体認識を実現
9
AlexNet
AlexNetの認識結果
[Krizhevsky 2012] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS, 2012.

歩行者検出への応用方法
• 2段階の検出構造による歩行者検出
• Region proposalベースの歩行者検出
10
・
・
・
識別器
識別器
識別器
識別器
・
・
・

11
識別器
・
・
・
識別器
識別器
識別器
・
・
・

2段階の検出構造による歩行者検出
• CNNと他の識別器を用いた歩行者検出法
– 前段の識別器：歩行者候補領域を検出
– 後段のCNN：検出した歩行者候補領域をCNNに入力して最終的な識別結果を出力
12
前段の識別器による検出後段のCNNによる検出

2段階の検出構造による歩行者検出の効果
• 膨大な背景パターンを大幅に削減
– 誤検出削減の効果
13
：背景
：歩行者

前段で用いられる歩行者検出法
• 各年代で用いられる歩行者検出法が異なる
– ~2013年：Color Self Similarity-HOG + SVM
– 2014年~：チャンネル特徴量ベースの歩行者検出法
• どのような歩行者検出法が前段として採用される？
1. 歩行者を取りこぼしなく検出できる歩行者検出法
2. 検出時間が早い
14

チャンネル特徴量の種類
• チャンネル特徴量：LUV，勾配強度，HOGを画像のチャンネルとして扱う特徴量
15
ICF
[Dollár 2009]
VeryFast
[Benenson, 2012]
ACF
[Dollár , 2014]
LDCF
[Nam, 2014]
Checkerboard
[Benenson, 2015]
SquaresChrFtrs
[Benenson, 2013]
Filtered Channel Feature
[Benenson 2013] R. Benenson, M. Mathias, T. Tuytelaars and L. Van Gool, "Seeking the strongest rigid detector", CVPR, 2013.
[Dollár 2009] P. Dollár, Z. Tu, P. Perona and S. Belongie, "Integral Channel Features", BMVC, 2009.
[Benenson 2012] R. Benenson, M. Mathias, R. Timofte and L. Van Gool, "Pedestrian detection at 100 frames per second", CVPR2012.
[Nam 2014] W. Nam, P. Dollár and J. H. Han, "Local Decorrelation For Improved Pedestrian Detection", NIPS, 2014.
[Zhang 2015] S. Zhang, R. Benenson and B. Schiele, "Filtered Channel Features for Pedestrian Detection", CVPR, 2015.
[Dollár 2014] P. Dollár, R. Appel, S. Belongie and P. Perona, "Fast feature pyramids for object detection", PAMI, 2014.
HOG+SVM
&
DPM
チャンネル特徴量

Integral Channel Feature [Dollár 2009]
• HOG特徴量に色やエッジ情報を追加したチャンネル特徴量による歩行者検出法
– Boosted treeを用いることで歩行者検出に有効な特徴量を選択
16

VeryFast [Benenson 2012]
• 1枚のフレームに対して複数のスケールのモデルで歩行者検出をする手法
– 通常：1つのモデルを用いて複数のスケールのフレームを検出
• Feature pyramidを生成して複数のスケールのモデルを構築
– 高速にFeature pyramidを生成するFast Feature pyramidを使用
– 特徴抽出の回数が大幅に削減されるために高速
17
N/K models 1 scale image1 model N scale images

Aggregate Channel Feature [Benenson 2014]
• 集約処理を施したチャンネル特徴量を用いた歩行者検出法
– 使用するチャンネル特徴量はICF, VeryFastと同様
18
[Benenson 2014] P. Dollár, R. Appel, S. Belongie and P. Perona, "Fast feature pyramids for object detection", PAMI, 2014.

Filtered Channel Feature [Nam 2014] [Zhang 2015]
• 集約処理の際に様々なフィルタを施す歩行者検出法
– LDCF：チャンネル特徴量の相関を取り除くフィルタを使用
– Checkerboard：Checkerboardパターンのフィルタを使用
19
(a) Squar esChnt r s filters (b) Checker boar ds filters
in
ing
the
dit
filt
Sq
wi
tur
ov
As
dif
6 p
filt
of
(a) Squar esChnt r s filters (b) Checker boar ds filters
(c) RandomFi l t er s (d) I nf or medFi l t er s
(e) LDCF8 filters (f) PcaFor egr ound filters
Figure2: Illustration of thedifferent filter banksconsidered.
Except for Squar esChnt r s filters, only arandom subset
of thefull filter bank isshown. { ⌅ Red, ⇤White, ⌅ Green}
tur
ov
As
dif
6 p
filt
of
21
sid
bo
wi
cis
us
ers
fea
Ch
I n
ho
co
a n
of
for
ho
all
are
hu
Th
LDCF Checkerboard

後段で用いられるCNN
• 前段で検出した歩行者候補領域を後段のCNNに入力して最終的な認識結果を出力
• 後段に用いられるCNNのグループ分け
1. パーツベースの歩行者検出法：高精度化
2. CNN特徴量を用いた歩行者検出法：高精度化
3. カスケード構造を導入した歩行者検出法：高速化
20

2段階検出のグループ分け
21
手法年代 Miss rate 速度 (fps) パーツ
CNN特徴
+識別器
カスケードスケールの対応ネットワークモデル
Joint Deep Learning 2013 39.32 -- ✔ ✔ CNN + RBM
SDN 2014 37.87 0.7 ✔ CNN + RBM
EIN 2015 37.77 1 CNN
TACNN 2015 34.99 -- AlexNet
CCF 2015 17.32 -- ✔ VGG
Deep Cascade 2015 26.21 15 ✔ VGG
DeepParts 2015 11.89 -- ✔ GoogLeNet
CompACT 2015 11.75 2 ✔ ✔ CNN, VGGNet

1. パーツベースの歩行者検出法
• 歩行者の局所的な領域を用いてCNNを学習する方法
– 代表的な手法：Joint Deep Learning, Switchable Deep Network, DeepParts
• パーツ情報を学習に用いることでより歩行者検出に有効な特徴を抽出
– より良いパーツの特徴量を得るために複数のネットワークを組み合わせる場合が多い
22

各歩行者検出法で使用するパーツ情報
23
Joint Deep Learning
[Ouyang 2013]
・歩行者の局所的な領域を学習
- Level毎にマスクのサイズが異なる
- マスキングした特徴マップをRBMに
入力して学習
nvolutional layer (a) at thebottom toextract low- and mid-level features, four
body, head-shoulder, upper-body, and lower-body), and alogistic regression
trated in (d). It divides the whole body into three sub-regions and pass their
3.1. Pre-training and Fine-tuning
Switchable Deep Network
[Luo 2013]
・3分割した歩行者領域を学習
- 学習で用いる歩行者領域をRBMで選択
DeepParts
[Tian 2015]
・オクルージョン領域からパーツを定義
- Caltechのオクルージョンのラベルから
パーツ領域を定義
- パーツはグリッドベースで抽出
(a) part (1,1,2,2,1)
Figure 3. Part prototype examples, (x,
Eqn.(2) (a) head-left-shoulder part wit
width; (b) leg part with 2 grids in height
method is further designed to hand
proposal windows. Finally, weinfer t
complementary part detectors.[Luo 2013] P. Luo, Y. Tian, X. Wang and X. Tang, "Switchable Deep Network for Pedestrian Detection", CVPR, 2014.
[Tian 2015] Y. Tian, P. Luo, X. Wang and X. Tang, "Deep Learning Strong Parts for Pedestrian Detection", ICCV, 2015.
[Ouyang 2013] W. Ouyang and X. Wang, "Joint deep learning for pedestrian detection" ,ICCV, 2013.

2. CNN特徴量を用いた歩行者検出
• Convolutional Channel Feature
– CNNの特徴マップとBoosted treeを用いた歩行者検出法
– ACFの構造をベースに識別器を構築
– 歩行者検出と顔検出において高精度な精度を実現
24
ever, currently CNN model is often ac-
companied with huge computation com-
plexity, and the model size is usually
large (e.g, more than 500MB for widely
used VGG net).
Contribution
(1) We prove that the low-level feature
representation in pre-trained CNN model
can be used as a new type of channel fea-
tures and can generalize well to diverse
tasks, without even fine-tuning to each
domain.
(2) We prove that the high-level connec-
tions (convolutional and fully-connected
layers) in CNN model can be replaced
with a boosting forest model on some
specific tasks.
(3) We achieve state-of-the-art results on
Caltech pedestrian detection, AFW face
detection, BSDS500 edge detection and
VOC2007 object proposal generation.
Acceleration
Given a test image, we first compute its
CCF feature pyramid composing of mul-
Our solut ion: Convolut ional Channel Feat ures
Experiments
1. Pedest rian and Face D et ect ion
2. Edge D et ect ion and Object Proposal Generat ionFigure 1. The pipeline of Convolutional Channel Features (CCF),
Output
layer
#Output
maps
Filter
size
#Ds
Miss
Rate(%
ACF - 10 3 4 41.22
LDCF - 40 7 4 38.66
ANet-s1
conv1 96 11 4 61.65
conv2 256 5 4 51.52
conv3 384 3 4 43.73
conv4 384 3 4 48.37
conv5 256 3 4 53.37
VGG-16
conv2-2 128 3 4 53.86
conv3-3 256 3 4 31.28
conv4-3 512 3 8 27.66
conv5-3 512 3 16 51.52
[Yang 2015] B. Yang, J. Yan, Z. Lei and S. Z. Li, "Convolutional Channel Features: Tailoring CNN to Diverse Tasks", ICCV, 2015.

3. カスケード構造を取り入れた歩行者検出
• 高速化を目的にカスケード構造を導入
– 複数の識別器を使用 → Deep cascade
– 使用する特徴量を選択 → Complex Aware Cascade Training
• カスケード構造を取り入れた歩行者検出法の共通した目的
– CNNをなるべく使わない
25
→ 簡単に識別できるサンプルは計算コストの低い検出手法で識別
→ 識別が困難なサンプルだけCNNを使う
[Cai 2015] Z. Cai, M. Saberian and N. Vasconcelos, "Learning Complexity-Aware Cascades for Deep Pedestrian Detection", ICCV, 2015.
[Angelova 2015] A. Angelova, A. Krizhevsky, M. View, V. Vanhoucke, A. Ogale and D. Ferguson, "Real-Time Pedestrian Detection With Deep Network Cascades", BMVC, 2015.

Deep Cascade [Angelova 2015]
• VeryFastと大小のCNNを組み合わせた歩行者検出法
– Deep Learningベースの歩行者検出法で最も高速
26
2 ANGELOVA ET AL.: REAL-TIME PEDESTRIAN DETECTION WITH DE
Figure 1: Performance of pedestrian detection methods on the accuracy vs s
DeepCascade method achieves both smaller miss-rates and real-time speeds
which the runtime is more than 5 seconds per image, or is unknown, are plo
hand side. TheSpatialPooling+/Katamari methods useadditional motion info
However, Deep Neural Network (DNN) models are known to be very slo
especially when used as sliding-window classifiers. We propose a very com
approach for pedestrian detection that isboth very accurate and runsin real-ti
this, we combine a fast cascade [4] with a cascade of DNNs [24]. Compa
approaches, ours is both very accurate and very fast, running in real-time at 6
on GPU per image, or 15 framesper second (FPS). To our knowledge, no com
VeryFast
TinyCNN
BaselineCNN
歩行者
背景
ANGELOVA ET AL.: REAL-TIME PEDESTRIAN DETECTION WITH DEEPCASCADES
e 3: Thearchitecture of the tiny deep network for pedestrian detection, which is apart
eDNN cascade.
s in accuracy [3, 25, 29, 30, 37]. These approaches are still slow, ranging from over a
nd per image[25] to several minutes [37]. Thefaster approaches do not apply deep nets
eraw pixel input so their accuracy isreduced.
mproving the speed of pedestrian detection has also been an active area. Benenson et
roposed a method reaching speeds of 100 to 135 FPS [4] for detection in a 480x640
e, albeit with significantly lower accuracy. Other researchers have focused specifically
peeding up Deep Neural Networks[19, 21, 23], but with no real-time solutions yet.
Deep Network Cascades
section describes our main architecture and our approach to build very fast cascades
Tiny CNN
ANGELOVA ET AL.: REAL-TIME PEDESTRIAN DETECTION WITH DEEPCASCADES 3
Figure 2: The architecture of the baseline deep network for pedestrian detection. S denotes
thestep size, whereas D refersto thenumber of convolutional kernels or unitsper layer.
lossin average miss rate (42%). Theseminal VeryFast method [4] runsat 100 FPS but with
even further lossin missrate. In therelated domain of general object detection, which utilize
similar capacity DNNs as ours, and also take advantage of evaluating fewer candidates per
image, themost accurate method isof Girshick et al. [19], which takes53 secondsper frame
on CPU and 13 seconds per frameon GPU, which is195 times slower.
We further note that our approach is easy to implement, as it is based on open source
code. More specifically, we use the ‘Doppia’ open source implementation provided by Be-
nenson and collaborators [2] of the VeryFast algorithm [4]. Our deep neural networks are
Baseline
CNN

27
・
・
・
識別器
識別器
識別器
識別器

Region proposalベースの歩行者検出
• 1枚のフレーム画像を入力して歩行者の位置を推定する歩行者検出法
– ラスタスキャンによる繰り返し識別処理をする必要がない
• 代表的な手法
– Fast R-CNN [Girshick 2015]
– Faster R-CNN [Ren 2015]
– You Only Look Once [Redmon 2016]
– Single Shot Multi-box Detector [Liu 2016]
28
[Redmon 2016] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection", CVPR, 2016.
[Girshick 2015] R. Girshick, "Fast R-CNN", ICCV, 2015.
[Ren 2015] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", NIPS, 2015.
[Liu 2016] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, "SSD : Single Shot MultiBox Detector", ECCV, 2016.

Region proposalベースの歩行者検出法
29
手法年代 Miss rate 速度 (fps) パーツ
CNN特徴
+識別器
カスケードスケールの対応ネットワークモデル
Fast R-CNN 2015 12.86 3 Fast R-CNN
SA-FAST R-CNN 2015 9.68 2.5 ✔ Fast R-CNN
Faster R-CNN 2015 18.02 2 RPN
MS-CNN 2016 10 2.5 ✔ RPN
RPN+BF 2016 9.6 2 ✔ ✔ RPN
SSD 2016 13.06 10 ✔ SSD
Fused DNN 2016 8.2 0.5 ✔ ✔ SSD + FCN

R-CNN
• Selective searchとCNNを用いた物体検出法
– Selective searchで検出した候補領域をリサイズしてCNNに入力
– 最終的な出力結果はCNNの特徴量を使ってSVMにより識別
30
ahue Trevor Darrell Jitendra Malik
UC Berkeley
vor , mal i k } @eecs. ber kel ey. edu
the
e last
x en-
level
r, we
t im-
30%
1. Input
image
2. Extract region
proposals (~2k)
3. Compute
CNN features
aeroplane? no.
...
person? yes.
tvmonitor? no.
4. Classify
regions
warped region
...
CNN
R-CNN: Regions with CNN features
Figure 1: Object detection system overview. Our system (1)
入力画像
Selective search
による検出
CNNによる物体認識
[Girshick 2014] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", CVPR, 2014.

Selective search
• (セマンティックではない)セグメンテーションベースの物体検出法
– 初期の細かいセグメンテーション領域を繰り返し結合することで最終的な物体位置を出力
31
初期のセグメンテーション結合後
[Jasper 2013] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, "Selective Search for Object Recognition", In International Journal of Computer Vision 2013.

R-CNNのデメリット (1 / 2)
32
the
e last
x en-
level
r, we
t im-
30%
eving
ghts:
net-
er to
1. Input
image
2. Extract region
proposals (~2k)
3. Compute
CNN features
aeroplane? no.
...
person? yes.
tvmonitor? no.
4. Classify
regions
warped region
...
CNN
takes an input image, (2) extracts around 2000 bottom-up region
proposals, (3) computes features for each proposal using a large
convolutional neural network (CNN), and then (4) classiﬁes each
region using class-speciﬁc linear SVMs. R-CNN achievesamean
average precision (mAP) of 53.7% on PASCAL VOC 2010. For
入力画像
Selective search
による検出
・検出した物体候補領域数だけCNNによる識別が必要
- 1枚の画像に対して約2000回CNNで識別
- CNNに入力する際に画像をリサイズ
- Selective searchに合わせてデータセットを再構築 …etc
→ 処理時間と手間が増加

R-CNNのデメリット (2 / 2)
33
the
e last
x en-
level
r, we
t im-
30%
eving
ghts:
net-
er to
1. Input
image
2. Extract region
proposals (~2k)
3. Compute
CNN features
aeroplane? no.
...
person? yes.
tvmonitor? no.
4. Classify
regions
warped region
...
CNN
入力画像
Selective search
による検出
・Selective searchが遅い
- 候補領域の検出→切り出し→CNNによる識別が複雑
・識別器を複数用意しないといけない
→ 処理時間の増加

R-CNNのデメリット
34
the
e last
x en-
level
r, we
t im-
30%
eving
ghts:
net-
er to
1. Input
image
2. Extract region
proposals (~2k)
3. Compute
CNN features
aeroplane? no.
...
person? yes.
tvmonitor? no.
4. Classify
regions
warped region
...
CNN
入力画像
Selective search
による検出
Faster R-CNNとを改善 →
を改善 Fast R-CNN→

Fast R-CNN [Girshick 2015] & Faster R-CNN [Ren 2015]
• Fast R-CNN：特徴抽出処理を1回で処理
• Faster R-CNN：物体候補領域検出と物体認識を1つのCNNで処理
– Region Proposal Network(RPN)の導入
35
Fast R-CNN
Faster R-CNN

Fast R-CNNを用いた歩行者検出法
• Scale Aware Fast R-CNN
– 歩行者のスケールに対応したサブネットの学習方法を導入
36
4
64
3
3
128
3
3
256
3
512
3
3
512
L_conv L_fc6 L_cls_score
3
3
3
input
4096
L_fc7
4096
L_bbox_pred
2
8
cls
score
2
8
Height
Weight1
RoI
pooling
bbox_pred
3
512
S_conv S_fc6 S_cls_score
4096
S_fc7
4096
S_bbox_pred
2
RoI
pooling
Large-size Sub-network
Small-size Sub-network
8
Weight2
Gate
Function
. . . . . .
. .
. .
2
cls
prob
he architecture of our SAF R-CNN. The features of the whole input image are first extracted by a sequence of convolutional layers and max pooling
nd then fed into two sub-networks. Each sub-network first utilizes several convolutional layers to further extract scale-specific features. Then, an RoI
[Li 2016] J. Li, X. Liang, S. Shen, T. Xu and S. Yan, "Scale-aware Fast R-CNN for Pedestrian Detection", ECCV, 2015.

Faster R-CNN(RPN)を用いた歩行者検出法
• Multi Scale CNN
– カスケード構造を導入することで歩行者のスケールに対応
• 入力層に近い出力層ほどスケールの小さい歩行者を検出
37
検出する歩行者のスケール
小大
[Cai 2016] Z. Cai, Q. Fan, R. S. Feris and N. Vasconcelos, "A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection", ECCV, 2016.

Faster R-CNN(RPN)を用いた歩行者検出法
• RPN + Boosted Forest
– RPNのアンカー領域の特徴量と検出スコアをBoosted Forestで学習・識別
– アンカー走査の層を変更するなどの細かい工夫あり
38
[Zhang 2016] L. Zhang, L. Lin, X. Liang and K. He, "Is Faster R-CNN Doing Well for Pedestrian Detection", abs/1607.07032, 2016.

Single shotベースの手法
• 1つのCNNで1回の識別処理で物体を検出する手法
– Faster R-CNN：1つのCNNで物体候補領域検出と物体認識を別々に識別
• 代表的な手法
– You Only Lock Once
– Single Shot Multi-box Detector
39
Faster R-CNN YOLO SSD
VS.

You Only Look Once
• グリッドベースに物体候補領域と物体の認識結果を出力する手法
– グリッド毎に(物体候補領域 + スコア) x 2と物体認識を行うネットワーク構造
– Faster R-CNNのようにアンカー走査を必要としないため高速
40

Single Shot Multibox Detector
• 複数の畳み込み層から物体検出の位置と識別結果を出力
– 物体候補領域の検出と認識を1つの処理で行えるため高速
– 複数の畳み込み層で物体検出することでマルチスケールに対応
– 単純に歩行者検出へ応用するだけで約13%の検出性能
41

Fused DNN
• SSDの検出結果とセグメンテーションを組み合わせた歩行者検出法
– SSDの検出結果とセグメンテーション結果を統合するSoft-rejection based Network Fusionを提案
• Caltech Pedestrian Datasetで最も性能が良い手法
42
[Du 2016] X. Du, M. El-Khamy, J. Lee, S. D. Larry, "Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection", abs/1610.03466, 2016.

おわりに：2016年までの歩行者検出
• データセットの大規模化
• 歩行者検出の精度向上だけでなく処理コストの削減も考慮
– 2段階の検出構造ではカスケード構造を導入
– Faster R-CNNやSSDを歩行者検出へ応用
43
メリットデメリット
2段階の検出構造検出対象の限定で誤検出と識別時間を削減
2段階による識別時間の増加
後段の識別器のチューニングが困難
Region proposalベース 1回の検出で歩行者の位置とスケールを推定歩行者のスケール対応

今後の歩行者検出は？
• データセットの大規模化
– データ数が多いほどDeep Learningの性能が向上
– RGB画像以外のデータを活用
• Toronto City Dataset
– トロント市全体を様々なデバイスで撮影
• RGB画像，上空画像(衛星, ドローン)， LIDAR，パノラマ，GPS
• 712km２の領域，8,439kmの道路，400,000軒の建物を撮影
44

今後の歩行者検出は？
• Region Proposal NetworkやSSDが一般的になる？
– 2016年に入ってからはRPNベースの手法が数多く提案
– すでにSSDを歩行者検出に応用した手法が提案されている
• 2016.12.12 更新
• 今後はどんなアプローチが追加されるのか？
– セグメンテーション情報の活用
– 属性情報の活用
– CGデータの活用
– 時系列(動作)の活用
45
Fused DNN [Du 2016]

本スライドで参照した文献
47
[Andreas 2012] G. Andreas, L. Philip and U. Raquel, "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite", CVPR, 2012.
[Dollár 2009] P. Dollár, C. Wojek, B. Schiele and P. Perona "Pedestrian Detection: A Benchmark”, CVPR, 2009.
[Krizhevsky 2012] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS, 2012.
[Benenson 2013] R. Benenson, M. Mathias, T. Tuytelaars and L. Van Gool, "Seeking the strongest rigid detector", CVPR, 2013.
[Dollár 2014] P. Dollár, R. Appel, S. Belongie and P. Perona, "Fast feature pyramids for object detection", PAMI, 2014.
[Luo 2013] P. Luo, Y. Tian, X. Wang and X. Tang, "Switchable Deep Network for Pedestrian Detection", CVPR, 2014.
[Tian 2015] Y. Tian, P. Luo, X. Wang and X. Tang, "Deep Learning Strong Parts for Pedestrian Detection", ICCV, 2015.
[Ouyang 2013] W. Ouyang and X. Wang, "Joint deep learning for pedestrian detection" ,ICCV, 2013.
[Yang 2015] B. Yang, J. Yan, Z. Lei and S. Z. Li, "Convolutional Channel Features: Tailoring CNN to Diverse Tasks", ICCV, 2015.
[Cai 2015] Z. Cai, M. Saberian and N. Vasconcelos, "Learning Complexity-Aware Cascades for Deep Pedestrian Detection", ICCV, 2015.
[Girshick 2014] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", CVPR, 2014.
[Zhang 2016] L. Zhang, L. Lin, X. Liang and K. He, "Is Faster R-CNN Doing Well for Pedestrian Detection", abs/1607.07032, 2016.

[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [サーベイ論文] Deep Learningを用いた歩行者検出の研究動向

Similar to [サーベイ論文] Deep Learningを用いた歩行者検出の研究動向 (10)

More from Hiroshi Fukui

More from Hiroshi Fukui (6)

[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向