MinSun@GTC'2016

TAIPEI | SEP. 21-22, 2016
Min Sun, Sept. 21, 2016
LEARNING FROM DASHCAM VIDEOS

2
AGENDA
• Anticipate Accident
- Chan et al. ACCV’16 oral
• Extracting Driving Behavior
- Chang et al. ECCV’16 workshop

3
Using Dashcam Videos to
Anticipate Accidents
詹富翔
Fu-Hsiang Chan
NTHU EE
向宇
Yu Xiang
Stanford CS
陳玉亭
Yu-Ting Chen
NTHU EE
孫民
Min Sun
NTHU EE
VSLab

4
MOTIVATION
VSLab
Google’s self-driving car is involved in 12 minor accidents mostly caused by
other human drivers.
Using dashcam videos to anticipate
corner cases (e.g., accient).
Google self-driving car project monthly report (2015)

6
POPULATION AND MOTOR VEHICLES DENSITY
Taiwan USA Japan Korea German UK
Area (km2
) 36.2 9,831.5 377.9 99.9 357.1 243.6
Population Density (No./km2
) 641 32 337 490 229 255
Motorbike Density (No./km2
) 614 26 232 165 155 140
Vehicles Density (No./km2
) 195 25 199 147 144 135
資料來源：中華民國環境保護統計年報101年表8-1
VSLab

7
MORE COMPLEX ENVIRONMENT
Japan Taiwan
VS
VSLab
Japan Taiwan

9
ACCIDENT TYPES OF 620 VIDEOS
VSLab
Bike hits
Car
42.6%
Car hits
Car
19.7%
Bike hits
Bike
15.6%
Others
22%

10
Our Method
VSLabPerson
Bike
Motorbike
Car

12
Faster-RCNN (Detection)
S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object
detection with region proposal networks. In NIPS, 2015
VSLab
Car
Car
Person
Person Person
Motorbike
Motorbike
Motorbike
Car

14
VSLab
Heng Wang and Cordelia Schmid, “Action recognition with
Improved Dense Trajectory (IDT)

15
ANTICIPATING ACCIDENTS MODEL
VSLab

16
VSLab
• Recurrent neural network

17
• Spatial attention modelｚｘ
VSLab
Time = t
RNNRNNRNN
Time = t+1
Time = t+2
Weighted sum
Weighted sum
Weighted sum
Attention
Attention

18
VSLab
• Exponential loss
Time
Ashesh Jain, Hema S. Koppula, Bharad Raghavan, Shane Soh, and Ashutosh Saxena,
“Car that knows before you do: Anticipating maneuvers via learning temporal driving models,” in ICCV, 2015.

19
VSLab
• Recurrent Neural Network
• Spatial Attention Model
• Exponential Loss

20
EXPERIENCES
VSLab
Positive
examples
Negative
examples
Total
Training set 455 829 1284
Testing set 165 301 466
Total 620 1130 1750
• Positive : Negative ≒ 2:3
• Training : Testing ≒ 3:1
Negative example Positive example

21
mAP
[1]
[2]
Finetune Faster-RCNN
VSLab
• Training set: KITTI dataset + 58 additional videos
• Testing set: 165 positive examples of testing set
29%
35%
27%
15%
35%
28%
[1] M. Everingham et al.“The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results,” 2007.
[2] T.-Y. Lin et al. “Microsoft COCO: Com- ´ mon Objects in Context,” in ECCV, 2014.
General photos StreetView photos

22
ANTICIPATING ACCIDENTS RESULTS
VSLab
Appearance
Motion
Recurrent Neural Network
Single-frame Classifier (SFC)
Frame baseAverage attention
Concatenate the frame
with the average attentionWeighted-summing frame with attention on objectConcatenating frame with attention on object
Frame
T
SFC
VGG or IDT
Output
Frame
T+1
SFC
VGG or IDT
Output
RNN RNN
attention
Only Attention on object

23
ANTICIPATING ACCIDENTS RESULTS
VSLab
Achieve the best
74.35% mAP
Appearance
Motion

24
ANTICIPATING ACCIDENTS RESULT
Our method anticipates accidents about 2 seconds before they occur
with 80% recall and 56.14% precision.
VSLab
56.14%
≒2

26
Box attention
high
low
Focus on
the box
weight > 0.4
frame
Probability
Threshold
Accident!
Warning
VSLab
Typical Examples

27
Box attention
high
low
Focus on
the box
weight > 0.4
frame
Probability
Threshold
Accident!
Warning
VSLab
Typical Examples

28
Box attention
high
low
Focus on
the box
weight > 0.4
frame
Probability
Threshold
Accident!
Warning
VSLab
Typical Examples

29
RELATED WORK
VSLab
B. Frohlich, M. Enzweiler, and U. Franke, “Will this car change the
lane? - turn signal recognition in the frequency domain,” in Intelligent
Vehicles Symposium (IV), 2014.
A. Doshi, B. Morris, and M. Trivedi, “On-road prediction of driver’s
intent with multimodal sensory cues,” IEEE Pervasive Computing, vol. 10,
no. 3, pp. 22–34, 2011.

30
RELATED WORK
VSLab
Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh Saxena,
“Recurrent neural networks for driver activity anticipation via sensory-fusion architecture,” in ICRA, 2016.
Ashesh Jain, Hema S. Koppula, Bharad Raghavan, Shane Soh, and Ashutosh Saxena,
“Car that knows before you do: Anticipating maneuvers via learning temporal driving models,” in ICCV, 2015.

31
AGENDA

32
Extracting Driving Behavior:
Global Metric Localization
from Dashcam Videos in the Wild
孫民
Min Sun
NTHU EE
陳煥宗
Hwann-Tzong Chen
NTHU CS
張劭平
NTHU EE
簡瑞霆
NTHU CS
王福恩
NTHU EE
楊尚達
NTHU EE

36
AGENDA
http://aliensunmin.github.io/
VSLab
李濬屹
CS NTHU

TAIPEI | SEP. 21-22, 2016
THANK YOU

MinSun@GTC'2016

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to MinSun@GTC'2016

Similar to MinSun@GTC'2016 (20)

Recently uploaded

Recently uploaded (20)

MinSun@GTC'2016

Editor's Notes