3. 3
Using Dashcam Videos to
Anticipate Accidents
詹富翔
Fu-Hsiang Chan
NTHU EE
向 宇
Yu Xiang
Stanford CS
陳玉亭
Yu-Ting Chen
NTHU EE
孫 民
Min Sun
NTHU EE
VSLab
4. 4
MOTIVATION
VSLab
Google’s self-driving car is involved in 12 minor accidents mostly caused by
other human drivers.
Using dashcam videos to anticipate
corner cases (e.g., accient).
Google self-driving car project monthly report (2015)
6. 6
POPULATION AND MOTOR VEHICLES DENSITY
Taiwan USA Japan Korea German UK
Area (km2
) 36.2 9,831.5 377.9 99.9 357.1 243.6
Population Density (No./km2
) 641 32 337 490 229 255
Motorbike Density (No./km2
) 614 26 232 165 155 140
Vehicles Density (No./km2
) 195 25 199 147 144 135
資料來源:中華民國環境保護統計年報101年表8-1
VSLab
12. 12
Faster-RCNN (Detection)
S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object
detection with region proposal networks. In NIPS, 2015
VSLab
Car
Car
Person
Person Person
Motorbike
Motorbike
Motorbike
Car
17. 17
• Spatial attention modelzx
ANTICIPATING ACCIDENTS MODEL
VSLab
Time = t
RNNRNNRNN
Time = t+1
Time = t+2
Weighted sum
Weighted sum
Weighted sum
Attention
Attention
18. 18
ANTICIPATING ACCIDENTS MODEL
VSLab
• Exponential loss
Time
Ashesh Jain, Hema S. Koppula, Bharad Raghavan, Shane Soh, and Ashutosh Saxena,
“Car that knows before you do: Anticipating maneuvers via learning temporal driving models,” in ICCV, 2015.
21. 21
mAP
[1]
[2]
Finetune Faster-RCNN
VSLab
• Training set: KITTI dataset + 58 additional videos
• Testing set: 165 positive examples of testing set
29%
35%
27%
15%
35%
28%
[1] M. Everingham et al.“The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results,” 2007.
[2] T.-Y. Lin et al. “Microsoft COCO: Com- ´ mon Objects in Context,” in ECCV, 2014.
General photos StreetView photos
22. 22
ANTICIPATING ACCIDENTS RESULTS
VSLab
Appearance
Motion
Recurrent Neural Network
Single-frame Classifier (SFC)
Frame baseAverage attention
Concatenate the frame
with the average attentionWeighted-summing frame with attention on objectConcatenating frame with attention on object
Frame
T
SFC
VGG or IDT
Output
Frame
T+1
SFC
VGG or IDT
Output
RNN RNN
attention
Only Attention on object
24. 24
ANTICIPATING ACCIDENTS RESULT
Our method anticipates accidents about 2 seconds before they occur
with 80% recall and 56.14% precision.
VSLab
56.14%
≒2
28. 29
RELATED WORK
VSLab
B. Frohlich, M. Enzweiler, and U. Franke, “Will this car change the
lane? - turn signal recognition in the frequency domain,” in Intelligent
Vehicles Symposium (IV), 2014.
A. Doshi, B. Morris, and M. Trivedi, “On-road prediction of driver’s
intent with multimodal sensory cues,” IEEE Pervasive Computing, vol. 10,
no. 3, pp. 22–34, 2011.
29. 30
RELATED WORK
VSLab
Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh Saxena,
“Recurrent neural networks for driver activity anticipation via sensory-fusion architecture,” in ICRA, 2016.
Ashesh Jain, Hema S. Koppula, Bharad Raghavan, Shane Soh, and Ashutosh Saxena,
“Car that knows before you do: Anticipating maneuvers via learning temporal driving models,” in ICCV, 2015.
31. 32
Extracting Driving Behavior:
Global Metric Localization
from Dashcam Videos in the Wild
孫 民
Min Sun
NTHU EE
陳煥宗
Hwann-Tzong Chen
NTHU CS
張劭平
NTHU EE
簡瑞霆
NTHU CS
王福恩
NTHU EE
楊尚達
NTHU EE
可以看到上面兩個是我們所用的Appearance 和 Motion features
而在時間軸上面我們則使用了RNN來把當下重要的資訊傳遞給下個時間點
在每個frame則使用了Attention model來注意那些object比較重要
在最下面則是我們用了Exponential Loss 讓它在離車禍越近時的Confidence要越高