Mmai 2014 final

Naive Soul Guardian
Bloody Scenes Detection
with Deep Convolutional Neural Network
B99902080 李冠穎
R03944007 張人尹

Outline
● Motivation
● System Overview
o Convolutional Neural Network
o Fully-Convolutional Net
o Pixelation
● Experiment
● Future Work
● Reference
● Demo 1

Motivation
● Lots of videos contain bloody scenes, we want to
protect kids from these inappropriate scenes
● Our system aims to detect and pixelate bloody
scenes automatically
2

Motivation
● Lots of videos contain bloody scenes, we want to
protect kids from these inappropriate scenes
● Our system aims to detect and pixelate bloody
scenes automatically
3

System Overview
4
Videos
Frames
Pixelated frames
Ignored frames
Pixelated videos
Decode
Encode
0
1

Convolutional Neural Network
● Fine-tune pre-trained CaffeNet(ImageNet)
o Human-labeled frames without bounding box
● Predict decoded frames
o Background(0) ignored frames
o Bloody frame(1) fully-convolutional net
5

Fully-Convolutional Net
● Classification for each 227 × 227 box with stride
32 on 451 x 451 image
● Generate a 8 x 8 classification map
o Interpolate probabilities to obtain heat map
6
Fully-Convolutional
Net

Pixelation
● Resize heat map to frame size
● Base on heat map, blur frames by Gaussian filter
7

Experiment (I)
● Run on cml21
● Decoding/Encoding done by FFmpeg
● Decoded frames as training/validation data
o Pos = Segments from Saw 1, 2, 3, 7, Final Destination 4,
5…… + Crawled images from google images
o Neg = Segments from The Big Bang Theory S8E11…… +
Part of ILSVRC 2013 val/test
o Random sample Pos : Neg = 2500 : 2500
8

Experiment (II)
● Classification Accuracy
o 73.46%
9

Experiment (III)
● Time(sec) of Processing a video clip
10
Decoding Classification Heat map Pixelation Encoding Average
time
Saw6
(139 frames,720x404)
0.34 41.18 22.99 72.43 0.02 0.99
sec/frame
CWL
(109 frames,1280x720)
0.79 36.95 0 0 1.24 0.36
sec/frame
FD5
(121 frames,1024x576)
0.44 36.23 3.87 28.72 0.81 0.58
sec/frame

Future Work
● Train our model with more diverse data to
increase accuracy and reduce false-positive
● Accelerate blurring and smooth boundaries
● Implement on surveillance camera for security
● Combine shot detection and motion vector to
reduce computation
11

Reference
● Caffe | Deep Learning Framework
○ http://caffe.berkeleyvision.org/
○ Classifying ImageNet: the instant Caffe way
○ Net Surgery for a Fully-Convolutional Model
● FFmpeg
○ https://www.ffmpeg.org/
● ImageNet
○ http://www.image-net.org/
● Tutorials by Hsinfu, Shiro, Jocelyn
12

Finally,
I wanna play a…
Q & A game
14

Mmai 2014 final

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Mmai 2014 final

Similar to Mmai 2014 final (20)

Recently uploaded

Recently uploaded (20)

Mmai 2014 final

Editor's Notes