자연어처리 연구실
M2020064
조단비
Published in: 2020 IEEE International Conference on Image Processing
URL: https://ieeexplore.ieee.org/abstract/document/9190996
Content
1. Introduction
2. Attention Integrated Deep Networks
3. Experiments
4. Summary
#Kookmin_University #Natural_Language_Processing_lab. 1
Introduction
#Kookmin_University #Natural_Language_Processing_lab. 2
> Traditional visual features
: color-based, short-based, motion-based
> Hand-crafted features on machine learning
: support vector machine (SVM) and hidden markov model (HMM)
> For image/video classification: Convolutional neural network (CNN)
> For temporal information: Long short-term memory (LSTM)
> For process the signal by certain information: Attention mechanism
>> CNN + LSTM including Attention
Attention Integrated Deep Networks
#Kookmin_University #Natural_Language_Processing_lab. 3
> 2D CNN: VGG16, VGG19, Inception V3, ResNet50, Xception
> LSTM: Bi-directional LSTM
> Attention: before LSTM, after LSTM
To extract relevant features that can represent individual video frames
To preserve information from both past and future
Experiments
#Kookmin_University #Natural_Language_Processing_lab. 4
Network hyper-parameters
> Hidden units of LSTM: 64, 128, 256, 512
> The size of dense layer for attention: average number of utilized video frames
- long video sequences with frames: discard
- short video sequences with frames: zero padding
Evaluation results
> Dataset
(1) UCF101: 13,320 videos (101 action categories)
(2) Sports-1M: 1 million YouTube videos (487 classes)
- select video files shorter than 20 seconds in 202 classes among 487 classes
- select classes with more than 100 video files
- total: 18,319 video sequences (99 classes) >> Sports-1M-99
Experiments
#Kookmin_University #Natural_Language_Processing_lab. 5
> Train:Test = 7:3
> Evaluation metrics: averaging accuracies of 10 tests
Summary
#Kookmin_University #Natural_Language_Processing_lab. 6
1. Applying attention on LSTM outputs achieves better accuracy
2. VGG19 is more suitable for integrating the attention block because of low dimension
3. 2D CNN outperforms 3D CNN
> Integrating the attention mechanism into 2D CNNs and LSTM
for video classification
Thank You.
7
#Kookmin_University #Natural_Language_Processing_lab.

Attention boosted deep networks for video classification