Attention boosted deep networks for video classification

자연어처리 연구실
M2020064
조단비
Published in: 2020 IEEE International Conference on Image Processing
URL: https://ieeexplore.ieee.org/abstract/document/9190996

Content
1. Introduction
2. Attention Integrated Deep Networks
3. Experiments
4. Summary
#Kookmin_University #Natural_Language_Processing_lab. 1

Introduction
> Traditional visual features
: color-based, short-based, motion-based
> Hand-crafted features on machine learning
: support vector machine (SVM) and hidden markov model (HMM)
> For image/video classification: Convolutional neural network (CNN)
> For temporal information: Long short-term memory (LSTM)
> For process the signal by certain information: Attention mechanism
>> CNN + LSTM including Attention

Attention Integrated Deep Networks
> 2D CNN: VGG16, VGG19, Inception V3, ResNet50, Xception
> LSTM: Bi-directional LSTM
> Attention: before LSTM, after LSTM
To extract relevant features that can represent individual video frames
To preserve information from both past and future

Experiments
Network hyper-parameters
> Hidden units of LSTM: 64, 128, 256, 512
> The size of dense layer for attention: average number of utilized video frames
- long video sequences with frames: discard
- short video sequences with frames: zero padding
Evaluation results
> Dataset
(1) UCF101: 13,320 videos (101 action categories)
(2) Sports-1M: 1 million YouTube videos (487 classes)
- select video files shorter than 20 seconds in 202 classes among 487 classes
- select classes with more than 100 video files
- total: 18,319 video sequences (99 classes) >> Sports-1M-99

Experiments
> Train:Test = 7:3
> Evaluation metrics: averaging accuracies of 10 tests

Summary
1. Applying attention on LSTM outputs achieves better accuracy
2. VGG19 is more suitable for integrating the attention block because of low dimension
3. 2D CNN outperforms 3D CNN
> Integrating the attention mechanism into 2D CNNs and LSTM
for video classification

Thank You.
7
#Kookmin_University #Natural_Language_Processing_lab.

Attention boosted deep networks for video classification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Attention boosted deep networks for video classification

Similar to Attention boosted deep networks for video classification (20)

More from Danbi Cho

More from Danbi Cho (10)

Recently uploaded

Recently uploaded (20)

Attention boosted deep networks for video classification