Attention boosted deep networks for video classification

•

0 likes•61 views

The presentation explains the integrating attention with CNN and LSTM. This paper carried out the video classification task using the attention with CNNLSTM models. (9th April 2021)

Software

자연어처리 연구실
M2020064
조단비
Published in: 2020 IEEE International Conference on Image Processing
URL: https://ieeexplore.ieee.org/abstract/document/9190996

Content
1. Introduction
2. Attention Integrated Deep Networks
3. Experiments
4. Summary
#Kookmin_University #Natural_Language_Processing_lab. 1

Introduction
#Kookmin_University #Natural_Language_Processing_lab. 2
> Traditional visual features
: color-based, short-based, motion-based
> Hand-crafted features on machine learning
: support vector machine (SVM) and hidden markov model (HMM)
> For image/video classification: Convolutional neural network (CNN)
> For temporal information: Long short-term memory (LSTM)
> For process the signal by certain information: Attention mechanism
>> CNN + LSTM including Attention

Attention Integrated Deep Networks
#Kookmin_University #Natural_Language_Processing_lab. 3
> 2D CNN: VGG16, VGG19, Inception V3, ResNet50, Xception
> LSTM: Bi-directional LSTM
> Attention: before LSTM, after LSTM
To extract relevant features that can represent individual video frames
To preserve information from both past and future

Experiments
#Kookmin_University #Natural_Language_Processing_lab. 4
Network hyper-parameters
> Hidden units of LSTM: 64, 128, 256, 512
> The size of dense layer for attention: average number of utilized video frames
- long video sequences with frames: discard
- short video sequences with frames: zero padding
Evaluation results
> Dataset
(1) UCF101: 13,320 videos (101 action categories)
(2) Sports-1M: 1 million YouTube videos (487 classes)
- select video files shorter than 20 seconds in 202 classes among 487 classes
- select classes with more than 100 video files
- total: 18,319 video sequences (99 classes) >> Sports-1M-99

Experiments
#Kookmin_University #Natural_Language_Processing_lab. 5
> Train:Test = 7:3
> Evaluation metrics: averaging accuracies of 10 tests

Summary
#Kookmin_University #Natural_Language_Processing_lab. 6
1. Applying attention on LSTM outputs achieves better accuracy
2. VGG19 is more suitable for integrating the attention block because of low dimension
3. 2D CNN outperforms 3D CNN
> Integrating the attention mechanism into 2D CNNs and LSTM
for video classification

Thank You.
7
#Kookmin_University #Natural_Language_Processing_lab.

What's hot

GUI based handwritten digit recognition using CNNAbhishek Tiwari

Image Captioning Generator using Deep Machine Learningijtsrd

Dissertation character recognition - Reportsachinkumar Bharadva

Handwritten Digit Recognitionijtsrd

Handwritten Recognition using Deep Learning with RPoo Kuan Hoong

Automated Neural Image Caption Generator for Visually Impaired PeopleChristopher Mehdi Elamri

Implementation of Steganographic Model using Inverted LSB InsertionDr. Amarjeet Singh

Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중datasciencekorea

Sharbani bhattacharya gyanodya 2014Sharbani Bhattacharya

Basics of Deep learningRamesh Kumar

88 92Ijarcsee Journal

DSRLab seminar Introduction to deep learningPoo Kuan Hoong

Digit recognitionbtandale

Mostafa Shabani Cvmostafa shabani

Artificial Neural Network TopologyHarshana Madusanka Jayamaha

Tensorflow Training From Bangalore,Online and ClassroomsmyTectra Learning Solutions Private Ltd

Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep LearningNUS Institute of Applied Learning Sciences and Educational Technology

Multimodal Sequential Learning for Video QANAVER Engineering

Neural networks...Molly Chugh

Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong

What's hot (20)

GUI based handwritten digit recognition using CNN

Image Captioning Generator using Deep Machine Learning

Dissertation character recognition - Report

Handwritten Digit Recognition

Handwritten Recognition using Deep Learning with R

Automated Neural Image Caption Generator for Visually Impaired People

Implementation of Steganographic Model using Inverted LSB Insertion

Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중

Sharbani bhattacharya gyanodya 2014

Basics of Deep learning

88 92

DSRLab seminar Introduction to deep learning

Digit recognition

Mostafa Shabani Cv

Artificial Neural Network Topology

Tensorflow Training From Bangalore,Online and Classrooms

Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning

Multimodal Sequential Learning for Video QA

Neural networks...

Big Data Malaysia - A Primer on Deep Learning

Similar to Attention boosted deep networks for video classification

Mining Frequent Events From VideoSteffi Keran Rani J

VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMIRJET Journal

Video content analysis and retrieval system using video storytelling and inde...IJECEIAES

VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...Journal For Research

Human Action Recognition in VideosIRJET Journal

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION

24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS

PPTVideoguy

Multimodel Operation for Visually1.docxAROCKIAJAYAIECW

Multimodal video abstraction into a static document using deep learning IJECEIAES

An optimized discrete wavelet transform compression technique for image trans...IJECEIAES

18 17 jan17 13470 rakesh ahuja revised-version(edit)IAESIJEECS

Key frame extraction for video summarization using motion activity descriptorseSAT Publishing House

Key frame extraction for video summarization using motion activity descriptorseSAT Journals

IRJET - Information Hiding in H.264/AVC using Digital WatermarkingIRJET Journal

SIGN LANGUAGE INTERFACE SYSTEM FOR HEARING IMPAIRED PEOPLEIRJET Journal

SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSIONsipij

Web-Based Online Embedded Security System And Alertness Via Social MediaIRJET Journal

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals

Similar to Attention boosted deep networks for video classification (20)

Mining Frequent Events From Video

VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM

Video content analysis and retrieval system using video storytelling and inde...

VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...

Human Action Recognition in Videos

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

24 7912 9261-1-ed a meaningful (edit a)

PPT

Multimodel Operation for Visually1.docx

Multimodal video abstraction into a static document using deep learning

An optimized discrete wavelet transform compression technique for image trans...

18 17 jan17 13470 rakesh ahuja revised-version(edit)

Key frame extraction for video summarization using motion activity descriptors

IRJET - Information Hiding in H.264/AVC using Digital Watermarking

SIGN LANGUAGE INTERFACE SYSTEM FOR HEARING IMPAIRED PEOPLE

SECURE OMP BASED PATTERN RECOGNITION THAT SUPPORTS IMAGE COMPRESSION

Web-Based Online Embedded Security System And Alertness Via Social Media

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...

Recently uploaded

Evolving Data Governance for the Real-time Streaming and AI Eraconfluent

Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig

Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio, Inc.

Test Automation Design Patterns_ A Comprehensive Guide.pdfkalichargn70th171

Abortion Clinic In Stanger ](+27832195400*)[ 🏥 Safe Abortion Pills In Stanger...Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaNeo4j

Your Ultimate Web Studio for Streaming Anywhere | Evmuxevmux96

Community is Just as Important as Code by Andrea GouletAndrea Goulet

Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

Transformer Neural Network Use Cases with LinksJinanKordab

Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Flutter Agency

GraphSummit Milan - Neo4j: The Art of the Possible with GraphNeo4j

From Theory to Practice: Utilizing SpiraPlan's REST APIInflectra

Lessons Learned from Building a Serverless Notifications System.pdfSrushith Repakula

Software Engineering - Introduction + Process Models + Requirements EngineeringPrakhyath Rai

Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

BusinessGPT - Security and Governance for Generative AIAGATSoftware

Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024SimonedeGijt

Recently uploaded (20)

Evolving Data Governance for the Real-time Streaming and AI Era

Automate your OpenSIPS config tests - OpenSIPS Summit 2024

Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud

Test Automation Design Patterns_ A Comprehensive Guide.pdf

Abortion Clinic In Stanger ](+27832195400*)[ 🏥 Safe Abortion Pills In Stanger...

UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida

Your Ultimate Web Studio for Streaming Anywhere | Evmux

Community is Just as Important as Code by Andrea Goulet

Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...

Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...

Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...

Transformer Neural Network Use Cases with Links

Navigation in flutter – how to add stack, tab, and drawer navigators to your ...

GraphSummit Milan - Neo4j: The Art of the Possible with Graph

From Theory to Practice: Utilizing SpiraPlan's REST API

Lessons Learned from Building a Serverless Notifications System.pdf

Software Engineering - Introduction + Process Models + Requirements Engineering

Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...

BusinessGPT - Security and Governance for Generative AI

Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024

Attention boosted deep networks for video classification

1. 자연어처리 연구실 M2020064 조단비 Published in: 2020 IEEE International Conference on Image Processing URL: https://ieeexplore.ieee.org/abstract/document/9190996

2. Content 1. Introduction 2. Attention Integrated Deep Networks 3. Experiments 4. Summary #Kookmin_University #Natural_Language_Processing_lab. 1

3. Introduction #Kookmin_University #Natural_Language_Processing_lab. 2 > Traditional visual features : color-based, short-based, motion-based > Hand-crafted features on machine learning : support vector machine (SVM) and hidden markov model (HMM) > For image/video classification: Convolutional neural network (CNN) > For temporal information: Long short-term memory (LSTM) > For process the signal by certain information: Attention mechanism >> CNN + LSTM including Attention

4. Attention Integrated Deep Networks #Kookmin_University #Natural_Language_Processing_lab. 3 > 2D CNN: VGG16, VGG19, Inception V3, ResNet50, Xception > LSTM: Bi-directional LSTM > Attention: before LSTM, after LSTM To extract relevant features that can represent individual video frames To preserve information from both past and future

5. Experiments #Kookmin_University #Natural_Language_Processing_lab. 4 Network hyper-parameters > Hidden units of LSTM: 64, 128, 256, 512 > The size of dense layer for attention: average number of utilized video frames - long video sequences with frames: discard - short video sequences with frames: zero padding Evaluation results > Dataset (1) UCF101: 13,320 videos (101 action categories) (2) Sports-1M: 1 million YouTube videos (487 classes) - select video files shorter than 20 seconds in 202 classes among 487 classes - select classes with more than 100 video files - total: 18,319 video sequences (99 classes) >> Sports-1M-99

6. Experiments #Kookmin_University #Natural_Language_Processing_lab. 5 > Train:Test = 7:3 > Evaluation metrics: averaging accuracies of 10 tests

7. Summary #Kookmin_University #Natural_Language_Processing_lab. 6 1. Applying attention on LSTM outputs achieves better accuracy 2. VGG19 is more suitable for integrating the attention block because of low dimension 3. 2D CNN outperforms 3D CNN > Integrating the attention mechanism into 2D CNNs and LSTM for video classification

8. Thank You. 7 #Kookmin_University #Natural_Language_Processing_lab.

Attention boosted deep networks for video classification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Attention boosted deep networks for video classification

Similar to Attention boosted deep networks for video classification (20)

More from Danbi Cho

More from Danbi Cho (10)

Recently uploaded

Recently uploaded (20)

Attention boosted deep networks for video classification