A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16)

•

0 likes•139 views

In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity. We build a deep model to capture these dynamics based on LSTM models. To make use of these observations, we present a 2-stage deep temporal model for the group activity recognition problem. In our model, a LSTM model is designed to represent action dynamics of individual people in a sequence and another LSTM model is designed to aggregate person-level information for whole activity understanding. We evaluate our model over two datasets: the Collective Activity Dataset and a new volleyball dataset.

Science

Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
Simon Fraser University
A Hierarchical Deep Temporal Model for
Group Activity Recognition
CVPR 16 - Code
Mostafa Saad Ibrahim
PhD Student @ Simon Fraser University

Human Activities
Gestures
Actions
Interactions Group Activity

Group Activity Recognition
Walking
Walking
Walking
Walking
StandingStanding
Walking
Major Activity = Walking scene.

Group Activity Recognition
Main Activity = Left Spike
Standing
Standing
Blocking
Blocking
Moving
Moving
Spikig
Jumping
Waiting
Moving
Moving
Moving
Left Spike Activity

Naive Approach: Image Classifier
Walking
Background?!
Person's Actions?!

Hierarchical Modeling Approach
Understand
person action
Understand
group activity
- Track people in the scene
- Learn action classifier
- <Bounding Box, Person Action> inputs
- Extract person’s representation
- Aggregate the people's representations
- Learn activity classifier
- <Scene Representation, Scene Activity>

Person Action Classifier
◼ Build a spatio-temporal representation for person’s
actions.
◼ Track a manually annotated bounding box for each
person for a fixed temporal window
◼ Extract deep visual representation for each tracked
person using Alexnet’s fc7 features,
◼ Feed fc7 to a person LSTM to model the temporal
dimension.
◼ Extract spatio-temporal features per person from its
LSTM

Person Action Classifier
ht
is the extracted features from
lstm layer representing
spatio-temporal features of a
person at time t

Group Activity Classifier
◼ Build a spatio-temporal representation for the group
activity of a given frame
◼ Aggregate all individual person representations for
every temporal step.
◼ Standard pooling operators (e.g. max/avg pooling) are experimented
◼ Feed aggregated representation to a group level LSTM
◼ Extract spatio-temporal representation for the group
activity from the top-level LSTM
◼ Learn a soft-max classifier on top of the group activity
representation.

Pooling Persons’ representations
….
Ptk
= kth Person representation
Zt
= Scene representation at time t

Collective Activity Dataset
◼ Same label set for people and group activities
◼ 1925 video clips for training, 638 clips for test
1. Crossing 2. Queueing 3. Talking
4. Waiting 5. Walking

Collective Activity Dataset
Method Accuracy
Contextual Model [Lan NIPS’10] 79.1
Deep Structured Model [Deng BMVC‘15] 80.6
Our Model 81.5
Cardinality Kernel [Hajimirsadeghi CVPR‘15] 83.4

◼ 4830 annotated frames from 55 YouTube videos.
◼ 9 person level labels, and 8 group activity labels.
New Volleyball Dataset

Volleyball Dataset
Spatial Model
Non
Spatial Model

Summary
◼ A two stage hierarchical model for group
activity recognition
◼ LSTMs as a highly effective temporal model
and temporal feature source
◼ Decent people-relation modeling with simple
pooling
◼ Code & Dataset Link

Similar to A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16)

Deep Learning Jump StartMichele Toni

Lecture #4 activities & fragmentsVitali Pekelis

Reliving on demand a total viewer experienceVivek Singh

Long-term Face Tracking in the Wild using Deep LearningElaheh Rashedi

ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORYijcseit

PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...광희 이

Project_Presentation1.pptxRUDRAPRASADSABAR

Agile leadership practices for PIONEERSStefan Haas

FASSOLD Deep learning for semantic analysis and annotation of conventional an...FIAT/IFTA

Human Action RecognitionNAVER Engineering

DefenseQuincy Liu

Activity recognition using histogram ofijcseit

Huawei STW 2018 publicAlan Smeaton

Ron's muri presentationgowinraj

A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...sugiuralab

Atari Game State Representation using Convolutional Neural Networksjohnstamford

Demo dayDeepikaRana30

Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD

Volume 2-issue-6-1960-1964Editor IJARCET

Similar to A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16) (20)

Deep Learning Jump Start

Lecture #4 activities & fragments

Reliving on demand a total viewer experience

Long-term Face Tracking in the Wild using Deep Learning

ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY

PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...

Project_Presentation1.pptx

Agile leadership practices for PIONEERS

FASSOLD Deep learning for semantic analysis and annotation of conventional an...

Human Action Recognition

Defense

Activity recognition using histogram of

Huawei STW 2018 public

Ron's muri presentation

A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...

Atari Game State Representation using Convolutional Neural Networks

Demo day

Deep Learning for Structure-from-Motion (SfM)

Volume 2-issue-6-1960-1964

Recently uploaded

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1

FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197

pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1

Proteomics: types, protein profiling steps etc.Silpa

PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384

chemical bonding Essentials of Physical Chemistry2.pdfTukamushabaBismark

Clean In Place(CIP).pptx .Poonam Aher Patil

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

GBSN - Microbiology (Unit 3)Areesha Ahmad

COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed

Bacterial Identification and ClassificationsAreesha Ahmad

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani

Module for Grade 9 for Asynchronous/Distance learninglevieagacer

Recently uploaded (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified

Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified

Digital Dentistry.Digital Dentistryvv.pptx

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service

FAIRSpectra - Enabling the FAIRification of Analytical Science

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

Proteomics: types, protein profiling steps etc.

PSYCHOSOCIAL NEEDS. in nursing II sem pptx

chemical bonding Essentials of Physical Chemistry2.pdf

Clean In Place(CIP).pptx .

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

GBSN - Microbiology (Unit 3)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx

Bacterial Identification and Classifications

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds

Module for Grade 9 for Asynchronous/Distance learning

A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16)

1. Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori Simon Fraser University A Hierarchical Deep Temporal Model for Group Activity Recognition CVPR 16 - Code Mostafa Saad Ibrahim PhD Student @ Simon Fraser University

2. Human Activities Gestures Actions Interactions Group Activity

3. Group Activity Recognition Walking Walking Walking Walking StandingStanding Walking Major Activity = Walking scene.

4. Group Activity Recognition Main Activity = Left Spike Standing Standing Blocking Blocking Moving Moving Spikig Jumping Waiting Moving Moving Moving Left Spike Activity

5. Naive Approach: Image Classifier Walking Background?! Person's Actions?!

6. Hierarchical Modeling Approach Understand person action Understand group activity - Track people in the scene - Learn action classifier - <Bounding Box, Person Action> inputs - Extract person’s representation - Aggregate the people's representations - Learn activity classifier - <Scene Representation, Scene Activity>

7. Hierarchical Deep Temporal Model

8. Person Action Classifier ◼ Build a spatio-temporal representation for person’s actions. ◼ Track a manually annotated bounding box for each person for a fixed temporal window ◼ Extract deep visual representation for each tracked person using Alexnet’s fc7 features, ◼ Feed fc7 to a person LSTM to model the temporal dimension. ◼ Extract spatio-temporal features per person from its LSTM

9. Person Action Classifier ht is the extracted features from lstm layer representing spatio-temporal features of a person at time t

10. Group Activity Classifier ◼ Build a spatio-temporal representation for the group activity of a given frame ◼ Aggregate all individual person representations for every temporal step. ◼ Standard pooling operators (e.g. max/avg pooling) are experimented ◼ Feed aggregated representation to a group level LSTM ◼ Extract spatio-temporal representation for the group activity from the top-level LSTM ◼ Learn a soft-max classifier on top of the group activity representation.

11. Pooling Persons’ representations …. Ptk = kth Person representation Zt = Scene representation at time t

12. Overall model

13. Overall model - more spatial

14. Experiments

15. Collective Activity Dataset ◼ Same label set for people and group activities ◼ 1925 video clips for training, 638 clips for test 1. Crossing 2. Queueing 3. Talking 4. Waiting 5. Walking

16. Collective Activity Dataset Method Accuracy Contextual Model [Lan NIPS’10] 79.1 Deep Structured Model [Deng BMVC‘15] 80.6 Our Model 81.5 Cardinality Kernel [Hajimirsadeghi CVPR‘15] 83.4

17. ◼ 4830 annotated frames from 55 YouTube videos. ◼ 9 person level labels, and 8 group activity labels. New Volleyball Dataset

18. Volleyball Dataset Spatial Model Non Spatial Model

19. Volleyball Dataset: Success/Failure

20. Summary ◼ A two stage hierarchical model for group activity recognition ◼ LSTMs as a highly effective temporal model and temporal feature source ◼ Decent people-relation modeling with simple pooling ◼ Code & Dataset Link

A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16)

Recommended

Recommended

More Related Content

Similar to A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16)

Similar to A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16) (20)

Recently uploaded

Recently uploaded (20)

A Hierarchical Deep Temporal Model for Group Activity Recognition (CVPR16)