Boundary-sensitive Pre-training for Temporal Localization in Videos cvpr21-talk

•Download as PPTX, PDF•

0 likes•71 views

Many video analysis tasks require temporal localization for the detection of content changes. However, most existing models developed for these tasks are pre-trained on general video action classification tasks. This is due to large scale annotation of temporal boundaries in untrimmed videos being expensive. Therefore, no suitable datasets exist that enable pre-training in a manner sensitive to temporal boundaries. In this paper for the first time, we investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext (BSP) task. Instead of relying on costly manual annotations of temporal boundaries, we propose to synthesize temporal boundaries in existing video action classification datasets. By defining different ways of synthesizing boundaries, BSP can then be simply conducted in a self-supervised manner via the classification of the boundary types. This enables the learning of video representations that are much more transferable to downstream temporal localization tasks. Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification-based pre-training counterpart, and achieves new state-of-the-art performance on several temporal localization tasks.

Engineering

Mengmeng Xu (Frost), @SAIC-KAUST
Boundary-sensitive Pretraining for
Temporal Localizations
An Application of Generic Event Boundary

Mengmeng Xu (Frost)
Our team
Juan-Manuel Pérez-Rúa Victor Escorcia Brais Martinez Xiatian Zhu
Li Zhang Bernard Ghanem Tao Xiang (Tony)

Mengmeng Xu (Frost)
Temporal Action Localization (TAL)
Generic Event Boundary in TAL
Boundary Synthesis and Pretraining
Results and Visualizations
Future Directions
Table of Contents

Mengmeng Xu (Frost)
Let’s Start By Defining The Task
Temporal Action Localization
When is the Activity Happening?
[(1:20, 1:32), (1:43, 1:59), …]
What Activity is Happening?
[Long Jump, Long Jump, …]

Mengmeng Xu (Frost)
Let’s Start By Defining The Task
Temporal Action Localization
Input: Long Untrimmed Video
Polishing
Furniture
Output: Temporally Localized Activity
Start Time End Time

Mengmeng Xu (Frost)
Temporal Action Localization
Our goal is to encourage the development of
automated systems to recognize and
localize human activities in videos

Mengmeng Xu (Frost)
Video Localization Applications
Temporal Action Localization

Mengmeng Xu (Frost)
What is the state-of-the-art?
Temporal Action Localization

Mengmeng Xu (Frost)
Temporal Action Localization
Solution: Two-stage training
Feature

Mengmeng Xu (Frost)
Temporal Action Localization
Stage1: train video backbone
Feature

Mengmeng Xu (Frost)
Temporal Action Localization
Stage 2: train TAL head
Feature

Mengmeng Xu (Frost)
Temporal Action Localization
Problems in Training Stage 1:
Pre-training Dataset Task
KINETICS
Image Classification
Action Recognition
✓
✓

Mengmeng Xu (Frost)
Temporal Action Localization
Problems in Training Stage 1:
Pre-training Dataset Task
KINETICS
Boundary
Action Recognition
✓
?
Temporal Localization
✓

Mengmeng Xu (Frost)
Temporal Action Localization
Boundary is the key to solve TAL!
Figure is from BSN [1]

Mengmeng Xu (Frost)
Generic Event Boundary in TAL
What is Generic Event Boundary?
Figure is from GEBD [2]
…

Mengmeng Xu (Frost)
Generic Event Boundary in TAL
Examples in TAL datasets (shot cut)
Collected from HACS [

Mengmeng Xu (Frost)
Generic Event Boundary in TAL
Examples in TAL datasets (action change)
Collected from ANET [4

Mengmeng Xu (Frost)
Generic Event Boundary in TAL
Examples in TAL datasets (new subject)
Collected from ANET [4

Mengmeng Xu (Frost)
Boundary Synthesis and Pretraining
Synthesis Method
• Diff-Class
• Same-Class
• Diff-Speed

Mengmeng Xu (Frost)
Boundary Synthesis and Pretraining
Examples: Diff-Class

Mengmeng Xu (Frost)
Boundary Synthesis and Pretraining
Vanilla
Encoder
concat
classifier
BSP
Encoder
classifier
Long Jump
Zumba
Cricket
Same-class
Diff-class
Diff-speed
Integration of BSP
TAL input
Figure is from G-TAD [5]

Mengmeng Xu (Frost)
BSP Results (TAL)
Results and Visualizations

Mengmeng Xu (Frost)
BSP Results (others)
Results and Visualizations
CrossTask dataset
Step Localization Task
Video Language Grounding Task

Mengmeng Xu (Frost)
Results and Visualizations
Qualitative Visualizations

Mengmeng Xu (Frost)
• Backbone pre-training on GEBD dataset
• Advanced Integration of boundary encoding
• Boundary-Aware TAL solution
Future Directions

Mengmeng Xu (Frost)
References
[1] Lin, Tianwei, et al. "Bsn: Boundary sensitive network for temporal action proposal
generation." Proceedings of the European Conference on Computer Vision (ECCV).
2018.
[2] Shou, Mike Zheng, et al. "Generic Event Boundary Detection: A Benchmark for Event
Segmentation." arXiv preprint arXiv:2101.10511 (2021).
[3] Zhao, Hang, et al. "Hacs: Human action clips and segments dataset for recognition
and temporal localization." Proceedings of the IEEE/CVF International Conference on
Computer Vision. 2019.
[4] Caba Heilbron, Fabian, et al. "Activitynet: A large-scale video benchmark for human
activity understanding." Proceedings of the ieee conference on computer vision and
pattern recognition. 2015.
[5] Xu, Mengmeng, et al. "G-tad: Sub-graph localization for temporal action
detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2020.

Our Report
Thank you! 
https://github.com/frostinassiky/bsp

Recently uploaded

data_management_and _data_science_cheat_sheet.pdfJiananWang21

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6

GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948

Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies

Generative AI or GenAI technology based PPTbhaskargani46

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Kandungan 087776558899

Introduction to Serverless with AWS LambdaOmar Fathy

A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1

Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31

Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan

Block diagram reduction techniques in control systems.pptNANDHAKUMARA10

Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6

kiln thermal load.pptx kiln tgermal loadhamedmustafa094

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30

Employee leave management system project.Kamal Acharya

Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1

Hospital management system project report.pdfKamal Acharya

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998

Recently uploaded (20)

data_management_and _data_science_cheat_sheet.pdf

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx

GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE

Standard vs Custom Battery Packs - Decoding the Power Play

Generative AI or GenAI technology based PPT

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil

Introduction to Serverless with AWS Lambda

A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx

Work-Permit-Receiver-in-Saudi-Aramco.pptx

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait

Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...

Block diagram reduction techniques in control systems.ppt

Computer Lecture 01.pptxIntroduction to Computers

kiln thermal load.pptx kiln tgermal load

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf

Employee leave management system project.

Wadi Rum luxhotel lodge Analysis case study.pptx

Hospital management system project report.pdf

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"

Featured

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

Boundary-sensitive Pre-training for Temporal Localization in Videos cvpr21-talk

1. Mengmeng Xu (Frost), @SAIC-KAUST Boundary-sensitive Pretraining for Temporal Localizations An Application of Generic Event Boundary

2. Mengmeng Xu (Frost) Our team Juan-Manuel Pérez-Rúa Victor Escorcia Brais Martinez Xiatian Zhu Li Zhang Bernard Ghanem Tao Xiang (Tony)

3. Mengmeng Xu (Frost) Temporal Action Localization (TAL) Generic Event Boundary in TAL Boundary Synthesis and Pretraining Results and Visualizations Future Directions Table of Contents

4. Mengmeng Xu (Frost) Let’s Start By Defining The Task Temporal Action Localization When is the Activity Happening? [(1:20, 1:32), (1:43, 1:59), …] What Activity is Happening? [Long Jump, Long Jump, …]

5. Mengmeng Xu (Frost) Let’s Start By Defining The Task Temporal Action Localization Input: Long Untrimmed Video Polishing Furniture Output: Temporally Localized Activity Start Time End Time

6. Mengmeng Xu (Frost) Temporal Action Localization Our goal is to encourage the development of automated systems to recognize and localize human activities in videos

7. Mengmeng Xu (Frost) Video Localization Applications Temporal Action Localization

8. Mengmeng Xu (Frost) What is the state-of-the-art? Temporal Action Localization

9. Mengmeng Xu (Frost) Temporal Action Localization Solution: Two-stage training Feature

10. Mengmeng Xu (Frost) Temporal Action Localization Stage1: train video backbone Feature

11. Mengmeng Xu (Frost) Temporal Action Localization Stage 2: train TAL head Feature

12. Mengmeng Xu (Frost) Temporal Action Localization Problems in Training Stage 1: Pre-training Dataset Task KINETICS Image Classification Action Recognition ✓ ✓

13. Mengmeng Xu (Frost) Temporal Action Localization Problems in Training Stage 1: Pre-training Dataset Task KINETICS Boundary Action Recognition ✓ ? Temporal Localization ✓

14. Mengmeng Xu (Frost) Temporal Action Localization Boundary is the key to solve TAL! Figure is from BSN [1]

15. Mengmeng Xu (Frost) Temporal Action Localization (TAL) Generic Event Boundary in TAL Boundary Synthesis and Pretraining Results and Visualizations Future Directions Table of Contents

16. Mengmeng Xu (Frost) Generic Event Boundary in TAL What is Generic Event Boundary? Figure is from GEBD [2] …

17. Mengmeng Xu (Frost) Generic Event Boundary in TAL Examples in TAL datasets (shot cut) Collected from HACS [

18. Mengmeng Xu (Frost) Generic Event Boundary in TAL Examples in TAL datasets (action change) Collected from ANET [4

19. Mengmeng Xu (Frost) Generic Event Boundary in TAL Examples in TAL datasets (new subject) Collected from ANET [4

20. Mengmeng Xu (Frost) Temporal Action Localization (TAL) Generic Event Boundary in TAL Boundary Synthesis and Pretraining Results and Visualizations Future Directions Table of Contents

21. Mengmeng Xu (Frost) Boundary Synthesis and Pretraining Synthesis Method • Diff-Class • Same-Class • Diff-Speed

22. Mengmeng Xu (Frost) Boundary Synthesis and Pretraining Examples: Diff-Class

23. Mengmeng Xu (Frost) Boundary Synthesis and Pretraining Vanilla Encoder concat classifier BSP Encoder classifier Long Jump Zumba Cricket Same-class Diff-class Diff-speed Integration of BSP TAL input Figure is from G-TAD [5]

24. Mengmeng Xu (Frost) Temporal Action Localization (TAL) Generic Event Boundary in TAL Boundary Synthesis and Pretraining Results and Visualizations Future Directions Table of Contents

25. Mengmeng Xu (Frost) BSP Results (TAL) Results and Visualizations

26. Mengmeng Xu (Frost) BSP Results (others) Results and Visualizations CrossTask dataset Step Localization Task Video Language Grounding Task

27. Mengmeng Xu (Frost) Results and Visualizations Qualitative Visualizations

28. Mengmeng Xu (Frost) Temporal Action Localization (TAL) Generic Event Boundary in TAL Boundary Synthesis and Pretraining Results and Visualizations Future Directions Table of Contents

29. Mengmeng Xu (Frost) • Backbone pre-training on GEBD dataset • Advanced Integration of boundary encoding • Boundary-Aware TAL solution Future Directions

30. Mengmeng Xu (Frost) References [1] Lin, Tianwei, et al. "Bsn: Boundary sensitive network for temporal action proposal generation." Proceedings of the European Conference on Computer Vision (ECCV). 2018. [2] Shou, Mike Zheng, et al. "Generic Event Boundary Detection: A Benchmark for Event Segmentation." arXiv preprint arXiv:2101.10511 (2021). [3] Zhao, Hang, et al. "Hacs: Human action clips and segments dataset for recognition and temporal localization." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. [4] Caba Heilbron, Fabian, et al. "Activitynet: A large-scale video benchmark for human activity understanding." Proceedings of the ieee conference on computer vision and pattern recognition. 2015. [5] Xu, Mengmeng, et al. "G-tad: Sub-graph localization for temporal action detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

31. Our Report Thank you!  https://github.com/frostinassiky/bsp

Editor's Notes

Hi everyone. I am Mengmeng Xu. Thanks for coming to my talk. I will introduce our submission to the second track of LOVEU workshop.
Most of the recent work only focus on the development of the TAL head. In this talk, I will show that we can also improve the pretraining process of the video encoder.

Boundary-sensitive Pre-training for Temporal Localization in Videos cvpr21-talk

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Boundary-sensitive Pre-training for Temporal Localization in Videos cvpr21-talk

Editor's Notes