SlideShare a Scribd company logo
1 of 10
2022-04-21
Sangmin Woo
Computational Intelligence Lab.
School of Electrical Engineering
Korea Advanced Institute of Science and Technology (KAIST)
Video Grounding
Benchmarks & Approaches
2
Video Grouding?
Video Grounding tries to determine the temporal boundaries
of the video moment corresponding to the given sentence [1].
Natural Language Video Localization is retrieving a
specific temporal segment, or moment, from a video given a
natural language text description [2].
Video Moment Retrieval aims to extract a video moment
from the untrimmed video that best matches the query [3].
[1] Zhang, Zhu, et al. "Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding." NIPS 2020
[2] Anne Hendricks, Lisa, et al. "Localizing moments in video with natural language." CCV 2017
[3] Zeng, Yawen, et al. "Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval." CVPR 2021
3
Keywords
Referring Expressions Comprehension
Temporal Language Grounding
Natural Language Video Localization
Video Description Grounding
Phrase Localization
Natural Language Object Retrieval
Video Moment Retrieval
Temporal Video Grounding
All keywords (Grounding, Retrieval, Localization,
Referring)
are interchangeable!
4
Natural Language Video Localization
Anne Hendricks, Lisa, et al. "Localizing moments in video with natural language." ICCV 2017
Chen, Jingyuan, et al. "Localizing natural language in videos." AAAI 2019
5
Grounded Video Description
Zhou, Luowei, et al. "Grounded video description." CVPR 2019
6
Dense Events Grounding
Bao, Peijun, et al. "Dense Events Grounding in Video." AAAI 2021
7
Spatio-Temporal Video Grounding
Tang, Zongheng, et al. "Human-centric spatio-temporal video grounding with visual transformers.“ TCSVT 2021
Zhang, Zhu, et al. "Where does it exist: Spatio-temporal video grounding for multi-form sentences." CVPR 2020
Yamaguchi, Masataka, et al. "Spatio-temporal person retrieval via natural language queries." ICCV 2017
8
Applications?
Lei, Jie, et al. "Tvqa+: Spatio-temporal grounding for video question answering.“ ACL 2020
https://www.youtube.com/results?search_query=A+man+is+holding+a+woman+while+the+woman+is+spreading+her+arms+at+the+front+of+the+shi
p
Video Question Answering
Searching Videos in YouTube
“A man is holding a
woman while the
woman is spreading
her arms at the front
of the ship.”
9
Approaches
https://github.com/SCZwangxiao/Temporal-Language-
Grounding-in-videos
https://github.com/TheShadow29/awesome-grounding
https://github.com/yawenzeng/Awesome-Cross-Modal-
Video-Moment-Retrieval
https://github.com/iworldtong/Awesome-Temporal-
Sentence-Grounding-in-Videos
Thank
You
Sangmin Woo
sangminwoo.github.i
o
smwoo95@kaist.ac.k
r
sangminwoo

More Related Content

What's hot

تقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليم
تقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليمتقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليم
تقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليم
Naseej Academy أكاديمية نسيج
 
Deep-Learning-Based Environmental Sound Segmentation - Integration of Sound ...
Deep-Learning-Based  Environmental Sound Segmentation - Integration of Sound ...Deep-Learning-Based  Environmental Sound Segmentation - Integration of Sound ...
Deep-Learning-Based Environmental Sound Segmentation - Integration of Sound ...
Yui Sudo
 
محاضرات في التأمين - Morad ASSARRAJ.ppt
محاضرات في التأمين - Morad ASSARRAJ.pptمحاضرات في التأمين - Morad ASSARRAJ.ppt
محاضرات في التأمين - Morad ASSARRAJ.ppt
SalamChriki
 
5-certificat de travail 03 HSE
5-certificat de travail 03 HSE5-certificat de travail 03 HSE
5-certificat de travail 03 HSE
Djaber Djefafla
 

What's hot (15)

تقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليم
تقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليمتقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليم
تقنية سلسلة الكتل (بلوك تشين) Blockchain وتطبيقاتها في التعليم
 
النموذج التنموي الجديد للأقاليم الجنوبية
 النموذج التنموي الجديد للأقاليم الجنوبية النموذج التنموي الجديد للأقاليم الجنوبية
النموذج التنموي الجديد للأقاليم الجنوبية
 
المحافظة على الظهر
المحافظة على الظهرالمحافظة على الظهر
المحافظة على الظهر
 
Deep-Learning-Based Environmental Sound Segmentation - Integration of Sound ...
Deep-Learning-Based  Environmental Sound Segmentation - Integration of Sound ...Deep-Learning-Based  Environmental Sound Segmentation - Integration of Sound ...
Deep-Learning-Based Environmental Sound Segmentation - Integration of Sound ...
 
Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介
 
محاضرات في التأمين - Morad ASSARRAJ.ppt
محاضرات في التأمين - Morad ASSARRAJ.pptمحاضرات في التأمين - Morad ASSARRAJ.ppt
محاضرات في التأمين - Morad ASSARRAJ.ppt
 
5-certificat de travail 03 HSE
5-certificat de travail 03 HSE5-certificat de travail 03 HSE
5-certificat de travail 03 HSE
 
Chov včelích matiek
Chov včelích matiekChov včelích matiek
Chov včelích matiek
 
Port Echo n°38 SPAT 3è Trimestre 2019
Port Echo n°38 SPAT 3è Trimestre 2019Port Echo n°38 SPAT 3è Trimestre 2019
Port Echo n°38 SPAT 3è Trimestre 2019
 
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded SystemsFastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
 
SSII2012 2D&3Dレジストレーション ~画像と3次元点群の合わせ方~ 第1部
SSII2012 2D&3Dレジストレーション ~画像と3次元点群の合わせ方~ 第1部SSII2012 2D&3Dレジストレーション ~画像と3次元点群の合わせ方~ 第1部
SSII2012 2D&3Dレジストレーション ~画像と3次元点群の合わせ方~ 第1部
 
خطة العمل أثناء الكوارث
خطة العمل أثناء الكوارث خطة العمل أثناء الكوارث
خطة العمل أثناء الكوارث
 
Liste des candidats admis au test de recrutement de la SPAT (Partenariat ent...
Liste des candidats admis au test de recrutement de la SPAT  (Partenariat ent...Liste des candidats admis au test de recrutement de la SPAT  (Partenariat ent...
Liste des candidats admis au test de recrutement de la SPAT (Partenariat ent...
 
製造業・サービス業での人とシステムとの協調
製造業・サービス業での人とシステムとの協調製造業・サービス業での人とシステムとの協調
製造業・サービス業での人とシステムとの協調
 
الشركات متعددة الجنسيات و التخطيط الاستراتيجي
 الشركات متعددة الجنسيات و التخطيط الاستراتيجي الشركات متعددة الجنسيات و التخطيط الاستراتيجي
الشركات متعددة الجنسيات و التخطيط الاستراتيجي
 

Similar to Video Grounding.pptx

Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videosAdria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Codiax
 
Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...
IJECEIAES
 

Similar to Video Grounding.pptx (6)

Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videosAdria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
 
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
 
Towards Using Semantic Features for Near-Duplicate Video Detection
Towards Using Semantic Features for Near-Duplicate Video DetectionTowards Using Semantic Features for Near-Duplicate Video Detection
Towards Using Semantic Features for Near-Duplicate Video Detection
 
Inverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy RetrievalInverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy Retrieval
 
Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...
 
Video Description using Deep Learning
Video Description using Deep LearningVideo Description using Deep Learning
Video Description using Deep Learning
 

More from Sangmin Woo

More from Sangmin Woo (14)

Multimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxMultimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptx
 
Video Transformers.pptx
Video Transformers.pptxVideo Transformers.pptx
Video Transformers.pptx
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
 
Visual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptxVisual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptx
 
Action Recognition Datasets.pptx
Action Recognition Datasets.pptxAction Recognition Datasets.pptx
Action Recognition Datasets.pptx
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Towards Efficient Transformers
Towards Efficient TransformersTowards Efficient Transformers
Towards Efficient Transformers
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Neural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextNeural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global context
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Video Grounding.pptx

Editor's Notes

  1. Thank you.