SlideShare a Scribd company logo
1 of 22
Download to read offline
TRECVID 2022 DEEP VIDEO UNDERSTANDING
INTRODUCTION AND TASK OVERVIEW
Keith Curtis
National Institute of Standards and Technology
George Awad
National Institute of Standards and Technology
Disclaimer: Certain commercial entities, equipment, or materials
may be identified in this document in order to describe an
experimental procedure or concept adequately. Such identification is
not intended to imply recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that the entities, materials, or equipment are necessarily the
best available for the purpose. The views and conclusions contained
herein are those of the authors and should not be interpreted as
necessarily representing the official policies or endorsements, either
expressed or implied, of NIST, or the U.S. Government.
Table of Contents
• Task Goals & Definition
• Data
• Annotation Framework
• Topics (Queries)
• Participating Teams
• Evaluation and Results
• General Observation
Task Goals
• Analyze long duration videos holistically.
• Exploit all available modalities (audio, video, image, & text) to
analyze both visual and non-visual elements.
• As the movies domain data can simulate the real world, many
lessons learned are expected to benefit different kinds of
applications
Task Definition
▪ Given:
▪ Whole original movie (e.g 1.5 - 2hrs long)
▪ Image snapshots of main entities (persons, locations, and concepts) per movie
▪ Ontology of relationships, interactions, locations, and sentiments used to
annotate each movie at global movie-level (relationships between entities) as well
as on fine-grained scene-level (scene sentiment, interactions between characters,
and locations of scenes)
▪ Generate a knowledge-base of the main actors and their relations (such as family,
work, social, etc.) over the whole movie, and of interactions between them over the
scene level.
▪ The task supported two auto generated query types on the movie-level as well as on
scene-level per movie.
Data
• Training Set comprised of 14 Creative Commons (CC) licensed movies.
• Long duration videos with a self-contained storyline.
• Videos range from 18 minutes in length to 109 minutes.
• Test Set comprised of 6 movies licensed from Kinolorber*.
• Long duration videos with a self-contained storyline.
• Videos range from 79 minutes in length to 92 minutes.
*https://kinolorberedu.com/
Annotation Framework
• Movies are first divided into scenes.
• A set of dedicated annotators were hired to work with us on the annotation framework[1].
• Annotators watch full movies, isolate and take images of main characters, places, &
concepts. Draw Knowledge Graph (KG) of full movie using yEd* graphing tool.
• Annotators watch individual scenes, and draw KG over the scene level recording
interactions between characters, chronological order of such, scene sentiments,
relationships, character’s emotional states, and a natural language description.
[1] Loc, E., Curtis, K., Awad, G., Rajput, S., & Soboroff, I. (2022). Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding. P-VLAM, 12.
* https://www.yworks.com/products/yed
Annotation: Movie-level
• KG annotates relations between main characters, locations, and entities.
• XGML graph file is processed later for query generation
Annotation: Scene-level
• KG annotates interactions,
attributes, and relations
between characters.
• Natural language text
descriptions are also
provided for each scene.
Queries: Movie-level
1. Question Answering (Required): This query type represents questions on the
resulting KG in the form of multiple-choice questions.
2. Fill in the graph space (Optional): Given a list of people, entities, and/or
relationships for certain nodes, where some nodes are replaced by variables X, Y,
etc., solve for X, Y etc.
Queries: Scene-level
1. Find the next or previous interaction (Required): Given a scene number a, and an
interaction i between two characters x & y, what is the immediate next or previous
interaction, in scene b, between x and y?
2. Find the unique scene (Optional): Given a full, inclusive list of interactions, unique
to a specific scene in the movie, teams should find which scene this is.
Query Samples: Movie-level
Works at ?
Manny
Visual modality helps
to answer the query
**All images are under CC license
Query Samples: Scene-level
Jack Pam
Scene 19
Audio modality helps
to answer the query
** Images and video clip are under CC license
Metrics
Movie-Level
• Question answering : correct answers/total questions.
• Fill in Graph questions : Mean Reciprocal Rank (MMR).
Scene-Level
• Next / Previous interaction questions : correct answers/total questions..
• Find unique scene : Mean Reciprocal Rank (MMR).
DVU 2022: 3 Finishers (out of 13)
Team Organization
Adapt ADAPT Research Centre, Dublin City University
columbia_graphen Graphen, Inc., Columbia University
WHU_NERCMS
National Engineering Research Center for Multimedia
Software, Wuhan University, Wuhan City, Hubei Province,
China
Results Summary by run
Performance
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
ADAPT Columbia_1 Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2
Scene-Level
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
45.0%
Columbia_2 Columbia_1 WHU_NERCMS_1 WHU_NERCMS_2
Movie-Level
Results by query types : Movie-Level
Performance
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
Columbia_2 Columbia_1 WHU_NERCMS_1 WHU_NERCMS_2
Question Answering
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
WHU_NERCMS_1 Columbia_1 Columbia_2 WHU_NERCMS_2
Fill in Graph Space
Results by query types : Scene-Level
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
ADAPT Columbia_1 WHU_NERCMS_1 Columbia_2 WHU_NERCMS_2
Performance
Find Unique Scene
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
Columbia_1 ADAPT Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2
Next/Previous Interaction
Results by runs
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Calloused_Hands ChainedForLife Liberty_Kid like_me little_rock losing_ground
movie-level results by run
Columbia_1 Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2
Results by runs
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
Calloused_Hands ChainedForLife Liberty_Kid like_me little_rock losing_ground
scene-level results by run
ADAPT Columbia_1 Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2
Conclusions
1. Submissions for movie-level queries scored much higher than for
scene-level queries, indicating that the movie-level queries were
easier or movie-level algorithms were more successful than the
scene algorithms.
2. Movie-level Question Answering queries scored higher than Fill in
the Graph Space queries, also indicating that these were slightly
easier.
3. Scene-level Find Next or Previous Interaction queries scored close
results to Find the Unique Scene queries.
4. Queries for the movie ‘Calloused Hands’ scored higher than any
other movie. ‘Liberty Kid’ was among the lowest scoring.
Conclusions
5. 13 teams registered for this year’s task. Out of these, 3 teams
submitted runs. We would wish to see an improvement on this in
subsequent years.
6. Improvements to the dataset for this task is an ongoing process,
test-set movies from this year’s task will be added to the training-
set for next year’s task, and additional new movies will be
annotated and used for the test-set.

More Related Content

Similar to Multimedia Understanding at TRECVID 2022

Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
Action_recognition-topic.pptx
Action_recognition-topic.pptxAction_recognition-topic.pptx
Action_recognition-topic.pptxcomputerscience98
 
Sentiment Analysis.pptx
Sentiment Analysis.pptxSentiment Analysis.pptx
Sentiment Analysis.pptxsherifsalem24
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AIYi-Shin Chen
 
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Goergen Institute for Data Science
 
Dependency Injection in .NET applications
Dependency Injection in .NET applicationsDependency Injection in .NET applications
Dependency Injection in .NET applicationsBabak Naffas
 
Video Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyVideo Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyIRJET Journal
 
IceBreaker Solving Cold Start Problem For Video Recommendation Engines
IceBreaker  Solving Cold Start Problem For Video Recommendation EnginesIceBreaker  Solving Cold Start Problem For Video Recommendation Engines
IceBreaker Solving Cold Start Problem For Video Recommendation EnginesJamie Boyd
 
Autom editor video blooper recognition and localization for automatic monolo...
Autom editor  video blooper recognition and localization for automatic monolo...Autom editor  video blooper recognition and localization for automatic monolo...
Autom editor video blooper recognition and localization for automatic monolo...Carlos Toxtli
 
Branded Media Panel Presentation
Branded Media Panel PresentationBranded Media Panel Presentation
Branded Media Panel Presentationcourtherm
 
AAMAS-2006 TANDEM Design Method (poster format)
AAMAS-2006 TANDEM Design Method (poster format)AAMAS-2006 TANDEM Design Method (poster format)
AAMAS-2006 TANDEM Design Method (poster format)Steve Goschnick
 
2015 2016 assignment brief 1,16,2, 22 final
2015   2016 assignment brief 1,16,2, 22 final2015   2016 assignment brief 1,16,2, 22 final
2015 2016 assignment brief 1,16,2, 22 finalctkmedia
 
So you want your students to produce digital video: some practical guidance
So you want your students to produce digital video: some practical guidanceSo you want your students to produce digital video: some practical guidance
So you want your students to produce digital video: some practical guidanceChris Willmott
 
Military Police Host Nation Policing Simulation: A Case Study into Designing ...
Military Police Host Nation Policing Simulation: A Case Study into Designing ...Military Police Host Nation Policing Simulation: A Case Study into Designing ...
Military Police Host Nation Policing Simulation: A Case Study into Designing ...Ronald Punako, Jr.
 
TRECVID 2019 Video to Text
TRECVID 2019 Video to TextTRECVID 2019 Video to Text
TRECVID 2019 Video to TextGeorge Awad
 
Coursework Introduction
Coursework IntroductionCoursework Introduction
Coursework IntroductionATith
 
Video production pedagogy
Video production pedagogyVideo production pedagogy
Video production pedagogyChris Willmott
 
Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022George Awad
 

Similar to Multimedia Understanding at TRECVID 2022 (20)

Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
Action_recognition-topic.pptx
Action_recognition-topic.pptxAction_recognition-topic.pptx
Action_recognition-topic.pptx
 
Sentiment Analysis.pptx
Sentiment Analysis.pptxSentiment Analysis.pptx
Sentiment Analysis.pptx
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AI
 
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
 
Dependency Injection in .NET applications
Dependency Injection in .NET applicationsDependency Injection in .NET applications
Dependency Injection in .NET applications
 
Video Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyVideo Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: Survey
 
D25014017
D25014017D25014017
D25014017
 
IceBreaker Solving Cold Start Problem For Video Recommendation Engines
IceBreaker  Solving Cold Start Problem For Video Recommendation EnginesIceBreaker  Solving Cold Start Problem For Video Recommendation Engines
IceBreaker Solving Cold Start Problem For Video Recommendation Engines
 
Autom editor video blooper recognition and localization for automatic monolo...
Autom editor  video blooper recognition and localization for automatic monolo...Autom editor  video blooper recognition and localization for automatic monolo...
Autom editor video blooper recognition and localization for automatic monolo...
 
Branded Media Panel Presentation
Branded Media Panel PresentationBranded Media Panel Presentation
Branded Media Panel Presentation
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
 
AAMAS-2006 TANDEM Design Method (poster format)
AAMAS-2006 TANDEM Design Method (poster format)AAMAS-2006 TANDEM Design Method (poster format)
AAMAS-2006 TANDEM Design Method (poster format)
 
2015 2016 assignment brief 1,16,2, 22 final
2015   2016 assignment brief 1,16,2, 22 final2015   2016 assignment brief 1,16,2, 22 final
2015 2016 assignment brief 1,16,2, 22 final
 
So you want your students to produce digital video: some practical guidance
So you want your students to produce digital video: some practical guidanceSo you want your students to produce digital video: some practical guidance
So you want your students to produce digital video: some practical guidance
 
Military Police Host Nation Policing Simulation: A Case Study into Designing ...
Military Police Host Nation Policing Simulation: A Case Study into Designing ...Military Police Host Nation Policing Simulation: A Case Study into Designing ...
Military Police Host Nation Policing Simulation: A Case Study into Designing ...
 
TRECVID 2019 Video to Text
TRECVID 2019 Video to TextTRECVID 2019 Video to Text
TRECVID 2019 Video to Text
 
Coursework Introduction
Coursework IntroductionCoursework Introduction
Coursework Introduction
 
Video production pedagogy
Video production pedagogyVideo production pedagogy
Video production pedagogy
 
Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022
 

More from George Awad

Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022George Awad
 
Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022George Awad
 
Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022George Awad
 
Video Search at TRECVID 2022
Video Search at TRECVID 2022Video Search at TRECVID 2022
Video Search at TRECVID 2022George Awad
 
TRECVID 2019 introduction slides
TRECVID 2019 introduction slidesTRECVID 2019 introduction slides
TRECVID 2019 introduction slidesGeorge Awad
 
TRECVID 2019 Instance Search
TRECVID 2019 Instance SearchTRECVID 2019 Instance Search
TRECVID 2019 Instance SearchGeorge Awad
 
TRECVID 2019 Ad-hoc Video Search
TRECVID 2019 Ad-hoc Video SearchTRECVID 2019 Ad-hoc Video Search
TRECVID 2019 Ad-hoc Video SearchGeorge Awad
 
[TRECVID 2018] Video to Text
[TRECVID 2018] Video to Text[TRECVID 2018] Video to Text
[TRECVID 2018] Video to TextGeorge Awad
 
[TRECVID 2018] Instance Search
[TRECVID 2018] Instance Search[TRECVID 2018] Instance Search
[TRECVID 2018] Instance SearchGeorge Awad
 
[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video Search[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video SearchGeorge Awad
 
Instance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDInstance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDGeorge Awad
 
Activity detection at TRECVID
Activity detection at TRECVIDActivity detection at TRECVID
Activity detection at TRECVIDGeorge Awad
 
The Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVIDThe Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVIDGeorge Awad
 
Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)George Awad
 
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)George Awad
 
TRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event DetectionTRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event DetectionGeorge Awad
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationGeorge Awad
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionGeorge Awad
 
TRECVID 2016 : Instance Search
TRECVID 2016 : Instance SearchTRECVID 2016 : Instance Search
TRECVID 2016 : Instance SearchGeorge Awad
 
TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search George Awad
 

More from George Awad (20)

Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022
 
Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022
 
Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022
 
Video Search at TRECVID 2022
Video Search at TRECVID 2022Video Search at TRECVID 2022
Video Search at TRECVID 2022
 
TRECVID 2019 introduction slides
TRECVID 2019 introduction slidesTRECVID 2019 introduction slides
TRECVID 2019 introduction slides
 
TRECVID 2019 Instance Search
TRECVID 2019 Instance SearchTRECVID 2019 Instance Search
TRECVID 2019 Instance Search
 
TRECVID 2019 Ad-hoc Video Search
TRECVID 2019 Ad-hoc Video SearchTRECVID 2019 Ad-hoc Video Search
TRECVID 2019 Ad-hoc Video Search
 
[TRECVID 2018] Video to Text
[TRECVID 2018] Video to Text[TRECVID 2018] Video to Text
[TRECVID 2018] Video to Text
 
[TRECVID 2018] Instance Search
[TRECVID 2018] Instance Search[TRECVID 2018] Instance Search
[TRECVID 2018] Instance Search
 
[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video Search[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video Search
 
Instance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDInstance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVID
 
Activity detection at TRECVID
Activity detection at TRECVIDActivity detection at TRECVID
Activity detection at TRECVID
 
The Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVIDThe Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVID
 
Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)
 
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
 
TRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event DetectionTRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event Detection
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept Localization
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text Description
 
TRECVID 2016 : Instance Search
TRECVID 2016 : Instance SearchTRECVID 2016 : Instance Search
TRECVID 2016 : Instance Search
 
TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Multimedia Understanding at TRECVID 2022

  • 1. TRECVID 2022 DEEP VIDEO UNDERSTANDING INTRODUCTION AND TASK OVERVIEW Keith Curtis National Institute of Standards and Technology George Awad National Institute of Standards and Technology
  • 2. Disclaimer: Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NIST, or the U.S. Government.
  • 3. Table of Contents • Task Goals & Definition • Data • Annotation Framework • Topics (Queries) • Participating Teams • Evaluation and Results • General Observation
  • 4. Task Goals • Analyze long duration videos holistically. • Exploit all available modalities (audio, video, image, & text) to analyze both visual and non-visual elements. • As the movies domain data can simulate the real world, many lessons learned are expected to benefit different kinds of applications
  • 5. Task Definition ▪ Given: ▪ Whole original movie (e.g 1.5 - 2hrs long) ▪ Image snapshots of main entities (persons, locations, and concepts) per movie ▪ Ontology of relationships, interactions, locations, and sentiments used to annotate each movie at global movie-level (relationships between entities) as well as on fine-grained scene-level (scene sentiment, interactions between characters, and locations of scenes) ▪ Generate a knowledge-base of the main actors and their relations (such as family, work, social, etc.) over the whole movie, and of interactions between them over the scene level. ▪ The task supported two auto generated query types on the movie-level as well as on scene-level per movie.
  • 6. Data • Training Set comprised of 14 Creative Commons (CC) licensed movies. • Long duration videos with a self-contained storyline. • Videos range from 18 minutes in length to 109 minutes. • Test Set comprised of 6 movies licensed from Kinolorber*. • Long duration videos with a self-contained storyline. • Videos range from 79 minutes in length to 92 minutes. *https://kinolorberedu.com/
  • 7. Annotation Framework • Movies are first divided into scenes. • A set of dedicated annotators were hired to work with us on the annotation framework[1]. • Annotators watch full movies, isolate and take images of main characters, places, & concepts. Draw Knowledge Graph (KG) of full movie using yEd* graphing tool. • Annotators watch individual scenes, and draw KG over the scene level recording interactions between characters, chronological order of such, scene sentiments, relationships, character’s emotional states, and a natural language description. [1] Loc, E., Curtis, K., Awad, G., Rajput, S., & Soboroff, I. (2022). Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding. P-VLAM, 12. * https://www.yworks.com/products/yed
  • 8. Annotation: Movie-level • KG annotates relations between main characters, locations, and entities. • XGML graph file is processed later for query generation
  • 9. Annotation: Scene-level • KG annotates interactions, attributes, and relations between characters. • Natural language text descriptions are also provided for each scene.
  • 10. Queries: Movie-level 1. Question Answering (Required): This query type represents questions on the resulting KG in the form of multiple-choice questions. 2. Fill in the graph space (Optional): Given a list of people, entities, and/or relationships for certain nodes, where some nodes are replaced by variables X, Y, etc., solve for X, Y etc.
  • 11. Queries: Scene-level 1. Find the next or previous interaction (Required): Given a scene number a, and an interaction i between two characters x & y, what is the immediate next or previous interaction, in scene b, between x and y? 2. Find the unique scene (Optional): Given a full, inclusive list of interactions, unique to a specific scene in the movie, teams should find which scene this is.
  • 12. Query Samples: Movie-level Works at ? Manny Visual modality helps to answer the query **All images are under CC license
  • 13. Query Samples: Scene-level Jack Pam Scene 19 Audio modality helps to answer the query ** Images and video clip are under CC license
  • 14. Metrics Movie-Level • Question answering : correct answers/total questions. • Fill in Graph questions : Mean Reciprocal Rank (MMR). Scene-Level • Next / Previous interaction questions : correct answers/total questions.. • Find unique scene : Mean Reciprocal Rank (MMR).
  • 15. DVU 2022: 3 Finishers (out of 13) Team Organization Adapt ADAPT Research Centre, Dublin City University columbia_graphen Graphen, Inc., Columbia University WHU_NERCMS National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan City, Hubei Province, China
  • 16. Results Summary by run Performance 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% ADAPT Columbia_1 Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2 Scene-Level 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 45.0% Columbia_2 Columbia_1 WHU_NERCMS_1 WHU_NERCMS_2 Movie-Level
  • 17. Results by query types : Movie-Level Performance 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% Columbia_2 Columbia_1 WHU_NERCMS_1 WHU_NERCMS_2 Question Answering 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% WHU_NERCMS_1 Columbia_1 Columbia_2 WHU_NERCMS_2 Fill in Graph Space
  • 18. Results by query types : Scene-Level 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% ADAPT Columbia_1 WHU_NERCMS_1 Columbia_2 WHU_NERCMS_2 Performance Find Unique Scene 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% Columbia_1 ADAPT Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2 Next/Previous Interaction
  • 19. Results by runs 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% Calloused_Hands ChainedForLife Liberty_Kid like_me little_rock losing_ground movie-level results by run Columbia_1 Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2
  • 20. Results by runs 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% Calloused_Hands ChainedForLife Liberty_Kid like_me little_rock losing_ground scene-level results by run ADAPT Columbia_1 Columbia_2 WHU_NERCMS_1 WHU_NERCMS_2
  • 21. Conclusions 1. Submissions for movie-level queries scored much higher than for scene-level queries, indicating that the movie-level queries were easier or movie-level algorithms were more successful than the scene algorithms. 2. Movie-level Question Answering queries scored higher than Fill in the Graph Space queries, also indicating that these were slightly easier. 3. Scene-level Find Next or Previous Interaction queries scored close results to Find the Unique Scene queries. 4. Queries for the movie ‘Calloused Hands’ scored higher than any other movie. ‘Liberty Kid’ was among the lowest scoring.
  • 22. Conclusions 5. 13 teams registered for this year’s task. Out of these, 3 teams submitted runs. We would wish to see an improvement on this in subsequent years. 6. Improvements to the dataset for this task is an ongoing process, test-set movies from this year’s task will be added to the training- set for next year’s task, and additional new movies will be annotated and used for the test-set.