SlideShare a Scribd company logo
1 of 12
Team
• Mohammed Kaif Shaikh
• Akshat Jain
• Pooja Patil
• Darsh Jain
This paper proposes a Discriminative Latent Semantic Graph
(D-LSG) framework to generate natural language captions
that can summarize the visual contents in long videos. The
model has three main components:
• A conditional graph is used to enhance object proposals
by fusing contextual information from the video frames
• A dynamic graph aggregates the enhanced proposals into
compact visual words with higher semantic meaning
• A discriminative module validates the generated captions
by reconstructing visual words and scoring against the
original visual words to ensure fidelity and relevance. The
model can effectively leverage complex object
interactions, extract salient visual concepts from videos,
and generate captions that are content-relevant.
Video captioning aims to use natural language descriptions
to summarize the visual contents in video data. This is a
challenging task as it requires:
• Modeling complex dependencies between objects and
their interactions
• Extracting high-level visual concepts from
spatio-temporal video data
• Generating captions that accurately reflect visual
content and are semantically rich
Sr. No. Title Year Authors Methodology
Feature Extraction
Techinique
Classifier Accuracy Issues Research Gap
2
Video Joint Modelling Based on Hierarchical
Transformer for Co-summarization
2022
Haopeng Li, Qiuhong Ke, Mingming
Gong, and Rui Zhang
ML
GoogLeNet(CNN), Video
Joint Modelling based on
Hierarchical Transformer
(VJMHT)
Transformer-based models (F-
Transformer and S-Transformer) for
modeling intra-shot and inter-shot
dependencies in the video
summarization process.
80%
Low F-Measure or Rank
Correlation, Long Training and
Inference Times:
Need for Generalization Across Diverse
Datasets &Handling Long Videos and
Temporal Context
3
Video summarization using deep learning techniques:
a detailed analysis and investigation
2023
Parul Saini, Krishan Kumar, Shamal
Kashid, Ashray Saini, Alok Negi
Deep Learning 3D CNN,FCNN,DG-CNN
K-Nearest Neighbors (K-NN),Deep
Belief Network (DBN)
88%
Some GAN-based models may
produce very short summaries that
lack important details making it
challenging to strike the right
balance.
additional efforts be put into video
summarization algorithms for optimizing
the best summaries based on the
intended audience
4 Semantic Text Summarization of Long Videos 2017
Shagan Sah, Sourabh Kulhare, Allison
Gray, Subhashini Venugopalan, Emily
Prud’hommeaux, Raymond Ptucha
Deep Learning, Neural
Network
3D CNN
deep visual-captioning techniques for
feature extraction in video
summarization
70%
annotated ground truth data for
semantic video summarization may
be limited or expensive to obtain,
hindering supervised training of
CNN-based models.
Enhancing CNNs' semantic
understanding of video content, is
essential. explore techniques that allow
CNNs to recognize actions, and
relationships within video frames.
5
Towards Diverse Paragraph Captioning for
Untrimmed Videos
2021
Yuqing Song1 , Shizhe Chen, Qin Jin,
Renmin University of China,
INRIA
Machine Learning,
Reinforcement Learning
ResNet, VGG MFT,Vtransformer,AdvInf,MART 79%
the vanilla encoder brings
computation burden for long
paragraph generation,both MLE
and RL training make the model
generate high-frequency words and
phrases.
Scalability to Longer Videos,Training
with Limited Data,Evaluation Metrics
Beyond State-of-the-Art:
6
A Comprehensive Review of the Video-to-Text
Problem
2021
Jesus Perez-Martin, Benjamin Bustos,
Silvio Jamil, F. Guimaraes, Ivan Sipiran,
Jorge P´erez , Grethel Coello
ML,DL
2D CNN,NLG (Natural
Language Generator), RNN
AlexNet,ImageNet,ILSVRC 71%
An essential issue for exactly and
precise video description
generation is the selection of the
most informative frames.
Model Adaptation:Fine-tuning pre-
trained AlexNet on ImageNet may not
always lead to optimal results for
specific tasks
Content Variability, Comparative
Analysis
Limited Mention of Multimodal
Integration: researchers can choose or
create datasets that inherently require
the integration of both visual and textual
information
Discriminative Latent Semantic Graph for Video
Captioning
1 2021
Yang Bai1, Junyan Wang2,Yang Long3
,Bingzhang Hu4, Yang Song2,Maurice
Pagnucco 2,Yu Guan1
Squence to Sequence
Model , Deep Learning
2D CNNs, Faster R-
CNN,LSTM models
Language LSTM,
Multimodal bilinear pooling
70%
Sr. No. Title Year Authors Methodology
Feature Extraction
Techinique
Classifier Accuracy Issues Research Gap
7
Real Time Video to Text
Summarization using Neural Network
2020
Abhishek Yadav, Anjali Vishwakarma,
Shyama Panickar, and Prof. Satish
Kuchiwale.
Deep Learning
Convolutional Neural
Network
RNN,SoftMax
layer
75%
Training RNNs for video
summarization can suffer from the
vanishing gradient problem, where
gradients become too large. This
can impact training stability and
convergence.
Research should aim to develop
effective regularization techniques and
architectural innovations to mitigate
overfitting in RNN-based video
summarization models.
8
Video Summarization by Learning
Deep Side Semantic Embedding
2019
Yitin Yuan, Taon Mei, Senior Member
IEEE, Peng Cui and Wenwu Zhu
Deep Learning 3D-CNN DSSE Model 80%
effectively measuring the semantic
relevance between video frames
and query information
Deep Side Semantic Embedding
(DSSE) model to address these issues
by leveraging side information
to select semantically meaningful
segments from videos
9
Spatiotemporal Modeling for Video
Summarization Using Convolutional
Recurrent Neural Network
2019 Yuan Yuan, Haopeng LI, QI WANG Deep Learning 2D-CNN,DCNNs AlexNet and GoogLeNet 85%
the increasing amount of video
data,the difficulty of retrieving
valuable information conveyed by
videos and the extremely heavy
burden of data storage
improving computational
efficiency,further research into enabling
real time. especially for applications that
require rapid summarization
10
Text Semantics Based Automatic Summarization
for Chinese Videos.
2015
WANG Xingqi, ZHA Taotao, WU
Chunming, FANG Jinglong.
ML HLAC, HOG Ant Colony - Broad Range of content
There has been no attempt for text
semantic-based video summarization
prior to their proposed method.
11
Video and Text Summarization
Using VDAN and RNN
2021
Joys Princia A, Ms. J Sangeetha Priya,
Kalai Selvi J, Rithi Afra J
Deep Learning and
Neural Network
VDAN Random Forest -
visual gaps and breaks
between frames
Short-term dependencies of simple
RNNS
The key problems this model aims to address:
• Current video captioning models cannot effectively leverage complex object-level
interactions and relationships in the video data.
• They fail to extract high-level visual concepts that capture salient information from
spatio-temporal video data.
• Existing models struggle to validate the fidelity and relevance of generated captions to
the source video's visual content.
• • •
1. Literature Review
• Survey prior work in video captioning and summarization
• Understand limitations of existing methods
• Identify opportunities for improvement
2. Problem Definition
• Clearly define the problem to be solved
• Set project objectives and scope
3. Data Collection
• Gather relevant video datasets for training and testing
• Ensure diversity of video content.
4. Model Development
• Implement base encoder-decoder architecture
• Incorporate conditional graph for enhancing object proposals
• Develop dynamic graph for latent proposal aggregation
5. Training and Optimization
• Prepare training data and protocols
• Train model end-to-end with suitable loss functions
Following can be the future scopes or possible applications of the
model introduced:
1.Video Search and Retrieval
• Generate textual captions to index video content
• Enable text-based semantic search of video database
2. Video Highlight Detection
• Identify key moments and events in long videos
• Generate concise summaries for skimming videos
3. Law Enforcement:
• Scan and index video evidence from body-worn cameras
• Surface video segments containing threats, violations etc.
• Assist investigators in reviewing large volumes of footage
4. Multi-lingual Subtitling
• The generated text can be translated to create multi-lingual
subtitles and aid localization of video content.
[1] Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for
image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077–6086.
[2] Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. 6299–6308.
[3] David Chen and William B Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies. 190–200.
[4] Yangyu Chen, Shuhui Wang, Weigang Zhang, and Qingming Huang. 2018. Less is more: Picking informative frames for video captioning. In ECCV. 358–373.
[5] Bo Dai, Sanja Fidler, Raquel Urtasun, and Dahua Lin. 2017. Towards diverse and natural image descriptions via a conditional gan. In ICCV. 2970–2979.
[6] Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth
workshop on statistical machine translation. 376–380.
[7] Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2013.
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV. 2712–2719.
[8] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in
neural information processing systems. 5767–5777.
[9] Jingyi Hou, Xinxiao Wu, Xiaoxun Zhang, Yayun Qi, Yunde Jia, and Jiebo Luo. 2020. Joint Commonsense and Relation Reasoning for Image and Video
Captioning.. In AAAI. 10973–10980.
[10] Yaosi Hu, Zhenzhong Chen, Zheng-Jun Zha, and Feng Wu. 2019. Hierarchical global-local temporal modeling for video captioning. In Proceedings of the
27th ACM International Conference on Multimedia. 774–783.
[11] Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016).
Semantic Summarization of videos, Semantic Summarization of videos

More Related Content

Similar to Semantic Summarization of videos, Semantic Summarization of videos

Real-time eyeglass detection using transfer learning for non-standard facial...
Real-time eyeglass detection using transfer learning for  non-standard facial...Real-time eyeglass detection using transfer learning for  non-standard facial...
Real-time eyeglass detection using transfer learning for non-standard facial...IJECEIAES
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringIRJET Journal
 
SUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOSSUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOSIRJET Journal
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...IJDKP
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...IJDKP
 
INAPPROPRIATE & ABUSIVE CONTENT CENSORSHIP
INAPPROPRIATE & ABUSIVE CONTENT CENSORSHIPINAPPROPRIATE & ABUSIVE CONTENT CENSORSHIP
INAPPROPRIATE & ABUSIVE CONTENT CENSORSHIPIRJET Journal
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionIJERA Editor
 
Profile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learningProfile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learningGihan Wikramanayake
 
Inverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy RetrievalInverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy Retrievalijcsa
 
Video saliency-recognition by applying custom spatio temporal fusion technique
Video saliency-recognition by applying custom spatio temporal fusion techniqueVideo saliency-recognition by applying custom spatio temporal fusion technique
Video saliency-recognition by applying custom spatio temporal fusion techniqueIAESIJAI
 
Content based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transformContent based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transformnooriasukmaningtyas
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1VasileiosMezaris
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGNathan Mathis
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A SurveyIRJET Journal
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...IRJET Journal
 
Deep fake video detection using machine learning.docx
Deep fake video detection using machine learning.docxDeep fake video detection using machine learning.docx
Deep fake video detection using machine learning.docxShakas Technologies
 

Similar to Semantic Summarization of videos, Semantic Summarization of videos (20)

Real-time eyeglass detection using transfer learning for non-standard facial...
Real-time eyeglass detection using transfer learning for  non-standard facial...Real-time eyeglass detection using transfer learning for  non-standard facial...
Real-time eyeglass detection using transfer learning for non-standard facial...
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 
SUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOSSUMMARY GENERATION FOR LECTURING VIDEOS
SUMMARY GENERATION FOR LECTURING VIDEOS
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
 
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...
 
INAPPROPRIATE & ABUSIVE CONTENT CENSORSHIP
INAPPROPRIATE & ABUSIVE CONTENT CENSORSHIPINAPPROPRIATE & ABUSIVE CONTENT CENSORSHIP
INAPPROPRIATE & ABUSIVE CONTENT CENSORSHIP
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object Recognition
 
Profile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learningProfile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learning
 
Inverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy RetrievalInverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy Retrieval
 
Video saliency-recognition by applying custom spatio temporal fusion technique
Video saliency-recognition by applying custom spatio temporal fusion techniqueVideo saliency-recognition by applying custom spatio temporal fusion technique
Video saliency-recognition by applying custom spatio temporal fusion technique
 
Content based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transformContent based video retrieval using discrete cosine transform
Content based video retrieval using discrete cosine transform
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET -  	  Applications of Image and Video Deduplication: A SurveyIRJET -  	  Applications of Image and Video Deduplication: A Survey
IRJET - Applications of Image and Video Deduplication: A Survey
 
50120130404055
5012013040405550120130404055
50120130404055
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
 
Deep fake video detection using machine learning.docx
Deep fake video detection using machine learning.docxDeep fake video detection using machine learning.docx
Deep fake video detection using machine learning.docx
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 

Semantic Summarization of videos, Semantic Summarization of videos

  • 1. Team • Mohammed Kaif Shaikh • Akshat Jain • Pooja Patil • Darsh Jain
  • 2. This paper proposes a Discriminative Latent Semantic Graph (D-LSG) framework to generate natural language captions that can summarize the visual contents in long videos. The model has three main components: • A conditional graph is used to enhance object proposals by fusing contextual information from the video frames • A dynamic graph aggregates the enhanced proposals into compact visual words with higher semantic meaning • A discriminative module validates the generated captions by reconstructing visual words and scoring against the original visual words to ensure fidelity and relevance. The model can effectively leverage complex object interactions, extract salient visual concepts from videos, and generate captions that are content-relevant.
  • 3. Video captioning aims to use natural language descriptions to summarize the visual contents in video data. This is a challenging task as it requires: • Modeling complex dependencies between objects and their interactions • Extracting high-level visual concepts from spatio-temporal video data • Generating captions that accurately reflect visual content and are semantically rich
  • 4. Sr. No. Title Year Authors Methodology Feature Extraction Techinique Classifier Accuracy Issues Research Gap 2 Video Joint Modelling Based on Hierarchical Transformer for Co-summarization 2022 Haopeng Li, Qiuhong Ke, Mingming Gong, and Rui Zhang ML GoogLeNet(CNN), Video Joint Modelling based on Hierarchical Transformer (VJMHT) Transformer-based models (F- Transformer and S-Transformer) for modeling intra-shot and inter-shot dependencies in the video summarization process. 80% Low F-Measure or Rank Correlation, Long Training and Inference Times: Need for Generalization Across Diverse Datasets &Handling Long Videos and Temporal Context 3 Video summarization using deep learning techniques: a detailed analysis and investigation 2023 Parul Saini, Krishan Kumar, Shamal Kashid, Ashray Saini, Alok Negi Deep Learning 3D CNN,FCNN,DG-CNN K-Nearest Neighbors (K-NN),Deep Belief Network (DBN) 88% Some GAN-based models may produce very short summaries that lack important details making it challenging to strike the right balance. additional efforts be put into video summarization algorithms for optimizing the best summaries based on the intended audience 4 Semantic Text Summarization of Long Videos 2017 Shagan Sah, Sourabh Kulhare, Allison Gray, Subhashini Venugopalan, Emily Prud’hommeaux, Raymond Ptucha Deep Learning, Neural Network 3D CNN deep visual-captioning techniques for feature extraction in video summarization 70% annotated ground truth data for semantic video summarization may be limited or expensive to obtain, hindering supervised training of CNN-based models. Enhancing CNNs' semantic understanding of video content, is essential. explore techniques that allow CNNs to recognize actions, and relationships within video frames. 5 Towards Diverse Paragraph Captioning for Untrimmed Videos 2021 Yuqing Song1 , Shizhe Chen, Qin Jin, Renmin University of China, INRIA Machine Learning, Reinforcement Learning ResNet, VGG MFT,Vtransformer,AdvInf,MART 79% the vanilla encoder brings computation burden for long paragraph generation,both MLE and RL training make the model generate high-frequency words and phrases. Scalability to Longer Videos,Training with Limited Data,Evaluation Metrics Beyond State-of-the-Art: 6 A Comprehensive Review of the Video-to-Text Problem 2021 Jesus Perez-Martin, Benjamin Bustos, Silvio Jamil, F. Guimaraes, Ivan Sipiran, Jorge P´erez , Grethel Coello ML,DL 2D CNN,NLG (Natural Language Generator), RNN AlexNet,ImageNet,ILSVRC 71% An essential issue for exactly and precise video description generation is the selection of the most informative frames. Model Adaptation:Fine-tuning pre- trained AlexNet on ImageNet may not always lead to optimal results for specific tasks Content Variability, Comparative Analysis Limited Mention of Multimodal Integration: researchers can choose or create datasets that inherently require the integration of both visual and textual information Discriminative Latent Semantic Graph for Video Captioning 1 2021 Yang Bai1, Junyan Wang2,Yang Long3 ,Bingzhang Hu4, Yang Song2,Maurice Pagnucco 2,Yu Guan1 Squence to Sequence Model , Deep Learning 2D CNNs, Faster R- CNN,LSTM models Language LSTM, Multimodal bilinear pooling 70%
  • 5. Sr. No. Title Year Authors Methodology Feature Extraction Techinique Classifier Accuracy Issues Research Gap 7 Real Time Video to Text Summarization using Neural Network 2020 Abhishek Yadav, Anjali Vishwakarma, Shyama Panickar, and Prof. Satish Kuchiwale. Deep Learning Convolutional Neural Network RNN,SoftMax layer 75% Training RNNs for video summarization can suffer from the vanishing gradient problem, where gradients become too large. This can impact training stability and convergence. Research should aim to develop effective regularization techniques and architectural innovations to mitigate overfitting in RNN-based video summarization models. 8 Video Summarization by Learning Deep Side Semantic Embedding 2019 Yitin Yuan, Taon Mei, Senior Member IEEE, Peng Cui and Wenwu Zhu Deep Learning 3D-CNN DSSE Model 80% effectively measuring the semantic relevance between video frames and query information Deep Side Semantic Embedding (DSSE) model to address these issues by leveraging side information to select semantically meaningful segments from videos 9 Spatiotemporal Modeling for Video Summarization Using Convolutional Recurrent Neural Network 2019 Yuan Yuan, Haopeng LI, QI WANG Deep Learning 2D-CNN,DCNNs AlexNet and GoogLeNet 85% the increasing amount of video data,the difficulty of retrieving valuable information conveyed by videos and the extremely heavy burden of data storage improving computational efficiency,further research into enabling real time. especially for applications that require rapid summarization 10 Text Semantics Based Automatic Summarization for Chinese Videos. 2015 WANG Xingqi, ZHA Taotao, WU Chunming, FANG Jinglong. ML HLAC, HOG Ant Colony - Broad Range of content There has been no attempt for text semantic-based video summarization prior to their proposed method. 11 Video and Text Summarization Using VDAN and RNN 2021 Joys Princia A, Ms. J Sangeetha Priya, Kalai Selvi J, Rithi Afra J Deep Learning and Neural Network VDAN Random Forest - visual gaps and breaks between frames Short-term dependencies of simple RNNS
  • 6. The key problems this model aims to address: • Current video captioning models cannot effectively leverage complex object-level interactions and relationships in the video data. • They fail to extract high-level visual concepts that capture salient information from spatio-temporal video data. • Existing models struggle to validate the fidelity and relevance of generated captions to the source video's visual content.
  • 8.
  • 9. 1. Literature Review • Survey prior work in video captioning and summarization • Understand limitations of existing methods • Identify opportunities for improvement 2. Problem Definition • Clearly define the problem to be solved • Set project objectives and scope 3. Data Collection • Gather relevant video datasets for training and testing • Ensure diversity of video content. 4. Model Development • Implement base encoder-decoder architecture • Incorporate conditional graph for enhancing object proposals • Develop dynamic graph for latent proposal aggregation 5. Training and Optimization • Prepare training data and protocols • Train model end-to-end with suitable loss functions
  • 10. Following can be the future scopes or possible applications of the model introduced: 1.Video Search and Retrieval • Generate textual captions to index video content • Enable text-based semantic search of video database 2. Video Highlight Detection • Identify key moments and events in long videos • Generate concise summaries for skimming videos 3. Law Enforcement: • Scan and index video evidence from body-worn cameras • Surface video segments containing threats, violations etc. • Assist investigators in reviewing large volumes of footage 4. Multi-lingual Subtitling • The generated text can be translated to create multi-lingual subtitles and aid localization of video content.
  • 11. [1] Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077–6086. [2] Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308. [3] David Chen and William B Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 190–200. [4] Yangyu Chen, Shuhui Wang, Weigang Zhang, and Qingming Huang. 2018. Less is more: Picking informative frames for video captioning. In ECCV. 358–373. [5] Bo Dai, Sanja Fidler, Raquel Urtasun, and Dahua Lin. 2017. Towards diverse and natural image descriptions via a conditional gan. In ICCV. 2970–2979. [6] Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation. 376–380. [7] Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2013. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV. 2712–2719. [8] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767–5777. [9] Jingyi Hou, Xinxiao Wu, Xiaoxun Zhang, Yayun Qi, Yunde Jia, and Jiebo Luo. 2020. Joint Commonsense and Relation Reasoning for Image and Video Captioning.. In AAAI. 10973–10980. [10] Yaosi Hu, Zhenzhong Chen, Zheng-Jun Zha, and Feng Wu. 2019. Hierarchical global-local temporal modeling for video captioning. In Proceedings of the 27th ACM International Conference on Multimedia. 774–783. [11] Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016).