SlideShare a Scribd company logo
1 of 19
Download to read offline
1
2020/12/15
Summarizing Videos with Attention
Arithmer Dynamics Div. Christian Saravia
Self-Introduction
Christian Martin Saravia Hernandez
● Graduate School
○ Universidad Peruana de Ciencias Aplicadas
○ Bs. Software Engineering
● Former Job
○ Ficha Inc
■ Algorithm Development Engineer
● Research & application of deep learning in computer vision problems
● Current Job
○ Application of machine learning / deep learning to computer vision problems
■ Object detection
■ Object classification
○ Research topics
■ Attention & Transformers
Purpose
Purpose of this material
● Explore a solution to the task of video summarization using attention.
Agenda
Contents
● Introduction
○ Motivation
○ Contributions
● Dataset
● VASNet
○ Feature Extraction
○ Attention Network
○ Regressor Network
● Inference
○ Changepoint Detection
○ Kernel Temporal Segmentation
● Results
○ Measuring method
○ Dataset Results
Introduction
Motivation
● Early video summarization methods were based on unsupervised methods, leveraging low level spatio-temporal features and
dimensionality reduction with clustering techniques.Success of these methods solely stands on the ability to define distance/cost
functions between the keyshots/frames with respect to the original video.
● Current state of the art methods for video summarization are based on recurrent encoder-decoder architectures, usually with bi-
directional LSTM or GRU and soft attention. They are computationally demanding, especially in the bi-directional configuration.
Introduction
Contribution
1. A novel approach to sequence to sequence transformation for video summarization based on soft, self-attention mechanism. In
contrast, current state of the art relies on complex LSTM/GRU encoder-decoder methods.
1. A demonstration that a recurrent network can be successfully replaced with simpler, attention mechanism for the video
summarization.
Dataset
Dataset
● TVSum Dataset: https://github.com/yalesong/tvsum
● SumMe Dataset: https://gyglim.github.io/me/vsum/index.html
VASNet
Feature Extraction
● Given a time interval t, every 15 frames are collected in an ordered set X
● Each set then is used as input to GoogLeNet for feature extraction
● Then we extract the Pool 5 layer of GoogLeNet, which is a 1024 dimensional array (D = 1024).
VASNet
Attention Network
● Unnormalized self-attention weight et,i is calculated as an alignment between input feature Xt and the entire input sequence
Where,
N: Number of frames
U, V: Network weight matrices estimated together with other parameters of the network during optimization
s: Scale paramenter
● The attention weights αt are true probabilities representing the importance of input features with respect to the
desired frame level score at the time t
VASNet
Attention Network
VASNet
Regressor Network
● Linear transformation C is then applied to each input and the results then weighted with attention vector αt and averaged.
● The output is a context vector ct which is used for the final frame score regression.
● The context vector ct is then projected by a single layer, fully connected network with linear activation and residual sum followed by
dropout and layer normalization.
VASNet
Regressor Network
Inference
Inference
● The output of the model VASNet is a probability of importance per frame
● This probability must be analyzed in the range of the scene it corresponds
● However to get the number of frames is relative per video
● The problem to find the frames where a change a scene exist is called changepoint detection.
● For the datasets used, the changepoints (cps) are already calculated by using KTS algorithm with hyperparameter tuning
Inference
Changepoint detection
● In statistical analysis, change detection or change point detection
tries to identify times when the probability distribution of a
stochastic process or time series changes. In general the problem
concerns both detecting whether or not a change has occurred, or
whether several changes might have occurred, and identifying the
times of any such changes.
Inference
Kernel Temporal Segmentation (KTS)
● Kernel Temporal Segmentation (KTS) method splits the video into
a set of non-intersecting temporal segments.
● It treats the cps detection as a dynamic programming problem.
● The method is fast and accurate when combined with high-
dimensional descriptors.
Results
Measuring method
P: Precision
R: Recall
F Score: [2 * P * R / (P + R)] * 100
Results
Dataset Results
Results
Dataset Results
● Long video: https://youtu.be/873CBVbPJVE
● Summarize: https://youtu.be/weW4memH3Dg
● Full playlist: https://www.youtube.com/playlist?list=PLEdpjt8KmmQMfQEat4HvuIxORwiO9q9DB
References
Reference
● VASNet: https://arxiv.org/pdf/1812.01969.pdf
● VASNet official implementation: https://github.com/ok1zjf/VASNet
● KTS implementation: https://github.com/TatsuyaShirakawa/KTS
● Video summarization datasets and review: https://hal.inria.fr/hal-01022967/PDF/video_summarization.pdf
● Issue on testing on own videos: https://github.com/ok1zjf/VASNet/issues/2

More Related Content

What's hot

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingRAHUL BHOJWANI
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Joint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersJoint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersUniversitat Politècnica de Catalunya
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 

What's hot (20)

Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computing
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Joint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clustersJoint unsupervised learning of deep representations and image clusters
Joint unsupervised learning of deep representations and image clusters
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
 

Similar to Summarizing videos with Attention

Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMKyuri Kim
 
IRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET Journal
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsQuantUniversity
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABCodeOps Technologies LLP
 
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IRJET Journal
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniquesIRJET Journal
 
Video Denoising using Transform Domain Method
Video Denoising using Transform Domain MethodVideo Denoising using Transform Domain Method
Video Denoising using Transform Domain MethodIRJET Journal
 
Near Reversible Data Hiding Scheme for images using DCT
Near Reversible Data Hiding Scheme for images using DCTNear Reversible Data Hiding Scheme for images using DCT
Near Reversible Data Hiding Scheme for images using DCTIJERA Editor
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
Pipeline anomaly detection
Pipeline anomaly detectionPipeline anomaly detection
Pipeline anomaly detectionGauravBiswas9
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...Edge AI and Vision Alliance
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
 
“Stream loss”: ConvNet learning for face verification using unlabeled videos ...
“Stream loss”: ConvNet learning for face verification using unlabeled videos ...“Stream loss”: ConvNet learning for face verification using unlabeled videos ...
“Stream loss”: ConvNet learning for face verification using unlabeled videos ...Elaheh Rashedi
 

Similar to Summarizing videos with Attention (20)

Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTM
 
IRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept Detection
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
 
Dynamic Adaptive Point Cloud Streaming
Dynamic Adaptive Point Cloud StreamingDynamic Adaptive Point Cloud Streaming
Dynamic Adaptive Point Cloud Streaming
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniques
 
Ajila (1)
Ajila (1)Ajila (1)
Ajila (1)
 
Video Denoising using Transform Domain Method
Video Denoising using Transform Domain MethodVideo Denoising using Transform Domain Method
Video Denoising using Transform Domain Method
 
H43044145
H43044145H43044145
H43044145
 
Near Reversible Data Hiding Scheme for images using DCT
Near Reversible Data Hiding Scheme for images using DCTNear Reversible Data Hiding Scheme for images using DCT
Near Reversible Data Hiding Scheme for images using DCT
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
Pipeline anomaly detection
Pipeline anomaly detectionPipeline anomaly detection
Pipeline anomaly detection
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
“Stream loss”: ConvNet learning for face verification using unlabeled videos ...
“Stream loss”: ConvNet learning for face verification using unlabeled videos ...“Stream loss”: ConvNet learning for face verification using unlabeled videos ...
“Stream loss”: ConvNet learning for face verification using unlabeled videos ...
 

More from Arithmer Inc.

コーディネートレコメンド
コーディネートレコメンドコーディネートレコメンド
コーディネートレコメンドArithmer Inc.
 
Arithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システムArithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システムArithmer Inc.
 
Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介Arithmer Inc.
 
Arithmer Robo Introduction
Arithmer Robo IntroductionArithmer Robo Introduction
Arithmer Robo IntroductionArithmer Inc.
 
Arithmer AIチャットボット
Arithmer AIチャットボットArithmer AIチャットボット
Arithmer AIチャットボットArithmer Inc.
 
Arithmer R3 Introduction
Arithmer R3 Introduction Arithmer R3 Introduction
Arithmer R3 Introduction Arithmer Inc.
 
Arithmer Inspection Introduction
Arithmer Inspection IntroductionArithmer Inspection Introduction
Arithmer Inspection IntroductionArithmer Inc.
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!TransformerArithmer Inc.
 
Arithmer NLP Introduction
Arithmer NLP IntroductionArithmer NLP Introduction
Arithmer NLP IntroductionArithmer Inc.
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesArithmer Inc.
 
Arithmer OCR Introduction
Arithmer OCR IntroductionArithmer OCR Introduction
Arithmer OCR IntroductionArithmer Inc.
 
Arithmer Dynamics Introduction
Arithmer Dynamics Introduction Arithmer Dynamics Introduction
Arithmer Dynamics Introduction Arithmer Inc.
 
ArithmerDB Introduction
ArithmerDB IntroductionArithmerDB Introduction
ArithmerDB IntroductionArithmer Inc.
 
3D human body modeling from RGB images
3D human body modeling from RGB images3D human body modeling from RGB images
3D human body modeling from RGB imagesArithmer Inc.
 
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose EstimationArithmer Inc.
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningArithmer Inc.
 
Survey on self supervised image segmentation
Survey on self supervised image segmentationSurvey on self supervised image segmentation
Survey on self supervised image segmentationArithmer Inc.
 
dataScienceofPhysics
dataScienceofPhysicsdataScienceofPhysics
dataScienceofPhysicsArithmer Inc.
 

More from Arithmer Inc. (20)

コーディネートレコメンド
コーディネートレコメンドコーディネートレコメンド
コーディネートレコメンド
 
Test for AI model
Test for AI modelTest for AI model
Test for AI model
 
最適化
最適化最適化
最適化
 
Arithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システムArithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システム
 
Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介
 
Arithmer Robo Introduction
Arithmer Robo IntroductionArithmer Robo Introduction
Arithmer Robo Introduction
 
Arithmer AIチャットボット
Arithmer AIチャットボットArithmer AIチャットボット
Arithmer AIチャットボット
 
Arithmer R3 Introduction
Arithmer R3 Introduction Arithmer R3 Introduction
Arithmer R3 Introduction
 
Arithmer Inspection Introduction
Arithmer Inspection IntroductionArithmer Inspection Introduction
Arithmer Inspection Introduction
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 
Arithmer NLP Introduction
Arithmer NLP IntroductionArithmer NLP Introduction
Arithmer NLP Introduction
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
 
Arithmer OCR Introduction
Arithmer OCR IntroductionArithmer OCR Introduction
Arithmer OCR Introduction
 
Arithmer Dynamics Introduction
Arithmer Dynamics Introduction Arithmer Dynamics Introduction
Arithmer Dynamics Introduction
 
ArithmerDB Introduction
ArithmerDB IntroductionArithmerDB Introduction
ArithmerDB Introduction
 
3D human body modeling from RGB images
3D human body modeling from RGB images3D human body modeling from RGB images
3D human body modeling from RGB images
 
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose Estimation
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
 
Survey on self supervised image segmentation
Survey on self supervised image segmentationSurvey on self supervised image segmentation
Survey on self supervised image segmentation
 
dataScienceofPhysics
dataScienceofPhysicsdataScienceofPhysics
dataScienceofPhysics
 

Recently uploaded

Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2DianaGray10
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Antonio de Llamas
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023Joshua Flannery
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024BookNet Canada
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
full stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdffull stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdfHulkTheDevil
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationBuild Intuit
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfwill854175
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxKunal Gupta
 

Recently uploaded (20)

Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
full stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdffull stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdf
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientation
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptx
 

Summarizing videos with Attention

  • 1. 1 2020/12/15 Summarizing Videos with Attention Arithmer Dynamics Div. Christian Saravia
  • 2. Self-Introduction Christian Martin Saravia Hernandez ● Graduate School ○ Universidad Peruana de Ciencias Aplicadas ○ Bs. Software Engineering ● Former Job ○ Ficha Inc ■ Algorithm Development Engineer ● Research & application of deep learning in computer vision problems ● Current Job ○ Application of machine learning / deep learning to computer vision problems ■ Object detection ■ Object classification ○ Research topics ■ Attention & Transformers
  • 3. Purpose Purpose of this material ● Explore a solution to the task of video summarization using attention.
  • 4. Agenda Contents ● Introduction ○ Motivation ○ Contributions ● Dataset ● VASNet ○ Feature Extraction ○ Attention Network ○ Regressor Network ● Inference ○ Changepoint Detection ○ Kernel Temporal Segmentation ● Results ○ Measuring method ○ Dataset Results
  • 5. Introduction Motivation ● Early video summarization methods were based on unsupervised methods, leveraging low level spatio-temporal features and dimensionality reduction with clustering techniques.Success of these methods solely stands on the ability to define distance/cost functions between the keyshots/frames with respect to the original video. ● Current state of the art methods for video summarization are based on recurrent encoder-decoder architectures, usually with bi- directional LSTM or GRU and soft attention. They are computationally demanding, especially in the bi-directional configuration.
  • 6. Introduction Contribution 1. A novel approach to sequence to sequence transformation for video summarization based on soft, self-attention mechanism. In contrast, current state of the art relies on complex LSTM/GRU encoder-decoder methods. 1. A demonstration that a recurrent network can be successfully replaced with simpler, attention mechanism for the video summarization.
  • 7. Dataset Dataset ● TVSum Dataset: https://github.com/yalesong/tvsum ● SumMe Dataset: https://gyglim.github.io/me/vsum/index.html
  • 8. VASNet Feature Extraction ● Given a time interval t, every 15 frames are collected in an ordered set X ● Each set then is used as input to GoogLeNet for feature extraction ● Then we extract the Pool 5 layer of GoogLeNet, which is a 1024 dimensional array (D = 1024).
  • 9. VASNet Attention Network ● Unnormalized self-attention weight et,i is calculated as an alignment between input feature Xt and the entire input sequence Where, N: Number of frames U, V: Network weight matrices estimated together with other parameters of the network during optimization s: Scale paramenter ● The attention weights αt are true probabilities representing the importance of input features with respect to the desired frame level score at the time t
  • 11. VASNet Regressor Network ● Linear transformation C is then applied to each input and the results then weighted with attention vector αt and averaged. ● The output is a context vector ct which is used for the final frame score regression. ● The context vector ct is then projected by a single layer, fully connected network with linear activation and residual sum followed by dropout and layer normalization.
  • 13. Inference Inference ● The output of the model VASNet is a probability of importance per frame ● This probability must be analyzed in the range of the scene it corresponds ● However to get the number of frames is relative per video ● The problem to find the frames where a change a scene exist is called changepoint detection. ● For the datasets used, the changepoints (cps) are already calculated by using KTS algorithm with hyperparameter tuning
  • 14. Inference Changepoint detection ● In statistical analysis, change detection or change point detection tries to identify times when the probability distribution of a stochastic process or time series changes. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times of any such changes.
  • 15. Inference Kernel Temporal Segmentation (KTS) ● Kernel Temporal Segmentation (KTS) method splits the video into a set of non-intersecting temporal segments. ● It treats the cps detection as a dynamic programming problem. ● The method is fast and accurate when combined with high- dimensional descriptors.
  • 16. Results Measuring method P: Precision R: Recall F Score: [2 * P * R / (P + R)] * 100
  • 18. Results Dataset Results ● Long video: https://youtu.be/873CBVbPJVE ● Summarize: https://youtu.be/weW4memH3Dg ● Full playlist: https://www.youtube.com/playlist?list=PLEdpjt8KmmQMfQEat4HvuIxORwiO9q9DB
  • 19. References Reference ● VASNet: https://arxiv.org/pdf/1812.01969.pdf ● VASNet official implementation: https://github.com/ok1zjf/VASNet ● KTS implementation: https://github.com/TatsuyaShirakawa/KTS ● Video summarization datasets and review: https://hal.inria.fr/hal-01022967/PDF/video_summarization.pdf ● Issue on testing on own videos: https://github.com/ok1zjf/VASNet/issues/2