SlideShare a Scribd company logo
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Title of presentation
Subtitle
Name of presenter
Date
Unsupervised Video Summarization via Attention-Driven
Adversarial Learning
E. Apostolidis1,2, E. Adamantidou1, A. I. Metsai1, V. Mezaris1, I. Patras2
1 CERTH-ITI, Thermi - Thessaloniki, Greece
2 School of EECS, Queen Mary University of London, London, UK
26th Int. Conf. on Multimedia Modeling
Daejeon, Korea, January 2020
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Outline
2
 Introduction
 Motivation
 Developed approach
 Experiments
 Summarization example
 Conclusions
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
3
Video summary: a short visual summary that encapsulates the flow of the story and
the essential parts of the full-length video
Original video
Video summary (storyboard)
Problem statement
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
4
Problem statement
Applications of video summarization
 Professional CMS: effective indexing,
browsing, retrieval & promotion of media
assets
 Video sharing platforms: improved viewer
experience, enhanced viewer engagement &
increased content consumption
 Other summarization scenarios: movie trailer production, sports highlights video generation,
video synopsis of 24h surveillance recordings
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
5
Related work
Deep-learning approaches
 Supervised methods that use types of feedforward neural nets (e.g. CNNs), extract and use
video semantics to identify important video parts based on sequence labeling [21], self-
attention networks [7], or video-level metadata [17]
 Supervised approaches that capture the story flow using recurrent neural nets (e.g. LSTMs)
 In combination with statistical models to select a representative and diverse set of keyframes [27]
 In hierarchies to identify the video structure and select key-fragments [30, 31]
 In combination with DTR units and GANs to capture long-range frame dependency [28]
 To form attention-based encoder-decoders [9, 13], or memory-augmented networks [8]
 Unsupervised algorithms that do not rely on human-annotations, and build summaries
 Using adversarial learning to: minimize the distance between videos and their summary-based
reconstructions [1, 16]; maximize the mutual information between summary and video [25]; learn a
mapping from raw videos to human-like summaries based on online available summaries [20]
 Through a decision-making process that is learned via RL and reward functions [32]
 By learning to extract key motions of appearing objects [29]
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
6
Motivation
Disadvantages of supervised learning
 Restricted amount of annotated data is available for supervised training of a video
summarization method
 Highly-subjective nature of video summarization (relying on viewer’s demands and aesthetics);
there is no “ideal” or commonly accepted summary that could be used for training an algorithm
Advantages of unsupervised learning
 No need for learning data; avoid laborious and time-demanding labeling of video data
 Adaptability to different types of video; summarization is learned based on the video content
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
7
Contributions
 Introduce an attention mechanism in an unsupervised learning framework, whereas all
previous attention-based summarization methods ([7-9, 13]) were supervised
 Investigate the integration of an attention mechanism into a variational auto-encoder for video
summarization purposes
 Use attention to guide the generative adversarial training of the model, rather than using it to
rank the video fragments as in [9]
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 Starting point: the SUM-GAN architecture
 Main idea: build a keyframe selection mechanism
by minimizing the distance between the deep
representations of the original video and a
reconstructed version of it based on the selected
keyframes
 Problem: how to define a good distance?
 Solution: use a trainable discriminator network!
 Goal: train the Summarizer to maximally confuse
the Discriminator when distinguishing the original
from the reconstructed video
8
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 SUM-GAN-sl:
 Contains a linear compression layer that reduces
the size of CNN feature vectors
9
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 SUM-GAN-sl:
 Contains a linear compression layer that reduces the size of CNN feature vectors
 Follows an incremental and fine-grained approach to train the model’s components
10
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
11
Building on adversarial learning
 SUM-GAN-sl:
 Contains a linear compression layer that reduces the size of CNN feature vectors
 Follows an incremental and fine-grained approach to train the model’s components
(regularization factor)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
12
Building on adversarial learning
 SUM-GAN-sl:
 Contains a linear compression layer that reduces the size of CNN feature vectors
 Follows an incremental and fine-grained approach to train the model’s components
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
13
Building on adversarial learning
 SUM-GAN-sl:
 Contains a linear compression layer that reduces the size of CNN feature vectors
 Follows an incremental and fine-grained approach to train the model’s components
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 Examined approaches:
1) Integrate an attention layer within the variational auto-encoder (VAE) of SUM-GAN-sl
2) Replace the VAE of SUM-GAN-sl with a deterministic attention auto-encoder
14
Introducing an attention mechanism
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 Variational attention was described in [4] and used for natural language modeling
 Models the attention vector as Gaussian distributed random variables
15
1) Adversarial learning driven by a variational attention auto-encoder
Variational auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 Extended SUM-GAN-sl with variational attention, forming the SUM-GAN-VAAE architecture
 The attention weights for each frame were handled as random variables and a latent space
was computed for these values, too
 In every time-step t the attention component combines the encoder's output at t and the
decoder's hidden state at t - 1 to compute an attention weight vector
 The decoder was modified to update its hidden states based on both latent spaces during the
reconstruction of the video
16
1) Adversarial learning driven by a variational attention auto-encoder
Variational attention
auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 Inspired by the efficiency of attention-based
encoder-decoder network in [13]
 Built on the findings of [4] w.r.t. the impact of
deterministic attention on VAE
 VAE was entirely replaced by an attention auto-
encoder (AAE) network, forming the SUM-
GAN-AAE architecture
17
2) Adversarial learning driven by deterministic attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
18
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
Processing pipeline
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
19
2) Adversarial learning driven by deterministic attention auto-encoder
Processing pipeline
 Weighted feature vectors fed to the Encoder
Attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
20
2) Adversarial learning driven by deterministic attention auto-encoder
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 For t > 1: use the hidden state of the previous
Decoder’s step (h1)
 For t = 1: use the hidden state of the last
Encoder’s step (He)
Attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
21
2) Adversarial learning driven by deterministic attention auto-encoder
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
Attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
Developed approach
22
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
 αt multiplied with V and form Context Vector vt’
Developed approach
23
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
 αt multiplied with V and form Context Vector vt’
 vt’ combined with Decoder’s previous output yt-1
Developed approach
24
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
25
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
 αt multiplied with V and form Context Vector vt’
 vt’ combined with Decoder’s previous output yt-1
 Decoder gradually reconstructs the video
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
 Input: The CNN feature vectors of the (sampled) video frames
 Output: Frame-level importance scores
 Summarization process:
 CNN features pass through the linear compression layer and the frame selector  importance
scores computed at frame-level
 Given a video segmentation (using KTS [18]) calculate fragment-level importance scores by
averaging the scores of each fragment's frames
 Summary is created by selecting the fragments that maximize the total importance score provided
that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem
26
Model’s I/O and summarization process
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
27
Datasets
 SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)
 25 videos capturing multiple events (e.g. cooking and sports)
 video length: 1.6 to 6.5 min
 annotation: fragment-based video summaries
 TVSum (https://github.com/yalesong/tvsum)
 50 videos from 10 categories of TRECVid MED task
 video length: 1 to 5 min
 annotation: frame-level importance scores
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
28
Evaluation protocol
 The generated summary should not exceed 15% of the video length
 Similarity between automatically generated (A) and ground-truth (G) summary is expressed
by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| ||
means duration)
 Typical metrics for computing Precision and Recall at the frame-level
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
29
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
30
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
31
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
32
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-Score2
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
33
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-ScoreN
F-Score2
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
34
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-ScoreN
F-Score2
F-Score1
SumMe: TVSum:
N
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
35
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Alternative approach (used in [9, 13, 16, 24, 25, 28])
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
36
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Alternative approach (used in [9, 13, 16, 24, 25, 28])
F-Score
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Videos were down-sampled to 2 fps
 Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet
 Linear compression layer reduces the size of these vectors from 1024 to 500
 All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM
 Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s
learning rate = 10-5
 Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing
set having the remaining 20% of data
 Ran experiments on 5 differently created random splits and report the average performance at
the training-epoch-level (i.e. for the same training epoch) over these runs
Experiments
37
Implementation details
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 1: Assessing the impact of regularization factor σ
Experiments
38
SUM-GAN-sl SUM-GAN-VAAE SUM-GAN-AAE
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 1: Assessing the impact of regularization factor σ
 Outcomes:
 Value of σ affects the models’ performance and needs fine-tuning
 Fine-tuning is dataset-dependent
 Best overall performance for each model, observed for different σ value
Experiments
39
SUM-GAN-sl SUM-GAN-VAAE SUM-GAN-AAE
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 2: Comparison with SoA unsupervised approaches based on multiple user summaries
 Outcomes
 A few SoA methods are comparable (or even worse) with a random summary generator
 Best method on TVSum shows random-level performance on SumMe
 Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum
 Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two
latent spaces in parallel to the continuous update of the model's components during the training
 Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl
Experiments
40
+/- indicate better/worse performance
compared to SUM-GAN-AAE
Note: SUM-GAN is not listed in this table as it follows
the single gt-summary evaluation protocol
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 3: Evaluating the effect of the introduced AAE component
 Key-fragment selection: Attention mechanism leads to much smoother series of importance
scores
Experiments
41
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 3: Evaluating the effect of the introduced AAE component
 Training efficiency: much faster and more stable training of the model
Experiments
42
Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 4: Comparison with SoA supervised approaches based on multiple user summaries
 Outcomes
 Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as
they exhibit random-level performance on SumMe
 Only a few supervised methods surpass the performance of a random summary generator on both
datasets, with VASNet being the best among them
 The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum
 Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods
Experiments
43
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 5: Comparison with SoA approaches based on single ground-truth summaries
 Impact of regularization factor σ (best scores in bold)
 The model’s performance is affected by the value of σ
 The effect of σ depends (also) on the evaluation approach; best performance when using multiple
human summaries was observed for σ = 0.15
 SUM-GAN-AAE outperforms the original SUM-GAN model on both datasets, even for the same value
of σ
Experiments
44
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Step 5: Comparison with SoA approaches based on single ground-truth summaries
 Outcomes
 SUM-GAN-AAE model performs consistently well on both datasets
 SUM-GAN-AAE shows advanced performance compared to SoA supervised and unsupervised (*)
summarization methods
Experiments
45
Unsupervised approaches
marked with an asterisk
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Summarization example
46
Full video Generated summary
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Presented a video summarization method that combines:
 The effectiveness of attention mechanisms in spotting the most important parts of the video
 The learning efficiency of the generative adversarial networks for unsupervised training
 Experimental evaluations on two benchmarking datasets:
 Documented the positive contribution of the introduced attention auto-encoder component in the
model's training and summarization performance
 Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video
summarization techniques
Conclusions
47
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
1. E. Apostolidis, et al.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In:
AI4TV, ACM MM 2019
2. E. Apostolidis, et al.: Fast shot segmentation combining global and local visual descriptors. In: IEEE ICASSP 2014. pp. 6583-6587
3. K. Apostolidis, et al.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: MMM 2018. pp. 29-41
4. H. Bahuleyan, et al.: Variational attention for sequence-to-sequence models. In: 27th COLING. pp. 1672-1682 (2018)
5. J. Cho: PyTorch implementation of SUM-GAN (2017), https://github.com/j-min/Adversarial Video Summary, (last accessed on Oct. 18, 2019)
6. M. Elfeki, et al.: Video summarization via actionness ranking. In: IEEE WACV 2019. pp. 754-763
7. J. Fajtl, et al.: Summarizing videos with attention. In: ACCV 2018. pp. 39-54
8. L. Feng, et al.: Extractive video summarizer with memory augmented neural networks. In: ACM MM 2018. pp. 976-983
9. T. Fu, et al.: Attentive and adversarial learning for video summarization. In: IEEE WACV 2019. pp. 1579-1587
10. M. Gygli, et al.: Creating summaries from user videos. In: ECCV 2014. pp. 505-520
11. M. Gygli, et al.: Video summarization by learning submodular mixtures of objectives. In: IEEE CVPR 2015. pp. 3090-3098
12. S. Hochreiter, et al.: Long Short-Term Memory. Neural Computation 9(8), 1735-1780 (1997)
13. Z. Ji, et al.: Video summarization with attention-based encoder-decoder networks. IEEE Trans. on Circuits and Systems for Video
Technology (2019)
14. D. Kaufman, et al.: Temporal Tessellation: A unified approach for video analysis. In: IEEE ICCV 2017. pp. 94-104
15. S. Lee, et al.: A memory network approach for story-based temporal summarization of 360 videos. In: IEEE CVPR 2018. pp. 1410-1419
16. B. Mahasseni, et al.: Unsupervised video summarization with adversarial LSTM networks. In: IEEE CVPR 2017. pp. 2982-2991
17. M. Otani, et al.: Video summarization using deep semantic features. In: ACCV 2016. pp. 361-377
Key references
48
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
18. D. Potapov, et al.: Category-specific video summarization. In: ECCV 2014. pp. 540-555
19. A. Radford, et al.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR 2016
20. M. Rochan, et al.: Video summarization by learning from unpaired data. In: IEEE CVPR 2019
21. M. Rochan, et al.: Video summarization using fully convolutional sequence networks. In: ECCV 2018. pp. 358-374
22. Y. Song, et al.: TVSum: Summarizing web videos using titles. In: IEEE CVPR 2015. pp. 5179-5187
23. C. Szegedy, et al.: Going deeper with convolutions. In: IEEE CVPR 2015. pp. 1-9
24. H. Wei, et al.: Video summarization via semantic attended networks. In: AAAI 2018. pp. 216-223
25. L. Yuan, et al.: Cycle-SUM: Cycle-consistent adversarial LSTM networks for unsupervised video summarization. In: AAAI 2019. pp. 9143-
9150
26. Y. Yuan, et al.: Video summarization by learning deep side semantic embedding. IEEE Trans. on Circuits and Systems for Video Technology
29(1), 226-237 (2019)
27. K. Zhang, et al.: Video summarization with Long Short-Term Memory. In: ECCV 2016. pp. 766-782
28. Y. Zhang, et al.: DTR-GAN: Dilated temporal relational adversarial network for video summarization. In: ACM TURC 2019. pp. 89:1-89:6
29. Y. Zhang, et al.: Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters (2018)
30. B. Zhao, et al.: Hierarchical recurrent neural network for video summarization. In: ACM MM 2017. pp. 863-871
31. B. Zhao, et al.: HSA-RNN: Hierarchical structure-adaptive RNN for video summarization. In: IEEE/CVF CVPR 2018. pp. 7405-7414
32. K. Zhou, et al.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI
2018. pp. 7582-7589
33. K. Zhou, et al.: Video summarisation by classification with deep reinforcement learning. In: BMVC 2018
Key references
49
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
50
Thank you for your attention!
Questions?
Vasileios Mezaris, bmezaris@iti.gr
Code and documentation publicly available at:
https://github.com/e-apostolidis/SUM-GAN-AAE
This work was supported by the EUs Horizon 2020 research and innovation
programme under grant agreement H2020-780656 ReTV. The work of Ioannis
Patras has been supported by EPSRC under grant No. EP/R026424/1.

More Related Content

What's hot

Keyframe-based Video Summarization Designer
Keyframe-based Video Summarization DesignerKeyframe-based Video Summarization Designer
Keyframe-based Video Summarization Designer
Universitat Politècnica de Catalunya
 
Advance image processing
Advance image processingAdvance image processing
Advance image processing
AAKANKSHA JAIN
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
University of Oklahoma
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
tvutech
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image Compression
Mathankumar S
 
Jpeg dct
Jpeg dctJpeg dct
Jpeg dct
darshan2518
 
StarGAN
StarGANStarGAN
StarGAN
Joonhyung Lee
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clustering
Sahil Biswas
 
Fundamental Steps of Digital Image Processing & Image Components
Fundamental Steps of Digital Image Processing & Image ComponentsFundamental Steps of Digital Image Processing & Image Components
Fundamental Steps of Digital Image Processing & Image Components
Kalyan Acharjya
 
Image Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):BasicsImage Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):Basics
Kalyan Acharjya
 
Gabor Filtering for Fingerprint Image Enhancement
Gabor Filtering for Fingerprint Image EnhancementGabor Filtering for Fingerprint Image Enhancement
Gabor Filtering for Fingerprint Image Enhancement
Ankit Nayan
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
S.Shayan Daneshvar
 
Object tracking
Object trackingObject tracking
Object tracking
Sri vidhya k
 
8th sem.pptx
8th sem.pptx8th sem.pptx
8th sem.pptx
Snehalkarki1
 
Image Compression
Image CompressionImage Compression
Image Compression
Paramjeet Singh Jamwal
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compression
asodariyabhavesh
 
Dip unit-i-ppt academic year(2016-17)
Dip unit-i-ppt academic year(2016-17)Dip unit-i-ppt academic year(2016-17)
Dip unit-i-ppt academic year(2016-17)
RagavanK6
 

What's hot (20)

Keyframe-based Video Summarization Designer
Keyframe-based Video Summarization DesignerKeyframe-based Video Summarization Designer
Keyframe-based Video Summarization Designer
 
Advance image processing
Advance image processingAdvance image processing
Advance image processing
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image Compression
 
Jpeg dct
Jpeg dctJpeg dct
Jpeg dct
 
StarGAN
StarGANStarGAN
StarGAN
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clustering
 
Fundamental Steps of Digital Image Processing & Image Components
Fundamental Steps of Digital Image Processing & Image ComponentsFundamental Steps of Digital Image Processing & Image Components
Fundamental Steps of Digital Image Processing & Image Components
 
Image Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):BasicsImage Restoration (Frequency Domain Filters):Basics
Image Restoration (Frequency Domain Filters):Basics
 
Gabor Filtering for Fingerprint Image Enhancement
Gabor Filtering for Fingerprint Image EnhancementGabor Filtering for Fingerprint Image Enhancement
Gabor Filtering for Fingerprint Image Enhancement
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
 
Object tracking
Object trackingObject tracking
Object tracking
 
8th sem.pptx
8th sem.pptx8th sem.pptx
8th sem.pptx
 
Image Compression
Image CompressionImage Compression
Image Compression
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compression
 
Dip unit-i-ppt academic year(2016-17)
Dip unit-i-ppt academic year(2016-17)Dip unit-i-ppt academic year(2016-17)
Dip unit-i-ppt academic year(2016-17)
 

Similar to Unsupervised Video Summarization via Attention-Driven Adversarial Learning

ReTV AI4TV Summarization
ReTV AI4TV SummarizationReTV AI4TV Summarization
ReTV AI4TV Summarization
ReTV project
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020
VasileiosMezaris
 
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendationICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ReTV project
 
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
ReTV project
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTV
FIAT/IFTA
 
Implementing artificial intelligence strategies for content annotation and pu...
Implementing artificial intelligence strategies for content annotation and pu...Implementing artificial intelligence strategies for content annotation and pu...
Implementing artificial intelligence strategies for content annotation and pu...
ReTV project
 
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
INFOGAIN PUBLICATION
 
Mini Project- Digital Video Editing
Mini Project- Digital Video EditingMini Project- Digital Video Editing
Using TV Metadata to optimise the repurposing and republication of TV Content...
Using TV Metadata to optimise the repurposing and republication of TV Content...Using TV Metadata to optimise the repurposing and republication of TV Content...
Using TV Metadata to optimise the repurposing and republication of TV Content...
ReTV project
 
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
 ReTV: Bringing Broadcaster Archives to the 21st-century Audiences ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
ReTV project
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
VasileiosMezaris
 
Software Tools for Building Industry 4.0 Applications
Software Tools for Building Industry 4.0 ApplicationsSoftware Tools for Building Industry 4.0 Applications
Software Tools for Building Industry 4.0 Applications
Pankesh Patel
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarization
VasileiosMezaris
 
Paper id 36201508
Paper id 36201508Paper id 36201508
Paper id 36201508
IJRAT
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
Edge AI and Vision Alliance
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
IJRAT
 
UVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxUVM_Full_Print_n.pptx
UVM_Full_Print_n.pptx
nikitha992646
 
COCOMO methods for software size estimation
COCOMO methods for software size estimationCOCOMO methods for software size estimation
COCOMO methods for software size estimation
Pramod Parajuli
 
Arneb
ArnebArneb
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
Sebastiano Panichella
 

Similar to Unsupervised Video Summarization via Attention-Driven Adversarial Learning (20)

ReTV AI4TV Summarization
ReTV AI4TV SummarizationReTV AI4TV Summarization
ReTV AI4TV Summarization
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020
 
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendationICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
 
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
 
OOMEN MEZARIS ReTV
OOMEN MEZARIS ReTVOOMEN MEZARIS ReTV
OOMEN MEZARIS ReTV
 
Implementing artificial intelligence strategies for content annotation and pu...
Implementing artificial intelligence strategies for content annotation and pu...Implementing artificial intelligence strategies for content annotation and pu...
Implementing artificial intelligence strategies for content annotation and pu...
 
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
 
Mini Project- Digital Video Editing
Mini Project- Digital Video EditingMini Project- Digital Video Editing
Mini Project- Digital Video Editing
 
Using TV Metadata to optimise the repurposing and republication of TV Content...
Using TV Metadata to optimise the repurposing and republication of TV Content...Using TV Metadata to optimise the repurposing and republication of TV Content...
Using TV Metadata to optimise the repurposing and republication of TV Content...
 
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
 ReTV: Bringing Broadcaster Archives to the 21st-century Audiences ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Software Tools for Building Industry 4.0 Applications
Software Tools for Building Industry 4.0 ApplicationsSoftware Tools for Building Industry 4.0 Applications
Software Tools for Building Industry 4.0 Applications
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarization
 
Paper id 36201508
Paper id 36201508Paper id 36201508
Paper id 36201508
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 
UVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxUVM_Full_Print_n.pptx
UVM_Full_Print_n.pptx
 
COCOMO methods for software size estimation
COCOMO methods for software size estimationCOCOMO methods for software size estimation
COCOMO methods for software size estimation
 
Arneb
ArnebArneb
Arneb
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
 

More from VasileiosMezaris

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
VasileiosMezaris
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages Task
VasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
VasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
VasileiosMezaris
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
VasileiosMezaris
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video Search
VasileiosMezaris
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
VasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
VasileiosMezaris
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...
VasileiosMezaris
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
VasileiosMezaris
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
VasileiosMezaris
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
VasileiosMezaris
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
VasileiosMezaris
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
VasileiosMezaris
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
VasileiosMezaris
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruning
VasileiosMezaris
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...
VasileiosMezaris
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networks
VasileiosMezaris
 

More from VasileiosMezaris (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages Task
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video Search
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruning
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networks
 

Recently uploaded

Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 

Recently uploaded (20)

Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

  • 1. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Title of presentation Subtitle Name of presenter Date Unsupervised Video Summarization via Attention-Driven Adversarial Learning E. Apostolidis1,2, E. Adamantidou1, A. I. Metsai1, V. Mezaris1, I. Patras2 1 CERTH-ITI, Thermi - Thessaloniki, Greece 2 School of EECS, Queen Mary University of London, London, UK 26th Int. Conf. on Multimedia Modeling Daejeon, Korea, January 2020
  • 2. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Outline 2  Introduction  Motivation  Developed approach  Experiments  Summarization example  Conclusions
  • 3. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 3 Video summary: a short visual summary that encapsulates the flow of the story and the essential parts of the full-length video Original video Video summary (storyboard) Problem statement
  • 4. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 4 Problem statement Applications of video summarization  Professional CMS: effective indexing, browsing, retrieval & promotion of media assets  Video sharing platforms: improved viewer experience, enhanced viewer engagement & increased content consumption  Other summarization scenarios: movie trailer production, sports highlights video generation, video synopsis of 24h surveillance recordings
  • 5. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 5 Related work Deep-learning approaches  Supervised methods that use types of feedforward neural nets (e.g. CNNs), extract and use video semantics to identify important video parts based on sequence labeling [21], self- attention networks [7], or video-level metadata [17]  Supervised approaches that capture the story flow using recurrent neural nets (e.g. LSTMs)  In combination with statistical models to select a representative and diverse set of keyframes [27]  In hierarchies to identify the video structure and select key-fragments [30, 31]  In combination with DTR units and GANs to capture long-range frame dependency [28]  To form attention-based encoder-decoders [9, 13], or memory-augmented networks [8]  Unsupervised algorithms that do not rely on human-annotations, and build summaries  Using adversarial learning to: minimize the distance between videos and their summary-based reconstructions [1, 16]; maximize the mutual information between summary and video [25]; learn a mapping from raw videos to human-like summaries based on online available summaries [20]  Through a decision-making process that is learned via RL and reward functions [32]  By learning to extract key motions of appearing objects [29]
  • 6. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 6 Motivation Disadvantages of supervised learning  Restricted amount of annotated data is available for supervised training of a video summarization method  Highly-subjective nature of video summarization (relying on viewer’s demands and aesthetics); there is no “ideal” or commonly accepted summary that could be used for training an algorithm Advantages of unsupervised learning  No need for learning data; avoid laborious and time-demanding labeling of video data  Adaptability to different types of video; summarization is learned based on the video content
  • 7. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 7 Contributions  Introduce an attention mechanism in an unsupervised learning framework, whereas all previous attention-based summarization methods ([7-9, 13]) were supervised  Investigate the integration of an attention mechanism into a variational auto-encoder for video summarization purposes  Use attention to guide the generative adversarial training of the model, rather than using it to rank the video fragments as in [9]
  • 8. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  Starting point: the SUM-GAN architecture  Main idea: build a keyframe selection mechanism by minimizing the distance between the deep representations of the original video and a reconstructed version of it based on the selected keyframes  Problem: how to define a good distance?  Solution: use a trainable discriminator network!  Goal: train the Summarizer to maximally confuse the Discriminator when distinguishing the original from the reconstructed video 8 Building on adversarial learning
  • 9. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  SUM-GAN-sl:  Contains a linear compression layer that reduces the size of CNN feature vectors 9 Building on adversarial learning
  • 10. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  SUM-GAN-sl:  Contains a linear compression layer that reduces the size of CNN feature vectors  Follows an incremental and fine-grained approach to train the model’s components 10 Building on adversarial learning
  • 11. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 11 Building on adversarial learning  SUM-GAN-sl:  Contains a linear compression layer that reduces the size of CNN feature vectors  Follows an incremental and fine-grained approach to train the model’s components (regularization factor)
  • 12. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 12 Building on adversarial learning  SUM-GAN-sl:  Contains a linear compression layer that reduces the size of CNN feature vectors  Follows an incremental and fine-grained approach to train the model’s components
  • 13. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 13 Building on adversarial learning  SUM-GAN-sl:  Contains a linear compression layer that reduces the size of CNN feature vectors  Follows an incremental and fine-grained approach to train the model’s components
  • 14. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  Examined approaches: 1) Integrate an attention layer within the variational auto-encoder (VAE) of SUM-GAN-sl 2) Replace the VAE of SUM-GAN-sl with a deterministic attention auto-encoder 14 Introducing an attention mechanism
  • 15. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  Variational attention was described in [4] and used for natural language modeling  Models the attention vector as Gaussian distributed random variables 15 1) Adversarial learning driven by a variational attention auto-encoder Variational auto-encoder
  • 16. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  Extended SUM-GAN-sl with variational attention, forming the SUM-GAN-VAAE architecture  The attention weights for each frame were handled as random variables and a latent space was computed for these values, too  In every time-step t the attention component combines the encoder's output at t and the decoder's hidden state at t - 1 to compute an attention weight vector  The decoder was modified to update its hidden states based on both latent spaces during the reconstruction of the video 16 1) Adversarial learning driven by a variational attention auto-encoder Variational attention auto-encoder
  • 17. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  Inspired by the efficiency of attention-based encoder-decoder network in [13]  Built on the findings of [4] w.r.t. the impact of deterministic attention on VAE  VAE was entirely replaced by an attention auto- encoder (AAE) network, forming the SUM- GAN-AAE architecture 17 2) Adversarial learning driven by deterministic attention auto-encoder
  • 18. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 18 2) Adversarial learning driven by deterministic attention auto-encoder Attention auto-encoder Processing pipeline
  • 19. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 19 2) Adversarial learning driven by deterministic attention auto-encoder Processing pipeline  Weighted feature vectors fed to the Encoder Attention auto-encoder
  • 20. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 20 2) Adversarial learning driven by deterministic attention auto-encoder Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  For t > 1: use the hidden state of the previous Decoder’s step (h1)  For t = 1: use the hidden state of the last Encoder’s step (He) Attention auto-encoder
  • 21. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 21 2) Adversarial learning driven by deterministic attention auto-encoder Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using: Attention auto-encoder
  • 22. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function Developed approach 22 2) Adversarial learning driven by deterministic attention auto-encoder Attention auto-encoder
  • 23. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’ Developed approach 23 2) Adversarial learning driven by deterministic attention auto-encoder Attention auto-encoder
  • 24. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’  vt’ combined with Decoder’s previous output yt-1 Developed approach 24 2) Adversarial learning driven by deterministic attention auto-encoder Attention auto-encoder
  • 25. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach 25 2) Adversarial learning driven by deterministic attention auto-encoder Attention auto-encoder Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’  vt’ combined with Decoder’s previous output yt-1  Decoder gradually reconstructs the video
  • 26. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach  Input: The CNN feature vectors of the (sampled) video frames  Output: Frame-level importance scores  Summarization process:  CNN features pass through the linear compression layer and the frame selector  importance scores computed at frame-level  Given a video segmentation (using KTS [18]) calculate fragment-level importance scores by averaging the scores of each fragment's frames  Summary is created by selecting the fragments that maximize the total importance score provided that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem 26 Model’s I/O and summarization process
  • 27. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 27 Datasets  SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)  25 videos capturing multiple events (e.g. cooking and sports)  video length: 1.6 to 6.5 min  annotation: fragment-based video summaries  TVSum (https://github.com/yalesong/tvsum)  50 videos from 10 categories of TRECVid MED task  video length: 1 to 5 min  annotation: frame-level importance scores
  • 28. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 28 Evaluation protocol  The generated summary should not exceed 15% of the video length  Similarity between automatically generated (A) and ground-truth (G) summary is expressed by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| || means duration)  Typical metrics for computing Precision and Recall at the frame-level
  • 29. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 29 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
  • 30. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 30 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
  • 31. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 31 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33]) F-Score1
  • 32. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 32 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33]) F-Score2 F-Score1
  • 33. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 33 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33]) F-ScoreN F-Score2 F-Score1
  • 34. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 34 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33]) F-ScoreN F-Score2 F-Score1 SumMe: TVSum: N
  • 35. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 35 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Alternative approach (used in [9, 13, 16, 24, 25, 28])
  • 36. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 36 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Alternative approach (used in [9, 13, 16, 24, 25, 28]) F-Score
  • 37. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Videos were down-sampled to 2 fps  Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet  Linear compression layer reduces the size of these vectors from 1024 to 500  All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM  Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s learning rate = 10-5  Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing set having the remaining 20% of data  Ran experiments on 5 differently created random splits and report the average performance at the training-epoch-level (i.e. for the same training epoch) over these runs Experiments 37 Implementation details
  • 38. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 1: Assessing the impact of regularization factor σ Experiments 38 SUM-GAN-sl SUM-GAN-VAAE SUM-GAN-AAE
  • 39. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 1: Assessing the impact of regularization factor σ  Outcomes:  Value of σ affects the models’ performance and needs fine-tuning  Fine-tuning is dataset-dependent  Best overall performance for each model, observed for different σ value Experiments 39 SUM-GAN-sl SUM-GAN-VAAE SUM-GAN-AAE
  • 40. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 2: Comparison with SoA unsupervised approaches based on multiple user summaries  Outcomes  A few SoA methods are comparable (or even worse) with a random summary generator  Best method on TVSum shows random-level performance on SumMe  Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum  Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two latent spaces in parallel to the continuous update of the model's components during the training  Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl Experiments 40 +/- indicate better/worse performance compared to SUM-GAN-AAE Note: SUM-GAN is not listed in this table as it follows the single gt-summary evaluation protocol
  • 41. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 3: Evaluating the effect of the introduced AAE component  Key-fragment selection: Attention mechanism leads to much smoother series of importance scores Experiments 41
  • 42. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 3: Evaluating the effect of the introduced AAE component  Training efficiency: much faster and more stable training of the model Experiments 42 Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
  • 43. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 4: Comparison with SoA supervised approaches based on multiple user summaries  Outcomes  Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as they exhibit random-level performance on SumMe  Only a few supervised methods surpass the performance of a random summary generator on both datasets, with VASNet being the best among them  The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum  Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods Experiments 43
  • 44. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 5: Comparison with SoA approaches based on single ground-truth summaries  Impact of regularization factor σ (best scores in bold)  The model’s performance is affected by the value of σ  The effect of σ depends (also) on the evaluation approach; best performance when using multiple human summaries was observed for σ = 0.15  SUM-GAN-AAE outperforms the original SUM-GAN model on both datasets, even for the same value of σ Experiments 44
  • 45. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Step 5: Comparison with SoA approaches based on single ground-truth summaries  Outcomes  SUM-GAN-AAE model performs consistently well on both datasets  SUM-GAN-AAE shows advanced performance compared to SoA supervised and unsupervised (*) summarization methods Experiments 45 Unsupervised approaches marked with an asterisk
  • 46. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Summarization example 46 Full video Generated summary
  • 47. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Presented a video summarization method that combines:  The effectiveness of attention mechanisms in spotting the most important parts of the video  The learning efficiency of the generative adversarial networks for unsupervised training  Experimental evaluations on two benchmarking datasets:  Documented the positive contribution of the introduced attention auto-encoder component in the model's training and summarization performance  Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video summarization techniques Conclusions 47
  • 48. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 1. E. Apostolidis, et al.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: AI4TV, ACM MM 2019 2. E. Apostolidis, et al.: Fast shot segmentation combining global and local visual descriptors. In: IEEE ICASSP 2014. pp. 6583-6587 3. K. Apostolidis, et al.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: MMM 2018. pp. 29-41 4. H. Bahuleyan, et al.: Variational attention for sequence-to-sequence models. In: 27th COLING. pp. 1672-1682 (2018) 5. J. Cho: PyTorch implementation of SUM-GAN (2017), https://github.com/j-min/Adversarial Video Summary, (last accessed on Oct. 18, 2019) 6. M. Elfeki, et al.: Video summarization via actionness ranking. In: IEEE WACV 2019. pp. 754-763 7. J. Fajtl, et al.: Summarizing videos with attention. In: ACCV 2018. pp. 39-54 8. L. Feng, et al.: Extractive video summarizer with memory augmented neural networks. In: ACM MM 2018. pp. 976-983 9. T. Fu, et al.: Attentive and adversarial learning for video summarization. In: IEEE WACV 2019. pp. 1579-1587 10. M. Gygli, et al.: Creating summaries from user videos. In: ECCV 2014. pp. 505-520 11. M. Gygli, et al.: Video summarization by learning submodular mixtures of objectives. In: IEEE CVPR 2015. pp. 3090-3098 12. S. Hochreiter, et al.: Long Short-Term Memory. Neural Computation 9(8), 1735-1780 (1997) 13. Z. Ji, et al.: Video summarization with attention-based encoder-decoder networks. IEEE Trans. on Circuits and Systems for Video Technology (2019) 14. D. Kaufman, et al.: Temporal Tessellation: A unified approach for video analysis. In: IEEE ICCV 2017. pp. 94-104 15. S. Lee, et al.: A memory network approach for story-based temporal summarization of 360 videos. In: IEEE CVPR 2018. pp. 1410-1419 16. B. Mahasseni, et al.: Unsupervised video summarization with adversarial LSTM networks. In: IEEE CVPR 2017. pp. 2982-2991 17. M. Otani, et al.: Video summarization using deep semantic features. In: ACCV 2016. pp. 361-377 Key references 48
  • 49. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 18. D. Potapov, et al.: Category-specific video summarization. In: ECCV 2014. pp. 540-555 19. A. Radford, et al.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR 2016 20. M. Rochan, et al.: Video summarization by learning from unpaired data. In: IEEE CVPR 2019 21. M. Rochan, et al.: Video summarization using fully convolutional sequence networks. In: ECCV 2018. pp. 358-374 22. Y. Song, et al.: TVSum: Summarizing web videos using titles. In: IEEE CVPR 2015. pp. 5179-5187 23. C. Szegedy, et al.: Going deeper with convolutions. In: IEEE CVPR 2015. pp. 1-9 24. H. Wei, et al.: Video summarization via semantic attended networks. In: AAAI 2018. pp. 216-223 25. L. Yuan, et al.: Cycle-SUM: Cycle-consistent adversarial LSTM networks for unsupervised video summarization. In: AAAI 2019. pp. 9143- 9150 26. Y. Yuan, et al.: Video summarization by learning deep side semantic embedding. IEEE Trans. on Circuits and Systems for Video Technology 29(1), 226-237 (2019) 27. K. Zhang, et al.: Video summarization with Long Short-Term Memory. In: ECCV 2016. pp. 766-782 28. Y. Zhang, et al.: DTR-GAN: Dilated temporal relational adversarial network for video summarization. In: ACM TURC 2019. pp. 89:1-89:6 29. Y. Zhang, et al.: Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters (2018) 30. B. Zhao, et al.: Hierarchical recurrent neural network for video summarization. In: ACM MM 2017. pp. 863-871 31. B. Zhao, et al.: HSA-RNN: Hierarchical structure-adaptive RNN for video summarization. In: IEEE/CVF CVPR 2018. pp. 7405-7414 32. K. Zhou, et al.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI 2018. pp. 7582-7589 33. K. Zhou, et al.: Video summarisation by classification with deep reinforcement learning. In: BMVC 2018 Key references 49
  • 50. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 50 Thank you for your attention! Questions? Vasileios Mezaris, bmezaris@iti.gr Code and documentation publicly available at: https://github.com/e-apostolidis/SUM-GAN-AAE This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV. The work of Ioannis Patras has been supported by EPSRC under grant No. EP/R026424/1.