Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

GAN-based video summarization Slide 1 GAN-based video summarization Slide 2 GAN-based video summarization Slide 3 GAN-based video summarization Slide 4 GAN-based video summarization Slide 5 GAN-based video summarization Slide 6 GAN-based video summarization Slide 7 GAN-based video summarization Slide 8 GAN-based video summarization Slide 9 GAN-based video summarization Slide 10 GAN-based video summarization Slide 11 GAN-based video summarization Slide 12 GAN-based video summarization Slide 13 GAN-based video summarization Slide 14 GAN-based video summarization Slide 15 GAN-based video summarization Slide 16 GAN-based video summarization Slide 17 GAN-based video summarization Slide 18 GAN-based video summarization Slide 19 GAN-based video summarization Slide 20 GAN-based video summarization Slide 21 GAN-based video summarization Slide 22 GAN-based video summarization Slide 23 GAN-based video summarization Slide 24 GAN-based video summarization Slide 25 GAN-based video summarization Slide 26 GAN-based video summarization Slide 27 GAN-based video summarization Slide 28 GAN-based video summarization Slide 29 GAN-based video summarization Slide 30 GAN-based video summarization Slide 31 GAN-based video summarization Slide 32 GAN-based video summarization Slide 33 GAN-based video summarization Slide 34 GAN-based video summarization Slide 35 GAN-based video summarization Slide 36 GAN-based video summarization Slide 37 GAN-based video summarization Slide 38
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

GAN-based video summarization

Download to read offline

Presentation on "GAN-based video summarization" at the AI4Media Workshop on GANs for Media Content Generation, on October 1st, 2020.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

GAN-based video summarization

  1. 1. Thessaloniki, October 2020 GAN-based Video Summarization Vasileios Mezaris CERTH-ITI Presentation at the AI4Media Workshop on GANs for Media Content Generation 1 Joint work with E. Apostolidis, E. Adamantidou, A. Metsai (CERTH-ITI); I. Patras (QMUL)
  2. 2. Thessaloniki, October 2020Vasileios Mezaris 2 Video summary: a short visual summary that encapsulates the flow of the story and the essential parts of the full-length video Original video Video summary (storyboard) Problem statement
  3. 3. Thessaloniki, October 2020Vasileios Mezaris 3 Problem statement Applications of video summarization  Professional CMS: effective indexing, browsing, retrieval & promotion of media assets  Video sharing platforms: improved viewer experience, enhanced viewer engagement & increased content consumption  Other summarization scenarios: movie trailer production, sports highlights video generation, video synopsis of 24h surveillance recordings
  4. 4. Thessaloniki, October 2020Vasileios Mezaris 4 Related work Deep-learning approaches  Various supervised methods (i.e., learning from ground-truth manually-generated summaries)  Using feedforward neural nets (CNNs) for e.g. identifying semantically-important video parts  Exploiting video-level metadata  Capturing the story flow using recurrent neural nets (e.g. LSTMs)  …and many more  Unsupervised algorithms that do not rely on human-annotations, and build summaries  Using adversarial learning to: minimize the distance between videos and their summary-based reconstructions; maximize the mutual information between summary and video; learn a mapping from raw videos to human-like summaries based on online available summaries  …and a few more approaches (see tutorial at IEEE ICME 2020, https://www.slideshare.net/VasileiosMezaris/icme2020-tutorial-videosummarizationpart1) + No need for training data (limited, hard to produce) + Avoid the subjectivity & biases of manually-generated summaries + Adaptability to different types of video
  5. 5. Thessaloniki, October 2020Vasileios Mezaris GANs for unsupervised video summarization  Our starting point: the SUM-GAN architecture [1]  Main idea: build a keyframe selection mechanism by minimizing the distance between the deep representations of the original video and a reconstructed version of it based on the selected keyframes  Problem: how to define a good distance?  Solution: use a trainable discriminator network!  Goal: train the Summarizer to maximally confuse the Discriminator when distinguishing the original from the reconstructed video 5 SUM-GAN [1] B. Mahasseni, M. Lam, S. Todorovic, "Unsupervised Video Summarization with Adversarial LSTM Networks“, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2982–2991.
  6. 6. Thessaloniki, October 2020Vasileios Mezaris  Introduces two extensions [2]:  A linear compression layer that reduces the size of the CNN feature vectors  An incremental and fine-grained approach to train the model’s components [2] E. Apostolidis, A. Metsai, E. Adamantidou, V. Mezaris, I. Patras, "A Stepwise, Label- based Approach for Improving the Adversarial Training in Unsupervised Video Summarization", Proc. 1st Int. Workshop on AI for Smart TV Content Production, Access and Delivery (AI4TV'19) at ACM Multimedia 2019, Nice, France, October 2019. 6 SUM-GAN-sl GANs for unsupervised video summarization
  7. 7. Thessaloniki, October 2020Vasileios Mezaris  Incremental approach to train the model’s components 7 SUM-GAN-sl GANs for unsupervised video summarization
  8. 8. Thessaloniki, October 2020Vasileios Mezaris 8 (regularization factor) SUM-GAN-sl GANs for unsupervised video summarization  Incremental approach to train the model’s components
  9. 9. Thessaloniki, October 2020Vasileios Mezaris 9 SUM-GAN-sl GANs for unsupervised video summarization  Incremental approach to train the model’s components
  10. 10. Thessaloniki, October 2020Vasileios Mezaris 10 SUM-GAN-sl GANs for unsupervised video summarization  Incremental approach to train the model’s components
  11. 11. Thessaloniki, October 2020Vasileios Mezaris  Adversarial learning driven by deterministic attention auto-encoder  The VAE in previous architecture was entirely replaced by an attention auto-encoder (AAE) network, forming the SUM-GAN-AAE architecture [3] [3] E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras, "Unsupervised Video Summarization via Attention-Driven Adversarial Learning", Proc. 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020. 11 SUM-GAN-AAE GANs for unsupervised video summarization
  12. 12. Thessaloniki, October 2020Vasileios Mezaris 12 Attention auto-encoder Processing pipeline SUM-GAN-AAE GANs for unsupervised video summarization
  13. 13. Thessaloniki, October 2020Vasileios Mezaris 13 Processing pipeline  Weighted feature vectors fed to the Encoder Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  14. 14. Thessaloniki, October 2020Vasileios Mezaris 14 Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  For t > 1: use the hidden state of the previous Decoder’s step (h1)  For t = 1: use the hidden state of the last Encoder’s step (He) Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  15. 15. Thessaloniki, October 2020Vasileios Mezaris 15 Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using: Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  16. 16. Thessaloniki, October 2020Vasileios Mezaris Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function 16 Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  17. 17. Thessaloniki, October 2020Vasileios Mezaris Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’ 17 Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  18. 18. Thessaloniki, October 2020Vasileios Mezaris Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’  vt’ combined with Decoder’s previous output yt-1 18 Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  19. 19. Thessaloniki, October 2020Vasileios Mezaris 19 Attention auto-encoder Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’  vt’ combined with Decoder’s previous output yt-1  Decoder gradually reconstructs the video SUM-GAN-AAE GANs for unsupervised video summarization
  20. 20. Thessaloniki, October 2020Vasileios Mezaris Video summarization practicalities  Input: The CNN feature vectors of the (sampled) video frames  Output: Frame-level importance scores  Summarization process:  CNN features pass through the linear compression layer and the frame selector  importance scores computed at frame-level  Given a video segmentation (using KTS) calculate fragment-level importance scores by averaging the scores of each fragment's frames  Summary is created by selecting the fragments that maximize the total importance score provided that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem 20 Model’s I/O and summarization process
  21. 21. Thessaloniki, October 2020Vasileios Mezaris Experiments 21 Datasets  SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)  25 videos capturing multiple events (e.g. cooking and sports)  video length: 1 to 6 min  annotation: fragment-based video summaries  TVSum (https://github.com/yalesong/tvsum)  50 videos from 10 categories of TRECVid MED task  video length: 1 to 11 min  annotation: frame-level importance scores
  22. 22. Thessaloniki, October 2020Vasileios Mezaris Experiments 22 Evaluation protocol  The generated summary should not exceed 15% of the video length  Similarity between automatically generated (A) and ground-truth (G) summary is expressed by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| || means duration)  Typical metrics for computing Precision and Recall at the frame-level
  23. 23. Thessaloniki, October 2020Vasileios Mezaris Experiments 23 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature
  24. 24. Thessaloniki, October 2020Vasileios Mezaris Experiments 24 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature
  25. 25. Thessaloniki, October 2020Vasileios Mezaris Experiments 25 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-Score1
  26. 26. Thessaloniki, October 2020Vasileios Mezaris Experiments 26 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-Score2 F-Score1
  27. 27. Thessaloniki, October 2020Vasileios Mezaris Experiments 27 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-ScoreN F-Score2 F-Score1
  28. 28. Thessaloniki, October 2020Vasileios Mezaris Experiments 28 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-ScoreN F-Score2 F-Score1 SumMe: TVSum: N
  29. 29. Thessaloniki, October 2020Vasileios Mezaris Experiments 29 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Alternative approach
  30. 30. Thessaloniki, October 2020Vasileios Mezaris Experiments 30 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Alternative approach F-Score
  31. 31. Thessaloniki, October 2020Vasileios Mezaris  Videos were down-sampled to 2 fps  Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet  Linear compression layer reduces the size of these vectors from 1024 to 500  All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM  Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s learning rate = 10-5  Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing set having the remaining 20% of data  Ran experiments on 5 differently created random splits and report the average performance at the training-epoch-level (i.e. for the same training epoch) over these runs Experiments 31 Implementation details
  32. 32. Thessaloniki, October 2020Vasileios Mezaris  Comparison with SoA unsupervised approaches based on multiple user summaries  Outcomes  A few SoA methods are comparable (or even worse) with a random summary generator  Best method on TVSum shows random-level performance on SumMe  Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum  Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two latent spaces in parallel to the continuous update of the model's components during the training  Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl Experiments 32 Note: SUM-GAN is not listed in this table as it follows the single gt-summary evaluation protocol
  33. 33. Thessaloniki, October 2020Vasileios Mezaris  Evaluating the effect of the AAE component  Training efficiency: much faster and more stable training of the model Experiments 33 Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
  34. 34. Thessaloniki, October 2020Vasileios Mezaris  Comparison with SoA supervised approaches based on multiple user summaries  Outcomes  Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as they exhibit random-level performance on SumMe  Only a few supervised methods surpass the performance of a random summary generator on both datasets, with VASNet being the best among them  The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum  Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods Experiments 34 +/- indicate better/worse performance compared to SUM-GAN-AAE
  35. 35. Thessaloniki, October 2020Vasileios Mezaris Adapting / re-purposing the content  Main requirements:  Target distribution platforms & devices have varying requirements (e.g. the optimal duration of a video differs from one platform to another)  Target audiences have different preferences / information needs  Video summarization:  Create editions of the content that are adapted to different platforms and audiences 35
  36. 36. Thessaloniki, October 2020Vasileios Mezaris Adapting / re-purposing the content Web application [4] for video summarization (try it with your video!): http://multimedia2.iti.gr/videosummarization/service/start.html Demo video: https://youtu.be/LbjPLJzeNII 36 [4] C. Collyda, K. Apostolidis, E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, "A Web Service for Video Summarization", Proc. ACM Int. Conf. on Interactive Media Experiences (IMX 2020), Barcelona, Spain, June 2020.
  37. 37. Thessaloniki, October 2020Vasileios Mezaris  Presented two new video summarization methods, making use of:  The learning efficiency of the generative adversarial networks for unsupervised training  The effectiveness of attention mechanisms in spotting the most important parts of the video  Experimental evaluations on two benchmarking datasets  Documented the positive contribution of the introduced attention auto-encoder component in the model's training and summarization performance  Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video summarization techniques  Used GANs in a new web application for video summarization  Keep in mind: complete automation is sometimes not desired! (AI + human symbiosis is key) Conclusions 37
  38. 38. Thessaloniki, October 2020Vasileios Mezaris Questions? 38 Contact: Dr. Vasileios Mezaris Information Technologies Institute Centre for Research and Technology Hellas Thermi-Thessaloniki, Greece Tel: +30 2311 257770 Email: bmezaris@iti.gr, web: http://www.iti.gr/~bmezaris/ This work was supported in part by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV.

Presentation on "GAN-based video summarization" at the AI4Media Workshop on GANs for Media Content Generation, on October 1st, 2020.

Views

Total views

278

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

4

Shares

0

Comments

0

Likes

0

×