ReTV AI4TV Summarization

ReTV project
ReTV projectReTV project
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Title of presentation
Subtitle
Name of presenter
Date
A Stepwise, Label-based Approach for Improving the
Adversarial Training in Unsupervised Video Summarization
E. Apostolidis1,2, A. I. Metsai1, E. Adamantidou1, V. Mezaris1, I. Patras2
1 CERTH-ITI, Thermi - Thessaloniki, Greece
2 School of EECS, Queen Mary University of London, London, UK
1st AI4TV Workshop, ACM MM 2019,
Nice, France, October 2019
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Outline
2
⚫ Problem statement
⚫ Related work
⚫ Limitations of existing approaches
⚫ Developed approach
⚫ Experiments
⚫ Summarization examples
⚫ Summary and next steps
⚫ Key references
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
3
Video summary: a short visual summary that encapsulates the flow of the story and
the essential parts of the full-length video
Original video
Video summary (storyboard)
Problem statement
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
4
Problem statement
General applications of video summarization
⚫ Professional CMS: effective indexing,
browsing, retrieval & promotion of media
assets!
⚫ Video sharing platforms: improved viewer
experience, enhanced viewer engagement &
increased content consumption!
⚫ However, different distribution channels have different requirements / restrictions w.r.t. the
video content duration (e.g. optimal Twitter videos < 45 sec.; ideal YouTube videos < 2 min.)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
5
Related work
Deep-learning approaches
⚫ CNN-based methods that extract and use video semantics to identify important video parts
based on sequence labeling [49], self-attention networks [18], or video-level metadata [44, 46]
⚫ RNN-based approaches that capture temporal dependency over the video frames, using
⚫ e.g. LSTMs for keyframe selection [64]
⚫ hierarchies of LSTMs to identify the video structure and select key-fragments [70, 71]
⚫ combinations of LSTMs with DTR units and GANs to capture long-range frame dependency [67]
⚫ attention-based encoder-decoders [22, 28], or memory-augmented networks [19]
⚫ Unsupervised methods that do not rely on human-annotations, and build summaries
⚫ using adversarial learning objectives (GANs) to minimize the distance between training videos and a
distribution of their summarizations [40], or to maximize mutual information between the created
summary and the original video [60]
⚫ through a decision-making process that is learned via RL and reward functions [73]
⚫ by learning to extract key motions of appearing objects [68]
⚫ by learning a “video-to-summary” mapping from unpaired edited summaries available online [48]
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
6
Motivation and reasoning
Disadvantages of supervised learning
⚫ Restricted amount of annotated data is available for supervised training of a video
summarization method
⚫ Highly-subjective nature of video summarization (relying on viewer’s demands and aesthetics);
there is no “ideal” or commonly accepted summary that could be used for training an algorithm
Advantages of unsupervised learning
⚫ No need for learning data; avoid laborious and time-demanding labeling of video data
⚫ Adaptability to different types of video; summarization is learned based on the video content
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
⚫ Starting point: the SUM-GAN architecture
⚫ Main idea: build a keyframe selection mechanism
by minimizing the distance between the deep
representations of the original video and a
reconstructed version of it based on the selected
key-frames
⚫ Problem: how to define a good distance?
⚫ Solution: use a trainable discriminator network!
⚫ Goal: train the Summarizer to maximally confuse
the Discriminator when distinguishing the original
from the reconstructed video
7
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
⚫ New model: the SUM-GAN-sl architecture
⚫ Contains a linear compression layer that reduces:
⚫ The size of the CNN feature vectors from 1024
to 500
⚫ Thus also the number of trainable parameters
8
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
⚫ New model: the SUM-GAN-sl architecture
⚫ Follows an incremental and fine-grained approach to training the model’s components
9
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
⚫ New model: the SUM-GAN-sl architecture
⚫ Follows an incremental and fine-grained approach to training the model’s components
10
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
⚫ New model: the SUM-GAN-sl architecture
⚫ Follows an incremental and fine-grained approach to training the model’s components
11
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
⚫ New model: the SUM-GAN-sl architecture
⚫ Follows an incremental and fine-grained approach to training the model’s components
12
Building on adversarial learning
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
13
Datasets
⚫ SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)
⚫ 25 videos capturing multiple events (e.g. cooking and sports)
⚫ video length: 1.6 to 6.5 min
⚫ annotation: fragment-based video summaries
⚫ TVSum (https://github.com/yalesong/tvsum)
⚫ 50 videos from 10 categories of TRECVid MED task
⚫ video length: 1 to 5 min
⚫ annotation: frame-level importance scores
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
14
Evaluation protocol
⚫ The generated summary should not exceed 15% of the video length
⚫ Similarity between automatically generated (A) and ground-truth (G) summary is expressed
by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| ||
means duration)
⚫ Typical metrics for computing Precision and Recall at the frame-level
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
15
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
16
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
17
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
18
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
F-Score2
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
19
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
F-ScoreN
F-Score2
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
20
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
F-ScoreN
F-Score2
F-Score1
SumMe: TVSum:
N
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
21
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Alternative approach (used in [22, 28, 40, 58, 60, 67])
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
22
Evaluation protocol
⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary
⚫ Alternative approach (used in [22, 28, 40, 58, 60, 67])
F-Score
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
⚫ To get insights about the efficiency of the used datasets and metrics, we examined:
⚫ the efficiency of a randomly generated summary (frames' importance scores were defined based
on a uniform distribution of probabilities, and the experiment was performed 100 times)
⚫ the human performance, i.e. how well a human annotator would perform based on the preferences
of the remaining annotators
⚫ the highest performance on TVSum according to the best user summary (with the highest overlap)
for each video of the dataset (best for SumMe is 100%)
⚫ Results consistent with the findings of a recent CVPR paper ([41])
Experiments
23
Preliminary study on summarization datasets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
⚫ Step 1: Assessing the impact of regularization factor σ
Experiments
24
Evaluation outcomes
Findings
⚫ Value of σ affects the models’ performance and needs fine-
tuning
⚫ Too small and too large values lead to reduced efficiency, and
only a specific range of values results in good performance
⚫ Fine-tuning is dataset-dependent as the highest performance is
achieved for different values of σ in each dataset
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
⚫ Step 2: Selecting the best configuration
Experiments
25
Evaluation outcomes
Learning curves
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Findings
⚫ A few SoA methods are comparable (or even worse) with a random summary generator
⚫ Best method on SumMe (UnpairedVSN) performs slightly better than our method, and it is
less competitive on TVSum
⚫ Best method on TVSum shows random-level performance on SumMe (seems to be dataset-
tailored)
⚫ SUM-GAN-sl performs consistently well in both datasets and is the most competitive one
⚫ Step 3: Comparison with SoA unsupervised
approaches based on multiple user summaries
Experiments
26
Evaluation outcomes
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Findings
⚫ Best methods in TVSum (MAVS & Tessellationsup) are highly-adapted to this dataset, as they
exhibit random-level performance on SumMe
⚫ Only a few supervised methods clearly surpass the performance of a randomly-generated
summary on both datasets, with VASNet being the best among them
⚫ Their performance ranges in [44.1 - 49.7] on SumMe, and in [56.1 - 61.4] on TVSum
⚫ The performance of SUM-GAN-sl makes our unsupervised method comparable with SoA
supervised techniques for video summarization
⚫ Step 4: Comparison with SoA
supervised approaches based
on multiple user summaries
Experiments
27
Evaluation outcomes
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Findings
⚫ Model’s performance is affected by the value of σ
in a way similar to the one reported in [40]
⚫ Step 5: Comparison with SoA approaches based on single ground-truth summaries
⚫ Impact of regularization factor σ
Experiments
28
Evaluation outcomes
⚫ Τhe effect of this hyper-parameter depends on the used evaluation approach (best performance
when using multiple human summaries was observed for σ = 0.1)
⚫ SUM-GAN-sl clearly outperforms the original SUM-GAN model on both datasets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Findings
⚫ Best version of SUM-GAN-sl (observed for σ = 0.5)
exceeds the performance of all other techniques
(both supervised and unsupervised ones) that
follow this evaluation protocol
⚫ Step 5: Comparison with SoA approaches based on single ground-truth summaries
Experiments
29
Evaluation outcomes
Unsupervised approaches
marked with an asterisk
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Summarization examples
30
Full video Generated summary
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
⚫ Summary
⚫ Conducted a study to assess and advance the effectiveness of unsupervised video summarization based
on adversarial learning
⚫ Focused on the SUM-GAN model and suggested a new training approach to advance the learning
efficiency of the adversarial module of the architecture
⚫ Examined the evaluation protocols and metrics, and made estimates on the possible performance on two
datasets and the suitability of the used metrics
⚫ Comparative evaluations with SoA showed that our model is among the best unsupervised methods and
comparable with supervised algorithms too
⚫ Next steps
⚫ Further improve our model by exploiting the efficiency of attention networks and the training capacity of
reinforcement learning approaches
⚫ Extend our model with mechanisms that capture the temporal structure of the video to support multi-
level video summarization
⚫ Investigate methods for video summarization tailored to the targeted audience and distribution channel
(apply rules about the content of the summary; integrate a human-critic in the pipeline)
Summary and next steps
31
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
1. J. Almeida, et al. Vison: Video summarization for online applications. Pat. Recogn. Lett., 33(4):397-409, Mar. 2012
2. R. Anirudh, et al. Diversity promoting online sampling for streaming video summarization. ICIP 2016, pp 3329-3333, Sept 2016
3. H. Bahuleyan, et al. Variational attention for sequence-to-sequence models. 27th Int. Conf. on Computational Linguistics, pp 1672-1682, USA, Aug. 2018
4. J. Calic, et al. Efficient layout of comic-like video summaries. IEEE Transactions on Circuits and Systems for Video Technology, 17(7):931-936, July 2007
5. P. P. K. Chan, et al. A novel method to reduce redundancy in adaptive threshold clustering key frame extraction systems. ICMLC 2011, vol 4, pp 1637-1642
6. J. Cho. PyTorch Implementation of SUM-GAN from “Unsupervised Video Summarization with Adversarial LSTM Networks”, 2017
7. K. Cho, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014 , pp. 1724-1734, Qatar, Oct. 2014
8. W. Chu, et al. Video co-summarization: Video summarization by visual co-occurrence. CVPR 2015, pp 3584-3592, June 2015
9. Y. Cong, et al. Sparse reconstruction cost for abnormal event detection. CVPR 2011, pp 3449-3456, June 2011
10. K. Darabi, et al. Personalized video summarization based on group scoring. ChinaSIP 2014, pp 310-314, July 2014
11. K. Davila, et al. Whiteboard video summarization via spatio-temporal conflict minimization. ICDAR 2017, vol 1, pp 355-362, Nov. 2017
12. S. E. F. de Avila, et al. Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pat. Rec. Lett., 32(1):56-68, Jan. 2011
13. M. S. Drew, et al. Clustering of compressed illumination-invariant chromaticity signatures for efficient video summarization. Im. Vis. Comp., 21:705-716, 2003
14. N. Ejaz, et al. Feature aggregation based visual attention model for video summarization. Computers & Electrical Engineering, 40(3):993-1005, 2014
15. N. Ejaz, et al. Adaptive key frame extraction for video summarization using an aggregation mechanism. J. of Vis. Comm. & Image Repr., 23(7):1031-1040, 2012
16. M. Elfeki, et al. Video summarization via actionness ranking. WACV 2019, Waikoloa Village, HI, USA, pp 754-763, Jan 2019
17. E. Elhamifar, et al. See all by looking at a few: Sparse modeling for finding representative objects. CVPR 2012, pp 1600-1607, June 2012
18. J. Fajtl, et al. Summarizing videos with attention. ACCV 2018 Workshops, pp 39-54, Cham, 2019. Springer International Publishing
19. L. Feng, et al. Extractive video summarizer with memory augmented neural networks. ACM MM 2018, pp 976-983, New York, NY, USA, 2018. ACM
20. S. Feng, et al. Online content-aware video condensation. CVPR 2012, pp 2082-2087, June 2012
21. M. Fleischman, et al. Temporal feature induction for baseball highlight classification. ACM MM 2007, pp 333-336, New York, NY, USA, 2007. ACM
22. T. Fu, et al. Attentive and adversarial learning for video summarization. WACV 2019, Waikoloa Village, HI, USA, pp 1579-1587, January 2019
23. M. Furini, et al. Stimo: Still and moving video storyboard for the web scenario. Multimedia Tools Appl., 46(1):47-69, Jan. 2010
Key references
32
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
24. B. Gong, et al. Diverse sequential subset selection for supervised video summarization. NIPS 2014, pp 2069-2077, Cambridge, MA, USA, 2014. MIT Press
25. M. Gygli, et al. Video summarization by learning sub-modular mixtures of objectives. CVPR 2015, pp 3090-3098, June 2015
26. M. Gygli, et al. Creating summaries from user videos. ECCV 2014, pp 505-520, Cham, 2014. Springer International Publishing
27. S. Hochreiter et al. Long short-term memory. Neural computation, 9(8):1735-1780, 1997
28. Z. Ji, et al. Video summarization with attention-based encoder-decoder networks. IEEE Trans. on Circuits and Systems for Video Technology, 2019
29. D. Kaufman, G. Levi, T. Hassner, and L. Wolf. Temporal tessellation: A unified approach for video analysis. ICCV 2017, pp 94-104, Oct 2017
30. A. Khosla, et al. Large-scale video summarization using web-image priors. CVPR 2013, pp 2698-2705, June 2013
31. C. Kim, et al. Object-based video abstraction for video surveillance systems. IEEE TCSVT, 12(12):1128-1138, Dec 2002
32. J.-L. Lai, et al. Key frame extraction based on visual attention model. Journal of Visual Communication and Image Representation, 23(1):114-125, 2012
33. S. Lee, et al. A Memory Network Approach for Story-based Temporal Summarization of 360 Videos. CVPR 2018
34. Y. J. Lee, et al. Discovering important people and objects for egocentric video summarization. CVPR 2012, pp 1346-1353, June 2012
35. X. Li, et al. A general framework for edited video and raw video summarization. IEEE Trans. on Image Processing, 26(8):3652-3664, Aug 2017
36. Y. Li, et al. An overview of video abstraction techniques. Hewlett Packard, Technical Reports, 01 2001
37. S. Lu, et al. A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE TMM, 16(6):1497-1509, Oct 2014
38. Z. Lu, et al. Story-driven summarization for egocentric video. CVPR 2013, pp 2714-2721, June 2013
39. Y.-F. Ma, et al. A generic framework of user attention model and its application in video summarization. IEEE Trans. on Multimedia, 7(5):907{919, Oct 2005
40. B. Mahasseni, et al. Unsupervised video summarization with adversarial LSTM networks. CVPR 2017, pp 2982-2991
41. M. Otani, et al. Rethinking the evaluation of video summaries. CVPR 2019
42. M. Otani, et al. Video summarization using deep semantic features. ACCV 2016
43. M. Otani, et al. Video summarization using textual descriptions for authoring video blogs. Multimedia Tools and Applications, 76(9):12097-12115, May 2017
44. R. Panda, et al. Weakly supervised summarization of web videos. ICCV 2017, pp 3677-3686, Oct 2017
45. J. Peng, et al. Keyframe-based video summary using visual attention clues. IEEE MultiMedia, 17(2):64-73, April 2010
46. D. Potapov, et al. Category-specific video summarization. ECCV 2017, pages 540-555, Zurich, Switzerland, Sept. 2014. Springer
Key references
33
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
47. A. Radford, et al. Unsupervised representation learning with deep convolutional generative adversarial networks. ICLR 2016
48. M. Rochan, et al. Video summarization by learning from unpaired data. CVPR 2019
49. M. Rochan, et al. Video summarization using fully convolutional sequence networks. ECCV 2018, pp 358-374, Cham, 2018. Springer International Publishing
50. A. Sharghi, et al. Query-focused extractive video summarization. ECCV 2016, pp 3-19, Cham, 2016. Springer International Publishing
51. Y. Song, et al. Tvsum: Summarizing web videos using titles. CVPR 2015, pp 5179-5187, June 2015
52. M. Sun, et al. Ranking highlights in personal videos by analyzing edited videos. IEEE Trans. on Image Processing, 25(11):5145-5157, Nov 2016
53. C. Szegedy, et al. Going deeper with convolutions. CVPR 2015, pp 1-9, June 2015
54. B. T. Truong, et al. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl., 3(1), Feb. 2007
55. A. B. Vasudevan, et al. Query-adaptive video summarization via quality-aware relevance estimation. ACM MM 2017, pp 582-590, NY, USA, 2017. ACM
56. T. Wang, et al. Video collage: A novel presentation of video sequence. ICME 2007, pp 1479-1482, July 2007
57. X. Wang, et al. User-specific video summarization. ICMSP 2011, vol 1, pp 213-219, May 2011
58. H. Wei, et al. Video summarization via semantic attended networks. AAAI 2018
59. T. Yao, et al. Highlight detection with pairwise deep ranking for first-person video summarization. CVPR 2016, pp 982-990, June 2016
60. L. Yuan, et al. Cycle-sum: Cycle-consistent adversarial LSTM networks for unsupervised video summarization. AAAI 2019
61. Y. Yuan, et al. Video summarization by learning deep side semantic embedding. IEEE TCSVT, 29(1):226-237, Jan 2019
62. H. Zhang, et al. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 30:643-658, 1997
63. K. Zhang, et al. Summary transfer: Exemplar-based subset selection for video summarization. CVPR 2016, pp 1059-1067, 2016
64. K. Zhang, et al. Video summarization with long short-term memory. ECCV 2016, pp 766-782, Cham, 2016. Springer International Publishing
65. S. Zhang, et al. Context-aware surveillance video summarization. IEEE Trans. on Image Processing, 25(11):5469-5478, Nov 2016
66. X.-D. Zhang, et al. Dynamic selection and effective compression of key frames for video abstraction. Pattern Recogn. Lett., 24(9-10):1523-1532, June 2003
67. Y. Zhang, et al. DTR-GAN: Dilated temporal relational adversarial network for video summarization. ACM TUR-C 2019: 89:1-89:6
68. Y. Zhang, et al. Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters, 2018
69. Y. Zhang, et al. Motion-state-adaptive video summarization via spatiotemporal analysis. IEEE TCSVT, 27(6):1340-1352, June 2017
Key references
34
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
70. B. Zhao, et al. Hierarchical recurrent neural network for video summarization. ACM MM 2017, pp 863-871, New York, NY, USA, 2017. ACM
71. B. Zhao, et al. HSA-RNN: Hierarchical structure-adaptive RNN for video summarization. CVPR 2018
72. B. Zhao, et al. Quasi real-time summarization for consumer videos. CVPR 2014, pp 2513-2520, June 2014
73. K. Zhou, et al. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. AAAI 2018
74. K. Zhou, et al. Video summarization by classification with deep reinforcement learning. BMVC 2018
75. G. Zhu, et al. Trajectory based event tactics analysis in broadcast sports video. ACM MM 2007, pp 58-67, New York, NY, USA, 2007. ACM
Key references
35
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
36
Thank you for your attention!
Questions?
Vasileios Mezaris, bmezaris@iti.gr
Code and documentation publicly available at:
https://github.com/e-apostolidis/SUM-GAN-sl
This work was supported by the EUs Horizon 2020 research and innovation
programme under grant agreement H2020-780656 ReTV
1 of 36

Recommended

Icme2020 tutorial video_summarization_part1 by
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1VasileiosMezaris
1K views112 slides
Unsupervised Video Summarization via Attention-Driven Adversarial Learning by
Unsupervised Video Summarization via Attention-Driven Adversarial LearningUnsupervised Video Summarization via Attention-Driven Adversarial Learning
Unsupervised Video Summarization via Attention-Driven Adversarial LearningVasileiosMezaris
1.3K views50 slides
Implementing Artificial Intelligence Strategies for Content Annotation and Pu... by
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...Implementing Artificial Intelligence Strategies for Content Annotation and Pu...
Implementing Artificial Intelligence Strategies for Content Annotation and Pu...ReTV project
50 views23 slides
GAN-based video summarization by
GAN-based video summarizationGAN-based video summarization
GAN-based video summarizationVasileiosMezaris
367 views38 slides
ReTV at EBU MDN Workshop 2020 by
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV project
733 views20 slides
PoR_evaluation_measure_acm_mm_2020 by
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020VasileiosMezaris
110 views55 slides

More Related Content

What's hot

Hard-Negatives Selection Strategy for Cross-Modal Retrieval by
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
80 views15 slides
PGL SUM Video Summarization by
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video SummarizationVasileiosMezaris
164 views25 slides
MPEG Visual Quality Assessment: Tasks and Perspectives by
MPEG Visual Quality Assessment: Tasks and PerspectivesMPEG Visual Quality Assessment: Tasks and Perspectives
MPEG Visual Quality Assessment: Tasks and PerspectivesAlpen-Adria-Universität
547 views11 slides
Perception and Quality of Immersive Media by
Perception and Quality of Immersive MediaPerception and Quality of Immersive Media
Perception and Quality of Immersive MediaAlpen-Adria-Universität
577 views45 slides
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics by
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsMPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsAlpen-Adria-Universität
732 views3 slides
AzMoC++Mt by
AzMoC++MtAzMoC++Mt
AzMoC++MtAziz Motiwala
267 views3 slides

Similar to ReTV AI4TV Summarization

The End Of Testing As We Know It (TestCon - Rik Marselis).pdf by
The End Of Testing As We Know It (TestCon - Rik Marselis).pdfThe End Of Testing As We Know It (TestCon - Rik Marselis).pdf
The End Of Testing As We Know It (TestCon - Rik Marselis).pdfRik Marselis
236 views50 slides
Advanced Verification Methodology for Complex System on Chip Verification by
Advanced Verification Methodology for Complex System on Chip VerificationAdvanced Verification Methodology for Complex System on Chip Verification
Advanced Verification Methodology for Complex System on Chip VerificationVLSICS Design
371 views11 slides
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN by
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET Journal
9 views3 slides
iEVM (Integrated Earned Value Management) An approach for integrating quality... by
iEVM (Integrated Earned Value Management) An approach for integrating quality...iEVM (Integrated Earned Value Management) An approach for integrating quality...
iEVM (Integrated Earned Value Management) An approach for integrating quality...IRJET Journal
44 views7 slides
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U... by
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET Journal
10 views5 slides
Fractional step discriminant pruning by
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruningVasileiosMezaris
90 views20 slides

Similar to ReTV AI4TV Summarization(20)

The End Of Testing As We Know It (TestCon - Rik Marselis).pdf by Rik Marselis
The End Of Testing As We Know It (TestCon - Rik Marselis).pdfThe End Of Testing As We Know It (TestCon - Rik Marselis).pdf
The End Of Testing As We Know It (TestCon - Rik Marselis).pdf
Rik Marselis236 views
Advanced Verification Methodology for Complex System on Chip Verification by VLSICS Design
Advanced Verification Methodology for Complex System on Chip VerificationAdvanced Verification Methodology for Complex System on Chip Verification
Advanced Verification Methodology for Complex System on Chip Verification
VLSICS Design371 views
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN by IRJET Journal
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET Journal9 views
iEVM (Integrated Earned Value Management) An approach for integrating quality... by IRJET Journal
iEVM (Integrated Earned Value Management) An approach for integrating quality...iEVM (Integrated Earned Value Management) An approach for integrating quality...
iEVM (Integrated Earned Value Management) An approach for integrating quality...
IRJET Journal44 views
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U... by IRJET Journal
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET Journal10 views
IRJET- The Application of Value Engineering to Highway Projects and Programs by IRJET Journal
IRJET- The Application of Value Engineering to Highway Projects and ProgramsIRJET- The Application of Value Engineering to Highway Projects and Programs
IRJET- The Application of Value Engineering to Highway Projects and Programs
IRJET Journal39 views
Mtech Second progresspresentation ON VIDEO SUMMARIZATION by NEERAJ BAGHEL
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
NEERAJ BAGHEL752 views
PetroSync - Successful Project Cost Estimation & Control by PetroSync
PetroSync - Successful Project Cost Estimation & ControlPetroSync - Successful Project Cost Estimation & Control
PetroSync - Successful Project Cost Estimation & Control
PetroSync383 views
GAS MANAGEMENT SYSTEM.pdf by RmsDagi
GAS MANAGEMENT SYSTEM.pdfGAS MANAGEMENT SYSTEM.pdf
GAS MANAGEMENT SYSTEM.pdf
RmsDagi2 views
Validation gaining confidence in simulation Darre Odeleye CEng MIMechE by Darre Odeleye
Validation gaining confidence in simulation Darre Odeleye CEng MIMechEValidation gaining confidence in simulation Darre Odeleye CEng MIMechE
Validation gaining confidence in simulation Darre Odeleye CEng MIMechE
Darre Odeleye178 views
IRJET- Storage Optimization of Video Surveillance from CCTV Camera by IRJET Journal
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET Journal18 views
Design and Development of Hybrid Storage Shelf by IRJET Journal
Design and Development of Hybrid Storage ShelfDesign and Development of Hybrid Storage Shelf
Design and Development of Hybrid Storage Shelf
IRJET Journal68 views
Reengineering framework for open source software using decision tree approach by IJECEIAES
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approach
IJECEIAES53 views
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendation by ReTV project
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendationICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ICME 2020 Tutorial Part II: Video summary (re-)use and recommendation
ReTV project803 views
IRJET- Critical Chain Project Management in Construction Projects by IRJET Journal
IRJET- Critical Chain Project Management in Construction ProjectsIRJET- Critical Chain Project Management in Construction Projects
IRJET- Critical Chain Project Management in Construction Projects
IRJET Journal15 views
Prototyping by Ifa Laili
PrototypingPrototyping
Prototyping
Ifa Laili1.7K views
Model based systems engineering by Capgemini
Model based systems engineeringModel based systems engineering
Model based systems engineering
Capgemini1.5K views
5 Killer Examples : How to Use Microlearning Based Training Effectively - EI ... by EI Design
5 Killer Examples : How to Use Microlearning Based Training Effectively - EI ...5 Killer Examples : How to Use Microlearning Based Training Effectively - EI ...
5 Killer Examples : How to Use Microlearning Based Training Effectively - EI ...
EI Design6.1K views

More from ReTV project

ReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital Age by
ReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital AgeReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital Age
ReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital AgeReTV project
323 views16 slides
ReTV @ EBU AI-AME 2021 by
ReTV @ EBU AI-AME 2021ReTV @ EBU AI-AME 2021
ReTV @ EBU AI-AME 2021ReTV project
540 views10 slides
ReTV AI4TV 2020 by
ReTV AI4TV 2020ReTV AI4TV 2020
ReTV AI4TV 2020ReTV project
472 views14 slides
Content Adaptation, Personalisation and Fine-Grained Retrieval: Applying AI ... by
Content Adaptation, Personalisation and Fine-Grained Retrieval:  Applying AI ...Content Adaptation, Personalisation and Fine-Grained Retrieval:  Applying AI ...
Content Adaptation, Personalisation and Fine-Grained Retrieval: Applying AI ...ReTV project
34 views25 slides
Implementing artificial intelligence strategies for content annotation and pu... by
Implementing artificial intelligence strategies for content annotation and pu...Implementing artificial intelligence strategies for content annotation and pu...
Implementing artificial intelligence strategies for content annotation and pu...ReTV project
1.2K views23 slides
ReTV: AI for Audience Prediction and Profiling by
ReTV: AI for Audience Prediction and ProfilingReTV: AI for Audience Prediction and Profiling
ReTV: AI for Audience Prediction and ProfilingReTV project
401 views14 slides

More from ReTV project(9)

ReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital Age by ReTV project
ReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital AgeReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital Age
ReTV: Come Aboard the Voyage of Audiovisual Content Reuse in the Digital Age
ReTV project323 views
ReTV @ EBU AI-AME 2021 by ReTV project
ReTV @ EBU AI-AME 2021ReTV @ EBU AI-AME 2021
ReTV @ EBU AI-AME 2021
ReTV project540 views
Content Adaptation, Personalisation and Fine-Grained Retrieval: Applying AI ... by ReTV project
Content Adaptation, Personalisation and Fine-Grained Retrieval:  Applying AI ...Content Adaptation, Personalisation and Fine-Grained Retrieval:  Applying AI ...
Content Adaptation, Personalisation and Fine-Grained Retrieval: Applying AI ...
ReTV project34 views
Implementing artificial intelligence strategies for content annotation and pu... by ReTV project
Implementing artificial intelligence strategies for content annotation and pu...Implementing artificial intelligence strategies for content annotation and pu...
Implementing artificial intelligence strategies for content annotation and pu...
ReTV project1.2K views
ReTV: AI for Audience Prediction and Profiling by ReTV project
ReTV: AI for Audience Prediction and ProfilingReTV: AI for Audience Prediction and Profiling
ReTV: AI for Audience Prediction and Profiling
ReTV project401 views
Using TV Metadata to optimise the repurposing and republication of TV Content... by ReTV project
Using TV Metadata to optimise the repurposing and republication of TV Content...Using TV Metadata to optimise the repurposing and republication of TV Content...
Using TV Metadata to optimise the repurposing and republication of TV Content...
ReTV project3.8K views
From TV to ReTV, Keynote by Lyndon Nixon at TVX 2019 @datatv by ReTV project
 From TV to ReTV, Keynote by Lyndon Nixon at TVX 2019 @datatv  From TV to ReTV, Keynote by Lyndon Nixon at TVX 2019 @datatv
From TV to ReTV, Keynote by Lyndon Nixon at TVX 2019 @datatv
ReTV project1.4K views
ReTV @ cross media cafe 2018 by ReTV project
ReTV @ cross media cafe 2018ReTV @ cross media cafe 2018
ReTV @ cross media cafe 2018
ReTV project2.4K views

Recently uploaded

Advanced_Recommendation_Systems_Presentation.pptx by
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptxneeharikasingh29
5 views9 slides
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 slides
Amy slides.pdf by
Amy slides.pdfAmy slides.pdf
Amy slides.pdfStatsCommunications
5 views13 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
19 views17 slides
UNEP FI CRS Climate Risk Results.pptx by
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides

Recently uploaded(20)

Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204215 views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821712 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... by DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 views

ReTV AI4TV Summarization

  • 1. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Title of presentation Subtitle Name of presenter Date A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization E. Apostolidis1,2, A. I. Metsai1, E. Adamantidou1, V. Mezaris1, I. Patras2 1 CERTH-ITI, Thermi - Thessaloniki, Greece 2 School of EECS, Queen Mary University of London, London, UK 1st AI4TV Workshop, ACM MM 2019, Nice, France, October 2019
  • 2. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Outline 2 ⚫ Problem statement ⚫ Related work ⚫ Limitations of existing approaches ⚫ Developed approach ⚫ Experiments ⚫ Summarization examples ⚫ Summary and next steps ⚫ Key references
  • 3. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 3 Video summary: a short visual summary that encapsulates the flow of the story and the essential parts of the full-length video Original video Video summary (storyboard) Problem statement
  • 4. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 4 Problem statement General applications of video summarization ⚫ Professional CMS: effective indexing, browsing, retrieval & promotion of media assets! ⚫ Video sharing platforms: improved viewer experience, enhanced viewer engagement & increased content consumption! ⚫ However, different distribution channels have different requirements / restrictions w.r.t. the video content duration (e.g. optimal Twitter videos < 45 sec.; ideal YouTube videos < 2 min.)
  • 5. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 5 Related work Deep-learning approaches ⚫ CNN-based methods that extract and use video semantics to identify important video parts based on sequence labeling [49], self-attention networks [18], or video-level metadata [44, 46] ⚫ RNN-based approaches that capture temporal dependency over the video frames, using ⚫ e.g. LSTMs for keyframe selection [64] ⚫ hierarchies of LSTMs to identify the video structure and select key-fragments [70, 71] ⚫ combinations of LSTMs with DTR units and GANs to capture long-range frame dependency [67] ⚫ attention-based encoder-decoders [22, 28], or memory-augmented networks [19] ⚫ Unsupervised methods that do not rely on human-annotations, and build summaries ⚫ using adversarial learning objectives (GANs) to minimize the distance between training videos and a distribution of their summarizations [40], or to maximize mutual information between the created summary and the original video [60] ⚫ through a decision-making process that is learned via RL and reward functions [73] ⚫ by learning to extract key motions of appearing objects [68] ⚫ by learning a “video-to-summary” mapping from unpaired edited summaries available online [48]
  • 6. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 6 Motivation and reasoning Disadvantages of supervised learning ⚫ Restricted amount of annotated data is available for supervised training of a video summarization method ⚫ Highly-subjective nature of video summarization (relying on viewer’s demands and aesthetics); there is no “ideal” or commonly accepted summary that could be used for training an algorithm Advantages of unsupervised learning ⚫ No need for learning data; avoid laborious and time-demanding labeling of video data ⚫ Adaptability to different types of video; summarization is learned based on the video content
  • 7. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach ⚫ Starting point: the SUM-GAN architecture ⚫ Main idea: build a keyframe selection mechanism by minimizing the distance between the deep representations of the original video and a reconstructed version of it based on the selected key-frames ⚫ Problem: how to define a good distance? ⚫ Solution: use a trainable discriminator network! ⚫ Goal: train the Summarizer to maximally confuse the Discriminator when distinguishing the original from the reconstructed video 7 Building on adversarial learning
  • 8. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach ⚫ New model: the SUM-GAN-sl architecture ⚫ Contains a linear compression layer that reduces: ⚫ The size of the CNN feature vectors from 1024 to 500 ⚫ Thus also the number of trainable parameters 8 Building on adversarial learning
  • 9. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach ⚫ New model: the SUM-GAN-sl architecture ⚫ Follows an incremental and fine-grained approach to training the model’s components 9 Building on adversarial learning
  • 10. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach ⚫ New model: the SUM-GAN-sl architecture ⚫ Follows an incremental and fine-grained approach to training the model’s components 10 Building on adversarial learning
  • 11. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach ⚫ New model: the SUM-GAN-sl architecture ⚫ Follows an incremental and fine-grained approach to training the model’s components 11 Building on adversarial learning
  • 12. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Developed approach ⚫ New model: the SUM-GAN-sl architecture ⚫ Follows an incremental and fine-grained approach to training the model’s components 12 Building on adversarial learning
  • 13. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 13 Datasets ⚫ SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark) ⚫ 25 videos capturing multiple events (e.g. cooking and sports) ⚫ video length: 1.6 to 6.5 min ⚫ annotation: fragment-based video summaries ⚫ TVSum (https://github.com/yalesong/tvsum) ⚫ 50 videos from 10 categories of TRECVid MED task ⚫ video length: 1 to 5 min ⚫ annotation: frame-level importance scores
  • 14. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 14 Evaluation protocol ⚫ The generated summary should not exceed 15% of the video length ⚫ Similarity between automatically generated (A) and ground-truth (G) summary is expressed by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| || means duration) ⚫ Typical metrics for computing Precision and Recall at the frame-level
  • 15. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 15 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
  • 16. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 16 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74])
  • 17. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 17 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74]) F-Score1
  • 18. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 18 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74]) F-Score2 F-Score1
  • 19. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 19 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74]) F-ScoreN F-Score2 F-Score1
  • 20. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 20 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Most used approach (by [16, 18, 19, 29, 48, 49, 64, 61, 70, 71, 73, 74]) F-ScoreN F-Score2 F-Score1 SumMe: TVSum: N
  • 21. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 21 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Alternative approach (used in [22, 28, 40, 58, 60, 67])
  • 22. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 22 Evaluation protocol ⚫ Slight but important distinction w.r.t. what is eventually used as ground-truth summary ⚫ Alternative approach (used in [22, 28, 40, 58, 60, 67]) F-Score
  • 23. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project ⚫ To get insights about the efficiency of the used datasets and metrics, we examined: ⚫ the efficiency of a randomly generated summary (frames' importance scores were defined based on a uniform distribution of probabilities, and the experiment was performed 100 times) ⚫ the human performance, i.e. how well a human annotator would perform based on the preferences of the remaining annotators ⚫ the highest performance on TVSum according to the best user summary (with the highest overlap) for each video of the dataset (best for SumMe is 100%) ⚫ Results consistent with the findings of a recent CVPR paper ([41]) Experiments 23 Preliminary study on summarization datasets
  • 24. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project ⚫ Step 1: Assessing the impact of regularization factor σ Experiments 24 Evaluation outcomes Findings ⚫ Value of σ affects the models’ performance and needs fine- tuning ⚫ Too small and too large values lead to reduced efficiency, and only a specific range of values results in good performance ⚫ Fine-tuning is dataset-dependent as the highest performance is achieved for different values of σ in each dataset
  • 25. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project ⚫ Step 2: Selecting the best configuration Experiments 25 Evaluation outcomes Learning curves
  • 26. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Findings ⚫ A few SoA methods are comparable (or even worse) with a random summary generator ⚫ Best method on SumMe (UnpairedVSN) performs slightly better than our method, and it is less competitive on TVSum ⚫ Best method on TVSum shows random-level performance on SumMe (seems to be dataset- tailored) ⚫ SUM-GAN-sl performs consistently well in both datasets and is the most competitive one ⚫ Step 3: Comparison with SoA unsupervised approaches based on multiple user summaries Experiments 26 Evaluation outcomes
  • 27. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Findings ⚫ Best methods in TVSum (MAVS & Tessellationsup) are highly-adapted to this dataset, as they exhibit random-level performance on SumMe ⚫ Only a few supervised methods clearly surpass the performance of a randomly-generated summary on both datasets, with VASNet being the best among them ⚫ Their performance ranges in [44.1 - 49.7] on SumMe, and in [56.1 - 61.4] on TVSum ⚫ The performance of SUM-GAN-sl makes our unsupervised method comparable with SoA supervised techniques for video summarization ⚫ Step 4: Comparison with SoA supervised approaches based on multiple user summaries Experiments 27 Evaluation outcomes
  • 28. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Findings ⚫ Model’s performance is affected by the value of σ in a way similar to the one reported in [40] ⚫ Step 5: Comparison with SoA approaches based on single ground-truth summaries ⚫ Impact of regularization factor σ Experiments 28 Evaluation outcomes ⚫ Τhe effect of this hyper-parameter depends on the used evaluation approach (best performance when using multiple human summaries was observed for σ = 0.1) ⚫ SUM-GAN-sl clearly outperforms the original SUM-GAN model on both datasets
  • 29. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Findings ⚫ Best version of SUM-GAN-sl (observed for σ = 0.5) exceeds the performance of all other techniques (both supervised and unsupervised ones) that follow this evaluation protocol ⚫ Step 5: Comparison with SoA approaches based on single ground-truth summaries Experiments 29 Evaluation outcomes Unsupervised approaches marked with an asterisk
  • 30. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Summarization examples 30 Full video Generated summary
  • 31. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project ⚫ Summary ⚫ Conducted a study to assess and advance the effectiveness of unsupervised video summarization based on adversarial learning ⚫ Focused on the SUM-GAN model and suggested a new training approach to advance the learning efficiency of the adversarial module of the architecture ⚫ Examined the evaluation protocols and metrics, and made estimates on the possible performance on two datasets and the suitability of the used metrics ⚫ Comparative evaluations with SoA showed that our model is among the best unsupervised methods and comparable with supervised algorithms too ⚫ Next steps ⚫ Further improve our model by exploiting the efficiency of attention networks and the training capacity of reinforcement learning approaches ⚫ Extend our model with mechanisms that capture the temporal structure of the video to support multi- level video summarization ⚫ Investigate methods for video summarization tailored to the targeted audience and distribution channel (apply rules about the content of the summary; integrate a human-critic in the pipeline) Summary and next steps 31
  • 32. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 1. J. Almeida, et al. Vison: Video summarization for online applications. Pat. Recogn. Lett., 33(4):397-409, Mar. 2012 2. R. Anirudh, et al. Diversity promoting online sampling for streaming video summarization. ICIP 2016, pp 3329-3333, Sept 2016 3. H. Bahuleyan, et al. Variational attention for sequence-to-sequence models. 27th Int. Conf. on Computational Linguistics, pp 1672-1682, USA, Aug. 2018 4. J. Calic, et al. Efficient layout of comic-like video summaries. IEEE Transactions on Circuits and Systems for Video Technology, 17(7):931-936, July 2007 5. P. P. K. Chan, et al. A novel method to reduce redundancy in adaptive threshold clustering key frame extraction systems. ICMLC 2011, vol 4, pp 1637-1642 6. J. Cho. PyTorch Implementation of SUM-GAN from “Unsupervised Video Summarization with Adversarial LSTM Networks”, 2017 7. K. Cho, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014 , pp. 1724-1734, Qatar, Oct. 2014 8. W. Chu, et al. Video co-summarization: Video summarization by visual co-occurrence. CVPR 2015, pp 3584-3592, June 2015 9. Y. Cong, et al. Sparse reconstruction cost for abnormal event detection. CVPR 2011, pp 3449-3456, June 2011 10. K. Darabi, et al. Personalized video summarization based on group scoring. ChinaSIP 2014, pp 310-314, July 2014 11. K. Davila, et al. Whiteboard video summarization via spatio-temporal conflict minimization. ICDAR 2017, vol 1, pp 355-362, Nov. 2017 12. S. E. F. de Avila, et al. Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pat. Rec. Lett., 32(1):56-68, Jan. 2011 13. M. S. Drew, et al. Clustering of compressed illumination-invariant chromaticity signatures for efficient video summarization. Im. Vis. Comp., 21:705-716, 2003 14. N. Ejaz, et al. Feature aggregation based visual attention model for video summarization. Computers & Electrical Engineering, 40(3):993-1005, 2014 15. N. Ejaz, et al. Adaptive key frame extraction for video summarization using an aggregation mechanism. J. of Vis. Comm. & Image Repr., 23(7):1031-1040, 2012 16. M. Elfeki, et al. Video summarization via actionness ranking. WACV 2019, Waikoloa Village, HI, USA, pp 754-763, Jan 2019 17. E. Elhamifar, et al. See all by looking at a few: Sparse modeling for finding representative objects. CVPR 2012, pp 1600-1607, June 2012 18. J. Fajtl, et al. Summarizing videos with attention. ACCV 2018 Workshops, pp 39-54, Cham, 2019. Springer International Publishing 19. L. Feng, et al. Extractive video summarizer with memory augmented neural networks. ACM MM 2018, pp 976-983, New York, NY, USA, 2018. ACM 20. S. Feng, et al. Online content-aware video condensation. CVPR 2012, pp 2082-2087, June 2012 21. M. Fleischman, et al. Temporal feature induction for baseball highlight classification. ACM MM 2007, pp 333-336, New York, NY, USA, 2007. ACM 22. T. Fu, et al. Attentive and adversarial learning for video summarization. WACV 2019, Waikoloa Village, HI, USA, pp 1579-1587, January 2019 23. M. Furini, et al. Stimo: Still and moving video storyboard for the web scenario. Multimedia Tools Appl., 46(1):47-69, Jan. 2010 Key references 32
  • 33. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 24. B. Gong, et al. Diverse sequential subset selection for supervised video summarization. NIPS 2014, pp 2069-2077, Cambridge, MA, USA, 2014. MIT Press 25. M. Gygli, et al. Video summarization by learning sub-modular mixtures of objectives. CVPR 2015, pp 3090-3098, June 2015 26. M. Gygli, et al. Creating summaries from user videos. ECCV 2014, pp 505-520, Cham, 2014. Springer International Publishing 27. S. Hochreiter et al. Long short-term memory. Neural computation, 9(8):1735-1780, 1997 28. Z. Ji, et al. Video summarization with attention-based encoder-decoder networks. IEEE Trans. on Circuits and Systems for Video Technology, 2019 29. D. Kaufman, G. Levi, T. Hassner, and L. Wolf. Temporal tessellation: A unified approach for video analysis. ICCV 2017, pp 94-104, Oct 2017 30. A. Khosla, et al. Large-scale video summarization using web-image priors. CVPR 2013, pp 2698-2705, June 2013 31. C. Kim, et al. Object-based video abstraction for video surveillance systems. IEEE TCSVT, 12(12):1128-1138, Dec 2002 32. J.-L. Lai, et al. Key frame extraction based on visual attention model. Journal of Visual Communication and Image Representation, 23(1):114-125, 2012 33. S. Lee, et al. A Memory Network Approach for Story-based Temporal Summarization of 360 Videos. CVPR 2018 34. Y. J. Lee, et al. Discovering important people and objects for egocentric video summarization. CVPR 2012, pp 1346-1353, June 2012 35. X. Li, et al. A general framework for edited video and raw video summarization. IEEE Trans. on Image Processing, 26(8):3652-3664, Aug 2017 36. Y. Li, et al. An overview of video abstraction techniques. Hewlett Packard, Technical Reports, 01 2001 37. S. Lu, et al. A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE TMM, 16(6):1497-1509, Oct 2014 38. Z. Lu, et al. Story-driven summarization for egocentric video. CVPR 2013, pp 2714-2721, June 2013 39. Y.-F. Ma, et al. A generic framework of user attention model and its application in video summarization. IEEE Trans. on Multimedia, 7(5):907{919, Oct 2005 40. B. Mahasseni, et al. Unsupervised video summarization with adversarial LSTM networks. CVPR 2017, pp 2982-2991 41. M. Otani, et al. Rethinking the evaluation of video summaries. CVPR 2019 42. M. Otani, et al. Video summarization using deep semantic features. ACCV 2016 43. M. Otani, et al. Video summarization using textual descriptions for authoring video blogs. Multimedia Tools and Applications, 76(9):12097-12115, May 2017 44. R. Panda, et al. Weakly supervised summarization of web videos. ICCV 2017, pp 3677-3686, Oct 2017 45. J. Peng, et al. Keyframe-based video summary using visual attention clues. IEEE MultiMedia, 17(2):64-73, April 2010 46. D. Potapov, et al. Category-specific video summarization. ECCV 2017, pages 540-555, Zurich, Switzerland, Sept. 2014. Springer Key references 33
  • 34. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 47. A. Radford, et al. Unsupervised representation learning with deep convolutional generative adversarial networks. ICLR 2016 48. M. Rochan, et al. Video summarization by learning from unpaired data. CVPR 2019 49. M. Rochan, et al. Video summarization using fully convolutional sequence networks. ECCV 2018, pp 358-374, Cham, 2018. Springer International Publishing 50. A. Sharghi, et al. Query-focused extractive video summarization. ECCV 2016, pp 3-19, Cham, 2016. Springer International Publishing 51. Y. Song, et al. Tvsum: Summarizing web videos using titles. CVPR 2015, pp 5179-5187, June 2015 52. M. Sun, et al. Ranking highlights in personal videos by analyzing edited videos. IEEE Trans. on Image Processing, 25(11):5145-5157, Nov 2016 53. C. Szegedy, et al. Going deeper with convolutions. CVPR 2015, pp 1-9, June 2015 54. B. T. Truong, et al. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl., 3(1), Feb. 2007 55. A. B. Vasudevan, et al. Query-adaptive video summarization via quality-aware relevance estimation. ACM MM 2017, pp 582-590, NY, USA, 2017. ACM 56. T. Wang, et al. Video collage: A novel presentation of video sequence. ICME 2007, pp 1479-1482, July 2007 57. X. Wang, et al. User-specific video summarization. ICMSP 2011, vol 1, pp 213-219, May 2011 58. H. Wei, et al. Video summarization via semantic attended networks. AAAI 2018 59. T. Yao, et al. Highlight detection with pairwise deep ranking for first-person video summarization. CVPR 2016, pp 982-990, June 2016 60. L. Yuan, et al. Cycle-sum: Cycle-consistent adversarial LSTM networks for unsupervised video summarization. AAAI 2019 61. Y. Yuan, et al. Video summarization by learning deep side semantic embedding. IEEE TCSVT, 29(1):226-237, Jan 2019 62. H. Zhang, et al. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 30:643-658, 1997 63. K. Zhang, et al. Summary transfer: Exemplar-based subset selection for video summarization. CVPR 2016, pp 1059-1067, 2016 64. K. Zhang, et al. Video summarization with long short-term memory. ECCV 2016, pp 766-782, Cham, 2016. Springer International Publishing 65. S. Zhang, et al. Context-aware surveillance video summarization. IEEE Trans. on Image Processing, 25(11):5469-5478, Nov 2016 66. X.-D. Zhang, et al. Dynamic selection and effective compression of key frames for video abstraction. Pattern Recogn. Lett., 24(9-10):1523-1532, June 2003 67. Y. Zhang, et al. DTR-GAN: Dilated temporal relational adversarial network for video summarization. ACM TUR-C 2019: 89:1-89:6 68. Y. Zhang, et al. Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters, 2018 69. Y. Zhang, et al. Motion-state-adaptive video summarization via spatiotemporal analysis. IEEE TCSVT, 27(6):1340-1352, June 2017 Key references 34
  • 35. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 70. B. Zhao, et al. Hierarchical recurrent neural network for video summarization. ACM MM 2017, pp 863-871, New York, NY, USA, 2017. ACM 71. B. Zhao, et al. HSA-RNN: Hierarchical structure-adaptive RNN for video summarization. CVPR 2018 72. B. Zhao, et al. Quasi real-time summarization for consumer videos. CVPR 2014, pp 2513-2520, June 2014 73. K. Zhou, et al. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. AAAI 2018 74. K. Zhou, et al. Video summarization by classification with deep reinforcement learning. BMVC 2018 75. G. Zhu, et al. Trajectory based event tactics analysis in broadcast sports video. ACM MM 2007, pp 58-67, New York, NY, USA, 2007. ACM Key references 35
  • 36. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 36 Thank you for your attention! Questions? Vasileios Mezaris, bmezaris@iti.gr Code and documentation publicly available at: https://github.com/e-apostolidis/SUM-GAN-sl This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV