"Unsupervised Video Summarization via Attention-Driven Adversarial Learning", by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
Tutorial on "Video Summarization and Re-use Technologies and Tools", delivered at IEEE ICME 2020. These slides correspond to the first part of the tutorial, presented by Vasileios Mezaris and Evlampios Apostolidis. This part deals with automatic video summarization, and includes a presentation of the video summarization problem definition and a literature overview; an in-depth discussion on a few unsupervised GAN-based methods; and a discussion on video summarization datasets, evaluation protocols and results, and future directions.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-uofw
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Professor Jeff Bilmes of the University of Washington delivers the presentation "Image and Video Summarization" at the December 2016 Embedded Vision Alliance Member Meeting. Bilmes provides an overview of the state of the art in image and video summarization.
Explaining video summarization based on the focus of attentionVasileiosMezaris
Presentation of paper "Explaining video summarization based on
the focus of attention", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper we propose a method for explaining
video summarization. We start by formulating the problem as
the creation of an explanation mask which indicates the parts
of the video that influenced the most the estimates of a video
summarization network, about the frames’ importance. Then, we
explain how the typical analysis pipeline of attention-based networks for video summarization can be used to define explanation
signals, and we examine various attention-based signals that have
been studied as explanations in the NLP domain. We evaluate
the performance of these signals by investigating the video
summarization network’s input-output relationship according
to different replacement functions, and utilizing measures that quantify the capability of explanations to spot the most and
least influential parts of a video. We run experiments using an
attention-based network (CA-SUM) and two datasets (SumMe
and TVSum) for video summarization. Our evaluations indicate the advanced performance of explanations formed using the inherent attention weights, and demonstrate the ability of our
method to explain the video summarization results using clues
about the focus of the attention mechanism.
Presentation of the paper titled "Combining Global and Local Attention with Positional Encoding for Video Summarization", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software is available at https://github.com/e-apostolidis/PGL-SUM.
In October 2017, ISO/IEC JCT1 SC29/WG11 MPEG and ITU-T SG16/Q6 VCEG have jointly published a Call for Proposals on Video Compression with Capability beyond HEVC and its current extensions. It is targeting at a new generation of video compression technology that has substantially higher compression capability than the existing HEVC standard. The responses to the call are evaluated in April 2018, forming the kick-off for a new standardization activity in the Joint Video Experts Team (JVET) of VCEG and MPEG, with a target of finalization by the end of the year 2020. Three categories of video are addressed: Standard dynamic range video (SDR), high dynamic range video (HDR), and 360° video. While SDR and HDR cover variants of conventional video to be displayed e.g. on a suitable TV screen at very high resolution (UHD), the 360° category targets at videos capturing a full-degree surround view of the scene. This enables an immersive video experience with the possibility to look around in the rendered scene, e.g. when viewed using a head-mounted display. This application triggers various technical challenges which need to be addressed in terms of compression, encoding, transport, and rendering. The talk summarizes the current state of the complete standardization project. Focussing on the SDR and 360° video categories, it highlights the development of selected coding tools compared to the state of the art. Representative examples of the new technological challenges as well as corresponding proposed solutions are presented.
Tutorial on "Video Summarization and Re-use Technologies and Tools", delivered at IEEE ICME 2020. These slides correspond to the first part of the tutorial, presented by Vasileios Mezaris and Evlampios Apostolidis. This part deals with automatic video summarization, and includes a presentation of the video summarization problem definition and a literature overview; an in-depth discussion on a few unsupervised GAN-based methods; and a discussion on video summarization datasets, evaluation protocols and results, and future directions.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-uofw
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Professor Jeff Bilmes of the University of Washington delivers the presentation "Image and Video Summarization" at the December 2016 Embedded Vision Alliance Member Meeting. Bilmes provides an overview of the state of the art in image and video summarization.
Explaining video summarization based on the focus of attentionVasileiosMezaris
Presentation of paper "Explaining video summarization based on
the focus of attention", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper we propose a method for explaining
video summarization. We start by formulating the problem as
the creation of an explanation mask which indicates the parts
of the video that influenced the most the estimates of a video
summarization network, about the frames’ importance. Then, we
explain how the typical analysis pipeline of attention-based networks for video summarization can be used to define explanation
signals, and we examine various attention-based signals that have
been studied as explanations in the NLP domain. We evaluate
the performance of these signals by investigating the video
summarization network’s input-output relationship according
to different replacement functions, and utilizing measures that quantify the capability of explanations to spot the most and
least influential parts of a video. We run experiments using an
attention-based network (CA-SUM) and two datasets (SumMe
and TVSum) for video summarization. Our evaluations indicate the advanced performance of explanations formed using the inherent attention weights, and demonstrate the ability of our
method to explain the video summarization results using clues
about the focus of the attention mechanism.
Presentation of the paper titled "Combining Global and Local Attention with Positional Encoding for Video Summarization", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software is available at https://github.com/e-apostolidis/PGL-SUM.
In October 2017, ISO/IEC JCT1 SC29/WG11 MPEG and ITU-T SG16/Q6 VCEG have jointly published a Call for Proposals on Video Compression with Capability beyond HEVC and its current extensions. It is targeting at a new generation of video compression technology that has substantially higher compression capability than the existing HEVC standard. The responses to the call are evaluated in April 2018, forming the kick-off for a new standardization activity in the Joint Video Experts Team (JVET) of VCEG and MPEG, with a target of finalization by the end of the year 2020. Three categories of video are addressed: Standard dynamic range video (SDR), high dynamic range video (HDR), and 360° video. While SDR and HDR cover variants of conventional video to be displayed e.g. on a suitable TV screen at very high resolution (UHD), the 360° category targets at videos capturing a full-degree surround view of the scene. This enables an immersive video experience with the possibility to look around in the rendered scene, e.g. when viewed using a head-mounted display. This application triggers various technical challenges which need to be addressed in terms of compression, encoding, transport, and rendering. The talk summarizes the current state of the complete standardization project. Focussing on the SDR and 360° video categories, it highlights the development of selected coding tools compared to the state of the art. Representative examples of the new technological challenges as well as corresponding proposed solutions are presented.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
"A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization" presentation from the ReTV project team at the AI4TV workshop at the ACM Multimedia 2019 conference focusing on
Slides for the paper "Performance over Random: A robust evaluation protocol for video summarization methods", authored by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, and I. Patras, published in the Proceedings of ACM Multimedia 2020 (ACM MM), Seattle, WA, USA, Oct. 2020.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
https://mcv-m6-video.github.io/deepvideo-2018/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Prepared for the Master in Computer Vision Barcelona:
http://pagines.uab.cat/mcv/
"A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization" presentation from the ReTV project team at the AI4TV workshop at the ACM Multimedia 2019 conference focusing on
Slides for the paper "Performance over Random: A robust evaluation protocol for video summarization methods", authored by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, and I. Patras, published in the Proceedings of ACM Multimedia 2020 (ACM MM), Seattle, WA, USA, Oct. 2020.
As presented by Johan Oomen (Sound an Vision) and Vasilis Mezaris (Information Technologies Institute Thessaloniki) at the 2019 FIAT/IFTA World Conference in Dubrovnik, Croatia.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
The following resources come from the 2009/10 B.Sc in Media Technology and Digital Broadcast (course number 2ELE0073) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The objectives of this module are to demonstrate within a digital broadcast environment:
• an understanding of technical requirements for digital video production process.
• an awareness of technical constraints for content creation and distribution.
• the creation of a video sequences with extra effects added.
This project entails the creation, editing, and encoding of a video sequence with the intention of distribution over a particular digital broadcast platform (e.g. DVB-T or ipTV). This project provides an awareness of current video standards for television and also introduces the use of contemporary digital video authoring tools and processes.
https://imatge.upc.edu/web/publications/keyframe-based-video-summarization-designer
This Final Degree Work extends two previous projects and consists in carrying out an improvement of the video keyframe extraction module from one of them called Designer Master, by integrating the algorithms that were developed in the other, Object Maps.
Firstly the proposed solution is explained, which consists in a shot detection method, where the input video is sampled uniformly and afterwards, cumulative pixel-to-pixel difference is applied and a classifier decides which frames are keyframes or not.
Last, to validate our approach we conducted a user study in which both applications were compared. Users were asked to complete a survey regarding to different summaries created by means of the original application and with the one developed in this project. The results obtained were analyzed and they showed that the improvement done in the keyframes extraction module improves slightly the application performance and the quality of the generated summaries.
Using TV Metadata to optimise the repurposing and republication of TV Content...ReTV project
Using TV Metadata to optimise the repurposing and republication of TV Content across online channels. A presentation by Lyndon Nixon about the ReTV project at EBU MDN 2019.
ReTV: Bringing Broadcaster Archives to the 21st-century AudiencesReTV project
Optimising audiovisual content for online publication and maximising user engagement. A presentation by Lyndon Nixon at Joint Technical Symposium 2019.
Presentation of the paper titled "Summarizing videos using concentrated attention and considering the uniqueness and diversity of the video frames", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the ACM Int. Conf. on Multimedia Retrieval (ICMR’22), Newark, NJ, USA, June 2022. The corresponding software is available at https://github.com/e-apostolidis/CA-SUM.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/how-transformers-are-changing-the-direction-of-deep-learning-architectures-a-presentation-from-synopsys/
Tom Michiels, System Architect for DesignWare ARC Processors at Synopsys, presents the “How Transformers are Changing the Direction of Deep Learning Architectures” tutorial at the May 2022 Embedded Vision Summit.
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrast them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes the presentation with insights on why Synopsys thinks transformers are an important approach for future visual perception tasks.
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
Presentation of paper "Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization", by K. Triaridis, V. Mezaris, delivered at 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Amsterdam, NL, Jan.-Feb. 2024.
Presentation of our top-scoring solution to the MediaEval 2023 NewsImages Task, "Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual Softmax Operation for MediaEval NewsImages 2023", by A. Leventakis, D. Galanopoulos, V. Mezaris, delivered at the 2023 Multimedia Evaluation Workshop (MediaEval'23), Amsterdam, NL, Feb. 2024.
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
Presentation of paper "An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos", by I. Kontostathis, E. Apostolidis, V. Mezaris, delivered at 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Amsterdam, NL, Jan.-Feb. 2024.
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
Presentation of paper "Masked Feature Modelling for the unsupervised pre-training of a Graph Attention Network block for bottom-up video event recognition", by D. Daskalakis, N. Gkalelis, V. Mezaris, delivered at IEEE ISM 2023, Dec. 2022, Laguna Hills, CA, USA.
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022VasileiosMezaris
Matching images to articles is challenging and can be considered a special version of the cross-media retrieval problem. This working note paper presents our solution for the MediaEval NewsImages
benchmarking task. We investigated the performance of two cross-modal networks, a pre-trained network and a trainable one, the latter originally developed for text-video retrieval tasks and adapted to the NewsImages task. Moreover, we utilize a method for revising the similarities produced by either one of the cross-modal networks, i.e., a dual softmax operation, to improve our solutions’ performance. We report the official results for our submitted runs and additional experiments we conducted to evaluate our runs internally.
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
Presentation of paper "TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks", by M. Ntrougkas, N. Gkalelis, V. Mezaris, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
The apparent “black box” nature of neural networks is a barrier to adoption in applications where explainability is essential. This paper presents TAME (Trainable Attention Mechanism for Explanations), a method for generating explanation maps with a multi-branch hierarchical attention mechanism. TAME combines a target model’s feature maps from multiple layers using an attention mechanism, transforming them into an explanation map. TAME can easily be applied to any convolutional neural network (CNN) by streamlining the optimization of the attention mechanism’s training method and the selection
of target model’s feature maps. After training, explanation maps can be computed in a single forward pass. We apply TAME to two widely used models, i.e. VGG-16 and ResNet-50, trained on ImageNet and show improvements over previous top-performing methods. We also provide a comprehensive ablation study comparing the performance of different variations of TAME’s architecture.
Presentation of paper "Gated-ViGAT: Efficient Bottom-Up Event
Recognition and Explanation Using a New Frame
Selection Policy and Gating Mechanism", by N. Gkalelis, D. Daskalakis, V. Mezaris, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper, Gated-ViGAT, an efficient approach for video event recognition, utilizing bottom-up (object) information, a new frame sampling policy and a gating mechanism is proposed. Specifically, the frame sampling policy uses weighted in-degrees (WiDs), derived from the adjacency matrices of graph attention networks (GATs), and a dissimilarity measure to select
the most salient and at the same time diverse frames representing
the event in the video. Additionally, the proposed gating mechanism fetches the selected frames sequentially, and commits early exiting when an adequately confident decision is achieved. In this way, only a few frames are processed by the computationally
expensive branch of our network that is responsible for the bottom-up information extraction. The experimental evaluation on two large, publicly available video datasets (MiniKinetics, ActivityNet) demonstrates that Gated-ViGAT provides a large computational complexity reduction in comparison to our previous approach (ViGAT), while maintaining the excellent event
recognition and explainability performance.
Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris
In this presentation, our work in the context of the Ad-hoc Video Search (AVS) Task of TRECVID 2022 is presented. Our participation in the AVS task is based on a cross-modal deep network architecture, T x V ("T times V"), which utilizes several textual and visual features. As part of the retrieval stage, a dual-softmax approach is also utilized to revise the calculated text-video similarities.
Explaining the decisions of image/video classifiersVasileiosMezaris
Presentation delivered by Vasileios Mezaris at the 1st Nice Workshop on Interpretability, November 2022, Nice, France.
This presentation starts by discussing the motivation of explainability approaches for image and video classifiers. Then, we focus on three distinct problems: learning how to derive explanations for the decisions of a legacy (trained) image classifier; designing a classifier for video event recognition that can also deliver explanations for its decisions; and, taking a first look at possible explanation signals of a video summarizer. Technical details of our proposed solutions to these three problems are presented. Besides quantitative results concerning the goodness of the derived explanations, qualitative examples are also discussed in order to provide insight on the reasons behind classification errors, including possible dataset biases affecting the trained classifiers.
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
I. Gkartzonika, N. Gkalelis, V. Mezaris, "Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism", Proc. ECCV 2022 Workshop on Vision with Biased or Scarce Data (VBSD), Oct. 2022.
In this paper two new learning-based eXplainable AI (XAI) methods for deep convolutional neural network (DCNN) image classifiers, called L-CAM-Fm and L-CAM-Img, are proposed. Both methods use an attention mechanism that is inserted in the original (frozen) DCNN and is trained to derive class activation maps (CAMs) from the last convolutional layer’s feature maps. During training, CAMs are applied to the feature maps (L-CAM-Fm) or the input image (L-CAM-Img) forcing the attention mechanism to learn the image regions explaining the DCNN’s outcome. Experimental evaluation on ImageNet shows that the proposed methods achieve competitive results while requiring a single forward pass at the inference stage. Moreover, based on the derived explanations a comprehensive qualitative analysis is performed providing valuable insight for understanding the reasons behind classification errors, including possible dataset biases affecting the trained classifier.
Are all combinations equal? Combining textual and visual features with multi...VasileiosMezaris
D. Galanopoulos, V. Mezaris, "Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval", Proc. ECCV 2022 Workshop on AI for Creative Video Editing and Understanding (CVEU), Oct. 2022.
In this paper we tackle the cross-modal video retrieval problem and, more specifically, we focus on text-to-video retrieval. We investigate how to optimally combine multiple diverse textual and visual features into feature pairs that lead to generating multiple joint feature spaces, which encode text-video pairs into comparable representations. To learn these representations our proposed network architecture is trained by following a multiple space learning procedure. Moreover, at the retrieval stage, we introduce additional softmax operations for revising the inferred query-video similarities. Extensive experiments in several setups based on three large-scale datasets (IACC.3, V3C1, and MSR-VTT) lead to conclusions on how to best combine text-visual features and document the performance of the proposed network.
Presentation of the paper titled "A Web Service for Video Smart-Cropping", by K. Apostolidis, V. Mezaris, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software and dataset are available at https://github.com/bmezaris/RetargetVid.
Presentation slides for our paper "Combining Adversarial and Reinforcement Learning for Video Thumbnail Selection", ACM ICMR 2021. https://doi.org/10.1145/3460426.3463630.
We developed a new method for unsupervised video thumbnail selection. The developed network architecture selects video thumbnails based on two criteria: the representativeness and the aesthetic quality of their visual content. Training relies on a combination of adversarial and reinforcement learning. The former is used to train a discriminator, whose goal is to distinguish the original from a reconstructed version of the video based on a small set of candidate thumbnails. The discriminator’s feedback is a measure of the representativeness of the selected thumbnails. This measure is combined with estimates about the aesthetic quality of the thumbnails (made using a SoA Fully Convolutional Network) to form a reward and train the thumbnail selector via reinforcement learning. Experiments on two datasets (OVP and Youtube) show the competitiveness of the proposed method against other SoA approaches. An ablation study with respect to the adopted thumbnail selection criteria documents the importance of considering the aesthetics, and the contribution of this information when used in combination with measures about the representativeness of the visual content.
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
Cross-modal learning has gained a lot of interest recently, and many applications of it, such as image-text retrieval, cross-modal video search, or video captioning have been proposed. In this work, we deal with the cross-modal video retrieval problem. The state-of-the-art approaches are based on deep network architectures, and rely on mining hard-negative samples during training to optimize the selection of the network’s parameters. Starting from a state-of-the-art cross-modal architecture that uses the improved marginal ranking loss function, we propose a simple strategy for hard-negative mining to identify which training samples are hard-negatives and which, although presently treated as hard-negatives, are likely not negative samples at all and shouldn’t be treated as such. Additionally, to take full advantage of network models trained using different design choices for hard-negative mining, we examine model combination strategies, and we design a hybrid one effectively combining large numbers of trained models.
Talk by Vasileios Mezaris, titled "Misinformation on the internet: Video and AI", delivered at the "Age of misinformation: an interdisciplinary outlook on fake news" webinar, on 17 December 2020.
Slides for the paper titled "Structured pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications", by N. Gkalelis and V. Mezaris, presented at the 22nd IEEE Int. Symposium on Multimedia (ISM), Dec. 2020.
Presentation of the paper titled "Migration-Related Semantic Concepts for the Retrieval of Relevant Video Content", by E. Elejalde, D. Galanopoulos, C. Niederee, V. Mezaris, published in the proceedings of the Int. Workshop on Artificial Intelligence and Robotics for Law Enforcement Agencies (AIRLEAs) at the 3rd Int. Conf. on Intelligent Technologies and Applications (INTAP 2020), Gjovik, Norway, Sept. 2020.
This is the presentation for the paper "Fractional Step Discriminant Pruning: A Filter Pruning Framework for Deep Convolutional Neural Networks", delivered by N. Gkalelis and V. Mezaris at the 7th IEEE Int. Workshop on Mobile Multimedia Computing (MMC2020) that was held as part of the IEEE Int. Conf. on Multimedia and Expo (ICME), in July 2020.
"Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification", by N. Gkalelis, V. Mezaris. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Unsupervised Video Summarization via Attention-Driven Adversarial Learning
1. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Title of presentation
Subtitle
Name of presenter
Date
Unsupervised Video Summarization via Attention-Driven
Adversarial Learning
E. Apostolidis1,2, E. Adamantidou1, A. I. Metsai1, V. Mezaris1, I. Patras2
1 CERTH-ITI, Thermi - Thessaloniki, Greece
2 School of EECS, Queen Mary University of London, London, UK
26th Int. Conf. on Multimedia Modeling
Daejeon, Korea, January 2020
3. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
3
Video summary: a short visual summary that encapsulates the flow of the story and
the essential parts of the full-length video
Original video
Video summary (storyboard)
Problem statement
4. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
4
Problem statement
Applications of video summarization
Professional CMS: effective indexing,
browsing, retrieval & promotion of media
assets
Video sharing platforms: improved viewer
experience, enhanced viewer engagement &
increased content consumption
Other summarization scenarios: movie trailer production, sports highlights video generation,
video synopsis of 24h surveillance recordings
5. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
5
Related work
Deep-learning approaches
Supervised methods that use types of feedforward neural nets (e.g. CNNs), extract and use
video semantics to identify important video parts based on sequence labeling [21], self-
attention networks [7], or video-level metadata [17]
Supervised approaches that capture the story flow using recurrent neural nets (e.g. LSTMs)
In combination with statistical models to select a representative and diverse set of keyframes [27]
In hierarchies to identify the video structure and select key-fragments [30, 31]
In combination with DTR units and GANs to capture long-range frame dependency [28]
To form attention-based encoder-decoders [9, 13], or memory-augmented networks [8]
Unsupervised algorithms that do not rely on human-annotations, and build summaries
Using adversarial learning to: minimize the distance between videos and their summary-based
reconstructions [1, 16]; maximize the mutual information between summary and video [25]; learn a
mapping from raw videos to human-like summaries based on online available summaries [20]
Through a decision-making process that is learned via RL and reward functions [32]
By learning to extract key motions of appearing objects [29]
6. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
6
Motivation
Disadvantages of supervised learning
Restricted amount of annotated data is available for supervised training of a video
summarization method
Highly-subjective nature of video summarization (relying on viewer’s demands and aesthetics);
there is no “ideal” or commonly accepted summary that could be used for training an algorithm
Advantages of unsupervised learning
No need for learning data; avoid laborious and time-demanding labeling of video data
Adaptability to different types of video; summarization is learned based on the video content
7. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
7
Contributions
Introduce an attention mechanism in an unsupervised learning framework, whereas all
previous attention-based summarization methods ([7-9, 13]) were supervised
Investigate the integration of an attention mechanism into a variational auto-encoder for video
summarization purposes
Use attention to guide the generative adversarial training of the model, rather than using it to
rank the video fragments as in [9]
8. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
Starting point: the SUM-GAN architecture
Main idea: build a keyframe selection mechanism
by minimizing the distance between the deep
representations of the original video and a
reconstructed version of it based on the selected
keyframes
Problem: how to define a good distance?
Solution: use a trainable discriminator network!
Goal: train the Summarizer to maximally confuse
the Discriminator when distinguishing the original
from the reconstructed video
8
Building on adversarial learning
9. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
SUM-GAN-sl:
Contains a linear compression layer that reduces
the size of CNN feature vectors
9
Building on adversarial learning
10. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
SUM-GAN-sl:
Contains a linear compression layer that reduces the size of CNN feature vectors
Follows an incremental and fine-grained approach to train the model’s components
10
Building on adversarial learning
11. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
11
Building on adversarial learning
SUM-GAN-sl:
Contains a linear compression layer that reduces the size of CNN feature vectors
Follows an incremental and fine-grained approach to train the model’s components
(regularization factor)
12. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
12
Building on adversarial learning
SUM-GAN-sl:
Contains a linear compression layer that reduces the size of CNN feature vectors
Follows an incremental and fine-grained approach to train the model’s components
13. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
13
Building on adversarial learning
SUM-GAN-sl:
Contains a linear compression layer that reduces the size of CNN feature vectors
Follows an incremental and fine-grained approach to train the model’s components
14. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
Examined approaches:
1) Integrate an attention layer within the variational auto-encoder (VAE) of SUM-GAN-sl
2) Replace the VAE of SUM-GAN-sl with a deterministic attention auto-encoder
14
Introducing an attention mechanism
15. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
Variational attention was described in [4] and used for natural language modeling
Models the attention vector as Gaussian distributed random variables
15
1) Adversarial learning driven by a variational attention auto-encoder
Variational auto-encoder
16. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
Extended SUM-GAN-sl with variational attention, forming the SUM-GAN-VAAE architecture
The attention weights for each frame were handled as random variables and a latent space
was computed for these values, too
In every time-step t the attention component combines the encoder's output at t and the
decoder's hidden state at t - 1 to compute an attention weight vector
The decoder was modified to update its hidden states based on both latent spaces during the
reconstruction of the video
16
1) Adversarial learning driven by a variational attention auto-encoder
Variational attention
auto-encoder
17. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
Inspired by the efficiency of attention-based
encoder-decoder network in [13]
Built on the findings of [4] w.r.t. the impact of
deterministic attention on VAE
VAE was entirely replaced by an attention auto-
encoder (AAE) network, forming the SUM-
GAN-AAE architecture
17
2) Adversarial learning driven by deterministic attention auto-encoder
19. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
19
2) Adversarial learning driven by deterministic attention auto-encoder
Processing pipeline
Weighted feature vectors fed to the Encoder
Attention auto-encoder
20. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
20
2) Adversarial learning driven by deterministic attention auto-encoder
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
For t > 1: use the hidden state of the previous
Decoder’s step (h1)
For t = 1: use the hidden state of the last
Encoder’s step (He)
Attention auto-encoder
21. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
21
2) Adversarial learning driven by deterministic attention auto-encoder
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Attention auto-encoder
22. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
Developed approach
22
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
23. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
αt multiplied with V and form Context Vector vt’
Developed approach
23
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
24. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
αt multiplied with V and form Context Vector vt’
vt’ combined with Decoder’s previous output yt-1
Developed approach
24
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
25. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
25
2) Adversarial learning driven by deterministic attention auto-encoder
Attention auto-encoder
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
αt multiplied with V and form Context Vector vt’
vt’ combined with Decoder’s previous output yt-1
Decoder gradually reconstructs the video
26. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Developed approach
Input: The CNN feature vectors of the (sampled) video frames
Output: Frame-level importance scores
Summarization process:
CNN features pass through the linear compression layer and the frame selector importance
scores computed at frame-level
Given a video segmentation (using KTS [18]) calculate fragment-level importance scores by
averaging the scores of each fragment's frames
Summary is created by selecting the fragments that maximize the total importance score provided
that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem
26
Model’s I/O and summarization process
27. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
27
Datasets
SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)
25 videos capturing multiple events (e.g. cooking and sports)
video length: 1.6 to 6.5 min
annotation: fragment-based video summaries
TVSum (https://github.com/yalesong/tvsum)
50 videos from 10 categories of TRECVid MED task
video length: 1 to 5 min
annotation: frame-level importance scores
28. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
28
Evaluation protocol
The generated summary should not exceed 15% of the video length
Similarity between automatically generated (A) and ground-truth (G) summary is expressed
by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| ||
means duration)
Typical metrics for computing Precision and Recall at the frame-level
29. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
29
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
30. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
30
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
31. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
31
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-Score1
32. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
32
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-Score2
F-Score1
33. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
33
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-ScoreN
F-Score2
F-Score1
34. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
34
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach (by [1, 6, 7, 8, 14, 20, 21, 26, 27, 29, 30, 31, 32, 33])
F-ScoreN
F-Score2
F-Score1
SumMe: TVSum:
N
35. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
35
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Alternative approach (used in [9, 13, 16, 24, 25, 28])
36. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
36
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Alternative approach (used in [9, 13, 16, 24, 25, 28])
F-Score
37. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Videos were down-sampled to 2 fps
Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet
Linear compression layer reduces the size of these vectors from 1024 to 500
All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM
Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s
learning rate = 10-5
Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing
set having the remaining 20% of data
Ran experiments on 5 differently created random splits and report the average performance at
the training-epoch-level (i.e. for the same training epoch) over these runs
Experiments
37
Implementation details
39. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 1: Assessing the impact of regularization factor σ
Outcomes:
Value of σ affects the models’ performance and needs fine-tuning
Fine-tuning is dataset-dependent
Best overall performance for each model, observed for different σ value
Experiments
39
SUM-GAN-sl SUM-GAN-VAAE SUM-GAN-AAE
40. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 2: Comparison with SoA unsupervised approaches based on multiple user summaries
Outcomes
A few SoA methods are comparable (or even worse) with a random summary generator
Best method on TVSum shows random-level performance on SumMe
Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum
Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two
latent spaces in parallel to the continuous update of the model's components during the training
Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl
Experiments
40
+/- indicate better/worse performance
compared to SUM-GAN-AAE
Note: SUM-GAN is not listed in this table as it follows
the single gt-summary evaluation protocol
41. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 3: Evaluating the effect of the introduced AAE component
Key-fragment selection: Attention mechanism leads to much smoother series of importance
scores
Experiments
41
42. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 3: Evaluating the effect of the introduced AAE component
Training efficiency: much faster and more stable training of the model
Experiments
42
Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
43. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 4: Comparison with SoA supervised approaches based on multiple user summaries
Outcomes
Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as
they exhibit random-level performance on SumMe
Only a few supervised methods surpass the performance of a random summary generator on both
datasets, with VASNet being the best among them
The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum
Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods
Experiments
43
44. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 5: Comparison with SoA approaches based on single ground-truth summaries
Impact of regularization factor σ (best scores in bold)
The model’s performance is affected by the value of σ
The effect of σ depends (also) on the evaluation approach; best performance when using multiple
human summaries was observed for σ = 0.15
SUM-GAN-AAE outperforms the original SUM-GAN model on both datasets, even for the same value
of σ
Experiments
44
45. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Step 5: Comparison with SoA approaches based on single ground-truth summaries
Outcomes
SUM-GAN-AAE model performs consistently well on both datasets
SUM-GAN-AAE shows advanced performance compared to SoA supervised and unsupervised (*)
summarization methods
Experiments
45
Unsupervised approaches
marked with an asterisk
47. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Presented a video summarization method that combines:
The effectiveness of attention mechanisms in spotting the most important parts of the video
The learning efficiency of the generative adversarial networks for unsupervised training
Experimental evaluations on two benchmarking datasets:
Documented the positive contribution of the introduced attention auto-encoder component in the
model's training and summarization performance
Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video
summarization techniques
Conclusions
47
48. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
1. E. Apostolidis, et al.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In:
AI4TV, ACM MM 2019
2. E. Apostolidis, et al.: Fast shot segmentation combining global and local visual descriptors. In: IEEE ICASSP 2014. pp. 6583-6587
3. K. Apostolidis, et al.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: MMM 2018. pp. 29-41
4. H. Bahuleyan, et al.: Variational attention for sequence-to-sequence models. In: 27th COLING. pp. 1672-1682 (2018)
5. J. Cho: PyTorch implementation of SUM-GAN (2017), https://github.com/j-min/Adversarial Video Summary, (last accessed on Oct. 18, 2019)
6. M. Elfeki, et al.: Video summarization via actionness ranking. In: IEEE WACV 2019. pp. 754-763
7. J. Fajtl, et al.: Summarizing videos with attention. In: ACCV 2018. pp. 39-54
8. L. Feng, et al.: Extractive video summarizer with memory augmented neural networks. In: ACM MM 2018. pp. 976-983
9. T. Fu, et al.: Attentive and adversarial learning for video summarization. In: IEEE WACV 2019. pp. 1579-1587
10. M. Gygli, et al.: Creating summaries from user videos. In: ECCV 2014. pp. 505-520
11. M. Gygli, et al.: Video summarization by learning submodular mixtures of objectives. In: IEEE CVPR 2015. pp. 3090-3098
12. S. Hochreiter, et al.: Long Short-Term Memory. Neural Computation 9(8), 1735-1780 (1997)
13. Z. Ji, et al.: Video summarization with attention-based encoder-decoder networks. IEEE Trans. on Circuits and Systems for Video
Technology (2019)
14. D. Kaufman, et al.: Temporal Tessellation: A unified approach for video analysis. In: IEEE ICCV 2017. pp. 94-104
15. S. Lee, et al.: A memory network approach for story-based temporal summarization of 360 videos. In: IEEE CVPR 2018. pp. 1410-1419
16. B. Mahasseni, et al.: Unsupervised video summarization with adversarial LSTM networks. In: IEEE CVPR 2017. pp. 2982-2991
17. M. Otani, et al.: Video summarization using deep semantic features. In: ACCV 2016. pp. 361-377
Key references
48
49. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
18. D. Potapov, et al.: Category-specific video summarization. In: ECCV 2014. pp. 540-555
19. A. Radford, et al.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR 2016
20. M. Rochan, et al.: Video summarization by learning from unpaired data. In: IEEE CVPR 2019
21. M. Rochan, et al.: Video summarization using fully convolutional sequence networks. In: ECCV 2018. pp. 358-374
22. Y. Song, et al.: TVSum: Summarizing web videos using titles. In: IEEE CVPR 2015. pp. 5179-5187
23. C. Szegedy, et al.: Going deeper with convolutions. In: IEEE CVPR 2015. pp. 1-9
24. H. Wei, et al.: Video summarization via semantic attended networks. In: AAAI 2018. pp. 216-223
25. L. Yuan, et al.: Cycle-SUM: Cycle-consistent adversarial LSTM networks for unsupervised video summarization. In: AAAI 2019. pp. 9143-
9150
26. Y. Yuan, et al.: Video summarization by learning deep side semantic embedding. IEEE Trans. on Circuits and Systems for Video Technology
29(1), 226-237 (2019)
27. K. Zhang, et al.: Video summarization with Long Short-Term Memory. In: ECCV 2016. pp. 766-782
28. Y. Zhang, et al.: DTR-GAN: Dilated temporal relational adversarial network for video summarization. In: ACM TURC 2019. pp. 89:1-89:6
29. Y. Zhang, et al.: Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters (2018)
30. B. Zhao, et al.: Hierarchical recurrent neural network for video summarization. In: ACM MM 2017. pp. 863-871
31. B. Zhao, et al.: HSA-RNN: Hierarchical structure-adaptive RNN for video summarization. In: IEEE/CVF CVPR 2018. pp. 7405-7414
32. K. Zhou, et al.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI
2018. pp. 7582-7589
33. K. Zhou, et al.: Video summarisation by classification with deep reinforcement learning. In: BMVC 2018
Key references
49
50. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
50
Thank you for your attention!
Questions?
Vasileios Mezaris, bmezaris@iti.gr
Code and documentation publicly available at:
https://github.com/e-apostolidis/SUM-GAN-AAE
This work was supported by the EUs Horizon 2020 research and innovation
programme under grant agreement H2020-780656 ReTV. The work of Ioannis
Patras has been supported by EPSRC under grant No. EP/R026424/1.