Slides for the paper "Performance over Random: A robust evaluation protocol for video summarization methods", authored by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, and I. Patras, published in the Proceedings of ACM Multimedia 2020 (ACM MM), Seattle, WA, USA, Oct. 2020.
"A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization" presentation from the ReTV project team at the AI4TV workshop at the ACM Multimedia 2019 conference focusing on
Unsupervised Video Summarization via Attention-Driven Adversarial LearningVasileiosMezaris
"Unsupervised Video Summarization via Attention-Driven Adversarial Learning", by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
Tutorial on "Video Summarization and Re-use Technologies and Tools", delivered at IEEE ICME 2020. These slides correspond to the first part of the tutorial, presented by Vasileios Mezaris and Evlampios Apostolidis. This part deals with automatic video summarization, and includes a presentation of the video summarization problem definition and a literature overview; an in-depth discussion on a few unsupervised GAN-based methods; and a discussion on video summarization datasets, evaluation protocols and results, and future directions.
Presentation of the paper titled "Combining Global and Local Attention with Positional Encoding for Video Summarization", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software is available at https://github.com/e-apostolidis/PGL-SUM.
Presentation of the paper titled "A Web Service for Video Smart-Cropping", by K. Apostolidis, V. Mezaris, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software and dataset are available at https://github.com/bmezaris/RetargetVid.
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
Cross-modal learning has gained a lot of interest recently, and many applications of it, such as image-text retrieval, cross-modal video search, or video captioning have been proposed. In this work, we deal with the cross-modal video retrieval problem. The state-of-the-art approaches are based on deep network architectures, and rely on mining hard-negative samples during training to optimize the selection of the network’s parameters. Starting from a state-of-the-art cross-modal architecture that uses the improved marginal ranking loss function, we propose a simple strategy for hard-negative mining to identify which training samples are hard-negatives and which, although presently treated as hard-negatives, are likely not negative samples at all and shouldn’t be treated as such. Additionally, to take full advantage of network models trained using different design choices for hard-negative mining, we examine model combination strategies, and we design a hybrid one effectively combining large numbers of trained models.
Slides for the paper "Performance over Random: A robust evaluation protocol for video summarization methods", authored by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, and I. Patras, published in the Proceedings of ACM Multimedia 2020 (ACM MM), Seattle, WA, USA, Oct. 2020.
"A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization" presentation from the ReTV project team at the AI4TV workshop at the ACM Multimedia 2019 conference focusing on
Unsupervised Video Summarization via Attention-Driven Adversarial LearningVasileiosMezaris
"Unsupervised Video Summarization via Attention-Driven Adversarial Learning", by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
Tutorial on "Video Summarization and Re-use Technologies and Tools", delivered at IEEE ICME 2020. These slides correspond to the first part of the tutorial, presented by Vasileios Mezaris and Evlampios Apostolidis. This part deals with automatic video summarization, and includes a presentation of the video summarization problem definition and a literature overview; an in-depth discussion on a few unsupervised GAN-based methods; and a discussion on video summarization datasets, evaluation protocols and results, and future directions.
Presentation of the paper titled "Combining Global and Local Attention with Positional Encoding for Video Summarization", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software is available at https://github.com/e-apostolidis/PGL-SUM.
Presentation of the paper titled "A Web Service for Video Smart-Cropping", by K. Apostolidis, V. Mezaris, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software and dataset are available at https://github.com/bmezaris/RetargetVid.
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
Cross-modal learning has gained a lot of interest recently, and many applications of it, such as image-text retrieval, cross-modal video search, or video captioning have been proposed. In this work, we deal with the cross-modal video retrieval problem. The state-of-the-art approaches are based on deep network architectures, and rely on mining hard-negative samples during training to optimize the selection of the network’s parameters. Starting from a state-of-the-art cross-modal architecture that uses the improved marginal ranking loss function, we propose a simple strategy for hard-negative mining to identify which training samples are hard-negatives and which, although presently treated as hard-negatives, are likely not negative samples at all and shouldn’t be treated as such. Additionally, to take full advantage of network models trained using different design choices for hard-negative mining, we examine model combination strategies, and we design a hybrid one effectively combining large numbers of trained models.
Presentation slides for our paper "Combining Adversarial and Reinforcement Learning for Video Thumbnail Selection", ACM ICMR 2021. https://doi.org/10.1145/3460426.3463630.
We developed a new method for unsupervised video thumbnail selection. The developed network architecture selects video thumbnails based on two criteria: the representativeness and the aesthetic quality of their visual content. Training relies on a combination of adversarial and reinforcement learning. The former is used to train a discriminator, whose goal is to distinguish the original from a reconstructed version of the video based on a small set of candidate thumbnails. The discriminator’s feedback is a measure of the representativeness of the selected thumbnails. This measure is combined with estimates about the aesthetic quality of the thumbnails (made using a SoA Fully Convolutional Network) to form a reward and train the thumbnail selector via reinforcement learning. Experiments on two datasets (OVP and Youtube) show the competitiveness of the proposed method against other SoA approaches. An ablation study with respect to the adopted thumbnail selection criteria documents the importance of considering the aesthetics, and the contribution of this information when used in combination with measures about the representativeness of the visual content.
ReTV: Bringing Broadcaster Archives to the 21st-century AudiencesReTV project
Optimising audiovisual content for online publication and maximising user engagement. A presentation by Lyndon Nixon at Joint Technical Symposium 2019.
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
We present a video compression framework that has two key features. First, we aim at achieving
perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the
literature have been evaluated and the performance was assessed using four well-known performance
metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have
been performed to demonstrate the proposed framework.
The Advisory Group on MPEG Visual Quality Assessment (ISO/IEC JTC1 SC29/AG5) has been founded in 2020 with the goal to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies in the context of the MPEG standardization work. In this talk, the current work items, as well as perspectives and first achievements of the group, are presented.
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
We present a video compression framework that has two key features. First, we aim at achieving perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the literature have been evaluated and the performance was assessed using four well-known performance metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have been performed to demonstrate the proposed framework.
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsAlpen-Adria-Universität
The Quality of Experience (QoE) is well-defined in QUALINET white papers, but its assessment and metrics are subject to research. The aim of this workshop on “Quality of Immersive Media: Assessment and Metrics” is to provide a forum for researchers and practitioners to discuss the latest findings in this field. The scope of this workshop is (i) to raise awareness about MPEG efforts in the context of quality of immersive visual media and (ii) invite experts (outside of MPEG) to present new techniques relevant to this workshop.
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
Global interconnect planning becomes a challenge as semiconductor technology continuously scales. Because of the increasing wire resistance and higher capacitive coupling in smaller features, the delay of global interconnects becomes large compared with the delay of a logic gate, introducing a huge performance gap that needs to be resolved A novel equalized global link architecture and driver– receiver co design flow are proposed for high-speed and low-energy on-chip communication by utilizing a continuous-time linear equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver–receiver design flow, over 50% energy reduction is observed.
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
Fast object re-detection and localization in video for spatio-temporal fragment creation, Jul. 2013, San Jose, California, USA. Talk provided by Vasileios Mezaris.
Tutorial on Point Cloud Compression and standardisationRufael Mekuria
Tutorial on Point Cloud Compression and standardisation given at IEEE VCIP 2017 in december. I provide the techniques for point cloud compression and the designed quality metrics and codecs in my PhD at CWI. I detail the standardisation activity on point cloud compression that I started in 2014 and that started in 2017 involving all mobile device makers like Apple, Huawei, Sony, Samsung and Nokia.
Improved Error Detection and Data Recovery Architecture for Motion Estimation...IJERA Editor
Given the critical role of motion estimation (ME) in a video coder, testing such a module is of priority concern. While focusing on the testing of ME in a video coding system, this work presents an error detection and data recovery (EDDR) design, based on the proposed residue-and-quotient (RQ) code, to embed into ME for video coding testing applications. An error in processing elements (PEs), i.e. key components of a ME, can be detected and recovered effectively by using the proposed EDDR design. Experimental results indicate that the proposed EDDR design for ME testing can detect errors and recover data with an acceptable area overhead and timing penalty. Importantly, the proposed EDDR design performs satisfactorily in terms of throughput and reliability for ME testing applications.
Video content analysis and retrieval system using video storytelling and inde...IJECEIAES
Videos are used often for communicating ideas, concepts, experience, and situations, because of the significant advances made in video communication technology. The social media platforms enhanced the video usage expeditiously. At, present, recognition of a video is done, using the metadata like video title, video descriptions, and video thumbnails. There are situations like video searcher requires only a video clip on a specific topic from a long video. This paper proposes a novel methodology for the analysis of video content and using video storytelling and indexing techniques for the retrieval of the intended video clip from a long duration video. Video storytelling technique is used for video content analysis and to produce a description of the video. The video description thus created is used for preparation of an index using wormhole algorithm, guarantying the search of a keyword of definite length L, within the minimum worst-case time. This video index can be used by video searching algorithm to retrieve the relevant part of the video by virtue of the frequency of the word in the keyword search of the video index. Instead of downloading and transferring a whole video, the user can download or transfer the specifically necessary video clip. The network constraints associated with the transfer of videos are considerably addressed.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
Presentation slides for our paper "Combining Adversarial and Reinforcement Learning for Video Thumbnail Selection", ACM ICMR 2021. https://doi.org/10.1145/3460426.3463630.
We developed a new method for unsupervised video thumbnail selection. The developed network architecture selects video thumbnails based on two criteria: the representativeness and the aesthetic quality of their visual content. Training relies on a combination of adversarial and reinforcement learning. The former is used to train a discriminator, whose goal is to distinguish the original from a reconstructed version of the video based on a small set of candidate thumbnails. The discriminator’s feedback is a measure of the representativeness of the selected thumbnails. This measure is combined with estimates about the aesthetic quality of the thumbnails (made using a SoA Fully Convolutional Network) to form a reward and train the thumbnail selector via reinforcement learning. Experiments on two datasets (OVP and Youtube) show the competitiveness of the proposed method against other SoA approaches. An ablation study with respect to the adopted thumbnail selection criteria documents the importance of considering the aesthetics, and the contribution of this information when used in combination with measures about the representativeness of the visual content.
ReTV: Bringing Broadcaster Archives to the 21st-century AudiencesReTV project
Optimising audiovisual content for online publication and maximising user engagement. A presentation by Lyndon Nixon at Joint Technical Symposium 2019.
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
We present a video compression framework that has two key features. First, we aim at achieving
perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the
literature have been evaluated and the performance was assessed using four well-known performance
metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have
been performed to demonstrate the proposed framework.
The Advisory Group on MPEG Visual Quality Assessment (ISO/IEC JTC1 SC29/AG5) has been founded in 2020 with the goal to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies in the context of the MPEG standardization work. In this talk, the current work items, as well as perspectives and first achievements of the group, are presented.
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
We present a video compression framework that has two key features. First, we aim at achieving perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the literature have been evaluated and the performance was assessed using four well-known performance metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have been performed to demonstrate the proposed framework.
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsAlpen-Adria-Universität
The Quality of Experience (QoE) is well-defined in QUALINET white papers, but its assessment and metrics are subject to research. The aim of this workshop on “Quality of Immersive Media: Assessment and Metrics” is to provide a forum for researchers and practitioners to discuss the latest findings in this field. The scope of this workshop is (i) to raise awareness about MPEG efforts in the context of quality of immersive visual media and (ii) invite experts (outside of MPEG) to present new techniques relevant to this workshop.
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
Global interconnect planning becomes a challenge as semiconductor technology continuously scales. Because of the increasing wire resistance and higher capacitive coupling in smaller features, the delay of global interconnects becomes large compared with the delay of a logic gate, introducing a huge performance gap that needs to be resolved A novel equalized global link architecture and driver– receiver co design flow are proposed for high-speed and low-energy on-chip communication by utilizing a continuous-time linear equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver–receiver design flow, over 50% energy reduction is observed.
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
Fast object re-detection and localization in video for spatio-temporal fragment creation, Jul. 2013, San Jose, California, USA. Talk provided by Vasileios Mezaris.
Tutorial on Point Cloud Compression and standardisationRufael Mekuria
Tutorial on Point Cloud Compression and standardisation given at IEEE VCIP 2017 in december. I provide the techniques for point cloud compression and the designed quality metrics and codecs in my PhD at CWI. I detail the standardisation activity on point cloud compression that I started in 2014 and that started in 2017 involving all mobile device makers like Apple, Huawei, Sony, Samsung and Nokia.
Improved Error Detection and Data Recovery Architecture for Motion Estimation...IJERA Editor
Given the critical role of motion estimation (ME) in a video coder, testing such a module is of priority concern. While focusing on the testing of ME in a video coding system, this work presents an error detection and data recovery (EDDR) design, based on the proposed residue-and-quotient (RQ) code, to embed into ME for video coding testing applications. An error in processing elements (PEs), i.e. key components of a ME, can be detected and recovered effectively by using the proposed EDDR design. Experimental results indicate that the proposed EDDR design for ME testing can detect errors and recover data with an acceptable area overhead and timing penalty. Importantly, the proposed EDDR design performs satisfactorily in terms of throughput and reliability for ME testing applications.
Video content analysis and retrieval system using video storytelling and inde...IJECEIAES
Videos are used often for communicating ideas, concepts, experience, and situations, because of the significant advances made in video communication technology. The social media platforms enhanced the video usage expeditiously. At, present, recognition of a video is done, using the metadata like video title, video descriptions, and video thumbnails. There are situations like video searcher requires only a video clip on a specific topic from a long video. This paper proposes a novel methodology for the analysis of video content and using video storytelling and indexing techniques for the retrieval of the intended video clip from a long duration video. Video storytelling technique is used for video content analysis and to produce a description of the video. The video description thus created is used for preparation of an index using wormhole algorithm, guarantying the search of a keyword of definite length L, within the minimum worst-case time. This video index can be used by video searching algorithm to retrieve the relevant part of the video by virtue of the frequency of the word in the keyword search of the video index. Instead of downloading and transferring a whole video, the user can download or transfer the specifically necessary video clip. The network constraints associated with the transfer of videos are considerably addressed.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
Abstract: With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission. To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-
generation codecs (e.g., High Efficiency Video Coding (HEVC),
Alliance for Open Media Video 1 (AV1)) that lie below the predicted rate-distortion curve of the Advanced Video Coding (AVC) codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF score of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Alpen-Adria-Universität
With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission.
To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-generation codecs (e.g., HEVC, AV1) that lie below the predicted rate-distortion curve of the AVC codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
Vignesh V Menon is invited to talk on "Video Coding for HTTP Adaptive Streaming" on the Research@Lunch, which is a research webinar series by Humanitarian Technology (HuT) Labs, Amrita Vishwa Vidyapeetham University, India, exclusively for Ph.D. Scholars, UG, and PG Researchers in India. This talk will introduce the basics of video codecs and highlight the scope of HAS-related research on video encoding.
Invited talk on “Video Coding for HTTP Adaptive Streaming” on the Research@Lunch, which is a research webinar series by Humanitarian Technology (HuT) Labs, Amrita Vishwa Vidyapeetham University, India, exclusively for Ph.D. Scholars, UG, and PG Researchers in India. This talk introduced the basics of video codecs and highlighted the scope of HAS-related research on video encoding.
Time: August 14, 10.00AM-10.30AM (CEST) or 1.30PM- 2.00PM (IST)
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
The paper deals with the development of mathematical models and algorithms for video processing in digital video surveillance systems to detect moving objects. The model and algorithm can be applied in video surveillance systems to identify moving objects on a surveillance area. The reduction of the
calculations for the segmentation of video is considered and describes the algorithm of Observational discrete lines for the detection and tracking of moving objects is proposed in this article.
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
The paper deals with the development of mathematical models and algorithms for video processing in digital video surveillance systems to detect moving objects. The model and algorithm can be applied in
video surveillance systems to identify moving objects on a surveillance area. The reduction of the calculations for the segmentation of video is considered and describes the algorithm of Observational discrete lines for the detection and tracking of moving objects is proposed in this article.
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Motaz Sabri
This presentation was used in Ridge-i Yomekai event in decemver 2018 for a NIPS2018 paper named Video-to-Video Synthesis delivered by researchers from Nvidia and MIT.
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
The objective of this paper is to develop a low bit rate encoding for VQ problems such as real time image coding.. The decision tree is generated by an offline process.. A new systolic architecture to realize the encoder of full search vector quantization VQ for high speed applications is presented here. Over past decades digital video compression technologies have become an integral part. Therefore the purpose is to improve image quality in Remote cardiac pulse measurement using Adaptive filter. It describes the approach to be used for feature extraction from many images.. This paper presents a real time application of compression of the image processing technique which can be efficiently used for the interfacing with any hardware. Therefore we have used Raspberry Pi in compression of image. We have developed an algorithm that is based on the endoscopic images that consist of the differential pulse code modulation. The compressors consist of a low cost YEF colour space converters and variable length predictive algorithm for lossless compression. Mr. Nilesh Bodne | Dr. Sunil Kumar "Design and Analysis of Quantization Based Low Bit Rate Encoding System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29289.pdf Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/29289/design-and-analysis-of-quantization-based-low-bit-rate-encoding-system/mr-nilesh-bodne
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
The paper deals with the development of mathematical models and algorithms for video processing in
digital video surveillance systems to detect moving objects. The model and algorithm can be applied in
video surveillance systems to identify moving objects on a surveillance area. The reduction of the
calculations for the segmentation of video is considered and describes the algorithm of Observational
discrete lines for the detection and tracking of moving objects is proposed in this article.
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is an open access journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
Presentation of paper "Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization", by K. Triaridis, V. Mezaris, delivered at 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Amsterdam, NL, Jan.-Feb. 2024.
Presentation of our top-scoring solution to the MediaEval 2023 NewsImages Task, "Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual Softmax Operation for MediaEval NewsImages 2023", by A. Leventakis, D. Galanopoulos, V. Mezaris, delivered at the 2023 Multimedia Evaluation Workshop (MediaEval'23), Amsterdam, NL, Feb. 2024.
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
Presentation of paper "An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos", by I. Kontostathis, E. Apostolidis, V. Mezaris, delivered at 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Amsterdam, NL, Jan.-Feb. 2024.
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
Presentation of paper "Masked Feature Modelling for the unsupervised pre-training of a Graph Attention Network block for bottom-up video event recognition", by D. Daskalakis, N. Gkalelis, V. Mezaris, delivered at IEEE ISM 2023, Dec. 2022, Laguna Hills, CA, USA.
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022VasileiosMezaris
Matching images to articles is challenging and can be considered a special version of the cross-media retrieval problem. This working note paper presents our solution for the MediaEval NewsImages
benchmarking task. We investigated the performance of two cross-modal networks, a pre-trained network and a trainable one, the latter originally developed for text-video retrieval tasks and adapted to the NewsImages task. Moreover, we utilize a method for revising the similarities produced by either one of the cross-modal networks, i.e., a dual softmax operation, to improve our solutions’ performance. We report the official results for our submitted runs and additional experiments we conducted to evaluate our runs internally.
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
Presentation of paper "TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks", by M. Ntrougkas, N. Gkalelis, V. Mezaris, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
The apparent “black box” nature of neural networks is a barrier to adoption in applications where explainability is essential. This paper presents TAME (Trainable Attention Mechanism for Explanations), a method for generating explanation maps with a multi-branch hierarchical attention mechanism. TAME combines a target model’s feature maps from multiple layers using an attention mechanism, transforming them into an explanation map. TAME can easily be applied to any convolutional neural network (CNN) by streamlining the optimization of the attention mechanism’s training method and the selection
of target model’s feature maps. After training, explanation maps can be computed in a single forward pass. We apply TAME to two widely used models, i.e. VGG-16 and ResNet-50, trained on ImageNet and show improvements over previous top-performing methods. We also provide a comprehensive ablation study comparing the performance of different variations of TAME’s architecture.
Presentation of paper "Gated-ViGAT: Efficient Bottom-Up Event
Recognition and Explanation Using a New Frame
Selection Policy and Gating Mechanism", by N. Gkalelis, D. Daskalakis, V. Mezaris, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper, Gated-ViGAT, an efficient approach for video event recognition, utilizing bottom-up (object) information, a new frame sampling policy and a gating mechanism is proposed. Specifically, the frame sampling policy uses weighted in-degrees (WiDs), derived from the adjacency matrices of graph attention networks (GATs), and a dissimilarity measure to select
the most salient and at the same time diverse frames representing
the event in the video. Additionally, the proposed gating mechanism fetches the selected frames sequentially, and commits early exiting when an adequately confident decision is achieved. In this way, only a few frames are processed by the computationally
expensive branch of our network that is responsible for the bottom-up information extraction. The experimental evaluation on two large, publicly available video datasets (MiniKinetics, ActivityNet) demonstrates that Gated-ViGAT provides a large computational complexity reduction in comparison to our previous approach (ViGAT), while maintaining the excellent event
recognition and explainability performance.
Explaining video summarization based on the focus of attentionVasileiosMezaris
Presentation of paper "Explaining video summarization based on
the focus of attention", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper we propose a method for explaining
video summarization. We start by formulating the problem as
the creation of an explanation mask which indicates the parts
of the video that influenced the most the estimates of a video
summarization network, about the frames’ importance. Then, we
explain how the typical analysis pipeline of attention-based networks for video summarization can be used to define explanation
signals, and we examine various attention-based signals that have
been studied as explanations in the NLP domain. We evaluate
the performance of these signals by investigating the video
summarization network’s input-output relationship according
to different replacement functions, and utilizing measures that quantify the capability of explanations to spot the most and
least influential parts of a video. We run experiments using an
attention-based network (CA-SUM) and two datasets (SumMe
and TVSum) for video summarization. Our evaluations indicate the advanced performance of explanations formed using the inherent attention weights, and demonstrate the ability of our
method to explain the video summarization results using clues
about the focus of the attention mechanism.
Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris
In this presentation, our work in the context of the Ad-hoc Video Search (AVS) Task of TRECVID 2022 is presented. Our participation in the AVS task is based on a cross-modal deep network architecture, T x V ("T times V"), which utilizes several textual and visual features. As part of the retrieval stage, a dual-softmax approach is also utilized to revise the calculated text-video similarities.
Explaining the decisions of image/video classifiersVasileiosMezaris
Presentation delivered by Vasileios Mezaris at the 1st Nice Workshop on Interpretability, November 2022, Nice, France.
This presentation starts by discussing the motivation of explainability approaches for image and video classifiers. Then, we focus on three distinct problems: learning how to derive explanations for the decisions of a legacy (trained) image classifier; designing a classifier for video event recognition that can also deliver explanations for its decisions; and, taking a first look at possible explanation signals of a video summarizer. Technical details of our proposed solutions to these three problems are presented. Besides quantitative results concerning the goodness of the derived explanations, qualitative examples are also discussed in order to provide insight on the reasons behind classification errors, including possible dataset biases affecting the trained classifiers.
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
I. Gkartzonika, N. Gkalelis, V. Mezaris, "Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism", Proc. ECCV 2022 Workshop on Vision with Biased or Scarce Data (VBSD), Oct. 2022.
In this paper two new learning-based eXplainable AI (XAI) methods for deep convolutional neural network (DCNN) image classifiers, called L-CAM-Fm and L-CAM-Img, are proposed. Both methods use an attention mechanism that is inserted in the original (frozen) DCNN and is trained to derive class activation maps (CAMs) from the last convolutional layer’s feature maps. During training, CAMs are applied to the feature maps (L-CAM-Fm) or the input image (L-CAM-Img) forcing the attention mechanism to learn the image regions explaining the DCNN’s outcome. Experimental evaluation on ImageNet shows that the proposed methods achieve competitive results while requiring a single forward pass at the inference stage. Moreover, based on the derived explanations a comprehensive qualitative analysis is performed providing valuable insight for understanding the reasons behind classification errors, including possible dataset biases affecting the trained classifier.
Are all combinations equal? Combining textual and visual features with multi...VasileiosMezaris
D. Galanopoulos, V. Mezaris, "Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval", Proc. ECCV 2022 Workshop on AI for Creative Video Editing and Understanding (CVEU), Oct. 2022.
In this paper we tackle the cross-modal video retrieval problem and, more specifically, we focus on text-to-video retrieval. We investigate how to optimally combine multiple diverse textual and visual features into feature pairs that lead to generating multiple joint feature spaces, which encode text-video pairs into comparable representations. To learn these representations our proposed network architecture is trained by following a multiple space learning procedure. Moreover, at the retrieval stage, we introduce additional softmax operations for revising the inferred query-video similarities. Extensive experiments in several setups based on three large-scale datasets (IACC.3, V3C1, and MSR-VTT) lead to conclusions on how to best combine text-visual features and document the performance of the proposed network.
Presentation of the paper titled "Summarizing videos using concentrated attention and considering the uniqueness and diversity of the video frames", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the ACM Int. Conf. on Multimedia Retrieval (ICMR’22), Newark, NJ, USA, June 2022. The corresponding software is available at https://github.com/e-apostolidis/CA-SUM.
Talk by Vasileios Mezaris, titled "Misinformation on the internet: Video and AI", delivered at the "Age of misinformation: an interdisciplinary outlook on fake news" webinar, on 17 December 2020.
Slides for the paper titled "Structured pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications", by N. Gkalelis and V. Mezaris, presented at the 22nd IEEE Int. Symposium on Multimedia (ISM), Dec. 2020.
Presentation of the paper titled "Migration-Related Semantic Concepts for the Retrieval of Relevant Video Content", by E. Elejalde, D. Galanopoulos, C. Niederee, V. Mezaris, published in the proceedings of the Int. Workshop on Artificial Intelligence and Robotics for Law Enforcement Agencies (AIRLEAs) at the 3rd Int. Conf. on Intelligent Technologies and Applications (INTAP 2020), Gjovik, Norway, Sept. 2020.
This is the presentation for the paper "Fractional Step Discriminant Pruning: A Filter Pruning Framework for Deep Convolutional Neural Networks", delivered by N. Gkalelis and V. Mezaris at the 7th IEEE Int. Workshop on Mobile Multimedia Computing (MMC2020) that was held as part of the IEEE Int. Conf. on Multimedia and Expo (ICME), in July 2020.
"Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification", by N. Gkalelis, V. Mezaris. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
Video & AI: capabilities and limitations of AI in detecting video manipulationsVasileiosMezaris
Invited presentation given by Dr. Vasileios Mezaris during the Greek Media Literacy Week 2019; specifically, presented in the international conference on "Disinformation in Cyberspace: Media literacy meets Artificial Intelligence" that was organized as part of the Media Literacy Week 2019 in Athens, Greece, on November 15, 2019.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Nutraceutical market, scope and growth: Herbal drug technology
GAN-based video summarization
1. Thessaloniki, October 2020
GAN-based Video Summarization
Vasileios Mezaris
CERTH-ITI
Presentation at the AI4Media
Workshop on GANs for Media
Content Generation
1
Joint work with
E. Apostolidis, E. Adamantidou,
A. Metsai (CERTH-ITI);
I. Patras (QMUL)
2. Thessaloniki, October 2020Vasileios Mezaris
2
Video summary: a short visual summary that encapsulates the flow of the story and
the essential parts of the full-length video
Original video
Video summary (storyboard)
Problem statement
3. Thessaloniki, October 2020Vasileios Mezaris
3
Problem statement
Applications of video summarization
Professional CMS: effective indexing,
browsing, retrieval & promotion of media
assets
Video sharing platforms: improved viewer
experience, enhanced viewer engagement &
increased content consumption
Other summarization scenarios: movie trailer production, sports highlights video generation,
video synopsis of 24h surveillance recordings
4. Thessaloniki, October 2020Vasileios Mezaris
4
Related work
Deep-learning approaches
Various supervised methods (i.e., learning from ground-truth manually-generated summaries)
Using feedforward neural nets (CNNs) for e.g. identifying semantically-important video parts
Exploiting video-level metadata
Capturing the story flow using recurrent neural nets (e.g. LSTMs)
…and many more
Unsupervised algorithms that do not rely on human-annotations, and build summaries
Using adversarial learning to: minimize the distance between videos and their summary-based
reconstructions; maximize the mutual information between summary and video; learn a mapping
from raw videos to human-like summaries based on online available summaries
…and a few more approaches (see tutorial at IEEE ICME 2020,
https://www.slideshare.net/VasileiosMezaris/icme2020-tutorial-videosummarizationpart1)
+ No need for training data (limited, hard to produce)
+ Avoid the subjectivity & biases of manually-generated summaries
+ Adaptability to different types of video
5. Thessaloniki, October 2020Vasileios Mezaris
GANs for unsupervised video summarization
Our starting point: the SUM-GAN architecture [1]
Main idea: build a keyframe selection mechanism
by minimizing the distance between the deep
representations of the original video and a
reconstructed version of it based on the selected
keyframes
Problem: how to define a good distance?
Solution: use a trainable discriminator network!
Goal: train the Summarizer to maximally confuse
the Discriminator when distinguishing the original
from the reconstructed video
5
SUM-GAN
[1] B. Mahasseni, M. Lam, S. Todorovic, "Unsupervised Video
Summarization with Adversarial LSTM Networks“, 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp.
2982–2991.
6. Thessaloniki, October 2020Vasileios Mezaris
Introduces two extensions [2]:
A linear compression layer that reduces the size
of the CNN feature vectors
An incremental and fine-grained approach to
train the model’s components
[2] E. Apostolidis, A. Metsai, E. Adamantidou, V. Mezaris, I. Patras, "A Stepwise, Label-
based Approach for Improving the Adversarial Training in Unsupervised Video
Summarization", Proc. 1st Int. Workshop on AI for Smart TV Content Production,
Access and Delivery (AI4TV'19) at ACM Multimedia 2019, Nice, France, October 2019.
6
SUM-GAN-sl
GANs for unsupervised video summarization
7. Thessaloniki, October 2020Vasileios Mezaris
Incremental approach to train the model’s components
7
SUM-GAN-sl
GANs for unsupervised video summarization
8. Thessaloniki, October 2020Vasileios Mezaris 8
(regularization factor)
SUM-GAN-sl
GANs for unsupervised video summarization
Incremental approach to train the model’s components
9. Thessaloniki, October 2020Vasileios Mezaris 9
SUM-GAN-sl
GANs for unsupervised video summarization
Incremental approach to train the model’s components
10. Thessaloniki, October 2020Vasileios Mezaris 10
SUM-GAN-sl
GANs for unsupervised video summarization
Incremental approach to train the model’s components
11. Thessaloniki, October 2020Vasileios Mezaris
Adversarial learning driven by deterministic
attention auto-encoder
The VAE in previous architecture was entirely
replaced by an attention auto-encoder (AAE)
network, forming the SUM-GAN-AAE
architecture [3]
[3] E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras, "Unsupervised
Video Summarization via Attention-Driven Adversarial Learning", Proc. 26th Int.
Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
11
SUM-GAN-AAE
GANs for unsupervised video summarization
12. Thessaloniki, October 2020Vasileios Mezaris 12
Attention auto-encoder
Processing pipeline
SUM-GAN-AAE
GANs for unsupervised video summarization
13. Thessaloniki, October 2020Vasileios Mezaris 13
Processing pipeline
Weighted feature vectors fed to the Encoder
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
14. Thessaloniki, October 2020Vasileios Mezaris 14
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
For t > 1: use the hidden state of the previous
Decoder’s step (h1)
For t = 1: use the hidden state of the last
Encoder’s step (He)
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
15. Thessaloniki, October 2020Vasileios Mezaris 15
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
16. Thessaloniki, October 2020Vasileios Mezaris
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
16
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
17. Thessaloniki, October 2020Vasileios Mezaris
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
αt multiplied with V and form Context Vector vt’
17
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
18. Thessaloniki, October 2020Vasileios Mezaris
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
αt multiplied with V and form Context Vector vt’
vt’ combined with Decoder’s previous output yt-1
18
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
19. Thessaloniki, October 2020Vasileios Mezaris 19
Attention auto-encoder
Processing pipeline
Weighted feature vectors fed to the Encoder
Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
Attention weights (αt) computed using:
Energy score function
Soft-max function
αt multiplied with V and form Context Vector vt’
vt’ combined with Decoder’s previous output yt-1
Decoder gradually reconstructs the video
SUM-GAN-AAE
GANs for unsupervised video summarization
20. Thessaloniki, October 2020Vasileios Mezaris
Video summarization practicalities
Input: The CNN feature vectors of the (sampled) video frames
Output: Frame-level importance scores
Summarization process:
CNN features pass through the linear compression layer and the frame selector importance
scores computed at frame-level
Given a video segmentation (using KTS) calculate fragment-level importance scores by averaging
the scores of each fragment's frames
Summary is created by selecting the fragments that maximize the total importance score provided
that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem
20
Model’s I/O and summarization process
21. Thessaloniki, October 2020Vasileios Mezaris
Experiments
21
Datasets
SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)
25 videos capturing multiple events (e.g. cooking and sports)
video length: 1 to 6 min
annotation: fragment-based video summaries
TVSum (https://github.com/yalesong/tvsum)
50 videos from 10 categories of TRECVid MED task
video length: 1 to 11 min
annotation: frame-level importance scores
22. Thessaloniki, October 2020Vasileios Mezaris
Experiments
22
Evaluation protocol
The generated summary should not exceed 15% of the video length
Similarity between automatically generated (A) and ground-truth (G) summary is expressed
by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| ||
means duration)
Typical metrics for computing Precision and Recall at the frame-level
23. Thessaloniki, October 2020Vasileios Mezaris
Experiments
23
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach in the literature
24. Thessaloniki, October 2020Vasileios Mezaris
Experiments
24
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach in the literature
25. Thessaloniki, October 2020Vasileios Mezaris
Experiments
25
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach in the literature
F-Score1
26. Thessaloniki, October 2020Vasileios Mezaris
Experiments
26
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach in the literature
F-Score2
F-Score1
27. Thessaloniki, October 2020Vasileios Mezaris
Experiments
27
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach in the literature
F-ScoreN
F-Score2
F-Score1
28. Thessaloniki, October 2020Vasileios Mezaris
Experiments
28
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Most used approach in the literature
F-ScoreN
F-Score2
F-Score1
SumMe: TVSum:
N
29. Thessaloniki, October 2020Vasileios Mezaris
Experiments
29
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Alternative approach
30. Thessaloniki, October 2020Vasileios Mezaris
Experiments
30
Evaluation protocol
Slight but important distinction w.r.t. what is eventually used as ground-truth summary
Alternative approach
F-Score
31. Thessaloniki, October 2020Vasileios Mezaris
Videos were down-sampled to 2 fps
Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet
Linear compression layer reduces the size of these vectors from 1024 to 500
All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM
Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s
learning rate = 10-5
Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing
set having the remaining 20% of data
Ran experiments on 5 differently created random splits and report the average performance at
the training-epoch-level (i.e. for the same training epoch) over these runs
Experiments
31
Implementation details
32. Thessaloniki, October 2020Vasileios Mezaris
Comparison with SoA unsupervised approaches based on multiple user summaries
Outcomes
A few SoA methods are comparable (or even worse) with a random summary generator
Best method on TVSum shows random-level performance on SumMe
Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum
Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two
latent spaces in parallel to the continuous update of the model's components during the training
Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl
Experiments
32
Note: SUM-GAN is not listed in this table as it follows
the single gt-summary evaluation protocol
33. Thessaloniki, October 2020Vasileios Mezaris
Evaluating the effect of the AAE component
Training efficiency: much faster and more stable training of the model
Experiments
33
Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
34. Thessaloniki, October 2020Vasileios Mezaris
Comparison with SoA supervised approaches based on multiple user summaries
Outcomes
Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as
they exhibit random-level performance on SumMe
Only a few supervised methods surpass the performance of a random summary generator on both
datasets, with VASNet being the best among them
The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum
Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods
Experiments
34
+/- indicate
better/worse
performance
compared to
SUM-GAN-AAE
35. Thessaloniki, October 2020Vasileios Mezaris
Adapting / re-purposing the content
Main requirements:
Target distribution platforms & devices have varying requirements (e.g. the optimal
duration of a video differs from one platform to another)
Target audiences have different preferences / information needs
Video summarization:
Create editions of the content that are adapted to different platforms and audiences
35
36. Thessaloniki, October 2020Vasileios Mezaris
Adapting / re-purposing the content
Web application [4] for video summarization (try it with your video!):
http://multimedia2.iti.gr/videosummarization/service/start.html
Demo video:
https://youtu.be/LbjPLJzeNII
36
[4] C. Collyda, K. Apostolidis, E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, "A
Web Service for Video Summarization", Proc. ACM Int. Conf. on Interactive Media
Experiences (IMX 2020), Barcelona, Spain, June 2020.
37. Thessaloniki, October 2020Vasileios Mezaris
Presented two new video summarization methods, making use of:
The learning efficiency of the generative adversarial networks for unsupervised training
The effectiveness of attention mechanisms in spotting the most important parts of the video
Experimental evaluations on two benchmarking datasets
Documented the positive contribution of the introduced attention auto-encoder component in the
model's training and summarization performance
Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video
summarization techniques
Used GANs in a new web application for video summarization
Keep in mind: complete automation is sometimes not desired! (AI + human symbiosis is key)
Conclusions
37
38. Thessaloniki, October 2020Vasileios Mezaris
Questions?
38
Contact: Dr. Vasileios Mezaris
Information Technologies Institute
Centre for Research and Technology Hellas
Thermi-Thessaloniki, Greece
Tel: +30 2311 257770
Email: bmezaris@iti.gr, web: http://www.iti.gr/~bmezaris/
This work was supported in part by the EU’s Horizon 2020 research and innovation programme under grant
agreement H2020-780656 ReTV.