This document presents a second progress report on video summarization research. It provides an outline of topics covered, including an introduction to video summarization, a literature review summarizing 5 papers on the topic, identified research gaps, challenges, the problem statement of finding key frames based on extracted text, overview of relevant datasets and tools used, and conclusions. The literature review analyzes the objectives, methods, strengths and limitations of the summarized papers.
Unsupervised Video Summarization via Attention-Driven Adversarial LearningVasileiosMezaris
"Unsupervised Video Summarization via Attention-Driven Adversarial Learning", by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
In October 2017, ISO/IEC JCT1 SC29/WG11 MPEG and ITU-T SG16/Q6 VCEG have jointly published a Call for Proposals on Video Compression with Capability beyond HEVC and its current extensions. It is targeting at a new generation of video compression technology that has substantially higher compression capability than the existing HEVC standard. The responses to the call are evaluated in April 2018, forming the kick-off for a new standardization activity in the Joint Video Experts Team (JVET) of VCEG and MPEG, with a target of finalization by the end of the year 2020. Three categories of video are addressed: Standard dynamic range video (SDR), high dynamic range video (HDR), and 360° video. While SDR and HDR cover variants of conventional video to be displayed e.g. on a suitable TV screen at very high resolution (UHD), the 360° category targets at videos capturing a full-degree surround view of the scene. This enables an immersive video experience with the possibility to look around in the rendered scene, e.g. when viewed using a head-mounted display. This application triggers various technical challenges which need to be addressed in terms of compression, encoding, transport, and rendering. The talk summarizes the current state of the complete standardization project. Focussing on the SDR and 360° video categories, it highlights the development of selected coding tools compared to the state of the art. Representative examples of the new technological challenges as well as corresponding proposed solutions are presented.
Explaining video summarization based on the focus of attentionVasileiosMezaris
Presentation of paper "Explaining video summarization based on
the focus of attention", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper we propose a method for explaining
video summarization. We start by formulating the problem as
the creation of an explanation mask which indicates the parts
of the video that influenced the most the estimates of a video
summarization network, about the frames’ importance. Then, we
explain how the typical analysis pipeline of attention-based networks for video summarization can be used to define explanation
signals, and we examine various attention-based signals that have
been studied as explanations in the NLP domain. We evaluate
the performance of these signals by investigating the video
summarization network’s input-output relationship according
to different replacement functions, and utilizing measures that quantify the capability of explanations to spot the most and
least influential parts of a video. We run experiments using an
attention-based network (CA-SUM) and two datasets (SumMe
and TVSum) for video summarization. Our evaluations indicate the advanced performance of explanations formed using the inherent attention weights, and demonstrate the ability of our
method to explain the video summarization results using clues
about the focus of the attention mechanism.
Presentation of the paper titled "Combining Global and Local Attention with Positional Encoding for Video Summarization", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software is available at https://github.com/e-apostolidis/PGL-SUM.
Unsupervised Video Summarization via Attention-Driven Adversarial LearningVasileiosMezaris
"Unsupervised Video Summarization via Attention-Driven Adversarial Learning", by E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras. Proceedings of the 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
In October 2017, ISO/IEC JCT1 SC29/WG11 MPEG and ITU-T SG16/Q6 VCEG have jointly published a Call for Proposals on Video Compression with Capability beyond HEVC and its current extensions. It is targeting at a new generation of video compression technology that has substantially higher compression capability than the existing HEVC standard. The responses to the call are evaluated in April 2018, forming the kick-off for a new standardization activity in the Joint Video Experts Team (JVET) of VCEG and MPEG, with a target of finalization by the end of the year 2020. Three categories of video are addressed: Standard dynamic range video (SDR), high dynamic range video (HDR), and 360° video. While SDR and HDR cover variants of conventional video to be displayed e.g. on a suitable TV screen at very high resolution (UHD), the 360° category targets at videos capturing a full-degree surround view of the scene. This enables an immersive video experience with the possibility to look around in the rendered scene, e.g. when viewed using a head-mounted display. This application triggers various technical challenges which need to be addressed in terms of compression, encoding, transport, and rendering. The talk summarizes the current state of the complete standardization project. Focussing on the SDR and 360° video categories, it highlights the development of selected coding tools compared to the state of the art. Representative examples of the new technological challenges as well as corresponding proposed solutions are presented.
Explaining video summarization based on the focus of attentionVasileiosMezaris
Presentation of paper "Explaining video summarization based on
the focus of attention", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at IEEE ISM 2022, Dec. 2022, Naples, Italy.
In this paper we propose a method for explaining
video summarization. We start by formulating the problem as
the creation of an explanation mask which indicates the parts
of the video that influenced the most the estimates of a video
summarization network, about the frames’ importance. Then, we
explain how the typical analysis pipeline of attention-based networks for video summarization can be used to define explanation
signals, and we examine various attention-based signals that have
been studied as explanations in the NLP domain. We evaluate
the performance of these signals by investigating the video
summarization network’s input-output relationship according
to different replacement functions, and utilizing measures that quantify the capability of explanations to spot the most and
least influential parts of a video. We run experiments using an
attention-based network (CA-SUM) and two datasets (SumMe
and TVSum) for video summarization. Our evaluations indicate the advanced performance of explanations formed using the inherent attention weights, and demonstrate the ability of our
method to explain the video summarization results using clues
about the focus of the attention mechanism.
Presentation of the paper titled "Combining Global and Local Attention with Positional Encoding for Video Summarization", by E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, delivered at the IEEE Int. Symposium on Multimedia (ISM), Dec. 2021. The corresponding software is available at https://github.com/e-apostolidis/PGL-SUM.
Presentation given in the Seminar of B.Tech 6th Semester during session 2009-10 By Paramjeet Singh Jamwal, Poonam Kanyal, Rittitka Mittal and Surabhi Tyagi.
This is the subject slides for the module MMS2401 - Multimedia System and Communication taught in Shepherd College of Media Technology, Affiliated with Purbanchal University.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
What is Video Compression?, Introduction of Video Compression. Motivation, Working Methodology of Video Compression., Example, Applications, Needs of Video Compression, Advantages & Disadvantages
This presentation is about JPEG compression algorithm. It briefly describes all the underlying steps in JPEG compression like picture preparation, DCT, Quantization, Rendering and Encoding.
This presentation covers the following topics-
1. Video Classification as a sequence of frames
2. Video Classification as a sequence of frame-blocks
3. 2D ConvNets for Videos
4. CNN + LSTM
Presentation given in the Seminar of B.Tech 6th Semester during session 2009-10 By Paramjeet Singh Jamwal, Poonam Kanyal, Rittitka Mittal and Surabhi Tyagi.
This is the subject slides for the module MMS2401 - Multimedia System and Communication taught in Shepherd College of Media Technology, Affiliated with Purbanchal University.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
What is Video Compression?, Introduction of Video Compression. Motivation, Working Methodology of Video Compression., Example, Applications, Needs of Video Compression, Advantages & Disadvantages
This presentation is about JPEG compression algorithm. It briefly describes all the underlying steps in JPEG compression like picture preparation, DCT, Quantization, Rendering and Encoding.
This presentation covers the following topics-
1. Video Classification as a sequence of frames
2. Video Classification as a sequence of frame-blocks
3. 2D ConvNets for Videos
4. CNN + LSTM
Multimodal video abstraction into a static document using deep learning IJECEIAES
Abstraction is a strategy that gives the essential points of a document in a short period of time. The video abstraction approach proposed in this research is based on multi-modal video data, which comprises both audio and visual data. Segmenting the input video into scenes and obtaining a textual and visual summary for each scene are the major video abstraction procedures to summarize the video events into a static document. To recognize the shot and scene boundary from a video sequence, a hybrid features method was employed, which improves detection shot performance by selecting strong and flexible features. The most informative keyframes from each scene are then incorporated into the visual summary. A hybrid deep learning model was used for abstractive text summarization. The BBC archive provided the testing videos, which comprised BBC Learning English and BBC News. In addition, a news summary dataset was used to train a deep model. The performance of the proposed approaches was assessed using metrics like Rouge for textual summary, which achieved a 40.49% accuracy rate. While precision, recall, and F-score used for visual summary have achieved (94.9%) accuracy, which performed better than the other methods, according to the findings of the experiments.
Key frame extraction for video summarization using motion activity descriptorseSAT Journals
Abstract Summarization of a video involves providing a gist of the entire video without affecting the semantics of the video. This has been implemented by the use of motion activity descriptors which generate relative motion between consecutive frames. Correctly capturing the motion in a video leads to the identification of the key frames in the video. This motion in the video can be obtained by using block matching techniques which is an important part of this process. It is implemented using two techniques, Diamond Search and Three Step Search, which have been studied and compared. The comparison process is tried across various videos differing in category, content, and objects. It is found that there is a trade-off between summarization factor and precision during the summarization process. Keywords: Video Summarization, Motion Descriptors, Block Matching
Key frame extraction for video summarization using motion activity descriptorseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...ijtsrd
Key Frame Extraction is the summarization of videos for different applications like video object recognition and classification, video retrieval and archival and surveillance is an active research area in computer vision. In this paper describe a new criterion for well presentative key frames and correspondingly, create a key frame selection algorithm based Two stage Method. A two stage method is used to extract accurate key frames to cover the content for the whole video sequence. Firstly, an alternative sequence is got based on color characteristic difference between adjacent frames from original sequence. Secondly, by analyzing structural characteristic difference between adjacent frames from the alternative sequence, the final key frame sequence is obtained. And then, an optimization step is added based on the number of final key frames in order to ensure the effectiveness of key frame extraction. Khaing Thazin Min | Wit Yee Swe | Yi Yi Aung | Khin Chan Myae Zin "Key Frame Extraction in Video Stream using Two-Stage Method with Colour and Structure" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27971.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-processing/27971/key-frame-extraction-in-video-stream-using-two-stage-method-with-colour-and-structure/khaing-thazin-min
Video content analysis and retrieval system using video storytelling and inde...IJECEIAES
Videos are used often for communicating ideas, concepts, experience, and situations, because of the significant advances made in video communication technology. The social media platforms enhanced the video usage expeditiously. At, present, recognition of a video is done, using the metadata like video title, video descriptions, and video thumbnails. There are situations like video searcher requires only a video clip on a specific topic from a long video. This paper proposes a novel methodology for the analysis of video content and using video storytelling and indexing techniques for the retrieval of the intended video clip from a long duration video. Video storytelling technique is used for video content analysis and to produce a description of the video. The video description thus created is used for preparation of an index using wormhole algorithm, guarantying the search of a keyword of definite length L, within the minimum worst-case time. This video index can be used by video searching algorithm to retrieve the relevant part of the video by virtue of the frequency of the word in the keyword search of the video index. Instead of downloading and transferring a whole video, the user can download or transfer the specifically necessary video clip. The network constraints associated with the transfer of videos are considerably addressed.
Decision Making Analysis of Video Streaming Algorithm for Private Cloud Compu...IJECEIAES
The issue on how to effectively deliver video streaming contents over cloud computing infrastructures is tackled in this study. Basically, quality of service of video streaming is strongly influenced by bandwidth, jitter and data loss problems. A number of intelligent video streaming algorithms are proposed by using different techniques to deal with such issues. This study aims to propose and demonstrate a novel decision making analysis which combines ISO 9126 (international standard for software engineering) and Analytic Hierarchy Process to help experts selecting the best video streaming algorithm for the case of private cloud computing infrastructure. The given case study concluded that Scalable Streaming algorithm is the best algorithm to be implemented for delivering high quality of service of video streaming over the private cloud computing infrastructure.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
Key-frame extraction is one of the important steps in semantic concept based video indexing and retrieval and accuracy of video concept detection highly depends on the effectiveness of keyframe extraction method. Therefore, extracting key-frames efficiently and effectively from video shots is considered to be a very challenging research problem in video retrieval systems. One of many approaches to extract key-frames from a shot is to make use of unsupervised clustering. Depending on the salient content of the shot and results of clustering, key-frames can be extracted. But usually, because of the visual complexity and/or the content of the video shot, we tend to get near duplicate or repetitive key-frames having the same semantic content in the output and hence accuracy of key-frame extraction decreases. In an attempt to improve accuracy, we proposed a novel key-frame extraction method based on unsupervised clustering and mutual comparison where we assigned 70% weightage to color component (HSV histogram) and 30% to texture (GLCM), while computing a combined frame similarity index used for clustering. We suggested a mutual comparison of the key-frames extracted from the output of the clustering where each key-frame is compared with every other to remove near duplicate keyframes. The proposed algorithm is both computationally simple and able to detect non-redundant and unique key-frames for the shot and as a result improving concept detection rate. The efficiency and effectiveness are validated by open database videos.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
A Deterministic Eviction Model for Removing Redundancies in Video Corpus IJECEIAES
The traditional storage approaches are being challenged by huge data volumes. In multimedia content, every file does not necessarily get tagged as an exact duplicate; rather they are prone to editing and resulting in similar copies of the same file. This paper proposes the similarity-based deduplication approach to evict similar duplicates from the archive storage, which compares the samples of binary hashes to identify the duplicates. This eviction is done by initially dividing the query video into dynamic key frames based on the video length. Binary hash codes of these frames are then compared with existing key frames to identify the differences. The similarity score is determined based on these differences, which decides the eradication strategy of duplicate copy. Duplicate elimination goes through two levels, namely removal of exact duplicates and similar duplicates. The proposed approach has shortened the comparison window by comparing only the candidate hash codes based on the dynamic key frames and aims the accurate lossless duplicate removals. The presented work is executed and tested on the produced synthetic video dataset. Results show the reduction in redundant data and increase in the storage space. Binary hashes and similarity scores contributed to achieving good deduplication ratio and overall performance.
Similar to Mtech Second progresspresentation ON VIDEO SUMMARIZATION (20)
Unsupervised object-level video summarization with online motion auto-encoderNEERAJ BAGHEL
Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day.
Author investigate a pioneer research direction towards the unsupervised object-level video summarization.
It can be distinguished from existing pipelines in two aspects:
Extracting key motions of participated objects
Learning to summarize in an unsupervised and online manner.
Host rank:Exploiting the Hierarchical Structure for Link AnalysisNEERAJ BAGHEL
Link analysis algorithms play key roles in Web search systems. solution for two problems:
The sparsity of link graph
Biased ranking of newly-emerging pages.
TVSum: Summarizing Web Videos Using TitlesNEERAJ BAGHEL
Title-based video summarization is a relatively unexplored domain; there is no publicly available dataset suitable for our purpose.
Author therefore collected a new dataset,TVSum50, that contains 50 videos and their shot-level importance scores obtained via crowdsourcing
Traffic behavior of local area network based onNEERAJ BAGHEL
Local Area Networks (LAN) is one of the most popular networks in which its performance is very important for operators.
The LAN method has been an essential infrastructure of numerous companies and organizations for a long time.
This study aims to evaluate the M/M/1 queuing
model in LAN Based on Poisson and Exponential Distribution
and compare the traffic behavior of these Distributions in terms
of some essential parameters. Moreover, it also aims to design and
implement a model to perform the M/M/1 queuing model with
different metrics and finally analyze the results to evaluate traffic
behavior of M/M/1 queuing model for Poisson and Exponential
Distribution in LAN.
A Framework For Dynamic Hand Gesture Recognition Using Key Frames ExtractionNEERAJ BAGHEL
Abstract—Hand Gesture Recognition is one of the natural
ways of human computer interaction (HCI) which has wide
range of technological as well as social applications. A dynamic
hand gesture can be characterized by its shape, position and
movement. This paper presents a user independent framework
for dynamic hand gesture recognition in which a novel algorithm
for extraction of key frames is proposed. This algorithm is based
on the change in hand shape and position, to find out the most
important and distinguishing frames from the video of the hand
gesture, using certain parameters and dynamic threshold. For
classification, Multiclass Support Vector Machine (MSVM) is
used. Experiments using the videos of hand gestures of Indian
Sign Language show the effectiveness of the proposed system for
various dynamic hand gestures. The use of key frame extraction
algorithm speeds up the system by selecting essential frames and
therefore eliminating extra computation on redundant frames.
Biometric is a technology which deals with unique bio elements of an individual.
It is used to verify an individual by its own bio elements saved in system.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Cosmetic shop management system project report.pdf
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
1. Second Progress Presentation on
“Video Summarization”
Presented By:
Neeraj Baghel
M. Tech.(P) (CSE) II Yr
178150005
Supervised By:
Prof. Charul Bhatnagar
Professor, Deptt. of CEA
GLA University, Mathura
Dept. of Computer Engineering & Applications,
GLA University, Mathura.
October 24, 2018
1
3. Introduction To Video Summarization
Video
• Video data is a great asset for
information extraction and
knowledge discovery.
• Due to its size an variability, it is
extremely hard for users to
monitor.[5]
Video Summarization
• Intelligent video summarization
algorithms allow us to quickly
browse a lengthy video by
capturing the essence and
removing redundant
information.[5]
Fig 1: Video Summarization Work Flow [1]
3
4. Types of Video summarization
Video can be summarized by two different ways which are as follows.
Fig 2: Video Summarization Technique Classification [7]
4
6. Paper 1: Tvsum: Summarizing web videos using
titles [2].
Video summarization is a
challenging problem in part
because knowing which
part of a video is important
requires prior knowledge
about its main topic. We
present TVSum, an
unsupervised video
summarization framework
that uses title-based image
search results to find
visually important shots.[2]
Authors- Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
Yahoo Labs, New York
IEEE Conference on Computer Vision and Pattern Recognition. 2015
Fig. 1. -Figure 1. An illustration of
title-based video
summarization.[2]
6
7. Objectiv
e
Proposed
Method
Dataset Strengt
h
Limitatio
n
1. To find
which part of a
video is
important. And
thus “summary
worthy,”
requires prior
knowledge
about its main
topic.
2. Proposed
TVSum ,an
unsupervised video
summarization
framework that
uses the video title
to find visually
important shots.
2. Author devel-
oped co-archetypal
analysis technique
that learns
canonical visual
concepts shared
between video and
images
1. TVSum50
dataset
2. SumMe
dataset
TVSum
unsupervised
video summa-
rization
framework that
uses the video
title to find
visually
important shots
1) Titles are
free-formed,
unconstrain
ed, and
often
written
ambiguousl
y,
2) How to
learn all
titles text.
7
8. Paper 2: Query-Focused Video Summarization: Dataset,
Evaluation, and A Memory Network Based Approach [5].
One of the main obstacles to the research
on video summarization is the user
subjectivity — users have various
preferences over the summaries. The
subjectiveness causes at least two
problems. First, no single video
summarizer fits all users unless it interacts
with and adapts to the individual users.
Second, it is very challenging to evaluate
theperformanceofavideosummarizer..[5]
Aidean SharghiA, Jacob S. LaurelB, and Boqing GongA
A University of Central Florida, Orlando
B University of Alabama at Birmingham
IEEE Conference on Computer Vision and Pattern Recognition. 2017
Fig. 2- Comparing the semantic
information captured by 48 captions and
by the concept tags we collected.[8]8
9. Objective Propose
d
Method
Dataset Strengt
h
Limitatio
n
Main obstacles to
the research on
video
summarization is
the user
subjectivity--users
have various
preferences over
the summaries.
Auhtor propose
a memory net-
work
parameterized
sequential
determinantal
point process
in order to attend
the user query
onto different
video frames
and shots.
1. UTEgocentri
c (UTE)
dataset
1)Introduces
user
preferences
in the form of
text queries
2) Author
collect dense
per-video-
shot concept
annotations
1)Collecting
dense per-
video-shot
concept annota-
Tions
9
10. Paper 3: Query-Conditioned Three-Player Adversarial
Network for Video Summarization [9].
Video summarization plays an important role in video understanding by selecting key
frames/shots.Traditionally,itaimstofindthemostrepresentativeanddiversecontentsinavideoas
short summaries. In this paper, Author propose a query-conditioned three-player generative
adversarialnetworktotacklethischallenge.Thegeneratorlearnsthejointrepresentationoftheuser
query and the video content, and the discriminator takes three pairs of query-conditioned
summariesastheinputtodiscriminatetherealsummaryfromageneratedandarandomone.[9]
Yujia Zhang12, Michael Kampffmeyer3, Xiaodan Liang4, Min Tan12, Eric P. Xing4
1 Institute of Automation,Chinese Academy of Sciences
2 University of Chinese Academy of Sciences
3 UiT The Arctic University of Norway
4 Carnegie Mellon University
IEEE Conference on Computer Vision and Pattern Recognition. 2018
Fig 3. Different video summarization10
11. Objectiv
e
Proposed
Method
Dataset Strengt
h
Limitatio
n
Main aims to
find the most
representative
and diverse
contents
in a video as
short
summaries.
Author propose a
query-conditioned
three-player
generative
adversarial network
to tackle this
challenge. The
generator learns
the joint
representation of
the user query and
the video content,
1. UTEgocentr
ic (UTE)
dataset
Results are
more accurate
based on user
query
1)Do not
randomly
generated
summary
11
12. Paper 4: Hierarchical Structure-Adaptive RNN for
Video Summarization [10].
The video data follow a hierarchical
structure, a video is composed of
shots, and a shot is composed of
several frames. While few existing
summarization approaches pay
attention to the shot segmentation
procedure. They generate shots by
some trivial strategies, such as fixed
length segmentation, which may
destroy the underlying hierarchical
structure of video data and further
reduce the quality of generated
summaries.[10]
Authors- Bin Zhao1, Xuelong Li2, Xiaoqiang Lu2
1 Northwestern Polytechnical University, Shaanxi, P. R. China
2 Chinese Academy of Sciences, Shaanxi, P. R. China
IEEE Conference on Computer Vision and Pattern Recognition. 2018
Fig. 4- The diagram of the proposed
HSA-RNN, where Layer 1 and Layer 2
are designed to exploit the video
structure and generate the video
summary[10]
12
13. Objective Propose
d
Method
Dataset Strengt
h
Limitatio
n
To make the
underlying
hierarchical
structure of
video data and
further improve
the quality of
generated
summaries.
Author propose
a structure-
adaptive video
summarization
approach that
integrates shot
segmentation
and video
summarization
into a
Hierarchical
Structure-
Adaptive RNN
1. SumMe
dataset
2. TVSum
dataset
1) Use
hierarchical
structure of
video data
improve the
quality of
generated
summaries.
1) Results are
not based on
user
subjectivity
13
14. Paper 5: Unsupervised object-level video summarization
with online motion auto-encoder [11].
Unsupervised video summarization
plays an important role on
digesting, browsing, and searching
the ever-growing videos every day,
and the underlying fine-grained
semantic and motion information
(i.e., objects of interest and their
key motions) in online videos has
been barely touched.[11]
Authors-Yujia ZhangA,Xiaodan LiangB,Dingwen ZhangC,Min TanA,Eric P.XingB
A University of Chinese Academy of Sciences, Beijing, China
B Carnegie Mellon University, Pittsburgh, PA, USA.
C Xidian university, Xi’an, China
IEEE Conference on Computer Vision and Pattern Recognition. 2018
Fig. 4- Different types of video
summarization techniques.[11]
14
15. Objective Propose
d
Method
Dataset Strengt
h
Limitatio
n
To extract key
motions
of participated
objects and
learning to
summarize in an
unsupervised and
online manner
Author propose
a novel online
motion Auto-
Encoder (online
motion-AE)
framework that
functions on the
super-segmented
object motion
clips.
1) OrangeVille
2) Base jumping
dataset from
public CoSum
dataset
1)Video
Summarized
based on
moving
object
instances.
2) Tracking
of each
moving
object.
1)Tracking of
too many
moving object
in a high speed.
It willvery
complex.
3) Results are
not based on
user
subjectivity
15
16. Research Gap
Finding a title based video summarization where Titles are free-formed,
often written ambiguously having unsupervised learning of titles text.
Collecting dense annotations of per-video-shot using learning
algorithms.
Finding HSA RNN for Video Summarization based on user subjectivity
Finding Unsupervised object-level video summarization with online
motion auto-encoder with user subjectivity
Finding key frame based on extracted text and assign a weight to frame.
16
17. Challenges
Some Challenges related to video summarization:
learning of all titles text.
Accuracy of object learning algorithms.
Assigning weight for extracted text.
Recovering Loss of information
Computationally expensive
Evaluate the performance of a video summarizer
No single video summarizer fits all users
17
19. Datasets
UT Egocnetric (UTE) [5]
The dataset contains 4 videos from head-mounted cameras, each about
3-5 hours long. (Size: 1.4Gb)
SumMe [12]
The dataset consists of 25 videos which are single-shot and range in
length from 1-6 minutes. The dataset contains summaries created by 15
to 18 users with the constraint in length being that the summaries
should be 5% to 15% of the original video. (Size: 2.2 GB)
19
20. Datasets Cont…
YouTube-8M [2]
YouTube-8M is a large-scale labeled video dataset that consists of
millions of YouTube video IDs and associated labels from a diverse
vocabulary of 4700+ visual entities
Each video must be public and have at least 1000 views
Each video must be between 120 and 500 seconds long
Each video must be associated with at least one entity from our target
vocabulary
Adult & sensitive content is removed (as determined by automated
classifiers)
May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video,
2.6B audio-visual features
20
21. Tools
Matlab
Matlab is a commercial product that is pretty widely-used in the image
video processing community. It also has an adequate image processing
`toolbox,' and toolboxes for things like Kalman filters, neural networks,
genetic algorithms, and so on. It runs on most Unices, including Linux,
and on Windows 95/NT. For people who are researching into vision
algorithms, the lack of source code is a killer.
OpenCV
OpenCV is a library of programming functions mainly aimed at real-
time computer vision. Originally developed by Intel. The library is
cross-platform and free for use under the open-source BSD license.
21
22. Tools Cont…
Python
Python is an interpreted high-level programming language for general-
purpose programming. Created by Guido van Rossum and first released
in 1991, Python has a design philosophy that emphasizes code
readability, notably using significant whitespace
22
23. Conclusion:
23
The Text retrieval can be used to assign the weight for a frame and
that can be used as one more feature for generating video
summary.
24. References:
1. https://www.slideshare.net/MikolajLeszczuk/results-on-video-summarization
(D.L.V 01/09/18)
2. Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizing web
videos using titles,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 5179–5187, 2015
3. Y. Zhuang, R. Xiao, and F. Wu, “Key issues in video summarization and its
application,” in Information, Communications and Signal Processing, 2003
and Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003
Joint Conference of the Fourth International Conference on, vol. 1, pp. 448–
452, IEEE, 2003
4. R. Kansagara, D. Thakore, and M. Joshi, “A study on video summarization
tech-niques,” International Journal of Innovative Research in Computer and
Communication engineering, vol. 2, 2014.
5. A. Sharghi, J. S. Laurel, and B. Gong, “Query-focused video summarization:
Dataset, evaluation, and a memory network based approach,” in The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),pp. 2127–
2136, 201724
25. References Cont…
6. P. Mundur, Y. Rao, and Y. Yesha, “Keyframe-based video summarization using delaunay
clustering,” International Journal on Digital Libraries, vol. 6, no. 2, pp. 219–232, 2006
7. M,Padmavathi, Y. Rao, and Y. Yesha. "Keyframe-based video summarization using
Delaunay clustering." International Journal on Digital Libraries 6.2 (2006): 219-232.
8. S. Yeung, A. Fathi, and L. Fei-Fei. Videoset: Video summary evaluation through text.
arXiv preprint arXiv:1406.5824, 2014. 1, 2, 3, 4, 5, 8
9. Y. Zhang, M. Kampffmeyer, X. Liang, M. Tan, and E. P. Xing, “Query-conditioned
three-player adversarial network for video summarization,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2018.
10. B.Zhao, X.Li, & X.Lu,. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video
Summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 7405-7414) (2018).
11. Y.Zhang, X.Liang, D.Zhang, M.Tan, & E.P Xing Unsupervised Object-Level Video
Summarization with Online Motion Auto-Encoder. arXiv preprint
arXiv:1801.00543.(2018)
12. M.Gygli, H.Grabner, H.Riemenschneider, & L. Van Gool. Creating summaries from user
videos. In European conference on computer vision (pp. 505-520). Springer,
Cham.(2014)
25
Recently due to progress in Complementary Metal Oxide Semiconductor (CMOS) technology, Wireless Multimedia Sensor Networks (WMSNs) become focus of research in a broader range of applications.
Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.
Data aggregation eliminates redundancy and improves bandwidth utilization and energy-efficiency of sensor nodes.