Due to advances in recent multimedia technologies, various digital video contents become available from different multimedia sources. Efficient management, storage, coding, and indexing of video are required because video contains lots of visual information and requires a large amount of memory. This paper proposes an efficient video indexing method for video with rapid motion or fast illumination change, in which motion information and feature points of specific objects are used. For accurate shot boundary detection, we make use of two steps: block matching algorithm to obtain accurate motion information and modified displaced frame difference to compensate for the error in existing methods. We also propose an object matching algorithm based on the scale invariant feature transform, which uses feature points to group shots semantically. Computer simulation with five fast-motion video shows the effectiveness of the proposed video indexing method.
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
IOSR Journal of Electronics and Communication Engineering(IOSR-JECE) is an open access international journal that provides rapid publication (within a month) of articles in all areas of electronics and communication engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electronics and communication engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Performance Comparison of Digital Image Watermarking Techniques: A SurveyEditor IJCATR
Digital watermarking is the processing of combined information into a digital signal. A watermark is a secondary image,
which is overlaid on the host image, and provides a means of protecting the image. In order to provide high quality watermarked
image, the watermarked image should be imperceptible. This paper presents different techniques of digital image watermarking based
on spatial & frequency domain, which shows that spatial domain technique provides security & successful recovery of watermark
image and higher PSNR value compared to frequency domain.
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
IOSR Journal of Electronics and Communication Engineering(IOSR-JECE) is an open access international journal that provides rapid publication (within a month) of articles in all areas of electronics and communication engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electronics and communication engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Performance Comparison of Digital Image Watermarking Techniques: A SurveyEditor IJCATR
Digital watermarking is the processing of combined information into a digital signal. A watermark is a secondary image,
which is overlaid on the host image, and provides a means of protecting the image. In order to provide high quality watermarked
image, the watermarked image should be imperceptible. This paper presents different techniques of digital image watermarking based
on spatial & frequency domain, which shows that spatial domain technique provides security & successful recovery of watermark
image and higher PSNR value compared to frequency domain.
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSINGcscpconf
Recently the progress in technology and flourishing applications open up new forecast and defy
for the image and video processing community. Compared to still images, video sequences
afford more information about how objects and scenarios change over time. Quality of video is
very significant before applying it to any kind of processing techniques. This paper deals with
two major problems in video processing they are noise reduction and object segmentation on
video frames. The segmentation of objects is performed using foreground segmentation based
and fuzzy c-means clustering segmentation is compared with the proposed method Improvised
fuzzy c – means segmentation based on color. This was applied in the video frame to segment
various objects in the current frame. The proposed technique is a powerful method for image
segmentation and it works for both single and multiple feature data with spatial information.
The experimental result was conducted using various noises and filtering methods to show which is best suited among others and the proposed segmentation approach generates good quality segmented frames.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Faro Visual Attention For Implicit Relevance Feedback In A Content Based Imag...Kalle
In this paper we propose an implicit relevance feedback method with the aim to improve the performance of known Content Based Image Retrieval (CBIR) systems by re-ranking the retrieved images according to users’ eye gaze data. This represents a new mechanism for implicit relevance feedback, in fact usually the sources taken into account for image retrieval are based on the natural behavior of the user in his/her environment estimated by analyzing mouse and keyboard interactions. In detail, after the retrieval of the images by querying CBIRs with a keyword, our system computes the most salient regions (where users look with a greater interest) of the retrieved images by gathering data from an unobtrusive eye tracker, such as Tobii T60. According to the features, in terms of color, texture, of these relevant regions our system is able to re-rank the images, initially, retrieved by the CBIR. Performance evaluation, carried out on a set of 30 users by using Google Images and “pyramid” like keyword, shows that about the 87% of the users is more satisfied of the output images when the re-raking is applied.
VIDEO SEGMENTATION FOR MOVING OBJECT DETECTION USING LOCAL CHANGE & ENTROPY B...csandit
Motion detection and object segmentation are an important research area of image-video
processing and computer vision. The technique and mathematical modeling used to detect and
segment region of interest (ROI) objects comprise the algorithmic modules of various high-level
techniques in video analysis, object extraction, classification, and recognition. The detection of
moving object is significant in many tasks, such as video surveillance & moving object tracking.
The design of a video surveillance system is directed on involuntary identification of events of
interest, especially on tracking and on classification of moving objects. An entropy based realtime
adaptive non-parametric window thresholding algorithm for change detection is
anticipated in this research. Based on the approximation of the value of scatter of sections of
change in a difference image, a threshold of every image block is calculated discriminatively
using entropy structure, and then the global threshold is attained by averaging all thresholds for
image blocks of the frame. The block threshold is calculated contrarily for regions of change
and background. Investigational results show the proposed thresholding algorithm
accomplishes well for change detection with high efficiency.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Survey on Multimedia Content Protection Mechanisms IJECEIAES
Cloud computing has emerged to influence multimedia content providers like Disney to render their multimedia services. When content providers use the public cloud, there are chances to have pirated copies further leading to a loss in revenues. At the same time, technological advancements regarding content recording and hosting made it easy to duplicate genuine multimedia objects. This problem has increased with increased usage of a cloud platform for rendering multimedia content to users across the globe. Therefore it is essential to have mechanisms to detect video copy, discover copyright infringement of multimedia content and protect the interests of genuine content providers. It is a challenging and computationally expensive problem to be addressed considering the exponential growth of multimedia content over the internet. In this paper, we surveyed multimedia-content protection mechanisms which throw light on different kinds of multimedia, multimedia content modification methods, and techniques to protect intellectual property from abuse and copyright infringement. It also focuses on challenges involved in protecting multimedia content and the research gaps in the area of cloud-based multimedia content protection.
Digital enhancement of indian manuscript, yashodhar charitracsandit
Over the years, many of our ancient manuscripts have been damaged by natural elements or
intentionally erased and re-used to record other information. While manual preservation
techniques are being carried out for the conservation of our ancient texts, digital image
processing is an alternative for the archival storage of the invaluable text contained in them. To
successfully recover text from such documents, it is important to understand the nature of the
writing and materials on which they are written. Different imaging and processing techniques
are needed, depending on the the condition of the manuscript. In recent years, modern imaging
techniques have been applied to ancient manuscripts to recover writings that are not visible to
the naked eye or not recognizable due to various factors. In this paper, we apply imaging
techniques on an ancient manuscript, Yashodhar Charitra, and restore it digitally.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics.
In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denosing autoencoders to automatically extract robust and non-redundant features.The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition
accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips.
The objective of a video communication system is to deliver the maximum of video data from the source to the destination through a communication channel using all of its available bandwidth. To achieve this objective, the source coding should compress the original video sequence as much as possible and the compressed video data should be robust and resilient to channel errors. However, while achieving a high coding efficiency, compression also makes the coded video bitstream vulnerable to transmission errors. Thus, the process of video data compression tends to work against the objectives of robustness and resilience to errors. Therefore, extra information that needs to be transmitted in 3-D video has brought new challenge and consumer applications will not gain more popularity unless the 3-D video coding problems are addressed.
Visual hull construction from semitransparent coloured silhouettesijcga
This paper attempts to create coloured
semi
-
transparent shadow images that can be projected onto
multiple screens simultaneously from different viewpoints. The inputs to this approach are a set of coloured
shadow images and view angles, projection information and light configurations for the f
inal projections.
We propose a method to convert coloured semi
-
transparent shadow images to a 3D visual hull
. A
shadowpix type method is used to incorporate varying ratio RGB values for each voxel. This computes the
desired image independently for each vie
wpoint from an arbitrary angle. An attenuation factor is used to
curb the coloured shadow images beyond a certain distance. The end result is a continuous animated
image that changes due to the rotated projection of the transparent visual hull.
Terrain generation finds many applications such as
in CGI movies, animations and video games. This
paper describes a new and simple-to-implement terra
in generator called the Uplift Model. It is based o
n
the theory of crustal deformations by uplifts in Ge
ology. When a number of uplifts are made on the Ear
th’s
surface, the final net effect is an average of the
influence of each uplift at each point on the terra
in. The
result of applying this model from Nature is a very
realistic looking effect in the generated terrains
. The
model uses 6 parameters which allow for a great var
iety in landscape types produced. Comparisons are
made with other existing terrain generation algorit
hms. Averaging causes erosion of the surface wherea
s
fractal surfaces tend to be very jagged and more su
ited to alien worlds.
Ghost free image using blur and noise estimationijcga
This paper presents an efficient image enhancement method by fusion of two different exposure images in
low-light condition. We use two degraded images with different exposures: one is a long-exposure image
that preserves the brightness but contains blur and the other is a short-exposure image that contains a lot of
noise but preserves object boundaries. The weight map used for image fusion without artifacts of blur and
noise is computed using blur and noise estimation. To get a blur map, edges in a long-exposure image are
detected at multiple scales and the amount of blur is estimated at detected edges. Also, we can get a noise
map by noise estimation using a denoised short-exposure image. Ghost effect between two successive
images is avoided according to the moving object map that is generated by a sigmoid comparison function
based on the ratio of two input images. We can get result images by fusion of two degraded images using
the weight maps. The proposed method can be extended to high dynamic range imaging without using
information of a camera response function or generating a radiance map. Experimental results with various
sets of images show the effectiveness of the proposed method in enhancing details and removing ghost
artifacts.
MAGE Q UALITY A SSESSMENT OF T ONE M APPED I MAGESijcga
This paper proposes an objective assessment method
for perceptual image quality of tone mapped images.
Tone mapping algorithms are used to display high dy
namic range (HDR) images onto standard display
devices that have low dynamic range (LDR). The prop
osed method implements visual attention to define
perceived structural distortion regions in LDR imag
es, so that a reasonable measurement of distortion
between HDR and LDR images can be performed. Since
the human visual system is sensitive to structural
information, quality metrics that can measure struc
tural similarity between HDR and LDR images are
used. Experimental results with a number of HDR and
tone mapped image pairs show the effectiveness of
the proposed method.
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
We develop a polygonal mesh simplification algorithm based on a novel analysis of the mesh
geometry.
Particularly, we propose first a characterization of vertices as hyperbolic or non
-
hyperbolic depend
-
ing
upon their discrete local geometry. Subsequently, the simplification process computes a volume cost for
each non
-
hyperbolic vertex, in anal
-
ogy with spherical volume, to capture the loss of fidelity if that vertex
is decimated. Vertices of least volume cost are then successively deleted and the resulting holes re
-
triangulated using a method based on a novel heuristic. Preliminary experiments i
ndicate a performance
comparable to that of the best known mesh simplification algorithms
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSINGcscpconf
Recently the progress in technology and flourishing applications open up new forecast and defy
for the image and video processing community. Compared to still images, video sequences
afford more information about how objects and scenarios change over time. Quality of video is
very significant before applying it to any kind of processing techniques. This paper deals with
two major problems in video processing they are noise reduction and object segmentation on
video frames. The segmentation of objects is performed using foreground segmentation based
and fuzzy c-means clustering segmentation is compared with the proposed method Improvised
fuzzy c – means segmentation based on color. This was applied in the video frame to segment
various objects in the current frame. The proposed technique is a powerful method for image
segmentation and it works for both single and multiple feature data with spatial information.
The experimental result was conducted using various noises and filtering methods to show which is best suited among others and the proposed segmentation approach generates good quality segmented frames.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Faro Visual Attention For Implicit Relevance Feedback In A Content Based Imag...Kalle
In this paper we propose an implicit relevance feedback method with the aim to improve the performance of known Content Based Image Retrieval (CBIR) systems by re-ranking the retrieved images according to users’ eye gaze data. This represents a new mechanism for implicit relevance feedback, in fact usually the sources taken into account for image retrieval are based on the natural behavior of the user in his/her environment estimated by analyzing mouse and keyboard interactions. In detail, after the retrieval of the images by querying CBIRs with a keyword, our system computes the most salient regions (where users look with a greater interest) of the retrieved images by gathering data from an unobtrusive eye tracker, such as Tobii T60. According to the features, in terms of color, texture, of these relevant regions our system is able to re-rank the images, initially, retrieved by the CBIR. Performance evaluation, carried out on a set of 30 users by using Google Images and “pyramid” like keyword, shows that about the 87% of the users is more satisfied of the output images when the re-raking is applied.
VIDEO SEGMENTATION FOR MOVING OBJECT DETECTION USING LOCAL CHANGE & ENTROPY B...csandit
Motion detection and object segmentation are an important research area of image-video
processing and computer vision. The technique and mathematical modeling used to detect and
segment region of interest (ROI) objects comprise the algorithmic modules of various high-level
techniques in video analysis, object extraction, classification, and recognition. The detection of
moving object is significant in many tasks, such as video surveillance & moving object tracking.
The design of a video surveillance system is directed on involuntary identification of events of
interest, especially on tracking and on classification of moving objects. An entropy based realtime
adaptive non-parametric window thresholding algorithm for change detection is
anticipated in this research. Based on the approximation of the value of scatter of sections of
change in a difference image, a threshold of every image block is calculated discriminatively
using entropy structure, and then the global threshold is attained by averaging all thresholds for
image blocks of the frame. The block threshold is calculated contrarily for regions of change
and background. Investigational results show the proposed thresholding algorithm
accomplishes well for change detection with high efficiency.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Survey on Multimedia Content Protection Mechanisms IJECEIAES
Cloud computing has emerged to influence multimedia content providers like Disney to render their multimedia services. When content providers use the public cloud, there are chances to have pirated copies further leading to a loss in revenues. At the same time, technological advancements regarding content recording and hosting made it easy to duplicate genuine multimedia objects. This problem has increased with increased usage of a cloud platform for rendering multimedia content to users across the globe. Therefore it is essential to have mechanisms to detect video copy, discover copyright infringement of multimedia content and protect the interests of genuine content providers. It is a challenging and computationally expensive problem to be addressed considering the exponential growth of multimedia content over the internet. In this paper, we surveyed multimedia-content protection mechanisms which throw light on different kinds of multimedia, multimedia content modification methods, and techniques to protect intellectual property from abuse and copyright infringement. It also focuses on challenges involved in protecting multimedia content and the research gaps in the area of cloud-based multimedia content protection.
Digital enhancement of indian manuscript, yashodhar charitracsandit
Over the years, many of our ancient manuscripts have been damaged by natural elements or
intentionally erased and re-used to record other information. While manual preservation
techniques are being carried out for the conservation of our ancient texts, digital image
processing is an alternative for the archival storage of the invaluable text contained in them. To
successfully recover text from such documents, it is important to understand the nature of the
writing and materials on which they are written. Different imaging and processing techniques
are needed, depending on the the condition of the manuscript. In recent years, modern imaging
techniques have been applied to ancient manuscripts to recover writings that are not visible to
the naked eye or not recognizable due to various factors. In this paper, we apply imaging
techniques on an ancient manuscript, Yashodhar Charitra, and restore it digitally.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics.
In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denosing autoencoders to automatically extract robust and non-redundant features.The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition
accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips.
The objective of a video communication system is to deliver the maximum of video data from the source to the destination through a communication channel using all of its available bandwidth. To achieve this objective, the source coding should compress the original video sequence as much as possible and the compressed video data should be robust and resilient to channel errors. However, while achieving a high coding efficiency, compression also makes the coded video bitstream vulnerable to transmission errors. Thus, the process of video data compression tends to work against the objectives of robustness and resilience to errors. Therefore, extra information that needs to be transmitted in 3-D video has brought new challenge and consumer applications will not gain more popularity unless the 3-D video coding problems are addressed.
Visual hull construction from semitransparent coloured silhouettesijcga
This paper attempts to create coloured
semi
-
transparent shadow images that can be projected onto
multiple screens simultaneously from different viewpoints. The inputs to this approach are a set of coloured
shadow images and view angles, projection information and light configurations for the f
inal projections.
We propose a method to convert coloured semi
-
transparent shadow images to a 3D visual hull
. A
shadowpix type method is used to incorporate varying ratio RGB values for each voxel. This computes the
desired image independently for each vie
wpoint from an arbitrary angle. An attenuation factor is used to
curb the coloured shadow images beyond a certain distance. The end result is a continuous animated
image that changes due to the rotated projection of the transparent visual hull.
Terrain generation finds many applications such as
in CGI movies, animations and video games. This
paper describes a new and simple-to-implement terra
in generator called the Uplift Model. It is based o
n
the theory of crustal deformations by uplifts in Ge
ology. When a number of uplifts are made on the Ear
th’s
surface, the final net effect is an average of the
influence of each uplift at each point on the terra
in. The
result of applying this model from Nature is a very
realistic looking effect in the generated terrains
. The
model uses 6 parameters which allow for a great var
iety in landscape types produced. Comparisons are
made with other existing terrain generation algorit
hms. Averaging causes erosion of the surface wherea
s
fractal surfaces tend to be very jagged and more su
ited to alien worlds.
Ghost free image using blur and noise estimationijcga
This paper presents an efficient image enhancement method by fusion of two different exposure images in
low-light condition. We use two degraded images with different exposures: one is a long-exposure image
that preserves the brightness but contains blur and the other is a short-exposure image that contains a lot of
noise but preserves object boundaries. The weight map used for image fusion without artifacts of blur and
noise is computed using blur and noise estimation. To get a blur map, edges in a long-exposure image are
detected at multiple scales and the amount of blur is estimated at detected edges. Also, we can get a noise
map by noise estimation using a denoised short-exposure image. Ghost effect between two successive
images is avoided according to the moving object map that is generated by a sigmoid comparison function
based on the ratio of two input images. We can get result images by fusion of two degraded images using
the weight maps. The proposed method can be extended to high dynamic range imaging without using
information of a camera response function or generating a radiance map. Experimental results with various
sets of images show the effectiveness of the proposed method in enhancing details and removing ghost
artifacts.
MAGE Q UALITY A SSESSMENT OF T ONE M APPED I MAGESijcga
This paper proposes an objective assessment method
for perceptual image quality of tone mapped images.
Tone mapping algorithms are used to display high dy
namic range (HDR) images onto standard display
devices that have low dynamic range (LDR). The prop
osed method implements visual attention to define
perceived structural distortion regions in LDR imag
es, so that a reasonable measurement of distortion
between HDR and LDR images can be performed. Since
the human visual system is sensitive to structural
information, quality metrics that can measure struc
tural similarity between HDR and LDR images are
used. Experimental results with a number of HDR and
tone mapped image pairs show the effectiveness of
the proposed method.
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
We develop a polygonal mesh simplification algorithm based on a novel analysis of the mesh
geometry.
Particularly, we propose first a characterization of vertices as hyperbolic or non
-
hyperbolic depend
-
ing
upon their discrete local geometry. Subsequently, the simplification process computes a volume cost for
each non
-
hyperbolic vertex, in anal
-
ogy with spherical volume, to capture the loss of fidelity if that vertex
is decimated. Vertices of least volume cost are then successively deleted and the resulting holes re
-
triangulated using a method based on a novel heuristic. Preliminary experiments i
ndicate a performance
comparable to that of the best known mesh simplification algorithms
3D-COMPUTER ANIMATION FOR A YORUBA NATIVE FOLKTALEijcga
Computer graphics has wide range of applications which are implemented into computer animation,
computer modeling e.t.c. Since the invention of computer graphics researchers haven’t paid much of
attentions toward the possibility of converting oral tales otherwise known as folktales into possible cartoon
animated videos. This paper is based on how to develop cartoons of local folktales that will be of huge
benefits to Nigerians. The activities were divided into 5 stages; analysis, design, development,
implementation and evaluation which involved various processes and use of various specialized software
and hardware. After the implementation of this project, the video characteristics were evaluated using
likert scale. Analysis of 30 user responses indicated that 17 users (56.7%) rated the image quality as
excellent, the video and image synchronization was rated as excellent by 9 users (30%), the Background
noise was rated excellent by 18 users (60%), the Character Impression was rated Excellent by 11 users
(36.67%), the general assessment of the storyline was rated excellent by 17 users (56.7%), the video
Impression was rated excellent by 11 users (36.67%) and the voice quality was rated by 10 users (33.33%)
as excellent.
Analysis of design principles and requirements for procedural rigging of bipe...ijcga
Character rigging is a process of endowing a character with a set of custom manipulators and controls
making it easy to animate by the animators. These controls consist of simple joints, handles, or even
separate character selection windows.This research paper present an automated rigging system for
quadruped characters with custom controls and manipulators for animation.The full character rigging
mechanism is procedurally driven based on various principles and requirements used by the riggers and
animators. The automation is achieved initially by creating widgets according to the character type. These
widgets then can be customized by the rigger according to the character shape, height and proportion.
Then joint locations for each body parts are calculated and widgets are replaced programmatically.Finally
a complete and fully operational procedurally generated character control rig is created and attached with
the underlying skeletal joints. The functionality and feasibility of the rig was analyzed from various source
of actual character motion and a requirements criterion was met. The final rigged character provides an
efficient and easy to manipulate control rig with no lagging and at high frame rate.
Ghost and Noise Removal in Exposure Fusion for High Dynamic Range Imagingijcga
For producing a single high dynamic range image (HDRI), multiple low dynamic range images (LDRIs)
are captured with different exposures and combined. In high dynamic range (HDR) imaging, local motion
of objects and noise in a set of LDRIs can influence a final HDRI: local motion of objects causes the ghost
artifact and LDRIs, especially captured with under-exposure, make the final HDRI noisy. In this paper, we
propose a ghost and noise removal method for HDRI using exposure fusion with subband architecture, in
which Haar wavelet filter is used. The proposed method blends weight map of exposure fusion in the
subband pyramid, where the weight map is produced for ghost artifact removal as well as exposure fusion.
Then, the noise is removed using multi-resolution bilateral filtering. After removing the ghost artifact and
noise in subband images, details of the images are enhanced using a gain control map. Experimental
results with various sets of LDRIs show that the proposed method effectively removes the ghost artifact and
noise, enhancing the contrast in a final HDRI.
Terra formation control or how to move mountainsijcga
The new Uplift Model of terrain generation is generalized here and provides new possibilities for terra
formation control unlike previous fractal terrain generation methods. With the Uplift Model fine-grained
editing is possible allowing the designer to move mountains and small hills to more suitable locations
creating gaps or valleys or deep bays rather than only being able to accept the positions dictated by the
algorithm itself. Coupled with this is a compressed file storage format considerably smaller in size that the
traditional height field or height map storage requirements.
AUTOMATIC DETECTION OF LOGOS IN VIDEO AND THEIR REMOVAL USING INPAINTIijcga
In this paper, we propose an algorithm that automatically detects and removes moving logos in video. The
proposed algorithm consists of logo detection, tracking and extraction, and removal. In the logo detection
step, we automatically detect moving logos using the saliency map and the scale invariant feature transform.
In the logo tracking and extraction step, an improved mean shift tracking method and backward tracking
technique are used in which only logo regions are extracted by using color information. Finally, in the logo
removal step, the detected logo region is filled with the region of neighborhood using an exemplar-based
inpainting that can fill a large detected region without artifacts. These steps are effectively interconnected
using control flags. Experimental results with various test sequences show that the proposed algorithm is
effective to detect, track, and remove moving logos in video.
3D M ESH S EGMENTATION U SING L OCAL G EOMETRYijcga
We develop an algorithm to segment a 3D surface mesh intovisually meaningful regions based
on
analyzing the local geometry ofvertices. We begin with a novel characterization of mesh vertices as
convex,concave or hyperbolic depending upon their discrete local geometry.Hyperbolic and concave
vertices are considered potential feature regionboundari
es. We propose a new region growing technique
starting fromthese boundary vertices leading to a segmentation of the surface thatis subsequently simplified
by a region
-
merging method. Experiments indicatethat our algorithm segments a broad range of 3D
model
s at aquality comparable to existing algorithms. Its use of methods that belongnaturally to discretized
surfaces and ease of implementation makeit an appealing alternative in various applications
One of the most efficient ways of generating goal-directed walking motions is synthesising the final motion
based on footprints. Nevertheless, current implementations have not examined the generation of continuous
motion based on footprints, where different behaviours can be generated automatically. Therefore, in this
paper a flexible approach for footprint-driven locomotion composition is presented. The presented solution
is based on the ability to generate footprint-driven locomotion, with flexible features like jumping, running,
and stair stepping. In addition, the presented system examines the ability of generating the desired motion
of the character based on predefined footprint patterns that determine which behaviour should be
performed. Finally, it is examined the generation of transition patterns based on the velocity of the root and
the number of footsteps required to achieve the target behaviour smoothly and naturally.
Segmentation of cysts in kidney and 3 d volume calculation from ct images ijcga
This paper proposes a segmentation method and a three-dimensional (3-D) volume calculation method of
cysts in kidney from a number of computer tomography (CT) slice images. The input CT slice images
contain both sides of kidneys. There are two segmentation steps used in the proposed method: kidney
segmentation and cyst segmentation. For kidney segmentation, kidney regions are segmented from CT slice
images by using a graph-cut method that is applied to the middle slice of input CT slice images. Then, the
same method is used for the remaining CT slice images. In cyst segmentation, cyst regions are segmented
from the kidney regions by using fuzzy C-means clustering and level-set methods that can reduce noise of
non-cyst regions. For 3-D volume calculation, cyst volume calculation and 3-D volume visualization are
used. In cyst volume calculation, the area of cyst in each CT slice image equals to the number of pixels in
the cyst regions multiplied by spatial density of CT slice images, and then the volume of cysts is calculated
by multiplying the cyst area and thickness (interval) of CT slice images. In 3-D volume visualization, a 3-D
visualization technique is used to show the distribution of cysts in kidneys by using the result of cyst volume
calculation. The total 3-D volume is the sum of the calculated cyst volume in each CT slice image.
Experimental results show a good performance of 3-D volume calculation. The proposed cyst segmentation
and 3-D volume calculation methods can provide practical supports to surgery options and medical
practice to medical students
A UGMENT R EALITY IN V OLUMETRIC M EDICAL I MAGING U SING S TEREOSCOPIC...ijcga
This paper is written about augment reality in medi
cine. Medical imaging equipment (CT, PET, MRI) are
produced 3D volumetric data, so using the stereosco
pic 3D display, observer feels depth perception. Th
e
major factors about depth-Convergence, Accommodatio
n, Relative size are tested. Convergence and
Accommodation have affected depth perception but re
lative size is negligibl
The midpoint method or technique is a “measurement” and as each measurement it has a tolerance, but
worst of all it can be invalid, called Out-of-Control or OoC. The core of all midpoint methods is the accurate
measurement of the difference of the squared distances of two points to the “polar” of their midpoint
with respect to the conic. When this measurement is valid, it also measures the difference of the squared
distances of these points to the conic, although it may be inaccurate, called Out-of-Accuracy or OoA. The
primary condition is the necessary and sufficient condition that a measurement is valid. It is comletely
new and it can be checked ultra fast and before the actual measurement starts. .
Modeling an incremental algorithm, shows that the curve must be subdivided into “piecewise monotonic”
sections, the start point must be optimal, and it explains that the 2D-incremental method can find, locally,
the global Least Square Distance. Locally means that there are at most three candidate points for a given
monotonic direction; therefore the 2D-midpoint method has, locally, at most three measurements.
When all the possible measurements are invalid, the midpoint method cannot be applied, and in that case
the ultra fast “OoC-rule” selects the candidate point. This guarantees, for the first time, a 100% stable,
ultra-fast, berserkless midpoint algorithm, which can be easily transformed to hardware. The new algorithm
is on average (26.5±5)% faster than Mathematica, using the same resolution and tested using 42
different conics. Both programs are completely written in Mathematica and only ContourPlot[] has been
replaced with a module to generate the grid-points, drawn with Mathematica’s
Graphics[Line{gridpoints}] function.
A WIRE PARAMETER AND REACTION MANAGER BASED BIPED CHARACTER SETUP AND RIGGI...ijcga
n this paper a unique way of rigging a bipedal character
is discussed
inside 3Ds Max, using constraints,
wire parameter setup and linking the controls. 3Ds Max is more commonly used for game character
modeling and rigging. For
this purpose a
built
-
in Biped setup is present that is a pre
-
rigged structure,
which can be easily
applied on
any
bipedal
character type easily and efficiently. However, the built
-
in
Biped system lacks the
functionality of manual
control and motion retargeted manipulation, mostly found in
custom rigs. This
proposed
system builds a custom rig using a
uni
que
approach of
combining
multiple tools
and expressions.
This paper presents a geometric approach to the coordinatization of a measured space called the Map
Maker’s algorithm. The measured space is defined by a distance matrix for sites which are reordered and
mapped to points in a two-dimensional Euclidean space. The algorithm is tested on distance matrices
created from 2D random point sets and the resulting coordinatizations compared with the original point
sets for confirmation. Tolerance levels are set to deal with the cumulative numerical errors in the
processing of the algorithm. The final point sets are found to be the same apart from translations,
reflections and rotations as expected. The algorithm also serves as a method for projecting higher
dimensional data to 2D.
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...Journal For Research
The ever increasing number of surveillance camera networks being deployed all over the world has not only resulted in a high interest in the development of algorithms to automatically analyze the video footage, but has also opened new questions as how to efficiently manage the vast amount of information generated. The user may not have sufficient time to watch the entire video or the whole of video content may not be of interest to the user. In such cases, the user may just want to view the summary of the video instead of watching the whole video. In this paper, we present a video summarization technique developed in order to efficiently access the points of interest in the video footage. The technique aims to eliminate the sequences which contain no activity of significance. The system being developed actually captures each frame from the video, then it processes the frame; if the frame is of its interest, it retains the frames otherwise it discards the frame; hence the resultant video is very short. The proposed method is extended to obtain rare event detection for security systems. These rare event detections refer to suspicious scenarios. The system will consider a particular frame of interest from a video footage taken at given time and search for actions from video footages across the particular area of interest specified by the user. The user is then notified about the objects and actions occurred in the area of interest. This helps in detecting suspicious behavior that would have otherwise been deemed unsuspicious and gone unnoticed in the context of a narrow timeframe.
Comparative Study of Different Video Shot Boundary Detection Techniquesijtsrd
Video shot boundary detection is the crucial step in the field of research of video processing. This makes the task of video retrieval based on contents, indexing and browsing. This paper contains the review of different techniques and methods which implemented to achieve the task of SBD along with key performance measurement parameters. This explains different preprocessing techniques, feature extraction methodologies, similarity computation techniques, etc. The outcomes of different approaches are framed with reference to accuracy, speed of computation along with comparison of precision, recall and F1 score. Swati Hadke | Ravi Mishra "Comparative Study of Different Video Shot Boundary Detection Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-5 , August 2022, URL: https://www.ijtsrd.com/papers/ijtsrd50630.pdf Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/50630/comparative-study-of-different-video-shot-boundary-detection-techniques/swati-hadke
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
A video fingerprint is a recognizer that is derived from a piece of video content. The video fingerprinting
methods obtain unique features of a video that differentiates one video clip from another. It aims to identify
whether a query video segment is a copy of video from the video database or not based on the signature of
the video. It is difficult to find whether a video is a copied video or a similar video, since the features of the
content are very similar from one video to the other. The main focus of this paper is to detect that the query
video is present in the video database with robustness depending on the content of video and also by fast
search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithms are adopted in
this paper to achieve robust, fast, efficient and accurate video copy detection. As a first step, the
Fingerprint Extraction algorithm is employed which extracts a fingerprint through the features from the
image content of video. The images are represented as Temporally Informative Representative Images
(TIRI). Then, the second step is to find the presence of copy of a query video in a video database, in which
a close match of its fingerprint in the corresponding fingerprint database is searched using inverted-filebased
method. The proposed system is tested against various attacks like noise, brightness, contrast,
rotation and frame drop. Thus the performance of the proposed system on an average shows high true
positive rate of 98% and low false positive rate of 1.3% for different attacks.
Content based video retrieval using discrete cosine transformnooriasukmaningtyas
A content based video retrieval (CBVR) framework is built in this paper.
One of the essential features of video retrieval process and CBVR is a color
value. The discrete cosine transform (DCT) is used to extract a query video
features to compare with the video features stored in our database. Average
result of 0.6475 was obtained by using the DCT after implementing it to the
database we created and collected, and on all categories. This technique was
applied on our database of video, check 100 database videos, 5 videos in
Keywords: each category.
Video content analysis and retrieval system using video storytelling and inde...IJECEIAES
Videos are used often for communicating ideas, concepts, experience, and situations, because of the significant advances made in video communication technology. The social media platforms enhanced the video usage expeditiously. At, present, recognition of a video is done, using the metadata like video title, video descriptions, and video thumbnails. There are situations like video searcher requires only a video clip on a specific topic from a long video. This paper proposes a novel methodology for the analysis of video content and using video storytelling and indexing techniques for the retrieval of the intended video clip from a long duration video. Video storytelling technique is used for video content analysis and to produce a description of the video. The video description thus created is used for preparation of an index using wormhole algorithm, guarantying the search of a keyword of definite length L, within the minimum worst-case time. This video index can be used by video searching algorithm to retrieve the relevant part of the video by virtue of the frequency of the word in the keyword search of the video index. Instead of downloading and transferring a whole video, the user can download or transfer the specifically necessary video clip. The network constraints associated with the transfer of videos are considerably addressed.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
In this work, we define a new method for indexing and retrieving non-geotagged
video sequences based on the visual content only by using the Local Binary Pattern
(LBP) and Singular Value Decomposition (SVD) techniques. The main question of our
system, Is it possible to determine the geographic location of a video film on the GISmap
from just its pixels of frames?. The proposed system is introduced to answer the
questions like that. The GIS database was constructed by storing the reference images
on the intersection between segment roads in the map. The Local Binary Pattern
(LBP) is used to extract the features form images. The Singular Value Decomposition
(SVD) technique is used for compress the length of features and indexing the images
in the database. The input to the system is a video taken from the camera puts on a
vehicle as forward facing camera. The output of the proposed system is the geolocation
of keyframes of video which correspond the geo-tagged images retrieved
from the GIS database.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
Key-frame extraction is one of the important steps in semantic concept based video indexing and retrieval and accuracy of video concept detection highly depends on the effectiveness of keyframe extraction method. Therefore, extracting key-frames efficiently and effectively from video shots is considered to be a very challenging research problem in video retrieval systems. One of many approaches to extract key-frames from a shot is to make use of unsupervised clustering. Depending on the salient content of the shot and results of clustering, key-frames can be extracted. But usually, because of the visual complexity and/or the content of the video shot, we tend to get near duplicate or repetitive key-frames having the same semantic content in the output and hence accuracy of key-frame extraction decreases. In an attempt to improve accuracy, we proposed a novel key-frame extraction method based on unsupervised clustering and mutual comparison where we assigned 70% weightage to color component (HSV histogram) and 30% to texture (GLCM), while computing a combined frame similarity index used for clustering. We suggested a mutual comparison of the key-frames extracted from the output of the clustering where each key-frame is compared with every other to remove near duplicate keyframes. The proposed algorithm is both computationally simple and able to detect non-redundant and unique key-frames for the shot and as a result improving concept detection rate. The efficiency and effectiveness are validated by open database videos.
Coronary heart disease is a disease with the highest mortality rates in the world. This makes the development of the diagnostic system as a very interesting topic in the field of biomedical informatics, aiming to detect whether a heart is normal or not. In the literature there are diagnostic system models by combining dimension reduction and data mining techniques. Unfortunately, there are no review papers that discuss and analyze the themes to date. This study reviews articles within the period 2009-2016, with a focus on dimension reduction methods and data mining techniques, validated using a dataset of UCI repository. Methods of dimension reduction use feature selection and feature extraction techniques, while data mining techniques include classification, prediction, clustering, and association rules.
Key frame extraction is an essential technique in the computer vision field. The extracted key frames should brief the salient events with an excellent feasibility, great efficiency, and with a high-level of robustness. Thus, it is not an easy problem to solve because it is attributed to many visual features. This paper intends to solve this problem by investigating the relationship between these features detection and the accuracy of key frames extraction techniques using TRIZ. An improved algorithm for key frame extraction was then proposed based on an accumulative optical flow with a self-adaptive threshold (AOF_ST) as recommended in TRIZ inventive principles. Several video shots including original and forgery videos with complex conditions are used to verify the experimental results. The comparison of our results with the-state-of-the-art algorithms results showed that the proposed extraction algorithm can accurately brief the videos and generated a meaningful compact count number of key frames. On top of that, our proposed algorithm achieves 124.4 and 31.4 for best and worst case in KTH dataset extracted key frames in terms of compression rate, while the-state-of-the-art algorithms achieved 8.90 in the best case.
Similar to Efficient video indexing for fast motion video (20)
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
1. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
DOI : 10.5121/ijcga.2014.4204 39
EFFICIENT VIDEO INDEXING FOR FAST-MOTION
VIDEO
Min-Ho Park and Rae-Hong Park
Department of Electronic Engineering, School of Engineering, Sogang University
35 Baekbeom-ro (sinsu-dong), Mapo-gu, Seoul 121-742, Korea
ABSTRACT
Due to advances in recent multimedia technologies, various digital video contents become available from
different multimedia sources. Efficient management, storage, coding, and indexing of video are required
because video contains lots of visual information and requires a large amount of memory. This paper
proposes an efficient video indexing method for video with rapid motion or fast illumination change, in
which motion information and feature points of specific objects are used. For accurate shot boundary
detection, we make use of two steps: block matching algorithm to obtain accurate motion information and
modified displaced frame difference to compensate for the error in existing methods. We also propose an
object matching algorithm based on the scale invariant feature transform, which uses feature points to
group shots semantically. Computer simulation with five fast-motion video shows the effectiveness of the
proposed video indexing method.
KEYWORDS
Scene Change Detection, Shot Boundary Detection, Key Frame, Block Matching Algorithm, Displaced
Frame Difference, Scale Invariant Feature Transform, Video Indexing.
1. INTRODUCTION
Recently, computing speed and memory capacity have increased greatly. With personal computer
or mobile multimedia devices such as smart phone, personal digital assistant, notebook, and so on,
one can connect to the Internet and surf to get, copy, browse, and transmit digital information
easily. Demand for multimedia information such as text, document, images, music, and video has
also increased according to the rapid advance in multimedia consumer products or devices.
Mostly text-based search services have been provided rather than multimedia information search
services such as music video that requires huge memory and a high computational load. Digitized
multimedia information is searched at the sites which are connected through the Internet and used
for various multimedia applications. Different from text-based information, it is relatively
difficult to generate keywords for search of multimedia information. Also automatic interpretation
of image or video is a big challenge. Text-based information can be analyzed and understood by
simply recognizing characters themselves, while it is a challenging problem to develop a general
algorithm that can analyze and understand nontext-based information such as video, photos, and
music that are produced under various conditions. The text-based search services have a limit on
the amount of information because of the rapid spread of multimedia consumer products or
devices and the Internet. The huge amount of digital multimedia contents available today in
digital libraries and on the Internet requires effective tools for accessing and browsing visual data
2. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
40
information.
Video is composed of many scenes each of which involves one story unit. This story unit consists
of many shots that contain visual content elements. We can understand the story by recognizing
these elements. Thus, content elements are very important to summarize or retrieve video
sequences. Shot boundary detection (SBD) is an essential step to efficient video search and
browsing, and thus a large number of SBD methods have been presented. Most SBD methods are
reasonably effective for specific types of video they are developed for, however it is noted that a
single SBD method using a simple low-level image feature cannot work for all types of scene
boundaries, especially for general video sequences as human viewers can do. Most of existing
SBD methods has drawbacks in processing video that contains rapid illumination changes or fast
moving objects and background [12].
As a solution to these problems and for efficient search, storage, and management of videos, SBD
and video content based indexing methods that group similar shots have been developed. SBD
finds shot boundaries of scene changes and splits a video into a number of video segments. For
SBD, features extracted from a given video include intensity/color difference, motion vector
(MV), texture, and so on. For performance improvement, more than two features are combined to
give reliable results that are robust to illumination change and abrupt or complex motion. Video
indexing is performed by computing similarities between input video and representative key
frames selected among shots.
For efficient video indexing of fast-motion video, this paper uses our previous work on SBD [12,
15]. Our previous SBD method uses the modified intensity difference and MV features for fast-
motion video containing motion blur. The similarity between key frames extracted from detected
shots is computed using the scale invariant feature transform (SIFT) that is invariant to size and
illumination change. By using the computed similarity as a measure to group similar shots, a
video is decomposed into a number of meaningful scenes, in which a number of frames including
the key frame are used for robustness to motion blur or rotation of objects in a scene.
The main contributions of the paper are: (i) object based shot grouping using the SIFT descriptor,
and (ii) shot change detection and video indexing for fast-motion video using motion information.
The rest of the paper is structured as follows. Section 2 reviews on SBD and video indexing.
Section 3 proposes a video indexing method using our previous SBD methods for fast-motion
video. Section 4 gives experimental results of video indexing and discussions. Finally, Section 5
draws conclusion of the paper.
2. RELATED WORK: SBD AND VIDEO INDEXING
Compared with other multimedia information, video requires huge memory storage and contains
a vast amount of information as well. By character recognition, text-based information can be
easily and heuristically retrieved and understood. However, video analysis is much more complex
than text analysis and there exists no generic video analysis technique that successfully deals with
a variety of videos that vary depending on the intent of video producer. There are a number of
existing video analysis methods with their performance limited on the specific application. Shot
detection is used to enhance the performance of video analysis by decomposing a complex video
into a number of simple video segments.
3. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
41
2.1. SBD
Video consists of a number of scenes of different content and scene is composed of a number of
shots, in which shot is a basic unit of content and consists of a number of frames. Figure 1 shows
a hierarchical structure of video: video, scene, shot, and frame. SBD represents finding
boundaries of shots, i.e., basic units of content. SBD methods are classified into two categories
depending on the features used: frame difference and motion information, each of which will be
described in more detail.
Figure 1. Hierarchical structure of video.
2.1.1. SBD Using Frame Difference Measure
For SBD, the simplest low-level one among various features extracted from input video is the
sum of absolute frame differences (FDs) of pixel intensities. Every shot has different objects,
people, and background, where shot boundary is defined as the frame at which change occurs.
Using this characteristic, the sum of absolute FDs is used to find the shot boundary. Between
successive frames k and 1k , framewise FD measure )(kd at frame k is defined as [1, 2]
where ),,( kyxI represents intensity at pixel ),( yx at frame k with YX denoting frame size.
This measure is computationally fast and gives satisfactory SBD results for video with slow
object/camera motion, however, produces incorrect results for video containing fast motion or
abrupt illumination change.
To solve the problem, a SBD method using the displaced frame difference (DFD) measure is
used. It gives better result robust to motion of objects or people than the method using the FD
measure, however, still gives inaccurate result for video with illumination change and abrupt
motion.
2.1.2. SBD Using Motion Information
Motion information is another low-level feature for SBD. An improved method [3] uses motion
estimation (ME) to avoid performance degradation of SBD methods by motions. This approach
considers both object and camera motions, however, still shows degraded SBD performance for
1
0
1
0
),,()1,,()(
X
x
Y
y
kyxIkyxIkd (1)
4. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
42
fast-motion video such as sports clip. Other motion-based SBD methods [5], [17], [18] consider
dominant or global camera motions and detect shot boundaries by finding abrupt changes in
dominant motions. Each shot has relatively similar motions of objects, people, and camera.
Usually camera motion is relatively the same between frames within a shot and abruptly changes
between shots. This characteristic of content change between shots can be used for SBD, e.g., by
block matching algorithm (BMA) [4] and spatial temporal slice [5].
BMA and optical flow can compute motion of small objects or people between video frames, thus
they can predict object motion in successive frames and detect shot boundaries as the frames at
which abrupt motion occurs. On the contrary, spatial temporal slice method focuses on detecting
camera motion. Figure 2 shows SBD computed by the spatial temporal slice method, where the
horizontal direction represents time index and each vertical line denotes the intensity of the pixels
on the slice lying across the image frames.
Figure 2. SBD using the spatial temporal slice method.
Although motion based SBD methods consider global (camera) or local (object) motions, the
performance is sensitive to object or camera motion. The image frames in a video sequence are
easily blurred by fast motion, and the conventional features computed from the blurred images are
not effective or reliable for determining whether the frame is a shot boundary or not. In video
with fast motion of object or people and rapid illumination change, shot boundary is detected
incorrectly with large discontinuity measure (computed using low-level image features such as
color, edge, shape, motion, or their combinations) similar to that of correct shot boundary. In this
case, it is difficult to select an optimum threshold for SBD. For example, if threshold is set to a
low value, shot boundaries are incorrectly detected with ambiguous discontinuity measure values
(false detection). On the contrary, if threshold is set to a high value, true shot boundaries are
missed (miss detection). Also an adaptive SBD algorithm was proposed to detect not only abrupt
transitions, but also gradual transitions [16].
In this paper, instead of changing threshold adaptively, our previous SBD approach [12] is used,
in which motion based features were developed for SBD especially in the presence of fast moving
objects/background. The paper addresses the problem of false/miss detection of shot boundaries
in fast-motion video and presents a new algorithm that can overcome the problem. The algorithm
reduces the possibility of miss/false detection using the motion-based features: the modified DFD
and the blockwise motion similarity.
5. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
43
2.2. Key Frame Detection
Key frame denotes the frame that represents content contained in the shot. Shot is assumed to
contain no abruptly appearing or disappearing objects or people. Thus, rather than using all the
frames, only a small number of key frames containing distinctive objects or people can be
efficiently used for video indexing, which reduces data redundancy and does not require a high
computational load.
A simple and common method for selecting key frames [3] is to choose the frame in the middle of
the shot. Another method is to find start and end frames of the shot, or to select frames by
uniform sampling of frames. However, this method uses fixed sampling interval of frames, thus
has a drawback that it does not consider which frame is capable of representing the shot
effectively. If the sampling interval is small, many key frames are selected and thus can describe
content detail of the video with a high computational load. On the contrary, with a large sampling
interval, a small number of key frames may miss the frame containing distinctive objects or
people. Various key frame extraction methods for video summary were reviewed and compared
based on the method, data set, and the results [6]. A feature-based key frame extraction method
for structuring low quality video sequences was presented [7]. Also, a spatial-temporal approach
based key frame extraction method was developed [8].
2.3. Video Indexing
Video indexing is defined as the automatic process of content-based classification of video data.
In general video indexing [3], video is decomposed into a number of shots by SBD and video
indexing is performed by computing the similarity between shots using the similarity of key
frames that are representative frames of shots. In computing the similarity between key frames,
features such as pixel intensity/color, motion information, shot length, and distance between shot
boundaries are used. A video indexing method based on a spatio-temporal segmentation was
proposed, in which salient regions are used as scene descriptors for video indexing [9]. A novel
active learning approach based on the optimum experimental design criteria in statistics was also
presented [10]. The computed similarities between key frames are represented as similarities
between shots by a semantic representation method [11].
3. VIDEO INDEXING
This section proposes a video indexing method. For video indexing application, a SBD method
for fast-motion video, which was proposed by the same authors [12, 15], is reviewed and its
application to video indexing is presented.
3.1. SBD for Fast-Motion Video
This paper uses a SBD method proposed for fast-motion video [15], which employs two features
[12]: motion similarity measure and modified DFD that compensates for the FD error caused by
object or camera motion. The discontinuity measure used for SBD is a combination of two
measures. Two measures are briefly reviewed.
6. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
44
3.1.1. Motion Similarity Measure
Shot boundary is determined as the frame at which motion directions of objects/people and
background change abruptly. The same or similar shots contain usually smooth motion flow of
objects/people and background whereas different ones have different objects/people and
background. In [15], a blockwise motion similarity was proposed by detecting changes in motion
directions of the matched macroblocks (MBs) in three successive frames using the BMA.
The similarity measure of the motion direction at MB i is defined as
otherwise,0
),1()1,(if,1
)(
Thkkkk
kMS
ii
i (2)
where )1,( kki represents direction angle of MB computed from frames k and ,1k and Th
denotes the angle threshold. The motion similarity )(kMS at frame k is defined as the average of
:)(kMSi
)(
1
)(
1
kMS
N
kMS
bN
i
i
b
(3)
where 2
/ MXYNb represents the total number of non-overlapping MBs in a frame, with
YX and MM denoting the frame size and MB size, respectively. The motion similarity
)(kMS is larger if more block pairs at frame k have similar motion directions in a shot.
3.1.2. Modified DFD
The DFD has been widely used as a measure of discontinuity, because motion-based difference
defined between two successive frames is a more efficient feature than other intensity-based
features [13]. The blockwise DFD ),,( kvud iii computed by the BMA [4] is defined by
2
1
0
1
0
2,
)),,()1,,((
1
min),,(
M
x
M
y
ii
MvuM
iii kyxIkvyuxI
M
kvud
(4)
where ),,( kyxIi signifies the intensity at ),( yx of MM MB i at frame ,k and u and v
denote horizontal and vertical displacements, respectively, with )12()12( MM search range
assumed. The framewise DFD )(kd at frame k is computed by averaging the blockwise DFDs:
bN
i
iii
b
kvud
N
kd
1
),,(
1
)( (5)
where iu and iv signify the horizontal and vertical components of the MV for MB ,i respectively,
and bN denotes the number of MBs at frame .k The DFD value at shot boundary is high whereas
low at non-shot boundary. It is a useful feature for video sequences with slow motions of
objects/people and relatively uniform illumination. However, the DFD value obtained from fast-
motion sequences such as action movies or sports clip is too ambiguous for accurate SBD because
7. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
45
blur produced by fast motion gives large ME errors. Most sequences in fast-motion video are
somewhat blurred because the exposure time of video camera is long enough to produce blurring,
in which motion blur cannot be ignored.
We assume that the DFD in a motion-blurred image is larger than that in a non-blurred image, and
that the larger the motion magnitude, the larger the DFD. To get a DFD value for accurate SBD, a
modified DFD [15] is used, in which the motion magnitude term and the distribution of the DFD
value that is computed in a sliding window with size of 1N :(N even) is used. First, we
simply divide the conventional blockwise DFD ),,( kvud iii by the motion magnitude
)()( 22
kvku ii for non-zero MV. The normalized framewise DFD d*
(k) at frame k is defined as
)(
1
)(
1
**
kd
N
kd
bN
i
i
b
.
(6)
In [15], it is observed that the magnitude of the normalized framewise DFD )(*
kd due to fast
motion is reduced to some degree, while that due to shot boundaries is not significantly affected.
Therefore, it is noted that the normalization decreases ambiguous values that give false
classification in fast-motion video and reduces the possibility of false detection at shot boundaries.
We can reduce miss detection by choosing shot boundary candidates that satisfy the following
condition:
otherwise),(
)(
1
)(
3
1
and)(max)(if),(
)(
*
2/
2/
1
1
1
2/2/**
kd
jkd
N
C
jkdjkdkdkd
kd
N
Njj
NjN (7)
where 1C is an empirically selected constant. In most of the video sequences, it is found that, at a
shot boundary ,k the DFD )(kd is usually maximum among the consecutive N frames and the
average value of the framewise DFDs in three frames around the shot boundary k is substantially
larger than those of N frames. In this case, )(kd is not normalized just to enhance the chance of
being detected as a shot boundary.
3.1.3. Discontinuity Measure and Rules for SBD
Through experiments with various test video sequences, both the modified DFD and the motion
direction similarity measure are shown to be highly effective for SBD of fast-motion video. Next,
a measure of discontinuity based on the combination of the two features [15] is used:
2/
2/
2
2/
2/
**
**
)(
1
1
)(
exp
)(
1
1
)(
)(
N
Nj
N
Nj
jkMS
N
kMSC
jkd
N
kd
kZ (8)
where C2 is a constant that determines the rate of decrease of Z(k) by the motion similarity at shot
boundaries. Equation (8) shows that the discontinuity measure Z(k) is large when two features
change abruptly whereas small when they have little change. This different characteristic of the
8. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
46
discontinuity measure can be utilized for separation of shot boundaries from non-shot boundaries
and to reduce the ambiguity in SBD. A sliding window is used to consider the temporal variation
of the discontinuity measure. We can determine shot boundaries using following detection rules:
0,2/2/),()( jNjNjkZkZ (9)
2/
2/
3
)(
1
)(
N
Nj
jkZ
N
C
kZ
.
(10)
If the discontinuity measure at frame k satisfies detection rules, it is assumed that an abrupt shot
change occurs at frame k. The first detection rule in equation (9) finds shot boundary candidates
whereas the second one in equation (10) prevents gradual cuts from being selected.
3.2. Key Frame Extraction
Different shots contain different objects, people, and background. Although we cannot recognize
them and thus do not know what objects are contained in each shot, we can detect shot boundaries
by perceiving the sudden changes of visual content elements. The SIFT [14] matches the query
image with one of the training images. In SBD, a training image corresponds to the reference
frame and the query image is the current frame. It is assumed that the number of matched points
is large between successive frames within the same shot, whereas that at the shot boundary is zero
or much smaller than that within the same shot. Thus, the number of matched points is used as a
criterion for SBD [15].
The key idea in SBD methods using low-level features is that image attributes or property change
abruptly between shots, for example, conventional SBD methods find abrupt changes in color
histogram or object/camera motions. In this paper, we investigate the discontinuity of local image
features such as those extracted from objects, people, and background at shot boundaries. It is not
unreasonable to assume that local image features should be continuously recognizable within the
same shot whereas a discontinuity occurs between successive shots. To detect shot boundaries,
discontinuities are detected by extracting and matching the SIFT feature points that can efficiently
represent objects, people, and background in successive video frames. In this paper, we use the
SIFT for establishing feature-based correspondences between two successive frames rather than
performing segmentation and matching of objects in successive frames. With many test video
sequences, it is observed that the SIFT features are continuously detected from objects, people,
and background, and then matched within the same shot, whereas they go through abrupt changes
over different shots. We establish feature point correspondences using the point set Pk extracted
from the reference frame k and the point set Pk+l detected at the current frame k+l, where l denotes
the frame distance between the reference and current frames and is to be selected depending on
the type of shot boundaries: hard-cut and gradual-transition such as panning, tilting, and fade
in/out. Two points are assumed to be correctly matched if a matched point pk+l (pk+l∈Pk+l) is
located within a square window at the current frame k+l, where pk (pk∈Pk) is a center point of a
window. For reduction of the computational complexity, the modified SIFT can be used [15].
SBD using SIFT features has a drawback that its performance is degraded for images or videos
blurred by fast motion or containing uniform regions such as sky. Also if rotation of objects is
larger than some value, e.g., 30 degrees, matching by SIFT features gives incorrect results. This
9. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
47
section presents a key frame extraction method that prevents performance degradation caused by
drawbacks of the SIFT.
The amount of motion, obtained in SBD, of objects, people, or background between successive
frames can be computed by the magnitude of MV. The framewise motion information )(kMI
between successive frames can be defined by
where iu and iv signify the horizontal and vertical components of the MV for MB ,i respectively,
and bN denotes the number of MBs at frame .k
If motion of objects or people is large, large )(kMI is obtained, otherwise, small ).(kMI Figure
3 shows an example of framewise motion information )(kMI in terms of the frame index of test
video. The frame that has the minimum )(kMI is selected first and kN key frames are chosen
adjacent to (before or after) the frame that has the minimum ),(kMI where kN is an
experimentally selected even number. Thus, )1( kN key frames are defined in each shot. This
key frame extraction method can prevent the performance degradation of the SIFT caused by
motion blur or rotation of objects. Selection of more than one key frame gives result robust to
three-dimensional (3-D) rotation of objects, because if objects or people move, then observation
of several successive frames gives 3-D information of objects or people.
0
0.05
0.1
0.15
0.2
70 90 110 130 150 170
Figure 3. MI(k) as a function of frame index.
3.3. Video Indexing
This section proposes a video indexing method, in which the SIFT [15] is applied to key frames
that are extracted between shot boundaries detected by the SBD method [12]. Video indexing is to
automatically classify video content using features extracted from video. As explained previously,
bN
i
ii
b
vu
N
kMI
1
221
)( (11)
Frame index k
MI(k)
10. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
48
a scene, which is a unit of content representation in video, consists of a number of shots. Figure 4
shows two example structures of scenes consisting of general shots, in which scene 1 consists of
shot a and shot b whereas scene 2 is composed of shot c, shot d, and shot e. Generally, different
shots have different objects, people, and background [15], however, within a limited range of
frames the same objects or people can be contained in different shots. Also, between different
scenes, different objects and people are contained.
Figure 4. Examples of scene structure.
The SIFT is applied to )1( kN key frames to compute the similarity between key frames
detected by the SBD method [12], where it is assumed that there exist the same or similar objects,
people, or background in two key frames. If the number of matched points (i.e., key points) is
larger than threshold, it is assumed that the same or similar objects, people, or background are
contained in these key frames. Then, key frames are grouped by the similarity (i.e., by the number
of matched points) computed using the SIFT. By grouping similar shots, a scene is constructed.
Using the SIFT, this paper detects different objects and people by comparing SIFT feature points
that are contained in key frames within the limited range of shots. The SIFT is robust to scale
change of objects, however, is sensitive to motion blur and 3-D rotation of objects. Thus, in each
shot, we first detect the key frame with the lowest amount of motion and select kN adjacent
frames, i.e., a total of )1( kN key frames are selected in each shot. For fast-motion video, kN
adjacent frames gives blurred or rotated versions of the object in the key frame with the lowest
amount of motion, thus similarity computation between shots gives accurate results by using
additional kN frames adjacent to the key frame with the lowest amount of motion. Similarity
computation between key frames is performed by matching feature points detected by the SIFT.
Figure 5 shows an example of video indexing. The SIFT is applied to key frame set at shot S
(containing SN key frames) and key frame set at shot lS (containing lSN key frames), where
22
22 N
l
N
with 2N being an even constant. The similarity at ),( lSS is computed using
)( 2 lSS NNN possible matchings. The relation between shot S and shot lS is represented
by the number of matched key points. If the number is larger than predefined threshold value,
shot S and shot lS are similar shots containing the same or similar objects, people, or
background. In this case, two similar shots are grouped into the same scene. Even if two shots are
not temporally adjacent (i.e., separated by l ), they may contain the same or similar objects,
people, or background, as illustrated in Figure 4(b).
Figure 5. Video indexing.
It is difficult to select appropriate value of shot range .2N For example, if the same objects,
people, or background occur frequently in video, large 2N gives a high probability that a large
11. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
49
number of frames (or a number of the same/similar shots) are merged together into the same
scene. Otherwise, a scene contains relatively different shots. In this paper, empirical value of
202 N is used, as used in existing methods.
To explain the procedure of the proposed method more clearly, a pseudo code is shown in the
following;
Step 1. Shot Change Detection
WHILE with Whole Frames
Calculate DFD via ME for Shot Change Detection
Calculate Similarity of Motion Direction with Previous Frame
IF DFD & Motion Direction Similarity > THRESHOLD_0
Detect Shot Change Detection
END IF
IF MIN(DFD)
Select Key Frame
END IF
END of WHILE
Step 2. Descriptor Calculation Using SIFT
WHILE with Whole Key Frames
Do SIFT
END of WHILE
Step 3. Grouping Shots with Descriptors
WHILE with Whole Key Frames
WHILE with Key Frames of Each Shot
Calculate Similarity of Each Key Frame by Comparing Descriptors of Key Frames
IF Similarity > THRESHOLD_1
Do Grouping Shots
END of WHILE
END of WHILE
4. EXPERIMENTAL RESULTS AND DISCUSSIONS
This section presents experimental results and discussions of the proposed video indexing
method, in which SBD is performed using our previous SBD method based on blockwise motion-
based features [12]. Experimental procedure and results of SBD are described briefly first,
followed by experimental results of video indexing.
Our previous SBD method for fast-motion video is applied to seven test videos as used in [12],
which consist of 8,000-15,000 frames sampled at 24 frames per second (fps), and a full search
BMA is used to estimate blockwise MV. For each sequence, a human observer identifies the shot
boundaries as the ground truth. The problem of SBD using low-level image features such as color
or motion information is inherently heuristic and it is not easy to mathematically justify the
optimal selection of parameter values. With the test video sequences used, the parameter set (C1,
C2, C3) that minimizes the detection error is experimentally selected as (3.1, 1.5, 6). The size kN
of a temporal sliding window is defined as the maximum size that does not exceed the average
shot length [4], in which the optimal size of kN =12 is experimentally selected. Note that our
previous SBD algorithm gives high detection performances for all test sequences used in
12. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
50
experiments, because of the motion-based discontinuity measure that is suitable for detecting shot
boundaries in fast-motion video such as action movies or sports clip [12].
Using SBD results, the performance of the proposed video indexing method is shown. Figures 6-
10 show experimental results of five test videos by the proposed video indexing method, in which
shot grouping results are illustrated. Horizontal and vertical axes show shot indices, and the
number of matched key points between two shots, i.e., the similarity between two shots, is
illustrated in two-dimensional (2-D) and 3-D representations. In each figure, similarity between
two shots are represented as gray value, e.g., the similarity between shots 10 and 20 is represented
as gray level at (10, 20). The darker, the larger the similarity between two shots. Note that dark
points gather mostly along the diagonal line, i.e., large similarity exists between adjacent shots.
Video indexing is performed using test videos with shot detection results shown in [12]. For
example, test video 1 consists of 131 shots and similarity between each pair of shots is shown in
Figures 10(a) and 10(b), in 2-D and 3-D plots, respectively. Large similarity values occur along
the diagonal line, i.e., adjacent shots have large similarity values. As observed from the 2-D and
3-D plots of similarity, shots between 85 and 95 have large similarity values. If the similarity
value is larger than the pre-specified threshold value, then these 11 shots can be combined
together as the same scene as shown in Figure 4. Representation of scene structure of test video 1
depends on the parameter values such as the number of adjacent frames in key frame detection
kN and the threshold for similarity values between a pair of shots.
(a) (b)
Figure 6. Result of video indexing (test video 1). (a) similarity between two shots are represented as gray
level. (b) 3-D plot of the similarity between two shots.
Similarly, video indexing results with test videos 2, 3, 4, and 5 are shown in Figures 7, 8, 9, and
10, respectively, in 2-D and 3-D representations. Test videos 2, 3, 4, and 5 consist of 91, 111, 71,
and 131 shots, respectively, in which the number of shots depends on the characteristics of the
test video and parameter values used in SBD. Final scene structure of the test video depends on
the parameter values used in video indexing. Note that the similarity values along the diagonal
13. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
51
lines have different cluster patterns depending on the characteristics of the test video, i.e.,
illustrating different scene structure of different test videos.
(a) (b)
Figure 7. Result of video indexing (test video 2). (a) similarity between two shots are represented as gray
level. (b) 3-D plot of the similarity between two shots.
(a) (b)
Figure 8. Result of video indexing (test video 3). (a) similarity between two shots are represented as gray
level. (b) 3-D plot of the similarity between two shots.
14. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
52
(a) (b)
Figure 9. Result of video indexing (test video 4). (a) similarity between two shots are represented as gray
level. (b) 3-D plot of the similarity between two shots.
(a) (b)
Figure 10. Result of video indexing (test video 5). (a) similarity between two shots are represented as gray
level. (b) 3-D plot of the similarity between two shots.
Different video indexing methods have many heuristic parameters, depending on characteristics
of test video and applications. Due to different heuristics used in each video indexing method, it is
difficult to compare experimental results of the proposed video indexing method with existing
video indexing methods.
In the proposed algorithm, computational steps such as ME, detection of SIFT feature descriptors,
and matching between SIFT descriptors are performed. Among them the most computationally
15. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
53
expensive step is ME, in which computational complexity is O(n), where n is the number of
frames in video.
In summary, this paper proposes a video indexing method suitable for fast-motion video with
motion blur or rotation of objects. For SBD, a number of frames adjacent to the key frame with
the lowest amount of motion are used in the SIFT feature matching for robustness to motion blur
due to fast motion or to 3-D rotation of objects in frames.
5. CONCLUSIONS
This paper proposes an efficient video indexing method for fast-motion video. First, an SBD
method robust to fast motion is used, in which two motion-based features such as the modified
DFD based on the BMA and the blockwise motion similarity are used. Then, a number of key
frames are detected in each shot and applied to video indexing using the SIFT feature matching,
which is robust to size and illumination change. The proposed video indexing algorithm gives
good results because SBD based on motion similarity information and modified DFD is
performed by considering motion of objects, people, and background. Future research will focus
on developing a SBD method that is effective for different types of shot boundaries and on the
investigation of a shot grouping method that effectively summarizes the fast-motion content.
REFERENCES
[1] Z. Rasheed and M. Shah, “Scene boundary detection in Hollywood movies and TV shows,” in Proc.
Computer Vision Pattern Recognition, vol. 2, pp. 343-348, Madison, WI, USA, June 2003.
[2] H. Liu, W. Meng, and Z. Liu, “Key frame extraction of online video based on optimized frame
difference,” in Proc. 2012 9th Int. Conf. Fuzzy Systems and Knowledge Discovery, pp. 1238-1242,
2012.
[3] W. Hu, N. Xie, L. Li, X. Zeng, and S. Maybank, “A survey on visual content-based video indexing
and retrieval,” IEEE Trans. Systems, Man, and Cybernetics–Part C: Applications and Reviews, vol.
41, no. 6, pp. 797-819, Nov. 2011.
[4] F. Moschetti, M. Kunt, and E. Debes, “A statistical adaptive block-matching motion estimation,”
IEEE Trans. Circuits Systems for Video Technology, vol. 13, no. 5, pp. 417-431, May 2003.
[5] C. W. Ngo, T. C. Pong, and H. J. Zhang, “Motion analysis and segmentation through spatio-temporal
slices processing,” IEEE Trans. Image Processing, vol. 12, no. 3, pp. 341-355, Mar. 2003.
[6] C. Sujatha and U. Mudenagudi, “A study on key frame extraction methods for video summary,” in
Proc. 2011 Int. Conf. Computational Intelligence and Communication Systems, pp. 73-77, Gwalior,
India, Oct. 2011.
[7] P. Kelm, S. Schmiedeke, and T. Sikora, “Feature-based video key frame extraction for low quality
video sequences,” in Proc. 10th Workshop Image Analysis for Multimedia Interactive Services, pp.
25-28, London, United Kingdom, May 2009.
[8] Z. Sun, K. Jia, and H. Chen, “Video key frame extraction based on spatial-temporal color
distribution,” in Proc. Int. Conf. Intelligent Information Hiding and Multimedia Signal Processing, pp.
196-199, Harbin, China, Aug. 2008.
[9] M. Sameh, S. Wided, A. Beghdadi, and C. B. Amar, “Video indexing using salient region based
spatio-temporal segmentation approach,” in Proc. 2012 Int. Conf. Multimedia Computing and
Systems, pp. 170-173, Tangier, Morocco, May 2012.
[10] Z.-J. Zha, M. Wang, Y.-T. Zheng, Y. Yang, R. Hong, and T.-S. Chua, “Interactive video indexing
with statistical active learning,” IEEE Trans. Multimedia, vol. 14, no. 1, pp. 17-27, Feb. 2012.
[11] S. Zhu and Y. Liu, “Scene segmentation and semantic representation for high-level retrieval,” IEEE
Signal Processing Letters, vol. 15, no. 1, pp. 713-716, 2008.
[12] M.-H. Park, R.-H. Park, and S. W. Lee, “Efficient shot boundary detection for action movies using
blockwise motion-based features,” Lecture Notes in Computer Science, Advances in Visual
16. International Journal of Computer Graphics & Animation (IJCGA) Vol.4, No.2, April 2014
54
Computing: First Int. Symposium, ISVC 2005, Eds. G. Bebis et al., vol. 3804, pp. 478-485, Dec.
2005.
[13] J.-Y. Kim, R.-H. Park, and S. Yang, “Block-based motion estimation using the pixelwise
classification of the motion compensation error,” in Proc. 2005 Digest of Technical Papers Int. Conf.
Consumer Electronics, pp. 259-260, Las Vegas, NV, USA, Jan. 2005.
[14] D. G. Lowe, “Distinctive image feature from scale-invariant keypoints,” Int. Journal of Computer
Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
[15] M.-H. Park, R.-H. Park, and S. W. Lee, “Shot boundary detection using scale invariant feature
transform,” in Proc. SPIE Visual Communications and Image Processing 2006, pp. 478-485, San
Jose, CA, USA, Jan. 2006.
[16] Y. Huo, P. Zhang, and Y. Wang, “Adaptive threshold video shot boundary detection algorithm based
on progressive bisection strategy,” Journal of Information & Computational Science, vol. 11, no. 2,
pp. 391-403, Jan. 2014.
[17] D. J. Lan, Y. F. Ma, and H. J. Zhang, “A novel motion-based representation for video mining,” in
Proc. Int. Conf. Multimedia and Expo, vol. 3, pp. 469-472, Baltimore, MD, USA, 2003.
[18] Y. F. Ma and H. J. Zhang, “Motion texture: A new motion based video representation,” in Proc. Int.
Conf. Pattern Recognition, vol. 2, pp. 548-551, Quebec, QU, Canada, 2002.
Authors
Min-Ho Park received the B.S. degree in electronic engineering from Dongguk University, Seoul, Korea, in
2004. He received the M.S. degree in electronic engineering from Sogang University, Seoul, Korea, in
2006. In 2006, he joined video team in the Research & Development of PIXTREE, Inc. as senior engineer.
He has been a senior software engineer in SmartTV Platform team in the Research & Development of LG
Electronics Inc. since 2010. His research interests include video processing, analysis, indexing, and
compression.
Rae-Hong Park received the B.S. and M.S. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1976 and 1979, respectively, and the M.S. and Ph.D. degrees in electrical
engineering from Stanford University, Stanford, CA, in 1981 and 1984, respectively. In 1984, he joined the
faculty of the Department of Electronic Engineering, Sogang University, Seoul, Korea, where he is
currently a Professor. In 1990, he spent his sabbatical year as a Visiting Associate Professor with the
Computer Vision Laboratory, Center for Automation Research, University of Maryland at College Park. In
2001 and 2004, he spent sabbatical semesters at Digital Media Research and Development Center (DTV
image/video enhancement), Samsung Electronics Co., Ltd. In 2012, he spent a sabbatical year in Digital
Imaging Business (R&D Team) and Visual Display Business (R&D Office), Samsung Electronics Co., Ltd.
His current research interests are video communication, computer vision, and pattern recognition. He
served as Editor for the Korea Institute of Telematics and Electronics (KITE) Journal of Electronics
Engineering from 1995 to 1996. Dr. Park was the recipient of a 1990 Post-Doctoral Fellowship presented
by the Korea Science and Engineering Foundation (KOSEF), the 1987 Academic Award presented by the
KITE, the 2000 Haedong Paper Award presented by the Institute of Electronics Engineers of Korea (IEEK),
the 1997 First Sogang Academic Award, and the 1999 Professor Achievement Excellence Award presented
by Sogang University. He is a co-recipient of the Best Student Paper Award of the IEEE Int. Symp.
Multimedia (ISM 2006) and IEEE Int. Symp. Consumer Electronics (ISCE 2011).