This document compares different block-matching motion estimation algorithms. It introduces block-matching motion estimation and describes popular distortion metrics like MSE and SAD. It then explains the full-search algorithm and more efficient algorithms like three-step search and four-step search that evaluate fewer candidate blocks to reduce computational cost. These algorithms are evaluated and compared using video test sequences to analyze their performance and quality.
Wavelet transform is one of the important methods of compressing image data so that it takes up less memory. Wavelet based compression techniques have advantages such as multi-resolution, scalability and tolerable degradation over other techniques.
This slide gives you the basic understanding of digital image compression.
Please Note: This is a class teaching PPT, more and detail topics were covered in the classroom.
Wavelet transform is one of the important methods of compressing image data so that it takes up less memory. Wavelet based compression techniques have advantages such as multi-resolution, scalability and tolerable degradation over other techniques.
This slide gives you the basic understanding of digital image compression.
Please Note: This is a class teaching PPT, more and detail topics were covered in the classroom.
Digital Image Processing denotes the process of digital images with the use of digital computer. Digital images are contains various types of noises which are reduces the quality of images. Noises can be removed by various enhancement techniques. Image smoothing is a key technology of image enhancement, which can remove noise in images.
Image processing, Noise, Noise Removal filtersKuppusamy P
Basics of images, Digital Images, Noise, Noise Removal filters
Reference:
Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010
Introduction to Digital Videos, Motion Estimation: Principles & Compensation. Learn more in IIT Kharagpur's Image and Video Communication online certificate course.
Digital Image Processing denotes the process of digital images with the use of digital computer. Digital images are contains various types of noises which are reduces the quality of images. Noises can be removed by various enhancement techniques. Image smoothing is a key technology of image enhancement, which can remove noise in images.
Image processing, Noise, Noise Removal filtersKuppusamy P
Basics of images, Digital Images, Noise, Noise Removal filters
Reference:
Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010
Introduction to Digital Videos, Motion Estimation: Principles & Compensation. Learn more in IIT Kharagpur's Image and Video Communication online certificate course.
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCIDES Editor
This paper proposes an efficient VLSI architecture
for the implementation of variable block size motion
estimation (VBSME). To improve the performance video
compression the Variable Block Size Motion Estimation
(VBSME) is the critical path. Variable Block Size Motion
Estimation feature has been introduced in to the H.264/AVC.
This feature induces significant complexities into the design
of the H.264/AVC video codec. This paper we compare the
existing architectures for VBSME. An efficient architecture
to improve the performance of Spiral Search for Variable Size
Motion Estimation in H.264/AVC is proposed. Among various
architectures available for VBSME spiral search provides
hardware friendly data flow with efficient utilization of
resources. The proposed implementation is verified using the
MATLAB on foreman, coastguard and train sequences. The
proposed Adaptive thresholding technique reduces the average
number of computations significantly with negligible effect
on the video quality. The results are verified using hardware
implementation on Xilinx Virtex 4 it was able to achieve real
time video coding of 60 fps at 95.56 MHz CLK frequency.
High Performance Architecture for Full Search Block matching Algorithmiosrjce
Video compression has two major issues to be handled, one is video compression rate and other one
is quality. There is always a trade-off between speed and quality. Full search block matching algorithm
(FSBMA) is most popular motion estimation algorithm. But high computational complexity is the major
challenge of FSBM. This makes FSBM to be very difficult to use for real time video processing with the low
power batteries. Other algorithm gives better speed on the expense of quality of video. The proposed algorithm
i.e. modified full search block matching algorithm (MFSBMA) reduces the computational complexity by keeping
the PSNR same as of FSBMA. MFSBMA skips the SAD calculations for a current background macroblock and it
does SAD calculations for foreground current microblock. This method reduces SAD calculations drastically.
This work presents apipelined architecture forMFSBMA which can work on real time HDTV video processing.
The proposed algorithm reduces computational complexity by 50% keeping PSNR same with the full search
algorithm.
Pixel Matching from Stereo Images (Callan seminar)Guillaume Gales
This talk discusses a number of techniques for correspondence estimation between stereo image pairs, i.e. two images of the same scene taken from different positions. The problem is to identify pairs of pixels in the two images that are the projections of the same scene point. Although the human visual system performs this task with ease, developing algorithms for automatically computing correspondences is a challenging task. In particular, existing algorithms can fail in homogeneous areas, near depth discontinuities and occlusions or with a repetitive texture pattern.
The first part of this talk focuses on seed propagation-based approaches that are a special case of local methods based computing an iterative solution, where the solution is initialised using a sparse set of reliable matches (the seeds). I introduce a reliability measure used by the propagation technique for finding the correct correspondent of a pixel, providing robustness in the context of the above difficulties. This measure takes into account an unambiguity term, a continuity term and a colour consistency term. It has the advantage of taking into account information from the other candidates, and leads, according to our experimental evaluation, to better results when compared to other methods based on a correlation score alone.
In the second part of this talk I will present ongoing work in our group on stereo matching in urban environments. In particular we exploit the fact that images of such environments contain multiple planar elements. I will show how utilising this strong geometrical constraint allows us to automatically segment building facades in single images. Furthermore I show how this technique permits robust pixel matching in wide-baseline stereo pairs. Finally, I will discuss how we intend to apply this technique for the development of augmented reality applications.
SYSTEM AND METHOD FOR ACQUIRING OF STATIC IMAGES OF OBJECTS IN MOTIONUSP
The vehicle supervision in Brazilian highways has an important role in the maintenance of public order in the traffic. Hence, the present invention is a system for acquisition of static images of objects in motion from videos in the surveillance, such as lateral images of vehicles which move in toll roads. Moreover, this invention also describes a method which consists in a process of detection of objects and video stitching.
In many Brazilian toll roads, vehicles in motion should be photographed sidelong. Common cameras cannot be used for this goal, mainly, because there is no enough space between the roads that allows the acquisition of photos of whole vehicles. This invention proposes a system and method for acquiring static images of objects in motion from videos, of which video frames are “stitching” to obtain just one image.
Robust Block-Matching Motion Estimation of Flotation Froth Using Mutual Infor...CSCJournals
In this paper, we propose a new method for the motion estimation of flotation froth using mutual information with a bin size of two as the block matching similarity metric. We also use three-step search and new-three-step-search as a search strategy. Mean sum of absolute difference (MAD) is widely considered in blocked based motion estimation. The minimum bin size selection of the proposed similarity metric also makes the computational cost of mutual information similar to MAD. Experimental results show that the proposed motion estimation technique improves the motion estimation accuracy in terms of peak signal-to-noise ratio of the reconstructed frame. The computational cost of the proposed method is almost the same as the standard machine vision methods used for the motion estimation of flotation froth.
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
A Comparison of People Counting Techniques viaVideo Scene AnalysisPoo Kuan Hoong
Real-time human detection and tracking from video surveillance footages is one of the most active research areas in computer vision and pattern recognition. This is due to the widespread application from being able to do it well. One such application is the counting of people, or density estimation, where the two key components are human detection and tracking. Traditional methods such as the usage of sensors are not suitable as they are not easily integrated with current video surveillance systems. As video surveillance systems are currently prevalent in most places, using vision based people counting techniques will be the logical approach. In this paper, we compared the two commonly used techniques which are Cascade Classifier and Histograms of Gradients (HOG) for human detection. We evaluated and compared these two techniques with three different video datasets with three different setting characteristics. From our experiment results, both Cascade Classifier and HOG techniques can be used for people counting to achieve moderate accuracy results.
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEMIJMIT JOURNAL
With an increasing number of extreme events and complexity, more alarms are being used to monitor
control rooms. Operators in the control rooms need to monitor and analyze these alarms to take suitable
actions to ensure the system’s stability and security. Security is the biggest concern in the modern world. It
is important to have a rigid surveillance that should guarantee protection from any sought of hazard.
Considering security, Closed Circuit TV (CCTV) cameras are being utilized for reconnaissance, but these
CCTV cameras require a person for supervision. As a human being, there can be a possibility to be tired
off in supervision at any point of time. So, we need a system to detect automatically. Thus, we came up with
a solution using YOLO V5. We have taken a data set and used robo-flow framework to enhance the existing
images into numerous variations where it will create a copy of grey scale image, a copy of its rotation and
a copy of its blurred version which will be used to get an enlarged data set. This work mainly focuses on
providing a secure environment using CCTV live footage as a source to detect the weapons. Using YOLO
algorithm, it divides an image from the video into grid system and each grid detects an object within itself
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
APPLICATION OF VARIOUS DEEP LEARNING MODELS FOR AUTOMATIC TRAFFIC VIOLATION D...ijitcs
A rapid growth in the population and economic growth has resulted in an increasing number of vehicles on
road every year. Traffic congestion is a big problem in every metropolitan city. To reach their destination
faster and to avoid traffic, some people are violating traffic rules and regulations. Violation of traffic rules
puts everyone in danger. Maintaining traffic rules manually has become difficult over the time due to the
rapid increase in the population. This alarming situation has be taken care of at the earliest. To overcome
this, we need a real-time violation detection system to help maintain the traffic rules. The approach is to
detect traffic violations in real-time using edge computing, which reduces the time to detect. Different
machine learning models and algorithms were applied to detect traffic violations like traveling without a
helmet, line crossing, parking violation detection, violating the one-way rule etc. The model implemented
gave an accuracy of around 85%, due to memory constraints of the edge device in this case NVIDIA Jetson
Nano, as the fps is quite low.
Real Time Object Detection System with YOLO and CNN Models: A ReviewSpringer
The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK
ONCE (YOLO) algorithm and it's more evolved versions are briefly described in this research survey. This
survey is all about YOLO and convolution neural networks (CNN) in the direction of real time object detection.
YOLO does generalized object representation more effectively without precision losses than other object
detection models. CNN architecture models have the ability to eliminate highlights and identify objects in any
given image. When implemented appropriately, CNN models can address issues like deformity diagnosis,
creating educational or instructive application, etc. This article reached at number of observations and
perspective findings through the analysis. Also it provides support for the focused visual information and
feature extraction in the financial and other industries, highlights the method of target detection and feature
selection, and briefly describes the development process of yolo algorithm
Human motion is fundamental to understanding behaviour. In spite of advancement on single image 3 Dimensional pose and estimation of shapes, current video-based state of the art methods unsuccessful to produce precise and motion of natural sequences due to inefficiency of ground-truth 3 Dimensional motion data for training. Recognition of Human action for programmed video surveillance applications is an interesting but forbidding task especially if the videos are captured in an unpleasant lighting environment. It is a Spatial-temporal feature-based correlation filter, for concurrent observation and identification of numerous human actions in a little-light environment. Estimated the presentation of a proposed filter with immense experimentation on night-time action datasets. Tentative results demonstrate the potency of the merging schemes for vigorous action recognition in a significantly low light environment.
A Framework for Human Action Detection via Extraction of Multimodal FeaturesCSCJournals
This work discusses the application of an Artificial Intelligence technique called data extraction and a process-based ontology in constructing experimental qualitative models for video retrieval and detection. We present a framework architecture that uses multimodality features as the knowledge representation scheme to model the behaviors of a number of human actions in the video scenes. The main focus of this paper placed on the design of two main components (model classifier and inference engine) for a tool abbreviated as VASD (Video Action Scene Detector) for retrieving and detecting human actions from video scenes. The discussion starts by presenting the workflow of the retrieving and detection process and the automated model classifier construction logic. We then move on to demonstrate how the constructed classifiers can be used with multimodality features for detecting human actions. Finally, behavioral explanation manifestation is discussed. The simulator is implemented in bilingual; Math Lab and C++ are at the backend supplying data and theories while Java handles all front-end GUI and action pattern updating. To compare the usefulness of the proposed framework, several experiments were conducted and the results were obtained by using visual features only (77.89% for precision; 72.10% for recall), audio features only (62.52% for precision; 48.93% for recall) and combined audiovisual (90.35% for precision; 90.65% for recall).
A benchmark dataset to evaluate sensor displacement in activity recognitionOresti Banos
This work introduces an open benchmark dataset to investigate inertial sensor displacement effects in activity recognition. While sensor position displacements such as rotations and translations have been recognised as a key limitation for the deployment of wearable systems, a realistic dataset is lacking. We introduce a concept of gradual sensor displacement conditions, including ideal, self-placement of a user, and mutual displacement deployments. These conditions were analysed in the dataset considering 33 fitness activities, recorded using 9 inertial sensor units from 17 participants. Our statistical analysis of acceleration features quantified relative effects of the displacement conditions. We expect that the dataset can be used to benchmark and compare recognition algorithms in the future.
This presentation illustrates part of the work described in the following article:
* Banos, O., Toth, M. A., Damas, M., Pomares, H., Rojas, I., Amft, O.: A benchmark dataset to evaluate sensor displacement in activity recognition. In: Proceedings of the 14th International Conference on Ubiquitous Computing (Ubicomp 2012), Pittsburgh, USA, September 5-8, (2012)
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
Similar to A Comparison of Block-Matching Motion Estimation Algorithms (20)
Se presenta un método para la identificación automática de células epiteliales en tejidos de histología. Trabajo presentado en el marco del VIII Congreso Colombiano de Morfología -2012
A comparison of stereo correspondence algorithms can be conducted by a quantitative evaluation of disparity maps. Among the existing evaluation methodologies, the Middlebury’s methodology is commonly used. However, the Middlebury’s methodology has shortcomings in the evaluation model and the error measure. These shortcomings may bias the evaluation results, and make a fair judgment about algorithms accuracy difficult. An alternative, the methodology is based on a multiobjective optimisation model that only provides a subset of algorithms with comparable accuracy. In this paper, a quantitative evaluation of disparity maps is proposed. It performs an exhaustive assessment of the entire set of algorithms. As innovative aspect, evaluation results are shown and analysed as disjoint groups of stereo correspondence algorithms with comparable accuracy. This innovation is obtained by a partitioning and grouping algorithm. On the other hand, the used error measure offers advantages over the error measure used in the Middlebury’s methodology. The experimental validation is based on the Middlebury’s test-bed and algorithms repository. The obtained results show seven groups with different accuracies. Moreover, the top-ranked stereo correspondence algorithms by the Middlebury’s methodology are not necessarily the most accurate in the proposed methodology
A quantitative evaluation methodology for disparity maps includes the selection of an error measure. Among existing measures, the percentage of bad matched pixels is commonly used. Nevertheless, it requires an error threshold. Thus, a score of zero bad matched pixels does not necessarily imply that a disparity map is free of errors. On the other hand, we have not found publications on the evaluation process where different error measures are applied. In this paper, error measures are characterised in order to provide the bases to select a measure during the evaluation process. An analysis of the impact on results of selecting different error measures on the evaluation of disparity maps is conducted based on the presented characterisation. The evaluation results showed that there is a lack of consistency on the results achieved by considering different error measures. It has an impact on interpreting the accuracy of stereo correspondence algorithms.
Stereo vision is related to the estimation of the depth of a scene captured, simultaneously, from different points of view. A fundamental problem in stereo vision is the search of corresponding points. A pair of corresponding points is formed by the projections of a same point in space. Find pairs of corresponding points allows to estimate the depth through of triangulation. Dynamic Programming is a efficient method for the search of pairs of corresponding points. In this paper are used different aspects of approaches which used Dynamic Programming for the search of pairs of corresponding points
Electronic microscopes are tools for capturing multimedia information that provide an alternative solution to several problems. Char coal classification is carried out manually by observing its morphological characteristics. In this process is necessary to analyse at least five hundred particles. As an alternative, the automation requires the use of image processing techniques. The char images acquisition is carried out automatically using an electronic microscope with motorized stage. In this process blur, empty and fragment particles images are captured. Including all these images in the classification process imply an additional effort during the process. In particular, the blur images may produce quantification errors in the quantification of the morphological characteristics. In this article a method, based on gradient magnitude and saturation for automatic identification of blur images and images with little content, is presented as a first step towards automatic classification process. Experimental results shown that the proposed method detects 70% of blur images and 95% of images with little content
La clasificación de carbonizados se realiza, generalmente, de forma manual mediante el análisis de las características morfológicas de al menos 500 partículas. Existen varias propuestas de clasificación semiautomática y automática usando técnicas de procesamiento de imágenes, sin embargo es poca la atención prestada al preprocesamiento de las imágenes. Las imágenes de carbonizados, normalmente empleadas para la clasificación automática, son de alta resolución (1300x1030 píxeles). Adicionalmente, analizar 500 partículas implica procesar al menos 290 imágenes para clasificar una muestra. En este artículo, se analiza el uso del sub-muestreo para reducir la resolución de las imágenes y su impacto sobre la clasificación de los carbonizados. Los resultados experimentales muestran que una reducción en el tamaño de las imágenes, a la mitad reduce hasta en un 69.19% el tiempo de procesamiento y no afecta la clasificación final de la muestra
Char classification process is based on morphological characteristics, such as: number of pores, distribution of pores and all thickness. Approximately, five hundred images have to be analysing in order to classify a char sample. Frequently, these images have high spatial resolution, 1300 x 1030 pixels, and intensity levels are represented using 8 bits. Thus, char image applications require large storage and processing capacity. In this paper, we compare different subsampling and quantisation strategies in order to reduce the spatial resolution and the number of bits used. Compared strategies showed excellent results in reducing spatial resolution and intensity levels, with minimal loss of information or details in processed images
Images are retrieved from a repository using MPEG-7 visual descriptors. The MPEG-7 standard uses XML documents for
storing descriptors of multimedia content. The MPEG-7 standard does not define a model for mapping XML documents into a
database. However, XML documents can be considered as a database. An XML document is self-describing and portable data
collection that has a data structure of a tree or a graph. An XML document collection can be semi-structured and this quality
allows grouping XML documents without a schema that relate them. There are two possible database models: the Native XML
and the Relational. A database model for XML documents is selected based on the purpose of information use and database
requirements. In this paper, both models are described and analysed. A relational database schema is designed for mapping
MPEG-7 visual descriptors into a database
Resource-Oriented Architecture offers advantages over other web-service architectures. It is based on a simple, scalable and highly standardised application-level protocol. Multimedia content is commonly managed using the MPEG-7. The MPEG-7 is a standard for representing audiovisual information that satisfies specific requirements based on syntax, semantic and decoding. Content descriptions under MPEG-7 can be organised and characterized without ambiguity. The MPEG-7 eXperimental Model (XM) includes the best performing tools for MPEG-7 normative and non-normative elements. In this paper, multimedia content is managed using the MPEG-7 eXperimental Model functionalities and provided using web-services technology. RESTful principles are the guidelines for achieving multimedia content storage and retrieval. Quantitative evaluation of the proposed web services has shown that this approach has better performance, in term of retrieval speed and storage space
Multimedia content is extracted automatically using MPEG-7 visual descriptors. The MPEG-7 uses an extended XML standard for defining structural relation between descriptors allowing creation and modification of description schemes. MPEG-7 visual descriptors are numerical representations of features - such as: texture, shape and color - extracted from an image. In this paper, the MPEG-7 is conceived as a set of services for extracting and storing visual descriptors. The MPEg-7 text-annotation tool is used for semantic descriptions. Semantic descriptions are linked to images content and conceived as a service for annotating and storing. A framework using service oriented architecture for mapping semantic descriptions and MPEG-7 visual descriptors into a pure-relational model is proposed.
The camera calibration problem consists in estimating the intrinsic and the extrinsic parameters. This problem can be solved by computing the fundamental matrix. The fundamental matrix can be obtained from a set of corresponding points. However in practice, corresponding points may be inaccurately estimated, falsely matched or badly located, due to occlusion and ambiguity, among others. On the other hand, if the set of corresponding points does not include information on different depth planes, the estimated fundamental matrix may not be able to correctly recover the epipolar geometry. In this paper a method for estimating the fundamental matrix is introduced. The estimation problem is posed as finding a set of corresponding points. Fundamental matrices are estimated using subsets of corresponding points and an optimisation criterion is used to select the best estimated fundamental matrix. The experimental evaluation shows that the least range of residuals is a tolerant criterion to large baselines.
A Cells Segmentation Approach in Epithelial Tissue using Histology Images by ...
A Comparison of Block-Matching Motion Estimation Algorithms
1. A Comparison of Block-Matching
Motion Estimation Algorithms
María Santamaría and María Trujillo
October 4th 2012
Séptimo Congreso Colombiano de Computación, 7CCC 2012, Medellín - Colombia
2. Multimedia and Vision Laboratory
MMV is a research group of the Universidad del Valle in Cali, Colombia
M. Santamaría M. Trujillo
&
Computer Vision
3D World
Optics
Problem
Camera Inverse
System Problem
2D Images
Multimedia and Vision Research Laboratory: http://mmv-lab.univalle.edu.co
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 2
3. Content
Motivation
Motion Estimation
Block-Matching
Distortion Metrics
Selected Algorithms
Evaluation
Quality Metrics
Performance Metrics
Video Test Sequences
Results
Final Remarks
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 3
4. Motivation
Video coding Tracking 3D TV
Gesture recognition Resolution enhancement
http://www.encodedmedia.com/
http://assets.vr-zone.net/15416/LGTV.jpg
http://csecar.wordpress.com/
http://www.newelectronics.co.uk/electronics-news/qualcomm-invests-in-gesture-recognition-technology/35620/
http://users.soe.ucsc.edu/~milanfar/research/resolution-enhancement.html
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 4
5. Motion Estimation
Video Frames Motion Estimation
Motion Vectors
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 5
6. Block-Matching
Reference Frame Current Frame
Search area
Current block
Best matched block
Motion vector
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 6
7. Distortion Metrics
The two most popular measures to determine the match between
two blocks are: the Mean Square Error (MSE) and the Sum of
Absolute Differences (SAD)
Distortion
x
y
B. Xiong and C. Zhu, “A new multiplication-free block matching criterion,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 10, 2008
Elliot J. Rouse. A virtual curriculum vitae. http://www.elliottjrouse.com/
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 7
8. Full-Search (FS)
The Full-Search algorithm
evaluates all positions in the
window search of (2W+1) x
(2W+1) size
It involves high
computational cost
It is simple
It guarantees a high
accuracy in finding the best
match
1st stage
Best matched
Y. Huzka, and P. Kulla, “Trends in Block-matching Motion Estimation Algorithms,” 2004
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 8
9. Three-Step Search (3SS)
Search centre
1st stage
T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion Compensated Interframe Coding for Video Conferencing,” Proc. Nat.
Telcommun. Conf., 1981
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 9
10. Three-Step Search (3SS)
Search centre
1st stage
2nd stage
Best candidate
T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion Compensated Interframe Coding for Video Conferencing,” Proc. Nat.
Telcommun. Conf., 1981
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 10
11. Three-Step Search (3SS)
Search centre
1st stage
2nd stage
3rd stage
Best candidate
T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion Compensated Interframe Coding for Video Conferencing,” Proc. Nat.
Telcommun. Conf., 1981
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 11
12. Three-Step Search (3SS)
The number of stages
depends on the initial
distance to which the first 9
neighbors are selected
Search centre
1st stage
2nd stage
3rd stage
Best matched
T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion Compensated Interframe Coding for Video Conferencing,” Proc. Nat.
Telcommun. Conf., 1981
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 12
13. Four-Step Search (4SS)
Search centre
1st stage
L.-M. Po, and W. C.-Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6,
no. 3, 1996
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 13
14. Four-Step Search (4SS)
Search centre
1st stage
2nd stage
Best candidate
L.-M. Po, and W. C.-Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6,
no. 3, 1996
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 14
15. Four-Step Search (4SS)
Search centre
1st stage
2nd stage
3rd stage
Best candidate
L.-M. Po, and W. C.-Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6,
no. 3, 1996
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 15
16. Four-Step Search (4SS)
Search centre
1st stage
2nd stage
3rd stage
4th stage
Best candidate
L.-M. Po, and W. C.-Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6,
no. 3, 1996
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 16
17. Four-Step Search (4SS)
Each new stage (except the
reduced step stage)
evaluates three or five blocks
Search centre
1st stage
2nd stage
3rd stage
4th stage
Best matched
L.-M. Po, and W. C.-Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6,
no. 3, 1996
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 17
18. Diamond Search (DS)
Search centre
1st stage
J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, 1998
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 18
19. Diamond Search (DS)
Search centre
1st stage
2nd stage
Best candidate
J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, 1998
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 19
20. Diamond Search (DS)
Search centre
1st stage
2nd stage
3rd stage
Best candidate
J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, 1998
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 20
21. Diamond Search (DS)
Search centre
1st stage
2nd stage
3rd stage
4th stage
Best candidate
J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, 1998
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 21
22. Diamond Search (DS)
Each new stage (except the
reduced step stage)
evaluates four or five blocks
The neighbors are selected
at a mixed distance
Search centre
1st stage
2nd stage
3rd stage
4th stage
Best matched
J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, 1998
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 22
23. Hexagonal Block Search (HEXBS)
Search centre
1st stage
C.-H. Cheung and L.-M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” IEEE Trans. Multimedia,
vol. 7, no. 1, 2005
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 23
24. Hexagonal Block Search (HEXBS)
Search centre
1st stage
2nd stage
Best candidate
C.-H. Cheung and L.-M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” IEEE Trans. Multimedia,
vol. 7, no. 1, 2005
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 24
25. Hexagonal Block Search (HEXBS)
Search centre
1st stage
2nd stage
3rd stage
Best candidate
C.-H. Cheung and L.-M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” IEEE Trans. Multimedia,
vol. 7, no. 1, 2005
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 25
26. Hexagonal Block Search (HEXBS)
Search centre
1st stage
2nd stage
3rd stage
4th stage
Best candidate
C.-H. Cheung and L.-M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” IEEE Trans. Multimedia,
vol. 7, no. 1, 2005
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 26
27. Hexagonal Block Search (HEXBS)
Each new stage (except the
reduced step stage)
evaluates three blocks
It is faster than the DS, but
has a lower quality of
prediction Search centre
1st stage
2nd stage
3rd stage
4th stage
Best matched
C.-H. Cheung and L.-M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” IEEE Trans. Multimedia,
vol. 7, no. 1, 2005
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 27
28. Multi-Directional Gradient Descent Search
(MDGDS)
Search centre 1
1st stage 1
1
8 1 2
7 3 3 3
6 5 4
4
4
L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, “Multi-direction search algorithm for block-based motion estimation,” in IEEE Asia Pacific
Conf. in Circuits and Systems (APPCAS), 2008
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 28
29. Multi-Directional Gradient Descent Search
(MDGDS)
6 1
Search centre 6 1
1st stage 5 5 5 2 2 2 2 2
2nd stage 4 3
Best candidate
L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, “Multi-direction search algorithm for block-based motion estimation,” in IEEE Asia Pacific
Conf. in Circuits and Systems (APPCAS), 2008
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 29
30. Multi-Directional Gradient Descent Search
(MDGDS)
1
Search centre 6 1 2
1st stage
2nd stage 5 4 3
3rd stage 4 3
Best candidate 4
4
L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, “Multi-direction search algorithm for block-based motion estimation,” in IEEE Asia Pacific
Conf. in Circuits and Systems (APPCAS), 2008
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 30
31. Multi-Directional Gradient Descent Search
(MDGDS)
Search centre
1st stage
2nd stage
3rd stage 5 1
4th stage 2
Best candidate 4 3
L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, “Multi-direction search algorithm for block-based motion estimation,” in IEEE Asia Pacific
Conf. in Circuits and Systems (APPCAS), 2008
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 31
32. Multi-Directional Gradient Descent Search
(MDGDS)
It tries to solve the problem
of being trapped in a local
minimum
Search centre
1st stage
2nd stage
3rd stage
4th stage
Best matched
L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, “Multi-direction search algorithm for block-based motion estimation,” in IEEE Asia Pacific
Conf. in Circuits and Systems (APPCAS), 2008
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 32
33. Fast Directional Gradient Descent Search
(FDGDS)
It is an improvement of the
MDGDS that increases the
speed of the algorithm and
leads to little loss in quality of
1
prediction
Search centre 1
1st stage 1
1 2
3 3 3
4
4
Relative Distortion Ratio 4
4
L.-M. Po, K.-H. Ng, K.-W. Cheung, K.-M. Wong, Y. Uddin, and C.-W. Ting, “Novel Directional Gradient Descent Searches for Fast Block Motion
Estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 8, 2009
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 33
34. Fast Directional Gradient Descent Search
(FDGDS)
Search centre
1st stage
2nd stage
Best candidate
1 2
1 2
3 3 3 3
L.-M. Po, K.-H. Ng, K.-W. Cheung, K.-M. Wong, Y. Uddin, and C.-W. Ting, “Novel Directional Gradient Descent Searches for Fast Block Motion
Estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 8, 2009
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 34
35. Fast Directional Gradient Descent Search
(FDGDS)
Search centre
1st stage
2nd stage
3rd stage
Best candidate
6 1 2
5 4 3
L.-M. Po, K.-H. Ng, K.-W. Cheung, K.-M. Wong, Y. Uddin, and C.-W. Ting, “Novel Directional Gradient Descent Searches for Fast Block Motion
Estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 8, 2009
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 35
36. Fast Directional Gradient Descent Search
(FDGDS)
Search centre
1st stage
2nd stage
3rd stage
Best matched
L.-M. Po, K.-H. Ng, K.-W. Cheung, K.-M. Wong, Y. Uddin, and C.-W. Ting, “Novel Directional Gradient Descent Searches for Fast Block Motion
Estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 8, 2009
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 36
37. Quality Metrics
Peak Signal-to-Noise Ratio Structural Similarity Index
It is a point to point metric It is a windowed metric
Based on square differences Based on luminance, contrast and
It is not very well matched to perceived structure between an original and a
visual quality distorted images
It takes into account the visual
perception of the image
Z. Wang, A. C. Bovik, H. R. Sheikh and, E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans.
Image Process., vol. 13, no. 4, 2004
C. S. varnan, A. Jagan, J. Kaur, D. Jyoti, and D. S. Rao, “Image quality assessment techniques pn spatial domain,” International Journal on
Computer Science and Technology, vol. 2, no. 3, 2011
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 37
39. Performance Metrics
Since an algorithm requires time proportional to the number of
explored blocks (EXB), the computational cost of a BMA is
determined by the EXB
EXB in the case of Zero Motion Vector (ZMV)
BMA 3SS 4SS DS HEXBS MDGDS FDGDS
EXB 25 17 13 11 9 9
V. Padilla, “Algoritmos de block-matching para compresión de video,” Final Career Project, Systems Engineering Program, Universidad del
Valle, 2009
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 39
40. Video Test Sequences
Sequence Size # Frames Motion
Akiyo 352x288 300 Small
Mother_ 352x288 300 Small
daugthter
Silent 352x288 300 Small
Foreman 352x288 300 Medium
Garden 352x240 115 Medium
Mobile 352x288 300 Medium
Coastguard 352x288 300 Large
Football 352x288 260 Large
Stefan 352x240 300 Large
Block sizes used: 8x8, 16x16 and 32x32
All video sequences used are in uncompressed format: YUV4MPEG, and are available at: http://media.xiph.org/video/derf/
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 40
41. Results
PSNR performance, block size of 8x8 pixels
27,000
FS
MDGDS
FDGDS
DS
26,000
4SS
3SS
MDGDS
FS
FDGDS
FS
25,000
4SS
HEXBS
PSNR (dB)
24,000
3SS
MDGDS
FDGDS
DS
4SS
23,000
HEXBS
3SS
HEXBS
22,000
DS
21,000
20,000
Football Garden Stefan
Video Sequence
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 41
42. Results (ii)
SSIM performance, block size of 8x8 pixels
MDGDS
0,950
FDGDS
FS
4SS
0,900
HEXBS
FS
3SS
0,850
MDGDS
FDGDS
MDGDS
DS
3SS
FDGDS
4SS
DS
0,800
FS
DS
4SS
3SS
SSIM
HEXBS
0,750
0,700
HEXBS
0,650
0,600
Football Garden Stefan
Video Sequence
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 42
45. Results (v)
SSIM performance of various algorithms for Coastguard
video sequence
0,94
0,92
0,9
0,88
SSIM
3SS
4SS
0,86
DS
HEXBS
0,84
0,82
0,8
8x8 16x16 32x32
Block Size
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 45
46. Results (vi)
SSIM performance of various algorithms for Football
video sequence
0,8
0,75
0,7
SSIM
0,65 3SS
4SS
DS
0,6 HEXBS
0,55
0,5
8x8 16x16 32x32
Block Size
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 46
47. Results (vii)
SSIM performance of various algoritms for Garden
video sequence
0,9
0,88
0,86
0,84
0,82
SSIM
0,8 3SS
4SS
0,78 DS
HEXBS
0,76
0,74
0,72
0,7
8x8 16x16 32x32
Block Size
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 47
48. Final Remarks
The HEXBS shows low computational cost but produces low
quality of prediction
The MDGDS and the FDGDS show low computational cost and
produce the highest quality of prediction
The FGDGS achieves a good trade off between high quality of
prediction and a low computation cost
The HEXBS is less affected by the variation in the block
sizes, whilst the others show a big loss of prediction by
increasing the block size used
A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 48
49. A Comparison of Block-Matching Motion Estimation Algorithms, 7CCC 2012, Medellín - Colombia Slide 49