40120140503006

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online), Volume 5, Issue 3, March (2014), pp. 34-42 © IAEME
34
MATLAB BASED MOTION ESTIMATION AND COMPRESSION IN VIDEO
FRAMES USING TRUE MOTION TRACKER
Rekhanshi Raghava1
and Dr. Anil Kumar Sharma2
M. Tech. Scholar1
, Professor & Principal2
Department of Electronics & Communication Engineering
Institute of Engineering & Technology, Alwar-301030 (Raj.), India
ABSTRACT
Motion estimation is the process of determining motion vectors that describe the
transformation from one 2D image to another; usually from adjacent frames in a video sequence.
The motion vectors may relate to the whole image (global motion estimation) or specific parts, such
as rectangular blocks, arbitrary shaped patches or even per pixel. Motion can be rotation and
translation in all three dimensions and zoom. We are concerned with the "projected motion" of 3-D
objects onto the 2-D plane of an imaging sensor. By motion n estimation, we mean the estimation of
the displacement (or velocity) of image structures from one frame to another in a time sequence of 2-
D images. Motion estimation is a video compression technique, which exploits temporal redundancy
of the video sequence. Successive pictures in the motion video sequence tend to be highly correlated
and consecutive video frames will be similar except for the change induced with the objects moving
within the frames. This implies that the arithmetic difference between these pictures is small. In
contrast, the objects that is in motion increase the arithmetic difference between the frames which in
turn implies that more bits are required to encode the sequence. For this reason motion estimation
technique is used to determine displacement of the object Motion Estimation and Compensation, The
motion estimation creates a model by modifying one or more reference frames to match the current
frame as closely as possible. The current frame is motion compensated by subtracting the model
from the frame to produce a motion-compensated residual frame. This is coded and transmitted,
along with the information required for the decoder to recreate the model (typically a set of motion
vectors). At the same time, the encoded residual is decoded and added to the model to reconstruct a
decoded copy of the current frame (which may not be identical to the original frame because of
coding losses). This reconstructed frame is stored to be used as reference frame for further
predictions.
Keywords: BMA, DFD, JPEG, MPEG, TMT.
INTERNATIONAL JOURNAL OF ELECTRONICS AND
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 5, Issue 3, March (2014), pp. 34-42
© IAEME: www.iaeme.com/ijecet.asp
Journal Impact Factor (2014): 7.2836 (Calculated by GISI)
www.jifactor.com
IJECET
© I A E M E

35
1. INTRODUCTION
With the advent of the multimedia age and the spread of Internet, video storage on CD/DVD
and streaming video has been gaining a lot of popularity. The ISO Moving Picture Experts Group
(MPEG) video coding standards pertain towards compressed video storage on physical media like
CD/DVD, whereas the International Telecommunications Union (ITU) addresses real-time point-to-
point or multi-point communications over a network. The former has the advantage of having higher
bandwidth for data transmission. In either standard the basic flow of the entire compression
decompression process is largely the same. The encoding side estimates the motion in the current
frame with respect to a previous frame. A motion compensated image for the current frame is then
created that is built of blocks of image from the previous frame. The motion vectors for blocks used
for motion estimation are transmitted, as well as the difference of the compensated image with the
current frame is also JPEG encoded and sent. The encoded image that is sent is then decoded at the
encoder and used as a reference frame for the subsequent frames. The decoder reverses the process
and creates a full frame. The whole idea behind motion estimation based video compression is to
save on bits by sending JPEG encoded difference images which inherently have less energy and can
be highly compressed as compared to sending a full frame that is JPEG encoded. Motion JPEG,
where all frames are JPEG encoded, achieves anything between 10:1 to 15:1 compression ratio
whereas MPEG can achieve a compression ratio of 30:1. The algorithms that have been implemented
are Exhaustive Search (ES), this algorithm, also known as Full Search, is the most computationally
expensive block matching algorithm of all. This algorithm calculates the cost function at each
possible location in the search window. As a result of which it finds the best possible match and
gives the highest PSNR amongst any block matching algorithm. The obvious disadvantage to ES is
that the larger the search window gets the more computations it requires. Implementing an integrated
perspective of ES and true motion estimation. Taking the pictures of 3D real-world scene generates
sequences of video images. When an object in the three-dimensional real world moves, there are
corresponding changes in the brightness—or luminance intensity—of its two-dimensional image.
The physical three-dimensional motion projected onto the two dimensional image space is referred to
as “true motion.” The ability to track true motion by observing changes in luminance intensity is
critical to many video applications.
2. MOTION ESTIMATION
A video sequence can be considered to be a discretized three-dimensional projection of the
real four dimensional continuous space-time. The objects in the real world may move, rotate, or
deform. The movements cannot be observed directly. Changes between frames are mainly due to the
movement of these objects. Using a model of the motion of objects between frames, the encoder
estimates the motion that occurred between the reference frame and the current frame. This process
is called motion estimation (ME). The encoder then uses this motion model and information to move
the contents of the reference frame to provide a better prediction of the current frame. This process is
known as motion compensation (MC), and the prediction so produced is called the motion-
compensated prediction (MCP) or the displaced-frame (DF). In this case, the coded prediction error
signal is called the displaced-frame difference (DFD). A block diagram of a motion compensated
coding system. This is the most commonly used interframe coding method. The underlying
supposition behind motion estimation is that the patterns corresponding to objects and background in
a frame of video sequence move within the frame to form Corresponding objects on the subsequent
frame. The idea behind block matching is to divide the current frame into a matrix of ‘macro blocks’
that are then compared with corresponding block and its adjacent neighbours in the previous frame to
create a vector that stipulates the movement of a macro block from one location to another in the

36
previous frame. This movement calculated for all the macro blocks comprising a frame, constitutes
the motion estimated in the current frame. The search area for a good macro block match is
constrained up to p pixels on all fours sides of the corresponding macro block in previous frame.
Fig. 1: Motion estimation process
This ‘p’ is called as the search parameter. Larger motions require a larger p, and the larger the
search parameter the more computationally expensive the process of motion estimation becomes.
Usually the macro block is taken as a square of side 16 pixels, and the search parameter p is 7 pixels.
The matching of one macro block with another is based on the output of a cost function. The macro
block that results in the least cost is the one that matches the closest to current block. There are
various cost functions, of which the most popular and less computationally expensive is Mean
Absolute Difference (MAD) given by equation (i). Another cost function is Mean Squared Error
(MSE) given by equation (ii) where N is the side of the macro bock, Cij and Rij are the pixels being
compared in current macro block and reference macro block, respectively. Peak-Signal-to-Noise-
Ratio (PSNR) given by equation (iii) characterizes the motion compensated image that is created by
using motion vectors and macro clocks from the reference frame.
MAD ൌ 1/ܰଶ ∑ ∑ ሾ‫ܥ‬௜௝
ேିଵ
௝ୀ଴
ேିଵ
௜ୀ଴ െ ܴ௜௝ሿ (i)
MSE ൌ 1/ܰଶ ∑ ∑ ሺ‫ܥ‬௜௝
ேିଵ
௝ୀ଴
ேିଵ
௜ୀ଴ െ ܴ௜௝ሻ2
(ii)
PSNR= 10 log 10[ (peak to peak value of original data)2
/ MSE] (iii)
Fig. 2. Motion Compensated Video Coding

37
3. MOTION ESTIMATION ALGORITHMS IN VIDEO
There are two kinds of motion estimation algorithms: the first identifies the true motion of a
pixel (or a block) between video frames, and the second removes temporal redundancies between
video frames.
Tracking the true motion: The first kind of motion estimation algorithms aims to accurately track
the true motion of objects/features in video sequences. Video sequences are generated by projecting
a 3D real world onto a series of 2D images (e.g., using CCD). When objects in the 3D real world
move, the brightness (pixel intensity) of the 2D images change correspondingly. The 2D motion
projected from the movement of a point in the 3D real world is referred to as the “true motion”. For
example, Fig. 3(a) and (b) show two consecutive frames of a ball moving upright and Fig. 3(c) shows
the corresponding true motion of the ball. Computer vision, the goal of which is to identify the
unknown environment via the moving camera, is one of the many potential applications of true
motion.
Removing temporal redundancy: The second kind of motion estimation algorithm aims to remove
temporal redundancy in video compression. In motion pictures, similar scenes exist between a frame
and its previous frame. In order to minimize the amount of information to be transmitted, block-
based video coding standards (such as MPEG and H.263) encode the displaced difference block
instead of the original block. The residue (difference) is coded together with the motion vector. Since
the actual compression ratio depends on the removal of temporal redundancy, conventional block-
matching algorithms use minimal-residue as the criterion to find the motion vectors.
Fig. 3: (a) (b) Show two consecutive frames of a ball moving upright and
(c) Shows the true motion – the physical motion in 2D images.
Although the minimal-residue motion estimation algorithms are good at removing temporal
redundancy, they are not sufficient for finding the true motion vector. The motion estimation
algorithm for removing temporal redundancy is happy with finding any of the two motion vectors.
However, the motion estimation for tracking the true motion is targeted at finding the only one. In
general, motion vectors for the minimal residue, though good for the redundancy removal, may not
actually be true motion.

38
Fig. 4: (a) 2D image comes from projection of a 3D real world. Here, we assume a pinhole
camera is used. (b) The 2D projection of the movement of a point in the 3D real world is
referred as the “true motion”
4. TRUE MOTION TRACKER AND VIDEO COMPRESSION
Video compression can make use of the true motion tracker (TMT) in various shapes, such
as rate-optimized motion vector coding, object-based video coding, and object-based global motion
compensation [19, 24, 52]. In this chapter, we demonstrate that the proposed true motion tracker
(TMT) can provide higher coding efficiency and better subjective visual quality than conventional
minimal-residue block-matching algorithms. Video compression plays an important role in many
multimedia applications, from video-conferencing and video-phone to video games. The key to
achieving compression is to remove temporal and spatial redundancies in video images. Block-
matching motion estimation algorithms (BMAs) have been widely exploited in various international
video compression standards to remove temporal redundancy. For differentially encoded motion
vectors, we observe that a piecewise continuous motion field reduces the bit-rate. Hence, we propose
a rate-optimized motion estimation algorithm based on the neighbourhood relaxation TMT. The
unique features of this algorithm come from two parts: (1) we incorporate the number of bits for
encoding motion vectors into the minimization criterion, and (2) instead of counting the actual
number of bits for motion vectors; we approximate the number of bits by the residues of the
neighborhood. In addition, we present a motion-compensated frame-rate up-conversion scheme using
the decoded motion. Such use of the decoded motion can save computation on the decoder side. The
more accurate the motion information is, the better the performance of frame-rate up-conversion will
turn out to be. Hence, using the true motion vectors for the compression, results in a better picture
quality of frame-rate up-conversion than using the motion vectors estimated by the minimal-residue
block-matching algorithms (BMA).We use a motion-vector refinement scheme in which small
changes to the estimated motion vectors are allowed, to increase the precision of correct motion
vectors given the assurance of the correctness. The use of multiple resolutions in the recognition
process is computationally and conceptually interesting. In the analysis of signals, it is often useful to
observe a signal in successive approximations. For instance, in pattern recognition applications the
vision system attempts to classify an object from a coarse approximation. If the classification does
not succeed, additional details are added such that a more accurate view of the object is obtained.
This process can be continued until the object has been recognized.
Multiresolution Technique with Different Image Sizes for Previous Frame: Reducing the number
of search positions and the number of pixels in residual calculation can also reduce computation. The

39
multi resolution motion estimation algorithms rely on the technique of predicting an approximate
large scale motion vector in a coarse-resolution video and refining the estimated motion vector in a
multi resolution fashion to achieve the motion vector in the finer resolution. The size of the image is
smaller at a coarser level (i.e., of a pyramid form). Since a block at the coarser level represents a
larger region than a block with the same number of pixels at the finer level, a smaller search area can
be used at coarser levels. In addition, multi resolution motion estimation algorithms also reduce the
number of pixels in residual calculation. These algorithms can be further divided into two groups:
constant block size and variable block size.
(i) The same block size is used at each level. If the image size is reduced to half as the level
becomes coarser, one block at a coarser level covers four corresponding blocks at the
next finer level. In this way, the motion vector of the coarser-level block is either directly
used as the initial estimate for the four corresponding finer-level blocks or interpolated to
obtain four motion vectors of the finer level.
(ii) In different block sizes are employed at each level to maintain a one-to-one
correspondence between blocks in different levels. As a result, the motion vector of each
block can directly be used as an initial estimate for the corresponding block at the finer
level.
Multi resolution Technique with Same Image Size for Previous Frame: Instead of reducing the
number of search locations, the multi resolution method trades the number of search locations for
better estimation quality. This method uses different image resolutions with the same image size of a
pyramid form. Since the same image size is used at each level, the number of possible motion
candidates is the same at each level. The block size is not the same at each level and is reduced by
half as the level becomes coarser. A block at the coarser level represents the same region as that at
the finer level. Then, in the coarsest level, a set of motion candidates is selected from the maximum
motion candidate set using a full search with fewer pixels in residual calculation. In each of the finer
levels, the motion candidate set is further screened. At the last level, only a single motion vector is
selected.
Fig.5: 3-level multi resolution motion estimation schemes
The first level images are the images of original resolution. The second level images are the
images of a quarter resolution of the first level. (A pixel in second level corresponds to the low-pass
filtering of four pixels in the corresponding position.) The third level images are a quarter of the
second level.

40
Fig. 6: Three different images of foreman series under multi resolution compression
5. STIMULATIONS AND RESULT ANALYSIS
We perform our simulations under MPEG-4 test conditions as shown in Table-1where each
sequence has 300 frames. These sequences cover a wide range of motion contents and have various
formats including QCIF and CIF. The original frame-rate is 30 frames per second (or fps). They have
been tested at various bit rates (10 1024 kilobits per second or Kbps) and sub-sampled frame-rates
(7.5 30 fps). When the coding bit-rate is lower than 512 Kbps, only the first frame in each sequence
is coded as I frame all the remaining frames are coded as P frames. At high bit-rates (512 Kbps and
1024 Kbps. Search range means that the search will be performed within a square region of [-P, +P]
around the position of the current block. For comparison, the performances of FS, DS, and ARPS are
reported as follows. Average peak signal-to-noise ratio (PSNR) per frame of each reconstructed
video sequence is computed for quality comparison and documented in Table 1. Fig 7 shows the
stimulation graph for various discussed algorithms.
Table-1 MPEG 4 test conditions
Fig. 7: Frame-based PSNR performance of FS, DS and ARPS in Foreman Fast camera
panning with scene change happens during frames 160_220
Video Forma
t
Bit
rate(Kbp
s)
Frame
rate(fbps
)
Search
range
Mother-
Daughter
QCIF 24 10 16
Foreman CIF 512 15 32
Foreman CIF 1024 30 32
Coast guard QCIF 48 10 16
Coast guard CIF 112 10 16

41
On Comparing with FS(Full Search), ARPS(Adaptive Rood Pattern Search) greatly
improves the search speed with computational gain in the range of 94 447.ARPS maintains similar
PSNR performance of FS in most sequences with less than 0.12 dB degradation (except 0.23 dB in
Coastguard at 112 Kbps and 0.49 dB in Foreman at 512 Kbps).When compared with DS, ARPS is
constantly around 2 times faster with similar PSNR achieved. Even for difficult test sequences such
as Foreman and Coastguard where large and/or complex motion contents are involved, ARPS still
achieves superior PSNR to that of DS, by 0.27 dB and 0.38 dB, respectively as described in table 2
& 3.
Table-2: Average PSNR (dB) performance of FS, DS ARPS
Simple fast block-matching algorithm called adaptive rood pattern search (ARPS). By
exploiting higher distribution of MVs in the horizontal and vertical directions and the spatial inter-
block correlation, ARP adaptively exploits adjustable rood-shaped search pattern (which is powerful
in tracking motion trend), together with the search point indicated by the predicted MV, to match
different motion contents of video sequence for each macro block.
Table 3: Average number of search points per MV generation
6. CONCLUSION
This work has explored the theory of true motion tracking in digital video with respect to its
applications. We have examined basic features of true motion estimation algorithms.
This true motion tracker has a number of advantageous properties when applied to motion analysis:
• Dependable tracking—the neighborhood helps to single out erroneous motion vectors
• Motion flexibility—the relaxation helps to accommodate non-translation motion
• High implementation efficiency i.e. 99% of the computations are integer additions.
Consequently, it may be used as a cost-effective motion estimation algorithm for video
coding, video interpolation, and video-object analysis.
Video FS DS ARPS
Mother-Daughter(24) 34.82 34.76 34.61
Foreman(512) 35.01 34.26 34.50
Foreman(1024) 35.70 35.12 35.40
Coast guard(48) 28.88 28.72 28.77
Coast guard(112) 27.05 26.44 26.82
Video FS DS ARPS
Mother-Daughter(24) 1024 13.84 6.12
Foreman(512) 4096 22.58 11.76
Foreman(1024) 4096 18.54 9.16
Coast guard(48) 1024 17.46 8.78
Coast guard(112) 1024 20.77 10.82

42
REFERENCES
[1] Z. Zhang and O. D. Faugeras, “Three-dimensional motion computation and object
segmentation in a long sequence of stereo frames,” Rapports de Recherche, INRIA, Juillet,
France, 1991.
[2] X. Q. Gao, C. J. Duanmu and C. R. Zou, “A multilevel successive elimination algorithm for
block matching motion estimation,” IEEE Trans. Image Processing, vol. 9, pp. 501—504,
Mar. 2000.
[3] Gao, X.Q., Duanmu, C.J. and Zou, C.R., 2000, “A multilevel successive elimination
algorithm for block matching motion estimation”, IEEE Trans. Image Processing, March
2000, vol. 9, pp. 501-504.
[4] Zhu, S. and Ma, K.K., 2000. “A new diamond search algorithm for fast block matching
motion estimation”, IEEE Trans. Image Processing, Feb. 2000, vol. 9, pp. 287-290.
[5] Zhu, C., Lin, X. and Chau, L.P., 2002. “Hexagon-based search pattern for fast block motion
estimation”, IEEE Trans. Circuits Syst. Video Technology, May 2002, vol. 12, pp. 349-355.
[6] Yao Nie and Kai-Kuang Ma “Adaptive Rood Pattern Search for Fast Block-Matching Motion
Estimation”IEEE transactions on Image processing, vol. 11, no. 12, Dec. 2002.
[7] Shan Zhu, and Kai-Kuang Ma, “ A New Diamond Search Algorithm for Fast Block-
Matching Motion Estimation”, IEEE Trans. Image Processing, vol 9, no. 2, pp. 287-290,
February 2000
[8] Chun-Ho Cheung, and Lai-Man Po, “A Novel Small Cross-Diamond Search Algorithm for
Fast Video Coding and Video Conferencing Applications”, Proc. IEEE ICIP, Sep. 2002.
[9] Xuan Jing and Lap-Pui Chau “An Efficient Three-Step Search Algorithm for Block Motion
Estimation”IEEE Transactions on Multimedia, Vol. 6, No. 3, June 2004.
[10] F. Essannounietal “Fast exhaustive block-based motion vector estimation algorithm using
fft”The Arabian Journal for Science and Engineering, Volume 32, Number 2C, 2007
[11] BingXiong and Ce Zhu, 2009. “Efficient block matching motion estimation using multilevel
intra and inter subblock features”, IEEE Trans. Evolutionary Computation, Vol. 19, No. 7,
pp. 1039-1050.
[12] Reeja S R and Dr. N. P Kavya, “Motion Detection for Video Denoising – The State of Art
and the Challenges”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 2, 2012, pp. 518 - 525, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[13] Gopal Thapa, Kalpana Sharma and M.K.Ghose, “Multi Resolution Motion Estimation
Techniques for Video Compression: A Survey”, International Journal of Computer
Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 399 - 406, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.

40120140503006

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to 40120140503006

Similar to 40120140503006 (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

40120140503006