SlideShare a Scribd company logo
1 of 280
Download to read offline
Dr. Mohieddin Moradi
mohieddinmoradi@gmail.com
1
Dream
Idea
Plan
Implementation
Section I
− Video Compression History
− A Generic Interframe Video Encoder
− The Principle of Compression
− Differential Pulse-Code Modulation (DPCM)
− Transform Coding
− Quantization of DCT Coefficients
− Entropy Coding
Section II
− Still Image Coding
− Prediction in Video Coding (Temporal and Spatial Prediction)
− A Generic Video Encoder/Decoder
− Some Motion Estimation Approaches
2
Outline
3
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
Lossless Compression
– Transparent (Totally reversible without any loss)
– Compression ratio not guaranteed
– Good for computer data where no loss is important
– Lempel–Ziv–Welch coding
• Lempel–Ziv–Welch (LZW) lossless compression technique (is a universal lossless data
compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.)
• LZW is the compression of a file into smaller file using a table based lookup algorithm.
• It replaces strings of characters with single codes.
• A dynamic coding system
• Used by PKZip, WinZip, GIF, TIF, PNG, Fax, UNIX and Linux, Microsoft DriveSpace (is a disk
compression utility supplied with MS-DOS), and many other compression systems
4
Encoder Decoder
Transparent
Lossy Compression
– Non-transparent.
– Compression ratio is guaranteed.
– Examples
• JPEG, MPEG, DV, Digital Betacam, Betacam SX, IMX, etc..
• MP3, Dolby E, Dolby Digital, DTS (originally Digital Theater Systems), etc..
• Both DTS and Dolby Digital are audio compression technologies, allowing movie makers to record
surround sound that can be reproduced in cinemas as well as homes.
– Good for media data ...
• … where compression ratio is important.
5
Encoder Decoder
Non-transparent
Lossless and Lossy Compression Techniques
− The Lossless approaches in compression process:
− DCT: Discrete Cosine Transform
− VLC: Variable Length Coding
− RLC: Run Length Coding
− The Lossy approaches in compression process:
− Sample subsampling: 4:2:2, 4:2:0, 4:1:1
− DPCM: Differential Pulse Code Modulation:
− Quantization
6
7
− It arises when parts of a picture are often replicated within a single frame of video (with minor
changes).
Spatial Redundancy in Still Images
This area
is all blue
This area is half
blue and half green
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
− Spatial Redundancy Reduction (pixels inside a picture are similar)
− Statistical Redundancy Reduction (more frequent symbols are assigned short code words and less
frequent ones longer words)
The Principle of Compression in Still Images
8
Differential Pulse Code Modulation (DPCM) for Spatial Redundancy Reduction
9
• Zig-Zag Scan
• Run-length coding
• VLC
Quantization
• Major Reduction
• Controls ‘Quality’
10
Intra-frame Compression (Like as still image Compression)
• .
Intra-frame Compression (Like as still image Compression)
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into frequency coefficients
Replaces the original data with shorter codes or symbols
Rearranges the
coefficients from raster
scan to low frequency first
Stores the compressed data & checks compression ratio.
Returns a quantisation signal if ratio is not achieved.
Quantises the data if the compression ratio
has not been achieved
11
Lossless and Lossy in Intra-frame Compression
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearrangement
(reversible)
Compression
ratio checking
(reversible)
Possible loss of entropy
(non-reversible!!)
Data reduction
(reversible)
12
Macroblocks after Color Sub-sampling
13
4:2:0
• .
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into
frequency coefficients
14
Still Image Encoder (Compressor)
• .
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into
frequency coefficients
Quantises the data if the compression ratio
has not been achieved
15
Still Image Encoder (Compressor)
– The quantizer cuts entropy.
– Controlled by the data buffer.
– May pass data through without change.
• The quantizer can effectively “switch off” by going into pass-through mode if the amount
of data from the data buffer is OK.
– May quantize the data in a number of steps.
• If the data buffer fills, a signal is back to the quantizer which switches to a different
quantisation matrix to flatten the data and reduce its content.
Quantisation
16
• .
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
17
÷ =
DifferentStep-sizes(Q)
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
• .
Quantisation
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 0 0
-1 7 -3 -2 1 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 1 1 2 2
1 1 1 1 1 2 2 2
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
18
÷ =
DifferentStep-sizes(Q)
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
• .
Quantisation
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 0 0
-7 -12 8 -8 -1 -1 0 1
4 5 -7 1 2 -2 0 0
-5 -4 2 -1 1 0 0 0
-1 7 -1 -1 0 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 1 1 2 2
1 1 1 1 1 2 2 2
1 1 1 1 2 2 2 4
1 1 1 2 2 2 4 4
1 1 2 2 2 4 4 4
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
19
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
÷ =
DifferentStep-sizes(Q)
Zig-zag Scanning for Separating Redundancy and Entropy
• .
Quantisation
238 -43 -12 -14 -6 0 -2 -4
39 12 -9 13 4 -1 -1 -2
-16 12 10 8 -1 3 2 0
-3 -7 1 -1 2 0 0 0
-7 -12 4 -4 0 0 0 0
4 2 -3 0 1 -1 0 0
-2 -2 1 0 0 0 0 0
0 3 0 0 0 0 0 0
1 1 1 1 1 1 2 2
1 1 1 1 1 2 2 2
1 1 1 1 2 2 2 4
1 1 1 2 2 2 4 4
1 1 2 2 2 4 4 4
1 2 2 2 4 4 4 8
2 2 2 4 4 4 8 8
2 2 4 4 4 8 8 8
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
20
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
÷ =
DifferentStep-sizes(Q)
Zig-zag Scanning for Separating Redundancy and Entropy
• .
Quantisation
1 1 1 2 2 2 4 4
1 1 2 2 2 4 4 4
1 2 2 2 4 4 4 8
2 2 2 4 4 4 8 8
2 2 4 4 4 8 8 8
2 4 4 4 8 8 8 16
4 4 4 8 8 8 16 16
4 4 8 8 8 16 16 16
238 -43 -12 -7 -3 0 -1 -2
39 12 -4 6 2 0 0 -1
-16 6 5 4 0 1 1 0
-1 -3 0 0 1 0 0 0
-3 -6 2 -2 0 0 0 0
2 1 -1 0 0 0 0 0
-1 -1 0 0 0 0 0 0
0 1 0 0 0 0 0 0
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
21
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
÷ =
DifferentStep-sizes(Q)
Zig-zag Scanning for Separating Redundancy and Entropy
• .
Quantisation
1 2 2 4 4 4 8 8
2 2 4 4 4 8 8 8
2 4 4 4 8 8 8 16
4 4 4 8 8 8 16 16
4 4 8 8 8 16 16 16
4 8 8 8 16 16 16 32
8 8 4 16 16 16 32 32
8 8 16 16 16 32 32 32
238 -21 -6 -3 -1 0 0 -1
19 6 -2 3 1 0 0 0
-8 3 2 2 0 0 0 0
0 -1 0 0 0 0 0 0
-1 -3 1 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
22
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
÷ =
DifferentStep-sizes(Q)
Zig-zag Scanning for Separating Redundancy and Entropy
DCT Coefficients after Quantization
23
24
DCT Coefficients after Quantization
• .
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into
frequency coefficients
Rearranges the coefficients from
raster scan to low frequency first
Quantises the data if the compression ratio
has not been achieved
25
Intra-frame Compression (Like as still image Compression)
– Rearranges the DCT coefficients.
• Changed from raster scan so that the DC and low frequency coefficients are first and the
high frequency coefficients are last.
– Helps to separate entropy from redundancy.
Zig-zag Scanning
26
• . 231 31 -12 4 -6 0 4 0
29 8 9 3 4 -2 3 2
-16 2 1 5 -3 2 1 0
-1 -4 0 1 0 1 1 0
-6 -2 4 -1 -1 0 0 0
4 5 7 1 0 0 0 0
-5 -3 0 0 0 0 0 0
-1 3 1 0 0 0 0 0
Non Zig-zag Scanning
27
231 31 -12 4 -6 0 4 0
29 8 9 3 4 -2 3 2
-16 2 1 5 -3 2 1 0
-1 -4 0 1 0 1 1 0
-6 -2 4 -1 -1 0 0 0
4 5 7 1 0 0 0 0
-5 -3 0 0 0 0 0 0
-1 3 1 0 0 0 0 0
Zig-zag Scanning
28
• .
Zig-zag Scanning
29
DC and low frequency coefficients are first
and the high frequency coefficients are last.
• .
Zig-zag Scanning for Separating Redundancy and Entropy
30
RedundancyEntropy
DC and low frequency coefficients are first
and the high frequency coefficients are last.
• .
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into
frequency coefficients Replaces the original data
with shorter codes or symbols
Rearranges the
coefficients from raster
scan to low frequency first
Quantises the data if the compression ratio
has not been achieved
31
Intra-frame Compression (Like as still image Compression)
• .
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into
frequency coefficients Replaces the original data
with shorter codes or symbols
Rearranges the
coefficients from raster
scan to low frequency first
Stores the compressed data & checks
compression ratio. Returns a quantisation
signal if ratio is not achieved.
Quantises the data if the compression ratio
has not been achieved
32
Intra-frame Compression (Like as still image Compression)
– Holds the results from the variable length coder and outputs data at a constant rate.
– If the data buffer empties, ‘packing’ data is output.
– If the data buffer fills, a signal is sent to quantizer.
– The quantizer is instructed to reduce the amount of data.
Data Buffer
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
33
– Simple video signal
• DCT has most of its big numbers in the top left corner.
• Zig-zag scan has all the big numbers at the start of the scan.
• RLC and VLC can reduce the amount of data a lot.
• Amount of data entering the data buffer is small.
– Medium complexity video signal
• DCT has some of its big numbers in the top left corner.
• Zig-zag scan has a few big numbers at the start of the scan.
• RLC and VLC reduce data a bit.
• Amount of data entering the data buffer is OK.
• Data buffer sends the compressed data out as is.
• Data buffer adds packing data.
– Complex video signal
• DCT has big numbers all over the DCT block.
• Zig-zag scan still results in big numbers everywhere.
• RLC and VLC cannot reduce the amount of data very much.
• Amount of data entering the data buffer is too high.
• Data buffer sends a signal back to the quantiser.
• Quantiser reduces the amount of data by cutting entropy.
Data Buffer
34
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
35
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
36
− It arises when parts of a picture are often replicated within a single frame of video (with minor
changes).
Spatial Redundancy in Still Images
This area
is all blue
This area is half
blue and half green
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
This picture is the
same as the previous
one except for this
area
− It arises when successive frames of video display images of the same scene.
− Take advantage of similarity between successive frames
37
Temporal Redundancy in Moving Images
This picture is the
same as the previous
one except for this
area
38
Temporal Redundancy in Moving Images
Moving Picture Redundancies
Temporal Redundancy
− It arises when successive frames of video display images of the same scene.
Spatial Redundancy
− It arises when parts of a picture are often replicated within a single frame of video (with
minor changes).
39
Temporal Redundancy (inter-frame)
Spatial Redundancy (intra-frame)
The MPEG video compression algorithm achieves very high rates of compression by exploiting the
redundancy in video information.
− Spatial Redundancy Reduction (pixels inside a picture are similar)
− Temporal Redundancy Reduction (Similarity between the frames)
− Statistical Redundancy Reduction (more frequent symbols are assigned short code words and less
frequent ones longer words)
The Principle of Compression for Moving Images
40
Spatial Redundancy Reduction, Recall
41
Spatial Redundancy Reduction
Transform coding
Discrete Sine
Transform (DST)
Discrete Wavelet
Transform (DWT)
Hadamard
Transform(HT)
Discrete Cosine
Transform (DCT)
Differential Pulse Code Modulation
(DPCM)
The goal of the prediction model is to reduce redundancy by forming a prediction of the data and subtracting
this prediction from the current data.
− The residual is encoded and sent to the decoder which re-creates the same prediction so that it can add the
decoded residual and reconstruct the current frame.
− In order that the decoder can create an identical prediction, it is essential that the encoder forms the
prediction using only data available to the decoder, i.e. data that has already been coded and transmitted.
Prediction Model
42
Decoder
Encoded Residual
I. Re-creates the same prediction (predictor)
II. Add the decoded residual to prediction (predictor)
III. Reconstructs a version of the original block
Encoder
I. Forming a prediction (predictor)
II. Subtracting prediction from the current
data to create residual
Spatial Prediction: The prediction is formed from previously coded image samples in the same frame
− The output of this process is a set of residual or difference samples and the more accurate the
prediction process, the less energy is contained in the residual.
Temporal Prediction: The prediction is formed from previously coded frames
43
Inter Frame (Temporal) and Intra Frame (Spatial) Prediction
44
Inter Frame (Temporal) and Intra Frame (Spatial) Prediction
45
Inter Frame (Temporal) and Intra Frame (Spatial) Prediction
The prediction for the current block of image samples is
created from previously-coded samples in the same frame.
− Assuming that the blocks of image samples are coded in
raster-scan order, which is not always the case, the
upper/left shaded blocks are available for intra prediction.
− These blocks have already been coded and placed in the
output bitstream.
− When the decoder processes the current block, the shaded
upper/left blocks are already decoded and can be used to
re-create the prediction.
− H.264/AVC uses spatial extrapolation to create an intra
prediction for a block or macroblock.
46
Intra Prediction
Available samples
Spatial extrapolation
47
Intra Prediction
Vertical
Horizontal + + + +
+
+
+
+
Mean
DC Diagonal down-left
Horizontal up
Diagonal right
Vertical right Vertical leftHorizontal down
Intra Prediction (Ex: H.264/AVC)
1
2
34
56
7
8
Mode Mode Name
0 DC
1 Vertical
2 Horizontal
3 Diagonal down/right
4 Diagonal down/left
5 Vertical –right
6 Vertical-left
7 Horizontal- Up
8 Horizontal-down
48
A B C D
E F G H
I J K L
M N O P
A B C D
I
J
K
L
M E F G H
Mode 1
Mode 6
Mode 0 Mode 5 Mode 4
A B C D
E F G H
I J K L
M N O P
A B C D
I
J
K
L
M E F G H
Mode 8
Mode 3 Mode 7
49
Intra Prediction for 4x4 Luma Blocks
Mode 0: DC Prediction
− If all samples A, B, C, D, I, J, K, L, are available, a=b=c=…=p = (A+B+C+D+I+J+K+L+4) / 8.
− If A, B, C, and D are not available and I, J, K, and L are available, a=b=c=…=p =(I+J+K+L+2) / 4.
− If I, J, K, and L are not available and A, B, C, and D are available, a=b=c=…=p =(A+B+C+D+2) /4.
− If all eight samples are not available, a=b=c=…=p = 128.
Intra Prediction (Ex: H.264/AVC)
1
2
34
56
7
8
a b c d
e f g h
i j k l
m n o p
Q A B C D E F G
I
J
K
L
H
50
Intra Prediction for 4x4 Luma Blocks
Mode 1: Vertical Prediction
− This mode shall be used only if A, B, C, D are available. The prediction in this mode shall be as follows:
• a, e, i, m are predicted by A,
• b, f, j, n are predicted by B,
• c, g, k, o are predicted by C,
• d, h, l, p are predicted by D.
Intra Prediction (Ex: H.264/AVC)
1
2
34
56
7
8
a b c d
e f g h
i j k l
m n o p
Q A B C D E F G
I
J
K
L
H
51
Intra Prediction for 4x4 Luma Blocks
Mode 3: Diagonal Down/Right prediction
− This mode is used only if all A,B,C,D,I,J,K,L,Q are inside the picture. This is a 'diagonal' prediction.
• m is predicted by: (J + 2K + L + 2)/4
• i, n are predicted by: (I + 2J + K + 2)/4
• e, j, o are predicted by: (Q + 2I + J + 2)/4
• a, f, k, p are predicted by: (A + 2Q + I + 2)/4
• b, g, l are predicted by: (Q + 2A + B + 2)/4
• c, h are predicted by: (A + 2B + C + 2)/4
Intra Prediction (Ex: H.264/AVC)
a b c d
e f g h
i j k l
m n o p
Q A B C D E F G
I
J
K
L
H
1
2
34
56
7
8
a b c d
e f g h
i j k l
m n o p
A B C D
I
J
K
L
M E F G H
Mode 4
a b c d
e f g h
i j k l
m n o p
A B C D
I
J
K
L
M E F G H
Mode 8
Intra Prediction for 4x4 Luma Blocks
Example of 4 x 4 luma block
– Sample a, d : predicted by round(I/4 + M/2 + A/4), round(B/4 + C/2 + D/4) for mode 4
– Sample a, d : predicted by round(I/2 + J/2), round(J/4 + K/2 + L/4) for mode 8
Intra Prediction (Ex: H.264/AVC)
53
Ex. Intra Prediction for 4x4 Luma Blocks
Intra Prediction (Ex: H.264/AVC)
A 4 × 4 luma block, part of the
highlighted macroblock
QCIF frame with highlighted macroblock
54
Ex. Intra Prediction for 4x4 Luma Blocks
− The 9 prediction modes 0-8 are calculated for the
following 4 × 4 block.
− The Sum of Absolute Errors (SAE) for each
prediction indicates the magnitude of the
prediction error.
− In this case, the best match to the actual current
block is given by mode 8, horizontal-up, because
this mode gives the smallest SAE.
− A visual comparison shows that the mode 8 P
block (prediction block) appears quite similar to
the original 4 × 4 block.
Intra Prediction (Ex: H.264/AVC)
55
Intra Prediction for 4x4 Chroma Blocks (Only one mode: DC Prediction)
− A, B, C, D are four 4x4 blocks in a 8x8 chroma block.
− S0, S1, S2, S3 are the sums of 4 neighboring pixels.
Intra Prediction (Ex: H.264/AVC)
If S0, S1, S2, S3 are all inside the frame:
A = (S0 + S2 + 4)/8
B = (S1 + 2)/4
C = (S3 + 2)/4
D = (S1 + S3 + 4)/8
If only S0 and S1 are inside the frame:
A = (S0 + 2)/4
B = (S1 + 2)/4
C = (S0 + 2)/4
D = (S1 + 2)/4
If only S2 and S3 are inside the frame:
A = (S2 + 2)/4
B = (S2 + 2)/4
C = (S3 + 2)/4
D = (S3 + 2)/4
If S0, S1, S2, S3 are all outside the frame: A = B = C = D = 128
A B
C D
S1S0
S2
S3
56
Intra Prediction (Ex: H.264/AVC)
16x16 Intra Prediction Mode
− Especially suitable for smooth areas
− Prediction Modes
• Mode 0 =Vertical Prediction
• Mode 1 = Horizontal Prediction
• Mode 2 = DC prediction
• Mode 3 = Plane prediction
− Residual coding
• Another 4x4 transform is applied to the 16 DC coefficients
• Only single scan is used.
57
Intra Prediction (Ex: H.264/AVC)
16x16 Intra Prediction Mode
− Mode 0 =Vertical Prediction
• Pred(i, j) = P(i, -1), i, j = 0, 1, ...,15
− Mode 1 = Horizontal Prediction
• Pred(i, j) = P(-1, j), i, j = 0, 1, ...,15
− Mode 2 = DC prediction
• Pred(i, j) =
• i, j = 0, 1, ..., 15
− Mode 3 = Plane prediction
• Pred(i, j)= max(0, min(255, (a+b× (i-7)+c ×(j-7) +16)/32 ) ),
• where a=16×(P(-1,15)+P(15,-1)), b=5 ×(H/4)/16, c=5 ×(V/4)/16



8
1
))1,7()1,7((
i
iPiPiH 


8
1
))7,1()7,1((
j
jPjPjV
32/)))1,(),1(((
15
0


i
iPiP
58
Ex. Intra 16×16
− A luma macroblock with
previously-encoded samples
at the upper and lefthand
edges.
− The best match is given by
mode 3 which in this case
produces a plane with a
luminance gradient from
light at the upper-left to dark
at the lower-right.
Intra Prediction (Ex: H.264/AVC)
59
Application of Difference Frame in Video Coding
Less Information
60
General Reasons for Differences Between Two Frames
Differences between two frames can be caused by
• Camera motion: the outlines of background or stationary objects can be seen in the Diff Image
• Object motion: the outlines of moving objects can be seen in the Diff Image
• Illumination changes: sun rising, headlights, etc.
• Scene Cuts: Lots of stuff in the Diff Image
• Noise: If the only difference between two frames is noise (nothing moved), then you won’t recognize
anything in the Difference Image
We try to minimize entropy in difference image by motion prediction
61
Typical Camera Motions, Recall
Track right
Dolly
backward
Boom up
(Pedestal up)
Pan right
Tilt up
Track left
Dolly forward
Boom down
(Pedestal down)
Pan left
Tilt down
Roll
62
Typical Objects Motions, Recall
Translation: Simple movement of typically rigid objects
Camera pans vs. movement of objects
Rotation: Spinning about an axis
– Camera versus object rotation
Zooms –in/out
– Camera zoom vs. object zoom (movement in/out)
Frame n+1
(Translation)
Frame n+1
(Rotation)
Frame n+2
(Zoom)
Frame n
63
Difference Frames and Motion
64
Difference Frames and Motion
65
Difference Frames and Motion
66
Difference Frames and Motion
Frame N
Frame N+1 (Frame N) - (Frame N+1)
67
Difference Frames and Motion
68
Frame N
Frame N+1
(Frame N) - (Frame N+1)
Difference Frames and Motion
69
Frame N
Frame N+1
(Frame N) - (Frame N+1)
Difference Frames and Motion
Difference Frame
Without Motion Prediction
Difference Frame
With Motion Prediction
Frame N Frame N+1
70
Goal: to remove the correlation by motion compensation
If you can see something in the Diff Image and recognize it, there’s still correlation in the difference image.
Temporal Prediction (Motion Prediction)
Temporal Redundancy Reduction
− Pixels in the successive frames of the same locations are highly correlated.
− In static parts of the picture, they are virtually the same.
− Due to motion, they are displaced, but their motion compensated values become more similar.
− The accuracy of the prediction can usually be improved by compensating for motion between the
reference frame(s) and the current frame.
• Hence motion compensated frame difference pixels become smaller
• Instead of transforming a block of pixels, their motion compensated values are transformed and quantised.
• The predicted frame is created from one or more past or future frames known as reference frames (anchor).
71
Temporal Prediction (Motion Prediction)
Frame 1 (as a predictor for frame 2 ) Frame 2 (current frame ) Difference
Mid-grey represents a difference of zero and light or dark greys correspond to positive and negative differences.
Preveious frame N Next frame N+1
Small Different
(Residual block)
Y
X
(Forward) Motion Vector
Best Match for
Macroblock in
preveious Frame N
(the predictor for the
macroblock in the next frame)
Macroblock in next frame N+1
Best Match for Macroblock in preveious Frame N
(motion-compensated prediction)
Macroblock in next frame N+1
Differentiator
72
Movement is cancelled in frames and added as vector information in header.
Temporal Prediction (Motion Prediction)
73
Image t Image t-1 Diff. without motion compensation
Differences with motion compensationMotion vectors for Blocks
Temporal Prediction (Motion Prediction), Example
74
Image to codeReference image
Temporal Prediction (Motion Prediction), Example
75
Searching area for finding predictor
Image to code
Reference image
Temporal Prediction (Motion Prediction), Example
76
(a motion compensated prediction)
The chosen candidate region becomes the
predictor for the current MxN block.
Motion Estimation (ME)
The process of finding the best match (finding MVs)
Residual block
Best Match for Macroblock in
preveious Frame N
(motion-compensated prediction)
Macroblock in next frame N+1
Diff.
Motion Compensation (MC)
− The selected ‘best’ matching region in the
reference frame is subtracted from the current
macroblock to produce a residual macroblock.
− This residual block encoded and transmitted
together with a motion vector describing the
position of the best matching region relative to
the current macroblock position.
Temporal Prediction (Motion Prediction)
Motion Estimation (ME)
− Search an area in the reference frame (a past or future frame) to find a similar MxN-sample region.
− The process of finding the best match (Motion Vector) is known as motion estimation (ME).
Motion Compensation (MC)
− The chosen candidate region becomes the predictor for the current MxN block (a motion compensated
prediction) and is subtracted from the current block to form a residual MxN block.
− The residual block is encoded and transmitted and the offset between the current block and the position
of the candidate region (motion vector) is also transmitted.
− The decoder uses the received motion vector to re-create the predictor region.
− It decodes the residual block, adds it to the predictor and reconstructs a version of the original block.
77
Temporal Prediction (Motion Prediction)
78
Motion Vector Extraction
− First, the frame to be approximated, the current frame, is chopped up into uniform non-overlapping
blocks.
− Then each block in the current frame is compared to areas of similar size from the previous frame in order
to find an area that is similar. A block from the current frame for which a similar area is sought is known as
a target block.
− The location of the similar or matching block in the past frame might be different from the location of the
target block in the current frame. The relative difference in locations is known as the motion vector.
− If the target block and matching block are found at the same location in their respective frames then the
motion vector that describes their difference is known as a zero vector
Temporal Prediction (Motion Prediction)
Block-based Motion Estimation:
− Motion estimation of a macroblock involves finding a M×N-sample region in a reference frame that
closely matches the current macroblock (best match).
− The reference frame is a previously encoded frame from the sequence and may be before or after the
current frame in display order. (frames must be encoded out of order)
− Where there is a significant change between the reference and current frames (ex: a scene change or
an uncovered area) it may be more efficient to encode the macroblock without motion compensation
and so an encoder may choose intra mode encoding using intra prediction.
Block-based Motion Compensation:
− The luma and chroma samples of the selected ‘best’ matching region in the reference frame is
subtracted from the current macroblock to produce a residual macroblock that is encoded and
transmitted together with a motion vector describing the position of the best matching region relative to
the current macroblock position.
Block-based Motion Estimation and Compensation
79
T=2 (current)
Block-based Motion Estimation and Compensation, Ex1
Search Window
T=1 (reference)
80
81
Block-based Motion Estimation and Compensation, Ex2
T=2 (current)
T=1 (reference)
82
Frame 1 s[x,y,t-1](previous) Frame 2 s[x,y,t](current) Partition of frame 2 into blocks (schematic)
Frame 2 with displacement vectors Difference between motion-compensated
prediction and current frame u[x,y,t] Referenced blocks in frame 1
Block-based Motion Estimation and Compensation, Ex3
83
Effectiveness of Block Based Motion Prediction
The effectiveness of compression techniques that use block based motion compensation depends on
the extent to which the following assumptions hold.
• Objects move in a plane that is parallel to the camera plane. Thus the effects of zoom and
object rotation are not considered, although tracking in the plane parallel to object motion is.
• Illumination is spatially and temporally uniform. That is, the level of lighting is constant
throughout the image and does not change over time.
• Occlusion of one object by another, and uncovered background are not considered.
Frame N
Frame N+1
Available
from earlier
frame (N)
Not available from earlier frame (N)
for prediction of frame N+1
84
Occlusion in Motion Estimation and Compensation, Example
Occlusion parts of one object by another object
85
Motion Estimation and Block Matching Algorithms
To carry out motion compensation, the motion of the moving objects has to be estimated first.
− The technique in all the standard video codecs is the Block Matching Algorithm (BMA).
− In a typical BMA, a frame is divided into blocks of 𝑀 × 𝑁 pixels or, more usually, square blocks
of 𝑁 × 𝑁 pixels.
− Then, for a maximum motion displacement of w pixels/frame, the current block of pixels is
matched against a corresponding block at the same coordinates but in the previous/next
frame, within the square window of width 𝑁 + 2𝑤.
− The best match on the basis of a matching criterion yields the displacement.
− Measurements of the video encoders’ complexity show that ME comprises almost 50–70 per
cent of the overall encoder’s complexity.
N+2w
(m,n
(m+i,n+j)
w
w
N+2w
i
j
(NxN) block
in the current
frame
search window
in the previous
frame
(NxN) block under the search
in the previous frame, shifted by i,j
)
 
2
1 1
2
),(),(
1
),(  

N
m
N
n
jnimgnmf
N
jiM
wjiw  ,
 

N
m
N
n
jnimgnmf
N
jiM
1 1
2
,(),(
1
),(
,
,
Mean Squared Error (MSE)
Mean Absolute Error (MAE)
Complexity
• Various measures such as the Cross-correlation Function
(CCF), Mean Squared Error (MSE) and Mean Absolute
Error (MAE) can be used in the matching criterion.
• For the best match, in the CCF the correlation has to be
maximised, whereas in the latter two the distortion must
be minimised.
• In practical coders, both MSE and MAE are used, since it
is believed that CCF would not give good motion
tracking, especially when the displacement is not large.
(2𝑊 + 1)2
Motion Estimation and Block Matching Algorithms
86
87
Reference Frame Current Frame
Search
Range
Motion
Vector
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
11 23 34 44 55 66 77 88
11 23 34 44 55 66 77 88
11 23 34 44 55 66 77 88
11 23 34 44 55 66 77 88
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
|A|=12
Exhaustive Block Matching Algorithm (EBMA)
88
1 1
1
1
1 2
2
2
3
3
4
44
4 4
1
2 3
4
Two-dimensional Log Search Algorithm (TDL)
89
11 1
1
1
2 2
2
2
11
11
2
2
2
2
3 3 3
3 3
3 3 3
1 2
3
Three-Step Search Algorithm (TSS)
90
Cross Search Algorithm (CSA)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Step.1 => Step.2
Side
=> Step.3
Side
=> Step.4
Center
Total Search
5+3+3+8=19 Points
Normal Worse Case
N=4 for search range -7 to +7
91
Diamond Search
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Step.1 =>Step.2
Side
=> Step.3
Side
=> Step.4
Side
=> Step.5
Total Search
9+6+6+4+4=29
Points
92
Hexagon-Based Search Algorithm
First step
Second step
Final step
Third step
Algorithm Maximum number of
search points
4 8 16
w
TDL 2 + 7 log w2
16 23 30
TSS 1 + 8 log w2 17 25 33
MMEA 1 + 6 log w
2
13 19 25
CDS 3 + 2 w 11 19 35
CSA 5 + 4 log w
2
13 17 21
(2w+1)FSM 81 289 10892
OSA 1 + 4 log w 9 13 17
2
Computational Complexity
Complexity/Performance
93
The range of motion speed from w=4 to 16 pixels/frame
− The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean-
square error, average error and standard deviation.
FSM: Full Search Mode
TDL: Two-dimensional Logarithmic
TSS: Three-step Search
MMEA: Modified Motion Estimation Algorithm
CDS: Conjugate Direction Search
OSA: Orthogonal Search Algorithm
CSA: Cross-search Algorithm
Algorithm
Split Screen Trevor White
Entropy
(bits/pel)
Standard
deviation
Entropy
(bits/pel)
Standard
deviation
FSM 4.57 7.39 4.41 6.07
TDL 4.74 8.23 4.60 6.92
TSS 4.74 8.19 4.58 6.86
MMEA 4.81 8.56 4.69 7.46
CDS 4.84 8.86 4.74 7.54
OSA 4.85 8.81 4.72 7.51
CSA 4.82 8.65 4.68 7.42
Compensation Efficiency
Complexity/Performance
94
The motion compensation efficiencies of some algorithms for a motion speed of
w=8 pixels/frame for two test image sequences (Split screen and Trevor white)
− The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean-
square error, average error and standard deviation.
FSM: Full Search Mode
TDL: Two-dimensional Logarithmic
TSS: Three-step Search
MMEA: Modified Motion Estimation Algorithm
CDS: Conjugate Direction Search
OSA: Orthogonal Search Algorithm
CSA: Cross-search Algorithm
Some Search Points Comparison
Methods Min Max Average Speed-up
FS 225 225 225 1
TSS 25 25 25 9
4SS 17 27 17.2 13.1
NTSS 17 33 17.5 12.8
Diamond 13 33 13.3 16.9
Method Ave Criterion Ave Distance (FS) Optimality
FS 2753 0 100%
TSS 2790 0.04 98.5%
4SS 2777 3.84 98.7%
NTSS 2775 2.98 99.0%
Diamond 2770 3.11 98.9%
95
96
Sub-pixel Motion Compensation
− In the first stage, motion estimation finds the best match
on the integer pixel grid (circles).
− The encoder searches the half-pixel positions
immediately next to this best match (squares) to see
whether the match can be improved and if required, the
quarter-pixel positions next to the best half-pixel position
(triangles) are then searched.
− The final match, at an integer, half-pixel or quarter-pixel
position, is subtracted from the current block or
macroblock.
Sub-pixel Motion Compensation
97
Close-up of reference region Reference region interpolated to half-pixel positions
98
Half-Pel Accuracy EBMA
99
Motion Compensation Block Size
Motion Compensation Block Size
100
Frame 2
Frame 1
Residual : no motion compensation
Residual : 16 × 16 block size
Residual : 8 × 8 block size
Residual : 4 × 4 block size
− The smaller motion compensation block sizes can produce better motion compensation results.
• Motion compensating each 8 × 8 block instead of each 16 × 16 macroblock reduces the residual
energy further and motion compensating each 4 × 4 block gives the smallest residual energy of all.
− However, a smaller block size leads to increased complexity, with more search operations to be carried
out, and an increase in the number of motion vectors that need to be transmitted.
− An effective compromise is to adapt the block size to the picture characteristics, for example
• choosing a large block size in flat, homogeneous regions of a frame
• choosing a small block size around areas of high detail and complex motion
Motion Compensation Block Size
101
102
𝒄(𝒙, 𝒚, 𝒕)
DCT-based
Decoder
Image
Buffer
𝒇(𝒙, 𝒚, 𝒕)
^
𝒇(𝒙 + 𝑫𝒙, 𝒚 + 𝑫𝒚, 𝒕 − 𝟏)
^ +
+
Motion
Estimation
(𝑫𝒙, 𝑫𝒚, 𝒕) (𝑫𝒙, 𝑫𝒚, 𝒕)
Motion Compensated DPCM
Previously-coded Frame at t-1
Extra Coding Parameter,
Motion vector: (Dx, Dy)
Current Frame at t
𝒇(𝒙, 𝒚, 𝒕)
Lossless
Encoder
DCT-based
Coder
𝒅(𝒙, 𝒚, 𝒕)
Image
Buffer
𝒇(𝒙, 𝒚, 𝒕)
^
𝒇(𝒙 + 𝑫𝒙, 𝒚 + 𝑫𝒚, 𝒕 − 𝟏)
^
+ _
+
+
MC-DPCM
f(x, y, t)
f(x+Dx, y+Dy, t1)^
DPCM
− It is possible to estimate the trajectory of each pixel between successive video frames, producing a field
of pixel trajectories known as the optical flow or optic flow.
− If the optical flow field is accurately known, it should be possible to form an accurate prediction of most of
the pixels of the current frame by moving each pixel from the reference frame along its optical flow
vector.
− However, this is not a practical method of motion compensation. (An accurate calculation of optical flow
is very computationally intensive)
Optical Flow or Optic Flow
103
Frame 1
(as a predictor for frame 2 )
Frame 2
(current frame )
Optical Flow or Optic Flow
− The macroblock, corresponding to a 16×16-pixel region of a frame, is the basic unit for motion
compensated prediction in a number of important visual coding standards including MPEG-1, MPEG-2,
MPEG-4 Visual, H.261, H.263 and H.264.
− For source video material in the popular 4:2:0 format, a macroblock is organized as shown in Figure.
− An H.261 codec processes each video frame in units of a macroblock.
104
Y Y
Y Y Cr Cb
8
8
Ex: Motion Estimation in H.261
Macro-block
– Motion estimation of a macroblock involves finding a 16×16-sample
region in a reference frame that closely matches the current
macroblock.
– Luminance: 16x16, four 8x8 blocks
– Chrominance: two 8x8 blocks
– Motion estimation only performed for luminance component
Motion Vector Range
– [ -15, 15]
– MB: 16 x 16
15
15
15 15
Search Area in Reference Frame
MB
105
Ex: Motion Estimation in H.261
𝑪𝒓 𝑪𝒃
𝒀
𝒀 𝟎 𝒀 𝟏
𝒀 𝟐 𝒀 𝟑
− Integer pixel ME search only
− Motion vectors are differentially & separately encoded
− 11-bit VLC for MVD (Motion Vector Delta)
Example
MV = 2 2 3 5 3 1 -1
MVD = 0 1 2 -2 -2 -2…
− Binary: 1 010 0010 0011 0011 0011…
]1[][
]1[][


nMVnMVMVD
nMVnMVMVD
yyy
xxx
106
MVD VLC
… …
-2 & 30 0011
-1 011
0 1
1 010
2 & -30 0010
3 & -29 0001 0
Ex: Motion Vectors Coding in H.261
𝐌𝐕𝐃 = 𝒇(𝒙, 𝒚, 𝒕) − 𝒇(𝒙+𝜟𝒙, 𝒚 + 𝜟𝒚, 𝒕 − 𝟏)
Uncompressed SDTV Digital Video Stream - 170 Mb/s
Picture 830kBytes Picture 830kBytesPicture 830kBytes
100 kBytes
I Frame
Picture 830kBytes
B Frame
12-30 kBytes12-30 kBytes
B Frame
33-50 kBytes
P Frame
I - Intra coded picture without reference to other pictures. Compressed using spatial redundancy only
MPEG-2 Compressed SDTV Digital Video Stream - 3.9 Mb/s
P - Predictive coded picture using motion compensated prediction from past I or P frames
B - Bi-directionally predictive coded picture using both past and future I or P frames
I, P & B Frames (Ex: MPEG 1)
107
I: Intra Coded Frame
P: Predictively Coded (Predictive-coded ) Frame
B: Bidirectionally Coded (Bidirectional-coded) Frame
• Intraframe Compression
– Frames marked by (I) denote the frames that are strictly intraframe compressed.
– The purpose of these frames, called the "I pictures", is to serve as random access points
to the sequence.
I Frames
108
• P Frames use motion-compensated forward predictive compression on a block basis.
– Motion vectors and prediction errors are coded.
– Predicting blocks from closest (most recently decoded) I and P pictures are utilised.
Forward Prediction
P Frames
109
• B frames use motion-compensated bi-directional predictive compression on a block basis.
– Motion vectors and prediction errors are coded.
– Predicting blocks from closest (most recently decoded) I and P pictures are utilised.
Forward Prediction
Bi-Directional Prediction
B Frames
110
Backward Prediction
I-pictures
• They are coded without reference to the previous picture.
• They provide access points to the coded sequence for decoding (intraframe coded as for JPEG)
P-pictures
• They are predictively coded with reference to the previous I- or P-coded pictures.
• They themselves are used as a reference (anchor) for coding of the future pictures.
B-pictures
• Bidirectionally coded pictures, which may use past, future or combinations of both pictures in their
predictions.
D-pictures
• As intraframe coded, where only the DC coefficients are retained.
• Hence, the picture quality is poor and normally used for applications like fast forward.
• D-pictures are not part of the GOP; hence, they are not present in a sequence containing any other
picture types. 111
I, P, B and D Pictures Features
• Relative number of (I), (P), and (B) pictures can be arbitrary.
• Group of Pictures (GOP) is the Distance from one I frame to the next I frame.
• Ex: MPEG-2: An I picture is mandatory at least once in a sequence of 132 frames (period_max=132)
1 2 3 4 5 6 7 8 9 10 11 12 1
GOP = 12
Group of Pictures
112
An I picture is mandatory at least once in a sequence of 132 frames (period_max= 132)
GOP = 6
GOP = 2
GOP = 2
113
Group of Pictures, Examples
– I frames are independently encoded.
– P frames are based on previous I, P frames.
– B frames are based on previous and following I and/or P frames.
114
The Typical Size of Compressed Frames
I: Intra Coded Frame
P: Predictively Coded (Predictive-coded ) Frame
B: Bidirectionally Coded (Bidirectional-coded) Frame
Type Size Compression
I 18kB 7:1
P 6kB 20:1
B 2.5kB 50:1
Avg 4.8kB 27:1
Typical Sizes of MPEG-1 Frames
– If B-pictures are not used for predictions of future frames, then they can be coded with the highest
possible compression without any side effects.
– This is because, if one picture is coarsely coded and is used as a prediction, the coding distortions are
transferred to the next frame. This frame then needs more bits to clear the previous distortions, and the
overall bit rate may increase rather than decrease.
– The typical size of compressed P-frames is significantly smaller than that of I-frames (because temporal
redundancy is exploited in inter-frame compression).
– B-frames are even smaller than P-frames because
• the advantage of bi-directional prediction
• the lowest priority given to B-frames
115
Type Size Compression
I 18kB 7:1
P 6kB 20:1
B 2.5kB 50:1
Avg 4.8kB 27:1
Typical Sizes of MPEG-1 Frames
The Typical Size of Compressed Frames
Previous reference Current frame Future reference
Forward prediction
P-frame (Predictive Coded Frame) Features
Forward prediction
Why we need P frame?
116
• .
+
-1
+1
Intra frame
compression
P’
P
– The P-frames are forward predicted from the last I-frame or P-
frame.
– It is impossible to reconstruct them without the data of another
frame (I or P).
– Are coded with respect to the nearest previous I- or P-frames.
– This technique is called forward prediction.
– It uses motion compensation to provide more compression than I-
frames.
– About 30% the size of an I frame
P-frame (Predictive Coded Frame) Features
117
Forward Prediction
Difference
Forward Motion vector
Reference image
(previous image)
Encode:
− Motion Vector - difference in spatial location of macro-blocks.
− Small difference in content of the macro-blocks
Current image
Huffman
Coding
P-frame (Predictive Coded Frame) Features
118
000111010…...
B-frame (Bidirectional Coded Frame) Features
Previous reference Current frame Future reference
Forward prediction Backward prediction
Why we need B frame?
119
120
– The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous
frame because half of the ball was occluded by another object.
– A match however can readily be obtained from the next frame.
B-frame (Bidirectional Coded Frame) Features
Forward
prediction
Backward
prediction
Best match
Forward Motion Vector
Macroblock to be coded
Previous reference picture
Current B-picture
Future reference picture
Best match
Backward Motion Vector
121
Forward Prediction
Backward Prediction
B-frame (Bidirectional Coded Frame) Features
122
B-frame (Bidirectional Coded Frame) Features
– B-frame requires information of the previous and following I-frame
and/or P-frame for encoding and decoding.
– Three types of motion compensation techniques are used:
• Forward motion compensation uses past anchor frame
information.
• Backward motion compensation uses future anchor frame
information.
• Interpolative motion compensation uses the average of the
past and future anchor frame information.
– It uses motion compensation to provide more compression than I-and
P-frames.
– About 15% the size of an I frame.
123
B-frame (Bidirectional Coded Frame) Features
+
+1
Intra frame
compression
B’
B
-1
2 -1
2
Forward Prediction
Bi-Directional PredictionBackward Prediction
124
– B-pictures have access to both past and future anchor pictures.
– Such an option increases the motion compensation efficiency, particularly when there are occluded
objects in the scene.
– In fact, one of the reasons for the introduction of B-pictures was this fact that the forward motion
estimation and P-pictures cannot compensate for the uncovered background of moving objects.
– From the two forward and backward motion vectors, the coder has a choice of choosing any of the
forward, backward or their combined motion-compensated predictions.
B-frame (Bidirectional Coded Frame) Features
125
– Note that B-pictures do not use motion compensation from each other, since they are not used as
predictors.
– Also note that the motion vector overhead in B-pictures is much more than in P-pictures.
– The reason is that, for B-pictures, there are more macroblock types, which increase the macroblock type
overhead, and for the bidirectionally motion-compensated macroblocks two motion vectors have to be
sent.
B-frame (Bidirectional Coded Frame) Features
- [ + ] =
Past reference Target Future reference
Encode:
− Two motion vectors - difference in spatial location of macro-blocks.
• Two motion vectors are estimated (one to a past frame, one to a future frame).
− Small difference in content of the macro-blocks
126
Interpolative compensation uses the waited average of the past and future anchor frame information.
B-frame (Bidirectional Coded Frame) Features
FMV
BMV
𝒘 𝟏 𝒘 𝟐
DCT + Quant + RLE
Huffman CodeMotion Vectors 000111010…...
127
The combined motion-compensated predictions
– A weighted average of the forward and backward motion-compensated pictures is calculated.
– The weight is inversely proportional to the distance of the B-picture with its anchor pictures.
Ex: GOB structure of I, B1, B2, P
– The bidirectionally interpolated motion-compensated picture for B1 would be two-thirds of the forward
motion-compensated pixels from the I-picture and one-third from backward motion-compensated pixels
of the P-picture.
B-frame (Bidirectional Coded Frame) Features
𝑀𝑉 𝑓𝑜𝑟 𝐵1 =
2
3
𝐹𝑀𝑉 𝑓𝑟𝑜𝑚 𝐼 +
1
3
𝐵𝑀𝑉 (𝑓𝑟𝑜𝑚 𝑃)
New Search Mechanism: Prediction
Spatial Motion Vector Prediction:
a. Due to the spatial correlation, the motion vector of the current block is close to those in nearby blocks.
− Usage of Predictor:
1. Initial search point
2. DPCM coding of motion vector
128
𝒗𝒊, 𝒋 of current
block
vi,j1vi,j2
vi 1,j1vi 1,j2 vi 1,j 1vi 1,j
Previous coded blocks
},,{~
,11,11, jijijiij Mean  vvvv
Predictor Example 1:
},,{~
,11,11, jijijiij Median  vvvv
Predictor Example 2:
Uncoded
Block
New Search Mechanism: Prediction
Temporal Motion Vector Prediction:
b. Due to the temporal correlation, the motion vector of the current block is close to those in nearby
blocks in the previous frame.
129
of current block
t
ijv
t
ji ,1v t
ji 1,1 vt
ji 1,1 vt
ji 2,1 v
t
ji 2, v t
ji 1, v1t
ijv
1
,1


t
jiv 1
1,1


t
jiv1
1,1


t
jiv1
2,1


t
jiv
1
2,


t
jiv 1
1,


t
jiv 1
,
t
jiv
Current FramePrevious Frame
Uncoded
Block
(all are coded)
},,,,,{~ 1
,1
1
,,11,11,



 t
ji
t
ji
t
ji
t
ji
t
ji
t
ij Median vvvvvv
(Predictor Example)
Uncoded
Block
Uncoded
Block
Uncoded
Block
Uncoded
Block
Uncoded
Block
130
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
131
Inter-frame and Intra-frame Coding
Still Image Compression
Moving Picture Compression
Intra-frame Compression/Coding
– Is a picture coded without
reference to any picture
except itself.
– It is a still image encoded
in JPEG in real-time.
– Often, I pictures (I-frames)
are used for random
access and as references
for the decoding of other
pictures.
132
• Zig-Zag Scan
• Run-length coding
• VLC
Quantization
• Major Reduction
• Controls ‘Quality’
133
Intra-frame Compression (Like as still image Compression)
• .
Intra-frame Compression (Like as still image Compression)
Data
Buffer
Entropy
Coding
Quantisation
Discrete
Cosine
Transform
Base band
input
Compressed
output
Zig-zag
Scanning
Rearranges the pixels into
frequency coefficients Replaces the original data
with shorter codes or symbols
Rearranges the
coefficients from raster
scan to low frequency first
Stores the compressed data & checks
compression ratio. Returns a quantisation
signal if ratio is not achieved.
Quantises the data if the compression ratio
has not been achieved
134
Inter-frame Compression/Coding
135
136
Inter-frame Compression/Coding
– Inter-frame coding removes temporal redundancy (Inter-frames reduce the average bit rate for the same
quality!)
– Relies on successive frames looking similar.
• Does not work well with cuts and breaks.
– 2 different types of comparison.
• P-frame & B-frame.
– Inter-frames need a ‘reference’ frame.
• i.e. an I-frame (or P-frame).
– Many inter-frames can be used after the I frame.
• This can reduce bit rate a lot.
– Eventually the process must be started again.
• The difference becomes too great especially if there is a cut.
137
Inter-frame Compression/Coding
138
A Generic Video Encoder
DCT
Motion
Estimation
Motion
Compensation
Frame store
Entropy Coding
(RLC then VLC)
Intra
Prediction
Intra/Inter Mode
Decision
Inverse Quantization
+
-
+
+
Video Input Output Bit streamQuantization
Inverse Transform
MVs
buffer
MVs
Deblocking
Filter
139
A Generic Video Encoder (Ex: AVC)
140
A Generic Video Encoder
141
Interframe Loop
− In interframe predictive coding, the difference between pixels in the current frame and their
prediction values from the reference frame (ex: previous frame) is coded and transmitted.
− At the receiver, after decoding the error signal of each pixel, it is added to a similar prediction
value to reconstruct the picture.
− The better the predictor, the smaller the error signal, and hence the transmission bit rate.
− when there is a motion, assuming that movement in the picture is only a shift of object
position, then a pixel in the previous frame, displaced by a motion vector, is used.
A Generic Video Encoder
142
Motion Estimator
− Assigning a motion vector to a group of pixels.
− A group of pixels is motion compensated, such that the motion vector overhead per pixel can
be very small.
− In standard codecs, a block of 16×16 pixels, known as a macroblock (MB) (to be differentiated
from 8 ×8 DCT blocks), is motion estimated and compensated.
− It should be noted that ME is only carried out on the luminance parts of the pictures.
− A scaled version of the same motion vector is used for compensation of chrominance blocks,
depending on the picture format.
A Generic Video Encoder
143
BD –Block Difference
DBD – Displaced Block Difference
X
X
3
2.7
MC
No MC
256
DBD
y 
x 
256
BD
1.5
0.5
1
DBD   c[x, y] r[x  dx, y  dy]
256 MB
1
BD   c[x, y] r[x, y]
256 MB
1
𝑦 = 𝑥/1.1
A Generic Video Encoder
– Not all blocks are motion compensated
– The one which generates less bits are preferred.
Motion Compensation Decision Characteristic (H.261)
144
Inter/Intra Switch
− Every MB is either interframe or intraframe coded, called inter/intra MBs.
− The decision on the type of MB depends on the coding technique.
− Sometimes it might be advantageous to intraframe code an MB, rather than
interframecoding it.
There are at least two reasons for intraframe coding:
I. Scene cuts or, in the event of violent motion, interframe prediction errors may not be less than those
of the intraframe. Hence, intraframe pictures might be coded at lower bit rates.
II. Intraframe coded pictures have a better error resilience to channel errors.
A Generic Video Encoder
145
Inter/Intra Switch
− In interframe coding in the event of channel error, the error
propagates into the subsequent frames. If that part of the
picture is not updated, the error can persist for a long time.
− The variance of intraframe MB is compared with that of the
variance of interframe MB (motion compensated or not) in
previous frame. The smallest is chosen.
• For large variances, no preference between the two modes.
• For smaller variances, interframe is preferred.
− The reason is that, in intra mode, the DC coefficients of the
blocks have to be quantised with a quantiser without a dead
zone and with 8-bit resolutions. This increases the bit rate
compared to that of the interframe mode, and hence
interframe is preferred. MC/NO_MC mode decision in H.261
A Generic Video Encoder
(Intraframe AC energy)
(Interframe AC energy)
146
DCT
− Every MB is divided into 8×8 luminance and chrominance pixel blocks.
− Each block is then transformed via the DCT.
− There are four luminance blocks in each MB, but the number of chrominance blocks depends
on the colour resolutions (image format).
A Generic Video Encoder
147
Quantiser
− There are two types of quantisers.
• With dead zone for the AC coefficients and the DC coefficient of inter MB
• Without the dead zone for the DC coefficient of intra MB.
− With a dead zone quantiser, if the modulus (absolute value) of a
coefficient is less than the quantiser step size q, it is set to zero; otherwise, it
is quantised according to quantiser indices.
Variable length coding
− The quantiser indices are variable length coded, according to the type of
VLC used.
− Motion vectors, as well as the address of coded MBs, are also variable
length coded.
A Generic Video Encoder
148
IQ and IDCT
− To generate a prediction for interframe coding, the quantised DCT coefficients are first inverse
quantised and inverse DCT coded.
− These are added to their previous picture values (after a frame delay by the frame store) to
generate a replica of decoded picture.
− The picture is then used as a prediction for coding of the next picture in the sequence.
Buffer
− The bit rate generated by an interframe coder is variable. (a function of motion of objects and their
details)
− Therefore, to transmit coded video into fixed rate channels, the bit rate has to be regulated. Storing
the coded data in a buffer and then emptying the buffer at the channel rate does this.
− However, if the picture activity is such that the buffer may overflow (violent motion), then a
feedback from the buffer to the quantiser can regulate the bit rate.
A Generic Video Encoder
149
A Generic Video Decoder
Motion
Compensation
Entropy
Decoding
Intra
Prediction
Intra/Inter Mode
Selection
Inverse Quantization
& Inverse DCT
+
+
Input Bit stream Video Output
Picture
Buffering
Deblocking
Filter
150
A Generic Video Decoder (Ex: AVC)
151
A Generic Video Decoder
− The compressed bitstream, after demultiplexing and Variable Length Decoding (VLD), separates the
motion vectors and the DCT coefficients.
− Motion vectors are used by motion compensation
− The DCT coefficients after the inverse quantisation and IDCT are converted to error data.
− They are then added to the motion-compensated previous frame to reconstruct the decoded picture.
152
Bit Rate Variation
Constant Bit Rate (CBR)
− Quantiser step size and even frame rate may change to adapt the bit rate to channel rate
− Video quality is variable
− Normally a complex structure is used to regulate the bit rate
Variable Bit Rate (VBR)
− Quantiser step size is nearly constant, generating almost constant quality picture
− Difficult to adapt to channel rate, but is suitable for packet Switched Network applications (e.g. Internet)
− No need for bit rate regulation (coded is simple)
− To achieve the requirement of random access, a set of pictures can be defined to form a
Group of Picture (GOP), consisting of a minimum of one I-frame, which is the first frame,
together with some P-frames and/or B-frames.
153
Group of Picture (GOP), Recall
Forward Prediction
Bi-Directional PredictionBackward Prediction
GOP = 12
I: Intra Coded Frame
P: Predictively Coded (Predictive-coded ) Frame
B: Bidirectionally Coded (Bidirectional-coded) Frame
Example GOP Structures
MPEG-2: Simple Possibilities
MPEG-2: An I picture is mandatory at least once in a sequence of 132 frames (period_max=132)
154
I B B P B B I B B P B B I
I I I I I I I I I I I I I
I P I P I P I P I P I P I
I P I P I I I P I P I I I
Example GOP Structures
• .I I I I I I I I I I I I I I I I I I
B I B I B I B I B I B I B I B I B I
B I P B B P B B P B B P B B P I P B
High bit rate, broadcast quality. Easy to edit.
Low bit rate, domestic and transmission quality. No
further editing required.
Example GOP Structures
• .I I I I I I I I I I I I I I I I I I
B I B I B I B I B I B I B I B I B I
B I P B B P B B P B B P B B P I P B
I frame only (1 frame GOP) Used by Sony IMX
IB frame only (2 frame GOP) Used by Sony Betacam SX
Long GOP Used by satellite, cable & DVD
157
How to Chose Current Frame: I, B, P?
To chose of how to encode the current frame is done by encoder
− Change of scene frames should be encoded as I-frame
− Encoder should never allocate too long sequences of P or B frames (Interframe coding is bad for error
resilience)
− B frames are computationally intensive
− Must compute forward and backward motion vectors
On average, natural images with fixed quantization intervals:
− Size (I-frame), Size (P-frame), Size (B-frame)= 6:3:2
− I and P pictures are called “anchor” pictures
− A GOP is a series of one or more pictures to assist random access into the picture sequence.
− The GOP length is normally defined as the distance between I-pictures, which is represented by
parameter N in the standard codecs.
− The distance between the anchor I/P and P-pictures is represented by M.
− The encoding or transmission order of pictures differs from the display or incoming picture order.
− This reordering introduces delays amounting to several frames at the encoder (equal to the number of B-
pictures between the anchor I- and P-pictures).
− The same amount of delay is introduced at the decoder in putting the transmission/ decoding sequence
back to its original. This format inevitably limits the application of MPEG-1 for telecommunications.
− A GOP, in coding, must start with an I picture and in display order, must start with an I or B picture and
must end with an I or P picture.
158
Group of pictures and Reordering
− In order to allow B frames to be decoded, frames are re-ordered when the MPEG file is created, so that
when a frame is received, the decoder will already have the required reference frames.
− To encode the frames in display order
I1B1B2P1B3B4P2B5B6P3B7B8P4
− The ordering of the frames in the file would be
I1P1B1B2P2B3B4P3B5B6P4B7B8
− When the decoder receives the P frame, it decodes it, but it would delay displaying the picture, as the
next frame is a B frame
159
Picture Re-ordering, Ex. 1
I B B P B B P B B I
1 2 3 4 5 6 10987
1 4 2 3 7 5 98106
Encoder Input/Decoder Output
Encoder Output/ Decoder Input
Picture Re-ordering, Ex. 1
Encoder Decoder
P B BI P B B I B B
160
1 2 3 4 5 6 7 8 9 10 11 12 1
Source and Display Order
Transmission Order
161
Picture Re-ordering, Ex. 2
Group of pictures and Reordering
162
0 3 1 2 6 4 5 9 7 8 12
I P B B P B B I B B P
Encoding Order of Frames
Intra frame coding
(Temporal reference)
0 1 2 3 4 5 6 7 8 9 10
I B B P B B P B B I B
Group of Picture (GOP)
Forward prediction Bidirectional prediction
N=8
M=3
Forward prediction Backward prediction
Picture Re-ordering, Ex. 3
• Give the encoded sequence of the following frames:
I1P1P2B1B2B3B4P3B5I2B6B7B8P4
• Answer
I1P1P2P3B1B2B3B4I2B5P4B6B7B8
163
Picture Re-ordering, Ex. 4
164
Encoding order: I0, P3, B1, B2, P6, B4, B5, I9, B7, B8
Playback order: I0, B1, B2, P3, B4, B5, P6, B7, B8, I9
Picture Re-ordering, Ex. 5
Ex: Bit Rate and Compression Ratio
− Consider a video clip encoded in MPEG-1 with a frame rate of 30 frames per second and a group of
pictures with sequence:
I B B P B B P B B P B B P B B .....
− If the size of each I-frame, P-frame and B-frame is 12.5 KB, 6 KB and 2.5 KB respectively, calculate the
average bit rate for the video clip.
− Suppose that the uncompressed frames are each of size 150 KB, find the compression ratio.
165
Solution:
− In a GOP, there are 1 I-frame, 4 P-frames, and 10 B-frames.
I-frame = 1 ×12.5 = 12.5 KB
P-frame = 4 × 6 = 24 KB
B-frame = 10 × 2.5 = 25 KB
Size of a GOPs = 61.5 KB
In 1 second, there are 30 frames = 2 GOPs = 2 × 61.5 KB = 123 KB
Average bit rate = 123 × 1024 × 8 = 1007616 bit/s
− Overall compression ratio for the video stream
= original/compressed = 15×150 / 61.5 = 36.59.
166
Ex: Bit Rate and Compression Ratio
167
PSNR
Long GOP
B I B B P B B I BI I I I I I I I I
Time
GOP
I Frame Only
Ti me
Moving Picture Types Quality
168
Moving Picture Types Quality
169
To GOP or not to GOP
1st 5th
Long GOP Codec
PSNR
10th
Generation
AVC-Intra100 : Cut Edit
AVC-Intra50 : Cut Edit
Still Pictures
Fast Motion
Confetti fall
Flashing lights
Landscape
Long GOP quality is
content dependent
170
Code and Decode Speed for Inter and Intra Codecs, Examples
Software Coded Performance
171
Code and Decode Speed for Inter and Intra Codecs, Examples
Core i7 4770, 4 core, 8 thread
Multi Slice Encoding
172
Single CPU Model
CPU #0
Multi CPU Model
CPU #0
CPU #1
CPU #2
CPU #3
Total 4 CPUsA
B
C
D
CPU #0
CPU #1
CPU #2
CPU #3
GOP 0 GOP 1 GOP 2
A B C D
A
B
C
D
* Use 1 GOP = 6 frames for Explanation
Blocking
− Borders of 8x8 blocks become visible in reconstructed frame (Caused by coarse quantization, with
different quantization applied to neighboring blocks.)
Ringing (Echoing, Ghosting)
− Distortions near edges of image (Caused by quantization/truncation of high-frequency
transform(DCT/DWT) coefficients during compression)
173
Original image
Reconstructed image
(with ringing Artifacts)
De-blocking and De-ringing Filters
Deblocking and Deringing Filters
Low-pass filters are used to smooth the image where artifacts occur.
De-blocking:
− Do Low-pass filtering on the pixels at borders of 8x8 blocks
− One-dimensional filter applied perpendicular to 8x8 block borders
− Can be turned on or off for each block, usually go together with MC
− Advantage
• Decreases prediction error by smoothing the prediction frame
• Reduces high-frequency artifacts like mosquito effects
− Disadvantage
• Increases complexity & overhead
De-ringing:
− Detect edges of image features
− Adaptively apply 2D filter to smooth out areas near edges
− Little or no filtering applied to edge pixels in order to avoid blurring
174
Deblocking
Artifact Reduction: Post-processing vs. In-loop filtering
De-blocking/de-ringing often applied after the decoder
(post-processing)
− Reference frames are not filtered
− Developers free to select best filters for the application
or not filter at all
− It may require an additional frame buffer
De-blocking/de-ringing can be incorporated in the
compression algorithm (in-loop filtering)
− Reference frames are filtered
− Same filters must be applied in encoder and decoder
− Better image quality at very low bit-rates
175
Sensitivity to Transmission Errors
− Prediction and Variable Length Coding (VLC) makes the video stream very sensitive to
transmission errors on the bitstream
− Error in one frame will propagate to subsequent frames
− Bit errors in one part of the bit stream make the following bits undecodable
176
Effect of Transmission Errors
177
Example reconstructed video frames from a H.263 coded sequence, subject to packet losses
Error Resilient Encoding
− To help the decoder to resume normal decoding after errors occur, the encoder can
• Periodically insert INTRA mode (INTRA refresh)
• Insert resynchronization codewords at the beginning of a group of blocks (GOB)
− More sophisticated error-resilience tools
• Multiple description coding
− Trade-off between efficiency and error-resilience
− Can also use channel coding / retransmission to correct errors
178
Error Concealment
− With proper error-resilience tools, packet loss typically lead to the loss of an isolated segment of a frame
− The lost region can be “recovered” based on the received regions by spatial/temporal interpolation →
Error concealment
− Decoders on the market differ in their error concealment capabilities
179Without concealment With concealment
180
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC
181
Projective Mapping
2-D Motion: Projection of 3-D motion, depending on 3D object motion and projection operator
Optical flow: “Perceived” 2-D motion based on changes in image pattern, also depends on illumination and
object surface texture
On the left, a sphere is rotating under a
constant ambient illumination, but the
observed image does not change.
On the right, a point light source is rotating
around a stationary sphere, causing the
highlight point on the sphere to rotate
182
When does optical flow break?
183
3D Motion to 2D Motion Projective Mapping
Y
D
Z2-D MV
X
3-D MV
X'
x
yx
d
X
x'
C
184
2D Motion Corresponding to Rigid 3D Object Motion
− General case
− Projective mapping
185
Two Features of Projective Mapping
− Chirping: increasing perceived spatial frequency for far away objects
− Converging (Keystone): parallel lines converge in distance
Non-chirping models Chirping models
(Original) (Affine) (Bilinear) (Projective) (Relative- (Pseudo- (Biquadratic)
projective) perspective)
186
Affine and Bilinear Transformation Models
Approximation of projective mapping:
I. Affine (6 parameters): Good for mapping triangles to triangles
II. Bilinear (8 parameters): Good for mapping blocks to quadrangles
187
Prospective and Pixel Coordinate Transformation Models
Approximation of projective mapping:
I. Prospective Transformation
II. Pixel Coordinate Transformation
Prospective
a1x1 + a2x2 + a3
a7x1 + a8xa + 1
x’1 =
a4x1 + a5x2 + a6
a7x1 + a8xa + 1
x’2 =
Eight Motion Parameters: a1, a2, a3, a4, a5, a6, a7, a8
Shift
x’1 = x1 + d1 x’2 = x2 + d2
Two Motion Parameters: d1, d2
The simplest block motion model, which is used for
block-based motion compensation mostly!
188
Motion Field Corresponding to Different 2-D Motion Models
Translation
Affine
(a) (b)
Bilinear Projective
(c) (d)
189
Sample Motion Field
190
2D Motion Corresponding to Camera Motion
(b)(a)
Camera zoom Camera rotation around Z-axis (roll)
191
Optical flow or optic flow
− It is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by
the relative motion between an observer (an eye or a camera) and the scene
Methods for determining optical flow
− Phase correlation – inverse of normalized cross-power spectrum
− Block-based methods – minimizing sum of squared differences or sum of absolute differences,
or maximizing normalized cross-correlation
Optical flow
192
Optical Flow Equation
When illumination condition is unknown, the best one can do it to estimate optical flow.
Constant intensity assumption → Optical flow equation
Under "constant intensity assumption":
But, using Taylor's expansion
Compare the above two, we have the optical flow equation
Brightness or intensity (𝝍)
193
Ambiguities in Motion Estimation
− Optical flow equation only constrains the flow vector in the gradient direction 𝑣 𝑛
− The flow vector in the tangent direction (𝑣 𝑡) is under-determined
− In regions with constant brightness (∆𝜓 = 0), the flow is indeterminate → Motion estimation is unreliable in
regions with flat texture, more reliable near edges
194
General Considerations for Motion Estimation
Two categories of approaches:
− Feature based (more often used in object tracking, 3D reconstruction from 2D)
− Intensity based (based on constant intensity assumption) (more often used for motion compensated
prediction, required in video coding, frame interpolation) → Our focus
Three important questions
− How to represent the motion field?
− What criteria to use to estimate motion parameters?
− How to search motion parameters?
195
Motion Representation
Global:
Entire motion field is represented
by a few global parameters
Pixel-based:
One MV at each pixel, with some
smoothness constraint between
adjacent MVs.
Region-based:
Entire frame is divided into regions,
each region corresponding to an
object or sub- object with consistent
motion, represented by a few
parameters.
Block-based:
Entire frame is divided into blocks,
and motion in each block is
characterized by a few
parameters.
196
Notations
Anchor frame: 1(x)
Target frame: 2(x)
Motion parameters: a
Motion vector at a pixel in the anchor frame: d(x)
Motion field: d(x;a), x 
Mapping function: w(x;a)  x  d(x;a), x
197
Motion Estimation Criterion
198
Relation Among Different Criteria
− OF (Optical Flow) criterion is good only if motion is small.
− OF criterion can often yield closed-form solution as the objective function is quadratic in MVs.
− When the motion is not small, can iterate the solution based on the OF criterion to satisfy the DFD criterion.
− Bayesian criterion can be reduced to the DFD criterion plus motion smoothness constraint
− More in the textbook
199
Optimization Methods
Exhaustive search
– Typically used for the DFD criterion with p=1 (MAD)
– Guarantees reaching the global optimal
– Computation required may be unacceptable when number of parameters to search simultaneously is large!
– Fast search algorithms reach sub-optimal solution in shorter time
Gradient-based search
– Typically used for the DFD or OF criterion with p=2 (MSE)
− The gradient can often be calculated analytically
− When used with the OF criterion, closed-form solution may be obtained
– Reaches the local optimal point closest to the initial solution
Multi-resolution search
− Search from coarse to fine resolution, faster than exhaustive search
− Avoid being trapped a local minimum
200
Gradient-based Search
Iteratively update the current estimate in the direction opposite the gradient direction.
• The solution depends on the initial condition. Reaches the local minimum closest to the initial condition
• Choice of step side:
– Fixed stepsize: Stepsize must be small to avoid oscillation, requires many iterations
– Steepest gradient descent (adjust stepsize optimally)
201
Block-Based Motion Estimation: Overview
• Assume all pixels in a block undergo a coherent motion, and search for the motion
parameters for each block independently
• Block matching algorithm (BMA): assume translational motion, 1 MV per block (2 parameter)
– Exhaustive BMA (EBMA)
– Fast algorithms
• Deformable block matching algorithm (DBMA): allow more complex motion (affine,
bilinear), to be discussed later.
202
Block Matching Algorithm
• Overview:
– Assume all pixels in a block undergo a translation, denoted by a single MV
– Estimate the MV for each block independently, by minimizing the DFD error over this block
• Minimizing function:
• Optimization method:
– Exhaustive search (feasible as one only needs to search one MV at a time), using MAD criterion (p=1)
– Fast search algorithms
Integer vs. fractional pel accuracy search
203
Ry
dm
Target frame
Rx
Anchor frame
Bm
Current block
Bm
Best match
Search region
Exhaustive Block Matching Algorithm (EBMA)
204
Reference Frame Current Frame
Search
Range
Motion
Vector
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
11 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
12 22 33 44 55 66 77 88
11 23 34 44 55 66 77 88
11 23 34 44 55 66 77 88
11 23 34 44 55 66 77 88
11 23 34 44 55 66 77 88
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
|A|=12
Exhaustive Block Matching Algorithm (EBMA)
205
Sample Matlab Script forInteger-pel EBMA
206
Fractional Accuracy EBMA
• Real MV may not always be multiples of pixels. To allow sub- pixel MV, the search stepsize must
be less than 1 pixel
• Half-pel EBMA: stepsize=1/2 pixel in both dimension
• Difficulty:
– Target frame only have integer pels
• Solution:
– Interpolate the target frame by factor of two before searching
– Bilinear interpolation is typically used
• Complexity:
– 4 times of integer-pel, plus additional operations for interpolation.
• Fast algorithms:
– Search in integer precisions first, then refine in a small search region in half-pel accuracy.
207
Sub-pixel Motion Compensation
− In the first stage, motion estimation finds the best match
on the integer pixel grid (circles).
− The encoder searches the half-pixel positions
immediately next to this best match (squares) to see
whether the match can be improved and if required, the
quarter-pixel positions next to the best half-pixel position
(triangles) are then searched.
− The final match, at an integer, half-pixel or quarter-pixel
position, is subtracted from the current block or
macroblock.
Sub-pixel Motion Compensation
208
Close-up of reference region Reference region interpolated to half-pixel positions
209
Half-Pel Accuracy EBMA
Bm: Current block
B'm: Matching block
dm
210
(x, y) (x+1,y)
(x,y +1) (x+1 ,y+1)
(2x +1,2y)
(2x,2y +1) (2x +1,2y+1)
Bilinear Interpolation
O[2x,2y]=I[x,y]
O[2x+1,2y]=(I[x,y]+I[x+1,y])/2
O[2x,2y+1]=(I[x,y]+I[x+1,y])/2
O[2x+1,2y+1]=(I[x,y]+I[x+1,y]+I[x,y+1]+I[x+1,y+1])/4
(2x, 2y)
211
Half-Pel Accuracy EBMA
212
Pros and Cons with EBMA
• Blocking effect (discontinuity across block boundary) in the predicted image
– Because the block-wise translation model is not accurate Fix: Deformable BMA (next
lecture)
• Motion field somewhat chaotic
– because MVs are estimated independently from block to block
– Fix 1: Mesh-based motion estimation (next lecture)
– Fix 2: Imposing smoothness constraint explicitly
• Wrong MV in the flat region
– because motion is indeterminate when spatial gradient is near zero
• Nonetheless, widely used for motion compensated prediction in video coding
– Because its simplicity and optimality in minimizing prediction error
213
Fast Algorithms for BMA
• Key idea to reduce the computation in EBMA:
– Reduce # of search candidates:
• Only search for those that are likely to produce small errors.
• Predict possible remaining candidates, based on previous search result
– Simplify the error measure (DFD) to reduce the computation involved for each candidate
• Classical fast algorithms (Save large computation, Not accurate as EBMA)
– Three-step
– 2D-log
– Conjugate direction
– The characteristics of fast algorithm
• Many new fast algorithms have been developed since then
– Some suitable for software implementation, others for VLSI implementation (memory access, etc)
214
Row-Column Search Algorithm
Step 1. Search row (15)
Step 2. Search column (14)
• 29 point
• not optimal
15
15
-7 +7
215
Three Step Search (TSS) Method
TOTAL = (9 + 8 + 8 ) = 25
Step 1:”O”﹐9 search points  find min point (1st MSB)
(x, y) = (a 0 0, b 0 0)
Step 2:””﹐8 search points  find min point (2nd MSB)
(x, y) = (-1 c 0, +1 d 0)
Step 3: ”□” 8 points  find min point
a and b could be +1, 0, or -1
c and d could be +1, 0, or -1
The TSS method is called as the logarithmic search method!!
First-step
Search Points
Second-step
Search Points
Third-step
Search Points
216
11 1
1
1
2 2
2
2
11
11
2
2
2
2
3 3 3
3 3
3 3 3
1 2
3
Three-Step Search Algorithm (TSS)
217
Three Step Search (TSS) Method
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
2525
25
25
25
25 25
25
25 25
25
25
25
25
25
25 25
25
25 25
25
25
25
2525
25
25
2525
25
25 25
25
25
25
25
25
25
25
25
25
25
25
25 252525
25
25
25
25
25
2525
2525
25
25
25
25
25
25
2525
25 25
25
25
2525
25
25
25
25
25
25 25
25
25
25
2525
25
25
25
25
25
25
25
2525
25 25 25 25
2525
218
Three-Step Search Algorithm Example
2
3 3 3
3 2 3 2
1 1 2 3 3 3 2
1
2 2 2
1 1
1
1 1 1
n denotes a search point of step n
i 6 i 5 i 4 i 3 i 2 i 1 i i 1 i 2 i 3 i 4 i 5 i 6 j 6
j 5
j 4
j 3
j 2
j 1
j
j 1
j 2
j 3
j 4
j 5
j 6
The best matching MVs in steps 1–3 are (3,3), (3,5),
and (2,6). The final MV is (2,6). From [Musmann85].
219
Three Step Search Algorithm
Four Step Search (4SS) Method
1. Corner: 2. Center:
3. Side:
Step 1. Search 9 points (Start from 5x5 : same as the TSS)
Go to Step 2
220
Four Step Search (4SS) Method
1. Corner:
2. Center:
3. Side:
Step 2 and 3
If Corner: (+5) If Side: (+3)
If Center: (8-point step)
Step 3 if Not Center
Go to
Step 4 if Center
221
222
Four Step Search (4SS) Method
Step 4. Search window reduced to 3x3 (same as Center case)
Minimun search points = 9 + 8 = 17
Step 1(Center) + Step 4 (Center)
Maximun search points = 9 + 5 + 5 + 8 = 27
Step 1(Center) + Step 2 (Corner) + Step 3 (Corner) + Step 4 (Center)
223
Four Step Search (4SS) Method, Examples
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Step.1
=>Step.2
Side
=> Step.3
Corner
=> Step.4
Center
Total Search
9+3+5+8=25
Points
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Step.1 =>Step.2
Corner
=> Step.3
Corner
=> Step.4
Center
Total Search
9+5+5+8=27
Points
Worst Cases for 4SS
224
Search Points of Four Step Search
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
17
27 27
27
27 27
27
27
27
27
25
27 27
27
25 25
25
2525
17
25 25
23
2323
2222
22 22
20
20
20
20
26
26
26
27
27 27
27
26
225
The search pattern is cross shape.
− Step 1:”+”search (5 points)
− Step 2: (a) center (+8 points) Stop
(b) side (+3 points)
− Step N: center (+8 points) Stop
− Minimun search points = 5 + 8 = 13 Step 1(Center)
− Maximun search points = ???
Cross Search Method
Search Shapes (Change 9 points to 5 Points)
1
2
226
Cross Search Algorithm (CSA)
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Step.1 => Step.2
Side
=> Step.3
Side
=> Step.4
Center
Total Search
5+3+3+8=19 Points
Normal Worse Case
N=4 for search range -7 to +7
227
Search Points of Cross Search
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
1313
13
13
13
19 19
19
19 19
19
19
19
19
19
19 19
19
19
19
19
19
19
19
19
19
19
19
19
1919
19
19
19
19
19 191919
19
19
16
1919
19
19
19
19
19 19
16
16
1616
16
16
16
16
16
16 16
19
19
19
19
16
16
1616
16
16 16
1616
Cross-search Algorithm (CSA)
An example of the CSA (cross-search
algorithm) search for w=8 pixels/frame
• Another method of fast BMA is the cross-
search algorithm (CSA) . In this method, the
basic idea is still a logarithmic step search,
but with some differences, which lead to
fewer computational search points.
• The main difference is that at each iteration
there are four search locations, which are the
end points of a cross (×) rather than (+).
• Also, at the final stage, the search points can
be either the end points of (×) or (+) crosses,
as shown in Figure.
• For a maximum motion displacement of w
pixels/frame, the total number of
computations becomes 5 + 4 log2 𝑊.
228
229
Step 1: 9 points ==> Three Cases
1. Center
2. Side
3. Corner
Step. 2 and 3
Final step: with shirked diamond (same as Center case)
Minimun search points = 9 + 4 = 13, Step 1(Center)
Maximun search points = ? (33)
Diamond Search
Slide
230
Diamond Search
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Step.1 =>Step.2
Side
=> Step.3
Side
=> Step.4
Side
=> Step.5
Total Search
9+6+6+4+4=29
Points
231
Search Points of Diamond Search
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
13
25 25
25
25 25
25
25
25
25
24
25 25
25
24 24
23
22
22
2222
22
22
2222
22
22 22
22
24
24
24
24
24
24
24
24
24
23
23 232323
23
23
19
21
21
2121
2121
21
21
21
21
21
21
2121
21 21
19
19
1919
19
19
19
19
19
19 19
1818
18
18
18
1616
16 16
232
Step 1. TSS points (9 points)
+ around center points (8 points)
Novel TSS Algorithm (NTSS)
Center_center
Center_side
Center_corner
Outside point
1. Center_center
2. Center_side
3. Center_corner
4. Outside
233
Step. 2
(1) Center ==> (0,0) Stop! (17 points) minimum search
(2) Center-side ==> Perform 3 points search
(3) Center-corner ==> Perform 5 points search Stop!
(4) Outside point ==> Perform regular TSS algorithm
Novel TSS Algorithm (NTSS)
Center-sideCenter_corner
25+8=33(worst case)
234
Search Points for Novel 3-Step Search
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
2020
20
20
17
33 33
33
33 33
33
33
33
33
33
33 33
33
33 33
33
33
33
3333
33
33
3333
33
33 33
33
33
33
33
33
33
33
33
33
33
33
33 333333
33
33
22
33
33
3333
3333
33
33
33
33
33
33
3333
33 33
33
33
2233
33
22
33
33
33
22 33
33
33
33
3320
20
20
20
16
16
22
22
2222
22 22 22 22
2222
235
1 1
1
1
1 2
2
2
3
3
4
44
4 4
1
2 3
4
Two-dimensional Log Search Algorithm (TDL)
236
2D-Log Search, Example
3 5 4 5
3 2 5 5 5
4
3
2 1 2
1 1 1
1
n denotes a search point of step n
i 6 i 5 i 4 i 3 i 2 i 1 i i 1 i 2 i 3 i 4 i 5 i 6 j 6
j 5
j 4
j 3
j 2
j 1
j
j 1
j 2
j 3
j 4
j 5
j 6
The best matching MVs in steps 1– 5 are
(0,2), (0,4), (2,4), (2,6), and (2,6). The final
MV is (2,6). From [Musmann85].
237
Step 1: The large hexagon with seven checking points is centered at (0,0), the center of a predefined search window in the
motion field.
If the MBD point is found to be at the center of the hexagon,
go to Step3 (Ending);
otherwise,
go to Step 2 (Searching).
Step 2: With the MBD point in the previous search step as the center, a new large hexagon is formed. Three new candidate
points are checked, and the MBD point is again identified.
If the MBD point is at the center point of the new hexagon,
go to Step3 (Ending);
otherwise, repeat this step continuously.
Step 3: Switch the search pattern from the large to the small size of the hexagon. The four points covered by the small
hexagon are evaluated to compare with the current MBD point. The new MBD point is the final solution of the motion vector.
Hexagon-Based Search Algorithm
238
Hexagon Search Path
First step
Second step
Final step
Third step
239
Diamond Search
Minimum search points =9+4=13
Number of search points
NDS= 9+Mxn’+4, M=5 or 3
n’: number of steps
Hexagon Search
Minimum search points =7+4=11
Number of search points
NHEXBS=7+3xn+4
n: number of steps
Number of Search Points
240
One More Step Hexagon Search
Minimum Distortion
241
Search Points of HEXBS Method
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
1111
11
11
11
20 20
20
20 20
20
20
20
20
17
20 20
20
17 17
17
17
17
2017
17
20
1720
17
17 20
17
17
17
17
17
17
17
17
17
17
17
17 171717
17
17
14
17
17
1717
1717
17
17
17
17
17
17
1717
17 17
17
17
1417
17
14
17
17
17
14 17
14
14
14
1414
14
14
14
14
14
14
14
1414
14 14 14 14
1414
Algorithm Maximum number of
search points
4 8 16
w
TDL 2 + 7 log w2
16 23 30
TSS 1 + 8 log w2 17 25 33
MMEA 1 + 6 log w
2
13 19 25
CDS 3 + 2 w 11 19 35
CSA 5 + 4 log w
2
13 17 21
(2w+1)FSM 81 289 10892
OSA 1 + 4 log w 9 13 17
2
Computational complexity
Complexity/Performance
242
The range of motion speed from w=4 to 16 pixels/frame
− The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean-
square error, average error and standard deviation.
FSM: Full Search Mode
TDL: two-dimensional logarithmic
TSS: three-step search
MMEA: modified motion estimation algorithm
CDS: conjugate direction search
OSA: orthogonal search algorithm
CSA: cross-search algorithm
Algorithm
Split Screen Trevor White
Entropy
(bits/pel)
Standard
deviation
Entropy
(bits/pel)
Standard
deviation
FSM 4.57 7.39 4.41 6.07
TDL 4.74 8.23 4.60 6.92
TSS 4.74 8.19 4.58 6.86
MMEA 4.81 8.56 4.69 7.46
CDS 4.84 8.86 4.74 7.54
OSA 4.85 8.81 4.72 7.51
CSA 4.82 8.65 4.68 7.42
Compensation efficiency
Complexity/Performance
243
The motion compensation efficiencies of some algorithms for a motion speed of
w=8 pixels/frame for two test image sequences (Split screen and Trevor white)
− The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean-
square error, average error and standard deviation.
FSM: Full Search Mode
TDL: two-dimensional logarithmic
TSS: three-step search
MMEA: modified motion estimation algorithm
CDS: conjugate direction search
OSA: orthogonal search algorithm
CSA: cross-search algorithm
Some Search Points Comparison
Methods Min Max Average Speed-up
FS 225 225 225 1
TSS 25 25 25 9
4SS 17 27 17.2 13.1
NTSS 17 33 17.5 12.8
Diamond 13 33 13.3 16.9
Method Ave Criterion Ave Distance (FS) Optimality
FS 2753 0 100%
TSS 2790 0.04 98.5%
4SS 2777 3.84 98.7%
NTSS 2775 2.98 99.0%
Diamond 2770 3.11 98.9%
244
245
Hierarchical Block Matching Algorithm (HBMA), Ex: H.261
246
Multi-resolution Motion Estimation or Hierarchical Motion Estimation
• Problems with BMA
– Unless exhaustive search is used, the solution may not be global minimum
– Exhaustive search requires extremely large computation
– Block wise translation motion model is not always appropriate
• Multiresolution approach
– Aim to solve the first two problems
– First estimate the motion in a coarse resolution over low-pass filtered, down-sampled image pair
• Can usually lead to a solution close to the true motion field
– Then modify the initial solution in successively finer resolution within a small search range
• Reduce the computation
– Can be applied to different motion representations, but we will focus on its application to BMA
247
− The assumption of monotonic variation of image intensity employed in the fast BMAs often causes
false estimations, especially for larger picture displacements.
− These methods perform well for slow moving objects, such as those in video conferencing.
− However, for higher motion speeds, due to the intrinsic selective nature of these methods, they
often converge to a local minimum of distortion.
− One method of alleviating this problem is to subsample the image to smaller sizes, such that the
motion speed is reduced by the sampling ratio.
− The process is done on a multilevel image pyramid, known as the Hierarchical Block Matching
Algorithm (HBMA).
− In this technique, pyramids of the image frames are reconstructed by successive two-dimensional
filtering and subsampling of the current and past image frames.
Multi-resolution Motion Estimation or Hierarchical Motion Estimation
2,1
2,2
2,3
1,1
1,2
1,3
d1,0,0
d2,0,1
q2,0,1d2,0,1
d3,1,2
d3,1,2
q3,1,2
Anchor frame Target frame
Hierarchical Block Matching Algorithm (HBMA)
248
Hierarchical Block Matching Algorithm (HBMA)
249
Level 0
Level 1
Level 2
384
256
V2
2V2

V1
2V1

V0
Hierarchical Block Matching Algorithm (HBMA)
250
251
− Conventional block matching with a block size of 16 pixels, either full search or any fast method, is
first applied to the highest level of the pyramid (level 2).
− This motion vector is then doubled in size, and further refinement within 1-pixel search is carried out
in the following level. The process is repeated to the lowest level.
− Therefore, with an n-level pyramid, the maximum motion speed of w at the highest level is reduced
to
𝑊
2 𝑛−1.
− For example, a maximum motion speed of 32 pixels/frame with a three level pyramid is reduced to
8 pixels/frame, which is quite manageable by any fast search method.
− Note that this method can also be regarded as another type of fast search, with a performance
very close to the full search, irrespective of the motion speed, but the computational complexity
can be very close to the fast logarithmic methods.
Multi-resolution Motion Estimation or Hierarchical Motion Estimation
252
• The number of levels is L
• l-th level images of the target frames
where Λ 𝑙 is set of pixels at level L
• At the l-th level, the MV is d(x)
• At the l-th level, the estimated MV is
• Determine update 𝑞𝑙(𝑥) such that error is minimized
• The new motion vector is
, ( ), , 1,2,...t l l t  x x
1( ) ( ( ))l lU d x d x
2, 1,| ( ( ) ( )) ( )) |
p
ll l l
x l
error

     x d x q x x
( ) ( ) ( )ll l d x d x q x
Hierarchical Block Matching Algorithm (HBMA)
Hierarchical Block Matching Algorithm (HBMA)
(a)
(c)
(e)
(b)
(d)
(f)
Predictedanchorframe(29.32dB)
Example: Three-level HBMA
253
254
Deformable Block Matching Algorithm
255
Overview of DBMA
• Three steps:
– Partition the anchor frame into regular blocks
– Model the motion in each block by a more complex motion
• The 2-D motion caused by a flat surface patch undergoing rigid 3-D motion can be
approximated well by projective mapping
• Projective Mapping can be approximated by affine mapping and bilinear mapping
• Various possible mappings can be described by a node-based motion model
– Estimate the motion parameters block by block independently
• Discontinuity problem cross block boundaries still remain
• Still cannot solve the problem of multiple motions within a block or changes due
to illumination effect!
256
Affine and Bilinear Model
Approximation of projective mapping:
I. Affine (6 parameters): Good for mapping triangles to triangles
II. Bilinear (8 parameters): Good for mapping blocks to quadrangles
257
Node-Based Motion Model
Control nodes in this example: Block
corners
Motion in other points are interpolated from
the nodal MVs dm,k
Control node MVs can be described with
integer or half-pel accuracy, all have same
importance
Translation, affine, and bilinear are special
case of this model
258
Problems with DBMA
• Motion discontinuity across block boundaries, because nodal MVs are estimated
independently from block to block
– Fix: mesh-based motion estimation
– First apply EBMA to all blocks
• Cannot do well on blocks with multiple moving objects or changes due to illumination
effect
– Three mode method
• First apply EBMA to all blocks
• Blocks with small EBMA errors have translational motion
• Blocks with large EBMA errors may have non-translational motion
– First apply DBMA to these blocks
– Blocks still having errors are non-motion compensable
• [Ref] O. Lee and Y. Wang, Motion compensated prediction using nodal based deformable block matching. J.
Visual Communications and Image Representation (March 1995), 6:26-34
259
Mesh-Based Motion Estimation: Overview
− MPEG-4 object motion
− Affine warping motion model
− Deformable polygon meshes
− Similar MAD, SSE error measures
− Trade-offs: more accurate ME vs. tremendous complexity
− Bilinear and perspective motion models are rarely used in video coding
260
Mesh-Based Motion Estimation: Overview
(a) Using a triangular mesh
(b) Using a quadrilateral mesh
261
Mesh-based v.s Block-based
(a) block-based backward ME
(b) mesh-based backward ME
(c) mesh-based forward ME
262
Mesh-Based Motion Model
• The motion in each element is interpolated from nodal MVs
• Mesh-based vs. node-based model:
– Mesh-based: Each node has a single MV, which influences the motion of all four adjacent elements
– Node-based: Each node can have four different MVs depend on within with element it is considered
Predictedanchorframe(29.86dB)anchorframe
targetframeMotionfield
Example: Half-pel EBMA
263
mesh-basedmethod(29.72dB)EBMA(29.86dB)
EBMA vs. Mesh-based Motion Estimation
264
Image Reconstruction with MV (Bilinear vs BMA)
265
Smooth Reconstruction
266
Performance of Spatial Transform Motion Compensation
267
− Rarely used in practice: BME/BMC mostly suffices
− Reference frame resampling: an option in H.263+/H.263++
− Global affine motion model: special-effect warping
− 3D subband & wavelet coding: align frames before temporal filtering [Taubman]
268
Global Motion Estimation
MV00
MV10 MV11
MV01
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts
Video Compression, Part 2-Section 2, Video Coding Concepts

More Related Content

What's hot

Introduction to H.264 Advanced Video Compression
Introduction to H.264 Advanced Video CompressionIntroduction to H.264 Advanced Video Compression
Introduction to H.264 Advanced Video CompressionIain Richardson
 
Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Dr. Mohieddin Moradi
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2VijayKumarArya
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanVinayagam Mariappan
 
Multimedia fundamental concepts in video
Multimedia fundamental concepts in videoMultimedia fundamental concepts in video
Multimedia fundamental concepts in videoMazin Alwaaly
 
Chapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.pptChapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.pptBinyamBekele3
 
Video compression
Video compressionVideo compression
Video compressionnnmaurya
 
The Digital Video Broadcast (DVB) Project
The Digital Video Broadcast (DVB) ProjectThe Digital Video Broadcast (DVB) Project
The Digital Video Broadcast (DVB) ProjectPartho Choudhury
 
An Introduction to HDTV Principles-Part 2
An Introduction to HDTV Principles-Part 2An Introduction to HDTV Principles-Part 2
An Introduction to HDTV Principles-Part 2Dr. Mohieddin Moradi
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVCYoss Cohen
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Dr. Mohieddin Moradi
 
An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1   An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1 Dr. Mohieddin Moradi
 
Video Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionVideo Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionChamp Yen
 

What's hot (20)

Introduction to H.264 Advanced Video Compression
Introduction to H.264 Advanced Video CompressionIntroduction to H.264 Advanced Video Compression
Introduction to H.264 Advanced Video Compression
 
Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts Video Compression, Part 2-Section 1, Video Coding Concepts
Video Compression, Part 2-Section 1, Video Coding Concepts
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Multimedia fundamental concepts in video
Multimedia fundamental concepts in videoMultimedia fundamental concepts in video
Multimedia fundamental concepts in video
 
Chapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.pptChapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
Chapter 3 - Fundamental Concepts in Video and Digital Audio.ppt
 
SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1
 
Video compression
Video compressionVideo compression
Video compression
 
Analog Video
Analog Video Analog Video
Analog Video
 
The Digital Video Broadcast (DVB) Project
The Digital Video Broadcast (DVB) ProjectThe Digital Video Broadcast (DVB) Project
The Digital Video Broadcast (DVB) Project
 
An Introduction to HDTV Principles-Part 2
An Introduction to HDTV Principles-Part 2An Introduction to HDTV Principles-Part 2
An Introduction to HDTV Principles-Part 2
 
HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6HDR and WCG Principles-Part 6
HDR and WCG Principles-Part 6
 
Mp3
Mp3Mp3
Mp3
 
Video coding standards ppt
Video coding standards pptVideo coding standards ppt
Video coding standards ppt
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVC
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 2
 
An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1   An Introduction to Video Principles-Part 1
An Introduction to Video Principles-Part 1
 
Video Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionVideo Compression Standards - History & Introduction
Video Compression Standards - History & Introduction
 
H.264 vs HEVC
H.264 vs HEVCH.264 vs HEVC
H.264 vs HEVC
 

Similar to Video Compression, Part 2-Section 2, Video Coding Concepts

“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...Edge AI and Vision Alliance
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
The_ERICSSON_commands_listed_below_are_f (1) (1).pdf
The_ERICSSON_commands_listed_below_are_f (1) (1).pdfThe_ERICSSON_commands_listed_below_are_f (1) (1).pdf
The_ERICSSON_commands_listed_below_are_f (1) (1).pdfssuser340a0c
 
Classification of Diabetic Retinopathy.pptx
Classification of Diabetic Retinopathy.pptxClassification of Diabetic Retinopathy.pptx
Classification of Diabetic Retinopathy.pptxasmshafi1
 
Operations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleOperations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleAhmed Gad
 
fdocuments.in_the-ericsson-commands.pdf
fdocuments.in_the-ericsson-commands.pdffdocuments.in_the-ericsson-commands.pdf
fdocuments.in_the-ericsson-commands.pdfSaidHaman
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming languageAshwini Mathur
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_pptaserrapages
 
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...Ealwan Lee
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiDatabricks
 
PosterFormatRNYF(1)
PosterFormatRNYF(1)PosterFormatRNYF(1)
PosterFormatRNYF(1)Usman Khalid
 
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...Peter Laurinec
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
Bind Peeking - The Endless Tuning Nightmare
Bind Peeking - The Endless Tuning NightmareBind Peeking - The Endless Tuning Nightmare
Bind Peeking - The Endless Tuning NightmareSage Computing Services
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HCpearlapplepen
 

Similar to Video Compression, Part 2-Section 2, Video Coding Concepts (20)

“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
The_ERICSSON_commands_listed_below_are_f (1) (1).pdf
The_ERICSSON_commands_listed_below_are_f (1) (1).pdfThe_ERICSSON_commands_listed_below_are_f (1) (1).pdf
The_ERICSSON_commands_listed_below_are_f (1) (1).pdf
 
Classification of Diabetic Retinopathy.pptx
Classification of Diabetic Retinopathy.pptxClassification of Diabetic Retinopathy.pptx
Classification of Diabetic Retinopathy.pptx
 
Operations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleOperations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by Example
 
fdocuments.in_the-ericsson-commands.pdf
fdocuments.in_the-ericsson-commands.pdffdocuments.in_the-ericsson-commands.pdf
fdocuments.in_the-ericsson-commands.pdf
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming language
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
Control charts
Control chartsControl charts
Control charts
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_ppt
 
MNIST 10-class Classifiers
MNIST 10-class ClassifiersMNIST 10-class Classifiers
MNIST 10-class Classifiers
 
P1111214158
P1111214158P1111214158
P1111214158
 
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Ewdts 2018
Ewdts 2018Ewdts 2018
Ewdts 2018
 
PosterFormatRNYF(1)
PosterFormatRNYF(1)PosterFormatRNYF(1)
PosterFormatRNYF(1)
 
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Bind Peeking - The Endless Tuning Nightmare
Bind Peeking - The Endless Tuning NightmareBind Peeking - The Endless Tuning Nightmare
Bind Peeking - The Endless Tuning Nightmare
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HC
 

More from Dr. Mohieddin Moradi

An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3Dr. Mohieddin Moradi
 
An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1    An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1 Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 3
Broadcast Camera Technology, Part 3Broadcast Camera Technology, Part 3
Broadcast Camera Technology, Part 3Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Dr. Mohieddin Moradi
 
Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Dr. Mohieddin Moradi
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Dr. Mohieddin Moradi
 
An Introduction to Audio Principles
An Introduction to Audio Principles An Introduction to Audio Principles
An Introduction to Audio Principles Dr. Mohieddin Moradi
 
Video Compression, Part 4 Section 1, Video Quality Assessment
Video Compression, Part 4 Section 1,  Video Quality Assessment Video Compression, Part 4 Section 1,  Video Quality Assessment
Video Compression, Part 4 Section 1, Video Quality Assessment Dr. Mohieddin Moradi
 

More from Dr. Mohieddin Moradi (20)

Video Quality Control
Video Quality ControlVideo Quality Control
Video Quality Control
 
HDR and WCG Principles-Part 5
HDR and WCG Principles-Part 5HDR and WCG Principles-Part 5
HDR and WCG Principles-Part 5
 
HDR and WCG Principles-Part 4
HDR and WCG Principles-Part 4HDR and WCG Principles-Part 4
HDR and WCG Principles-Part 4
 
HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3
 
HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2HDR and WCG Principles-Part 2
HDR and WCG Principles-Part 2
 
HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1HDR and WCG Principles-Part 1
HDR and WCG Principles-Part 1
 
SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2SDI to IP 2110 Transition Part 2
SDI to IP 2110 Transition Part 2
 
Broadcast Lens Technology Part 3
Broadcast Lens Technology Part 3Broadcast Lens Technology Part 3
Broadcast Lens Technology Part 3
 
Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2Broadcast Lens Technology Part 2
Broadcast Lens Technology Part 2
 
Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1Broadcast Lens Technology Part 1
Broadcast Lens Technology Part 1
 
An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2An Introduction to Video Principles-Part 2
An Introduction to Video Principles-Part 2
 
An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4An Introduction to HDTV Principles-Part 4
An Introduction to HDTV Principles-Part 4
 
An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3An Introduction to HDTV Principles-Part 3
An Introduction to HDTV Principles-Part 3
 
An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1    An Introduction to HDTV Principles-Part 1
An Introduction to HDTV Principles-Part 1
 
Broadcast Camera Technology, Part 3
Broadcast Camera Technology, Part 3Broadcast Camera Technology, Part 3
Broadcast Camera Technology, Part 3
 
Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2Broadcast Camera Technology, Part 2
Broadcast Camera Technology, Part 2
 
Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1Broadcast Camera Technology, Part 1
Broadcast Camera Technology, Part 1
 
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
Serial Digital Interface (SDI), From SD-SDI to 24G-SDI, Part 1
 
An Introduction to Audio Principles
An Introduction to Audio Principles An Introduction to Audio Principles
An Introduction to Audio Principles
 
Video Compression, Part 4 Section 1, Video Quality Assessment
Video Compression, Part 4 Section 1,  Video Quality Assessment Video Compression, Part 4 Section 1,  Video Quality Assessment
Video Compression, Part 4 Section 1, Video Quality Assessment
 

Recently uploaded

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 

Video Compression, Part 2-Section 2, Video Coding Concepts

  • 2. Section I − Video Compression History − A Generic Interframe Video Encoder − The Principle of Compression − Differential Pulse-Code Modulation (DPCM) − Transform Coding − Quantization of DCT Coefficients − Entropy Coding Section II − Still Image Coding − Prediction in Video Coding (Temporal and Spatial Prediction) − A Generic Video Encoder/Decoder − Some Motion Estimation Approaches 2 Outline
  • 4. Lossless Compression – Transparent (Totally reversible without any loss) – Compression ratio not guaranteed – Good for computer data where no loss is important – Lempel–Ziv–Welch coding • Lempel–Ziv–Welch (LZW) lossless compression technique (is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.) • LZW is the compression of a file into smaller file using a table based lookup algorithm. • It replaces strings of characters with single codes. • A dynamic coding system • Used by PKZip, WinZip, GIF, TIF, PNG, Fax, UNIX and Linux, Microsoft DriveSpace (is a disk compression utility supplied with MS-DOS), and many other compression systems 4 Encoder Decoder Transparent
  • 5. Lossy Compression – Non-transparent. – Compression ratio is guaranteed. – Examples • JPEG, MPEG, DV, Digital Betacam, Betacam SX, IMX, etc.. • MP3, Dolby E, Dolby Digital, DTS (originally Digital Theater Systems), etc.. • Both DTS and Dolby Digital are audio compression technologies, allowing movie makers to record surround sound that can be reproduced in cinemas as well as homes. – Good for media data ... • … where compression ratio is important. 5 Encoder Decoder Non-transparent
  • 6. Lossless and Lossy Compression Techniques − The Lossless approaches in compression process: − DCT: Discrete Cosine Transform − VLC: Variable Length Coding − RLC: Run Length Coding − The Lossy approaches in compression process: − Sample subsampling: 4:2:2, 4:2:0, 4:1:1 − DPCM: Differential Pulse Code Modulation: − Quantization 6
  • 7. 7 − It arises when parts of a picture are often replicated within a single frame of video (with minor changes). Spatial Redundancy in Still Images This area is all blue This area is half blue and half green Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue
  • 8. − Spatial Redundancy Reduction (pixels inside a picture are similar) − Statistical Redundancy Reduction (more frequent symbols are assigned short code words and less frequent ones longer words) The Principle of Compression in Still Images 8
  • 9. Differential Pulse Code Modulation (DPCM) for Spatial Redundancy Reduction 9
  • 10. • Zig-Zag Scan • Run-length coding • VLC Quantization • Major Reduction • Controls ‘Quality’ 10 Intra-frame Compression (Like as still image Compression)
  • 11. • . Intra-frame Compression (Like as still image Compression) Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearranges the pixels into frequency coefficients Replaces the original data with shorter codes or symbols Rearranges the coefficients from raster scan to low frequency first Stores the compressed data & checks compression ratio. Returns a quantisation signal if ratio is not achieved. Quantises the data if the compression ratio has not been achieved 11
  • 12. Lossless and Lossy in Intra-frame Compression Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearrangement (reversible) Compression ratio checking (reversible) Possible loss of entropy (non-reversible!!) Data reduction (reversible) 12
  • 13. Macroblocks after Color Sub-sampling 13 4:2:0
  • 15. • . Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearranges the pixels into frequency coefficients Quantises the data if the compression ratio has not been achieved 15 Still Image Encoder (Compressor)
  • 16. – The quantizer cuts entropy. – Controlled by the data buffer. – May pass data through without change. • The quantizer can effectively “switch off” by going into pass-through mode if the amount of data from the data buffer is OK. – May quantize the data in a number of steps. • If the data buffer fills, a signal is back to the quantizer which switches to a different quantisation matrix to flatten the data and reduce its content. Quantisation 16
  • 17. • . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0 17 ÷ = DifferentStep-sizes(Q) Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0
  • 18. • . Quantisation 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 0 0 -1 7 -3 -2 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0 18 ÷ = DifferentStep-sizes(Q) Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
  • 19. • . Quantisation 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 0 0 -7 -12 8 -8 -1 -1 0 1 4 5 -7 1 2 -2 0 0 -5 -4 2 -1 1 0 0 0 -1 7 -1 -1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 4 1 1 1 2 2 2 4 4 1 1 2 2 2 4 4 4 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0 19 Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients ÷ = DifferentStep-sizes(Q) Zig-zag Scanning for Separating Redundancy and Entropy
  • 20. • . Quantisation 238 -43 -12 -14 -6 0 -2 -4 39 12 -9 13 4 -1 -1 -2 -16 12 10 8 -1 3 2 0 -3 -7 1 -1 2 0 0 0 -7 -12 4 -4 0 0 0 0 4 2 -3 0 1 -1 0 0 -2 -2 1 0 0 0 0 0 0 3 0 0 0 0 0 0 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 4 1 1 1 2 2 2 4 4 1 1 2 2 2 4 4 4 1 2 2 2 4 4 4 8 2 2 2 4 4 4 8 8 2 2 4 4 4 8 8 8 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0 20 Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients ÷ = DifferentStep-sizes(Q) Zig-zag Scanning for Separating Redundancy and Entropy
  • 21. • . Quantisation 1 1 1 2 2 2 4 4 1 1 2 2 2 4 4 4 1 2 2 2 4 4 4 8 2 2 2 4 4 4 8 8 2 2 4 4 4 8 8 8 2 4 4 4 8 8 8 16 4 4 4 8 8 8 16 16 4 4 8 8 8 16 16 16 238 -43 -12 -7 -3 0 -1 -2 39 12 -4 6 2 0 0 -1 -16 6 5 4 0 1 1 0 -1 -3 0 0 1 0 0 0 -3 -6 2 -2 0 0 0 0 2 1 -1 0 0 0 0 0 -1 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0 21 Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients ÷ = DifferentStep-sizes(Q) Zig-zag Scanning for Separating Redundancy and Entropy
  • 22. • . Quantisation 1 2 2 4 4 4 8 8 2 2 4 4 4 8 8 8 2 4 4 4 8 8 8 16 4 4 4 8 8 8 16 16 4 4 8 8 8 16 16 16 4 8 8 8 16 16 16 32 8 8 4 16 16 16 32 32 8 8 16 16 16 32 32 32 238 -21 -6 -3 -1 0 0 -1 19 6 -2 3 1 0 0 0 -8 3 2 2 0 0 0 0 0 -1 0 0 0 0 0 0 -1 -3 1 -1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 238 -43 -12 -14 -6 0 -4 -8 39 12 -9 13 4 -2 -3 -4 -16 12 10 8 -3 7 5 0 -3 -7 1 -3 5 1 -1 0 -7 -12 8 -8 -1 -3 0 2 4 5 -7 1 5 -4 -1 0 -5 -4 2 -3 2 0 1 0 -1 7 -3 -2 1 0 0 0 22 Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients ÷ = DifferentStep-sizes(Q) Zig-zag Scanning for Separating Redundancy and Entropy
  • 23. DCT Coefficients after Quantization 23
  • 25. • . Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearranges the pixels into frequency coefficients Rearranges the coefficients from raster scan to low frequency first Quantises the data if the compression ratio has not been achieved 25 Intra-frame Compression (Like as still image Compression)
  • 26. – Rearranges the DCT coefficients. • Changed from raster scan so that the DC and low frequency coefficients are first and the high frequency coefficients are last. – Helps to separate entropy from redundancy. Zig-zag Scanning 26
  • 27. • . 231 31 -12 4 -6 0 4 0 29 8 9 3 4 -2 3 2 -16 2 1 5 -3 2 1 0 -1 -4 0 1 0 1 1 0 -6 -2 4 -1 -1 0 0 0 4 5 7 1 0 0 0 0 -5 -3 0 0 0 0 0 0 -1 3 1 0 0 0 0 0 Non Zig-zag Scanning 27
  • 28. 231 31 -12 4 -6 0 4 0 29 8 9 3 4 -2 3 2 -16 2 1 5 -3 2 1 0 -1 -4 0 1 0 1 1 0 -6 -2 4 -1 -1 0 0 0 4 5 7 1 0 0 0 0 -5 -3 0 0 0 0 0 0 -1 3 1 0 0 0 0 0 Zig-zag Scanning 28
  • 29. • . Zig-zag Scanning 29 DC and low frequency coefficients are first and the high frequency coefficients are last.
  • 30. • . Zig-zag Scanning for Separating Redundancy and Entropy 30 RedundancyEntropy DC and low frequency coefficients are first and the high frequency coefficients are last.
  • 31. • . Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearranges the pixels into frequency coefficients Replaces the original data with shorter codes or symbols Rearranges the coefficients from raster scan to low frequency first Quantises the data if the compression ratio has not been achieved 31 Intra-frame Compression (Like as still image Compression)
  • 32. • . Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearranges the pixels into frequency coefficients Replaces the original data with shorter codes or symbols Rearranges the coefficients from raster scan to low frequency first Stores the compressed data & checks compression ratio. Returns a quantisation signal if ratio is not achieved. Quantises the data if the compression ratio has not been achieved 32 Intra-frame Compression (Like as still image Compression)
  • 33. – Holds the results from the variable length coder and outputs data at a constant rate. – If the data buffer empties, ‘packing’ data is output. – If the data buffer fills, a signal is sent to quantizer. – The quantizer is instructed to reduce the amount of data. Data Buffer Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning 33
  • 34. – Simple video signal • DCT has most of its big numbers in the top left corner. • Zig-zag scan has all the big numbers at the start of the scan. • RLC and VLC can reduce the amount of data a lot. • Amount of data entering the data buffer is small. – Medium complexity video signal • DCT has some of its big numbers in the top left corner. • Zig-zag scan has a few big numbers at the start of the scan. • RLC and VLC reduce data a bit. • Amount of data entering the data buffer is OK. • Data buffer sends the compressed data out as is. • Data buffer adds packing data. – Complex video signal • DCT has big numbers all over the DCT block. • Zig-zag scan still results in big numbers everywhere. • RLC and VLC cannot reduce the amount of data very much. • Amount of data entering the data buffer is too high. • Data buffer sends a signal back to the quantiser. • Quantiser reduces the amount of data by cutting entropy. Data Buffer 34 Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning
  • 36. 36 − It arises when parts of a picture are often replicated within a single frame of video (with minor changes). Spatial Redundancy in Still Images This area is all blue This area is half blue and half green Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue Sky Blue
  • 37. This picture is the same as the previous one except for this area − It arises when successive frames of video display images of the same scene. − Take advantage of similarity between successive frames 37 Temporal Redundancy in Moving Images This picture is the same as the previous one except for this area
  • 38. 38 Temporal Redundancy in Moving Images
  • 39. Moving Picture Redundancies Temporal Redundancy − It arises when successive frames of video display images of the same scene. Spatial Redundancy − It arises when parts of a picture are often replicated within a single frame of video (with minor changes). 39 Temporal Redundancy (inter-frame) Spatial Redundancy (intra-frame)
  • 40. The MPEG video compression algorithm achieves very high rates of compression by exploiting the redundancy in video information. − Spatial Redundancy Reduction (pixels inside a picture are similar) − Temporal Redundancy Reduction (Similarity between the frames) − Statistical Redundancy Reduction (more frequent symbols are assigned short code words and less frequent ones longer words) The Principle of Compression for Moving Images 40
  • 41. Spatial Redundancy Reduction, Recall 41 Spatial Redundancy Reduction Transform coding Discrete Sine Transform (DST) Discrete Wavelet Transform (DWT) Hadamard Transform(HT) Discrete Cosine Transform (DCT) Differential Pulse Code Modulation (DPCM)
  • 42. The goal of the prediction model is to reduce redundancy by forming a prediction of the data and subtracting this prediction from the current data. − The residual is encoded and sent to the decoder which re-creates the same prediction so that it can add the decoded residual and reconstruct the current frame. − In order that the decoder can create an identical prediction, it is essential that the encoder forms the prediction using only data available to the decoder, i.e. data that has already been coded and transmitted. Prediction Model 42 Decoder Encoded Residual I. Re-creates the same prediction (predictor) II. Add the decoded residual to prediction (predictor) III. Reconstructs a version of the original block Encoder I. Forming a prediction (predictor) II. Subtracting prediction from the current data to create residual
  • 43. Spatial Prediction: The prediction is formed from previously coded image samples in the same frame − The output of this process is a set of residual or difference samples and the more accurate the prediction process, the less energy is contained in the residual. Temporal Prediction: The prediction is formed from previously coded frames 43 Inter Frame (Temporal) and Intra Frame (Spatial) Prediction
  • 44. 44 Inter Frame (Temporal) and Intra Frame (Spatial) Prediction
  • 45. 45 Inter Frame (Temporal) and Intra Frame (Spatial) Prediction
  • 46. The prediction for the current block of image samples is created from previously-coded samples in the same frame. − Assuming that the blocks of image samples are coded in raster-scan order, which is not always the case, the upper/left shaded blocks are available for intra prediction. − These blocks have already been coded and placed in the output bitstream. − When the decoder processes the current block, the shaded upper/left blocks are already decoded and can be used to re-create the prediction. − H.264/AVC uses spatial extrapolation to create an intra prediction for a block or macroblock. 46 Intra Prediction Available samples Spatial extrapolation
  • 48. Vertical Horizontal + + + + + + + + Mean DC Diagonal down-left Horizontal up Diagonal right Vertical right Vertical leftHorizontal down Intra Prediction (Ex: H.264/AVC) 1 2 34 56 7 8 Mode Mode Name 0 DC 1 Vertical 2 Horizontal 3 Diagonal down/right 4 Diagonal down/left 5 Vertical –right 6 Vertical-left 7 Horizontal- Up 8 Horizontal-down 48 A B C D E F G H I J K L M N O P A B C D I J K L M E F G H Mode 1 Mode 6 Mode 0 Mode 5 Mode 4 A B C D E F G H I J K L M N O P A B C D I J K L M E F G H Mode 8 Mode 3 Mode 7
  • 49. 49 Intra Prediction for 4x4 Luma Blocks Mode 0: DC Prediction − If all samples A, B, C, D, I, J, K, L, are available, a=b=c=…=p = (A+B+C+D+I+J+K+L+4) / 8. − If A, B, C, and D are not available and I, J, K, and L are available, a=b=c=…=p =(I+J+K+L+2) / 4. − If I, J, K, and L are not available and A, B, C, and D are available, a=b=c=…=p =(A+B+C+D+2) /4. − If all eight samples are not available, a=b=c=…=p = 128. Intra Prediction (Ex: H.264/AVC) 1 2 34 56 7 8 a b c d e f g h i j k l m n o p Q A B C D E F G I J K L H
  • 50. 50 Intra Prediction for 4x4 Luma Blocks Mode 1: Vertical Prediction − This mode shall be used only if A, B, C, D are available. The prediction in this mode shall be as follows: • a, e, i, m are predicted by A, • b, f, j, n are predicted by B, • c, g, k, o are predicted by C, • d, h, l, p are predicted by D. Intra Prediction (Ex: H.264/AVC) 1 2 34 56 7 8 a b c d e f g h i j k l m n o p Q A B C D E F G I J K L H
  • 51. 51 Intra Prediction for 4x4 Luma Blocks Mode 3: Diagonal Down/Right prediction − This mode is used only if all A,B,C,D,I,J,K,L,Q are inside the picture. This is a 'diagonal' prediction. • m is predicted by: (J + 2K + L + 2)/4 • i, n are predicted by: (I + 2J + K + 2)/4 • e, j, o are predicted by: (Q + 2I + J + 2)/4 • a, f, k, p are predicted by: (A + 2Q + I + 2)/4 • b, g, l are predicted by: (Q + 2A + B + 2)/4 • c, h are predicted by: (A + 2B + C + 2)/4 Intra Prediction (Ex: H.264/AVC) a b c d e f g h i j k l m n o p Q A B C D E F G I J K L H 1 2 34 56 7 8
  • 52. a b c d e f g h i j k l m n o p A B C D I J K L M E F G H Mode 4 a b c d e f g h i j k l m n o p A B C D I J K L M E F G H Mode 8 Intra Prediction for 4x4 Luma Blocks Example of 4 x 4 luma block – Sample a, d : predicted by round(I/4 + M/2 + A/4), round(B/4 + C/2 + D/4) for mode 4 – Sample a, d : predicted by round(I/2 + J/2), round(J/4 + K/2 + L/4) for mode 8 Intra Prediction (Ex: H.264/AVC)
  • 53. 53 Ex. Intra Prediction for 4x4 Luma Blocks Intra Prediction (Ex: H.264/AVC) A 4 × 4 luma block, part of the highlighted macroblock QCIF frame with highlighted macroblock
  • 54. 54 Ex. Intra Prediction for 4x4 Luma Blocks − The 9 prediction modes 0-8 are calculated for the following 4 × 4 block. − The Sum of Absolute Errors (SAE) for each prediction indicates the magnitude of the prediction error. − In this case, the best match to the actual current block is given by mode 8, horizontal-up, because this mode gives the smallest SAE. − A visual comparison shows that the mode 8 P block (prediction block) appears quite similar to the original 4 × 4 block. Intra Prediction (Ex: H.264/AVC)
  • 55. 55 Intra Prediction for 4x4 Chroma Blocks (Only one mode: DC Prediction) − A, B, C, D are four 4x4 blocks in a 8x8 chroma block. − S0, S1, S2, S3 are the sums of 4 neighboring pixels. Intra Prediction (Ex: H.264/AVC) If S0, S1, S2, S3 are all inside the frame: A = (S0 + S2 + 4)/8 B = (S1 + 2)/4 C = (S3 + 2)/4 D = (S1 + S3 + 4)/8 If only S0 and S1 are inside the frame: A = (S0 + 2)/4 B = (S1 + 2)/4 C = (S0 + 2)/4 D = (S1 + 2)/4 If only S2 and S3 are inside the frame: A = (S2 + 2)/4 B = (S2 + 2)/4 C = (S3 + 2)/4 D = (S3 + 2)/4 If S0, S1, S2, S3 are all outside the frame: A = B = C = D = 128 A B C D S1S0 S2 S3
  • 56. 56 Intra Prediction (Ex: H.264/AVC) 16x16 Intra Prediction Mode − Especially suitable for smooth areas − Prediction Modes • Mode 0 =Vertical Prediction • Mode 1 = Horizontal Prediction • Mode 2 = DC prediction • Mode 3 = Plane prediction − Residual coding • Another 4x4 transform is applied to the 16 DC coefficients • Only single scan is used.
  • 57. 57 Intra Prediction (Ex: H.264/AVC) 16x16 Intra Prediction Mode − Mode 0 =Vertical Prediction • Pred(i, j) = P(i, -1), i, j = 0, 1, ...,15 − Mode 1 = Horizontal Prediction • Pred(i, j) = P(-1, j), i, j = 0, 1, ...,15 − Mode 2 = DC prediction • Pred(i, j) = • i, j = 0, 1, ..., 15 − Mode 3 = Plane prediction • Pred(i, j)= max(0, min(255, (a+b× (i-7)+c ×(j-7) +16)/32 ) ), • where a=16×(P(-1,15)+P(15,-1)), b=5 ×(H/4)/16, c=5 ×(V/4)/16    8 1 ))1,7()1,7(( i iPiPiH    8 1 ))7,1()7,1(( j jPjPjV 32/)))1,(),1((( 15 0   i iPiP
  • 58. 58 Ex. Intra 16×16 − A luma macroblock with previously-encoded samples at the upper and lefthand edges. − The best match is given by mode 3 which in this case produces a plane with a luminance gradient from light at the upper-left to dark at the lower-right. Intra Prediction (Ex: H.264/AVC)
  • 59. 59 Application of Difference Frame in Video Coding Less Information
  • 60. 60 General Reasons for Differences Between Two Frames Differences between two frames can be caused by • Camera motion: the outlines of background or stationary objects can be seen in the Diff Image • Object motion: the outlines of moving objects can be seen in the Diff Image • Illumination changes: sun rising, headlights, etc. • Scene Cuts: Lots of stuff in the Diff Image • Noise: If the only difference between two frames is noise (nothing moved), then you won’t recognize anything in the Difference Image We try to minimize entropy in difference image by motion prediction
  • 61. 61 Typical Camera Motions, Recall Track right Dolly backward Boom up (Pedestal up) Pan right Tilt up Track left Dolly forward Boom down (Pedestal down) Pan left Tilt down Roll
  • 62. 62 Typical Objects Motions, Recall Translation: Simple movement of typically rigid objects Camera pans vs. movement of objects Rotation: Spinning about an axis – Camera versus object rotation Zooms –in/out – Camera zoom vs. object zoom (movement in/out) Frame n+1 (Translation) Frame n+1 (Rotation) Frame n+2 (Zoom) Frame n
  • 67. Frame N Frame N+1 (Frame N) - (Frame N+1) 67 Difference Frames and Motion
  • 68. 68 Frame N Frame N+1 (Frame N) - (Frame N+1) Difference Frames and Motion
  • 69. 69 Frame N Frame N+1 (Frame N) - (Frame N+1) Difference Frames and Motion
  • 70. Difference Frame Without Motion Prediction Difference Frame With Motion Prediction Frame N Frame N+1 70 Goal: to remove the correlation by motion compensation If you can see something in the Diff Image and recognize it, there’s still correlation in the difference image. Temporal Prediction (Motion Prediction)
  • 71. Temporal Redundancy Reduction − Pixels in the successive frames of the same locations are highly correlated. − In static parts of the picture, they are virtually the same. − Due to motion, they are displaced, but their motion compensated values become more similar. − The accuracy of the prediction can usually be improved by compensating for motion between the reference frame(s) and the current frame. • Hence motion compensated frame difference pixels become smaller • Instead of transforming a block of pixels, their motion compensated values are transformed and quantised. • The predicted frame is created from one or more past or future frames known as reference frames (anchor). 71 Temporal Prediction (Motion Prediction) Frame 1 (as a predictor for frame 2 ) Frame 2 (current frame ) Difference Mid-grey represents a difference of zero and light or dark greys correspond to positive and negative differences.
  • 72. Preveious frame N Next frame N+1 Small Different (Residual block) Y X (Forward) Motion Vector Best Match for Macroblock in preveious Frame N (the predictor for the macroblock in the next frame) Macroblock in next frame N+1 Best Match for Macroblock in preveious Frame N (motion-compensated prediction) Macroblock in next frame N+1 Differentiator 72 Movement is cancelled in frames and added as vector information in header. Temporal Prediction (Motion Prediction)
  • 73. 73 Image t Image t-1 Diff. without motion compensation Differences with motion compensationMotion vectors for Blocks Temporal Prediction (Motion Prediction), Example
  • 74. 74 Image to codeReference image Temporal Prediction (Motion Prediction), Example
  • 75. 75 Searching area for finding predictor Image to code Reference image Temporal Prediction (Motion Prediction), Example
  • 76. 76 (a motion compensated prediction) The chosen candidate region becomes the predictor for the current MxN block. Motion Estimation (ME) The process of finding the best match (finding MVs) Residual block Best Match for Macroblock in preveious Frame N (motion-compensated prediction) Macroblock in next frame N+1 Diff. Motion Compensation (MC) − The selected ‘best’ matching region in the reference frame is subtracted from the current macroblock to produce a residual macroblock. − This residual block encoded and transmitted together with a motion vector describing the position of the best matching region relative to the current macroblock position. Temporal Prediction (Motion Prediction)
  • 77. Motion Estimation (ME) − Search an area in the reference frame (a past or future frame) to find a similar MxN-sample region. − The process of finding the best match (Motion Vector) is known as motion estimation (ME). Motion Compensation (MC) − The chosen candidate region becomes the predictor for the current MxN block (a motion compensated prediction) and is subtracted from the current block to form a residual MxN block. − The residual block is encoded and transmitted and the offset between the current block and the position of the candidate region (motion vector) is also transmitted. − The decoder uses the received motion vector to re-create the predictor region. − It decodes the residual block, adds it to the predictor and reconstructs a version of the original block. 77 Temporal Prediction (Motion Prediction)
  • 78. 78 Motion Vector Extraction − First, the frame to be approximated, the current frame, is chopped up into uniform non-overlapping blocks. − Then each block in the current frame is compared to areas of similar size from the previous frame in order to find an area that is similar. A block from the current frame for which a similar area is sought is known as a target block. − The location of the similar or matching block in the past frame might be different from the location of the target block in the current frame. The relative difference in locations is known as the motion vector. − If the target block and matching block are found at the same location in their respective frames then the motion vector that describes their difference is known as a zero vector Temporal Prediction (Motion Prediction)
  • 79. Block-based Motion Estimation: − Motion estimation of a macroblock involves finding a M×N-sample region in a reference frame that closely matches the current macroblock (best match). − The reference frame is a previously encoded frame from the sequence and may be before or after the current frame in display order. (frames must be encoded out of order) − Where there is a significant change between the reference and current frames (ex: a scene change or an uncovered area) it may be more efficient to encode the macroblock without motion compensation and so an encoder may choose intra mode encoding using intra prediction. Block-based Motion Compensation: − The luma and chroma samples of the selected ‘best’ matching region in the reference frame is subtracted from the current macroblock to produce a residual macroblock that is encoded and transmitted together with a motion vector describing the position of the best matching region relative to the current macroblock position. Block-based Motion Estimation and Compensation 79
  • 80. T=2 (current) Block-based Motion Estimation and Compensation, Ex1 Search Window T=1 (reference) 80
  • 81. 81 Block-based Motion Estimation and Compensation, Ex2 T=2 (current) T=1 (reference)
  • 82. 82 Frame 1 s[x,y,t-1](previous) Frame 2 s[x,y,t](current) Partition of frame 2 into blocks (schematic) Frame 2 with displacement vectors Difference between motion-compensated prediction and current frame u[x,y,t] Referenced blocks in frame 1 Block-based Motion Estimation and Compensation, Ex3
  • 83. 83 Effectiveness of Block Based Motion Prediction The effectiveness of compression techniques that use block based motion compensation depends on the extent to which the following assumptions hold. • Objects move in a plane that is parallel to the camera plane. Thus the effects of zoom and object rotation are not considered, although tracking in the plane parallel to object motion is. • Illumination is spatially and temporally uniform. That is, the level of lighting is constant throughout the image and does not change over time. • Occlusion of one object by another, and uncovered background are not considered.
  • 84. Frame N Frame N+1 Available from earlier frame (N) Not available from earlier frame (N) for prediction of frame N+1 84 Occlusion in Motion Estimation and Compensation, Example Occlusion parts of one object by another object
  • 85. 85 Motion Estimation and Block Matching Algorithms To carry out motion compensation, the motion of the moving objects has to be estimated first. − The technique in all the standard video codecs is the Block Matching Algorithm (BMA). − In a typical BMA, a frame is divided into blocks of 𝑀 × 𝑁 pixels or, more usually, square blocks of 𝑁 × 𝑁 pixels. − Then, for a maximum motion displacement of w pixels/frame, the current block of pixels is matched against a corresponding block at the same coordinates but in the previous/next frame, within the square window of width 𝑁 + 2𝑤. − The best match on the basis of a matching criterion yields the displacement. − Measurements of the video encoders’ complexity show that ME comprises almost 50–70 per cent of the overall encoder’s complexity.
  • 86. N+2w (m,n (m+i,n+j) w w N+2w i j (NxN) block in the current frame search window in the previous frame (NxN) block under the search in the previous frame, shifted by i,j )   2 1 1 2 ),(),( 1 ),(    N m N n jnimgnmf N jiM wjiw  ,    N m N n jnimgnmf N jiM 1 1 2 ,(),( 1 ),( , , Mean Squared Error (MSE) Mean Absolute Error (MAE) Complexity • Various measures such as the Cross-correlation Function (CCF), Mean Squared Error (MSE) and Mean Absolute Error (MAE) can be used in the matching criterion. • For the best match, in the CCF the correlation has to be maximised, whereas in the latter two the distortion must be minimised. • In practical coders, both MSE and MAE are used, since it is believed that CCF would not give good motion tracking, especially when the displacement is not large. (2𝑊 + 1)2 Motion Estimation and Block Matching Algorithms 86
  • 87. 87 Reference Frame Current Frame Search Range Motion Vector 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 11 23 34 44 55 66 77 88 11 23 34 44 55 66 77 88 11 23 34 44 55 66 77 88 11 23 34 44 55 66 77 88 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 |A|=12 Exhaustive Block Matching Algorithm (EBMA)
  • 88. 88 1 1 1 1 1 2 2 2 3 3 4 44 4 4 1 2 3 4 Two-dimensional Log Search Algorithm (TDL)
  • 89. 89 11 1 1 1 2 2 2 2 11 11 2 2 2 2 3 3 3 3 3 3 3 3 1 2 3 Three-Step Search Algorithm (TSS)
  • 90. 90 Cross Search Algorithm (CSA) -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Step.1 => Step.2 Side => Step.3 Side => Step.4 Center Total Search 5+3+3+8=19 Points Normal Worse Case N=4 for search range -7 to +7
  • 91. 91 Diamond Search -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Step.1 =>Step.2 Side => Step.3 Side => Step.4 Side => Step.5 Total Search 9+6+6+4+4=29 Points
  • 92. 92 Hexagon-Based Search Algorithm First step Second step Final step Third step
  • 93. Algorithm Maximum number of search points 4 8 16 w TDL 2 + 7 log w2 16 23 30 TSS 1 + 8 log w2 17 25 33 MMEA 1 + 6 log w 2 13 19 25 CDS 3 + 2 w 11 19 35 CSA 5 + 4 log w 2 13 17 21 (2w+1)FSM 81 289 10892 OSA 1 + 4 log w 9 13 17 2 Computational Complexity Complexity/Performance 93 The range of motion speed from w=4 to 16 pixels/frame − The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean- square error, average error and standard deviation. FSM: Full Search Mode TDL: Two-dimensional Logarithmic TSS: Three-step Search MMEA: Modified Motion Estimation Algorithm CDS: Conjugate Direction Search OSA: Orthogonal Search Algorithm CSA: Cross-search Algorithm
  • 94. Algorithm Split Screen Trevor White Entropy (bits/pel) Standard deviation Entropy (bits/pel) Standard deviation FSM 4.57 7.39 4.41 6.07 TDL 4.74 8.23 4.60 6.92 TSS 4.74 8.19 4.58 6.86 MMEA 4.81 8.56 4.69 7.46 CDS 4.84 8.86 4.74 7.54 OSA 4.85 8.81 4.72 7.51 CSA 4.82 8.65 4.68 7.42 Compensation Efficiency Complexity/Performance 94 The motion compensation efficiencies of some algorithms for a motion speed of w=8 pixels/frame for two test image sequences (Split screen and Trevor white) − The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean- square error, average error and standard deviation. FSM: Full Search Mode TDL: Two-dimensional Logarithmic TSS: Three-step Search MMEA: Modified Motion Estimation Algorithm CDS: Conjugate Direction Search OSA: Orthogonal Search Algorithm CSA: Cross-search Algorithm
  • 95. Some Search Points Comparison Methods Min Max Average Speed-up FS 225 225 225 1 TSS 25 25 25 9 4SS 17 27 17.2 13.1 NTSS 17 33 17.5 12.8 Diamond 13 33 13.3 16.9 Method Ave Criterion Ave Distance (FS) Optimality FS 2753 0 100% TSS 2790 0.04 98.5% 4SS 2777 3.84 98.7% NTSS 2775 2.98 99.0% Diamond 2770 3.11 98.9% 95
  • 96. 96 Sub-pixel Motion Compensation − In the first stage, motion estimation finds the best match on the integer pixel grid (circles). − The encoder searches the half-pixel positions immediately next to this best match (squares) to see whether the match can be improved and if required, the quarter-pixel positions next to the best half-pixel position (triangles) are then searched. − The final match, at an integer, half-pixel or quarter-pixel position, is subtracted from the current block or macroblock.
  • 97. Sub-pixel Motion Compensation 97 Close-up of reference region Reference region interpolated to half-pixel positions
  • 100. Motion Compensation Block Size 100 Frame 2 Frame 1 Residual : no motion compensation Residual : 16 × 16 block size Residual : 8 × 8 block size Residual : 4 × 4 block size
  • 101. − The smaller motion compensation block sizes can produce better motion compensation results. • Motion compensating each 8 × 8 block instead of each 16 × 16 macroblock reduces the residual energy further and motion compensating each 4 × 4 block gives the smallest residual energy of all. − However, a smaller block size leads to increased complexity, with more search operations to be carried out, and an increase in the number of motion vectors that need to be transmitted. − An effective compromise is to adapt the block size to the picture characteristics, for example • choosing a large block size in flat, homogeneous regions of a frame • choosing a small block size around areas of high detail and complex motion Motion Compensation Block Size 101
  • 102. 102 𝒄(𝒙, 𝒚, 𝒕) DCT-based Decoder Image Buffer 𝒇(𝒙, 𝒚, 𝒕) ^ 𝒇(𝒙 + 𝑫𝒙, 𝒚 + 𝑫𝒚, 𝒕 − 𝟏) ^ + + Motion Estimation (𝑫𝒙, 𝑫𝒚, 𝒕) (𝑫𝒙, 𝑫𝒚, 𝒕) Motion Compensated DPCM Previously-coded Frame at t-1 Extra Coding Parameter, Motion vector: (Dx, Dy) Current Frame at t 𝒇(𝒙, 𝒚, 𝒕) Lossless Encoder DCT-based Coder 𝒅(𝒙, 𝒚, 𝒕) Image Buffer 𝒇(𝒙, 𝒚, 𝒕) ^ 𝒇(𝒙 + 𝑫𝒙, 𝒚 + 𝑫𝒚, 𝒕 − 𝟏) ^ + _ + + MC-DPCM f(x, y, t) f(x+Dx, y+Dy, t1)^ DPCM
  • 103. − It is possible to estimate the trajectory of each pixel between successive video frames, producing a field of pixel trajectories known as the optical flow or optic flow. − If the optical flow field is accurately known, it should be possible to form an accurate prediction of most of the pixels of the current frame by moving each pixel from the reference frame along its optical flow vector. − However, this is not a practical method of motion compensation. (An accurate calculation of optical flow is very computationally intensive) Optical Flow or Optic Flow 103 Frame 1 (as a predictor for frame 2 ) Frame 2 (current frame ) Optical Flow or Optic Flow
  • 104. − The macroblock, corresponding to a 16×16-pixel region of a frame, is the basic unit for motion compensated prediction in a number of important visual coding standards including MPEG-1, MPEG-2, MPEG-4 Visual, H.261, H.263 and H.264. − For source video material in the popular 4:2:0 format, a macroblock is organized as shown in Figure. − An H.261 codec processes each video frame in units of a macroblock. 104 Y Y Y Y Cr Cb 8 8 Ex: Motion Estimation in H.261
  • 105. Macro-block – Motion estimation of a macroblock involves finding a 16×16-sample region in a reference frame that closely matches the current macroblock. – Luminance: 16x16, four 8x8 blocks – Chrominance: two 8x8 blocks – Motion estimation only performed for luminance component Motion Vector Range – [ -15, 15] – MB: 16 x 16 15 15 15 15 Search Area in Reference Frame MB 105 Ex: Motion Estimation in H.261 𝑪𝒓 𝑪𝒃 𝒀 𝒀 𝟎 𝒀 𝟏 𝒀 𝟐 𝒀 𝟑
  • 106. − Integer pixel ME search only − Motion vectors are differentially & separately encoded − 11-bit VLC for MVD (Motion Vector Delta) Example MV = 2 2 3 5 3 1 -1 MVD = 0 1 2 -2 -2 -2… − Binary: 1 010 0010 0011 0011 0011… ]1[][ ]1[][   nMVnMVMVD nMVnMVMVD yyy xxx 106 MVD VLC … … -2 & 30 0011 -1 011 0 1 1 010 2 & -30 0010 3 & -29 0001 0 Ex: Motion Vectors Coding in H.261 𝐌𝐕𝐃 = 𝒇(𝒙, 𝒚, 𝒕) − 𝒇(𝒙+𝜟𝒙, 𝒚 + 𝜟𝒚, 𝒕 − 𝟏)
  • 107. Uncompressed SDTV Digital Video Stream - 170 Mb/s Picture 830kBytes Picture 830kBytesPicture 830kBytes 100 kBytes I Frame Picture 830kBytes B Frame 12-30 kBytes12-30 kBytes B Frame 33-50 kBytes P Frame I - Intra coded picture without reference to other pictures. Compressed using spatial redundancy only MPEG-2 Compressed SDTV Digital Video Stream - 3.9 Mb/s P - Predictive coded picture using motion compensated prediction from past I or P frames B - Bi-directionally predictive coded picture using both past and future I or P frames I, P & B Frames (Ex: MPEG 1) 107 I: Intra Coded Frame P: Predictively Coded (Predictive-coded ) Frame B: Bidirectionally Coded (Bidirectional-coded) Frame
  • 108. • Intraframe Compression – Frames marked by (I) denote the frames that are strictly intraframe compressed. – The purpose of these frames, called the "I pictures", is to serve as random access points to the sequence. I Frames 108
  • 109. • P Frames use motion-compensated forward predictive compression on a block basis. – Motion vectors and prediction errors are coded. – Predicting blocks from closest (most recently decoded) I and P pictures are utilised. Forward Prediction P Frames 109
  • 110. • B frames use motion-compensated bi-directional predictive compression on a block basis. – Motion vectors and prediction errors are coded. – Predicting blocks from closest (most recently decoded) I and P pictures are utilised. Forward Prediction Bi-Directional Prediction B Frames 110 Backward Prediction
  • 111. I-pictures • They are coded without reference to the previous picture. • They provide access points to the coded sequence for decoding (intraframe coded as for JPEG) P-pictures • They are predictively coded with reference to the previous I- or P-coded pictures. • They themselves are used as a reference (anchor) for coding of the future pictures. B-pictures • Bidirectionally coded pictures, which may use past, future or combinations of both pictures in their predictions. D-pictures • As intraframe coded, where only the DC coefficients are retained. • Hence, the picture quality is poor and normally used for applications like fast forward. • D-pictures are not part of the GOP; hence, they are not present in a sequence containing any other picture types. 111 I, P, B and D Pictures Features
  • 112. • Relative number of (I), (P), and (B) pictures can be arbitrary. • Group of Pictures (GOP) is the Distance from one I frame to the next I frame. • Ex: MPEG-2: An I picture is mandatory at least once in a sequence of 132 frames (period_max=132) 1 2 3 4 5 6 7 8 9 10 11 12 1 GOP = 12 Group of Pictures 112
  • 113. An I picture is mandatory at least once in a sequence of 132 frames (period_max= 132) GOP = 6 GOP = 2 GOP = 2 113 Group of Pictures, Examples
  • 114. – I frames are independently encoded. – P frames are based on previous I, P frames. – B frames are based on previous and following I and/or P frames. 114 The Typical Size of Compressed Frames I: Intra Coded Frame P: Predictively Coded (Predictive-coded ) Frame B: Bidirectionally Coded (Bidirectional-coded) Frame Type Size Compression I 18kB 7:1 P 6kB 20:1 B 2.5kB 50:1 Avg 4.8kB 27:1 Typical Sizes of MPEG-1 Frames
  • 115. – If B-pictures are not used for predictions of future frames, then they can be coded with the highest possible compression without any side effects. – This is because, if one picture is coarsely coded and is used as a prediction, the coding distortions are transferred to the next frame. This frame then needs more bits to clear the previous distortions, and the overall bit rate may increase rather than decrease. – The typical size of compressed P-frames is significantly smaller than that of I-frames (because temporal redundancy is exploited in inter-frame compression). – B-frames are even smaller than P-frames because • the advantage of bi-directional prediction • the lowest priority given to B-frames 115 Type Size Compression I 18kB 7:1 P 6kB 20:1 B 2.5kB 50:1 Avg 4.8kB 27:1 Typical Sizes of MPEG-1 Frames The Typical Size of Compressed Frames
  • 116. Previous reference Current frame Future reference Forward prediction P-frame (Predictive Coded Frame) Features Forward prediction Why we need P frame? 116
  • 117. • . + -1 +1 Intra frame compression P’ P – The P-frames are forward predicted from the last I-frame or P- frame. – It is impossible to reconstruct them without the data of another frame (I or P). – Are coded with respect to the nearest previous I- or P-frames. – This technique is called forward prediction. – It uses motion compensation to provide more compression than I- frames. – About 30% the size of an I frame P-frame (Predictive Coded Frame) Features 117 Forward Prediction
  • 118. Difference Forward Motion vector Reference image (previous image) Encode: − Motion Vector - difference in spatial location of macro-blocks. − Small difference in content of the macro-blocks Current image Huffman Coding P-frame (Predictive Coded Frame) Features 118 000111010…...
  • 119. B-frame (Bidirectional Coded Frame) Features Previous reference Current frame Future reference Forward prediction Backward prediction Why we need B frame? 119
  • 120. 120 – The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. – A match however can readily be obtained from the next frame. B-frame (Bidirectional Coded Frame) Features Forward prediction Backward prediction
  • 121. Best match Forward Motion Vector Macroblock to be coded Previous reference picture Current B-picture Future reference picture Best match Backward Motion Vector 121 Forward Prediction Backward Prediction B-frame (Bidirectional Coded Frame) Features
  • 123. – B-frame requires information of the previous and following I-frame and/or P-frame for encoding and decoding. – Three types of motion compensation techniques are used: • Forward motion compensation uses past anchor frame information. • Backward motion compensation uses future anchor frame information. • Interpolative motion compensation uses the average of the past and future anchor frame information. – It uses motion compensation to provide more compression than I-and P-frames. – About 15% the size of an I frame. 123 B-frame (Bidirectional Coded Frame) Features + +1 Intra frame compression B’ B -1 2 -1 2 Forward Prediction Bi-Directional PredictionBackward Prediction
  • 124. 124 – B-pictures have access to both past and future anchor pictures. – Such an option increases the motion compensation efficiency, particularly when there are occluded objects in the scene. – In fact, one of the reasons for the introduction of B-pictures was this fact that the forward motion estimation and P-pictures cannot compensate for the uncovered background of moving objects. – From the two forward and backward motion vectors, the coder has a choice of choosing any of the forward, backward or their combined motion-compensated predictions. B-frame (Bidirectional Coded Frame) Features
  • 125. 125 – Note that B-pictures do not use motion compensation from each other, since they are not used as predictors. – Also note that the motion vector overhead in B-pictures is much more than in P-pictures. – The reason is that, for B-pictures, there are more macroblock types, which increase the macroblock type overhead, and for the bidirectionally motion-compensated macroblocks two motion vectors have to be sent. B-frame (Bidirectional Coded Frame) Features
  • 126. - [ + ] = Past reference Target Future reference Encode: − Two motion vectors - difference in spatial location of macro-blocks. • Two motion vectors are estimated (one to a past frame, one to a future frame). − Small difference in content of the macro-blocks 126 Interpolative compensation uses the waited average of the past and future anchor frame information. B-frame (Bidirectional Coded Frame) Features FMV BMV 𝒘 𝟏 𝒘 𝟐 DCT + Quant + RLE Huffman CodeMotion Vectors 000111010…...
  • 127. 127 The combined motion-compensated predictions – A weighted average of the forward and backward motion-compensated pictures is calculated. – The weight is inversely proportional to the distance of the B-picture with its anchor pictures. Ex: GOB structure of I, B1, B2, P – The bidirectionally interpolated motion-compensated picture for B1 would be two-thirds of the forward motion-compensated pixels from the I-picture and one-third from backward motion-compensated pixels of the P-picture. B-frame (Bidirectional Coded Frame) Features 𝑀𝑉 𝑓𝑜𝑟 𝐵1 = 2 3 𝐹𝑀𝑉 𝑓𝑟𝑜𝑚 𝐼 + 1 3 𝐵𝑀𝑉 (𝑓𝑟𝑜𝑚 𝑃)
  • 128. New Search Mechanism: Prediction Spatial Motion Vector Prediction: a. Due to the spatial correlation, the motion vector of the current block is close to those in nearby blocks. − Usage of Predictor: 1. Initial search point 2. DPCM coding of motion vector 128 𝒗𝒊, 𝒋 of current block vi,j1vi,j2 vi 1,j1vi 1,j2 vi 1,j 1vi 1,j Previous coded blocks },,{~ ,11,11, jijijiij Mean  vvvv Predictor Example 1: },,{~ ,11,11, jijijiij Median  vvvv Predictor Example 2: Uncoded Block
  • 129. New Search Mechanism: Prediction Temporal Motion Vector Prediction: b. Due to the temporal correlation, the motion vector of the current block is close to those in nearby blocks in the previous frame. 129 of current block t ijv t ji ,1v t ji 1,1 vt ji 1,1 vt ji 2,1 v t ji 2, v t ji 1, v1t ijv 1 ,1   t jiv 1 1,1   t jiv1 1,1   t jiv1 2,1   t jiv 1 2,   t jiv 1 1,   t jiv 1 , t jiv Current FramePrevious Frame Uncoded Block (all are coded) },,,,,{~ 1 ,1 1 ,,11,11,     t ji t ji t ji t ji t ji t ij Median vvvvvv (Predictor Example) Uncoded Block Uncoded Block Uncoded Block Uncoded Block Uncoded Block
  • 131. 131 Inter-frame and Intra-frame Coding Still Image Compression Moving Picture Compression
  • 132. Intra-frame Compression/Coding – Is a picture coded without reference to any picture except itself. – It is a still image encoded in JPEG in real-time. – Often, I pictures (I-frames) are used for random access and as references for the decoding of other pictures. 132
  • 133. • Zig-Zag Scan • Run-length coding • VLC Quantization • Major Reduction • Controls ‘Quality’ 133 Intra-frame Compression (Like as still image Compression)
  • 134. • . Intra-frame Compression (Like as still image Compression) Data Buffer Entropy Coding Quantisation Discrete Cosine Transform Base band input Compressed output Zig-zag Scanning Rearranges the pixels into frequency coefficients Replaces the original data with shorter codes or symbols Rearranges the coefficients from raster scan to low frequency first Stores the compressed data & checks compression ratio. Returns a quantisation signal if ratio is not achieved. Quantises the data if the compression ratio has not been achieved 134
  • 137. – Inter-frame coding removes temporal redundancy (Inter-frames reduce the average bit rate for the same quality!) – Relies on successive frames looking similar. • Does not work well with cuts and breaks. – 2 different types of comparison. • P-frame & B-frame. – Inter-frames need a ‘reference’ frame. • i.e. an I-frame (or P-frame). – Many inter-frames can be used after the I frame. • This can reduce bit rate a lot. – Eventually the process must be started again. • The difference becomes too great especially if there is a cut. 137 Inter-frame Compression/Coding
  • 139. DCT Motion Estimation Motion Compensation Frame store Entropy Coding (RLC then VLC) Intra Prediction Intra/Inter Mode Decision Inverse Quantization + - + + Video Input Output Bit streamQuantization Inverse Transform MVs buffer MVs Deblocking Filter 139 A Generic Video Encoder (Ex: AVC)
  • 141. 141 Interframe Loop − In interframe predictive coding, the difference between pixels in the current frame and their prediction values from the reference frame (ex: previous frame) is coded and transmitted. − At the receiver, after decoding the error signal of each pixel, it is added to a similar prediction value to reconstruct the picture. − The better the predictor, the smaller the error signal, and hence the transmission bit rate. − when there is a motion, assuming that movement in the picture is only a shift of object position, then a pixel in the previous frame, displaced by a motion vector, is used. A Generic Video Encoder
  • 142. 142 Motion Estimator − Assigning a motion vector to a group of pixels. − A group of pixels is motion compensated, such that the motion vector overhead per pixel can be very small. − In standard codecs, a block of 16×16 pixels, known as a macroblock (MB) (to be differentiated from 8 ×8 DCT blocks), is motion estimated and compensated. − It should be noted that ME is only carried out on the luminance parts of the pictures. − A scaled version of the same motion vector is used for compensation of chrominance blocks, depending on the picture format. A Generic Video Encoder
  • 143. 143 BD –Block Difference DBD – Displaced Block Difference X X 3 2.7 MC No MC 256 DBD y  x  256 BD 1.5 0.5 1 DBD   c[x, y] r[x  dx, y  dy] 256 MB 1 BD   c[x, y] r[x, y] 256 MB 1 𝑦 = 𝑥/1.1 A Generic Video Encoder – Not all blocks are motion compensated – The one which generates less bits are preferred. Motion Compensation Decision Characteristic (H.261)
  • 144. 144 Inter/Intra Switch − Every MB is either interframe or intraframe coded, called inter/intra MBs. − The decision on the type of MB depends on the coding technique. − Sometimes it might be advantageous to intraframe code an MB, rather than interframecoding it. There are at least two reasons for intraframe coding: I. Scene cuts or, in the event of violent motion, interframe prediction errors may not be less than those of the intraframe. Hence, intraframe pictures might be coded at lower bit rates. II. Intraframe coded pictures have a better error resilience to channel errors. A Generic Video Encoder
  • 145. 145 Inter/Intra Switch − In interframe coding in the event of channel error, the error propagates into the subsequent frames. If that part of the picture is not updated, the error can persist for a long time. − The variance of intraframe MB is compared with that of the variance of interframe MB (motion compensated or not) in previous frame. The smallest is chosen. • For large variances, no preference between the two modes. • For smaller variances, interframe is preferred. − The reason is that, in intra mode, the DC coefficients of the blocks have to be quantised with a quantiser without a dead zone and with 8-bit resolutions. This increases the bit rate compared to that of the interframe mode, and hence interframe is preferred. MC/NO_MC mode decision in H.261 A Generic Video Encoder (Intraframe AC energy) (Interframe AC energy)
  • 146. 146 DCT − Every MB is divided into 8×8 luminance and chrominance pixel blocks. − Each block is then transformed via the DCT. − There are four luminance blocks in each MB, but the number of chrominance blocks depends on the colour resolutions (image format). A Generic Video Encoder
  • 147. 147 Quantiser − There are two types of quantisers. • With dead zone for the AC coefficients and the DC coefficient of inter MB • Without the dead zone for the DC coefficient of intra MB. − With a dead zone quantiser, if the modulus (absolute value) of a coefficient is less than the quantiser step size q, it is set to zero; otherwise, it is quantised according to quantiser indices. Variable length coding − The quantiser indices are variable length coded, according to the type of VLC used. − Motion vectors, as well as the address of coded MBs, are also variable length coded. A Generic Video Encoder
  • 148. 148 IQ and IDCT − To generate a prediction for interframe coding, the quantised DCT coefficients are first inverse quantised and inverse DCT coded. − These are added to their previous picture values (after a frame delay by the frame store) to generate a replica of decoded picture. − The picture is then used as a prediction for coding of the next picture in the sequence. Buffer − The bit rate generated by an interframe coder is variable. (a function of motion of objects and their details) − Therefore, to transmit coded video into fixed rate channels, the bit rate has to be regulated. Storing the coded data in a buffer and then emptying the buffer at the channel rate does this. − However, if the picture activity is such that the buffer may overflow (violent motion), then a feedback from the buffer to the quantiser can regulate the bit rate. A Generic Video Encoder
  • 150. Motion Compensation Entropy Decoding Intra Prediction Intra/Inter Mode Selection Inverse Quantization & Inverse DCT + + Input Bit stream Video Output Picture Buffering Deblocking Filter 150 A Generic Video Decoder (Ex: AVC)
  • 151. 151 A Generic Video Decoder − The compressed bitstream, after demultiplexing and Variable Length Decoding (VLD), separates the motion vectors and the DCT coefficients. − Motion vectors are used by motion compensation − The DCT coefficients after the inverse quantisation and IDCT are converted to error data. − They are then added to the motion-compensated previous frame to reconstruct the decoded picture.
  • 152. 152 Bit Rate Variation Constant Bit Rate (CBR) − Quantiser step size and even frame rate may change to adapt the bit rate to channel rate − Video quality is variable − Normally a complex structure is used to regulate the bit rate Variable Bit Rate (VBR) − Quantiser step size is nearly constant, generating almost constant quality picture − Difficult to adapt to channel rate, but is suitable for packet Switched Network applications (e.g. Internet) − No need for bit rate regulation (coded is simple)
  • 153. − To achieve the requirement of random access, a set of pictures can be defined to form a Group of Picture (GOP), consisting of a minimum of one I-frame, which is the first frame, together with some P-frames and/or B-frames. 153 Group of Picture (GOP), Recall Forward Prediction Bi-Directional PredictionBackward Prediction GOP = 12 I: Intra Coded Frame P: Predictively Coded (Predictive-coded ) Frame B: Bidirectionally Coded (Bidirectional-coded) Frame
  • 154. Example GOP Structures MPEG-2: Simple Possibilities MPEG-2: An I picture is mandatory at least once in a sequence of 132 frames (period_max=132) 154 I B B P B B I B B P B B I I I I I I I I I I I I I I I P I P I P I P I P I P I I P I P I I I P I P I I I
  • 155. Example GOP Structures • .I I I I I I I I I I I I I I I I I I B I B I B I B I B I B I B I B I B I B I P B B P B B P B B P B B P I P B High bit rate, broadcast quality. Easy to edit. Low bit rate, domestic and transmission quality. No further editing required.
  • 156. Example GOP Structures • .I I I I I I I I I I I I I I I I I I B I B I B I B I B I B I B I B I B I B I P B B P B B P B B P B B P I P B I frame only (1 frame GOP) Used by Sony IMX IB frame only (2 frame GOP) Used by Sony Betacam SX Long GOP Used by satellite, cable & DVD
  • 157. 157 How to Chose Current Frame: I, B, P? To chose of how to encode the current frame is done by encoder − Change of scene frames should be encoded as I-frame − Encoder should never allocate too long sequences of P or B frames (Interframe coding is bad for error resilience) − B frames are computationally intensive − Must compute forward and backward motion vectors On average, natural images with fixed quantization intervals: − Size (I-frame), Size (P-frame), Size (B-frame)= 6:3:2
  • 158. − I and P pictures are called “anchor” pictures − A GOP is a series of one or more pictures to assist random access into the picture sequence. − The GOP length is normally defined as the distance between I-pictures, which is represented by parameter N in the standard codecs. − The distance between the anchor I/P and P-pictures is represented by M. − The encoding or transmission order of pictures differs from the display or incoming picture order. − This reordering introduces delays amounting to several frames at the encoder (equal to the number of B- pictures between the anchor I- and P-pictures). − The same amount of delay is introduced at the decoder in putting the transmission/ decoding sequence back to its original. This format inevitably limits the application of MPEG-1 for telecommunications. − A GOP, in coding, must start with an I picture and in display order, must start with an I or B picture and must end with an I or P picture. 158 Group of pictures and Reordering
  • 159. − In order to allow B frames to be decoded, frames are re-ordered when the MPEG file is created, so that when a frame is received, the decoder will already have the required reference frames. − To encode the frames in display order I1B1B2P1B3B4P2B5B6P3B7B8P4 − The ordering of the frames in the file would be I1P1B1B2P2B3B4P3B5B6P4B7B8 − When the decoder receives the P frame, it decodes it, but it would delay displaying the picture, as the next frame is a B frame 159 Picture Re-ordering, Ex. 1
  • 160. I B B P B B P B B I 1 2 3 4 5 6 10987 1 4 2 3 7 5 98106 Encoder Input/Decoder Output Encoder Output/ Decoder Input Picture Re-ordering, Ex. 1 Encoder Decoder P B BI P B B I B B 160
  • 161. 1 2 3 4 5 6 7 8 9 10 11 12 1 Source and Display Order Transmission Order 161 Picture Re-ordering, Ex. 2
  • 162. Group of pictures and Reordering 162 0 3 1 2 6 4 5 9 7 8 12 I P B B P B B I B B P Encoding Order of Frames Intra frame coding (Temporal reference) 0 1 2 3 4 5 6 7 8 9 10 I B B P B B P B B I B Group of Picture (GOP) Forward prediction Bidirectional prediction N=8 M=3 Forward prediction Backward prediction Picture Re-ordering, Ex. 3
  • 163. • Give the encoded sequence of the following frames: I1P1P2B1B2B3B4P3B5I2B6B7B8P4 • Answer I1P1P2P3B1B2B3B4I2B5P4B6B7B8 163 Picture Re-ordering, Ex. 4
  • 164. 164 Encoding order: I0, P3, B1, B2, P6, B4, B5, I9, B7, B8 Playback order: I0, B1, B2, P3, B4, B5, P6, B7, B8, I9 Picture Re-ordering, Ex. 5
  • 165. Ex: Bit Rate and Compression Ratio − Consider a video clip encoded in MPEG-1 with a frame rate of 30 frames per second and a group of pictures with sequence: I B B P B B P B B P B B P B B ..... − If the size of each I-frame, P-frame and B-frame is 12.5 KB, 6 KB and 2.5 KB respectively, calculate the average bit rate for the video clip. − Suppose that the uncompressed frames are each of size 150 KB, find the compression ratio. 165
  • 166. Solution: − In a GOP, there are 1 I-frame, 4 P-frames, and 10 B-frames. I-frame = 1 ×12.5 = 12.5 KB P-frame = 4 × 6 = 24 KB B-frame = 10 × 2.5 = 25 KB Size of a GOPs = 61.5 KB In 1 second, there are 30 frames = 2 GOPs = 2 × 61.5 KB = 123 KB Average bit rate = 123 × 1024 × 8 = 1007616 bit/s − Overall compression ratio for the video stream = original/compressed = 15×150 / 61.5 = 36.59. 166 Ex: Bit Rate and Compression Ratio
  • 167. 167 PSNR Long GOP B I B B P B B I BI I I I I I I I I Time GOP I Frame Only Ti me Moving Picture Types Quality
  • 169. 169 To GOP or not to GOP 1st 5th Long GOP Codec PSNR 10th Generation AVC-Intra100 : Cut Edit AVC-Intra50 : Cut Edit Still Pictures Fast Motion Confetti fall Flashing lights Landscape Long GOP quality is content dependent
  • 170. 170 Code and Decode Speed for Inter and Intra Codecs, Examples Software Coded Performance
  • 171. 171 Code and Decode Speed for Inter and Intra Codecs, Examples Core i7 4770, 4 core, 8 thread
  • 172. Multi Slice Encoding 172 Single CPU Model CPU #0 Multi CPU Model CPU #0 CPU #1 CPU #2 CPU #3 Total 4 CPUsA B C D CPU #0 CPU #1 CPU #2 CPU #3 GOP 0 GOP 1 GOP 2 A B C D A B C D * Use 1 GOP = 6 frames for Explanation
  • 173. Blocking − Borders of 8x8 blocks become visible in reconstructed frame (Caused by coarse quantization, with different quantization applied to neighboring blocks.) Ringing (Echoing, Ghosting) − Distortions near edges of image (Caused by quantization/truncation of high-frequency transform(DCT/DWT) coefficients during compression) 173 Original image Reconstructed image (with ringing Artifacts) De-blocking and De-ringing Filters
  • 174. Deblocking and Deringing Filters Low-pass filters are used to smooth the image where artifacts occur. De-blocking: − Do Low-pass filtering on the pixels at borders of 8x8 blocks − One-dimensional filter applied perpendicular to 8x8 block borders − Can be turned on or off for each block, usually go together with MC − Advantage • Decreases prediction error by smoothing the prediction frame • Reduces high-frequency artifacts like mosquito effects − Disadvantage • Increases complexity & overhead De-ringing: − Detect edges of image features − Adaptively apply 2D filter to smooth out areas near edges − Little or no filtering applied to edge pixels in order to avoid blurring 174 Deblocking
  • 175. Artifact Reduction: Post-processing vs. In-loop filtering De-blocking/de-ringing often applied after the decoder (post-processing) − Reference frames are not filtered − Developers free to select best filters for the application or not filter at all − It may require an additional frame buffer De-blocking/de-ringing can be incorporated in the compression algorithm (in-loop filtering) − Reference frames are filtered − Same filters must be applied in encoder and decoder − Better image quality at very low bit-rates 175
  • 176. Sensitivity to Transmission Errors − Prediction and Variable Length Coding (VLC) makes the video stream very sensitive to transmission errors on the bitstream − Error in one frame will propagate to subsequent frames − Bit errors in one part of the bit stream make the following bits undecodable 176
  • 177. Effect of Transmission Errors 177 Example reconstructed video frames from a H.263 coded sequence, subject to packet losses
  • 178. Error Resilient Encoding − To help the decoder to resume normal decoding after errors occur, the encoder can • Periodically insert INTRA mode (INTRA refresh) • Insert resynchronization codewords at the beginning of a group of blocks (GOB) − More sophisticated error-resilience tools • Multiple description coding − Trade-off between efficiency and error-resilience − Can also use channel coding / retransmission to correct errors 178
  • 179. Error Concealment − With proper error-resilience tools, packet loss typically lead to the loss of an isolated segment of a frame − The lost region can be “recovered” based on the received regions by spatial/temporal interpolation → Error concealment − Decoders on the market differ in their error concealment capabilities 179Without concealment With concealment
  • 181. 181 Projective Mapping 2-D Motion: Projection of 3-D motion, depending on 3D object motion and projection operator Optical flow: “Perceived” 2-D motion based on changes in image pattern, also depends on illumination and object surface texture On the left, a sphere is rotating under a constant ambient illumination, but the observed image does not change. On the right, a point light source is rotating around a stationary sphere, causing the highlight point on the sphere to rotate
  • 182. 182 When does optical flow break?
  • 183. 183 3D Motion to 2D Motion Projective Mapping Y D Z2-D MV X 3-D MV X' x yx d X x' C
  • 184. 184 2D Motion Corresponding to Rigid 3D Object Motion − General case − Projective mapping
  • 185. 185 Two Features of Projective Mapping − Chirping: increasing perceived spatial frequency for far away objects − Converging (Keystone): parallel lines converge in distance Non-chirping models Chirping models (Original) (Affine) (Bilinear) (Projective) (Relative- (Pseudo- (Biquadratic) projective) perspective)
  • 186. 186 Affine and Bilinear Transformation Models Approximation of projective mapping: I. Affine (6 parameters): Good for mapping triangles to triangles II. Bilinear (8 parameters): Good for mapping blocks to quadrangles
  • 187. 187 Prospective and Pixel Coordinate Transformation Models Approximation of projective mapping: I. Prospective Transformation II. Pixel Coordinate Transformation Prospective a1x1 + a2x2 + a3 a7x1 + a8xa + 1 x’1 = a4x1 + a5x2 + a6 a7x1 + a8xa + 1 x’2 = Eight Motion Parameters: a1, a2, a3, a4, a5, a6, a7, a8 Shift x’1 = x1 + d1 x’2 = x2 + d2 Two Motion Parameters: d1, d2 The simplest block motion model, which is used for block-based motion compensation mostly!
  • 188. 188 Motion Field Corresponding to Different 2-D Motion Models Translation Affine (a) (b) Bilinear Projective (c) (d)
  • 190. 190 2D Motion Corresponding to Camera Motion (b)(a) Camera zoom Camera rotation around Z-axis (roll)
  • 191. 191 Optical flow or optic flow − It is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene Methods for determining optical flow − Phase correlation – inverse of normalized cross-power spectrum − Block-based methods – minimizing sum of squared differences or sum of absolute differences, or maximizing normalized cross-correlation Optical flow
  • 192. 192 Optical Flow Equation When illumination condition is unknown, the best one can do it to estimate optical flow. Constant intensity assumption → Optical flow equation Under "constant intensity assumption": But, using Taylor's expansion Compare the above two, we have the optical flow equation Brightness or intensity (𝝍)
  • 193. 193 Ambiguities in Motion Estimation − Optical flow equation only constrains the flow vector in the gradient direction 𝑣 𝑛 − The flow vector in the tangent direction (𝑣 𝑡) is under-determined − In regions with constant brightness (∆𝜓 = 0), the flow is indeterminate → Motion estimation is unreliable in regions with flat texture, more reliable near edges
  • 194. 194 General Considerations for Motion Estimation Two categories of approaches: − Feature based (more often used in object tracking, 3D reconstruction from 2D) − Intensity based (based on constant intensity assumption) (more often used for motion compensated prediction, required in video coding, frame interpolation) → Our focus Three important questions − How to represent the motion field? − What criteria to use to estimate motion parameters? − How to search motion parameters?
  • 195. 195 Motion Representation Global: Entire motion field is represented by a few global parameters Pixel-based: One MV at each pixel, with some smoothness constraint between adjacent MVs. Region-based: Entire frame is divided into regions, each region corresponding to an object or sub- object with consistent motion, represented by a few parameters. Block-based: Entire frame is divided into blocks, and motion in each block is characterized by a few parameters.
  • 196. 196 Notations Anchor frame: 1(x) Target frame: 2(x) Motion parameters: a Motion vector at a pixel in the anchor frame: d(x) Motion field: d(x;a), x  Mapping function: w(x;a)  x  d(x;a), x
  • 198. 198 Relation Among Different Criteria − OF (Optical Flow) criterion is good only if motion is small. − OF criterion can often yield closed-form solution as the objective function is quadratic in MVs. − When the motion is not small, can iterate the solution based on the OF criterion to satisfy the DFD criterion. − Bayesian criterion can be reduced to the DFD criterion plus motion smoothness constraint − More in the textbook
  • 199. 199 Optimization Methods Exhaustive search – Typically used for the DFD criterion with p=1 (MAD) – Guarantees reaching the global optimal – Computation required may be unacceptable when number of parameters to search simultaneously is large! – Fast search algorithms reach sub-optimal solution in shorter time Gradient-based search – Typically used for the DFD or OF criterion with p=2 (MSE) − The gradient can often be calculated analytically − When used with the OF criterion, closed-form solution may be obtained – Reaches the local optimal point closest to the initial solution Multi-resolution search − Search from coarse to fine resolution, faster than exhaustive search − Avoid being trapped a local minimum
  • 200. 200 Gradient-based Search Iteratively update the current estimate in the direction opposite the gradient direction. • The solution depends on the initial condition. Reaches the local minimum closest to the initial condition • Choice of step side: – Fixed stepsize: Stepsize must be small to avoid oscillation, requires many iterations – Steepest gradient descent (adjust stepsize optimally)
  • 201. 201 Block-Based Motion Estimation: Overview • Assume all pixels in a block undergo a coherent motion, and search for the motion parameters for each block independently • Block matching algorithm (BMA): assume translational motion, 1 MV per block (2 parameter) – Exhaustive BMA (EBMA) – Fast algorithms • Deformable block matching algorithm (DBMA): allow more complex motion (affine, bilinear), to be discussed later.
  • 202. 202 Block Matching Algorithm • Overview: – Assume all pixels in a block undergo a translation, denoted by a single MV – Estimate the MV for each block independently, by minimizing the DFD error over this block • Minimizing function: • Optimization method: – Exhaustive search (feasible as one only needs to search one MV at a time), using MAD criterion (p=1) – Fast search algorithms Integer vs. fractional pel accuracy search
  • 203. 203 Ry dm Target frame Rx Anchor frame Bm Current block Bm Best match Search region Exhaustive Block Matching Algorithm (EBMA)
  • 204. 204 Reference Frame Current Frame Search Range Motion Vector 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 11 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 12 22 33 44 55 66 77 88 11 23 34 44 55 66 77 88 11 23 34 44 55 66 77 88 11 23 34 44 55 66 77 88 11 23 34 44 55 66 77 88 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 |A|=12 Exhaustive Block Matching Algorithm (EBMA)
  • 205. 205 Sample Matlab Script forInteger-pel EBMA
  • 206. 206 Fractional Accuracy EBMA • Real MV may not always be multiples of pixels. To allow sub- pixel MV, the search stepsize must be less than 1 pixel • Half-pel EBMA: stepsize=1/2 pixel in both dimension • Difficulty: – Target frame only have integer pels • Solution: – Interpolate the target frame by factor of two before searching – Bilinear interpolation is typically used • Complexity: – 4 times of integer-pel, plus additional operations for interpolation. • Fast algorithms: – Search in integer precisions first, then refine in a small search region in half-pel accuracy.
  • 207. 207 Sub-pixel Motion Compensation − In the first stage, motion estimation finds the best match on the integer pixel grid (circles). − The encoder searches the half-pixel positions immediately next to this best match (squares) to see whether the match can be improved and if required, the quarter-pixel positions next to the best half-pixel position (triangles) are then searched. − The final match, at an integer, half-pixel or quarter-pixel position, is subtracted from the current block or macroblock.
  • 208. Sub-pixel Motion Compensation 208 Close-up of reference region Reference region interpolated to half-pixel positions
  • 209. 209 Half-Pel Accuracy EBMA Bm: Current block B'm: Matching block dm
  • 210. 210 (x, y) (x+1,y) (x,y +1) (x+1 ,y+1) (2x +1,2y) (2x,2y +1) (2x +1,2y+1) Bilinear Interpolation O[2x,2y]=I[x,y] O[2x+1,2y]=(I[x,y]+I[x+1,y])/2 O[2x,2y+1]=(I[x,y]+I[x+1,y])/2 O[2x+1,2y+1]=(I[x,y]+I[x+1,y]+I[x,y+1]+I[x+1,y+1])/4 (2x, 2y)
  • 212. 212 Pros and Cons with EBMA • Blocking effect (discontinuity across block boundary) in the predicted image – Because the block-wise translation model is not accurate Fix: Deformable BMA (next lecture) • Motion field somewhat chaotic – because MVs are estimated independently from block to block – Fix 1: Mesh-based motion estimation (next lecture) – Fix 2: Imposing smoothness constraint explicitly • Wrong MV in the flat region – because motion is indeterminate when spatial gradient is near zero • Nonetheless, widely used for motion compensated prediction in video coding – Because its simplicity and optimality in minimizing prediction error
  • 213. 213 Fast Algorithms for BMA • Key idea to reduce the computation in EBMA: – Reduce # of search candidates: • Only search for those that are likely to produce small errors. • Predict possible remaining candidates, based on previous search result – Simplify the error measure (DFD) to reduce the computation involved for each candidate • Classical fast algorithms (Save large computation, Not accurate as EBMA) – Three-step – 2D-log – Conjugate direction – The characteristics of fast algorithm • Many new fast algorithms have been developed since then – Some suitable for software implementation, others for VLSI implementation (memory access, etc)
  • 214. 214 Row-Column Search Algorithm Step 1. Search row (15) Step 2. Search column (14) • 29 point • not optimal 15 15 -7 +7
  • 215. 215 Three Step Search (TSS) Method TOTAL = (9 + 8 + 8 ) = 25 Step 1:”O”﹐9 search points  find min point (1st MSB) (x, y) = (a 0 0, b 0 0) Step 2:””﹐8 search points  find min point (2nd MSB) (x, y) = (-1 c 0, +1 d 0) Step 3: ”□” 8 points  find min point a and b could be +1, 0, or -1 c and d could be +1, 0, or -1 The TSS method is called as the logarithmic search method!! First-step Search Points Second-step Search Points Third-step Search Points
  • 216. 216 11 1 1 1 2 2 2 2 11 11 2 2 2 2 3 3 3 3 3 3 3 3 1 2 3 Three-Step Search Algorithm (TSS)
  • 217. 217 Three Step Search (TSS) Method -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 2525 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 2525 25 25 2525 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 252525 25 25 25 25 25 2525 2525 25 25 25 25 25 25 2525 25 25 25 25 2525 25 25 25 25 25 25 25 25 25 25 2525 25 25 25 25 25 25 25 2525 25 25 25 25 2525
  • 218. 218 Three-Step Search Algorithm Example 2 3 3 3 3 2 3 2 1 1 2 3 3 3 2 1 2 2 2 1 1 1 1 1 1 n denotes a search point of step n i 6 i 5 i 4 i 3 i 2 i 1 i i 1 i 2 i 3 i 4 i 5 i 6 j 6 j 5 j 4 j 3 j 2 j 1 j j 1 j 2 j 3 j 4 j 5 j 6 The best matching MVs in steps 1–3 are (3,3), (3,5), and (2,6). The final MV is (2,6). From [Musmann85].
  • 219. 219 Three Step Search Algorithm
  • 220. Four Step Search (4SS) Method 1. Corner: 2. Center: 3. Side: Step 1. Search 9 points (Start from 5x5 : same as the TSS) Go to Step 2 220
  • 221. Four Step Search (4SS) Method 1. Corner: 2. Center: 3. Side: Step 2 and 3 If Corner: (+5) If Side: (+3) If Center: (8-point step) Step 3 if Not Center Go to Step 4 if Center 221
  • 222. 222 Four Step Search (4SS) Method Step 4. Search window reduced to 3x3 (same as Center case) Minimun search points = 9 + 8 = 17 Step 1(Center) + Step 4 (Center) Maximun search points = 9 + 5 + 5 + 8 = 27 Step 1(Center) + Step 2 (Corner) + Step 3 (Corner) + Step 4 (Center)
  • 223. 223 Four Step Search (4SS) Method, Examples -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Step.1 =>Step.2 Side => Step.3 Corner => Step.4 Center Total Search 9+3+5+8=25 Points -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Step.1 =>Step.2 Corner => Step.3 Corner => Step.4 Center Total Search 9+5+5+8=27 Points Worst Cases for 4SS
  • 224. 224 Search Points of Four Step Search -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 17 27 27 27 27 27 27 27 27 27 25 27 27 27 25 25 25 2525 17 25 25 23 2323 2222 22 22 20 20 20 20 26 26 26 27 27 27 27 26
  • 225. 225 The search pattern is cross shape. − Step 1:”+”search (5 points) − Step 2: (a) center (+8 points) Stop (b) side (+3 points) − Step N: center (+8 points) Stop − Minimun search points = 5 + 8 = 13 Step 1(Center) − Maximun search points = ??? Cross Search Method Search Shapes (Change 9 points to 5 Points) 1 2
  • 226. 226 Cross Search Algorithm (CSA) -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Step.1 => Step.2 Side => Step.3 Side => Step.4 Center Total Search 5+3+3+8=19 Points Normal Worse Case N=4 for search range -7 to +7
  • 227. 227 Search Points of Cross Search -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 1313 13 13 13 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 1919 19 19 19 19 19 191919 19 19 16 1919 19 19 19 19 19 19 16 16 1616 16 16 16 16 16 16 16 19 19 19 19 16 16 1616 16 16 16 1616
  • 228. Cross-search Algorithm (CSA) An example of the CSA (cross-search algorithm) search for w=8 pixels/frame • Another method of fast BMA is the cross- search algorithm (CSA) . In this method, the basic idea is still a logarithmic step search, but with some differences, which lead to fewer computational search points. • The main difference is that at each iteration there are four search locations, which are the end points of a cross (×) rather than (+). • Also, at the final stage, the search points can be either the end points of (×) or (+) crosses, as shown in Figure. • For a maximum motion displacement of w pixels/frame, the total number of computations becomes 5 + 4 log2 𝑊. 228
  • 229. 229 Step 1: 9 points ==> Three Cases 1. Center 2. Side 3. Corner Step. 2 and 3 Final step: with shirked diamond (same as Center case) Minimun search points = 9 + 4 = 13, Step 1(Center) Maximun search points = ? (33) Diamond Search Slide
  • 230. 230 Diamond Search -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Step.1 =>Step.2 Side => Step.3 Side => Step.4 Side => Step.5 Total Search 9+6+6+4+4=29 Points
  • 231. 231 Search Points of Diamond Search -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 13 25 25 25 25 25 25 25 25 25 24 25 25 25 24 24 23 22 22 2222 22 22 2222 22 22 22 22 24 24 24 24 24 24 24 24 24 23 23 232323 23 23 19 21 21 2121 2121 21 21 21 21 21 21 2121 21 21 19 19 1919 19 19 19 19 19 19 19 1818 18 18 18 1616 16 16
  • 232. 232 Step 1. TSS points (9 points) + around center points (8 points) Novel TSS Algorithm (NTSS) Center_center Center_side Center_corner Outside point 1. Center_center 2. Center_side 3. Center_corner 4. Outside
  • 233. 233 Step. 2 (1) Center ==> (0,0) Stop! (17 points) minimum search (2) Center-side ==> Perform 3 points search (3) Center-corner ==> Perform 5 points search Stop! (4) Outside point ==> Perform regular TSS algorithm Novel TSS Algorithm (NTSS) Center-sideCenter_corner 25+8=33(worst case)
  • 234. 234 Search Points for Novel 3-Step Search -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 2020 20 20 17 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 3333 33 33 3333 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 333333 33 33 22 33 33 3333 3333 33 33 33 33 33 33 3333 33 33 33 33 2233 33 22 33 33 33 22 33 33 33 33 3320 20 20 20 16 16 22 22 2222 22 22 22 22 2222
  • 235. 235 1 1 1 1 1 2 2 2 3 3 4 44 4 4 1 2 3 4 Two-dimensional Log Search Algorithm (TDL)
  • 236. 236 2D-Log Search, Example 3 5 4 5 3 2 5 5 5 4 3 2 1 2 1 1 1 1 n denotes a search point of step n i 6 i 5 i 4 i 3 i 2 i 1 i i 1 i 2 i 3 i 4 i 5 i 6 j 6 j 5 j 4 j 3 j 2 j 1 j j 1 j 2 j 3 j 4 j 5 j 6 The best matching MVs in steps 1– 5 are (0,2), (0,4), (2,4), (2,6), and (2,6). The final MV is (2,6). From [Musmann85].
  • 237. 237 Step 1: The large hexagon with seven checking points is centered at (0,0), the center of a predefined search window in the motion field. If the MBD point is found to be at the center of the hexagon, go to Step3 (Ending); otherwise, go to Step 2 (Searching). Step 2: With the MBD point in the previous search step as the center, a new large hexagon is formed. Three new candidate points are checked, and the MBD point is again identified. If the MBD point is at the center point of the new hexagon, go to Step3 (Ending); otherwise, repeat this step continuously. Step 3: Switch the search pattern from the large to the small size of the hexagon. The four points covered by the small hexagon are evaluated to compare with the current MBD point. The new MBD point is the final solution of the motion vector. Hexagon-Based Search Algorithm
  • 238. 238 Hexagon Search Path First step Second step Final step Third step
  • 239. 239 Diamond Search Minimum search points =9+4=13 Number of search points NDS= 9+Mxn’+4, M=5 or 3 n’: number of steps Hexagon Search Minimum search points =7+4=11 Number of search points NHEXBS=7+3xn+4 n: number of steps Number of Search Points
  • 240. 240 One More Step Hexagon Search Minimum Distortion
  • 241. 241 Search Points of HEXBS Method -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 1111 11 11 11 20 20 20 20 20 20 20 20 20 17 20 20 20 17 17 17 17 17 2017 17 20 1720 17 17 20 17 17 17 17 17 17 17 17 17 17 17 17 171717 17 17 14 17 17 1717 1717 17 17 17 17 17 17 1717 17 17 17 17 1417 17 14 17 17 17 14 17 14 14 14 1414 14 14 14 14 14 14 14 1414 14 14 14 14 1414
  • 242. Algorithm Maximum number of search points 4 8 16 w TDL 2 + 7 log w2 16 23 30 TSS 1 + 8 log w2 17 25 33 MMEA 1 + 6 log w 2 13 19 25 CDS 3 + 2 w 11 19 35 CSA 5 + 4 log w 2 13 17 21 (2w+1)FSM 81 289 10892 OSA 1 + 4 log w 9 13 17 2 Computational complexity Complexity/Performance 242 The range of motion speed from w=4 to 16 pixels/frame − The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean- square error, average error and standard deviation. FSM: Full Search Mode TDL: two-dimensional logarithmic TSS: three-step search MMEA: modified motion estimation algorithm CDS: conjugate direction search OSA: orthogonal search algorithm CSA: cross-search algorithm
  • 243. Algorithm Split Screen Trevor White Entropy (bits/pel) Standard deviation Entropy (bits/pel) Standard deviation FSM 4.57 7.39 4.41 6.07 TDL 4.74 8.23 4.60 6.92 TSS 4.74 8.19 4.58 6.86 MMEA 4.81 8.56 4.69 7.46 CDS 4.84 8.86 4.74 7.54 OSA 4.85 8.81 4.72 7.51 CSA 4.82 8.65 4.68 7.42 Compensation efficiency Complexity/Performance 243 The motion compensation efficiencies of some algorithms for a motion speed of w=8 pixels/frame for two test image sequences (Split screen and Trevor white) − The accuracy of the motion estimation is expressed in terms of errors: maximum absolute error, root-mean- square error, average error and standard deviation. FSM: Full Search Mode TDL: two-dimensional logarithmic TSS: three-step search MMEA: modified motion estimation algorithm CDS: conjugate direction search OSA: orthogonal search algorithm CSA: cross-search algorithm
  • 244. Some Search Points Comparison Methods Min Max Average Speed-up FS 225 225 225 1 TSS 25 25 25 9 4SS 17 27 17.2 13.1 NTSS 17 33 17.5 12.8 Diamond 13 33 13.3 16.9 Method Ave Criterion Ave Distance (FS) Optimality FS 2753 0 100% TSS 2790 0.04 98.5% 4SS 2777 3.84 98.7% NTSS 2775 2.98 99.0% Diamond 2770 3.11 98.9% 244
  • 245. 245 Hierarchical Block Matching Algorithm (HBMA), Ex: H.261
  • 246. 246 Multi-resolution Motion Estimation or Hierarchical Motion Estimation • Problems with BMA – Unless exhaustive search is used, the solution may not be global minimum – Exhaustive search requires extremely large computation – Block wise translation motion model is not always appropriate • Multiresolution approach – Aim to solve the first two problems – First estimate the motion in a coarse resolution over low-pass filtered, down-sampled image pair • Can usually lead to a solution close to the true motion field – Then modify the initial solution in successively finer resolution within a small search range • Reduce the computation – Can be applied to different motion representations, but we will focus on its application to BMA
  • 247. 247 − The assumption of monotonic variation of image intensity employed in the fast BMAs often causes false estimations, especially for larger picture displacements. − These methods perform well for slow moving objects, such as those in video conferencing. − However, for higher motion speeds, due to the intrinsic selective nature of these methods, they often converge to a local minimum of distortion. − One method of alleviating this problem is to subsample the image to smaller sizes, such that the motion speed is reduced by the sampling ratio. − The process is done on a multilevel image pyramid, known as the Hierarchical Block Matching Algorithm (HBMA). − In this technique, pyramids of the image frames are reconstructed by successive two-dimensional filtering and subsampling of the current and past image frames. Multi-resolution Motion Estimation or Hierarchical Motion Estimation
  • 249. Hierarchical Block Matching Algorithm (HBMA) 249
  • 250. Level 0 Level 1 Level 2 384 256 V2 2V2  V1 2V1  V0 Hierarchical Block Matching Algorithm (HBMA) 250
  • 251. 251 − Conventional block matching with a block size of 16 pixels, either full search or any fast method, is first applied to the highest level of the pyramid (level 2). − This motion vector is then doubled in size, and further refinement within 1-pixel search is carried out in the following level. The process is repeated to the lowest level. − Therefore, with an n-level pyramid, the maximum motion speed of w at the highest level is reduced to 𝑊 2 𝑛−1. − For example, a maximum motion speed of 32 pixels/frame with a three level pyramid is reduced to 8 pixels/frame, which is quite manageable by any fast search method. − Note that this method can also be regarded as another type of fast search, with a performance very close to the full search, irrespective of the motion speed, but the computational complexity can be very close to the fast logarithmic methods. Multi-resolution Motion Estimation or Hierarchical Motion Estimation
  • 252. 252 • The number of levels is L • l-th level images of the target frames where Λ 𝑙 is set of pixels at level L • At the l-th level, the MV is d(x) • At the l-th level, the estimated MV is • Determine update 𝑞𝑙(𝑥) such that error is minimized • The new motion vector is , ( ), , 1,2,...t l l t  x x 1( ) ( ( ))l lU d x d x 2, 1,| ( ( ) ( )) ( )) | p ll l l x l error       x d x q x x ( ) ( ) ( )ll l d x d x q x Hierarchical Block Matching Algorithm (HBMA)
  • 253. Hierarchical Block Matching Algorithm (HBMA) (a) (c) (e) (b) (d) (f) Predictedanchorframe(29.32dB) Example: Three-level HBMA 253
  • 255. 255 Overview of DBMA • Three steps: – Partition the anchor frame into regular blocks – Model the motion in each block by a more complex motion • The 2-D motion caused by a flat surface patch undergoing rigid 3-D motion can be approximated well by projective mapping • Projective Mapping can be approximated by affine mapping and bilinear mapping • Various possible mappings can be described by a node-based motion model – Estimate the motion parameters block by block independently • Discontinuity problem cross block boundaries still remain • Still cannot solve the problem of multiple motions within a block or changes due to illumination effect!
  • 256. 256 Affine and Bilinear Model Approximation of projective mapping: I. Affine (6 parameters): Good for mapping triangles to triangles II. Bilinear (8 parameters): Good for mapping blocks to quadrangles
  • 257. 257 Node-Based Motion Model Control nodes in this example: Block corners Motion in other points are interpolated from the nodal MVs dm,k Control node MVs can be described with integer or half-pel accuracy, all have same importance Translation, affine, and bilinear are special case of this model
  • 258. 258 Problems with DBMA • Motion discontinuity across block boundaries, because nodal MVs are estimated independently from block to block – Fix: mesh-based motion estimation – First apply EBMA to all blocks • Cannot do well on blocks with multiple moving objects or changes due to illumination effect – Three mode method • First apply EBMA to all blocks • Blocks with small EBMA errors have translational motion • Blocks with large EBMA errors may have non-translational motion – First apply DBMA to these blocks – Blocks still having errors are non-motion compensable • [Ref] O. Lee and Y. Wang, Motion compensated prediction using nodal based deformable block matching. J. Visual Communications and Image Representation (March 1995), 6:26-34
  • 259. 259 Mesh-Based Motion Estimation: Overview − MPEG-4 object motion − Affine warping motion model − Deformable polygon meshes − Similar MAD, SSE error measures − Trade-offs: more accurate ME vs. tremendous complexity − Bilinear and perspective motion models are rarely used in video coding
  • 260. 260 Mesh-Based Motion Estimation: Overview (a) Using a triangular mesh (b) Using a quadrilateral mesh
  • 261. 261 Mesh-based v.s Block-based (a) block-based backward ME (b) mesh-based backward ME (c) mesh-based forward ME
  • 262. 262 Mesh-Based Motion Model • The motion in each element is interpolated from nodal MVs • Mesh-based vs. node-based model: – Mesh-based: Each node has a single MV, which influences the motion of all four adjacent elements – Node-based: Each node can have four different MVs depend on within with element it is considered
  • 265. Image Reconstruction with MV (Bilinear vs BMA) 265
  • 267. Performance of Spatial Transform Motion Compensation 267
  • 268. − Rarely used in practice: BME/BMC mostly suffices − Reference frame resampling: an option in H.263+/H.263++ − Global affine motion model: special-effect warping − 3D subband & wavelet coding: align frames before temporal filtering [Taubman] 268 Global Motion Estimation MV00 MV10 MV11 MV01