1. EE 5359
Project Report
Spring 2008
Study and Comparison of MPEG 2 and H.264 main
profiles and available transcoding methods
Priyanka Ankolekar
1000 51 4497
2. List of Acronyms
AVC: Advanced video coding
CABAC: Context-based adaptive binary arithmetic coding
CAVLC: Context-based adaptive variable length coding
DCT: Discrete Cosine Transform
GOP: Group of pictures
HDTV: High Definition Television
IDCT: Inverse DCT.
IQ: Inverse quantization:
ISO: International Organization for Standardization.
ITU: International telecommunication union.
JVT: Joint Video Team
M.E: Motion Estimation
M.C: Motion Compensation
MB: Macroblock
MV: Motion vector.
NAL: Network abstraction layer
QP: Quantization parameter.
VLC: Variable length coding
VLD: Variable length decoding
VCL: Video coding layer
VCEG: Video coding experts group.
2
3. Abstract
There is a high demand for multimedia applications like digital video recording
and teleconferencing. This has led to the development of various video coding
standards like MPEG-2 and H.264. The video coding layer of H.264 is
superficially similar to that of MPEG-2; however, there are several differences in
the details. In this project the MPEG-2 and H.264 video coding standards are
compared with a concentration on the main profiles. H.264 gives a better
compression performance than MPEG-2. However, MPEG-2 has already been
widely used in the field of digital broadcasting, HDTV and DVD applications. This
incompatibility problem between H.264 video source and the existing MPEG-2
decoders can be solved using transcoders. This project also discusses the
criteria for efficient transcoding and a few transcoding architectures.
3
4. 1 Introduction
Development of the international video coding standards such as MPEG-2 [7]
[11][17][16] boosted a diverse range of multimedia applications, including digital
video recording and teleconferencing. As a result of the growing demand for
better compression performance, advanced standards such as H.264 [1][2][6][9]
[18] were developed by the ITU-T-ISO/IEC Joint Video Team (JVT) in 2003. The
overall scheme of the video coding layer (VCL) of H.264 is superficially similar to
the encoding scheme of MPEG-2. However, there are significant differences in
the details. In this project the MPEG-2 and H.264 video coding standards are
compared, i.e. the similarities and differences are studied, with a concentration
on the main profiles.
H.264 can support various applications such as video broadcasting, video
streaming and video conferencing over fixed and wireless networks and over
different transport protocols. However, MPEG-2 has already been widely used in
the field of digital broadcasting, HDTV and DVD applications. The incompatibility
problem between H.264 video source and the existing MPEG-2 decoders can be
solved by using transcoders. In this project, the criteria for transcoding and a few
transcoding architectures are discussed.
The report has been structured in the following manner: Chapter 1 is an
introduction to the topic and explains the scope of the project. Chapter 2 explains
the various aspects of the MPEG-2 video coding standard while Chapter 3
covers the same for H.264 video coding standard. Chapter 4 shows a
comparison between the two standards. In Chapter 5, the topic of MPEG-2 to
H.264 transcoding is covered in greater detail.
4
5. 2 MPEG-2
MPEG-2 is widely used as the format of digital television signals that are
broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV
systems. It also specifies the format of movies and other programs that are
distributed on DVD and similar disks. As such, TV stations, TV receivers, DVD
players, and other equipment are often designed to this standard. MPEG-2 was
the second of several standards developed by the Moving Pictures Expert Group
(MPEG) and is an international standard (ISO/IEC 13818). [16]
The video section, part 2 of MPEG-2, is similar to the previous MPEG-1
standard, but also provides support for interlaced video; the format used by
analog broadcast TV systems. MPEG-2 video is not optimized for low bit-rates,
especially less than 1 Mbit/s at standard definition resolutions. However, it
outperforms MPEG-1 at 3 Mbit/s and above. MPEG-2 is directed at broadcast
formats at higher data rates of 4 Mbps (DVD) and 19 Mbps (HDTV). All
standards-compliant MPEG-2 video decoders are fully capable of playing back
MPEG-1 video streams. MPEG-2/video is formally known as ISO/IEC 13818-2
and as ITU-T Rec. H.262 [21].
2.1 MPEG-2 Profiles and Levels
MPEG-2 video supports wide range of applications from mobile to high quality
HD editing. For many applications, it is unrealistic and too expensive to support
the entire standard. To allow such applications to support only subsets of it, the
standard defines profile and level. [21]
Description
MPEG-2 video is a family of systems, each having an arranged degree of
commonality and compatibility. It allows four source formats, or ‘Levels’, to be
coded, ranging from Limited Definition (about today’s VCR4 quality), to full
HDTV5 – each with a range of bit rates [22]. The level defines the subset of
quantitative capabilities such as maximum bit rate, maximum frame size, etc [16].
In addition to this flexibility in source formats, MPEG-2 allows different ‘Profiles’.
Each profile offers a collection of compression tools that together make up the
coding system. A different profile means that a different set of compression tools
is available. [22]
5
6. MPEG-2 Profiles
2.1.1.1 Simple Profile
This profile has the fewest tools. The Simple profile offers the basic toolkit for
MPEG-2 encoding. This is intra and predicted frame encoding and decoding with
a color sub sampling of YUV 4:2:0.
2.1.1.2 Main Profile
This profile has all the tools of the Simple Profile plus one more (termed bi-
directional prediction). It gives better (maximum) quality for the same bit-rate than
the Simple Profile. A Main Profile decoder decodes both Main and Simple Profile-
encoded pictures. This backward compatibility pattern applies to the succession
of profiles. A refinement of the Main Profile, sometimes unofficially known as
Main Profile Professional Level or MPEG 422, allows line-sequential color
difference signals (4:2:2) to be used, but not the scaleable tools of the higher
Profiles.
2.1.1.3 SNR Scalable Profile and Spatially Scalable Profile
The two Profiles after the Main Profile are, successively, the SNR Scaleable
Profile and the Spatially Scaleable Profile. These add tools which allow the
coded video data to be partitioned into a base layer and one or more ‘top-up’
signals. The top-up signals can either improve the noise (SNR Scalability) or the
resolution (Spatial Scalability). These Scaleable systems may have interesting
uses. The lowest layer can be coded in a more robust way, and thus provide a
means to broadcast to a wider area, or provide a service for more difficult
reception conditions. Nevertheless there will be a premium to be paid for their
use in receiver complexity. Owing to the added complexity, none of the Scaleable
Profiles is supported by digital video broadcasting (DVB). The inputs to the
system are YUV component radio. However, the first four profiles code the color-
difference signals line-sequentially.
2.1.1.4 High Profile
It includes all the previous tools plus the ability to code line-simultaneous colour-
difference signals. In effect, the High Profile is a ‘super system’, designed for the
most sophisticated applications, where there is no constraint on bit rate.
Table 1 is a tabulated form of the properties of the various MPEG-2 profiles.
6
7. Table 1. MPEG-2 Profiles
MPEG-2 Profiles[16]
Picture
Chroma
Abbr. Name Coding Aspect Ratios Scalable modes
Format
Types
square pixels,
SP Simple profile I, P 4:2:0 none
4:3, or 16:9
square pixels,
MP Main profile I, P, B 4:2:0 none
4:3, or 16:9
SNR (signal-to-
SNR Scalable square pixels,
SNR I, P, B 4:2:0 noise ratio)
profile 4:3, or 16:9
scalable
Spatially
Spatia square pixels, SNR- or spatial-
Scalable I, P, B 4:2:0
l 4:3, or 16:9 scalable
profile
4:2:2 or square pixels, SNR- or spatial-
HP High profile I, P, B
4:2:0 4:3, or 16:9 scalable
MPEG-2 Levels
7
8. 2.1.1.5 Description of Levels
A level is the definition for the MPEG standard for physical parameters such as
bit rates, picture sizes and resolutions. There are four levels specified by
MPEG2: High level, High 1440, Main level, and Low level. MPEG-2 Video Main
Profile and Main level has sampling limits at ITU-R 601 parameters (PAL and
NTSC). Profiles limit syntax (i.e. algorithms) whereas Levels limit encoding
parameters (sample rates, frame dimensions, coded bitrates, buffer size etc.).
Together, Video Main Profile and Main Level (abbreviated as MP@ML) keep
complexity within current technical limits, yet still meet the needs of the majority
of applications. MP@ML is the most widely accepted combination for most cable
and satellite systems; however different combinations are possible to suit other
applications. [4]
Table 2 shows a comparison between the four MPEG-2 levels on the basis of the
frame size (PAL/NTSC) and the maximum bit rate for each.
Table 2. MPEG-2 Levels [22]
2.2 MPEG-2 Encoder
8
9. Figure 1. MPEG 2 encoder [10]
The various blocks of the MPEG-2 encoder are explained below:
DCT
The MPEG-2 encoder uses 8x8 2-D DCT. In the case of intra frames, it is applied
to 8x8 blocks of pels and in the case of inter frames it is applied to 8x8 blocks of
the residual (motion compensated prediction errors). Since DCT is more efficient
in compressing correlated sources, intra pictures DCT compress more efficiently
than inter pictures.
2.2.1 Quantizer
The DCT coefficients obtained above are then quantized by using a default or
modified matrix. User defined matrices may be downloaded and can occur in the
sequence header or in the quant matrix extension header. The quantizer step
sizes for DC coefficients of the luminance and chrominance components are 8, 4,
2 and 1 according to the intra DC precision of 8, 9, 10 and 11 bits respectively.
2.2.2 Motion estimation and compensation
Motion estimation and compensation: In the motion estimation process, motion
vectors for predicted and interpolated pictures are coded differentially between
macroblocks. The two motion vector components, the horizontal component first
9
10. and then the vertical component are coded independently. The motion
compensation process forms prediction from previously decoded pictures using
the motion vectors that are of integer and half-pel resolutions.
2.2.3 Coding decisions
There are four different coding modes in MPEG-2. These modes are chosen
based on whether the encoder encodes a frame picture as a frame or two fields
or in the case of interlaced pictures it can chose to encode it as two fields or use
16x8 motion compensation.
2.2.4 Scanning and VLC
The quantized transform coefficients are scanned and converted to a one
dimensional array. Two scanning methods are available:
a. Zigzag scan (Figure 2(a)): For progressive (non-interlaced) mode processing
b. Alternate scan (Figure 2(b)): For interlaced format video.
(a) (b)
Figure 2 (a) Zig Zag scan pattern (4x4) [4]
(b) Alternate scan pattern (4x4)
10
11. (a)
(b)
Figure 3. Scan matrices in MPEG-2 [20] (8x8) (a) Zigzag scan (b) Alternate scan
The list of values produced by scanning is then entropy coded using a variable
length code (VLC).
2.3 MPEG-2 Decoder
11
12. Figure 4. MPEG 2 Decoder [7]
At the decoder side, the quantized DCT coefficients are reconstructed and
inverse transformed to produce the prediction error. This predicted error is then
added to the motion compensated prediction generated from previously decoded
picture to produce the reconstructed output.
The various parts of the MPEG-2 decoder are:
2.3.1 Variable length decoding
This process involves the use of a table defined for decoding intra DC
coefficients and three tables one each for non intra DC coefficients, intra AC
coefficients and non intra AC coefficients. The decoded values basically infer one
of three courses of action: end of block, normal coefficients and escape coding.
2.3.2 Inverse scan
The output of the variable length decoding stage is one dimensional and of
length 64. Inverse scan process converts this one dimensional data into a two
dimensional array of coefficients according to a predefined scan matrix.
2.3.3 Inverse quantization
At this stage the two dimensional DCT coefficients are inverse quantized to
produce the reconstructed DCT coefficients. This process involves the rescaling
of the coefficients by essentially multiplying them by the quantizer step size. The
quantizer step size can be modified by using either a weighing matrix or a scale
12
13. factor. After performing inversion quantization, saturation and mismatch control
operations are performed.
2.3.4 Inverse DCT
Once the reconstructed DCT coefficients are obtained, a 2D 8x8 inverse DCT is
applied to obtain the inverse transformed values. These values are then
saturated to keep them in the range of [-256:+255].
2.3.5 Motion Compensation
During this stage, predictions from previously decoded pictures are combined
with the inverse DCT transformed coefficient data to get the final decoded output.
3 H.264
13
14. H.264/AVC [1][9] was developed by the JVT (Joint Video Team) to achieve
MPEG-2 [7] quality compression at almost half the bit rate. H.264/AVC provides
significant coding efficiency, simple syntax specifications, and seamless
integration of video coding into all current protocols and multiplex architectures.
H.264 supports various applications such as video broadcasting, video
streaming, and video conferencing over fixed and wireless networks and over
different transport protocols. [4]
H.264 video coding standard has the same basic functional elements as previous
standards (MPEG-1, MPEG-2, MPEG-4 part 2, H.261, H.263) [23], i.e., transform
for reduction of spatial correlation, quantization for bitrate control, motion
compensated prediction for reduction of temporal correlation, entropy encoding
for reduction of statistical correlation. However, in order to fulfill better coding
performance, the important changes in H.264 occur in the details of each
functional element by including intra-picture prediction, a new 4x4 integer
transform, multiple reference pictures, variable block sizes and a quarter pel
precision for motion compensation, a deblocking filter, and improved entropy
coding. [1]
3.1 H.264 Profiles
Each Profile specifies a subset of entire bitstream of syntax and limits that shall
be supported by all decoders conforming to that Profile. There are three Profiles
in the first version: Baseline, Main, and Extended. Baseline Profile is to be
applicable to real-time conversational services such as video conferencing and
videophone. Main Profile is designed for digital storage media and television
broadcasting. Extended Profile is aimed at multimedia services over Internet.
Also there are four High Profiles defined in the fidelity range extensions[19] for
applications such as content-contribution, content-distribution, and studio editing
and post-processing : High, High 10, High 4:2:2, and High 4:4:4. High Profile is to
support the 8-bit video with 4:2:0 sampling for applications using high resolution.
High 10 Profile is to support the 4:2:0 sampling with up to 10 bits of
representation accuracy per sample. High 4:2:2 Profile is to support up to 4:2:2
chroma sampling and up to 10 bits per sample. High 4:4:4 Profile is to support up
to 4:4:4 chroma sampling, up to 12 bits per sample, and integer residual color
transform for coding RGB signal. The Profiles have both the common coding
parts and as well specific coding parts as shown in Figure 5. [1]
3.1.1 Common Parts of All Profiles
14
15. 3.1.1.1 I slice (Intra-coded slice)
This slice is coded by using prediction only from decoded samples within
the same slice.
3.1.1.2 P slice (Predictive-coded slice)
This slice (Figure 6) is coded by using inter prediction from previously-decoded
reference pictures, using at most one motion vector and reference index to
predict the sample values of each block.
3.1.1.3 CAVLC (Context-based Adaptive Variable Length Coding)
This is used for entropy coding. After transform and quantization, the probability
that the level of coefficients is zero or +/-1 is very high. CAVLC handles the zero
and +/-1 coefficients as the different manner with the levels of coefficients. The
total numbers of zero and +/-1 are coded. For other coefficients, their levels are
coded.
3.1.2 Baseline Profile
3.1.2.1 Flexible macroblock order
Macroblocks may not necessarily be in the raster scan order. The map assigns
macroblocks to a slice group.
3.1.2.2 Arbitrary slice order
The macroblock address of the first macroblock of a slice of a picture may be
smaller than the macroblock address of the first macroblock of some other
preceding slice of the same coded picture.
3.1.2.3 Redundant slice
This slice belongs to the redundant coded data obtained by same or different
coding rate, in comparison with previous coded data of same slice.
15
16. Figure 5. The specific coding parts of the Profiles in H.264 [1].
3.1.3 Main Profile
3.1.3.1 B slice (Bi-directionally predictive-coded slice)
This slice (Figure 6) is coded by using inter prediction from previously-decoded
reference pictures, using at most two motion vectors and reference indices to
predict the sample values of each block.
3.1.3.2 Weighted prediction
This is a scaling operation performed by applying a weighting factor to the
samples of motion-compensated prediction data in P or B slice. A prediction
signal p for B slice is obtained using different weights from two reference signals,
r1 and r2.
Equation 1 [1]: p = w1 × r1 + w2 × r2
where w1 and w2 are weights.
16
17. 3.1.3.3 CABAC (Context-based Adaptive Binary Arithmetic Coding)
This is used for entropy coding. It utilizes arithmetic coding in order to achieve
good compression.
Figure 6. Illustration of temporal prediction (B and P slices)
3.1.4 Extended Profile
This profile includes all parts of Baseline Profile: flexible macroblock order,
arbitrary slice order, and redundant slice. The other features of this profile are:
3.1.4.1 SP slice
The specially coded slice for efficient switching between video streams, similar to
coding of a P slice.
3.1.4.2 SI slice
The switched slice, similar to coding of an I slice.
3.1.4.3 Data partition
The coded data is placed in separate data partitions, each partition can be
placed in different layer unit.
17
18. 3.1.4.4 B slice
H.264 generalizes the concept of bidirectional prediction and supports not only
forward/backward prediction pairs but also forward/forward and
backward/backward pairs.
3.1.4.5 Weighted prediction
All existing standards consider equal weights for reference pictures, i.e. a
prediction signal is obtained by averaging using equal weights of reference
signals. But gradual transitions from scene to scene need different weights.
H.264 uses weighted prediction method for a macroblock of P slice or B slice.
3.1.5 High Profiles
High profiles include all parts of the Main Profile: B slice, weighted prediction,
CABAC. The salient features of this profile are:
3.1.5.1 Adaptive transform block size
H.264 uses an adaptive transform block size, 4 x 4 and 8 x 8 (High Profiles only),
whereas previous video coding standards used the 8 x 8 DCT. The smaller block
size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform
has the additional benefit of removing the need for multiplications. [1]
3.1.5.2 Quantization scaling matrices
Different scaling according to specific frequency associated with the transform
coefficients in the quantization process to optimize the subjective quality. The
High Profiles support the perceptual-based quantization scaling matrices similar
to those used in MPEG-2. The encoder can specify a matrix for scaling factor
according to the specific frequency associated with the transform coefficient for
use in inverse quantization scaling by the decoder. This allows optimization of
the subjective quality according to the sensitivity of the human visual system
which is less sensitive to the coded error in high frequency transform coefficients.
Table 3 shows a comparison between the baseline, extended, main and high
profiles of H.264.
18
19. Table 3. Comparison chart for the various profiles of H.264 [18]
Baseline Extended Main High
I and P Slices Yes Yes Yes Yes
B Slices No Yes Yes Yes
SI and SP Slices No Yes No No
Multiple Reference Frames Yes Yes Yes Yes
In-Loop Deblocking Filter Yes Yes Yes Yes
CAVLC Entropy Coding Yes Yes Yes Yes
CABAC Entropy Coding No No Yes Yes
Flexible Macroblock Ordering (FMO) Yes Yes No No
Arbitrary Slice Ordering (ASO) Yes Yes No No
Redundant Slices (RS) Yes Yes No No
19
20. 3.2 H.264 Encoder
Figure 7. H.264 encoder [9]
The encoder blocks are explained below:
3.2.1.1 4x4 Integer transform
The H.264 employs a 4x4 integer DCT as compared to 8x8 DCT adopted by the
previous standards. The smaller block size leads to a significant reduction in
ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing
the need for multiplications.
3.2.1.2 Quantization and scan
The H.264 standard specifies the mathematical formulae of the quantization
process. The scale factor for each element in each sub block varies as a function
of the quantization parameter associated with the macroblock and as a function
20
21. of the position of the element within the sub block. The rate control algorithm
controls the value of the quantization parameter. Two types of scan pattern are
used for 4x4 blocks – one for frame coded macroblocks and one for field coded
macroblocks.
3.2.1.3 Context-based adaptive variable length coding (CAVLC) and Context-
based adaptive binary arithmetic coding (CABAC) entropy coding
H.264 uses different variable length coding methods in order to match a symbol
to a code based on the context characteristics. They are context-based adaptive
variable length coding (CAVLC) and context-based adaptive binary arithmetic
coding (CABAC). All syntax elements except for the residual data are encoded
by the Exp-Golomb codes. In order to read the residual data (quantized
transform coefficients), zig-zag scan (interlaced) or alternate scan (non-interlaced
or field) is used. For coding the residual data, a more sophistical method called
CAVLC is employed. Also, CABAC is employed in Main and High profiles,
CABAC has more coding efficiency but higher complexity compared to CAVLC.
3.2.1.4 Deblocking filter
H.264 employs a deblocking filter to reduce the blocking artifacts in the block
boundaries and stops the propagation of accumulated coded noise. The filter is
applied after the inverse transform (before reconstructing and storing the
macroblock for future predictions) and in the decoder (before reconstructing and
displaying the macroblocks). The deblocking filter is applied across the edges of
the macroblocks and the sub-blocks. The filtered image is used in motion
compensated prediction of future frames and helps achieve more compression.
Figure 8. Diagram depicting how the loop filter works on the edges of the blocks
and sub-blocks [4]
21
22. 3.2.1.5 Intra prediction
During intra prediction, the encoder derives a predicted block based on its
prediction with previously decoded samples. The predicted block is then
subtracted from the current block and then encoded. There are a total of nine
prediction modes (Figure 9) for each 4x4 luma block, four prediction modes for
each 16x16 luma block and four modes for each chroma block.
Figure 9. Intra prediction 4x4 [31]
3.2.1.6 Inter prediction
Inter prediction is performed on the basis of temporal correlation and consists of
motion estimation and motion compensation. As compared to the previous
standards, H.264 supports a large number of block sizes from 16x16 to 4x4.
Moreover H.264 supports motion vector accuracy of one-quarter of the luma
sample.
3.2.1.7 Reference pictures
Unlike the previous standards that just use the immediate previous I or P picture
for inter prediction, H.264 has the ability to use more than one previous reference
picture for inter prediction thus enabling the encoder to search for the best match
for the current picture from a wider set of reference pictures than just the
previously encoded one.
22
23. 3.3 H.264 Decoder
Figure 10 shows the block diagram of a general H.264/MPEG-4 AVC decoder.
Figure 10. H.264 decoder [7]
It includes all the control information such as picture or slice type, macroblock
types and subtypes, reference frames index, motion vectors, loop filter control,
quantizer step size etc, as well as coded data comprising of quantized transform
coefficients. The decoder of Figure 10 works similar to the local decoder at the
encoder; a simplified description is as follows. After entropy (CABAC or CAVLC)
decoding, the transform coefficients are inverse scanned and inverse quantized
prior to being inverse transformed. To the resulting 4_4 blocks of residual signal,
an appropriate prediction signal (intra or motion compensated inter) is added
depending on the macroblock type mbtyp (and submbtype) mode, the reference
frame, the motion vector/s, and decoded pictures store, or in intra mode. The
reconstructed video frames undergo deblock filtering prior to being stored for
future use for prediction. The frames at the output of deblocking filter may need
to undergo reordering prior to display. [2]
23
24. 4 Comparison between MPEG-2 and H.264
4.1 Key features of MPEG-2 video
The MPEG-2 coding standard has been designed to efficiently support both
interlaced and progressive video coding and produce high quality standard
definition video at about 4 Mbps. The MPEG-2 video standard uses a block-
based hybrid transform coding algorithm that employs transform coding of the
motion-compensated prediction error. While motion compensation exploits
temporal redundancies, the DCT transform exploits the spatial redundancies.
The asymmetric encoder-decoder complexity allows for a simpler decoder while
maintaining high quality and efficiency through a more complex encoder. [3]
4.2 Key features of H.264 video
The H.264 video coding standard has been developed recently through the joint
work of the ITU’s video coding experts group (VCEG) and ISO moving pictures
experts group (MPEG). The H.264 video coding standard is flexible and offers a
number of tools to support a range of applications with very low as well as very
high bitrate requirements.
4.3 Comparison: Similarities and Differences between MPEG-2 video
and H.264 video
In this section, the MPEG-2 and H.264 video coding standards are compared
with respect to their various aspects such as bit rate, block size, macroblock size,
intra prediction, motion estimation blocks, quantization, motion vector prediction,
intra prediction amongst various other. Table 4 tabulates these comparisons
systematically. They are also further elaborated in the sub sections below.
4.3.1 Increased efficiency
Compared with MPEG-2 video, the H.264 video format gives perceptually
equivalent video at 1/3 to1/2 of the MPEG-2 bit rates. Some extensions--known
as "Fidelity Range Extensions" facilitate higher-fidelity video coding by supporting
higher bit-depths, including 10-bit and 12-bit encoding, and higher color
resolution using the sampling structures YUV 4:2:2 and YUV 4:4:4. This naturally
makes it attractive to video distributors, because it permits them to maximize the
number of services that may be contained in a given amount of bandwidth [30].
24
25. The bit rate gains are not a result of any single feature but a combination of a
number of encoding tools. These gains come with a significant increase in
encoding and decoding complexity [27]. In spite of the increased complexity, the
dramatic bandwidth savings encourages TV broadcasters to adopt the new
technology as they can use the bandwidth savings to provide new channels or
new data and interactive services. With the coding gains of H.264, full length
HDTV resolution movies can be stored on DVDs. Further more, the fact that the
same video coding format can be used to broadcast TV as well as for internet
streaming.
4.3.2 Coding flexibility
ISO14496-10/H.264, like previous MPEG standards, does not define a specific
encoder and decoder. Instead, it defines the syntax of an encoded bitstream and
describes the method of decoding that bitstream. The implementation is left to
the developer. [31]
The H.264 video uses the same hybrid coding approach that is used in the other
MPEG video standards: motion compensated transform coding. The H.264
employs a hybrid coding approach similar to that of MPEG-2 but differs
significantly from MPEG-2 in terms of the actual coding tools used. The main
differences are: use of an integer transform with energy compaction properties
similar to that of the DCT instead of the DCT, an in-loop deblocking filter (DF) to
reduce block artifacts, and intra frame prediction (IFP). The coder control
operation is responsible for functions such as reference frame management,
coding mode selection, and managing the encoding parameter set. Besides, the
H.264 standard introduces several other new coding tools that improve coding
efficiency.
Multiple reference picture motion compensation uses previously encoded
pictures more flexibly than does MPEG-2. In MPEG-2, a P-frame can use only a
single previously coded frame to predict the motion compensation values for an
incoming picture, while a B-frame can use only the immediately previous P- or I-
frame and the immediately subsequent P- or I-frame.
H.264 permits the use of up to 32 previously coded pictures, and it supports
more flexibility in the selection of motion compensation block sizes and shapes,
down to the use of a luma compensation block as small as 4-by-4 pixels. H.264
also supports quarter-sample motion compensation vector accuracy, as opposed
to MPEG-2's half-sample accuracy.
25
26. These refinements permit more precise segmentation of moving areas within the
image, and more precise description of movement. Further, in H.264, the motion-
compensated prediction signal may be weighted and offset by the encoder,
facilitating significantly improved performance in fades (fades can be problematic
for MPEG-2).
4.3.3 Deblocking filter
Block-based coding can generate blocking artifacts in the decoded pictures. In
H.264, a de-blocking filter is brought within the motion-compensated prediction
loop, so that this filtering may be used to predict an expanded number of pictures
(Figure 8).
Switching slices, which permit a decoder to jump between bitstreams in order to
smoothly change bit-rates or do stunt modes without requiring all streams to
send an I-frame at the switch point (making the decoder's job easier at switch
points), have been incorporated.
26
27. Table 4. Comparison between MPEG-2 and H.264
Algorithm Characteristic MPEG-2 H.264
General Motion Same basic structure as
compensated MPEG
predictive, residual
transformed,
entropy coded
Block size 8x8 16x16, 8x16, 16x8, 8x8,
4x8, 8x4, 4x4
Macroblock size 16x16 (frame mode) 16x16
16x8 (field mode)
Intra Prediction None Multi-direction, Multi-
pattern
Scalar quantization Scalar quantization with
Quantization with step size of step size of increase at the
constant increment rate of 12.5%
Entropy coding VLC CAVLC, CABAC
Weighted prediction No Yes
Reference picture One picture Multiple pictures
Motion Estimation Blocks 16x16 16x16, 8x16, 16x8, 8x8,
4x8, 8x4, 4x4
Motion vector prediction Simple Uses median and
segmented
Entropy Coding Multiple VLC Tables Arithmetic Coding and
adaptive VLC Tables
Frame Distance for +/- 1 Unlimited
Prediction forward/backward
Fractional Motion 1/2 Pixel 1/4 Pixel
Estimation
Deblocking Filter None Dynamic edge filters
Scalable coding support [2] Yes, layered picture With some support on
spatial, SNR, temporal and SNR
temporal scalability scalability
Bit rates with same quality 12 -20 Mbps 7 – 8 Mbps
HD video with resolution
(1920 x 1080)
Transmission rate 2 – 15 Mbps 64 kbps – 150 Mbps
27
28. 4.3.4 Performance comparison between MPEG-2 and H.264 using standard
test streams – Simulation results
Test streams (foreman, news and carphone [26]) were encoded using the open-
source MPEG-2 codec [25] and the H.264 codec [24]. The results were
compared against each other for parameters like the signal to noise ratio (SNR),
GOP and compression ratio. CIF files were used for the “Foreman” and the
“News” clips whereas QCIF was used for the “Carphone” clip. The bit rate for
H.264 encoding was taken as the standard one used by the codes. The bit rate
for MPEG-2 encoding was adjusted on the basis of the bit rate of the H.264
encoding process. This helped to compare the two standards on a common
plane. While the aim of this project is to compare the Main profiles of MPEG-2
and H.264, simulations were run for the Simple/Baseline profiles too. This was
done in order to prove quantitatively that encoding using the Main profile for both
MPEG-2 and H.264 gives a better compression ratio and better quality video
than the Simple profile. Tables 5, 6 and 7 tabulate the results obtained after
running the simulations. Figures 11, 12 and 13 show screen shots of the
encoded videos (only for the Main profiles). Section 4.3.5 explains the
conclusions drawn on the basis of the results obtained from simulations.
4.3.5 Conclusion
From the tables below, the following is concluded:
• For the same bit rate and video resolution, the PSNR (dB) values are
greater for H.264 encoded videos than for the MPEG-2 encoded videos
indicating better video quality. This can be verified from the screen shots.
• The compression ratio for H.264 encoded video is also better than that for
MPEG-2 encoded video inspite of better quality video.
Compression ratio = original file size/compressed file size
• The video quality for H.264 video is better than for MPEG-2 video for the
Simple/Baseline profiles as well. Therefore, it can be concluded that H.264
video coding standard gives better compression and better video quality
as compared to MPEG-2.
28
32. 5 Transcoding methods
5.1 Introduction to transcoding
In this fast growing world of multimedia and telecommunications there is a great
demand for efficient usage of the available bandwidth. With the growth of
technology there is an increase in the number of networks, types of devices and
different content representation formats as a result of which interoperability
between different systems and networks is gaining in importance. Transcoding of
video content is one such effort in this direction. Besides these, a transcoder can
also be used to insert new information for example company’s logos, watermarks
as well as error resilience features into a compressed video stream. Transcoding
techniques are also useful in supporting VCR trick modes such as fast-forward,
reverse play etc. for on-demand applications. [4]
Technically, transcoding is the coding and recoding of digital content from one
compressed format to another to enable transmission over different media and
playback over various devices [29].
Having said this, now arises the question of why the need for H264/AVC to
MPEG-2 transcoding [14] [15]? In order to provide better compression of video
as compared to previous standards, H.264/AVC was recently developed by the
JVT (Joint Video Team). This new standard fulfills significant coding efficiency,
simple syntax specifications and seamless integration of video coding into all
current protocols and multiplex architectures. The H.264 specification represents
a significant advancement in the field of video coding technology by providing
MPEG-2 comparable video quality at an average of half the required bandwidth.
Since widespread use of H.264 is anticipated, many legacy systems including all
Digital TVs and home receivers use MPEG-2. This leads to the need for an
efficient architecture that significantly employs the lower cost of H.264 video and
does not require a significant investment in additional video coding hardware.
Figure 14. H.264 to MPEG-2 transcoder applications [12]
32
33. 5.2 How is transcoding done – the basic process
The simplest approach to transcoding is to completely decode the MPEG-2 bit
stream and then re-encode it with an H.264 encoder. The decode operation can
be performed either externally or as a part of the H.264 encoder. System issues,
such as handling SCTE-35 digital program insertion (DPI) messages, will require
that the decode and the encode operations be tightly coupled. The quality of
transcoding with this simple approach will not be high.
Figure 15 shows a comparison between direct encoding and transcoding. The
figure shows the PSNR (a measure of mean square error between the input and
decoded output) values computed at different bit rates. The PSNR numbers are
obtained by averaging the results over 18 different sequences of varying content
type and complexities. The top plot shows the performance of direct encoding
using an H.264 encoder. The bottom plot shows the performance of transcoding
where the video is originally coded with MPEG-2 at 4Mb/s, decoded and then re-
encoded with the same encoder used for direct encoding. Transcoding can result
in up to 20 percent loss in compression efficiency.
Similar to the previous approach, the incoming MPEG-2 stream is decoded and
then re-encoded using an H.264 encoder. However, here the relevant information
available from the MPEG-2 bit stream is reused.
Figure 15. Performance comparison between direct encoding and transcoding
[32]
33
34. 5.3 Criteria for transcoding
Transcoding can be of various types [14]. Some of them are bit rate transcoding
to facilitate more efficient transport of video, spatial and temporal resolution
reduction transcoding for use in mobile devices with limited display and
processing power and error-resilience transcoding in order to achieve higher
resilience of the original bit stream to transmission errors.
To achieve optimum results by transcoding, the following criteria have to be
fulfilled:
(i) The quality of the transcoded bitstream should be comparable to the one
obtained by direct decoding and re-encoding of the output stream.
(ii) The information contained in the input stream should be used as much as
possible to avoid multigenerational deterioration.
(iii) The process should be cost efficient, low in complexity and achieve the
highest quality possible.
5.4 Transcoding of H.264 to MPEG-2
In order to provide better compression of video as compared to previous
standards, H.264/AVC video coding standard was recently developed by the JVT
(Joint Video Team) consisting of experts from VCEG (Video Coding Experts
Group) and MPEG. This new standard fulfills significant coding efficiency, simple
syntax specifications, and seamless integration of video coding into all current
protocols and multiplex architectures. Thus H.264 can support various
applications such as video broadcasting, video streaming and video conferencing
over fixed and wireless networks and over different transport protocols. However
MPEG-2 has already been widely used in the field of digital broadcasting, HDTV
and DVD applications. Hence transcoding is a feasible method to solve the
incompatibility problem between H.264 video source and the existing MPEG-2
decoders.
An H.264/AVC to MPEG-2 transcoder is designed to transcode the H.264 video
stream to MPEG-2 format so as to be used by the MPEG-2 end equipment. It is
better to transmit H.264 bitstreams on public networks to save on the much
needed bandwidth and then transcode them into MPEG-2 bitstreams for local
MPEG-2 equipment like a set-top box.
34
35. 5.5 Transcoding architectures
This section describes the various transcoding architectures [15]:
5.5.1 Open loop transcoding:
Open loop transcoders include selective transmission where the high frequency
DCT coefficients are discarded and requantization. They are computationally
efficient, since they operate directly on the DCT coefficients. However they suffer
from the drift problem. Drift error occurs due to rounding, quantization loss and
clipping functions.
Figure 16. Open loop transcoding architecture [15]
5.5.2 Cascaded pixel domain transcoding architecture:
This is a drift free architecture. It is a concatenation of a simplified decoder and
encoder as shown in Figure 17. In this architecture, instead of performing the full
motion estimation, the encoder reuses the motion vectors along with other
information extracted from the input video bitstream thus reducing the
complexity.
Figure 17. Cascaded pixel domain transcoding architecture [15].
35
36. Simplified DCT domain transcoding (SDDT):
This architecture is based on the assumption that DCT, IDCT and motion
compensation are all linear operations. Since in this architecture, the motion
compensation is performed in the DCT domain it is a computationally intensive
operation. For instance, as shown in Figure 19, the goal is to compute the target
block B from the four overlapping blocks B1, B2, B3 and B4.
Figure 18. Simplified DCT domain transcoding architecture [15].
Figure 19. DCT- Motion compensation [15].
SDDT eliminates the DCT/IDCT and reduces the frame numbers by half as a
result of which it requires less computation and memory as compared to CPDT.
However the linearity assumptions are not strictly true since there are clipping
functions performed in the video encoder/decoder and rounding operations
performed in the interpolation for fractional pixel MC. These failed assumptions
may cause drift in the transcoded video.
36
37. 5.5.3 Cascaded DCT domain transcoding (CDDT)
The cascaded DCT-domain transcoder can be used for spatial and temporal
resolution downscaling and other coding parameter changes. Compared to
SDDT, greater flexibility is achieved using additional DCT-motion compensation
and frame memory resulting in higher cost and complexity. This architecture is
adopted for downscaling operations where the encoder side DCT-MC and
memory will not cost much.
Figure 20. Cascaded DCT domain transcoding architecture [15]
5.6 Conclusions
The selection of appropriate transcoding architecture depends upon the
application for which it is intended. There is generally a tradeoff between the
accuracy and the complexity and cost of the architecture. For example, the
simplest open loop architecture is the easiest to implement but it suffers from the
problem of drift whereas the cascaded DCT domain transcoding architecture
overcomes this problem but it is a very complex and expensive architecture to
implement.
37
38. 6 References
[1] Soon-kak Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4
Part 10 (pp.186-216)”, Special issue on “ Emerging H.264/AVC video coding
standard”, J. Visual Communication and Image Representation, vol. 17,
pp.183-552, Apr. 2006.
[2] A. Puri, H. Chen and A. Luthra, “Video Coding using the H.264/MPEG-4 AVC
compression standard”, Signal Processing: Image Communication, vol.19, pp
793-849, Oct. 2004.
[3] H. Kalva, “Issues in H.264/MPEG-2 Video Transcoding”, Computer Science
and Engineering, Florida Atlantic University, Boca Raton, FL.
[4] S. Sharma, “Transcoding of H.264 bitstream to MPEG 2 bitstream”, Master’s
Thesis, May 2006, EE Department, University of Texas at Arlington.
[5] S. Sharma and K. R. Rao, “Transcoding of H.264 bitstream to MPEG-2
bitstream”, Proceedings of Asia-Pacific Conference on Communications 2007.
[6] “Emerging H.264/AVC Video Coding Standard”, J. Visual Communication and
Image Representation, vol.17, pp. 183-552, Apr. 2006.
[7] P.N.Tudor, “Tutorial on MPEG-2 Video Compression”, IEE J Langham
Thomson Prize, Electronics and Communication Engineering Journal, Dec. 1995.
[8] “The MPEG-2 International Standard”, ISO/IEC, Reference number ISO/IEC
13818-2, 1996.
[9] T. Wiegand et. al., “Overview of the H.264/AVC Video Coding Standard”,
IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, Issue
7, pp. 560-576, July 2003.
[10] J McVeigh et. al., “A software based real time MPEG-2 video encoder”, IEEE
Trans. CSVT, Vol 10, pp 1178-1184, Oct. 2000.
[11] O.J. Morris, “MPEG-2: Where did it come from and what is it?”, IEE
Colloquium, pp. 1/1-1/5, 24 Jan. 1995.
[12] P. Kunzelmann and H. Kalva, “Reduced Complexity H.264 to MPEG-2
Transcoder”, ICCE 2007, pp. 1-2, Jan. 2007.
[13] N. Kamaci and Y. Altunbasak., “Performance Comparison of the Emerging
H.264 Video Coding Standard with the existing standards”, ICME, Vol.1, pp.
345-348, July 2003.
[14] J. Xin, C. Lin and M. Sun , “Digital Video Transcoding”, Proceedings of the
IEEE, Vol. 93, Issue 1,pp 84-97, Jan. 2005.
[15] A. Vetros, C. Christopoulos and H. Sun, “Video transcoding architectures
and techniques: an overview”, IEEE Signal Processing Magazine, Vol. 20, Issue
2, pp 18-29, Mar. 2003.
[16] “MPEG-2”, Wikipedia, Feb. 14, 2008.
Available at <http://en.wikipedia.org/wiki/Mpeg_2>
[17] “Introduction to MPEG 2 Video Compression”
Available at <http://www.bretl.com/mpeghtml/codecdia1.HTM>
38
39. [18] “H.264/MPEG-4 AVC”, Wikipedia, Feb. 18, 2008.
Available at < http://en.wikipedia.org/wiki/H.264>
[19] “H.264 A new Technology for Video Compression” – Available at <
http://www.nuntius.com/technology3.html>
[20] R. Periera, “Efficient transcoding of MPEG-2 to H.264”, Master’s thesis,
Dec. 2005, EE Department, University of Texas at Arlington.
[21] “H.262 : Information technology - Generic coding of moving pictures and
associated audio information: Video”, International Telecommunication Union,
2000-02.
Available at < http://www.itu.int/rec/T-REC-H.262>
[22] “MPEG-2 White paper”, Pinnacle Technical Documentation, Version 0.5,
Pinnacle Systems, Feb. 29, 2000.
[23] M. Ghanbari, “Standard Codecs : Image Compression to Advanced Video
Coding,” Hertz, UK: IEE, 2003.
[24] H.264 software (version 13.2) obtained from:
<http://iphome.hhi.de/suehring/tml/>
[25] MPEG-2 software (version 12) obtained from:
<http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html>
[26] Test streams (Foreman, News, Carphone) obtained from:
<http://www-ee.uta.edu/dip/Courses/EE5356/ee_5356.htm>
[27] Implementation Studies Group, “Main Results of the AVC Complexity
analysis”, MPEG document N4964, ISO/IEC JTC11/SC29/WG11, July 2002.
[28] A. Joch et al., “Performance comparison of video coding standards using
Lagarangian coder control”, IEEE Int. Conf. of Image Processing, Vol. 2, pp.
II-501 to II-504, Sept. 2002.
[29] I. Sylvester, “Transcoding: The future of the video market depends on it”,
IDC Executive Brief, Nov. 2006.
Available at < http://www.ed-
china.com/ARTICLES/2006NOV/2/2006NOV10_HA_AVC_HN_12.PDF>
[30] R. Hoffner, “MPEG-4 Advanced Video Coding emerges”,
Available at < http://www.tvtechnology.com/features/Tech-
Corner/F_Hoffner-03.09.05.shtml>
[31] S. Wagston and A. Susin, “IP core for an H.264 Decoder SoC”, 2007,
Available at< www.us.design-reuse.com/news/?id=15746&print=yes>
[32] S. Krishnamachari and K. Yang, “MPEG-2 to H.264 Transcoding: Why and
How?”, Dec. 1, 2006,
Available at <
http://broadcastengineering.com/infrastructure/broadcasting_mpeg_transc
oding_why/index1.html>
39