Introduction to Video Compression Techniques - Anurag Jain

5,479 views
5,364 views

Published on

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,479
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
618
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Digital video compression is the enabling technology in many multi-media applications. These compression algorithms reduce the bit-rate requirements for transmitting digital video and reduce delivery costs.With these appealing properties, digital video is rapidly becoming an experience of everyday life. For example, video telephony assists corporate and research users in a variety of collaborations utilizing the Public Switched Telephone Networks or the Internet. DVD players, High-Definition Television devices, digital camcorders, digital VCRs and time-shifting products provide consumers with enhanced entertainment environments and novel methods for accessing media content. In the near future, wireless videophones promise un-tethered video communication between users.Several compression standards are key to the success of digital video applications.
  • In this presentation we will see the basic ideas and techniques behind video compression. We will also see various techniques used to bring down the complexity of video processing. This lecture will provide an overview of current and emerging video compression standards which has been crucial for overcoming bottlenecks such as limited bandwidth and storage capacity. In this lecture, we briefly discuss the motivation for creating standards. We also discuss the aspects of a video compression system that the standards actually specify. We briefly review the basics of video compression and then highlight some of the important features of the H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 standards. We conclude by briefly mentioning two recent MPEG standards that are not targeted for compression to clarify their scope.
  • Before discussing video compression standards, let us examine why video compression is important. Video compression is necessary to make many video applications practical. This is because raw video contains an immense amount of data. Most video applications require the communication or storage of video. These capabilities are usually limited and/or very expensive. Consider, for example, video transmission within high-definition television, or HDTV. A popular HDTV video format is progressively scanned at 720x1280 pixels/frame, 60 frames/s video signal, with 24-bits/pixel, 8 bits each for red, green and blue, which corresponds to a raw data rate of about 1.3 Gbits/sec. With modern digital communications, we can only transmit about 20 Mb/s in the 6 MHz bandwidth allocated per television channel. Therefore, powerful video compression techniques are applied to compress the video by about a factor of 70 to send the video, which has a raw rate of 1.3 Gb/s, through the available 20 Mb/s channel. As will be discussed later in this presentation, the MPEG-2 video compression standard is used to compress HDTV video.
  • Currently there are two families of video compression standards, determined under the auspices of the International Telecommunications Union-Telecommunications, or ITU-T, formerly the International Telegraph and Telephone Consultative Committee, or CCITT, and the International Standards Organization, or ISO. The first video compression standard to gain widespread acceptance was the ITU H.261, which was designed for videoconferencing over the integrated services digital network, or ISDN. H.261 was adopted as a standard in 1990. It was designed to operate at p=1,2, ..., 30 multiples of the baseline ISDN data rate, or p x 64 kb/s. In 1993, the ITU-T initiated a standardization effort with the primary goal of video telephony over the public switched telephone network, or PSTN, conventional analog telephone lines, where the total available data rate is only about 33.6 kb/s. The video compression portion of the standard is H.263. Its first phase was adopted in 1996. An enhanced H.263, known as H.263+, was finalized in 1997. A new long-term compression standard, known as H.26L, is currently under development. The moving pictures expert group, or MPEG, was established by the ISO in 1988 to develop a standard for compressing motion pictures and associated audio on digital storage media such as CD-ROM. The resulting standard, commonly known as MPEG-1, was finalized in 1991. It achieves approximately VHS quality video and audio at about 1.5 Mb/s. A second phase of their work, commonly known as MPEG-2, was an extension of MPEG-1 developed for digital television and higher bit rates. Currently, the video portion of digital television, or DTV, and high definition television, or HDTV, standards for large portions of North America, Europe, and Asia are based on MPEG-2. A third phase of their work, known as MPEG-4’s primary goal was to provide increased functionality, including content-based processing, integration of both natural and synthetic, computer generated material, and interactivity with the scene.
  • Video compression standards provide a number of benefits, foremost of which is facilitating interoperability. By ensuring interoperability, standards lower the risk for both consumers and manufacturers . This results in quicker acceptance and widespread use. In addition, these standards are designed for a wide variety of applications. The resulting economies of scale lead to reduced cost and greater use.
  • An important question is what does a video compression standard actually specify? A video compression system is composed of an encoder, compressed bit-streams, and a decoder. The encoder takes original video and compresses it to a bit-stream. The bit-stream is passed to the decoder which decodes it to produce the reconstructed video. One possibility is that the standard would specify both the encoder and decoder, but this approach would have a number of disadvantages. Instead, the standards have a limited scope to ensure interoperability while enabling as much differentiation as possible.
  • The standards do not specify the encoder or the decoder. Instead they specify the bit-stream syntax and the decoding process. The bit-stream syntax is the format for representing the compressed data. The decoding process is the set of rules for interpreting the bit-stream. Note that specifying the decoding process is different from specifying a particular decoder implementation. For example, the standard may specify that the decoder uses an IDCT, but not how to implement the IDCT. The IDCT may be implemented in a direct form, or by a fast algorithm similar to the FFT, and may be optimized to specific targets like DSP. The specific implementation is not standardized. This allows different designers and manufacturers to differentiate their work. The encoding process is also not standardized. For example, more sophisticated encoders can be designed that provide improved performance over baseline encoders. In addition, improvements can be incorporated even after a standard is finalized. For instance, improved algorithms for motion estimation or bit allocation may be incorporated in the future in a standard-compatible manner. The only constraint is that the encoder produces a syntactically correct bit-stream that can be properly decoded by a standard-compatible decoder. Because of these issues, it is important to remember that not all encoders are created equal.
  • The standards do not specify the encoder or the decoder. Instead they specify the bit-stream syntax and the decoding process. The bit-stream syntax is the format for representing the compressed data. The decoding process is the set of rules for interpreting the bit-stream. Note that specifying the decoding process is different from specifying a particular decoder implementation. For example, the standard may specify that the decoder uses an IDCT, but not how to implement the IDCT. The IDCT may be implemented in a direct form, or by a fast algorithm similar to the FFT, and may be optimized to specific targets like DSP. The specific implementation is not standardized. This allows different designers and manufacturers to differentiate their work. The encoding process is also not standardized. For example, more sophisticated encoders can be designed that provide improved performance over baseline encoders. In addition, improvements can be incorporated even after a standard is finalized. For instance, improved algorithms for motion estimation or bit allocation may be incorporated in the future in a standard-compatible manner. The only constraint is that the encoder produces a syntactically correct bit-stream that can be properly decoded by a standard-compatible decoder. Because of these issues, it is important to remember that not all encoders are created equal.
  • On the next few pages, we present a very brief review of video compression. Compression is achieved by exploiting the similarities or correlations that exist in a typical video signal. This can be viewed as reducing the redundancy in the video data. For example, consecutive frames in a video sequence are often highly correlated in that they contain the same objects, perhaps undergoing some movement between the frames. We refer to this as temporal redundancy. Also within a single frame there is spatial redundancy as the amplitudes of nearby pixels are often correlated. Similarly, the red, green, and blue color components of a given pixel are often correlated. The redundancy in a video signal generally can be identified and exploited. Another goal of video compression is to reduce the irrelevancy in the video signal; that is, to reduce the information that is not perceptually important. For example, it would be wasteful to spend valuable bits coding video features that cannot be seen or perceived. Unfortunately, human visual perception is very difficult to model, so determining which data is perceptually irrelevant is a difficult task and therefore irrelevancy is difficult to exploit.
  • Current video compression standards achieve compression by applying the same basic principles. The temporal redundancy is exploited by applying MC-prediction. The spatial redundancy is exploited by applying the DCT. The color space redundancy is exploited by a color space conversion. The resulting DCT coefficients are quantized and the non-zero quantized DCT coefficients are run-length and Huffman coded to produce the compressed bit-stream.
  • The MPEG standard codes video in a hierarchy of units called sequences, pictures, groups of pictures, slices, macro-blocks, and DCT blocks. MC-prediction is performed on 16x16-pixel blocks. A 16x16-pixel block is called a macro-block and is coded using 8x8-pixel block DCTs, typically four 8x8-pixel blocks for luminance, two for chrominance, and possibly a forward and/or backward motion vector. The macro-blocks are scanned in a left-to-right, top-to-bottom fashion. A series of these macro-blocks form a slice. All the slices in a frame comprise a picture. Contiguous pictures form a GOP. The GOPs form the entire sequence.
  • A video sequence consists of a sequence of video frames or images. Each frame may be coded as a separate image, for example by independently applying JPEG-like coding to each frame. However, video has the property that neighboring video frames are typically very similar. Video compression can achieve much higher compression ratios than image compression by exploiting this temporal redundancy or similarity between frames. The fact that neighboring frames are highly similar can be exploited by coding a given frame by first predicting it based on a previously coded frame and then coding the prediction error. There are three basic types of coded frames: I-frames are intra-coded frames; that is, frames that are coded independently of all other frames; predictively coded, or P-frames, where the frame is coded based on a previously coded frame; and bi-directionally predicted frames, or B-frames, where the frame is coded using both previous and future coded frames.
  • Consecutive video frames typically contain the same imagery, although possibly at different spatial locations. To exploit the predictability among neighboring frames, it is important to estimate the motion between the frames and then form an appropriate prediction while compensating for the motion. The process of estimating the motion between frames is known as motion estimation. The process of predicting a given frame based on the previously coded reference frame, while compensating for the relative motion between the two frames, is referred to as motion-compensated prediction. Block-based, motion-compensated prediction is often used because it achieves good performance and has a basic, periodic structure that simplifies implementations. Examples of block-based forward and bi-directional motion-compensated prediction are illustrated on the left and right, respectively. The current frame to be coded is partitioned into 16x16-pixel blocks. For each block in the current frame, a prediction is formed by finding the best-matching block in a previously coded reference frame. The displacement or relative motion for the best-matching block is referred to as motion vector.
  • This page and the next illustrate high-level views of a typical video encoder and decoder. As previously discussed, the various standards specify the bit-stream syntax and the decoding process, but not the encoder processing or the specific decoder implementation. Therefore, these figures should be viewed only as examples of typical encoders and decoders in a video compression system. In the encoder, the input RGB video signal is first transformed into a luminance/chrominance color space, a YUV, to exploit the color space redundancy. To exploit the temporal redundancy, motion estimation and motion-compensated prediction are used to form a prediction of the current frame from the previously encoded frame. The prediction error, or residual, is partitioned into 8x8 blocks and the 2-D DCT is computed for each block. The DCT coefficients are adaptively quantized to exploit the local video characteristics, human perception, and to meet any bit-rate targets. The quantized coefficients and other information are Huffman coded for increased efficiency. Often a buffer is used to couple the variable bit-rate output of the video encoder to the desired channel. This is accomplished via a buffer control mechanism whereby the buffer fullness is used to regulate the coarseness versus fineness of the coefficient of quantization, and thereby the video bit-rate.
  • The video decoding process is the inverse of the encoding process. The bit-stream is parsed and Huffman decoded. The non-zero DCT coefficients are identified and the inverse quantized. An inverse block DCT operation produces the residual signal, which is combined in a spatially adaptive manner with the previously reconstructed frame to reconstruct the current frame. Finally, the reconstructed frame is converted back to the RGB color space to produce the output video signal.
  • This page and the next highlight the basic structures used in the MPEG standards. The MPEG standards group video frames into coding units called groups of pictures, or GOPs. GOPs have the property of re-initializing the temporal prediction used during encoding, which is important to enable random access into a coded video stream. Specifically, the first frame of a GOP is an I-frame and the other frames may be I, P, or B frames. In this example, the GOP contains nine video frames, I 0 through B 8 , where the subscript indicates the frame number. Frame I 9 is the first of the next GOP. The arrows indicate the prediction dependencies. The frame at the base of each arrow, the anchor frame, is used to predict the frame at the tip of the arrow, the predicted frame. I frames are coded independently of other frames. P frames depend on the prediction based on the preceding I or P frame. B frames depend on a prediction based on the preceding and following I or P frames. Notice that each B frame depends on data from a future frame, which means that the future frame must be de-coded before the current B frame can be de-coded. Also note that the use of B frames adds additional delay. Therefore, while B frames are fine for broadcast or storage applications, they are often not appropriate for use in real-time, two-way communications or other applications where low delay is a requirement.
  • The current video compression standards are based on the same basic building blocks, which include motion-compensated prediction, DCT, scalar quantization, run-length, and Huffman coding. Additional features added for particular applications include the capability to code interlaced video and error resilience or scalability tools. A major distinction between video compression standards is that the early standards – including H.261, H.263, MPEG-1, and MPEG-2 – used frame-based coding. Specifically, they viewed each frame as a rectangular group of pixels and attempted to code these pixels using block-based motion-compensated prediction and block DCT. In effect, these standards modeled the video as being composed of moving square blocks. In contrast, MPEG-4 provides the capability to model the video as being composed of a number of separate objects, such as a person, car, or background, and each object can have an arbitrary, non-square shape. MPEG-4 uses the same basic building blocks, but applies them to objects with arbitrary shapes. On the following pages, we briefly highlight the salient features of the different video compression standards. We first examine the MPEG-1 and MPEG-2 standards since they are the most popular, then the H.261 and H.263 standards. We end with the MPEG-4 standard since it is the newest and in many ways the most revolutionary.
  • The H.261 video compression standard was designed for real-time, two-way communication. Short delay was a critical feature, thus a maximum allowable delay of 150 ms was specified. H.261 was designed to operate over ISDN, at p=1,2,…,30 multiples of the baseline ISDN data rate, or p x 64 kb/s. H.261 uses only I and P frames. It does not use B frames in order to minimize the delay. H.261 employs 16x16-pixel ME/MC-P and 8x8-pixel Block DCT. The motion estimation is computed to full-pixel accuracy. The search range is +/-15 pixels. An interesting note is that H.261 provides the option of applying a 3x3 low-pass filter within the MC-P feedback loop to smooth the previous reconstructed frame as part of the prediction process. A loop filter is not used in MPEG-1, or MPEG-2, or H.263 since they use motion estimation with half-pixel accuracy and the resulting spatial interpolation has a similar effect as the loop filter. H.261 was standardized in 1990.
  • The H.263 video compression standard was designed with the primary goal of communication over conventional analog telephone lines. Transmitting video, speech, and control data over a 33.6 kb/s modem means that there typically is only about 20 to 24 kb/s available for the video. The H.263 coder has a similar structure to H.261. It was designed to facilitate interoperability between H.261 and H.263 coders. A number of enhancements over H.261 were introduced: Reducing the overhead information required. Improving the error resilience. Providing enhancements to some of the baseline coding techniques (including half-pixel MC-P). Providing improved compression efficiency via four advanced coding options. The advanced coding options are negotiated, in that the encoder and decoder communicate to determine which options can be used before compression begins. When all the coding options are used, H.263 provides significant quality improvement over H.261, particularly at the very-low, bit-rates. For example, at rates below 64 kb/s, H.263 typically achieves approximately a 3 dB improvement over H.261 at the same bit rate, or 50% reduction in bit rate for the same SNR quality. H.263 was standardized in 1995.
  • The moving pictures expert group, or MPEG, was originally established by ISO to develop a standard for compression of moving pictures, video, and associated audio on digital storage media such as CD-ROM. The resulting standard, commonly known as MPEG-1, was finalized in 1991 and achieves approximately VHS quality video and audio at about 1.5 Mb/s. A second phase of their work, commonly known as MPEG-2, was originally intended as an extension of MPEG-1 and was developed for application toward interlaced video from conventional television and for bit rates up to 10 Mb/s. A third phase was envisioned for higher-bit-rate applications such as HDTV, but it was recognized that those applications could also be addressed within the context of MPEG-2. Hence, the third phase was wrapped back into MPEG-2 and, as a result, there is no MPEG-3 standard. Both MPEG-1 and MPEG-2 are actually composed of a number of parts, including video, audio, systems, compliance testing, etc. The video compression parts of these standards are often referred to as MPEG-1 video and MPEG-2 video, or MPEG-1 and MPEG-2 for brevity. Currently, MPEG-2 video has been adopted as the video portion of the digital television and HDTV standards for large portions of North America, Europe, and Asia. MPEG-2 video is also the basis for the digital video disk, or DVD, standard. MPEG-2 is a superset of MPEG-1, supporting higher bit rates, higher resolutions, and interlaced pictures for television. For interlaced video, the even and odd fields may be coded separately or a pair of even and odd fields can be combined and coded as a frame. For field-based coding, MPEG-2 provides field-based methods for MC-prediction, Block-DCT, and alternate zigzag scanning. In addition, MPEG-2 provides a number of enhancements, including scalable extensions.
  • The MPEG standards were designed to address a large number of diverse applications in which each application required a number of different tools or functionalities. Encoders and decoders that support all the functionalities would be very complex and expensive. However, a typical application is likely to use only a small subset of the MPEG functionalities. Therefore, to enable more efficient implementations for different applications, MPEG grouped together appropriate subsets of functionalities and defined a set of profiles and levels. A profile corresponds to a set of functionalities that are useful for a particular range of applications. Within a profile, a level defines the maximum range on some of the parameters, such as resolution, frame rate, bit rate, and buffer size, which is a lower bound. This figure illustrates a simplified version of the 2-D matrix of profiles and levels in MPEG-2. A decoder is specified by the profile and level that it conforms to, main profile at main level, or MP@ML. In general, a more complex profile/level is a superset of a less complex profile/level. Two widely used profile/levels are MP@ML, which is used to compress conventional television, as used on DVDs and standard definition digital television, or SD-DTV, and main profile at high level which can be used to compress HDTV.
  • MPEG-4 is quite different from MPEG-1 and MPEG-2 in that its primary goals are to enable new functionalities, not just to provide better compression. MPEG-4 supports an object-based or content-based representation. This enables separate coding of different video objects in a video scene and, furthermore, allows individual access and manipulation of different objects on a video. Note that MPEG-4 does not specify how to identify or segment the objects in a video. That operation is performed at the encoder which is not specified by the standard. However, if the individual objects are known, MPEG-4 provides a method to compress those objects. MPEG-4 also supports compression of synthetic or computer-generated video objects, as well as the integration of natural and synthetic objects within a single video. MPEG-4 also enables interaction with the individual video objects. In addition, MPEG-4 supports error-resilient communication over error-prone channels such as the Internet and the third generation wireless system. MPEG-4 also includes most of the coding techniques developed in earlier standards. As a result, MPEG-4 supports both frame-based and object-based video coding. The first version of MPEG-4 was finalized in 1999. A second superset version, referred to as MPEG-4 Version 2, was finalized in 2000. A third version is currently being finalized.
  • A number of important differences become evident when comparing the various compression standards. MPEG-1, MPEG-2, H.261, and H.263 were primarily designed to compress video. They provide a pipe for storing or transmitting the video, use frame-based methods for coding, and are primarily designed for hardware implementations. In contrast, MPEG-4 is designed as a large set of tools for a variety of applications. These tools support both object-based and frame-based coding, and they also support the coding of synthetic video. MPEG-4 has a software emphasis and provides the capability to download certain types of algorithms that may be used at the decoder to support a rich variety of applications, such as interacting with the video or managing decoder client resources. Note that MPEG-4 prohibits the downloading of encoding or decoding algorithms.
  • Field prediction: Top and bottom fields of reference frame predicts first field Bottom field of previous frame and top field of current frame predicts the bottom field of current frame 16 X 8 motion compensation mode A macroblock may have two of them A B picture macroblock may have four! Dual prime motion compensation Top field of current frame is predicted from two motion vectors coming from the top and bottom field of reference frame Works for P vectors
  • Introduction to Video Compression Techniques - Anurag Jain

    1. 1. Introduction to Video Compression Techniques <ul><li>Soyeb Nagori & Anurag Jain, Texas Instruments </li></ul>
    2. 2. Agenda <ul><li>Video Compression Overview </li></ul><ul><li>Motivation for creating standards </li></ul><ul><li>What do the standards specify </li></ul><ul><li>Brief review of video compression </li></ul><ul><li>Current video compression standards H.261, H.263, MPEG-1-2-4 </li></ul><ul><li>Advanced Video Compression Standards, </li></ul><ul><ul><li>H.264, VC1, AVS </li></ul></ul>
    3. 3. Video Compression Overview <ul><li>Problem: </li></ul><ul><ul><li>Raw video contains an immense amount of data. </li></ul></ul><ul><ul><li>Communication and storage capabilities are limited and expensive. </li></ul></ul><ul><li>Example HDTV video signal: </li></ul>
    4. 4. Video Compression: Why? <ul><li>Bandwidth Reduction </li></ul>
    5. 5. Video Compression Standards Variable Object-based coding, synthetic content, interactivity MPEG-4 Variable From Low bitrate coding to HD encoding, HD-DVD, Surveillance, Video conferencing. H.264 < 33.6 kb/s Video telephony over PSTN H.263 > 2 Mb/s Digital Television MPEG-2 1.5 Mb/s Video on digital storage media (CD-ROM) MPEG-1 p x 64 kb/s Video telephony and teleconferencing over ISDN H.261 Variable Continuous-tone still-image compression JPEG BIT RATE APPLICATION STANDARD
    6. 6. Motivation for Standards <ul><li>Goal of standards: </li></ul><ul><ul><li>Ensuring interoperability – Enabling communication between devices made by different manufacturers. </li></ul></ul><ul><ul><li>Promoting a technology or industry. </li></ul></ul><ul><ul><li>Reducing costs. </li></ul></ul>
    7. 7. History of Video Standards
    8. 8. What Do the Standards Specify? <ul><li>A video compression system consists of the following: </li></ul><ul><ul><li>An encoder </li></ul></ul><ul><ul><li>Compressed bit-streams </li></ul></ul><ul><ul><li>A decoder </li></ul></ul><ul><li>What parts of the system do the standards specify? </li></ul>
    9. 9. What Do the Standards Specify? <ul><li>Not the encoder, not the decoder. </li></ul>
    10. 10. What Do the Standards Specify? <ul><li>Just the bit-stream syntax and the decoding process, for example it tells to use IDCT, but not how to implement the IDCT. </li></ul><ul><li>Enables improved encoding and decoding strategies to be employed in a standard-compatible manner. </li></ul>
    11. 11. Achieving Compression <ul><li>Reduce redundancy and irrelevancy. </li></ul><ul><li>Sources of redundancy: </li></ul><ul><ul><li>Temporal – Adjacent frames highly correlated. </li></ul></ul><ul><ul><li>Spatial – Nearby pixels are often correlated with each other. </li></ul></ul><ul><ul><li>Color space – RGB components are correlated among themselves. </li></ul></ul><ul><li>Irrelevancy – Perceptually unimportant information. </li></ul>
    12. 12. Basic Video Compression Architecture <ul><li>Exploiting the redundancies </li></ul><ul><ul><li>Temporal – MC-prediction and MC-interpolation </li></ul></ul><ul><ul><li>Spatial – Block DCT </li></ul></ul><ul><ul><li>Color – Color space conversion </li></ul></ul><ul><li>Scalar quantization of DCT coefficients </li></ul><ul><li>Run-length and Huffman coding of the non-zero quantized DCT coefficients </li></ul>
    13. 13. Video Structure <ul><li>MPEG Structure </li></ul>
    14. 14. Block Transform Encoding DCT Zig-zag Quantize 011010001011101... Run-length Code Huffman Code
    15. 15. Block Encoding original image DCT DC component AC components Quantize zigzag run-length code Huffman code 10011011100011... coded bitstream < 10 bits (0.55 bits/pixel)
    16. 16. Result of Coding/decoding original block reconstructed block errors
    17. 17. Examples Uncompressed (262 KB) Compressed (50) (22 KB, 12:1) Compressed (1) (6 KB, 43:1)
    18. 18. Video Compression <ul><li>Main addition over image compression </li></ul><ul><ul><li>Exploit the temporal redundancy </li></ul></ul><ul><li>Predict current frame based on previously coded frames </li></ul><ul><li>Types of coded frames: </li></ul><ul><ul><li>I-frame – Intra-coded frame, coded independently of all other frames </li></ul></ul><ul><ul><li>P-frame – Predictively coded frame, coded based on previously coded frame </li></ul></ul><ul><ul><li>B-frame – Bi-directionally predicted frame, coded based on both previous and future coded frames </li></ul></ul>
    19. 19. Motion Compensated Prediction (P and B Frames) <ul><li>Motion compensated prediction – predict the current frame based on a reference frame while compensating for the motion </li></ul><ul><li>Examples of block-based motion-compensated prediction for P-frames and B-frames. </li></ul>
    20. 20. Find the differences!! <ul><li>Video coding is fun !! </li></ul>
    21. 21. Conditional Replenishment
    22. 22. Residual Coding
    23. 23. Example Video Encoder
    24. 24. Example Video Decoder
    25. 25. AC/DC prediction for Intra Coding
    26. 26. Group of Pictures (GOP) Structure <ul><li>Enables random access into the coded bit-stream. </li></ul><ul><li>Number of B frames and impact on search range. </li></ul>
    27. 27. Current Video Compression Standards <ul><li>Classification & Characterization of different standards </li></ul><ul><ul><li>Based on the same fundamental building blocks </li></ul></ul><ul><ul><ul><li>Motion-compensated prediction and interpolation </li></ul></ul></ul><ul><ul><ul><li>2-D Discrete Cosine Transform (DCT) </li></ul></ul></ul><ul><ul><ul><li>Color space conversion </li></ul></ul></ul><ul><ul><ul><li>Scalar quantization, run-length, and Huffman coding </li></ul></ul></ul><ul><ul><li>Other tools added for different applications </li></ul></ul><ul><ul><ul><li>Progressive or interlaced video </li></ul></ul></ul><ul><ul><ul><li>Improved compression, error resilience, scalability, etc </li></ul></ul></ul>
    28. 28. H.261 (1990) <ul><li>Goal: real-time, two-way video communication </li></ul><ul><li>Key features </li></ul><ul><ul><li>Low delay (150 ms) </li></ul></ul><ul><ul><li>Low bit rates (p x 64 kb/s) </li></ul></ul><ul><li>Technical details </li></ul><ul><ul><li>Uses I- and P-frames (no B-frames) </li></ul></ul><ul><ul><li>Full-pixel motion estimation </li></ul></ul><ul><ul><li>Search range +/- 15 pixels </li></ul></ul><ul><ul><li>Low-pass filter in the feedback loop </li></ul></ul>
    29. 29. H.263 (1995) <ul><li>Goal: communication over conventional analog telephone lines (< 33.6 kb/s) </li></ul><ul><li>Enhancements to H.261 </li></ul><ul><ul><li>Reduced overhead information </li></ul></ul><ul><ul><li>Improved error resilience features </li></ul></ul><ul><ul><li>Algorithmic enhancements </li></ul></ul><ul><ul><ul><li>Half-pixel motion estimation with larger motion search range </li></ul></ul></ul><ul><ul><li>Four advanced coding modes </li></ul></ul><ul><ul><ul><li>Unrestricted motion vector mode </li></ul></ul></ul><ul><ul><ul><li>Advanced prediction mode ( median MV predictor using 3 neighbors) </li></ul></ul></ul><ul><ul><ul><li>PB-frame mode </li></ul></ul></ul><ul><ul><ul><li>OBMC </li></ul></ul></ul>
    30. 30. MPEG-1 and MPEG-2 <ul><li>MPEG-1 (1991) </li></ul><ul><ul><li>Goal is compression for digital storage media, CD-ROM </li></ul></ul><ul><ul><li>Achieves VHS quality video and audio at ~1.5 Mb/sec ?? </li></ul></ul><ul><li>MPEG-2 (1993) </li></ul><ul><ul><li>Superset of MPEG-1 to support higher bit rates, higher resolutions, and interlaced pictures </li></ul></ul><ul><ul><li>Original goal to support interlaced video from conventional television. Eventually extended to support HDTV </li></ul></ul><ul><ul><li>Provides field-based coding and scalability tools </li></ul></ul>
    31. 31. MPEG-2 Profiles and Levels <ul><li>Goal: to enable more efficient implementations for different applications. </li></ul><ul><ul><li>Profile – Subset of the tools applicable for a family of applications. </li></ul></ul><ul><ul><li>Level – Bounds on the complexity for any profile. </li></ul></ul>DVD & Digital TV: Main Profile at Main Level (MP@ML) HDTV: Main Profile at High Level (MP@HL)
    32. 32. MPEG-4 (1993) <ul><li>Primary goals: new functionalities, not better compression </li></ul><ul><ul><li>Object-based or content-based representation – </li></ul></ul><ul><ul><ul><li>Separate coding of individual visual objects </li></ul></ul></ul><ul><ul><ul><li>Content-based access and manipulation </li></ul></ul></ul><ul><ul><li>Integration of natural and synthetic objects </li></ul></ul><ul><ul><li>Interactivity </li></ul></ul><ul><ul><li>Communication over error-prone environments </li></ul></ul><ul><ul><li>Includes frame-based coding techniques from earlier standards </li></ul></ul>
    33. 33. MV Prediction- MPEG-4
    34. 34. Comparing MPEG-1/2 and H.261/3 With MPEG-4 <ul><li>MPEG-1/2 and H.261/H.263 – Algorithms for compression – </li></ul><ul><ul><li>Basically describe a pipe for storage or transmission </li></ul></ul><ul><ul><li>Frame-based </li></ul></ul><ul><ul><li>Emphasis on hardware implementation </li></ul></ul><ul><li>MPEG-4 – Set of tools for a variety of applications – </li></ul><ul><ul><li>Define tools and glue to put them together </li></ul></ul><ul><ul><li>Object-based and frame-based </li></ul></ul><ul><ul><li>Emphasis on software </li></ul></ul><ul><ul><li>Downloadable algorithms, not encoders or decoders </li></ul></ul>
    35. 35. <ul><li>Half-pel accuracy motion estimation, range up to +/- 64 </li></ul><ul><li>Using bi-directional temporal prediction </li></ul><ul><li>Important for handling uncovered regions </li></ul><ul><li>Using perceptual-based quantization matrix for I-blocks (same as JPEG) </li></ul><ul><li>DC coefficients coded predictively </li></ul>MPEG-1 video vs H.261
    36. 36. MPEG-2 : MC for Interlaced Video <ul><li>Field prediction for field pictures </li></ul><ul><li>Field prediction for frame pictures </li></ul><ul><li>Dual prime for P-pictures </li></ul><ul><li>16x8 MC for field pictures </li></ul>
    37. 37. <ul><li>Each field is predicted individually from the reference fields </li></ul><ul><li>A P-field is predicted from one previous field </li></ul><ul><li>A B-field is predicted from two fields chosen from two reference pictures </li></ul>Field prediction for field pictures
    38. 39. Field Prediction for Frame Pictures <ul><li>Field prediction for frame pictures : the MB to be predicted is split into top field pels and bottom field pels. Each 16x8 field block is predicted separately with its own motion vectors ( P-frame ) or two motion vectors ( B-frame ) </li></ul>
    39. 40. <ul><li>Common elements with other standards </li></ul><ul><li>Macroblocks: 16x16 luma + 2 x 8x8 chroma samples </li></ul><ul><li>Input: association of luma and chroma and conventional </li></ul><ul><li>Sub-sampling of chroma (4:2:0) </li></ul><ul><li>Block motion displacement </li></ul><ul><li>Motion vectors over picture boundaries </li></ul><ul><li>Variable block-size motion </li></ul><ul><li>Block transforms </li></ul><ul><li>Scalar quantization </li></ul><ul><li>I, P and B picture coding types </li></ul>Advanced Video Coding Standard, H.264
    40. 41. H.264 Encoder block diagram Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Coder Control Decoder Motion Estimation Integer Transform/ Scal./ Quant . - Intra-frame Prediction De-blocking Filter Output Video Signal
    41. 42. H.264 <ul><li>New elements introduced </li></ul><ul><li>Every macroblock is split in one of 7 ways </li></ul><ul><ul><li>Up to 16 mini-blocks (and as many MVs) </li></ul></ul><ul><li>Accuracy of motion compensation = 1/4 pixel </li></ul><ul><li>Multiple reference frames </li></ul>
    42. 43. <ul><li>Improved motion estimation </li></ul><ul><li>De-blocking filter at estimation </li></ul><ul><li>Integer 4x4 DCT approximation </li></ul><ul><ul><li>Eliminates </li></ul></ul><ul><ul><ul><li>Problem of mismatch between different implementation. </li></ul></ul></ul><ul><ul><ul><li>Problem of encoder/decoder drift. </li></ul></ul></ul><ul><li>Arithmetic coding for MVs & coefficients. </li></ul><ul><li>Compute SATD (Sum of Absolute Transformed Differences) instead of SAD. </li></ul><ul><ul><li>Cost of transformed differences (i.e. residual coefficients) for 4x4 block using 4 x 4 Hadamard-Transformation </li></ul></ul>H.264
    43. 44. <ul><li>Half sample positions are obtained by applying a 6-tap filter . (1,-5,20,20,-5,1) </li></ul><ul><li>Quarter sample positions are obtained by averaging samples at integer and half sample positions </li></ul>H.264/AVC
    44. 45. H.264/AVC Profiles
    45. 46. H.264/AVC Features Support for multiple reference pictures. It gives significant compression when motion is periodic in nature.
    46. 47. H.264/AVC Features <ul><li>PAFF (Picture adaptive frame/field) </li></ul><ul><ul><li>Combine the two fields together and to code them as one single coded frame (frame mode). </li></ul></ul><ul><ul><li>Not combine the two fields and to code them as separate coded fields (field mode). </li></ul></ul><ul><li>MBAFF (Macro block adaptive frame/field) </li></ul><ul><ul><li>The decision of field/frame happens at macro block pair level. </li></ul></ul>
    47. 48. H.264/AVC Features <ul><li>Flexible macro block ordering </li></ul><ul><ul><li>Picture can be partitioned into regions (slices) </li></ul></ul><ul><ul><li>Each region can be decoded independently. </li></ul></ul>
    48. 49. H.264/AVC Features <ul><li>Arbitrary slice ordering. </li></ul><ul><ul><li>Since each slice can be decoded independently. It can be sent out of order </li></ul></ul><ul><li>Redundant pictures </li></ul><ul><ul><li>Encoder has the flexibility to send redundant pictures. These pictures can be used during loss of data. </li></ul></ul>
    49. 50. Comparison 4*4,8*8(High Profile) Integer DCT 4*4,4*8,8*4,8*8 8*8 DCT Transform Yes Yes No (Optional) De-blocking Filter Yes No No Weighted Prediction Multiple pictures Two (Interlace) One picture Reference frame CAVLC,CABAC VLC VLC Entropy coding Intra Prediction (Spatial Domain) Ac Prediction (Transform Domain) Ac Prediction (Transform Domain) Intra Prediction 4*4,4*8,8*4,8*8, 8*16,16*8,16*16 16*16, 16*8, 8*8 , 4*4 16*16, 8*8 Prediction Block size H.264 WMV9 MPEG4 Feature
    50. 51. RD Comparison
    51. 52. Spatial Domain Intra Prediction <ul><li>What is Spatial Domain Intra Prediction? </li></ul><ul><li>New Approach to Prediction… </li></ul><ul><li>Advantages of the spatial domain prediction… </li></ul><ul><li>The Big Picture… </li></ul><ul><li>Intra-Prediction Modes </li></ul><ul><li>Implementation Challenges for Intra-Prediction </li></ul>
    52. 53. What is Intra Prediction… <ul><li>Intra Prediction is a process of using the pixel data predicted from the neighboring blocks for the purpose of sending information regarding the current macro-block instead of the actual pixel data. </li></ul>Transform Engine Top Neighbor Current Block Current Block Transform Engine 11 35 16 23 11 35 16 23 10 20 1 15 6 3
    53. 54. New approach to Prediction... <ul><li>The H.264/AVC uses a new approach to the prediction of intra blocks by doing the prediction in the spatial domain rather than in frequency domain like other codecs. </li></ul><ul><li>The H.264 /AVC uses the reconstructed but unfiltered macroblock data from the neighboring macroblocks to predict the current macroblock coefficients. </li></ul>
    54. 55. Advantages of spatial domain predictions… <ul><li>Intuitively, the prediction of pixels from the neighbouring pixels (top/left) of macro-blocks would be more efficient as compared to the prediction of the transform domain values. </li></ul><ul><li>Predicting from samples in the pixel domain helps in better compression for intra blocks in a inter frame. </li></ul><ul><li>Allows to better compression and hence a flexible bit-rate control by providing the flexibility to eliminate redundancies across multiple directions. </li></ul>
    55. 56. Intra Prediction Modes <ul><li>H.264/AVC supports intra-prediction for blocks of 4 x 4 to help achieve better compression for high motion areas. </li></ul><ul><ul><li>Supports 9 prediction modes. </li></ul></ul><ul><ul><li>Supported only for luminance blocks </li></ul></ul><ul><li>H.264/AVC also has a 16 x 16 mode, which is aimed to provide better compression for flat regions of a picture at a lower computational costs. </li></ul><ul><ul><li>Supports 4 direction modes. </li></ul></ul><ul><ul><li>Supported for 16x16 luminance blocks and 8x8 chrominance blocks </li></ul></ul>
    56. 57. LUMA 16x16 / CHROMA Intra-Prediction Modes explained... Plane (prediction mode) 3 DC (prediction mode) 2 Horizontal (prediction mode) 1 Vertical (prediction mode) 0 Name of Intra16x16PredMode Intra16x16PredMode
    57. 58. Luma 4x4 Intra-Prediction Modes explained... <ul><li>The H264 /MPEG4 AVC provides for eliminating redundancies in almost all directions using the 9 modes as shown below. </li></ul>Horizontal_Up 8 Vertical_Left 7 Horizontal_Down 6 Vertical_Right 5 Diagonal_Down_Right 4 Diagonal_Down_Left 3 DC 2 Horizontal 1 Vertical 0 Name of Prediction Mode Intra4x4PredMode
    58. 59. Luma 4x4 Intra-Prediction Modes explained...
    59. 60. Intra-Prediction Process… <ul><li>Determining the prediction mode (Only for a 4x4 block size mode). </li></ul><ul><li>Determination of samples to predict the block data. </li></ul><ul><li>Predict the block data. </li></ul>
    60. 61. Determining the prediction mode (Only for a 4x4 block size mode) <ul><li>Flag in the bit-stream indicates, whether prediction mode is present in the bit-stream or it has to be Implicitly calculated. </li></ul><ul><li>In case of Implicit mode, the prediction mode is the minimum of prediction modes of neighbors ‘A’ and ‘B’. </li></ul>
    61. 62. Intra-Prediction Process… <ul><li>Determining the prediction mode (Only for a 4x4 block size mode). </li></ul><ul><li>Determination of samples to predict the block data. </li></ul><ul><li>Predict the block data. </li></ul>
    62. 63. Determination of samples to predict the block data. <ul><li>To Predict a 4x4 block (a-p), a set of 13 samples (A-M) from the neighboring pixels have to be chosen. </li></ul><ul><li>For a 8x8 chrominance block a set if 17 neighboring pixels are chosen as sample values. </li></ul><ul><li>Similarly for predicting a 16x16 luminance block, a set of 33 neighboring pixels are selected as the samples </li></ul>
    63. 64. Intra-Prediction Process… <ul><li>Determining the prediction mode (Only for a 4x4 block size mode). </li></ul><ul><li>Determination of samples to predict the block data. </li></ul><ul><li>Predict the block data. </li></ul>
    64. 65. Intra-Prediction Process… <ul><li>Horizontal prediction mode </li></ul>M J I L K A D C B E H G F I I I I J J J J K K K K L L L L
    65. 66. Intra-Prediction Process… <ul><li>DC prediction mode </li></ul>X = Mean M J I L K A D C B E H G F X X X X X X X X X X X X X X X X
    66. 67. Implementation challenges with the intra-Prediction… <ul><li>The dependence of the blocks prediction samples on it’s neighbors, which itself may a part of current MB prevent parallel processing of block data. </li></ul><ul><li>Each of the 16 blocks in a given MB can choose any one of the nine prediction modes, With each mode entire processing changes. Each mode has a totally different mathematical weighting function used for deriving the predicted data from the samples. </li></ul>
    67. 68. H.264 /AVC adaptive De-blocking filter <ul><li>Coarse quantization of the block-based image transform produce disturbing blocking artifacts at the block boundaries of the image. </li></ul><ul><li>The second source of blocking artifacts is motion compensated prediction. Motion compensated blocks are generated by copying interpolated pixel data from different locations of possibly different reference frames. </li></ul><ul><li>When the later P/B frames reference these images having blocky edges, the blocking artifacts further propagates to the interiors of the current blocks block worsening the situation further. </li></ul>
    68. 69. H.264/AVC adaptive de-blocking filter : Impact on Reference frame Original Frame Reference frame De-blocked Reference frame
    69. 70. H.264/AVC adaptive de-blocking filter : Impact on Reference frame
    70. 71. H.264 /AVC adaptive De-blocking filter : Advantage over post-processing approach. <ul><li>Ensures a certain level of quality. </li></ul><ul><li>No need for potentially an extra frame buffer at the decoder. </li></ul><ul><li>Improves both objective and subjective quality of video streams. Due to the fact that filtered reference frames offer higher quality prediction for motion compensation. </li></ul>
    71. 72. H.264 /AVC adaptive De-blocking filter : Introduction <ul><li>The best way to deal with these artifacts is to filter the blocky edges to have a smoothed edge. This filtering process in known as the de-block filtering. </li></ul><ul><li>Till recently, the coding standards, defined the de-blocking filter, but not mandating the use of the same, as the implementation is cycle consuming and is a function of the quality needed at the user end. </li></ul><ul><li>But it was soon figured out that if the de-block filter is not compulsorily implemented the frames suffered from blockiness caused in the past frames used as reference. </li></ul><ul><li>This coupled with the increasing number crunching powers of the modern day DSP’s, made it a easier choice for the standards body to make this de-block filter mandatory tool or a block in the decode loop – IN LOOP DEBLOCK FILTER. </li></ul><ul><li>This filter not only smoothened the irritating blocky edges but also helped increase the rate-distortion performance. </li></ul>
    72. 73. H.264/AVC adaptive De-blocking filter process… <ul><li>Last process in the frame decoding, which ensures all the top/left neighbors have been fully reconstructed and available as inputs for de-blocking the current MB. </li></ul><ul><li>Applied to all 4x4 blocks except at the boundaries of the picture. </li></ul><ul><li>Filtering for block edges of any slice can be selectively disabled by means of flag. </li></ul><ul><li>Vertical edges filtered first (left to right) followed by horizontal edges (top to bottom) </li></ul>
    73. 74. H.264/AVC adaptive De-blocking filter process… <ul><li>For de-blocking an edge, 8 pixel samples in all are required in which 4 are from one side of the edge and 4 from the other side. </li></ul><ul><li>Of these 8 pixel samples the de-block filter updates 6 pixels for a luminance block and 4 pixels for a chrominance block. </li></ul>Luminance pixels after filtering Chrominance pixels after filtering p3 p1’ p2’ q1’ q0’ p0’ q2’ q3 p3 p1’ p2 q1’ q0’ p0’ q2 q3
    74. 75. H.264 /AVC adaptive De-blocking filter, continued <ul><li>Is it just low pass filter? </li></ul><ul><li>We want to filter only blocking artifacts and not genuine edges!!! </li></ul><ul><li>Content-dependent boundary filtering strength. </li></ul><ul><li>The Boundary strengths are a method of implementing adaptive filtering for a given edge based on certain conditions based on </li></ul><ul><ul><li>MB type </li></ul></ul><ul><ul><li>Reference picture ID </li></ul></ul><ul><ul><li>Motion vector </li></ul></ul><ul><ul><li>Other MB coding parameters </li></ul></ul><ul><li>The Boundary strengths for a chrominance block is determined from the boundary strength of the corresponding luminance macro block. </li></ul>
    75. 76. H.264 /AVC adaptive De-blocking filter, continued <ul><li>The blocking artifacts are most noticeable in very smooth region where the pixel values do not change much across the block edge. </li></ul><ul><li>Therefore, in addition to the boundary strength, a filtering threshold based on the pixel values are used to determine if de-blocking process should be carried for the current edge. </li></ul>
    76. 77. <ul><li>THANK YOU </li></ul><ul><li>Hope It was Fun!!!! </li></ul>

    ×