The document discusses video compression and the human visual system (HVS). It describes how the HVS processes light and forms images, including properties like spatial and temporal resolution. Color perception and visual perception factors like viewing distance are also covered. Common image and video formats are explained, such as RGB, YCbCr, and frame rates. Video compression takes advantage of spatial, temporal, and spectral redundancy to reduce file sizes. Transform-based methods like DCT and wavelets are widely used.
The document provides an overview of the High Efficiency Video Coding (HEVC) standard. Some key points:
- HEVC was created as a new video compression standard to address the growing needs of higher resolution video content and more efficient compression compared to prior standards like H.264.
- It achieves 50% bitrate reduction over H.264 for the same visual quality or improved quality at the same bitrate.
- The standard uses a block-based coding structure with coding tree units and supports intra-frame and inter-frame coding with motion estimation/compensation.
- It introduces more intra-prediction modes and block sizes along with improved transforms, quantization, and entropy coding.
Trends and Recent Developments in Video Coding StandardizationMathias Wien
This document summarizes a tutorial on trends and recent developments in video coding standardization. It discusses the history of video coding standards organizations and the standards they have developed. These include MPEG-1, H.261, H.262, H.264, H.265 and the upcoming H.266 Versatile Video Coding standard. The document outlines the tutorial, which will cover topics like video resolutions, current compression techniques, VVC, and future trends in areas like multi-camera coding.
This document provides an overview of HEVC (High Efficiency Video Coding) including:
- HEVC aims to provide roughly half the bitrate of H.264/AVC at the same quality.
- It uses block-based hybrid video coding with improved intra-prediction, transform, quantization and entropy coding techniques.
- HEVC supports a wide range of resolutions, color spaces and bit depths for 4K and beyond.
Video coding standards define bitstream structures and decoding methods for video compression. Popular standards include MPEG-1/2/4 and H.264/HEVC developed by ISO/IEC and ITU-T. Standards are developed through identification of requirements, algorithm development, selection of core techniques, validation testing, and publication. They enable interoperability and future decoding of emerging standards. [/SUMMARY]
The document provides an overview of the High Efficiency Video Coding (HEVC) standard. It was developed jointly by ISO/IEC and ITU-T to provide roughly half the bit-rate of H.264/AVC for the same subjective quality. Key aspects of HEVC include use of larger block sizes, intra-picture prediction with 33 directional modes, motion vectors with quarter-sample precision, transform sizes from 4x4 to 32x32, adaptive coefficient scanning, in-loop filtering including deblocking and sample adaptive offset, and support for lossless and transform skipping modes. Many companies are starting to support HEVC in their video products and services.
An Overview of High Efficiency Video Codec HEVC (H.265)Varun Ravi
The document provides an overview of the High Efficiency Video Coding (HEVC) H.265 standard. It discusses the need for improved video compression standards due to increasing video content and limited bandwidth. HEVC was developed to meet this need by providing around 50% better compression over its predecessor H.264 while still maintaining high video quality. The document describes the various techniques used in HEVC such as improved block partitioning, transform sizes, prediction modes, and entropy coding that help achieve its compression gains. Both hardware and software implementations of HEVC decoders and encoders are discussed.
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
HEVC/H.265 is a video compression standard that provides around 50% better compression over H.264/AVC for the same level of video quality. It was finalized in 2013 by the joint collaboration of MPEG and ITU-T. Key features of HEVC include support for higher resolutions like 4K and 8K, improved parallel processing abilities, increased coding efficiency through larger block sizes and an expanded set of prediction modes.
The document provides an overview of the High Efficiency Video Coding (HEVC) standard. Some key points:
- HEVC was created as a new video compression standard to address the growing needs of higher resolution video content and more efficient compression compared to prior standards like H.264.
- It achieves 50% bitrate reduction over H.264 for the same visual quality or improved quality at the same bitrate.
- The standard uses a block-based coding structure with coding tree units and supports intra-frame and inter-frame coding with motion estimation/compensation.
- It introduces more intra-prediction modes and block sizes along with improved transforms, quantization, and entropy coding.
Trends and Recent Developments in Video Coding StandardizationMathias Wien
This document summarizes a tutorial on trends and recent developments in video coding standardization. It discusses the history of video coding standards organizations and the standards they have developed. These include MPEG-1, H.261, H.262, H.264, H.265 and the upcoming H.266 Versatile Video Coding standard. The document outlines the tutorial, which will cover topics like video resolutions, current compression techniques, VVC, and future trends in areas like multi-camera coding.
This document provides an overview of HEVC (High Efficiency Video Coding) including:
- HEVC aims to provide roughly half the bitrate of H.264/AVC at the same quality.
- It uses block-based hybrid video coding with improved intra-prediction, transform, quantization and entropy coding techniques.
- HEVC supports a wide range of resolutions, color spaces and bit depths for 4K and beyond.
Video coding standards define bitstream structures and decoding methods for video compression. Popular standards include MPEG-1/2/4 and H.264/HEVC developed by ISO/IEC and ITU-T. Standards are developed through identification of requirements, algorithm development, selection of core techniques, validation testing, and publication. They enable interoperability and future decoding of emerging standards. [/SUMMARY]
The document provides an overview of the High Efficiency Video Coding (HEVC) standard. It was developed jointly by ISO/IEC and ITU-T to provide roughly half the bit-rate of H.264/AVC for the same subjective quality. Key aspects of HEVC include use of larger block sizes, intra-picture prediction with 33 directional modes, motion vectors with quarter-sample precision, transform sizes from 4x4 to 32x32, adaptive coefficient scanning, in-loop filtering including deblocking and sample adaptive offset, and support for lossless and transform skipping modes. Many companies are starting to support HEVC in their video products and services.
An Overview of High Efficiency Video Codec HEVC (H.265)Varun Ravi
The document provides an overview of the High Efficiency Video Coding (HEVC) H.265 standard. It discusses the need for improved video compression standards due to increasing video content and limited bandwidth. HEVC was developed to meet this need by providing around 50% better compression over its predecessor H.264 while still maintaining high video quality. The document describes the various techniques used in HEVC such as improved block partitioning, transform sizes, prediction modes, and entropy coding that help achieve its compression gains. Both hardware and software implementations of HEVC decoders and encoders are discussed.
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
HEVC/H.265 is a video compression standard that provides around 50% better compression over H.264/AVC for the same level of video quality. It was finalized in 2013 by the joint collaboration of MPEG and ITU-T. Key features of HEVC include support for higher resolutions like 4K and 8K, improved parallel processing abilities, increased coding efficiency through larger block sizes and an expanded set of prediction modes.
1) The document discusses the high-level syntax of HEVC, including the video parameter set (VPS), sequence parameter set (SPS), and picture parameter set (PPS).
2) It describes the bitstream structure and how VPS, SPS, PPS, and slice data are organized in network abstraction layer (NAL) units.
3) Key coding units like coding tree blocks (CTBs), coding blocks (CBs), and coding units (CUs) are defined, as well as the quadtree partitioning syntax used in HEVC.
Introduction to H.264 Advanced Video CompressionIain Richardson
The document discusses H.264 advanced video compression. It provides an agenda that covers what H.264 is, how it works through prediction, transform and quantization techniques, its syntax, examples, and going deeper into its implementation. H.264 is widely used for video compression in broadcast digital TV, DVDs/Blu-Rays, IPTV, web video and mobile video. It works by predicting pixels from previous frames, applying transforms and quantization to remove redundant information, and using entropy coding techniques to further compress the data. The document provides resources to learn more about H.264 standards, implementations, and extensions.
The document discusses the H.264 video compression standard. It provides an overview of the standard, including its objectives to improve compression performance over previous standards. Key features that allow for superior compression compared to other standards are described, such as enhanced motion estimation and an improved deblocking filter. Performance comparisons show H.264 can provide bit rate savings of up to 50% compared to other standards like MPEG-2 and H.263.
The document discusses video compression basics and MPEG-2 video compression. It explains that video frames contain redundant spatial and temporal data that can be compressed. MPEG-2 uses three frame types (I, P, B frames) and compresses frames using intra-frame and inter-frame encoding techniques like DCT, quantization, and entropy encoding to remove redundancy. The encoding process transforms raw video frames to compressed bitstreams for efficient storage and transmission.
Video Compression Standards - History & IntroductionChamp Yen
This document provides an overview of several video compression standards including MPEG-1/2, MPEG-4, H.264, and HEVC/H.265. It discusses the key concepts of video coding such as entropy coding, quantization, transformation, and intra- and inter-prediction. For each standard, it describes the main coding tools and improvements over previous standards, focusing on techniques for more efficient prediction and extraction of redundant spatial and temporal information while maintaining quality. The development of these standards has moved towards more fine-grained partitioning and new coding ideas and tools to reduce bitrates further.
Outline:
a. MediaPlayer Subsystem
b. Related Files
c. MediaPlayer Frame of Playing Flow
-StageFright and AwesomePlayer Relatin
-AwesomePlayer Frame and Playing Flow
d. Simple Playing Implement
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
JPEG compression is a lossy compression technique that exploits human visual perception. It works by:
1) Splitting images into blocks and applying the discrete cosine transform (DCT) to each block to de-correlate pixel values.
2) Quantizing the resulting DCT coefficients, discarding less visible high-frequency data.
3) Entropy coding the quantized DCT coefficients using techniques like run-length encoding and Huffman coding to further compress the data.
The document discusses audio compression techniques. It begins with an introduction to pulse code modulation (PCM) and then describes μ-law and A-law compression standards which compress audio using companding algorithms. It also covers differential PCM and adaptive differential PCM (ADPCM) techniques. The document then discusses the MPEG audio compression standard, including its encoder architecture, three layer standards (Layers I, II, III), and applications. It concludes with a comparison of various MPEG audio compression standards and references.
H.261 is a video coding standard published in 1990 by ITU-T for videoconferencing over ISDN networks. It uses techniques like DCT, motion compensation, and entropy coding to achieve compression ratios over 100:1 for video calling. H.261 remains widely used in applications like Windows NetMeeting and video conferencing standards H.320, H.323, and H.324.
Video compression techniques exploit various types of redundancy in video signals to reduce the data required to represent them. Key techniques include intra-frame compression which uses spatial redundancy within frames via DCT, inter-frame compression which uses temporal redundancy between consecutive frames by encoding differences, and motion compensation which accounts for motion between frames. Popular video compression standards like MPEG use a combination of these techniques including I, P and B frames along with motion estimation to achieve much higher compression ratios than image compression alone.
Audio Compression Techniques
a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality, used in CD and MP3 encoding, Internet radio.
Dynamic range compression, also called audio level compression, in which the dynamic range, the difference between loud and quiet, of an audio waveform is reduced
This document provides an overview of various video compression techniques and standards. It discusses fundamentals of digital video including frame rate, color resolution, spatial resolution, and image quality. It describes different compression techniques like intraframe, interframe, and lossy vs lossless. Key video compression standards discussed include MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and JPEG for still image compression. Factors that impact compression like compression ratio, bit rate control, and real-time vs non-real-time are also summarized.
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standardsRishab2612
This topic comes under the Image Processing.In this comparison between JPEG and JPEG 2000 compression standard techniques is made.The PPT comprises of results, analysis and conclusion along with the relevant outputs
The document discusses video compression techniques. It describes video compression as removing repetitive images, sounds, and scenes to reduce file size. There are two types: lossy compression which removes unnecessary data, and lossless compression which compresses without data loss. Common techniques involve predicting frames, exploiting temporal and spatial redundancies, and standards like MPEG. Applications include cable TV, video conferencing, storage media. Advantages are reduced file sizes and faster transfer, while disadvantages are recompilation needs and potential transmission errors.
A description about image Compression. What are types of redundancies, which are there in images. Two classes compression techniques. Four different lossless image compression techiques with proper diagrams(Huffman, Lempel Ziv, Run Length coding, Arithmetic coding).
The document provides an overview of JPEG image compression. It discusses that JPEG is a commonly used lossy compression method that allows adjusting the degree of compression for a tradeoff between file size and image quality. The JPEG compression process involves splitting the image into 8x8 blocks, converting color space, applying discrete cosine transform (DCT), quantization, zigzag scanning, differential pulse-code modulation (DPCM) on DC coefficients, run length encoding on AC coefficients, and Huffman coding for entropy encoding. Quantization is the main lossy step that discards high frequency data imperceptible to human vision to achieve higher compression ratios.
This document provides an overview and comparison of the H.264 and HEVC video coding standards. It describes the key features and innovations that allow each standard to compress video more efficiently than previous standards. H.264 introduced features like adaptive block sizes, multi-frame prediction, quarter-pixel motion compensation and loop filtering that improved compression performance over prior standards. HEVC aims to further increase compression efficiency through innovations such as larger coding tree blocks, additional intra-prediction modes, and improved entropy coding. The document analyzes these standards to understand how their new coding tools enable significantly higher compression ratios and support for new applications like higher resolution video.
Multimedia elements (like audio or video) are stored in media files.
The most common way to discover the type of a file, is to look at the file extension.
Multimedia files have formats and different extensions like: .wav, .mp3, .mp4, .mpg, .wmv, and .avi.
The Importance of Terminology and sRGB Uncertainty - Notes - 0.5Thomas Mansencal
The organised and formatted embodiment of the Colour Science notes I have taken along the years.
It is aimed at the VFX industry, and is the work-in-progress subset of a broader and generic Colour Science presentation.
1) The document discusses the high-level syntax of HEVC, including the video parameter set (VPS), sequence parameter set (SPS), and picture parameter set (PPS).
2) It describes the bitstream structure and how VPS, SPS, PPS, and slice data are organized in network abstraction layer (NAL) units.
3) Key coding units like coding tree blocks (CTBs), coding blocks (CBs), and coding units (CUs) are defined, as well as the quadtree partitioning syntax used in HEVC.
Introduction to H.264 Advanced Video CompressionIain Richardson
The document discusses H.264 advanced video compression. It provides an agenda that covers what H.264 is, how it works through prediction, transform and quantization techniques, its syntax, examples, and going deeper into its implementation. H.264 is widely used for video compression in broadcast digital TV, DVDs/Blu-Rays, IPTV, web video and mobile video. It works by predicting pixels from previous frames, applying transforms and quantization to remove redundant information, and using entropy coding techniques to further compress the data. The document provides resources to learn more about H.264 standards, implementations, and extensions.
The document discusses the H.264 video compression standard. It provides an overview of the standard, including its objectives to improve compression performance over previous standards. Key features that allow for superior compression compared to other standards are described, such as enhanced motion estimation and an improved deblocking filter. Performance comparisons show H.264 can provide bit rate savings of up to 50% compared to other standards like MPEG-2 and H.263.
The document discusses video compression basics and MPEG-2 video compression. It explains that video frames contain redundant spatial and temporal data that can be compressed. MPEG-2 uses three frame types (I, P, B frames) and compresses frames using intra-frame and inter-frame encoding techniques like DCT, quantization, and entropy encoding to remove redundancy. The encoding process transforms raw video frames to compressed bitstreams for efficient storage and transmission.
Video Compression Standards - History & IntroductionChamp Yen
This document provides an overview of several video compression standards including MPEG-1/2, MPEG-4, H.264, and HEVC/H.265. It discusses the key concepts of video coding such as entropy coding, quantization, transformation, and intra- and inter-prediction. For each standard, it describes the main coding tools and improvements over previous standards, focusing on techniques for more efficient prediction and extraction of redundant spatial and temporal information while maintaining quality. The development of these standards has moved towards more fine-grained partitioning and new coding ideas and tools to reduce bitrates further.
Outline:
a. MediaPlayer Subsystem
b. Related Files
c. MediaPlayer Frame of Playing Flow
-StageFright and AwesomePlayer Relatin
-AwesomePlayer Frame and Playing Flow
d. Simple Playing Implement
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
JPEG compression is a lossy compression technique that exploits human visual perception. It works by:
1) Splitting images into blocks and applying the discrete cosine transform (DCT) to each block to de-correlate pixel values.
2) Quantizing the resulting DCT coefficients, discarding less visible high-frequency data.
3) Entropy coding the quantized DCT coefficients using techniques like run-length encoding and Huffman coding to further compress the data.
The document discusses audio compression techniques. It begins with an introduction to pulse code modulation (PCM) and then describes μ-law and A-law compression standards which compress audio using companding algorithms. It also covers differential PCM and adaptive differential PCM (ADPCM) techniques. The document then discusses the MPEG audio compression standard, including its encoder architecture, three layer standards (Layers I, II, III), and applications. It concludes with a comparison of various MPEG audio compression standards and references.
H.261 is a video coding standard published in 1990 by ITU-T for videoconferencing over ISDN networks. It uses techniques like DCT, motion compensation, and entropy coding to achieve compression ratios over 100:1 for video calling. H.261 remains widely used in applications like Windows NetMeeting and video conferencing standards H.320, H.323, and H.324.
Video compression techniques exploit various types of redundancy in video signals to reduce the data required to represent them. Key techniques include intra-frame compression which uses spatial redundancy within frames via DCT, inter-frame compression which uses temporal redundancy between consecutive frames by encoding differences, and motion compensation which accounts for motion between frames. Popular video compression standards like MPEG use a combination of these techniques including I, P and B frames along with motion estimation to achieve much higher compression ratios than image compression alone.
Audio Compression Techniques
a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality, used in CD and MP3 encoding, Internet radio.
Dynamic range compression, also called audio level compression, in which the dynamic range, the difference between loud and quiet, of an audio waveform is reduced
This document provides an overview of various video compression techniques and standards. It discusses fundamentals of digital video including frame rate, color resolution, spatial resolution, and image quality. It describes different compression techniques like intraframe, interframe, and lossy vs lossless. Key video compression standards discussed include MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and JPEG for still image compression. Factors that impact compression like compression ratio, bit rate control, and real-time vs non-real-time are also summarized.
Comparison between JPEG(DCT) and JPEG 2000(DWT) compression standardsRishab2612
This topic comes under the Image Processing.In this comparison between JPEG and JPEG 2000 compression standard techniques is made.The PPT comprises of results, analysis and conclusion along with the relevant outputs
The document discusses video compression techniques. It describes video compression as removing repetitive images, sounds, and scenes to reduce file size. There are two types: lossy compression which removes unnecessary data, and lossless compression which compresses without data loss. Common techniques involve predicting frames, exploiting temporal and spatial redundancies, and standards like MPEG. Applications include cable TV, video conferencing, storage media. Advantages are reduced file sizes and faster transfer, while disadvantages are recompilation needs and potential transmission errors.
A description about image Compression. What are types of redundancies, which are there in images. Two classes compression techniques. Four different lossless image compression techiques with proper diagrams(Huffman, Lempel Ziv, Run Length coding, Arithmetic coding).
The document provides an overview of JPEG image compression. It discusses that JPEG is a commonly used lossy compression method that allows adjusting the degree of compression for a tradeoff between file size and image quality. The JPEG compression process involves splitting the image into 8x8 blocks, converting color space, applying discrete cosine transform (DCT), quantization, zigzag scanning, differential pulse-code modulation (DPCM) on DC coefficients, run length encoding on AC coefficients, and Huffman coding for entropy encoding. Quantization is the main lossy step that discards high frequency data imperceptible to human vision to achieve higher compression ratios.
This document provides an overview and comparison of the H.264 and HEVC video coding standards. It describes the key features and innovations that allow each standard to compress video more efficiently than previous standards. H.264 introduced features like adaptive block sizes, multi-frame prediction, quarter-pixel motion compensation and loop filtering that improved compression performance over prior standards. HEVC aims to further increase compression efficiency through innovations such as larger coding tree blocks, additional intra-prediction modes, and improved entropy coding. The document analyzes these standards to understand how their new coding tools enable significantly higher compression ratios and support for new applications like higher resolution video.
Multimedia elements (like audio or video) are stored in media files.
The most common way to discover the type of a file, is to look at the file extension.
Multimedia files have formats and different extensions like: .wav, .mp3, .mp4, .mpg, .wmv, and .avi.
The Importance of Terminology and sRGB Uncertainty - Notes - 0.5Thomas Mansencal
The organised and formatted embodiment of the Colour Science notes I have taken along the years.
It is aimed at the VFX industry, and is the work-in-progress subset of a broader and generic Colour Science presentation.
This document provides an overview of color video signals and color perception by the human visual system. It discusses:
1. The sensitivity of human cone cells to different wavelengths of light and how this determines color perception.
2. How color video signals like YUV, RGB, and composite video encode color and brightness information.
3. Standards for analog color television transmission including NTSC, PAL, and SECAM which differ in aspects like lines, frame rate, and color encoding.
This document provides information about an image processing course. The key details are:
- The course number is CSC 447 and is taught over 3 lecture hours and 2 lab hours. It is worth 65 marks and has a 3 hour exam.
- The course covers topics like image processing applications, enhancement techniques, restoration, segmentation, and scene analysis. It also covers specific techniques like using neural networks and parallel algorithms for image processing.
- The textbook for the course is "Digital Image Processing Using Matlab" by Rafael Gonzalez and Richard Woods. There are 11 lab assignments focused on topics like image display, filtering, transforms, and color conversion using Matlab.
- The course is taught by
Cathode ray tubes use an electron gun and phosphors to display pixels on a screen. The electron beam is focused and deflected horizontally and vertically to scan across the screen. Raster scan displays refresh the screen by scanning across rows of pixels stored in a frame buffer. Random scan displays draw images by moving and drawing the electron beam between points. Color CRTs use three electron guns and a shadow mask to display red, green, and blue pixels. LCDs use liquid crystals that can be aligned to block or transmit light, while plasma displays excite gas-filled capsules that emit UV light to illuminate phosphors.
This document outlines a course on Computer Graphics and Visualization (CSE304). It provides details on the subject teacher, textbook, schedule, assessments, topics to be covered in the course's 6 units, and expected learning outcomes. Students will learn about 2D and 3D computer graphics tools and techniques, apply algorithms for transformations and projections, and explore visibility, shading, curves, and object representation. Assessments include tests, a mandatory mini project in OpenGL, and a mid-term and end-term exam. Upon completing the course, students will have skills in various areas of computer graphics.
This document provides information about various camera settings and technologies for capturing clear images, including:
1. Clear Scan helps eliminate banding caused when a camera's frame rate does not match a CRT display's refresh rate.
2. Slow Shutter extends the camera's exposure time to produce blur effects or allow more light in low-light scenes.
3. Super Sampling uses a 1080p camera to produce sharper 720p images by maintaining higher frequency response.
4. Detail correction adds a spike-shaped detail signal to make edges appear sharper without degrading resolution. Settings like detail level and H/V ratio control the amount and balance of detail correction.
5. Other topics covered
Basics of Colour Television and Digital TVjanakiravi
Main characteristics of human eye with regard to perception of colours-mixing of colours. three standards of colour transmission system, CATV, DTH, HDTV & SMART TV
For TS-SBTET, C-18, DECE 6 Unit, By Nenavath Ravi Kumar, MIST Hyderabad
This presentation is focused on basic understanding of video signal generation and its electronic interpretation. Contents are taken from bible of television!
This presentation is dedicated to R R Gulati.
This document provides an overview of color theory and color models used in digital images and video. It discusses how the human visual system perceives color and light, and various color spaces such as RGB, YUV, YCbCr. The document also covers color decimation, packing, and conversions between different color formats like 4:4:4, 4:2:2, 4:2:0. Hands-on exercises demonstrate repacking video files between different color models and formats using FFmpeg.
Video and television systems work by presenting a sequence of images rapidly enough that the human eye perceives them as continuous motion. Different regions use different television standards that determine aspects like the number of lines, frames per second, and color systems. Video compression codecs like MPEG remove spatial and temporal redundancy to greatly reduce file sizes for storage and transmission while maintaining adequate quality.
Adaptive Median Filters
Elements of visual perception
Representing Digital Images
Spatial and Intensity Resolution
cones and rods
Brightness Adaptation
Spatial and Intensity Resolution
This document discusses fundamental concepts in digital video. It begins by explaining the differences between analog and digital video, and how digital video allows for direct access and repeated recording without quality degradation. It then examines various digital video standards including CCIR 601, CIF, and QCIF. It provides details on chroma subsampling ratios and how they reduce data requirements. The document also covers high-definition television standards and aims to increase the visual field rather than definition per unit area.
The document discusses various concepts related to capturing digital media, including:
1) Digital sampling breaks sounds and images into discrete data points that can be stored and reconstructed, with higher sampling rates providing more detail.
2) Video is created by lenses focusing light through an aperture onto a CCD sensor, which converts the light information into digital signals for color and brightness.
3) Color is represented digitally through combinations of red, green, and blue values, while print uses cyan, magenta, yellow, and black inks.
4) Microphones convert sound wave vibrations into electrical signals using transducers, and can have different pickup patterns and connection types.
This document discusses key elements that contribute to high quality image production, including spatial resolution, frame rate, dynamic range, color gamut, bit depth, and compression artifacts. It examines these elements in the context of 4K and 8K broadcast cameras and their advantages over HD. Factors like wider viewing angles, increased perceived motion, and benefits for nature documentaries are cited as motivations for 8K. Technical details covered include lens flange back distance, flare, shading, chromatic aberration, and testing procedures. Overall quality is represented as a function of these various image quality factors.
This document discusses various aspects of text and multimedia, including:
- Text attributes that can be changed like font, style, size, color, and effects to emphasize text.
- Common font types like serif, sans-serif, and script and their uses.
- Text formatting considerations like leading, kerning, and readability.
- Using text and its design to set mood and complement graphics in a multimedia project.
The document discusses various aspects of digital video technology including:
1) Digital video recording principles such as how images are captured using CCD sensors and converted into digital files for storage.
2) Television standards and connection systems including color encoding systems, aspect ratios, and connection types like SCART, HDMI, etc.
3) Broadcast systems including terrestrial, satellite, and multiplex broadcasting which allows multiple signals to be transmitted simultaneously.
4) Elements of producing video such as how cameras capture light and focus images, microphones capture sound, and controls like shutter speed, aperture, and white balance affect the image.
5) Digital editing principles involving converting analog to digital, compression formats, and linear
This document discusses color image processing and covers several topics:
- The electromagnetic spectrum and how color is perceived by the human visual system.
- Common color models like RGB, CMY, HSI and how to convert between them.
- Color fundamentals including hue, saturation, brightness.
- Pseudocolor image processing to assign color to monochrome images.
- Full color image processing using color models like HSI.
- The modulation transfer function (MTF) and how it relates to the image contrast sensitivity of the visual system.
The document provides an overview of graphics systems and their components. It discusses four major tasks for rendering geometric objects: modeling, geometric processing, rasterization, and hidden surface removal. It also outlines the major sections which discuss input devices, hard-copy devices, video display devices, and graphics workstations.
The document provides an overview of graphics systems and their components. It discusses four major tasks for rendering geometric objects: modeling, geometric processing, rasterization, and hidden surface removal. It also outlines the major sections which discuss input devices, hard-copy devices, video display devices, and graphics workstations.
Light ID based Interactive Exhibition Using Smart Glass and AR TechnologyVinayagam Mariappan
Digital technologies provide a valuable and engaging element of the exhibition visitor experience. At their best, digital displays and computer-driven interactives form an integrated part of the exhibition narrative and reinforce the learning objectives of the exhibition. This presentation covers Light ID based Interactive exhibition using smart class and AR for immersive and interactive exhibit.
Enabling secure management and distribution of live, linear and on demand video, Video Cloud migrates traditional broadcast transmission, cloud based media management, security and online streaming capabilities into a scalable, cloud-based alternative to traditional premise-based video delivery architectures.
OpenVLC is an open-source, software-defined, flexible, low-cost Visible Light Communication platform. The software solution is implemented as a Linux driver that can communicate directly with the cape and the Linux networking stack.Visible Light Communication, sometimes also referred to as “LiFi" uses standard off-the-shelf visible light LEDs to transmit data using the visible light spectrum.
This document provides an overview of visible light communications (VLC) using white LED lights. It discusses applications of VLC including indoor networking and discusses advantages like health safety, security, and lack of interference. It describes the VLC channel model and challenges like connectivity during movement, multi-user support, dimming control, and shadowing. Solutions to these challenges include handover techniques for mobility, time/code division multiple access for multi-user, and pulse width modulation or modified pulse position modulation for dimming control. Indoor VLC configurations and signal distribution methods are also summarized.
Ultra HD, or UHDTV, is the next generation television standard beyond HDTV and 4K. It provides viewers with a superior sense of reality through higher resolution, higher frame rates, more colors, and higher dynamic range. Key aspects of UHDTV include 8K resolution displays and broadcasting, wide color gamut, high dynamic range, and next-generation 22.2 channel surround sound. Standardization efforts are underway to define UHDTV formats, compression, transport, and end-to-end solutions. Early adopters like Japan and Korea are conducting trials of 8K satellite and terrestrial broadcasting using HEVC video coding and DVB transmission standards.
Vinayagam M is the director of eSilicon Labs in India. The document discusses various media services including IPTV, OTT services like Netflix, content delivery networks, and adaptive bitrate streaming. It provides details on how IPTV and OTT services work, differences between the two, examples of major OTT providers like Netflix, and technologies used for adaptive bitrate streaming including HLS, HDS, and MPEG-DASH.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
5. 5
HVS
• HVS properties influence the
design/tradeoffs of imaging/video
systems
• Basic properties of HVS “front-end”
– 4 types of photo-receptors in the retina
– Rods, 3 types of cones
• Rods
– Achromatic (no concept of color)
– Used for scotopic vision (low light levels)
– Concentrated in periphery
• Cones
– 3 types: S - Short, M- Medium, L - Long
– Red, Green, and Blue peaks
– Used for Photopic Vision (daylight levels)
– Concentrated in fovea (center of the
retina)
6. 6
HVS…
• Eyes, optic nerve, parts of the brain
• Transforms electromagnetic energy
• Image Formation
– Cornea, Sclera, Pupil, Iris, Lens, Retina, Fovea
• Transduction
– Retina, Rods, and Cones
• Processing
– Optic Nerve, Brain
• Retina and Fovea
– Retina has photosensitive receptors at back of eye
– Fovea is small, dense region of receptors
Only cones (no rods)
Gives visual acuity
– Outside Fovea
Fewer receptors overall
Larger proportion of rods
Fov
ea
Retina
7. 7
HVS…
• Transduction (Retina)
– Transform light to neural impulses
– Receptors signal bipolar cells
– Bipolar cells signal ganglion cells
– Axons in the ganglion cells form optic
nerve
• Image Formation in the Human Eye
8. 8
HVS…
• HVS Properties
– Tradeoff in resolution between space and time
Low resolution for high spatial AND high temporal frequencies
However, eye tracking can convert fast-moving object into low retinal frequency
– Achromatic versus chromatic channels
Achromatic channel has highest spatial resolution
Yellow/Blue has lower spatial resolution than Red/Green channel
– Color refers to how we perceive a narrow band of electromagnetic energy
Source, Object, Observer
10. 10
HVS…
• Color Perception (Color Theory)
– Hue
Distinguishes named colors, e.g., RGB
Dominant wavelength of the light
– Saturation
Perceived intensity of a specific color
How far color is from a gray of equal intensity
– Brightness (lightness)
Perceived intensity
Hue Scale
SaturationOriginallightness
11. 11
HVS…
• Visual Perception
– Resolution and Brightness
– Spatial Resolution depends on
Image Size
Viewing Distance
– Brightness
Perception of brightness is higher than perception of color
Different perception of primary colors
Relative Brightness: green:red:blue=59%:30%:11%
– B/W vs. Color
12. 12
HVS…
• Visual Perception
– Temporal Resolution
Effects caused by inertia of human eye
Perception of 16 frames/second as continuous sequence
Special Effect: Flicker
Flicker
Perceived if frame rate or refresh rate of screen too low (<50Hz)
Especially in large bright areas
Higher refresh rate requires
Higher scanning frequency
Higher bandwidth
13. 13
HVS…
• Visual Perception Influence
– Viewing distance
– Display ratio (width/height – 4/3 for conventional TV)
– Number of details still visible
– Intensity (luminance)
14. 14
HVS…
• Imaging / Visual System designed
based on HVS principles
• Example
– Image Sensor
– Television
– Image / Video Display
• Image Sensor
– CCD (charge coupled device):
Arrays of photo diodes
Linearity
Less light needed
Electronic shuttering
– CMOS
Cheaper
Easy manufacturing
• Television
– NTSC (National Television System
Committee):
60 Hz, 30 fps, 525 scan lines
North America, Japan, Korea ….
– PAL (Phase Alteration by Line):
50 Hz, 25 fps, 625 scan lines
Europe …
• Image / Video Display
– CRT Monitor
– LCD TV/Display Monitor
16. 16
IMAGE / VIDEO
• Images
– View Observation by HVS @ time instant
– A multidimensional array of numbers (such as intensity image) or vectors
(such as color image)
Each component in the image called pixel
associates with the pixel value (a single
number in the case of intensity images or
a vector in the case of color images)
39871532
22132515
372669
28161010
39656554
42475421
67965432
43567065
99876532
92438585
67969060
78567099
18. 18
IMAGE / VIDEO…
• Images / Video Frame
– A multidimensional function of spatial coordinates
– Spatial Coordinate
(x,y) for 2D case such as photograph,
(x,y,z) for 3D case such as CT scan images
(x,y,t) for movies
– The function f may represent intensity (for monochrome images) or color
(for color images) or other associated values
Image “After snow storm” f(x,y)
x
y
Origin
19. 19
IMAGE / VIDEO…
• Images / Video Frame
– An image that has been discretized both in Spatial coordinates and
associated value
Consist of 2 sets:(1) a point set and (2) a value set
Can be represented in the form
– I = {(x,a(x)): x ε X, a(x) ε F}
where X and F are a point set and value set, respectively
An element of the image, (x,a(x)) is called a pixel
where
x is called the pixel location and
a(x) is the pixel value at the location x
– Conventional Coordinate for Image Representation
20. 20
IMAGE / VIDEO…
• Images / Video Frame Representation
– Basic Unit : Pixel
– Dimensions
Height
Width
– Frame rate determines how long the pixel
exists, i.e. how it moves
– Color Depth of the pixel
How many bits are used to represent the color of
each pixel?
21. 21
IMAGE / VIDEO…
• Image Type
– Binary Image
– Intensity Image
– Color Image
– Index image
22. 22
IMAGE / VIDEO…
• Binary Image
– Binary image or black and white image
– Each pixel contains one bit
1 represent white
0 represents black
1111
1111
0000
0000
Binary Data
25. 25
IMAGE / VIDEO…
• Index Image
– Each pixel contains index number pointing to a color in a color table
256
746
941
Index Value
Index
No.
Red
component
Green
component
Blue
component
1 0.1 0.5 0.3
2 1.0 0.0 0.0
3 0.0 1.0 0.0
4 0.5 0.5 0.5
5 0.2 0.8 0.9
… … … …
Color Table
26. 26
IMAGE / VIDEO…
• Colourspace Representations
– RGB (Red, Green, Blue) – Basic analog components (from camera/to TV)
– YPbPr (Y,B-Y,R-Y) – ANALOG Colourspace (derived from RGB)
Y=Luminance, B=Blue,
– R=Red
– YUV – Colour difference signals scaled to be modulated on a composite
carrier
– YIQ – Used in NTSC. I=In-phase, Q=Quadrature (IQ plane is 33o rotation
of UV plane)
– YCbCr/YCC – DIGITAL representation of the YPbPr Colourspace (8bit, 2s
compliment)
27. 27
IMAGE / VIDEO…
• RGB Color
– All color can be composed by adding specific amounts of R, G, & B
– 8-bits (28) specifies the amount of each color
– This is the scheme used by most electronic displays to generate color;
e.g. we often call our computer monitors, "RGB displays"
8-bits Red
8-bits Green
8-bits Blue
28. 28
IMAGE / VIDEO…
• Color Reduction
– Human eye is not as sensitive to color as it is to Luminance
– To this end, to save costs the various standards decided to
Maintain luminance information in our images, but Reduce color information
Using RBG, though, how do we easily reduce color information without
removing luminance?
For this, and other technical reasons, a separate color space was chosen by
most video standards …
29. 29
IMAGE / VIDEO…
• Colour Image: RGB
• YCbCr
– Even though most displays actually
use RGB to create the image, YCbCr
is used most often in consumer
electronics for transmission of the
image
– Historically, B/W televisions
transmitted only luminance (Y)
– The color signals were added later
30. 30
IMAGE / VIDEO…
• YCbCr Generated By Sub sampling
– YUV 4:4:4 = 8bits per Y,U,V channel (no downsampling the chroma
channels)
– YUV 4:2:2 = 4 Y pixels sampled for every 2 U and 2 V (2:1 horizontal
downsampling, no vertical downsampling
– YUV 4:2:0 = 2:1 horizontal downsampling, 2:1 vertical downsampling
– YUV 4:1:1 = 4 Y pixels sampled for every 1 U and 1 V (4:1 horizontal
downsampling, no vertical downsampling)
• YUV 4:4:4
Y Y Y Y
Y Y Y Y
4:4:4 Format (3 bytes/pixel):
Cb Cr Cb Cr Cb Cr Cb Cr
Cb Cr Cb Cr Cb Cr Cb Cr
31. 31
IMAGE / VIDEO…
• YUV 4:2:2
• YUV 4:2:0
Y Y Y Y
Y Y Y Y
4:2:2 Format (2 bytes/pixel):
Cb Cr
Cb Cr
Cb Cr
Cb Cr
Y Y Y Y
Y Y Y Y
Cb Cr Cb Cr
4:2:0 Format (1.5 bytes/pixel):
32. 32
IMAGE / VIDEO…
• Up sampling
• Downsampling
nT
Input Signal
1 2 3 4
F(nT) F(nT/2)
nT
Intermediate Signal
12345678
Interpolating
low-pass filter
nT
nT
F(nT/2)
Output Signal
12345678
nT
Input Signal
1 2 3 4
F(nT)
Decimating
low-pass filter
prevents alias
at lower rate
F(2nT)
1
Output Signal
2
33. 33
IMAGE / VIDEO…
• RGB to YCbCr
• RGB to YUV Conversion
– Y = 0.299R + 0.587G + 0.114B
– U= (B-Y)*0.565
– V= (R-Y)*0.713
U-V plane at Y=0.5
Clamp the output: Y=[16, 235], U,V=[16,239]
35. 35
VIDEO/IMAGE COMPRESSION
• How can we use fewer bits?
• To understand how image/audio/video signals are compressed to
save storage and increase transmission efficiency
• Reduces signal size by taking advantage of correlation
– Spatial
– Temporal
– Spectral
36. 36
VIDEO/IMAGE COMPRESSION…
• Compression Methods
• Need to take advantage of redundancy
– Images
Space
Frequency
– Video
Space
Frequency
Time
Linear Predictive AutoRegressive Polynomial Fitting
Model-Based
Huffman
Statistical
Arithmetic Lempel-Ziv
Universal
Lossless
Spatial/Time-Domain
Subband Wavelet
Filter-Based
Fourier DCT
Transform-Based
Frequency-Domain
Lossy
Waveform-Based
Compression Methods
37. 37
VIDEO/IMAGE COMPRESSION…
• Need to take advantage of redundancy
RGB
YCbCr
Blocks
Macro
Blocks
I B P
Remove Temporal Redundancy
Transform
QuantizationCoding
01100010101,0
Remove Spatial Redundancy
Motion
Compensation
38. 38
VIDEO/IMAGE COMPRESSION…
• Spatial Redundancy
– Take advantage of similarity among most neighboring
pixels
• RGB to YUV
– Less information required for YUV (humans less sensitive
to chrominance)
• Macro Blocks
– Take groups of pixels (16x16)
• Discrete Cosine Transformation (DCT)
– Based on Fourier analysis where represent signal as sum
of sine's and cosine’s
– Concentrates on higher-frequency values
– Represent pixels in blocks with fewer numbers
• Quantization
– Reduce data required for coefficients
• Entropy coding
– Compress
40. 40
VIDEO/IMAGE COMPRESSION…
• When may spatial redundancy elimination be ineffective?
– High-resolution images and displays
– May appear ‘coarse’
• What kinds of images/movies?
– A varied image or ‘busy’ scene
– Many colors, few adjacent
Original (63 kb)
Low (7kb)
Very Low (4 kb)Due to Loss of Resolution
Solution ? Temporal Redundancy Reduction
49. 49
IMAGE CODING…
• Image compression system is composed of three key building blocks
– Representation
Concentrates important information into a few parameters
– Quantization
Discretizes parameters
– Binary encoding
Exploits non-uniform statistics of quantized parameters
Creates bitstream for transmission
50. 50
IMAGE CODING…
• Image compression system is composed of three key building blocks
– Representation
Concentrates important information into a few parameters
– Quantization
Discretizes parameters
– Binary encoding
Exploits non-uniform statistics of quantized parameters
Creates bitstream for transmission
51. 51
IMAGE CODING…
• Generally, the only operation that is lossy is the quantization stage
• The fact that all the loss (distortion) is localized to a single operation
greatly simplifies system design
• Can design loss to exploit human visual system (HVS) properties
• Source decoder performs the inverse of each of the three operations
52. 52
IMAGE CODING…
• Representations - Transform and Subband Filtering Methods
– Goal
Transform signal into another domain where most of the information (energy) is
concentrated into only a small fraction of the coefficients
– Enables perceptual processing
Exploiting HVS response to different frequency components
53. 53
IMAGE CODING…
• Representations - Transform and Subband Filtering Methods
– Examples of “traditional” transforms
KLT, DFT, DCT
– Examples of “traditional” Subband filtering methods
Perfect reconstruction filter banks, wavelets
– Transform and Subband interpretations
All of the above are linear representations and can be interpreted from either a
transform or a Subband filtering viewpoint
– Transform viewpoint
Express signal as a linear combination of basis vectors
Stresses linear expansion (linear algebra) perspective
– Subband filtering viewpoint
Pass signal through a set of filters and examine the frequencies passed by
each filter (Subband)
Stresses filtering (signal processing) perspective
54. 54
IMAGE CODING…
• Representations – Transform Image Coding
– A good transform provides
Most of the image energy is concentrated into a small fraction of the
coefficients
Coding only these small fraction of the coefficients and discarding the rest can
often lead to excellent reconstructed quality
The more energy compaction the better
– Orthogonal transforms are particularly useful
Energy in discarded coefficients is equal to energy in reconstruction error
55. 55
IMAGE CODING…
• Representations – Transform Image Coding
– Karhunen-Loeve Transform (KLT)
Optimal energy compaction
Requires knowledge of signal covariance
In general, no simple computational algorithm
– Discrete Fourier Transform (DFT)
Fast algorithms
Good energy compaction, but not as good as DCT
– Discrete Cosine Transform (DCT)
Fast algorithms
Good energy compaction
All real coefficients
Overall good performance and widely used for image and video coding
56. 56
IMAGE CODING…
• Discrete Cosine Transform (DCT)
– 1-D Discrete Cosine Transform (N-point)
– 1-D DCT basis vectors
– 2-D DCT: Separable transform of 1-D DCT
– 2-D DCT basis vectors?
Basis pictures!
– 2-D basis vectors for 2-D DCT are basis pictures!
– 64 basis pictures for 8x8-pixel 2-D DCT
– Image coding with the 2-D DCT is equivalent to approximating the image
as a linear combination of these basis pictures!
57. 57
IMAGE CODING…
• Representations – Coding Transform Coefficients
– Selecting the basis pictures to approximate an image is equivalent to
selecting the DCT coefficients to code
– General methods of coding/discarding coefficients
Zonal Coding
▫ Code all coefficients in a zone and discard others
▫ Example zone: Spatial low frequencies
▫ Only need to code coefficient amplitudes
Threshold Coding
▫ Keep coefficients with magnitude above a threshold
▫ Coefficient amplitudes and locations must be coded
▫ Provides best performance
58. 58
IMAGE CODING…
• Video / Image Coding are Block
based Coding
– Frames are divided into Sub-Block
and then coded
• Macroblock (MB) and Block Layer
– Process the data in blocks of 8x8
samples
– Convert Red-Green-Blue into
Luminance (greyscale) and
Chrominance (Blue color difference
and Red color difference)
– Use half resolution for Chrominance
(because eye is more sensitive to
greyscale than to color)
59. 59
IMAGE CODING…
• Macroblock (MB) and Block Layer
– Macroblock
Consist of
16x16 luminance block
8x8 chrominance block
Basic unit for motion estimation
– Block
8 pixels by 8 lines
Basic unit for DCT
60. 60
IMAGE CODING…
• Lossless Compression
– General-Purpose Compression: Entropy Encoding
– Remove statistical redundancy from data
– ie, Encode common values with short codes, uncommon values with
longer codes
• Lossless Compression
– Huffman Coding
– Example : ABCCDEAAB
After compression: 1011000000001010111011
– Compression ratio
According to probability of the characters appears in the uncompressed
data
C:12 D:13 F:5 E:9 B:16 A:45
1425
55
100
30
10
0
0
0
0
1
1
1
1
000 001 0100 0101 011 1
61. 61
IMAGE CODING…
• Lossless Compression
– Run-Length Coding
Reduce the number of samples to code
Implementation is simple
Input Sequence
0,0,-3,5,1,0,-2,0,0,0,0,2,-4,3,-2,0,0,0,1,0,0,-2,EOB
Run-Length Sequence
(2,-3)(0,5)(0,1)(1,-2)(4,2)(0,-4)(0,3)(0,-2)(3,1)(2,-2)EOB
66. 66
IMAGE CODING…
• Lossy Compression
– Quantization
Many to one mapping
Quantization is the most import means of irrelevancy reduction
– Implementation
Lookup Table
Divide by quantization step-size (round/truncate)
67. 67
IMAGE CODING…
• Lossy Compression
– Divide by quantization step-size
Input signal:0 1 2 3 4 5 6 7(3 bits)
Step-size:2
Quantization:0 0 1 1 2 2 3 3(2 bits)
Inverse quantization:0 0 2 2 4 4 6 6
Quantization Errors:0 1 0 1 0 1 0 1
– Lookup Table
Divide each DCT coefficient by an integer, discard remainder
Result: loss of precision
Typically, a few non-zero coefficients are left
68. 68
IMAGE CODING…
• Lossy Compression
– Zigzag Scan
Efficient encoding of the position
of non-zero transform
coefficients
Scan” quantized coefficients in a
zig-zag order
Non-zero coefficients tend to be
grouped together
72. 72
VIDEO CODING…
• Video
– Sequence of frames (images) that are related
• Moving images contain significant temporal redundancy
– Successive frames are very similar
– Related along the temporal dimension - Temporal redundancy exists
73. 73
VIDEO CODING…
• Video Coding
– The objective of video coding is to compress moving images
– Main addition over image compression
Temporal redundancy
Video coder must exploit the temporal redundancy
– The MPEG (Moving Picture Experts Group) and H.26X are the major
standards for video coding
• Video coding algorithms usually contains two coding schemes :
– Intraframe coding
Intraframe coding does not exploit the correlation among adjacent
frames
Intraframe coding therefore is similar to the still image coding
– Interframe coding
The interframe coding should include motion estimation/compensation process
to remove temporal redundancy
• Basic Concept
– Use interframe correlation for attaining better rate distortion
74. 74
VIDEO CODING…
• Usually high frame rate: Significant temporal redundancy
• Possible representations along temporal dimension
– Transform/Subband Methods
Good for textbook case of constant velocity uniform global motion
Inefficient for nonuniform motion, I.e. real-world motion
Requires large number of frame stores
Leads to delay (Memory cost may also be an issue)
– Predictive Methods
Good performance using only 2 frame stores
However, simple frame differencing in not enough
75. 75
VIDEO CODING…
• Main addition over image compression
– Exploit the temporal redundancy
• Predict current frame based on previously coded frames
• Types of coded frames
– I-frame
Intra-coded frame, coded independently of all other frames
– P-frame
Predictively coded frame, coded based on previously coded frame
– B-frame
Bi-directionally predicted frame, coded based on both previous and future coded
frames
76. 76
VIDEO CODING…
• Motion-Compensated Prediction
– Simple frame differencing fails when there is motion
– Must account for motion
Motion-compensated (MC) prediction
– MC-prediction generally provides significant improvements
– Questions
How can we estimate motion?
How can we form MC-prediction?
• Motion Estimation
– Ideal Situation
Partition video into moving objects
Describe object motion
Generally very difficult
– Practical approach: Block-Matching Motion Estimation
Partition each frame into blocks
Describe motion of each block
No object identification required
Good, robust performance
77. 77
VIDEO CODING…
• Block-Matching Motion Estimation
– Assumptions
Translational motion within block
All pixels within each block have the same motion
– ME Algorithm
Divide current frame into non-overlapping N1xN2 blocks
For each block, find the best matching block in reference frame
– MC-Prediction Algorithm
Use best matching blocks of reference frame as prediction of blocks in current
frame
78. 78
VIDEO CODING…
• Block-Matching - Determining the Best Matching Block
– For each block in the current frame search for best matching block in the
reference frame
Metrics for determining “best match”
Candidate blocks: All blocks in, e.g., (± 32,±32) pixel area
Strategies for searching candidate blocks for best match
Full search: Examine all candidate blocks
Partial (fast) search: Examine a carefully selected subset
– Estimate of motion for best matching block: “motion vector”
• Motion Vectors and Motion Vector Field
– Motion Vector
Expresses the relative horizontal and vertical offsets (mv1,mv2), or motion, of
a given block from one frame to another
Each block has its own motion vector
– Motion Vector Field
Collection of motion vectors for all the blocks in a frame
79. 79
VIDEO CODING…
• Example of Fast Search: 3-Step
(Log) Search
– Goal: Reduce number of search
points
Example:(± 7,±7) search area
Dots represent search points
Search performed in 3 steps
(coarse-to-fine)
– Step 1: (± 4 pixels )
– Step 2: (± 2 pixels )
– Step 3: (± 1 pixels )
– Best match is found at each step
– Next step: Search is centered
around the best match of prior step
– Speedup increases for larger
search areas
80. 80
VIDEO CODING…
• Motion Vector Precision
– Motivation
Motion is not limited to integer-pixel offsets
However, video only known at discrete pixel locations
To estimate sub-pixel motion, frames must be spatially interpolated
– Fractional MVs are used to represent the sub-pixel motion
– Improved performance (extra complexity is worthwhile)
– Half-pixel ME used in most standards: MPEG-1/2/4
– Why are half-pixel motion vectors better?
Can capture half-pixel motion
Averaging effect (from spatial interpolation) reduces prediction error ->
Improved prediction
For noisy sequences, averaging effect reduces noise -> Improved
compression
81. 81
VIDEO CODING…
• Practical Half-Pixel Motion Estimation Algorithm
– Half-Pixel ME (coarse-fine) Algorithm
Coarse Step: Perform integer motion estimation on blocks; find best integer-
pixel MV
Fine Step: Refine estimate to find best half-pixel MV
Spatially interpolate the selected region in reference frame
Compare current block to interpolated reference frame block
Choose the integer or half-pixel offset that provides best match
Typically, bilinear interpolation is used for spatial interpolation
• Example
– MC-Prediction for Two Consecutive Frames
82. 82
VIDEO CODING…
• Bi-Directional MC-Prediction
– Bi-Directional MC-Prediction is used to estimate a block in the current
frame from a block in
Previous frame
Future frame
Average of a block from the previous frame and a block from the future frame
– Motion compensated prediction
Predict the current frame based on reference frame(s) while compensating for
the motion
– Examples of block-based motion-compensated prediction (P-frame)
and bi-directional prediction (B-frame)
83. 83
VIDEO CODING…
• Motion Estimation and Compensation
– The amount of data to be coded can be reduced significantly if the
previous frame is subtracted from the current frame
84. 84
VIDEO CODING…
• Motion Estimation and Compensation
– Uses Block-Matching
The MPEG and H.26X standards use block-matching technique for motion
estimation /compensation
In the block-matching technique, each current frame is divide into equal-size
blocks, called source blocks
Each source block is associated with a search region in the reference frame
The objective of block-matching is to find a candidate block in the search
region best matched to the source block
The relative distances between a source block and its candidate blocks are
called motion vectors
Video Sequence
The current frameThe reconstructed
reference frame
Bx: Search area
associated with X
MV: Motion Vector
X: Source block for
block-matching
86. 86
VIDEO CODING…
• Motion Estimation and Compensation
The Reconstructed Previous Frame The Current Frame
Results of Block-
Matching
The Predicted
Current Frame
87. 87
VIDEO CODING…
• Motion Estimation and Compensation
– Search Range
– The size of the search range =
– The number of candidate blocks =
)2)(2( max2max1 yx dNdN
)12)(12( maxmax yx dd
88. 88
VIDEO CODING…
• Motion Estimation and Compensation
– Motion Vector and Search Area
pnpn 22 Search Area:
Motion vector: (u, v)
89. 89
VIDEO CODING…
• Motion Estimation and Compensation
– Matching Function
Mean square error(MSE)
Mean absolute difference(MAD)
Number of threshold difference(NTD)
Normalized cross-correlation function(NCF)
1
0
21
0
221121
21
21
1
1
1
1
)]1,,(),,([
1
),(
N
n
N
n
tdndnftnnf
NN
ddMSE
1
0
1
0
221121
21
21
1
1
1
1
|)1,,(),,(|
1
),(
N
n
N
n
tdndnftnnf
NN
ddMAD
90. 90
VIDEO CODING…
• Motion Estimation and Compensation
– Algorithm
Full search block matching (FSB)
Fast algorithm
▫ 2D Logarithmic Search (TDL)
▫ Three Step Search (TSS)
▫ Cross-Search Algorithm (CSA)
▫ …
– Full Search Algorithm
If p=7, then there are
(2p+1)(2p+1)=225 candidate blocks.
u
v
Search Area
Candidate
Block
91. 91
VIDEO CODING…
• Motion Estimation and Compensation
– Full Search Algorithm
Intensive computation
Need for fast Motion Estimation !
93. 93
VIDEO CODING…
• Motion Estimation and Compensation
– Three-Step Search
The first step involves block-matching based on 4-pel resolution at the nine
location
The second step involves block-matching based on 2-pel resolution around the
location determined by the first step
The third step repeats the process in the second step (but with resolution 1-pel)
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
11 1
11
11 1
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
3
3
333
3
3 3
2 2 2
2
222
2
96. 96
VIDEO CODER ARCHITECTURE
• Image / Video Coding Based on Block-Matching
– Assume frame f-1 has been encoded and reconstructed, and frame f is the
current frame to be encoded
• Exploiting the redundancies
– Temporal
MC-Prediction (P and B frames)
– Spatial
Block DCT
– Color
Color Space Conversion
• Scalar quantization of DCT coefficients
• Zigzag scanning, runlength and Huffman coding of the nonzero
quantized DCT coefficients
97. 97
VIDEO CODER ARCHITECTURE…
• Video Encoder
– Divide frame f into equal-size blocks
– For each source block,
Find its motion vector using the block-matching algorithm based on the
reconstructed frame f-1
Compute the DFD of the block
– Transmit the motion vector of each block to decoder
– Compress DFD’s of each block
– Transmit the encoded DFD’s to decoder
99. 99
VIDEO CODER ARCHITECTURE…
• Video Decoder
– Receive motion vector of each block from encoder
– Based on the motion vector ,find the best-matching block from the
reference frame
ie,, Find the predicted current frame from the reference frame
– Receive the encoded DFD of each block from encoder
– Decode the DFD.
– Each reconstructed block in the current frame = Its decompressed DFD +
the best-matching block
102. 102
VIDEO CODEC STANDARDS
• Goal of Standards
– Ensuring Interoperability
Enabling communication between devices made by different manufacturers
– Promoting a technology or industry
– Reducing costs
What do the Standards Specify?
103. 103
VIDEO CODEC STANDARDS…
What do the Standards Specify?
• Not the encoder
• Not the decoder
• Just the bitstream syntax and the decoding process(e.g. use IDCT,
but not how to implement the IDCT)
– Enables improved encoding & decoding strategies to be employed in a
standard-compatible manner
104. 104
VIDEO CODEC STANDARDS…
• The Scope of Picture and Video Coding Standardization
– Only the Syntax and Decoder are standardized:
Permits optimization beyond the obvious
Permits complexity reduction for implementability
Provides no guarantees of Quality
Pre-Processing Encoding
Source
Destination
Post-Processing
& Error Recovery
Decoding
Scope of Standard
106. 106
VIDEO CODEC STANDARDS…
• Based on the same fundamental building blocks
– Motion-compensated prediction (I, P, and B frames)
– 2-D Discrete Cosine Transform (DCT)
– Color space conversion
– Scalar quantization, runlengths, Huffman coding
• Additional tools added for different applications:
– Progressive or interlaced video
– Improved compression, error resilience, scalability, etc.
• MPEG-1/2/4, H.261/3/4
– Frame-based coding
• MPEG-4
– Object-based coding and Synthetic video
107. 107
VIDEO CODEC STANDARDS…
• The Video Standards uses all the three types of frames as shown
below
Encoding order: I0, P3, B1, B2, P6, B4, B5, I9, B7, B8.
Playback order: I0, B1, B2, P3, B4, B5, P6, B7, B8, I9.
108. 108
VIDEO CODEC STANDARDS…
• Video Structure
– Video standards code video sequences in hierarchy of layers
– There are usually 5 Layers
GOP (Group of Pictures)
Picture
Slice
Macroblock
Block
109. 109
VIDEO CODEC STANDARDS…
• Video Structure
– A GOP usually started with I frame, followed by a sequence of P and B
frames
– A Picture is indeed a frame in the video sequence
– A Slice is a portion in a picture
Some standards do not have slices
Some view a slice as a row
Each slice in H.264 is not necessary to be a row
It can be any shape containing integral number of macroblocks
– A Macroblock is a 16×16 block
Many standards use Marcoblocks as the basic unit for block-matching
operations
– A Block is a 8×8 block
Many standards use the Blocks as the basic unit for DCT
110. 110
VIDEO CODEC STANDARDS…
• Scalable Video Coding
– Three classes of scalable video coding techniques
Temporal Scalability
Spatial Scalability
SNR Scalability
– Uses B frames for attaining temporal scalability
B frames depend on other frames
No other frames depend on B frames
Discard B frames without affecting other frames
111. 111
VIDEO CODEC STANDARDS…
• Scalable Video Coding – Spatial Scalability
– Basically Resolution Scalability
Here the base layer is the low resolution version of the video sequence
– The base layer uses coaster quantizer for DFD coding
– The residuals in the base layer is refined in the enhancement layer
115. 115
HEVC…
• MPEG-H
– High Efficiency Coding and Media Delivery in
Heterogeneous Environments a new suite of
standards providing technical solutions for
emerging challenges in multimedia industries
– Part 1: System, MPEG Media Transport (MMT)
Integrated services with multiple components in a hybrid
delivery environment, providing support for seamless and
efficient use of heterogeneous network environments,
including broadcast, multicast, storage media and mobile
networks
– Part 2: Video, High Efficiency Video Coding
(HEVC)
Highly immersive visual experiences, with ultra high definition
displays that give no perceptible pixel structure even if
viewed from such a short distance that they subtend a large
viewing angle (up to 55 degrees horizontally for 4Kx2K
resolution displays, up to 100 degrees for 8Kx4K)
– Part 3: Audio, 3D-Audio
Highly immersive audio experiences in which the decoding
device renders a 3D audio scene. This may be using 10.2 or
22.2 channel configurations or much more limited speaker
configurations or headphones, such as found in a personal
tablet or smartphone.
116. 116
HEVC…
• Transport/System Layer Integration
– On going definitions (MPEG, IETF,…,DVB): benefit from H.264/AVC
– MPEG Media Transport (MMT) ?
117. 117
HEVC…
• HEVC = High Efficiency Video Coding
• Joint project between ISO/IEC/MPEG and ITU-T/VCEG
– ISO/IEC: MPEG-H Part 2 (23008-2)
– ITU-T: H.265
• JCT-VC committee
– Joint Collaborative Team on Video Coding
– Co-chairs: Dr. Gary Sullivan (Microsoft, USA) and Dr. Jens-Reiner Ohm (RWTH
Aachen, Germany)
• Target
– Roughly half the bit-rate at the same subjective quality compared to H.264/AVC (50%
over H.264/AVC)
– x10 complexity max for encoder and x2/3 max for decoder
• Requirements
– Progressive required for all profiles and levels
Interlaced support using field SEI message
– Video resolution: sub QVGA to 8Kx4K, with more focus on higher resolution video
content (1080p and up)
– Color space and chroma sampling: YUV420, YUV422, YUV444, RGB444
– Bit-depth: 8-14 bits
– Parallel Processing Architecture
119. 119
HEVC…
• Potential applications
– Existing applications and usage scenarios
IPTV over DSL : Large shift in IPTV eligibility
Facilitated deployment of OTT and multi-screen services
More customers on the same infrastructure: most IP traffic is video
More archiving facilities
– Existing applications and usage scenarios
1080p60/50 with bitrates comparable to 1080i
Immersive viewing experience: Ultra-HD (4K, 8K)
Premium services (sports, live music, live events,…): home theater, Bars
venue, mobile
HD 3DTV Full frame per view at today’s HD delivery rates
What becomes possible with 50% video rate reduction?
126. 126
HEVC…
• Video Coding Techniques : Block-based hybrid video coding
– Interpicture prediction
Temporal statistical dependences
– Intrapicture prediction
Spatial statistical dependences
– Transform coding
Spatial statistical dependences
• Uses YCbCr color space with 4:2:0 subsampling
– Y component
Luminance (luma)
Represents brightness (gray level)
– Cb and Cr components
Chrominance (chroma).
Color difference from gray toward blue and red
127. 127
HEVC…
• Video Coding Techniques : Block-based hybrid video coding
– Motion compensation
Quarter-sample precision is used for the MVs
7-tap or 8-tap filters are used for interpolation of fractional-sample
positions
– Intrapicture prediction
33 directional modes, planar (surface fitting), DC (flat)
Modes are encoded by deriving most probable modes (MPMs) based
on those of previously decoded neighboring PBs
– Quantization control
Uniform reconstruction quantization (URQ)
– Entropy coding
Context adaptive binary arithmetic coding (CABAC)
– In-Loop deblocking filtering
Similar to the one in H.264 and More friendly to parallel processing
– Sample adaptive offset (SAO)
Nonlinear amplitude mapping
For better reconstruction of amplitude by histogram analysis
128. 128
HEVC…
• Coding Tree Unit (CTU) - A picture is partitioned into CTUs
– The CTU is the basic processing unit instead of Macro Blocks (MB)
– Contains luma CTBs and chroma CTBs
A luma CTB covers L × L samples
Two chroma CTBs cover each L/2 × L/2 samples
– HEVC supports variable-size CTBs
The value of L may be equal to 16, 32, or 64.
Selected according to needs of encoders - In terms of memory and
computational requirements
Large CTB is beneficial when encoding high-resolution video content
– CTBs can be used as CBs or can be partitioned into multiple CBs using
quadtree structures
– The quadtree splitting process can be iterated until the size for a luma
CB reaches a minimum allowed luma CB size (8 × 8 or larger).
129. 129
HEVC…
• Block Structure
– Coding Tree Units (CTU)
Corresponds to macroblocks in earlier coding standards (H.264, MPEG2, etc)
Luma and chroma Coding Tree Blocks (CTB)
Quadtree structure to split into Coding Units (CUs)
16x16, 32x32, or 64x64, signaled in SPS
130. 130
HEVC…
• A new framework composed of three
new concepts
– Coding Units (CU)
– Prediction Units (PU)
– Transform Units (TU)
• The decision whether to code a
picture area using inter or intra
prediction is made at the CU level
Goal: To be as flexible as possible and to adapt the
compression-prediction to image peculiarities
131. 131
HEVC…
• Block Structure
– Coding Units (CU)
Luma and chroma Coding Blocks (CB)
Rooted in CTU
Intra or inter coding mode
Split into Prediction Units (PUs) and Transform Units (TUs)
132. 132
HEVC…
• Block Structure
– Prediction Units (PU)
Luma and chroma Prediction Blocks (PB)
Rooted in CU
Partition and motion info
135. 135
HEVC…
• Intra Prediction
– 35 intra modes: 33 directional modes +
DC + planar
– For chroma, 5 intra modes: DC, planar,
vertical, horizontal, and luma derived
– Planar prediction (Intra_Planar)
Amplitude surface with a horizontal and
vertical slope derived from boundaries
– DC prediction (Intra_DC)
Flat surface with a value matching the
mean value of the boundary samples
– Directional prediction (Intra_Angular)
33 different directional prediction is
defined for square TB sizes from 4×4 up
to 32×32
136. 136
HEVC…
• Intra Prediction
– Adaptive reference sample filtering
3-tap filter: [1 2 1]/4
Not performed for 4x4 blocks
For larger than 4x4 blocks, adaptively performed for a subset of modes
Modes except vertical/near-vertical, horizontal/near-horizontal, and DC
– Mode dependent adaptive scanning
4x4 and 8x8 intra blocks only
All other blocks use only diagonal upright scan (left-most scan pattern)
137. 137
HEVC…
• Intra Prediction
– Boundary smoothing
Applied to DC, vertical, and horizontal modes, luma only
Reduces boundary discontinuity
– For DC mode, 1st column and row of samples in predicted block are
filtered
– For Hor/Ver mode, first column/row of pixels in predicted block are filtered
138. 138
HEVC…
• Inter Prediction
– Fractional sample interpolation
¼ pixel precision for luma
– DCT based interpolation filters
8-/7- tap for luma
4-tap for chroma
Supports 16-bit implementation
with non-normative shift
– High precision interpolation and
biprediction
– DCT-IF design
Forward DCT, followed by
inverse DCT
139. 139
HEVC…
• Inter Prediction
– Asymmetric Motion Partition (AMP) for Inter PU
– Merge
Derive motion (MV and ref pic) from spatial and
temporal neighbors
Which spatial/temporal neighbor is identified by
merge_idx
Number of merge candidates (≤ 5) signaled in slice
header
Skip mode = merge mode + no residual
– Advanced Motion Vector Prediction (AMVP)
Use spatial/temporal PUs to predict current MV
140. 140
HEVC…
• Transforms
– Core transforms: DCT based
4x4, 8x8, 16x16, and 32x32
Square transforms only
Support partial factorization
Near-orthogonal
Nested transforms
– Alternative 4x4 DST
4x4 intra blocks, luma only
– Transform skipping mode
By-pass the transform stage
Most effective on “screen content”
4x4 TBs only
141. 141
HEVC…
• Scaling and Quantization
– HEVC uses a uniform reconstruction quantization (URQ)
scheme controlled by a quantization parameter (QP).
– The range of the QP values is defined from 0 to 51
142. 142
HEVC…
• Entropy Coding
– One entropy coder, CABAC
Reuse H.264 CABAC core algorithm
More friendly to software and hardware
implementations
Easier to parallelize, reduced HW area, increased
throughput
– Context modeling
Reduced # of contexts
Increased use of by-pass bins
Reduced data dependency
– Coefficient coding
Adaptive coefficient scanning for intra 4x4 and 8x8
▫ Diagonal upright, horizontal, vertical
Processed in 4x4 blocks for all TU sizes
Sign data hiding:
▫ Sign of first non-zero coefficient conditionally hidden in
the parity of the sum of the non-zero coefficient
magnitudes
▫ Conditions: 2 or more non-zero coefficients, and
“distance” between first and last coefficient > 3
143. 143
HEVC…
• Entropy Coding - CABAC
– Binarization: CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or
0) are encoded. A non-binary-valued symbol (e.g. a transform coefficient or motion vector) is
"binarized" or converted into a binary code prior to arithmetic coding. This process is similar to the
process of converting a data symbol into a variable length code but the binary code is further
encoded (by the arithmetic coder) prior to transmission.
– Stages are repeated for each bit (or "bin") of the binarized symbol.
– Context model selection: A "context model" is a probability model for one or more bins of the
binarized symbol. This model may be chosen from a selection of available models depending on
the statistics of recently coded data symbols. The context model stores the probability of each bin
being "1" or "0".
– Arithmetic encoding: An arithmetic coder encodes each bin according to the selected probability
model. Note that there are just two sub-ranges for each bin (corresponding to "0" and "1").
– Probability update: The selected context model is updated based on the actual coded value (e.g. if
the bin value was "1", the frequency count of "1"s is increased)
144. 144
HEVC…
• Parallel Processing Tools
– Slices
– Tiles
– Wavefront parallel processing (WPP)
– Dependent Slices
• Slices
– Slices are a sequence of CTUs that are processed in the order
of a raster scan. Slices are self-contained and independent
– Each slice is encapsulated in a separate packet
145. 145
HEVC…
• Tile
– Self-contained and independently decodable rectangular regions
– Tiles provide parallelism at a coarse level of granularity
Tiles more than the cores Not efficient Breaks dependencies
146. 146
HEVC…
• WPP
– A slice is divided into rows of CTUs. Parallel processing of rows
– The decoding of each row can be begun as soon a few decisions have
been made in the preceding row for the adaptation of the entropy coder.
– Better compression than tiles. Parallel processing at a fine level of
granularity.
No WPP with tiles !!
147. 147
HEVC…
• Dependent Slices
– Separate NAL units but dependent (Can only be decoded after part of
the previous slice)
– Dependent slices are mainly useful for ultra low delay applications
Remote Surgery
– Error resiliency gets worst
– Low delay
– Good Efficiency Goes well with WPP
148. 148
HEVC…
• Slice Vs Tile
– Tiles are kind of zero overhead slices
Slice header is sent at every slice but tile information once for a sequence
Slices have packet headers too
Each tile can contain a number of slices and vice versa
– Slices are for :
Controlling packet sizes
Error resiliency
– Tiles are for:
Controlling parallelism (multiple core architecture)
Defining ROI regions
149. 149
HEVC…
• Tile Vs WPP
– WPP
Better compression than tiles
Parallel processing at a fine level of granularity
But …
Needs frequent communication between processing units
If high number of cores Can’t get full utilization
– Good for when
Relatively small number of nodes
Good inter core communication
No need to match to MTU size
Big enough shared cache
150. 150
HEVC…
• In-Loop Filters
– Two processing steps, a deblocking filter (DBF) followed by an
sample adaptive offset (SAO) filter, are applied to the
reconstructed samples
The DBF is intended to reduce the blocking artifacts due to block-
based coding
The DBF is only applied to the samples located at block
boundaries
The SAO filter is applied adaptively to all samples satisfying
certain conditions. e.g. based on gradient.
151. 151
HEVC…
• Loop Filters: Deblocking
– Applied to all samples adjacent to a PU or TU boundary
Except the case when the boundary is also a picture boundary, or
when deblocking is disabled across slice or tile boundaries
– HEVC only applies the deblocking filter to the edge that are
aligned on an 8×8 sample grid
This restriction reduces the worst-case computational complexity
without noticeable degradation of the visual quality
It also improves parallel-processing operation
– The processing order of the deblocking filter is defined as
horizontal filtering for vertical edges for the entire picture first,
followed by vertical filtering for horizontal edges.
152. 152
HEVC…
• Loop Filters: Deblocking
– Simpler deblocking filter in HEVC (vs H.264 )
– Deblocking filter boundary strength is set according to
Block coding mode
Existence of non zero coefficients
Motion vector difference
Reference picture difference
153. 153
HEVC…
• Loop Filters: SAO
– A process that modifies the decoded
samples by conditionally adding an
offset value to each sample after the
application of the deblocking filter,
based on values in look-up tables
transmitted by the encoder.
– SAO: Sample Adaptive Offsets
New loop filter in HEVC
Non-linear filter
– For each CTB, signal SAO type and
parameters
– Encoder decides SAO type and
estimates SAO parameters (rate-
distortion opt.)
154. 154
HEVC…
• Special Coding
– I_PCM mode
The prediction, transform, quantization and entropy coding are bypassed
The samples are directly represented by a pre-defined number of bits
Main purpose is to avoid excessive consumption of bits when the signal
characteristics are extremely unusual and cannot be properly handled by hybrid
coding
– Lossless mode
The transform, quantization, and other processing that affects the decoded picture
are bypassed
The residual signal from inter- or intrapicture prediction is directly fed into the
entropy coder
It allows mathematically lossless reconstruction
SAO and deblocking filtering are not applied to this regions
– Transform skipping mode
Only the transform is bypassed
Improves compression for certain types of video content such as computer-
generated images or graphics mixed with camera-view content
Can be applied to TBs of 4×4 size only
155. 155
HEVC…
• High Level Parallelism
– Independently decodable packets
– Sequence of CTUs in raster scan
– Error resilience
– Parallelization
– Independently decodable (re-entry)
– Rectangular region of CTUs
– Parallelization (esp. encoder)
– 1 slice = more tiles, or 1 tile = more slices
– Rows of CTUs
– Decoding of each row can be parallelized
– Shaded CTU can start when gray CTUs in
row above are finished
– Main profile does not allow tiles + WPP
combination
156. 156
HEVC…
• Profiles, Levels and Tiers
– Historically, profile defines collection of coding
tools, whereas Level constrains decoder
processing load and memory requirements
– The first version of HEVC defined 3 profiles
Main Profile: 8-bit video in YUV4:2:0 format
Main 10 Profile: same as Main, up to 10-bit
Main Still Picture Profile: same as Main, one
picture only
– Levels and Tiers
Levels: max sample rate, max picture size,
max bit rate, DPB and CPB size, etc
Tiers: “main tier” and “high tier” within one
level
157. 157
HEVC…
• Complexity Analysis
– Software-based HEVC decoder capabilities
(published by NTT Docomo)
Single-threaded: 1080p@30 on ARMv7
(1.3GHz),1080p@60 decoding on i5
(2.53GHz)
Multi-threaded: 4Kx2K@60 on i7 (2.7GHz),
12Mbps, decoding speed up to 100fps
– Other independent software-based HEVC
real-time decoder implementations published
by Samsung and Qualcomm during HEVC
development
– Decoder complexity not substantially higher
More complex modules: MC, Transform, Intra
Pred, SAO
Simpler modules: CABAC and deblocking