R, G, B format requires three matrices, one each for R, G, B quantized values
In Y, U, V representation, the U and V matrices can be half as small as the Y matrix
Source image matrix is divided into blocks of 8X8 submatrices
Smaller block size helps DCT computation and individual blocks are sequentially fed to the DCT which transforms each block separately
Each pixel value in the 2-D matrix is quantized using 8 bits which produces a value in the range of 0 to 255 for the intensity/luminance values and the range of -128 to + 127 for the chrominance values. All values are shifted to the range of -128 to + 127 before computing DCT
All 64 values in the input matrix contribute to each entry in the transformed matrix
The value in the location F[0,0] of the transformed matrix is called the DC coefficient and is the average of all 64 values in the matrix
The other 63 values are called the AC coefficients and have a frequency coefficient associated with them
Spatial frequency coefficients increase as we move from left to right (horizontally) or from top to bottom (vertically). Low spatial frequencies are clustered in the left top corner.
The human eye responds to the DC coefficient and the lower spatial frequency coefficients
If the magnitude of a higher frequency coefficient is below a certain threshold, the eye will not detect it
Set the frequency coefficients in the transformed matrix whose amplitudes are less than a defined threshold to zero (these coefficients cannot be recovered during decoding)
During quantization, the size of the DC and AC coefficients are reduced
A division operation is performed using the predefined threshold value as the divisor
Threshold values vary for each of the 64 DCT coefficients and are held in a 2-D matrix
Trade off between the level of compression required and the information loss that is acceptable
JPEG standard includes two default quantization tables -- one for the luminance coefficients and the other for use with the two sets of chrominance coefficients. Customized tables may be used
Vectoring -- 2-D matrix of quantized DCT coefficients are represented in the form of a single-dimensional vector
After quantization, most of the high frequency coefficients(lower right corner) are zero.
To exploit the number of zeros, a zig-zag scan of the matrix is used
Zig-zag scan allows all the DC coefficients and lower frequency AC coefficients to be scanned first
DC are encoded using differential encoding and AC coefficients are encoded using run-length encoding. Huffman coding is used to encode both after that.
DC coefficient is the largest in the transformed matrix.
DC coefficient varies slowly from one block to the next.
Only the difference in value of the DC coefficients is encoded. Number of bits required to encode is reduced.
The difference values are encoded in the form (SSS, value) where SSS field indicates the number of bits needed to encode the value and the value field indicates the binary form.
63 values of the AC coefficients
Long strings of zeros because of the zig-zag scan
Each AC coefficient encoded as a pair of values -- (skip, value), skip indicates the number of zeros in the run and value is the next non-zero coefficient
Long strings of binary digits replaced by shorter codewords
Prefix property of the huffman codewords enable decoding the encoded bitstream unambiguously
Encapsulates the information relating to an encoded image
Video as a sequence of pictures (or frames)
JPEG algorithm applied to each frame -- moving JPEG (MJPEG). Exploits only spatial redundancy.
High correlation between successive frames. Only small portion of each frame is involved with any motion that is taking place.
A combination of actual frame contents and predicted frame contents are used.
Motion estimation and motion compensation
Interframe and intraframe coding. High compression ratios can be achieved by using both. Random access requirement of image retrieval is satisfied by pure intraframe coding.
I-frames are coded without reference to other frames. Serve as reference pictures for predictive-coded frames.
P-frames are coded using motion compensated prediction from a past I-frame or P-frame.
B-frames are bidirectionally predictive-coded. Highest degree of compression, but require both past and future reference pictures for motion compensation.
D-frames are DC-coded. Of the DCT coefficients only the DC coefficients are present. Used in interactive applications like VoD for rewind and fast-forward operations.
I B B P B B P B B I (display order)
Bitstream order -- I P B B P B B P B B I
Prediction span, Group of Pictures (GOP)
Input frames are preprocessed (color space conversion and spatial resolution adjustment).
Frame types are decided for each frame/picture
Each picture is divided into macroblocks of 16 X 16 pixels.
Macroblocks are intracoded for I frames and predictive coded or intracoded for P and B frames
Macroblocks are divided into six blocks of 8 X 8 pixels (4 luminance and 2 chrominance) and DCT is applied to each block and transform coefficients are quantized and zig-zag scanned and variable-length coded.