2. JPEG
JPEG is a commonly used method of lossy compression for digital images,
particularly for those images produced by digital photography. The degree of
compression can be adjusted, allowing a selectable tradeoff between storage
size and image quality. JPEG typically achieves 10:1 compression with little
perceptible loss in image quality.
Wikipedia, JPEG
4. 1. Color Transform & Downsampling
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
5. 1. Color Transform & Downsampling
The representation of the colors in the image is converted from RGB toY′CBCR,
consisting of one luma component (Y'), representing brightness, and two
chroma components, (CB and CR), representing color. This step is sometimes
skipped.
The resolution of the chroma data is reduced, usually by a factor of 2 or 3.
This reflects the fact that the eye is less sensitive to fine color details than to
fine brightness details.
Wikipedia, JPEG
6. YUV
YUV (…) encodes a color image or video taking human
perception into account, allowing reduced bandwidth for
chrominance components, thereby typically enabling
transmission errors or compression artifacts to be more
efficiently masked by the human perception than using a "direct"
RGB-representation.
Wikipedia,YUV
7. YUV
Y′UV was invented when engineers wanted color television
in a black-and-white infrastructure.
The luma component already existed as the black and white signal; they added
the UV signal to this as a solution.
The U and V signals tell the television to shift the color of a certain pixel
without altering its brightness. Or the U and V signals tell the monitor to make
one color brighter at the cost of the other and by how much it should be
shifted.
Wikipedia,YUV
8. Y’CbCr
Y′CbCr is often confused with theYUV color space,
and typically the termsYCbCr andYUV are used
interchangeably, leading to some confusion.
The main difference is that YUV is analog andYCbCr is digital.
Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high
resolution or transmitted at high bandwidth, and two chroma components (CB
and CR) that can be bandwidth-reduced, subsampled, compressed, or
otherwise treated separately for improved system efficiency.
Wikipedia,YCbCr
9. Chroma Subsampling
Chroma subsampling is the practice of encoding images by implementing less
resolution for chroma information than for luma information, taking
advantage of the human visual system's lower acuity for color differences
than for luminance.
The subsampling scheme is commonly expressed as a three part ratio J:a:b, that
describe the number of luminance and chrominance samples in a conceptual
region that is J pixels wide, and 2 pixels high.
● J: horizontal sampling reference (width of the conceptual region). Usually, 4.
● a: number of chrominance samples (Cr, Cb) in the first row of J pixels.
● b: number of changes of chrominance samples (Cr, Cb) between first and second row of J pixels.
Wikipedia, Chroma Subsampling
12. 2. Discrete Cosine Transform
After subsampling, each channel must be split into 8×8 blocks. Depending on
chroma subsampling, this yields Minimum Coded Unit (MCU) blocks of size 8×8
(4:4:4 – no subsampling), 16×8 (4:2:2), or most commonly 16×16 (4:2:0).
If the data for a channel does not represent an integer number of blocks then the
encoder must fill the remaining area of the incomplete blocks with some form
of dummy data. Filling the edges with a fixed color can create ringing
artifacts along the visible part of the border; repeating the edge pixels is a
common technique that reduces (but does not necessarily completely eliminate)
such artifacts, and more sophisticated border filling techniques can also be
applied.
Wikipedia, JPEG
13. Fourier Transform
The Fourier transform (FT) decomposes
a function of time (a signal) into its constituent
frequencies. This is similar to the way a musical
chord can be expressed in terms of the volumes and frequencies of its
constituent notes.
The Fourier transform of a function of time is itself a complex-valued function of
frequency, whose magnitude (modulus) represents the amount of that
frequency present in the original function, and whose argument is the phase
offset of the basic sinusoid in that frequency.
Wikipedia, Fourier Transform
14. 2. Discrete Cosine Transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in
terms of a sum of cosine functions oscillating at different frequencies. DCTs
are important to numerous applications in science and engineering, from lossy
compression of audio (e.g. MP3), images (e.g. JPEG) (where small
high-frequency components can be discarded), and video (e.g. MPEG)
Wikipedia, DCT
The DCT transforms an 8×8 block of input values to a linear combination of
these 64 patterns. The patterns are referred to as the two-dimensional DCT
basis functions, and the output values are referred to as transform
coefficients.
Wikipedia, JPEG
15. Discrete Cosine Transform Example
= Σ
u (horizontal spatial frequency) = 0 → 7
v(verticalspatialfrequency)=0→7
Wikipedia, Discrete Cosine Transform
17. Discrete Cosine Transform
Note the top-left corner entry with the rather large
magnitude. This is the DC coefficient, which defines
the basic hue for the entire block. The remaining 63
coefficients are the AC coefficients.
The advantage of the DCT is its tendency to aggregate
most of the signal in one corner of the result. The
quantization step to follow accentuates this effect while
simultaneously reducing the overall size of the DCT
coefficients, resulting in a signal that is easy to
compress efficiently in the entropy stage.
Wikipedia, JPEG
18. 3. Quantization
Note the top-left corner entry with the rather large
magnitude. This is the DC coefficient, which defines
the basic hue for the entire block. The remaining 63
coefficients are the AC coefficients.
The human eye is good at seeing small differences in
brightness over a relatively large area,
but not so good at distinguishing the exact strength of
a high frequency brightness variation.
Wikipedia, JPEG
19. 3. Quantization
The human eye is good at seeing small differences in brightness over a relatively
large area (DC Coefficients, basic hue), but not so good at distinguishing the
exact strength of a high frequency brightness variation (AC Coefficients).
This allows one to greatly reduce the amount of information in the high frequency
components. This is done by simply dividing each component in the frequency
domain by a constant for that component, and then rounding to the nearest
integer.
Wikipedia, JPEG
20. Quantization Matrix
A typical quantization matrix
(for a quality of 50% as specified in the original JPEG Standard)
Wikipedia, JPEG
21. Quantization Matrix
The quantization matrix is designed to provide more resolution to more
perceivable frequency components over less perceivable components
(usually lower frequencies over high frequencies) in addition to transforming as
many components to 0, which can be encoded with greatest efficiency.
This rounding operation is the only lossy operation in the whole process (other
than chroma subsampling) if the DCT computation is performed with sufficiently
high precision. As a result of this, it is typically the case that many of the higher
frequency components are rounded to zero, and many of the rest become
small positive or negative numbers, which take many fewer bits to represent.
Wikipedia, JPEG
24. Run-length Encoding (RLE)
Run-length encoding (RLE) is a very simple form of lossless data compression in
which runs of data (that is, sequences in which the same data value occurs in
many consecutive data elements) are stored as a single data value and count,
rather than as the original run. This is most useful on data that contains many
such runs.
Example. WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWW
→ 12W1B12W3B24W
Wikipedia, Run-length Encoding
25. Shannon Entropy
Information entropy is the average rate at which
information is produced by a stochastic source
of data. The measure of information entropy
associated with each possible data value is:
When the data source produces a low-probability
value (when a low-probability event occurs), the
event carries more "information" ("surprisal")
than a high-probability event.
Wikipedia, Shannon Entropy
Entropy H(X) of a coin flip
26. Huffman Encoding
In computer science and information theory, a Huffman code is a particular type
of optimal prefix code that is commonly used for lossless data compression.
The output from Huffman's algorithm can be viewed as a variable-length code
table for encoding a source symbol (such as a character in a file). The
algorithm derives this table from the estimated probability or frequency of
occurrence (weight) for each possible value of the source symbol. As in other
entropy encoding methods, more common symbols are generally represented
using fewer bits than less common symbols.
Wikipedia, Huffman Encoding
29. 4. Encoding
It involves arranging the image components in a "zigzag" order employing
run-length encoding (RLE) algorithm that groups similar frequencies together,
inserting length coding zeros, and then using Huffman coding on what is left.
The JPEG standard provides general-purpose Huffman tables; encoders may
also choose to generate Huffman tables optimized for the actual frequency
distributions in images being encoded.
Wikipedia, JPEG
30. Why RLE?
It involves arranging the image components in a
"zigzag" order employing run-length encoding
(RLE) algorithm that groups similar frequencies
together, inserting length coding zeros, and then
using Huffman coding on what is left.
It is typically the case that many of the higher
frequency components are rounded to zero.
Wikipedia, JPEG
31. Why not use Huffman Encoding directly?
A traditional Huffman code would be obliged to use at least one bit per
character. … The entropy of English, given a good model, is about one bit per
character (Shannon, 1948), so a Huffman code is likely to be highly inefficient.
A traditional patch-up of Huffman codes uses then to compress blocks of
symbols, … but only at the expense of losing the elegant instantaneous
decodeability, … and having to compute the probabilities of all relevant strings
… end up explicitly computing the probabilities and codes for a huge number of
strings, most of which will never actually occur. … They are optimal symbol
codes, but for practical purposes we don’t want a symbol code.
Information Theory, Inference, and Learning Algorithms (MacKay, 2005)
32. 5. Summary
Y’CbCr and
Chroma Subsampling
Discrete Cosine Transform
on Spatial Frequency
Effective Quantization of
AC Coefficients
Run-length Encoding
Huffman Encoding
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm