2019.07.29
Kyle (Kwanghee Choi)
JPEG
JPEG is a commonly used method of lossy compression for digital images,
particularly for those images produced by digital photography. The degree of
compression can be adjusted, allowing a selectable tradeoff between storage
size and image quality. JPEG typically achieves 10:1 compression with little
perceptible loss in image quality.
Wikipedia, JPEG
JFEF Encoding
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
1. Color Transform & Downsampling
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
1. Color Transform & Downsampling
The representation of the colors in the image is converted from RGB toY′CBCR,
consisting of one luma component (Y'), representing brightness, and two
chroma components, (CB and CR), representing color. This step is sometimes
skipped.
The resolution of the chroma data is reduced, usually by a factor of 2 or 3.
This reflects the fact that the eye is less sensitive to fine color details than to
fine brightness details.
Wikipedia, JPEG
YUV
YUV (…) encodes a color image or video taking human
perception into account, allowing reduced bandwidth for
chrominance components, thereby typically enabling
transmission errors or compression artifacts to be more
efficiently masked by the human perception than using a "direct"
RGB-representation.
Wikipedia,YUV
YUV
Y′UV was invented when engineers wanted color television
in a black-and-white infrastructure.
The luma component already existed as the black and white signal; they added
the UV signal to this as a solution.
The U and V signals tell the television to shift the color of a certain pixel
without altering its brightness. Or the U and V signals tell the monitor to make
one color brighter at the cost of the other and by how much it should be
shifted.
Wikipedia,YUV
Y’CbCr
Y′CbCr is often confused with theYUV color space,
and typically the termsYCbCr andYUV are used
interchangeably, leading to some confusion.
The main difference is that YUV is analog andYCbCr is digital.
Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high
resolution or transmitted at high bandwidth, and two chroma components (CB
and CR) that can be bandwidth-reduced, subsampled, compressed, or
otherwise treated separately for improved system efficiency.
Wikipedia,YCbCr
Chroma Subsampling
Chroma subsampling is the practice of encoding images by implementing less
resolution for chroma information than for luma information, taking
advantage of the human visual system's lower acuity for color differences
than for luminance.
The subsampling scheme is commonly expressed as a three part ratio J:a:b, that
describe the number of luminance and chrominance samples in a conceptual
region that is J pixels wide, and 2 pixels high.
● J: horizontal sampling reference (width of the conceptual region). Usually, 4.
● a: number of chrominance samples (Cr, Cb) in the first row of J pixels.
● b: number of changes of chrominance samples (Cr, Cb) between first and second row of J pixels.
Wikipedia, Chroma Subsampling
Commonly used ratios for JPEGs
Wikipedia, Chroma Subsampling
2. Discrete Cosine Transform
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
2. Discrete Cosine Transform
After subsampling, each channel must be split into 8×8 blocks. Depending on
chroma subsampling, this yields Minimum Coded Unit (MCU) blocks of size 8×8
(4:4:4 – no subsampling), 16×8 (4:2:2), or most commonly 16×16 (4:2:0).
If the data for a channel does not represent an integer number of blocks then the
encoder must fill the remaining area of the incomplete blocks with some form
of dummy data. Filling the edges with a fixed color can create ringing
artifacts along the visible part of the border; repeating the edge pixels is a
common technique that reduces (but does not necessarily completely eliminate)
such artifacts, and more sophisticated border filling techniques can also be
applied.
Wikipedia, JPEG
Fourier Transform
The Fourier transform (FT) decomposes
a function of time (a signal) into its constituent
frequencies. This is similar to the way a musical
chord can be expressed in terms of the volumes and frequencies of its
constituent notes.
The Fourier transform of a function of time is itself a complex-valued function of
frequency, whose magnitude (modulus) represents the amount of that
frequency present in the original function, and whose argument is the phase
offset of the basic sinusoid in that frequency.
Wikipedia, Fourier Transform
2. Discrete Cosine Transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in
terms of a sum of cosine functions oscillating at different frequencies. DCTs
are important to numerous applications in science and engineering, from lossy
compression of audio (e.g. MP3), images (e.g. JPEG) (where small
high-frequency components can be discarded), and video (e.g. MPEG)
Wikipedia, DCT
The DCT transforms an 8×8 block of input values to a linear combination of
these 64 patterns. The patterns are referred to as the two-dimensional DCT
basis functions, and the output values are referred to as transform
coefficients.
Wikipedia, JPEG
Discrete Cosine Transform Example
= Σ
u (horizontal spatial frequency) = 0 → 7
v(verticalspatialfrequency)=0→7
Wikipedia, Discrete Cosine Transform
3. Quantization
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
Discrete Cosine Transform
Note the top-left corner entry with the rather large
magnitude. This is the DC coefficient, which defines
the basic hue for the entire block. The remaining 63
coefficients are the AC coefficients.
The advantage of the DCT is its tendency to aggregate
most of the signal in one corner of the result. The
quantization step to follow accentuates this effect while
simultaneously reducing the overall size of the DCT
coefficients, resulting in a signal that is easy to
compress efficiently in the entropy stage.
Wikipedia, JPEG
3. Quantization
Note the top-left corner entry with the rather large
magnitude. This is the DC coefficient, which defines
the basic hue for the entire block. The remaining 63
coefficients are the AC coefficients.
The human eye is good at seeing small differences in
brightness over a relatively large area,
but not so good at distinguishing the exact strength of
a high frequency brightness variation.
Wikipedia, JPEG
3. Quantization
The human eye is good at seeing small differences in brightness over a relatively
large area (DC Coefficients, basic hue), but not so good at distinguishing the
exact strength of a high frequency brightness variation (AC Coefficients).
This allows one to greatly reduce the amount of information in the high frequency
components. This is done by simply dividing each component in the frequency
domain by a constant for that component, and then rounding to the nearest
integer.
Wikipedia, JPEG
Quantization Matrix
A typical quantization matrix
(for a quality of 50% as specified in the original JPEG Standard)
Wikipedia, JPEG
Quantization Matrix
The quantization matrix is designed to provide more resolution to more
perceivable frequency components over less perceivable components
(usually lower frequencies over high frequencies) in addition to transforming as
many components to 0, which can be encoded with greatest efficiency.
This rounding operation is the only lossy operation in the whole process (other
than chroma subsampling) if the DCT computation is performed with sufficiently
high precision. As a result of this, it is typically the case that many of the higher
frequency components are rounded to zero, and many of the rest become
small positive or negative numbers, which take many fewer bits to represent.
Wikipedia, JPEG
Visual Artifacts of JPEG
Wikipedia, JPEG
4. Encoding
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
Run-length Encoding (RLE)
Run-length encoding (RLE) is a very simple form of lossless data compression in
which runs of data (that is, sequences in which the same data value occurs in
many consecutive data elements) are stored as a single data value and count,
rather than as the original run. This is most useful on data that contains many
such runs.
Example. WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWW
→ 12W1B12W3B24W
Wikipedia, Run-length Encoding
Shannon Entropy
Information entropy is the average rate at which
information is produced by a stochastic source
of data. The measure of information entropy
associated with each possible data value is:
When the data source produces a low-probability
value (when a low-probability event occurs), the
event carries more "information" ("surprisal")
than a high-probability event.
Wikipedia, Shannon Entropy
Entropy H(X) of a coin flip
Huffman Encoding
In computer science and information theory, a Huffman code is a particular type
of optimal prefix code that is commonly used for lossless data compression.
The output from Huffman's algorithm can be viewed as a variable-length code
table for encoding a source symbol (such as a character in a file). The
algorithm derives this table from the estimated probability or frequency of
occurrence (weight) for each possible value of the source symbol. As in other
entropy encoding methods, more common symbols are generally represented
using fewer bits than less common symbols.
Wikipedia, Huffman Encoding
Huffman Encoding Example
Wikipedia, Huffman Encoding
Huffman Encoding Example
Wikipedia, Huffman Encoding
Near optimal
4. Encoding
It involves arranging the image components in a "zigzag" order employing
run-length encoding (RLE) algorithm that groups similar frequencies together,
inserting length coding zeros, and then using Huffman coding on what is left.
The JPEG standard provides general-purpose Huffman tables; encoders may
also choose to generate Huffman tables optimized for the actual frequency
distributions in images being encoded.
Wikipedia, JPEG
Why RLE?
It involves arranging the image components in a
"zigzag" order employing run-length encoding
(RLE) algorithm that groups similar frequencies
together, inserting length coding zeros, and then
using Huffman coding on what is left.
It is typically the case that many of the higher
frequency components are rounded to zero.
Wikipedia, JPEG
Why not use Huffman Encoding directly?
A traditional Huffman code would be obliged to use at least one bit per
character. … The entropy of English, given a good model, is about one bit per
character (Shannon, 1948), so a Huffman code is likely to be highly inefficient.
A traditional patch-up of Huffman codes uses then to compress blocks of
symbols, … but only at the expense of losing the elegant instantaneous
decodeability, … and having to compute the probabilities of all relevant strings
… end up explicitly computing the probabilities and codes for a huge number of
strings, most of which will never actually occur. … They are optimal symbol
codes, but for practical purposes we don’t want a symbol code.
Information Theory, Inference, and Learning Algorithms (MacKay, 2005)
5. Summary
Y’CbCr and
Chroma Subsampling
Discrete Cosine Transform
on Spatial Frequency
Effective Quantization of
AC Coefficients
Run-length Encoding
Huffman Encoding
Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm

JFEF encoding

  • 1.
  • 2.
    JPEG JPEG is acommonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. Wikipedia, JPEG
  • 3.
  • 4.
    1. Color Transform& Downsampling Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
  • 5.
    1. Color Transform& Downsampling The representation of the colors in the image is converted from RGB toY′CBCR, consisting of one luma component (Y'), representing brightness, and two chroma components, (CB and CR), representing color. This step is sometimes skipped. The resolution of the chroma data is reduced, usually by a factor of 2 or 3. This reflects the fact that the eye is less sensitive to fine color details than to fine brightness details. Wikipedia, JPEG
  • 6.
    YUV YUV (…) encodesa color image or video taking human perception into account, allowing reduced bandwidth for chrominance components, thereby typically enabling transmission errors or compression artifacts to be more efficiently masked by the human perception than using a "direct" RGB-representation. Wikipedia,YUV
  • 7.
    YUV Y′UV was inventedwhen engineers wanted color television in a black-and-white infrastructure. The luma component already existed as the black and white signal; they added the UV signal to this as a solution. The U and V signals tell the television to shift the color of a certain pixel without altering its brightness. Or the U and V signals tell the monitor to make one color brighter at the cost of the other and by how much it should be shifted. Wikipedia,YUV
  • 8.
    Y’CbCr Y′CbCr is oftenconfused with theYUV color space, and typically the termsYCbCr andYUV are used interchangeably, leading to some confusion. The main difference is that YUV is analog andYCbCr is digital. Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (CB and CR) that can be bandwidth-reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency. Wikipedia,YCbCr
  • 9.
    Chroma Subsampling Chroma subsamplingis the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance. The subsampling scheme is commonly expressed as a three part ratio J:a:b, that describe the number of luminance and chrominance samples in a conceptual region that is J pixels wide, and 2 pixels high. ● J: horizontal sampling reference (width of the conceptual region). Usually, 4. ● a: number of chrominance samples (Cr, Cb) in the first row of J pixels. ● b: number of changes of chrominance samples (Cr, Cb) between first and second row of J pixels. Wikipedia, Chroma Subsampling
  • 10.
    Commonly used ratiosfor JPEGs Wikipedia, Chroma Subsampling
  • 11.
    2. Discrete CosineTransform Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm
  • 12.
    2. Discrete CosineTransform After subsampling, each channel must be split into 8×8 blocks. Depending on chroma subsampling, this yields Minimum Coded Unit (MCU) blocks of size 8×8 (4:4:4 – no subsampling), 16×8 (4:2:2), or most commonly 16×16 (4:2:0). If the data for a channel does not represent an integer number of blocks then the encoder must fill the remaining area of the incomplete blocks with some form of dummy data. Filling the edges with a fixed color can create ringing artifacts along the visible part of the border; repeating the edge pixels is a common technique that reduces (but does not necessarily completely eliminate) such artifacts, and more sophisticated border filling techniques can also be applied. Wikipedia, JPEG
  • 13.
    Fourier Transform The Fouriertransform (FT) decomposes a function of time (a signal) into its constituent frequencies. This is similar to the way a musical chord can be expressed in terms of the volumes and frequencies of its constituent notes. The Fourier transform of a function of time is itself a complex-valued function of frequency, whose magnitude (modulus) represents the amount of that frequency present in the original function, and whose argument is the phase offset of the basic sinusoid in that frequency. Wikipedia, Fourier Transform
  • 14.
    2. Discrete CosineTransform A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio (e.g. MP3), images (e.g. JPEG) (where small high-frequency components can be discarded), and video (e.g. MPEG) Wikipedia, DCT The DCT transforms an 8×8 block of input values to a linear combination of these 64 patterns. The patterns are referred to as the two-dimensional DCT basis functions, and the output values are referred to as transform coefficients. Wikipedia, JPEG
  • 15.
    Discrete Cosine TransformExample = Σ u (horizontal spatial frequency) = 0 → 7 v(verticalspatialfrequency)=0→7 Wikipedia, Discrete Cosine Transform
  • 16.
  • 17.
    Discrete Cosine Transform Notethe top-left corner entry with the rather large magnitude. This is the DC coefficient, which defines the basic hue for the entire block. The remaining 63 coefficients are the AC coefficients. The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the result. The quantization step to follow accentuates this effect while simultaneously reducing the overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently in the entropy stage. Wikipedia, JPEG
  • 18.
    3. Quantization Note thetop-left corner entry with the rather large magnitude. This is the DC coefficient, which defines the basic hue for the entire block. The remaining 63 coefficients are the AC coefficients. The human eye is good at seeing small differences in brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency brightness variation. Wikipedia, JPEG
  • 19.
    3. Quantization The humaneye is good at seeing small differences in brightness over a relatively large area (DC Coefficients, basic hue), but not so good at distinguishing the exact strength of a high frequency brightness variation (AC Coefficients). This allows one to greatly reduce the amount of information in the high frequency components. This is done by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. Wikipedia, JPEG
  • 20.
    Quantization Matrix A typicalquantization matrix (for a quality of 50% as specified in the original JPEG Standard) Wikipedia, JPEG
  • 21.
    Quantization Matrix The quantizationmatrix is designed to provide more resolution to more perceivable frequency components over less perceivable components (usually lower frequencies over high frequencies) in addition to transforming as many components to 0, which can be encoded with greatest efficiency. This rounding operation is the only lossy operation in the whole process (other than chroma subsampling) if the DCT computation is performed with sufficiently high precision. As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers, which take many fewer bits to represent. Wikipedia, JPEG
  • 22.
    Visual Artifacts ofJPEG Wikipedia, JPEG
  • 23.
  • 24.
    Run-length Encoding (RLE) Run-lengthencoding (RLE) is a very simple form of lossless data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs. Example. WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWW → 12W1B12W3B24W Wikipedia, Run-length Encoding
  • 25.
    Shannon Entropy Information entropyis the average rate at which information is produced by a stochastic source of data. The measure of information entropy associated with each possible data value is: When the data source produces a low-probability value (when a low-probability event occurs), the event carries more "information" ("surprisal") than a high-probability event. Wikipedia, Shannon Entropy Entropy H(X) of a coin flip
  • 26.
    Huffman Encoding In computerscience and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol (such as a character in a file). The algorithm derives this table from the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Wikipedia, Huffman Encoding
  • 27.
  • 28.
    Huffman Encoding Example Wikipedia,Huffman Encoding Near optimal
  • 29.
    4. Encoding It involvesarranging the image components in a "zigzag" order employing run-length encoding (RLE) algorithm that groups similar frequencies together, inserting length coding zeros, and then using Huffman coding on what is left. The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables optimized for the actual frequency distributions in images being encoded. Wikipedia, JPEG
  • 30.
    Why RLE? It involvesarranging the image components in a "zigzag" order employing run-length encoding (RLE) algorithm that groups similar frequencies together, inserting length coding zeros, and then using Huffman coding on what is left. It is typically the case that many of the higher frequency components are rounded to zero. Wikipedia, JPEG
  • 31.
    Why not useHuffman Encoding directly? A traditional Huffman code would be obliged to use at least one bit per character. … The entropy of English, given a good model, is about one bit per character (Shannon, 1948), so a Huffman code is likely to be highly inefficient. A traditional patch-up of Huffman codes uses then to compress blocks of symbols, … but only at the expense of losing the elegant instantaneous decodeability, … and having to compute the probabilities of all relevant strings … end up explicitly computing the probabilities and codes for a huge number of strings, most of which will never actually occur. … They are optimal symbol codes, but for practical purposes we don’t want a symbol code. Information Theory, Inference, and Learning Algorithms (MacKay, 2005)
  • 32.
    5. Summary Y’CbCr and ChromaSubsampling Discrete Cosine Transform on Spatial Frequency Effective Quantization of AC Coefficients Run-length Encoding Huffman Encoding Ref. https://www.fileformat.info/mirror/egff/ch09_06.htm