1
Audio Compression
Techniques
Prepared by
Razia Nisar Noorani
Lecture 8
2
Introduction
 Digital Audio Compression
 Removal of redundant or otherwise irrelevant
information from audio signal
 Audio compression algorithms are often referred to as
“audio encoders”
 Applications
 Reduces required storage space
 Reduces required transmission bandwidth
3
Audio Compression
 Audio signal – overview
 Sampling rate (# of samples per second)
 Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a
1.4MBps bit rate
 Number of channels (mono / stereo / multichannel)
 Reduction by lowering those values or by data
compression / encoding
4
Audio Data Compression
 Redundant information
 Implicit in the remaining information
 Ex. oversampled audio signal
 oversampling is the process of sampling a signal with a
sampling frequency significantly higher than twice the
bandwidth or highest frequency of the signal being sampled
 Irrelevant information
 Perceptually insignificant
 Cannot be recovered from remaining information
5
Audio Data Compression
 Lossless Audio Compression
Removes redundant data
Resulting signal is same as original – perfect
reconstruction
 Lossy Audio Encoding
Removes irrelevant data
Resulting signal is similar to original
6
Audio Data Compression
 Audio vs. Speech Compression
Techniques
Speech Compression uses a human vocal
tract model to compress signals
Audio Compression does not use this
technique due to larger variety of possible
signal variations
7
Generic Audio Encoder
 Psychoacoustic Model
Psychoacoustics – study of how sounds are
perceived by humans
Uses perceptual coding
 eliminate information from audio signal that is
inaudible to the ear
Detects conditions under which different audio
signal components mask each other
8
Psychoacoustic Model
 Signal Masking
Threshold cut-off
Spectral (Frequency / Simultaneous) Masking
Temporal Masking
 Threshold cut-off and spectral masking
occur in frequency domain, temporal
masking occurs in time domain
9
Signal Masking
 Threshold cut-off
 Hearing threshold level
– a function of
frequency
 Any frequency
components below the
threshold will not be
perceived by human
ear
10
Signal Masking
 Spectral Masking
 A frequency
component can be
partly or fully masked
by another component
that is close to it in
frequency
 This shifts the hearing
threshold
11
Signal Masking
 Temporal Masking
 A quieter sound can
be masked by a louder
sound if they are
temporally close
 Sounds that occur
both (shortly) before
and after volume
increase can be
masked
12
Spectral Analysis
 a device or algorithm that identifies a frequency
domain representation of a time domain signal.
 Tasks of Spectral Analysis
 To derive masking thresholds to determine which
signal components can be eliminated
 To generate a representation of the signal to which
masking thresholds can be applied
 Spectral Analysis is done through transforms or
filter banks
13
Spectral Analysis
 Transforms
Fast Fourier Transform (FFT)
Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values only
Modified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3] – overlapped and windowed
version of DCT
14
Spectral Analysis
 Filter Banks
 a filter bank is an array of band-pass filters that
separates the input signal into multiple
components, each one carrying a single
frequency subband of the original signal
 Time sample blocks are passed through a set of
bandpass filters
 Masking thresholds are applied to resulting frequency
subband signals
 Poly-phase and wavelet banks are most popular filter
structures
15
Filter Bank Structures
 Polyphase Filter Bank
[used in all of the MPEG-1 encoders]
Signal is separated into subbands, the widths
of which are equal over the entire frequency
range
The resulting subband signals are
downsampled to create shorter signals (which
are later reconstructed during decoding
process)
16
Filter Bank Structures
 Wavelet Filter Bank
[used by Enhanced Perceptual Audio
Coder (EPAC) by Lucent]
Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for
higher frequencies)
This allows for better time resolution (ex. short
attacks), but at expense of frequency
resolution
17
Noise Allocation
 System Task: derive and apply shifted hearing
threshold to the input signal
 Anything below the threshold doesn’t need to be
transmitted
 Any noise below the threshold is irrelevant
 Frequency component quantization
 Tradeoff between space and noise
 Encoder saves on space by using just enough bits for
each frequency component to keep noise under the
threshold - this is known as noise allocation
18
Noise Allocation
 Pre-echo
 In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be
audible noise in the silent part of the block after
decoding
 This is avoided by pre-monitoring audio data at
encoding stage and separating audio into shorter
blocks in potential pre-echo case
 This does not completely eliminate pre-echo, but can
make it short enough to be masked by the attack
(temporal masking)
19
Additional Encoding Techniques
 Other encoding techniques techniques are
available (alternative or in combination)
Predictive Coding
Coupling / Delta Encoding
Huffman Encoding
20
Additional Encoding Techniques
 Predictive Coding
 Often used in speech and image compression
 Estimates the expected value for each sample based
on previous sample values
 Transmits/stores the difference between the expected
and received value
 Generates an estimate for the next sample and then
adjusts it by the difference stored for the current
sample
 Used for additional compression in MPEG2 AAC
(Advance audio Coding)
21
Additional Encoding Techniques
 Coupling / Delta encoding
 Used in cases where audio signal consists of two or
more channels (stereo or surround sound)
 Similarities between channels are used for
compression
 A sum and difference between two channels are
derived; difference is usually some value close to zero
and therefore requires less space to encode
 This is a case of lossless encoding process
22
Additional Encoding Techniques
 Huffman Coding
 Information-theory-based technique
 An element of a signal that often reoccurs in the
signal is represented by a simpler symbol, and its
value is stored in a look-up table
 Implemented using a look-up tables in encoder and in
decoder
 Provides substantial lossless compression, but
requires high computational power and therefore is
not very popular
 Used by MPEG1 and MPEG2 AAC
23
Encoding - Final Stages
 Audio data packed into frames
 Frames stored or transmitted
24
Questions

Audio compression 1

  • 1.
  • 2.
    2 Introduction  Digital AudioCompression  Removal of redundant or otherwise irrelevant information from audio signal  Audio compression algorithms are often referred to as “audio encoders”  Applications  Reduces required storage space  Reduces required transmission bandwidth
  • 3.
    3 Audio Compression  Audiosignal – overview  Sampling rate (# of samples per second)  Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate  Number of channels (mono / stereo / multichannel)  Reduction by lowering those values or by data compression / encoding
  • 4.
    4 Audio Data Compression Redundant information  Implicit in the remaining information  Ex. oversampled audio signal  oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled  Irrelevant information  Perceptually insignificant  Cannot be recovered from remaining information
  • 5.
    5 Audio Data Compression Lossless Audio Compression Removes redundant data Resulting signal is same as original – perfect reconstruction  Lossy Audio Encoding Removes irrelevant data Resulting signal is similar to original
  • 6.
    6 Audio Data Compression Audio vs. Speech Compression Techniques Speech Compression uses a human vocal tract model to compress signals Audio Compression does not use this technique due to larger variety of possible signal variations
  • 7.
    7 Generic Audio Encoder Psychoacoustic Model Psychoacoustics – study of how sounds are perceived by humans Uses perceptual coding  eliminate information from audio signal that is inaudible to the ear Detects conditions under which different audio signal components mask each other
  • 8.
    8 Psychoacoustic Model  SignalMasking Threshold cut-off Spectral (Frequency / Simultaneous) Masking Temporal Masking  Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain
  • 9.
    9 Signal Masking  Thresholdcut-off  Hearing threshold level – a function of frequency  Any frequency components below the threshold will not be perceived by human ear
  • 10.
    10 Signal Masking  SpectralMasking  A frequency component can be partly or fully masked by another component that is close to it in frequency  This shifts the hearing threshold
  • 11.
    11 Signal Masking  TemporalMasking  A quieter sound can be masked by a louder sound if they are temporally close  Sounds that occur both (shortly) before and after volume increase can be masked
  • 12.
    12 Spectral Analysis  adevice or algorithm that identifies a frequency domain representation of a time domain signal.  Tasks of Spectral Analysis  To derive masking thresholds to determine which signal components can be eliminated  To generate a representation of the signal to which masking thresholds can be applied  Spectral Analysis is done through transforms or filter banks
  • 13.
    13 Spectral Analysis  Transforms FastFourier Transform (FFT) Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT
  • 14.
    14 Spectral Analysis  FilterBanks  a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original signal  Time sample blocks are passed through a set of bandpass filters  Masking thresholds are applied to resulting frequency subband signals  Poly-phase and wavelet banks are most popular filter structures
  • 15.
    15 Filter Bank Structures Polyphase Filter Bank [used in all of the MPEG-1 encoders] Signal is separated into subbands, the widths of which are equal over the entire frequency range The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)
  • 16.
    16 Filter Bank Structures Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies) This allows for better time resolution (ex. short attacks), but at expense of frequency resolution
  • 17.
    17 Noise Allocation  SystemTask: derive and apply shifted hearing threshold to the input signal  Anything below the threshold doesn’t need to be transmitted  Any noise below the threshold is irrelevant  Frequency component quantization  Tradeoff between space and noise  Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation
  • 18.
    18 Noise Allocation  Pre-echo In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding  This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case  This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)
  • 19.
    19 Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination) Predictive Coding Coupling / Delta Encoding Huffman Encoding
  • 20.
    20 Additional Encoding Techniques Predictive Coding  Often used in speech and image compression  Estimates the expected value for each sample based on previous sample values  Transmits/stores the difference between the expected and received value  Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample  Used for additional compression in MPEG2 AAC (Advance audio Coding)
  • 21.
    21 Additional Encoding Techniques Coupling / Delta encoding  Used in cases where audio signal consists of two or more channels (stereo or surround sound)  Similarities between channels are used for compression  A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode  This is a case of lossless encoding process
  • 22.
    22 Additional Encoding Techniques Huffman Coding  Information-theory-based technique  An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table  Implemented using a look-up tables in encoder and in decoder  Provides substantial lossless compression, but requires high computational power and therefore is not very popular  Used by MPEG1 and MPEG2 AAC
  • 23.
    23 Encoding - FinalStages  Audio data packed into frames  Frames stored or transmitted
  • 24.

Editor's Notes

  • #2 Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  • #3 -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size