Audio compression 1

1
Audio Compression
Techniques
Prepared by
Razia Nisar Noorani
Lecture 8

2
Introduction
 Digital Audio Compression
 Removal of redundant or otherwise irrelevant
information from audio signal
 Audio compression algorithms are often referred to as
“audio encoders”
 Applications
 Reduces required storage space
 Reduces required transmission bandwidth

3
Audio Compression
 Audio signal – overview
 Sampling rate (# of samples per second)
 Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a
1.4MBps bit rate
 Number of channels (mono / stereo / multichannel)
 Reduction by lowering those values or by data
compression / encoding

4
Audio Data Compression
 Redundant information
 Implicit in the remaining information
 Ex. oversampled audio signal
 oversampling is the process of sampling a signal with a
sampling frequency significantly higher than twice the
bandwidth or highest frequency of the signal being sampled
 Irrelevant information
 Perceptually insignificant
 Cannot be recovered from remaining information

5
 Lossless Audio Compression
Removes redundant data
Resulting signal is same as original – perfect
reconstruction
 Lossy Audio Encoding
Removes irrelevant data
Resulting signal is similar to original

6
 Audio vs. Speech Compression
Techniques
Speech Compression uses a human vocal
tract model to compress signals
Audio Compression does not use this
technique due to larger variety of possible
signal variations

7
Generic Audio Encoder
 Psychoacoustic Model
Psychoacoustics – study of how sounds are
perceived by humans
Uses perceptual coding
 eliminate information from audio signal that is
inaudible to the ear
Detects conditions under which different audio
signal components mask each other

8
Psychoacoustic Model
 Signal Masking
Threshold cut-off
Spectral (Frequency / Simultaneous) Masking
Temporal Masking
 Threshold cut-off and spectral masking
occur in frequency domain, temporal
masking occurs in time domain

9
Signal Masking
 Threshold cut-off
 Hearing threshold level
– a function of
frequency
 Any frequency
components below the
threshold will not be
perceived by human
ear

10
Signal Masking
 Spectral Masking
 A frequency
component can be
partly or fully masked
by another component
that is close to it in
frequency
 This shifts the hearing
threshold

11
Signal Masking
 Temporal Masking
 A quieter sound can
be masked by a louder
sound if they are
temporally close
 Sounds that occur
both (shortly) before
and after volume
increase can be
masked

12
Spectral Analysis
 a device or algorithm that identifies a frequency
domain representation of a time domain signal.
 Tasks of Spectral Analysis
 To derive masking thresholds to determine which
signal components can be eliminated
 To generate a representation of the signal to which
masking thresholds can be applied
 Spectral Analysis is done through transforms or
filter banks

13
Spectral Analysis
 Transforms
Fast Fourier Transform (FFT)
Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values only
Modified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3] – overlapped and windowed
version of DCT

14
Spectral Analysis
 Filter Banks
 a filter bank is an array of band-pass filters that
separates the input signal into multiple
components, each one carrying a single
frequency subband of the original signal
 Time sample blocks are passed through a set of
bandpass filters
 Masking thresholds are applied to resulting frequency
subband signals
 Poly-phase and wavelet banks are most popular filter
structures

15
Filter Bank Structures
 Polyphase Filter Bank
[used in all of the MPEG-1 encoders]
Signal is separated into subbands, the widths
of which are equal over the entire frequency
range
The resulting subband signals are
downsampled to create shorter signals (which
are later reconstructed during decoding
process)

16
Filter Bank Structures
 Wavelet Filter Bank
[used by Enhanced Perceptual Audio
Coder (EPAC) by Lucent]
Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for
higher frequencies)
This allows for better time resolution (ex. short
attacks), but at expense of frequency
resolution

17
Noise Allocation
 System Task: derive and apply shifted hearing
threshold to the input signal
 Anything below the threshold doesn’t need to be
transmitted
 Any noise below the threshold is irrelevant
 Frequency component quantization
 Tradeoff between space and noise
 Encoder saves on space by using just enough bits for
each frequency component to keep noise under the
threshold - this is known as noise allocation

18
Noise Allocation
 Pre-echo
 In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be
audible noise in the silent part of the block after
decoding
 This is avoided by pre-monitoring audio data at
encoding stage and separating audio into shorter
blocks in potential pre-echo case
 This does not completely eliminate pre-echo, but can
make it short enough to be masked by the attack
(temporal masking)

19
Additional Encoding Techniques
 Other encoding techniques techniques are
available (alternative or in combination)
Predictive Coding
Coupling / Delta Encoding
Huffman Encoding

20
 Predictive Coding
 Often used in speech and image compression
 Estimates the expected value for each sample based
on previous sample values
 Transmits/stores the difference between the expected
and received value
 Generates an estimate for the next sample and then
adjusts it by the difference stored for the current
sample
 Used for additional compression in MPEG2 AAC
(Advance audio Coding)

21
 Coupling / Delta encoding
 Used in cases where audio signal consists of two or
more channels (stereo or surround sound)
 Similarities between channels are used for
compression
 A sum and difference between two channels are
derived; difference is usually some value close to zero
and therefore requires less space to encode
 This is a case of lossless encoding process

22
 Huffman Coding
 Information-theory-based technique
 An element of a signal that often reoccurs in the
signal is represented by a simpler symbol, and its
value is stored in a look-up table
 Implemented using a look-up tables in encoder and in
decoder
 Provides substantial lossless compression, but
requires high computational power and therefore is
not very popular
 Used by MPEG1 and MPEG2 AAC

23
Encoding - Final Stages
 Audio data packed into frames
 Frames stored or transmitted

Audio compression 1

In this document