SlideShare a Scribd company logo
MPEG Audio Coding
Nimrod Peleg
Update: March 2004
Introduction
• High quality low bit-rate audio coding
• MPEG-1: Mono & Stereo, sampling rates of
32KHz, 44.1KHz and 48KHz.
• MPEG-2: Backward compatible coding of
5+1 multi-channel sound, more sampling
rates: 16KHz, 22.05KHz and 24KHz.
Some Facts
• MPEG-1: 1.5 Mbits/sec for audio and video:
~1.1Mbps for video, 0.3-0.4Mbps for audio
• Uncompressed CD audio is 44,100 samples/sec
* 16 bits/sample * 2 channels > 1.4Mbps
• Typical Compression factors: from 2.7 to 24
• With Compression rate 6:1 (16 bits stereo
sampled at 48 KHz is reduced to 256 kbps) and
optimal listening conditions, expert listeners
could not distinguish between coded and
original audio clips.
Some Facts (Cont’d)
• MPEG-1 audio supports sampling
frequencies of 32, 44.1 and 48 KHz.
• Supports one or two audio channels in one
of the four modes:
– Monophonic: single audio channel
– Dual-monophonic: two independent channels
(similar to stereo)
– Stereo: for stereo channels that share bits, but
not using joint-stereo coding
– Joint-stereo: takes advantage of the correlation
between stereo channels
Basic Idea: PsychoAcoustics
• How much noise can be introduced to the
signal without being audible ?
PsychoAcoustic Model
Masking in the frequency domain
Reminder: Cochlear filter mechanism
• The bandwidth of filters (in the ‘filter bank’)
varies strongly from low to high frequencies
• Center frequencies are call ‘critical bands’:
mapping frequency onto a linear distance
measure along the basiliar membrane.
• Filters bandwidth variation: 40:1
• Filters time response variation: 1:40
• Simultaneous control of time/frequency
artifacts at 40:1 resolution range is difficult !
Barks
• Assuming that each critical band corresponds to a fixed
distance along the basiliar membrane, we define a unit
of length z(f) to be one critical band, and call it “Bark”
(after Barkhausen).
• The approximation of z(f) is done using:
z/Bark= 13 Arctan(0.76/1KHz) + 3.5 Arctan[(f/7.5kHz)2)
• Bark width vary from ~100Hz in low freq. and
4kHz at ~15kHz.
The Barks table
Bark# flow(Hz) fhigh(Hz) fcenter(Hz) BandWidth
0 0 100 50 100
1 100 200 150 100
2 200 300 250 100
•
10 1270 1480 1370 210
11 1480 1720 1600 240
•
22 9500 12000 10500 2500
23 12000 15500 13500 3500
24 15500
Reminder: Auditory mechanism (HAS)
PsychoAcoustics Model
• Frequency is divided into “Barks”: bands of
non-uniform width (narrower in lower freq.)
according to the ear’s “resolution”
• Masking is influenced by two major parameters:
– Tonal component (Masks the near frequency)
– Noise component (Masks near and lower noises)
• Given an audio signal, the model creates a
masking function
Measures
• SMR: Signal to Mask Ratio
• SNR: Signal to Noise Ratio
• MNR = SNR - SMR
Is the ratio between the mask energy and the
quantization noise
• Positive MNR: The noise injected in the
quantization process is higher than masking
level, and will be heard after reconstruction
Basic Coding Structure
Audio In
Analysis
FliterBank
Perceptual
Model
Quantization
+Bit Allocation
Bitstream
Encoding
Bit-
Stream
Output
1 3
2 Side information
(Bit allocation etc.)
Included in bitstream
Side info.
Block switching info.
Perceptual
threshold
Basic Coding Structure (Cont’d)
1. Input signal is decomposed into sub-sampled
spectral components (time/freq. domain)
2. A time-dependent mask threshold is estimated
3. Spectral components are quantized and coded,
keeping the noise (introduced in quantization)
below the mask threshold (Many implementations)
• Bit-Allocation:
1 bit of quantization introduces about 6 dB of noise
Bit-Stream Structure
Frame 1 Frame 2 Frame ... Frame N
Header
(32 bits)
CRC
(16 bits)
Coded
Data
Ancillary
Data
Header Contents
• SyncWord (12 bits)
• Layer Code (2 bits): Layer I, II or III
• Bit-rate Index (4 bits): according to the table in
next slide, 32Kbps up-to 448Kbps.
• Sampling Frequency (2 bits): 48, 44.1 or 32kHz.
• Padding (1 bit): number of slots, N or N+1
• Mode (2 bits): Stereo, Joint Stereo, Dual or
Single Channel.
Available Bit-Rates (Kbps)
Index Layer I Layer II Layer III
0000 free format free format free format
0001 32 32 32
0010 64 48 40
0011 96 56 48
0100 128 64 56
0101 160 80 64
0110 192 96 80
0111 224 112 96
1000 256 128 112
1001 288 160 128
1010 320 192 160
1011 352 224 192
1100 384 256 224
1101 416 320 256
1110 448 384 320
Perceptual Model
• A good estimation of actual masked threshold is
essential for a better quality
• A very simple model would allocate bits
according to : n_bits=(27dB*lu-lo) / 6.02dB
lu: upper band limit lo: lower band limit
(Measured in Bark)
• More advanced models (SMR) are in use
Perceptual Model: reminder
• The masking occurs in each critical band.
• Critical band represents the bandwidth at which
subjective response change rather fast.
• The bandwidth of the critical bands varies from
100Hz at low frequencies to about (0.2 x f) for
frequencies above 500Hz.
• The loudness of a band of noise at a constant sound
pressure remains constant in the critical band.
• The corresponding unit for the critical band is
bark.
Filter Banks
• Sub-Band Coders use low number of
channels, connected with processing of
adjacent samples in time
• Transform Coders use high number of sub-
bands and joint processing of adjacent
samples in frequency
No Basic difference between both approaches
Quantization
Two basic approaches
• Block Companding (block floating point):
A number of values, ordered either in time
domain or in frequency domain are normalized
to maximum absolute value (by scale factor)
– Number of bits allocated for the block (derived
from the perceptual model) derives the
quantization step size
Quantization (Cont’d)
• Noise allocation + Scalar Quant. + Huffman:
Instead of bit allocation, an amount of allowed
noise equal to the estimated masked threshold
is calculated for each scale-factor sub-band
Quantization noise is colored using scale factors,
by changing quantization step size
– Quantized values are Huffman coded
– Process is controlled by iteration loops
MPEG-1 System
• First international standard for digital audio
compression
• Joint effort of ASPEC (AT&T, CNET ...)
and MUSICAM (Philips, Matsushita ,....)
• A three layer coding algorithms defined
with main system properties are increased
complexity (encoder mainly ) and quality
(at low bit-rates)
MPEG-1 Layers
• MPEG defines 3 layers for audio.
Basic model is same, but Codec complexity
increases with each layer.
• Data is divided into frames, each of them
contains 384 samples, 12 samples from each
of the 32 filtered sub-bands as shown in the
next slide
• All layers share definition of basic
bitstream format (4 bytes header, sync. Etc.)
MPEG-1 , Layer 1
• Input signal transformed into 32 uniform sub-
bands (same frequency width for each band).
• For each sub-band an adaptive bit allocation
(based on PA model) and quantization
• Psychoacoustic model uses only frequency
masking.
• No control on the amount of noise introduced
for each sub-band : bit allocation continues
until needed output rate achieved
Layer I : more details
• Filter bank: Equally spaced Polyphase filter:
design flexibility of generalized QMF and
low computational complexity
• 511 tap prototype used, optimized for very
steep response, and stop band attenuation
better than 96dB (equivalent to 16 bit resolution)
– reconstruction error: LSB (of 16) if no quantization
• Impulse response of 10.6ms (@48kHz)
• Time resolution: 0.66ms (@48kHz)
• The prototype filter keeps pre-echo artifacts !
Layer I : more details (Cont’d)
• Quantization step uses block companding of 12
subband samples
– Basic block length: 12*32=384 samples
• A 4-bit field signals the bit allocation: 0-16 bits
for each subband
• A 6-bit field scale factor (G) for each band,
– the exponent of the block companding quantization
• This method allows changes in the bit
allocation procedure
Layer 1 features
• A simplified version of MUSICAM
• Appropriate for consumer applications such
as studio use
(where very low data rated not necessary)
• Compatible with PASC by Philips
• Basic frame length: 8mSec (for 48KHz rate)
MPEG-1, Layer 2
• Further compression, by removing
redundancy (a little bit of the temporal
masking) and a more precise quantization
• Basic frame length: 24mSec (for 48KHz
sampling freq.): 1152 samples
• Identical to MUSICAM (Except frame
header)
• Application fields: consumer and
professional studio-like broadcasting,
recording, multi-media, audio workstations.
MPEG-1, Layer 2 (Cont’d)
• Additional coding of bit-allocation, scale
factors and different frame structure.
• Encoder forms larger groups of 3 blocks,
12 samples/block, and 32 sub-bands (total
of 1152 samples per frame).
• One bit-allocation type and 3 scale factors
for every 3 blocks frame.
• Radix coding allows allocation of fractional
bits for small quantized values
MPEG-1 , Layer 3 (mp3)
• Signal mapping resolution is increased (in
freq. domain: non-equal frequencies)
• Signal is divided into “critical bands”,
according to human ear resolution
• Adaptive allocation of noise to each critical
band, and logarithmic quantization
• Further compression by Huffman coding
• PA model includes temporal masking effects,
takes into account stereo redundancy.
Non-Linear Critical Bands
0 2 4 6 8 10 12 14 16 18 20 22 24
0 3 6 9 12 15
Critical Bands [KHz]
Bark Numbers
Layer 3 Basic Scheme
FilterBank
(Polyphase)
MDCT
Quantizer +
Huffman
Frame Packet
and error corr.
PsychoAcoustic
Model
(Optional)
Hybrid
Filter-
Bank
PCM In
Packet Bitstream
Layer 3 Basic Scheme (Cont’d)
• PolyPhase FilterBank: Filters the input into 32
time domain signals, representing 32 uniform
frequency bands.
• MDCT: Transforms each band into freq. domain,
getting 576 spectral lines (32*18 samples):
additional frequency resolution: 18 sub-subbands
• PA Model: Analyses the input, and controls the
quantization step size
• Quantizer + Coder: Quantizes according to PA
and needed rate + Huffman coding
Layer 3 Basic Scheme (Cont’d)
• Adaptive block switching: Dynamic switching
of the time-frequency decomposition (filter
bank resolution) is allowed
• This is important in order to ensure that the
time spread of the filter bank does not exceed
the pre-masking period (to avoid pre-echo)
• Adaptive window switching uses four optional
windows: normal (long), start, short and stop.
Window types
Normal window
for stationary signals
576 spectral lines
Start window
to switch from long to short
right 1/3 is zero to cancel aliasing
Stop window
to switch from short to long
left 1/3 is zero to cancel aliasing
Short window
1/3 length followed by 1/3 length MDCT
time resolution: 4ms (@48KHz)
192 spectral lines
Example sequence of window forms
Layer III Quantization and Structure
• Non-uniform quantization and VLC (Huffman)
• A-by-S iteration loop
• No bit direct allocation, but ‘Noise allocation’
(indirect allocation) using two iteration loops
Iteration loops
• Two nested iteration loops:
– Rate loop: Inner iteration for quantization and
coding of spectral lines (using Huffman tables)
- repeats with increasing step size, until number
of allocated bits does not exceed the allowed
maximum.
– Distortion Control loop: Outer iteration,
keeping quantization noise below masking
threshold according to the PA model
• Noise coloration: scale factors reduced until injected
noise is small enough
Layer 3 Features
• Short Blocks of 12 samples (in addition to
regular 36 samples blocks) improve time
resolution to cope for transients.
• PA Model and coding technique are NOT
part of the standard - related information
should be included in bit-stream
• Optional variable rate mode
• Application in Telecommunication, mainly
narrow-band ISDN, satellite links , Internet
DVD, etc.
Joint Stereo Coding
• Can be used for stereo redundancy reduction.
• Stereo and Dual-Channel signals require twice
the bandwidth if we code them separately.
• To decrease bit-rate (or increase quality) we
can use intensity stereo mode or Middle/Side
(MS) stereo coding.
• MS stereo coding is supported only in Layer III
Joint Stereo Coding (cont’d)
Intensity stereo coding:
• instead of separate L and R subband
samples, a single summed signal is
transmitted with R and L Scale Factors.
• The frequency spectra of the decoded stereo
signals are the same but the magnitudes are
different.
Joint Stereo Coding (cont’d)
Middle/Side Stereo Mode:
• Middle (sum of L and R) and Side (difference
of L and R) are transmitted instead of L and R.
• M is transmitted in the L channel and S in the
R channel.
• R and L channels can be reconstructed using:
2/)(2/)( SMRSML −=+=
Effectiveness of MPEG audio
Layer Target Ratio Quality @ Quality @ Theoretical
# bitrate 64 kbits 128 kbits Min. Delay
1 192 kbit 4:1 --- --- 19 ms
2 128 kbit 6:1 2.1 to 2.6 4+ 35 ms
3 64 kbit 12:1 3.6 to 3.8 4+ 59 ms
5 = perfect, 4 = just noticeable, 3 = slightly
annoying, 2 = annoying, 1 = very annoying
More About the PA Model
• The difference between max. signal level
and min. masking threshold is used in the
bit or noise allocation to determine Q level
• Two models given in the informative part of
the standard. model 1 recommended for
layers 1,2 and model 2 for layer 3
• PA model output is SMR for each band (L1,
L2) or group of bands (L3)
PA Model I (Layer 1,2)
• Transform length (FFT) is 512 samples for
layer 1 and 1024 for layer 2.
• The filter bank suffers lack of selectivity at low
frequencies.
• To compensate it: FFT in parallel to sub-band
filtering.
• Sound Pressure Level (SPL) is computed for
each band.
• Tonal and non-tonal components are extracted
from the power spectrum.
PA Model I (Cont’d)
• Using “decimation”, number of maskers is reduced:
only components (tonal and non-tonal) greater than the
absolute threshold are considered.
• Two or more components that are smaller than the
highest power within the distance of 0.5 bark are
removed from the list of tonal components.
• Masking thresholds (both t and non-t) are defined by
adding the masking index and masking function to
the masking component (both index and function
are provided in the standard as formal equations)
PA Model I (Cont’d)
• Global masking threshold, LTg, (for the
frequency component) is derived by
summing the powers of the individual
masking thresholds (tonal: LTtm , non-tonal LTnm )
and the threshold in quite:
[dB]101010log10)(
1
10/),(
1
10/),(10/)(
10 





++= ∑∑ ==
n
j
ijLTnm
m
j
ijLTtmiLTq
iLTg
PA Model I (Cont’d)
• To determine the Signal-to-Mask Ratio
(SMR) in sub-band n, the minimum global
masking threshold LTmin is used:
[dB])()()( min nLTnLnSMR SBSB −=
Where LSB(n) is the signal component
in sub-band n .
PA Model II: Layer 3
• The size of FFT (+ Hann window) can be
varied. In practice: model is computed twice
in parallel (192 samples for short block and
576 samples for long block).
• Masking in time (forward and backward) is
taken into calculations (spreading energy).
• Final energy threshold obtained by the
convolution (via FFT) of “spreading”
energy and partitioned original energy.
PA Model II (Cont’d)
• SMR is calculated by the ratio between
energy in the “scale factor” band (e_partn)
and the noise level in the scale factor band
(n_partn):
[dB])_/_(log10 10 nnn partnparteSMR =
n: index of coder partition
Scale factor: the maximum of the absolute values of 12 samples in
a sub-band is determined. (6 bits)
MPEG-2 Audio
• Backwards compatible - defines extensions:
– MultiChannel coding
•5 channel audio (L, R, C, LS, RS)
– Multilingual coding
•7 multilingual channels
– Lower sampling frequencies (LSF)
– Optional Low Frequency Enhancement (LFE)
MultiCahnnel Coding
• Up to 5 audio channels Matrixation of channels
for compatibility:
C: center Ls,Rs: surround
• Lc and Rc are MPEG-1 encoded
• Layer 1,2: Use syntax of MPEG1-L2
• Layer 3: flexible number of extension channels
Lc b L C a Ls Rc b R C a Rs
b
a
a
= + + ⋅L
NM O
QP = + + ⋅L
NM O
QP
=
+ +
=
2 2
1
1 1
2
1
2
1
2
1
2 2
0; ; ;
Multi-channel Configurations
3/2 3/1 2/2 3/0
L C R
LS RS
2/1 3/0 2/0
And more options...
Bi/multi
lingual,
hearing impaired
etc.
Multilingual Coding
• Up to 7 additional channels for multi-
lingual purposes
Low Sampling Frequency Coding
• For narrow band frequencies, no need for high
sampling rates (wide-band speech and medium
quality audio)
• Added sampling rates are the halves of the
MPEG-1 rates: 16K, 22.05K and 24KHz
• Need to change PA model tables
• Optional sixth channel: LFE capable of
handling signals from 15Hz to 120Hz (Sub-
Woofer), added to 5 regular channels
Encoder Scheme
Mapping
Composite
Channel
Coding
Quant.
and
Coding
Packet
Frame
PA
Model
Bit
Stream
PCM Input
Decoder Scheme
Bit-
Stream Frame
UnPacking
Rc, Lc
Decoding
Composite
status information
Decoding
Reconstruction of
Quantized Audio
Data
Rebuilding
Audio
Channels
Inverse
Mapping
PCM
Out
MPEG-2, Layer-I extensions
• A “slot” consists of 32 bits.
• The number of slots in frame depends on the
sampling frequency and bit-rate.
• Each frame contains information on 384
samples of the original input signal.
• Frame_size= 384 * (1/fs) (16mSec for fs=24KHz)
• Num_of_slots= bit_rate * (384/32)/fs
(32 for fs=24KHz)
MPEG-2, Layer II extensions
• Difference from MPEG-1 only in formatting,
possible quantization and PA model.
• A slot consists of 8 bits.
• Each frame contains information on 1152
samples of the original input signal.
MPEG-2, Layer III extensions
• Different scale factor band tables.
• Omission of some side information (due to changed
frame layout).
• Some changes in PA model tables.
• 21 Scale factors bands for each fs (long windows)
• 12 Scale factors bands for each fs (short windows)
• Scale factor band: a set of frequency lines that are
scaled by the same scale factor.
Witches:Witches: ““DingoDingo”” at 22.05Khzat 22.05Khz
0
10
20
30
40
50
60
70
80
90
100
128 112 96 80 64 56 48 40 32 24 16
bitrate [KBits/sec]
0
5
10
15
20
25
quality [4.5-5] quality [3.5-4.5] quality [2.5-3.5] quality [1.5-2.5] quality [1-1.5] SNR [dB]
Successful
granules SNR

More Related Content

What's hot

Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and Communication
Benesh Selvanesan
 
SPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEWSPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEW
ijiert bestjournal
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
Digital audio
Digital audioDigital audio
Digital audio
Devashish Raval
 
3. digital transmission fundamentals
3. digital transmission fundamentals3. digital transmission fundamentals
3. digital transmission fundamentalsRovin Valencia
 
Lecture 18 (5)
Lecture 18 (5)Lecture 18 (5)
Lecture 18 (5)
Deepakkumar5880
 
Final presentation
Final presentationFinal presentation
Final presentation
Meghasyam Tummalacherla
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
Rudra Prasad Maiti
 
Digital Communication 1
Digital Communication 1Digital Communication 1
Digital Communication 1
admercano101
 
Digital communication systems
Digital communication systemsDigital communication systems
Digital communication systemsNisreen Bashar
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
Angelo Salatino
 
Dc ch06 : multiplexing
Dc ch06 : multiplexingDc ch06 : multiplexing
Dc ch06 : multiplexing
Syaiful Ahdan
 
Multimedia Systems by Sahil Punni
Multimedia Systems by Sahil PunniMultimedia Systems by Sahil Punni
Multimedia Systems by Sahil Punni
Sahil Punni
 
An Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and JitterAn Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and Jitter
Dr. Mohieddin Moradi
 
Wireless digital communication and coding techniques new
Wireless digital communication and coding techniques newWireless digital communication and coding techniques new
Wireless digital communication and coding techniques new
Clyde Lettsome
 
Digital signal processing part1
Digital signal processing part1Digital signal processing part1
Digital signal processing part1
Vaagdevi College of Engineering
 
Signal Filtering
Signal FilteringSignal Filtering
Signal Filtering
Silicon Mentor
 

What's hot (18)

Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and Communication
 
SPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEWSPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEW
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
Digital audio
Digital audioDigital audio
Digital audio
 
3. digital transmission fundamentals
3. digital transmission fundamentals3. digital transmission fundamentals
3. digital transmission fundamentals
 
Lecture 18 (5)
Lecture 18 (5)Lecture 18 (5)
Lecture 18 (5)
 
Final presentation
Final presentationFinal presentation
Final presentation
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
 
Digital Communication 1
Digital Communication 1Digital Communication 1
Digital Communication 1
 
Digital communication systems
Digital communication systemsDigital communication systems
Digital communication systems
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 
Dc ch06 : multiplexing
Dc ch06 : multiplexingDc ch06 : multiplexing
Dc ch06 : multiplexing
 
Multimedia Systems by Sahil Punni
Multimedia Systems by Sahil PunniMultimedia Systems by Sahil Punni
Multimedia Systems by Sahil Punni
 
Dm3400 3401 final
Dm3400 3401 finalDm3400 3401 final
Dm3400 3401 final
 
An Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and JitterAn Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and Jitter
 
Wireless digital communication and coding techniques new
Wireless digital communication and coding techniques newWireless digital communication and coding techniques new
Wireless digital communication and coding techniques new
 
Digital signal processing part1
Digital signal processing part1Digital signal processing part1
Digital signal processing part1
 
Signal Filtering
Signal FilteringSignal Filtering
Signal Filtering
 

Similar to A1mpeg12 2004

Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
zulhelmanz
 
dab_1.pdf
dab_1.pdfdab_1.pdf
dab_1.pdf
ssusera6186a1
 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
AshenafiGirma5
 
audiocompression-130624061221-phpapp02.pptx
audiocompression-130624061221-phpapp02.pptxaudiocompression-130624061221-phpapp02.pptx
audiocompression-130624061221-phpapp02.pptx
PawachMetharattanara
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
Daniel Brewster
 
03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt
03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt
03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt
ZeyadAlabsy
 
Pass band transmission
Pass band transmission Pass band transmission
Sound digitalisation
Sound digitalisationSound digitalisation
Sound digitalisation
Azmawati Lazim
 
Mp3
Mp3Mp3
Digital Audio
Digital AudioDigital Audio
Digital Audio
Magic Finger Lounge
 
Multimedia Object - Audio
Multimedia Object - AudioMultimedia Object - Audio
Multimedia Object - Audio
Telkom Institute of Management
 
Digital communication
Digital communicationDigital communication
Digital communicationmeashi
 
Audio_Overview.pptx
Audio_Overview.pptxAudio_Overview.pptx
Audio_Overview.pptx
BinhHoang71
 
Digital transmission new unit 3
Digital transmission new unit 3Digital transmission new unit 3
Digital transmission new unit 3
Srashti Vyas
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
Akira Tamamori
 
Sampling
SamplingSampling
Digital multiplexing and transmission concept
Digital multiplexing and transmission conceptDigital multiplexing and transmission concept
Digital multiplexing and transmission concept
Saroj Kumar Gochhayat
 
Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)
Godo Dodo
 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.ppt
shareea2002
 

Similar to A1mpeg12 2004 (20)

Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
 
dab_1.pdf
dab_1.pdfdab_1.pdf
dab_1.pdf
 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
 
audiocompression-130624061221-phpapp02.pptx
audiocompression-130624061221-phpapp02.pptxaudiocompression-130624061221-phpapp02.pptx
audiocompression-130624061221-phpapp02.pptx
 
Audio compression
Audio compressionAudio compression
Audio compression
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
 
03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt
03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt
03_04-AnalogDigital-HYanikomeroglu-12Jan2011_14Jan2011_Old1.ppt
 
Pass band transmission
Pass band transmission Pass band transmission
Pass band transmission
 
Sound digitalisation
Sound digitalisationSound digitalisation
Sound digitalisation
 
Mp3
Mp3Mp3
Mp3
 
Digital Audio
Digital AudioDigital Audio
Digital Audio
 
Multimedia Object - Audio
Multimedia Object - AudioMultimedia Object - Audio
Multimedia Object - Audio
 
Digital communication
Digital communicationDigital communication
Digital communication
 
Audio_Overview.pptx
Audio_Overview.pptxAudio_Overview.pptx
Audio_Overview.pptx
 
Digital transmission new unit 3
Digital transmission new unit 3Digital transmission new unit 3
Digital transmission new unit 3
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Sampling
SamplingSampling
Sampling
 
Digital multiplexing and transmission concept
Digital multiplexing and transmission conceptDigital multiplexing and transmission concept
Digital multiplexing and transmission concept
 
Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)
 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.ppt
 

Recently uploaded

Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 

Recently uploaded (20)

Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 

A1mpeg12 2004

  • 1. MPEG Audio Coding Nimrod Peleg Update: March 2004
  • 2. Introduction • High quality low bit-rate audio coding • MPEG-1: Mono & Stereo, sampling rates of 32KHz, 44.1KHz and 48KHz. • MPEG-2: Backward compatible coding of 5+1 multi-channel sound, more sampling rates: 16KHz, 22.05KHz and 24KHz.
  • 3. Some Facts • MPEG-1: 1.5 Mbits/sec for audio and video: ~1.1Mbps for video, 0.3-0.4Mbps for audio • Uncompressed CD audio is 44,100 samples/sec * 16 bits/sample * 2 channels > 1.4Mbps • Typical Compression factors: from 2.7 to 24 • With Compression rate 6:1 (16 bits stereo sampled at 48 KHz is reduced to 256 kbps) and optimal listening conditions, expert listeners could not distinguish between coded and original audio clips.
  • 4. Some Facts (Cont’d) • MPEG-1 audio supports sampling frequencies of 32, 44.1 and 48 KHz. • Supports one or two audio channels in one of the four modes: – Monophonic: single audio channel – Dual-monophonic: two independent channels (similar to stereo) – Stereo: for stereo channels that share bits, but not using joint-stereo coding – Joint-stereo: takes advantage of the correlation between stereo channels
  • 5. Basic Idea: PsychoAcoustics • How much noise can be introduced to the signal without being audible ? PsychoAcoustic Model Masking in the frequency domain
  • 6. Reminder: Cochlear filter mechanism • The bandwidth of filters (in the ‘filter bank’) varies strongly from low to high frequencies • Center frequencies are call ‘critical bands’: mapping frequency onto a linear distance measure along the basiliar membrane. • Filters bandwidth variation: 40:1 • Filters time response variation: 1:40 • Simultaneous control of time/frequency artifacts at 40:1 resolution range is difficult !
  • 7. Barks • Assuming that each critical band corresponds to a fixed distance along the basiliar membrane, we define a unit of length z(f) to be one critical band, and call it “Bark” (after Barkhausen). • The approximation of z(f) is done using: z/Bark= 13 Arctan(0.76/1KHz) + 3.5 Arctan[(f/7.5kHz)2) • Bark width vary from ~100Hz in low freq. and 4kHz at ~15kHz.
  • 8. The Barks table Bark# flow(Hz) fhigh(Hz) fcenter(Hz) BandWidth 0 0 100 50 100 1 100 200 150 100 2 200 300 250 100 • 10 1270 1480 1370 210 11 1480 1720 1600 240 • 22 9500 12000 10500 2500 23 12000 15500 13500 3500 24 15500
  • 10. PsychoAcoustics Model • Frequency is divided into “Barks”: bands of non-uniform width (narrower in lower freq.) according to the ear’s “resolution” • Masking is influenced by two major parameters: – Tonal component (Masks the near frequency) – Noise component (Masks near and lower noises) • Given an audio signal, the model creates a masking function
  • 11. Measures • SMR: Signal to Mask Ratio • SNR: Signal to Noise Ratio • MNR = SNR - SMR Is the ratio between the mask energy and the quantization noise • Positive MNR: The noise injected in the quantization process is higher than masking level, and will be heard after reconstruction
  • 12. Basic Coding Structure Audio In Analysis FliterBank Perceptual Model Quantization +Bit Allocation Bitstream Encoding Bit- Stream Output 1 3 2 Side information (Bit allocation etc.) Included in bitstream Side info. Block switching info. Perceptual threshold
  • 13. Basic Coding Structure (Cont’d) 1. Input signal is decomposed into sub-sampled spectral components (time/freq. domain) 2. A time-dependent mask threshold is estimated 3. Spectral components are quantized and coded, keeping the noise (introduced in quantization) below the mask threshold (Many implementations) • Bit-Allocation: 1 bit of quantization introduces about 6 dB of noise
  • 14. Bit-Stream Structure Frame 1 Frame 2 Frame ... Frame N Header (32 bits) CRC (16 bits) Coded Data Ancillary Data
  • 15. Header Contents • SyncWord (12 bits) • Layer Code (2 bits): Layer I, II or III • Bit-rate Index (4 bits): according to the table in next slide, 32Kbps up-to 448Kbps. • Sampling Frequency (2 bits): 48, 44.1 or 32kHz. • Padding (1 bit): number of slots, N or N+1 • Mode (2 bits): Stereo, Joint Stereo, Dual or Single Channel.
  • 16. Available Bit-Rates (Kbps) Index Layer I Layer II Layer III 0000 free format free format free format 0001 32 32 32 0010 64 48 40 0011 96 56 48 0100 128 64 56 0101 160 80 64 0110 192 96 80 0111 224 112 96 1000 256 128 112 1001 288 160 128 1010 320 192 160 1011 352 224 192 1100 384 256 224 1101 416 320 256 1110 448 384 320
  • 17. Perceptual Model • A good estimation of actual masked threshold is essential for a better quality • A very simple model would allocate bits according to : n_bits=(27dB*lu-lo) / 6.02dB lu: upper band limit lo: lower band limit (Measured in Bark) • More advanced models (SMR) are in use
  • 18. Perceptual Model: reminder • The masking occurs in each critical band. • Critical band represents the bandwidth at which subjective response change rather fast. • The bandwidth of the critical bands varies from 100Hz at low frequencies to about (0.2 x f) for frequencies above 500Hz. • The loudness of a band of noise at a constant sound pressure remains constant in the critical band. • The corresponding unit for the critical band is bark.
  • 19. Filter Banks • Sub-Band Coders use low number of channels, connected with processing of adjacent samples in time • Transform Coders use high number of sub- bands and joint processing of adjacent samples in frequency No Basic difference between both approaches
  • 20. Quantization Two basic approaches • Block Companding (block floating point): A number of values, ordered either in time domain or in frequency domain are normalized to maximum absolute value (by scale factor) – Number of bits allocated for the block (derived from the perceptual model) derives the quantization step size
  • 21. Quantization (Cont’d) • Noise allocation + Scalar Quant. + Huffman: Instead of bit allocation, an amount of allowed noise equal to the estimated masked threshold is calculated for each scale-factor sub-band Quantization noise is colored using scale factors, by changing quantization step size – Quantized values are Huffman coded – Process is controlled by iteration loops
  • 22. MPEG-1 System • First international standard for digital audio compression • Joint effort of ASPEC (AT&T, CNET ...) and MUSICAM (Philips, Matsushita ,....) • A three layer coding algorithms defined with main system properties are increased complexity (encoder mainly ) and quality (at low bit-rates)
  • 23. MPEG-1 Layers • MPEG defines 3 layers for audio. Basic model is same, but Codec complexity increases with each layer. • Data is divided into frames, each of them contains 384 samples, 12 samples from each of the 32 filtered sub-bands as shown in the next slide • All layers share definition of basic bitstream format (4 bytes header, sync. Etc.)
  • 24.
  • 25. MPEG-1 , Layer 1 • Input signal transformed into 32 uniform sub- bands (same frequency width for each band). • For each sub-band an adaptive bit allocation (based on PA model) and quantization • Psychoacoustic model uses only frequency masking. • No control on the amount of noise introduced for each sub-band : bit allocation continues until needed output rate achieved
  • 26. Layer I : more details • Filter bank: Equally spaced Polyphase filter: design flexibility of generalized QMF and low computational complexity • 511 tap prototype used, optimized for very steep response, and stop band attenuation better than 96dB (equivalent to 16 bit resolution) – reconstruction error: LSB (of 16) if no quantization • Impulse response of 10.6ms (@48kHz) • Time resolution: 0.66ms (@48kHz) • The prototype filter keeps pre-echo artifacts !
  • 27. Layer I : more details (Cont’d) • Quantization step uses block companding of 12 subband samples – Basic block length: 12*32=384 samples • A 4-bit field signals the bit allocation: 0-16 bits for each subband • A 6-bit field scale factor (G) for each band, – the exponent of the block companding quantization • This method allows changes in the bit allocation procedure
  • 28. Layer 1 features • A simplified version of MUSICAM • Appropriate for consumer applications such as studio use (where very low data rated not necessary) • Compatible with PASC by Philips • Basic frame length: 8mSec (for 48KHz rate)
  • 29. MPEG-1, Layer 2 • Further compression, by removing redundancy (a little bit of the temporal masking) and a more precise quantization • Basic frame length: 24mSec (for 48KHz sampling freq.): 1152 samples • Identical to MUSICAM (Except frame header) • Application fields: consumer and professional studio-like broadcasting, recording, multi-media, audio workstations.
  • 30. MPEG-1, Layer 2 (Cont’d) • Additional coding of bit-allocation, scale factors and different frame structure. • Encoder forms larger groups of 3 blocks, 12 samples/block, and 32 sub-bands (total of 1152 samples per frame). • One bit-allocation type and 3 scale factors for every 3 blocks frame. • Radix coding allows allocation of fractional bits for small quantized values
  • 31. MPEG-1 , Layer 3 (mp3) • Signal mapping resolution is increased (in freq. domain: non-equal frequencies) • Signal is divided into “critical bands”, according to human ear resolution • Adaptive allocation of noise to each critical band, and logarithmic quantization • Further compression by Huffman coding • PA model includes temporal masking effects, takes into account stereo redundancy.
  • 32. Non-Linear Critical Bands 0 2 4 6 8 10 12 14 16 18 20 22 24 0 3 6 9 12 15 Critical Bands [KHz] Bark Numbers
  • 33. Layer 3 Basic Scheme FilterBank (Polyphase) MDCT Quantizer + Huffman Frame Packet and error corr. PsychoAcoustic Model (Optional) Hybrid Filter- Bank PCM In Packet Bitstream
  • 34. Layer 3 Basic Scheme (Cont’d) • PolyPhase FilterBank: Filters the input into 32 time domain signals, representing 32 uniform frequency bands. • MDCT: Transforms each band into freq. domain, getting 576 spectral lines (32*18 samples): additional frequency resolution: 18 sub-subbands • PA Model: Analyses the input, and controls the quantization step size • Quantizer + Coder: Quantizes according to PA and needed rate + Huffman coding
  • 35. Layer 3 Basic Scheme (Cont’d) • Adaptive block switching: Dynamic switching of the time-frequency decomposition (filter bank resolution) is allowed • This is important in order to ensure that the time spread of the filter bank does not exceed the pre-masking period (to avoid pre-echo) • Adaptive window switching uses four optional windows: normal (long), start, short and stop.
  • 36. Window types Normal window for stationary signals 576 spectral lines Start window to switch from long to short right 1/3 is zero to cancel aliasing Stop window to switch from short to long left 1/3 is zero to cancel aliasing Short window 1/3 length followed by 1/3 length MDCT time resolution: 4ms (@48KHz) 192 spectral lines
  • 37. Example sequence of window forms
  • 38. Layer III Quantization and Structure • Non-uniform quantization and VLC (Huffman) • A-by-S iteration loop • No bit direct allocation, but ‘Noise allocation’ (indirect allocation) using two iteration loops
  • 39. Iteration loops • Two nested iteration loops: – Rate loop: Inner iteration for quantization and coding of spectral lines (using Huffman tables) - repeats with increasing step size, until number of allocated bits does not exceed the allowed maximum. – Distortion Control loop: Outer iteration, keeping quantization noise below masking threshold according to the PA model • Noise coloration: scale factors reduced until injected noise is small enough
  • 40. Layer 3 Features • Short Blocks of 12 samples (in addition to regular 36 samples blocks) improve time resolution to cope for transients. • PA Model and coding technique are NOT part of the standard - related information should be included in bit-stream • Optional variable rate mode • Application in Telecommunication, mainly narrow-band ISDN, satellite links , Internet DVD, etc.
  • 41. Joint Stereo Coding • Can be used for stereo redundancy reduction. • Stereo and Dual-Channel signals require twice the bandwidth if we code them separately. • To decrease bit-rate (or increase quality) we can use intensity stereo mode or Middle/Side (MS) stereo coding. • MS stereo coding is supported only in Layer III
  • 42. Joint Stereo Coding (cont’d) Intensity stereo coding: • instead of separate L and R subband samples, a single summed signal is transmitted with R and L Scale Factors. • The frequency spectra of the decoded stereo signals are the same but the magnitudes are different.
  • 43. Joint Stereo Coding (cont’d) Middle/Side Stereo Mode: • Middle (sum of L and R) and Side (difference of L and R) are transmitted instead of L and R. • M is transmitted in the L channel and S in the R channel. • R and L channels can be reconstructed using: 2/)(2/)( SMRSML −=+=
  • 44. Effectiveness of MPEG audio Layer Target Ratio Quality @ Quality @ Theoretical # bitrate 64 kbits 128 kbits Min. Delay 1 192 kbit 4:1 --- --- 19 ms 2 128 kbit 6:1 2.1 to 2.6 4+ 35 ms 3 64 kbit 12:1 3.6 to 3.8 4+ 59 ms 5 = perfect, 4 = just noticeable, 3 = slightly annoying, 2 = annoying, 1 = very annoying
  • 45. More About the PA Model • The difference between max. signal level and min. masking threshold is used in the bit or noise allocation to determine Q level • Two models given in the informative part of the standard. model 1 recommended for layers 1,2 and model 2 for layer 3 • PA model output is SMR for each band (L1, L2) or group of bands (L3)
  • 46. PA Model I (Layer 1,2) • Transform length (FFT) is 512 samples for layer 1 and 1024 for layer 2. • The filter bank suffers lack of selectivity at low frequencies. • To compensate it: FFT in parallel to sub-band filtering. • Sound Pressure Level (SPL) is computed for each band. • Tonal and non-tonal components are extracted from the power spectrum.
  • 47. PA Model I (Cont’d) • Using “decimation”, number of maskers is reduced: only components (tonal and non-tonal) greater than the absolute threshold are considered. • Two or more components that are smaller than the highest power within the distance of 0.5 bark are removed from the list of tonal components. • Masking thresholds (both t and non-t) are defined by adding the masking index and masking function to the masking component (both index and function are provided in the standard as formal equations)
  • 48. PA Model I (Cont’d) • Global masking threshold, LTg, (for the frequency component) is derived by summing the powers of the individual masking thresholds (tonal: LTtm , non-tonal LTnm ) and the threshold in quite: [dB]101010log10)( 1 10/),( 1 10/),(10/)( 10       ++= ∑∑ == n j ijLTnm m j ijLTtmiLTq iLTg
  • 49. PA Model I (Cont’d) • To determine the Signal-to-Mask Ratio (SMR) in sub-band n, the minimum global masking threshold LTmin is used: [dB])()()( min nLTnLnSMR SBSB −= Where LSB(n) is the signal component in sub-band n .
  • 50. PA Model II: Layer 3 • The size of FFT (+ Hann window) can be varied. In practice: model is computed twice in parallel (192 samples for short block and 576 samples for long block). • Masking in time (forward and backward) is taken into calculations (spreading energy). • Final energy threshold obtained by the convolution (via FFT) of “spreading” energy and partitioned original energy.
  • 51. PA Model II (Cont’d) • SMR is calculated by the ratio between energy in the “scale factor” band (e_partn) and the noise level in the scale factor band (n_partn): [dB])_/_(log10 10 nnn partnparteSMR = n: index of coder partition Scale factor: the maximum of the absolute values of 12 samples in a sub-band is determined. (6 bits)
  • 52. MPEG-2 Audio • Backwards compatible - defines extensions: – MultiChannel coding •5 channel audio (L, R, C, LS, RS) – Multilingual coding •7 multilingual channels – Lower sampling frequencies (LSF) – Optional Low Frequency Enhancement (LFE)
  • 53. MultiCahnnel Coding • Up to 5 audio channels Matrixation of channels for compatibility: C: center Ls,Rs: surround • Lc and Rc are MPEG-1 encoded • Layer 1,2: Use syntax of MPEG1-L2 • Layer 3: flexible number of extension channels Lc b L C a Ls Rc b R C a Rs b a a = + + ⋅L NM O QP = + + ⋅L NM O QP = + + = 2 2 1 1 1 2 1 2 1 2 1 2 2 0; ; ;
  • 54. Multi-channel Configurations 3/2 3/1 2/2 3/0 L C R LS RS 2/1 3/0 2/0 And more options... Bi/multi lingual, hearing impaired etc.
  • 55. Multilingual Coding • Up to 7 additional channels for multi- lingual purposes
  • 56. Low Sampling Frequency Coding • For narrow band frequencies, no need for high sampling rates (wide-band speech and medium quality audio) • Added sampling rates are the halves of the MPEG-1 rates: 16K, 22.05K and 24KHz • Need to change PA model tables • Optional sixth channel: LFE capable of handling signals from 15Hz to 120Hz (Sub- Woofer), added to 5 regular channels
  • 58. Decoder Scheme Bit- Stream Frame UnPacking Rc, Lc Decoding Composite status information Decoding Reconstruction of Quantized Audio Data Rebuilding Audio Channels Inverse Mapping PCM Out
  • 59. MPEG-2, Layer-I extensions • A “slot” consists of 32 bits. • The number of slots in frame depends on the sampling frequency and bit-rate. • Each frame contains information on 384 samples of the original input signal. • Frame_size= 384 * (1/fs) (16mSec for fs=24KHz) • Num_of_slots= bit_rate * (384/32)/fs (32 for fs=24KHz)
  • 60. MPEG-2, Layer II extensions • Difference from MPEG-1 only in formatting, possible quantization and PA model. • A slot consists of 8 bits. • Each frame contains information on 1152 samples of the original input signal.
  • 61. MPEG-2, Layer III extensions • Different scale factor band tables. • Omission of some side information (due to changed frame layout). • Some changes in PA model tables. • 21 Scale factors bands for each fs (long windows) • 12 Scale factors bands for each fs (short windows) • Scale factor band: a set of frequency lines that are scaled by the same scale factor.
  • 62. Witches:Witches: ““DingoDingo”” at 22.05Khzat 22.05Khz 0 10 20 30 40 50 60 70 80 90 100 128 112 96 80 64 56 48 40 32 24 16 bitrate [KBits/sec] 0 5 10 15 20 25 quality [4.5-5] quality [3.5-4.5] quality [2.5-3.5] quality [1.5-2.5] quality [1-1.5] SNR [dB] Successful granules SNR