Final presentation

Audio Compression using
Discrete Wavelet
Transform
Thyagarajan Venkatanarayanan
Meghasyam Tummalacherla

Overview
 Why audio compression?
 Overview of system
 Wavelet Representation
 The psychoacoustic model
 Results

Overview of the system
Wavelet
Analysis
Psychoacoustic
Model
Threshold,
decide bits
(Mu law)
Compression
Audio Signal Quantization
New
representation
(Mu law)
Expansion
Wavelet
Synthesis
Reconstructed
Signal
Encoding
Decoding

Why audio compression?
 To represent signal with minimum number of bits without
losing quality/message of the signal
 We use a Wavelet based coding method with a
psychoacoustic model to exploit perceptual masking and
eliminate source redundancies

Masking phenomena
 Masking refers to a process where one sound is rendered
inaudible because of the presence of another sound
 Simultaneous masking refers to a frequency domain
phenomenon which has been observed within critical
bands (in-band).
 Important to distinguish between two types of
simultaneous masking, namely
 tone-masking-noise: a tone occurring at the center of a
critical band masks noise of any subcritical bandwidth
 noise-masking-tone: follows the same pattern with the
roles of masker and maskee reversed

Temporal masking
 Masking also occurs in the time-domain.
 In the context of audio signal analysis, abrupt signal
transients (e.g., the onset of a percussive musical
instrument) create pre- and post- masking regions in time

Critical Band and Masking
Distance between each Critical band is one bark

The Psychoacoustic model
 Based on tests done on Human hearing
 Uses an N-point DFT for high resolution spectral analysis,
then estimates for each input frame individual
simultaneous masking thresholds due to the presence of
tone-like and noise-like maskers in the signal spectrum.
 A global masking threshold is then estimated for a subset
of the original N/2 frequency bins by (power) additive
combination of the tonal and non-tonal individual masking
thresholds

Step 1: Spectral analysis and
Normalization
 Input is segmented into 512 length frames by applying a
Hanning window and the power spectral density (PSD) is
obtained using a N-point FFT:
𝑃 𝑘 = log10
𝑛=0
𝑁−1
𝑤 𝑛 𝑥 𝑛 𝑒−𝑗2𝜋𝑘𝑛
𝑁
2
0 ≤ 𝑘 ≤ 𝑁
2
 Normalized to 96dB, for MPEG 1 codec parameters
𝑃 𝑁 𝑘 = 𝑃 𝑘 − max 𝑃 𝑘 + 96

Step 2: Identification of tonal &
noise (non tonal) frequencies
 Find Local Maxima
 Is a tonal frequency if
 Remaining maxima, not in the ±Δ 𝑘 range of a tonal
frequency are classified as noise frequencies

Step 3: Thresholding and
reorganization of Masks
 Any tonal/noise maskers below the absolute hearing
threshold are discarded
𝑃 𝑇𝑀,𝑁𝑀 𝑘 ≤ 𝑇𝑞 𝑘
Where Tq(f) is the absolute threshold of hearing (the
amount of energy needed in a pure tone such that it can
be detected by a listener in a noiseless environment)
 Next, a sliding is used to replace any pair of maskers
occurring within a distance of 0.5 Bark by the stronger of
the two

Thresholding and reorganization

Step 4: Individual masking thresholds
 Each individual threshold represents a masking contribution at
frequency bin i due to the tone or noise masker located at bin
j.
 The tonal masker thresholds, TTM(i, j), are given (in dB) by:
𝑇 𝑇𝑀 𝑖, 𝑗 = 𝑃 𝑇𝑀 𝑗 − 0.275𝑧 𝑗 + 𝑆𝐹 𝑖, 𝑗 − 6.025
 The noise masker thresholds, TTM(i, j), are given (in dB) by:
𝑇 𝑁𝑀 𝑖, 𝑗 = 𝑃 𝑁𝑀 𝑗 − 0.175𝑧 𝑗 + 𝑆𝐹 𝑖, 𝑗 − 2.025

Individual threshold corresponding
to tonal components

Individual threshold corresponding
to non-tonal components

Step 5: Global masking threshold
 The global masking threshold, Tg(i), is obtained by:
𝑇𝑔 𝑖 = 10 log10
𝑙=1
𝐿
100.1𝑇 𝑇𝑀 𝑖,𝑙
+
𝑚=1
𝑀
100.1𝑇 𝑁𝑀 𝑖,𝑚
𝑑𝐵
 The global threshold for each frequency bin represents a
signal dependent, power additive modification of the
absolute threshold due to the spread of all tonal and noise
maskers in the signal power spectrum

Effective SNR (PSD – Global Threshold)

Wavelet representation of signal
𝑔 𝑡 =
𝑘
𝑐𝑗0
𝑘 2
𝑗
2 𝛷 2 𝑗 𝑡 − 𝑘 +
𝑘 𝑗=𝑗0
𝑗1
𝑑𝑗 𝑘 2
𝑗
2 𝛹(2 𝑗 𝑡 − 𝑘)
 Audio signal divided into non-overlapping frames of length
512 samples (11.5 ms at 44.1 kHz). Each frame is
multiplied by a Hanning window of same length to avoid
border distortions

Wavelet decomposition – initial step
 Given a signal s of length N, the DWT consists of log2 N
stages at most.
 First step produces 2 sets of coefficients: approximation
coefficients CA1, and detail coefficients CD1.

Recursive wavelet decomposition
 The wavelet decomposition of the signal s analyzed at
level j has the following structure: [cAj, cDj, ..., cD1].

Wavelet decomposition
 The wavelet transform coefficients are computed
recursively using an efficient pyramid algorithm

Wavelet: Vanishing moments
 We choose orthonormal
wavelets
 For a wavelet of K coefficients,
it can have at most K/2
vanishing moments
 To ensure regularity (How fast
the coefficients decay to zero),
we choose a wavelet with high
no. of vanishing moments
 As they are best suited for
Audio processing

Wavelet: Sparsity
 Sparsity – Number of non zero coefficients, the lesser the
better

Dependence of efficiency on choice
of wavelet
 Type of wavelet basis has a significant impact on
efficiency of coding scheme.
 Used MATLAB function “wavedec” to perform our 1-D
wavelet decomposition.
 Compression using “wdencmp” by parameters obtained
using “ddencmp”
 Efficiency of compression measured using two metrics
returned by function: PERFL0 and PERFL2

Dependence of efficiency on choice
of wavelet
 Daubechies refers to a particular family of wavelets. The
number refers to the number of vanishing moments
 Simply put, the higher the number of vanishing moments,
the smoother the wavelet (and longer the wavelet filter).
Wavelet PERF0 PERFL2
Haar 31.01 99.93
Daubechies-2 64.63 99.95
Daubechies-10 65.59 99.97

Bit-rate reduction
 After testing the coder (with the Daubechies-10 wavelet)
with 4 different music signals originally at 16 bits/sample
(violin, drums, piano, Adele), we observed that the
average number of bits required to encode them was
around 7.5 i.e. we are able to attain more than 50 %
reduction.
 We assumed a bit for every 6.02dB of max of effective
SNR for a frame

Bits Allocation per frame (lowest is 4)

Original vs reconstructed signal

Subjective tests
 Important to eliminate chance in listening tests. So, we provided several
stimuli of each source material to each listener. We also did not reveal to the
listener the actual order in which the stimuli are presented (e.g. original,
coder 1, coder 2 etc).
 Figures indicate coder provided a transparent coding for all audio sources.
 Quality of the piano signal was not as good as others because it contains long
segments of nearly steady of slowly decaying sinusoids which the wavelet
based coder did not seem to handle well
Sample Average
probability of
original
preferred
over encoded
Sample size Comments
Violin 0.25 12 Transparent
Piano 0.5 10 Nearly
transparent
Drums 0.3 10 Transparent
Adele 0.27 15 Transparent

References
 [1] D. Sinha and A. Tewfik. “Low Bit Rate Transparent Audio Compression using
Adapted Wavelets”, IEEE Trans. ASSP, Vol. 41, No. 12, December 1993
 [2] T. Painter and A. Spanias, “A review of algorithms for perceptual coding of
digital audio signals,” DSP-97, 1977.
 [3] I. Daubechies, “Orthonormal bases of compactly supported wavelets”,
Commun. Pure Appl. Math.. vol 41, pp 909-996, Nov 1988
 [4] ISO/IEC JTC1/SC29/WG11 MPEG, IS11172-3 “Information Technology - Coding of
Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5
Mbit/s, Part 3: Audio” 1992. (“MPEG-1”)
 [5] R. Hellman, “Asymmetry of Masking Between Noise and Tone,” Percep. and
Psychphys., pp. 241-246, vol.11, 1972
 [6] M. Schroeder, et al., “Optimizing Digital Speech Coders by Exploiting Masking
Properties of the Human Ear,” J. Acoust. Soc. Am., pp. 1647-1652, Dec. 1979
 [7] E. Zwicker and H. Fastl, Psychoacoustics Facts and Models, Springer-Verlag,
1990
 [8] C. Burrus, R. A. Gopinath and H. Guo, “Introduction to Wavelets & Wavelet
Transforms”, Prentice-Hall 1998

Final presentation

More Related Content

What's hot

Similar to Final presentation

Recently uploaded

Final presentation

Editor's Notes