SlideShare a Scribd company logo
1 of 34
Audio Compression using
Discrete Wavelet
Transform
Thyagarajan Venkatanarayanan
Meghasyam Tummalacherla
Overview
 Why audio compression?
 Overview of system
 Wavelet Representation
 The psychoacoustic model
 Results
Overview of the system
Wavelet
Analysis
Psychoacoustic
Model
Threshold,
decide bits
(Mu law)
Compression
Audio Signal Quantization
New
representation
(Mu law)
Expansion
Wavelet
Synthesis
Reconstructed
Signal
Encoding
Decoding
Why audio compression?
 To represent signal with minimum number of bits without
losing quality/message of the signal
 We use a Wavelet based coding method with a
psychoacoustic model to exploit perceptual masking and
eliminate source redundancies
Masking phenomena
 Masking refers to a process where one sound is rendered
inaudible because of the presence of another sound
 Simultaneous masking refers to a frequency domain
phenomenon which has been observed within critical
bands (in-band).
 Important to distinguish between two types of
simultaneous masking, namely
 tone-masking-noise: a tone occurring at the center of a
critical band masks noise of any subcritical bandwidth
 noise-masking-tone: follows the same pattern with the
roles of masker and maskee reversed
Temporal masking
 Masking also occurs in the time-domain.
 In the context of audio signal analysis, abrupt signal
transients (e.g., the onset of a percussive musical
instrument) create pre- and post- masking regions in time
Simultaneous masking
Critical Band and Masking
Distance between each Critical band is one bark
The Psychoacoustic model
 Based on tests done on Human hearing
 Uses an N-point DFT for high resolution spectral analysis,
then estimates for each input frame individual
simultaneous masking thresholds due to the presence of
tone-like and noise-like maskers in the signal spectrum.
 A global masking threshold is then estimated for a subset
of the original N/2 frequency bins by (power) additive
combination of the tonal and non-tonal individual masking
thresholds
Step 1: Spectral analysis and
Normalization
 Input is segmented into 512 length frames by applying a
Hanning window and the power spectral density (PSD) is
obtained using a N-point FFT:
𝑃 π‘˜ = log10
𝑛=0
π‘βˆ’1
𝑀 𝑛 π‘₯ 𝑛 π‘’βˆ’π‘—2πœ‹π‘˜π‘›
𝑁
2
0 ≀ π‘˜ ≀ 𝑁
2
 Normalized to 96dB, for MPEG 1 codec parameters
𝑃 𝑁 π‘˜ = 𝑃 π‘˜ βˆ’ max 𝑃 π‘˜ + 96
Step 2: Identification of tonal &
noise (non tonal) frequencies
 Find Local Maxima
 Is a tonal frequency if
 Remaining maxima, not in the Β±Ξ” π‘˜ range of a tonal
frequency are classified as noise frequencies
Tonal and Noise Components
Tonal and Noise Masks
Step 3: Thresholding and
reorganization of Masks
 Any tonal/noise maskers below the absolute hearing
threshold are discarded
𝑃 𝑇𝑀,𝑁𝑀 π‘˜ ≀ π‘‡π‘ž π‘˜
Where Tq(f) is the absolute threshold of hearing (the
amount of energy needed in a pure tone such that it can
be detected by a listener in a noiseless environment)
 Next, a sliding is used to replace any pair of maskers
occurring within a distance of 0.5 Bark by the stronger of
the two
Thresholding and reorganization
Step 4: Individual masking thresholds
 Each individual threshold represents a masking contribution at
frequency bin i due to the tone or noise masker located at bin
j.
 The tonal masker thresholds, TTM(i, j), are given (in dB) by:
𝑇 𝑇𝑀 𝑖, 𝑗 = 𝑃 𝑇𝑀 𝑗 βˆ’ 0.275𝑧 𝑗 + 𝑆𝐹 𝑖, 𝑗 βˆ’ 6.025
 The noise masker thresholds, TTM(i, j), are given (in dB) by:
𝑇 𝑁𝑀 𝑖, 𝑗 = 𝑃 𝑁𝑀 𝑗 βˆ’ 0.175𝑧 𝑗 + 𝑆𝐹 𝑖, 𝑗 βˆ’ 2.025
Individual threshold corresponding
to tonal components
Individual threshold corresponding
to non-tonal components
Step 5: Global masking threshold
 The global masking threshold, Tg(i), is obtained by:
𝑇𝑔 𝑖 = 10 log10
𝑙=1
𝐿
100.1𝑇 𝑇𝑀 𝑖,𝑙
+
π‘š=1
𝑀
100.1𝑇 𝑁𝑀 𝑖,π‘š
𝑑𝐡
 The global threshold for each frequency bin represents a
signal dependent, power additive modification of the
absolute threshold due to the spread of all tonal and noise
maskers in the signal power spectrum
Global masking threshold
Effective SNR (PSD – Global Threshold)
Wavelet representation of signal
𝑔 𝑑 =
π‘˜
𝑐𝑗0
π‘˜ 2
𝑗
2 𝛷 2 𝑗 𝑑 βˆ’ π‘˜ +
π‘˜ 𝑗=𝑗0
𝑗1
𝑑𝑗 π‘˜ 2
𝑗
2 𝛹(2 𝑗 𝑑 βˆ’ π‘˜)
 Audio signal divided into non-overlapping frames of length
512 samples (11.5 ms at 44.1 kHz). Each frame is
multiplied by a Hanning window of same length to avoid
border distortions
Wavelet decomposition – initial step
 Given a signal s of length N, the DWT consists of log2 N
stages at most.
 First step produces 2 sets of coefficients: approximation
coefficients CA1, and detail coefficients CD1.
Recursive wavelet decomposition
 The wavelet decomposition of the signal s analyzed at
level j has the following structure: [cAj, cDj, ..., cD1].
Wavelet decomposition
 The wavelet transform coefficients are computed
recursively using an efficient pyramid algorithm
Wavelet: Vanishing moments
 We choose orthonormal
wavelets
 For a wavelet of K coefficients,
it can have at most K/2
vanishing moments
 To ensure regularity (How fast
the coefficients decay to zero),
we choose a wavelet with high
no. of vanishing moments
 As they are best suited for
Audio processing
Wavelet: Sparsity
 Sparsity – Number of non zero coefficients, the lesser the
better
Dependence of efficiency on choice
of wavelet
 Type of wavelet basis has a significant impact on
efficiency of coding scheme.
 Used MATLAB function β€œwavedec” to perform our 1-D
wavelet decomposition.
 Compression using β€œwdencmp” by parameters obtained
using β€œddencmp”
 Efficiency of compression measured using two metrics
returned by function: PERFL0 and PERFL2
Dependence of efficiency on choice
of wavelet
 Daubechies refers to a particular family of wavelets. The
number refers to the number of vanishing moments
 Simply put, the higher the number of vanishing moments,
the smoother the wavelet (and longer the wavelet filter).
Wavelet PERF0 PERFL2
Haar 31.01 99.93
Daubechies-2 64.63 99.95
Daubechies-10 65.59 99.97
Bit-rate reduction
 After testing the coder (with the Daubechies-10 wavelet)
with 4 different music signals originally at 16 bits/sample
(violin, drums, piano, Adele), we observed that the
average number of bits required to encode them was
around 7.5 i.e. we are able to attain more than 50 %
reduction.
 We assumed a bit for every 6.02dB of max of effective
SNR for a frame
Bits Allocation per frame (lowest is 4)
Original vs reconstructed signal
Subjective tests
 Important to eliminate chance in listening tests. So, we provided several
stimuli of each source material to each listener. We also did not reveal to the
listener the actual order in which the stimuli are presented (e.g. original,
coder 1, coder 2 etc).
 Figures indicate coder provided a transparent coding for all audio sources.
 Quality of the piano signal was not as good as others because it contains long
segments of nearly steady of slowly decaying sinusoids which the wavelet
based coder did not seem to handle well
Sample Average
probability of
original
preferred
over encoded
Sample size Comments
Violin 0.25 12 Transparent
Piano 0.5 10 Nearly
transparent
Drums 0.3 10 Transparent
Adele 0.27 15 Transparent
References
 [1] D. Sinha and A. Tewfik. β€œLow Bit Rate Transparent Audio Compression using
Adapted Wavelets”, IEEE Trans. ASSP, Vol. 41, No. 12, December 1993
 [2] T. Painter and A. Spanias, β€œA review of algorithms for perceptual coding of
digital audio signals,” DSP-97, 1977.
 [3] I. Daubechies, β€œOrthonormal bases of compactly supported wavelets”,
Commun. Pure Appl. Math.. vol 41, pp 909-996, Nov 1988
 [4] ISO/IEC JTC1/SC29/WG11 MPEG, IS11172-3 β€œInformation Technology - Coding of
Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5
Mbit/s, Part 3: Audio” 1992. (β€œMPEG-1”)
 [5] R. Hellman, β€œAsymmetry of Masking Between Noise and Tone,” Percep. and
Psychphys., pp. 241-246, vol.11, 1972
 [6] M. Schroeder, et al., β€œOptimizing Digital Speech Coders by Exploiting Masking
Properties of the Human Ear,” J. Acoust. Soc. Am., pp. 1647-1652, Dec. 1979
 [7] E. Zwicker and H. Fastl, Psychoacoustics Facts and Models, Springer-Verlag,
1990
 [8] C. Burrus, R. A. Gopinath and H. Guo, β€œIntroduction to Wavelets & Wavelet
Transforms”, Prentice-Hall 1998

More Related Content

What's hot

Wavelet based denoisiong of acoustic signal
Wavelet based denoisiong of acoustic signalWavelet based denoisiong of acoustic signal
Wavelet based denoisiong of acoustic signaleSAT Publishing House
Β 
Companding & Pulse Code Modulation
Companding & Pulse Code ModulationCompanding & Pulse Code Modulation
Companding & Pulse Code ModulationYeshudas Muttu
Β 
Introduction to Analog signal
Introduction to Analog signalIntroduction to Analog signal
Introduction to Analog signalHirdesh Vishwdewa
Β 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringTejus Adiga M
Β 
Adaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalAdaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalSai Malleswar
Β 
Image Denoising Using Wavelet
Image Denoising Using WaveletImage Denoising Using Wavelet
Image Denoising Using WaveletAsim Qureshi
Β 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
Β 
Noise in communication system
Noise in communication systemNoise in communication system
Noise in communication systemfirdous006
Β 
Chap 3 data and signals
Chap 3 data and signalsChap 3 data and signals
Chap 3 data and signalsMukesh Tekwani
Β 
3.Wavelet Transform(Backup slide-3)
3.Wavelet Transform(Backup slide-3)3.Wavelet Transform(Backup slide-3)
3.Wavelet Transform(Backup slide-3)Nashid Alam
Β 
Signal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLABSignal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLABEmbedded Plus Trichy
Β 
Data Compression using Multiple Transformation Techniques for Audio Applicati...
Data Compression using Multiple Transformation Techniques for Audio Applicati...Data Compression using Multiple Transformation Techniques for Audio Applicati...
Data Compression using Multiple Transformation Techniques for Audio Applicati...iosrjce
Β 
Slides3 The Communication System midterm Slides
Slides3 The Communication System midterm SlidesSlides3 The Communication System midterm Slides
Slides3 The Communication System midterm SlidesNoctorous Jamal
Β 

What's hot (20)

Chap 5
Chap 5Chap 5
Chap 5
Β 
t23notes
t23notest23notes
t23notes
Β 
Pcm
PcmPcm
Pcm
Β 
Wavelet based denoisiong of acoustic signal
Wavelet based denoisiong of acoustic signalWavelet based denoisiong of acoustic signal
Wavelet based denoisiong of acoustic signal
Β 
Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
Β 
Companding & Pulse Code Modulation
Companding & Pulse Code ModulationCompanding & Pulse Code Modulation
Companding & Pulse Code Modulation
Β 
Introduction to Analog signal
Introduction to Analog signalIntroduction to Analog signal
Introduction to Analog signal
Β 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
Β 
Adaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalAdaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signal
Β 
ecegwp
ecegwpecegwp
ecegwp
Β 
Image Denoising Using Wavelet
Image Denoising Using WaveletImage Denoising Using Wavelet
Image Denoising Using Wavelet
Β 
Digital signal processing part1
Digital signal processing part1Digital signal processing part1
Digital signal processing part1
Β 
Unit i-pcm-vsh
Unit i-pcm-vshUnit i-pcm-vsh
Unit i-pcm-vsh
Β 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
Β 
Noise in communication system
Noise in communication systemNoise in communication system
Noise in communication system
Β 
Chap 3 data and signals
Chap 3 data and signalsChap 3 data and signals
Chap 3 data and signals
Β 
3.Wavelet Transform(Backup slide-3)
3.Wavelet Transform(Backup slide-3)3.Wavelet Transform(Backup slide-3)
3.Wavelet Transform(Backup slide-3)
Β 
Signal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLABSignal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLAB
Β 
Data Compression using Multiple Transformation Techniques for Audio Applicati...
Data Compression using Multiple Transformation Techniques for Audio Applicati...Data Compression using Multiple Transformation Techniques for Audio Applicati...
Data Compression using Multiple Transformation Techniques for Audio Applicati...
Β 
Slides3 The Communication System midterm Slides
Slides3 The Communication System midterm SlidesSlides3 The Communication System midterm Slides
Slides3 The Communication System midterm Slides
Β 

Similar to Final presentation

MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio CompressionDaniel Brewster
Β 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdfIjictTeam
Β 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1Rajat Kumar
Β 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.pptshareea2002
Β 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compressionMr SMAK
Β 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compressionMr SMAK
Β 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniquesidescitation
Β 
Speech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometerSpeech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometerI'am Ajas
Β 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfBala Murugan
Β 
A1mpeg12 2004
A1mpeg12 2004A1mpeg12 2004
A1mpeg12 2004Thiago Skiba
Β 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
Β 
Comparison of different Sub-Band Adaptive Noise Canceller with LMS and RLS
Comparison of different Sub-Band Adaptive Noise Canceller with LMS and RLSComparison of different Sub-Band Adaptive Noise Canceller with LMS and RLS
Comparison of different Sub-Band Adaptive Noise Canceller with LMS and RLSijsrd.com
Β 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptxzulhelmanz
Β 
Tomas_IWAENC_keynote10.ppt
Tomas_IWAENC_keynote10.pptTomas_IWAENC_keynote10.ppt
Tomas_IWAENC_keynote10.pptRakesh Pogula
Β 
Dct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decompositionDct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decompositionMuhammad Younas
Β 

Similar to Final presentation (20)

MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
Β 
Speech Compression
Speech CompressionSpeech Compression
Speech Compression
Β 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdf
Β 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
Β 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.ppt
Β 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
Β 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
Β 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Β 
H010234144
H010234144H010234144
H010234144
Β 
Digital audio
Digital audioDigital audio
Digital audio
Β 
Speech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometerSpeech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometer
Β 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
Β 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
Β 
A1mpeg12 2004
A1mpeg12 2004A1mpeg12 2004
A1mpeg12 2004
Β 
H0814247
H0814247H0814247
H0814247
Β 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Β 
Comparison of different Sub-Band Adaptive Noise Canceller with LMS and RLS
Comparison of different Sub-Band Adaptive Noise Canceller with LMS and RLSComparison of different Sub-Band Adaptive Noise Canceller with LMS and RLS
Comparison of different Sub-Band Adaptive Noise Canceller with LMS and RLS
Β 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
Β 
Tomas_IWAENC_keynote10.ppt
Tomas_IWAENC_keynote10.pptTomas_IWAENC_keynote10.ppt
Tomas_IWAENC_keynote10.ppt
Β 
Dct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decompositionDct,gibbs phen,oversampled adc,polyphase decomposition
Dct,gibbs phen,oversampled adc,polyphase decomposition
Β 

Recently uploaded

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
Β 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
Β 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
Β 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
Β 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
Β 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
Β 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
Β 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
Β 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
Β 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
Β 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
Β 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
Β 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
Β 

Recently uploaded (20)

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Β 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Β 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
Β 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
Β 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
Β 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Β 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
Β 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Β 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
Β 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Β 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
Β 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Β 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
Β 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Β 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
Β 
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
Β 
young call girls in Green ParkπŸ” 9953056974 πŸ” escort Service
young call girls in Green ParkπŸ” 9953056974 πŸ” escort Serviceyoung call girls in Green ParkπŸ” 9953056974 πŸ” escort Service
young call girls in Green ParkπŸ” 9953056974 πŸ” escort Service
Β 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
Β 

Final presentation

  • 1. Audio Compression using Discrete Wavelet Transform Thyagarajan Venkatanarayanan Meghasyam Tummalacherla
  • 2. Overview  Why audio compression?  Overview of system  Wavelet Representation  The psychoacoustic model  Results
  • 3. Overview of the system Wavelet Analysis Psychoacoustic Model Threshold, decide bits (Mu law) Compression Audio Signal Quantization New representation (Mu law) Expansion Wavelet Synthesis Reconstructed Signal Encoding Decoding
  • 4. Why audio compression?  To represent signal with minimum number of bits without losing quality/message of the signal  We use a Wavelet based coding method with a psychoacoustic model to exploit perceptual masking and eliminate source redundancies
  • 5. Masking phenomena  Masking refers to a process where one sound is rendered inaudible because of the presence of another sound  Simultaneous masking refers to a frequency domain phenomenon which has been observed within critical bands (in-band).  Important to distinguish between two types of simultaneous masking, namely  tone-masking-noise: a tone occurring at the center of a critical band masks noise of any subcritical bandwidth  noise-masking-tone: follows the same pattern with the roles of masker and maskee reversed
  • 6. Temporal masking  Masking also occurs in the time-domain.  In the context of audio signal analysis, abrupt signal transients (e.g., the onset of a percussive musical instrument) create pre- and post- masking regions in time
  • 8. Critical Band and Masking Distance between each Critical band is one bark
  • 9. The Psychoacoustic model  Based on tests done on Human hearing  Uses an N-point DFT for high resolution spectral analysis, then estimates for each input frame individual simultaneous masking thresholds due to the presence of tone-like and noise-like maskers in the signal spectrum.  A global masking threshold is then estimated for a subset of the original N/2 frequency bins by (power) additive combination of the tonal and non-tonal individual masking thresholds
  • 10. Step 1: Spectral analysis and Normalization  Input is segmented into 512 length frames by applying a Hanning window and the power spectral density (PSD) is obtained using a N-point FFT: 𝑃 π‘˜ = log10 𝑛=0 π‘βˆ’1 𝑀 𝑛 π‘₯ 𝑛 π‘’βˆ’π‘—2πœ‹π‘˜π‘› 𝑁 2 0 ≀ π‘˜ ≀ 𝑁 2  Normalized to 96dB, for MPEG 1 codec parameters 𝑃 𝑁 π‘˜ = 𝑃 π‘˜ βˆ’ max 𝑃 π‘˜ + 96
  • 11. Step 2: Identification of tonal & noise (non tonal) frequencies  Find Local Maxima  Is a tonal frequency if  Remaining maxima, not in the Β±Ξ” π‘˜ range of a tonal frequency are classified as noise frequencies
  • 12. Tonal and Noise Components
  • 14. Step 3: Thresholding and reorganization of Masks  Any tonal/noise maskers below the absolute hearing threshold are discarded 𝑃 𝑇𝑀,𝑁𝑀 π‘˜ ≀ π‘‡π‘ž π‘˜ Where Tq(f) is the absolute threshold of hearing (the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment)  Next, a sliding is used to replace any pair of maskers occurring within a distance of 0.5 Bark by the stronger of the two
  • 16. Step 4: Individual masking thresholds  Each individual threshold represents a masking contribution at frequency bin i due to the tone or noise masker located at bin j.  The tonal masker thresholds, TTM(i, j), are given (in dB) by: 𝑇 𝑇𝑀 𝑖, 𝑗 = 𝑃 𝑇𝑀 𝑗 βˆ’ 0.275𝑧 𝑗 + 𝑆𝐹 𝑖, 𝑗 βˆ’ 6.025  The noise masker thresholds, TTM(i, j), are given (in dB) by: 𝑇 𝑁𝑀 𝑖, 𝑗 = 𝑃 𝑁𝑀 𝑗 βˆ’ 0.175𝑧 𝑗 + 𝑆𝐹 𝑖, 𝑗 βˆ’ 2.025
  • 19. Step 5: Global masking threshold  The global masking threshold, Tg(i), is obtained by: 𝑇𝑔 𝑖 = 10 log10 𝑙=1 𝐿 100.1𝑇 𝑇𝑀 𝑖,𝑙 + π‘š=1 𝑀 100.1𝑇 𝑁𝑀 𝑖,π‘š 𝑑𝐡  The global threshold for each frequency bin represents a signal dependent, power additive modification of the absolute threshold due to the spread of all tonal and noise maskers in the signal power spectrum
  • 21. Effective SNR (PSD – Global Threshold)
  • 22. Wavelet representation of signal 𝑔 𝑑 = π‘˜ 𝑐𝑗0 π‘˜ 2 𝑗 2 𝛷 2 𝑗 𝑑 βˆ’ π‘˜ + π‘˜ 𝑗=𝑗0 𝑗1 𝑑𝑗 π‘˜ 2 𝑗 2 𝛹(2 𝑗 𝑑 βˆ’ π‘˜)  Audio signal divided into non-overlapping frames of length 512 samples (11.5 ms at 44.1 kHz). Each frame is multiplied by a Hanning window of same length to avoid border distortions
  • 23. Wavelet decomposition – initial step  Given a signal s of length N, the DWT consists of log2 N stages at most.  First step produces 2 sets of coefficients: approximation coefficients CA1, and detail coefficients CD1.
  • 24. Recursive wavelet decomposition  The wavelet decomposition of the signal s analyzed at level j has the following structure: [cAj, cDj, ..., cD1].
  • 25. Wavelet decomposition  The wavelet transform coefficients are computed recursively using an efficient pyramid algorithm
  • 26. Wavelet: Vanishing moments  We choose orthonormal wavelets  For a wavelet of K coefficients, it can have at most K/2 vanishing moments  To ensure regularity (How fast the coefficients decay to zero), we choose a wavelet with high no. of vanishing moments  As they are best suited for Audio processing
  • 27. Wavelet: Sparsity  Sparsity – Number of non zero coefficients, the lesser the better
  • 28. Dependence of efficiency on choice of wavelet  Type of wavelet basis has a significant impact on efficiency of coding scheme.  Used MATLAB function β€œwavedec” to perform our 1-D wavelet decomposition.  Compression using β€œwdencmp” by parameters obtained using β€œddencmp”  Efficiency of compression measured using two metrics returned by function: PERFL0 and PERFL2
  • 29. Dependence of efficiency on choice of wavelet  Daubechies refers to a particular family of wavelets. The number refers to the number of vanishing moments  Simply put, the higher the number of vanishing moments, the smoother the wavelet (and longer the wavelet filter). Wavelet PERF0 PERFL2 Haar 31.01 99.93 Daubechies-2 64.63 99.95 Daubechies-10 65.59 99.97
  • 30. Bit-rate reduction  After testing the coder (with the Daubechies-10 wavelet) with 4 different music signals originally at 16 bits/sample (violin, drums, piano, Adele), we observed that the average number of bits required to encode them was around 7.5 i.e. we are able to attain more than 50 % reduction.  We assumed a bit for every 6.02dB of max of effective SNR for a frame
  • 31. Bits Allocation per frame (lowest is 4)
  • 33. Subjective tests  Important to eliminate chance in listening tests. So, we provided several stimuli of each source material to each listener. We also did not reveal to the listener the actual order in which the stimuli are presented (e.g. original, coder 1, coder 2 etc).  Figures indicate coder provided a transparent coding for all audio sources.  Quality of the piano signal was not as good as others because it contains long segments of nearly steady of slowly decaying sinusoids which the wavelet based coder did not seem to handle well Sample Average probability of original preferred over encoded Sample size Comments Violin 0.25 12 Transparent Piano 0.5 10 Nearly transparent Drums 0.3 10 Transparent Adele 0.27 15 Transparent
  • 34. References  [1] D. Sinha and A. Tewfik. β€œLow Bit Rate Transparent Audio Compression using Adapted Wavelets”, IEEE Trans. ASSP, Vol. 41, No. 12, December 1993  [2] T. Painter and A. Spanias, β€œA review of algorithms for perceptual coding of digital audio signals,” DSP-97, 1977.  [3] I. Daubechies, β€œOrthonormal bases of compactly supported wavelets”, Commun. Pure Appl. Math.. vol 41, pp 909-996, Nov 1988  [4] ISO/IEC JTC1/SC29/WG11 MPEG, IS11172-3 β€œInformation Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s, Part 3: Audio” 1992. (β€œMPEG-1”)  [5] R. Hellman, β€œAsymmetry of Masking Between Noise and Tone,” Percep. and Psychphys., pp. 241-246, vol.11, 1972  [6] M. Schroeder, et al., β€œOptimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear,” J. Acoust. Soc. Am., pp. 1647-1652, Dec. 1979  [7] E. Zwicker and H. Fastl, Psychoacoustics Facts and Models, Springer-Verlag, 1990  [8] C. Burrus, R. A. Gopinath and H. Guo, β€œIntroduction to Wavelets & Wavelet Transforms”, Prentice-Hall 1998

Editor's Notes

  1. Used to obtain compact digital representations of wideband audio signals for the purposes of efficient transmission or storage. Central objective: to represent signal with minimum number of bits while achieving transparent signal reconstruction, i.e., generating output audio which cannot be distinguished from the original input, even by a sensitive listener An audio compression scheme must exploit the two sources of irrelevancies and redundancies in audio signals: the masking characteristics of the human hearing process and the statistical redundancies in the signal An approach which employs a wavelet based coding method used with a psychoacoustic model to exploit perceptual masking and eliminate source redundancies
  2. Masking also occurs in the time-domain. In the context of audio signal analysis, abrupt signal transients (e.g., the onset of a percussive musical instrument) create pre- and post- masking regions in time during which a listener will not perceive signals beneath the elevated audibility thresholds produced by a masker
  3. We consider the case of a single masking tone occurring at the center of a critical band. All levels in the figure are given in terms of dB. A hypothetical masking tone occurs at some masking level. This generates an excitation along the basilar membrane which is modeled by a spreading function and a corresponding masking threshold
  4. Local maxima in the sample PSD which exceed neighboring components within a certain bark distance by at least 7 dB are classified as tonal. Tonal maskers are then computed from the spectral peaks listed in ST as follows: 𝑃 𝑇𝑀 π‘˜ =10 log 10 𝑗= βˆ’1 1 10 0.1 𝑃(π‘˜+𝑗) 𝑑𝐡 A single noise masker for each critical band is computed from (remaining) spectral lines not within the Β±Ξ”k neighborhood of a tonal masker using a similar sum
  5. where PTM(j) denotes the tonal masker in frequency bin j, z(j) denotes the Bark frequency of bin j, and the spread of masking from masker bin j, to masker bin i, SF(i, j) is a piecewise linear function of the masker level, P(j) and Bark maskee-masker separation Ξ”z = z(i) – z(j)
  6. The audio signal is represented in terms of the translates and dilates of the scaling function (say Daubechies 10) as: 𝑔 𝑑 = π‘˜ 𝑐 𝑗 0 π‘˜ 2 𝑗 0 2 𝛷 2 𝑗 π‘‘βˆ’π‘˜ + π‘˜ 𝑗= 𝑗 0 ∞ 𝑑 𝑗 π‘˜ 2 𝑗 2 𝛹( 2 𝑗 π‘‘βˆ’π‘˜) Such an expansion provides a multiresolution analysis of g(t). The choice of j0 sets the coarsest scale whose space is spanned by Ξ¦j0, k (t). Audio signal divided into non-overlapping frames of length 512 samples (11.5 ms at 44.1 kHz). Each frame is multiplied by a Hanning window of same length to avoid border distortions Restrictions: compact support wavelets, to create orthogonal translates and dilates of the wavelet and to ensure regularity (fast decay of coefficients controlled by choosing wavelets with large number of vanishing moments)
  7. Given a signal s of length N, the DWT consists of log2 N stages at most. First step produces 2 sets of coefficients: approximation coefficients CA1, and detail coefficients CD1. More precisely, the first step is: The next step splits the approximation coefficientsΒ cA1Β in two parts using the same scheme, replacingΒ sΒ byΒ cA1, and producingΒ cA2Β andΒ cD2, and so on