SlideShare a Scribd company logo
3
Example Based Audio Editing
Ramin Anushiravani
Advisor: Paris Smaragdis
Qualifying Exam Fall 15 1
Outline
• Motivation
– Why? What? How?
• Equalizer Matching
• Noise Matching
• Reverberation Matching
• Summary
2
Why? Motivation.
, ,
3
What? Graphic Equalizer.
iTunes Equalizer setting
4
How? Signal Processing!
Input
Example
Trim
Resample
to 44.1
kHz
STFT Function ISTFT
Result
Normalize
R: hop size
: time frame
L: length of the signal
Smith, J.O. Spectral Audio Signal Processing,
http://ccrma.stanford.edu/~jos/sasp/, online book,
2011 edition
k: frequency index
w: window function
Preprocessing
5
Inverse
Equalizer Matching
Power
Spectrum
STFT
Power
Spectrum
Element-wise multiplication
P Average Power Spectrum
input
example result
L Total Number of frames
Time-Invariant
6
Demo
7
Noise Matching
Denoise
Denoise
EQ
EQ +
SNRx
1
2
2:
1:
Equalizing noisy signals
Equalizing just the noise
-
-
8
Demo
example input
9
Musical
noise
Demo…
10
Denoising
Spectral Subtraction
Noise profile estimate
Estimate clean power spectrum Noise suppression
factor
Fourier transform of the noisy
signal in one frame
In practice,
• Noise profile is estimated over multiple frequency bands.
• Spectral subtraction fails at low SNR regions by creating musical noises. This artifact is
reduced by post-filtering the spectral subtraction.
(Philipos C. Loizou, Speech Enhancement
Theory and Practice, 2013)
Additive stationary noise
( Esch and Vary, Efficient Musical Noise Suppression for
Speech Enhancement Systems, 2009)
11
Reverberation
Krannert Center for the Performing Arts, Foellinger Great Hall
12
Reverberation
Falkland Palace Bottle Dungeon
reverb sound
dry sound reverb kernel
(OpenAir database, www.openairlib.net)
Approximate in the
magnitude STFT domain
Convolution between
time frames of
magnitude X and H at
each frequency index
(R. Talmon, I. Cohen, and S. Gannot, “Relative
transfer function identification using convolutive
transfer function approximation,” IEEE Trans.
Audio, Speech, and Language Process, 2009.)
13
Reverb kernel
=
14
Reverberation Matching 1
Adry
Ra
Bdry
Rb
Dereverberation
Dereverberation
Ideal case – Perfect decomposition of reverb sounds into dry sounds and
reverb kernels.
Running out of letters!
input
example
Focus is on decomposing the magnitude spectrograms into magnitude spectrograms.
I took the signals back to time domain using the reverberated input phase information.
15
Convolutive Non-negative Matrix Factorization
Update Equations:
,
Paul O’Grady & Barak Pearkmutter, Convolutive NMF with a
Sparseness Constraint, MLSP Conference, 2006
Convolution of non-
negative matrices
Shift operator
Spectrum at time frame t
Matrix of size
Ly x k with all
its elements
set to 1.
16
Dereverberation
• Initialize with positive random values.
• Initialize with positive exponential decays.
• On each iteration, enforce anti-sparsity on ,
I dropped indices and absolute values, but they’re there.
17
Set of dry speech bases (trained offline)
Corresponding activation
Reverberated activation matrix
Dereverberation
We can do better by using more prior knowledge.
Convolution is associative
average R over multiple
frequency bands
(Paris Smaragdis, “Convolutive speech
bases and their application to supervised
speech separation,” in Speech And Audio
Processing. IEEE, 2007)
18
Demo
Dereverberated
Reverb
HrWc
R
Hc
Fixed
19
Result
Original Input
Demo…
20
Reverberation Matching 2
Adry
Ra
Bdry
Rb
Dereverberation
Dereverberation
input
example
result
+
Suppress Artifact
Match Kernels
21
Example- Input
Example- Result
Summary
=>
Find power spectrums => Find EQ filter to match them. => Multiply the
EQ filter with every time frame in the input sound magnitude spectrogram.
=>
Denoise => EQ match the estimated clean and noise signals
individually. => Add the resulting input noise to the resulting clean signal
using their original SNR.
=>
Decompose to dry sound and reverb kernels => Convolve the
estimated dry input sound with the example sound’s estimated reverb
kernel.
22
23
24
Equalizer Matching
Log Mag-dB
Log spaced frequency-Hz
25
Spectral Subtraction
noisy Signal clean Signal noise
A common assumption in most papers:
Noise and the clean signal are uncorrelated.
(Philipos C. Loizou, Speech Enhancement
Theory and Practice, 2013)
Fourier Transform over a segment of x(n).
AWGN. Same over all clean input segments.
Estimated Noise PSD.
In practice H is learned
over different
frequency bands.
26
Musical Noise Reduction
( Esch and Vary, Efficient Musical Noise Suppression for
Speech Enhancement Systems, 2009)
Aim: Retain the naturalness of the
remaining background noise.
How?
• 1
Detect low SNR frames based on the
noisy signal and the estimated clean signal.
• 2
Design a smoothing window based on 1.
Lower the SNR, longer the window.
• 3
Design a post-filter to smooth the low SNR
frames, i.e. an FIR low pass filter designed
based on 2.
• 3
Element-wise multiply the noise suppression
factor by 2.
Step 3
Enhanced Spectral Subtraction 27
SS + Musical Noise Reduction
G.*H Musical Suppression PostFilterSNR= 22 dB
Noisy Input
Much Better!
.^2 .^2
(
(
.^0.5
28
Metrics for Ideal Reverberation
time
Magnitude-dB
Energy Decay Relief
Energy Decay Curve
EDC at multiple frequency bands
29
Reverberation Model
• Time Domain Statistical Model
Where b(t) is a zero mean Gaussian noise. is related to reverberation time.
• Reverberation time = RT60= Length of time to drop below 60 dB below the original level.
Sabine Formula:
Volume of the enclosure
Effective absorbing area
Area
of each wall
Absorption
coefficient
Reflection Coefficients:
30
Image Source Method
Source
Microphone
Mirror image
of the original source
Actual path
Perceived path
Image source produces
another image source
(Allen, J and Berkley, D. 'Image Method
for efficiently simulating small‐room acoustics'. The Journal of the
Acoustical Society of America, Vol 65, No.4, pp. 943‐950, 1978)
(Pictures from: Alex Tu, Reverberation
simulation from impulse response using
the Image Source Method)
Parameters that control which image source in which dimension
Reflection coefficients of the six surfaces in a rectangular
Time delay of the considered image source
31
Non-Negative Matrix Factorization
,
• Applying Gradient Descent under positive initial conditions for W and H and a ‘clever’ learning rate results in
the following multiplicative update rules,
(Lee and Seung, 1999)
Normalize W
32
Why NMF? (Lee and Seung, 1999)
Visually meaningful.
Decomposition can only be
positive. Part based
presentation.
Statistically meaningful.
Eigen faces are in the
direction of the largest
variance. Subtraction can
occur.
33
Why NMF?
m,Frequency
n, time Frame
k, Components = 2 n, time framem,Frequency
k,Components=2
W HX
34
Why Not NMF?
(Adopted from: Paul O’Grady & Barak Pearkmutter, Convolutive NMF
with a Sparseness Constraint, MLSP Conference, 2006)
35
Convolutive NMF
36
Convolutive NMF
T
H
m
k
k
n
X
n
m
37
Convolutive NMF
Iteration 1Iteration 2Iteration 3Iteration 10
38
Spectral Subtraction
SNR= 22 dB
Musical Noise –
mainly at low SNR regions
Noisy Input
Denoised-ish?
Go back to time domain
Use noisy input phase
H – Noise Suppression Factor
.^2 .^2
(
(.^0.5
39
With Musical Noise
SNR= 22 dB
Same results, better colormap?
Without Musical Noise
Noisy Signal
40

More Related Content

What's hot

LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
Dr. Uday Saikia
 
The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator
The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio EstimatorThe Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator
The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator
IJERA Editor
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Echo Cancellation Paper
Echo Cancellation Paper Echo Cancellation Paper
04 physical
04 physical04 physical
04 physical
mohamed touihri
 
Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...
Takuma_OKAMOTO
 
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
ijsrd.com
 
Allaboutequalizers
AllaboutequalizersAllaboutequalizers
Allaboutequalizersdouglaslyon
 
Digital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and PythonDigital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and Python
Mel Chua
 
Speech technology basics
Speech technology   basicsSpeech technology   basics
Speech technology basics
Hemaraja Nayaka S
 
Acoustic scales and levels
Acoustic scales and levelsAcoustic scales and levels
Acoustic scales and levels
Farhat Surve
 
Speech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometerSpeech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometerI'am Ajas
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
idescitation
 
Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
Distinguished Lecturer Series - Leon The Mathematician
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Adam Năm
 
Images
ImagesImages
Images
adil raja
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Adam Năm
 

What's hot (20)

Chapter 5 noise
Chapter 5 noiseChapter 5 noise
Chapter 5 noise
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 
Ax26326329
Ax26326329Ax26326329
Ax26326329
 
The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator
The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio EstimatorThe Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator
The Short-Time Silence of Speech Signal as Signal-To-Noise Ratio Estimator
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Echo Cancellation Paper
Echo Cancellation Paper Echo Cancellation Paper
Echo Cancellation Paper
 
04 physical
04 physical04 physical
04 physical
 
Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...Spatial Fourier transform-based localized sound zone generation with loudspea...
Spatial Fourier transform-based localized sound zone generation with loudspea...
 
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
 
Allaboutequalizers
AllaboutequalizersAllaboutequalizers
Allaboutequalizers
 
Digital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and PythonDigital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and Python
 
Speech technology basics
Speech technology   basicsSpeech technology   basics
Speech technology basics
 
Acoustic scales and levels
Acoustic scales and levelsAcoustic scales and levels
Acoustic scales and levels
 
Speech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometerSpeech measurement using laser doppler vibrometer
Speech measurement using laser doppler vibrometer
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
 
Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)
 
Images
ImagesImages
Images
 
Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)Satellite communications by dennis roddy (4th edition)
Satellite communications by dennis roddy (4th edition)
 

Similar to example based audio editing

3 D Sound
3 D Sound3 D Sound
3 D Sound
adityas87
 
Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...
JunjieShi3
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberation
Giacomo Vairetti
 
Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing
Ahmed A. Arefin
 
Final presentation
Final presentationFinal presentation
Final presentation
Meghasyam Tummalacherla
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Lushanthan Sivaneasharajah
 
The_impact_of_sound_control_room_acoustics_on_the_.pdf
The_impact_of_sound_control_room_acoustics_on_the_.pdfThe_impact_of_sound_control_room_acoustics_on_the_.pdf
The_impact_of_sound_control_room_acoustics_on_the_.pdf
EricSifuna1
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
IRJET Journal
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
IRJET Journal
 
Beamforming and microphone arrays
Beamforming and microphone arraysBeamforming and microphone arrays
Beamforming and microphone arrays
Ramin Anushiravani
 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.ppt
shareea2002
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberator
a3labdsp
 
Audio Noise Removal – The State of the Art
Audio Noise Removal – The State of the ArtAudio Noise Removal – The State of the Art
Audio Noise Removal – The State of the Art
ijceronline
 
Audio Noise Removal – The State of the Art
Audio Noise Removal – The State of the ArtAudio Noise Removal – The State of the Art
Audio Noise Removal – The State of the Art
ijceronline
 
F010334548
F010334548F010334548
F010334548
IOSR Journals
 
Sampling
SamplingSampling
Vidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systemsVidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systems
anilkurhekar
 
Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.
Nicolò Paternoster
 
P4_Predictive_Modeling_Speech.pdf
P4_Predictive_Modeling_Speech.pdfP4_Predictive_Modeling_Speech.pdf
P4_Predictive_Modeling_Speech.pdf
Yonas D. Ebren
 
A literature review on improving speech intelligibility in noisy environment
A literature review on improving speech intelligibility in noisy environmentA literature review on improving speech intelligibility in noisy environment
A literature review on improving speech intelligibility in noisy environment
OHSU | Oregon Health & Science University
 

Similar to example based audio editing (20)

3 D Sound
3 D Sound3 D Sound
3 D Sound
 
Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberation
 
Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
 
The_impact_of_sound_control_room_acoustics_on_the_.pdf
The_impact_of_sound_control_room_acoustics_on_the_.pdfThe_impact_of_sound_control_room_acoustics_on_the_.pdf
The_impact_of_sound_control_room_acoustics_on_the_.pdf
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
Beamforming and microphone arrays
Beamforming and microphone arraysBeamforming and microphone arrays
Beamforming and microphone arrays
 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.ppt
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberator
 
Audio Noise Removal – The State of the Art
Audio Noise Removal – The State of the ArtAudio Noise Removal – The State of the Art
Audio Noise Removal – The State of the Art
 
Audio Noise Removal – The State of the Art
Audio Noise Removal – The State of the ArtAudio Noise Removal – The State of the Art
Audio Noise Removal – The State of the Art
 
F010334548
F010334548F010334548
F010334548
 
Sampling
SamplingSampling
Sampling
 
Vidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systemsVidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systems
 
Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.
 
P4_Predictive_Modeling_Speech.pdf
P4_Predictive_Modeling_Speech.pdfP4_Predictive_Modeling_Speech.pdf
P4_Predictive_Modeling_Speech.pdf
 
A literature review on improving speech intelligibility in noisy environment
A literature review on improving speech intelligibility in noisy environmentA literature review on improving speech intelligibility in noisy environment
A literature review on improving speech intelligibility in noisy environment
 

example based audio editing

  • 1. 3 Example Based Audio Editing Ramin Anushiravani Advisor: Paris Smaragdis Qualifying Exam Fall 15 1
  • 2. Outline • Motivation – Why? What? How? • Equalizer Matching • Noise Matching • Reverberation Matching • Summary 2
  • 4. What? Graphic Equalizer. iTunes Equalizer setting 4
  • 5. How? Signal Processing! Input Example Trim Resample to 44.1 kHz STFT Function ISTFT Result Normalize R: hop size : time frame L: length of the signal Smith, J.O. Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/, online book, 2011 edition k: frequency index w: window function Preprocessing 5
  • 6. Inverse Equalizer Matching Power Spectrum STFT Power Spectrum Element-wise multiplication P Average Power Spectrum input example result L Total Number of frames Time-Invariant 6
  • 8. Noise Matching Denoise Denoise EQ EQ + SNRx 1 2 2: 1: Equalizing noisy signals Equalizing just the noise - - 8
  • 11. Denoising Spectral Subtraction Noise profile estimate Estimate clean power spectrum Noise suppression factor Fourier transform of the noisy signal in one frame In practice, • Noise profile is estimated over multiple frequency bands. • Spectral subtraction fails at low SNR regions by creating musical noises. This artifact is reduced by post-filtering the spectral subtraction. (Philipos C. Loizou, Speech Enhancement Theory and Practice, 2013) Additive stationary noise ( Esch and Vary, Efficient Musical Noise Suppression for Speech Enhancement Systems, 2009) 11
  • 12. Reverberation Krannert Center for the Performing Arts, Foellinger Great Hall 12
  • 13. Reverberation Falkland Palace Bottle Dungeon reverb sound dry sound reverb kernel (OpenAir database, www.openairlib.net) Approximate in the magnitude STFT domain Convolution between time frames of magnitude X and H at each frequency index (R. Talmon, I. Cohen, and S. Gannot, “Relative transfer function identification using convolutive transfer function approximation,” IEEE Trans. Audio, Speech, and Language Process, 2009.) 13
  • 15. Reverberation Matching 1 Adry Ra Bdry Rb Dereverberation Dereverberation Ideal case – Perfect decomposition of reverb sounds into dry sounds and reverb kernels. Running out of letters! input example Focus is on decomposing the magnitude spectrograms into magnitude spectrograms. I took the signals back to time domain using the reverberated input phase information. 15
  • 16. Convolutive Non-negative Matrix Factorization Update Equations: , Paul O’Grady & Barak Pearkmutter, Convolutive NMF with a Sparseness Constraint, MLSP Conference, 2006 Convolution of non- negative matrices Shift operator Spectrum at time frame t Matrix of size Ly x k with all its elements set to 1. 16
  • 17. Dereverberation • Initialize with positive random values. • Initialize with positive exponential decays. • On each iteration, enforce anti-sparsity on , I dropped indices and absolute values, but they’re there. 17
  • 18. Set of dry speech bases (trained offline) Corresponding activation Reverberated activation matrix Dereverberation We can do better by using more prior knowledge. Convolution is associative average R over multiple frequency bands (Paris Smaragdis, “Convolutive speech bases and their application to supervised speech separation,” in Speech And Audio Processing. IEEE, 2007) 18
  • 22. Summary => Find power spectrums => Find EQ filter to match them. => Multiply the EQ filter with every time frame in the input sound magnitude spectrogram. => Denoise => EQ match the estimated clean and noise signals individually. => Add the resulting input noise to the resulting clean signal using their original SNR. => Decompose to dry sound and reverb kernels => Convolve the estimated dry input sound with the example sound’s estimated reverb kernel. 22
  • 23. 23
  • 24. 24
  • 25. Equalizer Matching Log Mag-dB Log spaced frequency-Hz 25
  • 26. Spectral Subtraction noisy Signal clean Signal noise A common assumption in most papers: Noise and the clean signal are uncorrelated. (Philipos C. Loizou, Speech Enhancement Theory and Practice, 2013) Fourier Transform over a segment of x(n). AWGN. Same over all clean input segments. Estimated Noise PSD. In practice H is learned over different frequency bands. 26
  • 27. Musical Noise Reduction ( Esch and Vary, Efficient Musical Noise Suppression for Speech Enhancement Systems, 2009) Aim: Retain the naturalness of the remaining background noise. How? • 1 Detect low SNR frames based on the noisy signal and the estimated clean signal. • 2 Design a smoothing window based on 1. Lower the SNR, longer the window. • 3 Design a post-filter to smooth the low SNR frames, i.e. an FIR low pass filter designed based on 2. • 3 Element-wise multiply the noise suppression factor by 2. Step 3 Enhanced Spectral Subtraction 27
  • 28. SS + Musical Noise Reduction G.*H Musical Suppression PostFilterSNR= 22 dB Noisy Input Much Better! .^2 .^2 ( ( .^0.5 28
  • 29. Metrics for Ideal Reverberation time Magnitude-dB Energy Decay Relief Energy Decay Curve EDC at multiple frequency bands 29
  • 30. Reverberation Model • Time Domain Statistical Model Where b(t) is a zero mean Gaussian noise. is related to reverberation time. • Reverberation time = RT60= Length of time to drop below 60 dB below the original level. Sabine Formula: Volume of the enclosure Effective absorbing area Area of each wall Absorption coefficient Reflection Coefficients: 30
  • 31. Image Source Method Source Microphone Mirror image of the original source Actual path Perceived path Image source produces another image source (Allen, J and Berkley, D. 'Image Method for efficiently simulating small‐room acoustics'. The Journal of the Acoustical Society of America, Vol 65, No.4, pp. 943‐950, 1978) (Pictures from: Alex Tu, Reverberation simulation from impulse response using the Image Source Method) Parameters that control which image source in which dimension Reflection coefficients of the six surfaces in a rectangular Time delay of the considered image source 31
  • 32. Non-Negative Matrix Factorization , • Applying Gradient Descent under positive initial conditions for W and H and a ‘clever’ learning rate results in the following multiplicative update rules, (Lee and Seung, 1999) Normalize W 32
  • 33. Why NMF? (Lee and Seung, 1999) Visually meaningful. Decomposition can only be positive. Part based presentation. Statistically meaningful. Eigen faces are in the direction of the largest variance. Subtraction can occur. 33
  • 34. Why NMF? m,Frequency n, time Frame k, Components = 2 n, time framem,Frequency k,Components=2 W HX 34
  • 35. Why Not NMF? (Adopted from: Paul O’Grady & Barak Pearkmutter, Convolutive NMF with a Sparseness Constraint, MLSP Conference, 2006) 35
  • 38. Convolutive NMF Iteration 1Iteration 2Iteration 3Iteration 10 38
  • 39. Spectral Subtraction SNR= 22 dB Musical Noise – mainly at low SNR regions Noisy Input Denoised-ish? Go back to time domain Use noisy input phase H – Noise Suppression Factor .^2 .^2 ( (.^0.5 39
  • 40. With Musical Noise SNR= 22 dB Same results, better colormap? Without Musical Noise Noisy Signal 40

Editor's Notes

  1. Why? What?
  2. Fix the STFT equations
  3. Use the powerful yet so simple equalizer matching to do denoising as well.
  4. Well, now we can’t ignore time here anymore. Reverbs are usually longer than a time-frame and are presented in a convolutive manner. FIR filtering here gives you too many taps, and even when inversing you have to deal with whether its minimum phase and invertible and …
  5. Use a sound that gives you a less artifacty result.
  6. Def of conv2
  7. Put some sounds here…
  8. If you’re interested I designed a user inteface to play with.
  9. Might want to get rid of details and only show some to intrigue questions
  10.   is the total amount of signal energy remaining in the reverberator impulse response at time   (Smith, J.O. "Delay Lines", in Physical Audio Signal Processing, http://ccrma.stanford.edu/~jos/pasp/Delay_Lines.html, online book, 2010 edition)
  11. Polask statistical reverb model
  12. http://ses.library.usyd.edu.au/bitstream/2123/10601/2/Reverberation%20simulation%20from%20impulse%20response.pdf
  13. Use a more interesting realistic noise on false colormap Add musical noise result as well