SlideShare a Scribd company logo
1 of 106
Download to read offline
Comparison of Single Channel
Blind Dereverberation Methods
for Speech Signals
Deha Deniz Türköz - MSc Thesis
Thesis Supervisor: Hakan Erdoğan
Sabancı Üniversitesi
27.06.2016
OUTLINE
1) Introduction
2) Background
a) Features of speech
b) Reverberation model
c) Room impulse response (RIR)
d) Non-negative matrix factorization (NMF)
3) Blind-Dereverberation Methods
a) Delayed linear prediction (DLP)
b) Weighted prediction error (G-WPE)
c) Laplacian based Weighted Prediction Error (L-WPE)
d) NMF based spectral modeling (NMF+N-CTF)
e) Sparsity penalized weighted least squares method (SPWLS)
4) Experiments and Comparisons
5) Discussion and Conclusion
2
1. Introduction
3
1. Introduction
Reverberation:
● is an effect occurs on
speech data due to
reflections through walls,
● decreases speech
intelligibility,
● degrades applications such
as ASR, hands-free
teleconferencing,
● can be modeled with an LTI
filter.
4
● If filter, h is known, then clean signal,s can be
recovered with a simple deconvolution operation called
dereverberation.
● For most cases h & s are unknowns and x is the only known
parameter. Predicting h & s from x is called
“Blind-dereverberation problem” which is the main subject
of this work.
1. Introduction
5
Aim of this work is to compare the existing
blind-dereverberation methods
○ DLP: delayed linear prediction,
○ G-WPE: Gaussian based weighted prediction error,
○ L-WPE: Laplacian based based weighted prediction
error,
○ NMF+N-CTF: NMF based spectral-temporal modeling
and offer a new algorithm called
○ SPWLS: sparsity penalized weighted least squares.
1. Introduction
6
2. Background
a. FeaturesofSpeech
7
2a.FeaturesofSpeech
● Speech is a signal created
through human vocal system.
● Input of vocal tract is called
glottal signal:
○ White noise,
○ Impulse train
● Vocal tract system can be
modeled as all-pole filter
means speech production is a simple
LTI filtering operation of a glottal
signal.
8
2A. Features of Speech
● Speech signals are non-stationary.
● General approach: divide signal into small time segments,
assume each of them are stationary.
● To analyze speech: short-time Fourier transform (STFT)
● STFT: divides speech signal into overlapping segments
called frames by using a window filter. Calculates DFT of
these frames
9
2A.Features of Speech
Formulation of STFT:
L: frame shift,
N:frame size,
X(n,k): discrete STFT coefficients of speech signal x[m] at frame n.
W[m]: Hamming window
10
2A. Features of Speech
● STFT of signal is interpreted as a matrix having complex
DFT coefficients at columns.
11
2A. Features of Speech
● To visualize signal’s frequency changes with respect to
time: spectrogram
● Spectrogram, S(n,k) uses power spectral domain (PSD)
measures of STFT matrix, X(n,k) as intensity values in an
2D image:
12
2.Background
b.ReverberationModel
13
2b.ReverberationModel
● Reverberation environment can be modeled as an LTI filter
which is called room impulse response (RIR).
● Reverberation model:
h(t): RIR, unknown
s(t): clean signal (anechoic signal), unknown
x(t): reverberated signal (echoed signal), known
14
2b. Reverberation Model
Reverberation effect on spectrogram:
15
2.Background
c.Room Impulse Response(RIR)
16
2c.Room Impulse Response (RIR)
The length of RIR depends on
● Room size,
● Room temperature,
● Room shape,
● Microphone’s distance to the speech source,
● Absorption of sound in room,
: time required for reflected signal to drop by 60 dB level
● RIR shows FIR filter characteristic.
17
2c. Room Impulse Response (RIR)
Usually RIR is divided into two parts:
1. Early reverberation
2. Late reverberation: the most detrimental part of echo
n(t): noise
d(t): early echo + clean signal
(desired signal)
r(t): Late echo
Lh: the length of RIR
h(t): RIR, (earl echo + late echo) 18
2c. Room Impulse Response(RIR)
Then, early and late reverberations are
D: the length of early reverberation
19
2c. Room Impulse Response (RIR)
20
2.Background
d.Non-negative Matrix Factorization (NMF)
21
2d. Non-negative Matrix Factorization (NMF)
NMF: decomposition a V matrix as production of two matrices
B and G with non-negative entries.
B: basis or dictionary matrix, G: weight or gains matrix.
● This problem can be interpreted as an optimization
problem as follows:
where C is the cost function for measuring the distance
between V and BG 22
2d. Non-negative Matrix Factorization (NMF)
● Columns of B are called basis vectors,
● Number of B matrix columns are kept smaller than the size
of V,
● Iterative algorithms are utilized to solve the NMF
problem, since there is no unique solution.
● Initial B & G matrices can be randomized positive numbers
or supervised matrices for fast convergence.
● Popular iterative methods to formulate distance function
between V and BG are:
○ Euclidean distance,
○ Kullback-Leibler distance (KL),
○ Itakuro-Saito distance method (IS).
23
2d.Non-negative Matrix Factorization (NMF)
Kullback-Leibler divergence between V and BG and defined as [6]:
where “1” is the matrix of ones, has the
same size of V
24
2d.Non-negative Matrix Factorization (NMF)
● NMF is a non-convex algorithm and have multiple local
minimums. As a result, B and G can vary for the same V
matrix.
● NMF is a common method used in speech processing, deep
learning, clustering, and computer vision.
● In speech processing, NMF has applications for
Audio-Source Separation, source/filter model,
blind-dereverberation [3][4], speech denoising and so on.
25
3. Blind-Dereverberation Methods
a. Delayed linear prediction (DLP)
26
3a.DelayedLinearPrediction(DLP)
We denote time-domain signals x(t), s(t), h(t) as
respectively.
STFT-domain signal notations are , for x(n,k),
s(n,k), h(n,k) respectively.
Then,
27
3a. Delayed Linear Prediction (DLP)
● DLP estimates inverse filter coefficients from
reverberated signal.
● inverse filter of length Lw, can be used to
approximately obtain a dereverberated signal as:
● In matrix form, reverberation can be formulated as
28
3a. Delayed Linear Prediction (DLP)
29
3a. Delayed Linear Prediction (DLP)
● means desired signal can be estimated by only using
reverberated signal and its past samples.
● Then, the inverse filter is
● The number of zeros in the inverse filter vector is equal
to D, delay.
● In conclusion, DLP algorithm is a simple technique to
achieve dereverberation.
● it may not work well in most cases. Reason is having an
FIR filter as the inverse filter.
30
3.Blind-DereverberationMethods
b. Weighted prediction error (G-WPE)
31
3b. Weighted prediction error (g-wpe)
Assumption 1: speech signal has local Gaussian distribution
for small frames with length Lf,
Assumption 2: samples are mutually uncorrelated after a
certain distance,
Assumption 3: variance is constant for short-time frames
with size Lf.
32
3b. Weighted prediction error (g-wpe)
● Dereverberation can be done both in time domain and in
STFT domain,
● Using time domain is very costly, because of having quite
big matrices, so STFT domain will be used.
● Probability density function of desired signal in STFT
domain,
n:frame number, k:frequency bin, : time-varying variance
Then,
33
3b. Weighted prediction error (g-wpe)
● Variance values alter only with respect to time frames
Thus,
● Apply likelihood maximization to Gaussian pdf. Then, log
likelihood function for dereverberation process in STFT
domain becomes:
Parameter vector for likelihood maximization:
34
3b. Weighted prediction error (g-wpe)
Maximizing the equation with respect to parameter vector,
cannot be achieved analytically and there is no closed form
solution for this equation. Thus, an iterative algorithm is
needed.
35
3b. Weighted prediction error (g-wpe)
Two step procedure has been proposed in [1] to solve
Likelihood maximization problem.
1. Keep constant and solve for to maximize
likelihood, then obtain ;
2. Keep constant and update
and so on until a convergence criterion satisfied or a
maximum number of iterations completed
36
3b. Weighted prediction error (g-wpe)
37
3.Blind-DereverberationMethods
c.Laplacian based weighted linear prediction (L-WPE)
38
3c.Laplacian based weighted predictionERROR (L-WPE)
L-WPE in [2] suggests that speech can be modeled more
precisely with a Laplacian model rather than a Gaussian model
in STFT domain.
● Assumption 1: speech signal has local Laplacian
distribution for small frames with length Lf,
● Assumption 2: represent STFT coefficients of the desired
signal, for each time-frequency bin with an equal
variance, for independent imaginary and real parts.
39
3c.Laplacian based weighted predictionERROR (L-WPE)
Then, pdf of the Laplacian Model is
Likewise to G-WPE method, maximum likelihood estimation(ML)
will be utilized for parameter vector, . Then,
likelihood function:
40
3c.Laplacian based weighted predictionERROR (L-WPE)
No closed formulation for likelihood function. Thus, solve it
numerically.
1. Keep constant and solve for to maximize likelihood
(or minimize l1 norm), then obtain
2. Keep constant and update
Step1: fix & update
Likelihood function can be rewritten in terms of as
41
3c.Laplacian based weighted predictionERROR (L-WPE)
Thus, likelihood function can be written as:
42
3c.Laplacian based weighted predictionERROR (L-WPE)
Then, problem can be interpreted as a linear programming problem as:
43
3c.Laplacian based weighted predictionERROR(L-WPE)
Step 2: fix & update
After calculating log likelihood and calculating its maximum
with respect to variable , closed form solution for
variance becomes:
● These two steps will proceed until a convergence
criterion is satisfied or maximum number of iterations
has been reached.
44
3c.Laplacian based weighted predictionERROR (L-WPE)
45
46
3.Blind-Dereverberation Methods
d.NMF based spectral modeling (NMF+N-CTF)
3d. NMF based spectral modeling (NMF+N-CTF)
● The method in [3] is a combined version of non-negative
convoluted transfer function (N-CTF) model and
non-negative matrix factorization (NMF).
● N-CTF model assumption: for each frequency bin, the power
spectrogram of STFT coefficient matrices of clean speech
signal & RIR convolution gives the reverberated signal’s
power spectrogram of STFT coefficient matrix.
,
47
3d. NMF based spectral modeling (NMF+N-CTF)
Assumptions:
● Phase elements of the at different frames are
mutually independent
● Zero-mean random variable with Gaussian distribution
● Clean signal & RIR spectral coefficients are mutually
independent.
For simplicity, set , likewise for s(n,k) and
h(n,k). (different than other methods)
48
3d. NMF based spectral modeling (NMF+N-CTF)
Kullback-Leibler (KL) divergence will be used to estimate
power spectrogram of s(n,k) from previous eqn. As:
Where,
: estimated power spectrogram of reverberated signal
49
3d. NMF based spectral modeling (NMF+N-CTF)
To acquire more accurate estimation, the sparsity of clean
speech spectrogram can be added as a regularization term
with weight .
As a non-negativity constraint, are expected to
be greater than zero.
50
3d. NMF based spectral modeling (NMF+N-CTF)
This model can be solved as an iterative learning method as:
51
3d. NMF based spectral modeling (NMF+N-CTF)
Let’s add NMF approach:
The clean speech magnitude spectrogram S can be formulated as
the production of a dictionary matrix B and a weight matrix G.
Where,
R: the number of basis vectors in the dictionary matrix B, dictionary size; R<N (s frame size)
52
3d. NMF based spectral modeling (NMF+N-CTF)
After combination of method N-CTF and NMF, problem definition
becomes:
Approach: keep two fixed, update one in order until a
convergence criterion has been succeeded or maximum number of
iteration has been reached
53
3d. NMF based spectral modeling (NMF+N-CTF)
54
3d. NMF based spectral modeling (NMF+N-CTF)
● To remove scale ambiguity, after each iteration each
columns of B is normalized to sum to one
● The columns of H are element-wise divided by the first
column of H.
● The nature of RIR consists of decaying impulses.
● Mapping coefficient matrix, between clean speech
signal and reverberated speech signal can be formulated
as:
where,
55
56
3d. NMF based spectral modeling (NMF+N-CTF)
● Initializations of basis, B and weight, G matrices are
conducted with randomized non-negative numbers for online
method.
● B & G can be initialized with supervised matrices to
increase efficiency.
● In this work, we employ online method.
57
3.Blind-Dereverberation Methods
e.Sparsity penalized weighted least squares method
(SPWLS)
58
3e.Sparsity penalized weighted least squares method (SPWLS)
❖ SPWLS combines the idea of variance normalization with a
weight matrix and the sparsity property of speech
spectrogram matrices.
❖ To provide sparsity of a variable, generally norm
regularization is used.
❖ With regularization, optimization problem, also known
as Lasso problem, requires an iterative algorithm to
solve.
❖ Some popular algorithms to solve Lasso problem are
➢ ISTA (iterative shrinkage and threshold algorithm) [7]
➢ FISTA
➢ SALSA
59
3e.Sparsity penalized weighted least squares method (SPWLS)
Convolution equation (in STFT domain with fixed frequency k)
can be rewritten in matrix form as:
Then, with regularization term for sparsity, we need to
solve the Lasso problem:
n: noise signal, s: clean speech signal, x: reverberated signal,
H: convolution matrix of RIR.
60
3e.Sparsity penalized weighted least squares method (SPWLS)
● Add weights to the problem as in L-WPE and G-WPE method.
● Add an extra regularization on the norm of the filter h to
make sure that not getting a trivial solution.
● Our optimization loss function becomes:
where,
: regularization parameter, W: diagonal weight matrix with 1/(std) values
: the target norm for filter h,
k: freq. Index (fixed), n: frame index
61
3e.Sparsity penalized weighted least squares method (SPWLS)
● Problem is non-differentiable at its local minimum.
● s & h need to be calculated numerically with an iterative
approach.
● Our approach requires a good initialization for s & h
which can be obtained from an earlier method such as
G-WPE.
● Our approach: Performing alternating updates of s and h
that would minimize the objective function with respect
to the corresponding variable.
● For updating s & h, ISTA algorithm is utilized.
62
3e.Sparsity penalized weighted least squares method (SPWLS)
ISTA: minimizes functions like f(s)+g(s) where the first
function is differentiable and the second function is
usually not differentiable, but simple.
Step 1 to update s: Take a gradient descent step in the
direction of the first function f(.):
(i: iteration index)
The result is an intermediate solution.
● If we calculate the gradient of the first function f(.):
63
3e.Sparsity penalized weighted least squares method (SPWLS)
: positive step size parameter, indicates the amount that
we move along the negative gradient.
Step 2 to update s: A proximal operator step of g(.) is
performed around that intermediate solution as follows:
Proximal step corresponds to a thresholding/shrinkage
operation for the norm penalty:
Basically, this step erases the components with small energy
and shrinks the other parts. (a = for our algorithm) 64
3e.Sparsity penalized weighted least squares method (SPWLS)
● After the update of s, we update W matrix according to
new variance values of s.
Now, we need to solve problem for h. Update h according to:
● Use ISTA again:
Step 1 to update h: minimizer for f(.), simple least-square
problem with exact solution:
65
3e.Sparsity penalized weighted least squares method (SPWLS)
Step 2 to update h: Proximal operation step for the
regularization of h
● Step size parameter, for the inner gradient descent
descent iteration for s can be set to change for each
iteration as
Where are hyperparameters and is the initial step
size, are the inner and outer iteration indices. 66
67
4)Experiments&Comparisons
68
TEST DATA
Experiment 1: 3 male & 3 female (clean) voices convolved with 6
different RIR samples with 30 dB and 60 dB additive noises (for
DLP, G-WPE, NMF+N-CTF, SPWLS methods)
72 different samples have been dereverberated.
Experiment 2: 1 male and 1 female (clean) voices convolved with
5 different RIR samples and added 30 dB and 60 dB additive
noises. (for all methods)
20 different samples have been dereverberated.
● Test data has been taken from Reverb Challenge" data set. 69
TEST DATA
● Sampling frequency was 16KHz same for all files.
● RIR times (RT60) were 0.17, 0.11, 0.95, 0.33, 0.54, 0.35s
respectively
● L-WPE method was not performed with the RT60 = 0.95s only
due to excessive run time.
● As additive noise, a cafe environment noise with 30 dB and
60 dB levels has been used.
70
setup
● Number of delayed frame size, D was set to 3 frames for
G-WPE, L-WPE and DLP methods,
● Lf , number of frames used for variance calculations is
set to 1 frame for G-WPE, L-WPE and SPWLS methods,
● Iteration number for G-WPE, L-WPE and SPWLS methods is
set to 5,
● Iteration number for NMF+N-CTF method is set to 100.
● STFT parameters: hop size =10ms, window size =30ms.
● Minimum variance to avoid zero divisions,v = 1e(-6)
● Number of STFT frames used to predict signal changes with
respect to RT60 estimates of internal compiling.
71
setup
SPWLS parameters specific to this method are
● step size, = 1E-7,
● ISTA regularization parameter = 1E5,
● inner iteration number for ISTA i =10,
● ISTA regularization parameter for filter =10.
● SPWLS initialization for RIR, H is set as the output of
G-WPE method.
NMF+N-CTF method has
● dictionary matrix size ndict" as 100.
● Method uses online method.
72
Computational effıciency
● All the algorithms are implemented in MATLAB on a
computer with an Intel Xeon CPU, 2.5GHz.
● the fastest one is SPWLS method. Then, G-WPE, DLP,
NMF+N-CTF and L-WPE come in order.
● L-WPE is very slow due to linear programming (LP) part
inside. CVX tool for Matlab is utilized for LP part.
● Compiling times of data with RT60= 0.54 s :
○ L-WPE, ~one day
○ NMF+N-CTF ~1.5hour (with 100 iter#, 100 ndict)
○ G-WPE ~4mins (5 iterations)
○ SPWLS ~2mins (5 iterations)
○ DLP ~3mins (1 iteration) - implemented with Levinson-Durbin algorithm
73
Test Methods
● Accuracy of the dereverberation process is calculated with average
cepstral distortion (CD) test over short time frames.
● Popular method to measure speech quality measure between clean signal and
reconstructed signal.
: clean speech signal cepstral coeffs from 1th to 12th order
: estimated speech signal's cepstral coeffs 1th to 12th order.
: Zero order coeff, denotes the power spectrum envelope in dB.
● CD between similar signals converges to 0.
● Our aim is to keep CD as small as possible after dereverberation process.
74
Test Methods
● STOI, short-time objective intelligibility measure: For
short-time frames, STOI compares the temporal envelopes
of the clean and dereberberated speech in terms of
correlation coefficients.
● PESQ, Perceptual Evaluation of Speech Quality: common
standardized test method for speech quality measure. 3
types of PESQ measure is applied.
● Signal to noise (SNR) ratio test between clean signal and
dereverberated signal.
● Segmented SNR (segSNR): SNR results for short time
frames.
75
Test results-iteration# (experiment2-for20files)
76
Test results-iteration# (experiment2-for20files)
77
Test results-iteration# (experiment2-for20files)
78
Test results-iteration# (experiment2-for20files)
79
Test results-iteration# (experiment2-for20files)
80
Test results-iteration# (experiment2-for20files)
81
Test results-iteration# (experiment2-for20files)
82
Test results- NMF+N-CTF Method (experiment2-for20files)
83
Test results- NMF+N-CTF Method (experiment2-for20files)
84
Test results- NMF+N-CTF Method (experiment2-for20files)
85
Test results- NMF+N-CTF Method (experiment2-for20files)
86
Test results- NMF+N-CTF Method (experiment2-for20files)
87
Test results- NMF+N-CTF Method (experiment2-for20files)
88
Test results- NMF+N-CTF Method (experiment2-for20files)
89
Spectrogram results OF DEREVERBERATED Sıgnals
90
Spectrogram results OF DEREVERBERATED Sıgnals
91
Spectrogram results OF DEREVERBERATED Sıgnals
iter# =1
92
Spectrogram results OF DEREVERBERATED Sıgnals
iter# =5
93
Spectrogram results OF DEREVERBERATED Sıgnals
iter# =5
94
Spectrogram results OF DEREVERBERATED Sıgnals
iter# =100
95
NUMERICAL RESULTS
iter# =5
96
Test results-Average
97
Test results-Average
98
NUMERICAL RESULTS (ForlongRIRwith RT60=0.54sresults)
99
NUMERICAL RESULTS - NMF+N-CTF Method
ndict= dictionary matrix size , #iter = number of iterations
NNCTF1 ndict = 100 & #iter= 100,
NNCTF2 ndict = 500 & #iter= 200,
NNCTF3 ndict= 1000 & #iter= 200,
NNCTF4 ndict= 1000 & #iter= 400,
NNCTF5 ndict= 1000 & #iter= 240.
100
NUMERICAL RESULTS - NMF+N-CTF Method
101
102
Listen to the results
5) DISCUSSION&CONCLUSION
103
DISCUSSION & CONCLUSION
● The best test results belongs to L-WPE method.
● In terms of time efficiency and test results, G-WPE works better,
could work better with real time applications.
● L-WPE algorithm is much more complex than G-WPE because of linear
programming part. Thus, it works very slow.
● NMF+N-CTF results
○ converging,
○ test results are not as good as proposed in paper,
○ method could perform better with a good initialization or
supervised dictionary matrix.
○ Increasing dictionary size has good effects on test results, but
Iteration number does not always improve them.
○ No phase information.
104
DISCUSSION & CONCLUSION
● L-WPE was slower, G-WPE was faster than DLP for one iteration.
● SPWLS could not show good performance for CD. To improve the
performance, more constraints can be set for h. In SPWLS, we are
trying to eliminate the whole echo, not only late as in G-WPE,
L-WPE & DLP. Also, step size might be decreased.
● SPWLS shows promises due to time efficiency, SNR and PESQ
results.
● Spectrogram results show that L-WPE and G-WPE are successfully
managing eliminating late reverberant parts.
● DLP is just utilized to make comparisons with L-WPE and G-WPE
methods, since they rooted from DLP method. As expected L-WPE and
G-WPE are better.
105
REFERENCES
[1] Nakatani, Tomohiro, et al. "Speech dereverberation based on variance-normalized delayed linear prediction." IEEE transactions
on audio, speech, and language processing 18.7 (2010): 1717-1731.
[2] Jukić, Ante, and Simon Doclo. "Speech dereverberation using weighted prediction error with Laplacian model of the desired
signal." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.
[3] Mohammadiha, Nasser, Paris Smaragdis, and Simon Doclo. "Joint acoustic and spectral modeling for speech dereverberation
using non-negative representations." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 2015.
[4] Mohammadiha, Nasser, and Simon Doclo. "Speech dereverberation using non-negative convolutive transfer function and
spectro-temporal modeling."IEEE/ACM Transactions on Audio, Speech, and Language Processing 24.2 (2016): 276-289.
[5] Selesnick, Ivan. "Introduction to sparsity in signal processing." Connexions(2012).
[6] Lee, Daniel D., and H. Sebastian Seung. "Algorithms for non-negative matrix factorization." Advances in neural information
processing systems. 2001.
[7] Combettes, Patrick L., and Jean-Christophe Pesquet. "Proximal splitting methods in signal processing." Fixed-point algorithms for
inverse problems in science and engineering. Springer New York, 2011. 185-212.
106

More Related Content

What's hot

Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC techniqueChun Hao Wang
 
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive SensingIRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive SensingIRJET Journal
 
QRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsQRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsIDES Editor
 
S.A.kalaiselvan- robust video data hiding at forbidden zone
S.A.kalaiselvan- robust video data hiding at forbidden zoneS.A.kalaiselvan- robust video data hiding at forbidden zone
S.A.kalaiselvan- robust video data hiding at forbidden zonekalaiselvanresearch
 
Analysis of Space Time Codes Using Modulation Techniques
Analysis of Space Time Codes Using Modulation TechniquesAnalysis of Space Time Codes Using Modulation Techniques
Analysis of Space Time Codes Using Modulation TechniquesIOSR Journals
 
A Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM SystemsA Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM SystemsCSCJournals
 
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo MethodIRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo MethodIRJET Journal
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Tanjarul Islam Mishu
 
Comparison between Trigonometric and Traditional DDS, in 90 nm Technology
Comparison between Trigonometric and Traditional DDS, in 90 nm TechnologyComparison between Trigonometric and Traditional DDS, in 90 nm Technology
Comparison between Trigonometric and Traditional DDS, in 90 nm TechnologyTELKOMNIKA JOURNAL
 
Paper id 22201419
Paper id 22201419Paper id 22201419
Paper id 22201419IJRAT
 
Image stegnography and steganalysis
Image stegnography and steganalysisImage stegnography and steganalysis
Image stegnography and steganalysisPrince Boonlia
 
Design limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLLDesign limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLLIDES Editor
 
Gst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingGst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingssuser8d9d45
 
Evolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation Models
Evolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation ModelsEvolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation Models
Evolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation ModelsEditor IJCATR
 

What's hot (20)

Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
 
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive SensingIRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
IRJET- Reconstruction of Sparse Signals(Speech) Using Compressive Sensing
 
QRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband SignalsQRC-ESPRIT Method for Wideband Signals
QRC-ESPRIT Method for Wideband Signals
 
S.A.kalaiselvan- robust video data hiding at forbidden zone
S.A.kalaiselvan- robust video data hiding at forbidden zoneS.A.kalaiselvan- robust video data hiding at forbidden zone
S.A.kalaiselvan- robust video data hiding at forbidden zone
 
Analysis of Space Time Codes Using Modulation Techniques
Analysis of Space Time Codes Using Modulation TechniquesAnalysis of Space Time Codes Using Modulation Techniques
Analysis of Space Time Codes Using Modulation Techniques
 
A Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM SystemsA Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
 
iscas07
iscas07iscas07
iscas07
 
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo MethodIRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
 
Fb24958960
Fb24958960Fb24958960
Fb24958960
 
Cb25464467
Cb25464467Cb25464467
Cb25464467
 
45
4545
45
 
Comparison between Trigonometric and Traditional DDS, in 90 nm Technology
Comparison between Trigonometric and Traditional DDS, in 90 nm TechnologyComparison between Trigonometric and Traditional DDS, in 90 nm Technology
Comparison between Trigonometric and Traditional DDS, in 90 nm Technology
 
Paper id 22201419
Paper id 22201419Paper id 22201419
Paper id 22201419
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
Closed loop DPCM
Closed loop DPCMClosed loop DPCM
Closed loop DPCM
 
Image stegnography and steganalysis
Image stegnography and steganalysisImage stegnography and steganalysis
Image stegnography and steganalysis
 
Design limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLLDesign limitations and its effect in the performance of ZC1-DPLL
Design limitations and its effect in the performance of ZC1-DPLL
 
Gst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingGst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier sampling
 
Evolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation Models
Evolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation ModelsEvolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation Models
Evolving CSP Algorithm in Predicting the Path Loss of Indoor Propagation Models
 

Viewers also liked

Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionTakuya Yoshioka
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringTejus Adiga M
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...guestfb80e22
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Ojaswa Anand
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...eSAT Journals
 
Active noise control
Active noise controlActive noise control
Active noise controlRishikesh .
 
Geometric Approach to Spectral Substraction
Geometric Approach to Spectral SubstractionGeometric Approach to Spectral Substraction
Geometric Approach to Spectral Substractionkeerthi thallam
 
Honda presentation
Honda presentationHonda presentation
Honda presentationRahulSN
 

Viewers also liked (13)

Speech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognitionSpeech enhancement for distant talking speech recognition
Speech enhancement for distant talking speech recognition
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Final ppt
Final pptFinal ppt
Final ppt
 
Antinoise system & Noise Cancellation
Antinoise system & Noise CancellationAntinoise system & Noise Cancellation
Antinoise system & Noise Cancellation
 
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...
 
Active noise control
Active noise controlActive noise control
Active noise control
 
Geometric Approach to Spectral Substraction
Geometric Approach to Spectral SubstractionGeometric Approach to Spectral Substraction
Geometric Approach to Spectral Substraction
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Honda presentation
Honda presentationHonda presentation
Honda presentation
 

Similar to Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsLTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsIllaKolani1
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - FinalMax Robertson
 
Design of Low-Pass Digital Differentiators Based on B-splines
Design of Low-Pass Digital Differentiators Based on B-splinesDesign of Low-Pass Digital Differentiators Based on B-splines
Design of Low-Pass Digital Differentiators Based on B-splinesCSCJournals
 
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...IJNSA Journal
 
DONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDonyMa
 
Pres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxPres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxDonyMa
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processinganeetaanu
 
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...Polytechnique Montreal
 
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...IJNSA Journal
 
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...IJCSEA Journal
 
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...Rupesh Sharma
 
Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...
Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...
Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...VLSICS Design
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...Polytechnique Montreal
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderCSCJournals
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)Enrique Monzo Solves
 

Similar to Comparison of Single Channel Blind Dereverberation Methods for Speech Signals (20)

LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering ChannelsLTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
 
Dq31784792
Dq31784792Dq31784792
Dq31784792
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
 
Design of Low-Pass Digital Differentiators Based on B-splines
Design of Low-Pass Digital Differentiators Based on B-splinesDesign of Low-Pass Digital Differentiators Based on B-splines
Design of Low-Pass Digital Differentiators Based on B-splines
 
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
 
DONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptx
 
Pres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxPres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptx
 
L010628894
L010628894L010628894
L010628894
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
 
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
 
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
 
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
Design Ofdm System And Remove Nonlinear Distortion In OFDM Signal At Transmit...
 
Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...
Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...
Optimization of Cmos 0.18 µM Low Noise Amplifier Using Nsga-Ii for UWB Applic...
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
Projected Barzilai-Borwein Methods Applied to Distributed Compressive Spectru...
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
 

Comparison of Single Channel Blind Dereverberation Methods for Speech Signals

  • 1. Comparison of Single Channel Blind Dereverberation Methods for Speech Signals Deha Deniz Türköz - MSc Thesis Thesis Supervisor: Hakan Erdoğan Sabancı Üniversitesi 27.06.2016
  • 2. OUTLINE 1) Introduction 2) Background a) Features of speech b) Reverberation model c) Room impulse response (RIR) d) Non-negative matrix factorization (NMF) 3) Blind-Dereverberation Methods a) Delayed linear prediction (DLP) b) Weighted prediction error (G-WPE) c) Laplacian based Weighted Prediction Error (L-WPE) d) NMF based spectral modeling (NMF+N-CTF) e) Sparsity penalized weighted least squares method (SPWLS) 4) Experiments and Comparisons 5) Discussion and Conclusion 2
  • 4. 1. Introduction Reverberation: ● is an effect occurs on speech data due to reflections through walls, ● decreases speech intelligibility, ● degrades applications such as ASR, hands-free teleconferencing, ● can be modeled with an LTI filter. 4
  • 5. ● If filter, h is known, then clean signal,s can be recovered with a simple deconvolution operation called dereverberation. ● For most cases h & s are unknowns and x is the only known parameter. Predicting h & s from x is called “Blind-dereverberation problem” which is the main subject of this work. 1. Introduction 5
  • 6. Aim of this work is to compare the existing blind-dereverberation methods ○ DLP: delayed linear prediction, ○ G-WPE: Gaussian based weighted prediction error, ○ L-WPE: Laplacian based based weighted prediction error, ○ NMF+N-CTF: NMF based spectral-temporal modeling and offer a new algorithm called ○ SPWLS: sparsity penalized weighted least squares. 1. Introduction 6
  • 8. 2a.FeaturesofSpeech ● Speech is a signal created through human vocal system. ● Input of vocal tract is called glottal signal: ○ White noise, ○ Impulse train ● Vocal tract system can be modeled as all-pole filter means speech production is a simple LTI filtering operation of a glottal signal. 8
  • 9. 2A. Features of Speech ● Speech signals are non-stationary. ● General approach: divide signal into small time segments, assume each of them are stationary. ● To analyze speech: short-time Fourier transform (STFT) ● STFT: divides speech signal into overlapping segments called frames by using a window filter. Calculates DFT of these frames 9
  • 10. 2A.Features of Speech Formulation of STFT: L: frame shift, N:frame size, X(n,k): discrete STFT coefficients of speech signal x[m] at frame n. W[m]: Hamming window 10
  • 11. 2A. Features of Speech ● STFT of signal is interpreted as a matrix having complex DFT coefficients at columns. 11
  • 12. 2A. Features of Speech ● To visualize signal’s frequency changes with respect to time: spectrogram ● Spectrogram, S(n,k) uses power spectral domain (PSD) measures of STFT matrix, X(n,k) as intensity values in an 2D image: 12
  • 14. 2b.ReverberationModel ● Reverberation environment can be modeled as an LTI filter which is called room impulse response (RIR). ● Reverberation model: h(t): RIR, unknown s(t): clean signal (anechoic signal), unknown x(t): reverberated signal (echoed signal), known 14
  • 15. 2b. Reverberation Model Reverberation effect on spectrogram: 15
  • 17. 2c.Room Impulse Response (RIR) The length of RIR depends on ● Room size, ● Room temperature, ● Room shape, ● Microphone’s distance to the speech source, ● Absorption of sound in room, : time required for reflected signal to drop by 60 dB level ● RIR shows FIR filter characteristic. 17
  • 18. 2c. Room Impulse Response (RIR) Usually RIR is divided into two parts: 1. Early reverberation 2. Late reverberation: the most detrimental part of echo n(t): noise d(t): early echo + clean signal (desired signal) r(t): Late echo Lh: the length of RIR h(t): RIR, (earl echo + late echo) 18
  • 19. 2c. Room Impulse Response(RIR) Then, early and late reverberations are D: the length of early reverberation 19
  • 20. 2c. Room Impulse Response (RIR) 20
  • 22. 2d. Non-negative Matrix Factorization (NMF) NMF: decomposition a V matrix as production of two matrices B and G with non-negative entries. B: basis or dictionary matrix, G: weight or gains matrix. ● This problem can be interpreted as an optimization problem as follows: where C is the cost function for measuring the distance between V and BG 22
  • 23. 2d. Non-negative Matrix Factorization (NMF) ● Columns of B are called basis vectors, ● Number of B matrix columns are kept smaller than the size of V, ● Iterative algorithms are utilized to solve the NMF problem, since there is no unique solution. ● Initial B & G matrices can be randomized positive numbers or supervised matrices for fast convergence. ● Popular iterative methods to formulate distance function between V and BG are: ○ Euclidean distance, ○ Kullback-Leibler distance (KL), ○ Itakuro-Saito distance method (IS). 23
  • 24. 2d.Non-negative Matrix Factorization (NMF) Kullback-Leibler divergence between V and BG and defined as [6]: where “1” is the matrix of ones, has the same size of V 24
  • 25. 2d.Non-negative Matrix Factorization (NMF) ● NMF is a non-convex algorithm and have multiple local minimums. As a result, B and G can vary for the same V matrix. ● NMF is a common method used in speech processing, deep learning, clustering, and computer vision. ● In speech processing, NMF has applications for Audio-Source Separation, source/filter model, blind-dereverberation [3][4], speech denoising and so on. 25
  • 26. 3. Blind-Dereverberation Methods a. Delayed linear prediction (DLP) 26
  • 27. 3a.DelayedLinearPrediction(DLP) We denote time-domain signals x(t), s(t), h(t) as respectively. STFT-domain signal notations are , for x(n,k), s(n,k), h(n,k) respectively. Then, 27
  • 28. 3a. Delayed Linear Prediction (DLP) ● DLP estimates inverse filter coefficients from reverberated signal. ● inverse filter of length Lw, can be used to approximately obtain a dereverberated signal as: ● In matrix form, reverberation can be formulated as 28
  • 29. 3a. Delayed Linear Prediction (DLP) 29
  • 30. 3a. Delayed Linear Prediction (DLP) ● means desired signal can be estimated by only using reverberated signal and its past samples. ● Then, the inverse filter is ● The number of zeros in the inverse filter vector is equal to D, delay. ● In conclusion, DLP algorithm is a simple technique to achieve dereverberation. ● it may not work well in most cases. Reason is having an FIR filter as the inverse filter. 30
  • 32. 3b. Weighted prediction error (g-wpe) Assumption 1: speech signal has local Gaussian distribution for small frames with length Lf, Assumption 2: samples are mutually uncorrelated after a certain distance, Assumption 3: variance is constant for short-time frames with size Lf. 32
  • 33. 3b. Weighted prediction error (g-wpe) ● Dereverberation can be done both in time domain and in STFT domain, ● Using time domain is very costly, because of having quite big matrices, so STFT domain will be used. ● Probability density function of desired signal in STFT domain, n:frame number, k:frequency bin, : time-varying variance Then, 33
  • 34. 3b. Weighted prediction error (g-wpe) ● Variance values alter only with respect to time frames Thus, ● Apply likelihood maximization to Gaussian pdf. Then, log likelihood function for dereverberation process in STFT domain becomes: Parameter vector for likelihood maximization: 34
  • 35. 3b. Weighted prediction error (g-wpe) Maximizing the equation with respect to parameter vector, cannot be achieved analytically and there is no closed form solution for this equation. Thus, an iterative algorithm is needed. 35
  • 36. 3b. Weighted prediction error (g-wpe) Two step procedure has been proposed in [1] to solve Likelihood maximization problem. 1. Keep constant and solve for to maximize likelihood, then obtain ; 2. Keep constant and update and so on until a convergence criterion satisfied or a maximum number of iterations completed 36
  • 37. 3b. Weighted prediction error (g-wpe) 37
  • 39. 3c.Laplacian based weighted predictionERROR (L-WPE) L-WPE in [2] suggests that speech can be modeled more precisely with a Laplacian model rather than a Gaussian model in STFT domain. ● Assumption 1: speech signal has local Laplacian distribution for small frames with length Lf, ● Assumption 2: represent STFT coefficients of the desired signal, for each time-frequency bin with an equal variance, for independent imaginary and real parts. 39
  • 40. 3c.Laplacian based weighted predictionERROR (L-WPE) Then, pdf of the Laplacian Model is Likewise to G-WPE method, maximum likelihood estimation(ML) will be utilized for parameter vector, . Then, likelihood function: 40
  • 41. 3c.Laplacian based weighted predictionERROR (L-WPE) No closed formulation for likelihood function. Thus, solve it numerically. 1. Keep constant and solve for to maximize likelihood (or minimize l1 norm), then obtain 2. Keep constant and update Step1: fix & update Likelihood function can be rewritten in terms of as 41
  • 42. 3c.Laplacian based weighted predictionERROR (L-WPE) Thus, likelihood function can be written as: 42
  • 43. 3c.Laplacian based weighted predictionERROR (L-WPE) Then, problem can be interpreted as a linear programming problem as: 43
  • 44. 3c.Laplacian based weighted predictionERROR(L-WPE) Step 2: fix & update After calculating log likelihood and calculating its maximum with respect to variable , closed form solution for variance becomes: ● These two steps will proceed until a convergence criterion is satisfied or maximum number of iterations has been reached. 44
  • 45. 3c.Laplacian based weighted predictionERROR (L-WPE) 45
  • 46. 46 3.Blind-Dereverberation Methods d.NMF based spectral modeling (NMF+N-CTF)
  • 47. 3d. NMF based spectral modeling (NMF+N-CTF) ● The method in [3] is a combined version of non-negative convoluted transfer function (N-CTF) model and non-negative matrix factorization (NMF). ● N-CTF model assumption: for each frequency bin, the power spectrogram of STFT coefficient matrices of clean speech signal & RIR convolution gives the reverberated signal’s power spectrogram of STFT coefficient matrix. , 47
  • 48. 3d. NMF based spectral modeling (NMF+N-CTF) Assumptions: ● Phase elements of the at different frames are mutually independent ● Zero-mean random variable with Gaussian distribution ● Clean signal & RIR spectral coefficients are mutually independent. For simplicity, set , likewise for s(n,k) and h(n,k). (different than other methods) 48
  • 49. 3d. NMF based spectral modeling (NMF+N-CTF) Kullback-Leibler (KL) divergence will be used to estimate power spectrogram of s(n,k) from previous eqn. As: Where, : estimated power spectrogram of reverberated signal 49
  • 50. 3d. NMF based spectral modeling (NMF+N-CTF) To acquire more accurate estimation, the sparsity of clean speech spectrogram can be added as a regularization term with weight . As a non-negativity constraint, are expected to be greater than zero. 50
  • 51. 3d. NMF based spectral modeling (NMF+N-CTF) This model can be solved as an iterative learning method as: 51
  • 52. 3d. NMF based spectral modeling (NMF+N-CTF) Let’s add NMF approach: The clean speech magnitude spectrogram S can be formulated as the production of a dictionary matrix B and a weight matrix G. Where, R: the number of basis vectors in the dictionary matrix B, dictionary size; R<N (s frame size) 52
  • 53. 3d. NMF based spectral modeling (NMF+N-CTF) After combination of method N-CTF and NMF, problem definition becomes: Approach: keep two fixed, update one in order until a convergence criterion has been succeeded or maximum number of iteration has been reached 53
  • 54. 3d. NMF based spectral modeling (NMF+N-CTF) 54
  • 55. 3d. NMF based spectral modeling (NMF+N-CTF) ● To remove scale ambiguity, after each iteration each columns of B is normalized to sum to one ● The columns of H are element-wise divided by the first column of H. ● The nature of RIR consists of decaying impulses. ● Mapping coefficient matrix, between clean speech signal and reverberated speech signal can be formulated as: where, 55
  • 56. 56
  • 57. 3d. NMF based spectral modeling (NMF+N-CTF) ● Initializations of basis, B and weight, G matrices are conducted with randomized non-negative numbers for online method. ● B & G can be initialized with supervised matrices to increase efficiency. ● In this work, we employ online method. 57
  • 58. 3.Blind-Dereverberation Methods e.Sparsity penalized weighted least squares method (SPWLS) 58
  • 59. 3e.Sparsity penalized weighted least squares method (SPWLS) ❖ SPWLS combines the idea of variance normalization with a weight matrix and the sparsity property of speech spectrogram matrices. ❖ To provide sparsity of a variable, generally norm regularization is used. ❖ With regularization, optimization problem, also known as Lasso problem, requires an iterative algorithm to solve. ❖ Some popular algorithms to solve Lasso problem are ➢ ISTA (iterative shrinkage and threshold algorithm) [7] ➢ FISTA ➢ SALSA 59
  • 60. 3e.Sparsity penalized weighted least squares method (SPWLS) Convolution equation (in STFT domain with fixed frequency k) can be rewritten in matrix form as: Then, with regularization term for sparsity, we need to solve the Lasso problem: n: noise signal, s: clean speech signal, x: reverberated signal, H: convolution matrix of RIR. 60
  • 61. 3e.Sparsity penalized weighted least squares method (SPWLS) ● Add weights to the problem as in L-WPE and G-WPE method. ● Add an extra regularization on the norm of the filter h to make sure that not getting a trivial solution. ● Our optimization loss function becomes: where, : regularization parameter, W: diagonal weight matrix with 1/(std) values : the target norm for filter h, k: freq. Index (fixed), n: frame index 61
  • 62. 3e.Sparsity penalized weighted least squares method (SPWLS) ● Problem is non-differentiable at its local minimum. ● s & h need to be calculated numerically with an iterative approach. ● Our approach requires a good initialization for s & h which can be obtained from an earlier method such as G-WPE. ● Our approach: Performing alternating updates of s and h that would minimize the objective function with respect to the corresponding variable. ● For updating s & h, ISTA algorithm is utilized. 62
  • 63. 3e.Sparsity penalized weighted least squares method (SPWLS) ISTA: minimizes functions like f(s)+g(s) where the first function is differentiable and the second function is usually not differentiable, but simple. Step 1 to update s: Take a gradient descent step in the direction of the first function f(.): (i: iteration index) The result is an intermediate solution. ● If we calculate the gradient of the first function f(.): 63
  • 64. 3e.Sparsity penalized weighted least squares method (SPWLS) : positive step size parameter, indicates the amount that we move along the negative gradient. Step 2 to update s: A proximal operator step of g(.) is performed around that intermediate solution as follows: Proximal step corresponds to a thresholding/shrinkage operation for the norm penalty: Basically, this step erases the components with small energy and shrinks the other parts. (a = for our algorithm) 64
  • 65. 3e.Sparsity penalized weighted least squares method (SPWLS) ● After the update of s, we update W matrix according to new variance values of s. Now, we need to solve problem for h. Update h according to: ● Use ISTA again: Step 1 to update h: minimizer for f(.), simple least-square problem with exact solution: 65
  • 66. 3e.Sparsity penalized weighted least squares method (SPWLS) Step 2 to update h: Proximal operation step for the regularization of h ● Step size parameter, for the inner gradient descent descent iteration for s can be set to change for each iteration as Where are hyperparameters and is the initial step size, are the inner and outer iteration indices. 66
  • 67. 67
  • 69. TEST DATA Experiment 1: 3 male & 3 female (clean) voices convolved with 6 different RIR samples with 30 dB and 60 dB additive noises (for DLP, G-WPE, NMF+N-CTF, SPWLS methods) 72 different samples have been dereverberated. Experiment 2: 1 male and 1 female (clean) voices convolved with 5 different RIR samples and added 30 dB and 60 dB additive noises. (for all methods) 20 different samples have been dereverberated. ● Test data has been taken from Reverb Challenge" data set. 69
  • 70. TEST DATA ● Sampling frequency was 16KHz same for all files. ● RIR times (RT60) were 0.17, 0.11, 0.95, 0.33, 0.54, 0.35s respectively ● L-WPE method was not performed with the RT60 = 0.95s only due to excessive run time. ● As additive noise, a cafe environment noise with 30 dB and 60 dB levels has been used. 70
  • 71. setup ● Number of delayed frame size, D was set to 3 frames for G-WPE, L-WPE and DLP methods, ● Lf , number of frames used for variance calculations is set to 1 frame for G-WPE, L-WPE and SPWLS methods, ● Iteration number for G-WPE, L-WPE and SPWLS methods is set to 5, ● Iteration number for NMF+N-CTF method is set to 100. ● STFT parameters: hop size =10ms, window size =30ms. ● Minimum variance to avoid zero divisions,v = 1e(-6) ● Number of STFT frames used to predict signal changes with respect to RT60 estimates of internal compiling. 71
  • 72. setup SPWLS parameters specific to this method are ● step size, = 1E-7, ● ISTA regularization parameter = 1E5, ● inner iteration number for ISTA i =10, ● ISTA regularization parameter for filter =10. ● SPWLS initialization for RIR, H is set as the output of G-WPE method. NMF+N-CTF method has ● dictionary matrix size ndict" as 100. ● Method uses online method. 72
  • 73. Computational effıciency ● All the algorithms are implemented in MATLAB on a computer with an Intel Xeon CPU, 2.5GHz. ● the fastest one is SPWLS method. Then, G-WPE, DLP, NMF+N-CTF and L-WPE come in order. ● L-WPE is very slow due to linear programming (LP) part inside. CVX tool for Matlab is utilized for LP part. ● Compiling times of data with RT60= 0.54 s : ○ L-WPE, ~one day ○ NMF+N-CTF ~1.5hour (with 100 iter#, 100 ndict) ○ G-WPE ~4mins (5 iterations) ○ SPWLS ~2mins (5 iterations) ○ DLP ~3mins (1 iteration) - implemented with Levinson-Durbin algorithm 73
  • 74. Test Methods ● Accuracy of the dereverberation process is calculated with average cepstral distortion (CD) test over short time frames. ● Popular method to measure speech quality measure between clean signal and reconstructed signal. : clean speech signal cepstral coeffs from 1th to 12th order : estimated speech signal's cepstral coeffs 1th to 12th order. : Zero order coeff, denotes the power spectrum envelope in dB. ● CD between similar signals converges to 0. ● Our aim is to keep CD as small as possible after dereverberation process. 74
  • 75. Test Methods ● STOI, short-time objective intelligibility measure: For short-time frames, STOI compares the temporal envelopes of the clean and dereberberated speech in terms of correlation coefficients. ● PESQ, Perceptual Evaluation of Speech Quality: common standardized test method for speech quality measure. 3 types of PESQ measure is applied. ● Signal to noise (SNR) ratio test between clean signal and dereverberated signal. ● Segmented SNR (segSNR): SNR results for short time frames. 75
  • 83. Test results- NMF+N-CTF Method (experiment2-for20files) 83
  • 84. Test results- NMF+N-CTF Method (experiment2-for20files) 84
  • 85. Test results- NMF+N-CTF Method (experiment2-for20files) 85
  • 86. Test results- NMF+N-CTF Method (experiment2-for20files) 86
  • 87. Test results- NMF+N-CTF Method (experiment2-for20files) 87
  • 88. Test results- NMF+N-CTF Method (experiment2-for20files) 88
  • 89. Test results- NMF+N-CTF Method (experiment2-for20files) 89
  • 90. Spectrogram results OF DEREVERBERATED Sıgnals 90
  • 91. Spectrogram results OF DEREVERBERATED Sıgnals 91
  • 92. Spectrogram results OF DEREVERBERATED Sıgnals iter# =1 92
  • 93. Spectrogram results OF DEREVERBERATED Sıgnals iter# =5 93
  • 94. Spectrogram results OF DEREVERBERATED Sıgnals iter# =5 94
  • 95. Spectrogram results OF DEREVERBERATED Sıgnals iter# =100 95
  • 99. NUMERICAL RESULTS (ForlongRIRwith RT60=0.54sresults) 99
  • 100. NUMERICAL RESULTS - NMF+N-CTF Method ndict= dictionary matrix size , #iter = number of iterations NNCTF1 ndict = 100 & #iter= 100, NNCTF2 ndict = 500 & #iter= 200, NNCTF3 ndict= 1000 & #iter= 200, NNCTF4 ndict= 1000 & #iter= 400, NNCTF5 ndict= 1000 & #iter= 240. 100
  • 101. NUMERICAL RESULTS - NMF+N-CTF Method 101
  • 102. 102 Listen to the results
  • 104. DISCUSSION & CONCLUSION ● The best test results belongs to L-WPE method. ● In terms of time efficiency and test results, G-WPE works better, could work better with real time applications. ● L-WPE algorithm is much more complex than G-WPE because of linear programming part. Thus, it works very slow. ● NMF+N-CTF results ○ converging, ○ test results are not as good as proposed in paper, ○ method could perform better with a good initialization or supervised dictionary matrix. ○ Increasing dictionary size has good effects on test results, but Iteration number does not always improve them. ○ No phase information. 104
  • 105. DISCUSSION & CONCLUSION ● L-WPE was slower, G-WPE was faster than DLP for one iteration. ● SPWLS could not show good performance for CD. To improve the performance, more constraints can be set for h. In SPWLS, we are trying to eliminate the whole echo, not only late as in G-WPE, L-WPE & DLP. Also, step size might be decreased. ● SPWLS shows promises due to time efficiency, SNR and PESQ results. ● Spectrogram results show that L-WPE and G-WPE are successfully managing eliminating late reverberant parts. ● DLP is just utilized to make comparisons with L-WPE and G-WPE methods, since they rooted from DLP method. As expected L-WPE and G-WPE are better. 105
  • 106. REFERENCES [1] Nakatani, Tomohiro, et al. "Speech dereverberation based on variance-normalized delayed linear prediction." IEEE transactions on audio, speech, and language processing 18.7 (2010): 1717-1731. [2] Jukić, Ante, and Simon Doclo. "Speech dereverberation using weighted prediction error with Laplacian model of the desired signal." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014. [3] Mohammadiha, Nasser, Paris Smaragdis, and Simon Doclo. "Joint acoustic and spectral modeling for speech dereverberation using non-negative representations." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. [4] Mohammadiha, Nasser, and Simon Doclo. "Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling."IEEE/ACM Transactions on Audio, Speech, and Language Processing 24.2 (2016): 276-289. [5] Selesnick, Ivan. "Introduction to sparsity in signal processing." Connexions(2012). [6] Lee, Daniel D., and H. Sebastian Seung. "Algorithms for non-negative matrix factorization." Advances in neural information processing systems. 2001. [7] Combettes, Patrick L., and Jean-Christophe Pesquet. "Proximal splitting methods in signal processing." Fixed-point algorithms for inverse problems in science and engineering. Springer New York, 2011. 185-212. 106