More Related Content
Similar to Investigations on the role of analysis window shape parameter in speech enhancement using hybrid method (20)
Investigations on the role of analysis window shape parameter in speech enhancement using hybrid method
- 1. 978-1-5386-4304-4/18/$31.00 ©2018 IEEE.
Investigations on the Role of Analysis Window
Shape Parameter in Speech Enhancement using
Hybrid Method
*S.China Venkateswarlu,Research Scholar,ECE-
JNTUK ,Professor-ECE,MRIET-JNTUH
Secunderabad, Telangana State, India
sonagiricv@gmail.com
****N.Karthik, Research Scholar from VelTech, Chennai,
Asst.Professor-ECE, NREC-Sec-Bad,
karthik011190@gmail.com
**A.Subba Rami Reddy,Principal & Professor-ECE,
GKCE, Sullurpet, JNTUA, INDIA
skitprincipal@gmail.com
*** K.Satya Prasad, Professor-ECE, JNTUK-Kakinada,
India. Prasad_kodati@yahoo.co.in
Abstract— This paper investigates the effect of Window
shape on the improvement of Speech quality by reducing the
noise using Analysis Window with optimum shape. In Speech
Enhancement process, signal corrupted by noise is segmented
into frames and each segment is Windowed using Analysis
Window with variation in the shape parameter. The
Windowed Speech segments are applied to the Hybrid
Methods and the Enhanced Speech signal is reconstructed in
its time domain. The focus is to study the effect of analysis
Window shape on the Speech Enhancement process. For
different shapes of the analysis Window, It is observed that the
Window shape plays an important role on the enhancement
process. The quality of the Enhanced Speech was measured
using six objective measures. The results are compared with
the measures of analysis window and an optimum shape
constant for the analysis Window is proposed for better speech
quality. This paper investigates the improvement of Speech
quality in terms of six objective quality measures using
Discrete Wavelet Transform and proposes two Hybrid
methods
Keywords— discrete wavelet transform, hamming window,
Gaussian ,Kaiser, Dolph Chebyshev and Polynomial Windows, ,
Speech Enhancement, Shape parameter alpha, objective quality
measures, Thresholding.
I. INTRODUCTION
Speech is the most primary human communication. For
that reason, it exists a big trend to increase and improve
telecommunications [1]. Now-a-days, all the people use the
communication devices such as telephones, mobiles, internet
etc., as a primary goal and the customers demand a high
coverage and quality. But a speech signal is often degraded by
additive background noise. Listening task is very difficult at
the end user, in such noisy environment. Therefore, it is
necessary to develop speech enhancement algorithms. Speech
enhancement is the most important field of speech processing.
Speech enhancement refers to methods aiming at recovering
speech signal from a noisy observation. During the last
decades, many algorithms and various approaches have been
proposed to the problem such as spectral subtraction [2],
wavelet based methods [3], hidden Markovmodelling [4] and
signal subspace methods [5] to improve the perceptual quality
of the speech signals from the corrupted input signal.
The wavelet based de-noising algorithm is one of the ways
for speech enhancement. Telugu speech sentences are applied
to this algorithm for enhancement. Wavelets have been found
to be a powerful tool for removing noise. The paper is
organized as follows: Chapter I –Introduction, Chapter II
explains the fixed and Analysis windows. Chapter III explains
objective measures; Chapter IV describes the Algorithms,
Chapter V presents the Implementation and Simulation
Results and chapter VI describes the Conclusion.
A. Fixed and analysis windows
In literature many windows have been proposed [7-10].
They are known as suboptimal solutions, and the best window
is depending on the applications. Windows can be categorized
as fixed or adjustable [11]. The Dolph-Chebyshev Window is
constructed in the frequency domain by taking uniformly
spaced samples of the window with Discrete Fourier
Transform (DFT) and =Length of the side lobe attenuation.
Discrete time domain Dolph-Chebyshev Window function can
be written as
1N
0k
N
j2π2π
W(k)ew(n) ------ (1.1)
B. Kaiser-Bessel Windows
The Kaiser-Bessel window was formulated to be a simple-to-
compute approximation to the zero th order discrete prolate
- 2. 978-1-5386-4304-4/18/$31.00 ©2018 IEEE.
spheroidal sequences in the main lobe [8]. In discrete time
domain, Kaiser Window is defined by [9]
(1.2)
Where “α” is the adjustable shape parameter, and I0(x) is the
modified Bessel function of the first kind of order zero and it
is described by the power series expansion as
(1.3)
where I0 (・) is the zero th-order modified Bessel function of
the first kind. As ” increases, the main lobe width widens
and the side lobe attenuation increases. For ” = 0, the Kaiser-
Bessel window is a rectangular window. For ” = 5.3, the
Kaiser-Bessel window is close to a Hamming window. The
two parameters useful to obtain the desired amplitude
response pattern of the above defined windows are the length
of the sequence N and a shape parameter. In this study the
number of FFT points considered in the Windowed speech
frame is fixed to 160 or 240 hence the window length N is
fixed. Therefore the shape parameter” can only be varied to
achieve the desired pattern of the magnitude response of the
Window used. As the shape parameter “” increases, the side
lobe level of the magnitude response decreases at the cost of
main lobe width.
II. Objective Measures
i) Signal-To-Noise Ratio (SNR): The Signal-to-Noise Ratio
(SNR) is the ratio of signal energy to noise energy and is
expressed in decibels dB given by [2-4] as
(2.1)
ii) Segmental Signal-to-noise Ratio: The Seg-SNR is the
frame-based SNR and is estimated as It is defined [2-4] as
= (2.2)
iii) Weighted Spectral Slope Distance: WSS distance measure
computes the weighted difference between the spectral slopes
in each frequency band. The spectral slope is obtained as the
difference between adjacent spectral magnitudes in decibels.
The WSS measure is defined and evaluated [18] as
(2.3)
iv) Log Likelihood Ratio: The LLR measure is defined [19] as
LLR= (2.4)
v) Cepstrum distance: The Cepstrum distance [19] provides an
estimate of the log spectral distance between two spectra. It is
defined as
+ c(k) for
1≤m≤p (2.5)
vi) Frequency weighted segmental SNR: It is computed [4,
20] using the following equation
(2.6)
III. Algorithms
A. Weiner Filter Method
Noise corrupted speech signals are processed by the Weiner
Filter speech enhancement algorithm based on a priori SNR
estimation [21]. Let s(t) and b(t) denote the speech and the
additive noise processes, respectively. The observed signal
x(t) is given by
x(t)=s(t)+b(t). (3.1)
(3.2)
(3.3)
GW (k, t) = (3.4)
B. Spectral Subtraction
- 3. 978-1-5386-4304-4/18/$31.00 ©2018 IEEE.
The noise corrupted speech is processed by the Spectral
Subtraction method to get processed or enhanced speech.
Spectral Subtraction [21-24] is a popular frequency domain
method to reduce the effect of additive uncorrelated noise in a
signal.
y(n) = s(n) + v(n) (3.5)
= (3.6)
Here Y(w, k) is the DFT of y(n, k) given by
= (3.7)
In this paper, all the measures were evaluated with a = 1.7.
(3.8)
Where 0 ≤ µ ≤ 1 is exponential averaging constant. In this
work “µ” is selected as 0.85.
C. Speech Enhancement Using Wavelet Transform
The wavelet based speech signal enhancement technique was
proposed by Donoho and Johnstone [8]. This method is based
on thresholding the wavelet coefficients of noisy speech
signal. The fundamental idea behind wavelets is to analyse
according to scale.
D. Hybrid Thresholding
In this method the authors are proposed two new
thresholding schemes by combining with modified
improved thresholding scheme with soft thresholding and
modified improved thresholding with improved
thresholding and are defined in EQ.11 and EQ.12 given
below.
(4.13)
(3.9)
IV. Implementation and Simulation Results
A. Speech Materials
The aim of this section is to acquire the speech samples. The
experimental part consists of recording each of the well-
known Telugu Speech proverbs at a normal speaking rate
three times in a quiet room by three male and three female
native Telugu speakers (age around 23 years) at a sampling
rate of 48 kHz and 16 bit value. These digitized speech sounds
are then down sampled to 8 kHz and then normalized for the
purpose of analysis. The Gaussian white noise is added to the
speech signal in four particular SNRs: (15 dB, 10 dB, 5 dB, 0
dB). DWT is used to obtain the Enhanced Speech Signal from
noisy Speech Signal. The so produced pairs of reference and
Enhanced Signals are used for evaluating the objective
measures of speech quality.
Figure 4: Block Diagram of Wavelet De- noising of Speech Signal
Noisy
speec
h
Enhance
Speech
DWT
Level Dependent Thresholding
Framing
&Windowing
Windowing
IDWT
Overlap &Add
- 4. 978-1-5386-4304-4/18/$31.00 ©2018 IEEE.
V.
VI.
VII.
VIII.
IX.
X.
XI.
XII. Phonetically balanced clean speech signals and noise
Figure 4(a): LLR Variation of enhanced speech
as a function of β
Figure 4(b): WSS Variation of enhanced speech as a function of β
Figure 4(c): CEP Variation of enhanced speech as a function of β
Figure 4(d): SEG-SNR Variation of enhanced speech as a function of β
Figure 4(e): FWSEG Variation of enhanced speech as a function of β
Figure 4(f): SNR Variation of enhanced speech as a function of β
- 5. 978-1-5386-4304-4/18/$31.00 ©2018 IEEE.
Corrupted signals at different SNR levels have been taken
from a speech corpus called NOIZEUS. Noise corrupted
speech signal is segmented into frames containing 160
samples of 20ms length (at 8 KHz Sampling rate).160- point
Discrete Fourier Transform (DFT) of each segment is obtained
after applying the Kaiser window with variable shape
parameter “α”.
Spectral Subtraction Algorithm is applied to the spectral
components of each segment using mathematical equations.
The signal is reconstructed in its time domain with the help of
IDFT and overlap adds method (with 40% overlap between
frames). The signal thus obtained is the enhanced signal. The
performance of the enhanced signal is analyzed by using six
objective measures for speech enhancement. The measures are
WSS, LLR, fwseg-SNR, Cep, Seg-SNR, and SNR. All the
measures are computed by segmenting the sentences using 30-
ms duration Hamming windows with 75% overlap between
adjacent frames. A tenth order LPC analysis was used in the
computation of LPC- based objective measure LLR.
V. Conclusion
In this paper, we proposed an improved wavelet
thresholding speech enhancement system, which uses the
proposed Super-Soft thresholding algorithm improves the
noisy speech wavelet coefficients in a way that avoids sharp
time-frequency discontinuities in the speech spectrogram that
can decrease the quality of the enhanced speech signal. This
system also uses the estimation of the clean speech signal
energy for each frame to select the threshold for the
thresholding algorithm of the current frame. A further
advantage of this algorithm is that unlike most of the other
wavelet-based algorithms in which the detection of unvoiced
segments highly affects their performances, the proposed
method does not require any voiced/unvoiced detection
method. The results confirmed the improvement in
performance and achievements of our system.
Acknowledgements
The work is carried out through the research facility at the
Department of Electronics & Communication Engineering
GKCE, Sullurpet, Nellor-Dt,. Andhra Pradesh, as part of
Research Work. The Authors also would like to thank the
authorities of JNTUK, Kakinada, A.P, INDIA, encouraging
this research work. Our thanks to the experts who have
contributed towards development of this Research work.
VI. References
[1]. P. Loizou, Speech Enhancement Theory and Practice, CRC Press,
2007.
[2]. Milner, B.P., Speech enhancement: In Speech Technology
for Telecommunications,London. Chapman and Hall, London, 1998.
[3]. Hansen, J.H.L., “Speech enhancement. Encyclopaedia of electrical
and electronics engineering, Wiley, 20, 159-175, 1999.
[4]. Y. Ephraim, H. L. Ari, and W. Roberts, A brief survey of Speech
Enhancement, in The Electronic Handbook, 2nd ed. CRC Press, Apr. 2005.
[5]. Y. Ephraim and I. Cohen, Recent advancements in Speech enhancement,
in The Electrical Engineering Handbook. CRC Press, ch. 15, pp.12,26,
2006.
[6]. D. O Shaughnessy,”Enhancing Speech Degraded by Additive Noise or
Interfering Speakers,IEEE Communications Magazine,Feb1989,pp.46-52
[7]. F. J. Harris, On the Use of Windows for Harmonic Analysis with the
Discrete Fourier Transform”, Proc. IEEE, vol. 66, no. 1, pp. 51-83, Jan.
1978.
[8]. J. F. Kaiser, Nonrecursive digital filter design using I0 sinh window
function. in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 74), pp.20-
23, San Francisco, Calif, USA, April 1974.
[9]. T. Saram¨ki, A class of window functions with nearly minimum side lobe
energy for designing FIR filters, in Proc. IEEE Int. Symp. Circuits and Systems
(ISCAS 89), vol. 1, pp. 359-362, Portland, Ore, USA, May 1989.
[10]. C. L. Dolph, A current distribution for broadside arrays which
optimizes the relationship between beamwidth and side-lobe level, Proc.IRE,
vol. 34, pp. 335-348, June 1946.
[11]. T. Saram¨ki,Finite impulse response filter design, in Hand book for
Digital digital Processing, S. K. Mitra and J. F. Kaiser, Eds., Wiley, New
York, NY, USA, 1993.
[12]. Rabiner, L. R., McClellan, J. H. and Parks, T., FIR Digital Filter
Design Techniques Using Weighted Chebyshev Approximation, Proc. IEEE,
Vol. 63, pp. 595 610, April 1975
[13]. Puneet Singla and Tamnaraj Singh Desired order of continuous
polynomial Time window functions for Harmonic Analysis, IEEE
Transactions on Instrumentation and Measurement, Vol.59, No.9, 2475-2481
September 2010.
[14]. S.Boll, Suppression of acoustic noise in speech using spectral
subtraction, IEEE Tans Acoust., Speech, Signal Processing, vol. ASSP-
27, pp 113-120, April 1979.
[15]. Kamath, S. and Loizou, P. “A multi-band spectral subtraction method
for enhancing Speech corrupted by colored noise, Proc. Of International
Conference on Acoustics, Speech and Signal Processing, Vol. 4, pp. 4160-
4164, 2002.
[16]. Vaseghi, Saeed V., Advanced Digital Signal Processing and Noise
Reduction, Second Edition, John Wiley & Sons Ltd, July 2000.
[17]. P.Scalart and J. Filho, speech enhancement based on a priory signal to
noise estimation",in proc. IEEE Int. Conf. Acoust, speech, Signal Process.,
1996 pp.629-632.
[18]. J.S.Lim, A.V.Oppenheim, Enhancement and bandwidth compression of
noisy speech, in Proc. IEEE, Vol. 67, pp.-1586-1604, December 1979.
[19]. Ephraim, Y., and Malah, M., Speech enhancement using a minimum
mean-square error short-time spectral amplitude estimator. IEEE Trans.
Acoustics, Speech and Signal Process. 32(6), 1109-1121, 1984.
[20]. Y. Ephraim and D. Malah, “Speech enhancement using a minimum
mean square error log-spectral amplitude estimator, IEEE Trans. Acoust.,
Speech, Signal process., vol.ASSP-33,No.2, pp. 443,445, Apr. 1985.
[21]. David L. Donoho, “De-noising by Soft Thresholding”, IEEE
Transactions on Information Theory, vol. 41, No. 3, 1995.
[22]. J.I. Agbinya, Discrete Wavelet Transform Techniques in Speech
Processing,IEEE Tencon Digital Signal Processing Applications
Proceedings, IEEE, New York, NY, 1996, pp 514-519.
[23]. Y. Ghanbari , M. Karami, A Modified Speech Enhancement System
Based on the Thresholding of the Wavelet Packets, 13th ICEE-2005,
Vol. 3, Zanjan, Iran, May 10-12, 2005.
[24]. Y.Hu & P.Loizou, “Evaluation of objective measures for speech
enhancement.IEEE Trans. Audio speech Lang. process, Vol .16. No.1,
pp.229- 238, Jan-2008.
[25]. P. Krishnamoorthy, An Overview of Subjective and Objective Quality
Measures for Noisy Speech Enhancement Algorithms, IETE technical
review, vol 28, issue 4, jul-aug 2011, pp 292-301.
[26]. S. China Venkateswarlu, ASR Reddy, K. S. Prasad, Speech
Enhancement using Bolls Spectral Subtraction Method based on Gaussian
Window, Global Journal of Researches in Engineering: F, E and E
Engineering, Volume 14 Issue 6, Version 1.0, pp.9-20, Year 2014.
- 6. 978-1-5386-4304-4/18/$31.00 ©2018 IEEE.
Biography
S.China Venkateswarlu born in
Mittathmakur, a nearby Village of Gudur,
Nellore District, Andhra Pradesh, India in
1969. He obtained his B.Tech (Electronics &
Communication Engineering) from
Nagarjuna University (Acharya Nagarjuna
University), A.P, INDIA. He earned his DPH
& S.Tech from the College of Medical
Technology, Ongole, A.P, India. M.Tech
(Digital Systems & Computer Electronics)
from JNTU College of Engineering, Kukatpally, Hyderabad, A.P, INDIA.
And Pursuing PhD degree from JNTU College of Engineering, Kakinada,
A.P, INDIA. All in Electronics & Communication Engineering. He Worked as
Associate Lecturer, Assistant Professor, Associate Professor, Professor, and
Head of the Department in different Engineering Colleges of Andhra Pradesh,
INDIA. Working as an Professor in Dept. of ECE of MRIET-Hyderabad, TG,
INDIA. S.China Venkateswarlu is having more than 16 years of teaching
experience and published more than 25 papers in referred National and
International Journals. He has presented more than15 papers in National and
International Journals. He has reviewed one Book on Digital Signal
Processing from M/S Persons Educations. He is a Life member of ISTE, CSI,
and IAENG. He is working as a Reviewer in Different International Journals.
His areas of interest Digital Signal Processing, Speech Processing, Embedded
Systems, Digital Image Processing, E-mail IDs: sonagiricv@gmail.com,&
cvenkateswarlus@gmail.com
Dr. A. Subbarami Reddy born in Anjimedu, a
nearby village of Tirupati, Andhra Pradesh, India
in 1958. He obtained his M.Sc Physics
(Electronics) from S.V University, Tirupati, India.
He earned his AMIE from the Institution of
Engineers (India), Kolkata, India, M.Tech from
now NIT, Kurukshetra, India and PhD degree from Andhra University,
Waltair, India all in Electronics and Communication Engineering. He worked
as Laboratory Assistant, Associate Lecturer, Lecturer, Assistant Professor,
Associate Professor, Professor, Sr.Professor and Head of the Department and
Dean in different Engineering colleges of Andhra Pradesh, India. Presently he
is the Principal of GKCE, Andhra Pradesh, India. Dr. A. Subbarami Reddy is
having more than 30 years of experience and published more than 25 papers
in referred International and National Journals. He is a life member of ISTE
(India), Fellow of IETE. His areas of interest include Signal Processing and its
Sub fields. Email ID: skitprincipal@gmail.com
K.SatyaPrasad3
, B.Tech., ME., Ph.D.,
L.M.I.S.T.E., MIEEE, F.I.E., F.I.E.T.E. is a Ex-RECTOR & PROFESSOR in
ECE, in JNT University, Kakinada, A.P, INDIA, Worked in various
Engineering College and have 38 years of Teaching, R&D and Industrial
experience. He has published 28 research papers in National and International
Journals. He has presented 20 National and 2 International Conference Papers.
His research interest includes Signal Processing, Speech Processing, Wireless
Communications and Embedded Systems. He has guided 04 PhDs, 15
students pursuing PhD under his guidance. He has guided 15 M.Techs and 13
Projects. He has authored or Co-authored book entitled: Electronic Devices
and Circuits. He is the Member Curriculum development: Chairman, Board of
Studies, JNTU, Hyderabad and GPR College of Engineering, Kurnool. He
officiated as : Head of ECE dept. from January 1997 to September 2003,
Vice-Principal, JNTUCEK from October 2003 to May 2005 Principal,
JNTUCEK from May 2005 to June 2007. He has completed R & D Project
sanctioned by Ministry of Human Resources and Development, Government
of India. Email ID: prasad_kodati@yahoo.co.in.