CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Paper id 28201448
1. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
98
Speech Enhancement of Punjabi Language at Phoneme
Level using Digital Signal Processing Techniques
Jaismine Jassal1, Manjot Kaur Gill2
M.Tech. student, Dept. of Computer Science and Engineering1, Guru Nanak Dev Engg. College, Ludhiana1
Assistant Professor, Dept. of Information Technology2,Guru Nanak Dev Engg. College, Ludhiana2
Email:jassal.priya@yahoo.com1 , gill.manjot@gmail.com2
Abstract-This paper presents an overview of several most commonly used methods for enhancement of degraded speech.
The common methods like Spectral Subtraction, Wiener Filter, Kalman Filter, RASTA Filter and the Proposed Method
which contains the features from all the methods mentioned are explained. Each method uses certain Digital Signal Proc-essing
(DSP) techniques. Framing, windowing, DFT(Discrete Fourier Transform), FFT(Fast Fourier Transform), noise
detection, SNR are the common parameters used in each method. These methods are applied on the phonemes of Punjabi
language extracted from the word recorded.
Keywords- Noise, speech enhancement, phonemes, SNR (Signal to Noise Ratio).
1. INTRODUCTION
Speech signals in the real worlds scenario are often cor-rupted
by various types of degradations. The most common
degradation includes background noise, reverberation and
speech from competing speaker(s). Degraded speech is
poor, both in terms of quality and intelligibility. Therefore,
there is a need to process the degraded speech for enhancing
the perceptual quality and intelligibility. Several methods in
the literature have been proposed for the purpose. Degraded
speech is processed in the frequency domain for achieving
enhancement. Different types of noise from the environ-ment
were being added and their results were computed and
compared.
This paper provides an overview of some of the
commonly used methods, the comparison between them and
the proposed method. The rest of the paper is organised as
follows: Section 2 presents a review of the methods for
processing speech degraded by background noise. Section 3
describes the Punjabi language and its phonemes. Section 4
covers the methodology followed. Section 5 describes the
comparative results and discussion between the methods
applied on the phonemes. The conclusion is discussed in
Section 5.
2. ENHANCEMENT OF NOISY SPEECH
Background noise is the most common factor that causes
degradation of the quality and intelligibility of speech. The
term background noise refers to any unwanted signal that is
added to the desired signal. Background noise can be sta-tionary
or non-stationary and is assumed to be uncorrelated
and additive to the speech signal. Mathematically, speech
degraded by background noise can be expressed as the sum
of clean speech and background noise (Krishnamoorthy and
Prasanna, 2010) given as
s(n) = x(n) + p(n) (1)
where s(n), x(n) and p(n) denote the noisy speech, clean
speech and the background noise respectively. In the fre-quency
domain it can be represented as
S(f) = X(f) + P(f) (2)
where f is the index of frequency bin.
The problem of enhancing noisy speech received
considerable attention in the literature and a variety of
methods have been proposed to overcome it. the over-view
for each of them is discussed underneath.
2.1. Spectral Subtraction
Spectral Subtraction is a very popular method to en-hance
the quality of speech that has been degraded by
additive noise. It is a form of spectral amplitude esti-mation
method to restore signals degraded by additive
noise, where the phase distortion can be ignored
(Saeed, 2005) .Since, it is assumed that the human ear
is insensitive to the phase. This method of enhancement
works at restoring the signal by subtracting an estimate
of the noise spectrum from the noisy signal spectrum
(Saeed, 2005). In Spectral Subtraction the noise in the
degraded speech is estimated from the ‘pauses’ or
‘quiet’ periods in the speech signal, when there is no
speech being said and only noise is present. The noise
spectrum is then usually updated as more frames of
noise or silent periods appear in the speech signal.
However since the noise is random by nature the resul-tant
spectrum can become negative when Spectral Sub-traction
is applied. This means that the negative values
need to be set to a positive value. This in turn can also
cause distortion of the signal but reduces distortion
caused when the spectrum turns negative. Spectral Sub-traction
of the signal takes place in the frequency do-main
rather than the time domain where the signal is
given. To transform the signals to the frequency do-main
is usually done using a Discrete Fourier transform
(DFT). In this, the Fast Fourier Transform is used in-stead
(FFT). The FFT is the same as the DFT only it is
an efficient way of doing it. Therefore, it is quicker and
will use fewer resources when working with it, making
the system more efficient(Paul, 2009).
2.2. Wiener Filtering Method
2. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
99
The improvement to spectral is the Wiener Filter. In
signal processing, the Wiener Filter is a filter used to
produce an estimate of a desired or target random proc-ess
by linear time-invariant filtering an observed noisy
process, assuming non-stationary signal and noise spec-tra,
and additive noise. The Wiener Filter minimizes
the mean square error between the estimated random
process and the desired process. The goal of the Wiener
Filter is to filter out noise that has corrupted a signal
(Paul, 2009).
2.3. Kalman Filtering Method
Next method of improvement in signal is through Kal-man
Filtering. It is an adaptive least square error filter
that provides an efficient computational recursive solu-tion
for estimating a signal in presence of Gaussian
noises. It is an algorithm which makes optimal use of
imprecise data on a linear (or nearly linear) system with
Gaussian errors to continuously update the best esti-mate
of the system's current state (Gannot et al, 1998).
Kalman Filter theory is based on a state-space ap-proach
in which a state equation models the dynamics
of the signal generation process and an observation
equation models the noisy and distorted observation
signal.
This method however, is best suitable for reduction
of white noise to comply with Kalman assumption. In
deriving Kalman equations it is normally assumed that
the process noise (the additive noise that is observed in
the observation vector) is uncorrelated and has a nor-mal
distribution. This assumption extends to whiteness
character of the noise chosen. However, there are dif-ferent
methods developed to fit the Kalman approach to
colored noises (Gannot et al, 1998)
2.4. RASTA Method
The next technique is RASTA i.e. Relative Spectral
Analysis. To compensate for linear channel distortions
the analysis library provides the ability to perform
RASTA Filtering. This method can be used either in
the log spectral or cepstral domains. In effect, the filter
band passes each feature coefficient. the linear channel
distortions appear as an additive constant in both the
log spectral and the cepstral domains. The high-pass
portion of the equivalent band pass filter alleviates the
effect of convolution noise introduced in the channel.
The low-pass filtering helps in smoothing frame to
frame spectral changes (Urmila and Vilas, n.d).
2.5. The Proposed Method for Speech Enhancement
The Proposed method uses the features of Wiener and
Kalman Filtering method. The connection is not simple
cascade but the blocks are interacting. The combination
of Wiener and Kalman approach can be termed as hy-brid
approach used to improve the performance at even
low SNRs (0-15dB). This method is designed to en-hance
the speech ( i.e. phonemes in our case ) degraded
by noise. The method contains certain features of Wie-ner
and some of the parameters and features used in
Kalman filtering technique.
The features of Wiener like doubling the magni-tude
and eliminating negative magnitude because
sometimes the estimated noise could be larger than the
current signal and we end up with a negative magni-tude.
This would lead to poor quality sound and needed
to be limited to positive values to reduce musical noise
and. It was also necessary to keep the code flexible so a
range of values could be tested for the different pa-rameters.
The features from Kalman consists of innovation
process, Kalman gain, and recursive update. The Kal-man
gain matrix acts as a coefficient to the innovation
sequence. Their product gives a correction factor that is
used to update the initial prediction of the state vector.
The final, optimal estimate is the sum of the initial pre-dicted
value and the correction factor. Likewise, the a
prior error covariance is updated to give the posterior
error covariance matrix at time n. Along with this the
SNR was also used. The tests were conducted using
the combination of all these factors to get the enhanced
and better results from all the filtering methods dis-cussed
above.
3. PUNJABI LANGUAGE PHONEMES
Phonemes are the smallest segmental unit of sound to
form contrasts between utterances(Phonemes, n.d).
Punjabi language has 38 consonants and 10 non-nasal
vowels and 10 nasal vowels. these are shown as fol-lows
(Vivek and Meenakshi, 2013):
Figure 1. Punjabi Consonants and Vowels
Consonants are further divided into aspirated and non
aspirated consonants (Phonemes, n.d). Aspirated con-sonants
has sound of ( h, B, P, T, J, C, D, K, d, G, Q)
whereas non aspirated consonants (p, b, q, t, s, j, c, h,
d, r, V, S, g, l, n, x, v, X) have single character sound.
The ten non nasal vowels are divided into two forms
i.e. independent vowels ( A, Aw, au, aU, ie, eI, AY, a
o, AO) and dependent vowels( w, i, I, u, U, y, Y, o,
O ). There are three nasal symbols( N, M, ` ) that pro-duce
double sound and three paireens ( h, v, r).
4. METHODOLOGY
Step 1. Input :Word level input is fed into the system.
This can be done using microphone to record the word.
Step 2. Phoneme Extract: Break words into pho-nemes.
This is done with the help of Sound Forge 5.0.
Step 3. Add noise: Different types of noises are added.
The noise like random noise generated in Matlab (7.12)
which is of same length i.e. of the signal (phoneme).
Apart from this, other types of noises like cars, aircraft,
household, bells, water etc were added whose length
was truncated to the length of speech (phoneme).
3. International Journal of Research in Advent
Step 4. DSP Techniques: Techniques like
digital filtering, blocking into frames, windowing,
noise detection, SNR(Signal-to-Noise Ratio), FFT (Fast
Fourier Transform) etc, applied before filtering met
ods.
Step 5. Filtering methods: The methods explained
above in Section 2 are used and then the results are
computed and compared.
Step 6. Output: Enhanced speech.
5. RESULTS AND DISCUSSION
Different types of noises were used along with different
levels of SNR (Signal to Noise Ratio).
the test for random noise generated in Matlab was also
done at different SNR values. During the whole deve
opment of the algorithms there were tests being co
tinuously carried out to verify that the filters were o
erating as required. These tests involved the developer
listening to the filtered speech, the spectrogram and
also examining graphs of the speech signals that had
gone through the filters. Doing so helped to see the
progress of the filtering methods. When the algorithms
were working, they were then setup to be able to
change the value of the SNR of the signal. This now a
lowed to be able to choose their own SNR value and
run the filters to see how well they functioned under
different levels of noise in the speech signals.
The test itself consisted of different speech samples.
Each speech sample was then broken up again by a
plying different SNR values to the speech samples
ranging from 20db to 40db. Therefore most tests were
held in a relaxing atmosphere at the PC using either
headphones or speakers. First of all, the phoneme is s
lected , afterwards noise is chosen and added to th
phoneme. The phoneme selected was extracted from
the word recorded using Sound Forge5.0
length of phoneme and the noise was made equal by
using truncation method in Matlab using equation
4. ,
where, 'Len' stores the minimum among the both clean
signal and the noise signal. The noisy signal is then
computed using the addition operator in Matlab. The
formula to compute the noisy signal is shown in equ
tion 5 . The (1:Len) is used to shorten length of Both
clean and noise signal to 'Len'.
5. 1:
In the labelling of each figure SS, WF, RF,
notes Spectral Subtraction method, Wiener Filtering
method, RASTA Filtering Method,
method and the Proposed Method respectively
The graph for original signal i.e. phoneme (ey) is
shown Figure 1.
Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
truncation,
meth-
.
Apart from this,
andom devel-opment
con-tinuously
op-erating
raphs . al-be
ap-different
se-lected
the
Forge5.0. Then the
ing 4:
7. 1: (4)
KF, PM de-hod,
Kalman Filtering
respectively.
Figure 2. Original Signal
5.1. Graph of random noise for each method at different
values of SNR
The graphs are plotted in M
'plot' having syntax as shown:
Matlab 7.12 using function
plot(
X,Y); (5)
which creates a 2-D line plot of the data in
the corresponding values in
X where X and Y are both
vectors, both matrices or one vector other matrix of
equal length.
5.1.1. Comparison at SNR 20.0 dB
Y versus
Figure 3. (a) SS (b) WF (c)
RF (d) KF (e) PM at 20 dB
showing Clean signal (blue), noisy signal (red) and filtered
signal (green).
5.1.2. Comparison at 30.0 dB
Fig-
ure 4. (a) SS (b) WF (c) RF (d)
Clean signal (blue), noisy signal (red) and filtered signal
KF (e) PMat 30 dB showing
(green).
5.1.3. Comparison at 40.0 dB
100
8. International Journal of Research in Advent
Figure 5. (a) SS (b) WF (c) RF (d) KF
showing Clean signal (blue), noisy signal (red) and filtered
signal (green).
5.2. Graph of birds005.wav noise for each method at di
ferent values of SNR
As from the previous graphs, we can clearly see the diffe
ence that the Proposed method produces the best result as
compared to the other filters and it is observed that each
filters works best when SNR is increased. Apart from this
another type of noises were also introduced, which
truncated to the length of the phoneme using truncation
The result for each of the filter at SNR ranging from 20db
to 40db in the noise(birds005.wav) is shown underneath:
5.2.1. Comparison at SNR 20.0 dB
Figure 6. (a) SS (b) WF(c) RF (d) KF
showing Clean signal (blue), noisy signal (red) and filtered
signal (green).
5.2.2. Comparison at SNR 30.0 dB
Figure 7. (a) SS (b) WF (c) RF (d) KF
showing Clean signal (blue), noisy signal (red) and filtered
signal (green).
5.2.3. Comparison at SNR 40.0 dB
Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
(e) PM at 40 dB
dif-we
differ-ence
were
truncation.
(e) PM at 20 dB
KF(e) PM at 30 dB
Figure 8. (a) SS (b) WF (c)
RF(d) KF (e) PM at 40 dB
showing Clean signal (blue), noisy signal (red) and filtered
signal (green).
5.3. Spectrogram of birds005.wav noise for each method
at different values of SNR
The another method used for comparison between the di
ferent filters is the spectrogram.
of the spectrum of frequencies in a sound or other signal as
they vary with time or some other variable.
falls, voiceprints, or voice-grams
spectrograms (Spectrogram, n.d)
It is a visual representation
are commonly referred as
For identification of the
cally, spectrograms can be used
in the development fields like
speech processing, seismology
5.3.1. Comparison at SNR 20.0 dB
Figure 9. (a) Original Signal (b)
PM at 20
5.3.2. Comparison at SNR 30.0 dB
Figure 10. (a) Original Signal (b)
(f) PM at 30 dB
dif-ferent
Spectral water-grams
101
d).
ication spoken words phoneti-
, used. Extensively, it can be used
music, sonar, radar, and
mology etc (Spectrogram, n.d).
SS (c) WF (d) RF(e) KF (f)
dB
FF (c) WF (d) RF(e) KF
9. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
102
5.3.3. Comparison at SNR 40.0 dB
Figure 11. (a) Original Signal (b) SS (c) WF (d) RF (e) KF
(f) PM at 40 dB
As from the above spectrogram it can be more precisely
seen that each filters performs better when SNR(Signal to
Noise Ratio) is increase and at the same time it indicates
that the Proposed method performs better even at low SNR
value as described in the earlier comparison phase.
6. CONCLUSION
After studying and comparing all the filtering techniques, it
is clear that the proposed method gives the better results
even in the random noise and other noises observed, re-corded
and used in these methods.
As from the discussion in previous section, it be-comes
clear that even at low SNR value the results of the
Proposed method are better from the other four filters. The
Table 1 shows the rating 1 to 5 ranging from very poor,
poor, bad, good to very good respectively.
In the Table 1 the five different methods are la-belled
as SS, WF, RF, KF, PM denoting the Spectral sub-traction
method, Wiener Filtering method, Rasta Filtering
method, Kalman filtering method and the proposed method
respectively. The rating is done on the behalf of the results
computed and the comparison shown in previous section.
Table 1: Rating for each method based on the testing results
Noise type(.wav) Filtering Methods
SS WF RF KF PM
Randn 2 4 4 3 5
cars002.wav 2 4 4 3 5
household018.wav 3 3 4 3 5
aircraft003.wav 3 4 4 2 5
animals006.wav 2 3 4 2 4
birds005.wav 2 3 3 2 4
REFERENCES
[1] P. Krishnamoorthy; S. R. Mahadeva Prasanna (2010),
Temporal and Spectral Processing Methods for Process-ing
of Degraded Speech: A Review.
[2] Paul Coffey (2009), Enhancement of Speech in Noisy
Condition, Project Report, National University of Ire-land,
B.E. Electronic Engineering.
[3] Phonemes (n.d),Available from:
https://www.princeton.edu/~achaney/tmve/wiki100k/doc
s/Phoneme.html
[4] S.Gannot,D.Brushtein,E.Weinstein (1998), Iterative and
Sequential Kalman filter-based Speech Enhancement
Algorithms, IEEE Transaction,Speech AudioProcess,
vol. 6, no. 4, pp. 373-385.
[5] Saeed V.Vasegi (2005), Advanced Digital Signal Proc-essing
and Noise Reduction, Third edition.
[6] Spectrogram (n.d), Available from:
en.wikipedia.org/wiki/Spectrogram.
[7] Urmila Shrawankar, Dr Vilas Thakare (n.d), Techniques
for Feature Extraction in Speech Recognition System: A
Comparitive Study, Available from:
arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf
[8]. Vivek Sharma, Meenakshi Sharma(2013), A quantita-tive
study of the Automatic Speech Recognition Tech-nique,
International Journal of Advances in Science and
Technology, vol 1 issue 1.