SlideShare a Scribd company logo
1 of 8
Download to read offline
1
A Novel Parallel Model Method for Noise
Speech Recognition
ZHANG Mingxin1 2
, CHEN Guoping1 2
, NI Hong1
, ZHANG Dongbin1
1
(Institute of Acoustics, Chinese Academy of Sciences, Beijing 100080, China)
2
(Graduate School of the Chinese Academy of Sciences, Beijing 100039, China)
Abstract ─ In noise robust speech recognition, parallel model combination (PMC)
method is suitable for non-stationary environment noise, and theoretically the performance
the combined model can approach that of the model matching the noisy environment, so it is
an important and popular noise robust speech recognition research field. In this paper, a
new feature MFCC_FWD_BWD is presented to make PMC much simple and direct, which
is based on forward-backward difference dynamic parameters. On this condition, a novel
parallel sub-state hidden Markov model (PSSHMM) is also presented for PMC, which
topology is different from that of the standard hidden Markov model (HMM). In PSSHMM
each state has parallel sub-states with transitions. In experiment, PSSHMM using the
feature MFCC_FWD_BWD achieves good results under each kind of noise and SNR.
Especially for non-stationary noise, its robust performance is also excellent.
Key words ─ Parallel Model, Speech recognition, Noise robust, PMC
1 Introduction
Recognition rate of LVCSR (large vocabulary continuous speech recognition) system has
reached fairly high level in laboratory environment up to now. However, if the system works in
noisy environment, the performance degrades seriously. Such performance degradation has greatly
prevented the application of LVCSR. Therefore, robust speech recognition in noisy environment is
becoming increasingly important.
Now international noise robust speech recognition researches mainly focus on three aspects.
Firstly, robust feature representation is used, such as relative spectral (RASTA)[1], perceptual
linear prediction (PLP) and cepstral mean normalization (CMN). Secondly, approaches trying to
modify the testing speech features to make them better match the conditions of the pre-trained
recognition model. The methods based on spectral subtraction [2] and speech enhancement belong
to this aspect. Thirdly, the compensation is performed on the pre-trained model to match the noisy
background. Such model-based compensation schemes include parallel model combination
(PMC)[3][4], maximum likelihood linear regression (MLLR), etc.. Because PMC is suitable for
non-stationary noise and the performance of the combined model without retraining can
approximate that of the matched model trained using the noisy speech of corresponding
environment, it has been paid great attention. PMC is the subject of this paper.
In this paper, the basic PMC method is introduced first. Then the new feature named
MFCC_FWD_BWD is described, which dynamic parameters in feature vector is based on forward
and backward difference. And then, PSSHMM is presented for PMC noise robust speech
recognition model. The model parameters combination algorithm is also explained. Last, the
evaluation and conclusion are given.
2
2 Basic PMC Method
When LVCSR system works in additive noise environment, the matched model should be the
retrained model using the noisy speech sampled in the environment or obtained by adding the
noise and pure speech in time domain waveform. This model will have the best performance under
the noise environment. However, retraining the matched model online in any environment is
impractical for its great computation costs. Fortunately, PMC method doesn’t need retraining. It
believes that the pure speech model contains enough information about the speech feature and the
noise model contains enough information about the noise feature, so we can combine the speech
and noise model to match the noisy background [4].
In this paper, the speech model is the standard HMM model. The noise model is the single
Gaussian component and state-full-transition model, which is obtained by clustering noise feature
vectors. It has no starting and ending state, which is different from HMM speech model.
In order to describe the effects of the noise on the clean speech, a series of assumptions are
required, as show in the following [3].
1) speech and noise are independent.
2) speech and noise are additive in the time domain.
3) a single Gaussian or multiple Gaussian component(s) model contain sufficient information to
represent the distribution of the observation feature vector in cepstral or log-spectral domain.
4) the frame/state alignment used to generate the speech models from the clean speech data is
not altered by the addition of noise.
Under above assumptions, the speech and noise is treated as additive in power spectral [5]. As the
feature used in recognition is usually in cepstral domain, the model parameters of speech and
noise should be transformed to spectral domain. After model combination, the combined model
should be transformed back to cepstral domain. The procedure is shown in Fig.1.
1−
C 1−
C
exp{} exp{}
PMC
log{}
C
Fig.1 Parallel model combination procedure
3 Feature Vector Construction Method for PMC
PMC method requires that the feature vector used in recognition can regenerate the raw
parameter vectors that will be used for combination in power spectral domain [3]. Especially for
dynamic feature parameters in the feature vector, this requirement is much more important. For
this reason, we present a novel feature vector MFCC_FWD_BWD. Its static part is the same with
MFCC_D_A, while its dynamic part uses the forward and backward difference parameters to take
3
place of the difference and accelerate parameters. The MFCC_FWD_BWD feature is constructed
as following:
TTc
Bw
Tc
Fw
Tcc
NFVec ])()()([)( ττττ OOOO ∆∆= (1)
where )()()( τττ c
FB
cc
Fw w OOO −+=∆ is the forward difference part and
)()()( FB
ccc
Bw w−−=∆ τττ OOO is the backward difference part. In matrix form,
c
NTVecN
FB
c
c
FB
c
c
Bw
c
Fw
c
c
NFVec
w
w
OA
O
O
O
II0
0II
0I0
O
O
O
O =










−
+










−
−=










∆
∆=
)(
)(
)(
)(
)(
)(
)(
τ
τ
τ
τ
τ
τ
τ . (2)
However, MFCC_D_A is constructed as
c
TVec
c
c
c
c
c
c
c
c
c
FVec
w
w
w
w
AO
O
O
O
O
O
I02I0I
0I0I0
00I00
O
O
O
O =


















−
−
+
+










−
−=










∆
∆=
)2(
)(
)(
)(
)2(
)(
)(
)(
)(
2
τ
τ
τ
τ
τ
τ
τ
τ
τ . (3)
Comparing formula (2) and (3), we can see that MFCC_FWD_BWD construction matrix
NA is invertible, so feature vector static time series
c
NTVecO can be obtained from feature
vector )(τc
NFVecO . On the contrary, because the MFCC_D_A construction matrix A is not
invertible, we can not obtain
c
TVecO from )(τc
FVecO . Construction matrix is invertible is
necessary for PMC.
4 Parallel Sub-State HMM for PMC
4.1 Speech model and noise model used in PMC
In the system of this paper, clean speech model is the usually used standard HMM model,
which is the finite state machine model and is quite suited for describing the speech generating
procedure. The topology of HMM model is shown in Fig.2. HMM model may be characterized by
following three important parameters:
SN , the number of states in the model;
ST , the state-transition probability matrix;
)( tjb o , 1,,2 −= SNj L , the output observation probability distribution.
In the model, both the starting state and ending state are non-emitting states which are used for
HMM models connection. Here )( tjb o are often described by Gaussian (single or multiple)
4
component(s), i.e. ∑
=
=
Ms
m
jmjmjmtj Ncb
1
),()( Sµo , where 1≥sM (for the convenience of
explanation, we let 1=sM in the following, then ),()( jjtj Nb Sµo = ).
The noise model in the paper is defined as full-state-transition model, which topology is
shown in Fig.3. It is composed by several states, which parameters are obtained by clustering
background noise features. The noise model also can be characterized by following three
important parameters:
NoiN , the number of states in noise model;
NoiT , the full-state-transition probability matrix of noise model;
)( tkb o , noiNk ,,1 L= , the output observation probability distribution;
where )( tkb o is described by single Gaussian component, i.e. )
~
,~()( kktk Nb Sµo = .
Fig.2 HMM topology
Fig.3 Noise model topology
4.2 PSSHMM
PMC combines the clean speech model and noise model to achieve the matched model. In
this subsection, the presented PSSHMM used for the combined matched model is described in
detail. The topology of PSSHMM is shown in Fig.4. PSSHMM is a complex HMM model, while
each state has several parallel sub-states. These sub-states are generated by combining the
corresponding clean speech state and each noise state. In PSSHMM, there are two kinds of
transition. One is the transition of the global HMM, as shown in Fig.5; the other is the transition
among the sub-states, which obey the noise states transition matrix. Seen from the time
synchronous expanded states series, we can find that the sub-states are arranged parallel and at
each time point only one sub-state can emit an observation, as shown in Fig.6. It also can be seen
that the transition among the sub-states exists between the previous and posterior time
synchronous states.
Fig.4 PSSHMM topology
5
Fig.5 PSSHMMl is a complex HMM
2t 3t 4t1t
Fig.6 PSSHMM time synchronous expanded state series
The PSSHMM can be described using following five parameters:
pmN , the number of states in the model;
pmT , the state-transition probability matrix;
subD , the number of sub-states in each model state;
subT , the sub-state-transition probability matrix;
)( tjkb o , 1,,2 −= pmNj L , subDk ,,1 L= , the output observation probability distribution of
each sub-state.
Here )( tjkb o is often described by Gaussian component, i.e. )ˆ,ˆ()( jkjktjk Nb Sµo = , where
jkµˆ and jkSˆ are obtained by parallel model combination algorithm that will be discussed in
following section.
It should be emphasized that the output probability of parallel model state is related with that
of the sub-states. The relation can be described by )}|()({max)( 1−⋅= ttjk
k
tj kkPbb oo , which
has directly effect on recognition decoding, where 1−tk is previous optimal sub-state label and
)|( 1−tkkP is the sub-state-transition probability, i.e. ],[)|( 11 −− = tsubt kkkkP T .
5 Parallel Model Parameter Combination Algorithm
Using the MFCC_FWD_BWD feature, we combine the clean speech model and noise model
to achieve the parallel model to match the noisy environment. In this paper log-add algorithm [4]
is used for the model parameters combination. Log-add algorithm only combines means and does
not combine variances. It is assumed that clean speech model state is described by Gaussian
components ])[],([ c
Bw
c
Fw
cc
Bw
c
Fw
c
N ∆∆∆∆
SSSµµµ and noise model state is described by Gaussian
6
components ])
~~~
[],~~~([ c
Bw
c
Fw
cc
Bw
c
Fw
c
N ∆∆∆∆
SSSµµµ . The parameter combining steps is:
1) Transform the clean speech model parameters from MFCC_FWD_BWD to static time series
parameters in cepstral, i.e.
[ ] [ ]TTc
Bw
Tc
Fw
Tc
N
TTc
w
Tc
w
Tc ∆∆−
−+ = µµµAµµµ 1
τττ (4)
It is the same for noise model
[ ] [ ]TTc
Bw
Tc
Fw
Tc
N
TTc
w
Tc
w
Tc ∆∆−
−+ = µµµAµµµ ~~~~~~ 1
τττ (5)
2) Using IDCT, transform the static time series parameters from cepstral domain to log domain,
i.e.
[ ] [ ]TTc
w
Tc
w
Tc
TTl
w
Tl
w
Tl
−
−
+
−−
−+ = ττττττ µCµCµCµµµ 111
(6)
[ ] [ ]TTc
w
Tc
w
Tc
TTl
w
Tl
w
Tl
−
−
+
−−
−+ = ττττττ µCµCµCµµµ ~~~~~~ 111
(7)
3) Combine the parameters of clean speech model and noise model using log-add algorithm, i.e.
}}~exp{}log{exp{ˆ τττ
lll
µµµ += (8)
}}~exp{}log{exp{ˆ w
l
w
l
w
l
+++ += τττ µµµ (9)
}}~exp{}log{exp{ˆ w
l
w
l
w
l
−−− += τττ µµµ (10)
4) Using DTC, transform the combined model parameters from log domain to cepstral domain,
i.e.
[ ] [ ]TTl
w
Tl
w
Tl
TTc
w
Tc
w
Tc
−+−+ = ττττττ µCµCµCµµµ ˆˆˆˆˆˆ (11)
5) Transform the static time series combined model parameters to MFCC_FWD_BWD, i.e.
[ ] [ ]TTc
w
Tc
w
Tc
N
TTc
Bw
Tc
Fw
Tc
−+
∆∆
= τττ µµµAµµµ ˆˆˆˆˆˆ (12)
Thus, the sub-state output observation probability components of combined parallel model states
subpmtjk DkNjb ,,1,1,,2),( LL =−=o , can be calculated by combining the clean speech
model state Stj Njb ,,1),( L=o and each noise model state noitk Nkb ,,1),( L=o using the
above log-add algorithm.
6 Evaluation
Our experiment is based on HTK3.0 [6] speech recognition platform that has been improved
and changed for PMC. The acoustic models are context dependent single Gaussian component
model. The database used for clean speech was Mandarin 863 Speech Database. We select 9515
sentences of 16 female speakers as training set and 400 sentences of 4 speakers outside the
training set as testing set. The second database used is NOISEX92 as noise database. We select
four kinds of representative noise: babble, f16, machinegun and white. The four kinds of noise are
added to clean speech according to certain ratio to form four different SNR: 30db, 20db, 10db,
0db noisy speech for test. All of feature vector sizes are 39.
7
In the experiment, we first test recognition performance of the clean speech data using the
MFCC_D_A and MFCC_FWD_BWD features. As shown in table 1, it can be seen that the word
accurate recognition rates of two kinds of feature are almost same and the difference is less than
0.5%. So the new feature MFCC_FWD_BWD simplifies parameter combination procedure with
slight decrease in recognition rate.
Table 1. Accuracy comparison of clean speech (MFCC_D_A vs MFCC_FWD_BWD)
Feature kind Acc (%)
MFCC_D_A 75.08
MFCC_FWD_BWD 74.65
In noise robust experiments, the baseline system uses the MFCC_D_A feature and standard
HMM, and the PMC testing system uses the MFCC_FWD_BWD feature and PSSHMM. In PMC,
the number of noise model state is 3. In order to compare the performance of the new method we
also test the spectral subtraction method, which is usually used in noise robust speech recognition
field. The evaluations are given out in following table 2-4.
Table 2. Baseline system performance
Baseline system (MFCC_D_A & HMM)
Acc(%) babble f16 machinegun white Avg.
30db 66.52 69.10 68.46 55.77 64.96
20db 47.80 51.34 61.47 18.92 44.88
10db 5.70 10.41 48.34 5.34 17.45
0db 0.36 0.00 33.46 0.00 8.46
Avg. 30.10 32.71 52.93 20.01 33.94
Table 3. Spectral subtraction method performance
Spectral subtraction (MFCC_D_A & HMM)
Acc (%) babble f16 machinegun white Avg.
30db 65.67 69.10 66.70 65.83 66.83
20db 51.62 57.42 58.47 34.97 50.62
10db 13.69 24.44 37.31 5.46 20.23
0db 0.00 1.67 11.62 2.11 3.85
Avg. 32.75 38.16 43.53 27.09 35.38
Table 4. PMC method performance
PMC (MFCC_FWD_BWD & PSSHMM)
Acc (%) babble f16 machinegun white Avg.
30db 76.08 74.63 75.08 67.87 73.42
20db 69.65 63.78 73.67 50.05 64.29
10db 46.65 31.63 71.47 21.66 42.85
0db 12.46 5.80 65.63 5.68 22.39
Avg. 51.21 43.96 71.46 36.32 50.74
From table 2, it can be seen that the recognition rate of baseline system without any robust
processing decreases sharply with SNR descending. In table 3, as spectral subtraction feature
8
processing method is used, the recognition rate increases 4.2% relative to baseline system
averagely. Table 4 shows the performance of the PMC method using MFCC_FWD_BWD feature
and PSSHMM. It is clear that this method achieves excellent noise robust result, which
recognition rate is far higher than baseline system and spectral subtraction method with the
relative increase of 49.5% and 43.4%.
Comparing the recognition rate of machinegun noisy speech in table 2-4, we can find that
spectral subtraction method cannot reach the goal of noisy robustness. On the contrary, the
recognition rate decreases 17.8% relative to baseline system. However the PMC method has
excellent robust performance that it achieves relative increase of 35.0% compared with baseline
system.
7 Conclusion
In this paper, the PMC method using the presented MFCC_FWD_BWD feature and
PSSHMM achieves excellent noise robust performance in each kind of noise and each SNR level.
Its recognition rate improves 49.5% relative to baseline system and 43.4% relative to spectral
subtraction method. Especially for machinegun noise, the spectral subtraction cannot makes any
improvement, but the PMC stands out with the 35.0% increase relative to the baseline system. Our
planed work will be to put on the research of the model parameter combination algorithm to
improve the recognition performance.
Reference
[1] B.E.D. Kingsbury, N. Morgan, Recognizing Reverberant Speech with RASTA-PLP, ICASSP-97, pp.
1259-1262, Munich, Germany, 1997.
[2] Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, etc., Robust Speech Recognition with Spectral Subtraction
in low SNR. ICSLP-04, pp. 2077-2080, Jeju Island, Korea, 2004.
[3] Mark J. F. Gales, Steve Young, Robust Continuous Speech Recognition Using Parallel Model Combination,
IEEE Trans. Speech and Audio Processing, vol. 4, pp. 352-359, 1996.
[4] Jeih-weih Huang, Jia-lin Shen, Lin-shan Lee, New Approach for Domain Transformation and Parameter
Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques, IEEE Trans. Speech
and Audio Processing, vol. 9, 842-855, 2001.
[5] Febe de Wet, Jhan de Veth, Loe Boves, etc., Additive Background Noise as a source of non-linear mismatch
in the cepstral and log-energy domain, Computer Speech and Language, Vol.19, pp. 31-54, 2005.
[6] Steve Young, Dan Kershaw, Julian Odell, etc., The HTK book (for HTK v3.0), Cambridge University, 2000.

More Related Content

What's hot

1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-main1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-mainPraveen Jesudhas
 
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of SubwayThe Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of SubwayTELKOMNIKA JOURNAL
 
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
 
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...IJCSEA Journal
 
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...ijma
 
Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...
Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...
Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...CSCJournals
 
Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...
Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...
Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...ijsrd.com
 
Miniaturization of Resonator based on Moore Fractal
Miniaturization of Resonator based on Moore FractalMiniaturization of Resonator based on Moore Fractal
Miniaturization of Resonator based on Moore FractalTELKOMNIKA JOURNAL
 
A genetic algorithm to solve the
A genetic algorithm to solve theA genetic algorithm to solve the
A genetic algorithm to solve theIJCNCJournal
 
A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...
A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...
A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...Barhm Mohamad
 
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNELNEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNELijcseit
 

What's hot (18)

1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-main1-s2.0-S092523121401087X-main
1-s2.0-S092523121401087X-main
 
03 time and motion
03 time and motion03 time and motion
03 time and motion
 
F04924352
F04924352F04924352
F04924352
 
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of SubwayThe Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
 
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
 
3975 wd 1
3975 wd 13975 wd 1
3975 wd 1
 
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
Classical Discrete-Time Fourier TransformBased Channel Estimation for MIMO-OF...
 
Masters Report 1
Masters Report 1Masters Report 1
Masters Report 1
 
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...
 
Jc2415921599
Jc2415921599Jc2415921599
Jc2415921599
 
Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...
Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...
Investigating the Effect of Mutual Coupling on SVD Based Beam-forming over MI...
 
Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...
Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...
Bit Error Rate Performance of MIMO Spatial Multiplexing with MPSK Modulation ...
 
Miniaturization of Resonator based on Moore Fractal
Miniaturization of Resonator based on Moore FractalMiniaturization of Resonator based on Moore Fractal
Miniaturization of Resonator based on Moore Fractal
 
A genetic algorithm to solve the
A genetic algorithm to solve theA genetic algorithm to solve the
A genetic algorithm to solve the
 
A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...
A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...
A Comparison Between Hybrid Method Technique and Transfer Matrix Method for D...
 
Deep leaning Vincent Vanhoucke
Deep leaning Vincent VanhouckeDeep leaning Vincent Vanhoucke
Deep leaning Vincent Vanhoucke
 
1866 1872
1866 18721866 1872
1866 1872
 
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNELNEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
 

Viewers also liked

Goodwood presentation
Goodwood presentation Goodwood presentation
Goodwood presentation Jonathan Monje
 
La ceguera de galileo galilei
La ceguera de galileo galileiLa ceguera de galileo galilei
La ceguera de galileo galileiSocrates Narvaez
 
Things to know about Oracle GoldenGate
Things to know about Oracle GoldenGateThings to know about Oracle GoldenGate
Things to know about Oracle GoldenGateKoenig Solutions Ltd.
 
Physical map of africa
Physical map of africa Physical map of africa
Physical map of africa rachelkcole
 
Report on Grant County 2015
Report on Grant County 2015Report on Grant County 2015
Report on Grant County 2015Isaac Swanson
 
James F Gambrell Resume 06-Jan-2016 BBC
James F Gambrell Resume 06-Jan-2016 BBCJames F Gambrell Resume 06-Jan-2016 BBC
James F Gambrell Resume 06-Jan-2016 BBCJames Gambrell
 

Viewers also liked (6)

Goodwood presentation
Goodwood presentation Goodwood presentation
Goodwood presentation
 
La ceguera de galileo galilei
La ceguera de galileo galileiLa ceguera de galileo galilei
La ceguera de galileo galilei
 
Things to know about Oracle GoldenGate
Things to know about Oracle GoldenGateThings to know about Oracle GoldenGate
Things to know about Oracle GoldenGate
 
Physical map of africa
Physical map of africa Physical map of africa
Physical map of africa
 
Report on Grant County 2015
Report on Grant County 2015Report on Grant County 2015
Report on Grant County 2015
 
James F Gambrell Resume 06-Jan-2016 BBC
James F Gambrell Resume 06-Jan-2016 BBCJames F Gambrell Resume 06-Jan-2016 BBC
James F Gambrell Resume 06-Jan-2016 BBC
 

Similar to A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_

Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
 
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker IdentificationProbabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker IdentificationTELKOMNIKA JOURNAL
 
Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Arthur Weglein
 
Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Arthur Weglein
 
129966864405916304[1]
129966864405916304[1]129966864405916304[1]
129966864405916304[1]威華 王
 
Hidden Markov model technique for dynamic spectrum access
Hidden Markov model technique for dynamic spectrum accessHidden Markov model technique for dynamic spectrum access
Hidden Markov model technique for dynamic spectrum accessTELKOMNIKA JOURNAL
 
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...Polytechnique Montreal
 
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionA New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
 
ON M(M,m)/M/C/N: Interdependent Queueing Model
ON M(M,m)/M/C/N: Interdependent Queueing ModelON M(M,m)/M/C/N: Interdependent Queueing Model
ON M(M,m)/M/C/N: Interdependent Queueing ModelIOSR Journals
 
Research on Emotion Recognition for Facial Expression Images Based on Hidden ...
Research on Emotion Recognition for Facial Expression Images Based on Hidden ...Research on Emotion Recognition for Facial Expression Images Based on Hidden ...
Research on Emotion Recognition for Facial Expression Images Based on Hidden ...Editor IJMTER
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 

Similar to A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_ (20)

Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
 
D111823
D111823D111823
D111823
 
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker IdentificationProbabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014
 
Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...
 
Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...Initial study and implementation of the convolutional Perfectly Matched Layer...
Initial study and implementation of the convolutional Perfectly Matched Layer...
 
129966864405916304[1]
129966864405916304[1]129966864405916304[1]
129966864405916304[1]
 
Hidden Markov model technique for dynamic spectrum access
Hidden Markov model technique for dynamic spectrum accessHidden Markov model technique for dynamic spectrum access
Hidden Markov model technique for dynamic spectrum access
 
fading-conf
fading-conffading-conf
fading-conf
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
 
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
A novel and efficient mixed-signal compressed sensing for wide-band cognitive...
 
L046056365
L046056365L046056365
L046056365
 
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionA New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
A New Approach for Speech Enhancement Based On Eigenvalue Spectral Subtraction
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
 
ON M(M,m)/M/C/N: Interdependent Queueing Model
ON M(M,m)/M/C/N: Interdependent Queueing ModelON M(M,m)/M/C/N: Interdependent Queueing Model
ON M(M,m)/M/C/N: Interdependent Queueing Model
 
Research on Emotion Recognition for Facial Expression Images Based on Hidden ...
Research on Emotion Recognition for Facial Expression Images Based on Hidden ...Research on Emotion Recognition for Facial Expression Images Based on Hidden ...
Research on Emotion Recognition for Facial Expression Images Based on Hidden ...
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 

A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_

  • 1. 1 A Novel Parallel Model Method for Noise Speech Recognition ZHANG Mingxin1 2 , CHEN Guoping1 2 , NI Hong1 , ZHANG Dongbin1 1 (Institute of Acoustics, Chinese Academy of Sciences, Beijing 100080, China) 2 (Graduate School of the Chinese Academy of Sciences, Beijing 100039, China) Abstract ─ In noise robust speech recognition, parallel model combination (PMC) method is suitable for non-stationary environment noise, and theoretically the performance the combined model can approach that of the model matching the noisy environment, so it is an important and popular noise robust speech recognition research field. In this paper, a new feature MFCC_FWD_BWD is presented to make PMC much simple and direct, which is based on forward-backward difference dynamic parameters. On this condition, a novel parallel sub-state hidden Markov model (PSSHMM) is also presented for PMC, which topology is different from that of the standard hidden Markov model (HMM). In PSSHMM each state has parallel sub-states with transitions. In experiment, PSSHMM using the feature MFCC_FWD_BWD achieves good results under each kind of noise and SNR. Especially for non-stationary noise, its robust performance is also excellent. Key words ─ Parallel Model, Speech recognition, Noise robust, PMC 1 Introduction Recognition rate of LVCSR (large vocabulary continuous speech recognition) system has reached fairly high level in laboratory environment up to now. However, if the system works in noisy environment, the performance degrades seriously. Such performance degradation has greatly prevented the application of LVCSR. Therefore, robust speech recognition in noisy environment is becoming increasingly important. Now international noise robust speech recognition researches mainly focus on three aspects. Firstly, robust feature representation is used, such as relative spectral (RASTA)[1], perceptual linear prediction (PLP) and cepstral mean normalization (CMN). Secondly, approaches trying to modify the testing speech features to make them better match the conditions of the pre-trained recognition model. The methods based on spectral subtraction [2] and speech enhancement belong to this aspect. Thirdly, the compensation is performed on the pre-trained model to match the noisy background. Such model-based compensation schemes include parallel model combination (PMC)[3][4], maximum likelihood linear regression (MLLR), etc.. Because PMC is suitable for non-stationary noise and the performance of the combined model without retraining can approximate that of the matched model trained using the noisy speech of corresponding environment, it has been paid great attention. PMC is the subject of this paper. In this paper, the basic PMC method is introduced first. Then the new feature named MFCC_FWD_BWD is described, which dynamic parameters in feature vector is based on forward and backward difference. And then, PSSHMM is presented for PMC noise robust speech recognition model. The model parameters combination algorithm is also explained. Last, the evaluation and conclusion are given.
  • 2. 2 2 Basic PMC Method When LVCSR system works in additive noise environment, the matched model should be the retrained model using the noisy speech sampled in the environment or obtained by adding the noise and pure speech in time domain waveform. This model will have the best performance under the noise environment. However, retraining the matched model online in any environment is impractical for its great computation costs. Fortunately, PMC method doesn’t need retraining. It believes that the pure speech model contains enough information about the speech feature and the noise model contains enough information about the noise feature, so we can combine the speech and noise model to match the noisy background [4]. In this paper, the speech model is the standard HMM model. The noise model is the single Gaussian component and state-full-transition model, which is obtained by clustering noise feature vectors. It has no starting and ending state, which is different from HMM speech model. In order to describe the effects of the noise on the clean speech, a series of assumptions are required, as show in the following [3]. 1) speech and noise are independent. 2) speech and noise are additive in the time domain. 3) a single Gaussian or multiple Gaussian component(s) model contain sufficient information to represent the distribution of the observation feature vector in cepstral or log-spectral domain. 4) the frame/state alignment used to generate the speech models from the clean speech data is not altered by the addition of noise. Under above assumptions, the speech and noise is treated as additive in power spectral [5]. As the feature used in recognition is usually in cepstral domain, the model parameters of speech and noise should be transformed to spectral domain. After model combination, the combined model should be transformed back to cepstral domain. The procedure is shown in Fig.1. 1− C 1− C exp{} exp{} PMC log{} C Fig.1 Parallel model combination procedure 3 Feature Vector Construction Method for PMC PMC method requires that the feature vector used in recognition can regenerate the raw parameter vectors that will be used for combination in power spectral domain [3]. Especially for dynamic feature parameters in the feature vector, this requirement is much more important. For this reason, we present a novel feature vector MFCC_FWD_BWD. Its static part is the same with MFCC_D_A, while its dynamic part uses the forward and backward difference parameters to take
  • 3. 3 place of the difference and accelerate parameters. The MFCC_FWD_BWD feature is constructed as following: TTc Bw Tc Fw Tcc NFVec ])()()([)( ττττ OOOO ∆∆= (1) where )()()( τττ c FB cc Fw w OOO −+=∆ is the forward difference part and )()()( FB ccc Bw w−−=∆ τττ OOO is the backward difference part. In matrix form, c NTVecN FB c c FB c c Bw c Fw c c NFVec w w OA O O O II0 0II 0I0 O O O O =           − +           − −=           ∆ ∆= )( )( )( )( )( )( )( τ τ τ τ τ τ τ . (2) However, MFCC_D_A is constructed as c TVec c c c c c c c c c FVec w w w w AO O O O O O I02I0I 0I0I0 00I00 O O O O =                   − − + +           − −=           ∆ ∆= )2( )( )( )( )2( )( )( )( )( 2 τ τ τ τ τ τ τ τ τ . (3) Comparing formula (2) and (3), we can see that MFCC_FWD_BWD construction matrix NA is invertible, so feature vector static time series c NTVecO can be obtained from feature vector )(τc NFVecO . On the contrary, because the MFCC_D_A construction matrix A is not invertible, we can not obtain c TVecO from )(τc FVecO . Construction matrix is invertible is necessary for PMC. 4 Parallel Sub-State HMM for PMC 4.1 Speech model and noise model used in PMC In the system of this paper, clean speech model is the usually used standard HMM model, which is the finite state machine model and is quite suited for describing the speech generating procedure. The topology of HMM model is shown in Fig.2. HMM model may be characterized by following three important parameters: SN , the number of states in the model; ST , the state-transition probability matrix; )( tjb o , 1,,2 −= SNj L , the output observation probability distribution. In the model, both the starting state and ending state are non-emitting states which are used for HMM models connection. Here )( tjb o are often described by Gaussian (single or multiple)
  • 4. 4 component(s), i.e. ∑ = = Ms m jmjmjmtj Ncb 1 ),()( Sµo , where 1≥sM (for the convenience of explanation, we let 1=sM in the following, then ),()( jjtj Nb Sµo = ). The noise model in the paper is defined as full-state-transition model, which topology is shown in Fig.3. It is composed by several states, which parameters are obtained by clustering background noise features. The noise model also can be characterized by following three important parameters: NoiN , the number of states in noise model; NoiT , the full-state-transition probability matrix of noise model; )( tkb o , noiNk ,,1 L= , the output observation probability distribution; where )( tkb o is described by single Gaussian component, i.e. ) ~ ,~()( kktk Nb Sµo = . Fig.2 HMM topology Fig.3 Noise model topology 4.2 PSSHMM PMC combines the clean speech model and noise model to achieve the matched model. In this subsection, the presented PSSHMM used for the combined matched model is described in detail. The topology of PSSHMM is shown in Fig.4. PSSHMM is a complex HMM model, while each state has several parallel sub-states. These sub-states are generated by combining the corresponding clean speech state and each noise state. In PSSHMM, there are two kinds of transition. One is the transition of the global HMM, as shown in Fig.5; the other is the transition among the sub-states, which obey the noise states transition matrix. Seen from the time synchronous expanded states series, we can find that the sub-states are arranged parallel and at each time point only one sub-state can emit an observation, as shown in Fig.6. It also can be seen that the transition among the sub-states exists between the previous and posterior time synchronous states. Fig.4 PSSHMM topology
  • 5. 5 Fig.5 PSSHMMl is a complex HMM 2t 3t 4t1t Fig.6 PSSHMM time synchronous expanded state series The PSSHMM can be described using following five parameters: pmN , the number of states in the model; pmT , the state-transition probability matrix; subD , the number of sub-states in each model state; subT , the sub-state-transition probability matrix; )( tjkb o , 1,,2 −= pmNj L , subDk ,,1 L= , the output observation probability distribution of each sub-state. Here )( tjkb o is often described by Gaussian component, i.e. )ˆ,ˆ()( jkjktjk Nb Sµo = , where jkµˆ and jkSˆ are obtained by parallel model combination algorithm that will be discussed in following section. It should be emphasized that the output probability of parallel model state is related with that of the sub-states. The relation can be described by )}|()({max)( 1−⋅= ttjk k tj kkPbb oo , which has directly effect on recognition decoding, where 1−tk is previous optimal sub-state label and )|( 1−tkkP is the sub-state-transition probability, i.e. ],[)|( 11 −− = tsubt kkkkP T . 5 Parallel Model Parameter Combination Algorithm Using the MFCC_FWD_BWD feature, we combine the clean speech model and noise model to achieve the parallel model to match the noisy environment. In this paper log-add algorithm [4] is used for the model parameters combination. Log-add algorithm only combines means and does not combine variances. It is assumed that clean speech model state is described by Gaussian components ])[],([ c Bw c Fw cc Bw c Fw c N ∆∆∆∆ SSSµµµ and noise model state is described by Gaussian
  • 6. 6 components ]) ~~~ [],~~~([ c Bw c Fw cc Bw c Fw c N ∆∆∆∆ SSSµµµ . The parameter combining steps is: 1) Transform the clean speech model parameters from MFCC_FWD_BWD to static time series parameters in cepstral, i.e. [ ] [ ]TTc Bw Tc Fw Tc N TTc w Tc w Tc ∆∆− −+ = µµµAµµµ 1 τττ (4) It is the same for noise model [ ] [ ]TTc Bw Tc Fw Tc N TTc w Tc w Tc ∆∆− −+ = µµµAµµµ ~~~~~~ 1 τττ (5) 2) Using IDCT, transform the static time series parameters from cepstral domain to log domain, i.e. [ ] [ ]TTc w Tc w Tc TTl w Tl w Tl − − + −− −+ = ττττττ µCµCµCµµµ 111 (6) [ ] [ ]TTc w Tc w Tc TTl w Tl w Tl − − + −− −+ = ττττττ µCµCµCµµµ ~~~~~~ 111 (7) 3) Combine the parameters of clean speech model and noise model using log-add algorithm, i.e. }}~exp{}log{exp{ˆ τττ lll µµµ += (8) }}~exp{}log{exp{ˆ w l w l w l +++ += τττ µµµ (9) }}~exp{}log{exp{ˆ w l w l w l −−− += τττ µµµ (10) 4) Using DTC, transform the combined model parameters from log domain to cepstral domain, i.e. [ ] [ ]TTl w Tl w Tl TTc w Tc w Tc −+−+ = ττττττ µCµCµCµµµ ˆˆˆˆˆˆ (11) 5) Transform the static time series combined model parameters to MFCC_FWD_BWD, i.e. [ ] [ ]TTc w Tc w Tc N TTc Bw Tc Fw Tc −+ ∆∆ = τττ µµµAµµµ ˆˆˆˆˆˆ (12) Thus, the sub-state output observation probability components of combined parallel model states subpmtjk DkNjb ,,1,1,,2),( LL =−=o , can be calculated by combining the clean speech model state Stj Njb ,,1),( L=o and each noise model state noitk Nkb ,,1),( L=o using the above log-add algorithm. 6 Evaluation Our experiment is based on HTK3.0 [6] speech recognition platform that has been improved and changed for PMC. The acoustic models are context dependent single Gaussian component model. The database used for clean speech was Mandarin 863 Speech Database. We select 9515 sentences of 16 female speakers as training set and 400 sentences of 4 speakers outside the training set as testing set. The second database used is NOISEX92 as noise database. We select four kinds of representative noise: babble, f16, machinegun and white. The four kinds of noise are added to clean speech according to certain ratio to form four different SNR: 30db, 20db, 10db, 0db noisy speech for test. All of feature vector sizes are 39.
  • 7. 7 In the experiment, we first test recognition performance of the clean speech data using the MFCC_D_A and MFCC_FWD_BWD features. As shown in table 1, it can be seen that the word accurate recognition rates of two kinds of feature are almost same and the difference is less than 0.5%. So the new feature MFCC_FWD_BWD simplifies parameter combination procedure with slight decrease in recognition rate. Table 1. Accuracy comparison of clean speech (MFCC_D_A vs MFCC_FWD_BWD) Feature kind Acc (%) MFCC_D_A 75.08 MFCC_FWD_BWD 74.65 In noise robust experiments, the baseline system uses the MFCC_D_A feature and standard HMM, and the PMC testing system uses the MFCC_FWD_BWD feature and PSSHMM. In PMC, the number of noise model state is 3. In order to compare the performance of the new method we also test the spectral subtraction method, which is usually used in noise robust speech recognition field. The evaluations are given out in following table 2-4. Table 2. Baseline system performance Baseline system (MFCC_D_A & HMM) Acc(%) babble f16 machinegun white Avg. 30db 66.52 69.10 68.46 55.77 64.96 20db 47.80 51.34 61.47 18.92 44.88 10db 5.70 10.41 48.34 5.34 17.45 0db 0.36 0.00 33.46 0.00 8.46 Avg. 30.10 32.71 52.93 20.01 33.94 Table 3. Spectral subtraction method performance Spectral subtraction (MFCC_D_A & HMM) Acc (%) babble f16 machinegun white Avg. 30db 65.67 69.10 66.70 65.83 66.83 20db 51.62 57.42 58.47 34.97 50.62 10db 13.69 24.44 37.31 5.46 20.23 0db 0.00 1.67 11.62 2.11 3.85 Avg. 32.75 38.16 43.53 27.09 35.38 Table 4. PMC method performance PMC (MFCC_FWD_BWD & PSSHMM) Acc (%) babble f16 machinegun white Avg. 30db 76.08 74.63 75.08 67.87 73.42 20db 69.65 63.78 73.67 50.05 64.29 10db 46.65 31.63 71.47 21.66 42.85 0db 12.46 5.80 65.63 5.68 22.39 Avg. 51.21 43.96 71.46 36.32 50.74 From table 2, it can be seen that the recognition rate of baseline system without any robust processing decreases sharply with SNR descending. In table 3, as spectral subtraction feature
  • 8. 8 processing method is used, the recognition rate increases 4.2% relative to baseline system averagely. Table 4 shows the performance of the PMC method using MFCC_FWD_BWD feature and PSSHMM. It is clear that this method achieves excellent noise robust result, which recognition rate is far higher than baseline system and spectral subtraction method with the relative increase of 49.5% and 43.4%. Comparing the recognition rate of machinegun noisy speech in table 2-4, we can find that spectral subtraction method cannot reach the goal of noisy robustness. On the contrary, the recognition rate decreases 17.8% relative to baseline system. However the PMC method has excellent robust performance that it achieves relative increase of 35.0% compared with baseline system. 7 Conclusion In this paper, the PMC method using the presented MFCC_FWD_BWD feature and PSSHMM achieves excellent noise robust performance in each kind of noise and each SNR level. Its recognition rate improves 49.5% relative to baseline system and 43.4% relative to spectral subtraction method. Especially for machinegun noise, the spectral subtraction cannot makes any improvement, but the PMC stands out with the 35.0% increase relative to the baseline system. Our planed work will be to put on the research of the model parameter combination algorithm to improve the recognition performance. Reference [1] B.E.D. Kingsbury, N. Morgan, Recognizing Reverberant Speech with RASTA-PLP, ICASSP-97, pp. 1259-1262, Munich, Germany, 1997. [2] Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, etc., Robust Speech Recognition with Spectral Subtraction in low SNR. ICSLP-04, pp. 2077-2080, Jeju Island, Korea, 2004. [3] Mark J. F. Gales, Steve Young, Robust Continuous Speech Recognition Using Parallel Model Combination, IEEE Trans. Speech and Audio Processing, vol. 4, pp. 352-359, 1996. [4] Jeih-weih Huang, Jia-lin Shen, Lin-shan Lee, New Approach for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques, IEEE Trans. Speech and Audio Processing, vol. 9, 842-855, 2001. [5] Febe de Wet, Jhan de Veth, Loe Boves, etc., Additive Background Noise as a source of non-linear mismatch in the cepstral and log-energy domain, Computer Speech and Language, Vol.19, pp. 31-54, 2005. [6] Steve Young, Dan Kershaw, Julian Odell, etc., The HTK book (for HTK v3.0), Cambridge University, 2000.