SlideShare a Scribd company logo
On the Use of Weighted Filter Bank Analysis for
the Derivation of Robust MFCCs
Wei-Wen Hung
(Member, IEEE)
Department of Electrical Engineering
Ming Chi Institute of Technology
84 Gungjuan Road, Taishan, Taipei, Taiwan, 24306, Republic of China
E-mail :wwhung@ccsun.mit.edu.tw
FAX : 886-02-2906-1780; Tel. : 886-02-2906-0379
and
Hsiao-Chuan Wang
(Senior Member, IEEE)
(Associate Editor of IEEE Transactions on Speech and Audio Processing)
Department of Electrical Engineering
National Tsing Hua University
Hsinchu, 30043, Taiwan, Republic of China
E-mail : hcwang@ee.nthu.edu.tw
FAX : 886-03-571-5971; Tel. : 886-03-574-2587
EDICS number : SPL.SA.1.6 Speech Recognition
Re : SPL-2145
Corresponding Author : Wei-Wen Hung
On the Use of Weighted Filter Bank Analysis for
the Derivation of Robust MFCCs
∗
Wei-Wen Hung and #
Hsiao-Chuan Wang
∗
Department of Electrical Engineering, Ming Chi Institute of Technology
(Member, IEEE)
#
Department of Electrical Engineering, National Tsing Hua University
(Senior Member, IEEE)
(Associate Editor of IEEE Transactions on Speech and Audio Processing)
Abstract – In this paper, we discuss the use of weighted filter bank analysis (WFBA) to increase the
discriminating ability of mel frequency cepstral coefficients (MFCCs). The WFBA emphasizes the peak
structure of the log filter bank energies (LFBEs) obtained from filter bank analysis while attenuating the
components with lower energy in a simple, direct and effective way. Experimental results for recognition
of continuous Mandarin telephone speech indicate that the WFBA-based cepstral features are more
robust than those derived by employing the standard filter bank analysis and some widely used cepstral
liftering and frequency filtering schemes both in channel-distorted and noisy conditions.
Indexing Terms – Weighted filter bank analysis (WFBA), log filter bank energy (LFBE), mel frequency
cepstral coefficient (MFCC).
This research has been partially sponsored by the National Science Council, Taiwan, ROC, under
contract number NSC-89-2614-E-007-002.
LIST OF FIGURES AND TABLES
Fig. 1. Block diagram for the derivation of MFCCs based on the weighted filter bank analysis.
Fig. 2. F-ratio curves of mel frequency cepstral coefficients based on various schemes.
(A) For the 12-order cepstral coefficients.
(B) For the 12-order delta cepstral coefficients.
Fig. 3. Relationships between fuzzy factors and syllable recognition rates under different conditions.
Table I. COMPARISONS OF SYLLABLE RECOGNITION RATES FOR VARIOUS SCHEMES
UNDER DIFFERENT CONDITIONS.
I. INTRODUCTION
The filter bank analysis (FBA) is one of the most extensively employed spectral analysis techniques,
which is required among various kinds of speech applications. This approach typically uses a bank of
highly overlapped band-pass filters that roughly approximates the frequency response of basilar
membrane in the cochlea to cover the frequency range of interest in a speech signal. The measurement
from the outputs of those band-pass filters can be essentially treated as a short-time spectral envelope.
This measured spectral envelope is easily prone to statistical variation due to speaker characteristics,
background noise, channel effect and limitations of the underlying speech analysis model, etc., and it may
make spectral comparisons unreliable. To suppress those undesired variations and to obtain a more
reliable distance measure, a cepstral liftering (CL) scheme [1] has been developed to account for the
sensitivity of cepstral coefficients. In this regard the applied weights )(mL used in the liftering process
take advantage of the statistical characteristic of cepstral coefficients and the resulting liftered distance
measure is given by
[ ] [ ] ,~)()(~)
~
,(
1
2
1
2
)()()()( ∑∑
==
⋅−⋅=−=
L
m
mm
L
m
CLmCLmCLCL cmLcmLccCCd (1)
where [ ])()()()( ,,, CLLCL2CL1CL cccC ⋅⋅⋅= and [ ])()()()(
~,,~,~~
CLLCL2CL1CL cccC ⋅⋅⋅= are two liftered cepstral
vectors. Various types of weighting functions including linear, sinusoidal, exponential, band-pass and
ramp lifters have been introduced in the literature.
Besides the cepstral liftering scheme, Battle et al. [2] proposed an alternative to improve the
robustness of FBA-based speech features by filtering the frequency sequence of log filter bank energies
(LFBEs). The frequency filtering (FF) scheme not only approximately equalizes the variances of cepstral
coefficients up to a certain quefrency index, but also decorrelates the log filter bank energies to some
extents. This filtering process can be accomplished by passing the sequence of log filter bank energies
through a finite impulse response (FIR) filter of the form
∑ −
⋅=
i
i
i zhzH )( (2)
Although the aforementioned cepstral liftering and frequency filtering schemes have been widely used in
enhancing the robustness of cepstral features, there is still a need to investigate new approaches for
achieving better performance. Subsequently, we shall introduce a new weighted filter bank analysis
(WFBA) scheme which results in a set of discriminating cepstral features in a simple, direct and effective
way while maintaining a relatively low computation cost.
II. WEIGHTED FILTER BANK ANALYSIS SCHEME
Assuming that )(nx represents the frame of a speech signal that is pre-emphasized and
Hamming-windowed, then the derivation of conventional mel frequency cepstral coefficients (MFCCs)
proceeds as follows. Firstly, the speech frame )(nx , where Nn1 ≤≤ , is transformed from time
domain into frequency domain by applying an −N point short-time Fourier transform (STFT), and the
resulting power spectrum
2
kX )( can be formulated as
,)
2
exp()()(
2
1
2
∑
=
⋅⋅⋅
⋅−⋅=
N
n N
kn
jnxkX
π
(3)
where Nk ≤≤1 . Once the power spectrum
2
)(kX is obtained, we can calculate the filter bank
energy )(ie passing through the thi − mel-scaled critical band-pass filter )(kiψ by
,)()()(
1
2
∑
=
⋅=
N
k
i kkXie ψ (4)
where Qi1 ≤≤ and Q is the number of mel-scaled triangular band-pass filters. Finally, a discrete
cosine transform (DCT) is applied to the frequency sequence of log filter bank energies
{ }Qi1ie ≤≤)],(log[ . Thus, the mel frequency cepstral coefficients mc can be expressed as
,)(cos)](log[∑=






⋅
−⋅
⋅⋅=
Q
1i
m
Q2
1i2
miec
π
(5)
where ,Lm1 ≤≤ and L is the desired number of cepstral features.
From above description, we can find that a distorted speech signal always causes considerable
spectral variations and results in performance degradation. However, it is also known that more noise can
be perceptually tolerated in the spectral formant regions than in the spectral valleys. Therefore, our goal is
to emphasize the high energy parts of the log filter bank energies such that the cepstral features become
less susceptible to environmental interference. In our approach shown in Fig. 1, the log filter bank
energies are multiplied by a set of weighting factors prior to performing discrete cosine transform, that is
[3]
.)
2
12
(cos)](log[)(
1
)( ∑
=






⋅
−⋅
⋅⋅⋅=
Q
i
WFBAm
Q
i
mieiwc
π
(6)
In this study, we investigate the effects of the following two types of weighting functions.
Type 1. ∑
=
=
Q
j
jiiw
1
)( ββ and ∑
=
−






+
+
=
Q
1r
1F
1
i
01re
01ie
].)(log[
].)(log[
β . (7)
Type 2. .].)(log[].)(log[)( ∑
=
++=
Q
1j
01je01ieiw (8)
For the first type of weighting function, a fuzzy membership function is used to determine the weights. By
properly adjusting the fuzzy factor F , we can achieve various extents of fuzziness for the WFBA
scheme. When the fuzzy factor F tends to 1.0 and )(ie is the maximum energy, then the weights are
distributed with )(iw =1.0 and )( jw =0.0 for ji ≠ . On the other hand, in the case of ∞→F , all the
weights become equal and are set to Q1 . In the second type, the weighting terms are directly
proportional to the log energy of each critical band. In addition, it does not require a priori determination
of the fuzzy factor and therefore needs less computation. We will refer to the cepstral features calculated
by WFBA scheme using Type 1 and 2 weighting functions as the “FWFBA” and “DWFBA”,
respectively.
III. EXPERIMENTS AND DISCUSSIONS
The MAT (Mandarin Across Taiwan) speech database [3] was used to evaluate the presented
schemes. The database provided by the Computational Linguistic Society of R.O.C. was collected over
the public telephone network and each Mandarin word comprised 1~23 Mandarin syllables. From the
MAT database, we chose 8320 phonetically balanced Mandarin words (37784 syllables) spoken by 81
males and 79 females to train the right-context-dependent sub-syllable HMMs of 410 Mandarin syllables.
Moreover, each syllable model contains six to seven states in which the output observation distribution is
characterized by a 4-mixture Gaussian density function with diagonal covariance matrix. In the testing
phase, the evaluated schemes were applied to a 500-utterance (4754 syllables) recognition task in which
the testing utterances spoken by 15 males and 15 females were selected from a different set of the MAT
database. The feature vector was composed of 12-order mel frequency cepstral coefficients and their
first-order time derivatives. To simulate various noisy conditions, the 500 testing utterances were
corrupted by the additive white Gaussian noise (AWGN) with signal-to-noise ratio (SNR) at 10 dB, 20
dB and 30 dB. In addition, a sinusoidal lifter [1] and an FIR filter of the form 1
zzzH −
−=)( [2] were
used in the experiments for comparative purpose and abbreviated as SL and FF, respectively.
To evaluate the discriminating abilities of the speech features employing various schemes, we treated
each state from all the syllable models as a separate speech class and used F-ratio measure [4] to test the
class separability in the feature space. The F-ratio measure takes into account the variance of means and
the mean of variances among classes. It has been confirmed that good class separability with large F-ratio
measure gives high recognition accuracy. In Fig. 2, it shows the F-ratio curves of the 12-order mel
frequency cepstral coefficients and their first-order time derivatives derived by applying various schemes.
From these curves, we can find that lower quefrency coefficients generally have higher F-ratios and
should therefore offer better class separation. In addition, it can be seen that the WFBA scheme
compared to the other schemes always achieves higher F-ratios for different cepstral coefficients.
Especially, the FWFBA is superior to the DWFBA at the price of requiring more computation cost.
In the aspect of recognition for continuous Mandarin telephone speech, we evaluated these schemes in
terms of syllable recognition rate (S.R.R). Two kinds of environmental conditions including channel
distortion and noise corruption were investigated, and to see if the WFBA scheme can achieve better
syllable recognition rates than the other evaluated schemes in channel-distorted and noisy conditions. In
the channel-compensated condition, the widely used cepstral mean subtraction (CMS) [5] was employed
for canceling the embedded channel effect. In Fig. 3, we illustrated the relationships between the fuzzy
factors and the syllable recognition rates under different conditions. It shows that the syllable recognition
rate initially increases with the fuzzy factor F , attains a maximum value and then decreases with an
increase in the fuzzy factor. Obviously, the optimal value of fuzzy factor is related to SNR value, i.e.,
the smaller the SNR value of additive white Gaussian noise, the smaller the optimal value of fuzzy factor.
Moreover, we also find that further improvement in syllable recognition rate can be obtained by
integrating the WFBA with the CMS. On the other hand, as shown in Table I, we can also observe that
the WFBA technique outperforms the SL and FF schemes and exhibits consistent improvements for the
channel-distorted, channel-compensated and various noisy conditions. As far as computation cost is
concerned, the computation complexity required by the DWFBA is lower than for the FWFBA. Finally,
it is worth to note that the optimal value of fuzzy factor should be heavily related to SNR value and is still
not easily derived. In this study, the optimal values of fuzzy factor under various conditions were
determined in a time-consuming manner by selecting some specific values and their neighbors and
comparing the corresponding syllable recognition rates.
IV. CONCLUSIONS
In this paper, a weighted filter bank analysis scheme with emphasis on the peak structure of log filter
bank energies is proposed for the derivation of robust cepstral features. Two kinds of weighting functions
employed in the WFBA are investigated. The experiments show that by properly adjusting the fuzzy
factor the FWFBA has higher capability in enhancing the discriminating ability of cepstral features than the
conventional FBA scheme and the other two widely used schemes, i.e., cepstral liftering and frequency
filtering schemes. Also, instead of the FWFBA, the DWFBA can offer a simpler form for weighting the
LFBEs with much less computation cost while maintaining comparable recognition accuracy. In addition,
it is shown that the WFBA is effective for noisy speech recognition and can be well combined with some
environment-compensated techniques, such as the CMS, to achieve higher recognition rates if necessary.
REFERENCES
[1] B. H. Juang, L. R. Rabiner, and J. G. Wilpon, “On the use of band-pass liftering in speech recognition,”
IEEE Trans. Acoust., Speech, Signal Processing, vol. 35, no. 7, pp. 947-954, July, 1987.
[2] E. Battle, C. Nadeu and J. A. R. Fonollosa, “Feature decorrelation methods in speech recognition :A
comparative study,” Proceedings of International Conference on Spoken Language Processing, pp.
951-954, 1998.
[3] W. W. Hung, and H. C. Wang, “A fuzzy approach for equalization of the cepstral variances,”
Proceeding of International Conference on Acoustics, Speech, and Signal Processing, vol. 3,
SP-P7, pp.1611-1614, Istanbul, June 2000.
[4] S. Nicholson, B. Milner and S. Cox, “Evaluating feature set performance using the F-ratio and
J-measures,” Proceeding of European Conference on Speech Communication and Technology,
vol. 1, pp.413-416, Greece, September 1997.
[5] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoustics,
Speech and Signal Processing, vol. ASSP-29, pp. 254-272, 1981.
Figures and tables captions :
Fig. 1. Block diagram for the derivation of MFCCs based on the weighted filter bank analysis.
Fig. 2. F-ratio curves of mel frequency cepstral coefficients based on various schemes.
(A) For the 12-order cepstral coefficients.
(B) For the 12-order delta cepstral coefficients.
Fig. 3. Relationships between fuzzy factors and syllable recognition rates under different conditions.
Table I. COMPARISONS OF SYLLABLE RECOGNITION RATES FOR VARIOUS SCHEMES
UNDER DIFFERENT CONDITIONS.
Fig. 1
)(k1ψ )1(w
2
)(kX ⊗ )1(e ])(log[ .011e + ⊗
)(nx )(kQψ )(Qw
)(WFBAmC
2
)(kX ⊗ )(Qe ])(log[ .01Qe + ⊗
|STFT|
×
|STFT|
1.1.DCT
Pre-emphasis&
HammingWindowing
1.2.
1.3.
Fig. 2 (A)
0.25
0.75
1.25
1.75
2.25
2.75
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
cepstral coefficients
F-ratiomeasures
MFCC
FWFBA(F=1.9)
DWFBA
FF
SL
Fig. 2 (B)
0.2
0.7
1.2
1.7
2.2
2.7
C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24
delta cepstral coefficients
F-ratiomeasures
MFCC
FWFBA(F=1.9)
DWFBA
FF
SL
Fig. 3
0
5
10
15
20
25
30
35
40
45
50
less
than
1.0
1.001 1.01 1.1 1.3 1.5 1.6 1.7 1.8 1.9 2 2.2 10 100 1000
fuzzy factors
syllablerecognitionrates(%)
NO CMS+NO AWGN NO CMS+30dB AWGN
NO CMS+20dB AWGN NO CMS+10dB AWGN
CMS+NO AWGN
Table I.
S.R.Rs(%) Conditions
Schemes
NO CMS
NO AWGN
CMS
NO AWGN
NO CMS
30 dB AWGN
NO CMS
20 dB AWGN
NO CMS
10 dB AWGN
MFCC 40.79 44.90 36.16 24.41 8.01
Sinusoidal Lifter (SL) 40.32 45.01 37.42 26.34 9.78
Frequency Filtering (FF) 41.49 46.73 38.94 28.85 10.46
DWFBA 42.95 48.81 39.98 30.25 11.32
FWFBA ).( 01000F = 40.98 44.78 37.03 24.01 7.59
FWFBA ).( 02F = 43.21 48.36 40.66 30.51 11.95
FWFBA ).( 0011F = 7.18 10.53 2.89 0.0 0.0
FWFBA ).( 01F < 0.0 0.0 0.0 0.0 0.0
FWFBA
)( valueoptimalF =
43.8
).( 91F =
49.27
).( 22F =
41.21
).( 91F =
32.15
).( 81F =
14.81
).( 61F =

More Related Content

What's hot

Nucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation processNucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation process
Daniel Wheeler
 
Paper id 252014119
Paper id 252014119Paper id 252014119
Paper id 252014119
IJRAT
 
D0341015020
D0341015020D0341015020
D0341015020
inventionjournals
 
An improved dft based channel estimation
An improved dft based channel estimationAn improved dft based channel estimation
An improved dft based channel estimation
sakru naik
 
Dynamic magnification factor-A Re-evaluation
Dynamic magnification factor-A Re-evaluationDynamic magnification factor-A Re-evaluation
Dynamic magnification factor-A Re-evaluation
Sayan Batabyal
 
Phase-field modeling of crystal nucleation: Comparison with simulations and e...
Phase-field modeling of crystal nucleation: Comparison with simulations and e...Phase-field modeling of crystal nucleation: Comparison with simulations and e...
Phase-field modeling of crystal nucleation: Comparison with simulations and e...
PFHub PFHub
 
Papr reduction for ofdm oqam signals via alternative signal method
Papr reduction for ofdm   oqam signals via alternative   signal methodPapr reduction for ofdm   oqam signals via alternative   signal method
Papr reduction for ofdm oqam signals via alternative signal method
eSAT Journals
 
Nucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation processNucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation process
PFHub PFHub
 
Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...
Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...
Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...
balaganesh boomiraja
 
Phase-field modeling of crystal nucleation II: Comparison with simulations an...
Phase-field modeling of crystal nucleation II: Comparison with simulations an...Phase-field modeling of crystal nucleation II: Comparison with simulations an...
Phase-field modeling of crystal nucleation II: Comparison with simulations an...
Daniel Wheeler
 
Phase-field modeling of crystal nucleation I: Fundamentals and methods
Phase-field modeling of crystal nucleation I: Fundamentals and methodsPhase-field modeling of crystal nucleation I: Fundamentals and methods
Phase-field modeling of crystal nucleation I: Fundamentals and methods
Daniel Wheeler
 
Non-Extended Schemes for Inter-Subchannel
Non-Extended Schemes for Inter-SubchannelNon-Extended Schemes for Inter-Subchannel
Non-Extended Schemes for Inter-Subchannel
Shih-Chi Liao
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlation
ijdpsjournal
 
Tucci_SCD2016
Tucci_SCD2016Tucci_SCD2016
Tucci_SCD2016
Marco Tucci
 
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Kyle Poe
 
Theoretical and Applied Phase-Field: Glimpses of the activities in India
Theoretical and Applied Phase-Field: Glimpses of the activities in IndiaTheoretical and Applied Phase-Field: Glimpses of the activities in India
Theoretical and Applied Phase-Field: Glimpses of the activities in India
Daniel Wheeler
 
Material parameter modeling
Material parameter modelingMaterial parameter modeling
Material parameter modeling
Gaurav Singh Chandel
 

What's hot (17)

Nucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation processNucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation process
 
Paper id 252014119
Paper id 252014119Paper id 252014119
Paper id 252014119
 
D0341015020
D0341015020D0341015020
D0341015020
 
An improved dft based channel estimation
An improved dft based channel estimationAn improved dft based channel estimation
An improved dft based channel estimation
 
Dynamic magnification factor-A Re-evaluation
Dynamic magnification factor-A Re-evaluationDynamic magnification factor-A Re-evaluation
Dynamic magnification factor-A Re-evaluation
 
Phase-field modeling of crystal nucleation: Comparison with simulations and e...
Phase-field modeling of crystal nucleation: Comparison with simulations and e...Phase-field modeling of crystal nucleation: Comparison with simulations and e...
Phase-field modeling of crystal nucleation: Comparison with simulations and e...
 
Papr reduction for ofdm oqam signals via alternative signal method
Papr reduction for ofdm   oqam signals via alternative   signal methodPapr reduction for ofdm   oqam signals via alternative   signal method
Papr reduction for ofdm oqam signals via alternative signal method
 
Nucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation processNucleation III: Phase-field crystal modeling of nucleation process
Nucleation III: Phase-field crystal modeling of nucleation process
 
Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...
Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...
Convergence Behaviour of Newton-Raphson Method in Node- and Loop-Based Non-li...
 
Phase-field modeling of crystal nucleation II: Comparison with simulations an...
Phase-field modeling of crystal nucleation II: Comparison with simulations an...Phase-field modeling of crystal nucleation II: Comparison with simulations an...
Phase-field modeling of crystal nucleation II: Comparison with simulations an...
 
Phase-field modeling of crystal nucleation I: Fundamentals and methods
Phase-field modeling of crystal nucleation I: Fundamentals and methodsPhase-field modeling of crystal nucleation I: Fundamentals and methods
Phase-field modeling of crystal nucleation I: Fundamentals and methods
 
Non-Extended Schemes for Inter-Subchannel
Non-Extended Schemes for Inter-SubchannelNon-Extended Schemes for Inter-Subchannel
Non-Extended Schemes for Inter-Subchannel
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlation
 
Tucci_SCD2016
Tucci_SCD2016Tucci_SCD2016
Tucci_SCD2016
 
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
 
Theoretical and Applied Phase-Field: Glimpses of the activities in India
Theoretical and Applied Phase-Field: Glimpses of the activities in IndiaTheoretical and Applied Phase-Field: Glimpses of the activities in India
Theoretical and Applied Phase-Field: Glimpses of the activities in India
 
Material parameter modeling
Material parameter modelingMaterial parameter modeling
Material parameter modeling
 

Viewers also liked

Asp video contest sound (updated)
Asp video contest sound (updated)Asp video contest sound (updated)
Asp video contest sound (updated)
Association for Strategic Planning
 
UW College of Arts & Sciences '09 - '10
UW College of Arts &  Sciences '09 - '10UW College of Arts &  Sciences '09 - '10
UW College of Arts & Sciences '09 - '10
Jacob Lambert
 
nsc_proxy_2009
nsc_proxy_2009nsc_proxy_2009
nsc_proxy_2009
finance41
 
Iccana 2011
Iccana 2011Iccana 2011
Paper id 24201420
Paper id 24201420Paper id 24201420
Paper id 24201420
IJRAT
 
10.1.1.11.1180
10.1.1.11.118010.1.1.11.1180
10.1.1.11.1180
Tran Nghi
 
The Employee TV Network
The Employee TV Network The Employee TV Network
The Employee TV Network
Gil Pollock
 
Marca
MarcaMarca
Paper id 212014133
Paper id 212014133Paper id 212014133
Paper id 212014133
IJRAT
 
Roundtable Series Launch Event w/ the Financial Times
Roundtable Series Launch Event w/ the Financial TimesRoundtable Series Launch Event w/ the Financial Times
Roundtable Series Launch Event w/ the Financial Times
FLAGnetwork
 
75 78
75 7875 78
129966863516564072[1]
129966863516564072[1]129966863516564072[1]
129966863516564072[1]
威華 王
 
Bq33413420
Bq33413420Bq33413420
Bq33413420
IJERA Editor
 
Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...
JPINFOTECH JAYAPRAKASH
 
Speech processing strategies for cochlear prostheses
Speech processing strategies for cochlear prosthesesSpeech processing strategies for cochlear prostheses
Speech processing strategies for cochlear prostheses
IAEME Publication
 
129966862758614726[1]
129966862758614726[1]129966862758614726[1]
129966862758614726[1]
威華 王
 
Fex 131104 - presentatie louwers ip-technology advocaten - masterclass clou...
Fex   131104 - presentatie louwers ip-technology advocaten - masterclass clou...Fex   131104 - presentatie louwers ip-technology advocaten - masterclass clou...
Fex 131104 - presentatie louwers ip-technology advocaten - masterclass clou...
Flevum
 
Cryptography al-hamadi111
Cryptography al-hamadi111Cryptography al-hamadi111
Cryptography al-hamadi111
sukhalalton
 

Viewers also liked (20)

Asp video contest sound (updated)
Asp video contest sound (updated)Asp video contest sound (updated)
Asp video contest sound (updated)
 
UW College of Arts & Sciences '09 - '10
UW College of Arts &  Sciences '09 - '10UW College of Arts &  Sciences '09 - '10
UW College of Arts & Sciences '09 - '10
 
Test
TestTest
Test
 
nsc_proxy_2009
nsc_proxy_2009nsc_proxy_2009
nsc_proxy_2009
 
Iccana 2011
Iccana 2011Iccana 2011
Iccana 2011
 
Paper id 24201420
Paper id 24201420Paper id 24201420
Paper id 24201420
 
10.1.1.11.1180
10.1.1.11.118010.1.1.11.1180
10.1.1.11.1180
 
The Employee TV Network
The Employee TV Network The Employee TV Network
The Employee TV Network
 
Marca
MarcaMarca
Marca
 
Paper id 212014133
Paper id 212014133Paper id 212014133
Paper id 212014133
 
Roundtable Series Launch Event w/ the Financial Times
Roundtable Series Launch Event w/ the Financial TimesRoundtable Series Launch Event w/ the Financial Times
Roundtable Series Launch Event w/ the Financial Times
 
сушия
сушиясушия
сушия
 
75 78
75 7875 78
75 78
 
129966863516564072[1]
129966863516564072[1]129966863516564072[1]
129966863516564072[1]
 
Bq33413420
Bq33413420Bq33413420
Bq33413420
 
Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...
 
Speech processing strategies for cochlear prostheses
Speech processing strategies for cochlear prosthesesSpeech processing strategies for cochlear prostheses
Speech processing strategies for cochlear prostheses
 
129966862758614726[1]
129966862758614726[1]129966862758614726[1]
129966862758614726[1]
 
Fex 131104 - presentatie louwers ip-technology advocaten - masterclass clou...
Fex   131104 - presentatie louwers ip-technology advocaten - masterclass clou...Fex   131104 - presentatie louwers ip-technology advocaten - masterclass clou...
Fex 131104 - presentatie louwers ip-technology advocaten - masterclass clou...
 
Cryptography al-hamadi111
Cryptography al-hamadi111Cryptography al-hamadi111
Cryptography al-hamadi111
 

Similar to 129966863931865940[1]

129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
威華 王
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
IOSRJVSP
 
On The Fundamental Aspects of Demodulation
On The Fundamental Aspects of DemodulationOn The Fundamental Aspects of Demodulation
On The Fundamental Aspects of Demodulation
CSCJournals
 
Control Analysis of a mass- loaded String
Control Analysis of a mass- loaded StringControl Analysis of a mass- loaded String
Control Analysis of a mass- loaded String
AM Publications
 
Do33694700
Do33694700Do33694700
Do33694700
IJERA Editor
 
Do33694700
Do33694700Do33694700
Do33694700
IJERA Editor
 
timeSeriesClassificationLDA
timeSeriesClassificationLDAtimeSeriesClassificationLDA
timeSeriesClassificationLDA
Kellen Betts
 
ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...
ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...
ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...
CSCJournals
 
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
csandit
 
Implementation of the
Implementation of theImplementation of the
Implementation of the
csandit
 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014
Fangchen FENG
 
wep153
wep153wep153
Generation of quantum codes using up and down link optical soliton
Generation of quantum codes using up and down link optical solitonGeneration of quantum codes using up and down link optical soliton
Generation of quantum codes using up and down link optical soliton
University of Malaya (UM)
 
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
IJNSA Journal
 
Economia01
Economia01Economia01
Economia01
Crist Oviedo
 
Economia01
Economia01Economia01
Economia01
Crist Oviedo
 
research journal
research journalresearch journal
research journal
akhila1001
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Ivan Kitov
 
ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...
ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...
ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...
ijwmn
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
ssuser2797e4
 

Similar to 129966863931865940[1] (20)

129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
 
On The Fundamental Aspects of Demodulation
On The Fundamental Aspects of DemodulationOn The Fundamental Aspects of Demodulation
On The Fundamental Aspects of Demodulation
 
Control Analysis of a mass- loaded String
Control Analysis of a mass- loaded StringControl Analysis of a mass- loaded String
Control Analysis of a mass- loaded String
 
Do33694700
Do33694700Do33694700
Do33694700
 
Do33694700
Do33694700Do33694700
Do33694700
 
timeSeriesClassificationLDA
timeSeriesClassificationLDAtimeSeriesClassificationLDA
timeSeriesClassificationLDA
 
ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...
ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...
ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-...
 
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
 
Implementation of the
Implementation of theImplementation of the
Implementation of the
 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014
 
wep153
wep153wep153
wep153
 
Generation of quantum codes using up and down link optical soliton
Generation of quantum codes using up and down link optical solitonGeneration of quantum codes using up and down link optical soliton
Generation of quantum codes using up and down link optical soliton
 
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
 
Economia01
Economia01Economia01
Economia01
 
Economia01
Economia01Economia01
Economia01
 
research journal
research journalresearch journal
research journal
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlation
 
ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...
ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...
ANALYSIS OF FRACTIONALLY SPACED WIDELY LINEAR EQUALIZATION OVER FREQUENCY SEL...
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 

129966863931865940[1]

  • 1. On the Use of Weighted Filter Bank Analysis for the Derivation of Robust MFCCs Wei-Wen Hung (Member, IEEE) Department of Electrical Engineering Ming Chi Institute of Technology 84 Gungjuan Road, Taishan, Taipei, Taiwan, 24306, Republic of China E-mail :wwhung@ccsun.mit.edu.tw FAX : 886-02-2906-1780; Tel. : 886-02-2906-0379 and Hsiao-Chuan Wang (Senior Member, IEEE) (Associate Editor of IEEE Transactions on Speech and Audio Processing) Department of Electrical Engineering National Tsing Hua University Hsinchu, 30043, Taiwan, Republic of China E-mail : hcwang@ee.nthu.edu.tw FAX : 886-03-571-5971; Tel. : 886-03-574-2587 EDICS number : SPL.SA.1.6 Speech Recognition Re : SPL-2145 Corresponding Author : Wei-Wen Hung
  • 2. On the Use of Weighted Filter Bank Analysis for the Derivation of Robust MFCCs ∗ Wei-Wen Hung and # Hsiao-Chuan Wang ∗ Department of Electrical Engineering, Ming Chi Institute of Technology (Member, IEEE) # Department of Electrical Engineering, National Tsing Hua University (Senior Member, IEEE) (Associate Editor of IEEE Transactions on Speech and Audio Processing) Abstract – In this paper, we discuss the use of weighted filter bank analysis (WFBA) to increase the discriminating ability of mel frequency cepstral coefficients (MFCCs). The WFBA emphasizes the peak structure of the log filter bank energies (LFBEs) obtained from filter bank analysis while attenuating the components with lower energy in a simple, direct and effective way. Experimental results for recognition of continuous Mandarin telephone speech indicate that the WFBA-based cepstral features are more robust than those derived by employing the standard filter bank analysis and some widely used cepstral liftering and frequency filtering schemes both in channel-distorted and noisy conditions. Indexing Terms – Weighted filter bank analysis (WFBA), log filter bank energy (LFBE), mel frequency cepstral coefficient (MFCC). This research has been partially sponsored by the National Science Council, Taiwan, ROC, under contract number NSC-89-2614-E-007-002.
  • 3. LIST OF FIGURES AND TABLES Fig. 1. Block diagram for the derivation of MFCCs based on the weighted filter bank analysis. Fig. 2. F-ratio curves of mel frequency cepstral coefficients based on various schemes. (A) For the 12-order cepstral coefficients. (B) For the 12-order delta cepstral coefficients. Fig. 3. Relationships between fuzzy factors and syllable recognition rates under different conditions. Table I. COMPARISONS OF SYLLABLE RECOGNITION RATES FOR VARIOUS SCHEMES UNDER DIFFERENT CONDITIONS.
  • 4. I. INTRODUCTION The filter bank analysis (FBA) is one of the most extensively employed spectral analysis techniques, which is required among various kinds of speech applications. This approach typically uses a bank of highly overlapped band-pass filters that roughly approximates the frequency response of basilar membrane in the cochlea to cover the frequency range of interest in a speech signal. The measurement from the outputs of those band-pass filters can be essentially treated as a short-time spectral envelope. This measured spectral envelope is easily prone to statistical variation due to speaker characteristics, background noise, channel effect and limitations of the underlying speech analysis model, etc., and it may make spectral comparisons unreliable. To suppress those undesired variations and to obtain a more reliable distance measure, a cepstral liftering (CL) scheme [1] has been developed to account for the sensitivity of cepstral coefficients. In this regard the applied weights )(mL used in the liftering process take advantage of the statistical characteristic of cepstral coefficients and the resulting liftered distance measure is given by [ ] [ ] ,~)()(~) ~ ,( 1 2 1 2 )()()()( ∑∑ == ⋅−⋅=−= L m mm L m CLmCLmCLCL cmLcmLccCCd (1) where [ ])()()()( ,,, CLLCL2CL1CL cccC ⋅⋅⋅= and [ ])()()()( ~,,~,~~ CLLCL2CL1CL cccC ⋅⋅⋅= are two liftered cepstral vectors. Various types of weighting functions including linear, sinusoidal, exponential, band-pass and ramp lifters have been introduced in the literature. Besides the cepstral liftering scheme, Battle et al. [2] proposed an alternative to improve the robustness of FBA-based speech features by filtering the frequency sequence of log filter bank energies (LFBEs). The frequency filtering (FF) scheme not only approximately equalizes the variances of cepstral coefficients up to a certain quefrency index, but also decorrelates the log filter bank energies to some
  • 5. extents. This filtering process can be accomplished by passing the sequence of log filter bank energies through a finite impulse response (FIR) filter of the form ∑ − ⋅= i i i zhzH )( (2) Although the aforementioned cepstral liftering and frequency filtering schemes have been widely used in enhancing the robustness of cepstral features, there is still a need to investigate new approaches for achieving better performance. Subsequently, we shall introduce a new weighted filter bank analysis (WFBA) scheme which results in a set of discriminating cepstral features in a simple, direct and effective way while maintaining a relatively low computation cost. II. WEIGHTED FILTER BANK ANALYSIS SCHEME Assuming that )(nx represents the frame of a speech signal that is pre-emphasized and Hamming-windowed, then the derivation of conventional mel frequency cepstral coefficients (MFCCs) proceeds as follows. Firstly, the speech frame )(nx , where Nn1 ≤≤ , is transformed from time domain into frequency domain by applying an −N point short-time Fourier transform (STFT), and the resulting power spectrum 2 kX )( can be formulated as ,) 2 exp()()( 2 1 2 ∑ = ⋅⋅⋅ ⋅−⋅= N n N kn jnxkX π (3) where Nk ≤≤1 . Once the power spectrum 2 )(kX is obtained, we can calculate the filter bank energy )(ie passing through the thi − mel-scaled critical band-pass filter )(kiψ by ,)()()( 1 2 ∑ = ⋅= N k i kkXie ψ (4)
  • 6. where Qi1 ≤≤ and Q is the number of mel-scaled triangular band-pass filters. Finally, a discrete cosine transform (DCT) is applied to the frequency sequence of log filter bank energies { }Qi1ie ≤≤)],(log[ . Thus, the mel frequency cepstral coefficients mc can be expressed as ,)(cos)](log[∑=       ⋅ −⋅ ⋅⋅= Q 1i m Q2 1i2 miec π (5) where ,Lm1 ≤≤ and L is the desired number of cepstral features. From above description, we can find that a distorted speech signal always causes considerable spectral variations and results in performance degradation. However, it is also known that more noise can be perceptually tolerated in the spectral formant regions than in the spectral valleys. Therefore, our goal is to emphasize the high energy parts of the log filter bank energies such that the cepstral features become less susceptible to environmental interference. In our approach shown in Fig. 1, the log filter bank energies are multiplied by a set of weighting factors prior to performing discrete cosine transform, that is [3] .) 2 12 (cos)](log[)( 1 )( ∑ =       ⋅ −⋅ ⋅⋅⋅= Q i WFBAm Q i mieiwc π (6) In this study, we investigate the effects of the following two types of weighting functions. Type 1. ∑ = = Q j jiiw 1 )( ββ and ∑ = −       + + = Q 1r 1F 1 i 01re 01ie ].)(log[ ].)(log[ β . (7) Type 2. .].)(log[].)(log[)( ∑ = ++= Q 1j 01je01ieiw (8) For the first type of weighting function, a fuzzy membership function is used to determine the weights. By properly adjusting the fuzzy factor F , we can achieve various extents of fuzziness for the WFBA
  • 7. scheme. When the fuzzy factor F tends to 1.0 and )(ie is the maximum energy, then the weights are distributed with )(iw =1.0 and )( jw =0.0 for ji ≠ . On the other hand, in the case of ∞→F , all the weights become equal and are set to Q1 . In the second type, the weighting terms are directly proportional to the log energy of each critical band. In addition, it does not require a priori determination of the fuzzy factor and therefore needs less computation. We will refer to the cepstral features calculated by WFBA scheme using Type 1 and 2 weighting functions as the “FWFBA” and “DWFBA”, respectively. III. EXPERIMENTS AND DISCUSSIONS The MAT (Mandarin Across Taiwan) speech database [3] was used to evaluate the presented schemes. The database provided by the Computational Linguistic Society of R.O.C. was collected over the public telephone network and each Mandarin word comprised 1~23 Mandarin syllables. From the MAT database, we chose 8320 phonetically balanced Mandarin words (37784 syllables) spoken by 81 males and 79 females to train the right-context-dependent sub-syllable HMMs of 410 Mandarin syllables. Moreover, each syllable model contains six to seven states in which the output observation distribution is characterized by a 4-mixture Gaussian density function with diagonal covariance matrix. In the testing phase, the evaluated schemes were applied to a 500-utterance (4754 syllables) recognition task in which the testing utterances spoken by 15 males and 15 females were selected from a different set of the MAT database. The feature vector was composed of 12-order mel frequency cepstral coefficients and their first-order time derivatives. To simulate various noisy conditions, the 500 testing utterances were corrupted by the additive white Gaussian noise (AWGN) with signal-to-noise ratio (SNR) at 10 dB, 20 dB and 30 dB. In addition, a sinusoidal lifter [1] and an FIR filter of the form 1 zzzH − −=)( [2] were
  • 8. used in the experiments for comparative purpose and abbreviated as SL and FF, respectively. To evaluate the discriminating abilities of the speech features employing various schemes, we treated each state from all the syllable models as a separate speech class and used F-ratio measure [4] to test the class separability in the feature space. The F-ratio measure takes into account the variance of means and the mean of variances among classes. It has been confirmed that good class separability with large F-ratio measure gives high recognition accuracy. In Fig. 2, it shows the F-ratio curves of the 12-order mel frequency cepstral coefficients and their first-order time derivatives derived by applying various schemes. From these curves, we can find that lower quefrency coefficients generally have higher F-ratios and should therefore offer better class separation. In addition, it can be seen that the WFBA scheme compared to the other schemes always achieves higher F-ratios for different cepstral coefficients. Especially, the FWFBA is superior to the DWFBA at the price of requiring more computation cost. In the aspect of recognition for continuous Mandarin telephone speech, we evaluated these schemes in terms of syllable recognition rate (S.R.R). Two kinds of environmental conditions including channel distortion and noise corruption were investigated, and to see if the WFBA scheme can achieve better syllable recognition rates than the other evaluated schemes in channel-distorted and noisy conditions. In the channel-compensated condition, the widely used cepstral mean subtraction (CMS) [5] was employed for canceling the embedded channel effect. In Fig. 3, we illustrated the relationships between the fuzzy factors and the syllable recognition rates under different conditions. It shows that the syllable recognition rate initially increases with the fuzzy factor F , attains a maximum value and then decreases with an increase in the fuzzy factor. Obviously, the optimal value of fuzzy factor is related to SNR value, i.e., the smaller the SNR value of additive white Gaussian noise, the smaller the optimal value of fuzzy factor. Moreover, we also find that further improvement in syllable recognition rate can be obtained by
  • 9. integrating the WFBA with the CMS. On the other hand, as shown in Table I, we can also observe that the WFBA technique outperforms the SL and FF schemes and exhibits consistent improvements for the channel-distorted, channel-compensated and various noisy conditions. As far as computation cost is concerned, the computation complexity required by the DWFBA is lower than for the FWFBA. Finally, it is worth to note that the optimal value of fuzzy factor should be heavily related to SNR value and is still not easily derived. In this study, the optimal values of fuzzy factor under various conditions were determined in a time-consuming manner by selecting some specific values and their neighbors and comparing the corresponding syllable recognition rates. IV. CONCLUSIONS In this paper, a weighted filter bank analysis scheme with emphasis on the peak structure of log filter bank energies is proposed for the derivation of robust cepstral features. Two kinds of weighting functions employed in the WFBA are investigated. The experiments show that by properly adjusting the fuzzy factor the FWFBA has higher capability in enhancing the discriminating ability of cepstral features than the conventional FBA scheme and the other two widely used schemes, i.e., cepstral liftering and frequency filtering schemes. Also, instead of the FWFBA, the DWFBA can offer a simpler form for weighting the LFBEs with much less computation cost while maintaining comparable recognition accuracy. In addition, it is shown that the WFBA is effective for noisy speech recognition and can be well combined with some environment-compensated techniques, such as the CMS, to achieve higher recognition rates if necessary.
  • 10. REFERENCES [1] B. H. Juang, L. R. Rabiner, and J. G. Wilpon, “On the use of band-pass liftering in speech recognition,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 35, no. 7, pp. 947-954, July, 1987. [2] E. Battle, C. Nadeu and J. A. R. Fonollosa, “Feature decorrelation methods in speech recognition :A comparative study,” Proceedings of International Conference on Spoken Language Processing, pp. 951-954, 1998. [3] W. W. Hung, and H. C. Wang, “A fuzzy approach for equalization of the cepstral variances,” Proceeding of International Conference on Acoustics, Speech, and Signal Processing, vol. 3, SP-P7, pp.1611-1614, Istanbul, June 2000. [4] S. Nicholson, B. Milner and S. Cox, “Evaluating feature set performance using the F-ratio and J-measures,” Proceeding of European Conference on Speech Communication and Technology, vol. 1, pp.413-416, Greece, September 1997. [5] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-29, pp. 254-272, 1981.
  • 11. Figures and tables captions : Fig. 1. Block diagram for the derivation of MFCCs based on the weighted filter bank analysis. Fig. 2. F-ratio curves of mel frequency cepstral coefficients based on various schemes. (A) For the 12-order cepstral coefficients. (B) For the 12-order delta cepstral coefficients. Fig. 3. Relationships between fuzzy factors and syllable recognition rates under different conditions. Table I. COMPARISONS OF SYLLABLE RECOGNITION RATES FOR VARIOUS SCHEMES UNDER DIFFERENT CONDITIONS. Fig. 1 )(k1ψ )1(w 2 )(kX ⊗ )1(e ])(log[ .011e + ⊗ )(nx )(kQψ )(Qw )(WFBAmC 2 )(kX ⊗ )(Qe ])(log[ .01Qe + ⊗ |STFT| × |STFT| 1.1.DCT Pre-emphasis& HammingWindowing 1.2. 1.3.
  • 12. Fig. 2 (A) 0.25 0.75 1.25 1.75 2.25 2.75 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 cepstral coefficients F-ratiomeasures MFCC FWFBA(F=1.9) DWFBA FF SL Fig. 2 (B) 0.2 0.7 1.2 1.7 2.2 2.7 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 delta cepstral coefficients F-ratiomeasures MFCC FWFBA(F=1.9) DWFBA FF SL
  • 13. Fig. 3 0 5 10 15 20 25 30 35 40 45 50 less than 1.0 1.001 1.01 1.1 1.3 1.5 1.6 1.7 1.8 1.9 2 2.2 10 100 1000 fuzzy factors syllablerecognitionrates(%) NO CMS+NO AWGN NO CMS+30dB AWGN NO CMS+20dB AWGN NO CMS+10dB AWGN CMS+NO AWGN
  • 14. Table I. S.R.Rs(%) Conditions Schemes NO CMS NO AWGN CMS NO AWGN NO CMS 30 dB AWGN NO CMS 20 dB AWGN NO CMS 10 dB AWGN MFCC 40.79 44.90 36.16 24.41 8.01 Sinusoidal Lifter (SL) 40.32 45.01 37.42 26.34 9.78 Frequency Filtering (FF) 41.49 46.73 38.94 28.85 10.46 DWFBA 42.95 48.81 39.98 30.25 11.32 FWFBA ).( 01000F = 40.98 44.78 37.03 24.01 7.59 FWFBA ).( 02F = 43.21 48.36 40.66 30.51 11.95 FWFBA ).( 0011F = 7.18 10.53 2.89 0.0 0.0 FWFBA ).( 01F < 0.0 0.0 0.0 0.0 0.0 FWFBA )( valueoptimalF = 43.8 ).( 91F = 49.27 ).( 22F = 41.21 ).( 91F = 32.15 ).( 81F = 14.81 ).( 61F =