1. Methods and algorithms of speech recognition
course
Lection 5
Nikolay V. Karpov
nkarpov(а)hse.ru
2. Cepstral Coefficients
Relation to pole positions
Relation to LPC filter coefficients
◦ Line Spectrum Frequencies
Relation to pole positions and to formant frequencies
◦ Summary of LPC parameter sets
Most speech recognisers describe the spectrum of speech
sounds using cepstral coefficients. This is because they are good
at discriminating between different phonemes, are fairly
independent of each other and have approximately Gaussian
distributions for a particular phoneme.
Most speech coders describe the spectrum of speech sounds
using line spectrum frequencies. This is because they can be
quantised to low precision without distorting the spectrum too
much.
3. xk
Cepstrum: inverse Fourier transform of log spectrum (periodic
spectrum ⇒discrete cepstrum)
1
cn log V (ei ) ei n d
2
The coefficients cn can be obtained directly from the xk
1
Define C ( z) cn z n
cn C ( z )ei n d
n 2
This is the standard inverse z-transform derived by taking the inverse
Fourier transform of both sides of the first equation.
By equating the Fourier transforms of the two expressions for cn, we
get G
C ( z ) log(V ( z )) log log(G ) log( A( z ))
A( z )
p p
k
A( z ) 1 ak z (1 xk z 1 )
k 1 k 1
4. yn
By using the Taylor series log(1 y ) ; for y 1
n 1 n
p
C ( z ) log(G ) log( A( z )) log(G ) log(1 xk z 1 )
k 1
p n
xk n
log(G ) z
k 1 n 1 n
n
By collecting all the terms in z , we can get c n in terms of x k
0 for n 0 Because kx 1
the c n
cn log(G ) for n 0 decrease
p n exponentially
xk with n
for n 0
k 1
n
5. Differentiating C ( z) log(G ) log( A( z ))
A' ( z )
C ' ( z) A( z )C ' ( z ) A' ( z ) A( z ) zC ' ( z ) zA' ( z )
A( z )
p p
(1 a k z k )( z mcm z ( m 1)
) z nan z ( n 1)
k 1 m 0 n 1
p p
(1 a k z k )( mcm z m ) nan z n
k 1 m 1 n 1
p p
m (m k ) n
mcm z mcm ak z nan z
m 1 k 1m 1 n 1
replacing m by n-k (to make the z exponent uniform) gives
p p
n n n
ncn z nan z (n k )c( n k ) ak z
n 1 n 1 k 1n k 1
6. m in( p , n 1)
1 m in( p ,n 1)
ncn nan (n k )c( n k ) ak cn an (n k )c( n k ) ak
k 1 n k1
Thus we have a recurrence relation to calculate the cn from the ak
coefficients
c1 a1
1
c2 a2 c1a1
2
1
c3 a3 (c2 a1 c1a2 )
3
1
c4 a4 (c3 a1 c2 a2 c1a3 )
4
c5
7. These coefficients are called the complex cepstrum coefficients
(even though they are real). The cepstrum coefficients use log|V|
instead of log(V) and (except for c0) are half as big.
Note the cute names: spectrum→cepstrum, frequency→quefrency,
filter→lifter, etc
8. p
G j 1 2 p
A( z ) 1
1 ajz 1 a1 z a2 z ap z
V ( z) j 1
( p 1)
P( z ) A( z ) z A* ( z * 1 )
1 2 p ( p 1)
1 (a1 a p ) z ( a2 ap 1)z (a p a1 ) z z
( p 1)
Q( z ) A( z ) z A* ( z * 1 )
1 2 p ( p 1)
1 (a1 a p ) z ( a2 a p 1 ) z (a p a1 ) z z
V(z) is stable if and only if the roots of P(z)and Q(z)all lie on the unit
circle and they are interleaved
9. If the roots of P(z) are at exp(2πjfi ) for i=1,3,… and those of Q(z)
are at exp(2πjfi ) for i=0,2,… with fi+1>fi≥ 0 then the LSF
frequencies are defined as f1, f2, …,fp.
Note that it is always true that f0=+1 and fp+1=–1
1 2 1 2 3
A( z ) 1 0.7 z 0.5 z P( z ) 1 0.2 z 0.2 z z
z 3 A* ( z * 1 ) 0.5 z 1
0.7 z 2
z 3
Q( z ) 1 1.2 z 1
1.2 z 2
z 3
10. ( p 1)
P( z ) 0 A( z ) z A* ( z * 1 ) H ( z) 1
( p 1)
Q( z ) 0 A( z ) z A* ( z * 1 ) H ( z) 1
A( z ) p
(1 xi z 1 ) p
( z xi )
H ( z) (p 1) *
z z
z A ( z* 1 ) i 1
1 *
z (1 xi z ) i 1 (1 xi* z )
here the xi are the roots of A( z ) V 1 ( z )
It turns out that providing all the xi lie inside the unit circle, the
absolute values of the terms making up H(z) are either all > 1 or
else all < 1. Taking | | of a typical term
11. Filter Coefficients:ai
– Stability check difficult; Sensitive to errors; Cannot interpolate
Pole Positions: xi
+ Stability check easy; Can interpolate but unordered.
– Hard to calculate; Sensitive to errors near |xi|=1
Reflection Coefficients: ri
+ Stability check easy; Can interpolate
– Sensitive to errors near ±1
Log Area Ratios: gi
+ Stability guaranteed; Can interpolate
Cepstral Coefficients :ci
+ Good for speech recognition
– Stability check difficult
Line Spectrum Frequencies: fi
+ Stability check easy; Can interpolate; Vary smoothly in time; Strongly
correlated ⇒better coding; Related to spectral peaks (formants).
– Awkward to calculate