Cepstral coefficients

Methods and algorithms of speech recognition
course

Lection 5

Nikolay V. Karpov

nkarpov(а)hse.ru

Cepstral Coefficients
 Relation to pole positions
 Relation to LPC filter coefficients
◦ Line Spectrum Frequencies
 Relation to pole positions and to formant frequencies
◦ Summary of LPC parameter sets

 Most speech recognisers describe the spectrum of speech
sounds using cepstral coefficients. This is because they are good
at discriminating between different phonemes, are fairly
independent of each other and have approximately Gaussian
distributions for a particular phoneme.

 Most speech coders describe the spectrum of speech sounds
using line spectrum frequencies. This is because they can be
quantised to low precision without distorting the spectrum too
much.

xk
Cepstrum: inverse Fourier transform of log spectrum (periodic
spectrum ⇒discrete cepstrum)
1
cn log V (ei ) ei n d
2
The coefficients cn can be obtained directly from the xk
1
Define C ( z) cn z n
cn C ( z )ei n d
n 2
This is the standard inverse z-transform derived by taking the inverse
Fourier transform of both sides of the first equation.
By equating the Fourier transforms of the two expressions for cn, we
get G
C ( z ) log(V ( z )) log log(G ) log( A( z ))
A( z )
p p
k
A( z ) 1 ak z (1 xk z 1 )
k 1 k 1

yn
By using the Taylor series log(1 y ) ; for y 1
n 1 n
p
C ( z ) log(G ) log( A( z )) log(G ) log(1 xk z 1 )
k 1
p n
xk n
log(G ) z
k 1 n 1 n
n
By collecting all the terms in z , we can get c n in terms of x k

0 for n 0 Because kx 1
the c n
cn log(G ) for n 0 decrease
p n exponentially
xk with n
for n 0
k 1
n

Differentiating C ( z) log(G ) log( A( z ))
A' ( z )
C ' ( z) A( z )C ' ( z ) A' ( z ) A( z ) zC ' ( z ) zA' ( z )
A( z )
p p
(1 a k z k )( z mcm z ( m 1)
) z nan z ( n 1)

k 1 m 0 n 1
p p
(1 a k z k )( mcm z m ) nan z n

k 1 m 1 n 1
p p
m (m k ) n
mcm z mcm ak z nan z
m 1 k 1m 1 n 1
replacing m by n-k (to make the z exponent uniform) gives
p p
n n n
ncn z nan z (n k )c( n k ) ak z
n 1 n 1 k 1n k 1

m in( p , n 1)
1 m in( p ,n 1)
ncn nan (n k )c( n k ) ak cn an (n k )c( n k ) ak
k 1 n k1
Thus we have a recurrence relation to calculate the cn from the ak
coefficients
c1 a1
1
c2 a2 c1a1
2
1
c3 a3 (c2 a1 c1a2 )
3
1
c4 a4 (c3 a1 c2 a2 c1a3 )
4
c5 

These coefficients are called the complex cepstrum coefficients
(even though they are real). The cepstrum coefficients use log|V|
instead of log(V) and (except for c0) are half as big.

Note the cute names: spectrum→cepstrum, frequency→quefrency,
filter→lifter, etc

p
G j 1 2 p
A( z ) 1
1 ajz 1 a1 z a2 z  ap z
V ( z) j 1

( p 1)
P( z ) A( z ) z A* ( z * 1 )
1 2 p ( p 1)
1 (a1 a p ) z ( a2 ap 1)z  (a p a1 ) z z

( p 1)
Q( z ) A( z ) z A* ( z * 1 )
1 2 p ( p 1)
1 (a1 a p ) z ( a2 a p 1 ) z  (a p a1 ) z z

V(z) is stable if and only if the roots of P(z)and Q(z)all lie on the unit
circle and they are interleaved

If the roots of P(z) are at exp(2πjfi ) for i=1,3,… and those of Q(z)
are at exp(2πjfi ) for i=0,2,… with fi+1>fi≥ 0 then the LSF
frequencies are defined as f1, f2, …,fp.
Note that it is always true that f0=+1 and fp+1=–1
1 2 1 2 3
A( z ) 1 0.7 z 0.5 z P( z ) 1 0.2 z 0.2 z z
z 3 A* ( z * 1 ) 0.5 z 1
0.7 z 2
z 3
Q( z ) 1 1.2 z 1
1.2 z 2
z 3

( p 1)
P( z ) 0 A( z ) z A* ( z * 1 ) H ( z) 1
( p 1)
Q( z ) 0 A( z ) z A* ( z * 1 ) H ( z) 1
A( z ) p
(1 xi z 1 ) p
( z xi )
H ( z) (p 1) *
z z
z A ( z* 1 ) i 1
1 *
z (1 xi z ) i 1 (1 xi* z )
here the xi are the roots of A( z ) V 1 ( z )

It turns out that providing all the xi lie inside the unit circle, the
absolute values of the terms making up H(z) are either all > 1 or
else all < 1. Taking | | of a typical term

 Filter Coefficients:ai
– Stability check difficult; Sensitive to errors; Cannot interpolate
 Pole Positions: xi
+ Stability check easy; Can interpolate but unordered.
– Hard to calculate; Sensitive to errors near |xi|=1
 Reflection Coefficients: ri
+ Stability check easy; Can interpolate
– Sensitive to errors near ±1
 Log Area Ratios: gi
+ Stability guaranteed; Can interpolate
 Cepstral Coefficients :ci
+ Good for speech recognition
– Stability check difficult
 Line Spectrum Frequencies: fi
+ Stability check easy; Can interpolate; Vary smoothly in time; Strongly
correlated ⇒better coding; Related to spectral peaks (formants).
– Awkward to calculate

Cepstral coefficients

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Cepstral coefficients

Similar to Cepstral coefficients (20)

More from Nikolay Karpov

More from Nikolay Karpov (7)

Cepstral coefficients