Natural Language Processing
Audio Processing, Zero Crossing Rate,
Dynamic Time Warping, Spoken Word
Recognition
Vladimir K...
Outline
Audio Processing
 Zero Crossing Rate
 Dynamic Time Warping
 Spoken Word Recognition

Audio Processing
Samples
Samples are successive snapshots of a
specific signal
 Audio files are samples of sound waves
 Microphones conve...
Digital Audio Signal

Sound
pressure

time
Amplitude
Amplitude (in audio processing) is a
measure of sound pressure
 Amplitude is measured at a specific rate
 Ampl...
Digital Approximation Accuracy
Any digitization of analog signals carries some
inaccuracy
 Approximation
accuracy depends...
Sampling Rate & Resolution
Sampling rate is measured in Hertz
 Hertz or Hz are measured in samples per
second
 For examp...
Nyquist-Shannon Sampling Theorem
This theorem states that perfect reconstruction of
a signal is possible if the sampling f...
Audio File Formats
WAVE (WAV) is often associated with Windows but
are now implemented on other platforms
 AIFF is common...
Zero Crossing Rate
What is Zero Crossing Rate (ZCR)?
Zero Crossing Rate (ZCR) is a measure of the
number of times, in a given sample, when
am...
ZCR Source
public class ZeroCrossingRate {
public static double computeZCR01(double[] signals, double normalizer)
{
long n...
ZCR in Voiced vs. Unvoiced Speech
Voiced speech is produced when vowels are spoken
 Voiced
speech is characterized of con...
ZCR in Voiced vs. Unvoiced Speech
Phonetic theory states that voiced speech
has a smooth air flow through the vocal tract
...
Amplitude of Voiced vs. Unvoiced Speech
Amplitude of unvoiced speech is low
 Amplitude of voiced speech is high
 Given a...
ZCR & Amplitude of Voiced & Unvoiced Speech

Voiced
Unvoiced

ZCR
LOW
HIGH

Amplitude
HIGH
LOW
Detection of Silence & Non-Silence
silence_buffer = [];
non_silence_buffer = [];
buffer = [];
while ( there are still fram...
Dynamic Time Warping
source code is in DTW.java
Introduction
Dynamic Time Warping (DTW) is a method to
find an optimal alignment between two timedependent sequences
 DTW...
Basic Definitions
There are two sequences:
𝑋=

𝑥1 , … , 𝑥 𝑁

and

𝑌 = 𝑦1 , … , 𝑦 𝑀

There is a feature space F such that:
...
Sample Sequences
Sample Alignment
Cost Matrix DTW(N, M)
X and Y are sequences X[1:N] and Y[1:M]
M
…

Y

j
…
2
1

1

2

….

i

…

N

X
𝑑𝑡𝑤 𝑖, 𝑗 is the cost o...
Warping Path
𝑃 = 𝑝1 , … , 𝑝 𝐿 , where 𝑝 = 𝑛 𝑗 , 𝑚 𝑗 ∈ 1, 𝑁 × [1, 𝑀] and
𝑗 ∈ 1, 𝐿 is a warping path if
1) 𝑝1 = 1,1 and 𝑝 𝐿 ...
Valid Warping Path
𝑝5

5
4

𝑝4

3

𝑝6

𝑝3

2

𝑝2

1

𝑝1

1

2

3

4

𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where
𝑝1 = 1, 1 , 𝑝...
Invalid Warping Path
𝑝5

5
𝑝4

4

𝑝3

3
2

𝑝6

𝑝2

𝑝1

1
1

2

3

4

𝑝1 ≠ 1, 1 so constraint 1 is not satisfied
Invalid Warping Path
5
𝑝4

4

𝑝3

3
𝑝2

2
1

𝑝6

𝑝5

𝑝1

1

2

3

4

𝑝3 = 3, 3 , 𝑝4 = 2, 4 , 3 > 2 so 2nd constraint is no...
Invalid Warping Path
𝑝5

𝑝4

5

𝑝3

4
3
𝑝2

2
1

𝑝1

1

2

3

4

𝑝2 = 2, 2 , 𝑝3 = 3, 4 , 𝑝3 − 𝑝2 = 3,4 − 2,2 = 1, 2 ∉
1, 0...
Total Cost of a Warping Path
𝑃 = 𝑝1 , … , 𝑝 𝐿 , is a warping path between sequences X
and Y, then its total cost is
𝐿

𝑐 𝑝...
Example
𝑝5

5
4

Y

𝑝6

𝑝4

3

Then the total cost of P is
𝑐 𝑥1 , 𝑦1 + 𝑐 𝑥1 , 𝑦2 + 𝑐 𝑥2 , 𝑦3
+𝑐 𝑥2 , 𝑦4 + 𝑐 𝑥3 , 𝑦5 + 𝑐 𝑥4...
DTW(X, Y) – Cost of an Optimal Warping Path

𝐷𝑇𝑊 𝑋, 𝑌 = min 𝑐 𝑝 𝑋, 𝑌

𝑝 is a warping path}
Remarks on DTW(X, Y)
There may be several warping paths of the
same DTW(X, Y)
 DTW(X, Y) is symmetric whenever the local
...
DTW Equations: Base Cases
M

1st Column:
𝑑𝑡𝑤 1, 𝑗 = 𝑑𝑡𝑤 1, 𝑗 − 1 +
𝑐(1, 𝑗)

…

Y

j
…
2
1

1

2

….

i

X
Initial conditio...
DTW Equations: Recursion
M
…

Y

j
…
2
1

1

2

…

i

…

N

Interpretation:
Cost
of
warping X[1:i] with Y[1:J] is
the cost...
Example
Let the feature space 𝐹 = 𝑎, 𝑏, 𝑔 .
Let the local cost measure be
defined as follows:
0 𝑖𝑓 𝑥 = 𝑦
𝑐 𝑥, 𝑦 =
1 𝑖𝑓 𝑥 ≠...
DTW(X, Y)
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0

0
1

2

3

a

𝑏

𝑔

X
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

𝑑𝑡𝑤 2,1 = 𝑐 2,1 + 𝑑𝑡𝑤 1,1
= 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1
=1+0=1

0

1

1

2

3

a

𝑏

𝑔

...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

𝑑𝑡𝑤 3,1 = 𝑐 3,1 + 𝑑𝑡𝑤 2,1
= 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1
=1+1=2

0

1

2

1

2

3

a

𝑏

...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

1
0

𝑑𝑡𝑤 1,2 = 𝑐 1,2 + 𝑑𝑡𝑤 1,1
= 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,1
=1+0=1

1

2

1

2

3

a

𝑏...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

2
1
0

𝑑𝑡𝑤 1,3 = 𝑐 1,3 + 𝑑𝑡𝑤 1,2
= 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,2
=1+1=2

1

2

1

2

3

a
...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

𝑑𝑡𝑤 1,4 = 𝑐 1,4 + 𝑑𝑡𝑤 1,3
= 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,3
=1+2=3

1

2

1

2

3

...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

𝑑𝑡𝑤 2,2
= 𝑐 2,2

0
1

2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤
+ min 𝑑𝑡𝑤
𝑑𝑡𝑤
= 𝑐 ...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

𝑑𝑡𝑤 3,2
= 𝑐 3,2

0
1

1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤
+ min 𝑑𝑡𝑤
𝑑𝑡𝑤
= ...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

𝑑𝑡𝑤 2,3
= 𝑐 2,2

0
0
1

1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤
+ min 𝑑𝑡𝑤
𝑑𝑡𝑤
...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

0
0
1

1
1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤 3,3
= 𝑐 3,3
𝑑𝑡𝑤
+ min 𝑑𝑡𝑤
𝑑𝑡𝑤...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

1
0
0
1

1
1
2

1

2

3

a

𝑏

𝑔

3
2
1
0

X

𝑑𝑡𝑤 2,4
= 𝑐 2,4
𝑑𝑡𝑤
+ min 𝑑𝑡𝑤
𝑑...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

1
0
0
1

0
1
1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤 3,4
= 𝑐 3,4
𝑑𝑡𝑤
+ min 𝑑𝑡𝑤...
Example: DTW(X,Y)
𝑔

4

𝑏

3

𝑏

2

𝑎

1

Y

3
2
1
0

1
0
0
1

0
1
1
2

1

2

3

a

𝑏

𝑔

X

DTW(X, Y) = 0.
Optimal Warpin...
DTW(Y, Z)
DTW(Y, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0

0
1

2

3

4

a

𝑏

𝑏

𝑔

Y
DTW(Y, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 2,1
= 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1
=1+0=1

0

1

1

2

3

4

a

𝑏

𝑏

𝑔

Y
DTW(Y, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 3,1
= 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 2,1
=1+1=2

0

1

2

1

2

3

4

a

𝑏

𝑏

𝑔

Y
DTW(Y, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 4,1
= 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 3,1
=1+2=3

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y
DTW(Y, Z)

𝑔

Z

3
𝑑𝑡𝑤 1,2
= 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1
=1+0=1

𝑔

2

1

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y
DTW(Y, Z)

𝑔

Z

3

2
𝑑𝑡𝑤 1,3
= 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2
=1+1=2

𝑔

2

1

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y
DTW(Y, Z)

𝑑𝑡𝑤 2,2
= 𝑐 𝑏, 𝑔
+ min{𝑑𝑡𝑤 1,2 ,
𝑑𝑡𝑤 1,1 ,
𝑑𝑡𝑤 2,1 }
=1+ 0=1

𝑔

Z

3

2

𝑔

2

1

1

𝑎

1

0

1

2

3

1

2

3...
DTW(Y, Z)

𝑔

Z

3

2

𝑔

2

1

1

2

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y

𝑑𝑡𝑤 3,2
= 𝑐 𝑏, 𝑔
+ min{𝑑𝑡𝑤 2,2 ,
𝑑𝑡𝑤 2,...
DTW(Y, Z)

𝑔

Z

3

2

𝑔

2

1

1

2

2

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y

𝑑𝑡𝑤 4,2
= 𝑐 𝑔, 𝑔
+ min{𝑑𝑡𝑤 3,2 ,
𝑑𝑡𝑤...
DTW(Y, Z)

𝑔

Z

3

2

2

𝑔

2

1

1

2

2

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y

𝑑𝑡𝑤 2,3
= 𝑐 𝑏, 𝑔
+ min{𝑑𝑡𝑤 1,3 ,
...
DTW(Y, Z)

𝑔

Z

3

2

2

2

𝑔

2

1

1

2

2

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y

𝑑𝑡𝑤 3,3
= 𝑐 𝑏, 𝑔
+ min{𝑑𝑡𝑤 2,3...
DTW(Y, Z)

𝑔

Z

3

2

2

2

2

𝑔

2

1

1

2

2

𝑎

1

0

1

2

3

1

2

3

4

a

𝑏

𝑏

𝑔

Y

𝑑𝑡𝑤 4,3
= 𝑐 𝑔, 𝑔
+ min{𝑑𝑡𝑤 ...
DTW(Y, Z)

DTW(Y, Z) = 2.
Optimal Warping Path (OWP) P
can be found by chasing pointers
(red arrows): P = ((1,1), (2, 2), ...
DTW(X, Z)
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0

0
1

2

3

a

𝑏

𝑔

X
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1
=1+0=1

0

1

1

2

3

a

𝑏

𝑔

X
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

𝑑𝑡𝑤 3,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1
=1+1=2

0

1

2

1

2

3

a

𝑏

𝑔

X
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

1
0

𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1
=1+0=1

1

2

1

2

3

a

𝑏

𝑔

X
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

2
1
0

𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2
=1+1=2

1

2

1

2

3

a

𝑏

𝑔

X
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

2
1
0

𝑑𝑡𝑤 2,2
= 𝑐 𝑏, 𝑔

1
1

2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤 1,2 ,
+ min 𝑑𝑡𝑤 1,1 ,
𝑑𝑡𝑤 2,1
= ...
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

2
1
0

𝑑𝑡𝑤 3,2
= 𝑐 𝑔, 𝑔

1
1

1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤 2,2 ,
+ min 𝑑𝑡𝑤 2,1 ,
𝑑𝑡𝑤 3,1
...
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

2
1
0

𝑑𝑡𝑤 2,3
= 𝑐 𝑏, 𝑔

2
1
1

1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤 1,3 ,
+ min 𝑑𝑡𝑤 1,2 ,
𝑑𝑡𝑤 2,...
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

2
1
0

2
1
1

1
1
2

1

2

3

a

𝑏

𝑔

X

𝑑𝑡𝑤 3,3
= 𝑐 𝑔, 𝑔
𝑑𝑡𝑤 2,3 ,
+ min 𝑑𝑡𝑤 2,2 ,
𝑑𝑡𝑤 3...
DTW(X, Z)

𝑔

Z

3

𝑔

2

𝑎

1

2
1
0

2
1
1

1
1
2

1

2

3

a

𝑏

𝑔

X

DTW(X, Z) = 1.
Optimal Warping Path (OWP)
P can ...
Possible Optimizations of DTW
The computation of DTW can be optimized so that only the
cells within a specific window are ...
Possible Optimizations of DTW
You may have realized by now that if we care
only about the total cost of warping sequence X...
Spoken Word Recognition
source code is in WavAudioDictionary.java
General Outline
Given a directory of audio files with spoken words, process
each file into a table that maps specific word...
Optimizations
If we use DTW to compute the similarity b/w the
digital audio input vector and the vectors in the table,
it ...
References
M. Muller. Information Retrieval for Music and
Motion, Ch.04. Springer, ISBN 978-3-540-74047-6
 Bachu,
R. G., ...
Upcoming SlideShare
Loading in …5
×

NLP: Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition

1,152 views
1,053 views

Published on

Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,152
On SlideShare
0
From Embeds
0
Number of Embeds
612
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NLP: Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition

  1. 1. Natural Language Processing Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition Vladimir Kulyukin www.vkedco.blogspot.com
  2. 2. Outline Audio Processing  Zero Crossing Rate  Dynamic Time Warping  Spoken Word Recognition 
  3. 3. Audio Processing
  4. 4. Samples Samples are successive snapshots of a specific signal  Audio files are samples of sound waves  Microphones convert acoustic signals into analog electrical signals and then analog-todigital converter transform analog signals into digital samples 
  5. 5. Digital Audio Signal Sound pressure time
  6. 6. Amplitude Amplitude (in audio processing) is a measure of sound pressure  Amplitude is measured at a specific rate  Amplitude measures result in digital samples  Some samples have positive values  Some samples have negative values 
  7. 7. Digital Approximation Accuracy Any digitization of analog signals carries some inaccuracy  Approximation accuracy depends on two factors: 1) sampling rate and 2) resolution  In audio processing, sampling is reduction of continuous signal to discrete signal  Sampling rate is the number of samples per unit of time  Resolution is the size of a sample (e.g., the number of bits) 
  8. 8. Sampling Rate & Resolution Sampling rate is measured in Hertz  Hertz or Hz are measured in samples per second  For example, if the audio is sampled at a rate of 44100 per second, then its sampling rate is 44100Hz  Some typical resolutions are 8 bits, 16 bits, and 32 bits 
  9. 9. Nyquist-Shannon Sampling Theorem This theorem states that perfect reconstruction of a signal is possible if the sampling frequency is greater than two times the maximum frequency of the signal being sampled  For example, if a signal has a maximum frequency of 50Hz, then it can, theoretically, be reconstructed if sampled at a rate of 100Hz and avoid aliasing (the effect of indistinguishable sounds) 
  10. 10. Audio File Formats WAVE (WAV) is often associated with Windows but are now implemented on other platforms  AIFF is common on Mac OS  AU is common on Unix/Linux  These are similar formats that vary in how they represent data, pack samples (e.g., little-endian vs. big-endian), etc.  Java example of how to manipulate Wav files can be downloaded from WavFileManip.java 
  11. 11. Zero Crossing Rate
  12. 12. What is Zero Crossing Rate (ZCR)? Zero Crossing Rate (ZCR) is a measure of the number of times, in a given sample, when amplitude crosses the horizontal line at 0  ZCR can be used to detect silence vs. nonsilence, voice vs. unvoiced, speaker’s identity, etc.  ZCR is essentially the count of successive samples changing algebraic signs 
  13. 13. ZCR Source public class ZeroCrossingRate { public static double computeZCR01(double[] signals, double normalizer) { long numZC = 0; for(int i = 1; i < signals.length; i++) { if ( (signals[i] >= 0 && signals[i-1] < 0) || (signals[i] < 0 && signals[i-1] >= 0) ) { numZC++; } } return numZC/normalizer; } } source code is in ZeroCrossingRate.java
  14. 14. ZCR in Voiced vs. Unvoiced Speech Voiced speech is produced when vowels are spoken  Voiced speech is characterized of constant frequency tones of some duration  Unvoiced speech is produced when consonants are spoken  Unvoiced speech is non-periodic, random-like because air passes through a narrow constriction of the vocal tract 
  15. 15. ZCR in Voiced vs. Unvoiced Speech Phonetic theory states that voiced speech has a smooth air flow through the vocal tract whereas unvoiced speech has a turbulent air flow that produces noise  Thus, voiced speech should have a low ZCR whereas unvoiced speech should have a high ZCR 
  16. 16. Amplitude of Voiced vs. Unvoiced Speech Amplitude of unvoiced speech is low  Amplitude of voiced speech is high  Given a digital sample, we can use average amplitude as a measure of the sample’s energy  This can be used to classify samples as vowels and consonants 
  17. 17. ZCR & Amplitude of Voiced & Unvoiced Speech Voiced Unvoiced ZCR LOW HIGH Amplitude HIGH LOW
  18. 18. Detection of Silence & Non-Silence silence_buffer = []; non_silence_buffer = []; buffer = []; while ( there are still frames left ) { Read a specific number of frames into buffer; Compute ZCR and average amplitude of buffer; if ( ZCR and average amplitude are below specific thresholds ) { add the buffer to silence_buffer; } else { add the buffer to non_silence_buffer; } } source code is in WavFileManip.detectSilence()
  19. 19. Dynamic Time Warping source code is in DTW.java
  20. 20. Introduction Dynamic Time Warping (DTW) is a method to find an optimal alignment between two timedependent sequences  DTW aligns (“warps”) two sequences in a nonlinear way to match each other  DTW has been successfully used in automatic speech recognition (ASR), bioinformatics (genetic sequence matching), and video analysis 
  21. 21. Basic Definitions There are two sequences: 𝑋= 𝑥1 , … , 𝑥 𝑁 and 𝑌 = 𝑦1 , … , 𝑦 𝑀 There is a feature space F such that: 𝑥 𝑖 ∈ 𝐹 & 𝑦 𝑗 ∈ 𝐹 where 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑀 There is a local cost measure mapping 2tuples of features to non-negative reals: 𝑐: 𝐹 x 𝐹 → 𝑅 ≥ 0
  22. 22. Sample Sequences
  23. 23. Sample Alignment
  24. 24. Cost Matrix DTW(N, M) X and Y are sequences X[1:N] and Y[1:M] M … Y j … 2 1 1 2 …. i … N X 𝑑𝑡𝑤 𝑖, 𝑗 is the cost of warping X[1:i] with Y[1:j]
  25. 25. Warping Path 𝑃 = 𝑝1 , … , 𝑝 𝐿 , where 𝑝 = 𝑛 𝑗 , 𝑚 𝑗 ∈ 1, 𝑁 × [1, 𝑀] and 𝑗 ∈ 1, 𝐿 is a warping path if 1) 𝑝1 = 1,1 and 𝑝 𝐿 = 𝑁, 𝑀 2) 𝑛1 ≤ 𝑛2 ≤ … ≤ 𝑛 𝑁 and 𝑚1 ≤ 𝑚2 ≤ … ≤ 𝑚 𝑀 3) 𝑝 𝑙+1 − 𝑝 𝑙 ∈ 1, 0 , 0, 1 , 1, 1 , 1 ≤ 𝑙 ≤ 𝐿 − 1
  26. 26. Valid Warping Path 𝑝5 5 4 𝑝4 3 𝑝6 𝑝3 2 𝑝2 1 𝑝1 1 2 3 4 𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = (4, 5)
  27. 27. Invalid Warping Path 𝑝5 5 𝑝4 4 𝑝3 3 2 𝑝6 𝑝2 𝑝1 1 1 2 3 4 𝑝1 ≠ 1, 1 so constraint 1 is not satisfied
  28. 28. Invalid Warping Path 5 𝑝4 4 𝑝3 3 𝑝2 2 1 𝑝6 𝑝5 𝑝1 1 2 3 4 𝑝3 = 3, 3 , 𝑝4 = 2, 4 , 3 > 2 so 2nd constraint is not satisfied
  29. 29. Invalid Warping Path 𝑝5 𝑝4 5 𝑝3 4 3 𝑝2 2 1 𝑝1 1 2 3 4 𝑝2 = 2, 2 , 𝑝3 = 3, 4 , 𝑝3 − 𝑝2 = 3,4 − 2,2 = 1, 2 ∉ 1, 0 , 0, 1 , 1, 1 so 3rd condition is not satisfied
  30. 30. Total Cost of a Warping Path 𝑃 = 𝑝1 , … , 𝑝 𝐿 , is a warping path between sequences X and Y, then its total cost is 𝐿 𝑐 𝑝 𝑋, 𝑌 = 𝑐(𝑥 𝑛 𝑗 , 𝑦 𝑚 𝑗 ) 𝑗=1
  31. 31. Example 𝑝5 5 4 Y 𝑝6 𝑝4 3 Then the total cost of P is 𝑐 𝑥1 , 𝑦1 + 𝑐 𝑥1 , 𝑦2 + 𝑐 𝑥2 , 𝑦3 +𝑐 𝑥2 , 𝑦4 + 𝑐 𝑥3 , 𝑦5 + 𝑐 𝑥4 , 𝑦5 . 𝑝3 2 𝑝2 1 𝑝1 1 Assume that 𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = 4, 5 , is a warping path b/w X[1:4] and Y[1:5]. This notation 𝑐 𝑥 𝑖 , 𝑦 𝑗 can be simplified to read 𝑐(𝑖, 𝑗) or 𝑐 𝑋 𝑖 , 𝑌 𝑗 . 2 3 X 4
  32. 32. DTW(X, Y) – Cost of an Optimal Warping Path 𝐷𝑇𝑊 𝑋, 𝑌 = min 𝑐 𝑝 𝑋, 𝑌 𝑝 is a warping path}
  33. 33. Remarks on DTW(X, Y) There may be several warping paths of the same DTW(X, Y)  DTW(X, Y) is symmetric whenever the local cost measure is symmetric  DTW(X, Y) does not necessarily satisfy the triangle inequality (the sum of the lengths of two sides is greater than the length of the remaining side) 
  34. 34. DTW Equations: Base Cases M 1st Column: 𝑑𝑡𝑤 1, 𝑗 = 𝑑𝑡𝑤 1, 𝑗 − 1 + 𝑐(1, 𝑗) … Y j … 2 1 1 2 …. i X Initial condition: 𝑑𝑡𝑤 1,1 = 𝑐(1,1) … N 1st Row: 𝑑𝑡𝑤 𝑖, 1 = 𝑑𝑡𝑤 𝑖 − 1,1 + 𝑐(𝑖, 1)
  35. 35. DTW Equations: Recursion M … Y j … 2 1 1 2 … i … N Interpretation: Cost of warping X[1:i] with Y[1:J] is the cost of warping X[i] with Y[j] plus the minimum of the following three costs: 1) the cost of warping X[1:i-1] with Y[1:j]; 2) the cost of warping X[1:i-1] with Y[1:j-1]; 3) the cost of warping X[1:i] with Y[1:j-1] X Inner Cell: 𝑑𝑡𝑤 𝑖, 𝑗 = min 𝑑𝑡𝑤 𝑖 − 1, 𝑗 , 𝑑𝑡𝑤 𝑖 − 1, 𝑗 − 1 , 𝑑𝑡𝑤 𝑖, 𝑗 − 1 + 𝑐(𝑖, 𝑗)
  36. 36. Example Let the feature space 𝐹 = 𝑎, 𝑏, 𝑔 . Let the local cost measure be defined as follows: 0 𝑖𝑓 𝑥 = 𝑦 𝑐 𝑥, 𝑦 = 1 𝑖𝑓 𝑥 ≠ 𝑦 Let the sequences be: 𝑋 = 𝑎, 𝑏, 𝑔 𝑌 = 𝑎, 𝑏, 𝑏, 𝑔 𝑍 = (𝑎, 𝑔, 𝑔) Let us compute dtw(X,Y), dtw(Y,Z), and dtw(X, Z). Work it out on paper.
  37. 37. DTW(X, Y)
  38. 38. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 a 𝑏 𝑔 X
  39. 39. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 2,1 = 𝑐 2,1 + 𝑑𝑡𝑤 1,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 a 𝑏 𝑔 X
  40. 40. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 3,1 = 𝑐 3,1 + 𝑑𝑡𝑤 2,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 a 𝑏 𝑔 X
  41. 41. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 1 0 𝑑𝑡𝑤 1,2 = 𝑐 1,2 + 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,1 =1+0=1 1 2 1 2 3 a 𝑏 𝑔 X
  42. 42. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 2 1 0 𝑑𝑡𝑤 1,3 = 𝑐 1,3 + 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,2 =1+1=2 1 2 1 2 3 a 𝑏 𝑔 X
  43. 43. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 1,4 = 𝑐 1,4 + 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,3 =1+2=3 1 2 1 2 3 a 𝑏 𝑔 X
  44. 44. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 2,2 = 𝑐 2,2 0 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑏 + min 1,0,1 =0 1,2 , 1,1 , 2,1 =0+0
  45. 45. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 3,2 = 𝑐 3,2 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑏 + min 0,1,2 =1 2,2 , 2,1 , 3,1 =1+0
  46. 46. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 2,3 = 𝑐 2,2 0 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑏 + min 2,1,0 =0 1,3 , 1,2 , 2,2 =0+0
  47. 47. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 0 0 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,3 = 𝑐 3,3 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑏 + min 0,0,1 =1 2,3 , 2,2 , 3,1 =1+0
  48. 48. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 1 0 0 1 1 1 2 1 2 3 a 𝑏 𝑔 3 2 1 0 X 𝑑𝑡𝑤 2,4 = 𝑐 2,4 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑔 + min 3,2,0 =1 1,4 , 1,3 , 2,3 =1+0
  49. 49. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 1 0 0 1 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,4 = 𝑐 3,4 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑔 + min 1,0,1 =0 So DTW(X,Y) = 0 2,4 , 2,3 , 3,3 =0+0
  50. 50. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 1 0 0 1 0 1 1 2 1 2 3 a 𝑏 𝑔 X DTW(X, Y) = 0. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (2, 3), (3, 4)).
  51. 51. DTW(Y, Z)
  52. 52. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  53. 53. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  54. 54. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 3,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  55. 55. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 4,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 3,1 =1+2=3 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  56. 56. DTW(Y, Z) 𝑔 Z 3 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1 =1+0=1 𝑔 2 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  57. 57. DTW(Y, Z) 𝑔 Z 3 2 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2 =1+1=2 𝑔 2 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  58. 58. DTW(Y, Z) 𝑑𝑡𝑤 2,2 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 } =1+ 0=1 𝑔 Z 3 2 𝑔 2 1 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  59. 59. DTW(Y, Z) 𝑔 Z 3 2 𝑔 2 1 1 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 3,2 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 } = 1 + min 1,1,2 =1+1=2
  60. 60. DTW(Y, Z) 𝑔 Z 3 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 4,2 = 𝑐 𝑔, 𝑔 + min{𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 3,1 , 𝑑𝑡𝑤 4,1 } = 0 + min 2,2,3 =0+2=2
  61. 61. DTW(Y, Z) 𝑔 Z 3 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 2,3 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 1,3 , 𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 } = 1 + min 2,1,1 =1+1=2
  62. 62. DTW(Y, Z) 𝑔 Z 3 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 3,3 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 2,3 , 𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 } = 1 + min 2,1,2 =1+1=2
  63. 63. DTW(Y, Z) 𝑔 Z 3 2 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 4,3 = 𝑐 𝑔, 𝑔 + min{𝑑𝑡𝑤 3,4 , 𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 4,2 } = 0 + min 2,2,2 =0+2=2
  64. 64. DTW(Y, Z) DTW(Y, Z) = 2. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (3, 2), (4, 3)). 𝑔 Z 3 2 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
  65. 65. DTW(X, Z)
  66. 66. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 a 𝑏 𝑔 X
  67. 67. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 a 𝑏 𝑔 X
  68. 68. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 3,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 a 𝑏 𝑔 X
  69. 69. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 1 0 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1 =1+0=1 1 2 1 2 3 a 𝑏 𝑔 X
  70. 70. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2 =1+1=2 1 2 1 2 3 a 𝑏 𝑔 X
  71. 71. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 2,2 = 𝑐 𝑏, 𝑔 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 1,2 , + min 𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 = 1 + min 1,0,1 =1+0=1
  72. 72. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 3,2 = 𝑐 𝑔, 𝑔 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 2,2 , + min 𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 = 0 + min 1,1,2 =0+1=1
  73. 73. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 2,3 = 𝑐 𝑏, 𝑔 2 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 1,3 , + min 𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 = 1 + min 2,1,1 =1+1=2
  74. 74. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 2 1 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,3 = 𝑐 𝑔, 𝑔 𝑑𝑡𝑤 2,3 , + min 𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 = 0 + min 2,1,2 =0+1=1
  75. 75. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 2 1 1 1 1 2 1 2 3 a 𝑏 𝑔 X DTW(X, Z) = 1. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (3, 3)).
  76. 76. Possible Optimizations of DTW The computation of DTW can be optimized so that only the cells within a specific window are considered
  77. 77. Possible Optimizations of DTW You may have realized by now that if we care only about the total cost of warping sequence X with sequence Y, we do not need to compute the entire N x M cost matrix – we need only two columns  The storage savings are huge, but the running time remains the same – O(N x M)  We can also normalize the DTW cost by N x M to keep it low 
  78. 78. Spoken Word Recognition source code is in WavAudioDictionary.java
  79. 79. General Outline Given a directory of audio files with spoken words, process each file into a table that maps specific words (or phrases) to digital signal vectors  These signal vectors can be pre-processed to eliminate silences  An input audio file is taken and digitized into a digital signal vector  The input vector is compared with DTW scores b/w the input vector and the digital vectors in the table 
  80. 80. Optimizations If we use DTW to compute the similarity b/w the digital audio input vector and the vectors in the table, it is vital to keep the vectors as short as possible w/o sacrificing precision  Possible suggestions: decreasing the sampling rate and merging samples into super-features (e.g., Haar coefficients)  Parallelizing similarity computations 
  81. 81. References M. Muller. Information Retrieval for Music and Motion, Ch.04. Springer, ISBN 978-3-540-74047-6  Bachu, R. G., et al. “Separation of Voiced and Unvoiced using Zero Crossing Rate and Energy of the Speech Signal." American Society for Engineering Education (ASEE) Zone Conference Proceedings. 2008. 

×