• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition
 

NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition

on

  • 864 views

 

Statistics

Views

Total Views
864
Views on SlideShare
386
Embed Views
478

Actions

Likes
0
Downloads
3
Comments
0

46 Embeds 478

http://vkedco.blogspot.com 199
http://www.vkedco.blogspot.com 72
http://vkedco.blogspot.in 51
http://www.vkedco.blogspot.in 19
http://reader.aol.com 14
http://www.vkedco.blogspot.ru 10
http://vkedco.blogspot.ru 7
http://vkedco.blogspot.com.br 7
http://vkedco.blogspot.com.au 6
http://vkedco.blogspot.com.tr 5
http://vkedco.blogspot.co.uk 5
http://vkedco.blogspot.co.at 4
http://www.vkedco.blogspot.fr 4
http://www.vkedco.blogspot.co.uk 4
http://www.vkedco.blogspot.kr 4
http://vkedco.blogspot.co.il 4
http://vkedco.blogspot.de 4
http://www.vkedco.blogspot.com.es 3
http://vkedco.blogspot.sk 3
http://www.vkedco.blogspot.com.tr 3
http://vkedco.blogspot.ca 3
http://www.vkedco.blogspot.nl 3
http://vkedco.blogspot.com.es 3
http://vkedco.blogspot.dk 3
http://vkedco.blogspot.pt 3
http://vkedco.blogspot.cz 3
http://vkedco.blogspot.kr 3
http://vkedco.blogspot.ro 3
http://vkedco.blogspot.sg 2
http://www.blogger.com 2
http://www.vkedco.blogspot.ro 2
http://vkedco.blogspot.mx 2
http://vkedco.blogspot.ie 2
http://www.vkedco.blogspot.de 2
http://vkedco.blogspot.ae 2
http://vkedco.blogspot.it 2
http://www.vkedco.blogspot.jp 1
http://vkedco.blogspot.ch 1
http://vkedco.blogspot.fr 1
http://www.vkedco.blogspot.cz 1
http://www.vkedco.blogspot.hu 1
http://vkedco.blogspot.tw 1
http://vkedco.blogspot.nl 1
http://vkedco.blogspot.be 1
http://vkedco.blogspot.com.ar 1
http://www.vkedco.blogspot.ch 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

     NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition Presentation Transcript

    • Natural Language Processing Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition Vladimir Kulyukin www.vkedco.blogspot.com
    • Outline Audio Processing  Zero Crossing Rate  Dynamic Time Warping  Spoken Word Recognition 
    • Audio Processing
    • Samples Samples are successive snapshots of a specific signal  Audio files are samples of sound waves  Microphones convert acoustic signals into analog electrical signals and then analog-todigital converter transform analog signals into digital samples 
    • Digital Audio Signal Sound pressure time
    • Amplitude Amplitude (in audio processing) is a measure of sound pressure  Amplitude is measured at a specific rate  Amplitude measures result in digital samples  Some samples have positive values  Some samples have negative values 
    • Digital Approximation Accuracy Any digitization of analog signals carries some inaccuracy  Approximation accuracy depends on two factors: 1) sampling rate and 2) resolution  In audio processing, sampling is reduction of continuous signal to discrete signal  Sampling rate is the number of samples per unit of time  Resolution is the size of a sample (e.g., the number of bits) 
    • Sampling Rate & Resolution Sampling rate is measured in Hertz  Hertz or Hz are measured in samples per second  For example, if the audio is sampled at a rate of 44100 per second, then its sampling rate is 44100Hz  Some typical resolutions are 8 bits, 16 bits, and 32 bits 
    • Nyquist-Shannon Sampling Theorem This theorem states that perfect reconstruction of a signal is possible if the sampling frequency is greater than two times the maximum frequency of the signal being sampled  For example, if a signal has a maximum frequency of 50Hz, then it can, theoretically, be reconstructed if sampled at a rate of 100Hz and avoid aliasing (the effect of indistinguishable sounds) 
    • Audio File Formats WAVE (WAV) is often associated with Windows but are now implemented on other platforms  AIFF is common on Mac OS  AU is common on Unix/Linux  These are similar formats that vary in how they represent data, pack samples (e.g., little-endian vs. big-endian), etc.  Java example of how to manipulate Wav files can be downloaded from WavFileManip.java 
    • Zero Crossing Rate
    • What is Zero Crossing Rate (ZCR)? Zero Crossing Rate (ZCR) is a measure of the number of times, in a given sample, when amplitude crosses the horizontal line at 0  ZCR can be used to detect silence vs. nonsilence, voice vs. unvoiced, speaker’s identity, etc.  ZCR is essentially the count of successive samples changing algebraic signs 
    • ZCR Source public class ZeroCrossingRate { public static double computeZCR01(double[] signals, double normalizer) { long numZC = 0; for(int i = 1; i < signals.length; i++) { if ( (signals[i] >= 0 && signals[i-1] < 0) || (signals[i] < 0 && signals[i-1] >= 0) ) { numZC++; } } return numZC/normalizer; } } source code is in ZeroCrossingRate.java
    • ZCR in Voiced vs. Unvoiced Speech Voiced speech is produced when vowels are spoken  Voiced speech is characterized of constant frequency tones of some duration  Unvoiced speech is produced when consonants are spoken  Unvoiced speech is non-periodic, random-like because air passes through a narrow constriction of the vocal tract 
    • ZCR in Voiced vs. Unvoiced Speech Phonetic theory states that voiced speech has a smooth air flow through the vocal tract whereas unvoiced speech has a turbulent air flow that produces noise  Thus, voiced speech should have a low ZCR whereas unvoiced speech should have a high ZCR 
    • Amplitude of Voiced vs. Unvoiced Speech Amplitude of unvoiced speech is low  Amplitude of voiced speech is high  Given a digital sample, we can use average amplitude as a measure of the sample’s energy  This can be used to classify samples as vowels and consonants 
    • ZCR & Amplitude of Voiced & Unvoiced Speech Voiced Unvoiced ZCR LOW HIGH Amplitude HIGH LOW
    • Detection of Silence & Non-Silence silence_buffer = []; non_silence_buffer = []; buffer = []; while ( there are still frames left ) { Read a specific number of frames into buffer; Compute ZCR and average amplitude of buffer; if ( ZCR and average amplitude are below specific thresholds ) { add the buffer to silence_buffer; } else { add the buffer to non_silence_buffer; } } source code is in WavFileManip.detectSilence()
    • Dynamic Time Warping source code is in DTW.java
    • Introduction Dynamic Time Warping (DTW) is a method to find an optimal alignment between two timedependent sequences  DTW aligns (“warps”) two sequences in a nonlinear way to match each other  DTW has been successfully used in automatic speech recognition (ASR), bioinformatics (genetic sequence matching), and video analysis 
    • Basic Definitions There are two sequences: 𝑋= 𝑥1 , … , 𝑥 𝑁 and 𝑌 = 𝑦1 , … , 𝑦 𝑀 There is a feature space F such that: 𝑥 𝑖 ∈ 𝐹 & 𝑦 𝑗 ∈ 𝐹 where 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑀 There is a local cost measure mapping 2tuples of features to non-negative reals: 𝑐: 𝐹 x 𝐹 → 𝑅 ≥ 0
    • Sample Sequences
    • Sample Alignment
    • Cost Matrix DTW(N, M) X and Y are sequences X[1:N] and Y[1:M] M … Y j … 2 1 1 2 …. i … N X 𝑑𝑡𝑤 𝑖, 𝑗 is the cost of warping X[1:i] with Y[1:j]
    • Warping Path 𝑃 = 𝑝1 , … , 𝑝 𝐿 , where 𝑝 = 𝑛 𝑗 , 𝑚 𝑗 ∈ 1, 𝑁 × [1, 𝑀] and 𝑗 ∈ 1, 𝐿 is a warping path if 1) 𝑝1 = 1,1 and 𝑝 𝐿 = 𝑁, 𝑀 2) 𝑛1 ≤ 𝑛2 ≤ … ≤ 𝑛 𝑁 and 𝑚1 ≤ 𝑚2 ≤ … ≤ 𝑚 𝑀 3) 𝑝 𝑙+1 − 𝑝 𝑙 ∈ 1, 0 , 0, 1 , 1, 1 , 1 ≤ 𝑙 ≤ 𝐿 − 1
    • Valid Warping Path 𝑝5 5 4 𝑝4 3 𝑝6 𝑝3 2 𝑝2 1 𝑝1 1 2 3 4 𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = (4, 5)
    • Invalid Warping Path 𝑝5 5 𝑝4 4 𝑝3 3 2 𝑝6 𝑝2 𝑝1 1 1 2 3 4 𝑝1 ≠ 1, 1 so constraint 1 is not satisfied
    • Invalid Warping Path 5 𝑝4 4 𝑝3 3 𝑝2 2 1 𝑝6 𝑝5 𝑝1 1 2 3 4 𝑝3 = 3, 3 , 𝑝4 = 2, 4 , 3 > 2 so 2nd constraint is not satisfied
    • Invalid Warping Path 𝑝5 𝑝4 5 𝑝3 4 3 𝑝2 2 1 𝑝1 1 2 3 4 𝑝2 = 2, 2 , 𝑝3 = 3, 4 , 𝑝3 − 𝑝2 = 3,4 − 2,2 = 1, 2 ∉ 1, 0 , 0, 1 , 1, 1 so 3rd condition is not satisfied
    • Total Cost of a Warping Path 𝑃 = 𝑝1 , … , 𝑝 𝐿 , is a warping path between sequences X and Y, then its total cost is 𝐿 𝑐 𝑝 𝑋, 𝑌 = 𝑐(𝑥 𝑛 𝑗 , 𝑦 𝑚 𝑗 ) 𝑗=1
    • Example 𝑝5 5 4 Y 𝑝6 𝑝4 3 Then the total cost of P is 𝑐 𝑥1 , 𝑦1 + 𝑐 𝑥1 , 𝑦2 + 𝑐 𝑥2 , 𝑦3 +𝑐 𝑥2 , 𝑦4 + 𝑐 𝑥3 , 𝑦5 + 𝑐 𝑥4 , 𝑦5 . 𝑝3 2 𝑝2 1 𝑝1 1 Assume that 𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = 4, 5 , is a warping path b/w X[1:4] and Y[1:5]. This notation 𝑐 𝑥 𝑖 , 𝑦 𝑗 can be simplified to read 𝑐(𝑖, 𝑗) or 𝑐 𝑋 𝑖 , 𝑌 𝑗 . 2 3 X 4
    • DTW(X, Y) – Cost of an Optimal Warping Path 𝐷𝑇𝑊 𝑋, 𝑌 = min 𝑐 𝑝 𝑋, 𝑌 𝑝 is a warping path}
    • Remarks on DTW(X, Y) There may be several warping paths of the same DTW(X, Y)  DTW(X, Y) is symmetric whenever the local cost measure is symmetric  DTW(X, Y) does not necessarily satisfy the triangle inequality (the sum of the lengths of two sides is greater than the length of the remaining side) 
    • DTW Equations: Base Cases M 1st Column: 𝑑𝑡𝑤 1, 𝑗 = 𝑑𝑡𝑤 1, 𝑗 − 1 + 𝑐(1, 𝑗) … Y j … 2 1 1 2 …. i X Initial condition: 𝑑𝑡𝑤 1,1 = 𝑐(1,1) … N 1st Row: 𝑑𝑡𝑤 𝑖, 1 = 𝑑𝑡𝑤 𝑖 − 1,1 + 𝑐(𝑖, 1)
    • DTW Equations: Recursion M … Y j … 2 1 1 2 … i … N Interpretation: Cost of warping X[1:i] with Y[1:J] is the cost of warping X[i] with Y[j] plus the minimum of the following three costs: 1) the cost of warping X[1:i-1] with Y[1:j]; 2) the cost of warping X[1:i-1] with Y[1:j-1]; 3) the cost of warping X[1:i] with Y[1:j-1] X Inner Cell: 𝑑𝑡𝑤 𝑖, 𝑗 = min 𝑑𝑡𝑤 𝑖 − 1, 𝑗 , 𝑑𝑡𝑤 𝑖 − 1, 𝑗 − 1 , 𝑑𝑡𝑤 𝑖, 𝑗 − 1 + 𝑐(𝑖, 𝑗)
    • Example Let the feature space 𝐹 = 𝑎, 𝑏, 𝑔 . Let the local cost measure be defined as follows: 0 𝑖𝑓 𝑥 = 𝑦 𝑐 𝑥, 𝑦 = 1 𝑖𝑓 𝑥 ≠ 𝑦 Let the sequences be: 𝑋 = 𝑎, 𝑏, 𝑔 𝑌 = 𝑎, 𝑏, 𝑏, 𝑔 𝑍 = (𝑎, 𝑔, 𝑔) Let us compute dtw(X,Y), dtw(Y,Z), and dtw(X, Z). Work it out on paper.
    • DTW(X, Y)
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 a 𝑏 𝑔 X
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 2,1 = 𝑐 2,1 + 𝑑𝑡𝑤 1,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 a 𝑏 𝑔 X
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 3,1 = 𝑐 3,1 + 𝑑𝑡𝑤 2,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 a 𝑏 𝑔 X
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 1 0 𝑑𝑡𝑤 1,2 = 𝑐 1,2 + 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,1 =1+0=1 1 2 1 2 3 a 𝑏 𝑔 X
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 2 1 0 𝑑𝑡𝑤 1,3 = 𝑐 1,3 + 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,2 =1+1=2 1 2 1 2 3 a 𝑏 𝑔 X
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 1,4 = 𝑐 1,4 + 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,3 =1+2=3 1 2 1 2 3 a 𝑏 𝑔 X
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 2,2 = 𝑐 2,2 0 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑏 + min 1,0,1 =0 1,2 , 1,1 , 2,1 =0+0
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 3,2 = 𝑐 3,2 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑏 + min 0,1,2 =1 2,2 , 2,1 , 3,1 =1+0
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 2,3 = 𝑐 2,2 0 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑏 + min 2,1,0 =0 1,3 , 1,2 , 2,2 =0+0
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 0 0 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,3 = 𝑐 3,3 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑏 + min 0,0,1 =1 2,3 , 2,2 , 3,1 =1+0
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 1 0 0 1 1 1 2 1 2 3 a 𝑏 𝑔 3 2 1 0 X 𝑑𝑡𝑤 2,4 = 𝑐 2,4 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑔 + min 3,2,0 =1 1,4 , 1,3 , 2,3 =1+0
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 1 0 0 1 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,4 = 𝑐 3,4 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑔 + min 1,0,1 =0 So DTW(X,Y) = 0 2,4 , 2,3 , 3,3 =0+0
    • Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 1 0 0 1 0 1 1 2 1 2 3 a 𝑏 𝑔 X DTW(X, Y) = 0. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (2, 3), (3, 4)).
    • DTW(Y, Z)
    • DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 3,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 4,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 3,1 =1+2=3 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑔 Z 3 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1 =1+0=1 𝑔 2 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑔 Z 3 2 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2 =1+1=2 𝑔 2 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑑𝑡𝑤 2,2 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 } =1+ 0=1 𝑔 Z 3 2 𝑔 2 1 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(Y, Z) 𝑔 Z 3 2 𝑔 2 1 1 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 3,2 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 } = 1 + min 1,1,2 =1+1=2
    • DTW(Y, Z) 𝑔 Z 3 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 4,2 = 𝑐 𝑔, 𝑔 + min{𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 3,1 , 𝑑𝑡𝑤 4,1 } = 0 + min 2,2,3 =0+2=2
    • DTW(Y, Z) 𝑔 Z 3 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 2,3 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 1,3 , 𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 } = 1 + min 2,1,1 =1+1=2
    • DTW(Y, Z) 𝑔 Z 3 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 3,3 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 2,3 , 𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 } = 1 + min 2,1,2 =1+1=2
    • DTW(Y, Z) 𝑔 Z 3 2 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 4,3 = 𝑐 𝑔, 𝑔 + min{𝑑𝑡𝑤 3,4 , 𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 4,2 } = 0 + min 2,2,2 =0+2=2
    • DTW(Y, Z) DTW(Y, Z) = 2. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (3, 2), (4, 3)). 𝑔 Z 3 2 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
    • DTW(X, Z)
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 a 𝑏 𝑔 X
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 a 𝑏 𝑔 X
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 3,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 a 𝑏 𝑔 X
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 1 0 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1 =1+0=1 1 2 1 2 3 a 𝑏 𝑔 X
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2 =1+1=2 1 2 1 2 3 a 𝑏 𝑔 X
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 2,2 = 𝑐 𝑏, 𝑔 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 1,2 , + min 𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 = 1 + min 1,0,1 =1+0=1
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 3,2 = 𝑐 𝑔, 𝑔 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 2,2 , + min 𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 = 0 + min 1,1,2 =0+1=1
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 2,3 = 𝑐 𝑏, 𝑔 2 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 1,3 , + min 𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 = 1 + min 2,1,1 =1+1=2
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 2 1 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,3 = 𝑐 𝑔, 𝑔 𝑑𝑡𝑤 2,3 , + min 𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 = 0 + min 2,1,2 =0+1=1
    • DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 2 1 1 1 1 2 1 2 3 a 𝑏 𝑔 X DTW(X, Z) = 1. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (3, 3)).
    • Possible Optimizations of DTW The computation of DTW can be optimized so that only the cells within a specific window are considered
    • Possible Optimizations of DTW You may have realized by now that if we care only about the total cost of warping sequence X with sequence Y, we do not need to compute the entire N x M cost matrix – we need only two columns  The storage savings are huge, but the running time remains the same – O(N x M)  We can also normalize the DTW cost by N x M to keep it low 
    • Spoken Word Recognition source code is in WavAudioDictionary.java
    • General Outline Given a directory of audio files with spoken words, process each file into a table that maps specific words (or phrases) to digital signal vectors  These signal vectors can be pre-processed to eliminate silences  An input audio file is taken and digitized into a digital signal vector  The input vector is compared with DTW scores b/w the input vector and the digital vectors in the table 
    • Optimizations If we use DTW to compute the similarity b/w the digital audio input vector and the vectors in the table, it is vital to keep the vectors as short as possible w/o sacrificing precision  Possible suggestions: decreasing the sampling rate and merging samples into super-features (e.g., Haar coefficients)  Parallelizing similarity computations 
    • References M. Muller. Information Retrieval for Music and Motion, Ch.04. Springer, ISBN 978-3-540-74047-6  Bachu, R. G., et al. “Separation of Voiced and Unvoiced using Zero Crossing Rate and Energy of the Speech Signal." American Society for Engineering Education (ASEE) Zone Conference Proceedings. 2008. 