Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Pycon apac 2014 by Renyuan M呂老師 2727 views
- Fun with MATLAB by ritece 991 views
- Spatial Enhancement for Immersive S... by Andreas Floros 877 views
- Kluwer academic applications-of_dig... by Mahmut Yildiz 653 views
- Audio Processing by aneetaanu 1054 views
- Sound analysis and processing with ... by Tan Hoang Luu 2636 views

1,152 views

1,053 views

1,053 views

Published on

Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition

Published in:
Science

No Downloads

Total views

1,152

On SlideShare

0

From Embeds

0

Number of Embeds

612

Shares

0

Downloads

11

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Natural Language Processing Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition Vladimir Kulyukin www.vkedco.blogspot.com
- 2. Outline Audio Processing Zero Crossing Rate Dynamic Time Warping Spoken Word Recognition
- 3. Audio Processing
- 4. Samples Samples are successive snapshots of a specific signal Audio files are samples of sound waves Microphones convert acoustic signals into analog electrical signals and then analog-todigital converter transform analog signals into digital samples
- 5. Digital Audio Signal Sound pressure time
- 6. Amplitude Amplitude (in audio processing) is a measure of sound pressure Amplitude is measured at a specific rate Amplitude measures result in digital samples Some samples have positive values Some samples have negative values
- 7. Digital Approximation Accuracy Any digitization of analog signals carries some inaccuracy Approximation accuracy depends on two factors: 1) sampling rate and 2) resolution In audio processing, sampling is reduction of continuous signal to discrete signal Sampling rate is the number of samples per unit of time Resolution is the size of a sample (e.g., the number of bits)
- 8. Sampling Rate & Resolution Sampling rate is measured in Hertz Hertz or Hz are measured in samples per second For example, if the audio is sampled at a rate of 44100 per second, then its sampling rate is 44100Hz Some typical resolutions are 8 bits, 16 bits, and 32 bits
- 9. Nyquist-Shannon Sampling Theorem This theorem states that perfect reconstruction of a signal is possible if the sampling frequency is greater than two times the maximum frequency of the signal being sampled For example, if a signal has a maximum frequency of 50Hz, then it can, theoretically, be reconstructed if sampled at a rate of 100Hz and avoid aliasing (the effect of indistinguishable sounds)
- 10. Audio File Formats WAVE (WAV) is often associated with Windows but are now implemented on other platforms AIFF is common on Mac OS AU is common on Unix/Linux These are similar formats that vary in how they represent data, pack samples (e.g., little-endian vs. big-endian), etc. Java example of how to manipulate Wav files can be downloaded from WavFileManip.java
- 11. Zero Crossing Rate
- 12. What is Zero Crossing Rate (ZCR)? Zero Crossing Rate (ZCR) is a measure of the number of times, in a given sample, when amplitude crosses the horizontal line at 0 ZCR can be used to detect silence vs. nonsilence, voice vs. unvoiced, speaker’s identity, etc. ZCR is essentially the count of successive samples changing algebraic signs
- 13. ZCR Source public class ZeroCrossingRate { public static double computeZCR01(double[] signals, double normalizer) { long numZC = 0; for(int i = 1; i < signals.length; i++) { if ( (signals[i] >= 0 && signals[i-1] < 0) || (signals[i] < 0 && signals[i-1] >= 0) ) { numZC++; } } return numZC/normalizer; } } source code is in ZeroCrossingRate.java
- 14. ZCR in Voiced vs. Unvoiced Speech Voiced speech is produced when vowels are spoken Voiced speech is characterized of constant frequency tones of some duration Unvoiced speech is produced when consonants are spoken Unvoiced speech is non-periodic, random-like because air passes through a narrow constriction of the vocal tract
- 15. ZCR in Voiced vs. Unvoiced Speech Phonetic theory states that voiced speech has a smooth air flow through the vocal tract whereas unvoiced speech has a turbulent air flow that produces noise Thus, voiced speech should have a low ZCR whereas unvoiced speech should have a high ZCR
- 16. Amplitude of Voiced vs. Unvoiced Speech Amplitude of unvoiced speech is low Amplitude of voiced speech is high Given a digital sample, we can use average amplitude as a measure of the sample’s energy This can be used to classify samples as vowels and consonants
- 17. ZCR & Amplitude of Voiced & Unvoiced Speech Voiced Unvoiced ZCR LOW HIGH Amplitude HIGH LOW
- 18. Detection of Silence & Non-Silence silence_buffer = []; non_silence_buffer = []; buffer = []; while ( there are still frames left ) { Read a specific number of frames into buffer; Compute ZCR and average amplitude of buffer; if ( ZCR and average amplitude are below specific thresholds ) { add the buffer to silence_buffer; } else { add the buffer to non_silence_buffer; } } source code is in WavFileManip.detectSilence()
- 19. Dynamic Time Warping source code is in DTW.java
- 20. Introduction Dynamic Time Warping (DTW) is a method to find an optimal alignment between two timedependent sequences DTW aligns (“warps”) two sequences in a nonlinear way to match each other DTW has been successfully used in automatic speech recognition (ASR), bioinformatics (genetic sequence matching), and video analysis
- 21. Basic Definitions There are two sequences: 𝑋= 𝑥1 , … , 𝑥 𝑁 and 𝑌 = 𝑦1 , … , 𝑦 𝑀 There is a feature space F such that: 𝑥 𝑖 ∈ 𝐹 & 𝑦 𝑗 ∈ 𝐹 where 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑀 There is a local cost measure mapping 2tuples of features to non-negative reals: 𝑐: 𝐹 x 𝐹 → 𝑅 ≥ 0
- 22. Sample Sequences
- 23. Sample Alignment
- 24. Cost Matrix DTW(N, M) X and Y are sequences X[1:N] and Y[1:M] M … Y j … 2 1 1 2 …. i … N X 𝑑𝑡𝑤 𝑖, 𝑗 is the cost of warping X[1:i] with Y[1:j]
- 25. Warping Path 𝑃 = 𝑝1 , … , 𝑝 𝐿 , where 𝑝 = 𝑛 𝑗 , 𝑚 𝑗 ∈ 1, 𝑁 × [1, 𝑀] and 𝑗 ∈ 1, 𝐿 is a warping path if 1) 𝑝1 = 1,1 and 𝑝 𝐿 = 𝑁, 𝑀 2) 𝑛1 ≤ 𝑛2 ≤ … ≤ 𝑛 𝑁 and 𝑚1 ≤ 𝑚2 ≤ … ≤ 𝑚 𝑀 3) 𝑝 𝑙+1 − 𝑝 𝑙 ∈ 1, 0 , 0, 1 , 1, 1 , 1 ≤ 𝑙 ≤ 𝐿 − 1
- 26. Valid Warping Path 𝑝5 5 4 𝑝4 3 𝑝6 𝑝3 2 𝑝2 1 𝑝1 1 2 3 4 𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = (4, 5)
- 27. Invalid Warping Path 𝑝5 5 𝑝4 4 𝑝3 3 2 𝑝6 𝑝2 𝑝1 1 1 2 3 4 𝑝1 ≠ 1, 1 so constraint 1 is not satisfied
- 28. Invalid Warping Path 5 𝑝4 4 𝑝3 3 𝑝2 2 1 𝑝6 𝑝5 𝑝1 1 2 3 4 𝑝3 = 3, 3 , 𝑝4 = 2, 4 , 3 > 2 so 2nd constraint is not satisfied
- 29. Invalid Warping Path 𝑝5 𝑝4 5 𝑝3 4 3 𝑝2 2 1 𝑝1 1 2 3 4 𝑝2 = 2, 2 , 𝑝3 = 3, 4 , 𝑝3 − 𝑝2 = 3,4 − 2,2 = 1, 2 ∉ 1, 0 , 0, 1 , 1, 1 so 3rd condition is not satisfied
- 30. Total Cost of a Warping Path 𝑃 = 𝑝1 , … , 𝑝 𝐿 , is a warping path between sequences X and Y, then its total cost is 𝐿 𝑐 𝑝 𝑋, 𝑌 = 𝑐(𝑥 𝑛 𝑗 , 𝑦 𝑚 𝑗 ) 𝑗=1
- 31. Example 𝑝5 5 4 Y 𝑝6 𝑝4 3 Then the total cost of P is 𝑐 𝑥1 , 𝑦1 + 𝑐 𝑥1 , 𝑦2 + 𝑐 𝑥2 , 𝑦3 +𝑐 𝑥2 , 𝑦4 + 𝑐 𝑥3 , 𝑦5 + 𝑐 𝑥4 , 𝑦5 . 𝑝3 2 𝑝2 1 𝑝1 1 Assume that 𝑃 = 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 , 𝑝5 , 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = 4, 5 , is a warping path b/w X[1:4] and Y[1:5]. This notation 𝑐 𝑥 𝑖 , 𝑦 𝑗 can be simplified to read 𝑐(𝑖, 𝑗) or 𝑐 𝑋 𝑖 , 𝑌 𝑗 . 2 3 X 4
- 32. DTW(X, Y) – Cost of an Optimal Warping Path 𝐷𝑇𝑊 𝑋, 𝑌 = min 𝑐 𝑝 𝑋, 𝑌 𝑝 is a warping path}
- 33. Remarks on DTW(X, Y) There may be several warping paths of the same DTW(X, Y) DTW(X, Y) is symmetric whenever the local cost measure is symmetric DTW(X, Y) does not necessarily satisfy the triangle inequality (the sum of the lengths of two sides is greater than the length of the remaining side)
- 34. DTW Equations: Base Cases M 1st Column: 𝑑𝑡𝑤 1, 𝑗 = 𝑑𝑡𝑤 1, 𝑗 − 1 + 𝑐(1, 𝑗) … Y j … 2 1 1 2 …. i X Initial condition: 𝑑𝑡𝑤 1,1 = 𝑐(1,1) … N 1st Row: 𝑑𝑡𝑤 𝑖, 1 = 𝑑𝑡𝑤 𝑖 − 1,1 + 𝑐(𝑖, 1)
- 35. DTW Equations: Recursion M … Y j … 2 1 1 2 … i … N Interpretation: Cost of warping X[1:i] with Y[1:J] is the cost of warping X[i] with Y[j] plus the minimum of the following three costs: 1) the cost of warping X[1:i-1] with Y[1:j]; 2) the cost of warping X[1:i-1] with Y[1:j-1]; 3) the cost of warping X[1:i] with Y[1:j-1] X Inner Cell: 𝑑𝑡𝑤 𝑖, 𝑗 = min 𝑑𝑡𝑤 𝑖 − 1, 𝑗 , 𝑑𝑡𝑤 𝑖 − 1, 𝑗 − 1 , 𝑑𝑡𝑤 𝑖, 𝑗 − 1 + 𝑐(𝑖, 𝑗)
- 36. Example Let the feature space 𝐹 = 𝑎, 𝑏, 𝑔 . Let the local cost measure be defined as follows: 0 𝑖𝑓 𝑥 = 𝑦 𝑐 𝑥, 𝑦 = 1 𝑖𝑓 𝑥 ≠ 𝑦 Let the sequences be: 𝑋 = 𝑎, 𝑏, 𝑔 𝑌 = 𝑎, 𝑏, 𝑏, 𝑔 𝑍 = (𝑎, 𝑔, 𝑔) Let us compute dtw(X,Y), dtw(Y,Z), and dtw(X, Z). Work it out on paper.
- 37. DTW(X, Y)
- 38. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 a 𝑏 𝑔 X
- 39. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 2,1 = 𝑐 2,1 + 𝑑𝑡𝑤 1,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 a 𝑏 𝑔 X
- 40. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 𝑑𝑡𝑤 3,1 = 𝑐 3,1 + 𝑑𝑡𝑤 2,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 a 𝑏 𝑔 X
- 41. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 1 0 𝑑𝑡𝑤 1,2 = 𝑐 1,2 + 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,1 =1+0=1 1 2 1 2 3 a 𝑏 𝑔 X
- 42. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 2 1 0 𝑑𝑡𝑤 1,3 = 𝑐 1,3 + 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,2 =1+1=2 1 2 1 2 3 a 𝑏 𝑔 X
- 43. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 1,4 = 𝑐 1,4 + 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,3 =1+2=3 1 2 1 2 3 a 𝑏 𝑔 X
- 44. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 2,2 = 𝑐 2,2 0 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑏 + min 1,0,1 =0 1,2 , 1,1 , 2,1 =0+0
- 45. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 3,2 = 𝑐 3,2 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑏 + min 0,1,2 =1 2,2 , 2,1 , 3,1 =1+0
- 46. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 𝑑𝑡𝑤 2,3 = 𝑐 2,2 0 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑏 + min 2,1,0 =0 1,3 , 1,2 , 2,2 =0+0
- 47. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 0 0 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,3 = 𝑐 3,3 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑏 + min 0,0,1 =1 2,3 , 2,2 , 3,1 =1+0
- 48. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 1 0 0 1 1 1 2 1 2 3 a 𝑏 𝑔 3 2 1 0 X 𝑑𝑡𝑤 2,4 = 𝑐 2,4 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑏, 𝑔 + min 3,2,0 =1 1,4 , 1,3 , 2,3 =1+0
- 49. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 1 0 0 1 0 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,4 = 𝑐 3,4 𝑑𝑡𝑤 + min 𝑑𝑡𝑤 𝑑𝑡𝑤 = 𝑐 𝑔, 𝑔 + min 1,0,1 =0 So DTW(X,Y) = 0 2,4 , 2,3 , 3,3 =0+0
- 50. Example: DTW(X,Y) 𝑔 4 𝑏 3 𝑏 2 𝑎 1 Y 3 2 1 0 1 0 0 1 0 1 1 2 1 2 3 a 𝑏 𝑔 X DTW(X, Y) = 0. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (2, 3), (3, 4)).
- 51. DTW(Y, Z)
- 52. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 53. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 54. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 3,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 55. DTW(Y, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 4,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 3,1 =1+2=3 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 56. DTW(Y, Z) 𝑔 Z 3 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1 =1+0=1 𝑔 2 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 57. DTW(Y, Z) 𝑔 Z 3 2 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2 =1+1=2 𝑔 2 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 58. DTW(Y, Z) 𝑑𝑡𝑤 2,2 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 } =1+ 0=1 𝑔 Z 3 2 𝑔 2 1 1 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 59. DTW(Y, Z) 𝑔 Z 3 2 𝑔 2 1 1 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 3,2 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 } = 1 + min 1,1,2 =1+1=2
- 60. DTW(Y, Z) 𝑔 Z 3 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 4,2 = 𝑐 𝑔, 𝑔 + min{𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 3,1 , 𝑑𝑡𝑤 4,1 } = 0 + min 2,2,3 =0+2=2
- 61. DTW(Y, Z) 𝑔 Z 3 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 2,3 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 1,3 , 𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 } = 1 + min 2,1,1 =1+1=2
- 62. DTW(Y, Z) 𝑔 Z 3 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 3,3 = 𝑐 𝑏, 𝑔 + min{𝑑𝑡𝑤 2,3 , 𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 } = 1 + min 2,1,2 =1+1=2
- 63. DTW(Y, Z) 𝑔 Z 3 2 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y 𝑑𝑡𝑤 4,3 = 𝑐 𝑔, 𝑔 + min{𝑑𝑡𝑤 3,4 , 𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 4,2 } = 0 + min 2,2,2 =0+2=2
- 64. DTW(Y, Z) DTW(Y, Z) = 2. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (3, 2), (4, 3)). 𝑔 Z 3 2 2 2 2 𝑔 2 1 1 2 2 𝑎 1 0 1 2 3 1 2 3 4 a 𝑏 𝑏 𝑔 Y
- 65. DTW(X, Z)
- 66. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 0 1 2 3 a 𝑏 𝑔 X
- 67. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1 =1+0=1 0 1 1 2 3 a 𝑏 𝑔 X
- 68. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 𝑑𝑡𝑤 3,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1 =1+1=2 0 1 2 1 2 3 a 𝑏 𝑔 X
- 69. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 1 0 𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1 =1+0=1 1 2 1 2 3 a 𝑏 𝑔 X
- 70. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2 =1+1=2 1 2 1 2 3 a 𝑏 𝑔 X
- 71. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 2,2 = 𝑐 𝑏, 𝑔 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 1,2 , + min 𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 = 1 + min 1,0,1 =1+0=1
- 72. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 3,2 = 𝑐 𝑔, 𝑔 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 2,2 , + min 𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 = 0 + min 1,1,2 =0+1=1
- 73. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 𝑑𝑡𝑤 2,3 = 𝑐 𝑏, 𝑔 2 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 1,3 , + min 𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 = 1 + min 2,1,1 =1+1=2
- 74. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 2 1 1 1 1 2 1 2 3 a 𝑏 𝑔 X 𝑑𝑡𝑤 3,3 = 𝑐 𝑔, 𝑔 𝑑𝑡𝑤 2,3 , + min 𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 = 0 + min 2,1,2 =0+1=1
- 75. DTW(X, Z) 𝑔 Z 3 𝑔 2 𝑎 1 2 1 0 2 1 1 1 1 2 1 2 3 a 𝑏 𝑔 X DTW(X, Z) = 1. Optimal Warping Path (OWP) P can be found by chasing pointers (red arrows): P = ((1,1), (2, 2), (3, 3)).
- 76. Possible Optimizations of DTW The computation of DTW can be optimized so that only the cells within a specific window are considered
- 77. Possible Optimizations of DTW You may have realized by now that if we care only about the total cost of warping sequence X with sequence Y, we do not need to compute the entire N x M cost matrix – we need only two columns The storage savings are huge, but the running time remains the same – O(N x M) We can also normalize the DTW cost by N x M to keep it low
- 78. Spoken Word Recognition source code is in WavAudioDictionary.java
- 79. General Outline Given a directory of audio files with spoken words, process each file into a table that maps specific words (or phrases) to digital signal vectors These signal vectors can be pre-processed to eliminate silences An input audio file is taken and digitized into a digital signal vector The input vector is compared with DTW scores b/w the input vector and the digital vectors in the table
- 80. Optimizations If we use DTW to compute the similarity b/w the digital audio input vector and the vectors in the table, it is vital to keep the vectors as short as possible w/o sacrificing precision Possible suggestions: decreasing the sampling rate and merging samples into super-features (e.g., Haar coefficients) Parallelizing similarity computations
- 81. References M. Muller. Information Retrieval for Music and Motion, Ch.04. Springer, ISBN 978-3-540-74047-6 Bachu, R. G., et al. “Separation of Voiced and Unvoiced using Zero Crossing Rate and Energy of the Speech Signal." American Society for Engineering Education (ASEE) Zone Conference Proceedings. 2008.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment