Phone Recognition using Structural Support Vector Machine<br />Student: Chao-Hong Meng<br />Advisor: Lin-Shan Lee<br />Jun...
What is this work about?<br />shiyhvae<br />Phone Sequence <br />Structural SVM<br />HMM<br />MFCC/PLP<br />
Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
Speech Recognition:<br />Modeling           is hard             <br />Traditional approach:<br />Motivation<br />Acoustic ...
Structural SVM:<br />A model can handle structural output<br />Formulated by Joachims<br />Directly model <br />As a preli...
Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
Baseline Model<br />Baseline Model:<br />Monophone HMM<br />Tandem<br />Difference is their inputs:<br />HMM: MFCC/PLP<br ...
Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
Define a compatibility function<br />      is feature vector sequence<br />      is the correct phone sequence of <br />Go...
We assumes that        is inner product<br />          is combined feature function to encode the relationship of x and y<...
Settings:<br />3 different phone labels                   <br />2-dim feature vectors <br />A training sample:<br />Calcul...
Concate the rows of A and B<br />The output of            consists of<br />transition count <br />emission count<br />Comb...
Output label set (phone set) with size K:<br />Kronecker delta:<br />Define<br />Only one element is not zero in<br />Tens...
Define            as follows:<br />Formal Formulation<br />Transition Count<br />Emission Count<br />
For different pairs of (x, y), there is a different value of <br />Recall we define compatibility function as follows<br /...
          contains the information about transition and emission<br />Formal Formulation<br />Y<br />X<br />w<br />
Purpose:<br />Given a training sample , we want pairs with answer be the first.<br />The gap between first ant the others ...
The primal form of Structural SVM:<br />Where      are slack variables, which could let margin             be negative.<br...
Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
Corpus: TIMIT<br />Continuous English<br />Phone set: 48 phones <br />Frontend: <br />MFCC/PLP + delta + delta delta (39 d...
Structural SVM<br />Define             as previously stated:<br />The output dimension of<br />48 * 48 + 48 * 39 = 4176<br...
The Results of HMM<br />Best: 70.42%<br />HMM<br />Tandem<br />Better<br />
Primal Form, Linear Kernel, First-Order Markov<br />In theory, they should be the same<br />Solving Primal Form<br />Solvi...
Dual Form, Linear Kernel, First-Order Markov<br />Best: 71.75%<br />Input: Posterior<br />Input: MFCC/PLP<br />Slack Varia...
Dual Form, Linear Kernel, First-Order Markov<br />Without dim-reduction is better than dim-reduction<br />
Best Results of Baseline and Structural SVM<br />absolute1.33% improvement <br />Getting Worse<br />71.75%<br />70.42%<br />
Dual Form, Linear Kernel, Second-Order Markov<br />Better<br />Better<br />C = 1<br />C = 10<br />
Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
Structural SVM performs badly when input is MFCC/PLP<br />But Structural SVM can improve absolute 1% over Tandem when usin...
Thanks for your attention<br />
Upcoming SlideShare
Loading in …5
×

Thesis Defense 2007

716 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
716
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Thesis Defense 2007

  1. 1. Phone Recognition using Structural Support Vector Machine<br />Student: Chao-Hong Meng<br />Advisor: Lin-Shan Lee<br />June 15, 2009<br />
  2. 2. What is this work about?<br />shiyhvae<br />Phone Sequence <br />Structural SVM<br />HMM<br />MFCC/PLP<br />
  3. 3. Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
  4. 4. Speech Recognition:<br />Modeling is hard <br />Traditional approach:<br />Motivation<br />Acoustic <br />Model<br />Language Model<br />
  5. 5. Structural SVM:<br />A model can handle structural output<br />Formulated by Joachims<br />Directly model <br />As a preliminary research<br /> is MFCC/PLP/Posterior<br /> is phone sequence<br />Motivation (cont.)<br />
  6. 6. Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
  7. 7. Baseline Model<br />Baseline Model:<br />Monophone HMM<br />Tandem<br />Difference is their inputs:<br />HMM: MFCC/PLP<br />Tandem: Posterior<br />
  8. 8. Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
  9. 9. Define a compatibility function<br /> is feature vector sequence<br /> is the correct phone sequence of <br />Goal is to find a such that<br />Compatibility Function<br />High to Low<br />
  10. 10. We assumes that is inner product<br /> is combined feature function to encode the relationship of x and y<br />In this research, the transition count and emission count are considered<br />The output of :<br />(transition count, emission count)<br />Combined Feature Function<br />
  11. 11. Settings:<br />3 different phone labels <br />2-dim feature vectors <br />A training sample:<br />Calculate<br />Transition count matrix A<br />Emission count matrix B:<br />A Simplified Example<br />z1<br />z2<br />z2<br />z1<br />A<br />B<br />C<br />A<br />A<br />B<br />B<br />C<br />C<br />
  12. 12. Concate the rows of A and B<br />The output of consists of<br />transition count <br />emission count<br />Combined Feature Function<br />
  13. 13. Output label set (phone set) with size K:<br />Kronecker delta:<br />Define<br />Only one element is not zero in<br />Tensor product:<br />Formal Formulation<br />
  14. 14. Define as follows:<br />Formal Formulation<br />Transition Count<br />Emission Count<br />
  15. 15. For different pairs of (x, y), there is a different value of <br />Recall we define compatibility function as follows<br />So we have a different preference to y for a given x<br />Formal Formulation<br />Obtain from training data<br />w to be estimated<br />
  16. 16. contains the information about transition and emission<br />Formal Formulation<br />Y<br />X<br />w<br />
  17. 17. Purpose:<br />Given a training sample , we want pairs with answer be the first.<br />The gap between first ant the others be as large as possible<br />Margin:<br />Large Margin<br />Maximise<br />Training sample 1<br />Training sample 2 <br />Training sample 3<br />
  18. 18. The primal form of Structural SVM:<br />Where are slack variables, which could let margin be negative.<br /> means the margin should be at least<br />Structural SVM<br />Error tolerance weight<br />
  19. 19. Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
  20. 20. Corpus: TIMIT<br />Continuous English<br />Phone set: 48 phones <br />Frontend: <br />MFCC/PLP + delta + delta delta (39 dim in total)<br />processed by CMS<br />HMM<br />3 state HMM for each phone.<br />32 Gaussian Mixture in each state.<br />Tandem<br />1000 hidden nodes<br />Looking at 4 previous frames and 4 next frames and current frames<br />Reduce to 37 dimension with PCA<br />Experimental Settings<br />
  21. 21. Structural SVM<br />Define as previously stated:<br />The output dimension of<br />48 * 48 + 48 * 39 = 4176<br />Experimental Settings<br />Could be Second Order<br />39<br />48<br />48<br />48<br />Assume First Order Hypothesis<br />Transition Matrix<br />Emission Matrix<br />
  22. 22. The Results of HMM<br />Best: 70.42%<br />HMM<br />Tandem<br />Better<br />
  23. 23. Primal Form, Linear Kernel, First-Order Markov<br />In theory, they should be the same<br />Solving Primal Form<br />Solving Dual Form<br />
  24. 24. Dual Form, Linear Kernel, First-Order Markov<br />Best: 71.75%<br />Input: Posterior<br />Input: MFCC/PLP<br />Slack Variable Weight<br />Accuracy increases with C<br />
  25. 25. Dual Form, Linear Kernel, First-Order Markov<br />Without dim-reduction is better than dim-reduction<br />
  26. 26. Best Results of Baseline and Structural SVM<br />absolute1.33% improvement <br />Getting Worse<br />71.75%<br />70.42%<br />
  27. 27. Dual Form, Linear Kernel, Second-Order Markov<br />Better<br />Better<br />C = 1<br />C = 10<br />
  28. 28. Motivation<br />Baseline Model<br />Structural SVM<br />Experiments<br />Conclusion<br />Outline<br />
  29. 29. Structural SVM performs badly when input is MFCC/PLP<br />But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.<br />Conclusion<br />
  30. 30. Thanks for your attention<br />

×