View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
1.
Phone Recognition using Structural Support Vector Machine Student: Chao-Hong Meng Advisor: Lin-Shan Lee June 15, 2009
2.
What is this work about? shiyhvae Phone Sequence Structural SVM HMM MFCC/PLP
3.
Motivation Baseline Model Structural SVM Experiments Conclusion Outline
4.
Speech Recognition: Modeling is hard Traditional approach: Motivation Acoustic Model Language Model
5.
Structural SVM: A model can handle structural output Formulated by Joachims Directly model As a preliminary research is MFCC/PLP/Posterior is phone sequence Motivation (cont.)
6.
Motivation Baseline Model Structural SVM Experiments Conclusion Outline
7.
Baseline Model Baseline Model: Monophone HMM Tandem Difference is their inputs: HMM: MFCC/PLP Tandem: Posterior
8.
Motivation Baseline Model Structural SVM Experiments Conclusion Outline
9.
Define a compatibility function is feature vector sequence is the correct phone sequence of Goal is to find a such that Compatibility Function High to Low
10.
We assumes that is inner product is combined feature function to encode the relationship of x and y In this research, the transition count and emission count are considered The output of : (transition count, emission count) Combined Feature Function
11.
Settings: 3 different phone labels 2-dim feature vectors A training sample: Calculate Transition count matrix A Emission count matrix B: A Simplified Example z1 z2 z2 z1 A B C A A B B C C
12.
Concate the rows of A and B The output of consists of transition count emission count Combined Feature Function
13.
Output label set (phone set) with size K: Kronecker delta: Define Only one element is not zero in Tensor product: Formal Formulation
14.
Define as follows: Formal Formulation Transition Count Emission Count
15.
For different pairs of (x, y), there is a different value of Recall we define compatibility function as follows So we have a different preference to y for a given x Formal Formulation Obtain from training data w to be estimated
16.
contains the information about transition and emission Formal Formulation Y X w
17.
Purpose: Given a training sample , we want pairs with answer be the first. The gap between first ant the others be as large as possible Margin: Large Margin Maximise Training sample 1 Training sample 2 Training sample 3
18.
The primal form of Structural SVM: Where are slack variables, which could let margin be negative. means the margin should be at least Structural SVM Error tolerance weight
19.
Motivation Baseline Model Structural SVM Experiments Conclusion Outline
20.
Corpus: TIMIT Continuous English Phone set: 48 phones Frontend: MFCC/PLP + delta + delta delta (39 dim in total) processed by CMS HMM 3 state HMM for each phone. 32 Gaussian Mixture in each state. Tandem 1000 hidden nodes Looking at 4 previous frames and 4 next frames and current frames Reduce to 37 dimension with PCA Experimental Settings
21.
Structural SVM Define as previously stated: The output dimension of 48 * 48 + 48 * 39 = 4176 Experimental Settings Could be Second Order 39 48 48 48 Assume First Order Hypothesis Transition Matrix Emission Matrix
22.
The Results of HMM Best: 70.42% HMM Tandem Better
23.
Primal Form, Linear Kernel, First-Order Markov In theory, they should be the same Solving Primal Form Solving Dual Form
24.
Dual Form, Linear Kernel, First-Order Markov Best: 71.75% Input: Posterior Input: MFCC/PLP Slack Variable Weight Accuracy increases with C
25.
Dual Form, Linear Kernel, First-Order Markov Without dim-reduction is better than dim-reduction
26.
Best Results of Baseline and Structural SVM absolute1.33% improvement Getting Worse 71.75% 70.42%
27.
Dual Form, Linear Kernel, Second-Order Markov Better Better C = 1 C = 10
28.
Motivation Baseline Model Structural SVM Experiments Conclusion Outline
29.
Structural SVM performs badly when input is MFCC/PLP But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability. Conclusion
Views
Actions
Embeds 0
Report content