Upcoming SlideShare
×

# Thesis Defense 2007

• 466 views

• Comment goes here.
Are you sure you want to
Be the first to comment
Be the first to like this

Total Views
466
On Slideshare
0
From Embeds
0
Number of Embeds
0

Shares
6
0
Likes
0

No embeds

### Report content

No notes for slide

### Transcript

• 1. Phone Recognition using Structural Support Vector Machine
Student: Chao-Hong Meng
June 15, 2009
• 2. What is this work about?
shiyhvae
Phone Sequence
Structural SVM
HMM
MFCC/PLP
• 3. Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Outline
• 4. Speech Recognition:
Modeling is hard
Motivation
Acoustic
Model
Language Model
• 5. Structural SVM:
A model can handle structural output
Formulated by Joachims
Directly model
As a preliminary research
is MFCC/PLP/Posterior
is phone sequence
Motivation (cont.)
• 6. Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Outline
• 7. Baseline Model
Baseline Model:
Monophone HMM
Tandem
Difference is their inputs:
HMM: MFCC/PLP
Tandem: Posterior
• 8. Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Outline
• 9. Define a compatibility function
is feature vector sequence
is the correct phone sequence of
Goal is to find a such that
Compatibility Function
High to Low
• 10. We assumes that is inner product
is combined feature function to encode the relationship of x and y
In this research, the transition count and emission count are considered
The output of :
(transition count, emission count)
Combined Feature Function
• 11. Settings:
3 different phone labels
2-dim feature vectors
A training sample:
Calculate
Transition count matrix A
Emission count matrix B:
A Simplified Example
z1
z2
z2
z1
A
B
C
A
A
B
B
C
C
• 12. Concate the rows of A and B
The output of consists of
transition count
emission count
Combined Feature Function
• 13. Output label set (phone set) with size K:
Kronecker delta:
Define
Only one element is not zero in
Tensor product:
Formal Formulation
• 14. Define as follows:
Formal Formulation
Transition Count
Emission Count
• 15. For different pairs of (x, y), there is a different value of
Recall we define compatibility function as follows
So we have a different preference to y for a given x
Formal Formulation
Obtain from training data
w to be estimated
• 16. contains the information about transition and emission
Formal Formulation
Y
X
w
• 17. Purpose:
Given a training sample , we want pairs with answer be the first.
The gap between first ant the others be as large as possible
Margin:
Large Margin
Maximise
Training sample 1
Training sample 2
Training sample 3
• 18. The primal form of Structural SVM:
Where are slack variables, which could let margin be negative.
means the margin should be at least
Structural SVM
Error tolerance weight
• 19. Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Outline
• 20. Corpus: TIMIT
Continuous English
Phone set: 48 phones
Frontend:
MFCC/PLP + delta + delta delta (39 dim in total)
processed by CMS
HMM
3 state HMM for each phone.
32 Gaussian Mixture in each state.
Tandem
1000 hidden nodes
Looking at 4 previous frames and 4 next frames and current frames
Reduce to 37 dimension with PCA
Experimental Settings
• 21. Structural SVM
Define as previously stated:
The output dimension of
48 * 48 + 48 * 39 = 4176
Experimental Settings
Could be Second Order
39
48
48
48
Assume First Order Hypothesis
Transition Matrix
Emission Matrix
• 22. The Results of HMM
Best: 70.42%
HMM
Tandem
Better
• 23. Primal Form, Linear Kernel, First-Order Markov
In theory, they should be the same
Solving Primal Form
Solving Dual Form
• 24. Dual Form, Linear Kernel, First-Order Markov
Best: 71.75%
Input: Posterior
Input: MFCC/PLP
Slack Variable Weight
Accuracy increases with C
• 25. Dual Form, Linear Kernel, First-Order Markov
Without dim-reduction is better than dim-reduction
• 26. Best Results of Baseline and Structural SVM
absolute1.33% improvement
Getting Worse
71.75%
70.42%
• 27. Dual Form, Linear Kernel, Second-Order Markov
Better
Better
C = 1
C = 10
• 28. Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Outline
• 29. Structural SVM performs badly when input is MFCC/PLP
But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.
Conclusion
• 30. Thanks for your attention