• Like
Thesis Defense 2007
Upcoming SlideShare
Loading in...5
×

Thesis Defense 2007

  • 466 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
466
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Phone Recognition using Structural Support Vector Machine
    Student: Chao-Hong Meng
    Advisor: Lin-Shan Lee
    June 15, 2009
  • 2. What is this work about?
    shiyhvae
    Phone Sequence
    Structural SVM
    HMM
    MFCC/PLP
  • 3. Motivation
    Baseline Model
    Structural SVM
    Experiments
    Conclusion
    Outline
  • 4. Speech Recognition:
    Modeling is hard
    Traditional approach:
    Motivation
    Acoustic
    Model
    Language Model
  • 5. Structural SVM:
    A model can handle structural output
    Formulated by Joachims
    Directly model
    As a preliminary research
    is MFCC/PLP/Posterior
    is phone sequence
    Motivation (cont.)
  • 6. Motivation
    Baseline Model
    Structural SVM
    Experiments
    Conclusion
    Outline
  • 7. Baseline Model
    Baseline Model:
    Monophone HMM
    Tandem
    Difference is their inputs:
    HMM: MFCC/PLP
    Tandem: Posterior
  • 8. Motivation
    Baseline Model
    Structural SVM
    Experiments
    Conclusion
    Outline
  • 9. Define a compatibility function
    is feature vector sequence
    is the correct phone sequence of
    Goal is to find a such that
    Compatibility Function
    High to Low
  • 10. We assumes that is inner product
    is combined feature function to encode the relationship of x and y
    In this research, the transition count and emission count are considered
    The output of :
    (transition count, emission count)
    Combined Feature Function
  • 11. Settings:
    3 different phone labels
    2-dim feature vectors
    A training sample:
    Calculate
    Transition count matrix A
    Emission count matrix B:
    A Simplified Example
    z1
    z2
    z2
    z1
    A
    B
    C
    A
    A
    B
    B
    C
    C
  • 12. Concate the rows of A and B
    The output of consists of
    transition count
    emission count
    Combined Feature Function
  • 13. Output label set (phone set) with size K:
    Kronecker delta:
    Define
    Only one element is not zero in
    Tensor product:
    Formal Formulation
  • 14. Define as follows:
    Formal Formulation
    Transition Count
    Emission Count
  • 15. For different pairs of (x, y), there is a different value of
    Recall we define compatibility function as follows
    So we have a different preference to y for a given x
    Formal Formulation
    Obtain from training data
    w to be estimated
  • 16. contains the information about transition and emission
    Formal Formulation
    Y
    X
    w
  • 17. Purpose:
    Given a training sample , we want pairs with answer be the first.
    The gap between first ant the others be as large as possible
    Margin:
    Large Margin
    Maximise
    Training sample 1
    Training sample 2
    Training sample 3
  • 18. The primal form of Structural SVM:
    Where are slack variables, which could let margin be negative.
    means the margin should be at least
    Structural SVM
    Error tolerance weight
  • 19. Motivation
    Baseline Model
    Structural SVM
    Experiments
    Conclusion
    Outline
  • 20. Corpus: TIMIT
    Continuous English
    Phone set: 48 phones
    Frontend:
    MFCC/PLP + delta + delta delta (39 dim in total)
    processed by CMS
    HMM
    3 state HMM for each phone.
    32 Gaussian Mixture in each state.
    Tandem
    1000 hidden nodes
    Looking at 4 previous frames and 4 next frames and current frames
    Reduce to 37 dimension with PCA
    Experimental Settings
  • 21. Structural SVM
    Define as previously stated:
    The output dimension of
    48 * 48 + 48 * 39 = 4176
    Experimental Settings
    Could be Second Order
    39
    48
    48
    48
    Assume First Order Hypothesis
    Transition Matrix
    Emission Matrix
  • 22. The Results of HMM
    Best: 70.42%
    HMM
    Tandem
    Better
  • 23. Primal Form, Linear Kernel, First-Order Markov
    In theory, they should be the same
    Solving Primal Form
    Solving Dual Form
  • 24. Dual Form, Linear Kernel, First-Order Markov
    Best: 71.75%
    Input: Posterior
    Input: MFCC/PLP
    Slack Variable Weight
    Accuracy increases with C
  • 25. Dual Form, Linear Kernel, First-Order Markov
    Without dim-reduction is better than dim-reduction
  • 26. Best Results of Baseline and Structural SVM
    absolute1.33% improvement
    Getting Worse
    71.75%
    70.42%
  • 27. Dual Form, Linear Kernel, Second-Order Markov
    Better
    Better
    C = 1
    C = 10
  • 28. Motivation
    Baseline Model
    Structural SVM
    Experiments
    Conclusion
    Outline
  • 29. Structural SVM performs badly when input is MFCC/PLP
    But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.
    Conclusion
  • 30. Thanks for your attention