Exploiting contextual information for improved phoeneme recognition
1. EXPLOITING CONTEXTUAL INFORMATION
FOR IMPROVED PHONEME RECOGNITION
Joel Pinto, B. Yegnanarayana, H. Hermansky, Mathew Magimai.-Doss
presented by
Sebastian T. Hafner
2. OVERVIEW
• Introduction
• Basic Phoneme Recognizer
• Contextual Information
• at the feature level
• at the posterior level
2
21. POSTERIOR PROBABILITY
p (xt | qt = i) P (qt = i | xt )
p (xt ) P (qt = i)
P (qt = i) = P (qt = j) ∀i, j ∈ {1, 2, . . . , 39}
equal probability for each phoneme
11
44. 1 LARGE MLP
MLP P (qt = i, st = j | xt )
39 phonemes x 3 states
19
45. LARGE MLP VS. 3 SMALLER
labels for training MLP
classifier
uniform force aligned
one MLP
with 117 classes 69.87 71.67
three MLPs
earch 39 classes 70.13 69.70
20
67. EXPERIMENT B
original data modified data
0.04 0.25
0.64 0.64
0.32 0.11
remaining values: randomly =1
modified data
28
68. RECOGNITION ACCURACY
experiment 1 state MLP 3 state MLP
baseline 68.12 71.55
experiment A 62.77 70.27
experiment B 64.24 70.75
29
69. RECOGNITION ACCURACY
experiment 1 state MLP 3 state MLP
baseline 68.12 71.55
experiment A 62.77 70.27
experiment B 64.24 70.75
Information in phoneme posteriors !
29
70. SUMMARY
• contextual information in features
• contextual information in probabilities
30