This document presents a thesis on efficient acoustic model refinement for low resource languages using semi-supervised learning methods. The author proposes iterative semi-supervised learning frameworks that make use of unlabeled data through progressive decoding. A baseline non-iterative procedure is also described where the most confident unlabeled utterances are added after a single decoding. Experimental results on a Tamil speech corpus show that the iterative frameworks yield lower word error rates than the baseline approach by refining the acoustic models through multiple iterations. The frameworks proposed reduce computation time compared to decoding the entire unlabeled dataset in each iteration.
Efficient Acoustic Model Refinement for Low Resource Languages
1. Efficient Acoustic Model Refinement for Low Resource
Languages using Semi-Supervised Learning Methods
Chellapriyadharshini M (MT2016041)
Guide: Prof. V. Ramasubramanian
June, 2018
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 1 / 34
2. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 2 / 34
3. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
4. Overview
Introduction
This thesis addresses the problem of efficient acoustic-model
refinement using semi-supervised learning for low resource languages.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 3 / 34
5. Overview
Introduction
This thesis addresses the problem of efficient acoustic-model
refinement using semi-supervised learning for low resource languages.
Proposed Method
The proposed semi-supervised learning method decodes the unlabeled large
training corpus using the seed model and through various protocols, selects
the decoded utterances with high reliability using confidence levels and
iterative bootstrapping. Also improve seed model using active learning.
M.Chellapriyadharshini, Anoop Toffy, SrinivasaRaghavan K.M,
V.Ramasubramanian. “Semi-supervised and active-learning scenarios:
Efficient acoustic model refinement for a low resource Indian
language”. Accepted in INTERSPEECH 2018. Hyderabad, India.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 3 / 34
6. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
7. Overview
Motivation
Deep Learning Techniques - requirement of very large training corpus
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 4 / 34
8. Overview
Motivation
Deep Learning Techniques - requirement of very large training corpus
Resource Scarce Languages
1 limited availability of digital spoken language corpus
2 lack of script level representations
3 limited means of labeling the speech corpus
4 limited access to linguistic knowledge, expertise or resources by which
to acquire lexical representations, annotations etc.
5 labeling is expensive - due to the high throughput of the incoming data
- Voice Search
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 4 / 34
9. Overview
Motivation
Deep Learning Techniques - requirement of very large training corpus
Resource Scarce Languages
1 limited availability of digital spoken language corpus
2 lack of script level representations
3 limited means of labeling the speech corpus
4 limited access to linguistic knowledge, expertise or resources by which
to acquire lexical representations, annotations etc.
5 labeling is expensive - due to the high throughput of the incoming data
- Voice Search
Semi-Supervised learning method is extremely necessary in the ASR
context as it reduces the need for resource requirements or labeled
transcriptions.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 4 / 34
10. Related Work [1]
Lightly Supervised : as explored in
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 5 / 34
11. Related Work [2]
Semi-Supervised / Unsupervised : as explored in
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 6 / 34
12. Related Work [3]
Active Learning : as explored in
Low-Resource Languages : as explored in
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 7 / 34
13. Related Work [4]
Data Selection Strategies : as explored in
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 8 / 34
14. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
15. Overview
Not Applicable:
Lack of availability of large amounts of Approximate Transcriptions /
Text Corpora
× Lightly-Supervised
× Language Models trained from large text corpora & interpolation
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 9 / 34
16. Overview
Not Applicable:
Lack of availability of large amounts of Approximate Transcriptions /
Text Corpora
× Lightly-Supervised
× Language Models trained from large text corpora & interpolation
Lack of availability of large amounts of Audio corpus
× Iterative Strategy : Data Doubling based on Confidence
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 9 / 34
17. Overview
Not Applicable:
Lack of availability of large amounts of Approximate Transcriptions /
Text Corpora
× Lightly-Supervised
× Language Models trained from large text corpora & interpolation
Lack of availability of large amounts of Audio corpus
× Iterative Strategy : Data Doubling based on Confidence
Limitations specific to the Language
× Models trained on close Dialects
× Multi-lingually trained Monolingual systems
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 9 / 34
18. Overview
Not Applicable:
Lack of availability of large amounts of Approximate Transcriptions /
Text Corpora
× Lightly-Supervised
× Language Models trained from large text corpora & interpolation
Lack of availability of large amounts of Audio corpus
× Iterative Strategy : Data Doubling based on Confidence
Limitations specific to the Language
× Models trained on close Dialects
× Multi-lingually trained Monolingual systems
Applicable Methods Reused:
Semi-Supervised Self-Training Approach
Confidence Scores based on Aposteriori Probability of acoustic units
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 9 / 34
19. Overview
Not Applicable:
Lack of availability of large amounts of Approximate Transcriptions /
Text Corpora
× Lightly-Supervised
× Language Models trained from large text corpora & interpolation
Lack of availability of large amounts of Audio corpus
× Iterative Strategy : Data Doubling based on Confidence
Limitations specific to the Language
× Models trained on close Dialects
× Multi-lingually trained Monolingual systems
Applicable Methods Reused:
Semi-Supervised Self-Training Approach
Confidence Scores based on Aposteriori Probability of acoustic units
What’s Different?
∗ Iterative Strategies - to make the best use of available data
∗ Combined Approach: Active Learning + Semi-Supervised Learning
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 9 / 34
20. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
21. Corpus & Environment
Tamil language read speech data provided by SpeechOcean and
Microsoft for the ‘Low Resource Speech Recognition Challenge for
Indian Languages’ in Interspeech 2018.
Total: 15.07 hours
Lexicon : IIT-Madras Common Label Set Lexicon for Tamil.
Vocabulary : 32540 words
Experiments done in ‘Kaldi’-ASR Toolkit.
Acoustic Model training - DNN-HMM framework.
Language Model - word level tri-gram language model using the
training corpus.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 10 / 34
22. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
23. Baseline: Non-Iterative Procedure [1]
Semi-Supervised Learning using 25% Seed Data:
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 11 / 34
24. Baseline: Non-Iterative Procedure [2]
Confidence Score : measure of accuracy of the predicted labels.
It is the aposteriori probability of a phone or word hypothesis w, given
a sequence of acoustic feature vectors OT .
Figure: Confidence Level vs. WER
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 12 / 34
25. Baseline: Non-Iterative Procedure [3]
Semi-Supervised Self-Training:
Seed acoustic model AMseed built on Dseed .
AMseed used to predict approximate transcriptions for DU. Measure
of accuracy of decoding - Confidence Scores.
Confidence Intervals: (.95, 1), (.9, .95), (.85, .9), (.8, .85), (0, 0.8)
Most confident of the predicted transcriptions are then added to the
training corpus for Re-Training.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 13 / 34
26. Baseline: Non-Iterative Procedure [4]
Framework
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 14 / 34
27. Baseline: Non-Iterative Procedure [5]
WER Profile on T
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 15 / 34
28. Baseline: Non-Iterative Procedure [6]
Distribution of Confidence Bins
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 16 / 34
29. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
30. Iterative: Progressive Decoding of DU [1]
Decode the entire DU repeatedly to derive progressively better
decoding in such a way that the bins have progressively increasing
population of utterances.
The reuse of the iteratively refined bins result in progressively more
accurate acoustic-models.
Iterative procedure yields a lower WER profile than the non-iterative
procedure.
Considering the best model resulting from the above iterative
procedure carry out a ‘global’ iteration.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 17 / 34
31. Iterative: Progressive Decoding of DU [2]
Framework
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 18 / 34
32. Iterative: Progressive Decoding of DU [3]
Redistribution of utterances in bins
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 19 / 34
33. Iterative: Progressive Decoding of DU [4]
WER Profile on T
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 20 / 34
34. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
35. Iterative: Progressive Decoding of Bins [1]
The utterances belonging to each bin obtained after the first decoding
are frozen.
Only the decoding transcriptions of these fixed contents of the bins
gets better until convergence.
Once a bin Bi converges, the converged acoustic model AMi is then
used as the starting point to carry out the iterations on the next bin
Bi+1.
Advantage: we need not decode the entire DU each time, which
reduces the computation time manifold.
The two proposed methods of iterative learning produce equivalent
results and so we can afford to follow the second method as it has low
time requirements.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 21 / 34
36. Iterative: Progressive Decoding of Bins [2]
Framework
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 22 / 34
37. Iterative: Progressive Decoding of Bins [3]
WER Profile on T
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 23 / 34
38. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
39. Combined Procedure: Active Learning + Semi-Supervised
Learning [1]
Active Learning eases the labeling bottleneck by asking queries in the
form of unlabeled instances to be labeled by an oracle.
Pool Based Active Learning: queries are selected from a large pool of
unlabeled instances.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 24 / 34
40. Combined Procedure: Active Learning + Semi-Supervised
Learning [2]
Evaluate the informativeness of the unlabeled samples by some means
- Querying Strategy.
Uncertainty Sampling : selects the sample about which the model is
“least certain” how to label i.e. whose prediction is “least confident”.
This technique is popular in statistical sequence modeling tasks, as in
the case of speech, because the most likely label sequence (and its
associated likelihood) can be efficiently computed using dynamic
programming.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 25 / 34
41. Combined Procedure: Active Learning + Semi-Supervised
Learning [3]
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 26 / 34
42. Combined Procedure: Active Learning + Semi-Supervised
Learning [4]
Seed Corpus built by Uncertainty Sampling from 2.5% initial seed:
We have only enough manual labour available to transcribe 25% of
the data set.
So instead of labeling randomly selected utterances, we can pick and
choose the subset that should be labeled, so as to improve the quality
of the initial seed acoustic model.
Initial Data Split: Dseed : DU : T is 2.5:87.5:10
AMseed is trained on Dseed and used to decode DU and corresponding
Confidence Scores are computed.
Select ‘x%’ of DU that have the lowest confidence score and add
them to the training corpus Dseed and re-train the acoustic model.
Repeat these steps until we have grown the training corpus Dseed to
contain 25% of the entire data set.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 27 / 34
43. Combined Procedure: Active Learning + Semi-Supervised
Learning [5]
Now Semi-Supervised learning is applied using the Non-Iterative
procedure explained previously.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 28 / 34
44. Combined Procedure: Active Learning + Semi-Supervised
Learning [6]
WER Profile on T
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 29 / 34
45. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
46. Comparison of Results [1]
WER on T
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 30 / 34
47. Comparison of Results [2]
WER on T
Dseed : 50% decrease in WER of the Total WER Reduction possible,
after iterative training
Dseed built by Uncertainty Sampling : 41.2% decrease in WER of the
Total WER Reduction possible, without any iterative training
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 31 / 34
48. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
49. Future Work
To extend the iterative procedure on the combined active and
semi-supervised framework.
To extend the whole work on the 50 hours data.
To use a different measure of confidence of prediction - select
utterances that provide most benefit to the whole data set.
Explore different language models - varying training corpus size,
in-domain / out-of-domain data, multiple language model
components using many sources and then combine them with varying
weights.
Ensemble methods: for instance, Co-Training (Semi-Supervised) and
Query By Committee (Active Learning) combination.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 32 / 34
50. Outline
1 Overview
Introduction
Motivation
2 Related Work
3 Proposed Framework
Overview
Corpus & Environment
Baseline: Non-Iterative Procedure
Iterative: Progressive Decoding of DU
Iterative: Progressive Decoding of Bins
Combined Procedure: Active Learning + Semi-Supervised Learning
4 Results
Comparison of Results
5 Future Work & Conclusion
Future Work
Conclusion
51. Conclusion
We have addressed the problem of acoustic model training in a low
resource setting, where only a small seed data is assumed to be
available, and have proposed semi-supervised learning and active
learning protocols for refining the seed acoustic model from a larger,
but unlabeled, training corpus.
The proposed semi-supervised learning offers WER reductions by as
much as 50% with iteration and 41% without iteration of the best
WER-reduction realizable.
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 33 / 34
53. Questions?
Questions?
Thankyou for your time!
Chellapriyadharshini M (IIIT-Bangalore) Efficient Acoustic Model Refinement for Low Resource Languages using Semi-Supervised LeJune, 2018 34 / 34