Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Improving Speech Recognition Using Limited Accent Diverse
British English Training Data
With Acoustic Model and Data Selec...
Motivation
1/12
Regional accents can be a problem for Speech Technology!
Overview
• Problems: (1) Multi conditional data problem (2) Recognition of 14
regional accents of British English (3) Defi...
Objectives
• This research is concerned with automatic speech recognition (ASR) for accented
speech using a range of diffe...
Baseline AID System Design
Phonotactic
Accuracy : 80.65 %
I-vector
Accuracy :76.76%
ACCDIST-SVM
Accuracy: 95 %
4/12
ACCDIST Accent ID feature space
5/12
GMM-HMM: Unsupervised Adaptation
6/12
GMM-HMM: Speaker versus Accent Adaptation
Supervised speaker versus accent adaptation
Unsupervised speaker versus accent a...
DNN-HMM versus GMM-HMM
???
8/12
Accent properties of WSJCAM0
using an i-vector based AID
9/12
DNN based ASR Vs
i-vector based AID
error rates
10/12
DNN-HMM: Extra Training Material (ETM) &
Extra Pre-Training Material (EPM)
The relationship between AI &ASR error rates mo...
Summary and publications
• To address the multi-accent learning problem in a deep learning acoustic
modelling framework wi...
Thank you for listening
Email: m.najafian@utdallas.edu
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
presentation_Diarization_MIT
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

presentation_ASR_MIT

Download to read offline

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

presentation_ASR_MIT

  1. 1. Improving Speech Recognition Using Limited Accent Diverse British English Training Data With Acoustic Model and Data Selection By Maryam Najafian Supervisor Prof. Martin Russell University of Birmingham, UK 4th October 2016 Email: m.najafian@utdallas.edu
  2. 2. Motivation 1/12 Regional accents can be a problem for Speech Technology!
  3. 3. Overview • Problems: (1) Multi conditional data problem (2) Recognition of 14 regional accents of British English (3) Define an approach to measure the accent difficulty • Low dimensional visualisation of the AID feature space reveals expected relationships between regional accents. • One approach to accent robust ASR is adaption to the speakers accent using an online AID to select an accent specific acoustic model 1,2,3 • Another approach to accent-robust ASR is AID and analyse the training and apply data selection to train a DNN based system 1,2. [1] M Najafian, Acoustic model selection for recognition of regional accented speech”, Doctoral dissertation, Ph. D. dissertation, University of Birmingham, UK, 2016. [2] M Najafian et al. Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems," in ODYSSEY, 2016, pp. 132-139. [3] M Najafian et al. Unsupervised Model Selection for Recognition of Regional Accented “peech , Proc. Interspeech 2014. [4] M Najafian et al. Improving speech recognition using limited accent diverse British English training data with deep neural networks in MLSP, 2016. 2/12
  4. 4. Objectives • This research is concerned with automatic speech recognition (ASR) for accented speech using a range of different AID systems for GMM-HMM and DNN-HMM based acoustic model selection • Trained on the SI training set (92 speakers, 7861 utterances) of the WSJCAM0 corpus of read British English speech • Tested/adapted on ABI Corpus, 14 different accents (285 speakers) 3/12
  5. 5. Baseline AID System Design Phonotactic Accuracy : 80.65 % I-vector Accuracy :76.76% ACCDIST-SVM Accuracy: 95 % 4/12
  6. 6. ACCDIST Accent ID feature space 5/12
  7. 7. GMM-HMM: Unsupervised Adaptation 6/12
  8. 8. GMM-HMM: Speaker versus Accent Adaptation Supervised speaker versus accent adaptation Unsupervised speaker versus accent adaptation 7/12
  9. 9. DNN-HMM versus GMM-HMM ??? 8/12
  10. 10. Accent properties of WSJCAM0 using an i-vector based AID 9/12
  11. 11. DNN based ASR Vs i-vector based AID error rates 10/12
  12. 12. DNN-HMM: Extra Training Material (ETM) & Extra Pre-Training Material (EPM) The relationship between AI &ASR error rates motivated analysis of the effect of supplementing the WSJCAM0 training set with different types of accented speech!
  13. 13. Summary and publications • To address the multi-accent learning problem in a deep learning acoustic modelling framework with limited resources, this work introduced a concept called accent difficulty to analyse the training set • A relative gain of 46.85% is achieved in recognising the Accents of British Isles corpus by applying a baseline DNN model rather than a Gaussian mixture model. • Our results show that across all accent regions supplementing the training set with a small amount of data from the most difficult accent (2.25 hours of Glaswegian accent) leads to a similar gain in performance as using a large amount of accent diverse data (8.96 hours from 14 accent regions), even though this accent accounts for just 14% of the test data. 12/12
  14. 14. Thank you for listening Email: m.najafian@utdallas.edu

Views

Total views

414

On Slideshare

0

From embeds

0

Number of embeds

9

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×