Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[course site]
Day 3 Lecture 3
Speaker ID I
Javier Hernando
2
Acknowledgments
Miquel India, Omid Ghahabi, Pooyan Safari
Ph.D. candidates
3
Speech Recognition
Emotion - Happy
Gender - Woman
Age - Teengger
.
.
.
4
Speaker ID as Biometrics
5
Security
6
Applications
7
Modalities
8
Tasks
9
Features
Humans use different features to recognise the speaker:
● Pronunciation, diction, …
● Prosody, rhythm, speed, v...
10
HMMs and GMMs
11
GMM-UBM Universal Background Model
MAP Adaptation
12
Supervectors
13
i-vectors
Joint Factor Analysis (JFA) model
14
i-Vector dimension
15
i-Vector Training
T is trained iteratively according to the GMM posteriors
Given T and the UBM, i-vectors are extracted...
16
i-vector Scoring
17
SoA Speaker Recognition
18
DL in Speaker Recognition
19
DL Features
20
BN Features
After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.
Autoencoder
Denoising Autoencoder
Denoising autoencoder for cepstral domain
dereverberation.
● Transfrom noisy features of
reverberant...
Upcoming SlideShare
Loading in …5
×

Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

841 views

Published on

https://telecombcn-dl.github.io/2017-dlsl/

Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.

The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

  1. 1. [course site] Day 3 Lecture 3 Speaker ID I Javier Hernando
  2. 2. 2 Acknowledgments Miquel India, Omid Ghahabi, Pooyan Safari Ph.D. candidates
  3. 3. 3 Speech Recognition Emotion - Happy Gender - Woman Age - Teengger . . .
  4. 4. 4 Speaker ID as Biometrics
  5. 5. 5 Security
  6. 6. 6 Applications
  7. 7. 7 Modalities
  8. 8. 8 Tasks
  9. 9. 9 Features Humans use different features to recognise the speaker: ● Pronunciation, diction, … ● Prosody, rhythm, speed, volume,… ● Acoustic aspects of the voice. Desirable aspects for those features: ● Practical ○ To appear frequently and naturally during the speech ○ Easily measurable for the system ● Robust ○ Any change over time or by speaker’s health ○ Any change by different transmission characteristics or by background noise ● Secure ○ Hard to falsify No feature has all those properties Spectrum-derived features are the more used by now because of their effectiveness
  10. 10. 10 HMMs and GMMs
  11. 11. 11 GMM-UBM Universal Background Model MAP Adaptation
  12. 12. 12 Supervectors
  13. 13. 13 i-vectors Joint Factor Analysis (JFA) model
  14. 14. 14 i-Vector dimension
  15. 15. 15 i-Vector Training T is trained iteratively according to the GMM posteriors Given T and the UBM, i-vectors are extracted for each speaker utterance
  16. 16. 16 i-vector Scoring
  17. 17. 17 SoA Speaker Recognition
  18. 18. 18 DL in Speaker Recognition
  19. 19. 19 DL Features
  20. 20. 20 BN Features After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.
  21. 21. Autoencoder
  22. 22. Denoising Autoencoder Denoising autoencoder for cepstral domain dereverberation. ● Transfrom noisy features of reverberant speech to clean speech features. ● Pre-Trainning with Deep Belief Networks (DBN) Zhang et al., Deep neural network-based bottleneck feature and denoising autoencoder-based fro distant-talking speaker identification, EURASSIP Journal on Audio, Speech, and Music Processing (2015) 2015:12

×