Acoustic modeling using deep belief networks

1,170 views
1,022 views

Published on

This is my report for speech recognition. I hope it is of help for you.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,170
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Acoustic modeling using deep belief networks

  1. 1. Acoustic Modeling using Deep Belief Networks Yueshen Xu xuyueshen@163.com Zhejiang University1 CCNT, ZJU
  2. 2. Abstract  Problem  Achieving better phone recognition  Method  Deep neural networks which contain many layers of features and numbers of parameters  Rather than Gaussian Mixture Models  Step  Step 1: Pre-trained as a multi-layer generative models without making use of any discriminative information  spectral feature vector  Step2: Using backpropagation to make those features better at predicting a probability distribution2 CCNT, ZJU
  3. 3. Introduction  Typical Automatic Speech Recognition System  Model the sequential structure of speech signals: Hidden Markov Model  Spectral representation of the sound wave: HMM state + mixture of Gaussians+Mel-frequency Cepstral Coefficients(梅尔倒频谱 系数)  New research direction  Deeper acoustic models containing many layers of features  Feedforward neural networks  Advantages  The estimation of posterior probabilities of HMM does not require detailed assumptions about data distribution  Suitable for discrete and continuous features3 CCNT, ZJU
  4. 4. Introduction  Comparison among MFCCs, GMM  MFCCs  Partially overcome the very strong conditional independence assumption of HMM  GMM  Easy to fit to data using the EM algorithm  Inefficient at modeling high-dimensional data  Previous work of neural network  Using backpropagation algorithms to train neural networks discriminatively  Generative modeling vs discriminative training  Efficient to handle those unlabeled speech4 CCNT, ZJU
  5. 5. Introduction  Main novelty of this paper  Achieve consistently better phone recognition performance by pre- training a multi-layer neural network  One layer at a time, as a generative model  General Description  The generative pre-training creates many layers of feature detector  Using backpropagation algorithm to adjust the features in every layer to make features more useful for discrimination5 CCNT, ZJU
  6. 6. Learning a multilayer generative model  Two vital assumptions of this paper  The discrimination is more directly related to the underlying causes of data than to the individual elements of data itself  A good feature vector representation of the underlying causes can be recovered from the input data by modeling its higher order statistical structure  Directed view  Fit a multilayer generative model having infinitely layers of latent variables  Undirected view  Fitting a relatively simple type of learning module that only has one layer of latent variables6 CCNT, ZJU
  7. 7. Learning a multilayer generative model  Undirected view  Restricted Boltzmann Machine(RBM)  Bipartite graph in which visible units are connected to hidden units  No visible-visible or hidden-hidden connections  Visible units vs. Hidden units  Visible units: representing observation  Hidden units: representing features using undirected weighted connections  RBM in this paper  Binary RBM  Both hidden and visible units are binary and stochastic  Gaussian-Bernouli RBM  Hidden units are binary but visible units are linear with Gaussian noise7 CCNT, ZJU
  8. 8. Learning a multilayer generative model  Binary RBM  The weights on the connections and biases of individual units define a probability distribution over the joint states of visible and hidden units via an energy function  The conditional distribution p(h| v, )   The conditional distribution p(v| h, )8 CCNT, ZJU
  9. 9. Learning a multilayer generative model  Learning DBN  Updating each weight wij using the difference between two measured, pairwise correlations:  Directed view  A sigmoid belief net consisting of multiple layers of binary stochastic units  Hidden layers  Binary features  Visible layers  Binary data vectors9 CCNT, ZJU
  10. 10. Learning a multilayer generative model  Generating data from the model  Binary states are chosen for the top layer of hidden units  Adjusting the weights on the top-down connections  Performing gradient ascent in the expected log probability  Challenge  Getting unbiased samples from exponentially large posterior is intractable  Lack of conditional independence  Learning with tied weights (1/2)  Learning Context: a sigmoid belief net with an infinite number of layers and tied symmetric weights between layers  The posterior can be computed by simply multiplying visible vectors by transposed weight matrix10 CCNT, ZJU
  11. 11. Learning a multilayer generative model • An infinite sigmoid belief net with weights • Inference is easy since once posteriors have been sampled for the first hidden layer, the same process can be used for the next hidden layer  Learning is a little more difficult  Because every copy of tied weight matrix gets different derivatives11 CCNT, ZJU
  12. 12. Learning a multilayer generative model  Unbiased estimate of the sum of derivatives  h(2) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(1)  h(3) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(2)  Unbiased estimate of the sum of derivatives  h(2) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(1)  h(3) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(2)12 CCNT, ZJU
  13. 13. Learning a multilayer generative model  Learning different weights in each layer  Making the generative model more powerful by allowing different weights in different layers  Step1: Learn with all of weight matrices tied together  Step2: Untie the bottom weight matrix form the other matrices  Step3: Obtain the frozen matrix W(1)  Step4: Keeping all remaining matrices tied together, and continuing to learn higher matrices  This involves first inferring h(1) from v by using W(1) and then inferring h(2) , h(3) , and h(4) in a similar bottom up manner using W or WT13 CCNT, ZJU
  14. 14. Learning a multilayer generative model  Deep belief net(DBN)  Having learned K layers of features, we get a directed generative model called ’Deep Belief Net’  DBN has K different weight matrices between lower layers and an infinite number of higher layers  This paper models the whole system as a feedforward, deterministic neural network  This network is then discriminatively fine tuned by using backpropagation to maximize the log probability of correct HMM states14 CCNT, ZJU
  15. 15. Using Deep Belief Nets for Phone Recognition  Visible unit  Using a context window of n successive frames of speech coefficients  Generate phone sequences  The resulting feedforward neural network is discriminatively trained to output a probability distribution over all possible labels of central frames  Then the pdfs over all possible labels for each frame is fed into a standard Viterbi decoder15 CCNT, ZJU
  16. 16. Conclusions  Initiative  This is the first application to acoustic modeling of neural networks in which multiple layers of features are generatively pre-trained  This approach can be extended to explicitly model the covariance structure of input features  It can be used to jointly train acoustic and language models  It can be applied to a large vocabulary task replace of GMM16 CCNT, ZJU
  17. 17. Thank you17 CCNT, ZJU

×