Acoustic modeling using deep belief networks
Upcoming SlideShare
Loading in...5
×
 

Acoustic modeling using deep belief networks

on

  • 1,065 views

This is my report for speech recognition. I hope it is of help for you.

This is my report for speech recognition. I hope it is of help for you.

Statistics

Views

Total Views
1,065
Views on SlideShare
1,064
Embed Views
1

Actions

Likes
1
Downloads
16
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Acoustic modeling using deep belief networks Acoustic modeling using deep belief networks Presentation Transcript

  • Acoustic Modeling using Deep Belief Networks Yueshen Xu xuyueshen@163.com Zhejiang University1 CCNT, ZJU
  • Abstract  Problem  Achieving better phone recognition  Method  Deep neural networks which contain many layers of features and numbers of parameters  Rather than Gaussian Mixture Models  Step  Step 1: Pre-trained as a multi-layer generative models without making use of any discriminative information  spectral feature vector  Step2: Using backpropagation to make those features better at predicting a probability distribution2 CCNT, ZJU
  • Introduction  Typical Automatic Speech Recognition System  Model the sequential structure of speech signals: Hidden Markov Model  Spectral representation of the sound wave: HMM state + mixture of Gaussians+Mel-frequency Cepstral Coefficients(梅尔倒频谱 系数)  New research direction  Deeper acoustic models containing many layers of features  Feedforward neural networks  Advantages  The estimation of posterior probabilities of HMM does not require detailed assumptions about data distribution  Suitable for discrete and continuous features3 CCNT, ZJU
  • Introduction  Comparison among MFCCs, GMM  MFCCs  Partially overcome the very strong conditional independence assumption of HMM  GMM  Easy to fit to data using the EM algorithm  Inefficient at modeling high-dimensional data  Previous work of neural network  Using backpropagation algorithms to train neural networks discriminatively  Generative modeling vs discriminative training  Efficient to handle those unlabeled speech4 CCNT, ZJU
  • Introduction  Main novelty of this paper  Achieve consistently better phone recognition performance by pre- training a multi-layer neural network  One layer at a time, as a generative model  General Description  The generative pre-training creates many layers of feature detector  Using backpropagation algorithm to adjust the features in every layer to make features more useful for discrimination5 CCNT, ZJU
  • Learning a multilayer generative model  Two vital assumptions of this paper  The discrimination is more directly related to the underlying causes of data than to the individual elements of data itself  A good feature vector representation of the underlying causes can be recovered from the input data by modeling its higher order statistical structure  Directed view  Fit a multilayer generative model having infinitely layers of latent variables  Undirected view  Fitting a relatively simple type of learning module that only has one layer of latent variables6 CCNT, ZJU
  • Learning a multilayer generative model  Undirected view  Restricted Boltzmann Machine(RBM)  Bipartite graph in which visible units are connected to hidden units  No visible-visible or hidden-hidden connections  Visible units vs. Hidden units  Visible units: representing observation  Hidden units: representing features using undirected weighted connections  RBM in this paper  Binary RBM  Both hidden and visible units are binary and stochastic  Gaussian-Bernouli RBM  Hidden units are binary but visible units are linear with Gaussian noise7 CCNT, ZJU
  • Learning a multilayer generative model  Binary RBM  The weights on the connections and biases of individual units define a probability distribution over the joint states of visible and hidden units via an energy function  The conditional distribution p(h| v, )   The conditional distribution p(v| h, )8 CCNT, ZJU
  • Learning a multilayer generative model  Learning DBN  Updating each weight wij using the difference between two measured, pairwise correlations:  Directed view  A sigmoid belief net consisting of multiple layers of binary stochastic units  Hidden layers  Binary features  Visible layers  Binary data vectors9 CCNT, ZJU
  • Learning a multilayer generative model  Generating data from the model  Binary states are chosen for the top layer of hidden units  Adjusting the weights on the top-down connections  Performing gradient ascent in the expected log probability  Challenge  Getting unbiased samples from exponentially large posterior is intractable  Lack of conditional independence  Learning with tied weights (1/2)  Learning Context: a sigmoid belief net with an infinite number of layers and tied symmetric weights between layers  The posterior can be computed by simply multiplying visible vectors by transposed weight matrix10 CCNT, ZJU
  • Learning a multilayer generative model • An infinite sigmoid belief net with weights • Inference is easy since once posteriors have been sampled for the first hidden layer, the same process can be used for the next hidden layer  Learning is a little more difficult  Because every copy of tied weight matrix gets different derivatives11 CCNT, ZJU
  • Learning a multilayer generative model  Unbiased estimate of the sum of derivatives  h(2) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(1)  h(3) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(2)  Unbiased estimate of the sum of derivatives  h(2) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(1)  h(3) can be viewed as a noisy but unbiased estimate of probabilities for visible units predicted by h(2)12 CCNT, ZJU
  • Learning a multilayer generative model  Learning different weights in each layer  Making the generative model more powerful by allowing different weights in different layers  Step1: Learn with all of weight matrices tied together  Step2: Untie the bottom weight matrix form the other matrices  Step3: Obtain the frozen matrix W(1)  Step4: Keeping all remaining matrices tied together, and continuing to learn higher matrices  This involves first inferring h(1) from v by using W(1) and then inferring h(2) , h(3) , and h(4) in a similar bottom up manner using W or WT13 CCNT, ZJU
  • Learning a multilayer generative model  Deep belief net(DBN)  Having learned K layers of features, we get a directed generative model called ’Deep Belief Net’  DBN has K different weight matrices between lower layers and an infinite number of higher layers  This paper models the whole system as a feedforward, deterministic neural network  This network is then discriminatively fine tuned by using backpropagation to maximize the log probability of correct HMM states14 CCNT, ZJU
  • Using Deep Belief Nets for Phone Recognition  Visible unit  Using a context window of n successive frames of speech coefficients  Generate phone sequences  The resulting feedforward neural network is discriminatively trained to output a probability distribution over all possible labels of central frames  Then the pdfs over all possible labels for each frame is fed into a standard Viterbi decoder15 CCNT, ZJU
  • Conclusions  Initiative  This is the first application to acoustic modeling of neural networks in which multiple layers of features are generatively pre-trained  This approach can be extended to explicitly model the covariance structure of input features  It can be used to jointly train acoustic and language models  It can be applied to a large vocabulary task replace of GMM16 CCNT, ZJU
  • Thank you17 CCNT, ZJU