SlideShare a Scribd company logo
替代役男 ⺩王俊豪
chuchuhao831@gmail.com
Sequence Learning
from character based acoustic model to
End-to-End ASR system
figure from A. Graves, Supervised sequence labelling, ch2 p9
…
a stream of input data
= a sequence of acoustic feature
= a list of MFCC ( Here we use)
Learning
Algorithm Thee ound oof
Error ?
Extract Feature
Sequence Learning
Outline
1. Feedforward Neural Network
2. Recurrent Neural Network
3. Bidirectional Recurrent Neural Network
4. Long Short-Term Memory
5. Connectionist Temporal Classification
6. Build a End-to-End System
7. Experiment on Low-resource Language
Active Function
Weighted
Feedforward Neural Network
Active Function
Weighted
Recurrent Neural Network
Active Function
Weighted
Bidirectional RNN
figure from A. Graves, Supervised sequence labelling, ch3 p24
figure from A. Graves, Supervised sequence labelling, ch3 p23
Active func
G
G
G Input Gate
Output Gate
Forget Gate
G
W Weighted
Gate
M Multiplyer
M
M
M
W
W
RNNs
LSTM
W
W
W
W
W
W
W
LSTM Forward Pass, Previous Layer Input
G
G
G Input Gate
Output Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
Previous layer
LSTM Fordward Pass, Input Gate
G
G
Input Gate
Output Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
Previous layer
G
LSTM Fordward Pass, Foget Gate
G
G Input Gate
Output Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
Previous layer
G
LSTM Forward Pass, Cell
G
G Input Gate
Output Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
G
LSTM Forward Pass, Output Gate
G Input Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
G
Previous layer
G
LSTM Forward Pass, Output
G
G Input Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
G
G
G
G Input Gate
Output Gate
Forget Gate
M
M
M
W
W
W
W
W
W
W
LSTM Backward Pass, Output Gate
RNN
LSTM
figure from A. Graves, Supervised sequence labelling, p33,35
Additional CTC Output Layer
(1) Predict label at any timestep
Connectionist
Temporal
Classification
(2) Probability of Sentence
Predict label
at any tilmestep
Framewise Acoustic Feature
Label
A B C D E F G I J
K L M N O P Q R
S T U V W X Y Z
{space}
{blank}
others like ‘ “ .
Probability of Sentence
Thh..e S..outtd ooof
Th.e S.outd of
The Soutd of
Target
Step 1, remove repeated labels
Step 2, remove all blanks
paths:
labelling:
Predict
Now, we can do
the maximum likelihood training
Choose the labelling :
Best Path, the most probable path will correspond to most
probable labelling
Output Decoding
Something wrong here. Prefix Search would be better
How to connect target label
and RNNs Output
Sum over paths probability, forward
l_s = .
s-0 : .h.e.l.
s-1 : .h.e.l
s-2 : .h.e.
( s-2 blank to blank )
l_s = l
s-0 : .h.e.l.l
s-1 : .h.e.l.
s-2 : .h.e.l
( s-2 same non-blank label )
only allow transition
(1) between blank and non-blank label
(2) distinct non-blank label
Sum over paths probability, backward
l_s = .
s+0 : .l.l.o.
s+1 : l.l.o.
s+2 : .l.o.
( s+2 blank to blank )
l_s = l
s+0 : l.l.o.
s+1 : .l.o.
s+2 : l.o.
( s+2 same non-blank label )
only allow transition
(1) between blank and non-blank label
(2) distinct non-blank label
Sum over paths probability
D GO
Target : dog -> .d.o.g..t=0
D GO .t=1
D GO .t=2
D GO .t=T-2
D GO .t=T-1
D GO .t=T
…
.
.d
.d.
.d.o
.d.o.
.d.o.g
.d.o.g.
s=1
s=2
s=3
s=4
s=5
s=6
s=7
Sum over paths probability
D GO
backward-forward :
the probability of all the paths corresponding to l
that go through the symbol at time t
.t=0
D GO .t=1
D GO .t=t’
D GO .t=T-1
D GO .t=T
…
…
last we derivates w.r.t. unnormalised outputs, get error signal
A. Graves. Sequence Transduction with
Recurrent Neural Networks
Transcription network + Prediction network
Previous Labelfeature at time t
label prediction at time t next label probability
A. Graves. Towards End-to-end Speech Recognition with
Recurrent Neural Networks ( Google Deepmind )
Additional CTC Output Layer
Additional WER Output Layer
Minimise Objective Function -> Minimise Loss Function
Spectrogram feature
Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech
recognition ( Baidu Research)
Additional CTC Output Layer
Additional WER Output Layer
Spectrogram feature
Language Model Transcription
RNN Output : arther n tickets for the game
Decode Target: are there any tickets for the gam
Q( L ) = log( P( L | x ) ) + a * log( P_LM( L ) ) + b * word_count( L )
Low-resources language experiment
Target: ? ka didto sigi ra og tindog
Output: h bai to giy ngtndog
test 0.44,
test 0.71
1. Boulard and Morgan (1994) Connectionist speech recognition A hybrid approach. 

2. A. Grave. Supervised Sequence Labelling with Recurrent Neural Networks

3. A. Grave. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with
Recurrent Neural Network. 

4. A. Graves. Sequence Transduction with Recurrent Neural Networks

5. A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google
Deepmind ) 

6. Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research) 

7. standford CTC: https://github.com/amaas/stanford-ctc

8. R. Pascanu , On the difficulty of training recurrent neural networks

9. L. Besacier. Automatic Speech Recognition for Under-Resourced Languages: A Survey

10.A. Gibiansky. http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-
networks/

11. LSTM Mathematic formalism ( mandarin )

http://blog.csdn.net/u010754290/article/details/47167979

11. Assumption of Markov Model

http://jedlik.phy.bme.hu/~gerjanos/HMM/node5.html

12. Story about ANN-HMM Hybrid ( mandarin )

HMM : http://www.taodocs.com/p-5260781.html

13. Generative model v.s. discriminative model 

http://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-
discriminative-algorithm
Reference
Memory Network :
A. Grave. Neural Turing Machine
Appendix B. future work
figure from http://www.slideshare.net/yuzurukato/neural-turing-machines-43179669
Appendix A. Generative v.s. discriminative
A generative model learns the joint probability p(x,y).
A discriminative model learns the conditional probability p(y|x).
data input : (1,0), (1,0), (2,0), (2,1)
y=0 y=1
x=1 1/2 0
x=2 1/4 1/4
y=0 y=1
x=1 1 0
x=2 1/2 1/2
According to Andrew Y. Ng. On Discriminative vs. Generative
classifier: A comparison of logistic regression and naive Bayes
The overall gist is that discriminative models generally
outperform generative models in classification tasks.
Generative Discriminative

More Related Content

What's hot

Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
YanhuaSi
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
JaeJun Yoo
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Mohammed Bennamoun
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
Hira Gul
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
Sungjoon Choi
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Muhammad Rasel
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
Hojin Yang
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
Hemantha Kulathilake
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
Jinwon Lee
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
HARISH R
 
Connectionist Temporal Classification
Connectionist Temporal ClassificationConnectionist Temporal Classification
Connectionist Temporal Classification
Julius Hietala
 

What's hot (20)

Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Connectionist Temporal Classification
Connectionist Temporal ClassificationConnectionist Temporal Classification
Connectionist Temporal Classification
 

Viewers also liked

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Universitat Politècnica de Catalunya
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
Preferred Networks
 
Persistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-ChipPersistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-Chip
Baidu USA Research
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
WithTheBest
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Universitat Politècnica de Catalunya
 
Medical Deontology: 'Are We Truthfully The "GOOD DOCTORS??'
Medical Deontology:  'Are We Truthfully The "GOOD DOCTORS??' Medical Deontology:  'Are We Truthfully The "GOOD DOCTORS??'
Medical Deontology: 'Are We Truthfully The "GOOD DOCTORS??'
Prof. Mridul Panditrao
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Universitat Politècnica de Catalunya
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
ele3102 general principles in teaching language skills
ele3102 general principles in teaching language skillsele3102 general principles in teaching language skills
ele3102 general principles in teaching language skillsNur Amirah Hamzah
 
Teaching and learning materials
Teaching and learning materialsTeaching and learning materials
Teaching and learning materials
Vicente Antofina
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Universitat Politècnica de Catalunya
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
Josh Patterson
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth
 
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 

Viewers also liked (20)

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
 
Persistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-ChipPersistent RNNs: Stashing Recurrent Weights On-Chip
Persistent RNNs: Stashing Recurrent Weights On-Chip
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
 
Medical Deontology: 'Are We Truthfully The "GOOD DOCTORS??'
Medical Deontology:  'Are We Truthfully The "GOOD DOCTORS??' Medical Deontology:  'Are We Truthfully The "GOOD DOCTORS??'
Medical Deontology: 'Are We Truthfully The "GOOD DOCTORS??'
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
ele3102 general principles in teaching language skills
ele3102 general principles in teaching language skillsele3102 general principles in teaching language skills
ele3102 general principles in teaching language skills
 
Teaching and learning materials
Teaching and learning materialsTeaching and learning materials
Teaching and learning materials
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 

Similar to Sequence Learning with CTC technique

Hossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on GraphsHossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on Graphs
knowdiff
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
HAMNAHAMNA8
 
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
Subhabrata Mukherjee
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
GenomeInABottle
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
Sucheta Tripathy
 
Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...
Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...
Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...
Ana Marasović
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
Sankalp Gulati
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3
Aref35
 
Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01
Xuan Phu Nguyen
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
herbps10
 
Mcs 031
Mcs 031Mcs 031
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Ana Marasović
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009Yasuo Tabei
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
mooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthonmooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthon
garvitbisht27
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
Dmitriy Selivanov
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Mail.ru Group
 

Similar to Sequence Learning with CTC technique (20)

Hossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on GraphsHossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on Graphs
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Sma
SmaSma
Sma
 
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...
Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...
Poster for RepL4NLP - Multilingual Modal Sense Classification Using a Convolu...
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3
 
Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01
 
Zoooooohaib
ZoooooohaibZoooooohaib
Zoooooohaib
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Reginf pldi3
Reginf pldi3Reginf pldi3
Reginf pldi3
 
Mcs 031
Mcs 031Mcs 031
Mcs 031
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
mooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthonmooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthon
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
 

Recently uploaded

WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 

Recently uploaded (20)

WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 

Sequence Learning with CTC technique

  • 1. 替代役男 ⺩王俊豪 chuchuhao831@gmail.com Sequence Learning from character based acoustic model to End-to-End ASR system
  • 2. figure from A. Graves, Supervised sequence labelling, ch2 p9 … a stream of input data = a sequence of acoustic feature = a list of MFCC ( Here we use) Learning Algorithm Thee ound oof Error ? Extract Feature Sequence Learning
  • 3. Outline 1. Feedforward Neural Network 2. Recurrent Neural Network 3. Bidirectional Recurrent Neural Network 4. Long Short-Term Memory 5. Connectionist Temporal Classification 6. Build a End-to-End System 7. Experiment on Low-resource Language
  • 5.
  • 7.
  • 9. figure from A. Graves, Supervised sequence labelling, ch3 p24
  • 10. figure from A. Graves, Supervised sequence labelling, ch3 p23
  • 11. Active func G G G Input Gate Output Gate Forget Gate G W Weighted Gate M Multiplyer M M M W W RNNs LSTM W W W W W W W
  • 12. LSTM Forward Pass, Previous Layer Input G G G Input Gate Output Gate Forget Gate M M M W W W W W W W Previous layer
  • 13. LSTM Fordward Pass, Input Gate G G Input Gate Output Gate Forget Gate M M M W W W W W W W Previous layer G
  • 14. LSTM Fordward Pass, Foget Gate G G Input Gate Output Gate Forget Gate M M M W W W W W W W Previous layer G
  • 15. LSTM Forward Pass, Cell G G Input Gate Output Gate Forget Gate M M M W W W W W W W G
  • 16. LSTM Forward Pass, Output Gate G Input Gate Forget Gate M M M W W W W W W W G Previous layer G
  • 17. LSTM Forward Pass, Output G G Input Gate Forget Gate M M M W W W W W W W G
  • 18. G G G Input Gate Output Gate Forget Gate M M M W W W W W W W LSTM Backward Pass, Output Gate
  • 19. RNN LSTM figure from A. Graves, Supervised sequence labelling, p33,35
  • 20. Additional CTC Output Layer (1) Predict label at any timestep Connectionist Temporal Classification (2) Probability of Sentence
  • 21. Predict label at any tilmestep Framewise Acoustic Feature Label A B C D E F G I J K L M N O P Q R S T U V W X Y Z {space} {blank} others like ‘ “ .
  • 22. Probability of Sentence Thh..e S..outtd ooof Th.e S.outd of The Soutd of Target Step 1, remove repeated labels Step 2, remove all blanks paths: labelling: Predict Now, we can do the maximum likelihood training
  • 23. Choose the labelling : Best Path, the most probable path will correspond to most probable labelling Output Decoding Something wrong here. Prefix Search would be better
  • 24. How to connect target label and RNNs Output
  • 25. Sum over paths probability, forward
  • 26. l_s = . s-0 : .h.e.l. s-1 : .h.e.l s-2 : .h.e. ( s-2 blank to blank ) l_s = l s-0 : .h.e.l.l s-1 : .h.e.l. s-2 : .h.e.l ( s-2 same non-blank label ) only allow transition (1) between blank and non-blank label (2) distinct non-blank label
  • 27. Sum over paths probability, backward
  • 28. l_s = . s+0 : .l.l.o. s+1 : l.l.o. s+2 : .l.o. ( s+2 blank to blank ) l_s = l s+0 : l.l.o. s+1 : .l.o. s+2 : l.o. ( s+2 same non-blank label ) only allow transition (1) between blank and non-blank label (2) distinct non-blank label
  • 29. Sum over paths probability D GO Target : dog -> .d.o.g..t=0 D GO .t=1 D GO .t=2 D GO .t=T-2 D GO .t=T-1 D GO .t=T … . .d .d. .d.o .d.o. .d.o.g .d.o.g. s=1 s=2 s=3 s=4 s=5 s=6 s=7
  • 30. Sum over paths probability D GO backward-forward : the probability of all the paths corresponding to l that go through the symbol at time t .t=0 D GO .t=1 D GO .t=t’ D GO .t=T-1 D GO .t=T … …
  • 31. last we derivates w.r.t. unnormalised outputs, get error signal
  • 32. A. Graves. Sequence Transduction with Recurrent Neural Networks Transcription network + Prediction network Previous Labelfeature at time t label prediction at time t next label probability
  • 33. A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google Deepmind ) Additional CTC Output Layer Additional WER Output Layer Minimise Objective Function -> Minimise Loss Function Spectrogram feature
  • 34. Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research) Additional CTC Output Layer Additional WER Output Layer Spectrogram feature Language Model Transcription RNN Output : arther n tickets for the game Decode Target: are there any tickets for the gam Q( L ) = log( P( L | x ) ) + a * log( P_LM( L ) ) + b * word_count( L )
  • 35. Low-resources language experiment Target: ? ka didto sigi ra og tindog Output: h bai to giy ngtndog test 0.44, test 0.71
  • 36. 1. Boulard and Morgan (1994) Connectionist speech recognition A hybrid approach. 2. A. Grave. Supervised Sequence Labelling with Recurrent Neural Networks 3. A. Grave. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Network. 4. A. Graves. Sequence Transduction with Recurrent Neural Networks 5. A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google Deepmind ) 6. Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research) 7. standford CTC: https://github.com/amaas/stanford-ctc 8. R. Pascanu , On the difficulty of training recurrent neural networks 9. L. Besacier. Automatic Speech Recognition for Under-Resourced Languages: A Survey 10.A. Gibiansky. http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural- networks/ 11. LSTM Mathematic formalism ( mandarin ) http://blog.csdn.net/u010754290/article/details/47167979 11. Assumption of Markov Model http://jedlik.phy.bme.hu/~gerjanos/HMM/node5.html 12. Story about ANN-HMM Hybrid ( mandarin ) HMM : http://www.taodocs.com/p-5260781.html 13. Generative model v.s. discriminative model http://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and- discriminative-algorithm Reference
  • 37. Memory Network : A. Grave. Neural Turing Machine Appendix B. future work figure from http://www.slideshare.net/yuzurukato/neural-turing-machines-43179669
  • 38. Appendix A. Generative v.s. discriminative A generative model learns the joint probability p(x,y). A discriminative model learns the conditional probability p(y|x). data input : (1,0), (1,0), (2,0), (2,1) y=0 y=1 x=1 1/2 0 x=2 1/4 1/4 y=0 y=1 x=1 1 0 x=2 1/2 1/2 According to Andrew Y. Ng. On Discriminative vs. Generative classifier: A comparison of logistic regression and naive Bayes The overall gist is that discriminative models generally outperform generative models in classification tasks. Generative Discriminative