5. Engineering
OVERVIEW OF THE ASR
p(W|X) ≈ argmax p(X|W) p(W)
AM
Recog. Result
LM
Decoding
Xfeatures = x1, x2 . . xk
Wwords = w1, w2 . . wn
Building in advance
Decoding online
6. Engineering
Extract Feature
Features to Phone
Word to Sentence
Features (FBank, MFCC ,etc.)
Phone Seq. to Word
Phone Sequence
Word
Speech:
FLOW OF THE ASR PROCESS
Text:今日は雨です
AM
(DNN)
Lexicon
(L.fst)
HMM
(HC.fst)Grammar
(G.fst)
LM
(HCLG.fst)
AM
(*.nnet)
use
use
use
Training AM on
Hadoop and GPU
Training LM on Hadoop
Developing Decoder
Developing Server Side In ASR
9. Engineering
On Hadoop
Training Mono-Phone
Training Tri-Phone
Force Alignment
Feat. And Transcript
On GPUs
Feat. And Tri-Phone
Training Neural Net.(NN)
NN based AM
Pre-Process
MapReduce
Training On GPUsTens of Millions
ML
Infer An
Alignment
Tri-Phone
Feat.
G2P etc..
Data
A Couple of Days
A Couple of Weeks
Developing AM
a m e
sil-a+m a-m+e e+sil
11. Engineering
On Hadoop
Counting Words
Building N-Gram
Transcripts(Corpuses)
On CPU(Can’t Use Hadoop)
Lex And N-Gram
Building WFST
WFST based LM
Pre-Process
Tens of Millions
Lots of Processes
Fixed prob. WFST
Lex And N-Gram.
Filtering,G2P etc..
A Day
A day and Using A
Couple of Hundred
GB Memory
Not suitable for
distributed processing
cuz of graph structure
Developing LM