SlideShare a Scribd company logo
1/27
SP Study
Paper Reading
NAIST, AHC-Lab, SP-GROUP
Mori Takuma(M1)
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
2/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
1. Towards End-to-End Speech
Recognition with Deep
Convolutional Neural
Networks
2. Deep Speech 2: End-to-End
Speech Recognition in
English and Mandarin
Outline
3/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
1. Towards End-to-End Speech
Recognition with Deep
Convolutional Neural
Networks
2. Deep Speech 2: End-to-End
Speech Recognition in
English and Mandarin
Outline
4/27
Towards End-to-End Speech
Recognition with Deep
Convolutional Neural
Networks
Ying Zhang, Mohammad Pezeshki, Phil emon Brakel, Saizheng
Zhang, C esar Laurent,
Yoshua Bengio, Aaron Courville
Universit e de Montr eal, INTERSPEECH 2016
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
5/27
Background
• Convolutional Neural Networks (CNNs) are effective
models for reducing spectral variations and
modeling spectral correlations in acoustic
features for automatic speech recognition.
• Connectionist Temporal Classification (CTC) with
Recurrent Neural Networks (RNNs), which is
proposed for labeling unsegmented sequences, makes
it feasible to train an ‘end-to-end’ speech
recognition system instead of hybrid settings.
Problem
• RNNs are computationally expensive and sometimes
difficult to train.
Solve
• We propose an end-to-end speech framework for
sequence labeling, by combining hierarchical CNNs
with CTC directly without recurrent connections.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Overview
6/27
• Log-mel-filter-bank to
phoneme
• Input: 40-dimensional log
mel-filter-bank (plus energy
term) coefficients with
deltas and delta-deltas
• Our CNN acoustic model whose
architecture is different
from the above. The complete
CNN includes stacked
convolutional and pooling
layers, at the top of which
are multiple fully-connected
layers.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Proposed Method
CTC
7/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Convolution
8/27
• Sequence of acoustic feature values:
𝑋 ∈ ℝ 𝑐×𝑏×𝑓
• channels 𝑐, frequency bandwidth 𝑏, time length 𝑓
• convolutional layer convolves 𝑋 with 𝑘 filters
{𝑊𝑖} 𝑘:
𝑊𝑖 ∈ ℝ 𝑐×𝑚×𝑛
• Frequency axis = m, length along frame axis = n
• The resulting 𝑘 preactivation feature maps
consist of a 3D tensor:
𝐻 ∈ ℝ 𝑘×𝑏 𝐻×𝑓 𝐻
• 𝐻𝑖 is computed as follows:
𝐻𝑖 = 𝑊𝑖 ∗ 𝑋 + 𝑏𝑖, 𝑖 = 1, … , 𝑘.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Convolution
9/27
• We take the number of piece-wise linear functions
as 2 for example
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Maxout
10/27
• Before ෩𝐻𝑖 and After ෡𝐻𝑖 pooling:
[ ෡𝐻𝑖] 𝑟,𝑡= 𝑚𝑎𝑥𝑗=1
𝑝
{[ ෩𝐻𝑖] 𝑟×𝑠+𝑗,𝑡}
• the step size s, pooling size p, all the
[ ෩𝐻𝑖] 𝑟×𝑠+𝑗,𝑡 values inside the max have the same
time index t.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Pooling
11/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Connectionist Temporal Classification
12/27
• DATA: TIMIT corpus(462-speaker training set, 50-
speaker development set)
• Our model achieves 18.2% phoneme error rate on
the core test set, which is slightly better than
the LSTM baseline model and the transducer model
with an explicit RNN language model.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Experimental Result
13/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
1. Towards End-to-End Speech
Recognition with Deep
Convolutional Neural
Networks
2. Deep Speech 2: End-to-End
Speech Recognition in
English and Mandarin
Outline
14/27
Deep Speech 2: End-to-End
Speech Recognition in
English and Mandarin
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared
Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam
Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan,
Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick
LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair,
Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun,
Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani
Yogatama, Jun Zhan, Zhenyao Zhu
Baidu Research – Silicon Valley AI Lab, ICML 2016
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
15/27
Background
• This “end to end” vision of training simplyfies
the training process.
Effort
• We show that an end-to-end deep learning approach
can be used to recognize either English or
Mandarin Chinese speech–two vastly different
languages.
• In several cases, our system is competitive with
the transcription of human workers when
benchmarked on standard datasets.
• Finally, we show that our system can be
inexpensively deployed in an online setting,
delivering low latency when serving users at
scale.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Overview
16/27
• Spectrogram to Text
• Input:sequence of log-
spegtrograms of power normalized
audio clips(20ms window)
• At each output time-step t, the
RNN makes a prediction, p(𝑙 𝑡|𝑥),
where 𝑙 𝑡 is either a character
in the alphabet or the blank
symbol.
• At inference time, CTC models
are paired a with language model
trained on a bigger corpus of
text.
Q(y) = log(𝑝 𝑅𝑁𝑁 (y|x))
+ αlog(𝑝 𝐿𝑀(y)) + βwc(y) (1)
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Proposed Method
17/27
• Recent research has shown that
BatchNorm(Batch Normalization)
can speed convergence of RNNs
training, though not always
improving generalization error.
• A recurrent layer is
implemented as
ℎ 𝑡
𝑙
= 𝑓(𝐵 𝑊 𝑙
ℎ 𝑡
𝑙−1
+ 𝑈 𝑙
ℎ 𝑡−1
𝑙
)
• Two ways of applying BatchNorm
1. ℎ 𝑡
𝑙
= 𝑓(𝐵 𝑊 𝑙
ℎ 𝑡
𝑙−1
+ 𝑈 𝑙
ℎ 𝑡−1
𝑙
)
2. ℎ 𝑡
𝑙
= 𝑓(𝐵 𝑊 𝑙
ℎ 𝑡
𝑙−1
+ 𝑈 𝑙
ℎ 𝑡−1
𝑙
)
𝐵 :BatchNorm transformation
• 1 is not effective.
• 2 is good.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Batch Normalization for Deep RNNs
18/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Batch Normalization for Deep RNNs
19/27
• Even with Batch Normalization, we find training
with CTC to be occasionally unstable,
particularly in the early stages.
• We use the length of the utterance an a heuristic
• In the first training epoch we iterate through,
minibatches in the training set in increasing
order of the, length of the longest utterance in
the minibatch. After the first epoch training
reverts back to a random order over minibatches.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
SortaGrad
20/27
• GRU and LSTM reach similar accuracy
• GRUs are faster to train and less likely to
diverge.
• GRU architecture achieves better WER for all
network depths.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Comparison of vanilla RNNs and GRUs
21/27
• Convolution in frequency attempts to model
spectral variance due to speaker variability more
concisely than what is possible with large fully
conected networks.
• These are both in the time-and-frequency domain
(2D) and in the time-only domain (1D).
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Frequency Convolutions
22/27
• Bidirectional RNN models cannot stream the
transcription process as the utterance arrives
from the user.
• The layer learns weights to linearly combine each
neuron’s activations T timesteps into the future,
and thus allows us to control the amount of
future context needed.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Lookahead Convolution and Unidirectional Models
23/27
• The only architectural changes we make to our
networks are due to the characteristics of the
Chinese character set.
• The network outputs probabilities for about 6000
characters, which includes the Roman alphabet,
since hybrid Chinese-English transcripts are
common.
• We use a character level language model in
Mandarin as words are not usually segmented in
text.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Adaptation to Mandarin
24/27
• Optimizer: Synchronous SGD.
• Our training distributes work over multiple GPUs
in a dataparallel fashion with synchronous SGD.
• Each GPU uses a local copy of the model to work on
a portion of the current minibatch and then
exchanges computed gradients with all other GPUs.
• It is reproducible, which facilitates discovering
and fixing regressions.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
System Optimizations
25/27
• Training Data: 11,940 hours of labeled speech
containing 8 million utterances
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Experimental Result
26/27
• Training Data: 9,400
hours of labeled
speech containing 11
million utterances.
• Development Data: 2000
utterances as well as
a test set of 1882
examples of noisy
speech.
10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
Experimental Result
27/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
End Slide.

More Related Content

Viewers also liked

публичный отчет 2014г
публичный отчет 2014гпубличный отчет 2014г
публичный отчет 2014г
PsyX PsyX
 
Ulticon Helps The Victims of Yolanda
Ulticon Helps The Victims of YolandaUlticon Helps The Victims of Yolanda
Ulticon Helps The Victims of Yolanda
gianroces61
 
Bab 1 kls xi
Bab 1 kls xiBab 1 kls xi
Bab 1 kls xi
Fikri Pandoez
 
Case control epidiomologi
Case control epidiomologiCase control epidiomologi
Case control epidiomologi
noegy
 
Aroma Cengkeh di Kaki Menoreh
Aroma Cengkeh di Kaki MenorehAroma Cengkeh di Kaki Menoreh
Aroma Cengkeh di Kaki Menoreh
Pindai Media
 
V entajas y desventajas elearning
V entajas y desventajas elearningV entajas y desventajas elearning
V entajas y desventajas elearning
Kevin Owen Tajia
 
Pink Is The New Black -Bewakoof.com
Pink Is The New Black -Bewakoof.comPink Is The New Black -Bewakoof.com
Pink Is The New Black -Bewakoof.com
Bewakoof.com
 
Qué es la historia ahora
Qué es la historia ahoraQué es la historia ahora
Qué es la historia ahora
karina_fabiola
 
iTel Brochure 2015
iTel Brochure 2015 iTel Brochure 2015
iTel Brochure 2015
Ron Richard
 
validation
validationvalidation
420 final presentation
420 final presentation420 final presentation
420 final presentationwixtedcl
 
MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...
MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...
MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...
IAEME Publication
 
Jacobs Mission Critical
Jacobs Mission CriticalJacobs Mission Critical
Jacobs Mission CriticalSam Larsen
 
CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016
CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016
CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016
ibrasurf
 

Viewers also liked (15)

публичный отчет 2014г
публичный отчет 2014гпубличный отчет 2014г
публичный отчет 2014г
 
Ulticon Helps The Victims of Yolanda
Ulticon Helps The Victims of YolandaUlticon Helps The Victims of Yolanda
Ulticon Helps The Victims of Yolanda
 
Bab 1 kls xi
Bab 1 kls xiBab 1 kls xi
Bab 1 kls xi
 
Case control epidiomologi
Case control epidiomologiCase control epidiomologi
Case control epidiomologi
 
Aroma Cengkeh di Kaki Menoreh
Aroma Cengkeh di Kaki MenorehAroma Cengkeh di Kaki Menoreh
Aroma Cengkeh di Kaki Menoreh
 
V entajas y desventajas elearning
V entajas y desventajas elearningV entajas y desventajas elearning
V entajas y desventajas elearning
 
Pink Is The New Black -Bewakoof.com
Pink Is The New Black -Bewakoof.comPink Is The New Black -Bewakoof.com
Pink Is The New Black -Bewakoof.com
 
Qué es la historia ahora
Qué es la historia ahoraQué es la historia ahora
Qué es la historia ahora
 
iTel Brochure 2015
iTel Brochure 2015 iTel Brochure 2015
iTel Brochure 2015
 
The male and female repro ib master
The male and female repro ib masterThe male and female repro ib master
The male and female repro ib master
 
validation
validationvalidation
validation
 
420 final presentation
420 final presentation420 final presentation
420 final presentation
 
MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...
MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...
MECHANICAL PROPERTIES OF NANOIRON PARTICLES REINFORCED EPOXY/POLYESTER NANOCO...
 
Jacobs Mission Critical
Jacobs Mission CriticalJacobs Mission Critical
Jacobs Mission Critical
 
CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016
CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016
CIRCUITO PAULISTA UNIVERSITÁRIO DE SURF 2016
 

Similar to SP Study1018 Paper Reading

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
Tomoki Hayashi
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
June-Woo Kim
 
Conformer review
Conformer reviewConformer review
Conformer review
June-Woo Kim
 
Scalable AMR - HPC China 2016 and Exascale Co-Design
Scalable AMR - HPC China 2016 and Exascale Co-DesignScalable AMR - HPC China 2016 and Exascale Co-Design
Scalable AMR - HPC China 2016 and Exascale Co-Design
Michael Norman
 
08039246
0803924608039246
08039246
Thilip Kumar
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
June-Woo Kim
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
IRJET Journal
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET Journal
 
ROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTION
ROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTIONROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTION
ROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTION
Riccardo Bernardini
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
IRJET Journal
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
ssuser849b73
 
Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...
Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...
Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...
Polytechnique Montreal
 
EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...
EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...
EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...
Nexgen Technology
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
ijma
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
TELKOMNIKA JOURNAL
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
kevig
 

Similar to SP Study1018 Paper Reading (20)

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
 
Conformer review
Conformer reviewConformer review
Conformer review
 
Scalable AMR - HPC China 2016 and Exascale Co-Design
Scalable AMR - HPC China 2016 and Exascale Co-DesignScalable AMR - HPC China 2016 and Exascale Co-Design
Scalable AMR - HPC China 2016 and Exascale Co-Design
 
08039246
0803924608039246
08039246
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
L046056365
L046056365L046056365
L046056365
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
ROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTION
ROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTIONROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTION
ROBUST RECONSTRUCTION FOR CS-BASED FETAL BEATS DETECTION
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
 
Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...
Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...
Channel Assignment With Access Contention Resolution for Cognitive Radio Netw...
 
EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...
EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...
EDSA: ENERGY-EFFICIENT DYNAMIC SPECTRUMACCESS PROTOCOLS FOR COGNITIVE RADIO N...
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 

Recently uploaded

NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 

Recently uploaded (20)

NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 

SP Study1018 Paper Reading

  • 1. 1/27 SP Study Paper Reading NAIST, AHC-Lab, SP-GROUP Mori Takuma(M1) 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
  • 2. 2/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST 1. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks 2. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Outline
  • 3. 3/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST 1. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks 2. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Outline
  • 4. 4/27 Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks Ying Zhang, Mohammad Pezeshki, Phil emon Brakel, Saizheng Zhang, C esar Laurent, Yoshua Bengio, Aaron Courville Universit e de Montr eal, INTERSPEECH 2016 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
  • 5. 5/27 Background • Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition. • Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an ‘end-to-end’ speech recognition system instead of hybrid settings. Problem • RNNs are computationally expensive and sometimes difficult to train. Solve • We propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Overview
  • 6. 6/27 • Log-mel-filter-bank to phoneme • Input: 40-dimensional log mel-filter-bank (plus energy term) coefficients with deltas and delta-deltas • Our CNN acoustic model whose architecture is different from the above. The complete CNN includes stacked convolutional and pooling layers, at the top of which are multiple fully-connected layers. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Proposed Method CTC
  • 7. 7/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Convolution
  • 8. 8/27 • Sequence of acoustic feature values: 𝑋 ∈ ℝ 𝑐×𝑏×𝑓 • channels 𝑐, frequency bandwidth 𝑏, time length 𝑓 • convolutional layer convolves 𝑋 with 𝑘 filters {𝑊𝑖} 𝑘: 𝑊𝑖 ∈ ℝ 𝑐×𝑚×𝑛 • Frequency axis = m, length along frame axis = n • The resulting 𝑘 preactivation feature maps consist of a 3D tensor: 𝐻 ∈ ℝ 𝑘×𝑏 𝐻×𝑓 𝐻 • 𝐻𝑖 is computed as follows: 𝐻𝑖 = 𝑊𝑖 ∗ 𝑋 + 𝑏𝑖, 𝑖 = 1, … , 𝑘. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Convolution
  • 9. 9/27 • We take the number of piece-wise linear functions as 2 for example 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Maxout
  • 10. 10/27 • Before ෩𝐻𝑖 and After ෡𝐻𝑖 pooling: [ ෡𝐻𝑖] 𝑟,𝑡= 𝑚𝑎𝑥𝑗=1 𝑝 {[ ෩𝐻𝑖] 𝑟×𝑠+𝑗,𝑡} • the step size s, pooling size p, all the [ ෩𝐻𝑖] 𝑟×𝑠+𝑗,𝑡 values inside the max have the same time index t. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Pooling
  • 11. 11/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Connectionist Temporal Classification
  • 12. 12/27 • DATA: TIMIT corpus(462-speaker training set, 50- speaker development set) • Our model achieves 18.2% phoneme error rate on the core test set, which is slightly better than the LSTM baseline model and the transducer model with an explicit RNN language model. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Experimental Result
  • 13. 13/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST 1. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks 2. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Outline
  • 14. 14/27 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu Baidu Research – Silicon Valley AI Lab, ICML 2016 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST
  • 15. 15/27 Background • This “end to end” vision of training simplyfies the training process. Effort • We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. • In several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. • Finally, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Overview
  • 16. 16/27 • Spectrogram to Text • Input:sequence of log- spegtrograms of power normalized audio clips(20ms window) • At each output time-step t, the RNN makes a prediction, p(𝑙 𝑡|𝑥), where 𝑙 𝑡 is either a character in the alphabet or the blank symbol. • At inference time, CTC models are paired a with language model trained on a bigger corpus of text. Q(y) = log(𝑝 𝑅𝑁𝑁 (y|x)) + αlog(𝑝 𝐿𝑀(y)) + βwc(y) (1) 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Proposed Method
  • 17. 17/27 • Recent research has shown that BatchNorm(Batch Normalization) can speed convergence of RNNs training, though not always improving generalization error. • A recurrent layer is implemented as ℎ 𝑡 𝑙 = 𝑓(𝐵 𝑊 𝑙 ℎ 𝑡 𝑙−1 + 𝑈 𝑙 ℎ 𝑡−1 𝑙 ) • Two ways of applying BatchNorm 1. ℎ 𝑡 𝑙 = 𝑓(𝐵 𝑊 𝑙 ℎ 𝑡 𝑙−1 + 𝑈 𝑙 ℎ 𝑡−1 𝑙 ) 2. ℎ 𝑡 𝑙 = 𝑓(𝐵 𝑊 𝑙 ℎ 𝑡 𝑙−1 + 𝑈 𝑙 ℎ 𝑡−1 𝑙 ) 𝐵 :BatchNorm transformation • 1 is not effective. • 2 is good. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Batch Normalization for Deep RNNs
  • 18. 18/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Batch Normalization for Deep RNNs
  • 19. 19/27 • Even with Batch Normalization, we find training with CTC to be occasionally unstable, particularly in the early stages. • We use the length of the utterance an a heuristic • In the first training epoch we iterate through, minibatches in the training set in increasing order of the, length of the longest utterance in the minibatch. After the first epoch training reverts back to a random order over minibatches. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST SortaGrad
  • 20. 20/27 • GRU and LSTM reach similar accuracy • GRUs are faster to train and less likely to diverge. • GRU architecture achieves better WER for all network depths. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Comparison of vanilla RNNs and GRUs
  • 21. 21/27 • Convolution in frequency attempts to model spectral variance due to speaker variability more concisely than what is possible with large fully conected networks. • These are both in the time-and-frequency domain (2D) and in the time-only domain (1D). 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Frequency Convolutions
  • 22. 22/27 • Bidirectional RNN models cannot stream the transcription process as the utterance arrives from the user. • The layer learns weights to linearly combine each neuron’s activations T timesteps into the future, and thus allows us to control the amount of future context needed. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Lookahead Convolution and Unidirectional Models
  • 23. 23/27 • The only architectural changes we make to our networks are due to the characteristics of the Chinese character set. • The network outputs probabilities for about 6000 characters, which includes the Roman alphabet, since hybrid Chinese-English transcripts are common. • We use a character level language model in Mandarin as words are not usually segmented in text. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Adaptation to Mandarin
  • 24. 24/27 • Optimizer: Synchronous SGD. • Our training distributes work over multiple GPUs in a dataparallel fashion with synchronous SGD. • Each GPU uses a local copy of the model to work on a portion of the current minibatch and then exchanges computed gradients with all other GPUs. • It is reproducible, which facilitates discovering and fixing regressions. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST System Optimizations
  • 25. 25/27 • Training Data: 11,940 hours of labeled speech containing 8 million utterances 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Experimental Result
  • 26. 26/27 • Training Data: 9,400 hours of labeled speech containing 11 million utterances. • Development Data: 2000 utterances as well as a test set of 1882 examples of noisy speech. 10/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST Experimental Result
  • 27. 27/2710/18/2016 2016©Takuma Mori AHC-Lab, IS, NAIST End Slide.