SlideShare a Scribd company logo
Jaesung Bae(bjsd3@kaist.ac.kr)
Combining Time-and Frequency-
domain Convolutional Neural Network-
based phone recognition
Laszlo Toth
ppt: Jaesung Bae
Jaesung Bae(bjsd3@kaist.ac.kr)
Abstract
• Most author
→Focused on convolution on frequency domain
• To make invariance to speaker and speaking style.
→Others time-domain convolution
• For longer time-span of input in hierarchical manner.
• Here
→Combined two network.
• 16.7 error rate on the TIMIT phone recognition task.
• New record.
Jaesung Bae(bjsd3@kaist.ac.kr)
1. Introduction
• CNN
→Process in small localized parts, looking for the presence of relevant local features.
→By pooling
• Made more translation tolerant.
• Effect
→Convolution on frequency domain
• Good, speaker and speaking style invariant
→Convolution on time domain
• Not that effective. Too many convolution on time is harmful.
• negligible benefit.(Abdel-Hamid et al. and Sainath et al.)
• Some teams are succefully done it.
• During pooling the position information is not discarded.
• So convolution here is not for shift invariance, but allowing to process longer time.
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
• Difference with ANN
→1.locality
• Trained on a time-frequency representation instead of MFCC features
→2. Do weight sharing.
• 2.1. Convolution along the frequency axis
→Input: 40 mel filterbank channels plus the frame-level energy, along with the corresponding
delta and delta-delta parameters.
• 다른곳에서 쓰인 reference
→Use max pooling
→Vary the number of filters used to cover the whole frequency range of 40 mel filterbank.
→Weight sharing, limited weight sharing possible
• In here limited weight sharing is used.
→2 convolutional layer and 4 fully connected layer.
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
• 2.2. Convolution along the time axis.
→Motivated by hierarchical ANN models.
→Frequency에 대해서 먼저 network를 한 번 하고, 거기다가 time 에 대해서
convolution한 network.
→This paper’s difference with only applying frequency-domain convolution
• Input blocks are processed by just one layer of neurons in one case, and by a sub-
network of several layers in the other.
→This paper’s difference with only applying time-domain convolution
• Instead of pooling size r, they put several filters at different places along time.
• Allow shift invariance, but rather to enable the model to hierarchically process a fairly wide
range of input without increasing the number of weights.
• Q. Pooling size 1?????
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
• 2.3. Convolution along both time and frequency
→Network shown in Fig.1a should be substituted for the subnetwork for Fig.1b.
→First, sub-network is trained, then the output layer is discarded and full
network is consturceted with randomly initialized weights in the upper layers.
→Only the upper part is trained for 1 iteration
→Then the whole network is trained until convergence is reached.
Jaesung Bae(bjsd3@kaist.ac.kr)
3. Experimental Setting
• 10% of train dataset as validation dataset.
• Evaluation phase
→Label outputs were mapped to the usual set of 39 labels.
• Bigram language model was used.
→LM weight: 1.0, phone insertion penalty parameters: 0.0
• Trained by semi-batch back-propagation with batch size 100.
• Frame-level cross-entropy cost. (Not ctc-loss.)
• Learn rate: 0.001. If the validation loss does not decrease halved after each
iteration.
• Training was halted when the improvement in the error was smaller than 0.1% in
two subsequent iterations.
Jaesung Bae(bjsd3@kaist.ac.kr)
4. Result and Discussion
• Baseline model: FC 4 hidden layer with 2000 ReLU.(reference)
• 4.1. Convolution along time.
→Architecture of 1b was investigated in our earlier study.
→5 input blocks, covering 9 frames of input context with an overlap of 4 frames.
→Subnetwork: 3 layer of 2000 neurons. Bottleneck layer of 400 neurons.
→Upper part of network: hidden layer of 2000 neurons.
Jaesung Bae(bjsd3@kaist.ac.kr)
4. Result and Discussion
• 4.2. Convolution along frequency.
→First, attempt to find the optimal number of convolutional filters.
• Number of filter 4~8
• Neighboring filters overlapped by 2-3 mel channels.
→Filter width was set to 15 frames.
• To make it same with baseline model.
→Pooling size was 3.
→By experiment use 7 filters with width 7.
Jaesung Bae(bjsd3@kaist.ac.kr)
4. Result and Discussion
• 4.2. Convolution along frequency.
→Second, to find optimal pooling size.
• 5 gave the best result.
• Maybe possible, using various pooling size in the same model.
• 4.3. Convolution along time and frequency.
→Same dropout parameters for each layer.
→Dropout rate 0.25.
• Concusion
→16.7% on TIMIT dataset.
→Need more modification experiment.

More Related Content

Similar to [Paper Review] 2014 combining time and frequency-domain convolution in convolutional neural network-based phone recognition

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
Poo Kuan Hoong
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
balmanme
 
Dl
DlDl
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
NMR Automation
NMR AutomationNMR Automation
NMR Automation
cknoxrun
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
Akira Tamamori
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
Taehoon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
section5.pptx
section5.pptxsection5.pptx
section5.pptx
NourhanTarek23
 
cellular ppt
cellular pptcellular ppt
cellular ppt
Divya Bansal
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
ssusere5ddd6
 
Resnet
ResnetResnet
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
yashsharma863914
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
DanBarcan2
 

Similar to [Paper Review] 2014 combining time and frequency-domain convolution in convolutional neural network-based phone recognition (20)

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
Dl
DlDl
Dl
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
NMR Automation
NMR AutomationNMR Automation
NMR Automation
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
section5.pptx
section5.pptxsection5.pptx
section5.pptx
 
cellular ppt
cellular pptcellular ppt
cellular ppt
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Resnet
ResnetResnet
Resnet
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 

Recently uploaded

Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 

Recently uploaded (20)

Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 

[Paper Review] 2014 combining time and frequency-domain convolution in convolutional neural network-based phone recognition

  • 1. Jaesung Bae(bjsd3@kaist.ac.kr) Combining Time-and Frequency- domain Convolutional Neural Network- based phone recognition Laszlo Toth ppt: Jaesung Bae
  • 2. Jaesung Bae(bjsd3@kaist.ac.kr) Abstract • Most author →Focused on convolution on frequency domain • To make invariance to speaker and speaking style. →Others time-domain convolution • For longer time-span of input in hierarchical manner. • Here →Combined two network. • 16.7 error rate on the TIMIT phone recognition task. • New record.
  • 3. Jaesung Bae(bjsd3@kaist.ac.kr) 1. Introduction • CNN →Process in small localized parts, looking for the presence of relevant local features. →By pooling • Made more translation tolerant. • Effect →Convolution on frequency domain • Good, speaker and speaking style invariant →Convolution on time domain • Not that effective. Too many convolution on time is harmful. • negligible benefit.(Abdel-Hamid et al. and Sainath et al.) • Some teams are succefully done it. • During pooling the position information is not discarded. • So convolution here is not for shift invariance, but allowing to process longer time.
  • 4. Jaesung Bae(bjsd3@kaist.ac.kr) 2. Convolutional Neural Networks • Difference with ANN →1.locality • Trained on a time-frequency representation instead of MFCC features →2. Do weight sharing. • 2.1. Convolution along the frequency axis →Input: 40 mel filterbank channels plus the frame-level energy, along with the corresponding delta and delta-delta parameters. • 다른곳에서 쓰인 reference →Use max pooling →Vary the number of filters used to cover the whole frequency range of 40 mel filterbank. →Weight sharing, limited weight sharing possible • In here limited weight sharing is used. →2 convolutional layer and 4 fully connected layer.
  • 6. Jaesung Bae(bjsd3@kaist.ac.kr) 2. Convolutional Neural Networks • 2.2. Convolution along the time axis. →Motivated by hierarchical ANN models. →Frequency에 대해서 먼저 network를 한 번 하고, 거기다가 time 에 대해서 convolution한 network. →This paper’s difference with only applying frequency-domain convolution • Input blocks are processed by just one layer of neurons in one case, and by a sub- network of several layers in the other. →This paper’s difference with only applying time-domain convolution • Instead of pooling size r, they put several filters at different places along time. • Allow shift invariance, but rather to enable the model to hierarchically process a fairly wide range of input without increasing the number of weights. • Q. Pooling size 1?????
  • 7. Jaesung Bae(bjsd3@kaist.ac.kr) 2. Convolutional Neural Networks • 2.3. Convolution along both time and frequency →Network shown in Fig.1a should be substituted for the subnetwork for Fig.1b. →First, sub-network is trained, then the output layer is discarded and full network is consturceted with randomly initialized weights in the upper layers. →Only the upper part is trained for 1 iteration →Then the whole network is trained until convergence is reached.
  • 8. Jaesung Bae(bjsd3@kaist.ac.kr) 3. Experimental Setting • 10% of train dataset as validation dataset. • Evaluation phase →Label outputs were mapped to the usual set of 39 labels. • Bigram language model was used. →LM weight: 1.0, phone insertion penalty parameters: 0.0 • Trained by semi-batch back-propagation with batch size 100. • Frame-level cross-entropy cost. (Not ctc-loss.) • Learn rate: 0.001. If the validation loss does not decrease halved after each iteration. • Training was halted when the improvement in the error was smaller than 0.1% in two subsequent iterations.
  • 9. Jaesung Bae(bjsd3@kaist.ac.kr) 4. Result and Discussion • Baseline model: FC 4 hidden layer with 2000 ReLU.(reference) • 4.1. Convolution along time. →Architecture of 1b was investigated in our earlier study. →5 input blocks, covering 9 frames of input context with an overlap of 4 frames. →Subnetwork: 3 layer of 2000 neurons. Bottleneck layer of 400 neurons. →Upper part of network: hidden layer of 2000 neurons.
  • 10. Jaesung Bae(bjsd3@kaist.ac.kr) 4. Result and Discussion • 4.2. Convolution along frequency. →First, attempt to find the optimal number of convolutional filters. • Number of filter 4~8 • Neighboring filters overlapped by 2-3 mel channels. →Filter width was set to 15 frames. • To make it same with baseline model. →Pooling size was 3. →By experiment use 7 filters with width 7.
  • 11. Jaesung Bae(bjsd3@kaist.ac.kr) 4. Result and Discussion • 4.2. Convolution along frequency. →Second, to find optimal pooling size. • 5 gave the best result. • Maybe possible, using various pooling size in the same model. • 4.3. Convolution along time and frequency. →Same dropout parameters for each layer. →Dropout rate 0.25. • Concusion →16.7% on TIMIT dataset. →Need more modification experiment.