SlideShare a Scribd company logo
1 of 11
Jaesung Bae(bjsd3@kaist.ac.kr)
Combining Time-and Frequency-
domain Convolutional Neural Network-
based phone recognition
Laszlo Toth
ppt: Jaesung Bae
Jaesung Bae(bjsd3@kaist.ac.kr)
Abstract
• Most author
→Focused on convolution on frequency domain
• To make invariance to speaker and speaking style.
→Others time-domain convolution
• For longer time-span of input in hierarchical manner.
• Here
→Combined two network.
• 16.7 error rate on the TIMIT phone recognition task.
• New record.
Jaesung Bae(bjsd3@kaist.ac.kr)
1. Introduction
• CNN
→Process in small localized parts, looking for the presence of relevant local features.
→By pooling
• Made more translation tolerant.
• Effect
→Convolution on frequency domain
• Good, speaker and speaking style invariant
→Convolution on time domain
• Not that effective. Too many convolution on time is harmful.
• negligible benefit.(Abdel-Hamid et al. and Sainath et al.)
• Some teams are succefully done it.
• During pooling the position information is not discarded.
• So convolution here is not for shift invariance, but allowing to process longer time.
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
• Difference with ANN
→1.locality
• Trained on a time-frequency representation instead of MFCC features
→2. Do weight sharing.
• 2.1. Convolution along the frequency axis
→Input: 40 mel filterbank channels plus the frame-level energy, along with the corresponding
delta and delta-delta parameters.
• 다른곳에서 쓰인 reference
→Use max pooling
→Vary the number of filters used to cover the whole frequency range of 40 mel filterbank.
→Weight sharing, limited weight sharing possible
• In here limited weight sharing is used.
→2 convolutional layer and 4 fully connected layer.
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
• 2.2. Convolution along the time axis.
→Motivated by hierarchical ANN models.
→Frequency에 대해서 먼저 network를 한 번 하고, 거기다가 time 에 대해서
convolution한 network.
→This paper’s difference with only applying frequency-domain convolution
• Input blocks are processed by just one layer of neurons in one case, and by a sub-
network of several layers in the other.
→This paper’s difference with only applying time-domain convolution
• Instead of pooling size r, they put several filters at different places along time.
• Allow shift invariance, but rather to enable the model to hierarchically process a fairly wide
range of input without increasing the number of weights.
• Q. Pooling size 1?????
Jaesung Bae(bjsd3@kaist.ac.kr)
2. Convolutional Neural Networks
• 2.3. Convolution along both time and frequency
→Network shown in Fig.1a should be substituted for the subnetwork for Fig.1b.
→First, sub-network is trained, then the output layer is discarded and full
network is consturceted with randomly initialized weights in the upper layers.
→Only the upper part is trained for 1 iteration
→Then the whole network is trained until convergence is reached.
Jaesung Bae(bjsd3@kaist.ac.kr)
3. Experimental Setting
• 10% of train dataset as validation dataset.
• Evaluation phase
→Label outputs were mapped to the usual set of 39 labels.
• Bigram language model was used.
→LM weight: 1.0, phone insertion penalty parameters: 0.0
• Trained by semi-batch back-propagation with batch size 100.
• Frame-level cross-entropy cost. (Not ctc-loss.)
• Learn rate: 0.001. If the validation loss does not decrease halved after each
iteration.
• Training was halted when the improvement in the error was smaller than 0.1% in
two subsequent iterations.
Jaesung Bae(bjsd3@kaist.ac.kr)
4. Result and Discussion
• Baseline model: FC 4 hidden layer with 2000 ReLU.(reference)
• 4.1. Convolution along time.
→Architecture of 1b was investigated in our earlier study.
→5 input blocks, covering 9 frames of input context with an overlap of 4 frames.
→Subnetwork: 3 layer of 2000 neurons. Bottleneck layer of 400 neurons.
→Upper part of network: hidden layer of 2000 neurons.
Jaesung Bae(bjsd3@kaist.ac.kr)
4. Result and Discussion
• 4.2. Convolution along frequency.
→First, attempt to find the optimal number of convolutional filters.
• Number of filter 4~8
• Neighboring filters overlapped by 2-3 mel channels.
→Filter width was set to 15 frames.
• To make it same with baseline model.
→Pooling size was 3.
→By experiment use 7 filters with width 7.
Jaesung Bae(bjsd3@kaist.ac.kr)
4. Result and Discussion
• 4.2. Convolution along frequency.
→Second, to find optimal pooling size.
• 5 gave the best result.
• Maybe possible, using various pooling size in the same model.
• 4.3. Convolution along time and frequency.
→Same dropout parameters for each layer.
→Dropout rate 0.25.
• Concusion
→16.7% on TIMIT dataset.
→Need more modification experiment.

More Related Content

Similar to [Paper Review] 2014 combining time and frequency-domain convolution in convolutional neural network-based phone recognition

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
NMR Automation
NMR AutomationNMR Automation
NMR Automationcknoxrun
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningssusere5ddd6
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 

Similar to [Paper Review] 2014 combining time and frequency-domain convolution in convolutional neural network-based phone recognition (20)

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
Dl
DlDl
Dl
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
NMR Automation
NMR AutomationNMR Automation
NMR Automation
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
section5.pptx
section5.pptxsection5.pptx
section5.pptx
 
cellular ppt
cellular pptcellular ppt
cellular ppt
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Resnet
ResnetResnet
Resnet
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligencemahaffeycheryld
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptamrabdallah9
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Studentskannan348865
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Stationsiddharthteach18
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniR. Sosa
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfKira Dess
 

Recently uploaded (20)

Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 

[Paper Review] 2014 combining time and frequency-domain convolution in convolutional neural network-based phone recognition

  • 1. Jaesung Bae(bjsd3@kaist.ac.kr) Combining Time-and Frequency- domain Convolutional Neural Network- based phone recognition Laszlo Toth ppt: Jaesung Bae
  • 2. Jaesung Bae(bjsd3@kaist.ac.kr) Abstract • Most author →Focused on convolution on frequency domain • To make invariance to speaker and speaking style. →Others time-domain convolution • For longer time-span of input in hierarchical manner. • Here →Combined two network. • 16.7 error rate on the TIMIT phone recognition task. • New record.
  • 3. Jaesung Bae(bjsd3@kaist.ac.kr) 1. Introduction • CNN →Process in small localized parts, looking for the presence of relevant local features. →By pooling • Made more translation tolerant. • Effect →Convolution on frequency domain • Good, speaker and speaking style invariant →Convolution on time domain • Not that effective. Too many convolution on time is harmful. • negligible benefit.(Abdel-Hamid et al. and Sainath et al.) • Some teams are succefully done it. • During pooling the position information is not discarded. • So convolution here is not for shift invariance, but allowing to process longer time.
  • 4. Jaesung Bae(bjsd3@kaist.ac.kr) 2. Convolutional Neural Networks • Difference with ANN →1.locality • Trained on a time-frequency representation instead of MFCC features →2. Do weight sharing. • 2.1. Convolution along the frequency axis →Input: 40 mel filterbank channels plus the frame-level energy, along with the corresponding delta and delta-delta parameters. • 다른곳에서 쓰인 reference →Use max pooling →Vary the number of filters used to cover the whole frequency range of 40 mel filterbank. →Weight sharing, limited weight sharing possible • In here limited weight sharing is used. →2 convolutional layer and 4 fully connected layer.
  • 6. Jaesung Bae(bjsd3@kaist.ac.kr) 2. Convolutional Neural Networks • 2.2. Convolution along the time axis. →Motivated by hierarchical ANN models. →Frequency에 대해서 먼저 network를 한 번 하고, 거기다가 time 에 대해서 convolution한 network. →This paper’s difference with only applying frequency-domain convolution • Input blocks are processed by just one layer of neurons in one case, and by a sub- network of several layers in the other. →This paper’s difference with only applying time-domain convolution • Instead of pooling size r, they put several filters at different places along time. • Allow shift invariance, but rather to enable the model to hierarchically process a fairly wide range of input without increasing the number of weights. • Q. Pooling size 1?????
  • 7. Jaesung Bae(bjsd3@kaist.ac.kr) 2. Convolutional Neural Networks • 2.3. Convolution along both time and frequency →Network shown in Fig.1a should be substituted for the subnetwork for Fig.1b. →First, sub-network is trained, then the output layer is discarded and full network is consturceted with randomly initialized weights in the upper layers. →Only the upper part is trained for 1 iteration →Then the whole network is trained until convergence is reached.
  • 8. Jaesung Bae(bjsd3@kaist.ac.kr) 3. Experimental Setting • 10% of train dataset as validation dataset. • Evaluation phase →Label outputs were mapped to the usual set of 39 labels. • Bigram language model was used. →LM weight: 1.0, phone insertion penalty parameters: 0.0 • Trained by semi-batch back-propagation with batch size 100. • Frame-level cross-entropy cost. (Not ctc-loss.) • Learn rate: 0.001. If the validation loss does not decrease halved after each iteration. • Training was halted when the improvement in the error was smaller than 0.1% in two subsequent iterations.
  • 9. Jaesung Bae(bjsd3@kaist.ac.kr) 4. Result and Discussion • Baseline model: FC 4 hidden layer with 2000 ReLU.(reference) • 4.1. Convolution along time. →Architecture of 1b was investigated in our earlier study. →5 input blocks, covering 9 frames of input context with an overlap of 4 frames. →Subnetwork: 3 layer of 2000 neurons. Bottleneck layer of 400 neurons. →Upper part of network: hidden layer of 2000 neurons.
  • 10. Jaesung Bae(bjsd3@kaist.ac.kr) 4. Result and Discussion • 4.2. Convolution along frequency. →First, attempt to find the optimal number of convolutional filters. • Number of filter 4~8 • Neighboring filters overlapped by 2-3 mel channels. →Filter width was set to 15 frames. • To make it same with baseline model. →Pooling size was 3. →By experiment use 7 filters with width 7.
  • 11. Jaesung Bae(bjsd3@kaist.ac.kr) 4. Result and Discussion • 4.2. Convolution along frequency. →Second, to find optimal pooling size. • 5 gave the best result. • Maybe possible, using various pooling size in the same model. • 4.3. Convolution along time and frequency. →Same dropout parameters for each layer. →Dropout rate 0.25. • Concusion →16.7% on TIMIT dataset. →Need more modification experiment.