SlideShare a Scribd company logo
1 of 15
Download to read offline
RNNs for Speech
Faster and smaller RNNs with new regularization techniques.
Old Good RNNs
Cannot train RNN!!
Gradients get crazy!!
Fishes are better at remembering!!!
I watched Schmidhuber and liked him!!
I don’t care baseline, I use what the cool boys use!!
Why so big, Occam’s will cry!!
My GPU has 4GB!!
I can’t wait months to train!!
X et al. said GRUs are better!!
What else?
I need a RNN size model with LSTM performance !!
I need a smaller model or a better smart phone !!
FastGRNN
http://manikvarma.org/pubs/kusupati18.pdf
This forget gate makes no sense!!
May the ReLU be with you!!
I do speech recognition!!
I watched Bengio and liked him!!
LightGRU
https://arxiv.org/abs/1803.10225
I need Regularization!!!
Dropout is not good!!!
AWD-LSTM
https://arxiv.org/abs/1708.02182
Fast GRNN
● 2 trainable matrices vs 6 trainable matrices in a GRU layer.
● Low rank approximation of matrices: w = w1(w2).T
● Integer quantization for parameters.
● Piecewise linear approximation of non-linearities.
FastGRNN vs GRU
Light Gated Recurrent Units
● Remove the reset gate.
● Replace tanh with ReLU
● Batch normalization to reduce ReLU unstability.
● Specifically targeting speech recognition.
● Orthogonal weight initialization, Variational dropout
Redundancy of Reset Gate
Results
40 log-mel filter banks Maximum likelihood
linear regression
All together
GRU FastGRNN LightGRU
ASGD Weight Dropped LSTM
● Drop Connect
● Averaged SGD
● Embedding Dropout
● Activation Regularization
Weight Dropping
● Apply Drop-Connect to hidden to hidden connections. (All U matrices)
● Preventing recurrent unit overfitting.
● It needs not to modify optimized RNN implementations in DL frameworks.
● Apply the same dropout mask for the all sequence.
Average SGD and NT-ASGD
Number of steps
to start averaging Weights optimized per iterationWeights used as the
final model
PyTorch implementation:
https://github.com/pytorch/pytorch/blob/cd9b27231b51633e76e28b6a34002ab83b0660fc/torch/optim/asgd.py
NT-ASGD: Only use ASGD when validation metric fails to improve
Embedding Dropout
● Apply dropout in word level, that is dropout zeros-out randomly selected word
vectors.
Activation Regularization
● Panalize network for producing large changes in hidden states and large
outputs leading to overfitting.
Results

More Related Content

Similar to RNNs for Speech

Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
NYversity
 
Screaming Fast Wpmu
Screaming Fast WpmuScreaming Fast Wpmu
Screaming Fast Wpmu
djcp
 

Similar to RNNs for Speech (20)

Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac china
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
ARI. HiPEAK 2014
ARI. HiPEAK 2014ARI. HiPEAK 2014
ARI. HiPEAK 2014
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
Multicore
MulticoreMulticore
Multicore
 
Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
5G-Performance-Optimisation DATA RADIO oT+P+++.pdf
5G-Performance-Optimisation DATA RADIO oT+P+++.pdf5G-Performance-Optimisation DATA RADIO oT+P+++.pdf
5G-Performance-Optimisation DATA RADIO oT+P+++.pdf
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in Android
 
Ratpack the story so far
Ratpack the story so farRatpack the story so far
Ratpack the story so far
 
Simplified Troubleshooting through API Scripting
Simplified Troubleshooting through API Scripting Simplified Troubleshooting through API Scripting
Simplified Troubleshooting through API Scripting
 
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Transl...
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Transl...Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Transl...
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Transl...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
HPP Week 1 Summary
HPP Week 1 SummaryHPP Week 1 Summary
HPP Week 1 Summary
 
Screaming Fast Wpmu
Screaming Fast WpmuScreaming Fast Wpmu
Screaming Fast Wpmu
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
 
SOLID refactoring - racing car katas
SOLID refactoring - racing car katasSOLID refactoring - racing car katas
SOLID refactoring - racing car katas
 
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
 

More from Bilkent University (6)

Poster SCGlowTTS Interspeech 2021
Poster SCGlowTTS Interspeech 2021Poster SCGlowTTS Interspeech 2021
Poster SCGlowTTS Interspeech 2021
 
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
 
Fame cvpr
Fame cvprFame cvpr
Fame cvpr
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014
 
Cmap presentation
Cmap presentationCmap presentation
Cmap presentation
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

RNNs for Speech

  • 1. RNNs for Speech Faster and smaller RNNs with new regularization techniques.
  • 2. Old Good RNNs Cannot train RNN!! Gradients get crazy!! Fishes are better at remembering!!! I watched Schmidhuber and liked him!! I don’t care baseline, I use what the cool boys use!! Why so big, Occam’s will cry!! My GPU has 4GB!! I can’t wait months to train!! X et al. said GRUs are better!!
  • 3. What else? I need a RNN size model with LSTM performance !! I need a smaller model or a better smart phone !! FastGRNN http://manikvarma.org/pubs/kusupati18.pdf This forget gate makes no sense!! May the ReLU be with you!! I do speech recognition!! I watched Bengio and liked him!! LightGRU https://arxiv.org/abs/1803.10225 I need Regularization!!! Dropout is not good!!! AWD-LSTM https://arxiv.org/abs/1708.02182
  • 4. Fast GRNN ● 2 trainable matrices vs 6 trainable matrices in a GRU layer. ● Low rank approximation of matrices: w = w1(w2).T ● Integer quantization for parameters. ● Piecewise linear approximation of non-linearities.
  • 6.
  • 7. Light Gated Recurrent Units ● Remove the reset gate. ● Replace tanh with ReLU ● Batch normalization to reduce ReLU unstability. ● Specifically targeting speech recognition. ● Orthogonal weight initialization, Variational dropout
  • 9. Results 40 log-mel filter banks Maximum likelihood linear regression
  • 11. ASGD Weight Dropped LSTM ● Drop Connect ● Averaged SGD ● Embedding Dropout ● Activation Regularization
  • 12. Weight Dropping ● Apply Drop-Connect to hidden to hidden connections. (All U matrices) ● Preventing recurrent unit overfitting. ● It needs not to modify optimized RNN implementations in DL frameworks. ● Apply the same dropout mask for the all sequence.
  • 13. Average SGD and NT-ASGD Number of steps to start averaging Weights optimized per iterationWeights used as the final model PyTorch implementation: https://github.com/pytorch/pytorch/blob/cd9b27231b51633e76e28b6a34002ab83b0660fc/torch/optim/asgd.py NT-ASGD: Only use ASGD when validation metric fails to improve
  • 14. Embedding Dropout ● Apply dropout in word level, that is dropout zeros-out randomly selected word vectors. Activation Regularization ● Panalize network for producing large changes in hidden states and large outputs leading to overfitting.