SlideShare a Scribd company logo
1 of 33
Download to read offline
Efficient Lattice Rescoring
using Recurrent Neural
Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
Proc. of ICASSP 2014
Introduced by Makoto Morishita
2016/02/25 MT Study Group
What is a Language Model
• Language models assign a probability to
each sentence.



2
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3

P(W2)= 8.932 * 10
-4

P(W3)= 2.432 * 10
-7
What is a Language Model
• Language models assign a probability to
each sentence.



3
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3

P(W2)= 8.932 * 10
-4

P(W3)= 2.432 * 10
-7
Best!
In this paper…
• Authors propose 2 new methods to
efficiently re-score speech recognition
lattices.
4
0 1
7
9
2 3 4 5 6
8
high this is my mobile phone
phones
this
this
hi
hy
Language Models
n-gram back off model
6
This is my mobile phone
hone
home
2345
1
• Use n-gram words to estimate the next word
probability.
n-gram back off model
• Use n-gram words to estimate the next word
probability.
7
This is my mobile phone
hone
home
2345
1
If bi-gram, use these words.
Feedforward neural network language model
• Use n-gram words and feedforward neural
network.
8
[Y. Bengio et. al. 2002]
Feedforward neural network language model
9
[Y. Bengio et. al. 2002]
http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_language_model.html
Recurrent neural network language model
• Use full history contexts and recurrent neural
network.
10
[T. Mikolov et. al. 2010]
0
0
1
0
current word
history
sigmoid softmax
wi 1
si 2
si 1
si 1
P(wi|wi 1, si 2)
Language Model States
LM states
12
• To use LM for re-scoring task, 

we need to store the states of LM to
efficiently score the sentence.
bi-gram
13
0 1 2 3
a
b
c
e
d
SR Lattice
bi-gram

LM states
1aa
b
c
e
1b
2c
2d
0<s> 3e
e
c
d
d
tri-gram
14
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram

LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
tri-gram
15
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram

LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
States become
larger!
Difference
• n-gram back off model & feedforward NNLM

- Use only fixed n-gram words.
• Recurrent NNLM

- Use whole past words (history). 

- LM states will grow rapidly.

- It takes a lot of computational cost.
16
We want to reduce recurrent NNLM states
Hypothesis
Context information gradually diminishing
• We don’t have to distinguish all of the
histories.
• e.g.

I am presenting the paper about RNNLM.

≒

We are presenting the paper about RNNLM.
18
Similar history make similar vector
• We don’t have to distinguish all of the
histories.
• e.g.

I am presenting the paper about RNNLM.

≒

I am introducing the paper about RNNLM.

19
Proposed Method
n-gram based history clustering
• I am presenting the paper about RNNLM.

≒

We are presenting the paper about RNNLM.
• If the n-gram is the same,

we use the same history vector.
21
History vector based clustering
• I am presenting the paper about RNNLM.

≒

I am introducing the paper about RNNLM.
• If the history vector is similar to other vector,

we use the same history vector.
22
Experiments
Experimental results
24
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Experimental results
25
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Experimental results
26
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
comparable WER and

70% reduction in lattice size
27
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Same WER and

45% reduction in lattice size
Experimental results
28
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Same WER and

7% reduction in lattice size
Experimental results
Experimental results
29
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Comparable WER and

72.4% reduction in lattice size
Conclusion
Conclusion
• Proposed methods can achieve comparable
WER with 10k-best re-ranking, as well as over
70% compression in lattice size.
• Small lattice size make computational cost
smaller!
31
References
• これもある意味Deep Learning,Recurrent
Neural Network Language Modelの話
[MLAC2013_9日目]

http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_l
anguage_model.html
32
Prefix tree structuring
33

More Related Content

Similar to [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 

Similar to [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models (20)

NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Radcliffe
RadcliffeRadcliffe
Radcliffe
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
 
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
SP Study1018 Paper Reading
SP Study1018 Paper ReadingSP Study1018 Paper Reading
SP Study1018 Paper Reading
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and Applications
 
Enm fy17nano qsar
Enm fy17nano qsarEnm fy17nano qsar
Enm fy17nano qsar
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
 

More from NAIST Machine Translation Study Group

More from NAIST Machine Translation Study Group (13)

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

  • 1. Efficient Lattice Rescoring using Recurrent Neural Network Language Models X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland Proc. of ICASSP 2014 Introduced by Makoto Morishita 2016/02/25 MT Study Group
  • 2. What is a Language Model • Language models assign a probability to each sentence.
 
 2 W1 = speech recognition system W2 = speech cognition system W3 = speck podcast histamine P(W1)= 4.021 * 10 -3
 P(W2)= 8.932 * 10 -4
 P(W3)= 2.432 * 10 -7
  • 3. What is a Language Model • Language models assign a probability to each sentence.
 
 3 W1 = speech recognition system W2 = speech cognition system W3 = speck podcast histamine P(W1)= 4.021 * 10 -3
 P(W2)= 8.932 * 10 -4
 P(W3)= 2.432 * 10 -7 Best!
  • 4. In this paper… • Authors propose 2 new methods to efficiently re-score speech recognition lattices. 4 0 1 7 9 2 3 4 5 6 8 high this is my mobile phone phones this this hi hy
  • 6. n-gram back off model 6 This is my mobile phone hone home 2345 1 • Use n-gram words to estimate the next word probability.
  • 7. n-gram back off model • Use n-gram words to estimate the next word probability. 7 This is my mobile phone hone home 2345 1 If bi-gram, use these words.
  • 8. Feedforward neural network language model • Use n-gram words and feedforward neural network. 8 [Y. Bengio et. al. 2002]
  • 9. Feedforward neural network language model 9 [Y. Bengio et. al. 2002] http://kiyukuta.github.io/2013/12/09/ mlac2013_day9_recurrent_neural_network_language_model.html
  • 10. Recurrent neural network language model • Use full history contexts and recurrent neural network. 10 [T. Mikolov et. al. 2010] 0 0 1 0 current word history sigmoid softmax wi 1 si 2 si 1 si 1 P(wi|wi 1, si 2)
  • 12. LM states 12 • To use LM for re-scoring task, 
 we need to store the states of LM to efficiently score the sentence.
  • 13. bi-gram 13 0 1 2 3 a b c e d SR Lattice bi-gram
 LM states 1aa b c e 1b 2c 2d 0<s> 3e e c d d
  • 14. tri-gram 14 0 1 2 3 a b c e d SR Lattice tri-gram
 LM states 1<s>,a a b 0<s> 2<s>,b 2a,c 2a,d 2a,c 2a,d c d c d 3e,d 3e,c e e e e
  • 15. tri-gram 15 0 1 2 3 a b c e d SR Lattice tri-gram
 LM states 1<s>,a a b 0<s> 2<s>,b 2a,c 2a,d 2a,c 2a,d c d c d 3e,d 3e,c e e e e States become larger!
  • 16. Difference • n-gram back off model & feedforward NNLM
 - Use only fixed n-gram words. • Recurrent NNLM
 - Use whole past words (history). 
 - LM states will grow rapidly.
 - It takes a lot of computational cost. 16 We want to reduce recurrent NNLM states
  • 18. Context information gradually diminishing • We don’t have to distinguish all of the histories. • e.g.
 I am presenting the paper about RNNLM.
 ≒
 We are presenting the paper about RNNLM. 18
  • 19. Similar history make similar vector • We don’t have to distinguish all of the histories. • e.g.
 I am presenting the paper about RNNLM.
 ≒
 I am introducing the paper about RNNLM.
 19
  • 21. n-gram based history clustering • I am presenting the paper about RNNLM.
 ≒
 We are presenting the paper about RNNLM. • If the n-gram is the same,
 we use the same history vector. 21
  • 22. History vector based clustering • I am presenting the paper about RNNLM.
 ≒
 I am introducing the paper about RNNLM. • If the history vector is similar to other vector,
 we use the same history vector. 22
  • 24. Experimental results 24 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline
  • 25. Experimental results 25 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline
  • 26. Experimental results 26 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline comparable WER and
 70% reduction in lattice size
  • 27. 27 RNNLM n-gram based history clustering RNNLM history vector based clustering Same WER and
 45% reduction in lattice size Experimental results
  • 28. 28 RNNLM n-gram based history clustering RNNLM history vector based clustering Same WER and
 7% reduction in lattice size Experimental results
  • 29. Experimental results 29 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline Comparable WER and
 72.4% reduction in lattice size
  • 31. Conclusion • Proposed methods can achieve comparable WER with 10k-best re-ranking, as well as over 70% compression in lattice size. • Small lattice size make computational cost smaller! 31
  • 32. References • これもある意味Deep Learning,Recurrent Neural Network Language Modelの話 [MLAC2013_9日目]
 http://kiyukuta.github.io/2013/12/09/ mlac2013_day9_recurrent_neural_network_l anguage_model.html 32