SlideShare a Scribd company logo
Efficient Lattice Rescoring
using Recurrent Neural
Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
Proc. of ICASSP 2014
Introduced by Makoto Morishita
2016/02/25 MT Study Group
What is a Language Model
• Language models assign a probability to
each sentence.



2
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3

P(W2)= 8.932 * 10
-4

P(W3)= 2.432 * 10
-7
What is a Language Model
• Language models assign a probability to
each sentence.



3
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3

P(W2)= 8.932 * 10
-4

P(W3)= 2.432 * 10
-7
Best!
In this paper…
• Authors propose 2 new methods to
efficiently re-score speech recognition
lattices.
4
0 1
7
9
2 3 4 5 6
8
high this is my mobile phone
phones
this
this
hi
hy
Language Models
n-gram back off model
6
This is my mobile phone
hone
home
2345
1
• Use n-gram words to estimate the next word
probability.
n-gram back off model
• Use n-gram words to estimate the next word
probability.
7
This is my mobile phone
hone
home
2345
1
If bi-gram, use these words.
Feedforward neural network language model
• Use n-gram words and feedforward neural
network.
8
[Y. Bengio et. al. 2002]
Feedforward neural network language model
9
[Y. Bengio et. al. 2002]
http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_language_model.html
Recurrent neural network language model
• Use full history contexts and recurrent neural
network.
10
[T. Mikolov et. al. 2010]
0
0
1
0
current word
history
sigmoid softmax
wi 1
si 2
si 1
si 1
P(wi|wi 1, si 2)
Language Model States
LM states
12
• To use LM for re-scoring task, 

we need to store the states of LM to
efficiently score the sentence.
bi-gram
13
0 1 2 3
a
b
c
e
d
SR Lattice
bi-gram

LM states
1aa
b
c
e
1b
2c
2d
0<s> 3e
e
c
d
d
tri-gram
14
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram

LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
tri-gram
15
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram

LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
States become
larger!
Difference
• n-gram back off model & feedforward NNLM

- Use only fixed n-gram words.
• Recurrent NNLM

- Use whole past words (history). 

- LM states will grow rapidly.

- It takes a lot of computational cost.
16
We want to reduce recurrent NNLM states
Hypothesis
Context information gradually diminishing
• We don’t have to distinguish all of the
histories.
• e.g.

I am presenting the paper about RNNLM.

≒

We are presenting the paper about RNNLM.
18
Similar history make similar vector
• We don’t have to distinguish all of the
histories.
• e.g.

I am presenting the paper about RNNLM.

≒

I am introducing the paper about RNNLM.

19
Proposed Method
n-gram based history clustering
• I am presenting the paper about RNNLM.

≒

We are presenting the paper about RNNLM.
• If the n-gram is the same,

we use the same history vector.
21
History vector based clustering
• I am presenting the paper about RNNLM.

≒

I am introducing the paper about RNNLM.
• If the history vector is similar to other vector,

we use the same history vector.
22
Experiments
Experimental results
24
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Experimental results
25
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Experimental results
26
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
comparable WER and

70% reduction in lattice size
27
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Same WER and

45% reduction in lattice size
Experimental results
28
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Same WER and

7% reduction in lattice size
Experimental results
Experimental results
29
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Comparable WER and

72.4% reduction in lattice size
Conclusion
Conclusion
• Proposed methods can achieve comparable
WER with 10k-best re-ranking, as well as over
70% compression in lattice size.
• Small lattice size make computational cost
smaller!
31
References
• これもある意味Deep Learning,Recurrent
Neural Network Language Modelの話
[MLAC2013_9日目]

http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_l
anguage_model.html
32
Prefix tree structuring
33

More Related Content

Similar to [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
Normunds Grūzītis
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 
Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
Ali Bencherif
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
L. Thorne McCarty
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
Seoul National University
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
Radcliffe
RadcliffeRadcliffe
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
Daniel Perez
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Yuki Arase
 
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Association for Computational Linguistics
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
Ha Phuong
 
SP Study1018 Paper Reading
SP Study1018 Paper ReadingSP Study1018 Paper Reading
SP Study1018 Paper Reading
Mori Takuma
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
Work-Bench
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
June-Woo Kim
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and Applications
Fedelucio Narducci
 
Enm fy17nano qsar
Enm fy17nano qsarEnm fy17nano qsar
Enm fy17nano qsar
PaulHarten1
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
fmaumus
 

Similar to [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models (20)

NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
 
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Radcliffe
RadcliffeRadcliffe
Radcliffe
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)
 
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
SP Study1018 Paper Reading
SP Study1018 Paper ReadingSP Study1018 Paper Reading
SP Study1018 Paper Reading
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
 
Temporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and ApplicationsTemporal Semantic Techniques for Text Analysis and Applications
Temporal Semantic Techniques for Text Analysis and Applications
 
Enm fy17nano qsar
Enm fy17nano qsarEnm fy17nano qsar
Enm fy17nano qsar
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
 

More from NAIST Machine Translation Study Group

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
NAIST Machine Translation Study Group
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
NAIST Machine Translation Study Group
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
NAIST Machine Translation Study Group
 

More from NAIST Machine Translation Study Group (13)

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

  • 1. Efficient Lattice Rescoring using Recurrent Neural Network Language Models X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland Proc. of ICASSP 2014 Introduced by Makoto Morishita 2016/02/25 MT Study Group
  • 2. What is a Language Model • Language models assign a probability to each sentence.
 
 2 W1 = speech recognition system W2 = speech cognition system W3 = speck podcast histamine P(W1)= 4.021 * 10 -3
 P(W2)= 8.932 * 10 -4
 P(W3)= 2.432 * 10 -7
  • 3. What is a Language Model • Language models assign a probability to each sentence.
 
 3 W1 = speech recognition system W2 = speech cognition system W3 = speck podcast histamine P(W1)= 4.021 * 10 -3
 P(W2)= 8.932 * 10 -4
 P(W3)= 2.432 * 10 -7 Best!
  • 4. In this paper… • Authors propose 2 new methods to efficiently re-score speech recognition lattices. 4 0 1 7 9 2 3 4 5 6 8 high this is my mobile phone phones this this hi hy
  • 6. n-gram back off model 6 This is my mobile phone hone home 2345 1 • Use n-gram words to estimate the next word probability.
  • 7. n-gram back off model • Use n-gram words to estimate the next word probability. 7 This is my mobile phone hone home 2345 1 If bi-gram, use these words.
  • 8. Feedforward neural network language model • Use n-gram words and feedforward neural network. 8 [Y. Bengio et. al. 2002]
  • 9. Feedforward neural network language model 9 [Y. Bengio et. al. 2002] http://kiyukuta.github.io/2013/12/09/ mlac2013_day9_recurrent_neural_network_language_model.html
  • 10. Recurrent neural network language model • Use full history contexts and recurrent neural network. 10 [T. Mikolov et. al. 2010] 0 0 1 0 current word history sigmoid softmax wi 1 si 2 si 1 si 1 P(wi|wi 1, si 2)
  • 12. LM states 12 • To use LM for re-scoring task, 
 we need to store the states of LM to efficiently score the sentence.
  • 13. bi-gram 13 0 1 2 3 a b c e d SR Lattice bi-gram
 LM states 1aa b c e 1b 2c 2d 0<s> 3e e c d d
  • 14. tri-gram 14 0 1 2 3 a b c e d SR Lattice tri-gram
 LM states 1<s>,a a b 0<s> 2<s>,b 2a,c 2a,d 2a,c 2a,d c d c d 3e,d 3e,c e e e e
  • 15. tri-gram 15 0 1 2 3 a b c e d SR Lattice tri-gram
 LM states 1<s>,a a b 0<s> 2<s>,b 2a,c 2a,d 2a,c 2a,d c d c d 3e,d 3e,c e e e e States become larger!
  • 16. Difference • n-gram back off model & feedforward NNLM
 - Use only fixed n-gram words. • Recurrent NNLM
 - Use whole past words (history). 
 - LM states will grow rapidly.
 - It takes a lot of computational cost. 16 We want to reduce recurrent NNLM states
  • 18. Context information gradually diminishing • We don’t have to distinguish all of the histories. • e.g.
 I am presenting the paper about RNNLM.
 ≒
 We are presenting the paper about RNNLM. 18
  • 19. Similar history make similar vector • We don’t have to distinguish all of the histories. • e.g.
 I am presenting the paper about RNNLM.
 ≒
 I am introducing the paper about RNNLM.
 19
  • 21. n-gram based history clustering • I am presenting the paper about RNNLM.
 ≒
 We are presenting the paper about RNNLM. • If the n-gram is the same,
 we use the same history vector. 21
  • 22. History vector based clustering • I am presenting the paper about RNNLM.
 ≒
 I am introducing the paper about RNNLM. • If the history vector is similar to other vector,
 we use the same history vector. 22
  • 24. Experimental results 24 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline
  • 25. Experimental results 25 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline
  • 26. Experimental results 26 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline comparable WER and
 70% reduction in lattice size
  • 27. 27 RNNLM n-gram based history clustering RNNLM history vector based clustering Same WER and
 45% reduction in lattice size Experimental results
  • 28. 28 RNNLM n-gram based history clustering RNNLM history vector based clustering Same WER and
 7% reduction in lattice size Experimental results
  • 29. Experimental results 29 4-gram back-off LM Feedforward NNLM RNNLM Reranking RNNLM n-gram based history clustering RNNLM history vector based clustering Baseline Comparable WER and
 72.4% reduction in lattice size
  • 31. Conclusion • Proposed methods can achieve comparable WER with 10k-best re-ranking, as well as over 70% compression in lattice size. • Small lattice size make computational cost smaller! 31
  • 32. References • これもある意味Deep Learning,Recurrent Neural Network Language Modelの話 [MLAC2013_9日目]
 http://kiyukuta.github.io/2013/12/09/ mlac2013_day9_recurrent_neural_network_l anguage_model.html 32