[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Efﬁcient Lattice Rescoring
using Recurrent Neural
Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
Proc. of ICASSP 2014
Introduced by Makoto Morishita
2016/02/25 MT Study Group

What is a Language Model
• Language models assign a probability to
each sentence. 
 
2
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3 
P(W2)= 8.932 * 10
-4 
P(W3)= 2.432 * 10
-7

What is a Language Model
• Language models assign a probability to
each sentence. 
 
3
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3 
P(W2)= 8.932 * 10
-4 
P(W3)= 2.432 * 10
-7
Best!

In this paper…
• Authors propose 2 new methods to
efﬁciently re-score speech recognition
lattices.
4
0 1
7
9
2 3 4 5 6
8
high this is my mobile phone
phones
this
this
hi
hy

n-gram back off model
6
This is my mobile phone
hone
home
2345
1
• Use n-gram words to estimate the next word
probability.

n-gram back off model
• Use n-gram words to estimate the next word
probability.
7
This is my mobile phone
hone
home
2345
1
If bi-gram, use these words.

Feedforward neural network language model
• Use n-gram words and feedforward neural
network.
8
[Y. Bengio et. al. 2002]

Feedforward neural network language model
9
[Y. Bengio et. al. 2002]
http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_language_model.html

Recurrent neural network language model
• Use full history contexts and recurrent neural
network.
10
[T. Mikolov et. al. 2010]
0
0
1
0
current word
history
sigmoid softmax
wi 1
si 2
si 1
si 1
P(wi|wi 1, si 2)

LM states
12
• To use LM for re-scoring task,  
we need to store the states of LM to
efﬁciently score the sentence.

bi-gram
13
0 1 2 3
a
b
c
e
d
SR Lattice
bi-gram 
LM states
1aa
b
c
e
1b
2c
2d
0<s> 3e
e
c
d
d

tri-gram
14
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram 
LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e

tri-gram
15
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram 
LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
States become
larger!

Difference
• n-gram back off model & feedforward NNLM 
- Use only ﬁxed n-gram words.
• Recurrent NNLM 
- Use whole past words (history).  
- LM states will grow rapidly. 
- It takes a lot of computational cost.
16
We want to reduce recurrent NNLM states

Context information gradually diminishing
• We don’t have to distinguish all of the
histories.
• e.g. 
I am presenting the paper about RNNLM. 
≒ 
We are presenting the paper about RNNLM.
18

Similar history make similar vector
• We don’t have to distinguish all of the
histories.
• e.g. 
I am presenting the paper about RNNLM. 
≒ 
I am introducing the paper about RNNLM. 
19

n-gram based history clustering
• I am presenting the paper about RNNLM. 
≒ 
We are presenting the paper about RNNLM.
• If the n-gram is the same, 
we use the same history vector.
21

History vector based clustering
• I am presenting the paper about RNNLM. 
≒ 
I am introducing the paper about RNNLM.
• If the history vector is similar to other vector, 
we use the same history vector.
22

Experimental results
24
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline

25
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
Baseline

26
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
Baseline
comparable WER and 
70% reduction in lattice size

27
Same WER and 

28
Same WER and 

29
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
Baseline
Comparable WER and 
72.4% reduction in lattice size

Conclusion
• Proposed methods can achieve comparable
WER with 10k-best re-ranking, as well as over
70% compression in lattice size.
• Small lattice size make computational cost
smaller!
31

References
• これもある意味Deep Learning，Recurrent
Neural Network Language Modelの話
[MLAC2013_9日目] 
http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_l
anguage_model.html
32

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Recommended

Recommended

More Related Content

Similar to [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Similar to [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models (20)

More from NAIST Machine Translation Study Group

More from NAIST Machine Translation Study Group (13)

Recently uploaded

Recently uploaded (20)

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models