Efficient Lattice Rescoring Using Recurrent Neural Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
ICASSP 2014
I introduced this paper at NAIST Machine Translation Study Group.
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models
1. Efficient Lattice Rescoring
using Recurrent Neural
Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
Proc. of ICASSP 2014
Introduced by Makoto Morishita
2016/02/25 MT Study Group
2. What is a Language Model
• Language models assign a probability to
each sentence.
2
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3
P(W2)= 8.932 * 10
-4
P(W3)= 2.432 * 10
-7
3. What is a Language Model
• Language models assign a probability to
each sentence.
3
W1 = speech recognition
system
W2 = speech cognition
system
W3 = speck podcast
histamine
P(W1)= 4.021 * 10
-3
P(W2)= 8.932 * 10
-4
P(W3)= 2.432 * 10
-7
Best!
4. In this paper…
• Authors propose 2 new methods to
efficiently re-score speech recognition
lattices.
4
0 1
7
9
2 3 4 5 6
8
high this is my mobile phone
phones
this
this
hi
hy
6. n-gram back off model
6
This is my mobile phone
hone
home
2345
1
• Use n-gram words to estimate the next word
probability.
7. n-gram back off model
• Use n-gram words to estimate the next word
probability.
7
This is my mobile phone
hone
home
2345
1
If bi-gram, use these words.
8. Feedforward neural network language model
• Use n-gram words and feedforward neural
network.
8
[Y. Bengio et. al. 2002]
9. Feedforward neural network language model
9
[Y. Bengio et. al. 2002]
http://kiyukuta.github.io/2013/12/09/
mlac2013_day9_recurrent_neural_network_language_model.html
10. Recurrent neural network language model
• Use full history contexts and recurrent neural
network.
10
[T. Mikolov et. al. 2010]
0
0
1
0
current word
history
sigmoid softmax
wi 1
si 2
si 1
si 1
P(wi|wi 1, si 2)
12. LM states
12
• To use LM for re-scoring task,
we need to store the states of LM to
efficiently score the sentence.
13. bi-gram
13
0 1 2 3
a
b
c
e
d
SR Lattice
bi-gram
LM states
1aa
b
c
e
1b
2c
2d
0<s> 3e
e
c
d
d
14. tri-gram
14
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram
LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
15. tri-gram
15
0 1 2 3
a
b
c
e
d
SR Lattice
tri-gram
LM states
1<s>,a
a
b
0<s>
2<s>,b
2a,c
2a,d
2a,c
2a,d
c
d
c
d
3e,d
3e,c
e
e
e
e
States become
larger!
16. Difference
• n-gram back off model & feedforward NNLM
- Use only fixed n-gram words.
• Recurrent NNLM
- Use whole past words (history).
- LM states will grow rapidly.
- It takes a lot of computational cost.
16
We want to reduce recurrent NNLM states
18. Context information gradually diminishing
• We don’t have to distinguish all of the
histories.
• e.g.
I am presenting the paper about RNNLM.
≒
We are presenting the paper about RNNLM.
18
19. Similar history make similar vector
• We don’t have to distinguish all of the
histories.
• e.g.
I am presenting the paper about RNNLM.
≒
I am introducing the paper about RNNLM.
19
21. n-gram based history clustering
• I am presenting the paper about RNNLM.
≒
We are presenting the paper about RNNLM.
• If the n-gram is the same,
we use the same history vector.
21
22. History vector based clustering
• I am presenting the paper about RNNLM.
≒
I am introducing the paper about RNNLM.
• If the history vector is similar to other vector,
we use the same history vector.
22
24. Experimental results
24
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
25. Experimental results
25
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
26. Experimental results
26
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
comparable WER and
70% reduction in lattice size
27. 27
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Same WER and
45% reduction in lattice size
Experimental results
28. 28
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Same WER and
7% reduction in lattice size
Experimental results
29. Experimental results
29
4-gram back-off LM
Feedforward NNLM
RNNLM Reranking
RNNLM n-gram based history clustering
RNNLM history vector based clustering
Baseline
Comparable WER and
72.4% reduction in lattice size
31. Conclusion
• Proposed methods can achieve comparable
WER with 10k-best re-ranking, as well as over
70% compression in lattice size.
• Small lattice size make computational cost
smaller!
31