SlideShare a Scribd company logo
Language Model
MT STUDY MEETING 5/21
HIROYUKI FUDABA
How can you say whether a
sentence is natural or not?
𝑒1 = he is dog
𝑒2 = is big he
𝑒1 = this is a purple dog
How can you say whether a
sentence is natural or not?
𝑒1 = he is dog
↑ correct
𝑒2 = is big he
↑ grammatically wrong
𝑒1 = this is a purple dog
↑ semantically wrong
Language model probability
We want to treat “naturality” statistically
We represent this with language model probability 𝑃 𝑒
𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 0.7
𝑃 𝑒 = 𝑖𝑠 𝑏𝑖𝑔 ℎ𝑒 = 0.3
𝑃 𝑒 = 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑝𝑢𝑟𝑝𝑙𝑒 𝑑𝑜𝑔 = 0.5
Some ways to estimate 𝑃(𝑒)
 n-gram model
 Positional language model
 factored language model
 cache language model
Basis of n-gram
we notate a sentence as 𝒆 = 𝑒1
𝐼
, 𝐼 being the length of it
𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔
𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝐼 = 3
We can define 𝑃(𝑒) as following
𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 𝑃 𝐼 = 3, 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔
= 𝑃 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝑒4 = 𝑒𝑜𝑠
= P(e0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝑒4 = 𝑒𝑜𝑠 )
estimate 𝑃(𝑒) with a simple way
assume that natural sentence appear more frequently than the ones
that aren’t, simple way to estimate 𝑃(𝑒) is following
Bring a big training data 𝐸𝑡𝑟𝑎𝑖𝑛
Count frequencies of each sentences in 𝐸𝑡𝑟𝑎𝑖𝑛
𝑃𝑠 𝑒 =
𝑓𝑟𝑒𝑞 𝑒
𝑠𝑖𝑧𝑒(𝐸𝑡𝑟𝑎𝑖𝑛)
=
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒
𝑒 𝑐𝑡𝑟𝑎𝑖𝑛( 𝑒)
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 returns how many sentences exactly matched to “he is big”
Problem of estimation in simple way
when 𝐸𝑡𝑟𝑎𝑖𝑛 does not contain sentences 𝑒1 and 𝑒2,
than you can not say which is more natural.
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒1 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒2 = 0
𝑃𝑆 𝑒1 =
𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒1
𝑒 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒
= 0
𝑃𝑆 𝑒2 =
𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒2
𝑒 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒
= 0
You can not compare if both values are 0 …
Solution to 𝑃 𝑒 = 0
Rather thinking a sentence as a whole,
let’s think that a sentence is a data that is composed of words
𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑌 𝑃(𝑌)
𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 𝑃 𝑒1 = ℎ𝑒 𝑒0 = 𝑏𝑜𝑠 )
∗ P e2 = is e0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒)
∗ 𝑃 𝑒3 = 𝑏𝑖𝑔 𝑒0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠
∗ 𝑃 𝑒4 = 𝑒𝑜𝑠 𝑒0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = is, e3 = big)
Solution to 𝑃 𝑒 = 0
𝑃𝑆 𝑒 =
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒
𝑒 𝑐𝑡𝑟𝑎𝑖𝑛( 𝑒)
= 𝑃 𝑒1
𝐼
=
𝑖=1
𝐼+1
𝑃 𝑀𝐿 𝑒𝑖|𝑒0
𝑖−1
𝑃 𝑀𝐿 𝑒𝑖| 𝑒0
𝑖−1
=
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0
𝑖
𝑐𝑡𝑟𝑎𝑖𝑛(𝑒0
𝑖−1
)
So far 𝑃 𝑒1
𝐼
is completely equal to 𝑃𝑆(𝑒),
which means it still don’t work
Idea of n-gram model
Rather considering all words appeared before the word looking at,
let’s consider only 𝑛 − 1 words appeared just before the word
Instead of considering all words …
is big 𝑒𝑜𝑠he𝑏𝑜𝑠
Idea of n-gram model
Rather considering all words appeared before the word looking at,
let’s consider only 𝑛 − 1 words appeared just before the word
Consider only 𝑛 − 1 words
is big 𝑒𝑜𝑠he𝑏𝑜𝑠
n-gram in precise
From the previous expression
𝑃 𝑒1
𝐼
=
𝑖=1
𝐼+1
𝑃 𝑀𝐿 𝑒𝑖|𝑒0
𝑖−1
we can approximate 𝑃(𝑒) as following
𝑃 𝑒1
𝐼
≈
𝑖=1
𝐼+1
𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
How does this help?
𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 ≈ 𝑃 𝑒𝑖 = ℎ𝑒 | 𝑒𝑖−1 = 𝑏𝑜𝑠
∗ P ei = is | 𝑒𝑖−1 = he
∗ P ei = big ei−1 = is)
∗ P 𝑒𝑖 = 𝑒𝑜𝑠 | 𝑒𝑖−1 = 𝑏𝑖𝑔
Intuitively, a subset sequence appear more than it’s super set,
so 𝑃 𝑒 estimated with n-gram model is less likely to be 0
Smoothing n-gram model
 n-gram less likely estimate 𝑃 𝑒 = 0
 But it still have a possibility of estimating 0
→ Smoothing
Idea of smoothing
Combining probability of n-gram and (n-1)-gram
Even if probability of word 𝑤 could not be estimated with n-gram,
there is a possibility that probability can be estimated with (n-1)-gram
𝑃3−𝑔𝑟𝑎𝑚 𝑠𝑚𝑎𝑙𝑙 | ℎ𝑒 𝑖𝑠 = 0
P2−gram small is) = 0.03
0
0.05
0.1
0.15
0.2
0.25
P(he|<bos>) P(is|<bos> he) P(big|he is) P(small|he is) P(<eos>|is big)
probability
probability
Linear interpolation
Easiest, and basic way to express the idea
𝑃 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
= 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
0 ≤ 𝑎 ≤ 1
Adjusting 𝑎 to a good value is the problem
So how can we do that?
Adjusting 𝑎 to a good value
Easy way to achieve this is following
 Bring dataset which is different from training data
 Select 𝑎 that gives the highest likelihood to the dataset
Improve performance by considering each context
Witten-Bell smoothing
How should I choose 𝑎 if n-gram was like following?
President was President Ronald
elected 5 Reagan 38
the 3 Caza 1
in 3 Venetiaan 1
First 3
…
52 kind, sum 110 3 kind, sum 40
Witten-Bell smoothing
It is likely to have an unknown word after context “President was”
𝑎 should be large, so that (n-1)-gram will be more emphasized
𝑃 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
= 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
President was President Ronald
elected 5 Reagan 38
the 3 Caza 1
in 3 Venetiaan 1
First 3
…
52 kind, sum 110 3 kind, sum 40
Witten-Bell smoothing
It is likely to have an unknown word after context “President Ronald”
𝑎 should be small, so that n-gram will be more emphasized
𝑃 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
= 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
President was President Ronald
elected 5 Reagan 38
the 3 Caza 1
in 3 Venetiaan 1
First 3
…
52 kind, sum 110 3 kind, sum 40
Idea of Witten-Bell smoothing
If you only had a single coefficient value 𝑎 to adjust,
You can not consider context for each word
→ why not use different 𝒂 to consider each context info
for each word?
Witten-Bell smoothing in precise
Simple smoothing
𝑃 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
= 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
Witten-Bell smoothing
𝑃 𝑊𝐵 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
= 1 − 𝑎 𝑒𝑖−𝑛+1
𝑖−1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎 𝑒𝑖−𝑛+1
𝑖−1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
𝑎 𝑒𝑖−𝑛+1
𝑖−1 =
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗ + 𝑐 𝑒𝑖−𝑛+1
𝑖−1
Witten-Bell smoothing in precise
𝑎 𝑒𝑖−𝑛+1
𝑖−1 =
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗ + 𝑐 𝑒𝑖−𝑛+1
𝑖−1
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗ represents how many
kind of words continue after context 𝑒𝑖−𝑛+1
𝑖−1
𝑢 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑤𝑎𝑠,∗ = 52
𝑢 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑅𝑜𝑛𝑎𝑙𝑑,∗ = 3
President was President Ronald
elected 5 Reagan 38
the 3 Caza 1
in 3 Venetia
an
1
First 3
…
52 kind, sum 110 3 kind, sum 40
Witten-Bell smoothing in precise
𝑎 𝑒𝑖−𝑛+1
𝑖−1 =
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗ + 𝑐 𝑒𝑖−𝑛+1
𝑖−1
𝑎 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑤𝑎𝑠 =
52
110+52
= 0.32
𝑎 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑅𝑜𝑛𝑎𝑙𝑑 =
3
40+3
= 0.07
President was President Ronald
elected 5 Reagan 38
the 3 Caza 1
in 3 Venetia
an
1
First 3
…
52 kind, sum 110 3 kind, sum 40
Absolute discounting
 Yet another smoothing
 Unlike Witten-Bell smoothing which uses 𝑃 𝑀𝐿, it subtracts constant
value 𝑑 from frequency of each word in order to estimate
probability
𝑃𝑑 𝑒𝑖 | 𝑒0
𝑖−1
=
max 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0
𝑖
− 𝑑, 0
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0
𝑖−1
Abstruct discounting
So why do you subtract?
We want to treat low-frequent word as unknown word,
because low-frequent one can not really be trusted.
By doing this, (n-1)-gram gets more emphasized
Absolute discounting
𝑃𝑑 𝑒𝑖 | 𝑒𝑖−𝑛+1
𝑖−1
=
max 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒𝑖−𝑛+1
𝑖
− 𝑑, 0
𝑐𝑡𝑟𝑎𝑖𝑛 𝑒𝑖−𝑛+1
𝑖−1
𝑃𝑑 𝑒𝑖 = 𝑟𝑒𝑎𝑔𝑎𝑛|𝑒𝑖−2
𝑖−1
= 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑
=
38 − 0.5
40
= 0.9375
𝑃𝑑 𝑒𝑖 = 𝑐𝑎𝑧𝑎|𝑒𝑖−2
𝑖−1
= 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑
=
1 − 0.5
40
= 0.0125
𝑃𝑑 𝑒𝑖 = 𝑣𝑒𝑛𝑒𝑡𝑖𝑎𝑎𝑛|𝑒𝑖−2
𝑖−1
= 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑
=
1 − 0.5
40
= 0.0125
President was President Ronald
elected 5 Reagan 38
the 3 Caza 1
in 3 Venetia
an
1
First 3
…
52 kind, sum 110 3 kind, sum 40
Absolute discounting
𝑃𝑑 𝑒𝑖 = 𝑟𝑒𝑎𝑔𝑎𝑛|𝑒𝑖−2
𝑖−1
= 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.9375
𝑃𝑑 𝑒𝑖 = 𝑐𝑎𝑧𝑎|𝑒𝑖−2
𝑖−1
= 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.0125
𝑃𝑑 𝑒𝑖 = 𝑣𝑒𝑛𝑒𝑡𝑖𝑎𝑎𝑛|𝑒𝑖−2
𝑖−1
= 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.0125
𝑎 𝑒𝑖−𝑛+1
𝑖−1 = 1 − 0.9375 + 0.0125 + 0.0125 = 0.0375
Efficient way of solving this is following
𝑎 𝑒𝑖−𝑛+1
𝑖−1 =
𝑢 𝑒𝑖−𝑛+1
𝑖−1
,∗ × 𝑑
𝑐 𝑒𝑖−𝑛+1
𝑖−1
Absolute discounting
Now that we do not use maximum likelihood,
n-gram probability will be estimated as following
𝑃 𝑒𝑖| 𝑒𝑖−𝑛+1
𝑖−1
= 𝑃𝑑 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎 𝑒𝑖−𝑛+1
𝑖−1 𝑃 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
Quite similar, but differs in that absolute discounting use 𝑃𝑑
𝑃 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
= 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
+ 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2
𝑖−1
Kneser-Ney smoothing
achieve excellent performance
Similar to absolute discounting
Have an interest in a word that
only appears in specific context
Kneser-Ney smoothing
Lower order model is needed only when count in higher
order model is small
Suppose “San Francisco” is common, but “Francisco” appears only
after “San”
Both “San” and “Francisco” get a high unigram probability
But we want to give “Francisco” a low unigram probability!!
Kneser-Ney smoothing
Kneser-Ney is defined as following
𝑃𝑘𝑛 𝑒𝑖|𝑒𝑖−𝑛+1
𝑖−1
=
max 𝑢 ∗, 𝑒𝑖−𝑛+2
𝑖−1
− d, 0
𝑢 𝑒𝑖−𝑛+1
𝑖−1
Unknown words
Even though smoothing can reduce probability of having 𝑃 𝑒 = 0,
possibility of getting 0 still rely
We may give a possibility to unknown word as following
𝑃𝑢𝑛𝑘 𝑒𝑖 =
1
𝑉

More Related Content

What's hot

Комплекс тоо цуврал хичээл-1
Комплекс тоо цуврал хичээл-1Комплекс тоо цуврал хичээл-1
Комплекс тоо цуврал хичээл-1
Март
 
Reducible equation to quadratic form
Reducible equation to quadratic formReducible equation to quadratic form
Reducible equation to quadratic form
MahrukhShehzadi1
 
Addition and subtraction of rational expression
Addition and subtraction of rational expressionAddition and subtraction of rational expression
Addition and subtraction of rational expression
MartinGeraldine
 
Integers
IntegersIntegers
Integers
Ranjan K.M.
 
1.5 equations and solutions 1
1.5 equations and solutions 11.5 equations and solutions 1
1.5 equations and solutions 1
bweldon
 
Maths solving equations
Maths solving equationsMaths solving equations
Maths solving equations
Qwizdom UK
 
How to Integrate an Equation | Jameel Academy
How to Integrate an Equation | Jameel AcademyHow to Integrate an Equation | Jameel Academy
How to Integrate an Equation | Jameel Academy
Jameel Academy
 
How to write equations &expressions
How to write equations &expressionsHow to write equations &expressions
How to write equations &expressions
Mr Lam
 
Indices
Indices Indices
Indices
Hanini Hamsan
 
Absolute Value Notes
Absolute Value NotesAbsolute Value Notes
Absolute Value Notes
ClaireHumphrey3
 
Class Presentation Math 1
Class Presentation Math 1Class Presentation Math 1
Class Presentation Math 1
Michelle Podulka
 
S 1
S 1S 1
S 1
khyps13
 
Unit 4.6
Unit 4.6Unit 4.6
Unit 4.6
nglaze10
 
Sequences
SequencesSequences
Sequences
effiefil
 
Harkeerit&Kyra
Harkeerit&KyraHarkeerit&Kyra
Harkeerit&Kyra
kpwise
 
Introduction To Equations
Introduction To EquationsIntroduction To Equations
Introduction To Equations
gemmabean
 
Finding the sum of a geometric sequence
Finding the sum of a geometric sequenceFinding the sum of a geometric sequence
Finding the sum of a geometric sequence
mwagner1983
 
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
Dr. I. Uma Maheswari Maheswari
 
The magic of vedic maths
The magic of vedic mathsThe magic of vedic maths
The magic of vedic maths
Tarun Gehlot
 

What's hot (19)

Комплекс тоо цуврал хичээл-1
Комплекс тоо цуврал хичээл-1Комплекс тоо цуврал хичээл-1
Комплекс тоо цуврал хичээл-1
 
Reducible equation to quadratic form
Reducible equation to quadratic formReducible equation to quadratic form
Reducible equation to quadratic form
 
Addition and subtraction of rational expression
Addition and subtraction of rational expressionAddition and subtraction of rational expression
Addition and subtraction of rational expression
 
Integers
IntegersIntegers
Integers
 
1.5 equations and solutions 1
1.5 equations and solutions 11.5 equations and solutions 1
1.5 equations and solutions 1
 
Maths solving equations
Maths solving equationsMaths solving equations
Maths solving equations
 
How to Integrate an Equation | Jameel Academy
How to Integrate an Equation | Jameel AcademyHow to Integrate an Equation | Jameel Academy
How to Integrate an Equation | Jameel Academy
 
How to write equations &expressions
How to write equations &expressionsHow to write equations &expressions
How to write equations &expressions
 
Indices
Indices Indices
Indices
 
Absolute Value Notes
Absolute Value NotesAbsolute Value Notes
Absolute Value Notes
 
Class Presentation Math 1
Class Presentation Math 1Class Presentation Math 1
Class Presentation Math 1
 
S 1
S 1S 1
S 1
 
Unit 4.6
Unit 4.6Unit 4.6
Unit 4.6
 
Sequences
SequencesSequences
Sequences
 
Harkeerit&Kyra
Harkeerit&KyraHarkeerit&Kyra
Harkeerit&Kyra
 
Introduction To Equations
Introduction To EquationsIntroduction To Equations
Introduction To Equations
 
Finding the sum of a geometric sequence
Finding the sum of a geometric sequenceFinding the sum of a geometric sequence
Finding the sum of a geometric sequence
 
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
 
The magic of vedic maths
The magic of vedic mathsThe magic of vedic maths
The magic of vedic maths
 

Viewers also liked

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
NAIST Machine Translation Study Group
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
NAIST Machine Translation Study Group
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
子毅 楊
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters
 
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
Jungkyu Lee
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Universitat Politècnica de Catalunya
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
Roelof Pieters
 
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
D.CAMP
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNN
SANG WON PARK
 
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
Taejoon Yoo
 

Viewers also liked (14)

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
 
Code로 이해하는 RNN
Code로 이해하는 RNNCode로 이해하는 RNN
Code로 이해하는 RNN
 
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
 

Similar to [Book Reading] 機械翻訳 - Section 3 No.1

Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
Shuai Zhang
 
P2-Chp3-SequencesAndSeries from pure maths 2.pptx
P2-Chp3-SequencesAndSeries from pure maths 2.pptxP2-Chp3-SequencesAndSeries from pure maths 2.pptx
P2-Chp3-SequencesAndSeries from pure maths 2.pptx
ArafathAliMathsTeach
 
Complete Residue Systems.pptx
Complete Residue Systems.pptxComplete Residue Systems.pptx
Complete Residue Systems.pptx
JasonMeregildo3
 
Rational function 11
Rational function 11Rational function 11
Rational function 11
AjayQuines
 
Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptxYr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx
PremkumarLetchumanan
 
Binary Operations.pptx
Binary Operations.pptxBinary Operations.pptx
Binary Operations.pptx
SoyaMathew1
 
IGCSEFM-FactorTheorem.pptx
IGCSEFM-FactorTheorem.pptxIGCSEFM-FactorTheorem.pptx
IGCSEFM-FactorTheorem.pptx
AngieMichailidou
 
Finding the general term (not constant)
Finding the general term (not constant)Finding the general term (not constant)
Finding the general term (not constant)
AjayQuines
 
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
tungwc
 
Lecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdfLecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdf
MohammedKhodary4
 
Equations.pptx
Equations.pptxEquations.pptx
Equations.pptx
JeralynAlabanzas2
 
S1 z(def., prop., y operaciones)
S1  z(def., prop., y operaciones)S1  z(def., prop., y operaciones)
S1 z(def., prop., y operaciones)
EDGARYALLI
 
Sequence and series
Sequence and seriesSequence and series
Sequence and series
Denmar Marasigan
 
Комплекс тоо цуврал хичээл-2
Комплекс тоо цуврал хичээл-2Комплекс тоо цуврал хичээл-2
Комплекс тоо цуврал хичээл-2
Март
 
Polynomial division
Polynomial divisionPolynomial division
Polynomial division
drpahaworth
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Jongsu "Liam" Kim
 
PhyChem3_vector_matrix_mechanics.pptx
PhyChem3_vector_matrix_mechanics.pptxPhyChem3_vector_matrix_mechanics.pptx
PhyChem3_vector_matrix_mechanics.pptx
Erickson Fajiculay
 
ショアのアルゴリズム
ショアのアルゴリズムショアのアルゴリズム
ショアのアルゴリズム
FukiNakamura
 
Semana 10 numeros complejos i álgebra-uni ccesa007
Semana 10   numeros complejos i álgebra-uni ccesa007Semana 10   numeros complejos i álgebra-uni ccesa007
Semana 10 numeros complejos i álgebra-uni ccesa007
Demetrio Ccesa Rayme
 
WEEK 3.pdf
WEEK 3.pdfWEEK 3.pdf
WEEK 3.pdf
MarvinOreta
 

Similar to [Book Reading] 機械翻訳 - Section 3 No.1 (20)

Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
P2-Chp3-SequencesAndSeries from pure maths 2.pptx
P2-Chp3-SequencesAndSeries from pure maths 2.pptxP2-Chp3-SequencesAndSeries from pure maths 2.pptx
P2-Chp3-SequencesAndSeries from pure maths 2.pptx
 
Complete Residue Systems.pptx
Complete Residue Systems.pptxComplete Residue Systems.pptx
Complete Residue Systems.pptx
 
Rational function 11
Rational function 11Rational function 11
Rational function 11
 
Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptxYr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx
 
Binary Operations.pptx
Binary Operations.pptxBinary Operations.pptx
Binary Operations.pptx
 
IGCSEFM-FactorTheorem.pptx
IGCSEFM-FactorTheorem.pptxIGCSEFM-FactorTheorem.pptx
IGCSEFM-FactorTheorem.pptx
 
Finding the general term (not constant)
Finding the general term (not constant)Finding the general term (not constant)
Finding the general term (not constant)
 
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
 
Lecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdfLecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdf
 
Equations.pptx
Equations.pptxEquations.pptx
Equations.pptx
 
S1 z(def., prop., y operaciones)
S1  z(def., prop., y operaciones)S1  z(def., prop., y operaciones)
S1 z(def., prop., y operaciones)
 
Sequence and series
Sequence and seriesSequence and series
Sequence and series
 
Комплекс тоо цуврал хичээл-2
Комплекс тоо цуврал хичээл-2Комплекс тоо цуврал хичээл-2
Комплекс тоо цуврал хичээл-2
 
Polynomial division
Polynomial divisionPolynomial division
Polynomial division
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
PhyChem3_vector_matrix_mechanics.pptx
PhyChem3_vector_matrix_mechanics.pptxPhyChem3_vector_matrix_mechanics.pptx
PhyChem3_vector_matrix_mechanics.pptx
 
ショアのアルゴリズム
ショアのアルゴリズムショアのアルゴリズム
ショアのアルゴリズム
 
Semana 10 numeros complejos i álgebra-uni ccesa007
Semana 10   numeros complejos i álgebra-uni ccesa007Semana 10   numeros complejos i álgebra-uni ccesa007
Semana 10 numeros complejos i álgebra-uni ccesa007
 
WEEK 3.pdf
WEEK 3.pdfWEEK 3.pdf
WEEK 3.pdf
 

More from NAIST Machine Translation Study Group

On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
NAIST Machine Translation Study Group
 

More from NAIST Machine Translation Study Group (11)

On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 

Recently uploaded

学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
IJECEIAES
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 

Recently uploaded (20)

学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 

[Book Reading] 機械翻訳 - Section 3 No.1

  • 1. Language Model MT STUDY MEETING 5/21 HIROYUKI FUDABA
  • 2. How can you say whether a sentence is natural or not? 𝑒1 = he is dog 𝑒2 = is big he 𝑒1 = this is a purple dog
  • 3. How can you say whether a sentence is natural or not? 𝑒1 = he is dog ↑ correct 𝑒2 = is big he ↑ grammatically wrong 𝑒1 = this is a purple dog ↑ semantically wrong
  • 4. Language model probability We want to treat “naturality” statistically We represent this with language model probability 𝑃 𝑒 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 0.7 𝑃 𝑒 = 𝑖𝑠 𝑏𝑖𝑔 ℎ𝑒 = 0.3 𝑃 𝑒 = 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑝𝑢𝑟𝑝𝑙𝑒 𝑑𝑜𝑔 = 0.5
  • 5. Some ways to estimate 𝑃(𝑒)  n-gram model  Positional language model  factored language model  cache language model
  • 6. Basis of n-gram we notate a sentence as 𝒆 = 𝑒1 𝐼 , 𝐼 being the length of it 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝐼 = 3 We can define 𝑃(𝑒) as following 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 𝑃 𝐼 = 3, 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔 = 𝑃 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝑒4 = 𝑒𝑜𝑠 = P(e0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝑒4 = 𝑒𝑜𝑠 )
  • 7. estimate 𝑃(𝑒) with a simple way assume that natural sentence appear more frequently than the ones that aren’t, simple way to estimate 𝑃(𝑒) is following Bring a big training data 𝐸𝑡𝑟𝑎𝑖𝑛 Count frequencies of each sentences in 𝐸𝑡𝑟𝑎𝑖𝑛 𝑃𝑠 𝑒 = 𝑓𝑟𝑒𝑞 𝑒 𝑠𝑖𝑧𝑒(𝐸𝑡𝑟𝑎𝑖𝑛) = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 𝑒 𝑐𝑡𝑟𝑎𝑖𝑛( 𝑒) 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 returns how many sentences exactly matched to “he is big”
  • 8. Problem of estimation in simple way when 𝐸𝑡𝑟𝑎𝑖𝑛 does not contain sentences 𝑒1 and 𝑒2, than you can not say which is more natural. 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒1 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒2 = 0 𝑃𝑆 𝑒1 = 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒1 𝑒 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒 = 0 𝑃𝑆 𝑒2 = 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒2 𝑒 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒 = 0 You can not compare if both values are 0 …
  • 9. Solution to 𝑃 𝑒 = 0 Rather thinking a sentence as a whole, let’s think that a sentence is a data that is composed of words 𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑌 𝑃(𝑌) 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 𝑃 𝑒1 = ℎ𝑒 𝑒0 = 𝑏𝑜𝑠 ) ∗ P e2 = is e0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒) ∗ 𝑃 𝑒3 = 𝑏𝑖𝑔 𝑒0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠 ∗ 𝑃 𝑒4 = 𝑒𝑜𝑠 𝑒0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = is, e3 = big)
  • 10. Solution to 𝑃 𝑒 = 0 𝑃𝑆 𝑒 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 𝑒 𝑐𝑡𝑟𝑎𝑖𝑛( 𝑒) = 𝑃 𝑒1 𝐼 = 𝑖=1 𝐼+1 𝑃 𝑀𝐿 𝑒𝑖|𝑒0 𝑖−1 𝑃 𝑀𝐿 𝑒𝑖| 𝑒0 𝑖−1 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0 𝑖 𝑐𝑡𝑟𝑎𝑖𝑛(𝑒0 𝑖−1 ) So far 𝑃 𝑒1 𝐼 is completely equal to 𝑃𝑆(𝑒), which means it still don’t work
  • 11. Idea of n-gram model Rather considering all words appeared before the word looking at, let’s consider only 𝑛 − 1 words appeared just before the word Instead of considering all words … is big 𝑒𝑜𝑠he𝑏𝑜𝑠
  • 12. Idea of n-gram model Rather considering all words appeared before the word looking at, let’s consider only 𝑛 − 1 words appeared just before the word Consider only 𝑛 − 1 words is big 𝑒𝑜𝑠he𝑏𝑜𝑠
  • 13. n-gram in precise From the previous expression 𝑃 𝑒1 𝐼 = 𝑖=1 𝐼+1 𝑃 𝑀𝐿 𝑒𝑖|𝑒0 𝑖−1 we can approximate 𝑃(𝑒) as following 𝑃 𝑒1 𝐼 ≈ 𝑖=1 𝐼+1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1
  • 14. How does this help? 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 ≈ 𝑃 𝑒𝑖 = ℎ𝑒 | 𝑒𝑖−1 = 𝑏𝑜𝑠 ∗ P ei = is | 𝑒𝑖−1 = he ∗ P ei = big ei−1 = is) ∗ P 𝑒𝑖 = 𝑒𝑜𝑠 | 𝑒𝑖−1 = 𝑏𝑖𝑔 Intuitively, a subset sequence appear more than it’s super set, so 𝑃 𝑒 estimated with n-gram model is less likely to be 0
  • 15. Smoothing n-gram model  n-gram less likely estimate 𝑃 𝑒 = 0  But it still have a possibility of estimating 0 → Smoothing
  • 16. Idea of smoothing Combining probability of n-gram and (n-1)-gram Even if probability of word 𝑤 could not be estimated with n-gram, there is a possibility that probability can be estimated with (n-1)-gram 𝑃3−𝑔𝑟𝑎𝑚 𝑠𝑚𝑎𝑙𝑙 | ℎ𝑒 𝑖𝑠 = 0 P2−gram small is) = 0.03 0 0.05 0.1 0.15 0.2 0.25 P(he|<bos>) P(is|<bos> he) P(big|he is) P(small|he is) P(<eos>|is big) probability probability
  • 17. Linear interpolation Easiest, and basic way to express the idea 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 0 ≤ 𝑎 ≤ 1 Adjusting 𝑎 to a good value is the problem So how can we do that?
  • 18. Adjusting 𝑎 to a good value Easy way to achieve this is following  Bring dataset which is different from training data  Select 𝑎 that gives the highest likelihood to the dataset Improve performance by considering each context
  • 19. Witten-Bell smoothing How should I choose 𝑎 if n-gram was like following? President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetiaan 1 First 3 … 52 kind, sum 110 3 kind, sum 40
  • 20. Witten-Bell smoothing It is likely to have an unknown word after context “President was” 𝑎 should be large, so that (n-1)-gram will be more emphasized 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetiaan 1 First 3 … 52 kind, sum 110 3 kind, sum 40
  • 21. Witten-Bell smoothing It is likely to have an unknown word after context “President Ronald” 𝑎 should be small, so that n-gram will be more emphasized 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetiaan 1 First 3 … 52 kind, sum 110 3 kind, sum 40
  • 22. Idea of Witten-Bell smoothing If you only had a single coefficient value 𝑎 to adjust, You can not consider context for each word → why not use different 𝒂 to consider each context info for each word?
  • 23. Witten-Bell smoothing in precise Simple smoothing 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 Witten-Bell smoothing 𝑃 𝑊𝐵 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑒𝑖−𝑛+1 𝑖−1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎 𝑒𝑖−𝑛+1 𝑖−1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ + 𝑐 𝑒𝑖−𝑛+1 𝑖−1
  • 24. Witten-Bell smoothing in precise 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ + 𝑐 𝑒𝑖−𝑛+1 𝑖−1 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ represents how many kind of words continue after context 𝑒𝑖−𝑛+1 𝑖−1 𝑢 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑤𝑎𝑠,∗ = 52 𝑢 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑅𝑜𝑛𝑎𝑙𝑑,∗ = 3 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetia an 1 First 3 … 52 kind, sum 110 3 kind, sum 40
  • 25. Witten-Bell smoothing in precise 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ + 𝑐 𝑒𝑖−𝑛+1 𝑖−1 𝑎 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑤𝑎𝑠 = 52 110+52 = 0.32 𝑎 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑅𝑜𝑛𝑎𝑙𝑑 = 3 40+3 = 0.07 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetia an 1 First 3 … 52 kind, sum 110 3 kind, sum 40
  • 26. Absolute discounting  Yet another smoothing  Unlike Witten-Bell smoothing which uses 𝑃 𝑀𝐿, it subtracts constant value 𝑑 from frequency of each word in order to estimate probability 𝑃𝑑 𝑒𝑖 | 𝑒0 𝑖−1 = max 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0 𝑖 − 𝑑, 0 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0 𝑖−1
  • 27. Abstruct discounting So why do you subtract? We want to treat low-frequent word as unknown word, because low-frequent one can not really be trusted. By doing this, (n-1)-gram gets more emphasized
  • 28. Absolute discounting 𝑃𝑑 𝑒𝑖 | 𝑒𝑖−𝑛+1 𝑖−1 = max 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒𝑖−𝑛+1 𝑖 − 𝑑, 0 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒𝑖−𝑛+1 𝑖−1 𝑃𝑑 𝑒𝑖 = 𝑟𝑒𝑎𝑔𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 38 − 0.5 40 = 0.9375 𝑃𝑑 𝑒𝑖 = 𝑐𝑎𝑧𝑎|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 1 − 0.5 40 = 0.0125 𝑃𝑑 𝑒𝑖 = 𝑣𝑒𝑛𝑒𝑡𝑖𝑎𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 1 − 0.5 40 = 0.0125 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetia an 1 First 3 … 52 kind, sum 110 3 kind, sum 40
  • 29. Absolute discounting 𝑃𝑑 𝑒𝑖 = 𝑟𝑒𝑎𝑔𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.9375 𝑃𝑑 𝑒𝑖 = 𝑐𝑎𝑧𝑎|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.0125 𝑃𝑑 𝑒𝑖 = 𝑣𝑒𝑛𝑒𝑡𝑖𝑎𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.0125 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 1 − 0.9375 + 0.0125 + 0.0125 = 0.0375 Efficient way of solving this is following 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ × 𝑑 𝑐 𝑒𝑖−𝑛+1 𝑖−1
  • 30. Absolute discounting Now that we do not use maximum likelihood, n-gram probability will be estimated as following 𝑃 𝑒𝑖| 𝑒𝑖−𝑛+1 𝑖−1 = 𝑃𝑑 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎 𝑒𝑖−𝑛+1 𝑖−1 𝑃 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 Quite similar, but differs in that absolute discounting use 𝑃𝑑 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1
  • 31. Kneser-Ney smoothing achieve excellent performance Similar to absolute discounting Have an interest in a word that only appears in specific context
  • 32. Kneser-Ney smoothing Lower order model is needed only when count in higher order model is small Suppose “San Francisco” is common, but “Francisco” appears only after “San” Both “San” and “Francisco” get a high unigram probability But we want to give “Francisco” a low unigram probability!!
  • 33. Kneser-Ney smoothing Kneser-Ney is defined as following 𝑃𝑘𝑛 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = max 𝑢 ∗, 𝑒𝑖−𝑛+2 𝑖−1 − d, 0 𝑢 𝑒𝑖−𝑛+1 𝑖−1
  • 34. Unknown words Even though smoothing can reduce probability of having 𝑃 𝑒 = 0, possibility of getting 0 still rely We may give a possibility to unknown word as following 𝑃𝑢𝑛𝑘 𝑒𝑖 = 1 𝑉