## What's hot

Комплекс тоо цуврал хичээл-1
Комплекс тоо цуврал хичээл-1
Март

Addition and subtraction of rational expression
Addition and subtraction of rational expression
MartinGeraldine

Integers
Integers
Ranjan K.M.

1.5 equations and solutions 1
1.5 equations and solutions 1
bweldon

Maths solving equations
Maths solving equations
Qwizdom UK

How to Integrate an Equation | Jameel Academy
How to Integrate an Equation | Jameel Academy

How to write equations &expressions
How to write equations &expressions
Mr Lam

Indices
Indices
Hanini Hamsan

Absolute Value Notes
Absolute Value Notes
ClaireHumphrey3

Class Presentation Math 1
Class Presentation Math 1
Michelle Podulka

S 1
S 1
khyps13

Unit 4.6
Unit 4.6
nglaze10

Sequences
Sequences
effiefil

Harkeerit&Kyra
Harkeerit&Kyra
kpwise

Introduction To Equations
Introduction To Equations
gemmabean

Finding the sum of a geometric sequence
Finding the sum of a geometric sequence
mwagner1983

3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
Dr. I. Uma Maheswari Maheswari

The magic of vedic maths
The magic of vedic maths
Tarun Gehlot

### What's hot(19)

Комплекс тоо цуврал хичээл-1
Комплекс тоо цуврал хичээл-1

Addition and subtraction of rational expression
Addition and subtraction of rational expression

Integers
Integers

1.5 equations and solutions 1
1.5 equations and solutions 1

Maths solving equations
Maths solving equations

How to Integrate an Equation | Jameel Academy
How to Integrate an Equation | Jameel Academy

How to write equations &expressions
How to write equations &expressions

Indices
Indices

Absolute Value Notes
Absolute Value Notes

Class Presentation Math 1
Class Presentation Math 1

S 1
S 1

Unit 4.6
Unit 4.6

Sequences
Sequences

Harkeerit&Kyra
Harkeerit&Kyra

Introduction To Equations
Introduction To Equations

Finding the sum of a geometric sequence
Finding the sum of a geometric sequence

3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)
3h. Pedagogy of Mathematics (Part II) - Algebra (Ex 3.8)

The magic of vedic maths
The magic of vedic maths

## Viewers also liked

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
NAIST Machine Translation Study Group

RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
NAIST Machine Translation Study Group

The Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group

Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters

From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
Jungkyu Lee

Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski

Word representations in vector space
Word representations in vector space

Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Universitat Politècnica de Catalunya

Deep Learning for Information Retrieval
Deep Learning for Information Retrieval
Roelof Pieters

코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
D.CAMP

Code로 이해하는 RNN
Code로 이해하는 RNN
SANG WON PARK

20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
Taejoon Yoo

### Viewers also liked(14)

[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...

RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)

The Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...

Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language

From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec

Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications

Word representations in vector space
Word representations in vector space

Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...

Deep Learning for Information Retrieval
Deep Learning for Information Retrieval

코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020
코노랩스(최재훈 CTO)_AI Startup D.PARTY_20161020

Code로 이해하는 RNN
Code로 이해하는 RNN

20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사
20160203_마인즈랩_딥러닝세미나_05 딥러닝 자연어처리와 분류엔진 황이규박사

## Similar to [Book Reading] 機械翻訳 - Section 3 No.1

Learning group em - 20171025 - copy
Learning group em - 20171025 - copy
Shuai Zhang

P2-Chp3-SequencesAndSeries from pure maths 2.pptx
P2-Chp3-SequencesAndSeries from pure maths 2.pptx
ArafathAliMathsTeach

Complete Residue Systems.pptx
Complete Residue Systems.pptx
JasonMeregildo3

Rational function 11
Rational function 11
AjayQuines

Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx
PremkumarLetchumanan

Binary Operations.pptx
Binary Operations.pptx
SoyaMathew1

IGCSEFM-FactorTheorem.pptx
IGCSEFM-FactorTheorem.pptx
AngieMichailidou

Finding the general term (not constant)
Finding the general term (not constant)
AjayQuines

tungwc

Lecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdf
MohammedKhodary4

Equations.pptx
Equations.pptx
JeralynAlabanzas2

S1 z(def., prop., y operaciones)
S1 z(def., prop., y operaciones)
EDGARYALLI

Sequence and series
Sequence and series
Denmar Marasigan

Комплекс тоо цуврал хичээл-2
Комплекс тоо цуврал хичээл-2
Март

Polynomial division
Polynomial division
drpahaworth

Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Jongsu "Liam" Kim

PhyChem3_vector_matrix_mechanics.pptx
PhyChem3_vector_matrix_mechanics.pptx
Erickson Fajiculay

ショアのアルゴリズム
ショアのアルゴリズム
FukiNakamura

Semana 10 numeros complejos i álgebra-uni ccesa007
Semana 10 numeros complejos i álgebra-uni ccesa007
Demetrio Ccesa Rayme

WEEK 3.pdf
WEEK 3.pdf
MarvinOreta

### Similar to [Book Reading] 機械翻訳 - Section 3 No.1(20)

Learning group em - 20171025 - copy
Learning group em - 20171025 - copy

P2-Chp3-SequencesAndSeries from pure maths 2.pptx
P2-Chp3-SequencesAndSeries from pure maths 2.pptx

Complete Residue Systems.pptx
Complete Residue Systems.pptx

Rational function 11
Rational function 11

Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx

Binary Operations.pptx
Binary Operations.pptx

IGCSEFM-FactorTheorem.pptx
IGCSEFM-FactorTheorem.pptx

Finding the general term (not constant)
Finding the general term (not constant)

Lecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdf

Equations.pptx
Equations.pptx

S1 z(def., prop., y operaciones)
S1 z(def., prop., y operaciones)

Sequence and series
Sequence and series

Комплекс тоо цуврал хичээл-2
Комплекс тоо цуврал хичээл-2

Polynomial division
Polynomial division

Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation

PhyChem3_vector_matrix_mechanics.pptx
PhyChem3_vector_matrix_mechanics.pptx

ショアのアルゴリズム
ショアのアルゴリズム

Semana 10 numeros complejos i álgebra-uni ccesa007
Semana 10 numeros complejos i álgebra-uni ccesa007

WEEK 3.pdf
WEEK 3.pdf

## More from NAIST Machine Translation Study Group

On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group

[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group

[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group

[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group

[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group

[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group

[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group

[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group

[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group

[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group

[Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
NAIST Machine Translation Study Group

### More from NAIST Machine Translation Study Group(11)

On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation

[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...

[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...

[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...

[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...

[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...

[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...

[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data

[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2

[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1

[Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2

171ticu

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub

Computational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018

CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6

Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra

Design and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal

Material for memory and display system h
Material for memory and display system h
gowrishankartb2005

Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL

Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578

An Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU

Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal

ecqow

An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
IJECEIAES

ydzowc

Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression

insn4465

AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf

Computational Engineering IITH Presentation
Computational Engineering IITH Presentation

CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS

Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content

Design and optimization of ion propulsion drone
Design and optimization of ion propulsion drone

Material for memory and display system h
Material for memory and display system h

Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx

Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf

An Introduction to the Compiler Designss
An Introduction to the Compiler Designss

Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...

An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...

Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression

AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...

### [Book Reading] 機械翻訳 - Section 3 No.1

• 1. Language Model MT STUDY MEETING 5/21 HIROYUKI FUDABA
• 2. How can you say whether a sentence is natural or not? 𝑒1 = he is dog 𝑒2 = is big he 𝑒1 = this is a purple dog
• 3. How can you say whether a sentence is natural or not? 𝑒1 = he is dog ↑ correct 𝑒2 = is big he ↑ grammatically wrong 𝑒1 = this is a purple dog ↑ semantically wrong
• 4. Language model probability We want to treat “naturality” statistically We represent this with language model probability 𝑃 𝑒 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 0.7 𝑃 𝑒 = 𝑖𝑠 𝑏𝑖𝑔 ℎ𝑒 = 0.3 𝑃 𝑒 = 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑝𝑢𝑟𝑝𝑙𝑒 𝑑𝑜𝑔 = 0.5
• 5. Some ways to estimate 𝑃(𝑒)  n-gram model  Positional language model  factored language model  cache language model
• 6. Basis of n-gram we notate a sentence as 𝒆 = 𝑒1 𝐼 , 𝐼 being the length of it 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝐼 = 3 We can define 𝑃(𝑒) as following 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 𝑃 𝐼 = 3, 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔 = 𝑃 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝑒4 = 𝑒𝑜𝑠 = P(e0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠, 𝑒3 = 𝑏𝑖𝑔, 𝑒4 = 𝑒𝑜𝑠 )
• 7. estimate 𝑃(𝑒) with a simple way assume that natural sentence appear more frequently than the ones that aren’t, simple way to estimate 𝑃(𝑒) is following Bring a big training data 𝐸𝑡𝑟𝑎𝑖𝑛 Count frequencies of each sentences in 𝐸𝑡𝑟𝑎𝑖𝑛 𝑃𝑠 𝑒 = 𝑓𝑟𝑒𝑞 𝑒 𝑠𝑖𝑧𝑒(𝐸𝑡𝑟𝑎𝑖𝑛) = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 𝑒 𝑐𝑡𝑟𝑎𝑖𝑛( 𝑒) 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 returns how many sentences exactly matched to “he is big”
• 8. Problem of estimation in simple way when 𝐸𝑡𝑟𝑎𝑖𝑛 does not contain sentences 𝑒1 and 𝑒2, than you can not say which is more natural. 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒1 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒2 = 0 𝑃𝑆 𝑒1 = 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒1 𝑒 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒 = 0 𝑃𝑆 𝑒2 = 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒2 𝑒 𝑐 𝑡𝑟𝑎𝑖𝑛 𝑒 = 0 You can not compare if both values are 0 …
• 9. Solution to 𝑃 𝑒 = 0 Rather thinking a sentence as a whole, let’s think that a sentence is a data that is composed of words 𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑌 𝑃(𝑌) 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 = 𝑃 𝑒1 = ℎ𝑒 𝑒0 = 𝑏𝑜𝑠 ) ∗ P e2 = is e0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒) ∗ 𝑃 𝑒3 = 𝑏𝑖𝑔 𝑒0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = 𝑖𝑠 ∗ 𝑃 𝑒4 = 𝑒𝑜𝑠 𝑒0 = 𝑏𝑜𝑠 , 𝑒1 = ℎ𝑒, 𝑒2 = is, e3 = big)
• 10. Solution to 𝑃 𝑒 = 0 𝑃𝑆 𝑒 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒 𝑒 𝑐𝑡𝑟𝑎𝑖𝑛( 𝑒) = 𝑃 𝑒1 𝐼 = 𝑖=1 𝐼+1 𝑃 𝑀𝐿 𝑒𝑖|𝑒0 𝑖−1 𝑃 𝑀𝐿 𝑒𝑖| 𝑒0 𝑖−1 = 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0 𝑖 𝑐𝑡𝑟𝑎𝑖𝑛(𝑒0 𝑖−1 ) So far 𝑃 𝑒1 𝐼 is completely equal to 𝑃𝑆(𝑒), which means it still don’t work
• 11. Idea of n-gram model Rather considering all words appeared before the word looking at, let’s consider only 𝑛 − 1 words appeared just before the word Instead of considering all words … is big 𝑒𝑜𝑠he𝑏𝑜𝑠
• 12. Idea of n-gram model Rather considering all words appeared before the word looking at, let’s consider only 𝑛 − 1 words appeared just before the word Consider only 𝑛 − 1 words is big 𝑒𝑜𝑠he𝑏𝑜𝑠
• 13. n-gram in precise From the previous expression 𝑃 𝑒1 𝐼 = 𝑖=1 𝐼+1 𝑃 𝑀𝐿 𝑒𝑖|𝑒0 𝑖−1 we can approximate 𝑃(𝑒) as following 𝑃 𝑒1 𝐼 ≈ 𝑖=1 𝐼+1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1
• 14. How does this help? 𝑃 𝑒 = ℎ𝑒 𝑖𝑠 𝑏𝑖𝑔 ≈ 𝑃 𝑒𝑖 = ℎ𝑒 | 𝑒𝑖−1 = 𝑏𝑜𝑠 ∗ P ei = is | 𝑒𝑖−1 = he ∗ P ei = big ei−1 = is) ∗ P 𝑒𝑖 = 𝑒𝑜𝑠 | 𝑒𝑖−1 = 𝑏𝑖𝑔 Intuitively, a subset sequence appear more than it’s super set, so 𝑃 𝑒 estimated with n-gram model is less likely to be 0
• 15. Smoothing n-gram model  n-gram less likely estimate 𝑃 𝑒 = 0  But it still have a possibility of estimating 0 → Smoothing
• 16. Idea of smoothing Combining probability of n-gram and (n-1)-gram Even if probability of word 𝑤 could not be estimated with n-gram, there is a possibility that probability can be estimated with (n-1)-gram 𝑃3−𝑔𝑟𝑎𝑚 𝑠𝑚𝑎𝑙𝑙 | ℎ𝑒 𝑖𝑠 = 0 P2−gram small is) = 0.03 0 0.05 0.1 0.15 0.2 0.25 P(he|<bos>) P(is|<bos> he) P(big|he is) P(small|he is) P(<eos>|is big) probability probability
• 17. Linear interpolation Easiest, and basic way to express the idea 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 0 ≤ 𝑎 ≤ 1 Adjusting 𝑎 to a good value is the problem So how can we do that?
• 18. Adjusting 𝑎 to a good value Easy way to achieve this is following  Bring dataset which is different from training data  Select 𝑎 that gives the highest likelihood to the dataset Improve performance by considering each context
• 19. Witten-Bell smoothing How should I choose 𝑎 if n-gram was like following? President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetiaan 1 First 3 … 52 kind, sum 110 3 kind, sum 40
• 20. Witten-Bell smoothing It is likely to have an unknown word after context “President was” 𝑎 should be large, so that (n-1)-gram will be more emphasized 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetiaan 1 First 3 … 52 kind, sum 110 3 kind, sum 40
• 21. Witten-Bell smoothing It is likely to have an unknown word after context “President Ronald” 𝑎 should be small, so that n-gram will be more emphasized 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetiaan 1 First 3 … 52 kind, sum 110 3 kind, sum 40
• 22. Idea of Witten-Bell smoothing If you only had a single coefficient value 𝑎 to adjust, You can not consider context for each word → why not use different 𝒂 to consider each context info for each word?
• 23. Witten-Bell smoothing in precise Simple smoothing 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 Witten-Bell smoothing 𝑃 𝑊𝐵 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑒𝑖−𝑛+1 𝑖−1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎 𝑒𝑖−𝑛+1 𝑖−1 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ + 𝑐 𝑒𝑖−𝑛+1 𝑖−1
• 24. Witten-Bell smoothing in precise 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ + 𝑐 𝑒𝑖−𝑛+1 𝑖−1 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ represents how many kind of words continue after context 𝑒𝑖−𝑛+1 𝑖−1 𝑢 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑤𝑎𝑠,∗ = 52 𝑢 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑅𝑜𝑛𝑎𝑙𝑑,∗ = 3 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetia an 1 First 3 … 52 kind, sum 110 3 kind, sum 40
• 25. Witten-Bell smoothing in precise 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ + 𝑐 𝑒𝑖−𝑛+1 𝑖−1 𝑎 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑤𝑎𝑠 = 52 110+52 = 0.32 𝑎 𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑅𝑜𝑛𝑎𝑙𝑑 = 3 40+3 = 0.07 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetia an 1 First 3 … 52 kind, sum 110 3 kind, sum 40
• 26. Absolute discounting  Yet another smoothing  Unlike Witten-Bell smoothing which uses 𝑃 𝑀𝐿, it subtracts constant value 𝑑 from frequency of each word in order to estimate probability 𝑃𝑑 𝑒𝑖 | 𝑒0 𝑖−1 = max 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0 𝑖 − 𝑑, 0 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒0 𝑖−1
• 27. Abstruct discounting So why do you subtract? We want to treat low-frequent word as unknown word, because low-frequent one can not really be trusted. By doing this, (n-1)-gram gets more emphasized
• 28. Absolute discounting 𝑃𝑑 𝑒𝑖 | 𝑒𝑖−𝑛+1 𝑖−1 = max 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒𝑖−𝑛+1 𝑖 − 𝑑, 0 𝑐𝑡𝑟𝑎𝑖𝑛 𝑒𝑖−𝑛+1 𝑖−1 𝑃𝑑 𝑒𝑖 = 𝑟𝑒𝑎𝑔𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 38 − 0.5 40 = 0.9375 𝑃𝑑 𝑒𝑖 = 𝑐𝑎𝑧𝑎|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 1 − 0.5 40 = 0.0125 𝑃𝑑 𝑒𝑖 = 𝑣𝑒𝑛𝑒𝑡𝑖𝑎𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 1 − 0.5 40 = 0.0125 President was President Ronald elected 5 Reagan 38 the 3 Caza 1 in 3 Venetia an 1 First 3 … 52 kind, sum 110 3 kind, sum 40
• 29. Absolute discounting 𝑃𝑑 𝑒𝑖 = 𝑟𝑒𝑎𝑔𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.9375 𝑃𝑑 𝑒𝑖 = 𝑐𝑎𝑧𝑎|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.0125 𝑃𝑑 𝑒𝑖 = 𝑣𝑒𝑛𝑒𝑡𝑖𝑎𝑎𝑛|𝑒𝑖−2 𝑖−1 = 𝑝𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡 𝑟𝑜𝑛𝑎𝑙𝑑 = 0.0125 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 1 − 0.9375 + 0.0125 + 0.0125 = 0.0375 Efficient way of solving this is following 𝑎 𝑒𝑖−𝑛+1 𝑖−1 = 𝑢 𝑒𝑖−𝑛+1 𝑖−1 ,∗ × 𝑑 𝑐 𝑒𝑖−𝑛+1 𝑖−1
• 30. Absolute discounting Now that we do not use maximum likelihood, n-gram probability will be estimated as following 𝑃 𝑒𝑖| 𝑒𝑖−𝑛+1 𝑖−1 = 𝑃𝑑 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎 𝑒𝑖−𝑛+1 𝑖−1 𝑃 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1 Quite similar, but differs in that absolute discounting use 𝑃𝑑 𝑃 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = 1 − 𝑎 𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 + 𝑎𝑃 𝑀𝐿 𝑒𝑖|𝑒𝑖−𝑛+2 𝑖−1
• 31. Kneser-Ney smoothing achieve excellent performance Similar to absolute discounting Have an interest in a word that only appears in specific context
• 32. Kneser-Ney smoothing Lower order model is needed only when count in higher order model is small Suppose “San Francisco” is common, but “Francisco” appears only after “San” Both “San” and “Francisco” get a high unigram probability But we want to give “Francisco” a low unigram probability!!
• 33. Kneser-Ney smoothing Kneser-Ney is defined as following 𝑃𝑘𝑛 𝑒𝑖|𝑒𝑖−𝑛+1 𝑖−1 = max 𝑢 ∗, 𝑒𝑖−𝑛+2 𝑖−1 − d, 0 𝑢 𝑒𝑖−𝑛+1 𝑖−1
• 34. Unknown words Even though smoothing can reduce probability of having 𝑃 𝑒 = 0, possibility of getting 0 still rely We may give a possibility to unknown word as following 𝑃𝑢𝑛𝑘 𝑒𝑖 = 1 𝑉
Current LanguageEnglish
Español
Portugues
Français
Deutsche