SlideShare a Scribd company logo
2.3.3 BLEU
2.3.4 METEOR
2.3.5 RIBES
2.3.6 Meta Evaluation
2.4 Statistical Testing
MT study / May 14 , 2015
Seitaro Shinagawa , AHC-lab1
機械翻訳 Chapter2
2.3.3 BLEU
|}{}{|),(
||)(
rgegerm
egec
nnn
nn



Evaluate matching rate of n-gram between r(ref) and e(translated).
☆N-gram position is ignored.
 The number of n-gram of e
 The number of match between
reference and translated text
Calculate a geometric mean from 1-gram to 4-gram.



















4
1
4/1
)(
1
)()(
1
)(
)9.2(),(
)(
)},,...,({
),(
n
i
n
i
i
M
ii
n
ERBP
ec
errm
ERBLEU
brevity penalty
|}{}{|)},,...,({
1
1 
M
j
jnnMn rgegerrm

 If you have M references per e,
choose max ),( erm jn
)),(,),,(),,(max()},,...,({ 211 ermermermerrm MnnnMn  2
11 7 4 2
9 4 1 0



















4
1
4/1
)(
1
)()(
1
)(
)9.2(),(
)(
)},,...,({
),(
n
i
n
i
i
M
ii
n
ERBP
ec
errm
ERBLEU
2.3.3 BLEU example
r1 : I’d like to stay there for five nights , from June sixth .
r2 : I want to stay for five nights , from June sixth .
e : Well , I’d like to stay five nights beginning June sixth.
),( 1 er
),( 2 er
1m 2m 4m3m n : n-gram
13 12 11 10
1c 2c 4c3c
e
 𝑖 = 1
accepted
( )|𝑟2| < |𝑟1| < |𝑒|
𝐵𝐿𝐸𝑈 𝑟1, 𝑟2 , 𝑒 = 𝐵𝐿𝐸𝑈 𝑟1, 𝑒 =
11
13
⋅
7
12
⋅
4
11
⋅
2
10
1
4
⋅ 𝐵𝑃 𝑟1, 𝑒 ≅ 0.4353 ⋅ 𝐵𝑃 𝑟1, 𝑒3
※ : Choose one whose length is close to e. (and short)
2.3.3 brevity penalty































N
i
i
N
i
i
e
r
ERBP
1
)(
1
)(
||
|~|
1exp,1min),( (2.10)

N
i
i
e
1
)(
|| 
N
i
i
r
1
)(
|~|

N
i
i
e
1
)(
|| 
N
i
i
r
1
)(
|~|

N
i
i
e
1
)(
|| 
N
i
i
r
1
)(
|~|
<<
>
≅
0),( ERBP
1),( ERBP
1),( ERBP
BP penalizes translated text is too short against reference.
)(~ i
r
4
2.3.4 METEOR
 Lack of recall
 Indirectly measure fluency and grammaticality
 Using geometric averaging
There are problems to use BLEU naively. (※ref->134)
Brevity Penalty does not adequately compensate for the lack of recall. [Lavie 2004]
Explicit word-matching is required.
Geometric averaging results in score of zero whenever one of the
component n-gram scores is zero.
Metric for Evaluation of Translation with Explicit Ordering
assess them.
5
2.3.4 METEOR
r : I ‘d like to stay there for five nights , from June sixth .
e : Well , I ‘d like to stay five nights beginning June sixth .
To Explicit word-matching, taking alignment between r and e.
Ex)
(2.11)
F-measureThe number of words aligned.
The number of words in e.
The number of words aligned.
The number of words in r.
14 words
13 words
11
alignments
6
(if 𝛼 = 0.5)
Harmonic mean is desirable for METEOR
(2.11)
Both high precision
and high recall rate
are essential.
7
2.3.4 METEOR
: fragmentation penalty
(2.11)
: The number of groups of sequential words
r : I ‘d like to stay there for five nights , from June sixth .
e : Well , I ‘d like to stay five nights beginning June sixth .
Ex)
(1) (2) (3) (4)
Summary of METEOR
・High precision and high recall are desirable.
・FP intends to divide a text to long sentences.
・Necessary to tune hyper parameter 𝛼, 𝛽, 𝛾8
For scoring Japanese-to-English translation,
(※ref -> 111)There are problems to use BLEU naively.
2.3.5 RIBES
Rank-based Intuitive Bilingual Evaluation Score assess this problem.9
http://www.researchgate.net/profile/Katsuhito_Sudoh/publication/221012636_Automatic_Evaluatio
n_of_Translation_Quality_for_Distant_Language_Pairs/links/00b4952d8d9f8ab140000000.pdf
2.3.5 RIBES
r : I ‘d like to stay there for five nights , from June sixth .
e : Well , I ‘d like to stay five nights beginning June sixth .
Ex)
(1)(2) (3) (4) (5) (6) (7) (8) (9) (10)(11) (12) (13) (14)
(10)(1)(2) (3) (8) (9) (12) (13) (14)(4) (5)
Position
number
Aligned
by r
Rank vector 𝒉 = 8 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 9 , 10 , 11
Scoring by using a rank correlation coefficient.
To evaluate bilingual translations required to sort extremely.
Rank vector 𝒉
Rank correlation coefficients
Spearman’s 𝝆
Kendall’s 𝝉
Considering
coefficients as score.
10
Spearman’s 𝝆
Kendall’s 𝝉
If rank vector 𝒉 = 8 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 9 , 10 , 11 is given …
𝒉 𝒌\𝒉 𝒌′ 8 1 2 3 4
8 × × × ×
1 ○ ○ ○
2 ○ ○
3 ○
4
𝒉 ∶ length of 𝒉
⋯
⋮
If ℎ 𝑘 < ℎ 𝑘′ then
return 1
(2.13)
(2.14)
𝒉 = ℎ1 , ℎ2 , ⋯ , ℎ|𝒉|ℎ 𝑘 ∈ 𝒉 , (𝑘 = 1,2, ⋯ , 𝒉 )
Calculate distance between
𝒉 and 𝒌 = (1,2, ⋯ , |𝒉|)
11
(Spearman)
(Kendall)
𝒉 𝒓, 𝒆 : rank vector aligned
by r and e
Brevity Penalty
𝒆 ≅ 𝒓 is better𝒉(𝒓, 𝒆) = 𝒆 is desirable
(∵ 𝒉 𝒓, 𝒆 ≤ 𝒆 )
Summary of RIBES
・Rank correlation coefficient is useful for bilingual translation.
・Spearman score is almost equal to Kendall score.
・Necessary to tune hyper parameter 𝛼, 𝛽
(2.15)
12
2.3.6 Meta Evaluation of Automatic Evaluation
Good Automatic Evaluation correlates with Human Evaluation.
Human Evaluation 𝑥1 𝑥2 𝑥3 𝑥4 ⋯ 𝑥 𝑆−1 𝑥 𝑆
Automatic Evaluation 𝑦1 𝑦2 𝑦3 𝑦4 ⋯ 𝑦𝑆−1 𝑦𝑆
Assuming that score sample xs, ys 𝑠 = (1,2, ⋯ 𝑆) are given,
Calculate Pearson product-moment correlation coefficient
(2.19)
13
2.4 Statistical Testing
How can we judge which evaluation is the best ?
 Score may be different by
another system or evaluators.
 Our test resources (data,
human) are limited.
Statistical Testing Problem
Calculating confidence interval
“You can get score which is out of confidence interval with probability p.”
14
Bootstrapping
200 texts
Make N test sets from whole texts as below.
Choose
randomly
100 texts 100 texts 100 texts
・・・
Ex)
1st 2nd Nth
Statistical Machine Translation
s1 s2 ⋯ s 𝑁
Get Score
After ascending sort of 𝑺, delete extreme scores.
𝑺
s1 s2 𝑠3 ・・・ 𝑠 𝑁−2 s 𝑁−1 s 𝑁
𝑁 ⋅ 𝑝/2 𝑁 ⋅ 𝑝/2confidence interval
< 𝑠3, 𝑠 𝑁−2 >
Assuming
p=0.05
N=1000
Delete
5025,25
15
Comparing SMT system using bootstrapping
200 texts
Choose
randomly
100 texts 100 texts 100 texts
・・・
Ex)
1st 2nd Nth
Statistical Machine Translation
s1
(a)
s2
(a) ⋯ s 𝑁
(𝑎)
s1
(b)
s2
(b) ⋯ s 𝑁
(b)
Get Score
𝑺
s 𝑡𝑒𝑠𝑡𝑠𝑒𝑡
(System)
Win rate of system a
If 𝑁𝑎 is over 95% ,
System a is better than b with p=0.05
16
References
[Lavie 2004] Lavie, Alon, Kenji Sagae, and Shyamsundar Jayaraman. "The significance
of recall in automatic metrics for MT evaluation." Machine Translation: From Real
Users to Research. Springer Berlin Heidelberg, 2004. 134-143.
111) Isozaki, Hideki, et al. "Automatic evaluation of translation quality for distant
language pairs." Proceedings of the 2010 Conference on Empirical Methods in
Natural Language Processing. Association for Computational Linguistics, 2010.
134) Lavie, Alon, and Michael J. Denkowski. "The METEOR metric for automatic
evaluation of machine translation." Machine translation 23.2-3 (2009): 105-115.
17

More Related Content

What's hot

A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
csandit
 
Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2
Vineetha Vishnu
 
IRJET- Common Fixed Point Results in Menger Spaces
IRJET-  	  Common Fixed Point Results in Menger SpacesIRJET-  	  Common Fixed Point Results in Menger Spaces
IRJET- Common Fixed Point Results in Menger Spaces
IRJET Journal
 
Precision and accuracy, science, lab report, terminology
Precision and accuracy, science, lab report, terminologyPrecision and accuracy, science, lab report, terminology
Precision and accuracy, science, lab report, terminology
Mr Lam
 
7th pre alg -l41
7th pre alg -l417th pre alg -l41
7th pre alg -l41
jdurst65
 
Lar calc10 ch04_sec5
Lar calc10 ch04_sec5Lar calc10 ch04_sec5
Lar calc10 ch04_sec5
Institute of Applied Technology
 
REYLEIGH’S METHOD,BUCKINGHAM π-THEOREM
REYLEIGH’S METHOD,BUCKINGHAM  π-THEOREMREYLEIGH’S METHOD,BUCKINGHAM  π-THEOREM
REYLEIGH’S METHOD,BUCKINGHAM π-THEOREM
Amiraj College Of Engineering And Technology
 
Free Ebooks Download
Free Ebooks Download Free Ebooks Download
Free Ebooks Download
Edhole.com
 
Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysis
MuddassirMuhammad
 
Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...
Alexander Decker
 
Pakdd
PakddPakdd
Pakdd
Siswanto .
 
Bm35359363
Bm35359363Bm35359363
Bm35359363
IJERA Editor
 
Module 2 Design Analysis and Algorithms
Module 2 Design Analysis and AlgorithmsModule 2 Design Analysis and Algorithms
Module 2 Design Analysis and Algorithms
Cool Guy
 
1+3 gr reduced_as_1+1_gravity_set_1_fordisplay
1+3 gr reduced_as_1+1_gravity_set_1_fordisplay1+3 gr reduced_as_1+1_gravity_set_1_fordisplay
1+3 gr reduced_as_1+1_gravity_set_1_fordisplay
foxtrot jp R
 
G023073077
G023073077G023073077
G023073077
inventionjournals
 
Buckingham's theorem
Buckingham's  theoremBuckingham's  theorem
Buckingham's theorem
MuhammadNomanAslam3
 
Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysis
Ronak Parmar
 
Dimensional analysis - Part 1
Dimensional analysis - Part 1 Dimensional analysis - Part 1
Dimensional analysis - Part 1
Ramesh B R
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
ijfcstjournal
 

What's hot (19)

A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
 
Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2Dynamic_Prog_Analysis_poster2
Dynamic_Prog_Analysis_poster2
 
IRJET- Common Fixed Point Results in Menger Spaces
IRJET-  	  Common Fixed Point Results in Menger SpacesIRJET-  	  Common Fixed Point Results in Menger Spaces
IRJET- Common Fixed Point Results in Menger Spaces
 
Precision and accuracy, science, lab report, terminology
Precision and accuracy, science, lab report, terminologyPrecision and accuracy, science, lab report, terminology
Precision and accuracy, science, lab report, terminology
 
7th pre alg -l41
7th pre alg -l417th pre alg -l41
7th pre alg -l41
 
Lar calc10 ch04_sec5
Lar calc10 ch04_sec5Lar calc10 ch04_sec5
Lar calc10 ch04_sec5
 
REYLEIGH’S METHOD,BUCKINGHAM π-THEOREM
REYLEIGH’S METHOD,BUCKINGHAM  π-THEOREMREYLEIGH’S METHOD,BUCKINGHAM  π-THEOREM
REYLEIGH’S METHOD,BUCKINGHAM π-THEOREM
 
Free Ebooks Download
Free Ebooks Download Free Ebooks Download
Free Ebooks Download
 
Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysis
 
Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...
 
Pakdd
PakddPakdd
Pakdd
 
Bm35359363
Bm35359363Bm35359363
Bm35359363
 
Module 2 Design Analysis and Algorithms
Module 2 Design Analysis and AlgorithmsModule 2 Design Analysis and Algorithms
Module 2 Design Analysis and Algorithms
 
1+3 gr reduced_as_1+1_gravity_set_1_fordisplay
1+3 gr reduced_as_1+1_gravity_set_1_fordisplay1+3 gr reduced_as_1+1_gravity_set_1_fordisplay
1+3 gr reduced_as_1+1_gravity_set_1_fordisplay
 
G023073077
G023073077G023073077
G023073077
 
Buckingham's theorem
Buckingham's  theoremBuckingham's  theorem
Buckingham's theorem
 
Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysis
 
Dimensional analysis - Part 1
Dimensional analysis - Part 1 Dimensional analysis - Part 1
Dimensional analysis - Part 1
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
 

Similar to [Book Reading] 機械翻訳 - Section 2 No.2

Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
ssuser01e301
 
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 KindThe Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
Dr. Amarjeet Singh
 
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNA GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
cscpconf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...
inventionjournals
 
9.3 Geometric Sequences
9.3 Geometric Sequences9.3 Geometric Sequences
9.3 Geometric Sequences
smiller5
 
E0561719
E0561719E0561719
E0561719
IOSR Journals
 
A Non Local Boundary Value Problem with Integral Boundary Condition
A Non Local Boundary Value Problem with Integral Boundary ConditionA Non Local Boundary Value Problem with Integral Boundary Condition
A Non Local Boundary Value Problem with Integral Boundary Condition
IJMERJOURNAL
 
A Special Type Of Differential Polynomial And Its Comparative Growth Properties
A Special Type Of Differential Polynomial And Its Comparative Growth PropertiesA Special Type Of Differential Polynomial And Its Comparative Growth Properties
A Special Type Of Differential Polynomial And Its Comparative Growth Properties
IJMER
 
Radix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier TransformRadix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier Transform
IJERA Editor
 
Lesson 21: More Algebra
Lesson 21: More AlgebraLesson 21: More Algebra
Lesson 21: More Algebra
Kevin Johnson
 
v39i11.pdf
v39i11.pdfv39i11.pdf
v39i11.pdf
Gangula Abhimanyu
 
140106 isaim-okayama
140106 isaim-okayama140106 isaim-okayama
140106 isaim-okayama
gumitaro2012
 
Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
IOSR Journals
 
Efficiency of ratio and regression estimators using double sampling
Efficiency of ratio and regression estimators using double samplingEfficiency of ratio and regression estimators using double sampling
Efficiency of ratio and regression estimators using double sampling
Alexander Decker
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
Tomonari Masada
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...
Alexander Decker
 
Definition of statistical efficiency
Definition of statistical efficiencyDefinition of statistical efficiency
Definition of statistical efficiency
RuhulAmin339
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
mathsjournal
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
Alexander Decker
 

Similar to [Book Reading] 機械翻訳 - Section 2 No.2 (20)

Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 KindThe Generalized Difference Operator of the 퐧 퐭퐡 Kind
The Generalized Difference Operator of the 퐧 퐭퐡 Kind
 
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNA GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...
 
9.3 Geometric Sequences
9.3 Geometric Sequences9.3 Geometric Sequences
9.3 Geometric Sequences
 
E0561719
E0561719E0561719
E0561719
 
A Non Local Boundary Value Problem with Integral Boundary Condition
A Non Local Boundary Value Problem with Integral Boundary ConditionA Non Local Boundary Value Problem with Integral Boundary Condition
A Non Local Boundary Value Problem with Integral Boundary Condition
 
A Special Type Of Differential Polynomial And Its Comparative Growth Properties
A Special Type Of Differential Polynomial And Its Comparative Growth PropertiesA Special Type Of Differential Polynomial And Its Comparative Growth Properties
A Special Type Of Differential Polynomial And Its Comparative Growth Properties
 
Radix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier TransformRadix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier Transform
 
Lesson 21: More Algebra
Lesson 21: More AlgebraLesson 21: More Algebra
Lesson 21: More Algebra
 
v39i11.pdf
v39i11.pdfv39i11.pdf
v39i11.pdf
 
140106 isaim-okayama
140106 isaim-okayama140106 isaim-okayama
140106 isaim-okayama
 
Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
 
Efficiency of ratio and regression estimators using double sampling
Efficiency of ratio and regression estimators using double samplingEfficiency of ratio and regression estimators using double sampling
Efficiency of ratio and regression estimators using double sampling
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...
 
Definition of statistical efficiency
Definition of statistical efficiencyDefinition of statistical efficiency
Definition of statistical efficiency
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 

More from NAIST Machine Translation Study Group

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
NAIST Machine Translation Study Group
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
NAIST Machine Translation Study Group
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
NAIST Machine Translation Study Group
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group
 

More from NAIST Machine Translation Study Group (14)

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 

Recently uploaded

Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 

Recently uploaded (20)

Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 

[Book Reading] 機械翻訳 - Section 2 No.2

  • 1. 2.3.3 BLEU 2.3.4 METEOR 2.3.5 RIBES 2.3.6 Meta Evaluation 2.4 Statistical Testing MT study / May 14 , 2015 Seitaro Shinagawa , AHC-lab1 機械翻訳 Chapter2
  • 2. 2.3.3 BLEU |}{}{|),( ||)( rgegerm egec nnn nn    Evaluate matching rate of n-gram between r(ref) and e(translated). ☆N-gram position is ignored.  The number of n-gram of e  The number of match between reference and translated text Calculate a geometric mean from 1-gram to 4-gram.                    4 1 4/1 )( 1 )()( 1 )( )9.2(),( )( )},,...,({ ),( n i n i i M ii n ERBP ec errm ERBLEU brevity penalty |}{}{|)},,...,({ 1 1  M j jnnMn rgegerrm   If you have M references per e, choose max ),( erm jn )),(,),,(),,(max()},,...,({ 211 ermermermerrm MnnnMn  2
  • 3. 11 7 4 2 9 4 1 0                    4 1 4/1 )( 1 )()( 1 )( )9.2(),( )( )},,...,({ ),( n i n i i M ii n ERBP ec errm ERBLEU 2.3.3 BLEU example r1 : I’d like to stay there for five nights , from June sixth . r2 : I want to stay for five nights , from June sixth . e : Well , I’d like to stay five nights beginning June sixth. ),( 1 er ),( 2 er 1m 2m 4m3m n : n-gram 13 12 11 10 1c 2c 4c3c e  𝑖 = 1 accepted ( )|𝑟2| < |𝑟1| < |𝑒| 𝐵𝐿𝐸𝑈 𝑟1, 𝑟2 , 𝑒 = 𝐵𝐿𝐸𝑈 𝑟1, 𝑒 = 11 13 ⋅ 7 12 ⋅ 4 11 ⋅ 2 10 1 4 ⋅ 𝐵𝑃 𝑟1, 𝑒 ≅ 0.4353 ⋅ 𝐵𝑃 𝑟1, 𝑒3
  • 4. ※ : Choose one whose length is close to e. (and short) 2.3.3 brevity penalty                                N i i N i i e r ERBP 1 )( 1 )( || |~| 1exp,1min),( (2.10)  N i i e 1 )( ||  N i i r 1 )( |~|  N i i e 1 )( ||  N i i r 1 )( |~|  N i i e 1 )( ||  N i i r 1 )( |~| << > ≅ 0),( ERBP 1),( ERBP 1),( ERBP BP penalizes translated text is too short against reference. )(~ i r 4
  • 5. 2.3.4 METEOR  Lack of recall  Indirectly measure fluency and grammaticality  Using geometric averaging There are problems to use BLEU naively. (※ref->134) Brevity Penalty does not adequately compensate for the lack of recall. [Lavie 2004] Explicit word-matching is required. Geometric averaging results in score of zero whenever one of the component n-gram scores is zero. Metric for Evaluation of Translation with Explicit Ordering assess them. 5
  • 6. 2.3.4 METEOR r : I ‘d like to stay there for five nights , from June sixth . e : Well , I ‘d like to stay five nights beginning June sixth . To Explicit word-matching, taking alignment between r and e. Ex) (2.11) F-measureThe number of words aligned. The number of words in e. The number of words aligned. The number of words in r. 14 words 13 words 11 alignments 6 (if 𝛼 = 0.5)
  • 7. Harmonic mean is desirable for METEOR (2.11) Both high precision and high recall rate are essential. 7
  • 8. 2.3.4 METEOR : fragmentation penalty (2.11) : The number of groups of sequential words r : I ‘d like to stay there for five nights , from June sixth . e : Well , I ‘d like to stay five nights beginning June sixth . Ex) (1) (2) (3) (4) Summary of METEOR ・High precision and high recall are desirable. ・FP intends to divide a text to long sentences. ・Necessary to tune hyper parameter 𝛼, 𝛽, 𝛾8
  • 9. For scoring Japanese-to-English translation, (※ref -> 111)There are problems to use BLEU naively. 2.3.5 RIBES Rank-based Intuitive Bilingual Evaluation Score assess this problem.9 http://www.researchgate.net/profile/Katsuhito_Sudoh/publication/221012636_Automatic_Evaluatio n_of_Translation_Quality_for_Distant_Language_Pairs/links/00b4952d8d9f8ab140000000.pdf
  • 10. 2.3.5 RIBES r : I ‘d like to stay there for five nights , from June sixth . e : Well , I ‘d like to stay five nights beginning June sixth . Ex) (1)(2) (3) (4) (5) (6) (7) (8) (9) (10)(11) (12) (13) (14) (10)(1)(2) (3) (8) (9) (12) (13) (14)(4) (5) Position number Aligned by r Rank vector 𝒉 = 8 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 9 , 10 , 11 Scoring by using a rank correlation coefficient. To evaluate bilingual translations required to sort extremely. Rank vector 𝒉 Rank correlation coefficients Spearman’s 𝝆 Kendall’s 𝝉 Considering coefficients as score. 10
  • 11. Spearman’s 𝝆 Kendall’s 𝝉 If rank vector 𝒉 = 8 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 9 , 10 , 11 is given … 𝒉 𝒌\𝒉 𝒌′ 8 1 2 3 4 8 × × × × 1 ○ ○ ○ 2 ○ ○ 3 ○ 4 𝒉 ∶ length of 𝒉 ⋯ ⋮ If ℎ 𝑘 < ℎ 𝑘′ then return 1 (2.13) (2.14) 𝒉 = ℎ1 , ℎ2 , ⋯ , ℎ|𝒉|ℎ 𝑘 ∈ 𝒉 , (𝑘 = 1,2, ⋯ , 𝒉 ) Calculate distance between 𝒉 and 𝒌 = (1,2, ⋯ , |𝒉|) 11
  • 12. (Spearman) (Kendall) 𝒉 𝒓, 𝒆 : rank vector aligned by r and e Brevity Penalty 𝒆 ≅ 𝒓 is better𝒉(𝒓, 𝒆) = 𝒆 is desirable (∵ 𝒉 𝒓, 𝒆 ≤ 𝒆 ) Summary of RIBES ・Rank correlation coefficient is useful for bilingual translation. ・Spearman score is almost equal to Kendall score. ・Necessary to tune hyper parameter 𝛼, 𝛽 (2.15) 12
  • 13. 2.3.6 Meta Evaluation of Automatic Evaluation Good Automatic Evaluation correlates with Human Evaluation. Human Evaluation 𝑥1 𝑥2 𝑥3 𝑥4 ⋯ 𝑥 𝑆−1 𝑥 𝑆 Automatic Evaluation 𝑦1 𝑦2 𝑦3 𝑦4 ⋯ 𝑦𝑆−1 𝑦𝑆 Assuming that score sample xs, ys 𝑠 = (1,2, ⋯ 𝑆) are given, Calculate Pearson product-moment correlation coefficient (2.19) 13
  • 14. 2.4 Statistical Testing How can we judge which evaluation is the best ?  Score may be different by another system or evaluators.  Our test resources (data, human) are limited. Statistical Testing Problem Calculating confidence interval “You can get score which is out of confidence interval with probability p.” 14
  • 15. Bootstrapping 200 texts Make N test sets from whole texts as below. Choose randomly 100 texts 100 texts 100 texts ・・・ Ex) 1st 2nd Nth Statistical Machine Translation s1 s2 ⋯ s 𝑁 Get Score After ascending sort of 𝑺, delete extreme scores. 𝑺 s1 s2 𝑠3 ・・・ 𝑠 𝑁−2 s 𝑁−1 s 𝑁 𝑁 ⋅ 𝑝/2 𝑁 ⋅ 𝑝/2confidence interval < 𝑠3, 𝑠 𝑁−2 > Assuming p=0.05 N=1000 Delete 5025,25 15
  • 16. Comparing SMT system using bootstrapping 200 texts Choose randomly 100 texts 100 texts 100 texts ・・・ Ex) 1st 2nd Nth Statistical Machine Translation s1 (a) s2 (a) ⋯ s 𝑁 (𝑎) s1 (b) s2 (b) ⋯ s 𝑁 (b) Get Score 𝑺 s 𝑡𝑒𝑠𝑡𝑠𝑒𝑡 (System) Win rate of system a If 𝑁𝑎 is over 95% , System a is better than b with p=0.05 16
  • 17. References [Lavie 2004] Lavie, Alon, Kenji Sagae, and Shyamsundar Jayaraman. "The significance of recall in automatic metrics for MT evaluation." Machine Translation: From Real Users to Research. Springer Berlin Heidelberg, 2004. 134-143. 111) Isozaki, Hideki, et al. "Automatic evaluation of translation quality for distant language pairs." Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010. 134) Lavie, Alon, and Michael J. Denkowski. "The METEOR metric for automatic evaluation of machine translation." Machine translation 23.2-3 (2009): 105-115. 17