SlideShare a Scribd company logo
1 of 1
Download to read offline
Sampling-based Alignment and Hierarchical Sub-sentential
Alignment in Chinese–Japanese Translation of Patents
Wei Yang, Zhongwen Zhao, Baosong Yang and Yves Lepage
Graduate School of Information, Production and Systems
Waseda University
{kevinyoogi@akane., zzw890827@fuji., yangbaosong@fuji.}waseda.jp, yves.lepage@waseda.jp
This paper describes Chinese–Japanese translation systems based on different alignment
methods using the JPO corpus and our submission (ID: WASUIPS) to the subtask of the
2015 Workshop on Asian Translation. One of the alignment methods used is bilingual hi-
erarchical sub-sentential alignment combined with sampling-based multilingual alignment.
We also accelerated this method and in this paper, we evaluate the translation results and
time spent on several machine translation tasks. The training time is much faster than the
standard baseline pipeline (GIZA++/Moses) and MGIZA/Moses.
Bilingual hierarchical sub-sentential alignment method used in Phrase-based
Statistical Machine Translation (PB-SMT)
• Associative approaches: use a local maximization process in which each sentence is
processed independently.
– Anymalign1: is an open source multilingual associative aligner (Lardilleux and Lep-
age, 2009; Lardilleux et al., 2013). This method samples large numbers of sub-
corpora randomly to obtain source and target word or phrase occurrence distributions.
– Cutnalign: is a bilingual hierarchical sub-sentential alignment method (Lardilleux et
al., 2012). It is based on a recursive binary segmentation process of the alignment
matrix between a source sentence and its corresponding target sentence. We make
use of this method in combination with Anymalign. It is a three-step approach:
∗ measure the strength of the translation link between any source and target pair of
words;
∗ compute the optimal joint clustering of a bipartite graph to search the best alignment;
∗ segment and align a pair of sentences.
When building alignment matrices, the strength between two words is evaluated using the
following formula (Lardilleux et al., 2012).
w(s, t) = p(s|t) × p(t|s) (1)
(p(s|t) and p(t|s)) are translation probabilities estimated by Anymalign. An example of
alignment matrix is shown in Table 1.
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据 ε ε ε 0.27 0.46 0.01 ε ε 0.002 ε ε ε ε ε ε 0.02
这些 0.38 ε ε 0.02 ε ε ε 0.001 ε ε ε ε ε ε ε 0.01
值 0.012 0.27 0.44 ε ε ε ε ε ε ε ε ε ε ε ε 0.03
, 0.002 0.01 0.01 0.13 0.12 0.21 0.10 0.002 0.001 0.002 0.001 0.01 0.01 0.01 ε 0.01
通过 ε ε 0.01 ε ε 0.06 ε ε 0.52 ε ε ε ε 0.02 ε 0.01
upgma ε ε ε ε ε ε 0.75 ε ε ε ε ε ε ε ε 0.02
法 ε ε ε ε ε ε ε 0.013 0.013 ε ε ε ε ε ε 0.01
进行 ε ε ε ε ε ε ε ε ε ε ε 0.01 0.23 0.34 0.21 0.01
聚类 ε ε ε ε ε ε ε ε ε 0.045 0.045 ε ε ε ε 0.02
分析 ε ε ε ε ε ε ε ε ε ε ε 0.5 ε ε ε 0.01
。 0.01 0.02 0.01 0.02 0.01 0.01 0.02 0.01 0.02 0.01 0.002 0.01 0.02 0.01 0.01 0.7
Table 1: An example of an alignment matrix which contains the translation strength for
each word pair (Chinese–Japanese). The scores are obtained using Anymalign’s output.
Computing by w.
The optimal joint clustering of a bipartite graph is computed recursively using the fol-
lowing formula for searching the best alignment between words in the source and target
languages (Zha et al., 2001; Lardilleux et al., 2012).
cut(X, Y ) = W(X, Y ) + W(X, Y ) (2)
X, X, Y , Y denote the segmentation of the sentences. Here the block we start with is the
entire matrix. Splitting horizontally and vertically into two parts gives four sub-blocks.
W(X, Y ) =
s∈X,t∈Y
w(s, t) (3)
W(X, Y ) is the sum of all translation strengths between all source and target words inside
a sub-block (X, Y ).
The point where to is found on the x and y which minimize Ncut (Lardilleux et al., 2012):
Ncut(X, Y ) =
cut(X, Y )
cut(X, Y ) + 2 × W(X, Y )
+
cut(X, Y )
cut(X, Y ) + 2 × W(X, Y )
(4)
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
そ
れ
ら
の 値 に
基
づ
い
て
u
p
g
m
a
法
に
よ
っ
て
ク
ラ
ス
タ
|
分
析
を
行
っ
た 。
根据
这些
值
,
通过
upgma
法
进行
聚类
分析
。
Table 2: Steps in recursive segmentation and alignment result using sampling-based
alignment and hierarchical sub-sentential alignment method.
SMT experiments
• Experimental protocol (Chinese and Japanese data used): Chinese–Japanese JPO
Patent Corpus (JPC)2 provided by WAT 2015 for the patents subtask. We used sen-
tences of 40 words or less than 40 words as our training data for the translation models,
but use all of the Japanese sentences in the parallel corpus for training the language
models. We used all of the development data for tuning.
Baseline Chinese Japanese
train
sentences 820,184 820,184
words 15,655,674 20,279,246
mean ± std.dev. 19.39 ± 6.71 25.08 ± 7.75
tune
sentences 4,000 4,000
words 114,363 143,853
mean ± std.dev. 28.71 ± 18.34 36.12 ± 21.73
test
sentences 2,000 2,000
words 55,582 70,117
mean ± std.dev. 27.83 ±16.73 35.09 ± 20.16
• Experimental results
(using the different alignment approaches, tools and Moses versions)
– alignment tools: GIZA++ (baseline) and MGIZA, moses 2.1.1.
s→t Moses Aligner BLEU RIBES Training time
zh→ja
2.1.1 MGIZA 37.70 0.783000 5:34:28
2.1.1 GIZA++ 37.46 0.778914 4:43:56
– alignment tools: the alignment method of combining sampling-based alignment and
bilingual hierarchical sub-sentential alignment methods. Here, 2 (c) shows option -i of
Anymalign is 2, and Cutnlaign version where core component is implemented in C.
Language Moses
Aligner
BLEU Training timeAnymalign + Cutnalign
Timeout (s) i
zh-ja 3.0 1200 2 (c) 36.11 1:2:8
zh-ja 3.0 5400 2 (c) 36.07 2:9:29
zh-ja 2.1.1 1200 2 (c) 35.95 0:57:1
zh-ja 2.1.1 1200 2 (python) 35.93 1:1:16
Conclusion
We have shown that it is possible to accelerate development of SMT systems follow-
ing the work by Lardilleux et al. (2012) and Yang and Lepage (2015) on bilingual
hierarchical sub-sentential alignment. We performed several machine translation ex-
periments using different alignment methods and obtained a significant reduction of
processing training time. Setting different timeouts for Anymalign did not change the
translation quality. In other word, we get a relative steady translation quality even
when less time is allotted to word-to-word association computation. Here, the fastest
training time was only 57 minutes, one fifth compared with the use of GIZA++ or
MGIZA.
1
https://anymalign.limsi.fr
2
http://lotus.kuee.kyoto-u.ac.jp/WAT/patent/index.html

More Related Content

What's hot

Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionYu Liu
 
Diploma_Semester-II_Advanced Mathematics_Definite Integration
Diploma_Semester-II_Advanced Mathematics_Definite IntegrationDiploma_Semester-II_Advanced Mathematics_Definite Integration
Diploma_Semester-II_Advanced Mathematics_Definite IntegrationRai University
 
Hyers ulam rassias stability of exponential primitive mapping
Hyers  ulam rassias stability of exponential primitive mappingHyers  ulam rassias stability of exponential primitive mapping
Hyers ulam rassias stability of exponential primitive mappingAlexander Decker
 
Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 officialEvert Sandye Taasiringan
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Tsuyoshi Sakama
 
Numerical solutions for linear fredholm integro differential
 Numerical solutions for linear fredholm integro differential  Numerical solutions for linear fredholm integro differential
Numerical solutions for linear fredholm integro differential Alexander Decker
 
Introductory maths analysis chapter 00 official
Introductory maths analysis   chapter 00 officialIntroductory maths analysis   chapter 00 official
Introductory maths analysis chapter 00 officialEvert Sandye Taasiringan
 
Isoparametric bilinear quadrilateral element _ ppt presentation
Isoparametric bilinear quadrilateral element _ ppt presentationIsoparametric bilinear quadrilateral element _ ppt presentation
Isoparametric bilinear quadrilateral element _ ppt presentationFilipe Giesteira
 
Limit in Dual Space
Limit in Dual SpaceLimit in Dual Space
Limit in Dual SpaceQUESTJOURNAL
 
Diploma_Semester-II_Advanced Mathematics_Indefinite integration
Diploma_Semester-II_Advanced Mathematics_Indefinite integrationDiploma_Semester-II_Advanced Mathematics_Indefinite integration
Diploma_Semester-II_Advanced Mathematics_Indefinite integrationRai University
 
TOC 1 | Introduction to Theory of Computation
TOC 1 | Introduction to Theory of ComputationTOC 1 | Introduction to Theory of Computation
TOC 1 | Introduction to Theory of ComputationMohammad Imam Hossain
 
Bounded solutions of certain (differential and first order
Bounded solutions of certain (differential and first orderBounded solutions of certain (differential and first order
Bounded solutions of certain (differential and first orderAlexander Decker
 
Introductory maths analysis chapter 01 official
Introductory maths analysis   chapter 01 officialIntroductory maths analysis   chapter 01 official
Introductory maths analysis chapter 01 officialEvert Sandye Taasiringan
 
TOC 8 | Derivation, Parse Tree & Ambiguity Check
TOC 8 | Derivation, Parse Tree & Ambiguity CheckTOC 8 | Derivation, Parse Tree & Ambiguity Check
TOC 8 | Derivation, Parse Tree & Ambiguity CheckMohammad Imam Hossain
 
The Existence of Maximal and Minimal Solution of Quadratic Integral Equation
The Existence of Maximal and Minimal Solution of Quadratic Integral EquationThe Existence of Maximal and Minimal Solution of Quadratic Integral Equation
The Existence of Maximal and Minimal Solution of Quadratic Integral EquationIRJET Journal
 

What's hot (19)

Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
 
Diploma_Semester-II_Advanced Mathematics_Definite Integration
Diploma_Semester-II_Advanced Mathematics_Definite IntegrationDiploma_Semester-II_Advanced Mathematics_Definite Integration
Diploma_Semester-II_Advanced Mathematics_Definite Integration
 
Chapter 6 - Matrix Algebra
Chapter 6 - Matrix AlgebraChapter 6 - Matrix Algebra
Chapter 6 - Matrix Algebra
 
Hyers ulam rassias stability of exponential primitive mapping
Hyers  ulam rassias stability of exponential primitive mappingHyers  ulam rassias stability of exponential primitive mapping
Hyers ulam rassias stability of exponential primitive mapping
 
Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 official
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Numerical solutions for linear fredholm integro differential
 Numerical solutions for linear fredholm integro differential  Numerical solutions for linear fredholm integro differential
Numerical solutions for linear fredholm integro differential
 
Project 7
Project 7Project 7
Project 7
 
Introductory maths analysis chapter 00 official
Introductory maths analysis   chapter 00 officialIntroductory maths analysis   chapter 00 official
Introductory maths analysis chapter 00 official
 
Isoparametric bilinear quadrilateral element _ ppt presentation
Isoparametric bilinear quadrilateral element _ ppt presentationIsoparametric bilinear quadrilateral element _ ppt presentation
Isoparametric bilinear quadrilateral element _ ppt presentation
 
Limit in Dual Space
Limit in Dual SpaceLimit in Dual Space
Limit in Dual Space
 
Diploma_Semester-II_Advanced Mathematics_Indefinite integration
Diploma_Semester-II_Advanced Mathematics_Indefinite integrationDiploma_Semester-II_Advanced Mathematics_Indefinite integration
Diploma_Semester-II_Advanced Mathematics_Indefinite integration
 
TOC 1 | Introduction to Theory of Computation
TOC 1 | Introduction to Theory of ComputationTOC 1 | Introduction to Theory of Computation
TOC 1 | Introduction to Theory of Computation
 
Bounded solutions of certain (differential and first order
Bounded solutions of certain (differential and first orderBounded solutions of certain (differential and first order
Bounded solutions of certain (differential and first order
 
NCM Graph theory talk
NCM Graph theory talkNCM Graph theory talk
NCM Graph theory talk
 
Introductory maths analysis chapter 01 official
Introductory maths analysis   chapter 01 officialIntroductory maths analysis   chapter 01 official
Introductory maths analysis chapter 01 official
 
J0744851
J0744851J0744851
J0744851
 
TOC 8 | Derivation, Parse Tree & Ambiguity Check
TOC 8 | Derivation, Parse Tree & Ambiguity CheckTOC 8 | Derivation, Parse Tree & Ambiguity Check
TOC 8 | Derivation, Parse Tree & Ambiguity Check
 
The Existence of Maximal and Minimal Solution of Quadratic Integral Equation
The Existence of Maximal and Minimal Solution of Quadratic Integral EquationThe Existence of Maximal and Minimal Solution of Quadratic Integral Equation
The Existence of Maximal and Minimal Solution of Quadratic Integral Equation
 

Similar to Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Alignment in Chinese–Japanese Translation of Patents

Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...Association for Computational Linguistics
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves  On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves IJECEIAES
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Accelerated life testing
Accelerated life testingAccelerated life testing
Accelerated life testingSteven Li
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesTwo Sigma
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence InformationLatent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Informationcsandit
 
Método Topsis - multiple decision makers
Método Topsis  - multiple decision makersMétodo Topsis  - multiple decision makers
Método Topsis - multiple decision makersLuizOlimpio4
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
Subjective evaluation answer ppt
Subjective evaluation answer pptSubjective evaluation answer ppt
Subjective evaluation answer pptMrunal Pagnis
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionXin-She Yang
 
A Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdfA Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdfLori Head
 

Similar to Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Alignment in Chinese–Japanese Translation of Patents (20)

semeval2016
semeval2016semeval2016
semeval2016
 
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves  On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves
 
F5233444
F5233444F5233444
F5233444
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Accelerated life testing
Accelerated life testingAccelerated life testing
Accelerated life testing
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence InformationLatent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
 
Método Topsis - multiple decision makers
Método Topsis  - multiple decision makersMétodo Topsis  - multiple decision makers
Método Topsis - multiple decision makers
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
J017256674
J017256674J017256674
J017256674
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
O0447796
O0447796O0447796
O0447796
 
Subjective evaluation answer ppt
Subjective evaluation answer pptSubjective evaluation answer ppt
Subjective evaluation answer ppt
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
A Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdfA Decision Support System for Solving Linear Programming Problems.pdf
A Decision Support System for Solving Linear Programming Problems.pdf
 

More from Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Alignment in Chinese–Japanese Translation of Patents

  • 1. Sampling-based Alignment and Hierarchical Sub-sentential Alignment in Chinese–Japanese Translation of Patents Wei Yang, Zhongwen Zhao, Baosong Yang and Yves Lepage Graduate School of Information, Production and Systems Waseda University {kevinyoogi@akane., zzw890827@fuji., yangbaosong@fuji.}waseda.jp, yves.lepage@waseda.jp This paper describes Chinese–Japanese translation systems based on different alignment methods using the JPO corpus and our submission (ID: WASUIPS) to the subtask of the 2015 Workshop on Asian Translation. One of the alignment methods used is bilingual hi- erarchical sub-sentential alignment combined with sampling-based multilingual alignment. We also accelerated this method and in this paper, we evaluate the translation results and time spent on several machine translation tasks. The training time is much faster than the standard baseline pipeline (GIZA++/Moses) and MGIZA/Moses. Bilingual hierarchical sub-sentential alignment method used in Phrase-based Statistical Machine Translation (PB-SMT) • Associative approaches: use a local maximization process in which each sentence is processed independently. – Anymalign1: is an open source multilingual associative aligner (Lardilleux and Lep- age, 2009; Lardilleux et al., 2013). This method samples large numbers of sub- corpora randomly to obtain source and target word or phrase occurrence distributions. – Cutnalign: is a bilingual hierarchical sub-sentential alignment method (Lardilleux et al., 2012). It is based on a recursive binary segmentation process of the alignment matrix between a source sentence and its corresponding target sentence. We make use of this method in combination with Anymalign. It is a three-step approach: ∗ measure the strength of the translation link between any source and target pair of words; ∗ compute the optimal joint clustering of a bipartite graph to search the best alignment; ∗ segment and align a pair of sentences. When building alignment matrices, the strength between two words is evaluated using the following formula (Lardilleux et al., 2012). w(s, t) = p(s|t) × p(t|s) (1) (p(s|t) and p(t|s)) are translation probabilities estimated by Anymalign. An example of alignment matrix is shown in Table 1. そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 ε ε ε 0.27 0.46 0.01 ε ε 0.002 ε ε ε ε ε ε 0.02 这些 0.38 ε ε 0.02 ε ε ε 0.001 ε ε ε ε ε ε ε 0.01 值 0.012 0.27 0.44 ε ε ε ε ε ε ε ε ε ε ε ε 0.03 , 0.002 0.01 0.01 0.13 0.12 0.21 0.10 0.002 0.001 0.002 0.001 0.01 0.01 0.01 ε 0.01 通过 ε ε 0.01 ε ε 0.06 ε ε 0.52 ε ε ε ε 0.02 ε 0.01 upgma ε ε ε ε ε ε 0.75 ε ε ε ε ε ε ε ε 0.02 法 ε ε ε ε ε ε ε 0.013 0.013 ε ε ε ε ε ε 0.01 进行 ε ε ε ε ε ε ε ε ε ε ε 0.01 0.23 0.34 0.21 0.01 聚类 ε ε ε ε ε ε ε ε ε 0.045 0.045 ε ε ε ε 0.02 分析 ε ε ε ε ε ε ε ε ε ε ε 0.5 ε ε ε 0.01 。 0.01 0.02 0.01 0.02 0.01 0.01 0.02 0.01 0.02 0.01 0.002 0.01 0.02 0.01 0.01 0.7 Table 1: An example of an alignment matrix which contains the translation strength for each word pair (Chinese–Japanese). The scores are obtained using Anymalign’s output. Computing by w. The optimal joint clustering of a bipartite graph is computed recursively using the fol- lowing formula for searching the best alignment between words in the source and target languages (Zha et al., 2001; Lardilleux et al., 2012). cut(X, Y ) = W(X, Y ) + W(X, Y ) (2) X, X, Y , Y denote the segmentation of the sentences. Here the block we start with is the entire matrix. Splitting horizontally and vertically into two parts gives four sub-blocks. W(X, Y ) = s∈X,t∈Y w(s, t) (3) W(X, Y ) is the sum of all translation strengths between all source and target words inside a sub-block (X, Y ). The point where to is found on the x and y which minimize Ncut (Lardilleux et al., 2012): Ncut(X, Y ) = cut(X, Y ) cut(X, Y ) + 2 × W(X, Y ) + cut(X, Y ) cut(X, Y ) + 2 × W(X, Y ) (4) そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 そ れ ら の 値 に 基 づ い て u p g m a 法 に よ っ て ク ラ ス タ | 分 析 を 行 っ た 。 根据 这些 值 , 通过 upgma 法 进行 聚类 分析 。 Table 2: Steps in recursive segmentation and alignment result using sampling-based alignment and hierarchical sub-sentential alignment method. SMT experiments • Experimental protocol (Chinese and Japanese data used): Chinese–Japanese JPO Patent Corpus (JPC)2 provided by WAT 2015 for the patents subtask. We used sen- tences of 40 words or less than 40 words as our training data for the translation models, but use all of the Japanese sentences in the parallel corpus for training the language models. We used all of the development data for tuning. Baseline Chinese Japanese train sentences 820,184 820,184 words 15,655,674 20,279,246 mean ± std.dev. 19.39 ± 6.71 25.08 ± 7.75 tune sentences 4,000 4,000 words 114,363 143,853 mean ± std.dev. 28.71 ± 18.34 36.12 ± 21.73 test sentences 2,000 2,000 words 55,582 70,117 mean ± std.dev. 27.83 ±16.73 35.09 ± 20.16 • Experimental results (using the different alignment approaches, tools and Moses versions) – alignment tools: GIZA++ (baseline) and MGIZA, moses 2.1.1. s→t Moses Aligner BLEU RIBES Training time zh→ja 2.1.1 MGIZA 37.70 0.783000 5:34:28 2.1.1 GIZA++ 37.46 0.778914 4:43:56 – alignment tools: the alignment method of combining sampling-based alignment and bilingual hierarchical sub-sentential alignment methods. Here, 2 (c) shows option -i of Anymalign is 2, and Cutnlaign version where core component is implemented in C. Language Moses Aligner BLEU Training timeAnymalign + Cutnalign Timeout (s) i zh-ja 3.0 1200 2 (c) 36.11 1:2:8 zh-ja 3.0 5400 2 (c) 36.07 2:9:29 zh-ja 2.1.1 1200 2 (c) 35.95 0:57:1 zh-ja 2.1.1 1200 2 (python) 35.93 1:1:16 Conclusion We have shown that it is possible to accelerate development of SMT systems follow- ing the work by Lardilleux et al. (2012) and Yang and Lepage (2015) on bilingual hierarchical sub-sentential alignment. We performed several machine translation ex- periments using different alignment methods and obtained a significant reduction of processing training time. Setting different timeouts for Anymalign did not change the translation quality. In other word, we get a relative steady translation quality even when less time is allotted to word-to-word association computation. Here, the fastest training time was only 57 minutes, one fifth compared with the use of GIZA++ or MGIZA. 1 https://anymalign.limsi.fr 2 http://lotus.kuee.kyoto-u.ac.jp/WAT/patent/index.html