Tree-based Translation Models (『機械翻訳』§6.2-6.3)

14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 1
Tree-based Translation Models
Yusuke Oda
@odashi_t
2014/6/5 NAIST MT-Study Group

Agenda
●
(6.2) Synchronous Context Free Grammar (SCFG)
– (6.2.2) Learning SCFG
– (6.2.3) Introducing Syntax Labels
– (6.2.4) Features
– (6.2.5) Decoding
– (6.2.6) Rescoring
●
(6.3) Synchronous Tree Substitution Grammar (STSG)
– (6.3.1) Characteristics of STSG
– (6.3.2) Learning STSG
– (6.3.3) Features
– (6.4.4) Decoding
– (6.3.5) Binarization
●
(6.4) Synchronous Parsing
– (6.4.1) Inversion Transduction Grammar (ITG)
– (6.4.2) Span Pruning
– (6.4.3) Beam Search
– (6.4.4) Two Parsing
Hiero
Travatar

Synchronous Context Free Grammar
(SCFG)

Learning SCFG
●
Synchronous rules are retrieved from each parallel corpora and their
word alignment .
●
: Source sentence
●
: Target sentence
●
: Set of word alignment

Closed Phrase Pair under Word Alignment
●
A phrase pair is closed under its word alignment
●
Phrase pair and alignment satisfy below:
he
will
dissolve
the
diet
in
the
near
future 彼
は
近い
うち
に
国会
を
解散
する
(国会を → the diet)

Extracting Abstract Rules
●
We can make more abstract synchronous rules by replacing some words
in a phrase pair into a non-terminal symbol, when the phrase pair
covers other "small" phrase pair.
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
dissolve
in
the
に
解散
する
(国会を, the diet)
(近いうち, near future)
(近いうち ... 解散する, dissolve the ... near future)
(X1 に X2 解散する, dissolve X2 in the X1)

Hiero Grammar
●
Hierarchical phrase grammar (Hiero Grammar):
– Set of all synchronous rule in the parallel corpus
●
Algorithm:
1.
where is the set of all possible phrase pair in the parallel corpora.
2. If a rule and a phrase pair satisfies then
3.

Constraints of Hiero Rules
●
To suppress size and ambiguity of Hiero grammar, we can introduce
some constraints for rule extraction.
●
Minimal phrase pair
– (国会を, the diet) ... BAD
– (国会, the diet) ... GOOD
●
Phrase length
– (奈良先端科学技術大学院大学情報科学研究科自然言語処理学研究室, ...) BAD (too many words)
●
Number of symbol
– X → 〈あらゆる X1 を全て X2 の方へねじ曲げたのだ, ...〉 BAD (too many symbols)
●
Rank of rule
– X → 〈X1 が X2 で X3 に X4 した, ...〉 BAD (too many non-terminals)
the
diet
国会
を

Glue Rules
●
To make large size sentence using small rules, we introduce glue rules
as below:

Introducing Syntax Labels
●
Up to here, we considered basic ideas of Hiero rules.
– non-terminal symbol are only and .
●
This model is very simple, but very ambiguous.
●
Next, we introduce syntax information into Hiero rules.
= Syntax-augmented machine translation (SAMT)
S
NP VP
PRP VBZ DT NN
this is a pen
NP
Hiero Syntax
+ →　SAMT

Combinatorial Categorical Grammar (CCG)
●
SAMT uses categories (≒partial structure of syntax label) based on
the idea of combinatorial categorical grammar (CCG) .
●
Categories:
: Syntax label with absence of right-side child
: Syntax label with absence of left-side child
: Concatenation of two syntax labels and

Extracting SAMT Rules
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
VP
VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
NP
NP
PP
VP
NPDT
IN+DT
VP/PP
VPVB
VP → 〈NPDT1 に NP2 解散する, dissolve NP2 in the NPDT1〉
VP → 〈近いうち IN+DT1 国会を VB2, VB2 the diet IN+DT1 near future〉
etc...

Probabilistic Formalization of Hiero Model
●
We consider that the translation problem using Hiero grammar is
maximization of posterior probability (similar to phrase based model):
●
And we assume the probability is modeled as log-linear model:
: Set of derivation (≒ set of used synchronous rules)
: Weights
: Feature functions

Features of Hiero Model (1)
●
Generative model: likelihoods of translation probability
Forward model:
Backward model:
where
Forward
Backward

●
Syntax model (f):
Syntax model (e):
where
Syntax (f) Syntax (e)

●
Lexical translation model: goodness of phrase alignment
Forward model:
Backward model:
where
Forward
Backward

●
Language model: measuring fluency of hypothesis
Out-of-vocabulary (OOV) penalty: adjusting LM
●
Length penalty: adjusting number of words in hypothesis
Glueing penalty: adjusting number of glue rules in derivation

Decoding of Hiero Model
●
Now input sentence and set of SCFG rules are given, we find
the optimal output sequence :
: Set of possible derivation given a grammar
: Sequence of terminal symbols in given derivationn

Decoding Process
1. Calculate intersection between and .
•
= Generating syntax forest using CYK algorithm
2. Transform syntax forest into corresponding translation forest .
3. Output the sequence of terminal symbols in that maximizes model
score.
S
NP VP
PP NP V
NP P NP
が
犬
本
の
上に
座った
S
NP VP
the dog V NP PP
sat NP of P NP
the upper on the book
"犬が本の上に座った"
"the dog sat on the book"

Synchronous Tree Substitution Grammar
(STSG)

Synchronous Tree Substitution Grammar
●
STSG is a extension of Tree Substitution Grammar (TSG) for bilingual
analysis.
●
STSG is a subset of Synchronous Tree Adjoining Grammar (STAG).
●
Definition:
SCFG (Hiero)
STSG
STAG
U
U
Set of non-terminal symbol
Start symbol
Set of terminal symbol
Set of rules
Weight semiring

Synchronous Rules of STSG
●
Definition:
where : Elementary tree (source language)
: Elementary tree (target language)
: Association between and
●
All rules are also associated a weight:
S
x1:NP VP
x2:NP V
開けた
S
x1:NP VP
VBD x2:NP
opened
frontier

Expressive Power of STSG
●
SCFG cannot express the difference of syntax, but STSG can treat it.
●
Example:
– This synchronous rule cannot generate using more smaller SCFG rules
because these trees not corresponds any structure.
– STSG framework can treat these correspondence of tree structure directly.
NP
NP PP
N P x1:CD PC
犬が匹
NP
NNSx1:CD
dogs

Translation Models under STSG Framework
●
In the STSG framework, we can use the sequence of frontier nodes
(leaves of synchronous rule) instead of full tree.
●
4 translation models are available when we choose either tree or
sequence of frontier as data structure about source and target
language.
Target : frontier Target : tree
Source : frontier
String-to-string
translation
(= SCFG)
String-to-tree
translation
Source : tree
Tree-to-string
translation
Tree-to-tree
translation
S
x1:NP
VP
x2:NP
V
開けた
Tree
sequence of frontier nodes

Retrieving STSG Synchronous Rules
●
Heuristic method (similar to SCFG rule extraction)
: Syntax tree generated
from source sentence
: Syntax tree generated
from target sentence
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
VP
PP NP VP
N P
NP
NP VP P
VP
x1:PP x2:NP VP
V P
解散する
VP
x1:PPx2:NPVB
dissolve

GHKM Algorithm
●
Galley-Hopkins-Kinght-Marcu (GHKM) Algorithm
– Generating STSG synchronous rules (string-to-tree rules) by composing minimal
rules using inside-outside algorithm.
Minimal rule
Syntax tree
1.
Detecting minimal rules
from target syntax trees.
2.
Generating large synchronous
rules by composing minimal
rules.

GHKM: Alignment Span (1)
●
Alignment span :
– Set of indexes of words in source sentence aligned to partial tree
●
Complement alignment span :
– Set of indexes of words in source sentence aligned to other than
●
Closure :
– Minimum range that covers the alignment span

GHKM: Alignment Span (2)
he
will
dissolve
the
diet
in
the
near
future
彼
は
近い
うち
に
国会
を
解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
S
NP
PRP
MD

GHKM: Admissible Node
●
Admissible node:
– Node in target syntax tree that satisfies:
he
will
dissolve
the
diet
in
the
near
future
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
S
NP
PRP
MD
彼
は
近い
うち
に
国会
を
解散
する

GHKM: Minimal Rule
●
Split the syntax tree by admissible node
he
will
dissolve
the
diet
in
the
near
future
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
S
NP
PRP
MD
彼
は
近い
うち
に
国会
を
解散
する
VP
x1:PP x2:NP x3:VB
x
x3 x2 x1
VP
the near future
x
近いうち
DT JJ NN

Extension for Tree-to-tree Model (1)
●
We need to extract node pairs of two syntax trees that are admissible
each other.
●
First, find admissible nodes in given .
●
A node pair satisfies below then they are
bidirectional admissible:
●
Span :
– Minimum range over sentence that covers all terminal symbols in

Extension for Tree-to-tree Model (2)
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
VP
PP NP VP
N P
NP
NP VP P

Features of STSG Model (1)
●

●
Lexical translation model: goodness of phrase alignment

●
Height penalty: adjusting depth of derivation
●
Internal node penalty: adjusting total size of derivation
●
Some features introduced to Hiero model are also available

Decoding of STSG Model
●
STSG decoding is basically same method as Hiero decoding:
Depends on translation model

Difference of Formalization of Each Model
●
String-to-string model
– Same model as Hiero (SCFG) model.
●
String-to-tree model
– Never use any informations from syntax of source sentence.
●
Tree-to-string model
●
Tree-to-tree model
– Explicitly use syntax informations of source sentence.
– Translation process can be divided into syntax analysis and decoding.
Source
sentence
Syntax tree
of source sentence
Translation
hypothesi(e)s
Syntax
analyzer Decoder
Non-syntax-based
translation
Syntax(tree)-based
translation

Formalization of Syntax-based Translation
●
Syntax-based translation model uses the syntax tree of source
sentence.
●
We can ignore because is already decided while syntax
analysis.

Questions & Discussions

Tree-based Translation Models (『機械翻訳』§6.2-6.3)

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Viewers also liked

Viewers also liked (13)

Similar to Tree-based Translation Models (『機械翻訳』§6.2-6.3)

Similar to Tree-based Translation Models (『機械翻訳』§6.2-6.3) (20)

Recently uploaded

Recently uploaded (20)

Tree-based Translation Models (『機械翻訳』§6.2-6.3)