SlideShare a Scribd company logo
MT	
  Study	
  Group	
  
	
  
Transla1ng	
  into	
  Morphologically	
  Rich	
  Languages	
  
with	
  Synthe1c	
  Phrases	
  
	
  
Victor	
  Chahuneau,	
  	
  Eva	
  Schlinger,	
  	
  Noah	
  A.	
  Smith,	
  	
  Chris	
  Dyer	
  
	
  
Proc.	
  of	
  EMNLP	
  2013,	
  SeaJle,	
  USA	
  
Introduced	
  by	
  Akiva	
  Miura,	
  AHC-­‐Lab	
  
15/10/15	
 20141©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 1
Contents	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 2	
1.  Introduc1on	
  
2.  Translate-­‐and-­‐Inflect	
  Model	
  
3.  Morphological	
  Grammars	
  and	
  Features	
  
4.  Inflec1on	
  Model	
  Parameter	
  Es1ma1on	
  
5.  Synthe1c	
  Phrases	
  
6.  Transla1on	
  Experiments	
  
7.  Conclusion	
  
8.  Impressions
1.	
  Introduc1on	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 3	
Problem:	
  
	
  
l  MT	
  into	
  morphologically	
  rich	
  languages	
  (e.g.	
  inflec1on	
  
languages)	
  is	
  challenging	
  due	
  to:	
  
•  lexical	
  sparsity	
  
•  large	
  variety	
  of	
  gramma1cal	
  features	
  expressed	
  with	
  
morphology	
  
l  This	
  paper:	
  
•  introduces	
  a	
  method	
  using	
  target	
  language	
  morphological	
  
grammars	
  to	
  address	
  this	
  challenge	
  
•  improves	
  transla1on	
  quality	
  from	
  English	
  
into	
  Russian/Hebrew/Swahili	
  
1.	
  Introduc1on	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 4	
Proposed	
  Approach:	
  
	
  
l  The	
  proposed	
  approach	
  decomposes	
  MT	
  process	
  into	
  steps:	
  
1.  transla1ng	
  from	
  a	
  word	
  (or	
  phrase)	
  into	
  a	
  meaning-­‐bearing	
  
stem	
  
2.  selec1ng	
  appropriate	
  inflec3on	
  using	
  feature-­‐rich	
  
discrimina1ve	
  model	
  that	
  condi1ons	
  on	
  the	
  source	
  context	
  
of	
  the	
  word	
  being	
  translated	
  
l  Inventory	
  of	
  transla1on	
  rules	
  individual	
  words	
  and	
  short	
  
phrases	
  are	
  obtained	
  using	
  standard	
  transla1on	
  rule	
  
extrac1on	
  techniques	
  of	
  Chiang	
  (2007)	
  
•  Authors	
  call	
  these	
  synthe3c	
  phrases	
  
1.	
  Introduc1on	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 5	
Advantages:	
  
	
  
l  The	
  major	
  advantages	
  of	
  this	
  approach	
  are:	
  
1.  synthesized	
  forms	
  are	
  targeted	
  to	
  a	
  specific	
  transla1on	
  
context	
  
2.  mul1ple,	
  alterna1ve	
  phrases	
  may	
  be	
  generated	
  with	
  the	
  final	
  
choice	
  among	
  rules	
  lec	
  to	
  the	
  global	
  transla1on	
  model	
  
3.  virtually	
  no	
  language-­‐specific	
  engineering	
  is	
  necessary	
  
4.  any	
  phrase-­‐	
  or	
  syntax-­‐based	
  decoder	
  can	
  be	
  used	
  without	
  
modifica1on	
  
5.  we	
  can	
  generate	
  forms	
  that	
  were	
  not	
  aJested	
  in	
  the	
  bilingual	
  
training	
  data	
  
2.	
  Translate-­‐and-­‐Inflect	
  Model	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 6	
Figure	
  1:	
  The	
  inflec1on	
  model	
  predicts	
  a	
  form	
  for	
  the	
  target	
  verb	
  lemma	
  
σ=пытаться	
  (pytat’sya)	
  based	
  on	
  its	
  source	
  a'empted	
  and	
  the	
  linear	
  and	
  
syntac1c	
  source	
  context.	
  
inflec1on:	
  μ	
  =	
  mis-­‐sfm-­‐e	
  (equivalent	
  to	
  the	
  more	
  tradi1onal	
  morphological	
  
string:	
  +MAIN+IND+PAST+SING+FEM+MEDIAL+PERF).	
  	
  
она пыталась пересечь пути на ее велосипед
she had attempted to cross the road on her bike
PRP VBD VBN TO VB DT NN IN PRP$ NN
nsubj
aux
xcomp
σ:пытаться_V,+,μ:mis2sfm2e
C50 C473 C28 C8 C275 C37 C43 C82 C94 C331
root
-1 +1
она пыталась пересечь пути на ее велосипед
she had attempted to cross the road on her bik
PRP VBD VBN TO VB DT NN IN PRP$ NN
nsubj
aux
xcomp
σ:пытаться_V,+,μ:mis2sfm2e
C50 C473 C28 C8 C275 C37 C43 C82 C94 C3
root
-1 +1
he inflection model predicts a form for the target verb lemma =пытаться (pytat’sya)
mpted and the linear and syntactic source context. The correct inflection string for the obse
particular training instance is µ = mis-sfm-e (equivalent to the more traditional morpholo
D+PAST+SING+FEM+MEDIAL+PERF).
source aligned word ei
2.	
  Translate-­‐and-­‐Inflect	
  Model	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 7	
Modeling	
  Inflec1on:	
  
this sentence.2 This assumption is further relaxed
in §5 when the model is integrated in the translation
system.
We decompose the probability of generating each
target word f in the following way:
p(f | e, i) =
µ=f
p( | ei)
gen. stem
p(µ | , e, i)
gen. inflection
Here, each stem is generated independently from a
single aligned source word ei, but in practice we
1
This prevents the model from generating words that would
be difficult for the language model to reliably score.
2
This is the same assumption that Brown et al. (1993) make
in, for example, IBM Model 1.
tor and si
parameter
a feature
feature sp
allows to
put and th
morpholo
expressed
in the sum
features to
as such ca
bias terms
diagonal V
variables
onal Vij)
1678
h stem,
lable, a
ectional
the set
ns for a
ammar;
flections
ilingual
ministic
gical in-
m f. In
oach in
though
d to de-
ote this
abilistic
erally expresses multiple grammatical features, we
would like a model that naturally incorporates rich,
possibly overlapping features in its representation of
both the input (i.e., conditioning context) and out-
put (i.e., the inflection pattern). We therefore use
the following parametric form to model inflectional
probabilities:
u(µ, e, i) = exp (e, i) W (µ)+
(µ) V (µ) ,
p(µ | , e, i) =
u(µ, e, i)
µ u(µ , e, i)
. (1)
Here, is an m-dimensional source context fea-
ture vector function, is an n-dimensional mor-
phology feature vector function, W Rm n and
ut (i.e., conditioning context) and out-
inflection pattern). We therefore use
g parametric form to model inflectional
:
e, i) = exp (e, i) W (µ)+
(µ) V (µ) ,
e, i) =
u(µ, e, i)
µ u(µ , e, i)
. (1)
s an m-dimensional source context fea-
function, is an n-dimensional mor-
ure vector function, W Rm n and
- n-dimensional morphology feature vector function	
put (i.e., the inflection pattern). We therefore use
the following parametric form to model inflectional
probabilities:
u(µ, e, i) = exp (e, i) W (µ)+
(µ) V (µ) ,
p(µ | , e, i) =
u(µ, e, i)
µ u(µ , e, i)
. (1)
Here, is an m-dimensional source context fea-
ture vector function, is an n-dimensional mor-
phology feature vector function, W Rm n and
- m-dimensional source context feature vector function
2.	
  Translate-­‐and-­‐Inflect	
  Model	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 8	
Source	
  Context	
  Feature	
  Extrac1on:	
  
nsubj xcomprootnsubj xcomproot
he inflection model predicts a form for the target verb lemma =пытаться (pytat’sya
pted and the linear and syntactic source context. The correct inflection string for the obs
particular training instance is µ = mis-sfm-e (equivalent to the more traditional morpho
+PAST+SING+FEM+MEDIAL+PERF).
source aligned word ei
parent word e i
with its dependency i i
all children ej | j = i with their dependency i j
source words ei 1 and ei+1
token
part-of-speech tag
word cluster
– are ei, e i
at the root of the dependency tree?
– number of children, siblings of ei
ource features (e, i) extracted from e and its linguistic analysis. i denotes the parent o
the dependency tree and i i the typed dependency link.
ce Context Features: (e, i)
select the best inflection of a target-
ord, given the source word it translates
parser trained on the Penn Treeban
basic Stanford dependencies.
• Assignment of tokens to one of
Figure 2: Source features φ(e, i) extracted from e and its linguistic
analysis. πi denotes the parent of the token in position i in the
dependency tree and πi → i the typed dependency link.
3.	
  Morphological	
  Grammars	
  and	
  Features	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 9	
Supervisd	
  morphology	
  features:	
  
	
  
ψ(μ)	
  –	
  extracted	
  from	
  results	
  of	
  morphological	
  analysis	
  
(available	
  only	
  for	
  Russian	
  in	
  this	
  paper)	
  
	
  
Since	
  a	
  posi1on	
  tag	
  set	
  is	
  used,	
  it	
  is	
  straighqorward	
  to	
  convert	
  each	
  
fixed-­‐length	
  tag	
  μ	
  into	
  a	
  feature	
  vector	
  by	
  defining	
  a	
  binary	
  feature	
  
for	
  each	
  key-­‐value	
  pair	
  (e.g.,	
  Tense=past)	
  composing	
  the	
  tag.	
  
3.	
  Morphological	
  Grammars	
  and	
  Features	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 10	
Unsupervisd	
  morphology	
  features:	
  
	
  
ψ(μ)	
  –	
  generated	
  by	
  unsupervised	
  analyzer	
  
(not	
  using	
  language-­‐specific	
  engineering)	
  
	
  
We	
  produce	
  binary	
  features	
  corresponding	
  to	
  the	
  content	
  of	
  each	
  
poten1al	
  affixa1on	
  posi1on	
  rela1ve	
  to	
  the	
  stem:	
  
will involve a more direct method for specifying or
inferring these values.
Unsupervised morphology features: (µ). For
the unsupervised analyzer, we do not have a map-
ping from morphemes to structured morphological
attributes; however, we can create features from the
affix sequences obtained after morphological seg-
mentation. We produce binary features correspond-
ing to the content of each potential affixation posi-
tion relative to the stem:
prefix suffix
...-3 -2 -1 STEM +1 +2 +3...
For example, the unsupervised analysis µ =
wa+ki+wa+STEM of the Swahili word wakiwapiga
will produce the following features:
Statistics o
inflection mo
important to n
model is trai
pus, not just o
(which are typ
eters). Despit
flection mode
ferent inflecti
target languag
learning from
This makes th
all experimen
gle processor
SGD optimiz
3.3M Russian
tures were pr
performed in
For example, the unsupervised analysis µ = wa+ki+wa+STEM of the
Swahili word wakiwapiga will produce the following features:
pervised analyzer, we do not have a map-
m morphemes to structured morphological
; however, we can create features from the
uences obtained after morphological seg-
n. We produce binary features correspond-
e content of each potential affixation posi-
ive to the stem:
prefix suffix
.-3 -2 -1 STEM +1 +2 +3...
xample, the unsupervised analysis µ =
wa+STEM of the Swahili word wakiwapiga
uce the following features:
prefix[ 3][wa](µ) = 1,
prefix[ 2][ki](µ) = 1,
model is trained on the full p
pus, not just on a handful of de
(which are typically used to tun
eters). Despite this scale, traini
flection model is trained to disc
ferent inflectional paradigms, n
target language sentences (Blun
learning from all observable ru
This makes the training problem
all experiments in this paper w
gle processor using a Cython im
SGD optimizer. For our larges
3.3M Russian words, n = 231
tures were produced, and 10 S
performed in less than 16 hours
4.1 Intrinsic Evaluation
Before considering the broader
the unsupervised analyzer, we do not have a map-
ping from morphemes to structured morphological
attributes; however, we can create features from the
affix sequences obtained after morphological seg-
mentation. We produce binary features correspond-
ing to the content of each potential affixation posi-
tion relative to the stem:
prefix suffix
...-3 -2 -1 STEM +1 +2 +3...
For example, the unsupervised analysis µ =
wa+ki+wa+STEM of the Swahili word wakiwapiga
will produce the following features:
prefix[ 3][wa](µ) = 1,
prefix[ 2][ki](µ) = 1,
prefix[ 1][wa](µ) = 1.
pus, not jus
(which are
eters). Des
flection mo
ferent infle
target langu
learning fro
This makes
all experim
gle process
SGD optim
3.3M Russ
tures were
performed
4.1 Intrin
Before con
ing the infl
system, we
ping from morphemes to structured morpho
attributes; however, we can create features fr
affix sequences obtained after morphologic
mentation. We produce binary features corre
ing to the content of each potential affixatio
tion relative to the stem:
prefix suffix
...-3 -2 -1 STEM +1 +2 +3...
For example, the unsupervised analysis
wa+ki+wa+STEM of the Swahili word wakiw
will produce the following features:
prefix[ 3][wa](µ) = 1,
prefix[ 2][ki](µ) = 1,
prefix[ 1][wa](µ) = 1.
4 Inflection Model Parameter Estima
3.	
  Morphological	
  Grammars	
  and	
  Features	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 11	
Unsupervised	
  Morphological	
  Segmenta1on:	
  
Let M represent the set of all possible morphemes and
define a regular grammar:	
produce candidate
ence and then sta-
among the analy-
l., 2000; Hajiˇc et
ash and Rambow,
hnique is capable
tic analyses, it is
hand-crafted rule-
M MM (i.e., zero or more p
zero or more suffixes). To infe
structure for the words in the ta
sume that the vocabulary was g
lowing process:
1. Sample morpheme distribu
ric Dirichlet distributions:
The vocabulary was generated by the following process (Blocked Gibbs Sampling):
1.  Sample morpheme distributions from symmetric Dirichlet distributions: θp 〜
Dir|M|(αp) for prefixes, θσ 〜 Dir|M|(ασ) for stems, and θs 〜 Dir|M|(αs) for suffixes.
2.  Sample length distribution parameters λp 〜 Beta(βp,γp) for prefix sequences
and λs 〜 (βs,γs) for suffix sequences.
3.  Sample a vocabulary by creating each word type w using the following steps:
a)  Sample affix sequence lengths: lp 〜 Geom(λp); ls 〜 Geom(λs)
b)  Sample lp prefixes p1,…,plp independently from θp; ls suffixes s1,…,sls
independently from θs; and a stem σ 〜 θσ.
c)  Concatenate prefixes, the stem, and suffixes:
w = p1 + … + plp + σ + s1 + … + sls
4.	
  Inflec1on	
  Model	
  Parameter	
  Es1ma1on	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 12	
Estimating Parameter:
To set the parameters W and V of the inflection prediction
model, we use stochastic gradient descent to maximize the
conditional log-likelihood of a training set consisting of pairs
of source (English) sentence contextual features (φ) and
target word inflectional features (ψ).
When morphological category information is available, we
train an independent model for each open-class category (in
Russian, nouns, verbs, adjectives, numerals, adverbs);
otherwise a single model is used for all words.
4.	
  Inflec1on	
  Model	
  Parameter	
  Es1ma1on	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 13	
Intrinsic Evaluation:
Table 1: Corpus statistics.
Parallel Parallel+Monolingual
Sentences EN-tokens TRG-tokens EN-types TRG-types Sentences TRG-tokens TRG-types
Russian 150k 3.5M 3.3M 131k 254k 20M 360M 1,971k
Hebrew 134k 2.7M 2.0M 48k 120k 806k 15M 316k
Swahili 15k 0.3M 0.3M 23k 35k 596k 13M 334k
will involve a more direct method for specifying or
inferring these values.
Unsupervised morphology features: (µ). For
the unsupervised analyzer, we do not have a map-
ping from morphemes to structured morphological
attributes; however, we can create features from the
affix sequences obtained after morphological seg-
mentation. We produce binary features correspond-
ing to the content of each potential affixation posi-
tion relative to the stem:
prefix suffix
...-3 -2 -1 STEM +1 +2 +3...
Statistics of the parallel corpora used to train the
inflection model are summarized in Table 1. It is
important to note here that our richly parameterized
model is trained on the full parallel training cor-
pus, not just on a handful of development sentences
(which are typically used to tune MT system param-
eters). Despite this scale, training is simple: the in-
flection model is trained to discriminate among dif-
ferent inflectional paradigms, not over all possible
target language sentences (Blunsom et al., 2008) or
learning from all observable rules (Subotin, 2011).
This makes the training problem relatively tractable:
all experiments in this paper were trained on a sin-
gle processor using a Cython implementation of the
SGD optimizer. For our largest model, trained on
acc. ppl. | |
Supervised
Russian
N 64.1% 3.46 9.16
V 63.7% 3.41 20.12
A 51.5% 6.24 19.56
M 73.0% 2.81 9.14
average 63.1% 3.98 14.49
Unsup.
Russian all 71.2% 2.15 4.73
Hebrew all 85.5% 1.49 2.55
Swahili all 78.2% 2.09 11.46
Table 2: Intrinsic evaluation of inflection model (N:
nouns, V: verbs, A: adjectives, M: numerals).
show the eff
dency contex
good perform
context subs
model (we in
sian morpho
relationships
English). Th
of source lan
indicate that
rate predictio
Intrinsic evaluation of inflection model
(N: nouns, V: verbs, A: adjectives, M: numerals).
4.	
  Inflec1on	
  Model	
  Parameter	
  Es1ma1on	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 14	
Feature Ablation:
m, and
me set
f pos-
to the
of the
nted in
d. We
t, per-
ty of a
l quite
e con-
esults,
ectives
rts-of-
s they
iosyn-
performance).
Table 3: Feature ablation experiments using supervised
Russian classification experiments.
Features ( (e, i)) acc.
all 54.7%
linear context 52.7%
dependency context 44.4%
POS tags 54.5%
Brown clusters 54.5%
POS tags, Brown cl. 50.9%
lexical items 51.2%
5 Synthetic Phrases
Feature ablation experiments using supervised Russian classification
experiments.
5.	
  Synthe1c	
  Phrases	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 15	
Russian supervised
Verb: 1st Person
child(nsubj)=I child(nsubj)=we
Verb: Future tense
child(aux)=MD child(aux)=will
Noun: Animate
source=animals/victims/...
Noun: Feminine gender
source=obama/economy/...
Noun: Dative case
parent(iobj)
Adjective: Genitive case
grandparent(poss)
Hebrew
Suffix ‫ים‬ (masculine plural)
parent=NNS after=NNS
Prefix ‫א‬ (first person sing. + future)
child(nsubj)=I child(aux)='ll
Prefix ‫כ‬ (preposition like/as)
child(prep)=IN parent=as
Suffix ‫י‬ (possesive mark)
before=my child(poss)=my
Suffix ‫ה‬ (feminine mark)
child(nsubj)=she before=she
Prefix ‫כש‬ (when)
before=when before=WRB
Swahili
Prefix li (past)
source=VBD source=VBN
Prefix nita (1st person sing. + future)
child(aux) child(nsubj)=I
Prefix ana (3rd person sing. + present)
source=VBZ
Prefix wa (3rd person plural)
before=they child(nsubj)=NNS
Suffix tu (1st person plural)
child(nsubj)=she before=she
Prefix ha (negative tense)
source=no after=not
Figure 3: Examples of highly weighted features learned by the inflection model. We selected a few frequent morpho-
logical features and show their top corresponding source context features.
data, we first construct an additional phrase-based
translation model on the parallel corpus prepro-
cessed to replace inflected surface words with their
stems. We then extract a set of non-gappy phrases
for each sentence (e.g., X <attempted,
phrase already existing in the standard phrase table
happens to be recreated, both phrases are kept and
will compete with each other with different features
in the decoder.
For example, for the large EN RU system, 6%
6.	
  Transla1on	
  Experiments	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 16	
Experimental Set-Up:
l  Comparing:
•  A baseline system (Standard Hiero), using a 4-gram LM
•  An enriched system with class-based n-gram LM trained on the
monolingual data mapped to 600 Brown clusters
•  The enriched system further augmented with the inflected synthetic
phrases
l  Dataset:
•  Russian: News Commentary parallel corpus and additional monolingual
data crawled from news websites
•  Hebrew: transcribed TED talks and additional monolingual news data
•  Swahili: parallel corpus obtained by crawling the Global Voices project
website and additional monolingual data taken form the Helsinki Corpus
of Swahili
6.	
  Transla1on	
  Experiments	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 17	
Experimental Set-Up:
l  Comparing:
•  A baseline system (Standard Hiero), using a 4-gram LM
•  An enriched system with class-based n-gram LM trained on the
monolingual data mapped to 600 Brown clusters
•  The enriched system further augmented with the inflected synthetic
phrases
l  Dataset:
•  Russian: News Commentary parallel corpus and additional monolingual
data crawled from news websites
•  Hebrew: transcribed TED talks and additional monolingual news data
•  Swahili: parallel corpus obtained by crawling the Global Voices project
website and additional monolingual data taken form the Helsinki Corpus
of Swahili
6.	
  Transla1on	
  Experiments	
  (cont’d)	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 18	
Translation Results:agreement
e easily on
sequences.
Table 1:
News Com-
onal mono-
bsites.9
mposed of
al., 2012).
s also used.
obtained by
t website10
Table 4: Translation quality (measured by BLEU) aver-
aged over 3 MIRA runs.
EN RU EN HE EN SW
Baseline 14.7±0.1 15.8±0.3 18.3±0.1
+Class LM 15.7±0.1 16.8±0.4 18.7±0.2
+Synthetic
unsupervised 16.2±0.1 17.6±0.1 19.0±0.1
supervised 16.7±0.1 — —
rule shape indicator features and bigram cluster fea-
tures. In these large scale conditions, the BLEU score
improves from 18.8 to 19.6 with the addition of word
clusters and reaches 20.0 with synthetic phrases.
Details regarding this system are reported in Ammar
et al. (2013).
Translation quality (measured by BLEU) averaged over 3 MIRA runs.
7.	
  Conclusion	
  
15/10/15	
 2014©Akiba	
  (Akiva)	
  Miura	
  	
  	
  AHC-­‐Lab,	
  IS,	
  NAIST	
 19	
The	
  paper	
  
l  presents	
  an	
  efficient	
  technique	
  that	
  exploits	
  morphologically	
  
analyzed	
  corpora	
  to	
  produce	
  new	
  inflec1ons	
  possibly	
  unseen	
  
in	
  the	
  bilingual	
  training	
  data	
  
•  This	
  method	
  decomposes	
  into	
  two	
  simple	
  independent	
  
steps	
  	
  involving	
  well-­‐understood	
  discrimina1ve	
  models	
  
l  also	
  achieves	
  language	
  independency	
  by	
  exploi1ng	
  
unsupervised	
  morphological	
  segmenta1on	
  in	
  the	
  absence	
  of	
  
linguis1cally	
  informed	
  morphological	
  analyses

More Related Content

What's hot

A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
iyo
 
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesisAPSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
Shinnosuke Takamichi
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
RIILP
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Shinnosuke Takamichi
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
RIILP
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
RIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
RIILP
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
AbdurrahimDerric
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
Jasline Presilda
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
Lifeng (Aaron) Han
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
IJECEIAES
 
Object Oriented Programming lecture 1
Object Oriented Programming lecture 1Object Oriented Programming lecture 1
Object Oriented Programming lecture 1
Anwar Ul Haq
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Hw6 interpreter iterator GoF
Hw6 interpreter iterator GoFHw6 interpreter iterator GoF
Hw6 interpreter iterator GoF
Edison Lascano
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crf
ijnlc
 
100701 1st
100701 1st100701 1st
100701 1st
picshbj
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
Gennadi Lembersky
 
oop Lecture 3
oop Lecture 3oop Lecture 3
oop Lecture 3
Anwar Ul Haq
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
RIILP
 

What's hot (20)

A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesisAPSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
APSIPA2017: Trajectory smoothing for vocoder-free speech synthesis
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
 
Object Oriented Programming lecture 1
Object Oriented Programming lecture 1Object Oriented Programming lecture 1
Object Oriented Programming lecture 1
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 
Hw6 interpreter iterator GoF
Hw6 interpreter iterator GoFHw6 interpreter iterator GoF
Hw6 interpreter iterator GoF
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crf
 
100701 1st
100701 1st100701 1st
100701 1st
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
oop Lecture 3
oop Lecture 3oop Lecture 3
oop Lecture 3
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 

Viewers also liked

Quando me tornei invisivel
Quando me tornei invisivelQuando me tornei invisivel
Quando me tornei invisivel
Dhiancarlo Macedo
 
665254
665254665254
665254
Saulo Castro
 
FTOP Investor pitch
FTOP Investor pitchFTOP Investor pitch
FTOP Investor pitch
Andrew Barriskell
 
C:\documents and settings\28\mis documentos\doc1
C:\documents and settings\28\mis documentos\doc1C:\documents and settings\28\mis documentos\doc1
C:\documents and settings\28\mis documentos\doc1
kandris
 
Pronunciamento e conclamação
Pronunciamento e conclamaçãoPronunciamento e conclamação
Pronunciamento e conclamação
clabaroni
 
The Fairness pledge
The Fairness pledgeThe Fairness pledge
The Fairness pledge
FairnessCoalition
 
Animales
AnimalesAnimales
Animales
ale-paredes1989
 
Centrandonos en los_niños
Centrandonos en los_niñosCentrandonos en los_niños
Centrandonos en los_niños
casadomarisol
 
Informatica medica, Pregunta.
Informatica medica, Pregunta. Informatica medica, Pregunta.
Informatica medica, Pregunta.
Manu Ocampo Medina
 
Estudio de caso
Estudio de casoEstudio de caso
Estudio de caso
Carolyna Sanchez
 
la inmaculada
la inmaculadala inmaculada
la inmaculada
guest7b6c171
 
Cateq pt 08
Cateq pt 08Cateq pt 08
Cateq pt 08
Cursos Católicos
 

Viewers also liked (13)

Quando me tornei invisivel
Quando me tornei invisivelQuando me tornei invisivel
Quando me tornei invisivel
 
665254
665254665254
665254
 
FTOP Investor pitch
FTOP Investor pitchFTOP Investor pitch
FTOP Investor pitch
 
Week4cmap
Week4cmapWeek4cmap
Week4cmap
 
C:\documents and settings\28\mis documentos\doc1
C:\documents and settings\28\mis documentos\doc1C:\documents and settings\28\mis documentos\doc1
C:\documents and settings\28\mis documentos\doc1
 
Pronunciamento e conclamação
Pronunciamento e conclamaçãoPronunciamento e conclamação
Pronunciamento e conclamação
 
The Fairness pledge
The Fairness pledgeThe Fairness pledge
The Fairness pledge
 
Animales
AnimalesAnimales
Animales
 
Centrandonos en los_niños
Centrandonos en los_niñosCentrandonos en los_niños
Centrandonos en los_niños
 
Informatica medica, Pregunta.
Informatica medica, Pregunta. Informatica medica, Pregunta.
Informatica medica, Pregunta.
 
Estudio de caso
Estudio de casoEstudio de caso
Estudio de caso
 
la inmaculada
la inmaculadala inmaculada
la inmaculada
 
Cateq pt 08
Cateq pt 08Cateq pt 08
Cateq pt 08
 

Similar to [Paper Introduction] Translating into Morphologically Rich Languages with Synthetic Phrases

PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
ijistjournal
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
ijistjournal
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
MeetupDataScienceRoma
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
butest
 
Interpreter
InterpreterInterpreter
Interpreter
Ahmad Al-bsheer
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
Rakuten Group, Inc.
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
AbdurrahimDerric
 
referát.doc
referát.docreferát.doc
referát.doc
butest
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Syntax analysis
Syntax analysisSyntax analysis
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...
IJECEIAES
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
Lifeng (Aaron) Han
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
ijnlc
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
ijnlc
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
SDL
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
ijnlc
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
sipij
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 

Similar to [Paper Introduction] Translating into Morphologically Rich Languages with Synthetic Phrases (20)

PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Interpreter
InterpreterInterpreter
Interpreter
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
referát.doc
referát.docreferát.doc
referát.doc
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...
Fuzzy Recursive Least-Squares Approach in Speech System Identification: A Tra...
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 

More from NAIST Machine Translation Study Group

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
NAIST Machine Translation Study Group
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
NAIST Machine Translation Study Group
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
NAIST Machine Translation Study Group
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
NAIST Machine Translation Study Group
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
NAIST Machine Translation Study Group
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
NAIST Machine Translation Study Group
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
NAIST Machine Translation Study Group
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
NAIST Machine Translation Study Group
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
NAIST Machine Translation Study Group
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
NAIST Machine Translation Study Group
 

More from NAIST Machine Translation Study Group (14)

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1[Book Reading] 機械翻訳 - Section 7 No.1
[Book Reading] 機械翻訳 - Section 7 No.1
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 

Recently uploaded

SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 

Recently uploaded (20)

SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 

[Paper Introduction] Translating into Morphologically Rich Languages with Synthetic Phrases

  • 1. MT  Study  Group     Transla1ng  into  Morphologically  Rich  Languages   with  Synthe1c  Phrases     Victor  Chahuneau,    Eva  Schlinger,    Noah  A.  Smith,    Chris  Dyer     Proc.  of  EMNLP  2013,  SeaJle,  USA   Introduced  by  Akiva  Miura,  AHC-­‐Lab   15/10/15 20141©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 1
  • 2. Contents   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 2 1.  Introduc1on   2.  Translate-­‐and-­‐Inflect  Model   3.  Morphological  Grammars  and  Features   4.  Inflec1on  Model  Parameter  Es1ma1on   5.  Synthe1c  Phrases   6.  Transla1on  Experiments   7.  Conclusion   8.  Impressions
  • 3. 1.  Introduc1on   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 3 Problem:     l  MT  into  morphologically  rich  languages  (e.g.  inflec1on   languages)  is  challenging  due  to:   •  lexical  sparsity   •  large  variety  of  gramma1cal  features  expressed  with   morphology   l  This  paper:   •  introduces  a  method  using  target  language  morphological   grammars  to  address  this  challenge   •  improves  transla1on  quality  from  English   into  Russian/Hebrew/Swahili  
  • 4. 1.  Introduc1on  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 4 Proposed  Approach:     l  The  proposed  approach  decomposes  MT  process  into  steps:   1.  transla1ng  from  a  word  (or  phrase)  into  a  meaning-­‐bearing   stem   2.  selec1ng  appropriate  inflec3on  using  feature-­‐rich   discrimina1ve  model  that  condi1ons  on  the  source  context   of  the  word  being  translated   l  Inventory  of  transla1on  rules  individual  words  and  short   phrases  are  obtained  using  standard  transla1on  rule   extrac1on  techniques  of  Chiang  (2007)   •  Authors  call  these  synthe3c  phrases  
  • 5. 1.  Introduc1on  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 5 Advantages:     l  The  major  advantages  of  this  approach  are:   1.  synthesized  forms  are  targeted  to  a  specific  transla1on   context   2.  mul1ple,  alterna1ve  phrases  may  be  generated  with  the  final   choice  among  rules  lec  to  the  global  transla1on  model   3.  virtually  no  language-­‐specific  engineering  is  necessary   4.  any  phrase-­‐  or  syntax-­‐based  decoder  can  be  used  without   modifica1on   5.  we  can  generate  forms  that  were  not  aJested  in  the  bilingual   training  data  
  • 6. 2.  Translate-­‐and-­‐Inflect  Model   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 6 Figure  1:  The  inflec1on  model  predicts  a  form  for  the  target  verb  lemma   σ=пытаться  (pytat’sya)  based  on  its  source  a'empted  and  the  linear  and   syntac1c  source  context.   inflec1on:  μ  =  mis-­‐sfm-­‐e  (equivalent  to  the  more  tradi1onal  morphological   string:  +MAIN+IND+PAST+SING+FEM+MEDIAL+PERF).     она пыталась пересечь пути на ее велосипед she had attempted to cross the road on her bike PRP VBD VBN TO VB DT NN IN PRP$ NN nsubj aux xcomp σ:пытаться_V,+,μ:mis2sfm2e C50 C473 C28 C8 C275 C37 C43 C82 C94 C331 root -1 +1 она пыталась пересечь пути на ее велосипед she had attempted to cross the road on her bik PRP VBD VBN TO VB DT NN IN PRP$ NN nsubj aux xcomp σ:пытаться_V,+,μ:mis2sfm2e C50 C473 C28 C8 C275 C37 C43 C82 C94 C3 root -1 +1 he inflection model predicts a form for the target verb lemma =пытаться (pytat’sya) mpted and the linear and syntactic source context. The correct inflection string for the obse particular training instance is µ = mis-sfm-e (equivalent to the more traditional morpholo D+PAST+SING+FEM+MEDIAL+PERF). source aligned word ei
  • 7. 2.  Translate-­‐and-­‐Inflect  Model  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 7 Modeling  Inflec1on:   this sentence.2 This assumption is further relaxed in §5 when the model is integrated in the translation system. We decompose the probability of generating each target word f in the following way: p(f | e, i) = µ=f p( | ei) gen. stem p(µ | , e, i) gen. inflection Here, each stem is generated independently from a single aligned source word ei, but in practice we 1 This prevents the model from generating words that would be difficult for the language model to reliably score. 2 This is the same assumption that Brown et al. (1993) make in, for example, IBM Model 1. tor and si parameter a feature feature sp allows to put and th morpholo expressed in the sum features to as such ca bias terms diagonal V variables onal Vij) 1678 h stem, lable, a ectional the set ns for a ammar; flections ilingual ministic gical in- m f. In oach in though d to de- ote this abilistic erally expresses multiple grammatical features, we would like a model that naturally incorporates rich, possibly overlapping features in its representation of both the input (i.e., conditioning context) and out- put (i.e., the inflection pattern). We therefore use the following parametric form to model inflectional probabilities: u(µ, e, i) = exp (e, i) W (µ)+ (µ) V (µ) , p(µ | , e, i) = u(µ, e, i) µ u(µ , e, i) . (1) Here, is an m-dimensional source context fea- ture vector function, is an n-dimensional mor- phology feature vector function, W Rm n and ut (i.e., conditioning context) and out- inflection pattern). We therefore use g parametric form to model inflectional : e, i) = exp (e, i) W (µ)+ (µ) V (µ) , e, i) = u(µ, e, i) µ u(µ , e, i) . (1) s an m-dimensional source context fea- function, is an n-dimensional mor- ure vector function, W Rm n and - n-dimensional morphology feature vector function put (i.e., the inflection pattern). We therefore use the following parametric form to model inflectional probabilities: u(µ, e, i) = exp (e, i) W (µ)+ (µ) V (µ) , p(µ | , e, i) = u(µ, e, i) µ u(µ , e, i) . (1) Here, is an m-dimensional source context fea- ture vector function, is an n-dimensional mor- phology feature vector function, W Rm n and - m-dimensional source context feature vector function
  • 8. 2.  Translate-­‐and-­‐Inflect  Model  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 8 Source  Context  Feature  Extrac1on:   nsubj xcomprootnsubj xcomproot he inflection model predicts a form for the target verb lemma =пытаться (pytat’sya pted and the linear and syntactic source context. The correct inflection string for the obs particular training instance is µ = mis-sfm-e (equivalent to the more traditional morpho +PAST+SING+FEM+MEDIAL+PERF). source aligned word ei parent word e i with its dependency i i all children ej | j = i with their dependency i j source words ei 1 and ei+1 token part-of-speech tag word cluster – are ei, e i at the root of the dependency tree? – number of children, siblings of ei ource features (e, i) extracted from e and its linguistic analysis. i denotes the parent o the dependency tree and i i the typed dependency link. ce Context Features: (e, i) select the best inflection of a target- ord, given the source word it translates parser trained on the Penn Treeban basic Stanford dependencies. • Assignment of tokens to one of Figure 2: Source features φ(e, i) extracted from e and its linguistic analysis. πi denotes the parent of the token in position i in the dependency tree and πi → i the typed dependency link.
  • 9. 3.  Morphological  Grammars  and  Features   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 9 Supervisd  morphology  features:     ψ(μ)  –  extracted  from  results  of  morphological  analysis   (available  only  for  Russian  in  this  paper)     Since  a  posi1on  tag  set  is  used,  it  is  straighqorward  to  convert  each   fixed-­‐length  tag  μ  into  a  feature  vector  by  defining  a  binary  feature   for  each  key-­‐value  pair  (e.g.,  Tense=past)  composing  the  tag.  
  • 10. 3.  Morphological  Grammars  and  Features  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 10 Unsupervisd  morphology  features:     ψ(μ)  –  generated  by  unsupervised  analyzer   (not  using  language-­‐specific  engineering)     We  produce  binary  features  corresponding  to  the  content  of  each   poten1al  affixa1on  posi1on  rela1ve  to  the  stem:   will involve a more direct method for specifying or inferring these values. Unsupervised morphology features: (µ). For the unsupervised analyzer, we do not have a map- ping from morphemes to structured morphological attributes; however, we can create features from the affix sequences obtained after morphological seg- mentation. We produce binary features correspond- ing to the content of each potential affixation posi- tion relative to the stem: prefix suffix ...-3 -2 -1 STEM +1 +2 +3... For example, the unsupervised analysis µ = wa+ki+wa+STEM of the Swahili word wakiwapiga will produce the following features: Statistics o inflection mo important to n model is trai pus, not just o (which are typ eters). Despit flection mode ferent inflecti target languag learning from This makes th all experimen gle processor SGD optimiz 3.3M Russian tures were pr performed in For example, the unsupervised analysis µ = wa+ki+wa+STEM of the Swahili word wakiwapiga will produce the following features: pervised analyzer, we do not have a map- m morphemes to structured morphological ; however, we can create features from the uences obtained after morphological seg- n. We produce binary features correspond- e content of each potential affixation posi- ive to the stem: prefix suffix .-3 -2 -1 STEM +1 +2 +3... xample, the unsupervised analysis µ = wa+STEM of the Swahili word wakiwapiga uce the following features: prefix[ 3][wa](µ) = 1, prefix[ 2][ki](µ) = 1, model is trained on the full p pus, not just on a handful of de (which are typically used to tun eters). Despite this scale, traini flection model is trained to disc ferent inflectional paradigms, n target language sentences (Blun learning from all observable ru This makes the training problem all experiments in this paper w gle processor using a Cython im SGD optimizer. For our larges 3.3M Russian words, n = 231 tures were produced, and 10 S performed in less than 16 hours 4.1 Intrinsic Evaluation Before considering the broader the unsupervised analyzer, we do not have a map- ping from morphemes to structured morphological attributes; however, we can create features from the affix sequences obtained after morphological seg- mentation. We produce binary features correspond- ing to the content of each potential affixation posi- tion relative to the stem: prefix suffix ...-3 -2 -1 STEM +1 +2 +3... For example, the unsupervised analysis µ = wa+ki+wa+STEM of the Swahili word wakiwapiga will produce the following features: prefix[ 3][wa](µ) = 1, prefix[ 2][ki](µ) = 1, prefix[ 1][wa](µ) = 1. pus, not jus (which are eters). Des flection mo ferent infle target langu learning fro This makes all experim gle process SGD optim 3.3M Russ tures were performed 4.1 Intrin Before con ing the infl system, we ping from morphemes to structured morpho attributes; however, we can create features fr affix sequences obtained after morphologic mentation. We produce binary features corre ing to the content of each potential affixatio tion relative to the stem: prefix suffix ...-3 -2 -1 STEM +1 +2 +3... For example, the unsupervised analysis wa+ki+wa+STEM of the Swahili word wakiw will produce the following features: prefix[ 3][wa](µ) = 1, prefix[ 2][ki](µ) = 1, prefix[ 1][wa](µ) = 1. 4 Inflection Model Parameter Estima
  • 11. 3.  Morphological  Grammars  and  Features  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 11 Unsupervised  Morphological  Segmenta1on:   Let M represent the set of all possible morphemes and define a regular grammar: produce candidate ence and then sta- among the analy- l., 2000; Hajiˇc et ash and Rambow, hnique is capable tic analyses, it is hand-crafted rule- M MM (i.e., zero or more p zero or more suffixes). To infe structure for the words in the ta sume that the vocabulary was g lowing process: 1. Sample morpheme distribu ric Dirichlet distributions: The vocabulary was generated by the following process (Blocked Gibbs Sampling): 1.  Sample morpheme distributions from symmetric Dirichlet distributions: θp 〜 Dir|M|(αp) for prefixes, θσ 〜 Dir|M|(ασ) for stems, and θs 〜 Dir|M|(αs) for suffixes. 2.  Sample length distribution parameters λp 〜 Beta(βp,γp) for prefix sequences and λs 〜 (βs,γs) for suffix sequences. 3.  Sample a vocabulary by creating each word type w using the following steps: a)  Sample affix sequence lengths: lp 〜 Geom(λp); ls 〜 Geom(λs) b)  Sample lp prefixes p1,…,plp independently from θp; ls suffixes s1,…,sls independently from θs; and a stem σ 〜 θσ. c)  Concatenate prefixes, the stem, and suffixes: w = p1 + … + plp + σ + s1 + … + sls
  • 12. 4.  Inflec1on  Model  Parameter  Es1ma1on   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 12 Estimating Parameter: To set the parameters W and V of the inflection prediction model, we use stochastic gradient descent to maximize the conditional log-likelihood of a training set consisting of pairs of source (English) sentence contextual features (φ) and target word inflectional features (ψ). When morphological category information is available, we train an independent model for each open-class category (in Russian, nouns, verbs, adjectives, numerals, adverbs); otherwise a single model is used for all words.
  • 13. 4.  Inflec1on  Model  Parameter  Es1ma1on  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 13 Intrinsic Evaluation: Table 1: Corpus statistics. Parallel Parallel+Monolingual Sentences EN-tokens TRG-tokens EN-types TRG-types Sentences TRG-tokens TRG-types Russian 150k 3.5M 3.3M 131k 254k 20M 360M 1,971k Hebrew 134k 2.7M 2.0M 48k 120k 806k 15M 316k Swahili 15k 0.3M 0.3M 23k 35k 596k 13M 334k will involve a more direct method for specifying or inferring these values. Unsupervised morphology features: (µ). For the unsupervised analyzer, we do not have a map- ping from morphemes to structured morphological attributes; however, we can create features from the affix sequences obtained after morphological seg- mentation. We produce binary features correspond- ing to the content of each potential affixation posi- tion relative to the stem: prefix suffix ...-3 -2 -1 STEM +1 +2 +3... Statistics of the parallel corpora used to train the inflection model are summarized in Table 1. It is important to note here that our richly parameterized model is trained on the full parallel training cor- pus, not just on a handful of development sentences (which are typically used to tune MT system param- eters). Despite this scale, training is simple: the in- flection model is trained to discriminate among dif- ferent inflectional paradigms, not over all possible target language sentences (Blunsom et al., 2008) or learning from all observable rules (Subotin, 2011). This makes the training problem relatively tractable: all experiments in this paper were trained on a sin- gle processor using a Cython implementation of the SGD optimizer. For our largest model, trained on acc. ppl. | | Supervised Russian N 64.1% 3.46 9.16 V 63.7% 3.41 20.12 A 51.5% 6.24 19.56 M 73.0% 2.81 9.14 average 63.1% 3.98 14.49 Unsup. Russian all 71.2% 2.15 4.73 Hebrew all 85.5% 1.49 2.55 Swahili all 78.2% 2.09 11.46 Table 2: Intrinsic evaluation of inflection model (N: nouns, V: verbs, A: adjectives, M: numerals). show the eff dency contex good perform context subs model (we in sian morpho relationships English). Th of source lan indicate that rate predictio Intrinsic evaluation of inflection model (N: nouns, V: verbs, A: adjectives, M: numerals).
  • 14. 4.  Inflec1on  Model  Parameter  Es1ma1on  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 14 Feature Ablation: m, and me set f pos- to the of the nted in d. We t, per- ty of a l quite e con- esults, ectives rts-of- s they iosyn- performance). Table 3: Feature ablation experiments using supervised Russian classification experiments. Features ( (e, i)) acc. all 54.7% linear context 52.7% dependency context 44.4% POS tags 54.5% Brown clusters 54.5% POS tags, Brown cl. 50.9% lexical items 51.2% 5 Synthetic Phrases Feature ablation experiments using supervised Russian classification experiments.
  • 15. 5.  Synthe1c  Phrases  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 15 Russian supervised Verb: 1st Person child(nsubj)=I child(nsubj)=we Verb: Future tense child(aux)=MD child(aux)=will Noun: Animate source=animals/victims/... Noun: Feminine gender source=obama/economy/... Noun: Dative case parent(iobj) Adjective: Genitive case grandparent(poss) Hebrew Suffix ‫ים‬ (masculine plural) parent=NNS after=NNS Prefix ‫א‬ (first person sing. + future) child(nsubj)=I child(aux)='ll Prefix ‫כ‬ (preposition like/as) child(prep)=IN parent=as Suffix ‫י‬ (possesive mark) before=my child(poss)=my Suffix ‫ה‬ (feminine mark) child(nsubj)=she before=she Prefix ‫כש‬ (when) before=when before=WRB Swahili Prefix li (past) source=VBD source=VBN Prefix nita (1st person sing. + future) child(aux) child(nsubj)=I Prefix ana (3rd person sing. + present) source=VBZ Prefix wa (3rd person plural) before=they child(nsubj)=NNS Suffix tu (1st person plural) child(nsubj)=she before=she Prefix ha (negative tense) source=no after=not Figure 3: Examples of highly weighted features learned by the inflection model. We selected a few frequent morpho- logical features and show their top corresponding source context features. data, we first construct an additional phrase-based translation model on the parallel corpus prepro- cessed to replace inflected surface words with their stems. We then extract a set of non-gappy phrases for each sentence (e.g., X <attempted, phrase already existing in the standard phrase table happens to be recreated, both phrases are kept and will compete with each other with different features in the decoder. For example, for the large EN RU system, 6%
  • 16. 6.  Transla1on  Experiments   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 16 Experimental Set-Up: l  Comparing: •  A baseline system (Standard Hiero), using a 4-gram LM •  An enriched system with class-based n-gram LM trained on the monolingual data mapped to 600 Brown clusters •  The enriched system further augmented with the inflected synthetic phrases l  Dataset: •  Russian: News Commentary parallel corpus and additional monolingual data crawled from news websites •  Hebrew: transcribed TED talks and additional monolingual news data •  Swahili: parallel corpus obtained by crawling the Global Voices project website and additional monolingual data taken form the Helsinki Corpus of Swahili
  • 17. 6.  Transla1on  Experiments   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 17 Experimental Set-Up: l  Comparing: •  A baseline system (Standard Hiero), using a 4-gram LM •  An enriched system with class-based n-gram LM trained on the monolingual data mapped to 600 Brown clusters •  The enriched system further augmented with the inflected synthetic phrases l  Dataset: •  Russian: News Commentary parallel corpus and additional monolingual data crawled from news websites •  Hebrew: transcribed TED talks and additional monolingual news data •  Swahili: parallel corpus obtained by crawling the Global Voices project website and additional monolingual data taken form the Helsinki Corpus of Swahili
  • 18. 6.  Transla1on  Experiments  (cont’d)   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 18 Translation Results:agreement e easily on sequences. Table 1: News Com- onal mono- bsites.9 mposed of al., 2012). s also used. obtained by t website10 Table 4: Translation quality (measured by BLEU) aver- aged over 3 MIRA runs. EN RU EN HE EN SW Baseline 14.7±0.1 15.8±0.3 18.3±0.1 +Class LM 15.7±0.1 16.8±0.4 18.7±0.2 +Synthetic unsupervised 16.2±0.1 17.6±0.1 19.0±0.1 supervised 16.7±0.1 — — rule shape indicator features and bigram cluster fea- tures. In these large scale conditions, the BLEU score improves from 18.8 to 19.6 with the addition of word clusters and reaches 20.0 with synthetic phrases. Details regarding this system are reported in Ammar et al. (2013). Translation quality (measured by BLEU) averaged over 3 MIRA runs.
  • 19. 7.  Conclusion   15/10/15 2014©Akiba  (Akiva)  Miura      AHC-­‐Lab,  IS,  NAIST 19 The  paper   l  presents  an  efficient  technique  that  exploits  morphologically   analyzed  corpora  to  produce  new  inflec1ons  possibly  unseen   in  the  bilingual  training  data   •  This  method  decomposes  into  two  simple  independent   steps    involving  well-­‐understood  discrimina1ve  models   l  also  achieves  language  independency  by  exploi1ng   unsupervised  morphological  segmenta1on  in  the  absence  of   linguis1cally  informed  morphological  analyses