Latent Domain Alignment Improves Word Alignment for Mixed Corpora

Latent Domain Word Alignment for Heterogeneous Corpora
Latent Domain Word Alignment for
Heterogeneous Corpora
Hoang Cuong
Joint work with Khalil Sima’an, appearing at NAACL 2015
ILLC, University of Amsterdam
1 / 21

1. An Introduction
Bitext word alignment
Alignment task: identifying translation relationships
among the words in parallel sentences.
Proposed by
[Brown et al.(1993)Brown, Pietra, Pietra, and Mercer],
turning out to be one of the most important tasks in
Natural Language Processing.
2 / 21

1. An Introduction
Bitext word alignment
(a)
Bilingual Data
Alignment Model
Viterbi Decoding
Figure: Statistical Alignment Framework (a), c.f.,
[Brown et al.(1993)Brown, Pietra, Pietra, and Mercer] 3 / 21

1. An Introduction
SMT with Mix-of-Domains Haystack
We have Big DATA to train SMT systems.
Thanks to Europarl, UN, Common Crawl, ...
Data come from very diﬀerent domains.
How does this aﬀect the alignment accuracy?
Bigger data = producing better alignment quality
This in fact not so surprising!
In domain adaptation, [Moore and Lewis(2010),
Axelrod et al.(2011)Axelrod, He, and Gao,
Cuong and Sima’an(2014)] shows that bigger data does
not mean better translation!
4 / 21

1. An Introduction
Word Alignment with Mix-of-Domains Haystack
Why? Haystack = too many diﬀerent translations!
maestra → master (computer);
maestra → teacher (education); maestra → dean (education);
maestra → crack (other), maestra → ...
Suboptimal alignment quality has been repeatedly observed
[Gao et al.(2011)Gao, Lewis, Quirk, and Hwang,
Bach et al.(2008)Bach, Gao, and Vogel,
Banerjee et al.(2012)Banerjee, Naskar, Roturier, Way, and Genabith].
5 / 21

1. An Introduction
How to overcome this problem?
6 / 21

1. An Introduction
Disentangling the Subdomains
(a) (b)
Bilingual Data
Model
Viterbi Decoding
Bilingual Data
Model1 Modeli ModelK
... ...
Viterbi Decoding
Domain1 Domaini DomainK
Figure: Statistical Alignment Framework (a) vs. Statistical Latent Domain
Alignment Framework (b).
7 / 21

1. An Introduction
Disentangling the Subdomains
Technical contributions
“Splitting” alignment statistics P(f, a| e) into diﬀerent
domain-sensitive alignment statistics P(f, a| e, D) with
latent variable D
Combining domain-sensitive alignment statistics
8 / 21

1. An Introduction
“Splitting” alignment statistics
fj−1 fj fj+1
aj−1 aj aj+1
Observed layer (source words)
Latent alignment layer (target
words)
Figure: HMM alignment model with observed and latent alignment
layers (a).
9 / 21

1. An Introduction
“Splitting” alignment statistics
fj−1 fj fj+1
aj−1 aj aj+1
D
Observed layer (source words)
Latent alignment layer (target
words)
Latent domain layer
Figure: Latent domain HMM alignment model. An additional
latent layer representing domains has been conditioned on by both
the rest two layers.
10 / 21

1. An Introduction
Likelihood
Likelihood: L ∝
f, e D P(D) P(f| e, D)P(e| D) + P(e|f, D)P(f| D)
A joint model between language models and
translation models
Too complex to train, unfortunately (we cannot learn
from scratch now!).
Deep Neural Networks might help (suggested in the talk
of the speaker)!
11 / 21

1. An Introduction
Learning
Our temporary solution: EM with Partial Supervision
Number of Domains: The values of D ∈ [1..(N + 1)]
depends on the N available seed samples we know their
domain in advance plus the so-called ”out-domain”.
Parameter Constraints: We keep the domain prior
parameters ﬁxed for all sentence pairs that belong to
seed samples.
12 / 21

1. An Introduction
ˆa = argmax
a
D
P(f, a, D| e)
= argmax
a
D
P(f, a| e, D)P(D| e)
= argmax
a
D
P(f, a| e, D)P(e| D)P(D).
Unfortunately, the decoding problem is NP-hard (see
[DeNero and Macherey(2011),
Chang et al.(2014)Chang, Rush, DeNero, and Collins]).
13 / 21

1. An Introduction
â = argmax
a
D
P(f, a| e, D)P(e| D)P(D).
Two potential solutions
Lagrangian relaxation-based decoder (ack ack I don’t
want to implement this!!!)
Defining an approximate objective function, e.g., its
lower bound (this work!)
â = argmax
a
D
P(f, a| e, D)P(e| D)P(D)
14 / 21

1. An Introduction
Data Preparation
Legal
Pharmacy
Hardware
The
rest (3.7M)
Cmix
Training latent domain alignment model with the prior
knowledge derived from domain information of three
subsets, comparing alignment accuracy to the baseline.
15 / 21

1. An Introduction
Alignment results
Model Prior Prec.↑ Rec.↑ AER↓
1 Million
Baseline - 66.95 61.29 36.00
Latent
Pharmacy (100K) 67.85 61.72 35.36
Legal (100K) 67.57 62.29 35.17
Hardware (100K) 69.41 63.58 33.63
ALL (300K) 69.64 63.30 33.68
16 / 21

1. An Introduction
Alignment results
2 Million
Baseline - 68.34 61.58 35.22
Latent
Pharmacy (100K) 68.85 62.58 34.43
Legal (100K) 69.98 64.01 33.13
Hardware (100K) 69.45 63.23 33.81
ALL (300K) 71.51 63.87 32.53
17 / 21

1. An Introduction
Alignment results
4 Million
Baseline - 69.37 64.30 33.26
Latent
Pharmacy (100K) 69.69 62.80 33.94
Legal (100K) 70.51 63.94 32.93
Hardware (100K) 71.75 64.44 32.10
ALL (300K) 72.16 64.30 31.99
18 / 21

1. An Introduction
Discussion
Word alignment should involve latent concepts
representing domains of data
We present the beneﬁts: With the latent domain - the
more we know about the data, the better we can improve
the performance.
We strongly believe this should be applicable for any
statistical model, and not limited into alignment models
only.
Challenge: Can we learn the latent domain (alignment)
models from scratch?
19 / 21

Bibliography
Bibliography I
Amittai Axelrod, Xiaodong He, and Jianfeng Gao.
Domain adaptation via pseudo in-domain data selection.
In Proceedings of EMNLP, 2011.
Nguyen Bach, Qin Gao, and Stephan Vogel.
Improving word alignment with language model based conﬁdence scores.
In Proceedings of the Third Workshop on Statistical Machine Translation, 2008.
Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way, and Josef Genabith.
Translation quality-based supplementary data selection by incremental update of translation models.
In Martin Kay and Christian Boitet, editors, COLING 2012, 24th International Conference on
Computational Linguistics, Proceedings of the Conference: Technical Papers, 8-15 December 2012,
Mumbai, India, pages 149–166. Indian Institute of Technology Bombay, 2012.
Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer.
The mathematics of statistical machine translation: parameter estimation.
Comput. Linguist., 1993.
Yin-Wen Chang, Alexander M. Rush, John DeNero, and Michael Collins.
A constrained viterbi relaxation for bidirectional word alignment.
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association
for Computational Linguistics, 2014.
URL http://www.aclweb.org/anthology/P/P14/P14-1139.
20 / 21

Bibliography
Bibliography II
Hoang Cuong and Khalil Sima’an.
Latent domain translation models in mix-of-domains haystack.
In Proceedings of COLING, 2014.
John DeNero and Klaus Macherey.
Model-based aligner combination using dual decomposition.
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies - Volume 1. Association for Computational Linguistics, 2011.
URL http://dl.acm.org/citation.cfm?id=2002472.2002526.
Qin Gao, Will Lewis, Chris Quirk, and Mei-Yuh Hwang.
Incremental training and intentional over-ﬁtting of word alignment.
In Proceedings of MT Summit XIII. Asia-Paciﬁc Association for Machine Translation, September 2011.
URL http://research.microsoft.com/apps/pubs/default.aspx?id=153368.
Robert C. Moore and William Lewis.
Intelligent selection of language model training data.
In Proceedings of the ACL 2010 Conference Short Papers, ACLShort ’10, pages 220–224, Stroudsburg, PA,
USA, 2010. Association for Computational Linguistics.
URL http://dl.acm.org/citation.cfm?id=1858842.1858883.
21 / 21

Latent Domain Alignment Improves Word Alignment for Mixed Corpora

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Latent Domain Alignment Improves Word Alignment for Mixed Corpora

Similar to Latent Domain Alignment Improves Word Alignment for Mixed Corpora (20)

More from RIILP

More from RIILP (15)

Recently uploaded

Recently uploaded (20)

Latent Domain Alignment Improves Word Alignment for Mixed Corpora