SlideShare a Scribd company logo
1 of 21
Download to read offline
Latent Domain Word Alignment for Heterogeneous Corpora
Latent Domain Word Alignment for
Heterogeneous Corpora
Hoang Cuong
Joint work with Khalil Sima’an, appearing at NAACL 2015
ILLC, University of Amsterdam
1 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Bitext word alignment
Alignment task: identifying translation relationships
among the words in parallel sentences.
Proposed by
[Brown et al.(1993)Brown, Pietra, Pietra, and Mercer],
turning out to be one of the most important tasks in
Natural Language Processing.
2 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Bitext word alignment
(a)
Bilingual Data
Alignment Model
Viterbi Decoding
Figure: Statistical Alignment Framework (a), c.f.,
[Brown et al.(1993)Brown, Pietra, Pietra, and Mercer] 3 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
SMT with Mix-of-Domains Haystack
We have Big DATA to train SMT systems.
Thanks to Europarl, UN, Common Crawl, ...
Data come from very different domains.
How does this affect the alignment accuracy?
Bigger data = producing better alignment quality
This in fact not so surprising!
In domain adaptation, [Moore and Lewis(2010),
Axelrod et al.(2011)Axelrod, He, and Gao,
Cuong and Sima’an(2014)] shows that bigger data does
not mean better translation!
4 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Word Alignment with Mix-of-Domains Haystack
Why? Haystack = too many different translations!
maestra → master (computer);
maestra → teacher (education); maestra → dean (education);
maestra → crack (other), maestra → ...
Suboptimal alignment quality has been repeatedly observed
[Gao et al.(2011)Gao, Lewis, Quirk, and Hwang,
Bach et al.(2008)Bach, Gao, and Vogel,
Banerjee et al.(2012)Banerjee, Naskar, Roturier, Way, and Genabith].
5 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
How to overcome this problem?
6 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Disentangling the Subdomains
(a) (b)
Bilingual Data
Model
Viterbi Decoding
Bilingual Data
Model1 Modeli ModelK
... ...
Viterbi Decoding
Domain1 Domaini DomainK
Figure: Statistical Alignment Framework (a) vs. Statistical Latent Domain
Alignment Framework (b).
7 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Disentangling the Subdomains
Technical contributions
“Splitting” alignment statistics P(f, a| e) into different
domain-sensitive alignment statistics P(f, a| e, D) with
latent variable D
Combining domain-sensitive alignment statistics
8 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
“Splitting” alignment statistics
fj−1 fj fj+1
aj−1 aj aj+1
Observed layer (source words)
Latent alignment layer (target
words)
Figure: HMM alignment model with observed and latent alignment
layers (a).
9 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
“Splitting” alignment statistics
fj−1 fj fj+1
aj−1 aj aj+1
D
Observed layer (source words)
Latent alignment layer (target
words)
Latent domain layer
Figure: Latent domain HMM alignment model. An additional
latent layer representing domains has been conditioned on by both
the rest two layers.
10 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Likelihood
Likelihood: L ∝
f, e D P(D) P(f| e, D)P(e| D) + P(e|f, D)P(f| D)
A joint model between language models and
translation models
Too complex to train, unfortunately (we cannot learn
from scratch now!).
Deep Neural Networks might help (suggested in the talk
of the speaker)!
11 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Learning
Our temporary solution: EM with Partial Supervision
Number of Domains: The values of D ∈ [1..(N + 1)]
depends on the N available seed samples we know their
domain in advance plus the so-called ”out-domain”.
Parameter Constraints: We keep the domain prior
parameters fixed for all sentence pairs that belong to
seed samples.
12 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Combining domain-sensitive alignment statistics
ˆa = argmax
a
D
P(f, a, D| e)
= argmax
a
D
P(f, a| e, D)P(D| e)
= argmax
a
D
P(f, a| e, D)P(e| D)P(D).
Unfortunately, the decoding problem is NP-hard (see
[DeNero and Macherey(2011),
Chang et al.(2014)Chang, Rush, DeNero, and Collins]).
13 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Combining domain-sensitive alignment statistics
ˆa = argmax
a
D
P(f, a| e, D)P(e| D)P(D).
Two potential solutions
Lagrangian relaxation-based decoder (ack ack I don’t
want to implement this!!!)
Defining an approximate objective function, e.g., its
lower bound (this work!)
ˆa = argmax
a
D
P(f, a| e, D)P(e| D)P(D)
14 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Data Preparation
Legal
Pharmacy
Hardware
The
rest (3.7M)
Cmix
Training latent domain alignment model with the prior
knowledge derived from domain information of three
subsets, comparing alignment accuracy to the baseline.
15 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Alignment results
Model Prior Prec.↑ Rec.↑ AER↓
1 Million
Baseline - 66.95 61.29 36.00
Latent
Pharmacy (100K) 67.85 61.72 35.36
Legal (100K) 67.57 62.29 35.17
Hardware (100K) 69.41 63.58 33.63
ALL (300K) 69.64 63.30 33.68
16 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Alignment results
Model Prior Prec.↑ Rec.↑ AER↓
2 Million
Baseline - 68.34 61.58 35.22
Latent
Pharmacy (100K) 68.85 62.58 34.43
Legal (100K) 69.98 64.01 33.13
Hardware (100K) 69.45 63.23 33.81
ALL (300K) 71.51 63.87 32.53
17 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Alignment results
Model Prior Prec.↑ Rec.↑ AER↓
4 Million
Baseline - 69.37 64.30 33.26
Latent
Pharmacy (100K) 69.69 62.80 33.94
Legal (100K) 70.51 63.94 32.93
Hardware (100K) 71.75 64.44 32.10
ALL (300K) 72.16 64.30 31.99
18 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
1. An Introduction
Discussion
Word alignment should involve latent concepts
representing domains of data
We present the benefits: With the latent domain - the
more we know about the data, the better we can improve
the performance.
We strongly believe this should be applicable for any
statistical model, and not limited into alignment models
only.
Challenge: Can we learn the latent domain (alignment)
models from scratch?
19 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
Bibliography
Bibliography I
Amittai Axelrod, Xiaodong He, and Jianfeng Gao.
Domain adaptation via pseudo in-domain data selection.
In Proceedings of EMNLP, 2011.
Nguyen Bach, Qin Gao, and Stephan Vogel.
Improving word alignment with language model based confidence scores.
In Proceedings of the Third Workshop on Statistical Machine Translation, 2008.
Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way, and Josef Genabith.
Translation quality-based supplementary data selection by incremental update of translation models.
In Martin Kay and Christian Boitet, editors, COLING 2012, 24th International Conference on
Computational Linguistics, Proceedings of the Conference: Technical Papers, 8-15 December 2012,
Mumbai, India, pages 149–166. Indian Institute of Technology Bombay, 2012.
Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer.
The mathematics of statistical machine translation: parameter estimation.
Comput. Linguist., 1993.
Yin-Wen Chang, Alexander M. Rush, John DeNero, and Michael Collins.
A constrained viterbi relaxation for bidirectional word alignment.
In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association
for Computational Linguistics, 2014.
URL http://www.aclweb.org/anthology/P/P14/P14-1139.
20 / 21
Latent Domain Word Alignment for Heterogeneous Corpora
Bibliography
Bibliography II
Hoang Cuong and Khalil Sima’an.
Latent domain translation models in mix-of-domains haystack.
In Proceedings of COLING, 2014.
John DeNero and Klaus Macherey.
Model-based aligner combination using dual decomposition.
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies - Volume 1. Association for Computational Linguistics, 2011.
URL http://dl.acm.org/citation.cfm?id=2002472.2002526.
Qin Gao, Will Lewis, Chris Quirk, and Mei-Yuh Hwang.
Incremental training and intentional over-fitting of word alignment.
In Proceedings of MT Summit XIII. Asia-Pacific Association for Machine Translation, September 2011.
URL http://research.microsoft.com/apps/pubs/default.aspx?id=153368.
Robert C. Moore and William Lewis.
Intelligent selection of language model training data.
In Proceedings of the ACL 2010 Conference Short Papers, ACLShort ’10, pages 220–224, Stroudsburg, PA,
USA, 2010. Association for Computational Linguistics.
URL http://dl.acm.org/citation.cfm?id=1858842.1858883.
21 / 21

More Related Content

What's hot

Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
A project on advanced C language
A project on advanced C languageA project on advanced C language
A project on advanced C languagesvrohith 9
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰ssuserc35c0e
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainDDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainLuukBoulogne
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
 
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Daisuke BEKKI
 

What's hot (20)

Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
Icml12
Icml12Icml12
Icml12
 
A project on advanced C language
A project on advanced C languageA project on advanced C language
A project on advanced C language
 
Gate-Cs 1997
Gate-Cs 1997Gate-Cs 1997
Gate-Cs 1997
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
P-6
P-6P-6
P-6
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainDDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
 
grammer
grammergrammer
grammer
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Trivandrum
TrivandrumTrivandrum
Trivandrum
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
Why parsing is a part of Language Faculty Science (by Daisuke Bekki)
 

Viewers also liked

Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA RIILP
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015RIILP
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - AcclaroRIILP
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD RIILP
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU RIILP
 
Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD RIILP
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic RIILP
 
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015RIILP
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...RIILP
 
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015RIILP
 
7. Intellectual Property - Alberto Massidda (Translated)
7. Intellectual Property - Alberto Massidda (Translated)7. Intellectual Property - Alberto Massidda (Translated)
7. Intellectual Property - Alberto Massidda (Translated)RIILP
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
 

Viewers also liked (19)

Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - Acclaro
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU
 
Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic
 
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
ESR6 Varvara Logacheva - EXPERT Summer School - Malaga 2015
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
 
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
 
7. Intellectual Property - Alberto Massidda (Translated)
7. Intellectual Property - Alberto Massidda (Translated)7. Intellectual Property - Alberto Massidda (Translated)
7. Intellectual Property - Alberto Massidda (Translated)
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 

Similar to Latent Domain Alignment Improves Word Alignment for Mixed Corpora

Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Variability Management in Domain Specific Languages
Variability Management in Domain Specific LanguagesVariability Management in Domain Specific Languages
Variability Management in Domain Specific LanguagesDavid Méndez-Acuña
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Ontology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsOntology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsRaji Ghawi
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityJie Bao
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIDES Editor
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionGaetano Rossiello, PhD
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languageshs0041
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationJEE HYUN PARK
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Stefan Marr
 

Similar to Latent Domain Alignment Improves Word Alignment for Mixed Corpora (20)

Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Variability Management in Domain Specific Languages
Variability Management in Domain Specific LanguagesVariability Management in Domain Specific Languages
Variability Management in Domain Specific Languages
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Ontology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsOntology-based Cooperation of Information Systems
Ontology-based Cooperation of Information Systems
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for Lipreading
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation Extraction
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?
 

More from RIILP

Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones RIILP
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO RIILP
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT RIILP
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARRIILP
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU RIILP
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMARIILP
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW RIILP
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARRIILP
 
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015RIILP
 
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015RIILP
 
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015RIILP
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015RIILP
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)RIILP
 
8. Transfer of Technology to Market and Commercial Exploitation of Results - ...
8. Transfer of Technology to Market and Commercial Exploitation of Results - ...8. Transfer of Technology to Market and Commercial Exploitation of Results - ...
8. Transfer of Technology to Market and Commercial Exploitation of Results - ...RIILP
 

More from RIILP (15)

Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAAR
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMA
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAAR
 
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015
 
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
ESR5 Liling Tan - EXPERT Summer School - Malaga 2015
 
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
 
8. Transfer of Technology to Market and Commercial Exploitation of Results - ...
8. Transfer of Technology to Market and Commercial Exploitation of Results - ...8. Transfer of Technology to Market and Commercial Exploitation of Results - ...
8. Transfer of Technology to Market and Commercial Exploitation of Results - ...
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Latent Domain Alignment Improves Word Alignment for Mixed Corpora

  • 1. Latent Domain Word Alignment for Heterogeneous Corpora Latent Domain Word Alignment for Heterogeneous Corpora Hoang Cuong Joint work with Khalil Sima’an, appearing at NAACL 2015 ILLC, University of Amsterdam 1 / 21
  • 2. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Bitext word alignment Alignment task: identifying translation relationships among the words in parallel sentences. Proposed by [Brown et al.(1993)Brown, Pietra, Pietra, and Mercer], turning out to be one of the most important tasks in Natural Language Processing. 2 / 21
  • 3. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Bitext word alignment (a) Bilingual Data Alignment Model Viterbi Decoding Figure: Statistical Alignment Framework (a), c.f., [Brown et al.(1993)Brown, Pietra, Pietra, and Mercer] 3 / 21
  • 4. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction SMT with Mix-of-Domains Haystack We have Big DATA to train SMT systems. Thanks to Europarl, UN, Common Crawl, ... Data come from very different domains. How does this affect the alignment accuracy? Bigger data = producing better alignment quality This in fact not so surprising! In domain adaptation, [Moore and Lewis(2010), Axelrod et al.(2011)Axelrod, He, and Gao, Cuong and Sima’an(2014)] shows that bigger data does not mean better translation! 4 / 21
  • 5. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Word Alignment with Mix-of-Domains Haystack Why? Haystack = too many different translations! maestra → master (computer); maestra → teacher (education); maestra → dean (education); maestra → crack (other), maestra → ... Suboptimal alignment quality has been repeatedly observed [Gao et al.(2011)Gao, Lewis, Quirk, and Hwang, Bach et al.(2008)Bach, Gao, and Vogel, Banerjee et al.(2012)Banerjee, Naskar, Roturier, Way, and Genabith]. 5 / 21
  • 6. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction How to overcome this problem? 6 / 21
  • 7. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Disentangling the Subdomains (a) (b) Bilingual Data Model Viterbi Decoding Bilingual Data Model1 Modeli ModelK ... ... Viterbi Decoding Domain1 Domaini DomainK Figure: Statistical Alignment Framework (a) vs. Statistical Latent Domain Alignment Framework (b). 7 / 21
  • 8. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Disentangling the Subdomains Technical contributions “Splitting” alignment statistics P(f, a| e) into different domain-sensitive alignment statistics P(f, a| e, D) with latent variable D Combining domain-sensitive alignment statistics 8 / 21
  • 9. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction “Splitting” alignment statistics fj−1 fj fj+1 aj−1 aj aj+1 Observed layer (source words) Latent alignment layer (target words) Figure: HMM alignment model with observed and latent alignment layers (a). 9 / 21
  • 10. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction “Splitting” alignment statistics fj−1 fj fj+1 aj−1 aj aj+1 D Observed layer (source words) Latent alignment layer (target words) Latent domain layer Figure: Latent domain HMM alignment model. An additional latent layer representing domains has been conditioned on by both the rest two layers. 10 / 21
  • 11. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Likelihood Likelihood: L ∝ f, e D P(D) P(f| e, D)P(e| D) + P(e|f, D)P(f| D) A joint model between language models and translation models Too complex to train, unfortunately (we cannot learn from scratch now!). Deep Neural Networks might help (suggested in the talk of the speaker)! 11 / 21
  • 12. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Learning Our temporary solution: EM with Partial Supervision Number of Domains: The values of D ∈ [1..(N + 1)] depends on the N available seed samples we know their domain in advance plus the so-called ”out-domain”. Parameter Constraints: We keep the domain prior parameters fixed for all sentence pairs that belong to seed samples. 12 / 21
  • 13. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Combining domain-sensitive alignment statistics ˆa = argmax a D P(f, a, D| e) = argmax a D P(f, a| e, D)P(D| e) = argmax a D P(f, a| e, D)P(e| D)P(D). Unfortunately, the decoding problem is NP-hard (see [DeNero and Macherey(2011), Chang et al.(2014)Chang, Rush, DeNero, and Collins]). 13 / 21
  • 14. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Combining domain-sensitive alignment statistics ˆa = argmax a D P(f, a| e, D)P(e| D)P(D). Two potential solutions Lagrangian relaxation-based decoder (ack ack I don’t want to implement this!!!) Defining an approximate objective function, e.g., its lower bound (this work!) ˆa = argmax a D P(f, a| e, D)P(e| D)P(D) 14 / 21
  • 15. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Data Preparation Legal Pharmacy Hardware The rest (3.7M) Cmix Training latent domain alignment model with the prior knowledge derived from domain information of three subsets, comparing alignment accuracy to the baseline. 15 / 21
  • 16. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Alignment results Model Prior Prec.↑ Rec.↑ AER↓ 1 Million Baseline - 66.95 61.29 36.00 Latent Pharmacy (100K) 67.85 61.72 35.36 Legal (100K) 67.57 62.29 35.17 Hardware (100K) 69.41 63.58 33.63 ALL (300K) 69.64 63.30 33.68 16 / 21
  • 17. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Alignment results Model Prior Prec.↑ Rec.↑ AER↓ 2 Million Baseline - 68.34 61.58 35.22 Latent Pharmacy (100K) 68.85 62.58 34.43 Legal (100K) 69.98 64.01 33.13 Hardware (100K) 69.45 63.23 33.81 ALL (300K) 71.51 63.87 32.53 17 / 21
  • 18. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Alignment results Model Prior Prec.↑ Rec.↑ AER↓ 4 Million Baseline - 69.37 64.30 33.26 Latent Pharmacy (100K) 69.69 62.80 33.94 Legal (100K) 70.51 63.94 32.93 Hardware (100K) 71.75 64.44 32.10 ALL (300K) 72.16 64.30 31.99 18 / 21
  • 19. Latent Domain Word Alignment for Heterogeneous Corpora 1. An Introduction Discussion Word alignment should involve latent concepts representing domains of data We present the benefits: With the latent domain - the more we know about the data, the better we can improve the performance. We strongly believe this should be applicable for any statistical model, and not limited into alignment models only. Challenge: Can we learn the latent domain (alignment) models from scratch? 19 / 21
  • 20. Latent Domain Word Alignment for Heterogeneous Corpora Bibliography Bibliography I Amittai Axelrod, Xiaodong He, and Jianfeng Gao. Domain adaptation via pseudo in-domain data selection. In Proceedings of EMNLP, 2011. Nguyen Bach, Qin Gao, and Stephan Vogel. Improving word alignment with language model based confidence scores. In Proceedings of the Third Workshop on Statistical Machine Translation, 2008. Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way, and Josef Genabith. Translation quality-based supplementary data selection by incremental update of translation models. In Martin Kay and Christian Boitet, editors, COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 8-15 December 2012, Mumbai, India, pages 149–166. Indian Institute of Technology Bombay, 2012. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 1993. Yin-Wen Chang, Alexander M. Rush, John DeNero, and Michael Collins. A constrained viterbi relaxation for bidirectional word alignment. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2014. URL http://www.aclweb.org/anthology/P/P14/P14-1139. 20 / 21
  • 21. Latent Domain Word Alignment for Heterogeneous Corpora Bibliography Bibliography II Hoang Cuong and Khalil Sima’an. Latent domain translation models in mix-of-domains haystack. In Proceedings of COLING, 2014. John DeNero and Klaus Macherey. Model-based aligner combination using dual decomposition. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1. Association for Computational Linguistics, 2011. URL http://dl.acm.org/citation.cfm?id=2002472.2002526. Qin Gao, Will Lewis, Chris Quirk, and Mei-Yuh Hwang. Incremental training and intentional over-fitting of word alignment. In Proceedings of MT Summit XIII. Asia-Pacific Association for Machine Translation, September 2011. URL http://research.microsoft.com/apps/pubs/default.aspx?id=153368. Robert C. Moore and William Lewis. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, ACLShort ’10, pages 220–224, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. URL http://dl.acm.org/citation.cfm?id=1858842.1858883. 21 / 21