Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

•

0 likes•61 views

Association for Computational Linguistics

ACL 2017 - P17-2061

Education

ACL 2017, Vancouver, July 31, 2017
An Empirical Comparison of Domain Adaptation Methods for
Neural Machine Translation
Chenhui Chu,1 Raj Dabre,2 Sadao Kurohashi2
chu@ids.osaka-u.ac.jp, raj@nlp.ist.i.kyoto-u.ac.jp, kuro@i.kyoto-u.ac.jp
1Osaka University, 2Kyoto University
Osaka University Kyoto University
1. Attention-based Neural Machine Translation [Bahdanau+ 2015]
2. The Low Resource Domain Problem
新たな翻訳手法を提案する
embedding
forward LSTM
backward LSTM
We propose a novel translationmethod
attention
decoder LSTM
softmax
input
output
<EOS>
<EOS>
It requires large-scale data for learning large amounts of parameters!
System NTCIR-CE ASPEC-CJ IWSLT-CE WIKI-CJ
SMT 29.54 36.39 14.31 36.83
NMT 37.11 42.92 7.87 18.29
3. Overview of This Study
BLEU-4 scores for translation performance
Methods
Features
Fine tuning
[Luong+ 2015] etc.
Multi domain
[Kobus+ 2016]
Proposed: mixed
fine tuning
Performance Good Good Best
Speed Fast Slowest Slower
Overfitting Yes No No
We conducted an empirical study on different domain adaptation
methods for NMT
4. Domain Adaptation Methods for Comparison
Source-Target
(out-of-domain) NMT Model
(out-of-domain)
Source-Target
(in-domain) NMT Model
(in-domain)
NMT Model
(mixed)
merge
Append in-domain tag
(<2in-domain>)
NMT Model
(in-domain)
Append out-of-domain
tag (<2out-of-domain>)
Oversample the smaller
in-domain corpus
Source-Target
(out-of-domain)
Source-Target
(in-domain)
NMT Model
(mixed)
merge
Append in-domain tag
(<2in-domain>)
NMT Model
(out-of-domain)
NMT Model
(in-domain)
Append out-of-domain
tag (<2out-of-domain>)
Oversample the smaller
in-domain corpus
Source-Target
(out-of-domain)
Source-Target
(in-domain)
System NTCIR-CE IWSLT-CE
IWSLT-CE SMT - 14.31
IWSLT-CE NMT - 7.87
NTCIR-CE SMT 29.54 4.33
NTCIR-CE NMT 37.11 2.60
Fine tuning 17.37 16.41
Multi domain 36.40 16.34
Multi domain w/o tags 37.32 14.97
Multi domain + Fine tuning 14.47 15.82
Mixed fine tuning 37.01 18.01
Mixed fine tuning w/o tags 39.67 17.43
Mixed fine tuning + Fine tuning 32.03 17.11
System ASPEC-CJ WIKI-CJ
WIKI-CJ SMT - 36.83
WIKI-CJ NMT - 18.29
ASPEC-CJ SMT 36.39 17.43
ASPEC-CJ NMT 42.92 20.01
Fine tuning 22.10 37.66
Multi domain 42.52 35.79
Multi domain w/o tags 40.78 33.74
Multi domain + Fine tuning 22.78 34.61
Mixed fine tuning 42.56 37.57
Mixed fine tuning w/o tags 41.86 37.23
Mixed fine tuning + Fine tuning 31.63 37.77
5. Experimental Results & Future Work
4.1. Fine tuning [Luong+ 2015; Sennrich+ 2016; Servan+ 2016; Freitag+ 2016]
Fine tuning tends to overfit due to the small size of the in-domain data
4.2. Multi domain [Kobus+ 2016]
4.3. Proposed: mixed fine tuning
[Kobus+ 2016] studies multi domain
translation, but it is not for domain adaptation in particular
Multi domain w/o tags:
if do not append tags
Multi domain + Fine tuning
Mixed fine tuning + Fine tuning
Mixed fine tuning w/o tags:
if do not append tags
Prevent overfitting
• High Quality In-domain Corpus Setting Results (BLEU-4) • Low Quality In-domain Corpus Setting Results (BLEU-4)
# Red numbers indicate the best system and all systems that were not significantly different from the best system
• Resource rich: NTCIR-CE patent (train: 1M), ASPEC-CJ scientific (train: 672k)
• Resource Poor: IWSLT-CE spoken (train: 209k), WIKI-CJ automatically
extracted Wikipedia (train: 136k)
Future work: leverage in-domain monolingual corpora
[# Figure from Dr. Toshiaki Nakazawa]

What's hot

Karnaugh map or K-map methodAbdullah Moin

Optimization of graph storage using GoFFishAnushree Prasanna Kumar

Update on Benchmark 7Daniel Wheeler

Vlsi project presentationmdaman89

Design and implementation of Closed Loop Control of Three Phase Interleaved P...IJMTST Journal

Cilk - An Efficient Multithreaded Runtime SystemShareek Ahamed

Analysis of different bit carry look ahead adder using verilog code 2IAEME Publication

Design and Verification of Area Efficient Carry Select Adderijsrd.com

Implementation of Low Power and Area Efficient Carry Select Adderinventionjournals

Fast Python: Master the Basics to Write Faster CodeChristopher Conlan

35th 36th Lecturebabak danyal

G-TAD: Sub-Graph Localization for Temporal Action DetectionMengmeng Xu

Fp12_Efficient_SCMMd. Al-Amin Khandaker Nipu

A shared-filesystem-memory approach for running IDA in parallel over informal...openseesdays

(slides 4) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen

jimmy hacking (at) MicrosoftJimmy Schementi

Designing a machine learning algorithm for Apache SparkMarco Gaido

Advanced flow mapping[bernard]Bernard Artola

Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...IJTET Journal

What's hot (19)

Karnaugh map or K-map method

Optimization of graph storage using GoFFish

Update on Benchmark 7

Vlsi project presentation

Design and implementation of Closed Loop Control of Three Phase Interleaved P...

Cilk - An Efficient Multithreaded Runtime System

Analysis of different bit carry look ahead adder using verilog code 2

Design and Verification of Area Efficient Carry Select Adder

Implementation of Low Power and Area Efficient Carry Select Adder

Fast Python: Master the Basics to Write Faster Code

35th 36th Lecture

G-TAD: Sub-Graph Localization for Temporal Action Detection

Fp12_Efficient_SCM

A shared-filesystem-memory approach for running IDA in parallel over informal...

(slides 4) Visual Computing: Geometry, Graphics, and Vision

jimmy hacking (at) Microsoft

Designing a machine learning algorithm for Apache Spark

Advanced flow mapping[bernard]

Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...

Similar to Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

Readinggroup xiang 24112016Xiang Zhang

Compiler Optimization-Space Explorationtmusabbir

HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf

CH02-Computer Organization and Architecture 10e.pptxHafizSaifullah4

Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems

Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES

Cuda project paperKan-Han (John) Lu

Continuous test suite failure predictionssuser94f898

Design and Estimation of delay, power and area for Parallel prefix addersIJERA Editor

SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCIRJET Journal

C0161018IOSR Journals

Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic CircuitsIJRES Journal

WMS Performance Shootout 2011Jeff McKenna

Test PDF filePrayag Verma

Real-time Code Sharing Service for one-to-many coding classesa2tt

Comparing Write-Ahead Logging and the Memory Bus Usingjorgerodriguessimao

International Journal of Engineering Research and Development (IJERD)IJERD Editor

Chainer OpenPOWER developer congress HandsON 20170522_otaPreferred Networks

PostgreSQL 10: What to Look ForAmit Langote

Similar to Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation (20)

Readinggroup xiang 24112016

Compiler Optimization-Space Exploration

HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER

CH02-Computer Organization and Architecture 10e.pptx

Improving Efficiency of Machine Learning Algorithms using HPCC Systems

Deep learning-based switchable network for in-loop filtering in high efficie...

Cuda project paper

Continuous test suite failure prediction

Design and Estimation of delay, power and area for Parallel prefix adders

SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC

C0161018

Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits

WMS Performance Shootout 2011

Test PDF file

Real-time Code Sharing Service for one-to-many coding classes

Comparing Write-Ahead Logging and the Memory Bus Using

International Journal of Engineering Research and Development (IJERD)

Chainer OpenPOWER developer congress HandsON 20170522_ota

PostgreSQL 10: What to Look For

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

microwave assisted reaction. General introductionMaksud Ahmed

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Accessible design: Minimum effort, maximum impactdawncurless

Paris 2024 Olympic Geographies - an activityGeoBlogs

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Nutritional Needs Presentation - HLTH 104misteraugie

mini mental status format.docxPoojaSen20

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Separation of Lanthanides/ Lanthanides and Actinides

microwave assisted reaction. General introduction

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf

Z Score,T Score, Percential Rank and Box Plot Graph

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Beyond the EU: DORA and NIS 2 Directive's Global Impact

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991

1029-Danh muc Sach Giao Khoa khoi 6.pdf

Employee wellbeing at the workplace.pptx

Accessible design: Minimum effort, maximum impact

Paris 2024 Olympic Geographies - an activity

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Nutritional Needs Presentation - HLTH 104

mini mental status format.docx

Sanyam Choudhary Chemistry practical.pdf

How to Make a Pirate ship Primary Education.pptx

Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

1. ACL 2017, Vancouver, July 31, 2017 An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation Chenhui Chu,1 Raj Dabre,2 Sadao Kurohashi2 chu@ids.osaka-u.ac.jp, raj@nlp.ist.i.kyoto-u.ac.jp, kuro@i.kyoto-u.ac.jp 1Osaka University, 2Kyoto University Osaka University Kyoto University 1. Attention-based Neural Machine Translation [Bahdanau+ 2015] 2. The Low Resource Domain Problem 新たな翻訳手法を提案する embedding forward LSTM backward LSTM We propose a novel translationmethod attention decoder LSTM softmax input output <EOS> <EOS> It requires large-scale data for learning large amounts of parameters! System NTCIR-CE ASPEC-CJ IWSLT-CE WIKI-CJ SMT 29.54 36.39 14.31 36.83 NMT 37.11 42.92 7.87 18.29 3. Overview of This Study BLEU-4 scores for translation performance Methods Features Fine tuning [Luong+ 2015] etc. Multi domain [Kobus+ 2016] Proposed: mixed fine tuning Performance Good Good Best Speed Fast Slowest Slower Overfitting Yes No No We conducted an empirical study on different domain adaptation methods for NMT 4. Domain Adaptation Methods for Comparison Source-Target (out-of-domain) NMT Model (out-of-domain) Source-Target (in-domain) NMT Model (in-domain) NMT Model (mixed) merge Append in-domain tag (<2in-domain>) NMT Model (in-domain) Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target (out-of-domain) Source-Target (in-domain) NMT Model (mixed) merge Append in-domain tag (<2in-domain>) NMT Model (out-of-domain) NMT Model (in-domain) Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target (out-of-domain) Source-Target (in-domain) System NTCIR-CE IWSLT-CE IWSLT-CE SMT - 14.31 IWSLT-CE NMT - 7.87 NTCIR-CE SMT 29.54 4.33 NTCIR-CE NMT 37.11 2.60 Fine tuning 17.37 16.41 Multi domain 36.40 16.34 Multi domain w/o tags 37.32 14.97 Multi domain + Fine tuning 14.47 15.82 Mixed fine tuning 37.01 18.01 Mixed fine tuning w/o tags 39.67 17.43 Mixed fine tuning + Fine tuning 32.03 17.11 System ASPEC-CJ WIKI-CJ WIKI-CJ SMT - 36.83 WIKI-CJ NMT - 18.29 ASPEC-CJ SMT 36.39 17.43 ASPEC-CJ NMT 42.92 20.01 Fine tuning 22.10 37.66 Multi domain 42.52 35.79 Multi domain w/o tags 40.78 33.74 Multi domain + Fine tuning 22.78 34.61 Mixed fine tuning 42.56 37.57 Mixed fine tuning w/o tags 41.86 37.23 Mixed fine tuning + Fine tuning 31.63 37.77 5. Experimental Results & Future Work 4.1. Fine tuning [Luong+ 2015; Sennrich+ 2016; Servan+ 2016; Freitag+ 2016] Fine tuning tends to overfit due to the small size of the in-domain data 4.2. Multi domain [Kobus+ 2016] 4.3. Proposed: mixed fine tuning [Kobus+ 2016] studies multi domain translation, but it is not for domain adaptation in particular Multi domain w/o tags: if do not append tags Multi domain + Fine tuning Mixed fine tuning + Fine tuning Mixed fine tuning w/o tags: if do not append tags Prevent overfitting • High Quality In-domain Corpus Setting Results (BLEU-4) • Low Quality In-domain Corpus Setting Results (BLEU-4) # Red numbers indicate the best system and all systems that were not significantly different from the best system • Resource rich: NTCIR-CE patent (train: 1M), ASPEC-CJ scientific (train: 672k) • Resource Poor: IWSLT-CE spoken (train: 209k), WIKI-CJ automatically extracted Wikipedia (train: 136k) Future work: leverage in-domain monolingual corpora [# Figure from Dr. Toshiaki Nakazawa]

Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

Similar to Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation (20)

More from Association for Computational Linguistics

More from Association for Computational Linguistics (20)

Recently uploaded

Recently uploaded (20)

Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation