SlideShare a Scribd company logo
1 of 1
Download to read offline
ACL 2017, Vancouver, July 31, 2017
An Empirical Comparison of Domain Adaptation Methods for
Neural Machine Translation
Chenhui Chu,1 Raj Dabre,2 Sadao Kurohashi2
chu@ids.osaka-u.ac.jp, raj@nlp.ist.i.kyoto-u.ac.jp, kuro@i.kyoto-u.ac.jp
1Osaka University, 2Kyoto University
Osaka University Kyoto University
1. Attention-based Neural Machine Translation [Bahdanau+ 2015]
2. The Low Resource Domain Problem
新たな 翻訳 手法 を 提案 する
embedding
forward LSTM
backward LSTM
We propose a novel translationmethod
attention
decoder LSTM
softmax
input
output
<EOS>
<EOS>
It	requires	large-scale data	for	learning	large	amounts of	parameters!
System NTCIR-CE	 ASPEC-CJ IWSLT-CE WIKI-CJ
SMT	 29.54 36.39	 14.31	 36.83	
NMT	 37.11 42.92	 7.87	 18.29	
3. Overview of This Study
BLEU-4	scores	for	translation	performance
Methods
Features
Fine tuning
[Luong+	2015]	etc.
Multi	domain	
[Kobus+	2016]
Proposed:	mixed	
fine	tuning
Performance Good Good Best
Speed Fast Slowest Slower
Overfitting Yes No No
We conducted an empirical study on different domain adaptation
methods for NMT
4. Domain Adaptation Methods for Comparison
Source-Target
(out-of-domain) NMT Model
(out-of-domain)
Source-Target
(in-domain) NMT Model
(in-domain)
NMT Model
(mixed)
merge
Append in-domain tag
(<2in-domain>)
NMT Model
(in-domain)
Append out-of-domain
tag (<2out-of-domain>)
Oversample the smaller
in-domain corpus
Source-Target
(out-of-domain)
Source-Target
(in-domain)
NMT Model
(mixed)
merge
Append in-domain tag
(<2in-domain>)
NMT Model
(out-of-domain)
NMT Model
(in-domain)
Append out-of-domain
tag (<2out-of-domain>)
Oversample the smaller
in-domain corpus
Source-Target
(out-of-domain)
Source-Target
(in-domain)
System NTCIR-CE	 IWSLT-CE
IWSLT-CE	SMT	 - 14.31	
IWSLT-CE	NMT	 - 7.87	
NTCIR-CE	SMT	 29.54 4.33
NTCIR-CE	NMT	 37.11 2.60	
Fine	tuning	 17.37 16.41
Multi	domain 36.40 16.34	
Multi	domain	w/o	tags 37.32 14.97	
Multi	domain	+	Fine	tuning	 14.47 15.82	
Mixed	fine	tuning 37.01 18.01	
Mixed	fine	tuning	w/o	tags 39.67 17.43	
Mixed	fine	tuning	+	Fine	tuning	 32.03 17.11	
System ASPEC-CJ	 WIKI-CJ
WIKI-CJ	SMT	 - 36.83	
WIKI-CJ	NMT	 - 18.29	
ASPEC-CJ	SMT	 36.39	 17.43	
ASPEC-CJ	NMT	 42.92	 20.01	
Fine	tuning	 22.10	 37.66	
Multi	domain 42.52	 35.79	
Multi	domain	w/o	tags 40.78	 33.74	
Multi	domain	+	Fine	tuning	 22.78	 34.61	
Mixed	fine	tuning 42.56	 37.57
Mixed	fine	tuning	w/o	tags 41.86	 37.23	
Mixed	fine	tuning	+	Fine	tuning	 31.63	 37.77	
5. Experimental Results & Future Work
4.1.	Fine tuning [Luong+	2015;	Sennrich+	2016;	Servan+	2016;	Freitag+	2016]
Fine tuning tends to overfit due to the small size of the in-domain data
4.2. Multi domain [Kobus+ 2016]
4.3. Proposed: mixed fine tuning
[Kobus+ 2016] studies multi domain
translation, but it is not for domain adaptation in particular
Multi domain w/o tags:
if do not append tags
Multi domain + Fine tuning
Mixed fine tuning + Fine tuning
Mixed fine tuning w/o tags:
if do not append tags
Prevent overfitting
• High Quality In-domain Corpus Setting Results (BLEU-4) • Low Quality In-domain Corpus Setting Results (BLEU-4)
# Red numbers indicate the best system and all systems that were not significantly different from the best system
• Resource	rich:	NTCIR-CE	patent (train:	1M),	ASPEC-CJ	scientific	(train:	672k)
• Resource	Poor:	IWSLT-CE	spoken	(train:	209k),	WIKI-CJ	automatically	
extracted Wikipedia (train:	136k)
Future work: leverage in-domain monolingual corpora
[# Figure from Dr. Toshiaki Nakazawa]

More Related Content

What's hot

Karnaugh map or K-map method
Karnaugh map or K-map methodKarnaugh map or K-map method
Karnaugh map or K-map methodAbdullah Moin
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFishAnushree Prasanna Kumar
 
Vlsi project presentation
Vlsi project presentationVlsi project presentation
Vlsi project presentationmdaman89
 
Design and implementation of Closed Loop Control of Three Phase Interleaved P...
Design and implementation of Closed Loop Control of Three Phase Interleaved P...Design and implementation of Closed Loop Control of Three Phase Interleaved P...
Design and implementation of Closed Loop Control of Three Phase Interleaved P...IJMTST Journal
 
Cilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime SystemCilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime SystemShareek Ahamed
 
Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2IAEME Publication
 
Design and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select AdderDesign and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select Adderijsrd.com
 
Implementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select AdderImplementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select Adderinventionjournals
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeChristopher Conlan
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionMengmeng Xu
 
A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...openseesdays
 
(slides 4) Visual Computing: Geometry, Graphics, and Vision
(slides 4) Visual Computing: Geometry, Graphics, and Vision(slides 4) Visual Computing: Geometry, Graphics, and Vision
(slides 4) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen
 
jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) MicrosoftJimmy Schementi
 
Designing a machine learning algorithm for Apache Spark
Designing a machine learning algorithm for Apache SparkDesigning a machine learning algorithm for Apache Spark
Designing a machine learning algorithm for Apache SparkMarco Gaido
 
Advanced flow mapping[bernard]
Advanced flow mapping[bernard]Advanced flow mapping[bernard]
Advanced flow mapping[bernard]Bernard Artola
 
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...IJTET Journal
 

What's hot (19)

Karnaugh map or K-map method
Karnaugh map or K-map methodKarnaugh map or K-map method
Karnaugh map or K-map method
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Update on Benchmark 7
Update on Benchmark 7Update on Benchmark 7
Update on Benchmark 7
 
Vlsi project presentation
Vlsi project presentationVlsi project presentation
Vlsi project presentation
 
Design and implementation of Closed Loop Control of Three Phase Interleaved P...
Design and implementation of Closed Loop Control of Three Phase Interleaved P...Design and implementation of Closed Loop Control of Three Phase Interleaved P...
Design and implementation of Closed Loop Control of Three Phase Interleaved P...
 
Cilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime SystemCilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime System
 
Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2
 
Design and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select AdderDesign and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select Adder
 
Implementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select AdderImplementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select Adder
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster Code
 
35th 36th Lecture
35th 36th Lecture35th 36th Lecture
35th 36th Lecture
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
 
Fp12_Efficient_SCM
Fp12_Efficient_SCMFp12_Efficient_SCM
Fp12_Efficient_SCM
 
A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...
 
(slides 4) Visual Computing: Geometry, Graphics, and Vision
(slides 4) Visual Computing: Geometry, Graphics, and Vision(slides 4) Visual Computing: Geometry, Graphics, and Vision
(slides 4) Visual Computing: Geometry, Graphics, and Vision
 
jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) Microsoft
 
Designing a machine learning algorithm for Apache Spark
Designing a machine learning algorithm for Apache SparkDesigning a machine learning algorithm for Apache Spark
Designing a machine learning algorithm for Apache Spark
 
Advanced flow mapping[bernard]
Advanced flow mapping[bernard]Advanced flow mapping[bernard]
Advanced flow mapping[bernard]
 
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
 

Similar to Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

Readinggroup xiang 24112016
Readinggroup xiang 24112016Readinggroup xiang 24112016
Readinggroup xiang 24112016Xiang Zhang
 
Compiler Optimization-Space Exploration
Compiler Optimization-Space ExplorationCompiler Optimization-Space Exploration
Compiler Optimization-Space Explorationtmusabbir
 
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
 
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxCH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxHafizSaifullah4
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
 
Continuous test suite failure prediction
Continuous test suite failure predictionContinuous test suite failure prediction
Continuous test suite failure predictionssuser94f898
 
Design and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix addersDesign and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix addersIJERA Editor
 
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCSVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCIRJET Journal
 
Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits
Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic CircuitsDesign Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits
Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic CircuitsIJRES Journal
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011Jeff McKenna
 
Real-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classesReal-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classesa2tt
 
Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Usingjorgerodriguessimao
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Chainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_otaChainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_otaPreferred Networks
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForPostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForAmit Langote
 

Similar to Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation (20)

Readinggroup xiang 24112016
Readinggroup xiang 24112016Readinggroup xiang 24112016
Readinggroup xiang 24112016
 
Compiler Optimization-Space Exploration
Compiler Optimization-Space ExplorationCompiler Optimization-Space Exploration
Compiler Optimization-Space Exploration
 
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
 
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxCH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptx
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
Continuous test suite failure prediction
Continuous test suite failure predictionContinuous test suite failure prediction
Continuous test suite failure prediction
 
Design and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix addersDesign and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix adders
 
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCSVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
 
C0161018
C0161018C0161018
C0161018
 
C0161018
C0161018C0161018
C0161018
 
Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits
Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic CircuitsDesign Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits
Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011
 
Test PDF file
Test PDF fileTest PDF file
Test PDF file
 
Real-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classesReal-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classes
 
Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Using
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Chainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_otaChainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_ota
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForPostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
 

More from Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 

Chenhui Chu - 2017 - An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

  • 1. ACL 2017, Vancouver, July 31, 2017 An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation Chenhui Chu,1 Raj Dabre,2 Sadao Kurohashi2 chu@ids.osaka-u.ac.jp, raj@nlp.ist.i.kyoto-u.ac.jp, kuro@i.kyoto-u.ac.jp 1Osaka University, 2Kyoto University Osaka University Kyoto University 1. Attention-based Neural Machine Translation [Bahdanau+ 2015] 2. The Low Resource Domain Problem 新たな 翻訳 手法 を 提案 する embedding forward LSTM backward LSTM We propose a novel translationmethod attention decoder LSTM softmax input output <EOS> <EOS> It requires large-scale data for learning large amounts of parameters! System NTCIR-CE ASPEC-CJ IWSLT-CE WIKI-CJ SMT 29.54 36.39 14.31 36.83 NMT 37.11 42.92 7.87 18.29 3. Overview of This Study BLEU-4 scores for translation performance Methods Features Fine tuning [Luong+ 2015] etc. Multi domain [Kobus+ 2016] Proposed: mixed fine tuning Performance Good Good Best Speed Fast Slowest Slower Overfitting Yes No No We conducted an empirical study on different domain adaptation methods for NMT 4. Domain Adaptation Methods for Comparison Source-Target (out-of-domain) NMT Model (out-of-domain) Source-Target (in-domain) NMT Model (in-domain) NMT Model (mixed) merge Append in-domain tag (<2in-domain>) NMT Model (in-domain) Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target (out-of-domain) Source-Target (in-domain) NMT Model (mixed) merge Append in-domain tag (<2in-domain>) NMT Model (out-of-domain) NMT Model (in-domain) Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target (out-of-domain) Source-Target (in-domain) System NTCIR-CE IWSLT-CE IWSLT-CE SMT - 14.31 IWSLT-CE NMT - 7.87 NTCIR-CE SMT 29.54 4.33 NTCIR-CE NMT 37.11 2.60 Fine tuning 17.37 16.41 Multi domain 36.40 16.34 Multi domain w/o tags 37.32 14.97 Multi domain + Fine tuning 14.47 15.82 Mixed fine tuning 37.01 18.01 Mixed fine tuning w/o tags 39.67 17.43 Mixed fine tuning + Fine tuning 32.03 17.11 System ASPEC-CJ WIKI-CJ WIKI-CJ SMT - 36.83 WIKI-CJ NMT - 18.29 ASPEC-CJ SMT 36.39 17.43 ASPEC-CJ NMT 42.92 20.01 Fine tuning 22.10 37.66 Multi domain 42.52 35.79 Multi domain w/o tags 40.78 33.74 Multi domain + Fine tuning 22.78 34.61 Mixed fine tuning 42.56 37.57 Mixed fine tuning w/o tags 41.86 37.23 Mixed fine tuning + Fine tuning 31.63 37.77 5. Experimental Results & Future Work 4.1. Fine tuning [Luong+ 2015; Sennrich+ 2016; Servan+ 2016; Freitag+ 2016] Fine tuning tends to overfit due to the small size of the in-domain data 4.2. Multi domain [Kobus+ 2016] 4.3. Proposed: mixed fine tuning [Kobus+ 2016] studies multi domain translation, but it is not for domain adaptation in particular Multi domain w/o tags: if do not append tags Multi domain + Fine tuning Mixed fine tuning + Fine tuning Mixed fine tuning w/o tags: if do not append tags Prevent overfitting • High Quality In-domain Corpus Setting Results (BLEU-4) • Low Quality In-domain Corpus Setting Results (BLEU-4) # Red numbers indicate the best system and all systems that were not significantly different from the best system • Resource rich: NTCIR-CE patent (train: 1M), ASPEC-CJ scientific (train: 672k) • Resource Poor: IWSLT-CE spoken (train: 209k), WIKI-CJ automatically extracted Wikipedia (train: 136k) Future work: leverage in-domain monolingual corpora [# Figure from Dr. Toshiaki Nakazawa]