SlideShare a Scribd company logo
Why Neural Translations are
the Right Length
Xing Shi, Kevin Knight, Deniz Yuret
EMNLP 2016, short
Introduction
● One of the most remarkable feature of NMT is that
it produces translations of the right length.
○ In SMT, length of the output tend to be shorter due to using BLEU.
○ MERT avoided this problem, but it was the temporary relief.
● Following table show the reason of this problem
SMT NMT
Training Maximum BLEU training Maximum likelihood training
Clarity of finising translating clear not clear
Path numbers of Beam serch heavy Far fewer from the SMT’s
Toy problem
● Copy a input sentence as a output sentence.
○ Vocabularies: only 3 words, ‘a’, ‘b’, and ‘<EOS>’
○ Max length of inputs: 9
○ Encoder-Decoder
○ only 4 hidden-unit LSTM
Result 1
● left: unit1-unit2, right: unit3-unit4
Result
1. “string length”
2. “The number of ‘b’ - that of ‘a’ ”
3. “Initial character of a string is ‘b’ if this values < 1.0.”
4. “Positive correlation between this value and the number of ‘b’
Negative correlation between this value and the length.”
Result 2
● Value1 decrease about 1.0 each time
a letter is read.
● Value1 increase about 1.0 each time
a letter is written.
● If value1 reaches 0, probability of
decoding ‘<EOS>’.
Translation
● WMT2014 En-Fr (12,075,604 pairs)
● 2 layers, 1000-unit LSTM encoder-decoder
(not use “Attention”)
● Using the least squares method to find the cell.
Result
● 109 and 334 maybe
controls the length
● P(‘<EOS>’) shaply
increases when both
value are reach 0
(from 10-8
to 0.998)
(This is the last slide.)

More Related Content

More from Hayahide Yamagishi

[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
Hayahide Yamagishi
 
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
Hayahide Yamagishi
 
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
Hayahide Yamagishi
 
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
Hayahide Yamagishi
 
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
Hayahide Yamagishi
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
Hayahide Yamagishi
 
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend
Hayahide Yamagishi
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
Hayahide Yamagishi
 
[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs
Hayahide Yamagishi
 
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
Hayahide Yamagishi
 
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
Hayahide Yamagishi
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
ニューラル論文を読む前に
ニューラル論文を読む前にニューラル論文を読む前に
ニューラル論文を読む前に
Hayahide Yamagishi
 
ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御
Hayahide Yamagishi
 
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
Hayahide Yamagishi
 
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
Hayahide Yamagishi
 

More from Hayahide Yamagishi (16)

[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
 
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
 
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
 
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
 
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
 
[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs
 
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
 
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
 
ニューラル論文を読む前に
ニューラル論文を読む前にニューラル論文を読む前に
ニューラル論文を読む前に
 
ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御
 
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
 
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
 

Recently uploaded

Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 

Recently uploaded (20)

Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 

Why neural translations are the right length

  • 1. Why Neural Translations are the Right Length Xing Shi, Kevin Knight, Deniz Yuret EMNLP 2016, short
  • 2. Introduction ● One of the most remarkable feature of NMT is that it produces translations of the right length. ○ In SMT, length of the output tend to be shorter due to using BLEU. ○ MERT avoided this problem, but it was the temporary relief. ● Following table show the reason of this problem SMT NMT Training Maximum BLEU training Maximum likelihood training Clarity of finising translating clear not clear Path numbers of Beam serch heavy Far fewer from the SMT’s
  • 3. Toy problem ● Copy a input sentence as a output sentence. ○ Vocabularies: only 3 words, ‘a’, ‘b’, and ‘<EOS>’ ○ Max length of inputs: 9 ○ Encoder-Decoder ○ only 4 hidden-unit LSTM
  • 4. Result 1 ● left: unit1-unit2, right: unit3-unit4
  • 5. Result 1. “string length” 2. “The number of ‘b’ - that of ‘a’ ” 3. “Initial character of a string is ‘b’ if this values < 1.0.” 4. “Positive correlation between this value and the number of ‘b’ Negative correlation between this value and the length.”
  • 6. Result 2 ● Value1 decrease about 1.0 each time a letter is read. ● Value1 increase about 1.0 each time a letter is written. ● If value1 reaches 0, probability of decoding ‘<EOS>’.
  • 7. Translation ● WMT2014 En-Fr (12,075,604 pairs) ● 2 layers, 1000-unit LSTM encoder-decoder (not use “Attention”) ● Using the least squares method to find the cell.
  • 8.
  • 9. Result ● 109 and 334 maybe controls the length ● P(‘<EOS>’) shaply increases when both value are reach 0 (from 10-8 to 0.998) (This is the last slide.)