SlideShare a Scribd company logo
What do Neural Machine
Translation Models Learn
about Morphology?
Yonatan Belinkov, Nadir Durrani, Fahim Dalvi,
Hassan Sajjad and James Glass
@ 8/11 ACL2017 Reading
M1 Hayahide Yamagishi
Introduction
● “Little is known about what and how much NMT models learn
about each language and its features.”
● They try to answer the following questions
1. Which parts of the NMT architecture capture word structure?
2. What is the division of labor between defferent components?
3. How do different word representation help learn better morphology and
modeling of infrequent words?
4. How does the target language affect the learning of word structure?
● Task: Part-of-Speech tagging and morphological tagging
2
Task
● Part-of-Speech (POS) tagging
○ computer → NN
○ computers → NNS
● Morphological tagging
○ he → 3, single, male, subject
○ him → 3, single, male, object
● Task: hidden states → tag
○ They would like to test each hidden state.
○ If the accuracy is high, hidden states learn about the word representation.
3
Methodology
1. Training the NMT models (Bahdanau attention, LSTM)
2. Using the trained models as a feature extractor.
3. Training the feedforward NN using the state-tag pairs
○ 1layer: input layer, hidden layer, output layer
4. Test
● “Our goal is not to beat the state-of-the-art on a given task.”
● “We also experimented with a linear classifier and observed
similar trends to the non-linear case.”
4
Data
● Language Pair:
○ {Arabic, German, French, Czech} - English
○ Arabic - Hebrew (Both languages are morphologically-rich and similar.)
○ Arabic - German (Both languages are morphologically-rich but different.)
● Parallel corpus: TED
● POS annotated data
○ Gold: included in some datasets
○ Predict: from the free taggers
5
Char-base Encoder
● Character-aware Neural Language
Model [Kim+, AAAI2016]
● Character-based Neural Machine
Translation [Costa-jussa and Fonollosa,
ACL2016]
● Character embedding
→ word embedding
● Obtained word embeddings are inputted
into the word-based RNN-LM.
6
Effect of word representation (Encoder)
● Word-based vs. Char-based model
● Char-based models are stronger.
7
Impact of word frequency
● Frequent words don’t need the character information.
● “The char-based model is able to learn character n-gram
patterns that are important for identifying word structure.”
8
Confusion matrices
9
Analyzing specific tags
● Arabic → Determiner “Al-” becomes a prefix.
● Char-based model can distinguish “DT+NNS” from “NNS”.
10
Effect of encoder depth
● LSTM carries the context information → Layer 0 is worse.
● States from layer 1 is more effective than states from layer 2.
11
Effect of encoder depth
● Char-based models have the similar tendencies.
12
Effect of encoder depth
● BLEU: 2-layer NMT > 1-layer NMT
○ word / char : +1.11 / +0.56
● Layer 1 learns the word representation
● Layer 2 learns the word meaning
● Word representation < word representation + word meaning
13
Effect of target language
● Translating into morphologically-rich language is harder.
○ Arabic-English: 24.69
○ English-Arabic: 13.37
● “How does the target language affect the learned source
language representations?”
○ “Does translating into a morphologically-rich language require more
knowledge about source language morphology?”
● Experiment: Arabic - {Arabic, Hebrew, German, English}
○ Arabic-Arabic: Autoencoder
14
Result
15
Effect of target languages
● They expected translating into morph-rich languages would
make the model learn more about morphology. → No
● The accuracy doesn’t correlate with the BLEU score
○ Autoencoder couldn’t learn the morphological representation.
○ If the model only works as a recreator, it doesn’t have to learn it.
○ “A better translation model learns more informative representation.”
● Possible explanation
○ Arabic-English is simply better than -Hebrew and -German.
○ These models may not be able to afford to understand the representations of
word structure.
16
Decoder Analysis
● Similar experiments
○ Decoder’s input is the correct previous word.
○ Char-based decoder’s input is the char-based representation.
○ Char-based decoder’s output is the word-level.
● Arabic-English or English-Arabic
17
Effect of decoder states
● Decoder states doesn’t have a morphological information.
● BLEU doesn’t correlate the accuracy
○ French-English: 37.8 BLEU / 54.26% accuracy
18
Effect of attention
● Encoder states
○ Task: creating a generic, close to language-independent representation of
source sentence .
○ When the attention is attached, these are treated as a memo.
○ When the model translates the noun, the attention sees the noun words.
● Decoder states
○ Task: using encoder’s representation to generate the target sentence in a
specific language.
○ “Without the attention mechanism, the decoder is forced to learn more
informative representations of the target language.”
19
Effect of word representation (Decoder)
● Char-based representations don’t hep the decoder
○ The decoder’s predictions are still done at word level.
○ “In Arabic-English the char-based model reduces the number of generated
unknown words in the MT test set by 25%.”
○ “In English-Arabic the number of unknown words remains roughly the same
between word-based and char-based models.”
20
Conclusion
● Their results lead to the following conclusions
○ Char-based representations are better than word-based ones
○ Lower layers captures morphology, while deeper layers improve translation
performance.
○ Translating into morphologically-poorer languages leads to better source
representations.
○ The attentional decoder learns impoverished representations that do not
carry much information about morphology.
● “Jointly learning translation and morphology can possibly
lead to better representations and improved translation.”
21

More Related Content

What's hot

PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
Schwannden Kuo
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
Rushdi Shams
 
PL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopePL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and Scope
Schwannden Kuo
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Masahiro Kaneko
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
Roelof Pieters
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
Bayu Aldi Yansyah
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
c sharada
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
Rajnish Raj
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
Automatic text simplification evaluation aspects
Automatic text simplification  evaluation aspectsAutomatic text simplification  evaluation aspects
Automatic text simplification evaluation aspects
iwan_rg
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
Satyam Saxena
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
Marina Santini
 
Family Tree on PROLOG
Family Tree on PROLOGFamily Tree on PROLOG
Family Tree on PROLOG
Abdul Rafay
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
iwan_rg
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
Felipe Moraes
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
Marcis Pinnis
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
Francisco Manuel Rangel Pardo
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
Sebastian Ruder
 

What's hot (20)

PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
PL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopePL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and Scope
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Automatic text simplification evaluation aspects
Automatic text simplification  evaluation aspectsAutomatic text simplification  evaluation aspects
Automatic text simplification evaluation aspects
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Language models
Language modelsLanguage models
Language models
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Family Tree on PROLOG
Family Tree on PROLOGFamily Tree on PROLOG
Family Tree on PROLOG
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 

Similar to [ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
Jaemin Cho
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptx
MariYam371004
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
Kwanghee Choi
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
kevig
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
Zoltan Varju
 
Shedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and IdiomsShedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and Idioms
Mia Mohammad Imran
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
ParrotAI
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS PosterTaha Merghani
 
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
Shivaprasad787526
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
Isabelle Augenstein
 
NLP.pptx
NLP.pptxNLP.pptx
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
Roelof Pieters
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 

Similar to [ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology? (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptx
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Shedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and IdiomsShedding Light on Software Engineering-specific Metaphors and Idioms
Shedding Light on Software Engineering-specific Metaphors and Idioms
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS Poster
 
Esa act
Esa actEsa act
Esa act
 
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptxARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
ARTIFICIAL INTELLEGENCE AND MACHINE LEARNING.pptx
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 

More from Hayahide Yamagishi

[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
Hayahide Yamagishi
 
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
Hayahide Yamagishi
 
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
Hayahide Yamagishi
 
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
Hayahide Yamagishi
 
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
Hayahide Yamagishi
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
Hayahide Yamagishi
 
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend
Hayahide Yamagishi
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
Hayahide Yamagishi
 
[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs
Hayahide Yamagishi
 
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
Hayahide Yamagishi
 
Why neural translations are the right length
Why neural translations are  the right lengthWhy neural translations are  the right length
Why neural translations are the right length
Hayahide Yamagishi
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
ニューラル論文を読む前に
ニューラル論文を読む前にニューラル論文を読む前に
ニューラル論文を読む前に
Hayahide Yamagishi
 
ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御
Hayahide Yamagishi
 
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
Hayahide Yamagishi
 
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
Hayahide Yamagishi
 

More from Hayahide Yamagishi (16)

[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
 
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
 
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
 
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
 
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Teaching Machines to Read and Comprehend
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
 
[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs[ML論文読み会資料] Training RNNs as Fast as CNNs
[ML論文読み会資料] Training RNNs as Fast as CNNs
 
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
 
Why neural translations are the right length
Why neural translations are  the right lengthWhy neural translations are  the right length
Why neural translations are the right length
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
 
ニューラル論文を読む前に
ニューラル論文を読む前にニューラル論文を読む前に
ニューラル論文を読む前に
 
ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御ニューラル日英翻訳における出力文の態制御
ニューラル日英翻訳における出力文の態制御
 
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
 
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
 

Recently uploaded

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 

Recently uploaded (20)

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 

[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?

  • 1. What do Neural Machine Translation Models Learn about Morphology? Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad and James Glass @ 8/11 ACL2017 Reading M1 Hayahide Yamagishi
  • 2. Introduction ● “Little is known about what and how much NMT models learn about each language and its features.” ● They try to answer the following questions 1. Which parts of the NMT architecture capture word structure? 2. What is the division of labor between defferent components? 3. How do different word representation help learn better morphology and modeling of infrequent words? 4. How does the target language affect the learning of word structure? ● Task: Part-of-Speech tagging and morphological tagging 2
  • 3. Task ● Part-of-Speech (POS) tagging ○ computer → NN ○ computers → NNS ● Morphological tagging ○ he → 3, single, male, subject ○ him → 3, single, male, object ● Task: hidden states → tag ○ They would like to test each hidden state. ○ If the accuracy is high, hidden states learn about the word representation. 3
  • 4. Methodology 1. Training the NMT models (Bahdanau attention, LSTM) 2. Using the trained models as a feature extractor. 3. Training the feedforward NN using the state-tag pairs ○ 1layer: input layer, hidden layer, output layer 4. Test ● “Our goal is not to beat the state-of-the-art on a given task.” ● “We also experimented with a linear classifier and observed similar trends to the non-linear case.” 4
  • 5. Data ● Language Pair: ○ {Arabic, German, French, Czech} - English ○ Arabic - Hebrew (Both languages are morphologically-rich and similar.) ○ Arabic - German (Both languages are morphologically-rich but different.) ● Parallel corpus: TED ● POS annotated data ○ Gold: included in some datasets ○ Predict: from the free taggers 5
  • 6. Char-base Encoder ● Character-aware Neural Language Model [Kim+, AAAI2016] ● Character-based Neural Machine Translation [Costa-jussa and Fonollosa, ACL2016] ● Character embedding → word embedding ● Obtained word embeddings are inputted into the word-based RNN-LM. 6
  • 7. Effect of word representation (Encoder) ● Word-based vs. Char-based model ● Char-based models are stronger. 7
  • 8. Impact of word frequency ● Frequent words don’t need the character information. ● “The char-based model is able to learn character n-gram patterns that are important for identifying word structure.” 8
  • 10. Analyzing specific tags ● Arabic → Determiner “Al-” becomes a prefix. ● Char-based model can distinguish “DT+NNS” from “NNS”. 10
  • 11. Effect of encoder depth ● LSTM carries the context information → Layer 0 is worse. ● States from layer 1 is more effective than states from layer 2. 11
  • 12. Effect of encoder depth ● Char-based models have the similar tendencies. 12
  • 13. Effect of encoder depth ● BLEU: 2-layer NMT > 1-layer NMT ○ word / char : +1.11 / +0.56 ● Layer 1 learns the word representation ● Layer 2 learns the word meaning ● Word representation < word representation + word meaning 13
  • 14. Effect of target language ● Translating into morphologically-rich language is harder. ○ Arabic-English: 24.69 ○ English-Arabic: 13.37 ● “How does the target language affect the learned source language representations?” ○ “Does translating into a morphologically-rich language require more knowledge about source language morphology?” ● Experiment: Arabic - {Arabic, Hebrew, German, English} ○ Arabic-Arabic: Autoencoder 14
  • 16. Effect of target languages ● They expected translating into morph-rich languages would make the model learn more about morphology. → No ● The accuracy doesn’t correlate with the BLEU score ○ Autoencoder couldn’t learn the morphological representation. ○ If the model only works as a recreator, it doesn’t have to learn it. ○ “A better translation model learns more informative representation.” ● Possible explanation ○ Arabic-English is simply better than -Hebrew and -German. ○ These models may not be able to afford to understand the representations of word structure. 16
  • 17. Decoder Analysis ● Similar experiments ○ Decoder’s input is the correct previous word. ○ Char-based decoder’s input is the char-based representation. ○ Char-based decoder’s output is the word-level. ● Arabic-English or English-Arabic 17
  • 18. Effect of decoder states ● Decoder states doesn’t have a morphological information. ● BLEU doesn’t correlate the accuracy ○ French-English: 37.8 BLEU / 54.26% accuracy 18
  • 19. Effect of attention ● Encoder states ○ Task: creating a generic, close to language-independent representation of source sentence . ○ When the attention is attached, these are treated as a memo. ○ When the model translates the noun, the attention sees the noun words. ● Decoder states ○ Task: using encoder’s representation to generate the target sentence in a specific language. ○ “Without the attention mechanism, the decoder is forced to learn more informative representations of the target language.” 19
  • 20. Effect of word representation (Decoder) ● Char-based representations don’t hep the decoder ○ The decoder’s predictions are still done at word level. ○ “In Arabic-English the char-based model reduces the number of generated unknown words in the MT test set by 25%.” ○ “In English-Arabic the number of unknown words remains roughly the same between word-based and char-based models.” 20
  • 21. Conclusion ● Their results lead to the following conclusions ○ Char-based representations are better than word-based ones ○ Lower layers captures morphology, while deeper layers improve translation performance. ○ Translating into morphologically-poorer languages leads to better source representations. ○ The attentional decoder learns impoverished representations that do not carry much information about morphology. ● “Jointly learning translation and morphology can possibly lead to better representations and improved translation.” 21