SlideShare a Scribd company logo
Jaemin Jeong Seminar 2
Neural Machine Translation
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
Jaemin Jeong Seminar 3
Neural Machine Translation
Jaemin Jeong Seminar 4
Our Solution requires no changes to the model architecture
from a standard NMT system.
But instead introduces an artificial token at the beginning of the
input sentence to specify the required target language.
Jaemin Jeong Seminar 5
Using a shared wordpiece vocabulary, our approach enables multilingual NMT
systems using a single model.
English -> French : comparable performance
English -> German : state-of-the-art
French -> English : state-of-the-art
German -> English : state-of-the-art
Abstract
Jaemin Jeong Seminar 6
Simplicity
Low-resource language improvements
Zero-shot translation
Several attractive benefits
Jaemin Jeong Seminar 7
 No changes are made to the architecture of the model.
 New data is simply added
No changes are made to the training procedure.
 The mini-batches is sampling from the mixed-language just like a single-
language case.
No a-prior decisions about how to allocate parameters for different languages are
made, the system adapts automatically to use the total number of parameters
efficiently to minimize the global loss.
 if single-language model... , 100 languages -> required 1002 models
Simplicity
Jaemin Jeong Seminar 8
All parameters are implicitly shared by all the language pairs being modeled.
This forces the model to generalize across language boundaries during training.
It is observed that when language pairs with little available data and language pairs
with abundant data are mixed into a single model, translation quality on the low
resource language pair is significantly improved.
Low-resource language improvements
Jaemin Jeong Seminar 9
A surprising benefit of modeling several language pairs in a
single model is that the model can learn to translate between
language pairs it has never seen in this combination during
training.
Main Contribution : Zero shot
Zero resource is the additional fine-tuning step which is
required in the latter approach.
Zero-shot translation
Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).
Jaemin Jeong Seminar 10
 Google’s Neural Machine Translation (GNMT)
System Architecture
Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
Jaemin Jeong Seminar 11
Add artificial token
System Architecture
Jaemin Jeong Seminar 12
To address the issue of translation of unknown words and to limit the
vocabulary for computational efficiency we use a shared word piece
model across all the source and target data used for training, usually
with 32000 word pieces.
System Architecture
BPE
Jaemin Jeong Seminar 13
 Train
 WMT 14
 En -> Fr
 En -> De
 Train
 Google-internal large-scale production
datasets
 En <-> Ja
 En <-> Kr
 En <-> Es
 En <-> Pt
 Test
 Newstest2014 and newstest2015
 Fr -> En
 De -> En
Datasets, Training Protocols and Evaluation Metrics
 Multilingual models take a little more time to train
than single language pair models.
 Larger batch size
 Higher initial learning rate
 Evaluate : BLEU score metric
 To test the influence of varying amounts of
training data per language pair we explore two
strategies
 With oversampling vs Without any change
 Random sampling
Jaemin Jeong Seminar 14
 Since there is only a single target language no
additional source token is required -- Perform
three sets of experiments
 First set
 Single : De -> En, Fr -> En
 Multi : De, Fr -> En
 With oversampling vs Without any change
 Second set
 Single : Ja -> En, Ko -> En
 Multi : Ja, Ko -> En
 Third set
 Single : Pt -> En, Es -> En
 Multi : Pt, Es -> En
Many to One
Jaemin Jeong Seminar 15
One to Many
smaller language pair
(En→De)
larger language pair
(En→Fr)
Jaemin Jeong Seminar 16
Many to Many
Jaemin Jeong Seminar 17
Large-scale Experiments
This section shows the result of combining
12 production language pairs having a total
of 3B parameters (255M per single model)
into a single multilingual model.
255M * 12 = 3B
Jaemin Jeong Seminar 18
Zero-Shot Translation
The most straight-forward approach of
translating between languages where no
or little parallel data is available is to use
explicit bridging.
xx→En (bridging) En→yy
Disadvantages :
• time doubles
• loss of quality
Es -> En En -> Ja
Es -> Ja
Jaemin Jeong Seminar 19
explore two ways of leveraging available parallel data to improve zero-shot
translation quality.
 Incrementally training the multilingual model on the additional parallel data for the zero-shot
directions.
 Training a new multilingual model with all available parallel data mixed equally.
1. “Zero-Shot” : English <-> {Belarusian(Be), Russian(Ru), Ukrainian(Uk)}
2. “From-Scratch” : “Zero-Shot” + (additional data) Ru <-> {Be, Uk}
3. “Incremental” : Take the best checkpoint of the “Zero-Shot” model, and run incremental training
on a small portion of the data used to train the “From-Scratch” model for a short period of time until
convergence. (in this case 3% of “Zero-Shot” model total training time)
Effect of Direct Parallel Data
Jaemin Jeong Seminar 20
Effect of Direct Parallel Data
Jaemin Jeong Seminar 21
 Evidence for an Interlingua
Visual Analysis
Jaemin Jeong Seminar 22
 Partially Separated Representations
Visual Analysis
Jaemin Jeong Seminar 23
Mixing Languages
Jaemin Jeong Seminar 24
Mixing Languages
1 – 𝑤 < 2𝑗𝑎 > + 𝑤 < 2𝑘𝑜 >
Jaemin Jeong Seminar 25
We present a simple solution to multilingual NMT.
We show
 We can train multilingual NMT models that can be used to translate between a number of
different languages using a single model where all parameters are shared. (slightly lower
translation quality)
 Zero-shot translation without explicit bridging is possible.
Conclusion

More Related Content

What's hot

Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605
Shuai Zhang
 
Inter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurInter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT Jodhpur
niveditJain
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3
TOMMYLINK1
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
Joonhyung Lee
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basic
radthees
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with Go
James Bowman
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First Search
Amrinder Arora
 
Semantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech RecognitionSemantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech Recognition
Whenty Ariyanti
 
Oblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN TransformationsOblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN Transformations
Sherif Abdelfattah
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancy
Naveen Kumar
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
ANISH BHANUSHALI
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2
TOMMYLINK1
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
Frozen Paradise
 
Run length encoding
Run length encodingRun length encoding
Run length encoding
praseethasnair123
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Data compression
Data compressionData compression
Data compression
Sherif Abdelfattah
 
Ca notes
Ca notesCa notes
Ca notes
ankitadhoot
 

What's hot (20)

Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605
 
Inter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurInter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT Jodhpur
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basic
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with Go
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First Search
 
Semantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech RecognitionSemantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech Recognition
 
Oblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN TransformationsOblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN Transformations
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancy
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
 
Run length encoding
Run length encodingRun length encoding
Run length encoding
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Data compression
Data compressionData compression
Data compression
 
Ca notes
Ca notesCa notes
Ca notes
 

Similar to 2021 04-04-google nmt

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
ijnlc
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
indico data
 
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
eraser Juan José Calderón
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
REMEGIUSPRAVEENSAHAY
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
IJECEIAES
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
AishwaryaChemate
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
Toru Fujino
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Po-Chuan Chen
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Sheeyam Shellvacumar
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
kevig
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
kevig
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
SDL
 
“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...
Paris Women in Machine Learning and Data Science
 
NLP unit-VI.pptx
NLP unit-VI.pptxNLP unit-VI.pptx
NLP unit-VI.pptx
aishuchemate01
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
butest
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
IJECEIAES
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 

Similar to 2021 04-04-google nmt (20)

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...
 
NLP unit-VI.pptx
NLP unit-VI.pptxNLP unit-VI.pptx
NLP unit-VI.pptx
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 

More from JAEMINJEONG5

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
JAEMINJEONG5
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
JAEMINJEONG5
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
JAEMINJEONG5
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
JAEMINJEONG5
 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
JAEMINJEONG5
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
JAEMINJEONG5
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
JAEMINJEONG5
 
2021 03-02-spade
2021 03-02-spade2021 03-02-spade
2021 03-02-spade
JAEMINJEONG5
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
JAEMINJEONG5
 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers
JAEMINJEONG5
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
JAEMINJEONG5
 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformer
JAEMINJEONG5
 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shake
JAEMINJEONG5
 
2020 12-03-vit
2020 12-03-vit2020 12-03-vit
2020 12-03-vit
JAEMINJEONG5
 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detr
JAEMINJEONG5
 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks
JAEMINJEONG5
 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart
JAEMINJEONG5
 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_net
JAEMINJEONG5
 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam w
JAEMINJEONG5
 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detection
JAEMINJEONG5
 

More from JAEMINJEONG5 (20)

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
2021 03-02-spade
2021 03-02-spade2021 03-02-spade
2021 03-02-spade
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformer
 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shake
 
2020 12-03-vit
2020 12-03-vit2020 12-03-vit
2020 12-03-vit
 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detr
 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks
 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart
 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_net
 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam w
 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detection
 

Recently uploaded

Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
CVCSOfficial
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
Prakhyath Rai
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 

Recently uploaded (20)

Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 

2021 04-04-google nmt

  • 1.
  • 2. Jaemin Jeong Seminar 2 Neural Machine Translation Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
  • 3. Jaemin Jeong Seminar 3 Neural Machine Translation
  • 4. Jaemin Jeong Seminar 4 Our Solution requires no changes to the model architecture from a standard NMT system. But instead introduces an artificial token at the beginning of the input sentence to specify the required target language.
  • 5. Jaemin Jeong Seminar 5 Using a shared wordpiece vocabulary, our approach enables multilingual NMT systems using a single model. English -> French : comparable performance English -> German : state-of-the-art French -> English : state-of-the-art German -> English : state-of-the-art Abstract
  • 6. Jaemin Jeong Seminar 6 Simplicity Low-resource language improvements Zero-shot translation Several attractive benefits
  • 7. Jaemin Jeong Seminar 7  No changes are made to the architecture of the model.  New data is simply added No changes are made to the training procedure.  The mini-batches is sampling from the mixed-language just like a single- language case. No a-prior decisions about how to allocate parameters for different languages are made, the system adapts automatically to use the total number of parameters efficiently to minimize the global loss.  if single-language model... , 100 languages -> required 1002 models Simplicity
  • 8. Jaemin Jeong Seminar 8 All parameters are implicitly shared by all the language pairs being modeled. This forces the model to generalize across language boundaries during training. It is observed that when language pairs with little available data and language pairs with abundant data are mixed into a single model, translation quality on the low resource language pair is significantly improved. Low-resource language improvements
  • 9. Jaemin Jeong Seminar 9 A surprising benefit of modeling several language pairs in a single model is that the model can learn to translate between language pairs it has never seen in this combination during training. Main Contribution : Zero shot Zero resource is the additional fine-tuning step which is required in the latter approach. Zero-shot translation Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).
  • 10. Jaemin Jeong Seminar 10  Google’s Neural Machine Translation (GNMT) System Architecture Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
  • 11. Jaemin Jeong Seminar 11 Add artificial token System Architecture
  • 12. Jaemin Jeong Seminar 12 To address the issue of translation of unknown words and to limit the vocabulary for computational efficiency we use a shared word piece model across all the source and target data used for training, usually with 32000 word pieces. System Architecture BPE
  • 13. Jaemin Jeong Seminar 13  Train  WMT 14  En -> Fr  En -> De  Train  Google-internal large-scale production datasets  En <-> Ja  En <-> Kr  En <-> Es  En <-> Pt  Test  Newstest2014 and newstest2015  Fr -> En  De -> En Datasets, Training Protocols and Evaluation Metrics  Multilingual models take a little more time to train than single language pair models.  Larger batch size  Higher initial learning rate  Evaluate : BLEU score metric  To test the influence of varying amounts of training data per language pair we explore two strategies  With oversampling vs Without any change  Random sampling
  • 14. Jaemin Jeong Seminar 14  Since there is only a single target language no additional source token is required -- Perform three sets of experiments  First set  Single : De -> En, Fr -> En  Multi : De, Fr -> En  With oversampling vs Without any change  Second set  Single : Ja -> En, Ko -> En  Multi : Ja, Ko -> En  Third set  Single : Pt -> En, Es -> En  Multi : Pt, Es -> En Many to One
  • 15. Jaemin Jeong Seminar 15 One to Many smaller language pair (En→De) larger language pair (En→Fr)
  • 16. Jaemin Jeong Seminar 16 Many to Many
  • 17. Jaemin Jeong Seminar 17 Large-scale Experiments This section shows the result of combining 12 production language pairs having a total of 3B parameters (255M per single model) into a single multilingual model. 255M * 12 = 3B
  • 18. Jaemin Jeong Seminar 18 Zero-Shot Translation The most straight-forward approach of translating between languages where no or little parallel data is available is to use explicit bridging. xx→En (bridging) En→yy Disadvantages : • time doubles • loss of quality Es -> En En -> Ja Es -> Ja
  • 19. Jaemin Jeong Seminar 19 explore two ways of leveraging available parallel data to improve zero-shot translation quality.  Incrementally training the multilingual model on the additional parallel data for the zero-shot directions.  Training a new multilingual model with all available parallel data mixed equally. 1. “Zero-Shot” : English <-> {Belarusian(Be), Russian(Ru), Ukrainian(Uk)} 2. “From-Scratch” : “Zero-Shot” + (additional data) Ru <-> {Be, Uk} 3. “Incremental” : Take the best checkpoint of the “Zero-Shot” model, and run incremental training on a small portion of the data used to train the “From-Scratch” model for a short period of time until convergence. (in this case 3% of “Zero-Shot” model total training time) Effect of Direct Parallel Data
  • 20. Jaemin Jeong Seminar 20 Effect of Direct Parallel Data
  • 21. Jaemin Jeong Seminar 21  Evidence for an Interlingua Visual Analysis
  • 22. Jaemin Jeong Seminar 22  Partially Separated Representations Visual Analysis
  • 23. Jaemin Jeong Seminar 23 Mixing Languages
  • 24. Jaemin Jeong Seminar 24 Mixing Languages 1 – 𝑤 < 2𝑗𝑎 > + 𝑤 < 2𝑘𝑜 >
  • 25. Jaemin Jeong Seminar 25 We present a simple solution to multilingual NMT. We show  We can train multilingual NMT models that can be used to translate between a number of different languages using a single model where all parameters are shared. (slightly lower translation quality)  Zero-shot translation without explicit bridging is possible. Conclusion