SlideShare a Scribd company logo
1 of 25
Jaemin Jeong Seminar 2
Neural Machine Translation
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
Jaemin Jeong Seminar 3
Neural Machine Translation
Jaemin Jeong Seminar 4
Our Solution requires no changes to the model architecture
from a standard NMT system.
But instead introduces an artificial token at the beginning of the
input sentence to specify the required target language.
Jaemin Jeong Seminar 5
Using a shared wordpiece vocabulary, our approach enables multilingual NMT
systems using a single model.
English -> French : comparable performance
English -> German : state-of-the-art
French -> English : state-of-the-art
German -> English : state-of-the-art
Abstract
Jaemin Jeong Seminar 6
Simplicity
Low-resource language improvements
Zero-shot translation
Several attractive benefits
Jaemin Jeong Seminar 7
 No changes are made to the architecture of the model.
 New data is simply added
No changes are made to the training procedure.
 The mini-batches is sampling from the mixed-language just like a single-
language case.
No a-prior decisions about how to allocate parameters for different languages are
made, the system adapts automatically to use the total number of parameters
efficiently to minimize the global loss.
 if single-language model... , 100 languages -> required 1002 models
Simplicity
Jaemin Jeong Seminar 8
All parameters are implicitly shared by all the language pairs being modeled.
This forces the model to generalize across language boundaries during training.
It is observed that when language pairs with little available data and language pairs
with abundant data are mixed into a single model, translation quality on the low
resource language pair is significantly improved.
Low-resource language improvements
Jaemin Jeong Seminar 9
A surprising benefit of modeling several language pairs in a
single model is that the model can learn to translate between
language pairs it has never seen in this combination during
training.
Main Contribution : Zero shot
Zero resource is the additional fine-tuning step which is
required in the latter approach.
Zero-shot translation
Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).
Jaemin Jeong Seminar 10
 Google’s Neural Machine Translation (GNMT)
System Architecture
Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
Jaemin Jeong Seminar 11
Add artificial token
System Architecture
Jaemin Jeong Seminar 12
To address the issue of translation of unknown words and to limit the
vocabulary for computational efficiency we use a shared word piece
model across all the source and target data used for training, usually
with 32000 word pieces.
System Architecture
BPE
Jaemin Jeong Seminar 13
 Train
 WMT 14
 En -> Fr
 En -> De
 Train
 Google-internal large-scale production
datasets
 En <-> Ja
 En <-> Kr
 En <-> Es
 En <-> Pt
 Test
 Newstest2014 and newstest2015
 Fr -> En
 De -> En
Datasets, Training Protocols and Evaluation Metrics
 Multilingual models take a little more time to train
than single language pair models.
 Larger batch size
 Higher initial learning rate
 Evaluate : BLEU score metric
 To test the influence of varying amounts of
training data per language pair we explore two
strategies
 With oversampling vs Without any change
 Random sampling
Jaemin Jeong Seminar 14
 Since there is only a single target language no
additional source token is required -- Perform
three sets of experiments
 First set
 Single : De -> En, Fr -> En
 Multi : De, Fr -> En
 With oversampling vs Without any change
 Second set
 Single : Ja -> En, Ko -> En
 Multi : Ja, Ko -> En
 Third set
 Single : Pt -> En, Es -> En
 Multi : Pt, Es -> En
Many to One
Jaemin Jeong Seminar 15
One to Many
smaller language pair
(En→De)
larger language pair
(En→Fr)
Jaemin Jeong Seminar 16
Many to Many
Jaemin Jeong Seminar 17
Large-scale Experiments
This section shows the result of combining
12 production language pairs having a total
of 3B parameters (255M per single model)
into a single multilingual model.
255M * 12 = 3B
Jaemin Jeong Seminar 18
Zero-Shot Translation
The most straight-forward approach of
translating between languages where no
or little parallel data is available is to use
explicit bridging.
xx→En (bridging) En→yy
Disadvantages :
• time doubles
• loss of quality
Es -> En En -> Ja
Es -> Ja
Jaemin Jeong Seminar 19
explore two ways of leveraging available parallel data to improve zero-shot
translation quality.
 Incrementally training the multilingual model on the additional parallel data for the zero-shot
directions.
 Training a new multilingual model with all available parallel data mixed equally.
1. “Zero-Shot” : English <-> {Belarusian(Be), Russian(Ru), Ukrainian(Uk)}
2. “From-Scratch” : “Zero-Shot” + (additional data) Ru <-> {Be, Uk}
3. “Incremental” : Take the best checkpoint of the “Zero-Shot” model, and run incremental training
on a small portion of the data used to train the “From-Scratch” model for a short period of time until
convergence. (in this case 3% of “Zero-Shot” model total training time)
Effect of Direct Parallel Data
Jaemin Jeong Seminar 20
Effect of Direct Parallel Data
Jaemin Jeong Seminar 21
 Evidence for an Interlingua
Visual Analysis
Jaemin Jeong Seminar 22
 Partially Separated Representations
Visual Analysis
Jaemin Jeong Seminar 23
Mixing Languages
Jaemin Jeong Seminar 24
Mixing Languages
1 – 𝑤 < 2𝑗𝑎 > + 𝑤 < 2𝑘𝑜 >
Jaemin Jeong Seminar 25
We present a simple solution to multilingual NMT.
We show
 We can train multilingual NMT models that can be used to translate between a number of
different languages using a single model where all parameters are shared. (slightly lower
translation quality)
 Zero-shot translation without explicit bridging is possible.
Conclusion

More Related Content

What's hot

Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...MLAI2
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605Shuai Zhang
 
Inter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurInter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurniveditJain
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3TOMMYLINK1
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...Joonhyung Lee
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningMLAI2
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basicradthees
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with GoJames Bowman
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchAmrinder Arora
 
Semantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech RecognitionSemantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech RecognitionWhenty Ariyanti
 
Oblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN TransformationsOblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN TransformationsSherif Abdelfattah
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancyNaveen Kumar
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2TOMMYLINK1
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplabFrozen Paradise
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 

What's hot (20)

Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605
 
Inter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT JodhpurInter IIT Tech Meet 2k19, IIT Jodhpur
Inter IIT Tech Meet 2k19, IIT Jodhpur
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
Hufman coding basic
Hufman coding basicHufman coding basic
Hufman coding basic
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with Go
 
Graph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First SearchGraph Traversal Algorithms - Breadth First Search
Graph Traversal Algorithms - Breadth First Search
 
Semantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech RecognitionSemantic Mask for Transformer Based End-to-End Speech Recognition
Semantic Mask for Transformer Based End-to-End Speech Recognition
 
Oblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN TransformationsOblivious Neural Network Predictions via MiniONN Transformations
Oblivious Neural Network Predictions via MiniONN Transformations
 
Interpixel redundancy
Interpixel redundancyInterpixel redundancy
Interpixel redundancy
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
 
Run length encoding
Run length encodingRun length encoding
Run length encoding
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Data compression
Data compressionData compression
Data compression
 
Ca notes
Ca notesCa notes
Ca notes
 

Similar to Neural Machine Translation Seminars

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...eraser Juan José Calderón
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...IJECEIAES
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...AishwaryaChemate
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfPo-Chuan Chen
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEkevig
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEkevig
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problemJaeHo Jang
 

Similar to Neural Machine Translation Seminars (20)

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
 
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGE
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...
 
NLP unit-VI.pptx
NLP unit-VI.pptxNLP unit-VI.pptx
NLP unit-VI.pptx
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 

More from JAEMINJEONG5

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJAEMINJEONG5
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptxJAEMINJEONG5
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretabilityJAEMINJEONG5
 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layersJAEMINJEONG5
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basisJAEMINJEONG5
 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformerJAEMINJEONG5
 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shakeJAEMINJEONG5
 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricksJAEMINJEONG5
 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heartJAEMINJEONG5
 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_netJAEMINJEONG5
 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detectionJAEMINJEONG5
 

More from JAEMINJEONG5 (20)

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
2021 03-02-spade
2021 03-02-spade2021 03-02-spade
2021 03-02-spade
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformer
 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shake
 
2020 12-03-vit
2020 12-03-vit2020 12-03-vit
2020 12-03-vit
 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detr
 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks
 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart
 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_net
 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam w
 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detection
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

Neural Machine Translation Seminars

  • 1.
  • 2. Jaemin Jeong Seminar 2 Neural Machine Translation Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
  • 3. Jaemin Jeong Seminar 3 Neural Machine Translation
  • 4. Jaemin Jeong Seminar 4 Our Solution requires no changes to the model architecture from a standard NMT system. But instead introduces an artificial token at the beginning of the input sentence to specify the required target language.
  • 5. Jaemin Jeong Seminar 5 Using a shared wordpiece vocabulary, our approach enables multilingual NMT systems using a single model. English -> French : comparable performance English -> German : state-of-the-art French -> English : state-of-the-art German -> English : state-of-the-art Abstract
  • 6. Jaemin Jeong Seminar 6 Simplicity Low-resource language improvements Zero-shot translation Several attractive benefits
  • 7. Jaemin Jeong Seminar 7  No changes are made to the architecture of the model.  New data is simply added No changes are made to the training procedure.  The mini-batches is sampling from the mixed-language just like a single- language case. No a-prior decisions about how to allocate parameters for different languages are made, the system adapts automatically to use the total number of parameters efficiently to minimize the global loss.  if single-language model... , 100 languages -> required 1002 models Simplicity
  • 8. Jaemin Jeong Seminar 8 All parameters are implicitly shared by all the language pairs being modeled. This forces the model to generalize across language boundaries during training. It is observed that when language pairs with little available data and language pairs with abundant data are mixed into a single model, translation quality on the low resource language pair is significantly improved. Low-resource language improvements
  • 9. Jaemin Jeong Seminar 9 A surprising benefit of modeling several language pairs in a single model is that the model can learn to translate between language pairs it has never seen in this combination during training. Main Contribution : Zero shot Zero resource is the additional fine-tuning step which is required in the latter approach. Zero-shot translation Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).
  • 10. Jaemin Jeong Seminar 10  Google’s Neural Machine Translation (GNMT) System Architecture Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
  • 11. Jaemin Jeong Seminar 11 Add artificial token System Architecture
  • 12. Jaemin Jeong Seminar 12 To address the issue of translation of unknown words and to limit the vocabulary for computational efficiency we use a shared word piece model across all the source and target data used for training, usually with 32000 word pieces. System Architecture BPE
  • 13. Jaemin Jeong Seminar 13  Train  WMT 14  En -> Fr  En -> De  Train  Google-internal large-scale production datasets  En <-> Ja  En <-> Kr  En <-> Es  En <-> Pt  Test  Newstest2014 and newstest2015  Fr -> En  De -> En Datasets, Training Protocols and Evaluation Metrics  Multilingual models take a little more time to train than single language pair models.  Larger batch size  Higher initial learning rate  Evaluate : BLEU score metric  To test the influence of varying amounts of training data per language pair we explore two strategies  With oversampling vs Without any change  Random sampling
  • 14. Jaemin Jeong Seminar 14  Since there is only a single target language no additional source token is required -- Perform three sets of experiments  First set  Single : De -> En, Fr -> En  Multi : De, Fr -> En  With oversampling vs Without any change  Second set  Single : Ja -> En, Ko -> En  Multi : Ja, Ko -> En  Third set  Single : Pt -> En, Es -> En  Multi : Pt, Es -> En Many to One
  • 15. Jaemin Jeong Seminar 15 One to Many smaller language pair (En→De) larger language pair (En→Fr)
  • 16. Jaemin Jeong Seminar 16 Many to Many
  • 17. Jaemin Jeong Seminar 17 Large-scale Experiments This section shows the result of combining 12 production language pairs having a total of 3B parameters (255M per single model) into a single multilingual model. 255M * 12 = 3B
  • 18. Jaemin Jeong Seminar 18 Zero-Shot Translation The most straight-forward approach of translating between languages where no or little parallel data is available is to use explicit bridging. xx→En (bridging) En→yy Disadvantages : • time doubles • loss of quality Es -> En En -> Ja Es -> Ja
  • 19. Jaemin Jeong Seminar 19 explore two ways of leveraging available parallel data to improve zero-shot translation quality.  Incrementally training the multilingual model on the additional parallel data for the zero-shot directions.  Training a new multilingual model with all available parallel data mixed equally. 1. “Zero-Shot” : English <-> {Belarusian(Be), Russian(Ru), Ukrainian(Uk)} 2. “From-Scratch” : “Zero-Shot” + (additional data) Ru <-> {Be, Uk} 3. “Incremental” : Take the best checkpoint of the “Zero-Shot” model, and run incremental training on a small portion of the data used to train the “From-Scratch” model for a short period of time until convergence. (in this case 3% of “Zero-Shot” model total training time) Effect of Direct Parallel Data
  • 20. Jaemin Jeong Seminar 20 Effect of Direct Parallel Data
  • 21. Jaemin Jeong Seminar 21  Evidence for an Interlingua Visual Analysis
  • 22. Jaemin Jeong Seminar 22  Partially Separated Representations Visual Analysis
  • 23. Jaemin Jeong Seminar 23 Mixing Languages
  • 24. Jaemin Jeong Seminar 24 Mixing Languages 1 – 𝑤 < 2𝑗𝑎 > + 𝑤 < 2𝑘𝑜 >
  • 25. Jaemin Jeong Seminar 25 We present a simple solution to multilingual NMT. We show  We can train multilingual NMT models that can be used to translate between a number of different languages using a single model where all parameters are shared. (slightly lower translation quality)  Zero-shot translation without explicit bridging is possible. Conclusion