SlideShare a Scribd company logo
1 of 22
Download to read offline
Learning to Translate with
Joey NMT
PyData Meetup Montreal
Julia Kreutzer
Feb 25, 2021
Today
1. Neural Machine Translation 101
a. Translation as a ML problem
b. Transformer model
c. The role of data
2. Joey NMT
a. Features and purpose
b. Demo
c. Use cases
3. Q & A
Assuming basic ML knowledge, familiarity with neural networks.
What's the technology behind modern
machine translation?
How can you get started?
Why open-sourcing?
Why another toolkit?
[Optional] Demo Preparation
If you want to train your own translation model during this presentation:
1. Open joey_demo.ipynb on Colab.
2. Create a copy.
3. Select GPU runtime: Runtime -> Change runtime type -> Hardware accelerator: GPU
4. Run all cells: Runtime -> Run all
5. Come back to the talk :)
We'll inspect later what's happening there.
Neural Machine Translation 101
Translation as a ML Problem
Challenges
➢ Unlimited length
➢ Structural dependencies
➢ Unseen words
➢ Figurative language
Seq2Seq
➢ Modeling sentences (mostly)
➢ Connections between all words
➢ Sub-word modeling
➢ A lot of training data
Input: What is a poutine ?
Output: Qu'est-ce qu'une poutine ?
The
Transformer
"Attention is all you need"
Vaswani et al. 2017
Decoder
Specialties
Source: Vaswani et al. 2017
Training vs. Inference
Conditional language modeling: Predict the next token yt
:
● given source X and all previous tokens of the reference during training.
● given source X and previously predicted tokens during inference.
Training with MLE, inference with greedy or beam search.
Beam Search
Source: G. Neubig's course on MT and Seq2Seq
Keep the k most likely
prediction sequences in
each step.
➢ more expensive
than greedy
➢ more exact
Implementation on
mini-batches is tricky!
k=2
Words?
Pre-processing plays a huge role in NMT.
qu'est-ce qu'une poutine ? 4 tokens, 4 types
vs
qu ' est - ce qu ' une poutine ? 10 tokens, 8 types
➢ Sub-words instead of words: frequency-based automatic segmentation.
➢ Algorithms: BPE, unigram LM.
➢ Implementations: subword-nmt, SentencePiece.
The Role of Data
A "base"-sized Transformer has ~65M weights. How much data does it need?
➢ It depends!
➢ “As much as you can find" heuristic
➢ Beyond parallel data
○ unsupervised NMT
○ data augmentation
○ dictionaries
○ pre-trained embeddings
○ multilingual modeling
How similar are source and target language?
What kind of quality are you expecting?
How complex is the text?
Evaluation
Input: What is a poutine ?
Reference: Qu'est-ce qu'une poutine ?
Outputs:
1. Est-ce qu'une poutine ?
2. Que-ce une poutine ?
3. Qu'une poutine ?
4. Qu'est-ce qu'un poutin ?
5. C'est qu'une poutine .
How should these outputs be ranked / scored?
Evaluation
Input: What is a poutine ?
Reference: Qu'est-ce qu'une poutine ?
Outputs:
1. Est-ce qu'une poutine ? 59.5 82.8
2. Que-ce une poutine ? 32.0 51.4
3. Qu'une poutine ? 39.4 58.3
4. Qu'est-ce qu'un poutin ? 19.0 74.4
5. C'est qu'une poutine . 32.0 60.8
BLEU: geometric average of
token n-gram precisions,
brevity penalty
ChrF: character
n-gram F-score
Joey NMT
Joint work with Jasmijn Bastings, Mayumi Ohta and Joey NMT contributors
Problem
+ A lot of code for NMT is online.
+ Free compute through Colab.
+ Data is freely available. Is it clean?
How long would I have to study it?
Are all features documented?
How can I run it on Colab?
How do I need to prepare data to use it?
Does that mean it's accessible?
Solution
Joey NMT: clean, minimalist, documented.
➢ Much smaller than other toolkits
➢ Covers core features
➢ User study on usability
➢ The core API changes very little.
➢ Examples, pre-trained models, tutorials, FAQ
➢ Based on PyTorch
Does not do
everything,
does not grow
much.
Features
You can:
● train a RNN/Transformer model
● on CPU, one or multiple GPUs
● monitor the training process
● configure hyperparameters
● store it, load it, test it
And more:
● follow training recipes
● modify the code easily
● get inspiration from other extensions
● share/load pre-trained models
It's cute, but can it compete?
Quality?
➢ Comparable to other toolkits.
Adoption?
➢ Not as popular.
Innovation?
➢ More and more research.
It's cute, but can it compete?
Quality?
➢ Comparable to other toolkits.
Adoption?
➢ Not as popular.
Innovation?
➢ More and more research.
It might not be the best choice for
➢ exact replication of another paper
-> use their code instead
➢ non-seq2seq applications
➢ performance-critical applications
(not optimized for it)
➢ loading BERT (not implemented)
Demo
Cool stuff feat. Joey NMT
Grassroots research communities
➢ Masakhane: NLP for African languages
➢ Turkic Interlingua: NLP for Turkic languages
Extensions
➢ Reinforcement learning
➢ Sign language translation
➢ Speech translation
➢ Image captioning
➢ Slack bot
More on this list.
Material
➢ Neural networks in NLP
○ Y. Goldberg: A Primer on Neural Network Models for Natural Language Processing
○ G. Neubig: CMU CS 11-747: Neural Networks for NLP
➢ Neural Machine translation
○ P. Koehn: Neural Machine Translation (Draft Chapter of the Statistical MT book)
○ G. Neubig: Tutorial on Neural Machine Translation
○ A. Rush: The Annotated Transformer
○ J. Bastings: The Annotated Encoder-Decoder
○ M. Müller: Seven Recommendations for MT Evaluation
➢ Joey NMT
○ Joey NMT paper
○ Joey NMT tutorial
○ Masakhane notebooks and YouTube tutorial
○ Turkic Interlingua YouTube tutorial
Thank you!
jkreutzer@google.com
Twitter: @KreutzerJulia
Q & A

More Related Content

What's hot

Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translateNeural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translatesotanemoto
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRANDeep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRANTAUS - The Language Data Network
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsArtifacia
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Demo the reactive jargons
Demo the reactive jargonsDemo the reactive jargons
Demo the reactive jargonsThoughtworks
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
NLP Transfer learning platform
NLP Transfer learning platformNLP Transfer learning platform
NLP Transfer learning platformmanusuryavansh
 

What's hot (20)

Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translateNeural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translate
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRANDeep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
Go fundamentals
Go fundamentalsGo fundamentals
Go fundamentals
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
ANTLR4 in depth
ANTLR4 in depthANTLR4 in depth
ANTLR4 in depth
 
Demo the reactive jargons
Demo the reactive jargonsDemo the reactive jargons
Demo the reactive jargons
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
BERT
BERTBERT
BERT
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
NLP Transfer learning platform
NLP Transfer learning platformNLP Transfer learning platform
NLP Transfer learning platform
 
Bert
BertBert
Bert
 

Similar to Learning to Translate with Joey NMT

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java codeAttila Balazs
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingsaber tabatabaee
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Igalia
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsChris Rackauckas
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
The *on-going* future of Perl5
The *on-going* future of Perl5The *on-going* future of Perl5
The *on-going* future of Perl5Vytautas Dauksa
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
Programming languages and concepts by vivek parihar
Programming languages and concepts by vivek pariharProgramming languages and concepts by vivek parihar
Programming languages and concepts by vivek pariharVivek Parihar
 
7 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM20157 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM2015Francesco Degrassi
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman
 
Rsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupRsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupSanjana Chowdhury
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesAlex Cruise
 

Similar to Learning to Translate with Joey NMT (20)

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn..."Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancoding
 
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia Ecosystems
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
The *on-going* future of Perl5
The *on-going* future of Perl5The *on-going* future of Perl5
The *on-going* future of Perl5
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Programming languages and concepts by vivek parihar
Programming languages and concepts by vivek pariharProgramming languages and concepts by vivek parihar
Programming languages and concepts by vivek parihar
 
7 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM20157 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM2015
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Rsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first StartupRsqrd AI: ML Tooling at an AI-first Startup
Rsqrd AI: ML Tooling at an AI-first Startup
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Learning to Translate with Joey NMT

  • 1. Learning to Translate with Joey NMT PyData Meetup Montreal Julia Kreutzer Feb 25, 2021
  • 2. Today 1. Neural Machine Translation 101 a. Translation as a ML problem b. Transformer model c. The role of data 2. Joey NMT a. Features and purpose b. Demo c. Use cases 3. Q & A Assuming basic ML knowledge, familiarity with neural networks. What's the technology behind modern machine translation? How can you get started? Why open-sourcing? Why another toolkit?
  • 3. [Optional] Demo Preparation If you want to train your own translation model during this presentation: 1. Open joey_demo.ipynb on Colab. 2. Create a copy. 3. Select GPU runtime: Runtime -> Change runtime type -> Hardware accelerator: GPU 4. Run all cells: Runtime -> Run all 5. Come back to the talk :) We'll inspect later what's happening there.
  • 5. Translation as a ML Problem Challenges ➢ Unlimited length ➢ Structural dependencies ➢ Unseen words ➢ Figurative language Seq2Seq ➢ Modeling sentences (mostly) ➢ Connections between all words ➢ Sub-word modeling ➢ A lot of training data Input: What is a poutine ? Output: Qu'est-ce qu'une poutine ?
  • 6. The Transformer "Attention is all you need" Vaswani et al. 2017 Decoder Specialties Source: Vaswani et al. 2017
  • 7. Training vs. Inference Conditional language modeling: Predict the next token yt : ● given source X and all previous tokens of the reference during training. ● given source X and previously predicted tokens during inference. Training with MLE, inference with greedy or beam search.
  • 8. Beam Search Source: G. Neubig's course on MT and Seq2Seq Keep the k most likely prediction sequences in each step. ➢ more expensive than greedy ➢ more exact Implementation on mini-batches is tricky! k=2
  • 9. Words? Pre-processing plays a huge role in NMT. qu'est-ce qu'une poutine ? 4 tokens, 4 types vs qu ' est - ce qu ' une poutine ? 10 tokens, 8 types ➢ Sub-words instead of words: frequency-based automatic segmentation. ➢ Algorithms: BPE, unigram LM. ➢ Implementations: subword-nmt, SentencePiece.
  • 10. The Role of Data A "base"-sized Transformer has ~65M weights. How much data does it need? ➢ It depends! ➢ “As much as you can find" heuristic ➢ Beyond parallel data ○ unsupervised NMT ○ data augmentation ○ dictionaries ○ pre-trained embeddings ○ multilingual modeling How similar are source and target language? What kind of quality are you expecting? How complex is the text?
  • 11. Evaluation Input: What is a poutine ? Reference: Qu'est-ce qu'une poutine ? Outputs: 1. Est-ce qu'une poutine ? 2. Que-ce une poutine ? 3. Qu'une poutine ? 4. Qu'est-ce qu'un poutin ? 5. C'est qu'une poutine . How should these outputs be ranked / scored?
  • 12. Evaluation Input: What is a poutine ? Reference: Qu'est-ce qu'une poutine ? Outputs: 1. Est-ce qu'une poutine ? 59.5 82.8 2. Que-ce une poutine ? 32.0 51.4 3. Qu'une poutine ? 39.4 58.3 4. Qu'est-ce qu'un poutin ? 19.0 74.4 5. C'est qu'une poutine . 32.0 60.8 BLEU: geometric average of token n-gram precisions, brevity penalty ChrF: character n-gram F-score
  • 13. Joey NMT Joint work with Jasmijn Bastings, Mayumi Ohta and Joey NMT contributors
  • 14. Problem + A lot of code for NMT is online. + Free compute through Colab. + Data is freely available. Is it clean? How long would I have to study it? Are all features documented? How can I run it on Colab? How do I need to prepare data to use it? Does that mean it's accessible?
  • 15. Solution Joey NMT: clean, minimalist, documented. ➢ Much smaller than other toolkits ➢ Covers core features ➢ User study on usability ➢ The core API changes very little. ➢ Examples, pre-trained models, tutorials, FAQ ➢ Based on PyTorch Does not do everything, does not grow much.
  • 16. Features You can: ● train a RNN/Transformer model ● on CPU, one or multiple GPUs ● monitor the training process ● configure hyperparameters ● store it, load it, test it And more: ● follow training recipes ● modify the code easily ● get inspiration from other extensions ● share/load pre-trained models
  • 17. It's cute, but can it compete? Quality? ➢ Comparable to other toolkits. Adoption? ➢ Not as popular. Innovation? ➢ More and more research.
  • 18. It's cute, but can it compete? Quality? ➢ Comparable to other toolkits. Adoption? ➢ Not as popular. Innovation? ➢ More and more research. It might not be the best choice for ➢ exact replication of another paper -> use their code instead ➢ non-seq2seq applications ➢ performance-critical applications (not optimized for it) ➢ loading BERT (not implemented)
  • 19. Demo
  • 20. Cool stuff feat. Joey NMT Grassroots research communities ➢ Masakhane: NLP for African languages ➢ Turkic Interlingua: NLP for Turkic languages Extensions ➢ Reinforcement learning ➢ Sign language translation ➢ Speech translation ➢ Image captioning ➢ Slack bot More on this list.
  • 21. Material ➢ Neural networks in NLP ○ Y. Goldberg: A Primer on Neural Network Models for Natural Language Processing ○ G. Neubig: CMU CS 11-747: Neural Networks for NLP ➢ Neural Machine translation ○ P. Koehn: Neural Machine Translation (Draft Chapter of the Statistical MT book) ○ G. Neubig: Tutorial on Neural Machine Translation ○ A. Rush: The Annotated Transformer ○ J. Bastings: The Annotated Encoder-Decoder ○ M. Müller: Seven Recommendations for MT Evaluation ➢ Joey NMT ○ Joey NMT paper ○ Joey NMT tutorial ○ Masakhane notebooks and YouTube tutorial ○ Turkic Interlingua YouTube tutorial