SlideShare a Scribd company logo
1 of 36
Download to read offline
Seq2Seq &
Neural Machine Translation
By Sam Witteveen
Seq2Seq Model Uses
• Machine Translation
• Auto Reply
• Dialogue Systems
• Speech Recognition
• Time Series
• Chatbots
• Audio
• Image Captioning
• Q&A
• many more
Why Seq2seq
• Sequences preserve the order of the inputs
• They allow us to process information that has a
time or order element to it
• They allow us to preserve information that
couldn’t be done done via normal neural
networks
Seq2Seq Key Components
• Encoder - takes info in, in time steps then
creates a hidden state to be passed to the
decoder
• Decoder - takes the hidden state/states and
uses that to predict the correct next step in the
sequence
• Lots of data
Seq2Seq Key idea
The aim is to convert a sequence into a fixed size
feature vector that encodes only the important
information in the sequence while losing the
unnecessary information.
Seq2Seq
Encoder Decoder
LSTM LSTM LSTM LSTM
Embedding
U
B
V V V
U U U
BA AAA
I am good
How are you ?
<GO>
1 2 3 4 5 6 7
Seq2Seq
Bidirectional Encoder
LSTM LSTM LSTM LSTM
Embedding
U U U U
A AAA
How are you ?
1 2 3 4
LSTM LSTM LSTM LSTMA A A A
The BiLSTM Secret
• Bidirectional LSTMs generally work better than
anything else for almost every NLP task
• Often the more BiLSTMs the better
• State of the art is usually BiLSTMs with Attention
• State of the art is still often lacking
Stacked
Bidirectional Encoder
LSTM LSTM LSTM LSTM
Embedding
U U U U
A AAA
How are you ?
1 2 3 4
LSTM LSTM LSTM LSTMA A A A
LSTM LSTM LSTM LSTMA AAA
LSTM LSTM LSTM LSTMA A A A
Decoder
LSTM
B
V V V
B
I
am good
Context
Vector
Dense Layer
with Softmax
vocab_size
TimeDistributed
<GO>
What is padding
Hello how are you today ?
No padding
‘Hello’ ‘how’ ‘are’ ‘you’ ‘?’ ‘<pad>’ ‘<pad>’ ‘<pad>’
Padded Length 8
I am fine
‘I’ ‘am’ ‘fine’‘<pad>’ ‘<pad>’ ‘<pad>’ ‘<pad>’ ‘<pad>’
All input sequences must be same
length as each other.
All output sequences must be
same length as each other.
Special Tokens
<PAD> padded zero input
<EOS> end of sentence
<GO> telling the decoder to start
<UNK> unknown
<ES2> language to translate to
<OOV> out of vocabulary
Lookup tables
• We can’t pass words directly to the network
• We have to assign each word to an index
number
‘Hello’, ‘how’, ‘are’, ‘you’, ‘?’, ‘<pad>’, ‘<pad>’, ‘<pad>’
‘Hello’, ‘you’, ‘!’,’<pad>’, ‘<pad>’, ‘<pad>’, ‘<pad>’, ‘<pad>’
23,4,13,14,8,0,0,0
23,14,23,0,0,0,0,0
Lookup tables
• We often discard words that are used very little and
replace them with unknown <UNK>
• The bigger the vocabulary we have the harder it will
be at prediction time, so we want the words that are
used the most.
Embeddings
• Embeddings allow us to extract more semantic
meaning from the words
• Embeddings like Word2Vec have been trained
on billions of words and have abstracted a lot of
the meaning from those words.
Inputs
• Glove embedding = 100 dim
• Max sequence length = 30
• Batch size = 64
• Tensor input shape = (64,30,100)
Language Translation
• USD$40 Billion a year
• Google translates over a 100 billion words a day
• Facebook has been working on it’s own systems
• Ecommerce etc etc
Why is translation hard?
• The correct word to use depends on other words in
the sentence
• Order of words can change in different languages
• Rules don’t work, need to use statistical approaches
• Traditional SMT was really complicated
Why NMT
Chris Manning - Stanford Course
Any to Any in the past
• Google ~ 80 languages
• 6400 MT systems Bilingual systems
• Interlingua 80 Encoders 80 Decoders
<ES2> translate to Spanish
<IT2> translate to Italian
NMT
Encoder Decoder
Context
Embedding
U
B
V V V
U U
BA AAA
I am good
How are you
<GO>
1 2 3 5 6 7
Code time
Vanilla Seq2Seq Problems
• Works well on short sentences but not long ones
• LSTMs can remember out to about 30 steps
• Drops off very quickly after 30
Advanced Seq2Seq
Attention
Teacher Forcing
Peeking
Beam Search
Attention
Attention
Attention
• It even does better for short sentence length
• NMT without attention often generate sentences
with good grammar but gets the name wrong or
repeats itself
• Attention gives us like a fixed vector of RAM to
score the words
What words are important?
Last Friday David’s team went out but the others stayed in
Last Friday David’s team went out but the others stayed in
Attention Scoring
Encoder Decoder
h0 h1 h2
Embedding
U
B
V V V
U U
BA AAA
I am good
How are you
<GO>
1 2 3 5 6 7
Attention
Vector.8.1.7
Teacher Forcing
When training the network instead of letting
the decoder pass its predictions to the next
layer, pass the correct word/state
GRU GRU GRU GRU
Embedding
U
B
V V V
U U U
BA AAA
I
How are you ?
<GO>
1 2 3 4 5 6 7
I am fine
amI
I am
Peeking
Encoder Decoder
BiLSTM BiLSTM BiLSTM
Context
Vector
Embedding
U
B
V V V
U U
BA AAA
I am good
How are you
<GO>
1 2 3 5 6 7
Peeking
Keras Resources
Resources
• https://www.nytimes.com/2016/12/14/magazine/
the-great-ai-awakening.html
• http://distill.pub/2016/augmented-rnns/
• Chris Manning - NLP with Deep Learning Stanford
• Quoc Le - Seq2Seq Deep Learning: https://
www.youtube.com/watch?v=G5RY_SUJih4
Papers
• Grammar as a foreign language -Neubig
• Google’s Neural Machine Translation System : Bridging
the gap - Wu et al
• Google’s multilingual MNT System: enabling zero-shot
translation
• NMT and sequence to sequence models: a tutorial-
Neubig
The End
Contact
email: s@maclea.ai
twitter: sam_witteveen
https://github.com/samwit/

More Related Content

Similar to Tensor flow05 neural-machine-translation-seq2seq

Challenges Building Secure Mobile Applications
Challenges Building Secure Mobile ApplicationsChallenges Building Secure Mobile Applications
Challenges Building Secure Mobile ApplicationsMasabi
 
GOTO Night with Todd Montgomery: Aeron: What, why and what next?
GOTO Night with Todd Montgomery: Aeron: What, why and what next?GOTO Night with Todd Montgomery: Aeron: What, why and what next?
GOTO Night with Todd Montgomery: Aeron: What, why and what next?Alexandra Masterson
 
Demystifying Secure enclave processor
Demystifying Secure enclave processorDemystifying Secure enclave processor
Demystifying Secure enclave processorPriyanka Aash
 
ENNEoS Presentation - HackMiami
ENNEoS Presentation - HackMiamiENNEoS Presentation - HackMiami
ENNEoS Presentation - HackMiamiDrew Kirkpatrick
 
BSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOSBSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOSBSides Delhi
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Share winter 2016 encryption
Share winter 2016 encryptionShare winter 2016 encryption
Share winter 2016 encryptionbigendiansmalls
 
Bitcoin Keys, Addresses & Wallets
Bitcoin Keys, Addresses & WalletsBitcoin Keys, Addresses & Wallets
Bitcoin Keys, Addresses & WalletsChristopher Allen
 
Exploit Research and Development Megaprimer: Unicode Based Exploit Development
Exploit Research and Development Megaprimer: Unicode Based Exploit DevelopmentExploit Research and Development Megaprimer: Unicode Based Exploit Development
Exploit Research and Development Megaprimer: Unicode Based Exploit DevelopmentAjin Abraham
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...David Geurts
 
Columnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeColumnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeWang Zuo
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Inside Darwin Analytics
Inside Darwin AnalyticsInside Darwin Analytics
Inside Darwin Analyticsjelmersnoeck
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRicard Clau
 
Programming RPi for IoT Applications.pdf
Programming RPi for IoT Applications.pdfProgramming RPi for IoT Applications.pdf
Programming RPi for IoT Applications.pdfrakeshk213994
 
Please correct my code for me i am lost Thanks i submitted.pdf
Please correct my code for me i am lost Thanks i submitted.pdfPlease correct my code for me i am lost Thanks i submitted.pdf
Please correct my code for me i am lost Thanks i submitted.pdfkitty811
 
Building your own open-source voice assistant
Building your own open-source voice assistantBuilding your own open-source voice assistant
Building your own open-source voice assistantAll Things Open
 

Similar to Tensor flow05 neural-machine-translation-seq2seq (20)

Challenges Building Secure Mobile Applications
Challenges Building Secure Mobile ApplicationsChallenges Building Secure Mobile Applications
Challenges Building Secure Mobile Applications
 
GOTO Night with Todd Montgomery: Aeron: What, why and what next?
GOTO Night with Todd Montgomery: Aeron: What, why and what next?GOTO Night with Todd Montgomery: Aeron: What, why and what next?
GOTO Night with Todd Montgomery: Aeron: What, why and what next?
 
Demystifying Secure enclave processor
Demystifying Secure enclave processorDemystifying Secure enclave processor
Demystifying Secure enclave processor
 
ENNEoS Presentation - HackMiami
ENNEoS Presentation - HackMiamiENNEoS Presentation - HackMiami
ENNEoS Presentation - HackMiami
 
Survey of Percona Toolkit
Survey of Percona ToolkitSurvey of Percona Toolkit
Survey of Percona Toolkit
 
BSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOSBSidesDelhi 2018: Headshot - Game Hacking on macOS
BSidesDelhi 2018: Headshot - Game Hacking on macOS
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Share winter 2016 encryption
Share winter 2016 encryptionShare winter 2016 encryption
Share winter 2016 encryption
 
Bitcoin Keys, Addresses & Wallets
Bitcoin Keys, Addresses & WalletsBitcoin Keys, Addresses & Wallets
Bitcoin Keys, Addresses & Wallets
 
Exploit Research and Development Megaprimer: Unicode Based Exploit Development
Exploit Research and Development Megaprimer: Unicode Based Exploit DevelopmentExploit Research and Development Megaprimer: Unicode Based Exploit Development
Exploit Research and Development Megaprimer: Unicode Based Exploit Development
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
 
Columnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to comeColumnar processing for SQL-on-Hadoop: The best is yet to come
Columnar processing for SQL-on-Hadoop: The best is yet to come
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Integreation
IntegreationIntegreation
Integreation
 
Inside Darwin Analytics
Inside Darwin AnalyticsInside Darwin Analytics
Inside Darwin Analytics
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHP
 
Programming RPi for IoT Applications.pdf
Programming RPi for IoT Applications.pdfProgramming RPi for IoT Applications.pdf
Programming RPi for IoT Applications.pdf
 
Formal analysis-crypto-proto
Formal analysis-crypto-protoFormal analysis-crypto-proto
Formal analysis-crypto-proto
 
Please correct my code for me i am lost Thanks i submitted.pdf
Please correct my code for me i am lost Thanks i submitted.pdfPlease correct my code for me i am lost Thanks i submitted.pdf
Please correct my code for me i am lost Thanks i submitted.pdf
 
Building your own open-source voice assistant
Building your own open-source voice assistantBuilding your own open-source voice assistant
Building your own open-source voice assistant
 

More from Sam Witteveen

PyTorch 04 What's New in PyTorch Land
PyTorch 04 What's New in PyTorch LandPyTorch 04 What's New in PyTorch Land
PyTorch 04 What's New in PyTorch LandSam Witteveen
 
Tensor flow 06 tips and tricks
Tensor flow 06 tips and tricksTensor flow 06 tips and tricks
Tensor flow 06 tips and tricksSam Witteveen
 
Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Sam Witteveen
 
Tensor flow03 generativemodels part2 8x super resolution
Tensor flow03 generativemodels part2 8x super resolutionTensor flow03 generativemodels part2 8x super resolution
Tensor flow03 generativemodels part2 8x super resolutionSam Witteveen
 
Conversational agents
Conversational agentsConversational agents
Conversational agentsSam Witteveen
 
Hr Summit Biz Genius Presentation
Hr Summit Biz Genius PresentationHr Summit Biz Genius Presentation
Hr Summit Biz Genius PresentationSam Witteveen
 

More from Sam Witteveen (8)

PyTorch 04 What's New in PyTorch Land
PyTorch 04 What's New in PyTorch LandPyTorch 04 What's New in PyTorch Land
PyTorch 04 What's New in PyTorch Land
 
Tensor flow 06 tips and tricks
Tensor flow 06 tips and tricksTensor flow 06 tips and tricks
Tensor flow 06 tips and tricks
 
Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017
 
Tensor flow03 generativemodels part2 8x super resolution
Tensor flow03 generativemodels part2 8x super resolutionTensor flow03 generativemodels part2 8x super resolution
Tensor flow03 generativemodels part2 8x super resolution
 
Conversational agents
Conversational agentsConversational agents
Conversational agents
 
Traction
TractionTraction
Traction
 
Product development
Product developmentProduct development
Product development
 
Hr Summit Biz Genius Presentation
Hr Summit Biz Genius PresentationHr Summit Biz Genius Presentation
Hr Summit Biz Genius Presentation
 

Recently uploaded

A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 

Recently uploaded (20)

A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 

Tensor flow05 neural-machine-translation-seq2seq

  • 1. Seq2Seq & Neural Machine Translation By Sam Witteveen
  • 2. Seq2Seq Model Uses • Machine Translation • Auto Reply • Dialogue Systems • Speech Recognition • Time Series • Chatbots • Audio • Image Captioning • Q&A • many more
  • 3. Why Seq2seq • Sequences preserve the order of the inputs • They allow us to process information that has a time or order element to it • They allow us to preserve information that couldn’t be done done via normal neural networks
  • 4. Seq2Seq Key Components • Encoder - takes info in, in time steps then creates a hidden state to be passed to the decoder • Decoder - takes the hidden state/states and uses that to predict the correct next step in the sequence • Lots of data
  • 5. Seq2Seq Key idea The aim is to convert a sequence into a fixed size feature vector that encodes only the important information in the sequence while losing the unnecessary information.
  • 6. Seq2Seq Encoder Decoder LSTM LSTM LSTM LSTM Embedding U B V V V U U U BA AAA I am good How are you ? <GO> 1 2 3 4 5 6 7
  • 7. Seq2Seq Bidirectional Encoder LSTM LSTM LSTM LSTM Embedding U U U U A AAA How are you ? 1 2 3 4 LSTM LSTM LSTM LSTMA A A A
  • 8. The BiLSTM Secret • Bidirectional LSTMs generally work better than anything else for almost every NLP task • Often the more BiLSTMs the better • State of the art is usually BiLSTMs with Attention • State of the art is still often lacking
  • 9. Stacked Bidirectional Encoder LSTM LSTM LSTM LSTM Embedding U U U U A AAA How are you ? 1 2 3 4 LSTM LSTM LSTM LSTMA A A A LSTM LSTM LSTM LSTMA AAA LSTM LSTM LSTM LSTMA A A A
  • 10. Decoder LSTM B V V V B I am good Context Vector Dense Layer with Softmax vocab_size TimeDistributed <GO>
  • 11. What is padding Hello how are you today ? No padding ‘Hello’ ‘how’ ‘are’ ‘you’ ‘?’ ‘<pad>’ ‘<pad>’ ‘<pad>’ Padded Length 8 I am fine ‘I’ ‘am’ ‘fine’‘<pad>’ ‘<pad>’ ‘<pad>’ ‘<pad>’ ‘<pad>’ All input sequences must be same length as each other. All output sequences must be same length as each other.
  • 12. Special Tokens <PAD> padded zero input <EOS> end of sentence <GO> telling the decoder to start <UNK> unknown <ES2> language to translate to <OOV> out of vocabulary
  • 13. Lookup tables • We can’t pass words directly to the network • We have to assign each word to an index number ‘Hello’, ‘how’, ‘are’, ‘you’, ‘?’, ‘<pad>’, ‘<pad>’, ‘<pad>’ ‘Hello’, ‘you’, ‘!’,’<pad>’, ‘<pad>’, ‘<pad>’, ‘<pad>’, ‘<pad>’ 23,4,13,14,8,0,0,0 23,14,23,0,0,0,0,0
  • 14. Lookup tables • We often discard words that are used very little and replace them with unknown <UNK> • The bigger the vocabulary we have the harder it will be at prediction time, so we want the words that are used the most.
  • 15. Embeddings • Embeddings allow us to extract more semantic meaning from the words • Embeddings like Word2Vec have been trained on billions of words and have abstracted a lot of the meaning from those words.
  • 16. Inputs • Glove embedding = 100 dim • Max sequence length = 30 • Batch size = 64 • Tensor input shape = (64,30,100)
  • 17. Language Translation • USD$40 Billion a year • Google translates over a 100 billion words a day • Facebook has been working on it’s own systems • Ecommerce etc etc
  • 18. Why is translation hard? • The correct word to use depends on other words in the sentence • Order of words can change in different languages • Rules don’t work, need to use statistical approaches • Traditional SMT was really complicated
  • 19. Why NMT Chris Manning - Stanford Course
  • 20. Any to Any in the past • Google ~ 80 languages • 6400 MT systems Bilingual systems • Interlingua 80 Encoders 80 Decoders <ES2> translate to Spanish <IT2> translate to Italian
  • 21. NMT Encoder Decoder Context Embedding U B V V V U U BA AAA I am good How are you <GO> 1 2 3 5 6 7
  • 23. Vanilla Seq2Seq Problems • Works well on short sentences but not long ones • LSTMs can remember out to about 30 steps • Drops off very quickly after 30
  • 27. Attention • It even does better for short sentence length • NMT without attention often generate sentences with good grammar but gets the name wrong or repeats itself • Attention gives us like a fixed vector of RAM to score the words
  • 28. What words are important? Last Friday David’s team went out but the others stayed in Last Friday David’s team went out but the others stayed in
  • 29. Attention Scoring Encoder Decoder h0 h1 h2 Embedding U B V V V U U BA AAA I am good How are you <GO> 1 2 3 5 6 7 Attention Vector.8.1.7
  • 30. Teacher Forcing When training the network instead of letting the decoder pass its predictions to the next layer, pass the correct word/state GRU GRU GRU GRU Embedding U B V V V U U U BA AAA I How are you ? <GO> 1 2 3 4 5 6 7 I am fine amI I am
  • 31. Peeking Encoder Decoder BiLSTM BiLSTM BiLSTM Context Vector Embedding U B V V V U U BA AAA I am good How are you <GO> 1 2 3 5 6 7
  • 34. Resources • https://www.nytimes.com/2016/12/14/magazine/ the-great-ai-awakening.html • http://distill.pub/2016/augmented-rnns/ • Chris Manning - NLP with Deep Learning Stanford • Quoc Le - Seq2Seq Deep Learning: https:// www.youtube.com/watch?v=G5RY_SUJih4
  • 35. Papers • Grammar as a foreign language -Neubig • Google’s Neural Machine Translation System : Bridging the gap - Wu et al • Google’s multilingual MNT System: enabling zero-shot translation • NMT and sequence to sequence models: a tutorial- Neubig
  • 36. The End Contact email: s@maclea.ai twitter: sam_witteveen https://github.com/samwit/