SlideShare a Scribd company logo
1 of 36
Download to read offline
Deep Learning
for Machine Translation
A dramatic turn of paradigm
Alberto Massidda
Who we are
● Founded in 2001;
● Branches in Milan, Rome and London;
● Market leader in enterprise ready solutions based on Open Source tech;
● Expertise:
○ Open Source
○ DevOps
○ Public and private cloud
○ Search
○ BigData and many more...
This presentation is Open Source (yay!)
https://creativecommons.org/licenses/by-nc-sa/3.0/
Outline
1. Statistical Machine Translation
2. Neural Machine Translation
3. Domain Adaptation
4. Zero shot translation
5. Unsupervised Neural MT
Statistical Machine Translation
Translating as a ciphered message recovery through probability laws:
1. Foreign language as a noisy channel
2. Language model and Translation model
3. Training (building the translation model)
4. Decoding (translating with the translation model)
Noisy channel model
Goal
Translate a sentence in foreign language f to our language e:
The abstract model
1. Transmit e over a noisy channel.
2. Channel garbles sentence and f is received.
3. Try to recover e by thinking about:
a. how likely is that e was the message, p(e) (source model)
b. how f is turned into e, p(e|f) (channel model)
Word choice and word reordering
P(f|e) cares about words, in any order.
● “It’s too late” → “Tardi troppo è” ✓
● “It’s too late” → “È troppo tardi” ✓
● “It’s too late” → “È troppa birra” ✗
P(e) cares about words order.
● “È troppo tardi” ✓
● “Tardi troppo è” ✗
P(e) and P(f|e)
Where does
these numbers
come from?
P(e) comes from a Language model, a machine that assigns scores to
sentences, estimating their likelihood.
1. Record every sentence ever said in English (1 Billion?)
2. If the sentence “how’s it going?” appears 76413 times in that database, then
we say:
Language model
Translation model
Next we need to worry about P(f|e), the probability of a French string f given an
English string e.
This is called a translation model.
It boils down to computing alignments between source and target languages.
Computing alignments intuition
Pairs of English and Chinese words which come together in a parallel example
may be translations of each other.
Training Data
A parallel corpus is a collection of texts, each of which is translated into one or
more other languages than the original.
EN IT
Look at that! Guarda lì!
I' ve never seen anything like that! Non ho mai visto nulla di simile!
That's incredible! É incredibile!
That's terrific. É eccezionale.
Computing alignments: Expectation Maximization
This algorithm iterates over data,
exacerbating latent properties of a
system.
It finds a local optimum convergence
point without any user supervision.
Example with a 2 sentence corpus:
b c
b y
yx
Decoding
Now it’s time to decode our string encoded by the noisy channel.
Word alignments are leveraged to build a “space” for a search algorithm.
Translating is searching in a space of options.
Translation options as a coverage set
Decoding in action
1. The algorithm builds the search space as a tree of options, sorted by p(e|f).
a. Search space is limited to a fixed size named “beam”.
2. Each option is picked on highest probability first.
a. Reordering adds a penalty.
b. Language model penalizes each stage output.
3. Translation stops when all source words are translated, or covered.
Decoding in action
Neural machine translation
NMT is based on probability too, but has some differences:
● End-to-end training: no more separate Translation + Language Models.
● Markovian assumption, instead of Naive Bayesian: words move together.
If a sentence f of length n is a sequence of words , then p(f) is:
Neural network review: feed-forward
Weighted links determine the strength a neuron can influence its neighbours.
Deviation between outputs and expected values affects rebalancing of weights.
But a feed forward network is not suitable to map the temporal dependencies
between words. We need an architecture than can explicitly map sequences.
Recurrent network
Neural language model
Encoder - Decoder architecture
With a sentence f and e :
(one single sequence)
Languages are independent (vocabulary and domain), so
we can split in 2 separate RNNs:
1. (summary vector of source)
2. Each new word depends on history
Sequence-to-sequence (seq2seq) architecture
THE WAITER TOOK THE PLATES
h h h h h
g g g g g
IL
CAMERI
ERE
PRESE I PIATTI
Summary vector as information bottleneck
Fixed sized representation degrades as sentence length increases.
This is because the alignment learning operates on many-to-many logic.
Gradient flows towards everybody for any alignment mistake.
Let’s gate gradient flow through a context vector, as a weighted average of
source hidden states (also known as “soft search” or “attention”).
Weights computed by feed-forward network with softmax activation.
Attention model
THE WAITER TOOK THE PLATES
h h h h h
g g g g g
IL
CAMERI
ERE
PRESE I PIATTI
+
0.7 0.05
0.1 0.050.1
Attention model
THE WAITER TOOK THE PLATES
h h h h h
g g g g g
IL
CAMERI
ERE
PRESE I PIATTI
+
0.1 0.05
0.1 0.050.7
Attention model
THE WAITER TOOK THE PLATES
h h h h h
g g g g g
IL
CAMERI
ERE
PRESE I PIATTI
+
0.05 0.05
0.7 0.10.1
Attention model
THE WAITER TOOK THE PLATES
h h h h h
g g g g g
IL
CAMERI
ERE
PRESE I PIATTI
+
0.05 0.1
0.1 0.70.05
Attention model
THE WAITER TOOK THE PLATES
h h h h h
g g g g g
IL
CAMERI
ERE
PRESE I PIATTI
+
0.05 0.7
0.1 0.10.05
Neural domain adaptation
Sometimes we want our network to assume a
particular style, but we don’t have enough data.
Solution: adapt an already trained network.
1. First, train the full network with general data to
obtain a general model.
2. Then, train last layers on new data to have it
influence stylistically the output.
Zero shot translation: Google Neural MT
We can use a single system for
multilingual MT: just feed all the different
parallel data inside the same system.
Tag input data with desired target
language: NMT will translate in target
language!
As a side effect, we build an internal
“shared knowledge representation”.
This enables to translate between unseen
language pairs.
GNMT
French English
German Italian
<2IT> I am here<2DE> je suis ici
Sono quiIch bin hier
FR → DE
EN → IT
EN → DE?
Unsupervised NMT
We can translate even without parallel data, using just two monolingual corpora.
Each corpus builds a latent semantic space. Similar languages build similar spaces.
Translation as geometrical mapping between affine latent semantic spaces.
x z
encoder
decoder
source sentence latent space
target sentence
ydecoder
auto encoder
x^
Links
https://www.tensorflow.org/tutorials/seq2seq
NMT (seq2seq) Tutorial
https://github.com/google/seq2seq
A general-purpose encoder-decoder framework for Tensorflow
https://github.com/awslabs/sockeye
seq2seq framework with a focus on NMT based on Apache MXNet
http://www.statmt.org/
Old school statistical MT reference site
QA

More Related Content

What's hot

Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Bca1020 programming in c
Bca1020  programming in cBca1020  programming in c
Bca1020 programming in csmumbahelp
 
17430 data communication &amp; net
17430  data communication &amp; net17430  data communication &amp; net
17430 data communication &amp; netsoni_nits
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with PythonRalf Gommers
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Modelssaurav singla
 
Neural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code PredictionNeural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code PredictionYusuke Oda
 
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)Aditya Yadav
 
Universal programmability how ai can help
Universal programmability how ai can helpUniversal programmability how ai can help
Universal programmability how ai can helpCS, NcState
 
python classes in thane
python classes in thanepython classes in thane
python classes in thanefaizrashid1995
 
Good Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial IntelligenceGood Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial IntelligenceRobert Short
 
C Types - Extending Python
C Types - Extending PythonC Types - Extending Python
C Types - Extending PythonPriyank Kapadia
 
Extending Python with ctypes
Extending Python with ctypesExtending Python with ctypes
Extending Python with ctypesAnant Narayanan
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming KrishnaMildain
 
NTUT Information Security Homework 1
NTUT Information Security Homework 1 NTUT Information Security Homework 1
NTUT Information Security Homework 1 dennysora
 

What's hot (20)

Image (PNG) Forensic Analysis
Image (PNG) Forensic Analysis	Image (PNG) Forensic Analysis
Image (PNG) Forensic Analysis
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Smt in-a-few-slides
Smt in-a-few-slidesSmt in-a-few-slides
Smt in-a-few-slides
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Bca1020 programming in c
Bca1020  programming in cBca1020  programming in c
Bca1020 programming in c
 
17430 data communication &amp; net
17430  data communication &amp; net17430  data communication &amp; net
17430 data communication &amp; net
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 
Neural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code PredictionNeural Machine Translation via Binary Code Prediction
Neural Machine Translation via Binary Code Prediction
 
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
Automatski - NP-Complete - TSP - Travelling Salesman Problem Solved in O(N^4)
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Universal programmability how ai can help
Universal programmability how ai can helpUniversal programmability how ai can help
Universal programmability how ai can help
 
python classes in thane
python classes in thanepython classes in thane
python classes in thane
 
Ctypes
CtypesCtypes
Ctypes
 
Good Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial IntelligenceGood Old Fashioned Artificial Intelligence
Good Old Fashioned Artificial Intelligence
 
C Types - Extending Python
C Types - Extending PythonC Types - Extending Python
C Types - Extending Python
 
Extending Python with ctypes
Extending Python with ctypesExtending Python with ctypes
Extending Python with ctypes
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
 
NTUT Information Security Homework 1
NTUT Information Security Homework 1 NTUT Information Security Homework 1
NTUT Information Security Homework 1
 

Similar to Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - Codemotion Rome 2018

The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryVinícius Uchôa
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered HarmfulPrateek Singh
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdfChaoYang81
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET Journal
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthingtonoscon2007
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docxevonnehoggarth79783
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 

Similar to Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - Codemotion Rome 2018 (20)

The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
 
Introducing Parallel Pixie Dust
Introducing Parallel Pixie DustIntroducing Parallel Pixie Dust
Introducing Parallel Pixie Dust
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered Harmful
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Model checking
Model checkingModel checking
Model checking
 
Moses
MosesMoses
Moses
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Chatbot_Presentation
Chatbot_PresentationChatbot_Presentation
Chatbot_Presentation
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthington
 
Generating Creative Works with AI
Generating Creative Works with AIGenerating Creative Works with AI
Generating Creative Works with AI
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - Codemotion Rome 2018

  • 1. Deep Learning for Machine Translation A dramatic turn of paradigm Alberto Massidda
  • 2. Who we are ● Founded in 2001; ● Branches in Milan, Rome and London; ● Market leader in enterprise ready solutions based on Open Source tech; ● Expertise: ○ Open Source ○ DevOps ○ Public and private cloud ○ Search ○ BigData and many more...
  • 3. This presentation is Open Source (yay!) https://creativecommons.org/licenses/by-nc-sa/3.0/
  • 4. Outline 1. Statistical Machine Translation 2. Neural Machine Translation 3. Domain Adaptation 4. Zero shot translation 5. Unsupervised Neural MT
  • 5. Statistical Machine Translation Translating as a ciphered message recovery through probability laws: 1. Foreign language as a noisy channel 2. Language model and Translation model 3. Training (building the translation model) 4. Decoding (translating with the translation model)
  • 6. Noisy channel model Goal Translate a sentence in foreign language f to our language e: The abstract model 1. Transmit e over a noisy channel. 2. Channel garbles sentence and f is received. 3. Try to recover e by thinking about: a. how likely is that e was the message, p(e) (source model) b. how f is turned into e, p(e|f) (channel model)
  • 7. Word choice and word reordering P(f|e) cares about words, in any order. ● “It’s too late” → “Tardi troppo è” ✓ ● “It’s too late” → “È troppo tardi” ✓ ● “It’s too late” → “È troppa birra” ✗ P(e) cares about words order. ● “È troppo tardi” ✓ ● “Tardi troppo è” ✗
  • 8. P(e) and P(f|e) Where does these numbers come from?
  • 9. P(e) comes from a Language model, a machine that assigns scores to sentences, estimating their likelihood. 1. Record every sentence ever said in English (1 Billion?) 2. If the sentence “how’s it going?” appears 76413 times in that database, then we say: Language model
  • 10. Translation model Next we need to worry about P(f|e), the probability of a French string f given an English string e. This is called a translation model. It boils down to computing alignments between source and target languages.
  • 11. Computing alignments intuition Pairs of English and Chinese words which come together in a parallel example may be translations of each other.
  • 12. Training Data A parallel corpus is a collection of texts, each of which is translated into one or more other languages than the original. EN IT Look at that! Guarda lì! I' ve never seen anything like that! Non ho mai visto nulla di simile! That's incredible! É incredibile! That's terrific. É eccezionale.
  • 13. Computing alignments: Expectation Maximization This algorithm iterates over data, exacerbating latent properties of a system. It finds a local optimum convergence point without any user supervision. Example with a 2 sentence corpus: b c b y yx
  • 14. Decoding Now it’s time to decode our string encoded by the noisy channel. Word alignments are leveraged to build a “space” for a search algorithm. Translating is searching in a space of options.
  • 15. Translation options as a coverage set
  • 16. Decoding in action 1. The algorithm builds the search space as a tree of options, sorted by p(e|f). a. Search space is limited to a fixed size named “beam”. 2. Each option is picked on highest probability first. a. Reordering adds a penalty. b. Language model penalizes each stage output. 3. Translation stops when all source words are translated, or covered.
  • 18. Neural machine translation NMT is based on probability too, but has some differences: ● End-to-end training: no more separate Translation + Language Models. ● Markovian assumption, instead of Naive Bayesian: words move together. If a sentence f of length n is a sequence of words , then p(f) is:
  • 19. Neural network review: feed-forward Weighted links determine the strength a neuron can influence its neighbours. Deviation between outputs and expected values affects rebalancing of weights. But a feed forward network is not suitable to map the temporal dependencies between words. We need an architecture than can explicitly map sequences.
  • 22. Encoder - Decoder architecture With a sentence f and e : (one single sequence) Languages are independent (vocabulary and domain), so we can split in 2 separate RNNs: 1. (summary vector of source) 2. Each new word depends on history
  • 23. Sequence-to-sequence (seq2seq) architecture THE WAITER TOOK THE PLATES h h h h h g g g g g IL CAMERI ERE PRESE I PIATTI
  • 24. Summary vector as information bottleneck Fixed sized representation degrades as sentence length increases. This is because the alignment learning operates on many-to-many logic. Gradient flows towards everybody for any alignment mistake. Let’s gate gradient flow through a context vector, as a weighted average of source hidden states (also known as “soft search” or “attention”). Weights computed by feed-forward network with softmax activation.
  • 25. Attention model THE WAITER TOOK THE PLATES h h h h h g g g g g IL CAMERI ERE PRESE I PIATTI + 0.7 0.05 0.1 0.050.1
  • 26. Attention model THE WAITER TOOK THE PLATES h h h h h g g g g g IL CAMERI ERE PRESE I PIATTI + 0.1 0.05 0.1 0.050.7
  • 27. Attention model THE WAITER TOOK THE PLATES h h h h h g g g g g IL CAMERI ERE PRESE I PIATTI + 0.05 0.05 0.7 0.10.1
  • 28. Attention model THE WAITER TOOK THE PLATES h h h h h g g g g g IL CAMERI ERE PRESE I PIATTI + 0.05 0.1 0.1 0.70.05
  • 29. Attention model THE WAITER TOOK THE PLATES h h h h h g g g g g IL CAMERI ERE PRESE I PIATTI + 0.05 0.7 0.1 0.10.05
  • 30. Neural domain adaptation Sometimes we want our network to assume a particular style, but we don’t have enough data. Solution: adapt an already trained network. 1. First, train the full network with general data to obtain a general model. 2. Then, train last layers on new data to have it influence stylistically the output.
  • 31. Zero shot translation: Google Neural MT We can use a single system for multilingual MT: just feed all the different parallel data inside the same system. Tag input data with desired target language: NMT will translate in target language! As a side effect, we build an internal “shared knowledge representation”. This enables to translate between unseen language pairs. GNMT French English German Italian <2IT> I am here<2DE> je suis ici Sono quiIch bin hier FR → DE EN → IT EN → DE?
  • 32.
  • 33. Unsupervised NMT We can translate even without parallel data, using just two monolingual corpora. Each corpus builds a latent semantic space. Similar languages build similar spaces. Translation as geometrical mapping between affine latent semantic spaces. x z encoder decoder source sentence latent space target sentence ydecoder auto encoder x^
  • 34. Links https://www.tensorflow.org/tutorials/seq2seq NMT (seq2seq) Tutorial https://github.com/google/seq2seq A general-purpose encoder-decoder framework for Tensorflow https://github.com/awslabs/sockeye seq2seq framework with a focus on NMT based on Apache MXNet http://www.statmt.org/ Old school statistical MT reference site
  • 35.
  • 36. QA