35 min talk about developing NMT version of the Koreanizer, a Ro-Ko transliterator, as an extended talk after PyCon KR 2019 where I showed SMT based transliterator.
In this talk, we will go through essential concepts which are encoder-decoder architecture and attention model for developing NMT, and also Dynamic Programming will be introduced as a key programming technique to developing such system with some examples.
4. Introduction
● Transfer based translation
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
syntax
phrases
words
syntax
phrases
words
6. Introduction
● Interlingual Translation
○ Two phases
■ Analysis : Analyze the source language
into a semantic representation
■ Generation : Convert the
representation into an target language
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
Interlingua
analysis
generation
7. Outline
● Introduction
● Neural Machine Translation
○ Drawbacks in SMT
○ Neural Language Model
○ Encoder-Decoder architecture
○ Attention Model
○ Ro-Ko Transliterator
● Dynamic Programming
○ Definition
○ Code examples
8. Neural Machine Translation
● Phrase based translation
○ Translation task breaks up source sentences into multiple chunks
○ and then translates them phrase-by-phrase
● Local translation problem
○ can’t capture long-range dependencies in languages
■ e.g., gender agreements, syntax structures
○ this led to disfluency in translation outputs
Drawbacks in SMT
9. Neural Machine Translation
● Standard Network for a text sequence
○ Input, outputs can be different lengths in different examples
○ Doesn’t share features learned across different positions of text
Neural Language Model
quoted from Andrew Ng’s Coursera lecture
10. Neural Machine Translation
● RNN Language Model
○ P(w1
w2
w3
... wt
) = P(w1
) x P(w2
|w1
) x P(w3
|w1
w2
) x …… x P(wt
|w1
w2
...wt-1
)
○ Each step in RNN outputs distribution over the next word given preceding words
○ P(<s>Cats average 15 hours of sleep a day</s>)
Neural Language Model
a0
a1
<s>
P(cats|<s>)
a2
cats
P(average|cats)
a1
average
P(15|cats average)
a1
day
P(</s>|......)
……
11. ● Conditional Language Model
○ P(y1
y2
… yT
| x1
x2
… xT
)
Language Model :
Machine Translation :
Neural Machine Translation Neural Language Model
quoted from Andrew Ng’s Coursera lecture
12. NMT
● Encoder
○ reads the source sentence to build a “thought” vector
○ the vector presents the sentence meaning
● Decoder
○ processes the “thought” vector to emit a translation
Encoder-Decoder architecture
quoted from Google’s Tensorflow tutorial
14. NMT
● Problem of long sequences
○ works well with short sentences
○ performance drops on long sentences
Attention Model
quoted from Andrew Ng’s Coursera lecture
16. Dynamic Programming
● To grown-ups
○ In Mathematical Optimization and
Computation Programming Method
○ Simplifying a problem by breaking it
down into simpler sub-problems in a
recursive manner.
○ Applicable under two conditions
■ optimal sub-structure
■ overlapping sub-problems
Definition
17. Dynamic Programming
● Fibonacci Numbers
○ F0
= 0, F1
= 1, and Fn
= Fn-1
+ Fn-2
for n > 1
○ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …
● Approaches
○ by Recursion (Naive approach)
○ by Memoization (Top-down)
○ by Tabulation (Buttom-up)
Code Examples
18. Dynamic Programming
● Word Segmentation
○ “whatdoesthisreferto” ⇒ “what does this refer to”
● Best segmentation Ps
○ one with highest probability
● Probability of a segmentation
○ Pw
(first word) x Ps
(rest of segmentation)
● Pw
(word)
○ estimated by counting (unigram)
● Ps
(“choosespain”)
○ Pw
(“choose”) x Pw
(“spain”) > Pw
(“chooses”) x Pw
(“pain”)
Code Examples
19. Dynamic Programming
● Segmentation problem Ps
(“whatdoesthisreferto”)
→ P(“w”) x Ps
(“hatdoesthisreferto”)
→ P(“wh”) x Ps
(“atdoesthisreferto”)
→ P(“wha”) x Ps
(“tdoesthisreferto”)
→ P(“what”) x Ps
(“doesthisreferto”)
→ ……
Code Examples