집에서 다시 만든 머신러닝 기반 자동번역기
#DeepLearning
2019.8.24
이홍주 (lee.hongjoo@yandex.com)
KOREANizerencore
:
NMT based Ro-Ko Transliterator
2019.8.24
이홍주 (lee.hongjoo@yandex.com)
Introduction Previously on PyCon KR 2019
https://www.pycon.kr/program/talk-detail?id=117
Introduction
● Transfer based translation
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
syntax
phrases
words
syntax
phrases
words
Introduction
● SMT based Ro-Ko Transliteration
Previously on PyCon KR 2019
● Romanized K-pop Lyrics
○ 12,095 songs
○ 1,586,305 lines
○ 121,469 unique bi-word pairs
■ ex. “모르는 moreuneun”
Introduction
● Interlingual Translation
○ Two phases
■ Analysis : Analyze the source language
into a semantic representation
■ Generation : Convert the
representation into an target language
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
Interlingua
analysis
generation
Outline
● Introduction
● Neural Machine Translation
○ Drawbacks in SMT
○ Neural Language Model
○ Encoder-Decoder architecture
○ Attention Model
○ Ro-Ko Transliterator
● Dynamic Programming
○ Definition
○ Code examples
Neural Machine Translation
● Phrase based translation
○ Translation task breaks up source sentences into multiple chunks
○ and then translates them phrase-by-phrase
● Local translation problem
○ can’t capture long-range dependencies in languages
■ e.g., gender agreements, syntax structures
○ this led to disfluency in translation outputs
Drawbacks in SMT
Neural Machine Translation
● Standard Network for a text sequence
○ Input, outputs can be different lengths in different examples
○ Doesn’t share features learned across different positions of text
Neural Language Model
quoted from Andrew Ng’s Coursera lecture
Neural Machine Translation
● RNN Language Model
○ P(w1
w2
w3
... wt
) = P(w1
) x P(w2
|w1
) x P(w3
|w1
w2
) x …… x P(wt
|w1
w2
...wt-1
)
○ Each step in RNN outputs distribution over the next word given preceding words
○ P(<s>Cats average 15 hours of sleep a day</s>)
Neural Language Model
a0
a1
<s>
P(cats|<s>)
a2
cats
P(average|cats)
a1
average
P(15|cats average)
a1
day
P(</s>|......)
……
● Conditional Language Model
○ P(y1
y2
… yT
| x1
x2
… xT
)
Language Model :
Machine Translation :
Neural Machine Translation Neural Language Model
quoted from Andrew Ng’s Coursera lecture
NMT
● Encoder
○ reads the source sentence to build a “thought” vector
○ the vector presents the sentence meaning
● Decoder
○ processes the “thought” vector to emit a translation
Encoder-Decoder architecture
quoted from Google’s Tensorflow tutorial
NMT seq2seq model
quoted from Andrew Ng’s Coursera lecture
NMT
● Problem of long sequences
○ works well with short sentences
○ performance drops on long sentences
Attention Model
quoted from Andrew Ng’s Coursera lecture
Ro-Ko Transliteration
● http://enc-koreanizer.herokuapp.com
Enc-Koreanizer
Dynamic Programming
● To grown-ups
○ In Mathematical Optimization and
Computation Programming Method
○ Simplifying a problem by breaking it
down into simpler sub-problems in a
recursive manner.
○ Applicable under two conditions
■ optimal sub-structure
■ overlapping sub-problems
Definition
Dynamic Programming
● Fibonacci Numbers
○ F0
= 0, F1
= 1, and Fn
= Fn-1
+ Fn-2
for n > 1
○ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …
● Approaches
○ by Recursion (Naive approach)
○ by Memoization (Top-down)
○ by Tabulation (Buttom-up)
Code Examples
Dynamic Programming
● Word Segmentation
○ “whatdoesthisreferto” ⇒ “what does this refer to”
● Best segmentation Ps
○ one with highest probability
● Probability of a segmentation
○ Pw
(first word) x Ps
(rest of segmentation)
● Pw
(word)
○ estimated by counting (unigram)
● Ps
(“choosespain”)
○ Pw
(“choose”) x Pw
(“spain”) > Pw
(“chooses”) x Pw
(“pain”)
Code Examples
Dynamic Programming
● Segmentation problem Ps
(“whatdoesthisreferto”)
→ P(“w”) x Ps
(“hatdoesthisreferto”)
→ P(“wh”) x Ps
(“atdoesthisreferto”)
→ P(“wha”) x Ps
(“tdoesthisreferto”)
→ P(“what”) x Ps
(“doesthisreferto”)
→ ……
Code Examples
Contacts
lee.hongjoo@yandex.com
https://www.linkedin.com/in/hongjoo-lee/
https://github.com/midnightradio/consalad-5th.git

Enc-Koreanizer : NMT based Ro-Ko Transliterator

  • 1.
    집에서 다시 만든머신러닝 기반 자동번역기 #DeepLearning 2019.8.24 이홍주 (lee.hongjoo@yandex.com)
  • 2.
    KOREANizerencore : NMT based Ro-KoTransliterator 2019.8.24 이홍주 (lee.hongjoo@yandex.com)
  • 3.
    Introduction Previously onPyCon KR 2019 https://www.pycon.kr/program/talk-detail?id=117
  • 4.
    Introduction ● Transfer basedtranslation Previously on PyCon KR 2019 Bernard Vauquois' pyramid target text source text syntax phrases words syntax phrases words
  • 5.
    Introduction ● SMT basedRo-Ko Transliteration Previously on PyCon KR 2019 ● Romanized K-pop Lyrics ○ 12,095 songs ○ 1,586,305 lines ○ 121,469 unique bi-word pairs ■ ex. “모르는 moreuneun”
  • 6.
    Introduction ● Interlingual Translation ○Two phases ■ Analysis : Analyze the source language into a semantic representation ■ Generation : Convert the representation into an target language Previously on PyCon KR 2019 Bernard Vauquois' pyramid target text source text Interlingua analysis generation
  • 7.
    Outline ● Introduction ● NeuralMachine Translation ○ Drawbacks in SMT ○ Neural Language Model ○ Encoder-Decoder architecture ○ Attention Model ○ Ro-Ko Transliterator ● Dynamic Programming ○ Definition ○ Code examples
  • 8.
    Neural Machine Translation ●Phrase based translation ○ Translation task breaks up source sentences into multiple chunks ○ and then translates them phrase-by-phrase ● Local translation problem ○ can’t capture long-range dependencies in languages ■ e.g., gender agreements, syntax structures ○ this led to disfluency in translation outputs Drawbacks in SMT
  • 9.
    Neural Machine Translation ●Standard Network for a text sequence ○ Input, outputs can be different lengths in different examples ○ Doesn’t share features learned across different positions of text Neural Language Model quoted from Andrew Ng’s Coursera lecture
  • 10.
    Neural Machine Translation ●RNN Language Model ○ P(w1 w2 w3 ... wt ) = P(w1 ) x P(w2 |w1 ) x P(w3 |w1 w2 ) x …… x P(wt |w1 w2 ...wt-1 ) ○ Each step in RNN outputs distribution over the next word given preceding words ○ P(<s>Cats average 15 hours of sleep a day</s>) Neural Language Model a0 a1 <s> P(cats|<s>) a2 cats P(average|cats) a1 average P(15|cats average) a1 day P(</s>|......) ……
  • 11.
    ● Conditional LanguageModel ○ P(y1 y2 … yT | x1 x2 … xT ) Language Model : Machine Translation : Neural Machine Translation Neural Language Model quoted from Andrew Ng’s Coursera lecture
  • 12.
    NMT ● Encoder ○ readsthe source sentence to build a “thought” vector ○ the vector presents the sentence meaning ● Decoder ○ processes the “thought” vector to emit a translation Encoder-Decoder architecture quoted from Google’s Tensorflow tutorial
  • 13.
    NMT seq2seq model quotedfrom Andrew Ng’s Coursera lecture
  • 14.
    NMT ● Problem oflong sequences ○ works well with short sentences ○ performance drops on long sentences Attention Model quoted from Andrew Ng’s Coursera lecture
  • 15.
  • 16.
    Dynamic Programming ● Togrown-ups ○ In Mathematical Optimization and Computation Programming Method ○ Simplifying a problem by breaking it down into simpler sub-problems in a recursive manner. ○ Applicable under two conditions ■ optimal sub-structure ■ overlapping sub-problems Definition
  • 17.
    Dynamic Programming ● FibonacciNumbers ○ F0 = 0, F1 = 1, and Fn = Fn-1 + Fn-2 for n > 1 ○ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, … ● Approaches ○ by Recursion (Naive approach) ○ by Memoization (Top-down) ○ by Tabulation (Buttom-up) Code Examples
  • 18.
    Dynamic Programming ● WordSegmentation ○ “whatdoesthisreferto” ⇒ “what does this refer to” ● Best segmentation Ps ○ one with highest probability ● Probability of a segmentation ○ Pw (first word) x Ps (rest of segmentation) ● Pw (word) ○ estimated by counting (unigram) ● Ps (“choosespain”) ○ Pw (“choose”) x Pw (“spain”) > Pw (“chooses”) x Pw (“pain”) Code Examples
  • 19.
    Dynamic Programming ● Segmentationproblem Ps (“whatdoesthisreferto”) → P(“w”) x Ps (“hatdoesthisreferto”) → P(“wh”) x Ps (“atdoesthisreferto”) → P(“wha”) x Ps (“tdoesthisreferto”) → P(“what”) x Ps (“doesthisreferto”) → …… Code Examples
  • 20.