Enc-Koreanizer : NMT based Ro-Ko Transliterator

집에서 다시 만든 머신러닝 기반 자동번역기
#DeepLearning
2019.8.24
이홍주 (lee.hongjoo@yandex.com)

KOREANizerencore
:
NMT based Ro-Ko Transliterator
2019.8.24
이홍주 (lee.hongjoo@yandex.com)

Introduction Previously on PyCon KR 2019
https://www.pycon.kr/program/talk-detail?id=117

Introduction
● Transfer based translation
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
syntax
phrases
words
syntax
phrases
words

Introduction
● SMT based Ro-Ko Transliteration
● Romanized K-pop Lyrics
○ 12,095 songs
○ 1,586,305 lines
○ 121,469 unique bi-word pairs
■ ex. “모르는 moreuneun”

Introduction
● Interlingual Translation
○ Two phases
■ Analysis : Analyze the source language
into a semantic representation
■ Generation : Convert the
representation into an target language
Bernard Vauquois' pyramid
target
text
source
text
Interlingua
analysis
generation

Outline
● Introduction
● Neural Machine Translation
○ Drawbacks in SMT
○ Neural Language Model
○ Encoder-Decoder architecture
○ Attention Model
○ Ro-Ko Transliterator
● Dynamic Programming
○ Definition
○ Code examples

Neural Machine Translation
● Phrase based translation
○ Translation task breaks up source sentences into multiple chunks
○ and then translates them phrase-by-phrase
● Local translation problem
○ can’t capture long-range dependencies in languages
■ e.g., gender agreements, syntax structures
○ this led to disfluency in translation outputs
Drawbacks in SMT

● Standard Network for a text sequence
○ Input, outputs can be different lengths in different examples
○ Doesn’t share features learned across different positions of text
Neural Language Model
quoted from Andrew Ng’s Coursera lecture

● RNN Language Model
○ P(w1
w2
w3
... wt
) = P(w1
) x P(w2
|w1
) x P(w3
|w1
w2
) x …… x P(wt
|w1
w2
...wt-1
)
○ Each step in RNN outputs distribution over the next word given preceding words
○ P(<s>Cats average 15 hours of sleep a day</s>)
Neural Language Model
a0
a1
<s>
P(cats|<s>)
a2
cats
P(average|cats)
a1
average
P(15|cats average)
a1
day
P(</s>|......)
……

● Conditional Language Model
○ P(y1
y2
… yT
| x1
x2
… xT
)
Language Model :
Machine Translation :
Neural Machine Translation Neural Language Model

NMT
● Encoder
○ reads the source sentence to build a “thought” vector
○ the vector presents the sentence meaning
● Decoder
○ processes the “thought” vector to emit a translation
Encoder-Decoder architecture
quoted from Google’s Tensorflow tutorial

NMT seq2seq model

NMT
● Problem of long sequences
○ works well with short sentences
○ performance drops on long sentences
Attention Model

Ro-Ko Transliteration
● http://enc-koreanizer.herokuapp.com
Enc-Koreanizer

Dynamic Programming
● To grown-ups
○ In Mathematical Optimization and
Computation Programming Method
○ Simplifying a problem by breaking it
down into simpler sub-problems in a
recursive manner.
○ Applicable under two conditions
■ optimal sub-structure
■ overlapping sub-problems
Definition

Dynamic Programming
● Fibonacci Numbers
○ F0
= 0, F1
= 1, and Fn
= Fn-1
+ Fn-2
for n > 1
○ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …
● Approaches
○ by Recursion (Naive approach)
○ by Memoization (Top-down)
○ by Tabulation (Buttom-up)
Code Examples

Dynamic Programming
● Word Segmentation
○ “whatdoesthisreferto” ⇒ “what does this refer to”
● Best segmentation Ps
○ one with highest probability
● Probability of a segmentation
○ Pw
(first word) x Ps
(rest of segmentation)
● Pw
(word)
○ estimated by counting (unigram)
● Ps
(“choosespain”)
○ Pw
(“choose”) x Pw
(“spain”) > Pw
(“chooses”) x Pw
(“pain”)
Code Examples

Dynamic Programming
● Segmentation problem Ps
(“whatdoesthisreferto”)
→ P(“w”) x Ps
(“hatdoesthisreferto”)
→ P(“wh”) x Ps
(“atdoesthisreferto”)
→ P(“wha”) x Ps
(“tdoesthisreferto”)
→ P(“what”) x Ps
(“doesthisreferto”)
→ ……
Code Examples

Contacts
lee.hongjoo@yandex.com
https://www.linkedin.com/in/hongjoo-lee/
https://github.com/midnightradio/consalad-5th.git

Enc-Koreanizer : NMT based Ro-Ko Transliterator

Recommended

Recommended

More Related Content

Similar to Enc-Koreanizer : NMT based Ro-Ko Transliterator

Similar to Enc-Koreanizer : NMT based Ro-Ko Transliterator (20)

Recently uploaded

Recently uploaded (20)

Enc-Koreanizer : NMT based Ro-Ko Transliterator