SlideShare a Scribd company logo
집에서 다시 만든 머신러닝 기반 자동번역기
#DeepLearning
2019.8.24
이홍주 (lee.hongjoo@yandex.com)
KOREANizerencore
:
NMT based Ro-Ko Transliterator
2019.8.24
이홍주 (lee.hongjoo@yandex.com)
Introduction Previously on PyCon KR 2019
https://www.pycon.kr/program/talk-detail?id=117
Introduction
● Transfer based translation
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
syntax
phrases
words
syntax
phrases
words
Introduction
● SMT based Ro-Ko Transliteration
Previously on PyCon KR 2019
● Romanized K-pop Lyrics
○ 12,095 songs
○ 1,586,305 lines
○ 121,469 unique bi-word pairs
■ ex. “모르는 moreuneun”
Introduction
● Interlingual Translation
○ Two phases
■ Analysis : Analyze the source language
into a semantic representation
■ Generation : Convert the
representation into an target language
Previously on PyCon KR 2019
Bernard Vauquois' pyramid
target
text
source
text
Interlingua
analysis
generation
Outline
● Introduction
● Neural Machine Translation
○ Drawbacks in SMT
○ Neural Language Model
○ Encoder-Decoder architecture
○ Attention Model
○ Ro-Ko Transliterator
● Dynamic Programming
○ Definition
○ Code examples
Neural Machine Translation
● Phrase based translation
○ Translation task breaks up source sentences into multiple chunks
○ and then translates them phrase-by-phrase
● Local translation problem
○ can’t capture long-range dependencies in languages
■ e.g., gender agreements, syntax structures
○ this led to disfluency in translation outputs
Drawbacks in SMT
Neural Machine Translation
● Standard Network for a text sequence
○ Input, outputs can be different lengths in different examples
○ Doesn’t share features learned across different positions of text
Neural Language Model
quoted from Andrew Ng’s Coursera lecture
Neural Machine Translation
● RNN Language Model
○ P(w1
w2
w3
... wt
) = P(w1
) x P(w2
|w1
) x P(w3
|w1
w2
) x …… x P(wt
|w1
w2
...wt-1
)
○ Each step in RNN outputs distribution over the next word given preceding words
○ P(<s>Cats average 15 hours of sleep a day</s>)
Neural Language Model
a0
a1
<s>
P(cats|<s>)
a2
cats
P(average|cats)
a1
average
P(15|cats average)
a1
day
P(</s>|......)
……
● Conditional Language Model
○ P(y1
y2
… yT
| x1
x2
… xT
)
Language Model :
Machine Translation :
Neural Machine Translation Neural Language Model
quoted from Andrew Ng’s Coursera lecture
NMT
● Encoder
○ reads the source sentence to build a “thought” vector
○ the vector presents the sentence meaning
● Decoder
○ processes the “thought” vector to emit a translation
Encoder-Decoder architecture
quoted from Google’s Tensorflow tutorial
NMT seq2seq model
quoted from Andrew Ng’s Coursera lecture
NMT
● Problem of long sequences
○ works well with short sentences
○ performance drops on long sentences
Attention Model
quoted from Andrew Ng’s Coursera lecture
Ro-Ko Transliteration
● http://enc-koreanizer.herokuapp.com
Enc-Koreanizer
Dynamic Programming
● To grown-ups
○ In Mathematical Optimization and
Computation Programming Method
○ Simplifying a problem by breaking it
down into simpler sub-problems in a
recursive manner.
○ Applicable under two conditions
■ optimal sub-structure
■ overlapping sub-problems
Definition
Dynamic Programming
● Fibonacci Numbers
○ F0
= 0, F1
= 1, and Fn
= Fn-1
+ Fn-2
for n > 1
○ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …
● Approaches
○ by Recursion (Naive approach)
○ by Memoization (Top-down)
○ by Tabulation (Buttom-up)
Code Examples
Dynamic Programming
● Word Segmentation
○ “whatdoesthisreferto” ⇒ “what does this refer to”
● Best segmentation Ps
○ one with highest probability
● Probability of a segmentation
○ Pw
(first word) x Ps
(rest of segmentation)
● Pw
(word)
○ estimated by counting (unigram)
● Ps
(“choosespain”)
○ Pw
(“choose”) x Pw
(“spain”) > Pw
(“chooses”) x Pw
(“pain”)
Code Examples
Dynamic Programming
● Segmentation problem Ps
(“whatdoesthisreferto”)
→ P(“w”) x Ps
(“hatdoesthisreferto”)
→ P(“wh”) x Ps
(“atdoesthisreferto”)
→ P(“wha”) x Ps
(“tdoesthisreferto”)
→ P(“what”) x Ps
(“doesthisreferto”)
→ ……
Code Examples
Contacts
lee.hongjoo@yandex.com
https://www.linkedin.com/in/hongjoo-lee/
https://github.com/midnightradio/consalad-5th.git

More Related Content

Similar to Enc-Koreanizer : NMT based Ro-Ko Transliterator

Nugo final presentation
Nugo final presentationNugo final presentation
Nugo final presentation
NickPark19
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Universitat Politècnica de Catalunya
 
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptxSANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SangeetaYadav843179
 
JSX - developing a statically-typed programming language for the Web
JSX - developing a statically-typed programming language for the WebJSX - developing a statically-typed programming language for the Web
JSX - developing a statically-typed programming language for the WebKazuho Oku
 
Do you think OOP when writing topics?
Do you think OOP when writing topics?Do you think OOP when writing topics?
Do you think OOP when writing topics?
Gunnar Krause
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
C Programming - Refresher - Part I
C Programming - Refresher - Part I C Programming - Refresher - Part I
C Programming - Refresher - Part I
Emertxe Information Technologies Pvt Ltd
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
indeedeng
 
Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation
Rebaz Najeeb
 
Master's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciMaster's Thesis Alessandro Calmanovici
Master's Thesis Alessandro Calmanovici
Alessandro Calmanovici
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
Hayahide Yamagishi
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
account inactive
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
Pierre de Lacaze
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
Jaemin Cho
 
Deep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsDeep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender Systems
Huiji Gao
 
Natural Question Generation using Deep Learning
Natural Question Generation using Deep LearningNatural Question Generation using Deep Learning
Natural Question Generation using Deep Learning
Arijit Mukherjee
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
mraaaaa
 

Similar to Enc-Koreanizer : NMT based Ro-Ko Transliterator (20)

Nugo final presentation
Nugo final presentationNugo final presentation
Nugo final presentation
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
 
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptxSANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
 
JSX - developing a statically-typed programming language for the Web
JSX - developing a statically-typed programming language for the WebJSX - developing a statically-typed programming language for the Web
JSX - developing a statically-typed programming language for the Web
 
Do you think OOP when writing topics?
Do you think OOP when writing topics?Do you think OOP when writing topics?
Do you think OOP when writing topics?
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
C Programming - Refresher - Part I
C Programming - Refresher - Part I C Programming - Refresher - Part I
C Programming - Refresher - Part I
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
 
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
 
Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation
 
Master's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciMaster's Thesis Alessandro Calmanovici
Master's Thesis Alessandro Calmanovici
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
 
Deep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsDeep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender Systems
 
Typing Concerns
Typing ConcernsTyping Concerns
Typing Concerns
 
Natural Question Generation using Deep Learning
Natural Question Generation using Deep LearningNatural Question Generation using Deep Learning
Natural Question Generation using Deep Learning
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Enc-Koreanizer : NMT based Ro-Ko Transliterator

  • 1. 집에서 다시 만든 머신러닝 기반 자동번역기 #DeepLearning 2019.8.24 이홍주 (lee.hongjoo@yandex.com)
  • 2. KOREANizerencore : NMT based Ro-Ko Transliterator 2019.8.24 이홍주 (lee.hongjoo@yandex.com)
  • 3. Introduction Previously on PyCon KR 2019 https://www.pycon.kr/program/talk-detail?id=117
  • 4. Introduction ● Transfer based translation Previously on PyCon KR 2019 Bernard Vauquois' pyramid target text source text syntax phrases words syntax phrases words
  • 5. Introduction ● SMT based Ro-Ko Transliteration Previously on PyCon KR 2019 ● Romanized K-pop Lyrics ○ 12,095 songs ○ 1,586,305 lines ○ 121,469 unique bi-word pairs ■ ex. “모르는 moreuneun”
  • 6. Introduction ● Interlingual Translation ○ Two phases ■ Analysis : Analyze the source language into a semantic representation ■ Generation : Convert the representation into an target language Previously on PyCon KR 2019 Bernard Vauquois' pyramid target text source text Interlingua analysis generation
  • 7. Outline ● Introduction ● Neural Machine Translation ○ Drawbacks in SMT ○ Neural Language Model ○ Encoder-Decoder architecture ○ Attention Model ○ Ro-Ko Transliterator ● Dynamic Programming ○ Definition ○ Code examples
  • 8. Neural Machine Translation ● Phrase based translation ○ Translation task breaks up source sentences into multiple chunks ○ and then translates them phrase-by-phrase ● Local translation problem ○ can’t capture long-range dependencies in languages ■ e.g., gender agreements, syntax structures ○ this led to disfluency in translation outputs Drawbacks in SMT
  • 9. Neural Machine Translation ● Standard Network for a text sequence ○ Input, outputs can be different lengths in different examples ○ Doesn’t share features learned across different positions of text Neural Language Model quoted from Andrew Ng’s Coursera lecture
  • 10. Neural Machine Translation ● RNN Language Model ○ P(w1 w2 w3 ... wt ) = P(w1 ) x P(w2 |w1 ) x P(w3 |w1 w2 ) x …… x P(wt |w1 w2 ...wt-1 ) ○ Each step in RNN outputs distribution over the next word given preceding words ○ P(<s>Cats average 15 hours of sleep a day</s>) Neural Language Model a0 a1 <s> P(cats|<s>) a2 cats P(average|cats) a1 average P(15|cats average) a1 day P(</s>|......) ……
  • 11. ● Conditional Language Model ○ P(y1 y2 … yT | x1 x2 … xT ) Language Model : Machine Translation : Neural Machine Translation Neural Language Model quoted from Andrew Ng’s Coursera lecture
  • 12. NMT ● Encoder ○ reads the source sentence to build a “thought” vector ○ the vector presents the sentence meaning ● Decoder ○ processes the “thought” vector to emit a translation Encoder-Decoder architecture quoted from Google’s Tensorflow tutorial
  • 13. NMT seq2seq model quoted from Andrew Ng’s Coursera lecture
  • 14. NMT ● Problem of long sequences ○ works well with short sentences ○ performance drops on long sentences Attention Model quoted from Andrew Ng’s Coursera lecture
  • 16. Dynamic Programming ● To grown-ups ○ In Mathematical Optimization and Computation Programming Method ○ Simplifying a problem by breaking it down into simpler sub-problems in a recursive manner. ○ Applicable under two conditions ■ optimal sub-structure ■ overlapping sub-problems Definition
  • 17. Dynamic Programming ● Fibonacci Numbers ○ F0 = 0, F1 = 1, and Fn = Fn-1 + Fn-2 for n > 1 ○ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, … ● Approaches ○ by Recursion (Naive approach) ○ by Memoization (Top-down) ○ by Tabulation (Buttom-up) Code Examples
  • 18. Dynamic Programming ● Word Segmentation ○ “whatdoesthisreferto” ⇒ “what does this refer to” ● Best segmentation Ps ○ one with highest probability ● Probability of a segmentation ○ Pw (first word) x Ps (rest of segmentation) ● Pw (word) ○ estimated by counting (unigram) ● Ps (“choosespain”) ○ Pw (“choose”) x Pw (“spain”) > Pw (“chooses”) x Pw (“pain”) Code Examples
  • 19. Dynamic Programming ● Segmentation problem Ps (“whatdoesthisreferto”) → P(“w”) x Ps (“hatdoesthisreferto”) → P(“wh”) x Ps (“atdoesthisreferto”) → P(“wha”) x Ps (“tdoesthisreferto”) → P(“what”) x Ps (“doesthisreferto”) → …… Code Examples