GNMT로 알아보는 신경망 기반 기계번역

GNMT로 알아보는 신경망
기반 기계번역
바벨피쉬 - 고병일
2016년 12월

# 순서
• 시작
• 구글번역
• RBMT / SMT
• Word Embedding
• Recurrent Neural Network
• GRU/LSTM
• Encoder-Decoder Model
• Bidirectional RNN
• Attention Model
• Google NMT
• 정리

Babel Fish Babel Pish
http://babelpish.github.io/

이러려고 번역기를 만들었
나
자괴감이 들고

구글이 그것을 만들었답니다.

구글은
• 초기에는 시스트란(SYSTRAN)의 규칙기반 기계번역 엔진을 가
지고 서비스를 시작합니다.
• 2006년 통계기반 기계번역(SMT)를 런칭합니다.
• Franz Och 입사 : 2003년
• 2007년 10월 이후엔 모든 번역 엔진을 SMT로 전환
• Moses  mkcls / giza++ 등을 개발
• Phrase based SMT에 큰기여
• 2016년 10월
• Google Neural Machine Translation 을 발표 및 서비스 런칭
• translate.google.co.kr
Franz Och

GNMT
• https://arxiv.org/pdf/1609.
08144v2.pdf
• 역쉬 이분의 이름이 있네요
• Jeffrey Dean

GNMT 현상
SMT(N)  Our core goal is to achieve this year. or This is the will to disperse and go to his
senses and our energy can make it on their mind. I think you have.
RBMT(S)  Our main goal has to have this or that mind that it can do and can accomplish to disperse our
energy if it gets back senses and advances to should achieve this year.
우리의 핵심 목표는 ~~ 그런 마음을 가지셔야 합니다.
그런 마음 가짐 : 올해 달성해야 될 것은 이것이며, 정신을 차리고 나아가면서 우리의 에너지를 분산시키는 걸 해낼수
있다는

우선 번역에 대해서
•어떤 언어로 쓰인 글을 다른 언어로 그에
상응하는 의미의 글로 전달하는 일.
•출발어 : 어떤 언어
•Source Language
•도착어 : 다른 언어
•Target Language
•기계번역
•기계를 이용하여 자연언어를 번역하는 것.

Rule-based Machine Translation
나는 사과를 먹었습니다
나/대명사 는/조사(SUBJ)
사과/명사 를/조사(OBJ)
먹다/동사 었/선어말어미 습니다/종결어
미
주어
목적어
서술어
주어
목적어
서술어
나/대명사 는/조사(SUBJ)
사과/명사 를/조사(OBJ)
먹다/동사 었/선어말어미 습니다/종결어
미
I
apple /single
eat/past  ate
I
an apple
ate
I ate an apple
형태소 분석 구문분석
어순 변환
대역어 치환대역어 생성
#형태소분석 #구문분석 #어순변환패턴 #대역어사전
#자연언어처리의꽃 #비용비쌈 #WSD

Statistical Machine Translation
나는 사과를 먹었습니다
I ate an apple
번역모델
Translation Model
언어모델
Language Model
재배열모델
Reordering Model
디코더
Decoder
나는  I / i / ai / …
사과를  apple / an apple / a apple / …
먹었습니다  eat / ate / drink / …
I apple eat
I eat apple
I ate apple
I ate a apple
I ate an apple
병렬 말뭉치
세라는 사과를 좋아합니다. Sera likes an apple.
나는 오렌지를 먹었습니다. I ate an orange.
…
# 2~3 million bi-sentences
학습
Training
디코딩
Decoding
#병렬말뭉치 #Moses #구현이쉽다
#누락 발생 #WSD #하지만직접하려면..

RBMT
-실제 번역의 과정을 따름
-의미모호성 문제
-용어볼륨이 매우 크고 복잡해짐  유지보수 문제
SMT
-용어번역/재배치문제를 통계적으로 해결
-어순이 다른언어에 있어서 문법 문제가 큼  한<>영
-한국어 : SOV
-영어 : SVO
-누락같은 문제가 발생

NMT 번역(seq2seq)
• 문장 전체의 정보를 사용한 번역
RBMT/SMT
원문 이해
원문을 분석
분석
세세한 규칙으로
분할
재구성
규칙을 결함
번역문
NMT
원문
이해
원문의 이미를 이
해
재구성
의미표현에서
직접 단어를 생성
번역문
의미표현

• Google NMT에 대해서 알기전에
•NMT에 대해서 알아봅니다.
• 그전에…
• Deep Learning / Neural Network
• Word Embedding
• Recurrent Neural Network
• GRU / LSTM
• Encode Decode Model

Word Embedding
• One-hot Representation
• Distributed Representation
• Word embedding
고양이 = [ 0, 0, 0, 1, 0, 0, … ]
 One-hot Representation
 Real World 라면 1000….00000 개
고양이 = [ 31.2, 10.9, 92.1, … ]
 Distritubted Representation
 500~1000개 차원으로도 표현가능

Word Embedding(cont`)
• CBOW
• “단어”란 주변 단어로 정의
• Skip-gram
• 주변 단어를 잘 설명하는 무엇이 그 “단어”를 정의
• Word2vec  https://code.google.com/archive/p/word2vec/

Recurrent Neural Network
• 시계열 데이터를 위한 신경망 구조

Recurrent Neural Network(cont`)
• 이전 정보를 적용해서 사용되는 구조
…
…
…
나는
나는
…
…
…
학교에
나는
학교에
…
…
…
간다
나는
학교에
간다

RNN : Vanishing Gradient
• RNN에서의 Vanishing Gradient 문제 극복

GRU / LSTM(cont`)
• Vanishing Gradient 문제 극복을 위한 구조
• Long Short Term Memory
• LSTM
• Gated Recurrent Unit
• GRU
• http://aikorea.org/blog/rnn-tutorial-4/

Encoder-Decoder 번역 모델
• 입력언어와 출력언어 각 RNN구조 2개를 결합한 구조
그 는 달린다

Encoder Decoder 모델 결과 (GRU이용)
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, HK Cho et al, 2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, HK Cho et al, 2014

Bidirectional Recurrent NN
• Sequence to Sequence Learning with Neural Networks
• Ilya Sutskever(google) et al. 2014
• 입력 문자열을 뒤집어서도 학습하니 긴 문장에 대해서도 성능이
올라감
• 학습
• 나는 학교에 간다 -- I go to school
• 간다 학교에 나는 -- I go to school
•  Bi-directional RNN
• Perplexity : 5.8  4.7
• BLEU : 25.9  30.6

Soft Attention Model
“I love coffee” -> “나는 커피를 사랑한다”

문장이 길어도 .. Attention Model

Neural Machine Translation
• RNN
• GRU/LSTM
• Encoder Decoder
• Soft Attention Model
• Limitation
• Vocabulary size : OOV
• Model Size
• Training & Translation Time

Google NMT << NMT
• LSTM
• Encoder Decoder
• Attention Mechanism
• 1st encoder layer

Google NMT <> NMT
• Deep layer : 8 layers
• Encoder
• 1 bidirectional RNN layer
• 7 unidirectional RNN layers
• Decoder
• 8 unidirectional RNN layers
• Residual networks
• Parallelization
• WPM : Word Piece Model
• Quantize / TPU
• Beam search using length-normalization

Residual Network
• Vanishing Gradient
• Deep layers
http://smerity.com/articles/2016/google_nmt_arch.html

Deep Layers

Parallelization

Segmentation Approach
• WPM
• Word Piece Model
• Sub-word unit
• Vocabulary size 제한에 유리
• 한국어/일본어 같은 아시아 언어
• 50k~60k : target language vocabulary
• Mixed Word/Character Model
• <B> , <M> , <E>
• Miki  OOV
• <B>M <M>i <M>k <E>i

Quantization
• Floating point
number
• 32bit
• Integer
• 8 bit
• Model size
• 75% 축소
• CPU Time 속도
보장
https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/

TPU
• Tensor Processing Unit
https://tensorflow.blog/2016/05/19/tensor-processing-unittpu/

GNMT : one more things
• https://arxiv.org/pdf/1611.04558v1.pdf
• Google”s Multilingual Neural Machine Translation System:
Enabling Zero-Shot Translation
• <2es> Hello, how are you? -> ¿Hola como estás?
• 12 languages

GNMT : one more things(cont`)
• 현재 이 논문에서 구현된 형태가 서비스 된게 아닌지.

• Mixing language 도 가능

자세한 사항은 ..

정리
• NMT는 기존의 기계번역방법은 심플하게 풀어버림
• << RBMT / SMT
• 구문 분석기 / 구조 변환기 / …  필요 없음
• 그런데 번역스럽지 않게 번역을 잘함
• GPU기반
• GNMT
• Practically 하게 NMT문제를 해결함
• 기존의 4 layer정도의 구조를 8 layer로의 Deep 한 구조를 갖으면서
이문제를 residual network로 해결
• GPU문제는 Quantization 문제로 CPU/TPU에서 번역 서비스 가능토록
함
• 대량의 서버 인프라
• 구글이 하면 뭔가 다른기 하네요

감사합니다.
Thang you.
ありがとうございます.
Dank.
Merci.
谢谢.

Refs.
• https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-1/
• http://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf
• http://kv-emptypages.blogspot.kr/2016/09/comparing-neural-mt-smt-and-rbmt.html
• http://neuralnetworksanddeeplearning.com/chap6.html
• https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
• http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf
• http://www.slideshare.net/ToshiakiNakazawa/gnmt-66491745
• http://www.slideshare.net/ToshiakiNakazawa/attentionbased-nmt-description?next_slideshow=1
• http://www.slideshare.net/YusukeOda1/encoderdecoder-tis
• https://shuuki4.wordpress.com/2016/01/27/word2vec-%EA%B4%80%EB%A0%A8-%EC%9D%B4%EB%A1%A0-
%EC%A0%95%EB%A6%AC/
• http://eric-yuan.me/rnn2-lstm/
• https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/

GNMT로 알아보는 신경망 기반 기계번역

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GNMT로 알아보는 신경망 기반 기계번역

Similar to GNMT로 알아보는 신경망 기반 기계번역 (20)

GNMT로 알아보는 신경망 기반 기계번역

Editor's Notes