Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Attention Mechanism(Seq2Seq)

1,234 views

Published on

Introduction to Attention Mechanism (Neural Machine Translation / Python, Tensorflow)

Published in: Data & Analytics
  • Be the first to comment

Attention Mechanism(Seq2Seq)

  1. 1. What is Attention Mechanism ? Kim SuSang(healess1@gmail.com)
  2. 2. Submitted on 17 Aug 2015 / 608회 인용 아래 논문을 바탕으로 설명
  3. 3. 글을 읽을때 핵심을 알고 읽는다 나는 지금 배가 고파 판교에 피자 주문하고 싶다 I like to order a pizza because I’m hungry 문장을 읽을때 중요 단어 위주로 읽음
  4. 4. LSTM이 있긴하지만 길어지면.. 나는 지금 배가 고파 판교에 피자 주문하고 싶다
  5. 5. 기존 Seq2Seq와 Attention 적용 I like to order a pizza because I’m hungry 나는 지금 배가 고파 판교에 피자 주문하고 싶다 나는 지금 배가 고파 판교에 피자 주문하고 싶다 I like to order a pizza because I’m hungry ○ Stacked RNN기반
  6. 6. Attention Mechnism [0 1 2 3 4 5 6] [나는] [지금] [배가고파] [판교에] [피자] [주문][하고 싶다] Softmax [0*w 1*w 2*w 3*w 4*w 5*w 6*w] 0*w + 1*w + 2*w + 3*w + 4*w + 5*w + 6*w = Summation Vector(Context) Attention Layer(Element Wise Summation = Blending - 혼합) RNN(Hidden*Context) I like to order a pizza because I’m hungry Vector*Weight
  7. 7. Attention Models(Global) Ct : context vector ht : target hidden state hs : source hidden states
  8. 8. Local Attention Models ● Predictive alignment ● Align weights
  9. 9. English-German Results -English to German translation(4.5M Sentence Pairs), we achieve new state-of-the-art (SOTA) -4 Layer Stacking LSTMs:1000-dim cells/embeddings -50K most frequent English & German words
  10. 10. 논문에 쓰인 지표 (BLEU) BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. 그러나 AI가 조만간 대신할 것으로 기대됐던 번역 분야에선 아직 갈 길이 먼 것으로 드러났다. 연구진은 AI의 번역 수준을 알아보기 위해 기계번역 의 질을 평가하는 BLEU 점수를 영어-독일어 부문에 한정해 수집했다. 올 해 가장 뛰어난 AI가 기록한 점수는 31.7점을 기록했다. 번역 업계에서 좋 은 번역의 기준으로 보고 있는 50점에 크게 미달하는 점수다. [출처: 중앙일보] 세계 첫 'AI지수' 보고서 "AI, 인간 따라잡고 있다"
  11. 11. Learning Curve & BLEU
  12. 12. Alignment Quality Alignment Error Rate is commonly used metric for assessing sentence alignments. It combines precision and recall metrics together such that a perfect alignment must have all of the sure alignments and may have some possible alignments AER = (|A∩S| + |A∩P|) / (|A| + |S|) meaning that the best alignment would when the AER = 1.0. AER = 1 - (|A∩S| + |A∩P|) / (|A| + |S|)
  13. 13. Attention Matrix(Attention Score)
  14. 14. Hard(0,1) vs Soft(SoftMax) Attention
  15. 15. Luong vs Bahdanau Effective approaches to attention-based neural machine translation(2015.9) Neural Machine Translation by Jointly Learning to Align and Translate(2014.9)
  16. 16. Example Base : Non-Attention Ref : Human Src : Source Best : Best Model(Attention/Ensemble)
  17. 17. Research Data https://nlp.stanford.edu/projects/nmt/
  18. 18. Source Seq2Seq vs Attention (Tensorflow) Seq2Seq https://github.com/TensorMSA/tensormsa_jupyter /blob/master/chap13_chatbot_lecture/6.Chatbot(Q A-seq2seq)-Word_onehot.ipynb Seq2Seq with Attention https://github.com/TensorMSA/tensormsa_jupyter /blob/master/chap13_chatbot_lecture/6.Chatbot(Q A-seq2seq%20with%20Attention).ipynb
  19. 19. Any questions ? You can send mail to ◉ healess1@gmail.com Thanks!

×