모두를 위한 기계번역 (박찬준)
○ 개요
2014년 본격적으로 NMT에 대한 연구가 진행되었으며 현재는 Transformer 기반의 다양한 NMT 시스템들이 연구되고 있습니다.
더 나아가 최근 NLP에서 가장 뜨거운 연구분야인 Language Representation 분야에서도 Transformer를 기반으로 한 BERT, GPT-2, XLNET 등의 모델이 개발되고 있습니다.
본 테크톡에서는 먼저 RBMT와 SMT에 대해서 간략하게 살펴보고 RNN기반 NMT 부터 Transformer를 기반으로 하는 NMT까지 자세히 살펴볼 예정입니다.
더 나아가 최근 WMT에서 매년 Shared Task로 열리고 있는 Automatic Post Editing System과 Parallel Corpus Filtering, Quality Estimation 분야에 대해서 설명하며 NMT를 이용한 다양한 응용 연구분야를 소개해드리겠습니다. (ex. 실시간 강연통역 시스템, 문법교정 시스템) , 기계번역에 대해서 아무것도 모르시는 분, 궁금하시분들도 이해할 수 있는 수준으로 쉽게 설명을 진행할 예정입니다.
○ 목차
1)기계번역이란
2)RBMT에 대한 간략한 소개
3)SMT에 대한 간략한 소개
4)RNN기반 딥러닝부터 Transformer까지
5)NMT를 이용한 다양한 응용 연구 소개
a. Automatic Post Editing
b. Quality Estimation
c. Parallel Corpus Filtering
d. Grammar Error Correction
e. 실시간 강연통역 시스템
6)OpenNMT 소개
Keras vs Tensorflow vs PyTorch | Deep Learning Frameworks Comparison | EdurekaEdureka!
** AI & Deep Learning with Tensorflow Training: https://www.edureka.co/ai-deep-learni... **
This Edureka PPT on "Keras vs TensorFlow vs PyTorch" will provide you with a crisp comparison among the top three deep learning frameworks. It provides a detailed and comprehensive knowledge about Keras, TensorFlow and PyTorch and which one to use for what purposes. Following topics will be covered in this PPT:
Introduction to keras, Tensorflow, Pytorch
Parameters of Comparison
Level of API
Speed
Architecture
Ease of Code
Debugging
Community Support
Datasets
Popularity
Suitable use cases
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This lectures provides students with an introduction to natural language processing, with a specific focus on the basics of two applications: vector semantics and text classification.
(Lecture at the QUARTZ PhD Winter School (http://www.quartz-itn.eu/training/winter-school/ in Padua, Italy on February 12, 2018)
[기초개념] Recurrent Neural Network (RNN) 소개Donghyeon Kim
* 시계열 데이터의 시간적 속성을 이용하는 RNN과 그 한계점을 극복하기 위한 LSTM, GRU 기법에 대해 기본적인 개념을 소개합니다.
* 광주과학기술원 인공지능 스터디 A-GIST 모임에서 발표했습니다.
* 발표 영상 (유튜브, 한국어): https://youtu.be/Dt2SCbKbKvs
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
Keras vs Tensorflow vs PyTorch | Deep Learning Frameworks Comparison | EdurekaEdureka!
** AI & Deep Learning with Tensorflow Training: https://www.edureka.co/ai-deep-learni... **
This Edureka PPT on "Keras vs TensorFlow vs PyTorch" will provide you with a crisp comparison among the top three deep learning frameworks. It provides a detailed and comprehensive knowledge about Keras, TensorFlow and PyTorch and which one to use for what purposes. Following topics will be covered in this PPT:
Introduction to keras, Tensorflow, Pytorch
Parameters of Comparison
Level of API
Speed
Architecture
Ease of Code
Debugging
Community Support
Datasets
Popularity
Suitable use cases
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This lectures provides students with an introduction to natural language processing, with a specific focus on the basics of two applications: vector semantics and text classification.
(Lecture at the QUARTZ PhD Winter School (http://www.quartz-itn.eu/training/winter-school/ in Padua, Italy on February 12, 2018)
[기초개념] Recurrent Neural Network (RNN) 소개Donghyeon Kim
* 시계열 데이터의 시간적 속성을 이용하는 RNN과 그 한계점을 극복하기 위한 LSTM, GRU 기법에 대해 기본적인 개념을 소개합니다.
* 광주과학기술원 인공지능 스터디 A-GIST 모임에서 발표했습니다.
* 발표 영상 (유튜브, 한국어): https://youtu.be/Dt2SCbKbKvs
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
Supervised Machine learning in R is discussed with R basics and how to clean, pre-process , partitioning. It also discusess some algorithms and how to control training itself using cross-validation.
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. In this lecture, we will cover some of the fundamentals of neural representation learning for text retrieval. We will also discuss some of the recent advances in the applications of deep neural architectures to retrieval tasks.
(These slides were presented at a lecture as part of the Information Retrieval and Data Mining course taught at UCL.)
Conversational AI with Transformer ModelsDatabricks
With the advancements in Artificial Intelligence (AI) and cognitive technologies, automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.
In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.
We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.
Key takeaways:
Understanding the building blocks & flow of Conversational Agents.
Advantages of Transformer based models over RNN/LSTMS
Knowledge distillation techniques
Different model compressions techniques including Quantization
Sample code in PyTorch & TF2
This presentation Neural Network will help you understand what is a neural network, how a neural network works, what can the neural network do, types of neural network and a use case implementation on how to classify between photos of dogs and cats. Deep Learning uses advanced computing power and special types of neural networks and applies them to large amounts of data to learn, understand, and identify complicated patterns. Automatic language translation and medical diagnoses are examples of deep learning. Most deep learning methods involve artificial neural networks, modeling how our brains work. Neural networks are built on Machine Learning algorithms to create an advanced computation model that works much like the human brain. This neural network tutorial is designed for beginners to provide them the basics of deep learning. Now, let us deep dive into these slides to understand how a neural network actually work.
Below topics are explained in this neural network presentation:
1. What is Neural Network?
2. What can Neural Network do?
3. How does Neural Network work?
4. Types of Neural Network
5. Use case - To classify between the photos of dogs and cats
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms.
Learn more at: https://www.simplilearn.com
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...SlideTeam
Introducing Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates Complete Deck. This ready-to-use backpropagation PowerPoint visuals can be used to explain the concept of artificial intelligence, machine learning, and deep learning easily to your audience. Discuss the types of artificial intelligence including deep learning, machine learning, and artificial intelligence. Present the goals of AI research which constitutes reasoning, knowledge representation, learning, natural language processing, artificial neural networks by taking the advantage of our neural network PowerPoint slide designs. Describe the concept of machine learning and discuss how it helps in analyzing customer queries and provide support for human customer support executives. You can also showcase the comparison between artificial intelligence, deep learning and machine learning. Make your audience familiar with the usages of artificial intelligence such as customer services, supply chain, human resources, customer insight etc. Challenges and limitations of machine learning can also be discussed by using our content-ready computational statistics PPT themes. Thus, download our ready-to-use artificial neural network PowerPoint slide deck and increase your business efficiency. https://bit.ly/2YlHC9s
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
This presentation on TensorFlow will help you understand what is Deep Learning and it's libraries, why use TensorFlow, what is TensorFlow, how to build a computational graph, programming using elements in TensorFlow, what are Recurrent Neural Networks along with a use case implementation on TensorFlow. TensorFlow is a software library developed by Google for the purposes of conducting machine learning and deep neural network research. In this video, you will learn the fundamentals of TensorFlow concepts, functions and operations required to implement deep learning algorithms and leverage data like never before. Now let's get started in mastering the concept of Deep Learning using TensorFlow.
Below topics are explained in this TensorFlow presentation:
1. What is Deep Learning?
2. Top Deep Learning libraries?
3. Why use TensorFlow?
4. What is TensorFlow?
5. Building a computational graph
6. Programming elements in TensorFlow
7. Introducing Recurrent Neural Networks
8. Use case implementation of RNN using TensorFlow
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks. Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning and artificial intelligence
Learn more at: https://www.simplilearn.com
Introductions to Online Machine Learning AlgorithmsDataWorks Summit
Online algorithms are an increasingly popular yet often misunderstood branch of machine learning, where model parameter estimates are updated for each new piece of information received. While mini-batch methods have often been mislabeled as 'streaming-machine learning', true online methods have different implementations and goals. This talk will explain key differences between online and offline machine learning, an introduction to many common online algorithms, and how online algorithms can be analyzed. An example using Apache Flink to detect trends on Twitter will be presented. Attendees will come away from this talk with a better understanding of the challenges and opportunities from working with online algorithms and how they can begin implementing their own algorithms in Apache Flink.
Supervised Machine learning in R is discussed with R basics and how to clean, pre-process , partitioning. It also discusess some algorithms and how to control training itself using cross-validation.
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. In this lecture, we will cover some of the fundamentals of neural representation learning for text retrieval. We will also discuss some of the recent advances in the applications of deep neural architectures to retrieval tasks.
(These slides were presented at a lecture as part of the Information Retrieval and Data Mining course taught at UCL.)
Conversational AI with Transformer ModelsDatabricks
With the advancements in Artificial Intelligence (AI) and cognitive technologies, automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.
In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.
We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.
Key takeaways:
Understanding the building blocks & flow of Conversational Agents.
Advantages of Transformer based models over RNN/LSTMS
Knowledge distillation techniques
Different model compressions techniques including Quantization
Sample code in PyTorch & TF2
This presentation Neural Network will help you understand what is a neural network, how a neural network works, what can the neural network do, types of neural network and a use case implementation on how to classify between photos of dogs and cats. Deep Learning uses advanced computing power and special types of neural networks and applies them to large amounts of data to learn, understand, and identify complicated patterns. Automatic language translation and medical diagnoses are examples of deep learning. Most deep learning methods involve artificial neural networks, modeling how our brains work. Neural networks are built on Machine Learning algorithms to create an advanced computation model that works much like the human brain. This neural network tutorial is designed for beginners to provide them the basics of deep learning. Now, let us deep dive into these slides to understand how a neural network actually work.
Below topics are explained in this neural network presentation:
1. What is Neural Network?
2. What can Neural Network do?
3. How does Neural Network work?
4. Types of Neural Network
5. Use case - To classify between the photos of dogs and cats
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms.
Learn more at: https://www.simplilearn.com
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...SlideTeam
Introducing Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates Complete Deck. This ready-to-use backpropagation PowerPoint visuals can be used to explain the concept of artificial intelligence, machine learning, and deep learning easily to your audience. Discuss the types of artificial intelligence including deep learning, machine learning, and artificial intelligence. Present the goals of AI research which constitutes reasoning, knowledge representation, learning, natural language processing, artificial neural networks by taking the advantage of our neural network PowerPoint slide designs. Describe the concept of machine learning and discuss how it helps in analyzing customer queries and provide support for human customer support executives. You can also showcase the comparison between artificial intelligence, deep learning and machine learning. Make your audience familiar with the usages of artificial intelligence such as customer services, supply chain, human resources, customer insight etc. Challenges and limitations of machine learning can also be discussed by using our content-ready computational statistics PPT themes. Thus, download our ready-to-use artificial neural network PowerPoint slide deck and increase your business efficiency. https://bit.ly/2YlHC9s
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
This presentation on TensorFlow will help you understand what is Deep Learning and it's libraries, why use TensorFlow, what is TensorFlow, how to build a computational graph, programming using elements in TensorFlow, what are Recurrent Neural Networks along with a use case implementation on TensorFlow. TensorFlow is a software library developed by Google for the purposes of conducting machine learning and deep neural network research. In this video, you will learn the fundamentals of TensorFlow concepts, functions and operations required to implement deep learning algorithms and leverage data like never before. Now let's get started in mastering the concept of Deep Learning using TensorFlow.
Below topics are explained in this TensorFlow presentation:
1. What is Deep Learning?
2. Top Deep Learning libraries?
3. Why use TensorFlow?
4. What is TensorFlow?
5. Building a computational graph
6. Programming elements in TensorFlow
7. Introducing Recurrent Neural Networks
8. Use case implementation of RNN using TensorFlow
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks. Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning and artificial intelligence
Learn more at: https://www.simplilearn.com
Introductions to Online Machine Learning AlgorithmsDataWorks Summit
Online algorithms are an increasingly popular yet often misunderstood branch of machine learning, where model parameter estimates are updated for each new piece of information received. While mini-batch methods have often been mislabeled as 'streaming-machine learning', true online methods have different implementations and goals. This talk will explain key differences between online and offline machine learning, an introduction to many common online algorithms, and how online algorithms can be analyzed. An example using Apache Flink to detect trends on Twitter will be presented. Attendees will come away from this talk with a better understanding of the challenges and opportunities from working with online algorithms and how they can begin implementing their own algorithms in Apache Flink.
2017 tensor flow dev summit (Sequence Models and the RNN API)
작성된 자료로 2017년 2월 22일 오후 8시 부터 Maru180에서
GDG Seoul 에서 주최한 2017 Tensorflow Dev Summit Extended Seou에서
발표를 진행
Sequence Models and the RNN API 정리 내역 공유
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
2. Who Am I
2
2018.06~2019.07: SYSTRAN (현: LLsoLLu )
Automatic Speech Recognition
Machine Translation
Grammar Error Correction
Real-time lecture interpreter system
2019.07~Now: Korea University
컴퓨터학과 대학원 과정(지도교수: 임희석 교수님)
NLP&AI Lab
Research Interest:
기계번역, 문법교정, 실시간 통역 시스템
네이버 자연언어처리 전문 블로그 운영(For: 비전공자)
3. 목차
3
1. Overview of MT
2. RBMT
3. SMT
4. NMT (RNN ~ Transformer ~ MASS)
5. NMT 하위 분야
6. Open Source
7. 기계번역 서비스 해보기
15. 한눈에 보는 통계기반 기계번역
15
처음에는 단어(word) 단위로 번역을 수행
2003년 여러 개의 단어 묶음인 구(Phrase) 단위의 번역 방식 제안:
단어 단위 보다 잘 번역
구(Phrase)내에 변수 개념을 도입한 것이 Hierarchical Phrase-
Based SMT
예를 들어 "eat an apple -> 사과를 먹다" 이렇게 하는 것이 아닌
"eat X --> X를 먹다" 이런식으로 표현!!
이러한 방식의 장점은 X에 apple, banna 등 다양한 단어를 수용할
수 있음 !!
Prereordering-based SMT는 번역하기 전 어순을 바꾸는 작업 !!
Syntax Base SMT는 Hierarchical Phrase-Based SMT에서 eat X
를 eat NP(명사구) 로 변경한 것 !! 즉 모든 구가 올 수 있는 것이
아닌 명사구만 올 수 있다고 한정 지어 불필요한 번역 후보를
사전에 제거 !!
29. Annotations == Bi RNN Concat Hidden
29
인코더에서는 Bidirectional RNN을 사용하여
Forward Network와 Backward Network에서
Hidden State Vector Set 를 각각 생성하여 각각의
단어 별로 두 벡터들을 합하여 Context Vector Set
를 생성한다. (annotations)
30. Attention
30
Attention Weight(Energy) 를 결정하기 위해서 Feed-
Forward Neural Network(FFNN) 와 같은 신경망이
내부적으로 사용되고, Attention Weight를 이용하여
Context Vector Set의 가중치 합 (Weighted Sum)을
구하여 새로운 Context Vector C t를 아래와 같이 구한다:
S
E: Energy
a: Align
(좌)C: Context vector
(우)C: annotation
(우)C: annotation
31. 디코딩 과정
31
디코더는 새로 구한 Context Vector C t 와 디코더의
이 전 Hidden State Vector S t-1 와 이전 출력 단어
Y t-1을 입 력으로 받아서 Hidden State Vector S t를
갱신하고 이를 이용하여 새로운 출력 단어 Yt를
결정한다.
32. 32
더 쉽게
기존 RNN
디코딩에서 h3만 사용함.
출처)
https://www.youtube.com/watch?v=WsQLdu2JMgI&t=4
18s
55. Self Attention
55
이 벡터들은 입력 벡터에 대해서
세 개의 학습 가능한 행렬들을
각각 곱함으로써 만들어짐!!
(WQ,WK,WV)기존의 벡터들 보다 더 작은 사이즈(64차원)
multi-head attention의 계산 복잡도를 일정하게 만들고자 내린 구조적인 선택
63. Multi Head Attention
63
self-attention 계산 과정을 8개의 다른 weight 행렬들에 대해 8번 거친다.
1방에 하지 말고 8개로 나누어서 해보자 !!
➔ 여러 개의 “representation 공간”을 가지게 해줌!!
쉽게 생각해 다양한 표현을 학습함으로 더 잘 표현할 수 있음
self.dim_per_head = model_dim // head_count
64. Multi Head Attention
64
각각의 Query Key Value벡터를 가진다 !
즉 8번의 Scale Dot Product Attention과정을 거친다 !!
따라서 결과값도 8개가 나올 것임 !!
65. Multi Head Attention
65
문제는 이 8개의 행렬을 바로 feed-forward layer으로 보낼 수 없다.
feed-forward layer 은 한 위치에 대해 오직 한 개의 행렬만을 input으로 받음 !
일단 모두 이어 붙여서 하나의 행렬로 만들어버리고, 그 다음 하나의 또 다른 weight 행렬인 W0을 곱해버립니다.
1개로 만들어버리기 성공 !!
74. Decoder – Encoder와의 차이점
74
1. Masked
2. Key 와 Value 행렬들을
encoder의 출력에서
가져온다.
75. Decoder – Encoder와의 차이점
75
<1> Masked
Decoder에서의 self-attention layer은 output
sequence 내에서 현재 위치의 이전 위치들에
대해서만 Attention을 진행 할 수 있다.
➔ Decoder에서는 encoder와
달리 순차적으로 결과를 만들어내야 하기
때문 !!
<2> Query 행렬들을 그 밑의 layer에서
가져오고 Key 와 Value 행렬들을 encoder의
출력에서 가져온다.
77. Linear & Softmax
77
Decoder를 거치고 난 후에는 소수로 이루어진 벡터 하나가 남게 됨 !
이것을 Vocab 즉 단어로 변환해야 함 !!
Linear layer은 fully-connected 신경망으로
decoder가 마지막으로 출력한 벡터를
그보다 훨씬 더 큰 사이즈의 벡터인 logits 벡터로 투영시킴 !
Logits벡터는 Vocab size와 동일 !!!
softmax layer는 이 점수들을 확률로 변환해주는 역할
83. 83
MASS
Input에서 k 개의 토큰 조각을 임의로 지정해 마스킹 한 후, 마스킹 된 해당조각들을
예측하도록 훈련시키는 Pre-training 기법
인코더 측 에서 마스킹 되지 않은 토큰들이 디코더 측에서 마스킹됨에 따라,디코더는
인코더가 제공한 hidden representation과 Attention 정보만을 참고해 마스킹 된 토큰들을
예측해야 하게 되고 이는Encoder와 Decoder가 함께Pre-train될 수 있는 환경을 제공하게 됨
84. 84
MASS
K: 마스킹 되는 토큰 개수
BERT의 Masked LM: k=1일때 인코더 측에서는 하나의 토큰이 마스킹 되고,디코더는 마스킹 된 하나의 토큰을 예측
GPT의 Standard LM: k가Input 문장의 전체 길이인 m과 같을 때 인코더 측의 모든 토큰들이 마스킹 됨
85. 85
MASS
기본적으로 MASS는 Pre-training을 위해 monolingual 데이터를 필요로 함
그러나 기계번역과같은 Cross-lingual Task도 수행할 수 있음
영어-불어번역과 같은 Cross-lingual Task의 수행을 위해서는 Pre-training으로 하나의MASS 모델에
‘English-English’와 ‘ French-French’ 데이터를 모두 학습시키는 방법을 사용
이때, 두 언어를 구분 짓기 위해 “Language embedding” 이라는 정보를 추가적으로 더해 줌
Facebook의 XLM은 인코더와 디코더를 각각 BERT의 Masked LM과 standard LM으로 Pre-train 시킨 모델
88. Automatic Post Editing
88
번역문 사후 교정(Automatic Post-Editing, APE)은 기계 번역 시스템이 생성한 결과물에
포함되어 있는 오류를 수정하여 더 나은 품질의 번역문을 만들어내는 과정
89. Quality Estimation
89
기계번역 품질 예측은 정답번역문의 참고 없이 기계번역문장의 번역품질을
예측하는 것을 말하며, 최근 들어 기계번역분야에서 중요성이 강조되고 있다.
사용자가 원시 언어(source language) 나 목표 언어(target language)를 잘 알지 못하는
경우 유용!
BLEU ??
90. Quality Estimation
90
원시문장 (source sentence)와 기계번역문장(translation sentence, target
sentence)만을 보고 기계번역문장의 번역 품질 (translation quality)을
예측(estimation)
즉 정답문장이 필요 없음 !! BLEU와 결정적 차이점
데이터(QE 데이터):
1) 원시문장, 2) 기계번역문장, 3) 기계번역 문장의 각 수준 별 품질 주석
(quality annotations)
104. 참고자료
104
[1]Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural Machine Translation By Jointly Learning To Align and Translate. In ICLR,
pages 1–15
[2]Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase
Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proc of EMNLP.
[3] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin.
Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
[4] Marcin Junczys-Dowmunt, Roman Grundkiewicz, MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source
Transformer for Automatic Post-Editing
[5] Rikters, Matīss, Impact of Corpora Quality on Neural Machine Translation(2018), In Proceedings of the 8th Conference Human
Language Technologies - The Baltic Perspective (Baltic HLT 2018)
[6]H. Xu and P. Koehn, Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora, Emnlp (2017),
2935–2940. http://www.aclweb.org/anthology/D17-1319% 0Ahttp://aclweb.org/anthology/D17-1318
[7] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proc.
of ACL
[8] Taku Kudo, John Richardson, SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural
Text Processing, EMNLP2018
[9] Taku Kudo. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In
Proc. of ACL.
105. 참고자료
105
[10] Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. Opennmt: Open-source toolkit for
neural machine translation. CoRR, abs/1701.02810.
[11] Hyun Kim, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-Estimator using Multilevel Task Learning with
Stack Propagation for Neural Quality Estimation. In Conference on Machine Translation (WMT).
[12] 이 창 기 , 김 준 석 , 이 형 규 , 이 재 송 . (2015). Neural Machine Translation 기 반 의 영 어 - 일 본 어
자동번역. 정보과학회지, 33(10), 48-52.
[13] http://jalammar.github.io/illustrated-transformer/
[14] 김기현의 자연어처리 딥러닝 캠프
[15] 자연어처리 튜토리얼 2019
[16] https://www.etri.re.kr/webzine/20190201/sub02.html
[17] https://blog.naver.com/bcj1210
[18] https://brunch.co.kr/@kakao-it/156
[19] https://www.youtube.com/watch?v=WsQLdu2JMgI