SlideShare a Scribd company logo
1 of 38
Min-Seo Kim
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: kms39273@naver.com
1
Background
• Machine translation is a field that has been extensively researched for a long time in the areas of NLP (Natural
Language Processing) and Computer Science.
• Initially, research in this field was based on rule-based methods, but these methods did not yield good
performance.
• Later, a shift towards statistical methods led to significant improvements in performance.
• The emergence of Neural Machine Translation (NMT) has received considerable attention and acclaim in the
field.
Machine Translation
2
Background
• RBMT is designed based on the grammatical, syntactic, and lexical rules of each language.
• Since the order of sentences varies from language to language, specific translation rules are required for each
language.
• The process involves converting the analyzed language into an intermediate language (interlingual) and then
mapping it back to the target language's words through a reverse process.
• This method lacks flexibility and has difficulty adapting to new languages or expressions.
Ruled based Machine Translation(RBMT)
3
Background
• SMT operates using statistical probability models.
• It estimates the probability of a given source language sentence being translated into a target language
sentence.
• Unlike the rule-based approach of RBMT, SMT relies more on patterns learned from large amounts of data
rather than on the rules and structure of the language.
Statistical Machine Translation(SMT)
4
Background
• NMT is a machine translation approach based on artificial neural networks.
• It converts input sentences into vector forms.
• Utilizes an Encoder – Decoder structure.
• Requires only pairs of data – operates in an End-to-End manner.
Neural Machine Translation(NMT)
5
Background
• I was wondering if you can help me on this problem
• Tokenization: Breaking down the sentence into individual words or tokens.
• Example: "I / was / wondering / if / you / can / help / me / on / this / problem
• Cleaning and Extraction: Removing unnecessary words and extracting the essential ones.
• Example: "i / wondering / you / help / problem”
• Encoding: Representing words or phrases in a numerical format.
• Example 1: "4, 6, 1, 5, 7“
• Example 2: "[1, 0], [0, 1, 0], [1, 1, 0], [1, 1, 1], [0, 0]"
Text preprocessing
6
Background
• Breaking down the sentence "화분에 예쁜 꽃이 피었다" into
individual grammatical components.
• Tokenization:
• 화분(명사) / 에(조사) / 예쁘(어간) / ㄴ(어미) / 꽃(명사) /
이(조사) / 피(어간) / 었(어미) / 다(어미
• Conjugation of the verb '모르다'
Tokenization
• It is very difficult to tokenize.
• The need for efficient preprocessing.
7
Background
• An example code for tokenization using TreebankWordTokenizer.
Tokenization
8
Background
• Cleaning:
• Converting from uppercase to lowercase.
• Removing words with low frequency of occurrence.
• Eliminating short words, pronouns, and articles.
• Stemming(어간 추출):
• Extracting word stems.
• Example 1: Using the Porter Algorithm.
• Lemmatization(표제어 추출):
• Extracting lemmas (base forms of words).
• Stopword(불용어 제거):
• Removing stopwords (commonly used words that may be irrelevant in some contexts).
Cleaning and Extraction
9
Background
• Integer-Encoding
Encoding
10
Problem Statement
• The problem of a bottleneck due to fixed-size vectors.
• In Neural Machine Translation (NMT), performance significantly deteriorates with longer sentences.
Issues with the Encoder-Decoder Model
11
Previous work
• A deep learning structure designed for analyzing sequential data.
• O(2) reflects both past and current information using h(1) and x(2).
• U: Input layer to hidden layer.
• W: Hidden layer at time t to hidden layer at time t+1.
• V: Hidden layer to output layer.
Recurrent Neural Networks (RNN)
12
Previous work
• RNNs are models that are flexible in terms of the length of input and output values, allowing for the
construction of RNNs in various structures depending on the form of the input and output.
Recurrent Neural Networks (RNN)
13
Previous work
Recurrent Neural Networks (RNN)
14
Previous work
• As the time steps in a vanilla RNN increase, there arises a problem of long-term dependencies, where
information from earlier time steps is not sufficiently transmitted to later stages.
• If important information for prediction is located at the beginning, it becomes impossible to predict effectively.
• Example 1: "I grew up in France and want to be a plumber who is the best in the world and I speak
fluent French."
Long Short-Term Memory (LSTM)
15
Previous work
Long Short-Term Memory (LSTM)
• The core idea of LSTM is to store the information from previous steps in
a memory cell and pass it forward.
• It determines how much of the past information to forget based on the
current information, multiplies it accordingly, and then adds the current
information to this result to pass it on to the next time step.
16
Previous work
Long Short-Term Memory (LSTM)
• Forget Gate
• A gate that decides how much of the past
information to forget.
17
Previous work
Long Short-Term Memory (LSTM)
• Input Gate & Input Candidate
• The Input Gate determines the amount of
current information to be entered into the cell,
and the Input Candidate calculates the current
information.
18
Previous work
Long Short-Term Memory (LSTM)
• Calculations in the Memory Cell
• The process of storing information in the
memory cell using the Forget Gate, Input Gate,
and Input Candidate.
19
Previous work
Long Short-Term Memory (LSTM)
• Output Gate
• The Output Gate decides the amount of the
current memory cell value to be output as the
current hidden layer value.
• Output Layer
• The calculation for the output layer is the same
as in RNN.
20
Previous work
• The Gated Recurrent Unit (GRU) is a model that simplifies the operational process of LSTM.
GRU
21
Previous work
GRU
• Reset Gate
• The purpose of the Reset Gate in GRU is to
appropriately reset past information. It uses the
sigmoid function as its output, generating
values between (0, 1), which are then multiplied
with the previous hidden layer.
22
Previous work
GRU
• Update Gate
• The design of GRU feels like a combination of
LSTM's forget gate and input gate, determining
the ratio of updating information between the
past and the present.
23
Previous work
GRU
• Candidate
• This stage involves calculating the current time
step's candidate information.
• The key aspect is that it doesn't use the
information from the past hidden layer directly;
instead, it multiplies it by the result of the reset
gate.
24
Previous work
GRU
• Hidden Layer Calculation
• This step involves calculating the current hidden
layer by combining the results of the update
gate and the candidate.
• The output of the sigmoid function determines
the amount of information from the current
time step, while 1 minus the sigmoid function's
output determines the amount of information
from the previous time step.
• It varies by context whether LSTM or GRU is more effective.
• A clear advantage of GRU is that it has fewer weights to learn.
25
Methodology
Methodology
26
Methodology
• Assume that the hidden state up to step t-1 has been calculated.
Methodology
27
Methodology
Methodology
• Apply an Fc (fully connected) layer followed by the tanh function.
• The distribution obtained after applying softmax is the Attention Distribution.
• A Context vector is created through a weighted sum.
28
Methodology
Methodology
• Concatenate the Context vector with the output at time t-1 and input it into the decoder.
• Then, by passing through an fc layer and softmax, predict the output at time t.
29
Baseline
Base RNN decoder
• Encoder's Hidden State
• The encoder converts the input sentence x = (x1, x2, ..., xTx) into a fixed-length vector C.
• This fixed-length vector is also known as the context vector c. ht ​represents the hidden state at time t.
30
Baseline
Base RNN decoder
• When given the context vector c from the encoder, the next word y t ​ is predicted based on c and the
previously predicted results y1 ,y2 ​ ,...,y t−1 ​ .
• The translation results are generated through the following conditional probability, and the objective is to
maximize this conditional probability.
31
Baseline
• A forward RNN sequence reads from the beginning in order and calculates
the forward hidden state.
• A backward RNN reads the sequence in reverse, from the end to the
beginning, and calculates the backward hidden state.
• For each word, concatenate the forward hidden state and the backward
hidden state.
Align and translate
• h_i contains information from both before and after the word.
32
Baseline
Bahdanau Attention Decoder
• The conditional probability p(yi | y1, ..., yi-1, X) represents the probability of the target sentence word yi
given the previous words and the input sentence X.
• This is calculated using the function g which depends on the previous output yi-1, the previous hidden state
si-1, and the context vector ci.
• The hidden state si is updated by the function f, taking as inputs the previous hidden state si-1, the previous
word of the output yi-1, and the context vector ci.
• The context vector ci is computed as a weighted sum of the hidden states hj of the encoder.
• The weights αij are determined by the attention mechanism.
• The attention weights αij are calculated using a softmax function applied to the alignment scores eij, which
measure the match between inputs around position j and the output at position i.
• The alignment model a computes eij as a function of the decoder's previous hidden state si-1 and the
encoder's hidden state hj.
33
Experiments
• Using the WMT’14 dataset for translating English to French.
• No UNK: Sentences without any unknown words.
• RNNsearch-50*: Trained for an extended period until there was no further improvement in performance on
the development set.
• Moses: A conventional phrase-based translation system that utilizes a separate monolingual corpus.
Quantitative results
• The longer the sentence, the more significant the difference in performance.
34
Experiments
Confusion matrix
35
Experiments
Confusion matrix
36
Paper review
• This paper introduces a novel architecture, 'RNNsearch', to overcome limitations of traditional neural machine
translation, by dynamically searching for relevant input words or their annotations while generating each
target word, rather than relying on a fixed-length vector.
• The experimental results show that RNNsearch significantly outperforms the conventional encoder-decoder model,
especially in translating longer sentences, and demonstrates robustness against the length of the source sentence.
• Remarkably, RNNsearch achieves a translation performance comparable to existing phrase-based statistical
machine translation systems, marking a promising step towards improved machine translation and a deeper
understanding of natural languages.
Conclusions
Neural machine translation by jointly learning to align and translate.pptx

More Related Content

Similar to Neural machine translation by jointly learning to align and translate.pptx

Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Joe Xing
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionSaumyaMundra3
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptxNibrasulIslam
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep LearningYaminiAlapati1
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 

Similar to Neural machine translation by jointly learning to align and translate.pptx (20)

Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Advanced Machine Learning
Advanced Machine LearningAdvanced Machine Learning
Advanced Machine Learning
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Lec10new
Lec10newLec10new
Lec10new
 
lec10new.ppt
lec10new.pptlec10new.ppt
lec10new.ppt
 
Icon18revrec sudeshna
Icon18revrec sudeshnaIcon18revrec sudeshna
Icon18revrec sudeshna
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 

More from ssuser2624f71

Vector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operationsVector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operationsssuser2624f71
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxssuser2624f71
 
인공지능 로봇 윤리_1229_9차시.pptx
인공지능 로봇 윤리_1229_9차시.pptx인공지능 로봇 윤리_1229_9차시.pptx
인공지능 로봇 윤리_1229_9차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1228_8차시.pptx
인공지능 로봇 윤리_1228_8차시.pptx인공지능 로봇 윤리_1228_8차시.pptx
인공지능 로봇 윤리_1228_8차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1227_7차시.pptx
인공지능 로봇 윤리_1227_7차시.pptx인공지능 로봇 윤리_1227_7차시.pptx
인공지능 로봇 윤리_1227_7차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1226_6차시.pptx
인공지능 로봇 윤리_1226_6차시.pptx인공지능 로봇 윤리_1226_6차시.pptx
인공지능 로봇 윤리_1226_6차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1222_5차시.pptx
인공지능 로봇 윤리_1222_5차시.pptx인공지능 로봇 윤리_1222_5차시.pptx
인공지능 로봇 윤리_1222_5차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1221_4차시.pptx
인공지능 로봇 윤리_1221_4차시.pptx인공지능 로봇 윤리_1221_4차시.pptx
인공지능 로봇 윤리_1221_4차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1220_3차시.pptx
인공지능 로봇 윤리_1220_3차시.pptx인공지능 로봇 윤리_1220_3차시.pptx
인공지능 로봇 윤리_1220_3차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1219_2차시.pptx
인공지능 로봇 윤리_1219_2차시.pptx인공지능 로봇 윤리_1219_2차시.pptx
인공지능 로봇 윤리_1219_2차시.pptxssuser2624f71
 
인공지능 로봇 윤리_1218_1차시.pptx
인공지능 로봇 윤리_1218_1차시.pptx인공지능 로봇 윤리_1218_1차시.pptx
인공지능 로봇 윤리_1218_1차시.pptxssuser2624f71
 
디지털인문학9차시.pptx
디지털인문학9차시.pptx디지털인문학9차시.pptx
디지털인문학9차시.pptxssuser2624f71
 
디지털인문학8차시.pptx
디지털인문학8차시.pptx디지털인문학8차시.pptx
디지털인문학8차시.pptxssuser2624f71
 
디지털인문학7차시.pptx
디지털인문학7차시.pptx디지털인문학7차시.pptx
디지털인문학7차시.pptxssuser2624f71
 
디지털인문학6차시.pptx
디지털인문학6차시.pptx디지털인문학6차시.pptx
디지털인문학6차시.pptxssuser2624f71
 
디지털인문학 5차시.pptx
디지털인문학 5차시.pptx디지털인문학 5차시.pptx
디지털인문학 5차시.pptxssuser2624f71
 
디지털인문학4차시.pptx
디지털인문학4차시.pptx디지털인문학4차시.pptx
디지털인문학4차시.pptxssuser2624f71
 
디지털인문학3차시.pptx
디지털인문학3차시.pptx디지털인문학3차시.pptx
디지털인문학3차시.pptxssuser2624f71
 
디지털인문학2차시.pptx
디지털인문학2차시.pptx디지털인문학2차시.pptx
디지털인문학2차시.pptxssuser2624f71
 
디지털인문학1차시.pptx
디지털인문학1차시.pptx디지털인문학1차시.pptx
디지털인문학1차시.pptxssuser2624f71
 

More from ssuser2624f71 (20)

Vector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operationsVector and Matrix operationsVector and Matrix operations
Vector and Matrix operationsVector and Matrix operations
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
 
인공지능 로봇 윤리_1229_9차시.pptx
인공지능 로봇 윤리_1229_9차시.pptx인공지능 로봇 윤리_1229_9차시.pptx
인공지능 로봇 윤리_1229_9차시.pptx
 
인공지능 로봇 윤리_1228_8차시.pptx
인공지능 로봇 윤리_1228_8차시.pptx인공지능 로봇 윤리_1228_8차시.pptx
인공지능 로봇 윤리_1228_8차시.pptx
 
인공지능 로봇 윤리_1227_7차시.pptx
인공지능 로봇 윤리_1227_7차시.pptx인공지능 로봇 윤리_1227_7차시.pptx
인공지능 로봇 윤리_1227_7차시.pptx
 
인공지능 로봇 윤리_1226_6차시.pptx
인공지능 로봇 윤리_1226_6차시.pptx인공지능 로봇 윤리_1226_6차시.pptx
인공지능 로봇 윤리_1226_6차시.pptx
 
인공지능 로봇 윤리_1222_5차시.pptx
인공지능 로봇 윤리_1222_5차시.pptx인공지능 로봇 윤리_1222_5차시.pptx
인공지능 로봇 윤리_1222_5차시.pptx
 
인공지능 로봇 윤리_1221_4차시.pptx
인공지능 로봇 윤리_1221_4차시.pptx인공지능 로봇 윤리_1221_4차시.pptx
인공지능 로봇 윤리_1221_4차시.pptx
 
인공지능 로봇 윤리_1220_3차시.pptx
인공지능 로봇 윤리_1220_3차시.pptx인공지능 로봇 윤리_1220_3차시.pptx
인공지능 로봇 윤리_1220_3차시.pptx
 
인공지능 로봇 윤리_1219_2차시.pptx
인공지능 로봇 윤리_1219_2차시.pptx인공지능 로봇 윤리_1219_2차시.pptx
인공지능 로봇 윤리_1219_2차시.pptx
 
인공지능 로봇 윤리_1218_1차시.pptx
인공지능 로봇 윤리_1218_1차시.pptx인공지능 로봇 윤리_1218_1차시.pptx
인공지능 로봇 윤리_1218_1차시.pptx
 
디지털인문학9차시.pptx
디지털인문학9차시.pptx디지털인문학9차시.pptx
디지털인문학9차시.pptx
 
디지털인문학8차시.pptx
디지털인문학8차시.pptx디지털인문학8차시.pptx
디지털인문학8차시.pptx
 
디지털인문학7차시.pptx
디지털인문학7차시.pptx디지털인문학7차시.pptx
디지털인문학7차시.pptx
 
디지털인문학6차시.pptx
디지털인문학6차시.pptx디지털인문학6차시.pptx
디지털인문학6차시.pptx
 
디지털인문학 5차시.pptx
디지털인문학 5차시.pptx디지털인문학 5차시.pptx
디지털인문학 5차시.pptx
 
디지털인문학4차시.pptx
디지털인문학4차시.pptx디지털인문학4차시.pptx
디지털인문학4차시.pptx
 
디지털인문학3차시.pptx
디지털인문학3차시.pptx디지털인문학3차시.pptx
디지털인문학3차시.pptx
 
디지털인문학2차시.pptx
디지털인문학2차시.pptx디지털인문학2차시.pptx
디지털인문학2차시.pptx
 
디지털인문학1차시.pptx
디지털인문학1차시.pptx디지털인문학1차시.pptx
디지털인문학1차시.pptx
 

Recently uploaded

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 

Recently uploaded (20)

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 

Neural machine translation by jointly learning to align and translate.pptx

  • 1. Min-Seo Kim Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: kms39273@naver.com
  • 2. 1 Background • Machine translation is a field that has been extensively researched for a long time in the areas of NLP (Natural Language Processing) and Computer Science. • Initially, research in this field was based on rule-based methods, but these methods did not yield good performance. • Later, a shift towards statistical methods led to significant improvements in performance. • The emergence of Neural Machine Translation (NMT) has received considerable attention and acclaim in the field. Machine Translation
  • 3. 2 Background • RBMT is designed based on the grammatical, syntactic, and lexical rules of each language. • Since the order of sentences varies from language to language, specific translation rules are required for each language. • The process involves converting the analyzed language into an intermediate language (interlingual) and then mapping it back to the target language's words through a reverse process. • This method lacks flexibility and has difficulty adapting to new languages or expressions. Ruled based Machine Translation(RBMT)
  • 4. 3 Background • SMT operates using statistical probability models. • It estimates the probability of a given source language sentence being translated into a target language sentence. • Unlike the rule-based approach of RBMT, SMT relies more on patterns learned from large amounts of data rather than on the rules and structure of the language. Statistical Machine Translation(SMT)
  • 5. 4 Background • NMT is a machine translation approach based on artificial neural networks. • It converts input sentences into vector forms. • Utilizes an Encoder – Decoder structure. • Requires only pairs of data – operates in an End-to-End manner. Neural Machine Translation(NMT)
  • 6. 5 Background • I was wondering if you can help me on this problem • Tokenization: Breaking down the sentence into individual words or tokens. • Example: "I / was / wondering / if / you / can / help / me / on / this / problem • Cleaning and Extraction: Removing unnecessary words and extracting the essential ones. • Example: "i / wondering / you / help / problem” • Encoding: Representing words or phrases in a numerical format. • Example 1: "4, 6, 1, 5, 7“ • Example 2: "[1, 0], [0, 1, 0], [1, 1, 0], [1, 1, 1], [0, 0]" Text preprocessing
  • 7. 6 Background • Breaking down the sentence "화분에 예쁜 꽃이 피었다" into individual grammatical components. • Tokenization: • 화분(명사) / 에(조사) / 예쁘(어간) / ㄴ(어미) / 꽃(명사) / 이(조사) / 피(어간) / 었(어미) / 다(어미 • Conjugation of the verb '모르다' Tokenization • It is very difficult to tokenize. • The need for efficient preprocessing.
  • 8. 7 Background • An example code for tokenization using TreebankWordTokenizer. Tokenization
  • 9. 8 Background • Cleaning: • Converting from uppercase to lowercase. • Removing words with low frequency of occurrence. • Eliminating short words, pronouns, and articles. • Stemming(어간 추출): • Extracting word stems. • Example 1: Using the Porter Algorithm. • Lemmatization(표제어 추출): • Extracting lemmas (base forms of words). • Stopword(불용어 제거): • Removing stopwords (commonly used words that may be irrelevant in some contexts). Cleaning and Extraction
  • 11. 10 Problem Statement • The problem of a bottleneck due to fixed-size vectors. • In Neural Machine Translation (NMT), performance significantly deteriorates with longer sentences. Issues with the Encoder-Decoder Model
  • 12. 11 Previous work • A deep learning structure designed for analyzing sequential data. • O(2) reflects both past and current information using h(1) and x(2). • U: Input layer to hidden layer. • W: Hidden layer at time t to hidden layer at time t+1. • V: Hidden layer to output layer. Recurrent Neural Networks (RNN)
  • 13. 12 Previous work • RNNs are models that are flexible in terms of the length of input and output values, allowing for the construction of RNNs in various structures depending on the form of the input and output. Recurrent Neural Networks (RNN)
  • 15. 14 Previous work • As the time steps in a vanilla RNN increase, there arises a problem of long-term dependencies, where information from earlier time steps is not sufficiently transmitted to later stages. • If important information for prediction is located at the beginning, it becomes impossible to predict effectively. • Example 1: "I grew up in France and want to be a plumber who is the best in the world and I speak fluent French." Long Short-Term Memory (LSTM)
  • 16. 15 Previous work Long Short-Term Memory (LSTM) • The core idea of LSTM is to store the information from previous steps in a memory cell and pass it forward. • It determines how much of the past information to forget based on the current information, multiplies it accordingly, and then adds the current information to this result to pass it on to the next time step.
  • 17. 16 Previous work Long Short-Term Memory (LSTM) • Forget Gate • A gate that decides how much of the past information to forget.
  • 18. 17 Previous work Long Short-Term Memory (LSTM) • Input Gate & Input Candidate • The Input Gate determines the amount of current information to be entered into the cell, and the Input Candidate calculates the current information.
  • 19. 18 Previous work Long Short-Term Memory (LSTM) • Calculations in the Memory Cell • The process of storing information in the memory cell using the Forget Gate, Input Gate, and Input Candidate.
  • 20. 19 Previous work Long Short-Term Memory (LSTM) • Output Gate • The Output Gate decides the amount of the current memory cell value to be output as the current hidden layer value. • Output Layer • The calculation for the output layer is the same as in RNN.
  • 21. 20 Previous work • The Gated Recurrent Unit (GRU) is a model that simplifies the operational process of LSTM. GRU
  • 22. 21 Previous work GRU • Reset Gate • The purpose of the Reset Gate in GRU is to appropriately reset past information. It uses the sigmoid function as its output, generating values between (0, 1), which are then multiplied with the previous hidden layer.
  • 23. 22 Previous work GRU • Update Gate • The design of GRU feels like a combination of LSTM's forget gate and input gate, determining the ratio of updating information between the past and the present.
  • 24. 23 Previous work GRU • Candidate • This stage involves calculating the current time step's candidate information. • The key aspect is that it doesn't use the information from the past hidden layer directly; instead, it multiplies it by the result of the reset gate.
  • 25. 24 Previous work GRU • Hidden Layer Calculation • This step involves calculating the current hidden layer by combining the results of the update gate and the candidate. • The output of the sigmoid function determines the amount of information from the current time step, while 1 minus the sigmoid function's output determines the amount of information from the previous time step. • It varies by context whether LSTM or GRU is more effective. • A clear advantage of GRU is that it has fewer weights to learn.
  • 27. 26 Methodology • Assume that the hidden state up to step t-1 has been calculated. Methodology
  • 28. 27 Methodology Methodology • Apply an Fc (fully connected) layer followed by the tanh function. • The distribution obtained after applying softmax is the Attention Distribution. • A Context vector is created through a weighted sum.
  • 29. 28 Methodology Methodology • Concatenate the Context vector with the output at time t-1 and input it into the decoder. • Then, by passing through an fc layer and softmax, predict the output at time t.
  • 30. 29 Baseline Base RNN decoder • Encoder's Hidden State • The encoder converts the input sentence x = (x1, x2, ..., xTx) into a fixed-length vector C. • This fixed-length vector is also known as the context vector c. ht ​represents the hidden state at time t.
  • 31. 30 Baseline Base RNN decoder • When given the context vector c from the encoder, the next word y t ​ is predicted based on c and the previously predicted results y1 ,y2 ​ ,...,y t−1 ​ . • The translation results are generated through the following conditional probability, and the objective is to maximize this conditional probability.
  • 32. 31 Baseline • A forward RNN sequence reads from the beginning in order and calculates the forward hidden state. • A backward RNN reads the sequence in reverse, from the end to the beginning, and calculates the backward hidden state. • For each word, concatenate the forward hidden state and the backward hidden state. Align and translate • h_i contains information from both before and after the word.
  • 33. 32 Baseline Bahdanau Attention Decoder • The conditional probability p(yi | y1, ..., yi-1, X) represents the probability of the target sentence word yi given the previous words and the input sentence X. • This is calculated using the function g which depends on the previous output yi-1, the previous hidden state si-1, and the context vector ci. • The hidden state si is updated by the function f, taking as inputs the previous hidden state si-1, the previous word of the output yi-1, and the context vector ci. • The context vector ci is computed as a weighted sum of the hidden states hj of the encoder. • The weights αij are determined by the attention mechanism. • The attention weights αij are calculated using a softmax function applied to the alignment scores eij, which measure the match between inputs around position j and the output at position i. • The alignment model a computes eij as a function of the decoder's previous hidden state si-1 and the encoder's hidden state hj.
  • 34. 33 Experiments • Using the WMT’14 dataset for translating English to French. • No UNK: Sentences without any unknown words. • RNNsearch-50*: Trained for an extended period until there was no further improvement in performance on the development set. • Moses: A conventional phrase-based translation system that utilizes a separate monolingual corpus. Quantitative results • The longer the sentence, the more significant the difference in performance.
  • 37. 36 Paper review • This paper introduces a novel architecture, 'RNNsearch', to overcome limitations of traditional neural machine translation, by dynamically searching for relevant input words or their annotations while generating each target word, rather than relying on a fixed-length vector. • The experimental results show that RNNsearch significantly outperforms the conventional encoder-decoder model, especially in translating longer sentences, and demonstrates robustness against the length of the source sentence. • Remarkably, RNNsearch achieves a translation performance comparable to existing phrase-based statistical machine translation systems, marking a promising step towards improved machine translation and a deeper understanding of natural languages. Conclusions

Editor's Notes

  1. 빈도수 정렬(sorting) 및 padding, 크로스엔트로피 계산을 위한 one-hot encoding, Word2wec encoding, TF-IDE encoding, interim Summary을 할수도 있지만따로 X
  2. 순차적 데이터를 분석하기 위한 딥러닝 구조 O(2)는 h(1)과 x(2)를 이용하여 과거 정보와 현재 정보 모두를 반영 U : 입력층 → 은닉층 W: t 시점 은닉층 → t+1 시점 은닉층 V : 은닉층 → 출력층
  3. RNN은 입력과 출력 값의 길이에 자유로운 모형이기 때문에 Input과 Output의 형태에 따라 다양한 구조로 RNN을 구성할 수 있음.
  4. 바닐라 RNN의 시점(time step)이 길어질 수록 앞의 정보가 뒤로 충분히 전달되지 못하는 장기의존성 문제(the problems of long-term dependencies)의 발생 예측을 위한 중요한 정보가 앞에 있을경우 예측 불가능
  5. LSTM의 핵심 아이디어는 이전 단계의 정보를 memory cell에 저장하여 흘려보내는 것 현재 시점의 정보를 바탕으로 과거의 내용을 얼마나 잊을지 곱해주고, 그 결과에 현재의 정보를 더해서 다음 시점으로 정보를 전달
  6. 현시점의 정보와 과거의 은닉층의 값에 각각 가중치를 곱하여 더한 후 sigmoid 함수를 적용하여, 그 출력 값을 직전 시점의 cell에 곱해줍니다. sigmoid 함수의 값은 (0,1)(0,1) 사이 값을 갖습니다. 이에 따라 1에 가깝다면 과거 정보를 많이 활용할 것이고, 0에 가깝다면 과거 정보를 많이 잃게됩니다.
  7. 현시점이 실제로 갖고 있는 정보(입력후보)가 얼마나 중요한지(입력게이트)를 반영하여 셀에 기록하는 것
  8. 먼저 과거의 정보를 망각게이트에서 계산 된 만큼 잊고, 현시점의 정보 후보에 입력게이트의 중요도를 곱해준 것을 더하여 현시점 기준 memory cell을 계산합니다. 이를 수식으로 나타내면 다음과 같습니다. 여기에서 ∗은 pointwise operation
  9. LSTM의 작동과정을 조금 단순화 시킨 모형이 Gated Recurrent Unit (GRU)
  10. Reset Gate는 과거의 정보를 적당히 리셋시키는게 목적으로 sigmoid 함수를 출력으로 이용해 (0,1) ( 0 , 1 ) 값을 이전 은닉층에 곱함
  11. LSTM의 forget gate와 input gate를 합쳐놓은 느낌으로 과거와 현재의 정보의 최신화 비율을 결정 Update gate에서는 sigmoid로 출력된 결과(u(t))는 현시점의 정보의 양을 결정하고, 1에서 뺀 값(1−u(t))는 직전 시점의 은닉층의 정보에 곱해주며, 각각이 LSTM의 input gate와 forget gate와 유사합니다.
  12. 현 시점의 정보 후보군을 계산하는 단계 핵심은 과거 은닉층의 정보를 그대로 이용하지 않고 리셋 게이트의 결과를 곱하여 이용 τ 는 tangent hyperbolic이고, ∗ 은 pointwise operation
  13. update gate 결과와 candidate 결과를 결합하여 현시점의 은닉층을 계산하는 단계 sigmoid 함수의 결과는 현시점 결과의 정보의 양을 결정하고, 1-sigmoid 함수의 결과는 과거 시점의 정보 양을 결정 주제별로 LSTM이 좋기도, GRU가 좋기도 하다 GRU가 학습할 가중치가 적다는 것은 확실한 이점
  14. RNN based encoder 입력
  15. t-1 step까지의 hidden state가 구해졌다는 가정 RNN based encoder 입력
  16. Fclayer 적용 후 tanh 적용 Softmax 이후 완성된 분포가 Attention Distribution Weighted sum을 통해 Context vector를 작성
  17. Context vector와 t-1번째 시점의 output과 concatnated를 통해 decode로 입력 이후 fc layer와 softmax를 거쳐 t시점 output을 예측
  18. Encoder의 hidden state Encoder는 입력으로 제공되는 문장 x=(x 1 ​ ,x 2 ​ ,...x Tx ​ )을 고정된 길이의 vector C로 변환 고정 길이 벡터이자 context vector c를 생성한다. ht​는 time t 에서의 hidden state를 의미 Y t-1 이전단어 S t hidden state C context-vector
  19. context vector c가 encoder로부터 주어졌을 때, c와 이전에 예측한 결과 y 1 ​ ,y 2 ​ ,...y t−1 ​ 을 기반으로 다음 단어 y t ​ 를 예측 번역의 결과는 아래의 조건부확률을 통해 생성된다. 그리고 조건부 확률을 최대화 Y t-1 이전단어 S t hidden state C context-vector
  20. BiRNN류, attention, decoder
  21. Y t-1 이전단어, S t hidden state, C context-vector A : log softmax
  22. RNNencdec-30 attention을 적용하지 않은 baseline Search – 어텐션 적용
  23. RNNencdec-30 attention을 적용하지 않은 baseline Search – 어텐션 적용
  24. RNNencdec-30 attention을 적용하지 않은 baseline Search – 어텐션 적용