SlideShare a Scribd company logo
1 of 13
Min-Seo Kim
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: kms39273@naver.com
1
Previous work
RNN(Recurrent Neural Network)
• Utilizes the structure of RNN, which is suitable for processing sequence data or time-series data.
• RNN incorporates past information into current decisions, enabling the understanding of the continuity and
context of data over time.
2
Previous work
LSTM (Long Short-Term Memory)
• LSTM emerged as a solution to the problems of long-term dependencies, where in vanilla RNNs, information
from earlier time steps fails to be sufficiently transmitted to later stages as the sequence lengthens.
3
Previous work
GRU (Gated Recurrent Unit)
• While LSTM requires considerable computing power due to the presence of four neural networks within a
single cell, GRU emerged as an improvement, implementing a similar mechanism with only three neural
networks.
4
Background
• To address the bottleneck issue caused by a single, fixed-size context vector, there has been a shift towards
machine translation approaches that move beyond the RNN-based framework.
Problem with the Encoder-Decoder Model
5
Methodology
Methodology
• Does not use networks that consider sequence order, such as RNN or CNN.
• Utilizes positional encoding to account for the position, and employs self-
attention techniques separately to consider context. encoder
decoder
6
Baseline
Scaled Dot-Product Attention
• Takes Query (Q), Key (K), and Value (V) as
inputs.
7
Baseline
Multi-Head Attention
• More efficient than using a single attention function. It involves mapping queries,
keys, and values through linear projections to intermediate representations. This
process creates multiple attention functions, each with different sets of inputs.
8
Experiments
English-to-German translation task (WMT 2014)
• Measures the performance of translations by comparing how similar machine-translated results are to those
translated by humans.
• It is observed that the Transformer demonstrates higher performance compared to other models, while also
having a lower training cost.
9
Experiments
Model Variation
10
Experiments
English Constituency Parsing
• To test the Transformer's effectiveness in other tasks, it has been applied to the English Constituency Parsing
task.
• Constituency Parsing involves classifying words according to their grammatical constituents.
• Despite not being specifically tuned for this task, the Transformer demonstrates good performance.
11
Paper review
• The Transformer replaces the recurrent layers commonly used in encoder-decoder architectures with multi-
headed self-attention.
• For translation tasks, the Transformer can be trained much faster than architectures based on recurrent or
convolutional layers.
• There is great anticipation for the future of attention-based models, and plans are in place to apply them to
other tasks.
• Plans include extending the Transformer to handle input and output modalities beyond text, and exploring
local, restricted attention mechanisms to efficiently process large inputs and outputs such as images, audio,
and video.
• Another research goal is to make the generation process less sequential.
Conclusions
240115_Attention Is All You Need (2017 NIPS).pptx

More Related Content

Similar to 240115_Attention Is All You Need (2017 NIPS).pptx

Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning OptimizationNikolas Markou
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# WayBishnu Rawal
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedSEMINARGROOT
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelNikhil Sharma
 
PMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processingPMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processingMd. Mashiur Rahman
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...NECST Lab @ Politecnico di Milano
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyRimzim Thube
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
 

Similar to 240115_Attention Is All You Need (2017 NIPS).pptx (20)

Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
Story story ppt
Story story pptStory story ppt
Story story ppt
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
PMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processingPMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processing
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
TensorFlow.pptx
TensorFlow.pptxTensorFlow.pptx
TensorFlow.pptx
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 

More from thanhdowork

[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...thanhdowork
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...thanhdowork
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...thanhdowork
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...thanhdowork
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...thanhdowork
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...thanhdowork
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...thanhdowork
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
 
240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptxthanhdowork
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...thanhdowork
 
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....thanhdowork
 
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptxthanhdowork
 
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...thanhdowork
 
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptxthanhdowork
 
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptxthanhdowork
 
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...thanhdowork
 
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...thanhdowork
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork
 

More from thanhdowork (20)

[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
 
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
 
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
 
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
 
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
 
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
 
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
 
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 

240115_Attention Is All You Need (2017 NIPS).pptx

  • 1. Min-Seo Kim Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: kms39273@naver.com
  • 2. 1 Previous work RNN(Recurrent Neural Network) • Utilizes the structure of RNN, which is suitable for processing sequence data or time-series data. • RNN incorporates past information into current decisions, enabling the understanding of the continuity and context of data over time.
  • 3. 2 Previous work LSTM (Long Short-Term Memory) • LSTM emerged as a solution to the problems of long-term dependencies, where in vanilla RNNs, information from earlier time steps fails to be sufficiently transmitted to later stages as the sequence lengthens.
  • 4. 3 Previous work GRU (Gated Recurrent Unit) • While LSTM requires considerable computing power due to the presence of four neural networks within a single cell, GRU emerged as an improvement, implementing a similar mechanism with only three neural networks.
  • 5. 4 Background • To address the bottleneck issue caused by a single, fixed-size context vector, there has been a shift towards machine translation approaches that move beyond the RNN-based framework. Problem with the Encoder-Decoder Model
  • 6. 5 Methodology Methodology • Does not use networks that consider sequence order, such as RNN or CNN. • Utilizes positional encoding to account for the position, and employs self- attention techniques separately to consider context. encoder decoder
  • 7. 6 Baseline Scaled Dot-Product Attention • Takes Query (Q), Key (K), and Value (V) as inputs.
  • 8. 7 Baseline Multi-Head Attention • More efficient than using a single attention function. It involves mapping queries, keys, and values through linear projections to intermediate representations. This process creates multiple attention functions, each with different sets of inputs.
  • 9. 8 Experiments English-to-German translation task (WMT 2014) • Measures the performance of translations by comparing how similar machine-translated results are to those translated by humans. • It is observed that the Transformer demonstrates higher performance compared to other models, while also having a lower training cost.
  • 11. 10 Experiments English Constituency Parsing • To test the Transformer's effectiveness in other tasks, it has been applied to the English Constituency Parsing task. • Constituency Parsing involves classifying words according to their grammatical constituents. • Despite not being specifically tuned for this task, the Transformer demonstrates good performance.
  • 12. 11 Paper review • The Transformer replaces the recurrent layers commonly used in encoder-decoder architectures with multi- headed self-attention. • For translation tasks, the Transformer can be trained much faster than architectures based on recurrent or convolutional layers. • There is great anticipation for the future of attention-based models, and plans are in place to apply them to other tasks. • Plans include extending the Transformer to handle input and output modalities beyond text, and exploring local, restricted attention mechanisms to efficiently process large inputs and outputs such as images, audio, and video. • Another research goal is to make the generation process less sequential. Conclusions

Editor's Notes

  1. RNN based encoder 입력
  2. Y t-1 이전단어 S t hidden state C context-vector
  3. Y t-1 이전단어 S t hidden state C context-vector
  4. RNNencdec-30 attention을 적용하지 않은 baseline Search – 어텐션 적용
  5. RNNencdec-30 attention을 적용하지 않은 baseline Search – 어텐션 적용
  6. RNNencdec-30 attention을 적용하지 않은 baseline Search – 어텐션 적용