SlideShare a Scribd company logo
1 of 13
Transformers for NLP
An Introduction to the Attention on Steroids
What is NLP?
Natural Language Processing
(NLP) is a subfield of artificial
intelligence which is primarily aimed
to program computers to process
and analyze large amounts of
natural language data
NLP techniques
In timeline ● Symbolic NLP (1950s to 1990s)
● Statistical NLP (1990s to 2010s)
● Neural NLP (2010s to present)
Deep Learning for NLP
Deep Learning Models
Neural Networks
Word to representation
(word2vec)
● Layered structure
● True targets vs
output predictions
● Weights and loss
functions
● Optimizers
RNN
Language Generation
Encoder-Decoder
Translation
● Joint RNNs
● Creating context
and using the final
context of first RNN
● Training using
parallel corpora
Image source: https://www.slideshare.net/darvind/nlp-using-transformers
Deep Learning Models
Attention Models
Translation and Image
Recognition
● Pay selective attention
● Utilize intermediate
states while decoding
● Do alignment while
translating
Transformer
Translation
Image source: https://www.slideshare.net/darvind/nlp-using-transformers
Transformers in
Translation
This architecture aims to solve
sequence-to-sequence tasks while
handling long-range dependencies with
ease. It relies entirely on self-attention
to compute representations of its input
and output WITHOUT using sequence-
aligned RNNs or convolution
Transformers in
Translation
Three types of attention
● Within encoder
● Encoder - Decoder
● Within decoder
Transformers in
Translation
Self-attention
● A rich representation which
captures the interdependencies
between the words compared
to single word embeddings
● Parallel computation
● Adhere to long-term
dependencies
Transformers in
Translation
Image source: https://arxiv.org/pdf/1706.03762.pdf
Transformers in
Translation
Challenges
● Attention can only deal with
fixed-length text strings. The
text has to be split into a
certain number of segments or
chunks before being fed into
the system as input
● This chunking of text causes
context fragmentation.
Transformers in
Translation
Future directions
● Transformer XL for language
modelling
● Google’s BERT
Thank you!!!

More Related Content

What's hot

ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUSri Geetha
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 

What's hot (20)

ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Lstm
LstmLstm
Lstm
 
Image captioning
Image captioningImage captioning
Image captioning
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 

Similar to Introduction to Transformer Model

Short story presentation
Short story presentationShort story presentation
Short story presentationStutiAgarwal36
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
Intro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdfIntro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdfomardesoky789
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Vimukthi Wickramasinghe
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learningBabu Priyavrat
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Universitat Politècnica de Catalunya
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...Kyuri Kim
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 

Similar to Introduction to Transformer Model (20)

Short story presentation
Short story presentationShort story presentation
Short story presentation
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Intro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdfIntro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdf
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 

More from Nuwan Sriyantha Bandara

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudPoint-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudNuwan Sriyantha Bandara
 
End-to-end Urban Street Air Quality Management Framework: A Conceptual Design
End-to-end Urban Street Air Quality Management Framework: A Conceptual DesignEnd-to-end Urban Street Air Quality Management Framework: A Conceptual Design
End-to-end Urban Street Air Quality Management Framework: A Conceptual DesignNuwan Sriyantha Bandara
 
An IoT-empowered air quality monitoring system integrated with a machine lear...
An IoT-empowered air quality monitoring system integrated with a machine lear...An IoT-empowered air quality monitoring system integrated with a machine lear...
An IoT-empowered air quality monitoring system integrated with a machine lear...Nuwan Sriyantha Bandara
 
Engineering Design Approach for an Optimized Analog Electronic Stethoscope
Engineering Design Approach for an Optimized Analog Electronic StethoscopeEngineering Design Approach for an Optimized Analog Electronic Stethoscope
Engineering Design Approach for an Optimized Analog Electronic StethoscopeNuwan Sriyantha Bandara
 
Predicting Autism Spectrum Disorder in Children based on Facial Morphological...
Predicting Autism Spectrum Disorder in Children based on Facial Morphological...Predicting Autism Spectrum Disorder in Children based on Facial Morphological...
Predicting Autism Spectrum Disorder in Children based on Facial Morphological...Nuwan Sriyantha Bandara
 
Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality
Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality
Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality Nuwan Sriyantha Bandara
 
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...Nuwan Sriyantha Bandara
 
Independent Study on a Pulse Oximetry IoT System Based on Powerline Technology
Independent Study on a Pulse Oximetry IoT System Based on Powerline TechnologyIndependent Study on a Pulse Oximetry IoT System Based on Powerline Technology
Independent Study on a Pulse Oximetry IoT System Based on Powerline TechnologyNuwan Sriyantha Bandara
 
AIRSPY: AI-enabled mobile framework to detect air quality
AIRSPY: AI-enabled mobile framework to detect air qualityAIRSPY: AI-enabled mobile framework to detect air quality
AIRSPY: AI-enabled mobile framework to detect air qualityNuwan Sriyantha Bandara
 
Cutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan Bandara
Cutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan BandaraCutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan Bandara
Cutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan BandaraNuwan Sriyantha Bandara
 

More from Nuwan Sriyantha Bandara (10)

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudPoint-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
 
End-to-end Urban Street Air Quality Management Framework: A Conceptual Design
End-to-end Urban Street Air Quality Management Framework: A Conceptual DesignEnd-to-end Urban Street Air Quality Management Framework: A Conceptual Design
End-to-end Urban Street Air Quality Management Framework: A Conceptual Design
 
An IoT-empowered air quality monitoring system integrated with a machine lear...
An IoT-empowered air quality monitoring system integrated with a machine lear...An IoT-empowered air quality monitoring system integrated with a machine lear...
An IoT-empowered air quality monitoring system integrated with a machine lear...
 
Engineering Design Approach for an Optimized Analog Electronic Stethoscope
Engineering Design Approach for an Optimized Analog Electronic StethoscopeEngineering Design Approach for an Optimized Analog Electronic Stethoscope
Engineering Design Approach for an Optimized Analog Electronic Stethoscope
 
Predicting Autism Spectrum Disorder in Children based on Facial Morphological...
Predicting Autism Spectrum Disorder in Children based on Facial Morphological...Predicting Autism Spectrum Disorder in Children based on Facial Morphological...
Predicting Autism Spectrum Disorder in Children based on Facial Morphological...
 
Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality
Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality
Novel Image Super-Resolution Algorithm for Improving Ultrasound Image Quality
 
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
 
Independent Study on a Pulse Oximetry IoT System Based on Powerline Technology
Independent Study on a Pulse Oximetry IoT System Based on Powerline TechnologyIndependent Study on a Pulse Oximetry IoT System Based on Powerline Technology
Independent Study on a Pulse Oximetry IoT System Based on Powerline Technology
 
AIRSPY: AI-enabled mobile framework to detect air quality
AIRSPY: AI-enabled mobile framework to detect air qualityAIRSPY: AI-enabled mobile framework to detect air quality
AIRSPY: AI-enabled mobile framework to detect air quality
 
Cutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan Bandara
Cutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan BandaraCutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan Bandara
Cutting-Edge Biomedical Technologies for Human Spaceflights by Nuwan Bandara
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 

Introduction to Transformer Model

  • 1. Transformers for NLP An Introduction to the Attention on Steroids
  • 2. What is NLP? Natural Language Processing (NLP) is a subfield of artificial intelligence which is primarily aimed to program computers to process and analyze large amounts of natural language data
  • 3. NLP techniques In timeline ● Symbolic NLP (1950s to 1990s) ● Statistical NLP (1990s to 2010s) ● Neural NLP (2010s to present)
  • 5. Deep Learning Models Neural Networks Word to representation (word2vec) ● Layered structure ● True targets vs output predictions ● Weights and loss functions ● Optimizers RNN Language Generation Encoder-Decoder Translation ● Joint RNNs ● Creating context and using the final context of first RNN ● Training using parallel corpora Image source: https://www.slideshare.net/darvind/nlp-using-transformers
  • 6. Deep Learning Models Attention Models Translation and Image Recognition ● Pay selective attention ● Utilize intermediate states while decoding ● Do alignment while translating Transformer Translation Image source: https://www.slideshare.net/darvind/nlp-using-transformers
  • 7. Transformers in Translation This architecture aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. It relies entirely on self-attention to compute representations of its input and output WITHOUT using sequence- aligned RNNs or convolution
  • 8. Transformers in Translation Three types of attention ● Within encoder ● Encoder - Decoder ● Within decoder
  • 9. Transformers in Translation Self-attention ● A rich representation which captures the interdependencies between the words compared to single word embeddings ● Parallel computation ● Adhere to long-term dependencies
  • 10. Transformers in Translation Image source: https://arxiv.org/pdf/1706.03762.pdf
  • 11. Transformers in Translation Challenges ● Attention can only deal with fixed-length text strings. The text has to be split into a certain number of segments or chunks before being fed into the system as input ● This chunking of text causes context fragmentation.
  • 12. Transformers in Translation Future directions ● Transformer XL for language modelling ● Google’s BERT