SlideShare a Scribd company logo
Transformers for
Natural Language Processing
Dr. Ash Pahwa
©2022 Dr. Ash Pahwa 1
Bio: Dr. Ash Pahwa
 Ph.D. Computer Science
 Website: www.AshPahwa.com
 Affiliation
 California Institute of Technology, Pasadena
 UC Irvine, UCLA, UCSD, Chapman
 Field of Expertise
 Machine Learning, Deep Learning, Digital Image Processing,
Database Management, CD-ROM/DVD
 Worked for
 General Electric, AT&T Bell Laboratories, Oracle, UC Santa Barbara
2
©2021 Dr. Ash Pahwa
Outline
1. What are Transformers?
2. Transformers – Applications: GPT+BERT
3. Problem – Context Sensitive Embeddings
1. Bank Word Embeddings
4. Transformer Architecture
1. Word2Vec Embeddings
2. Positional Encoding
3. Self Attention
4. Feed Forward Neural Network
©2022 Dr. Ash Pahwa 3
What are Transformers?
 Transformer are a new (2017) family of Deep
Learning neural network architecture
 Solution to the problems experienced by RNN
(Recurrent Neural Networks) architecture
 Transformer Architecture contains
 Encoder
 Decoder
 Primary application: Translation
Copyright 2021 - Dr. Ash Pahwa 4
Attention is All You Need
Google Research: NIPS 2017
Copyright 2021 - Dr. Ash Pahwa 5
Transformers Applications
BERT and GPT
 Google: BERT’s model architecture is based on the Encoder of Transformer
 OpenAI: GPT’s model architecture is based on the Decoder of the Transformer
Copyright 2021 - Dr. Ash Pahwa 6
OpenAI – GPT Generative Pre-trained Transformer
GPT-1 Paper: 2018 + GPT-2 Paper: 2019
GPT-3 Paper: 2020
 GPT-2: Language Models Are Unsupervised Multitask Learners, by Alec
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
 GPT-3: Language Models are Few-Shot Learners, by Tom B. Brown, Benjamin
Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind
Neelakantan,
©2022 Dr. Ash Pahwa 7
Bi-Directional Transformer
BERT: Sesame Street
Character
 ELMO: 2018: Bi-Directional RNN/LSTM
Embeddings from Language Models (ELMO):
Univ. of Washington
 2018 BERT from Google
 Bi-directional Encoder Representations from
Transformers
 Based on Transformer: Encoder
Copyright 2021 - Dr. Ash Pahwa 8
 BERT: Word Vector is different from
sentence 1 to sentence 2
Derivative Architecture from
BERT
Copyright 2021 - Dr. Ash Pahwa 9
GPT & BERT Applications
 GPT:
 Machine Translation
 Text Generation
 BERT
 Context sensitive enhanced word
embeddings
 Used in Google search engine
Copyright 2021 - Dr. Ash Pahwa 10
Synonymy & Polysemy
 Synonymy refers to cases where two different words have the same meaning
 Cars & Automobile
 Polysemy refers to cases where the same word has different meaning based on
the context
 Example
 I banked on my husband; he was about to drop me to the bank. He got
late and I wanted to take a cab but there was a taxi strike. I ended up
driving my husband’s vehicle. It was showing low fuel warning, I had to go
to gas station to refill, by the time I reached the bank, car parking was full.
 Synonymy:
 cab, taxi
 vehicle, car
 fuel, gas
 Polysemy
 bank, bank
Copyright 2021 - Dr. Ash Pahwa 11
Test Your English Language Skills
 I went to a “bank”
to deposit money
 What is the meaning of
the word ‘bank’?
©2022 Dr. Ash Pahwa 12
 I went to a “bank” of
a river to take a walk
 What is the meaning of the
word ‘bank’?
Answer: A Answer: B
How Humans Interpret
Language?
 Humans read the whole
sentence to interpret the
meaning of a word
 I went to a “bank” to
deposit money
 I went to a “bank” of a
river to take a walk
©2022 Dr. Ash Pahwa 13
How BERT Interpret
Language?
 BERT use the self-
attention method
 I went to a “bank” to
deposit money
 I went to a “bank” of a
river to take a walk
©2022 Dr. Ash Pahwa 14
Self Attention
 Bank
©2022 Dr. Ash Pahwa 15
Bank Embedding
A bunch of floating-point numbers
Self Attention
 Bank-1
 Financial
Institution
 Bank-2
 Bank of a river
Transformer Architecture
1. Word Embeddings
2. Positional Embeddings (Excel + Python)
3. Self-Attention
4. Masking
5. Train the Neural Network
©2022 Dr. Ash Pahwa 16
Word Embeddings
©2022 Dr. Ash Pahwa 17
Word Embeddings: Word2Vec: Google Research
Mikolov + Chen + Corrado + Dean: ICLR 2013
Copyright 2021 - Dr. Ash Pahwa 18
Example: Vector Math:
King – Man + Woman = Queen
Copyright 2021 - Dr. Ash Pahwa 19
Positional Encoding
©2022 Dr. Ash Pahwa 20
Positional Encoding
Copyright 2021 - Dr. Ash Pahwa 21
Position of a word in a
Sentence: BERT
 In any sequence data (NLP), position of
a word is important
Copyright 2021 - Dr. Ash Pahwa 22
Need for Positional Encoding
 We feed all the words at a time in
parallel
 Need for Positional Encoding
 Positional Encoding represent the order of
words in a sentence
 Advantages
 Decrease in Training Time
 Learn long term dependency of words
Copyright 2021 - Dr. Ash Pahwa 23
Positional Encoding
Copyright 2022 - Dr. Ash Pahwa 24
𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
Positional Encoding
Excel
Copyright 2022 - Dr. Ash Pahwa 25
Positional Encoding Value
 𝑑𝑚𝑜𝑑𝑒𝑙 = 10
 Position: 0 – 5
 Value of ‘i’: 0 - 9
Copyright 2022 - Dr. Ash Pahwa 26
𝑃𝐸_𝑉𝑎𝑙𝑢𝑒(𝑝𝑜𝑠,2𝑖) =
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
Sin and Cosine Values of Values
Copyright 2022 - Dr. Ash Pahwa 27
𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
𝑃𝐸_𝑉𝑎𝑙𝑢𝑒(𝑝𝑜𝑠,2𝑖) =
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
Positional Encoding
Python
Copyright 2022 - Dr. Ash Pahwa 28
Positional Embedding
Implementation in Python
Copyright 2022 - Dr. Ash Pahwa 29
 Load libraries
 Define constants
 𝑑𝑚𝑜𝑑𝑒𝑙 = 10
 Position: 0 – 5
 Value of ‘i’: 0 - 9
Positional Vector
Value
Copyright 2022 - Dr. Ash Pahwa 30
𝑃𝐸_𝑉𝑎𝑙𝑢𝑒(𝑝𝑜𝑠,2𝑖) =
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
Sin and Cosine Values
Copyright 2022 - Dr. Ash Pahwa 31
𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos
𝑝𝑜𝑠
10000
2𝑖
𝑑𝑚𝑜𝑑𝑒𝑙
Positional Encoding in Python
©2022 Dr. Ash Pahwa 32
Heat-Map: Plot of Positional Encoding
d_model = 10, Positions =0-5
Copyright 2022 - Dr. Ash Pahwa 33
Heat-Map: Plot of Positional Encoding
d_model = 512, Positions =0-100
Copyright 2022 - Dr. Ash Pahwa 34
Self Attention
©2022 Dr. Ash Pahwa 35
Transformer Architecture
Copyright 2022 - Dr. Ash Pahwa 36
Self attention
Multi-head attention
Feedforward network
Feedforward network
Encoder 2
Encoder 1
Text + Positional Embeddings
Modified Text Embeddings
Embeddings for all the words
of a sentence
Copyright 2022 - Dr. Ash Pahwa 37
Sentence: I am good
Suppose the embedding matrix contains 512 floating point
numbers
X Matrix: dimension = 3 x 512
Matrix X: Embeddings of all the words of a sentence
X 1 2 … 512
I 1.76 2.22 … 6.66
am 7.77 0.631 … 5.35
good 11.44 10.10 … 3.33
Weight Matrices
Initialized Randomly
Learned by Training
Copyright 2022 - Dr. Ash Pahwa 38
1 2 … 64
1
2
…
512
𝑊𝑄
1 2 … 64
1
2
…
512
𝑊𝐾
1 2 … 64
1
2
…
512
𝑊𝑉
Create Q,K,V Vectors
Copyright 2022 - Dr. Ash Pahwa
39
1 2 … 64
1
2
…
512
𝑊𝑄
1 2 … 64
1
2
…
512
𝑊𝐾
1 2 … 64
1
2
…
512
𝑊𝑉
X 1 2 … 512
I 1.76 2.22 … 6.66
am 7.77 0.631 … 5.35
good 11.44 10.10 … 3.33
X 1 2 … 512
I 1.76 2.22 … 6.66
am 7.77 0.631 … 5.35
good 11.44 10.10 … 3.33
X 1 2 … 512
I 1.76 2.22 … 6.66
am 7.77 0.631 … 5.35
good 11.44 10.10 … 3.33
V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3 x 64
Q = X.𝑊𝑄 =(3 x 512) * (512 x 64) = 3 x 64
K= X.𝑊𝐾 =(3 x 512) * (512 x 64) = 3 x 64
Create Q,K,V Vectors
40
𝑄 𝐾 𝑉
X 1 2 … 64
I 3.69 7.42 … 4.44
am 11.11 7.07 … 76.7
good 99.3 3.69 … 0.85
X 1 2 … 64
I 5.31 6.78 … 0.96
am 11.71 0.86 … 11.31
good 10.10 11.44 … 5.11
X 1 2 … 64
I 67.85 91.2 … 0.13
am 13.13 63.1 … 4.44
good 12.12 96.1 … 43.4
V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3 x 64
Q = X.𝑊𝑄 =(3 x 512) * (512 x 64) = 3 x 64
K= X.𝑊𝐾 =(3 x 512) * (512 x 64) = 3 x 64
Self Attention
Step-1
 Compute 𝑄. 𝐾𝑇
 Dimensions
 𝑄 = 3𝑥64
 𝐾 = 3𝑥64
 𝐾𝑇
= 64𝑥3
 𝑄. 𝐾𝑇 = 3𝑥3
Copyright 2022 - Dr. Ash Pahwa 41
Self Attention Matrix =𝑄. 𝐾𝑇
 The 𝑄. 𝐾𝑇
matrix displays the Self
Attention data
 Shows how strongly the words are
related with each other
Copyright 2022 - Dr. Ash Pahwa 42
𝑸. 𝑲𝑻 I am good
I 110 90 80
am 70 99 70
good 90 70 100
 𝑄. 𝐾𝑇 = 3𝑥3
Word
Relations
 Word ‘I’ is most related to word ‘I’
 Word ‘am’ is most related to word ‘am’
 Word ‘good’ is most related to word ‘good’
Copyright 2022 - Dr. Ash Pahwa 43
𝑸. 𝑲𝑻 I am good
I 110 90 80
am 70 99 70
good 90 70 100
 I went to a “bank”
to deposit money
 What is the meaning of
the word ‘bank’?
 I went to a “bank” of
a river to take a walk
 What is the meaning of the
word ‘bank’?
Step-2
 Normalize the Self-Attention Vector
 Compute
𝑄𝐾𝑇
𝐷𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛 𝑜𝑓 𝐾𝑒𝑦 𝑉𝑒𝑐𝑡𝑜𝑟
=
𝑄𝐾𝑇
64
Copyright 2022 - Dr. Ash Pahwa 44
𝑸. 𝑲𝑻 I am good
I 110 90 80
am 70 99 70
good 90 70 100
𝑸. 𝑲𝑻/ 𝒅 I am good
I 110
64
= 13.75
90
64
= 11.25
80
64
= 10
am 70
64
= 8.75
99
64
= 12.375
70
64
= 8.75
good 90
64
= 11.25
70
64
= 8.75
100
64
= 12.5
Step-3: Normalize + Softmax
 Softmax of the Function
Copyright 2022 - Dr. Ash Pahwa 45
𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑸. 𝑲𝑻
/ 𝒅) I am good
I 0.90 0.07 0.03
am 0.025 0.95 0.025
good 0.21 0.03 0.76
𝑸. 𝑲𝑻
/ 𝒅 I am good
I 110
64
= 13.75
90
64
= 11.25
80
64
= 10
am 70
64
= 8.75
99
64
= 12.375
70
64
= 8.75
good 90
64
= 11.25
70
64
= 8.75
100
64
= 12.5
Step-4:
Attention Matrix Z = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑄. 𝐾𝑇
𝑑
. 𝑉
 Attention Matrix Z = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑄.𝐾𝑇
𝑑
. 𝑉
 Dimensions of Z = (3 x 64)
Copyright 2022 - Dr. Ash Pahwa 46
𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑸. 𝑲𝑻
/ 𝒅) I am good
I 0.90 0.07 0.03
am 0.025 0.95 0.025
good 0.21 0.03 0.76
𝑉
X 1 2 … 64
I 67.85 91.2 … 0.13
am 13.13 63.1 … 4.44
good 12.12 96.1 … 43.4
V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3 x 64
Step-4:
Attention Matrix Z = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑄. 𝐾𝑇
𝑑
. 𝑉
 𝑧1 = 0.90 67.85, 91.2, … , 0.13 +
0.07 13.13, 63.1, … , 4.44 +
0.03 12.12, 96.1, … , 43.4
 𝑧2 = 0.025 67.85, 91.2, … , 0.13 +
0.95 13.13, 63.1, … , 4.44 +
0.025 12.12, 96.1, … , 43.4
 𝑧3 = 0.21 67.85, 91.2, … , 0.13 +
0.03 13.13, 63.1, … , 4.44 +
0.76 12.12, 96.1, … , 43.4
Copyright 2022 - Dr. Ash Pahwa 47
𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑸. 𝑲𝑻
/ 𝒅) I am good
I 0.90 0.07 0.03
am 0.025 0.95 0.025
good 0.21 0.03 0.76
𝑉
X 1 2 … 64
I 67.85 91.2 … 0.13
am 13.13 63.1 … 4.44
good 12.12 96.1 … 43.4
V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3x64
Masking Approach
 Key words of every sentence are masked
 The Neural Network predicts the masked word
 After the training period the following 3 matrices will be
converge into real numbers
©2022 Dr. Ash Pahwa 48
Copyright 2022 - Dr. Ash Pahwa
1 2 … 64
1
2
…
512
𝑊𝑄
1 2 … 64
1
2
…
512
𝑊𝐾
1 2 … 64
1
2
…
512
𝑊𝑉
Train the Neural Network
 Train the Neural Network with the
Training Data
 Compute the Modified Embeddings
©2022 Dr. Ash Pahwa 49
Training Data
with Masked
words
Embeddings
of words
Q,K,V
Matrices +
Self Attention
Vectors
Modified
Embeddings
How BERT Interpret
Language?
 BERT use the self-
attention method
 I went to a “bank” to
deposit money
 I went to a “bank” of a
river to take a walk
©2022 Dr. Ash Pahwa 50
Summary
1. What are Transformers?
2. Transformers – Applications: GPT+BERT
3. Problem – Context Sensitive Embeddings
1. Bank Word Embeddings
4. Transformer Architecture
1. Word2Vec Embeddings
2. Positional Encoding
3. Self Attention
4. Feed Forward Neural Network
©2022 Dr. Ash Pahwa 51

More Related Content

What's hot

Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
Arvind Devaraj
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & ML
Mandy Sidana
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
Loic Merckel
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
Databricks
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
Grigory Sapunov
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
AdventureWorld5
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
HyunJoon Jung
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
OVHcloud
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
YaminiAlapati1
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 

What's hot (20)

Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & ML
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 

Similar to Data Con LA 2022 - Transformers for NLP

Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
jonathanG19
 
word2vec-DataPalooza-Seattle
word2vec-DataPalooza-Seattleword2vec-DataPalooza-Seattle
word2vec-DataPalooza-Seattlecastanan2
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
Wsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problemWsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problemlolokikipipi
 
Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017
Brian Ho
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
Sri Ambati
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
ChaoYang81
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
Roberto Pereira Silveira
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
GeeksLab Odessa
 
"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI
Edge AI and Vision Alliance
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
Garrett Broughton, Architect/Engineer
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshop
QuantUniversity
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
Sri Ambati
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Show and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorShow and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generator
Hojin Yang
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
AtharvaTanawade
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
NAVER D2
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in EngineeringPrince Jain
 

Similar to Data Con LA 2022 - Transformers for NLP (20)

Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
word2vec-DataPalooza-Seattle
word2vec-DataPalooza-Seattleword2vec-DataPalooza-Seattle
word2vec-DataPalooza-Seattle
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Wsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problemWsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problem
 
Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"
 
"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI
 
American sign language recognizer
American sign language recognizerAmerican sign language recognizer
American sign language recognizer
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshop
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
 
Show and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorShow and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generator
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in Engineering
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Data Con LA 2022 - Transformers for NLP

  • 1. Transformers for Natural Language Processing Dr. Ash Pahwa ©2022 Dr. Ash Pahwa 1
  • 2. Bio: Dr. Ash Pahwa  Ph.D. Computer Science  Website: www.AshPahwa.com  Affiliation  California Institute of Technology, Pasadena  UC Irvine, UCLA, UCSD, Chapman  Field of Expertise  Machine Learning, Deep Learning, Digital Image Processing, Database Management, CD-ROM/DVD  Worked for  General Electric, AT&T Bell Laboratories, Oracle, UC Santa Barbara 2 ©2021 Dr. Ash Pahwa
  • 3. Outline 1. What are Transformers? 2. Transformers – Applications: GPT+BERT 3. Problem – Context Sensitive Embeddings 1. Bank Word Embeddings 4. Transformer Architecture 1. Word2Vec Embeddings 2. Positional Encoding 3. Self Attention 4. Feed Forward Neural Network ©2022 Dr. Ash Pahwa 3
  • 4. What are Transformers?  Transformer are a new (2017) family of Deep Learning neural network architecture  Solution to the problems experienced by RNN (Recurrent Neural Networks) architecture  Transformer Architecture contains  Encoder  Decoder  Primary application: Translation Copyright 2021 - Dr. Ash Pahwa 4
  • 5. Attention is All You Need Google Research: NIPS 2017 Copyright 2021 - Dr. Ash Pahwa 5
  • 6. Transformers Applications BERT and GPT  Google: BERT’s model architecture is based on the Encoder of Transformer  OpenAI: GPT’s model architecture is based on the Decoder of the Transformer Copyright 2021 - Dr. Ash Pahwa 6
  • 7. OpenAI – GPT Generative Pre-trained Transformer GPT-1 Paper: 2018 + GPT-2 Paper: 2019 GPT-3 Paper: 2020  GPT-2: Language Models Are Unsupervised Multitask Learners, by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever  GPT-3: Language Models are Few-Shot Learners, by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, ©2022 Dr. Ash Pahwa 7
  • 8. Bi-Directional Transformer BERT: Sesame Street Character  ELMO: 2018: Bi-Directional RNN/LSTM Embeddings from Language Models (ELMO): Univ. of Washington  2018 BERT from Google  Bi-directional Encoder Representations from Transformers  Based on Transformer: Encoder Copyright 2021 - Dr. Ash Pahwa 8  BERT: Word Vector is different from sentence 1 to sentence 2
  • 10. GPT & BERT Applications  GPT:  Machine Translation  Text Generation  BERT  Context sensitive enhanced word embeddings  Used in Google search engine Copyright 2021 - Dr. Ash Pahwa 10
  • 11. Synonymy & Polysemy  Synonymy refers to cases where two different words have the same meaning  Cars & Automobile  Polysemy refers to cases where the same word has different meaning based on the context  Example  I banked on my husband; he was about to drop me to the bank. He got late and I wanted to take a cab but there was a taxi strike. I ended up driving my husband’s vehicle. It was showing low fuel warning, I had to go to gas station to refill, by the time I reached the bank, car parking was full.  Synonymy:  cab, taxi  vehicle, car  fuel, gas  Polysemy  bank, bank Copyright 2021 - Dr. Ash Pahwa 11
  • 12. Test Your English Language Skills  I went to a “bank” to deposit money  What is the meaning of the word ‘bank’? ©2022 Dr. Ash Pahwa 12  I went to a “bank” of a river to take a walk  What is the meaning of the word ‘bank’? Answer: A Answer: B
  • 13. How Humans Interpret Language?  Humans read the whole sentence to interpret the meaning of a word  I went to a “bank” to deposit money  I went to a “bank” of a river to take a walk ©2022 Dr. Ash Pahwa 13
  • 14. How BERT Interpret Language?  BERT use the self- attention method  I went to a “bank” to deposit money  I went to a “bank” of a river to take a walk ©2022 Dr. Ash Pahwa 14
  • 15. Self Attention  Bank ©2022 Dr. Ash Pahwa 15 Bank Embedding A bunch of floating-point numbers Self Attention  Bank-1  Financial Institution  Bank-2  Bank of a river
  • 16. Transformer Architecture 1. Word Embeddings 2. Positional Embeddings (Excel + Python) 3. Self-Attention 4. Masking 5. Train the Neural Network ©2022 Dr. Ash Pahwa 16
  • 18. Word Embeddings: Word2Vec: Google Research Mikolov + Chen + Corrado + Dean: ICLR 2013 Copyright 2021 - Dr. Ash Pahwa 18
  • 19. Example: Vector Math: King – Man + Woman = Queen Copyright 2021 - Dr. Ash Pahwa 19
  • 22. Position of a word in a Sentence: BERT  In any sequence data (NLP), position of a word is important Copyright 2021 - Dr. Ash Pahwa 22
  • 23. Need for Positional Encoding  We feed all the words at a time in parallel  Need for Positional Encoding  Positional Encoding represent the order of words in a sentence  Advantages  Decrease in Training Time  Learn long term dependency of words Copyright 2021 - Dr. Ash Pahwa 23
  • 24. Positional Encoding Copyright 2022 - Dr. Ash Pahwa 24 𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙 𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙
  • 26. Positional Encoding Value  𝑑𝑚𝑜𝑑𝑒𝑙 = 10  Position: 0 – 5  Value of ‘i’: 0 - 9 Copyright 2022 - Dr. Ash Pahwa 26 𝑃𝐸_𝑉𝑎𝑙𝑢𝑒(𝑝𝑜𝑠,2𝑖) = 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙
  • 27. Sin and Cosine Values of Values Copyright 2022 - Dr. Ash Pahwa 27 𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙 𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙 𝑃𝐸_𝑉𝑎𝑙𝑢𝑒(𝑝𝑜𝑠,2𝑖) = 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙
  • 29. Positional Embedding Implementation in Python Copyright 2022 - Dr. Ash Pahwa 29  Load libraries  Define constants  𝑑𝑚𝑜𝑑𝑒𝑙 = 10  Position: 0 – 5  Value of ‘i’: 0 - 9
  • 30. Positional Vector Value Copyright 2022 - Dr. Ash Pahwa 30 𝑃𝐸_𝑉𝑎𝑙𝑢𝑒(𝑝𝑜𝑠,2𝑖) = 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙
  • 31. Sin and Cosine Values Copyright 2022 - Dr. Ash Pahwa 31 𝑃𝐸(𝑝𝑜𝑠,2𝑖) = sin 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙 𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = cos 𝑝𝑜𝑠 10000 2𝑖 𝑑𝑚𝑜𝑑𝑒𝑙
  • 32. Positional Encoding in Python ©2022 Dr. Ash Pahwa 32
  • 33. Heat-Map: Plot of Positional Encoding d_model = 10, Positions =0-5 Copyright 2022 - Dr. Ash Pahwa 33
  • 34. Heat-Map: Plot of Positional Encoding d_model = 512, Positions =0-100 Copyright 2022 - Dr. Ash Pahwa 34
  • 36. Transformer Architecture Copyright 2022 - Dr. Ash Pahwa 36 Self attention Multi-head attention Feedforward network Feedforward network Encoder 2 Encoder 1 Text + Positional Embeddings Modified Text Embeddings
  • 37. Embeddings for all the words of a sentence Copyright 2022 - Dr. Ash Pahwa 37 Sentence: I am good Suppose the embedding matrix contains 512 floating point numbers X Matrix: dimension = 3 x 512 Matrix X: Embeddings of all the words of a sentence X 1 2 … 512 I 1.76 2.22 … 6.66 am 7.77 0.631 … 5.35 good 11.44 10.10 … 3.33
  • 38. Weight Matrices Initialized Randomly Learned by Training Copyright 2022 - Dr. Ash Pahwa 38 1 2 … 64 1 2 … 512 𝑊𝑄 1 2 … 64 1 2 … 512 𝑊𝐾 1 2 … 64 1 2 … 512 𝑊𝑉
  • 39. Create Q,K,V Vectors Copyright 2022 - Dr. Ash Pahwa 39 1 2 … 64 1 2 … 512 𝑊𝑄 1 2 … 64 1 2 … 512 𝑊𝐾 1 2 … 64 1 2 … 512 𝑊𝑉 X 1 2 … 512 I 1.76 2.22 … 6.66 am 7.77 0.631 … 5.35 good 11.44 10.10 … 3.33 X 1 2 … 512 I 1.76 2.22 … 6.66 am 7.77 0.631 … 5.35 good 11.44 10.10 … 3.33 X 1 2 … 512 I 1.76 2.22 … 6.66 am 7.77 0.631 … 5.35 good 11.44 10.10 … 3.33 V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3 x 64 Q = X.𝑊𝑄 =(3 x 512) * (512 x 64) = 3 x 64 K= X.𝑊𝐾 =(3 x 512) * (512 x 64) = 3 x 64
  • 40. Create Q,K,V Vectors 40 𝑄 𝐾 𝑉 X 1 2 … 64 I 3.69 7.42 … 4.44 am 11.11 7.07 … 76.7 good 99.3 3.69 … 0.85 X 1 2 … 64 I 5.31 6.78 … 0.96 am 11.71 0.86 … 11.31 good 10.10 11.44 … 5.11 X 1 2 … 64 I 67.85 91.2 … 0.13 am 13.13 63.1 … 4.44 good 12.12 96.1 … 43.4 V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3 x 64 Q = X.𝑊𝑄 =(3 x 512) * (512 x 64) = 3 x 64 K= X.𝑊𝐾 =(3 x 512) * (512 x 64) = 3 x 64
  • 41. Self Attention Step-1  Compute 𝑄. 𝐾𝑇  Dimensions  𝑄 = 3𝑥64  𝐾 = 3𝑥64  𝐾𝑇 = 64𝑥3  𝑄. 𝐾𝑇 = 3𝑥3 Copyright 2022 - Dr. Ash Pahwa 41
  • 42. Self Attention Matrix =𝑄. 𝐾𝑇  The 𝑄. 𝐾𝑇 matrix displays the Self Attention data  Shows how strongly the words are related with each other Copyright 2022 - Dr. Ash Pahwa 42 𝑸. 𝑲𝑻 I am good I 110 90 80 am 70 99 70 good 90 70 100  𝑄. 𝐾𝑇 = 3𝑥3
  • 43. Word Relations  Word ‘I’ is most related to word ‘I’  Word ‘am’ is most related to word ‘am’  Word ‘good’ is most related to word ‘good’ Copyright 2022 - Dr. Ash Pahwa 43 𝑸. 𝑲𝑻 I am good I 110 90 80 am 70 99 70 good 90 70 100  I went to a “bank” to deposit money  What is the meaning of the word ‘bank’?  I went to a “bank” of a river to take a walk  What is the meaning of the word ‘bank’?
  • 44. Step-2  Normalize the Self-Attention Vector  Compute 𝑄𝐾𝑇 𝐷𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛 𝑜𝑓 𝐾𝑒𝑦 𝑉𝑒𝑐𝑡𝑜𝑟 = 𝑄𝐾𝑇 64 Copyright 2022 - Dr. Ash Pahwa 44 𝑸. 𝑲𝑻 I am good I 110 90 80 am 70 99 70 good 90 70 100 𝑸. 𝑲𝑻/ 𝒅 I am good I 110 64 = 13.75 90 64 = 11.25 80 64 = 10 am 70 64 = 8.75 99 64 = 12.375 70 64 = 8.75 good 90 64 = 11.25 70 64 = 8.75 100 64 = 12.5
  • 45. Step-3: Normalize + Softmax  Softmax of the Function Copyright 2022 - Dr. Ash Pahwa 45 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑸. 𝑲𝑻 / 𝒅) I am good I 0.90 0.07 0.03 am 0.025 0.95 0.025 good 0.21 0.03 0.76 𝑸. 𝑲𝑻 / 𝒅 I am good I 110 64 = 13.75 90 64 = 11.25 80 64 = 10 am 70 64 = 8.75 99 64 = 12.375 70 64 = 8.75 good 90 64 = 11.25 70 64 = 8.75 100 64 = 12.5
  • 46. Step-4: Attention Matrix Z = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄. 𝐾𝑇 𝑑 . 𝑉  Attention Matrix Z = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄.𝐾𝑇 𝑑 . 𝑉  Dimensions of Z = (3 x 64) Copyright 2022 - Dr. Ash Pahwa 46 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑸. 𝑲𝑻 / 𝒅) I am good I 0.90 0.07 0.03 am 0.025 0.95 0.025 good 0.21 0.03 0.76 𝑉 X 1 2 … 64 I 67.85 91.2 … 0.13 am 13.13 63.1 … 4.44 good 12.12 96.1 … 43.4 V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3 x 64
  • 47. Step-4: Attention Matrix Z = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄. 𝐾𝑇 𝑑 . 𝑉  𝑧1 = 0.90 67.85, 91.2, … , 0.13 + 0.07 13.13, 63.1, … , 4.44 + 0.03 12.12, 96.1, … , 43.4  𝑧2 = 0.025 67.85, 91.2, … , 0.13 + 0.95 13.13, 63.1, … , 4.44 + 0.025 12.12, 96.1, … , 43.4  𝑧3 = 0.21 67.85, 91.2, … , 0.13 + 0.03 13.13, 63.1, … , 4.44 + 0.76 12.12, 96.1, … , 43.4 Copyright 2022 - Dr. Ash Pahwa 47 𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝑸. 𝑲𝑻 / 𝒅) I am good I 0.90 0.07 0.03 am 0.025 0.95 0.025 good 0.21 0.03 0.76 𝑉 X 1 2 … 64 I 67.85 91.2 … 0.13 am 13.13 63.1 … 4.44 good 12.12 96.1 … 43.4 V=X.𝑊𝑉= (3 x 512) * (512 x 64) = 3x64
  • 48. Masking Approach  Key words of every sentence are masked  The Neural Network predicts the masked word  After the training period the following 3 matrices will be converge into real numbers ©2022 Dr. Ash Pahwa 48 Copyright 2022 - Dr. Ash Pahwa 1 2 … 64 1 2 … 512 𝑊𝑄 1 2 … 64 1 2 … 512 𝑊𝐾 1 2 … 64 1 2 … 512 𝑊𝑉
  • 49. Train the Neural Network  Train the Neural Network with the Training Data  Compute the Modified Embeddings ©2022 Dr. Ash Pahwa 49 Training Data with Masked words Embeddings of words Q,K,V Matrices + Self Attention Vectors Modified Embeddings
  • 50. How BERT Interpret Language?  BERT use the self- attention method  I went to a “bank” to deposit money  I went to a “bank” of a river to take a walk ©2022 Dr. Ash Pahwa 50
  • 51. Summary 1. What are Transformers? 2. Transformers – Applications: GPT+BERT 3. Problem – Context Sensitive Embeddings 1. Bank Word Embeddings 4. Transformer Architecture 1. Word2Vec Embeddings 2. Positional Encoding 3. Self Attention 4. Feed Forward Neural Network ©2022 Dr. Ash Pahwa 51