SlideShare a Scribd company logo
1 of 76
Download to read offline
Insoo Chung
Big ideas in NLP
Simple and e
ff
ective big ideas that led us here — CSCE181
My name is Insoo Chung
• Graduate student at Texas A&M
• Worked on Machine Translation @ Samsung Research
• On-device translation, simultaneous interpretation,
speech-to-speech translation
• Published at ACL, EMNLP, ICASSP, and WMT
• Interned with Siri @ Apple this Summer
• Performance analysis of on-device Siri
• NLP enthusiast!
Howdy
Big ideas in NLP
What we are covering today
1. Natural language processing — what and why
2. Deep learning — a tiny bit
3. Big ideas
👉 Word2Vec — words
👉 Recurrent neural networks — sentence input
👉 Sequence to sequence framework — sentence output
👉 Attention — long sentence
Natural language processing
De
fi
nition
Branch of AI, concerned with giving computers the ability to understand natural language
utterance → text → text → utterance
(usually)
🗣📱
De
fi
nition from: https://www.ibm.com/cloud/learn/natural-language-processing
+ generating natural language as well
Language and intelligence
Why?
+ Most knowledge is in stored in some form of a language → we want computers to exploit it
If computers could process NL, it would be more useful to people.
Language and intelligence
Why?
+ Most knowledge is in stored in some form of a language → we want computers to exploit it
If computers could process NL, it would be more useful to people.
https://dev.to/spectrumcetb/evolution-of-nlp-f54
Language and intelligence
Turing test
Linguistic ability ≈ Intelligence
Language and intelligence
Turing test
“I propose to consider the question, ‘Can machines think?’”, Turing
https://en.wikipedia.org/wiki/Turing_test
If C cannot determine which player is a human, machine passes the test
Linguistic ability ≈ Intelligence
Language and intelligence
Turing test
“I propose to consider the question, ‘Can machines think?’”, Turing
https://en.wikipedia.org/wiki/Turing_test
If C cannot determine which player is a human, machine passes the test
Linguistic ability ≈ Intelligence
Language and intelligence
Arti
fi
cial general intelligence
Language and intelligence
Arti
fi
cial general intelligence
Language and intelligence
Arti
fi
cial general intelligence
Turing test is one of the tests used for con
fi
rming human-level AGI
Language and intelligence
Arti
fi
cial general intelligence
Turing test is one of the tests used for con
fi
rming human-level AGI
Linguistic abilities of AI is important in moving towards the next step
Language and intelligence
Arti
fi
cial general intelligence
Turing test is one of the tests used for con
fi
rming human-level AGI
Linguistic abilities of AI is important in moving towards the next step
But language is so weird!
Slide from Stanford’s CS241n course, original cartoon by Randall Monroe
Slide from Stanford’s CS241n course, original cartoon by Randall Monroe
Then deep learning
happened 💥
Use cases
Commercial products
Machine translation (text → text)
Voice assistants (utterance → utterance)
Source: (left) Google translate, (right) Apple Siri
Use cases
Commercial products
Machine translation (text → text)
Voice assistants (utterance → utterance)
Source: (left) Google translate, (right) Apple Siri
So what’s
under the hood?
Deep learning
… for NLP
NLP models can be viewed as a conditional probability function which can be learned using deep
learning.
f( ⃗
y | ⃗
x , ⃗
θ )
NLP models can be viewed as a conditional probability function which can be learned using deep
learning.
P( ⃗
y | ⃗
x , ⃗
θ )
With x as input and
𝛉
as model params,
f outputs most probable outcome y.
Deep learning
… for NLP
NLP models can be viewed as a conditional probability function which can be learned using deep
learning.
Deep learning provides an effective way to learn the model , given A LOT of data and computation.
P( ⃗
y | ⃗
x , ⃗
θ )
θ
With x as input and
𝛉
as model params,
f outputs most probable outcome y.
Deep learning
… for NLP
Big ideas in NLP
What we are covering today
👉Word2Vec
👉Recurrent neural networks
👉Sequence to sequence framework
👉Attention - a tiny bit
Word2Vec
Language representation in computers
Words are represented as vectors of numbers in NLP. How?
1. Words are associated with random vectors:
2. We go through many sentences and learn that predicts prev/next word probability correctly.
θ
Example from: https://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture01-wordvecs1.pdf
brown = [+0.3, − 0.4, + 0.2, − 0.3,...]T
fox = [−0.2, − 0.1, − 0.1, + 0.3,...]T
Word2Vec
Example from: https://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture01-wordvecs1.pdf
Words are represented as vectors of numbers in NLP. How?
1. Words are associated with random vectors:
2. We go through many sentences and learn that predicts prev/next word probability correctly.
3. The result?
• Word vectors populated in n-d space that holds semantic/syntactic meaning
θ
brown = [+0.3, − 0.4, + 0.2, − 0.3,...]T
fox = [−0.2, − 0.1, − 0.1, + 0.3,...]T
Language representation in computers
Word2Vec
Learned vectors
Semantically close words are near each other
Word2Vec
Learned vectors
Semantically close words are near each other
Syntactic relationships are preserved with relative positioning
e.g. ⃗
slower − ⃗
slow ≈ ⃗
faster − ⃗
fast
Word2Vec
Learned vectors
Semantically close words are near each other
Syntactic relationships are preserved with relative positioning
e.g. ⃗
slower − ⃗
slow ≈ ⃗
faster − ⃗
fast
Word2Vec
Learned vectors
Semantically close words are near each other
We have
computable
representations
for words!
Syntactic relationships are preserved with relative positioning
e.g. ⃗
slower − ⃗
slow ≈ ⃗
faster − ⃗
fast
Recurrent neural networks
Dealing with sequence inputs
We now know to deal with words using adjacency stats, but how do we handle sentences?
→ Consider movie review sentiment analysis.
Recurrent neural networks
Dealing with sequence inputs
We now know to deal with words using adjacency stats, but how do we handle sentences?
→ Consider movie review sentiment analysis.
Negative
How do we deal with a sentence, i.e. a sequence of words?
→ If we consider every possible sentences, the possible # of inputs would be - intractable
∞
Recurrent neural networks
Dealing with sequence inputs
Very good 1: positive
I enjoyed this as much as my cat enjoys baths 0: negative
f( ⃗
y | ⃗
x , ⃗
θ )
How do we deal with a sentence, i.e. a sequence of words?
→ Break it down to word level: then the possible # of words wouldn’t be that many (~30K) - tractable!
Recurrent neural networks
Dealing with sequence inputs
[“Very”, “good”] 1: positive
[“I”, “enjoyed”, “this”, “as”, “much”, “as”, “my”,
“cat”, “enjoys”, “baths”]
0: negative
f( ⃗
y | ⃗
x , ⃗
θ )
Recurrent neural network
1. Handle words step-by-step.
2. Use previous and vector to create the next
3. Use the final step’s output to determine the result
⃗
word ⃗
state ⃗
state'
P( ⃗
h t+1 | ⃗
x t, ⃗
h t, ⃗
θ )
Recurrent neural networks
Dealing with sequence inputs
Recurrent neural network
1. Handle words step-by-step.
2. Use previous and vector to create the next
3. Use the final step’s output to determine the result
⃗
word ⃗
state ⃗
state'
P( ⃗
h t+1 | ⃗
x t, ⃗
h t, ⃗
θ )
Recurrent neural networks
Dealing with sequence inputs
Context
Word
New context
Recurrent neural network
1. Handle words step-by-step.
2. Use previous and vector to create the next
3. Use the final step’s output to determine the result
⃗
word ⃗
state ⃗
state'
Recurrent neural networks
Dealing with sequence inputs
P( ⃗
h t+1 | ⃗
x t, ⃗
h t, ⃗
θ )
h<0> f f
Very good
1: positive
h<1>
h<2>
h<0>
Recurrent neural network
1. Handle words step-by-step.
2. Use previous and vector to create the next
3. Use the final step’s output to determine the result
⃗
word ⃗
state ⃗
state'
̂
y(s) = P(sentiment|review)
= h<2>
= f(x<2>
, h<1>
)
= f(x<2>
, f(x<1>
, h<0>
))
Recurrent neural networks
Dealing with sequence inputs
f f
Very good
1: positive
h<1>
h<2>
h<0>
P( ⃗
h t+1 | ⃗
x t, ⃗
h t, ⃗
θ )
f f
Very good
1: positive
h<1>
h<2>
h<0>
P( ⃗
h t+1 | ⃗
x t, ⃗
h t, ⃗
θ )
Recurrent neural network
1. Handle words step-by-step.
2. Use previous and vector to create the next
3. Use the final step’s output to determine the result
⃗
word ⃗
state ⃗
state'
Step 1
Step 1
Recurrent neural networks
Dealing with sequence inputs
̂
y(s) = P(sentiment|review)
= h<2>
= f(x<2>
, h<1>
)
= f(x<2>
, f(x<1>
, h<0>
))
f f
Very good
1: positive
h<1>
h<2>
h<0>
P( ⃗
h t+1 | ⃗
x t, ⃗
h t, ⃗
θ )
Recurrent neural network
1. Handle words step-by-step.
2. Use previous and vector to create the next
3. Use the final step’s output to determine the result
⃗
word ⃗
state ⃗
state'
Step 1
Step 1
Recurrent neural networks
Dealing with sequence inputs
̂
y(s) = P(sentiment|review)
= h<2>
= f(x<2>
, h<1>
)
= f(x<2>
, f(x<1>
, h<0>
))
Step 2
Step 2
Recurrent neural networks
Use case
Recurrent neural networks
Use case
We can read
-many
possible
sentences!
∞
But how can we produce sequence outputs?
Seq2seq
Producing sequence outputs
Same principal: produce one word at a time
Seq2seq
Producing sequence outputs
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Same principal: produce one word at a time
Seq2seq
Producing sequence outputs
Words are read by encoder RNN
and is updated
⃗
state
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Same principal: produce one word at a time
Seq2seq
Producing sequence outputs
Final of encoder is fed
as the initial of the decoder
⃗
state
⃗
state0
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Same principal: produce one word at a time
Seq2seq
Producing sequence outputs
Decoder RNN does its thing:
Emits output word one at a time depending on the
is also updated for the next step.
⃗
state
⃗
state
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Same principal: produce one word at a time
Seq2seq
Producing sequence outputs
Terminates when special token is emitted
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Same principal: produce one word at a time
Seq2seq
Producing sequence outputs
Terminates when special token is emitted
We can generate
-many
possible
sentences!
∞
Attention
Handling long dependency
Words can be too far away!
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Words can be too far away!
Attention
Handling long dependency
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Attention
Handling long dependency
P( ⃗
h ′

t | ⃗
y t−1, ⃗
h ′

t−1, ⃗
θ )
Words can be too far away!
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Words can be too far away!
Attention
Handling long dependency
P( ⃗
h ′

t | ⃗
y t−1, ⃗
h ′

t−1, ⃗
θ )
⃗
h ′

t accounts ⃗
x i ∈ X, ⃗
y i<t
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Words can be too far away!
Attention
Handling long dependency
P( ⃗
h ′

t | ⃗
y t−1, ⃗
h ′

t−1, ⃗
θ )
⃗
h ′

t accounts ⃗
x i ∈ X, ⃗
y i<t
- Questions and answers can easily have 20 words
- Relying on a 1 context vector to account for context of 30+ words is not optimal
Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
Attention
Handling long dependency
Image source: https://www.tensor
fl
ow.org/text/tutorials/nmt_with_attention
Taylor made context vector for each step!
Attention
Handling long dependency
Taylor made context vector for each step!
Image source: https://www.tensor
fl
ow.org/text/tutorials/nmt_with_attention
Attention
Handling long dependency
Taylor made context vector for each step!
👈 no details today
Image source: https://www.tensor
fl
ow.org/text/tutorials/nmt_with_attention
Attention-based model transformer greatly improved NLP performance
👉Parallel encoding is possible (decoding is still auto-regressive)
👉SOTA performance in multitude of tasks
👉Performance scales indefinitely with size of data and number of parameters
Transformer
What empowered seq2seq framework further
Attention-based model transformer greatly improved NLP performance
👉Parallel encoding is possible (decoding is still auto-regressive)
👉SOTA performance in multitude of tasks
👉Performance scales indefinitely with size of data and number of parameters
Transformer
What empowered seq2seq framework further
State of the art
GPT-3
Language models are
fl
exible task solvers!
Source: https://beta.openai.com/examples
State of the art
GPT-3
Source: https://beta.openai.com/examples
State of the art
Codex (GPT-3 descendant)
Image source: Github Co-pilot
Use cases
https://ai.googleblog.com/2019/05/introducing-translatotron-end-to-end.html
Audio = array of spectrogram patches
No textual representation in-between audio in/out
Di
ff
erent modalities
The sequence view of NLP provides a useful view for other modalities!
Use cases
Di
ff
erent modalities
https://arxiv.org/pdf/2010.11929v2.pdf
Image = array of image patches
Use cases
https://arxiv.org/pdf/2102.00719.pdf
Video = array of images
Di
ff
erent modalities
State of the art
DALL-E 2
Image source: https://openai.com/dall-e-2/
NLP model as encoder Generative model as decoder
State of the art
DALL-E 2
Image source: https://openai.com/dall-e-2/
NLP model as encoder Generative model as decoder
State of the art
Midjourney examples
@midjourneyartwork
State of the art
Midjourney examples
@midjourney.architecture
Recap
Big ideas in NLP
A. Language is important, but hard to compute
A. Context, nuances, -many possible sentences
B. Word2vec creates a mean to map words to meaning vectors
A. Allows computational representation
C. RNN can read sentences at word space
A. Compute friendly
D. Seq2seq provides a way to generate sentences
A. More
fl
exibility
E. Attention lets you handle long sentences.
∞
Recap
Big ideas in NLP
A. Language is important, but hard to compute
A. Context, nuances, -many possible sentences
B. Word2vec creates a mean to map words to meaning vectors
A. Allows computational representation
C. RNN can read sentences at word space
A. Compute friendly
D. Seq2seq provides a way to generate sentences
A. More
fl
exibility
E. Attention lets you handle long sentences
∞
Recap
Big ideas in NLP
A. Language is important, but hard to compute
A. Context, nuances, -many possible sentences
B. Word2vec creates a mean to map words to meaning vectors
A. Allows computational representation
C. RNN can read sentences at word space
A. Compute friendly
D. Seq2seq provides a way to generate sentences
A. More
fl
exibility
E. Attention lets you handle long sentences
∞
Simple and
effective ideas
changed the game
CEO of OpenAI
CEO of OpenAI
Implications?
The father of FPS
John Carmack on Lex Fridman’s podcast: https://youtu.be/xLi83prR5fg
He also said (something along these lines):
“The remaining ideas are simple enough to be written down on a back of an envelope”
“The code for AGI will be ~10,000 lines of code and will take one man to implement it”
The father of FPS
Where to learn more?
A. Stanford CS224n: Natural Language Processing - lecture
B. Unreasonable E
ff
ectiveness of Recurrent Neural Networks - article
C. Illustrated Transformer - article
D. Speech and Text Processing - book
E. MIT 6.S191: Introduction to Deep Learning - lecture
Questions?

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning健程 杨
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Word embedding
Word embedding Word embedding
Word embedding
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 

Similar to CSCE181 Big ideas in NLP

Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectorsOsebe Sammi
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Sunil Kumar Kopparapu
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining Yi-Shin Chen
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPRrubaa Panchendrarajan
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNetDomyoung Lee
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...MobileMonday Estonia
 
transfer.pptx
transfer.pptxtransfer.pptx
transfer.pptxHaibinSu2
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Hady Elsahar
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255deffa5
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing WorkshopLakshya Sivaramakrishnan
 

Similar to CSCE181 Big ideas in NLP (20)

Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Nltk
NltkNltk
Nltk
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLP
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
 
transfer.pptx
transfer.pptxtransfer.pptx
transfer.pptx
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 

Recently uploaded

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

CSCE181 Big ideas in NLP

  • 1. Insoo Chung Big ideas in NLP Simple and e ff ective big ideas that led us here — CSCE181
  • 2. My name is Insoo Chung • Graduate student at Texas A&M • Worked on Machine Translation @ Samsung Research • On-device translation, simultaneous interpretation, speech-to-speech translation • Published at ACL, EMNLP, ICASSP, and WMT • Interned with Siri @ Apple this Summer • Performance analysis of on-device Siri • NLP enthusiast! Howdy
  • 3. Big ideas in NLP What we are covering today 1. Natural language processing — what and why 2. Deep learning — a tiny bit 3. Big ideas 👉 Word2Vec — words 👉 Recurrent neural networks — sentence input 👉 Sequence to sequence framework — sentence output 👉 Attention — long sentence
  • 4. Natural language processing De fi nition Branch of AI, concerned with giving computers the ability to understand natural language utterance → text → text → utterance (usually) 🗣📱 De fi nition from: https://www.ibm.com/cloud/learn/natural-language-processing + generating natural language as well
  • 5. Language and intelligence Why? + Most knowledge is in stored in some form of a language → we want computers to exploit it If computers could process NL, it would be more useful to people.
  • 6. Language and intelligence Why? + Most knowledge is in stored in some form of a language → we want computers to exploit it If computers could process NL, it would be more useful to people. https://dev.to/spectrumcetb/evolution-of-nlp-f54
  • 7. Language and intelligence Turing test Linguistic ability ≈ Intelligence
  • 8. Language and intelligence Turing test “I propose to consider the question, ‘Can machines think?’”, Turing https://en.wikipedia.org/wiki/Turing_test If C cannot determine which player is a human, machine passes the test Linguistic ability ≈ Intelligence
  • 9. Language and intelligence Turing test “I propose to consider the question, ‘Can machines think?’”, Turing https://en.wikipedia.org/wiki/Turing_test If C cannot determine which player is a human, machine passes the test Linguistic ability ≈ Intelligence
  • 12. Language and intelligence Arti fi cial general intelligence Turing test is one of the tests used for con fi rming human-level AGI
  • 13. Language and intelligence Arti fi cial general intelligence Turing test is one of the tests used for con fi rming human-level AGI Linguistic abilities of AI is important in moving towards the next step
  • 14. Language and intelligence Arti fi cial general intelligence Turing test is one of the tests used for con fi rming human-level AGI Linguistic abilities of AI is important in moving towards the next step But language is so weird!
  • 15. Slide from Stanford’s CS241n course, original cartoon by Randall Monroe
  • 16. Slide from Stanford’s CS241n course, original cartoon by Randall Monroe Then deep learning happened 💥
  • 17. Use cases Commercial products Machine translation (text → text) Voice assistants (utterance → utterance) Source: (left) Google translate, (right) Apple Siri
  • 18. Use cases Commercial products Machine translation (text → text) Voice assistants (utterance → utterance) Source: (left) Google translate, (right) Apple Siri So what’s under the hood?
  • 19. Deep learning … for NLP NLP models can be viewed as a conditional probability function which can be learned using deep learning. f( ⃗ y | ⃗ x , ⃗ θ )
  • 20. NLP models can be viewed as a conditional probability function which can be learned using deep learning. P( ⃗ y | ⃗ x , ⃗ θ ) With x as input and 𝛉 as model params, f outputs most probable outcome y. Deep learning … for NLP
  • 21. NLP models can be viewed as a conditional probability function which can be learned using deep learning. Deep learning provides an effective way to learn the model , given A LOT of data and computation. P( ⃗ y | ⃗ x , ⃗ θ ) θ With x as input and 𝛉 as model params, f outputs most probable outcome y. Deep learning … for NLP
  • 22. Big ideas in NLP What we are covering today 👉Word2Vec 👉Recurrent neural networks 👉Sequence to sequence framework 👉Attention - a tiny bit
  • 23. Word2Vec Language representation in computers Words are represented as vectors of numbers in NLP. How? 1. Words are associated with random vectors: 2. We go through many sentences and learn that predicts prev/next word probability correctly. θ Example from: https://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture01-wordvecs1.pdf brown = [+0.3, − 0.4, + 0.2, − 0.3,...]T fox = [−0.2, − 0.1, − 0.1, + 0.3,...]T
  • 24. Word2Vec Example from: https://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture01-wordvecs1.pdf Words are represented as vectors of numbers in NLP. How? 1. Words are associated with random vectors: 2. We go through many sentences and learn that predicts prev/next word probability correctly. 3. The result? • Word vectors populated in n-d space that holds semantic/syntactic meaning θ brown = [+0.3, − 0.4, + 0.2, − 0.3,...]T fox = [−0.2, − 0.1, − 0.1, + 0.3,...]T Language representation in computers
  • 25. Word2Vec Learned vectors Semantically close words are near each other
  • 26. Word2Vec Learned vectors Semantically close words are near each other Syntactic relationships are preserved with relative positioning e.g. ⃗ slower − ⃗ slow ≈ ⃗ faster − ⃗ fast
  • 27. Word2Vec Learned vectors Semantically close words are near each other Syntactic relationships are preserved with relative positioning e.g. ⃗ slower − ⃗ slow ≈ ⃗ faster − ⃗ fast
  • 28. Word2Vec Learned vectors Semantically close words are near each other We have computable representations for words! Syntactic relationships are preserved with relative positioning e.g. ⃗ slower − ⃗ slow ≈ ⃗ faster − ⃗ fast
  • 29. Recurrent neural networks Dealing with sequence inputs We now know to deal with words using adjacency stats, but how do we handle sentences? → Consider movie review sentiment analysis.
  • 30. Recurrent neural networks Dealing with sequence inputs We now know to deal with words using adjacency stats, but how do we handle sentences? → Consider movie review sentiment analysis. Negative
  • 31. How do we deal with a sentence, i.e. a sequence of words? → If we consider every possible sentences, the possible # of inputs would be - intractable ∞ Recurrent neural networks Dealing with sequence inputs Very good 1: positive I enjoyed this as much as my cat enjoys baths 0: negative f( ⃗ y | ⃗ x , ⃗ θ )
  • 32. How do we deal with a sentence, i.e. a sequence of words? → Break it down to word level: then the possible # of words wouldn’t be that many (~30K) - tractable! Recurrent neural networks Dealing with sequence inputs [“Very”, “good”] 1: positive [“I”, “enjoyed”, “this”, “as”, “much”, “as”, “my”, “cat”, “enjoys”, “baths”] 0: negative f( ⃗ y | ⃗ x , ⃗ θ )
  • 33. Recurrent neural network 1. Handle words step-by-step. 2. Use previous and vector to create the next 3. Use the final step’s output to determine the result ⃗ word ⃗ state ⃗ state' P( ⃗ h t+1 | ⃗ x t, ⃗ h t, ⃗ θ ) Recurrent neural networks Dealing with sequence inputs
  • 34. Recurrent neural network 1. Handle words step-by-step. 2. Use previous and vector to create the next 3. Use the final step’s output to determine the result ⃗ word ⃗ state ⃗ state' P( ⃗ h t+1 | ⃗ x t, ⃗ h t, ⃗ θ ) Recurrent neural networks Dealing with sequence inputs Context Word New context
  • 35. Recurrent neural network 1. Handle words step-by-step. 2. Use previous and vector to create the next 3. Use the final step’s output to determine the result ⃗ word ⃗ state ⃗ state' Recurrent neural networks Dealing with sequence inputs P( ⃗ h t+1 | ⃗ x t, ⃗ h t, ⃗ θ ) h<0> f f Very good 1: positive h<1> h<2> h<0>
  • 36. Recurrent neural network 1. Handle words step-by-step. 2. Use previous and vector to create the next 3. Use the final step’s output to determine the result ⃗ word ⃗ state ⃗ state' ̂ y(s) = P(sentiment|review) = h<2> = f(x<2> , h<1> ) = f(x<2> , f(x<1> , h<0> )) Recurrent neural networks Dealing with sequence inputs f f Very good 1: positive h<1> h<2> h<0> P( ⃗ h t+1 | ⃗ x t, ⃗ h t, ⃗ θ )
  • 37. f f Very good 1: positive h<1> h<2> h<0> P( ⃗ h t+1 | ⃗ x t, ⃗ h t, ⃗ θ ) Recurrent neural network 1. Handle words step-by-step. 2. Use previous and vector to create the next 3. Use the final step’s output to determine the result ⃗ word ⃗ state ⃗ state' Step 1 Step 1 Recurrent neural networks Dealing with sequence inputs ̂ y(s) = P(sentiment|review) = h<2> = f(x<2> , h<1> ) = f(x<2> , f(x<1> , h<0> ))
  • 38. f f Very good 1: positive h<1> h<2> h<0> P( ⃗ h t+1 | ⃗ x t, ⃗ h t, ⃗ θ ) Recurrent neural network 1. Handle words step-by-step. 2. Use previous and vector to create the next 3. Use the final step’s output to determine the result ⃗ word ⃗ state ⃗ state' Step 1 Step 1 Recurrent neural networks Dealing with sequence inputs ̂ y(s) = P(sentiment|review) = h<2> = f(x<2> , h<1> ) = f(x<2> , f(x<1> , h<0> )) Step 2 Step 2
  • 40. Recurrent neural networks Use case We can read -many possible sentences! ∞
  • 41. But how can we produce sequence outputs? Seq2seq Producing sequence outputs
  • 42. Same principal: produce one word at a time Seq2seq Producing sequence outputs Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 43. Same principal: produce one word at a time Seq2seq Producing sequence outputs Words are read by encoder RNN and is updated ⃗ state Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 44. Same principal: produce one word at a time Seq2seq Producing sequence outputs Final of encoder is fed as the initial of the decoder ⃗ state ⃗ state0 Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 45. Same principal: produce one word at a time Seq2seq Producing sequence outputs Decoder RNN does its thing: Emits output word one at a time depending on the is also updated for the next step. ⃗ state ⃗ state Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 46. Same principal: produce one word at a time Seq2seq Producing sequence outputs Terminates when special token is emitted Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 47. Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8 Same principal: produce one word at a time Seq2seq Producing sequence outputs Terminates when special token is emitted We can generate -many possible sentences! ∞
  • 48. Attention Handling long dependency Words can be too far away! Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 49. Words can be too far away! Attention Handling long dependency Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 50. Attention Handling long dependency P( ⃗ h ′  t | ⃗ y t−1, ⃗ h ′  t−1, ⃗ θ ) Words can be too far away! Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 51. Words can be too far away! Attention Handling long dependency P( ⃗ h ′  t | ⃗ y t−1, ⃗ h ′  t−1, ⃗ θ ) ⃗ h ′  t accounts ⃗ x i ∈ X, ⃗ y i<t Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 52. Words can be too far away! Attention Handling long dependency P( ⃗ h ′  t | ⃗ y t−1, ⃗ h ′  t−1, ⃗ θ ) ⃗ h ′  t accounts ⃗ x i ∈ X, ⃗ y i<t - Questions and answers can easily have 20 words - Relying on a 1 context vector to account for context of 30+ words is not optimal Image source: https://towardsdatascience.com/sequence-to-sequence-tutorial-4fde3ee798d8
  • 53. Attention Handling long dependency Image source: https://www.tensor fl ow.org/text/tutorials/nmt_with_attention Taylor made context vector for each step!
  • 54. Attention Handling long dependency Taylor made context vector for each step! Image source: https://www.tensor fl ow.org/text/tutorials/nmt_with_attention
  • 55. Attention Handling long dependency Taylor made context vector for each step! 👈 no details today Image source: https://www.tensor fl ow.org/text/tutorials/nmt_with_attention
  • 56. Attention-based model transformer greatly improved NLP performance 👉Parallel encoding is possible (decoding is still auto-regressive) 👉SOTA performance in multitude of tasks 👉Performance scales indefinitely with size of data and number of parameters Transformer What empowered seq2seq framework further
  • 57. Attention-based model transformer greatly improved NLP performance 👉Parallel encoding is possible (decoding is still auto-regressive) 👉SOTA performance in multitude of tasks 👉Performance scales indefinitely with size of data and number of parameters Transformer What empowered seq2seq framework further
  • 58. State of the art GPT-3 Language models are fl exible task solvers! Source: https://beta.openai.com/examples
  • 59. State of the art GPT-3 Source: https://beta.openai.com/examples
  • 60. State of the art Codex (GPT-3 descendant) Image source: Github Co-pilot
  • 61. Use cases https://ai.googleblog.com/2019/05/introducing-translatotron-end-to-end.html Audio = array of spectrogram patches No textual representation in-between audio in/out Di ff erent modalities The sequence view of NLP provides a useful view for other modalities!
  • 63. Use cases https://arxiv.org/pdf/2102.00719.pdf Video = array of images Di ff erent modalities
  • 64. State of the art DALL-E 2 Image source: https://openai.com/dall-e-2/ NLP model as encoder Generative model as decoder
  • 65. State of the art DALL-E 2 Image source: https://openai.com/dall-e-2/ NLP model as encoder Generative model as decoder
  • 66. State of the art Midjourney examples @midjourneyartwork
  • 67. State of the art Midjourney examples @midjourney.architecture
  • 68. Recap Big ideas in NLP A. Language is important, but hard to compute A. Context, nuances, -many possible sentences B. Word2vec creates a mean to map words to meaning vectors A. Allows computational representation C. RNN can read sentences at word space A. Compute friendly D. Seq2seq provides a way to generate sentences A. More fl exibility E. Attention lets you handle long sentences. ∞
  • 69. Recap Big ideas in NLP A. Language is important, but hard to compute A. Context, nuances, -many possible sentences B. Word2vec creates a mean to map words to meaning vectors A. Allows computational representation C. RNN can read sentences at word space A. Compute friendly D. Seq2seq provides a way to generate sentences A. More fl exibility E. Attention lets you handle long sentences ∞
  • 70. Recap Big ideas in NLP A. Language is important, but hard to compute A. Context, nuances, -many possible sentences B. Word2vec creates a mean to map words to meaning vectors A. Allows computational representation C. RNN can read sentences at word space A. Compute friendly D. Seq2seq provides a way to generate sentences A. More fl exibility E. Attention lets you handle long sentences ∞ Simple and effective ideas changed the game
  • 74. John Carmack on Lex Fridman’s podcast: https://youtu.be/xLi83prR5fg He also said (something along these lines): “The remaining ideas are simple enough to be written down on a back of an envelope” “The code for AGI will be ~10,000 lines of code and will take one man to implement it” The father of FPS
  • 75. Where to learn more? A. Stanford CS224n: Natural Language Processing - lecture B. Unreasonable E ff ectiveness of Recurrent Neural Networks - article C. Illustrated Transformer - article D. Speech and Text Processing - book E. MIT 6.S191: Introduction to Deep Learning - lecture