NLP_deep_learning_intro.pptx

NLP with Deep Learning
Yasuto Tamura

Supervised Learning
Blackbox Model
• Approximation of real-world blackbox
• Supervision by errors between predictions and lables
Model
Correct
Incorrect
Correct
Supervision

Unsupervised Learning
• Dimension reduction • Clustering
Handcrafted
rules
Handcrafted
rules

Deep Learning: Model as Neural Networks
• NLP
Verctors → Vectors
A tensor → Vectors/tensors
Positive
Negative
Cat
Dog
Horse
Goat…
• Image processing

Deep Learning: Various Training Examples
• Sentiment analysis
Positive
Negative
• Translation
• ChatGPT
No. 1
No. 2
No. 3
Correct
label
Correct
translation
Giving rankings
to outputs

Expressions of Tokens in Deep Learning
 One-hot encoding  Word embedding
”a” ”am” ”!”
No. 1
No. 2
No. N
• Sparse
• High-dimensional (N words)
• Hardcoded
• Dense
• Lower-dimensional (D dimensional)
• Learned from data
”a” ”am” ”!”
No. 1
No. D

Expression of Texts in Deep Learning
”am”
”person”
”banana”
”love”
“a”
”cat”
”dog”
”!”
”I”
”am”
”a”
”cat”
”person”
One-hot
encoding
Word
embedding
Text embedding
Neural network
These embeddings are gained through trainings

Self-Supervised Learning: BERT and GPT
• BERT: filling a blank • GPT: predicting the next token
Word/text embedding can be trained without labels
I am a person
I am a cat person
I
I am
I am
I am a
I am a
I am a cat

Transfer Learning, Intuitively
Conventional machine learning Transfer learning
• Processing “coffee beans“
for every task
• Needs relatively a lot of data
• Using preprocessed „instant coffee“
for various tasks
• Performance can be adjusted with
extra “coffee beans“

Transfer Learning: Using Coffee (Embeddings)
Pre-trained CNN
(ResNet, ViT, etc.)
Pre-trained NLP models
(ElMo, BERT, GPT, etc.)
An input
image
An input
text
or
or
A tensor
(usually
3 channel)
A sequence
of vectors
A converted
tensor
A sequence
of vectors
A vector
• Image classification
• Object detection
• Image captioning etc.
• Sentiment analysis
• Token classification
• Summarization etc.
A vector

Visualizing Embeddings with Dimension Reduction
Pre-trained CNN
(ResNet, ViT, etc.)
Input
images
Input
texts
Tensors
Vectors
Embeddings
(hundreds or thousands
of dimensions)
Dimension
Reduction
(t-SNE, UMAP)
Pretrained model
2D visualization

Visualizing Embeddings by ResNet 101

Visualizing Embeddings by BERT

Types of Topic Analysis
Topic analysis
Topic classification Topic modleing
BERTopic
Conventional topic
modleing (LDA, NMF)
• Needs labels for our uses
• Technically same as
sentiment classification
Embedding Discrete
Supervised Unsupervised
• Cluster more fluent texts
• Effective for Twitter
• Unstructured texts
• Can detect relations of BoW

BERTopic: Clustering Text Embedding
Input
texts
Topic 1
Topic 3
Topic 2

NLP_deep_learning_intro.pptx

More Related Content

What's hot

Similar to NLP_deep_learning_intro.pptx

More from YasutoTamura1

Recently uploaded

NLP_deep_learning_intro.pptx

Editor's Notes