I'll share some slides I prepared for a workshop on NLP with deep learning.
Maybe you could consider using some of the figures.
Or I’d appreciate some feedbacks.
2. Supervised Learning
Blackbox Model
• Approximation of real-world blackbox
• Supervision by errors between predictions and lables
Model
Correct
Incorrect
Correct
Supervision
4. Deep Learning: Model as Neural Networks
• NLP
Verctors → Vectors
A tensor → Vectors/tensors
Positive
Negative
Cat
Dog
Horse
Goat…
• Image processing
5. Deep Learning: Various Training Examples
• Sentiment analysis
Verctors → Vectors
Positive
Negative
• Translation
Verctors → Vectors
• ChatGPT
Verctors → Vectors
No. 1
No. 2
No. 3
Correct
label
Correct
translation
Giving rankings
to outputs
6. Expressions of Tokens in Deep Learning
One-hot encoding Word embedding
”a” ”am” ”!”
No. 1
No. 2
No. N
• Sparse
• High-dimensional (N words)
• Hardcoded
• Dense
• Lower-dimensional (D dimensional)
• Learned from data
”a” ”am” ”!”
No. 1
No. D
7. Expression of Texts in Deep Learning
”am”
”person”
”banana”
”love”
“a”
”cat”
”dog”
”!”
”I”
”am”
”a”
”cat”
”person”
One-hot
encoding
Word
embedding
Text embedding
Neural network
These embeddings are gained through trainings
8. Self-Supervised Learning: BERT and GPT
• BERT: filling a blank • GPT: predicting the next token
Word/text embedding can be trained without labels
I am a person
I am a cat person
I
I am
I am
I am a
I am a
I am a cat
9. Transfer Learning, Intuitively
Conventional machine learning Transfer learning
• Processing “coffee beans“
for every task
• Needs relatively a lot of data
• Using preprocessed „instant coffee“
for various tasks
• Performance can be adjusted with
extra “coffee beans“
10. Transfer Learning: Using Coffee (Embeddings)
Pre-trained CNN
(ResNet, ViT, etc.)
Pre-trained NLP models
(ElMo, BERT, GPT, etc.)
An input
image
An input
text
or
or
A tensor
(usually
3 channel)
A sequence
of vectors
A converted
tensor
A sequence
of vectors
A vector
• Image classification
• Object detection
• Image captioning etc.
• Sentiment analysis
• Token classification
• Summarization etc.
A vector
11. Visualizing Embeddings with Dimension Reduction
Pre-trained CNN
(ResNet, ViT, etc.)
Pre-trained NLP models
(ElMo, BERT, GPT, etc.)
Input
images
Input
texts
Tensors
Vectors
Embeddings
(hundreds or thousands
of dimensions)
Dimension
Reduction
(t-SNE, UMAP)
Pretrained model
2D visualization
In the figure, a vector is a word, or a label. And a sequence of vectors is a sentence.
In the figure, a matrix means a 1 channel image. And a tensor means a multi channel (RGB) tensor.
Tranformer is a small component in the networks
The ResNet is trained on image classification on Imagenet dataset.
Classical discrete topic modelings are used in Job Skill App. In Job Skill App, we filetered out only necessary tokens to use.
LDA: Latent Dirichlet allocation
NMF: Non-negative matrix factorization
SBERT, UMAP, HDBSCAN etc. Choices of them are up to your design.
Dimensionality reduction is needed to tackle curse of dimensionality and to get better clustering.
Stop words are filtered out in the tokenization step for better representations of topics.