NLP with Deep Learning
Yasuto Tamura
Supervised Learning
Blackbox Model
• Approximation of real-world blackbox
• Supervision by errors between predictions and lables
Model
Correct
Incorrect
Correct
Supervision
Unsupervised Learning
• Dimension reduction • Clustering
Handcrafted
rules
Handcrafted
rules
Deep Learning: Model as Neural Networks
• NLP
Verctors → Vectors
A tensor → Vectors/tensors
Positive
Negative
Cat
Dog
Horse
Goat…
• Image processing
Deep Learning: Various Training Examples
• Sentiment analysis
Verctors → Vectors
Positive
Negative
• Translation
Verctors → Vectors
• ChatGPT
Verctors → Vectors
No. 1
No. 2
No. 3
Correct
label
Correct
translation
Giving rankings
to outputs
Expressions of Tokens in Deep Learning
 One-hot encoding  Word embedding
”a” ”am” ”!”
No. 1
No. 2
No. N
• Sparse
• High-dimensional (N words)
• Hardcoded
• Dense
• Lower-dimensional (D dimensional)
• Learned from data
”a” ”am” ”!”
No. 1
No. D
Expression of Texts in Deep Learning
”am”
”person”
”banana”
”love”
“a”
”cat”
”dog”
”!”
”I”
”am”
”a”
”cat”
”person”
One-hot
encoding
Word
embedding
Text embedding
Neural network
These embeddings are gained through trainings
Self-Supervised Learning: BERT and GPT
• BERT: filling a blank • GPT: predicting the next token
Word/text embedding can be trained without labels
I am a person
I am a cat person
I
I am
I am
I am a
I am a
I am a cat
Transfer Learning, Intuitively
Conventional machine learning Transfer learning
• Processing “coffee beans“
for every task
• Needs relatively a lot of data
• Using preprocessed „instant coffee“
for various tasks
• Performance can be adjusted with
extra “coffee beans“
Transfer Learning: Using Coffee (Embeddings)
Pre-trained CNN
(ResNet, ViT, etc.)
Pre-trained NLP models
(ElMo, BERT, GPT, etc.)
An input
image
An input
text
or
or
A tensor
(usually
3 channel)
A sequence
of vectors
A converted
tensor
A sequence
of vectors
A vector
• Image classification
• Object detection
• Image captioning etc.
• Sentiment analysis
• Token classification
• Summarization etc.
A vector
Visualizing Embeddings with Dimension Reduction
Pre-trained CNN
(ResNet, ViT, etc.)
Pre-trained NLP models
(ElMo, BERT, GPT, etc.)
Input
images
Input
texts
Tensors
Vectors
Embeddings
(hundreds or thousands
of dimensions)
Dimension
Reduction
(t-SNE, UMAP)
Pretrained model
2D visualization
Visualizing Embeddings by ResNet 101
Visualizing Embeddings by BERT
Types of Topic Analysis
Topic analysis
Topic classification Topic modleing
BERTopic
Conventional topic
modleing (LDA, NMF)
• Needs labels for our uses
• Technically same as
sentiment classification
Embedding Discrete
Supervised Unsupervised
• Cluster more fluent texts
• Effective for Twitter
• Unstructured texts
• Can detect relations of BoW
BERTopic: Clustering Text Embedding
Pre-trained NLP models
(ElMo, BERT, GPT, etc.)
Input
texts
Topic 1
Topic 3
Topic 2

NLP_deep_learning_intro.pptx

  • 1.
    NLP with DeepLearning Yasuto Tamura
  • 2.
    Supervised Learning Blackbox Model •Approximation of real-world blackbox • Supervision by errors between predictions and lables Model Correct Incorrect Correct Supervision
  • 3.
    Unsupervised Learning • Dimensionreduction • Clustering Handcrafted rules Handcrafted rules
  • 4.
    Deep Learning: Modelas Neural Networks • NLP Verctors → Vectors A tensor → Vectors/tensors Positive Negative Cat Dog Horse Goat… • Image processing
  • 5.
    Deep Learning: VariousTraining Examples • Sentiment analysis Verctors → Vectors Positive Negative • Translation Verctors → Vectors • ChatGPT Verctors → Vectors No. 1 No. 2 No. 3 Correct label Correct translation Giving rankings to outputs
  • 6.
    Expressions of Tokensin Deep Learning  One-hot encoding  Word embedding ”a” ”am” ”!” No. 1 No. 2 No. N • Sparse • High-dimensional (N words) • Hardcoded • Dense • Lower-dimensional (D dimensional) • Learned from data ”a” ”am” ”!” No. 1 No. D
  • 7.
    Expression of Textsin Deep Learning ”am” ”person” ”banana” ”love” “a” ”cat” ”dog” ”!” ”I” ”am” ”a” ”cat” ”person” One-hot encoding Word embedding Text embedding Neural network These embeddings are gained through trainings
  • 8.
    Self-Supervised Learning: BERTand GPT • BERT: filling a blank • GPT: predicting the next token Word/text embedding can be trained without labels I am a person I am a cat person I I am I am I am a I am a I am a cat
  • 9.
    Transfer Learning, Intuitively Conventionalmachine learning Transfer learning • Processing “coffee beans“ for every task • Needs relatively a lot of data • Using preprocessed „instant coffee“ for various tasks • Performance can be adjusted with extra “coffee beans“
  • 10.
    Transfer Learning: UsingCoffee (Embeddings) Pre-trained CNN (ResNet, ViT, etc.) Pre-trained NLP models (ElMo, BERT, GPT, etc.) An input image An input text or or A tensor (usually 3 channel) A sequence of vectors A converted tensor A sequence of vectors A vector • Image classification • Object detection • Image captioning etc. • Sentiment analysis • Token classification • Summarization etc. A vector
  • 11.
    Visualizing Embeddings withDimension Reduction Pre-trained CNN (ResNet, ViT, etc.) Pre-trained NLP models (ElMo, BERT, GPT, etc.) Input images Input texts Tensors Vectors Embeddings (hundreds or thousands of dimensions) Dimension Reduction (t-SNE, UMAP) Pretrained model 2D visualization
  • 12.
  • 13.
  • 14.
    Types of TopicAnalysis Topic analysis Topic classification Topic modleing BERTopic Conventional topic modleing (LDA, NMF) • Needs labels for our uses • Technically same as sentiment classification Embedding Discrete Supervised Unsupervised • Cluster more fluent texts • Effective for Twitter • Unstructured texts • Can detect relations of BoW
  • 15.
    BERTopic: Clustering TextEmbedding Pre-trained NLP models (ElMo, BERT, GPT, etc.) Input texts Topic 1 Topic 3 Topic 2

Editor's Notes

  • #5 In the figure, a vector is a word, or a label. And a sequence of vectors is a sentence. In the figure, a matrix means a 1 channel image. And a tensor means a multi channel (RGB) tensor.
  • #9 Tranformer is a small component in the networks
  • #13 The ResNet is trained on image classification on Imagenet dataset.
  • #15 Classical discrete topic modelings are used in Job Skill App. In Job Skill App, we filetered out only necessary tokens to use. LDA: Latent Dirichlet allocation NMF: Non-negative matrix factorization
  • #16 SBERT, UMAP, HDBSCAN etc. Choices of them are up to your design. Dimensionality reduction is needed to tackle curse of dimensionality and to get better clustering. Stop words are filtered out in the tokenization step for better representations of topics.