Beyond the Symbols: A 30-minute Overview of NLP

Beyond the $!mb0ls
A 30-minute Overview of Natural Language Processing
Mengsay Loem
Tokyo Insitute of Technology
2023/06/17

NLP-based Applications/Services
2
https://twitter.com/EconomyApp/status/1622029832099082241

What is NLP? Natural Language Processing
• Processing of natural languages used by humans
• e.g., English, Japanese, Khmer, Chinese …
• Making natural language understandable to machines
3
English
⽇本語
!"ែខ%រ
C/C++, Java
Python, R
Lojban
Natural Language Artificial Language
Natural Language Processing

Some Tasks in NLP
4
Recent advances in dept
learning-based optimization and
computational hardware have
greatly facilitated progress in
natural language processing
近年のディープラーニングに
基づく最適化と計算機ハード
ウェアの進歩は、⾃然⾔語処
理の進展を⼤きく促している
⾃然⾔語処理の進展促す
＝ディープラーニング最適化
と計算機ハードウェア
Recent advances in dept
learning-based optimization and
computational hardware has
greatly facilitated progress in
naturail language processing
Machine translation Information Retrieval Proofreading/
Error Correction
Document
classification/clustering
Dialogue System
Proofreading/
Error Correction
Translation
Summarization

Difficulties in NLP
Back to Basic of Languages
5

What are Languages for?
6
Tool for Communication Tool for Thought Tool for Record

Some Features of Languages
7
Arbitrariness
Connection between a
word and its meaning
is often arbitrary
Sociolinguistics
Many aspects of language
usage are based on social
customs and can't always
be logically explained
Evolving
Language changes over
time and varies across
regions and cultures
Networked
Language reflects complex
relationships between
entities and concepts
Ambiguous
A single expression can
have multiple meanings
based on context

Some Features of Languages
8
Arbitrariness
Connection between a
word and its meaning
is often arbitrary
Sociolinguistics
Many aspects of language
usage are based on social
customs and can't always
be logically explained
Evolving
Language changes over
time and varies across
regions and cultures
Networked
Language reflects complex
relationships between
entities and concepts
Ambiguous
A single expression can
have multiple meanings
based on context
Computer must have vast
knowledge of languages
Computers must be flexible in interpreting
meaning in text

Some Difficulties in NLP
• Ambiguity in Semantics
• sleep = 寝る
• Natural Language Processing = NLP
• … machine learning… ; … car machine …
• Dealing with hierarchical sequential data
• character → word → phrase → sentence → paragraph → text
• What does language comprehension entail for machine?
• To machine, input text is just a sequence to SYMBOLs
9

Approaches to NLP
From Rule-based to Deep Learning-based
10

Before vs. After Deep Learning
11
Input text
POS Tagging
Syntatic Parsing
Predicate-Argument
Recognition
Application System
Output
Input text
Output
Training
Data
Deep Learning based NLP
Traditional NLP
Training
Data
Neural Network

NLP’s Methods in a Century
• 1940s~1960s
• First Computer: ENIAC (1946)
• Translating Machine Project (1952)
• Limit of Computing Performance
• 1960s~1990s
• Digital text data; Brown Corpus (1967)
• MEDLINE Database Service (1971)
• Manual-Rule-based Text Analysis
• 1990s~2010s (※1990: WWW, 1998: Google)
• Large-scale Corpora + Machine Learning
• Translation by Analogy (京大1981)
• Statistical Machine Translation (1980s)
• CALO (origin of Siri), Watson
12
• 2010s
• Mainly on Neural Network
• Word2Vec (Google, 2013)
• Neural Machine Translation (2014)
• Transformer (Google, 2017)
• Pre-trained Language Models, BERT
(2018), GPT-2 (2019), BART(2019)
• 2020s
• Large-scaled Pre-trained Language
Models (LLMs)
• GPT-3 (2020), T5 (2020), ChatGPT
(2022), GPT-4(2023)

Neural Network + NLP
Basic of Neural Network and How it works in NLP
13

Basic structure of Neural Network
• Linear and Non-linear Transformation
• Matrix Multiplication＋Non-linear Activation Function
• ! = # ∑% &%'%
14
'(
')
0.3
−0.5
0.1
1
1 + 123 0.76
0.4
∑
'7
'8

• ! = # ∑% &%'%
• Input: adjective and noun in a movie review
• Output: 0 (negative) / 1 (positive)
• Such a wonderful movie. → 1 (positive)
15
movie
boring
wonderful
0.3
−2.5
3.1
1
1 + 012 0.97
time 0.1
∑

• ! = # ∑% &%'%
• Boring time of the year. → 0 (negative)
16
movie
boring
wonderful
0.3
−2.5
3.1
1
1 + 012 0.08
time 0.1
∑

• ! = # ∑% &%'%
• Boring time of the year. → 0 (negative)
17
movie
boring
wonderful
0.3
−2.5
3.1
1
1 + 012 0.08
time 0.1
∑
How to represent these words?

Neural Network for NLP
• How to represent input word/sentence/text as a vector ?
• Simple solution: one-hot vector for word level
• Bag-of-words: each element represents occurrence frequency
18
0
0
1
⋮
0
0
1
0
0
⋮
0
0
movie
great
like
⋮
love
I
like
movie
1
0
1
⋮
0
1
I like movie
vocabulary
size
(10K - 100K)

• Problems:
• Cannot deal with Synonymy
• Cannot deal with differences in word orders
• Distributional hypothesis
words in the same contexts tend to be sematically similar
19
1
0
0
⋮
0
0
like
0
0
0
⋮
1
0
love

Embedding Representation
• Represent a word by a real number vector
• Share features of each word in embedding space (vector space)
• Represent similar words with similar vectors
• (use lower dimension compared to Bag-of-words)
20
0.12
−1.90
⋮
0.55
1.37
book dictionary
0.17
−1.80
⋮
0.52
1.57
phone
0.97
0.10
⋮
1.63
−0.11
Similar Not Similar
Dense vector
100 - 1K dimensions
book
dictionary
phone
Defined by Neural Network

word2vec
• Learn to represent a word by an embedding vector
• CBoW, skip-gram
• Example:
• Start with randomly initialized embedding vectors
• Learn to predict a target word given its contexts
• Maximize target word’s probability
21
…… he ate an apple yesterday with ……
!"#$ !"#% !" !"&% !"&$
'
"(%
)
'
#*+,+*,,./
log 3 !"|!"&,

• Problems:
→ Embedding representation
22
1.67
0.45
−1.80
0.12
like
1.57
0.65
−1.60
0.15
love

• Problems:
→ Embedding representation
→ revise network architecture
23

Dealing with Sequential Data
• Recurrent Neural Network
• Related architectures
• Long Short-Term Memory (LSTM)
• Gated Recurrent Unit (GRU)
• Transformer
• Attention mechanism
• Basic Neural Network architecture (Feed-Forward Network)
24

Solving NLP Tasks: Examples
• Task:
• Machine Translation (MT) : English → Khmer
• Summarization : Document → Summary
• Grammatical Error Correction
• Model:
• Encoder-Decoder
• Data
• Parallel data
{(input, output)}
25
Encoder Decoder
Hidden
Representation
Input sequence
Output sequence
RNN, LSTM, Transformer,…

LLMs Era of NLP
What is happening with Large-scale Language Models?
26

LLMs Revolution
27
https://github.com/Mooler0410/LLMsPracticalGuide

Solving NLP Tasks with LLMs
• Pre-train & Fine-tune Paradigm
• Pre-train a language model with large corpora
• Fine-tune pre-trained model on specific task with (small) data sets
• Prompt-based Method
• Pre-train a language model with large corpora
• Ask model to solve various tasks with prompt written in natural
language
28

Pre-train & Fine-tune Paradigm
• Train a Language Model with (very very) large data sets
• Fine-tune pre-trained model on (specific) target tasks
• Document Classification, Machine Translation, Summarization…
29
機械学習（きかいがくしゅう、英: machine
learning）とは、経験からの学習により⾃
動で改善するコンピューターアルゴリズム
もしくはその研究領域で[1][2]、⼈⼯知能
の⼀種であるとみなされている。「訓練
データ」もしくは「学習データ」と呼ばれ
るデータを使って学習し、学習結果を使っ
て何らかのタスクをこなす。例えば過去の
スパムメールを訓練データとして⽤いて学
習し、スパムフィルタリングというタスク
をこなす、といった事が可能となる。
Machine learning (ML) is the study of
computer algorithms that can improve
automatically through experience and by the
use of data.[1] It is seen as a part of artificial
intelligence. Machine learning algorithms build
a model based on sample data, known as
training data, in order to make predictions or
decisions without being explicitly programmed
to do so.[2] Machine learning algorithms are
used in a wide variety of applications, such as
in medicine, email filtering, speech recognition,
and computer vision, where it is difficult or
unfeasible to develop conventional algorithms
to perform the needed tasks.[3]
Fine-tuning
Recent advances in deep learning-based optimization and
computational hardware have greatly facilitated progress in
natural language processing
近年のディープラーニングに基づく最適化と計算機ハードウェ
アの進歩は、⾃然⾔語処理の進展を⼤きく促している
Pre-training
I have a GPT-n pen
I [MASK] a pen BERT have
I [MASK] pen a BART I have a pen

Pre-trained Model
• Why pre-train on Language Model ?
• Large scale training data is available !!!
• Neural-based model is Data Hungry!
30
Pre-training
I have a GPT-n pen
I [MASK] a pen BERT have
I [MASK] pen a BART I have a pen

Prompt-based Methods
31
• ChatGPT/GPT-x

How ChatGPT works?
From next word prediction to human-instructed training
32

(Probabilistic) Language Model
• Models that assign probabilities to sequence of words
• To evaluate text (sequence) generated by a system
• ! I, like, watching, movie > ! I, eat, watch, movie
• Predict next coming word
• Many NLP tasks can be formulated as language modeling
33
The capital of Japan is
Beijing
Seattle
Tokyo
0.20
0.05
0.75
! Tokyo | The, capital, of, Japan, is = 0.75
! The, capital, of, Japan, is, Tokyo, ja , 日本, の, 首都, は, A

Training with Human Feedback
34
L Ouyang, J Wu, X Jiang, D Almeida, et. al. 2022. Training Language Models to Follow Instructions with Human Feedback. arXiv:2203.02155.

Improve LLMs with Discussions
• Solving NLP Problems through Human-System
Collaboration: A Discussion-based Approach
• Kaneko et al. 2023
35

Beyond Performance
Risks and Problems to Solve in LLMs Era
36

Threats to LLMs
• Turning point in a wide range of fields
• including search engine, finance, advertising, education, and legal.
• There's been an explosive increase in services incorporating LLMs.
• Jobs such as translators, investigators, writers, etc. are being shortened
(Eloundou+2023).
• Hallucinations
• Even when unsure, they can calmly lie, responding without basis in fact.
• Bias
• They learn and amplify societal biases related to gender, race, etc.
• They can be adjusted to respond in ways that benefit specific individuals
or groups.
• Personal information exposure
• Misuse
37
- T Eloundou, S Manning, P Mishkin, D Rock. 2023. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large
Language Models. arXiv:2303.10130.
- 岡崎直観 2023. ⼤規模⾔語モデルの驚異と脅威. https://speakerdeck.com/chokkan/20230327_riken_llm

40
Data and GPUs
are All You Need!

Summary
• NLP aims to make human’s language understandable to
machines.
• Many difficulties for NLP due to languages’ characteristics,
such as ambiguity, complex network structures, etc.
• Deep Learning-based methods have been pushing NLP to an
impressive achievement over these decades.
• New era of NLP has been coming with large-scale language
models
• There are still many problems to deal with NLP
• Low-resource languages
• Bias, Hallucination, etc.
41

Beyond the Symbols: A 30-minute Overview of NLP

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Beyond the Symbols: A 30-minute Overview of NLP

Similar to Beyond the Symbols: A 30-minute Overview of NLP (20)

Recently uploaded

Recently uploaded (20)

Beyond the Symbols: A 30-minute Overview of NLP