This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
3. What is NLP? Natural Language Processing
• Processing of natural languages used by humans
• e.g., English, Japanese, Khmer, Chinese …
• Making natural language understandable to machines
3
English
⽇本語
!"ែខ%រ
C/C++, Java
Python, R
Lojban
Natural Language Artificial Language
Natural Language Processing
4. Some Tasks in NLP
4
Recent advances in dept
learning-based optimization and
computational hardware have
greatly facilitated progress in
natural language processing
近年のディープラーニングに
基づく最適化と計算機ハード
ウェアの進歩は、⾃然⾔語処
理の進展を⼤きく促している
⾃然⾔語処理の進展促す
=ディープラーニング最適化
と計算機ハードウェア
Recent advances in dept
learning-based optimization and
computational hardware has
greatly facilitated progress in
naturail language processing
Machine translation Information Retrieval Proofreading/
Error Correction
Document
classification/clustering
Dialogue System
Proofreading/
Error Correction
Translation
Summarization
6. What are Languages for?
6
Tool for Communication Tool for Thought Tool for Record
7. Some Features of Languages
7
Arbitrariness
Connection between a
word and its meaning
is often arbitrary
Sociolinguistics
Many aspects of language
usage are based on social
customs and can't always
be logically explained
Evolving
Language changes over
time and varies across
regions and cultures
Networked
Language reflects complex
relationships between
entities and concepts
Ambiguous
A single expression can
have multiple meanings
based on context
8. Some Features of Languages
8
Arbitrariness
Connection between a
word and its meaning
is often arbitrary
Sociolinguistics
Many aspects of language
usage are based on social
customs and can't always
be logically explained
Evolving
Language changes over
time and varies across
regions and cultures
Networked
Language reflects complex
relationships between
entities and concepts
Ambiguous
A single expression can
have multiple meanings
based on context
Computer must have vast
knowledge of languages
Computers must be flexible in interpreting
meaning in text
9. Some Difficulties in NLP
• Ambiguity in Semantics
• sleep = 寝る
• Natural Language Processing = NLP
• … machine learning… ; … car machine …
• Dealing with hierarchical sequential data
• character → word → phrase → sentence → paragraph → text
• What does language comprehension entail for machine?
• To machine, input text is just a sequence to SYMBOLs
9
11. Before vs. After Deep Learning
11
Input text
POS Tagging
Syntatic Parsing
Predicate-Argument
Recognition
Application System
Output
Input text
Output
Training
Data
Deep Learning based NLP
Traditional NLP
Training
Data
Neural Network
12. NLP’s Methods in a Century
• 1940s~1960s
• First Computer: ENIAC (1946)
• Translating Machine Project (1952)
• Limit of Computing Performance
• 1960s~1990s
• Digital text data; Brown Corpus (1967)
• MEDLINE Database Service (1971)
• Manual-Rule-based Text Analysis
• 1990s~2010s (※1990: WWW, 1998: Google)
• Large-scale Corpora + Machine Learning
• Translation by Analogy (京大1981)
• Statistical Machine Translation (1980s)
• CALO (origin of Siri), Watson
12
• 2010s
• Mainly on Neural Network
• Word2Vec (Google, 2013)
• Neural Machine Translation (2014)
• Transformer (Google, 2017)
• Pre-trained Language Models, BERT
(2018), GPT-2 (2019), BART(2019)
• 2020s
• Large-scaled Pre-trained Language
Models (LLMs)
• GPT-3 (2020), T5 (2020), ChatGPT
(2022), GPT-4(2023)
13. Neural Network + NLP
Basic of Neural Network and How it works in NLP
13
14. Basic structure of Neural Network
• Linear and Non-linear Transformation
• Matrix Multiplication+Non-linear Activation Function
• ! = # ∑% &%'%
14
'(
')
0.3
−0.5
0.1
1
1 + 123 0.76
0.4
∑
'7
'8
15. Basic structure of Neural Network
• Linear and Non-linear Transformation
• Matrix Multiplication+Non-linear Activation Function
• ! = # ∑% &%'%
• Input: adjective and noun in a movie review
• Output: 0 (negative) / 1 (positive)
• Such a wonderful movie. → 1 (positive)
15
movie
boring
wonderful
0.3
−2.5
3.1
1
1 + 012 0.97
time 0.1
∑
16. Basic structure of Neural Network
• Linear and Non-linear Transformation
• Matrix Multiplication+Non-linear Activation Function
• ! = # ∑% &%'%
• Input: adjective and noun in a movie review
• Output: 0 (negative) / 1 (positive)
• Boring time of the year. → 0 (negative)
16
movie
boring
wonderful
0.3
−2.5
3.1
1
1 + 012 0.08
time 0.1
∑
17. Basic structure of Neural Network
• Linear and Non-linear Transformation
• Matrix Multiplication+Non-linear Activation Function
• ! = # ∑% &%'%
• Input: adjective and noun in a movie review
• Output: 0 (negative) / 1 (positive)
• Boring time of the year. → 0 (negative)
17
movie
boring
wonderful
0.3
−2.5
3.1
1
1 + 012 0.08
time 0.1
∑
How to represent these words?
18. Neural Network for NLP
• How to represent input word/sentence/text as a vector ?
• Simple solution: one-hot vector for word level
• Bag-of-words: each element represents occurrence frequency
18
0
0
1
⋮
0
0
1
0
0
⋮
0
0
movie
great
like
⋮
love
I
like
movie
1
0
1
⋮
0
1
I like movie
vocabulary
size
(10K - 100K)
19. Neural Network for NLP
• How to represent input word/sentence/text as a vector ?
• Simple solution: one-hot vector for word level
• Bag-of-words: each element represents occurrence frequency
• Problems:
• Cannot deal with Synonymy
• Cannot deal with differences in word orders
• Distributional hypothesis
words in the same contexts tend to be sematically similar
19
1
0
0
⋮
0
0
like
0
0
0
⋮
1
0
love
20. Embedding Representation
• Represent a word by a real number vector
• Share features of each word in embedding space (vector space)
• Represent similar words with similar vectors
• (use lower dimension compared to Bag-of-words)
20
0.12
−1.90
⋮
0.55
1.37
book dictionary
0.17
−1.80
⋮
0.52
1.57
phone
0.97
0.10
⋮
1.63
−0.11
Similar Not Similar
Dense vector
100 - 1K dimensions
book
dictionary
phone
Defined by Neural Network
21. word2vec
• Learn to represent a word by an embedding vector
• CBoW, skip-gram
• Example:
• Start with randomly initialized embedding vectors
• Learn to predict a target word given its contexts
• Maximize target word’s probability
21
…… he ate an apple yesterday with ……
!"#$ !"#% !" !"&% !"&$
'
"(%
)
'
#*+,+*,,./
log 3 !"|!"&,
22. Neural Network for NLP
• How to represent input word/sentence/text as a vector ?
• Simple solution: one-hot vector for word level
• Bag-of-words: each element represents occurrence frequency
• Problems:
• Cannot deal with Synonymy
→ Embedding representation
• Cannot deal with differences in word orders
• Distributional hypothesis
words in the same contexts tend to be sematically similar
22
1.67
0.45
−1.80
0.12
like
1.57
0.65
−1.60
0.15
love
23. Neural Network for NLP
• How to represent input word/sentence/text as a vector ?
• Simple solution: one-hot vector for word level
• Bag-of-words: each element represents occurrence frequency
• Problems:
• Cannot deal with Synonymy
→ Embedding representation
• Cannot deal with differences in word orders
→ revise network architecture
• Distributional hypothesis
words in the same contexts tend to be sematically similar
23
24. Dealing with Sequential Data
• Recurrent Neural Network
• Related architectures
• Long Short-Term Memory (LSTM)
• Gated Recurrent Unit (GRU)
• Transformer
• Attention mechanism
• Basic Neural Network architecture (Feed-Forward Network)
24
28. Solving NLP Tasks with LLMs
• Pre-train & Fine-tune Paradigm
• Pre-train a language model with large corpora
• Fine-tune pre-trained model on specific task with (small) data sets
• Prompt-based Method
• Pre-train a language model with large corpora
• Ask model to solve various tasks with prompt written in natural
language
28
29. Pre-train & Fine-tune Paradigm
• Train a Language Model with (very very) large data sets
• Fine-tune pre-trained model on (specific) target tasks
• Document Classification, Machine Translation, Summarization…
29
機械学習(きかいがくしゅう、英: machine
learning)とは、経験からの学習により⾃
動で改善するコンピューターアルゴリズム
もしくはその研究領域で[1][2]、⼈⼯知能
の⼀種であるとみなされている。「訓練
データ」もしくは「学習データ」と呼ばれ
るデータを使って学習し、学習結果を使っ
て何らかのタスクをこなす。例えば過去の
スパムメールを訓練データとして⽤いて学
習し、スパムフィルタリングというタスク
をこなす、といった事が可能となる。
Machine learning (ML) is the study of
computer algorithms that can improve
automatically through experience and by the
use of data.[1] It is seen as a part of artificial
intelligence. Machine learning algorithms build
a model based on sample data, known as
training data, in order to make predictions or
decisions without being explicitly programmed
to do so.[2] Machine learning algorithms are
used in a wide variety of applications, such as
in medicine, email filtering, speech recognition,
and computer vision, where it is difficult or
unfeasible to develop conventional algorithms
to perform the needed tasks.[3]
Fine-tuning
Recent advances in deep learning-based optimization and
computational hardware have greatly facilitated progress in
natural language processing
近年のディープラーニングに基づく最適化と計算機ハードウェ
アの進歩は、⾃然⾔語処理の進展を⼤きく促している
Pre-training
I have a GPT-n pen
I [MASK] a pen BERT have
I [MASK] pen a BART I have a pen
30. Pre-trained Model
• Why pre-train on Language Model ?
• Large scale training data is available !!!
• Neural-based model is Data Hungry!
30
Pre-training
I have a GPT-n pen
I [MASK] a pen BERT have
I [MASK] pen a BART I have a pen
33. (Probabilistic) Language Model
• Models that assign probabilities to sequence of words
• To evaluate text (sequence) generated by a system
• ! I, like, watching, movie > ! I, eat, watch, movie
• Predict next coming word
• Many NLP tasks can be formulated as language modeling
33
The capital of Japan is
Beijing
Seattle
Tokyo
0.20
0.05
0.75
! Tokyo | The, capital, of, Japan, is = 0.75
! The, capital, of, Japan, is, Tokyo, ja , 日本, の, 首都, は, A
34. Training with Human Feedback
34
L Ouyang, J Wu, X Jiang, D Almeida, et. al. 2022. Training Language Models to Follow Instructions with Human Feedback. arXiv:2203.02155.
35. Improve LLMs with Discussions
• Solving NLP Problems through Human-System
Collaboration: A Discussion-based Approach
• Kaneko et al. 2023
35
37. Threats to LLMs
• Turning point in a wide range of fields
• including search engine, finance, advertising, education, and legal.
• There's been an explosive increase in services incorporating LLMs.
• Jobs such as translators, investigators, writers, etc. are being shortened
(Eloundou+2023).
• Hallucinations
• Even when unsure, they can calmly lie, responding without basis in fact.
• Bias
• They learn and amplify societal biases related to gender, race, etc.
• They can be adjusted to respond in ways that benefit specific individuals
or groups.
• Personal information exposure
• Misuse
37
- T Eloundou, S Manning, P Mishkin, D Rock. 2023. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large
Language Models. arXiv:2303.10130.
- 岡崎 直観 2023. ⼤規模⾔語モデルの驚異と脅威. https://speakerdeck.com/chokkan/20230327_riken_llm
41. Summary
• NLP aims to make human’s language understandable to
machines.
• Many difficulties for NLP due to languages’ characteristics,
such as ambiguity, complex network structures, etc.
• Deep Learning-based methods have been pushing NLP to an
impressive achievement over these decades.
• New era of NLP has been coming with large-scale language
models
• There are still many problems to deal with NLP
• Low-resource languages
• Bias, Hallucination, etc.
41