SlideShare a Scribd company logo
[introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
● [Part 1, Part 2] https://www.fai.cds.tohoku.ac.jp/research/activities/
● [Part 3, Part 4] https://speakerdeck.com/kyoun/pakdd2023-tutorial
● [Part 5] https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Schedule
● Overview [This part]
● Part 1: [introductory] Language models (LMs)
● Part 2: Large language models (LLMs)
● Part 3: Technologies underlying ChatGPT-like LLMs
● Part 4: Recent achievements in ChatGPT-like LLMs
● Part 5: Efforts for Responsible LLMs
● Q&A
10 min
20 min
20 min
20 min
20 min
(Short Break: 10 min)
(Short Break: 10 min)
20 min
20 min
2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Abstract
● Language models (LMs) have a long history in natural language processing
(NLP) research. Their usage was mainly a text generation module (or
calculating likelihood of word sequences) in machine translation and speech
recognition systems, used together with translation or acoustic models. After
the current neural era, LMs take an essential role in the NLP field. In fact, LMs
are integrated into any models/systems to tackle almost all the NLP tasks and
provide state-of-the-art performance on conventional NLP benchmarks. The
usage of LMs is considered to be shifting to more like a world model of
languages or a general-purpose feature generator of any language-related
tasks. More recently, the public sometimes treats LMs like ChatGPT, and its
successor GPT-4, as general-purpose AI after starting an online service in the
public domain.
● This tutorial will first introduce some introductory topics we should know when
discussing the recent advances in LMs like ChatGPT. We will then briefly
introduce the technologies behind ChatGPT-like LMs. Additionally, we also
provide ChatGPT’s social impacts discussed recently in public.
3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Aims of This Tutorial
● This tutorial aims to introduce necessary factual pieces of knowledge for
discussing the latest LMs to researchers outside of the NLP field.
● In addition, another goal is for audiences to understand the strengths and
weaknesses of LMs from the viewpoint of LM users, and help their future
studies by learning the knowledge from this tutorial.
● Our audiences require no prior knowledge of LMs.
4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Brief Overview
● This tutorial will begin with some introductory topics that we
should know when discussing recent advances in LMs like
ChatGPT.
[Part 1, Part 2]
● https://www.fai.cds.tohoku.ac.jp/research/activities/
● We will then briefly introduce the technologies behind
ChatGPT-like LMs and their achievements.
[Part 3, Part 4]
● https://speakerdeck.com/kyoun/pakdd2023-tutorial
● Finally, we will also provide a topic of ChatGPT-like LMs,
Responsible LLMs, discussed recently in general public.
[Part 5]
● https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (1/5)
● Part 1: [introductory] Neural Language models (LMs)
◆ Traditional Definition of LMs
◆ Typical Base Model Architecture: Transformer
◆ Three Major Model Types: Encoder, Decoder, Encoder-decoder
◆ Universal Features
◆ Pre-training & Fine-tuning Scheme
◆ Multi-task Learner
6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (2/5)
● Part 2: Large Language Models (LLMs)
◆ LMs’ Scaling Laws (Parameter size ver.)
◆ LLMs: GPT-3
◆ Prompt Engineering
◆ Achievement and Remaining Issues
7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (3/5)
● Part 3: Technologies underlying ChatGPT-like LLMs
◆ Codex
◆ InstructGPT and ChatGPT
◆ GPT-4 and ChatGPT plugins
◆ LLaMA
◆ Alpaca and Vicuna
◆ MPT
◆ LLaVA
8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (4/5)
● Part 4: Recent achievements in ChatGPT-like LLMs
◆ Performance in Natural Language Processing benchmarks and exams
◆ Interesting results in Vision-and-Language Understanding evaluations
◆ Open-source applications powered by LLMs
◆ Remaining Issues
9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (5/5)
● Part 5: Efforts for Responsible LLMs
◆ High-level overview of potential harms of LLMs
◆ Efforts for reducing potential harms in GPT-4 and PaLM 2
[introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
PART 1
1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Part 1: [introductory]
Neural Language models (LMs)
We only discuss neural LMs
=> In this talk, “LM” means “neural LM”
(unless otherwise specified)
2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents of Part 1
l Part 1: [introductory] Neural Language Models (LMs)
u ① Traditional Definition of LMs
u ② Typical Base Model Architecture: Transformer
u ③ Three Major Model Types: Encoder, Decoder, Encoder-decoder
u ④ Universal Features
u ⑤ Pre-training & Fine-tuning Scheme
u ⑥ Multi-task Learner
5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
① Traditional Definition of Language Model (LM)
Language Model = Probabilistic model
u Modeling probabilities for all next words (probability distribution)
in vocabulary, given a context
: Context, prefix text, before j-th word
=
J+1
Y
j=1
P✓(yj | Y<j)
✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
: Target word, j-th word
ocean is ̲̲
??
Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
the
,
dark
bleu
white
today
fine
excellent
good
scary
…
Vocabulary Probability
…
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
Conditional Probability
??
dark ocean is ̲̲
Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Traditional LMs: 𝑛-gram LM
l Example: Google 𝑛-gram
https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html
serve as the incoming 92
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
serve as the individual 234
serve as the industrial 52
serve as the industry 607
serve as the info 42
serve as the informal 102
serve as the information 838
serve as the informational 41
serve as the initial 5331
serve as the initiating 125
serve as the initiation 63
serve as the initiator 81
serve as the …………
Counts of consecutive 𝑛 words
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
Context Target word Count Probability
Total 187491
0.00049
0.00423
0.00119
0.00038
0.00064
0.00024
0.00059
0.00021
0.00125
0.00028
0.00324
0.00022
0.00054
0.00447
0.00022
0.02843
0.00067
0.00034
0.00043
7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Neural LM
l Fit probability distribution over a sequence of words
given a context with a (deep) neural network
...
serve
serve
as the
as
input
the
Output
Input
1 2 3 4 5
Time step
Deep
Neural
Network
Context
… serve as the input …
Input text (training data) Correct
Answer
Vector
Current
estimated
probability
distribution
incoming 0.03 0
independent 0.12 0
index 0.04 0
indication 0.01 0
...
input 0.11 1
Initial 0.21 0
... ... ...
Vocabulary
Parameter
update
direction
Target
word
...
8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (1/2)
l If we have an LM, we can ... ??
1, Evaluate likelihood of given texts
Example: Statistical machine translation (SMT) system
HI YWUY1 KKIUH WKUYN LO WUNI
Input text
Translation
Model
Language Model
(LM)
(^ ^) ♪( ´▽`) ##!
Output text
&’# ()#’(“
OO)” LOW)
(^ ^) ♪( ´▽`)
##! ##!
OOooOO
(^~^)
21.5 x
145.2 x
3.2 x
111.2 x
...
72.4 x
9.2 x
...
Translation
score
21.5
35.2
21.2
11.2
...
32.4
119.2
...
LM
score
(intermediate)
Candidates
9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (2/2)
l If we have an LM, we can ... ??
2, Generate texts
l Estimate next words one-by-one
auto-regressively
l For greedy search, compute
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
argmax
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
10
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (2/2)
l If we have an LM, we can ... ??
2, Generate texts
l Estimate next words one-by-one
auto-regressively
l For greedy search, compute
Estimation: 𝑒
We have never met before , right ? Nice to meet you !
Neural LM (e.g., Generative pre-trained transformer: GPT)
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
Example: Generating text
using neural LM Nice to
Nice to
meet
meet
you
you
, too ...
, ...
Vocabulary Vocabulary Vocabulary Vocabulary
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
argmax
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
11
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Rough comparisons between general abilities of 𝑛-gram
LM and Neural LM
Why using Neural LM
Computational
cost
Longer context
Unseen context
𝑛-gram LM Neural LM
😩 😋
😩
😋
😩 😋
Performance
(in terms of perplexity) 😩 😋
12
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
...
serve
serve
as the
as
input
the
Output
Input
1 2 3 4 5
Time step
Deep
Neural
Network
Context
… serve as the input …
Input text (training data) Correct
Answer
Vector
Current
estimated
probability
distribution
incoming 0.03 0
independent 0.12 0
index 0.04 0
indication 0.01 0
...
input 0.11 1
Initial 0.21 0
... ... ...
Vocabulary
Parameter
update
direction
Target
word
...
② Typical Base Model Architecture
Transformer
u Surprisingly, all the famous LMs developed recently have chosen
Transformers as their base model
13
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Transformer (1/2)
l 2017: Published First draft paper
u Developed primarily for machine translation tasks
From: Ashish Vaswani [view email]
[v1] Mon, 12 Jun 2017 17:57:34 UTC
[v2] Mon, 19 Jun 2017 16:49:45 UTC
[v3] Tue, 20 Jun 2017 05:20:02 UTC
[v4] Fri, 30 Jun 2017 17:29:30 UTC
[v5] Wed, 6 Dec 2017 03:30:32 UTC
14
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Transformer (2/2)
l 2017: Published First draft paper
l 2018: Selected as base model architecture of
BERT (and GPT)
u => fundamental model for language
l 2023 [Current]: used as base model architecture for
(almost) all famous LMs
l e.g., GPT-3 (ChatGPT), GPT-4, PaLM (PaLM2), OPT, LLaMa, Bloom, ...
(GPT: Generative Pre-trained Transformer)
(BERT: Bidirectional Encoder Representations from Transformers)
15
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
[FYI] Transformer for Image Processing
l Used for image processing tasks
u 2020: Vision Transformer (ViT)
u Competitive to the conventional CNNs
Split given image
into parts according
to the mesh
https://openreview.net/forum?id=YicbFdNTTy
16
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
③ Three Major Model Types
l Encoder type (Masked LMs)
l e.g., BERT, RoBERTa
l Bidirectional
l Decoder type (Causal LMs)
l e.g., GPT
l Unidirectional (left-to-right)
l Encoder-decoder type
l e.g., T5
l Encoder: bidirectional,
Decoder: unidirectional
Not discuss deeply about Encoder type in this talk
Reason: This type becomes less important in these days,
in terms of the context of “Generative AI”
17
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
④ LMs as Universal Features
l Typical usages of LMs
=> If we have an LM, we can ... ??
u 1, Evaluate likelihood
of given texts
u 2, Generate texts
u 3, Use LMs as universal features
l Novel usage of LMs
l Ability of neural networks has enabled it
l All LMs explicitly/implicitly take this approach
Identical to
traditional usages
like n-gram LMs
Pioneer of
this usage
18
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMs as Universal Features
l Neural LM => Implicitly encodes/captures many
linguistic aspects such as
l Such learned linguistic aspects can help a lot for many
NLP tasks
LM
Training
l Distribution of word occurrences given contexts
l Semantically similar/dissimilar expressions
l Syntactic/semantic structural information
19
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Large-scale dataset
from the Web
e.g., news articles,
wikipedia, arXiv,
GitHub
⑤ Pre-training / Fine-tuning Scheme
l Two stage training scheme of LMs
u 1, Pre-training
l Train LMs from extremely many raw texts
obtained from the web
u 2, Fine-tuning
l Train LMs from human annotated data
u Human annotated data: relatively small
Pre-Training
Language
Model
Human-annotated data
Fine-tuning
First step
Second step
20
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Viewpoint from Data Sparsity Problem
l Pre-training / Fine-tuning scheme
u Variant of Transfer learning
u Requires less human annotated data to achieve reasonable task
performance
l Free from having to prepare a large amount of human
annotation data for every NLP task
u Relatively expensive and time-consuming to increase human
annotation data
l May solves (or alleviates) an essential problem (called
data sparsity problem) in the NLP community that has
remained unsolved for a long time
21
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
⑥ LMs as multitask Learner
l Tackle to solve any NLP tasks by a single model
u Possible by casting in a unified ”text-to-text” generation task
Copied from
https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html
Example: T5
22
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMs as Multitask Learner
+ Pre-training / Fine-tuning Scheme
l Single LM can solve many NLP tasks
Large-scale dataset
from the Web
e.g., news articles,
wikipedia, arXiv, GitHub
Fact checking
○○ is good
is bad
Sentiment Analysis
Machine Translation
Hello! こんにちは
本日のニュース
・株価 ○○
・通常国会、、
・東京都、、
Text summarization
Question Answering
Language
Model
MT data QA data SA data
”text-to-text” format
Pre-Training
Fine-tuning
First step
Second step
T5 (and partially GPT-2): pioneers of this approach
=> First trial toward artificial general intelligence in LM literature
23
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Summary/Take Home Message: Part 1
24
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part1 (1/2)
l ① Language Model: Probabilistic Model
u Modeling probabilities for all possible next words (probability
distribution) in vocabulary, given a context
l ② Base Model Architecture: Transformer
u Developed mainly for neural machine translation (NMT)
l ③ Model Types
u Encoder-type: (discard discussions of this type in this talk)
u Decoder-type:
u Encoder-decoder-type:
25
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part1 (2/2)
l ④ Universal Features
u Neural LM: Implicitly encodes/captures many linguistic aspects
l E.g., ELMo, BERT, GPT-2, RoBERTa, ...
l ⑤ Pre-training & Fine-tuning Scheme
u Valiant of transfer learning (from Pre-trained LMs)
u Free from having to prepare a large amount of human annotation
data for every NLP task
u Alleviate data sparsity problem (long-time problem) in NLP
l ⑥ Multi-task Learner (towards artificial general
intelligence)
u Tackle any NLP tasks in a single model
u Toward artificial general intelligence
[introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
PART 2
1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Part 2: Large language models (LLMs)
2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents of Part 2
l Part 2: Large Language Models (LLMs)
u ① LMsʼ Scaling Laws (Parameter size ver.)
u ② LLMs: GPT-3
u ③ Prompt Engineering
u ④ Achievement: LLMs and Prompts
u ⑤ Remaining Issues: LLMs and Prompts
5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
① LMsʼ Scaling Laws (Parameter size)
l General tendency:
More parameters => Better performance !?
[Kaplan+, 2020] Scaling Laws for Neural Language Models, arXiv:2001.08361
N : number of parameters (w/o embedding vectors) [logarithmic scale]
Sm
aller is
better
6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMsʼ Scaling Laws: Parameter Size
PaLM 2
GPT-4 (?)
LLaMA
OPT
PaLM
Chinchilla
BLOOM
Megatron Turing-
NLG
GPT-3
T5
RoBERTa
GPT-2
BERT
GPT-1
ELMO
0.1
1
10
100
1000
01/2018
04/2018
07/2018
10/2018
01/2019
04/2019
07/2019
10/2019
01/2020
04/2020
07/2020
10/2020
01/2021
04/2021
07/2021
10/2021
01/2022
04/2022
07/2022
10/2022
01/2023
04/2023
07/2023
10/2023
Model
Size
(in
billions
of
parameters)
[logarithmic
scale
Roughly x10 per year
7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
② LLMs: GPT-3
l Introducing several new concepts
u 1, Larger model: First >100B parameter LMs
u 2, Potential on prompt engineering (in-context learning)
l Few-shot learning
l Instruction
https://arxiv.org/abs/2005.14165
PaLM 2
GPT-4 (?)
LLaMA
OPT
PaLM
Chinchilla
BLOOM
Megatron Turing-
NLG
GPT-3
T5
RoBERTa
GPT-2
BERT
GPT-1
ELMO
0.1
1
10
100
1000
01/2018
04/2018
07/2018
10/2018
01/2019
04/2019
07/2019
10/2019
01/2020
04/2020
07/2020
10/2020
01/2021
04/2021
07/2021
10/2021
01/2022
04/2022
07/2022
10/2022
01/2023
04/2023
07/2023
10/2023
Model
Size
(in
billions
of
parameters)
[logarithmic
scale
Large jump-up in model size
(note: y-axis is log-scale)
8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
GPT-3: Impact of using Large Neural Models
l By scaling up the model parameters,
LMs enable to gain the ... :
u 1, Capacity to memorize much of the training data in its entirety
u 2, Ability to produce very fluent sentences, like a human writer
u 3, Solvable patterns for simple calculations, reasoning,
argumentative thinking, common sense, etc., at the elementary
level
( Some might say, "gain the ability to solve/think simple
calculations ..." but not in this talk )
9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Generate Just Like Human Writers (?)
l Trial to generate fake news
Figure 3.14: The GPT-3 generated news article that humans had the greatest difficulty
distinguishing from a human written article (accuracy: 12%).
Title: United Methodists Agree to Historic Split
Subtitle: Those who oppose gay marriage will form their own denomination
Article: After two days of intense debate, the United Methodist Church has agreed to a historic split
- one that is expected to end in the creation of a new denomination, one that will be "theologically
and socially conservative," according to The Washington Post. The majority of delegates attending
the church's annual General Conference in May voted to strengthen a ban on the ordination of
LGBTQ clergy and to write new rules that will "discipline" clergy who officiate at same-sex weddings.
But those who opposed these measures have a new plan: They say they will form a separate
denomination by 2020, calling their church the Christian Methodist denomination.
The Post notes that the denomination, which claims 12.5 million members, was in the early 20th
century the "largest Protestant denomination in the U.S.,” but that it has been shrinking in recent
decades. The new split will be the second in the church's history. The first occurred in 1968, when
roughly 10 percent of the denomination left to form the Evangelical United Brethren Church. The
Post notes that the proposed split "comes at a critical time for the church, which has been losing
members for years," which has been "pushed toward the brink of a schism over the role of LGBTQ
people in the church." Gay marriage is not the only issue that has divided the church. In 2016, the
denomination was split over ordination of transgender clergy, with the North Pacific regional
conference voting to ban them from serving as clergy, and the South Pacific regional conference
voting to allow them.
Given Title, Subtitle and prefix
(prompt) word “Article:”
10
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Generate texts just like human writers?
l Several reports that attest to GPT-3's ability to
generate fluent sentences that can potentially fool us
https://www.technologyreview.com/2020/08/14/1006
780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/
l A student who studies computer science at the UC
Berkeley, develops a fake blog site using GPT-3.
l One of the fake blog posts reaches #1 on Hacker News.
https://metastable.org/gpt-3.html
l Someone posted on a Reddit message
board for nearly a week using GPT-3.
l The posting by GPT-3 was eventually
stopped by the developer, but many
readers were unaware that the posts were
AI-generated.
11
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of LLMs
l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
12
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
③ Prompt Engineering
l Model size becomes larger
=> fine-tuning cost also becomes larger
l Consider reducing the computational cost
=> Method without additional training from Pre-trained
LLMs
l Alternative approach of pre-training / fine-tuning
scheme
13
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
What is Prompt?
l Starting text given to LMs to generate subsequent text.
Example: standard text completion case
Sensoji is Language Model the oldest temple in Tokyo ...
Prompt
Equivalent to the “context”
(prefix text) in LMs
Why ? In GPT-2 & GPT-3 papers
Example: Machine translation experiment
Context format
English sentence = Japanese sentence
Hello. = こんにちは。
It is snowing. = 雪が降っている
Today it is sunny. = 今日は晴天です。
Yesterday it rained. =
Prompt
Starting
text
Confusing, but currently,
we usually call the entire
starting text (prefix text) as
“prompt“, not the last
sentence, since we set
several different types of
texts in the prompt.
This looks like “prompt” of
shell in terminal ??
14
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
More Information in Context
l Conventional LMs
u Generate sentences following context words
=> LMsʼ basic task: text completion
l LM + Prompt
u Generate sentences by giving a sort of instruction sentences on
how we expect LMs to solve a target task as context words
l e.g., 1, Examples (exemplars),
2, (Human) Instruction for explaining the target task
u Elicit the implicit capabilities of the LLM to solve the given task,
without changing (no-learn) the model parameters
l Prerequisite: already learned through pre-training
( => can't resolve the task that hasnʼt been learned in general )
u Use context to "control" generation
15
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Few-shot (1/3)
l Equivalent to the “context” (prefix text)
u But, not just a standard prefix text, but more for ordering or
instruction for what we want LMs to generate a sentence
Estimation: 𝑒 Sensoji Temple ” <EOS>
Question: 𝑞
What is the oldest temple in Tokyo ? The answer is “ Sensoji Temple ”
instruction
Example: Question answering
Generative Pre-trained Transformer (GPT)
Q: What is the oldest temple in Tokyo? A: Sensoji / Sensoji Temple
16
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Few-shot (2/3)
l Setting where the model is given a few demonstrations
(examples/exemplars) of the task at inference time
u No weight updates
https://arxiv.org/abs/2005.14165
17
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of Few-show Learning
l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
18
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Chain-of-thought (3/3)
l Scaling up model size alone has not proved sufficient
high performance on challenging tasks
u Arithmetic, Commonsense, Symbolic reasoning
Chain-of-thought (CoT) prompting improves performance
Example
(One-shot)
Question
19
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Chain-of-thought (3/3)
l Examples of CoT prompting for challenging tasks
20
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of CoT Prompting
l Performance of CoT prompting for challenging tasks
21
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
④ Achievement and Remaining Issues
l Achievement
u LLMs
l Very fluent like human-made sentences
u Prompt engineering
l Solve many language tasks without
1, parameter update
2, preparing the task specific supervised data
22
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Remaining Issues
l Prompt: not consistently effective
u Limited availability, dependent on pre-training data
l Effective prompt: somewhat artificial (or non-intuitive to humans)
u LM tasks: basically text-completion tasks
l Sometimes generates
unreliable (not based on facts),
biased (discriminatory or socially criticized),
inconsistent (opposed responses in the same context), or
harmful (information that should not publicly shared) texts
Instruction tuning and chat tuning (Part 3)
23
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Summary/Take Home Message: Part 2
24
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part 2 (1/2)
l ① LMsʼ Scaling laws
u Larger is better ?
l ② LLMs: GPT-3
u Very fluent as human writer
l ③ Prompt engineering (in-context learning)
u Background: discard fine-tuning cost
l Unnecessary of the additional cost for train large model
u Few-shot examples, chain-of-thought, instruction, ...
25
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part 2 (1/2)
l ④ Achievement and Remaining Issues: LLMs and
Prompts
u Achievement
l Very fluent as human writer
l Solve many language tasks without preparing the task specific
supervised data
u Remaining Issues
l Prompts can be somewhat artificial
l How we prevent to generate unreliable, biased, inconsistent, or
harmful texts

More Related Content

What's hot

(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...KIT Cognitive Interaction Design
 
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...Deep Learning JP
 
Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304
Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304
Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304Matlantis
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? Deep Learning JP
 
ベイズ最適化によるハイパラーパラメータ探索
ベイズ最適化によるハイパラーパラメータ探索ベイズ最適化によるハイパラーパラメータ探索
ベイズ最適化によるハイパラーパラメータ探索西岡 賢一郎
 
Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習Hitoshi Sato
 
【DL輪読会】GPT-4Technical Report
【DL輪読会】GPT-4Technical Report【DL輪読会】GPT-4Technical Report
【DL輪読会】GPT-4Technical ReportDeep Learning JP
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)MasanoriSuganuma
 
論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing TrickSeiya Tokui
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language ModelsDeep Learning JP
 
機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323
機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323
機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323Matlantis
 
先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際Tetsutaro Watanabe
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健Preferred Networks
 
ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知Core Concept Technologies
 

What's hot (20)

(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
 
ChatGPT_ppf.pdf
ChatGPT_ppf.pdfChatGPT_ppf.pdf
ChatGPT_ppf.pdf
 
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
 
Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304
Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304
Matlantisがもたらす革新的なマテリアルの創出_POL共催セミナー_20220304
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
 
ベイズ最適化によるハイパラーパラメータ探索
ベイズ最適化によるハイパラーパラメータ探索ベイズ最適化によるハイパラーパラメータ探索
ベイズ最適化によるハイパラーパラメータ探索
 
Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習
 
【DL輪読会】GPT-4Technical Report
【DL輪読会】GPT-4Technical Report【DL輪読会】GPT-4Technical Report
【DL輪読会】GPT-4Technical Report
 
BERT+XLNet+RoBERTa
BERT+XLNet+RoBERTaBERT+XLNet+RoBERTa
BERT+XLNet+RoBERTa
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)
 
論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
 
BERT
BERTBERT
BERT
 
ゼロから始める転移学習
ゼロから始める転移学習ゼロから始める転移学習
ゼロから始める転移学習
 
機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323
機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323
機械学習によるハイスループット 第一原理計算の代替の可能性_日本化学会_20230323
 
先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際
 
LBFGSの実装
LBFGSの実装LBFGSの実装
LBFGSの実装
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
 
ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知
 

Similar to PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupProvectus
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmIRJET Journal
 
OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21Shane Coughlan
 
The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...ijma
 
Bs telecom final defence format
Bs telecom final defence formatBs telecom final defence format
Bs telecom final defence formatPrince Rose
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfsudeshnakundu10
 
Software project management
Software project managementSoftware project management
Software project managementAnnisa Shabrina
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...Pieter Pauwels
 
Simple Drools Examples
Simple Drools ExamplesSimple Drools Examples
Simple Drools ExamplesMatteo Mortari
 
Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17VishalButkar2
 
IRJET- School in the Cloud
IRJET- School in the CloudIRJET- School in the Cloud
IRJET- School in the CloudIRJET Journal
 
06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppm06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppmAzmi Sulaiman
 
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...Christian Morbidoni
 
Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17abc19789
 
Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)IOSR Journals
 

Similar to PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2) (20)

IA377 Seminar FEEC-UNICAMP Intro classpdf
IA377 Seminar FEEC-UNICAMP Intro classpdfIA377 Seminar FEEC-UNICAMP Intro classpdf
IA377 Seminar FEEC-UNICAMP Intro classpdf
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting Algorithm
 
OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21
 
The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...
 
Bs telecom final defence format
Bs telecom final defence formatBs telecom final defence format
Bs telecom final defence format
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
Software project management
Software project managementSoftware project management
Software project management
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
 
Simple Drools Examples
Simple Drools ExamplesSimple Drools Examples
Simple Drools Examples
 
Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17
 
IRJET- School in the Cloud
IRJET- School in the CloudIRJET- School in the Cloud
IRJET- School in the Cloud
 
06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppm06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppm
 
Rock Overview
Rock OverviewRock Overview
Rock Overview
 
DM2E DHAB meeting: WP3 Report Scholarly research platform
DM2E DHAB meeting: WP3 Report Scholarly research platformDM2E DHAB meeting: WP3 Report Scholarly research platform
DM2E DHAB meeting: WP3 Report Scholarly research platform
 
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
 
Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17
 
Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)
 

Recently uploaded

WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyUXDXConf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Boni Yeamin
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfAnthony Lucente
 

Recently uploaded (20)

WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in Technology
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdf
 

PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2)

  • 1. [introductory] A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT 2023.05.25 (Thu) PAKDD 2023 Tutorial 2 Jun Suzuki Tohoku University Kyosuke Nishida NTT Human Informatics Laboratories Naoaki Okazaki Tokyo Institute of Technology ● [Part 1, Part 2] https://www.fai.cds.tohoku.ac.jp/research/activities/ ● [Part 3, Part 4] https://speakerdeck.com/kyoun/pakdd2023-tutorial ● [Part 5] https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
  • 2. 1 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Schedule ● Overview [This part] ● Part 1: [introductory] Language models (LMs) ● Part 2: Large language models (LLMs) ● Part 3: Technologies underlying ChatGPT-like LLMs ● Part 4: Recent achievements in ChatGPT-like LLMs ● Part 5: Efforts for Responsible LLMs ● Q&A 10 min 20 min 20 min 20 min 20 min (Short Break: 10 min) (Short Break: 10 min) 20 min 20 min
  • 3. 2 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Abstract ● Language models (LMs) have a long history in natural language processing (NLP) research. Their usage was mainly a text generation module (or calculating likelihood of word sequences) in machine translation and speech recognition systems, used together with translation or acoustic models. After the current neural era, LMs take an essential role in the NLP field. In fact, LMs are integrated into any models/systems to tackle almost all the NLP tasks and provide state-of-the-art performance on conventional NLP benchmarks. The usage of LMs is considered to be shifting to more like a world model of languages or a general-purpose feature generator of any language-related tasks. More recently, the public sometimes treats LMs like ChatGPT, and its successor GPT-4, as general-purpose AI after starting an online service in the public domain. ● This tutorial will first introduce some introductory topics we should know when discussing the recent advances in LMs like ChatGPT. We will then briefly introduce the technologies behind ChatGPT-like LMs. Additionally, we also provide ChatGPT’s social impacts discussed recently in public.
  • 4. 3 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Aims of This Tutorial ● This tutorial aims to introduce necessary factual pieces of knowledge for discussing the latest LMs to researchers outside of the NLP field. ● In addition, another goal is for audiences to understand the strengths and weaknesses of LMs from the viewpoint of LM users, and help their future studies by learning the knowledge from this tutorial. ● Our audiences require no prior knowledge of LMs.
  • 5. 4 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Brief Overview ● This tutorial will begin with some introductory topics that we should know when discussing recent advances in LMs like ChatGPT. [Part 1, Part 2] ● https://www.fai.cds.tohoku.ac.jp/research/activities/ ● We will then briefly introduce the technologies behind ChatGPT-like LMs and their achievements. [Part 3, Part 4] ● https://speakerdeck.com/kyoun/pakdd2023-tutorial ● Finally, we will also provide a topic of ChatGPT-like LMs, Responsible LLMs, discussed recently in general public. [Part 5] ● https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
  • 6. 5 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (1/5) ● Part 1: [introductory] Neural Language models (LMs) ◆ Traditional Definition of LMs ◆ Typical Base Model Architecture: Transformer ◆ Three Major Model Types: Encoder, Decoder, Encoder-decoder ◆ Universal Features ◆ Pre-training & Fine-tuning Scheme ◆ Multi-task Learner
  • 7. 6 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (2/5) ● Part 2: Large Language Models (LLMs) ◆ LMs’ Scaling Laws (Parameter size ver.) ◆ LLMs: GPT-3 ◆ Prompt Engineering ◆ Achievement and Remaining Issues
  • 8. 7 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (3/5) ● Part 3: Technologies underlying ChatGPT-like LLMs ◆ Codex ◆ InstructGPT and ChatGPT ◆ GPT-4 and ChatGPT plugins ◆ LLaMA ◆ Alpaca and Vicuna ◆ MPT ◆ LLaVA
  • 9. 8 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (4/5) ● Part 4: Recent achievements in ChatGPT-like LLMs ◆ Performance in Natural Language Processing benchmarks and exams ◆ Interesting results in Vision-and-Language Understanding evaluations ◆ Open-source applications powered by LLMs ◆ Remaining Issues
  • 10. 9 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (5/5) ● Part 5: Efforts for Responsible LLMs ◆ High-level overview of potential harms of LLMs ◆ Efforts for reducing potential harms in GPT-4 and PaLM 2
  • 11. [introductory] A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT 2023.05.25 (Thu) PAKDD 2023 Tutorial 2 Jun Suzuki Tohoku University Kyosuke Nishida NTT Human Informatics Laboratories Naoaki Okazaki Tokyo Institute of Technology PART 1
  • 12. 1 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Part 1: [introductory] Neural Language models (LMs) We only discuss neural LMs => In this talk, “LM” means “neural LM” (unless otherwise specified)
  • 13. 2 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068
  • 14. 3 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068 ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog
  • 15. 4 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents of Part 1 l Part 1: [introductory] Neural Language Models (LMs) u ① Traditional Definition of LMs u ② Typical Base Model Architecture: Transformer u ③ Three Major Model Types: Encoder, Decoder, Encoder-decoder u ④ Universal Features u ⑤ Pre-training & Fine-tuning Scheme u ⑥ Multi-task Learner
  • 16. 5 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ① Traditional Definition of Language Model (LM) Language Model = Probabilistic model u Modeling probabilities for all next words (probability distribution) in vocabulary, given a context : Context, prefix text, before j-th word = J+1 Y j=1 P✓(yj | Y<j) ✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) : Target word, j-th word ocean is ̲̲ ?? Y ) = J+1 Y j=1 P✓(yj | Y<j) <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) the , dark bleu white today fine excellent good scary … Vocabulary Probability … <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) Conditional Probability ?? dark ocean is ̲̲ Y ) = J+1 Y j=1 P✓(yj | Y<j) <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j)
  • 17. 6 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Traditional LMs: 𝑛-gram LM l Example: Google 𝑛-gram https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html serve as the incoming 92 serve as the independent 794 serve as the index 223 serve as the indication 72 serve as the indicator 120 serve as the indicators 45 serve as the indispensable 111 serve as the indispensible 40 serve as the individual 234 serve as the industrial 52 serve as the industry 607 serve as the info 42 serve as the informal 102 serve as the information 838 serve as the informational 41 serve as the initial 5331 serve as the initiating 125 serve as the initiation 63 serve as the initiator 81 serve as the ………… Counts of consecutive 𝑛 words <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) Context Target word Count Probability Total 187491 0.00049 0.00423 0.00119 0.00038 0.00064 0.00024 0.00059 0.00021 0.00125 0.00028 0.00324 0.00022 0.00054 0.00447 0.00022 0.02843 0.00067 0.00034 0.00043
  • 18. 7 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Neural LM l Fit probability distribution over a sequence of words given a context with a (deep) neural network ... serve serve as the as input the Output Input 1 2 3 4 5 Time step Deep Neural Network Context … serve as the input … Input text (training data) Correct Answer Vector Current estimated probability distribution incoming 0.03 0 independent 0.12 0 index 0.04 0 indication 0.01 0 ... input 0.11 1 Initial 0.21 0 ... ... ... Vocabulary Parameter update direction Target word ...
  • 19. 8 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Typical usages of LMs (1/2) l If we have an LM, we can ... ?? 1, Evaluate likelihood of given texts Example: Statistical machine translation (SMT) system HI YWUY1 KKIUH WKUYN LO WUNI Input text Translation Model Language Model (LM) (^ ^) ♪( ´▽`) ##! Output text &’# ()#’(“ OO)” LOW) (^ ^) ♪( ´▽`) ##! ##! OOooOO (^~^) 21.5 x 145.2 x 3.2 x 111.2 x ... 72.4 x 9.2 x ... Translation score 21.5 35.2 21.2 11.2 ... 32.4 119.2 ... LM score (intermediate) Candidates
  • 20. 9 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Typical usages of LMs (2/2) l If we have an LM, we can ... ?? 2, Generate texts l Estimate next words one-by-one auto-regressively l For greedy search, compute <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) argmax <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j)
  • 21. 10 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Typical usages of LMs (2/2) l If we have an LM, we can ... ?? 2, Generate texts l Estimate next words one-by-one auto-regressively l For greedy search, compute Estimation: 𝑒 We have never met before , right ? Nice to meet you ! Neural LM (e.g., Generative pre-trained transformer: GPT) A this that … meet have you … Nice … to … too , . A this that … meet have you … Nice … to … too , . A this that … meet have you … Nice … to … too , . A this that … meet have you … Nice … to … too , . Example: Generating text using neural LM Nice to Nice to meet meet you you , too ... , ... Vocabulary Vocabulary Vocabulary Vocabulary <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) argmax <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j)
  • 22. 11 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) l Rough comparisons between general abilities of 𝑛-gram LM and Neural LM Why using Neural LM Computational cost Longer context Unseen context 𝑛-gram LM Neural LM 😩 😋 😩 😋 😩 😋 Performance (in terms of perplexity) 😩 😋
  • 23. 12 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ... serve serve as the as input the Output Input 1 2 3 4 5 Time step Deep Neural Network Context … serve as the input … Input text (training data) Correct Answer Vector Current estimated probability distribution incoming 0.03 0 independent 0.12 0 index 0.04 0 indication 0.01 0 ... input 0.11 1 Initial 0.21 0 ... ... ... Vocabulary Parameter update direction Target word ... ② Typical Base Model Architecture Transformer u Surprisingly, all the famous LMs developed recently have chosen Transformers as their base model
  • 24. 13 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Transformer (1/2) l 2017: Published First draft paper u Developed primarily for machine translation tasks From: Ashish Vaswani [view email] [v1] Mon, 12 Jun 2017 17:57:34 UTC [v2] Mon, 19 Jun 2017 16:49:45 UTC [v3] Tue, 20 Jun 2017 05:20:02 UTC [v4] Fri, 30 Jun 2017 17:29:30 UTC [v5] Wed, 6 Dec 2017 03:30:32 UTC
  • 25. 14 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Transformer (2/2) l 2017: Published First draft paper l 2018: Selected as base model architecture of BERT (and GPT) u => fundamental model for language l 2023 [Current]: used as base model architecture for (almost) all famous LMs l e.g., GPT-3 (ChatGPT), GPT-4, PaLM (PaLM2), OPT, LLaMa, Bloom, ... (GPT: Generative Pre-trained Transformer) (BERT: Bidirectional Encoder Representations from Transformers)
  • 26. 15 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) [FYI] Transformer for Image Processing l Used for image processing tasks u 2020: Vision Transformer (ViT) u Competitive to the conventional CNNs Split given image into parts according to the mesh https://openreview.net/forum?id=YicbFdNTTy
  • 27. 16 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ③ Three Major Model Types l Encoder type (Masked LMs) l e.g., BERT, RoBERTa l Bidirectional l Decoder type (Causal LMs) l e.g., GPT l Unidirectional (left-to-right) l Encoder-decoder type l e.g., T5 l Encoder: bidirectional, Decoder: unidirectional Not discuss deeply about Encoder type in this talk Reason: This type becomes less important in these days, in terms of the context of “Generative AI”
  • 28. 17 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ④ LMs as Universal Features l Typical usages of LMs => If we have an LM, we can ... ?? u 1, Evaluate likelihood of given texts u 2, Generate texts u 3, Use LMs as universal features l Novel usage of LMs l Ability of neural networks has enabled it l All LMs explicitly/implicitly take this approach Identical to traditional usages like n-gram LMs Pioneer of this usage
  • 29. 18 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) LMs as Universal Features l Neural LM => Implicitly encodes/captures many linguistic aspects such as l Such learned linguistic aspects can help a lot for many NLP tasks LM Training l Distribution of word occurrences given contexts l Semantically similar/dissimilar expressions l Syntactic/semantic structural information
  • 30. 19 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Large-scale dataset from the Web e.g., news articles, wikipedia, arXiv, GitHub ⑤ Pre-training / Fine-tuning Scheme l Two stage training scheme of LMs u 1, Pre-training l Train LMs from extremely many raw texts obtained from the web u 2, Fine-tuning l Train LMs from human annotated data u Human annotated data: relatively small Pre-Training Language Model Human-annotated data Fine-tuning First step Second step
  • 31. 20 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Viewpoint from Data Sparsity Problem l Pre-training / Fine-tuning scheme u Variant of Transfer learning u Requires less human annotated data to achieve reasonable task performance l Free from having to prepare a large amount of human annotation data for every NLP task u Relatively expensive and time-consuming to increase human annotation data l May solves (or alleviates) an essential problem (called data sparsity problem) in the NLP community that has remained unsolved for a long time
  • 32. 21 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ⑥ LMs as multitask Learner l Tackle to solve any NLP tasks by a single model u Possible by casting in a unified ”text-to-text” generation task Copied from https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html Example: T5
  • 33. 22 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) LMs as Multitask Learner + Pre-training / Fine-tuning Scheme l Single LM can solve many NLP tasks Large-scale dataset from the Web e.g., news articles, wikipedia, arXiv, GitHub Fact checking ○○ is good is bad Sentiment Analysis Machine Translation Hello! こんにちは 本日のニュース ・株価 ○○ ・通常国会、、 ・東京都、、 Text summarization Question Answering Language Model MT data QA data SA data ”text-to-text” format Pre-Training Fine-tuning First step Second step T5 (and partially GPT-2): pioneers of this approach => First trial toward artificial general intelligence in LM literature
  • 34. 23 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) l Summary/Take Home Message: Part 1
  • 35. 24 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part1 (1/2) l ① Language Model: Probabilistic Model u Modeling probabilities for all possible next words (probability distribution) in vocabulary, given a context l ② Base Model Architecture: Transformer u Developed mainly for neural machine translation (NMT) l ③ Model Types u Encoder-type: (discard discussions of this type in this talk) u Decoder-type: u Encoder-decoder-type:
  • 36. 25 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part1 (2/2) l ④ Universal Features u Neural LM: Implicitly encodes/captures many linguistic aspects l E.g., ELMo, BERT, GPT-2, RoBERTa, ... l ⑤ Pre-training & Fine-tuning Scheme u Valiant of transfer learning (from Pre-trained LMs) u Free from having to prepare a large amount of human annotation data for every NLP task u Alleviate data sparsity problem (long-time problem) in NLP l ⑥ Multi-task Learner (towards artificial general intelligence) u Tackle any NLP tasks in a single model u Toward artificial general intelligence
  • 37. [introductory] A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT 2023.05.25 (Thu) PAKDD 2023 Tutorial 2 Jun Suzuki Tohoku University Kyosuke Nishida NTT Human Informatics Laboratories Naoaki Okazaki Tokyo Institute of Technology PART 2
  • 38. 1 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Part 2: Large language models (LLMs)
  • 39. 2 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068
  • 40. 3 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068
  • 41. 4 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents of Part 2 l Part 2: Large Language Models (LLMs) u ① LMsʼ Scaling Laws (Parameter size ver.) u ② LLMs: GPT-3 u ③ Prompt Engineering u ④ Achievement: LLMs and Prompts u ⑤ Remaining Issues: LLMs and Prompts
  • 42. 5 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ① LMsʼ Scaling Laws (Parameter size) l General tendency: More parameters => Better performance !? [Kaplan+, 2020] Scaling Laws for Neural Language Models, arXiv:2001.08361 N : number of parameters (w/o embedding vectors) [logarithmic scale] Sm aller is better
  • 43. 6 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) LMsʼ Scaling Laws: Parameter Size PaLM 2 GPT-4 (?) LLaMA OPT PaLM Chinchilla BLOOM Megatron Turing- NLG GPT-3 T5 RoBERTa GPT-2 BERT GPT-1 ELMO 0.1 1 10 100 1000 01/2018 04/2018 07/2018 10/2018 01/2019 04/2019 07/2019 10/2019 01/2020 04/2020 07/2020 10/2020 01/2021 04/2021 07/2021 10/2021 01/2022 04/2022 07/2022 10/2022 01/2023 04/2023 07/2023 10/2023 Model Size (in billions of parameters) [logarithmic scale Roughly x10 per year
  • 44. 7 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ② LLMs: GPT-3 l Introducing several new concepts u 1, Larger model: First >100B parameter LMs u 2, Potential on prompt engineering (in-context learning) l Few-shot learning l Instruction https://arxiv.org/abs/2005.14165 PaLM 2 GPT-4 (?) LLaMA OPT PaLM Chinchilla BLOOM Megatron Turing- NLG GPT-3 T5 RoBERTa GPT-2 BERT GPT-1 ELMO 0.1 1 10 100 1000 01/2018 04/2018 07/2018 10/2018 01/2019 04/2019 07/2019 10/2019 01/2020 04/2020 07/2020 10/2020 01/2021 04/2021 07/2021 10/2021 01/2022 04/2022 07/2022 10/2022 01/2023 04/2023 07/2023 10/2023 Model Size (in billions of parameters) [logarithmic scale Large jump-up in model size (note: y-axis is log-scale)
  • 45. 8 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) GPT-3: Impact of using Large Neural Models l By scaling up the model parameters, LMs enable to gain the ... : u 1, Capacity to memorize much of the training data in its entirety u 2, Ability to produce very fluent sentences, like a human writer u 3, Solvable patterns for simple calculations, reasoning, argumentative thinking, common sense, etc., at the elementary level ( Some might say, "gain the ability to solve/think simple calculations ..." but not in this talk )
  • 46. 9 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Generate Just Like Human Writers (?) l Trial to generate fake news Figure 3.14: The GPT-3 generated news article that humans had the greatest difficulty distinguishing from a human written article (accuracy: 12%). Title: United Methodists Agree to Historic Split Subtitle: Those who oppose gay marriage will form their own denomination Article: After two days of intense debate, the United Methodist Church has agreed to a historic split - one that is expected to end in the creation of a new denomination, one that will be "theologically and socially conservative," according to The Washington Post. The majority of delegates attending the church's annual General Conference in May voted to strengthen a ban on the ordination of LGBTQ clergy and to write new rules that will "discipline" clergy who officiate at same-sex weddings. But those who opposed these measures have a new plan: They say they will form a separate denomination by 2020, calling their church the Christian Methodist denomination. The Post notes that the denomination, which claims 12.5 million members, was in the early 20th century the "largest Protestant denomination in the U.S.,” but that it has been shrinking in recent decades. The new split will be the second in the church's history. The first occurred in 1968, when roughly 10 percent of the denomination left to form the Evangelical United Brethren Church. The Post notes that the proposed split "comes at a critical time for the church, which has been losing members for years," which has been "pushed toward the brink of a schism over the role of LGBTQ people in the church." Gay marriage is not the only issue that has divided the church. In 2016, the denomination was split over ordination of transgender clergy, with the North Pacific regional conference voting to ban them from serving as clergy, and the South Pacific regional conference voting to allow them. Given Title, Subtitle and prefix (prompt) word “Article:”
  • 47. 10 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Generate texts just like human writers? l Several reports that attest to GPT-3's ability to generate fluent sentences that can potentially fool us https://www.technologyreview.com/2020/08/14/1006 780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/ l A student who studies computer science at the UC Berkeley, develops a fake blog site using GPT-3. l One of the fake blog posts reaches #1 on Hacker News. https://metastable.org/gpt-3.html l Someone posted on a Reddit message board for nearly a week using GPT-3. l The posting by GPT-3 was eventually stopped by the developer, but many readers were unaware that the posts were AI-generated.
  • 48. 11 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Effectiveness of LLMs l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
  • 49. 12 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ③ Prompt Engineering l Model size becomes larger => fine-tuning cost also becomes larger l Consider reducing the computational cost => Method without additional training from Pre-trained LLMs l Alternative approach of pre-training / fine-tuning scheme
  • 50. 13 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) What is Prompt? l Starting text given to LMs to generate subsequent text. Example: standard text completion case Sensoji is Language Model the oldest temple in Tokyo ... Prompt Equivalent to the “context” (prefix text) in LMs Why ? In GPT-2 & GPT-3 papers Example: Machine translation experiment Context format English sentence = Japanese sentence Hello. = こんにちは。 It is snowing. = 雪が降っている Today it is sunny. = 今日は晴天です。 Yesterday it rained. = Prompt Starting text Confusing, but currently, we usually call the entire starting text (prefix text) as “prompt“, not the last sentence, since we set several different types of texts in the prompt. This looks like “prompt” of shell in terminal ??
  • 51. 14 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) More Information in Context l Conventional LMs u Generate sentences following context words => LMsʼ basic task: text completion l LM + Prompt u Generate sentences by giving a sort of instruction sentences on how we expect LMs to solve a target task as context words l e.g., 1, Examples (exemplars), 2, (Human) Instruction for explaining the target task u Elicit the implicit capabilities of the LLM to solve the given task, without changing (no-learn) the model parameters l Prerequisite: already learned through pre-training ( => can't resolve the task that hasnʼt been learned in general ) u Use context to "control" generation
  • 52. 15 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Few-shot (1/3) l Equivalent to the “context” (prefix text) u But, not just a standard prefix text, but more for ordering or instruction for what we want LMs to generate a sentence Estimation: 𝑒 Sensoji Temple ” <EOS> Question: 𝑞 What is the oldest temple in Tokyo ? The answer is “ Sensoji Temple ” instruction Example: Question answering Generative Pre-trained Transformer (GPT) Q: What is the oldest temple in Tokyo? A: Sensoji / Sensoji Temple
  • 53. 16 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Few-shot (2/3) l Setting where the model is given a few demonstrations (examples/exemplars) of the task at inference time u No weight updates https://arxiv.org/abs/2005.14165
  • 54. 17 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Effectiveness of Few-show Learning l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
  • 55. 18 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Chain-of-thought (3/3) l Scaling up model size alone has not proved sufficient high performance on challenging tasks u Arithmetic, Commonsense, Symbolic reasoning Chain-of-thought (CoT) prompting improves performance Example (One-shot) Question
  • 56. 19 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Chain-of-thought (3/3) l Examples of CoT prompting for challenging tasks
  • 57. 20 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Effectiveness of CoT Prompting l Performance of CoT prompting for challenging tasks
  • 58. 21 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ④ Achievement and Remaining Issues l Achievement u LLMs l Very fluent like human-made sentences u Prompt engineering l Solve many language tasks without 1, parameter update 2, preparing the task specific supervised data
  • 59. 22 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Remaining Issues l Prompt: not consistently effective u Limited availability, dependent on pre-training data l Effective prompt: somewhat artificial (or non-intuitive to humans) u LM tasks: basically text-completion tasks l Sometimes generates unreliable (not based on facts), biased (discriminatory or socially criticized), inconsistent (opposed responses in the same context), or harmful (information that should not publicly shared) texts Instruction tuning and chat tuning (Part 3)
  • 60. 23 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) l Summary/Take Home Message: Part 2
  • 61. 24 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part 2 (1/2) l ① LMsʼ Scaling laws u Larger is better ? l ② LLMs: GPT-3 u Very fluent as human writer l ③ Prompt engineering (in-context learning) u Background: discard fine-tuning cost l Unnecessary of the additional cost for train large model u Few-shot examples, chain-of-thought, instruction, ...
  • 62. 25 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part 2 (1/2) l ④ Achievement and Remaining Issues: LLMs and Prompts u Achievement l Very fluent as human writer l Solve many language tasks without preparing the task specific supervised data u Remaining Issues l Prompts can be somewhat artificial l How we prevent to generate unreliable, biased, inconsistent, or harmful texts