SlideShare a Scribd company logo
1 of 62
[introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
● [Part 1, Part 2] https://www.fai.cds.tohoku.ac.jp/research/activities/
● [Part 3, Part 4] https://speakerdeck.com/kyoun/pakdd2023-tutorial
● [Part 5] https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Schedule
● Overview [This part]
● Part 1: [introductory] Language models (LMs)
● Part 2: Large language models (LLMs)
● Part 3: Technologies underlying ChatGPT-like LLMs
● Part 4: Recent achievements in ChatGPT-like LLMs
● Part 5: Efforts for Responsible LLMs
● Q&A
10 min
20 min
20 min
20 min
20 min
(Short Break: 10 min)
(Short Break: 10 min)
20 min
20 min
2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Abstract
● Language models (LMs) have a long history in natural language processing
(NLP) research. Their usage was mainly a text generation module (or
calculating likelihood of word sequences) in machine translation and speech
recognition systems, used together with translation or acoustic models. After
the current neural era, LMs take an essential role in the NLP field. In fact, LMs
are integrated into any models/systems to tackle almost all the NLP tasks and
provide state-of-the-art performance on conventional NLP benchmarks. The
usage of LMs is considered to be shifting to more like a world model of
languages or a general-purpose feature generator of any language-related
tasks. More recently, the public sometimes treats LMs like ChatGPT, and its
successor GPT-4, as general-purpose AI after starting an online service in the
public domain.
● This tutorial will first introduce some introductory topics we should know when
discussing the recent advances in LMs like ChatGPT. We will then briefly
introduce the technologies behind ChatGPT-like LMs. Additionally, we also
provide ChatGPT’s social impacts discussed recently in public.
3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Aims of This Tutorial
● This tutorial aims to introduce necessary factual pieces of knowledge for
discussing the latest LMs to researchers outside of the NLP field.
● In addition, another goal is for audiences to understand the strengths and
weaknesses of LMs from the viewpoint of LM users, and help their future
studies by learning the knowledge from this tutorial.
● Our audiences require no prior knowledge of LMs.
4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Brief Overview
● This tutorial will begin with some introductory topics that we
should know when discussing recent advances in LMs like
ChatGPT.
[Part 1, Part 2]
● https://www.fai.cds.tohoku.ac.jp/research/activities/
● We will then briefly introduce the technologies behind
ChatGPT-like LMs and their achievements.
[Part 3, Part 4]
● https://speakerdeck.com/kyoun/pakdd2023-tutorial
● Finally, we will also provide a topic of ChatGPT-like LMs,
Responsible LLMs, discussed recently in general public.
[Part 5]
● https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (1/5)
● Part 1: [introductory] Neural Language models (LMs)
◆ Traditional Definition of LMs
◆ Typical Base Model Architecture: Transformer
◆ Three Major Model Types: Encoder, Decoder, Encoder-decoder
◆ Universal Features
◆ Pre-training & Fine-tuning Scheme
◆ Multi-task Learner
6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (2/5)
● Part 2: Large Language Models (LLMs)
◆ LMs’ Scaling Laws (Parameter size ver.)
◆ LLMs: GPT-3
◆ Prompt Engineering
◆ Achievement and Remaining Issues
7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (3/5)
● Part 3: Technologies underlying ChatGPT-like LLMs
◆ Codex
◆ InstructGPT and ChatGPT
◆ GPT-4 and ChatGPT plugins
◆ LLaMA
◆ Alpaca and Vicuna
◆ MPT
◆ LLaVA
8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (4/5)
● Part 4: Recent achievements in ChatGPT-like LLMs
◆ Performance in Natural Language Processing benchmarks and exams
◆ Interesting results in Vision-and-Language Understanding evaluations
◆ Open-source applications powered by LLMs
◆ Remaining Issues
9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents (5/5)
● Part 5: Efforts for Responsible LLMs
◆ High-level overview of potential harms of LLMs
◆ Efforts for reducing potential harms in GPT-4 and PaLM 2
[introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
PART 1
1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Part 1: [introductory]
Neural Language models (LMs)
We only discuss neural LMs
=> In this talk, “LM” means “neural LM”
(unless otherwise specified)
2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents of Part 1
l Part 1: [introductory] Neural Language Models (LMs)
u ① Traditional Definition of LMs
u ② Typical Base Model Architecture: Transformer
u ③ Three Major Model Types: Encoder, Decoder, Encoder-decoder
u ④ Universal Features
u ⑤ Pre-training & Fine-tuning Scheme
u ⑥ Multi-task Learner
5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
① Traditional Definition of Language Model (LM)
Language Model = Probabilistic model
u Modeling probabilities for all next words (probability distribution)
in vocabulary, given a context
: Context, prefix text, before j-th word
=
J+1
Y
j=1
P✓(yj | Y<j)
✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
: Target word, j-th word
ocean is ̲̲
??
Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
the
,
dark
bleu
white
today
fine
excellent
good
scary
…
Vocabulary Probability
…
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
Conditional Probability
??
dark ocean is ̲̲
Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Traditional LMs: 𝑛-gram LM
l Example: Google 𝑛-gram
https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html
serve as the incoming 92
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
serve as the individual 234
serve as the industrial 52
serve as the industry 607
serve as the info 42
serve as the informal 102
serve as the information 838
serve as the informational 41
serve as the initial 5331
serve as the initiating 125
serve as the initiation 63
serve as the initiator 81
serve as the …………
Counts of consecutive 𝑛 words
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
Context Target word Count Probability
Total 187491
0.00049
0.00423
0.00119
0.00038
0.00064
0.00024
0.00059
0.00021
0.00125
0.00028
0.00324
0.00022
0.00054
0.00447
0.00022
0.02843
0.00067
0.00034
0.00043
7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Neural LM
l Fit probability distribution over a sequence of words
given a context with a (deep) neural network
...
serve
serve
as the
as
input
the
Output
Input
1 2 3 4 5
Time step
Deep
Neural
Network
Context
… serve as the input …
Input text (training data) Correct
Answer
Vector
Current
estimated
probability
distribution
incoming 0.03 0
independent 0.12 0
index 0.04 0
indication 0.01 0
...
input 0.11 1
Initial 0.21 0
... ... ...
Vocabulary
Parameter
update
direction
Target
word
...
8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (1/2)
l If we have an LM, we can ... ??
1, Evaluate likelihood of given texts
Example: Statistical machine translation (SMT) system
HI YWUY1 KKIUH WKUYN LO WUNI
Input text
Translation
Model
Language Model
(LM)
(^ ^) ♪( ´▽`) ##!
Output text
&’# ()#’(“
OO)” LOW)
(^ ^) ♪( ´▽`)
##! ##!
OOooOO
(^~^)
21.5 x
145.2 x
3.2 x
111.2 x
...
72.4 x
9.2 x
...
Translation
score
21.5
35.2
21.2
11.2
...
32.4
119.2
...
LM
score
(intermediate)
Candidates
9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (2/2)
l If we have an LM, we can ... ??
2, Generate texts
l Estimate next words one-by-one
auto-regressively
l For greedy search, compute
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
argmax
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
10
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Typical usages of LMs (2/2)
l If we have an LM, we can ... ??
2, Generate texts
l Estimate next words one-by-one
auto-regressively
l For greedy search, compute
Estimation: 𝑒
We have never met before , right ? Nice to meet you !
Neural LM (e.g., Generative pre-trained transformer: GPT)
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
A
this
that
…
meet
have
you
…
Nice
…
to
…
too
,
.
Example: Generating text
using neural LM Nice to
Nice to
meet
meet
you
you
, too ...
, ...
Vocabulary Vocabulary Vocabulary Vocabulary
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
argmax
<latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit>
P✓(Y ) =
J+1
Y
j=1
P✓(yj | Y<j)
11
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Rough comparisons between general abilities of 𝑛-gram
LM and Neural LM
Why using Neural LM
Computational
cost
Longer context
Unseen context
𝑛-gram LM Neural LM
😩 😋
😩
😋
😩 😋
Performance
(in terms of perplexity) 😩 😋
12
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
...
serve
serve
as the
as
input
the
Output
Input
1 2 3 4 5
Time step
Deep
Neural
Network
Context
… serve as the input …
Input text (training data) Correct
Answer
Vector
Current
estimated
probability
distribution
incoming 0.03 0
independent 0.12 0
index 0.04 0
indication 0.01 0
...
input 0.11 1
Initial 0.21 0
... ... ...
Vocabulary
Parameter
update
direction
Target
word
...
② Typical Base Model Architecture
Transformer
u Surprisingly, all the famous LMs developed recently have chosen
Transformers as their base model
13
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Transformer (1/2)
l 2017: Published First draft paper
u Developed primarily for machine translation tasks
From: Ashish Vaswani [view email]
[v1] Mon, 12 Jun 2017 17:57:34 UTC
[v2] Mon, 19 Jun 2017 16:49:45 UTC
[v3] Tue, 20 Jun 2017 05:20:02 UTC
[v4] Fri, 30 Jun 2017 17:29:30 UTC
[v5] Wed, 6 Dec 2017 03:30:32 UTC
14
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Transformer (2/2)
l 2017: Published First draft paper
l 2018: Selected as base model architecture of
BERT (and GPT)
u => fundamental model for language
l 2023 [Current]: used as base model architecture for
(almost) all famous LMs
l e.g., GPT-3 (ChatGPT), GPT-4, PaLM (PaLM2), OPT, LLaMa, Bloom, ...
(GPT: Generative Pre-trained Transformer)
(BERT: Bidirectional Encoder Representations from Transformers)
15
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
[FYI] Transformer for Image Processing
l Used for image processing tasks
u 2020: Vision Transformer (ViT)
u Competitive to the conventional CNNs
Split given image
into parts according
to the mesh
https://openreview.net/forum?id=YicbFdNTTy
16
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
③ Three Major Model Types
l Encoder type (Masked LMs)
l e.g., BERT, RoBERTa
l Bidirectional
l Decoder type (Causal LMs)
l e.g., GPT
l Unidirectional (left-to-right)
l Encoder-decoder type
l e.g., T5
l Encoder: bidirectional,
Decoder: unidirectional
Not discuss deeply about Encoder type in this talk
Reason: This type becomes less important in these days,
in terms of the context of “Generative AI”
17
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
④ LMs as Universal Features
l Typical usages of LMs
=> If we have an LM, we can ... ??
u 1, Evaluate likelihood
of given texts
u 2, Generate texts
u 3, Use LMs as universal features
l Novel usage of LMs
l Ability of neural networks has enabled it
l All LMs explicitly/implicitly take this approach
Identical to
traditional usages
like n-gram LMs
Pioneer of
this usage
18
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMs as Universal Features
l Neural LM => Implicitly encodes/captures many
linguistic aspects such as
l Such learned linguistic aspects can help a lot for many
NLP tasks
LM
Training
l Distribution of word occurrences given contexts
l Semantically similar/dissimilar expressions
l Syntactic/semantic structural information
19
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Large-scale dataset
from the Web
e.g., news articles,
wikipedia, arXiv,
GitHub
⑤ Pre-training / Fine-tuning Scheme
l Two stage training scheme of LMs
u 1, Pre-training
l Train LMs from extremely many raw texts
obtained from the web
u 2, Fine-tuning
l Train LMs from human annotated data
u Human annotated data: relatively small
Pre-Training
Language
Model
Human-annotated data
Fine-tuning
First step
Second step
20
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Viewpoint from Data Sparsity Problem
l Pre-training / Fine-tuning scheme
u Variant of Transfer learning
u Requires less human annotated data to achieve reasonable task
performance
l Free from having to prepare a large amount of human
annotation data for every NLP task
u Relatively expensive and time-consuming to increase human
annotation data
l May solves (or alleviates) an essential problem (called
data sparsity problem) in the NLP community that has
remained unsolved for a long time
21
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
⑥ LMs as multitask Learner
l Tackle to solve any NLP tasks by a single model
u Possible by casting in a unified ”text-to-text” generation task
Copied from
https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html
Example: T5
22
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMs as Multitask Learner
+ Pre-training / Fine-tuning Scheme
l Single LM can solve many NLP tasks
Large-scale dataset
from the Web
e.g., news articles,
wikipedia, arXiv, GitHub
Fact checking
○○ is good
is bad
Sentiment Analysis
Machine Translation
Hello! こんにちは
本日のニュース
・株価 ○○
・通常国会、、
・東京都、、
Text summarization
Question Answering
Language
Model
MT data QA data SA data
”text-to-text” format
Pre-Training
Fine-tuning
First step
Second step
T5 (and partially GPT-2): pioneers of this approach
=> First trial toward artificial general intelligence in LM literature
23
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Summary/Take Home Message: Part 1
24
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part1 (1/2)
l ① Language Model: Probabilistic Model
u Modeling probabilities for all possible next words (probability
distribution) in vocabulary, given a context
l ② Base Model Architecture: Transformer
u Developed mainly for neural machine translation (NMT)
l ③ Model Types
u Encoder-type: (discard discussions of this type in this talk)
u Decoder-type:
u Encoder-decoder-type:
25
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part1 (2/2)
l ④ Universal Features
u Neural LM: Implicitly encodes/captures many linguistic aspects
l E.g., ELMo, BERT, GPT-2, RoBERTa, ...
l ⑤ Pre-training & Fine-tuning Scheme
u Valiant of transfer learning (from Pre-trained LMs)
u Free from having to prepare a large amount of human annotation
data for every NLP task
u Alleviate data sparsity problem (long-time problem) in NLP
l ⑥ Multi-task Learner (towards artificial general
intelligence)
u Tackle any NLP tasks in a single model
u Toward artificial general intelligence
[introductory]
A Gentle Introduction to Technologies Behind
Language Models
and Recent Achievement in ChatGPT
2023.05.25 (Thu)
PAKDD 2023 Tutorial 2
Jun Suzuki
Tohoku University
Kyosuke Nishida
NTT
Human Informatics Laboratories
Naoaki Okazaki
Tokyo Institute of Technology
PART 2
1
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Part 2: Large language models (LLMs)
2
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
3
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Selected LMs Discussed in This Talk
l aa
① Model nick name
② Date of first published or
announced in public
③ URL for the announcement
e.g., paper or blog
① ELMo ② 2018.02
③ arxiv.org/abs/1802.05365
① GPT-1 ② 2018.06
③ openai.com/research/language-
unsupervised
① GPT-2 ② 2019.02
③ openai.com/blog/better-
language-models/
① GPT-3 ② 2020.05
③ arxiv.org/abs/2005.14165
* 1 https://www.microsoft.com/en-
us/research/blog/using-deepspeed-
and-megatron-to-train-megatron-
turing-nlg-530b-the-worlds-largest-
and-most-powerful-generative-
language-model/
① BLOOM ② 2022.01
③ *2
① Chinchilla ② 2022.03
③ arxiv.org/abs/2203.15556
① PaLM ② 2022.04
③ storage.googleapis.com/pathways-
language-model/PaLM-paper.pdff
*2
https://github.com/bigscience-
workshop/bigscience/tree/mas
ter/train/tr11-176B-ml
① RoBERTa ② 2019.07
③ arxiv.org/abs/1907.11692
① T5 ② 2019.10
③ arxiv.org/abs/1910.10683
① Megatron-turing NLG
② 2021.10 ③ *1
① PaLM2 ② 2023.05
③ https://ai.google/static/documents/palm2techreport.pdf
① GPT-4 ② 2023.03
③cdn.openai.com/papers/gpt-4.pdf
2018 2019 2020 2021 2022 2023 2024
① FLAN ② 2021.09
③ arxiv.org/abs/2109.01652
① BERT ② 2018.10
③ arxiv.org/abs/1810.04805
① T0 ② 2021.10
③ arxiv.org/abs/2110.08207
① LLaMA ② 2023.02
③research.facebook.com/publications/llama-
open-and-efficient-foundation-language-
models/
① OPT ② 2022.05
③ arxiv.org/abs/2205.01068
4
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Contents of Part 2
l Part 2: Large Language Models (LLMs)
u ① LMsʼ Scaling Laws (Parameter size ver.)
u ② LLMs: GPT-3
u ③ Prompt Engineering
u ④ Achievement: LLMs and Prompts
u ⑤ Remaining Issues: LLMs and Prompts
5
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
① LMsʼ Scaling Laws (Parameter size)
l General tendency:
More parameters => Better performance !?
[Kaplan+, 2020] Scaling Laws for Neural Language Models, arXiv:2001.08361
N : number of parameters (w/o embedding vectors) [logarithmic scale]
Sm
aller is
better
6
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
LMsʼ Scaling Laws: Parameter Size
PaLM 2
GPT-4 (?)
LLaMA
OPT
PaLM
Chinchilla
BLOOM
Megatron Turing-
NLG
GPT-3
T5
RoBERTa
GPT-2
BERT
GPT-1
ELMO
0.1
1
10
100
1000
01/2018
04/2018
07/2018
10/2018
01/2019
04/2019
07/2019
10/2019
01/2020
04/2020
07/2020
10/2020
01/2021
04/2021
07/2021
10/2021
01/2022
04/2022
07/2022
10/2022
01/2023
04/2023
07/2023
10/2023
Model
Size
(in
billions
of
parameters)
[logarithmic
scale
Roughly x10 per year
7
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
② LLMs: GPT-3
l Introducing several new concepts
u 1, Larger model: First >100B parameter LMs
u 2, Potential on prompt engineering (in-context learning)
l Few-shot learning
l Instruction
https://arxiv.org/abs/2005.14165
PaLM 2
GPT-4 (?)
LLaMA
OPT
PaLM
Chinchilla
BLOOM
Megatron Turing-
NLG
GPT-3
T5
RoBERTa
GPT-2
BERT
GPT-1
ELMO
0.1
1
10
100
1000
01/2018
04/2018
07/2018
10/2018
01/2019
04/2019
07/2019
10/2019
01/2020
04/2020
07/2020
10/2020
01/2021
04/2021
07/2021
10/2021
01/2022
04/2022
07/2022
10/2022
01/2023
04/2023
07/2023
10/2023
Model
Size
(in
billions
of
parameters)
[logarithmic
scale
Large jump-up in model size
(note: y-axis is log-scale)
8
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
GPT-3: Impact of using Large Neural Models
l By scaling up the model parameters,
LMs enable to gain the ... :
u 1, Capacity to memorize much of the training data in its entirety
u 2, Ability to produce very fluent sentences, like a human writer
u 3, Solvable patterns for simple calculations, reasoning,
argumentative thinking, common sense, etc., at the elementary
level
( Some might say, "gain the ability to solve/think simple
calculations ..." but not in this talk )
9
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Generate Just Like Human Writers (?)
l Trial to generate fake news
Figure 3.14: The GPT-3 generated news article that humans had the greatest difficulty
distinguishing from a human written article (accuracy: 12%).
Title: United Methodists Agree to Historic Split
Subtitle: Those who oppose gay marriage will form their own denomination
Article: After two days of intense debate, the United Methodist Church has agreed to a historic split
- one that is expected to end in the creation of a new denomination, one that will be "theologically
and socially conservative," according to The Washington Post. The majority of delegates attending
the church's annual General Conference in May voted to strengthen a ban on the ordination of
LGBTQ clergy and to write new rules that will "discipline" clergy who officiate at same-sex weddings.
But those who opposed these measures have a new plan: They say they will form a separate
denomination by 2020, calling their church the Christian Methodist denomination.
The Post notes that the denomination, which claims 12.5 million members, was in the early 20th
century the "largest Protestant denomination in the U.S.,” but that it has been shrinking in recent
decades. The new split will be the second in the church's history. The first occurred in 1968, when
roughly 10 percent of the denomination left to form the Evangelical United Brethren Church. The
Post notes that the proposed split "comes at a critical time for the church, which has been losing
members for years," which has been "pushed toward the brink of a schism over the role of LGBTQ
people in the church." Gay marriage is not the only issue that has divided the church. In 2016, the
denomination was split over ordination of transgender clergy, with the North Pacific regional
conference voting to ban them from serving as clergy, and the South Pacific regional conference
voting to allow them.
Given Title, Subtitle and prefix
(prompt) word “Article:”
10
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Generate texts just like human writers?
l Several reports that attest to GPT-3's ability to
generate fluent sentences that can potentially fool us
https://www.technologyreview.com/2020/08/14/1006
780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/
l A student who studies computer science at the UC
Berkeley, develops a fake blog site using GPT-3.
l One of the fake blog posts reaches #1 on Hacker News.
https://metastable.org/gpt-3.html
l Someone posted on a Reddit message
board for nearly a week using GPT-3.
l The posting by GPT-3 was eventually
stopped by the developer, but many
readers were unaware that the posts were
AI-generated.
11
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of LLMs
l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
12
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
③ Prompt Engineering
l Model size becomes larger
=> fine-tuning cost also becomes larger
l Consider reducing the computational cost
=> Method without additional training from Pre-trained
LLMs
l Alternative approach of pre-training / fine-tuning
scheme
13
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
What is Prompt?
l Starting text given to LMs to generate subsequent text.
Example: standard text completion case
Sensoji is Language Model the oldest temple in Tokyo ...
Prompt
Equivalent to the “context”
(prefix text) in LMs
Why ? In GPT-2 & GPT-3 papers
Example: Machine translation experiment
Context format
English sentence = Japanese sentence
Hello. = こんにちは。
It is snowing. = 雪が降っている
Today it is sunny. = 今日は晴天です。
Yesterday it rained. =
Prompt
Starting
text
Confusing, but currently,
we usually call the entire
starting text (prefix text) as
“prompt“, not the last
sentence, since we set
several different types of
texts in the prompt.
This looks like “prompt” of
shell in terminal ??
14
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
More Information in Context
l Conventional LMs
u Generate sentences following context words
=> LMsʼ basic task: text completion
l LM + Prompt
u Generate sentences by giving a sort of instruction sentences on
how we expect LMs to solve a target task as context words
l e.g., 1, Examples (exemplars),
2, (Human) Instruction for explaining the target task
u Elicit the implicit capabilities of the LLM to solve the given task,
without changing (no-learn) the model parameters
l Prerequisite: already learned through pre-training
( => can't resolve the task that hasnʼt been learned in general )
u Use context to "control" generation
15
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Few-shot (1/3)
l Equivalent to the “context” (prefix text)
u But, not just a standard prefix text, but more for ordering or
instruction for what we want LMs to generate a sentence
Estimation: 𝑒 Sensoji Temple ” <EOS>
Question: 𝑞
What is the oldest temple in Tokyo ? The answer is “ Sensoji Temple ”
instruction
Example: Question answering
Generative Pre-trained Transformer (GPT)
Q: What is the oldest temple in Tokyo? A: Sensoji / Sensoji Temple
16
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Few-shot (2/3)
l Setting where the model is given a few demonstrations
(examples/exemplars) of the task at inference time
u No weight updates
https://arxiv.org/abs/2005.14165
17
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of Few-show Learning
l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
18
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Chain-of-thought (3/3)
l Scaling up model size alone has not proved sufficient
high performance on challenging tasks
u Arithmetic, Commonsense, Symbolic reasoning
Chain-of-thought (CoT) prompting improves performance
Example
(One-shot)
Question
19
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Example of Prompt: Chain-of-thought (3/3)
l Examples of CoT prompting for challenging tasks
20
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Effectiveness of CoT Prompting
l Performance of CoT prompting for challenging tasks
21
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
④ Achievement and Remaining Issues
l Achievement
u LLMs
l Very fluent like human-made sentences
u Prompt engineering
l Solve many language tasks without
1, parameter update
2, preparing the task specific supervised data
22
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Remaining Issues
l Prompt: not consistently effective
u Limited availability, dependent on pre-training data
l Effective prompt: somewhat artificial (or non-intuitive to humans)
u LM tasks: basically text-completion tasks
l Sometimes generates
unreliable (not based on facts),
biased (discriminatory or socially criticized),
inconsistent (opposed responses in the same context), or
harmful (information that should not publicly shared) texts
Instruction tuning and chat tuning (Part 3)
23
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
l Summary/Take Home Message: Part 2
24
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part 2 (1/2)
l ① LMsʼ Scaling laws
u Larger is better ?
l ② LLMs: GPT-3
u Very fluent as human writer
l ③ Prompt engineering (in-context learning)
u Background: discard fine-tuning cost
l Unnecessary of the additional cost for train large model
u Few-shot examples, chain-of-thought, instruction, ...
25
PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu)
Summary/Take Home Message: Part 2 (1/2)
l ④ Achievement and Remaining Issues: LLMs and
Prompts
u Achievement
l Very fluent as human writer
l Solve many language tasks without preparing the task specific
supervised data
u Remaining Issues
l Prompts can be somewhat artificial
l How we prevent to generate unreliable, biased, inconsistent, or
harmful texts

More Related Content

What's hot

Jawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptxJawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptxJawadNadeem3
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdfSome Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdfKent Bye
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
 
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...patiladiti752
 
Impulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate LearningImpulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate LearningJan Foelsing
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Vadym Kazulkin
 
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPTAI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPTCprime
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfStephenAmell4
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AIMark DeLoura
 
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdfUNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdfHermes Romero
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
 

What's hot (20)

How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs
 
Jawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptxJawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptx
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdfSome Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
Some Preliminary Thoughts on Artificial Intelligence - April 20, 2023.pdf
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
 
Impulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate LearningImpulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate Learning
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
 
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPTAI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
 
Webinar on ChatGPT.pptx
Webinar on ChatGPT.pptxWebinar on ChatGPT.pptx
Webinar on ChatGPT.pptx
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
 
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdfUNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 

Similar to PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupProvectus
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmIRJET Journal
 
OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21Shane Coughlan
 
The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...ijma
 
Bs telecom final defence format
Bs telecom final defence formatBs telecom final defence format
Bs telecom final defence formatPrince Rose
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfsudeshnakundu10
 
Software project management
Software project managementSoftware project management
Software project managementAnnisa Shabrina
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...Pieter Pauwels
 
Simple Drools Examples
Simple Drools ExamplesSimple Drools Examples
Simple Drools ExamplesMatteo Mortari
 
Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17VishalButkar2
 
IRJET- School in the Cloud
IRJET- School in the CloudIRJET- School in the Cloud
IRJET- School in the CloudIRJET Journal
 
06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppm06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppmAzmi Sulaiman
 
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...Christian Morbidoni
 
Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17abc19789
 
Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)IOSR Journals
 

Similar to PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2) (20)

IA377 Seminar FEEC-UNICAMP Intro classpdf
IA377 Seminar FEEC-UNICAMP Intro classpdfIA377 Seminar FEEC-UNICAMP Intro classpdf
IA377 Seminar FEEC-UNICAMP Intro classpdf
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting Algorithm
 
OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21OpenChain Monthly Meeting - North America / Asia - 2023-03-21
OpenChain Monthly Meeting - North America / Asia - 2023-03-21
 
The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...The application of computer aided learning to learn basic concepts of branchi...
The application of computer aided learning to learn basic concepts of branchi...
 
Bs telecom final defence format
Bs telecom final defence formatBs telecom final defence format
Bs telecom final defence format
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
Software project management
Software project managementSoftware project management
Software project management
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
 
Simple Drools Examples
Simple Drools ExamplesSimple Drools Examples
Simple Drools Examples
 
Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17Te computer syllabus 2015 course 3-4-17 3-5-17
Te computer syllabus 2015 course 3-4-17 3-5-17
 
IRJET- School in the Cloud
IRJET- School in the CloudIRJET- School in the Cloud
IRJET- School in the Cloud
 
06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppm06 scheme ictf5 2015 -smkppm
06 scheme ictf5 2015 -smkppm
 
Rock Overview
Rock OverviewRock Overview
Rock Overview
 
DM2E DHAB meeting: WP3 Report Scholarly research platform
DM2E DHAB meeting: WP3 Report Scholarly research platformDM2E DHAB meeting: WP3 Report Scholarly research platform
DM2E DHAB meeting: WP3 Report Scholarly research platform
 
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
DM2E Digital Humanities Advisory Board - Pundit, Ask and scholarly research p...
 
Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17Te computer-syllabus-2015-course-3-4-17
Te computer-syllabus-2015-course-3-4-17
 
Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)Seminar and Project Manager and Resourceful Trainer(SMART)
Seminar and Project Manager and Resourceful Trainer(SMART)
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

PAKDD2023_Tutorial_T2 (Overview, Part 1, and Part 2)

  • 1. [introductory] A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT 2023.05.25 (Thu) PAKDD 2023 Tutorial 2 Jun Suzuki Tohoku University Kyosuke Nishida NTT Human Informatics Laboratories Naoaki Okazaki Tokyo Institute of Technology ● [Part 1, Part 2] https://www.fai.cds.tohoku.ac.jp/research/activities/ ● [Part 3, Part 4] https://speakerdeck.com/kyoun/pakdd2023-tutorial ● [Part 5] https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
  • 2. 1 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Schedule ● Overview [This part] ● Part 1: [introductory] Language models (LMs) ● Part 2: Large language models (LLMs) ● Part 3: Technologies underlying ChatGPT-like LLMs ● Part 4: Recent achievements in ChatGPT-like LLMs ● Part 5: Efforts for Responsible LLMs ● Q&A 10 min 20 min 20 min 20 min 20 min (Short Break: 10 min) (Short Break: 10 min) 20 min 20 min
  • 3. 2 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Abstract ● Language models (LMs) have a long history in natural language processing (NLP) research. Their usage was mainly a text generation module (or calculating likelihood of word sequences) in machine translation and speech recognition systems, used together with translation or acoustic models. After the current neural era, LMs take an essential role in the NLP field. In fact, LMs are integrated into any models/systems to tackle almost all the NLP tasks and provide state-of-the-art performance on conventional NLP benchmarks. The usage of LMs is considered to be shifting to more like a world model of languages or a general-purpose feature generator of any language-related tasks. More recently, the public sometimes treats LMs like ChatGPT, and its successor GPT-4, as general-purpose AI after starting an online service in the public domain. ● This tutorial will first introduce some introductory topics we should know when discussing the recent advances in LMs like ChatGPT. We will then briefly introduce the technologies behind ChatGPT-like LMs. Additionally, we also provide ChatGPT’s social impacts discussed recently in public.
  • 4. 3 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Aims of This Tutorial ● This tutorial aims to introduce necessary factual pieces of knowledge for discussing the latest LMs to researchers outside of the NLP field. ● In addition, another goal is for audiences to understand the strengths and weaknesses of LMs from the viewpoint of LM users, and help their future studies by learning the knowledge from this tutorial. ● Our audiences require no prior knowledge of LMs.
  • 5. 4 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Brief Overview ● This tutorial will begin with some introductory topics that we should know when discussing recent advances in LMs like ChatGPT. [Part 1, Part 2] ● https://www.fai.cds.tohoku.ac.jp/research/activities/ ● We will then briefly introduce the technologies behind ChatGPT-like LMs and their achievements. [Part 3, Part 4] ● https://speakerdeck.com/kyoun/pakdd2023-tutorial ● Finally, we will also provide a topic of ChatGPT-like LMs, Responsible LLMs, discussed recently in general public. [Part 5] ● https://speakerdeck.com/chokkan/efforts-for-responsible-llms-pakdd-2023-tutorial-2
  • 6. 5 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (1/5) ● Part 1: [introductory] Neural Language models (LMs) ◆ Traditional Definition of LMs ◆ Typical Base Model Architecture: Transformer ◆ Three Major Model Types: Encoder, Decoder, Encoder-decoder ◆ Universal Features ◆ Pre-training & Fine-tuning Scheme ◆ Multi-task Learner
  • 7. 6 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (2/5) ● Part 2: Large Language Models (LLMs) ◆ LMs’ Scaling Laws (Parameter size ver.) ◆ LLMs: GPT-3 ◆ Prompt Engineering ◆ Achievement and Remaining Issues
  • 8. 7 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (3/5) ● Part 3: Technologies underlying ChatGPT-like LLMs ◆ Codex ◆ InstructGPT and ChatGPT ◆ GPT-4 and ChatGPT plugins ◆ LLaMA ◆ Alpaca and Vicuna ◆ MPT ◆ LLaVA
  • 9. 8 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (4/5) ● Part 4: Recent achievements in ChatGPT-like LLMs ◆ Performance in Natural Language Processing benchmarks and exams ◆ Interesting results in Vision-and-Language Understanding evaluations ◆ Open-source applications powered by LLMs ◆ Remaining Issues
  • 10. 9 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents (5/5) ● Part 5: Efforts for Responsible LLMs ◆ High-level overview of potential harms of LLMs ◆ Efforts for reducing potential harms in GPT-4 and PaLM 2
  • 11. [introductory] A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT 2023.05.25 (Thu) PAKDD 2023 Tutorial 2 Jun Suzuki Tohoku University Kyosuke Nishida NTT Human Informatics Laboratories Naoaki Okazaki Tokyo Institute of Technology PART 1
  • 12. 1 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Part 1: [introductory] Neural Language models (LMs) We only discuss neural LMs => In this talk, “LM” means “neural LM” (unless otherwise specified)
  • 13. 2 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068
  • 14. 3 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068 ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog
  • 15. 4 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents of Part 1 l Part 1: [introductory] Neural Language Models (LMs) u ① Traditional Definition of LMs u ② Typical Base Model Architecture: Transformer u ③ Three Major Model Types: Encoder, Decoder, Encoder-decoder u ④ Universal Features u ⑤ Pre-training & Fine-tuning Scheme u ⑥ Multi-task Learner
  • 16. 5 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ① Traditional Definition of Language Model (LM) Language Model = Probabilistic model u Modeling probabilities for all next words (probability distribution) in vocabulary, given a context : Context, prefix text, before j-th word = J+1 Y j=1 P✓(yj | Y<j) ✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) : Target word, j-th word ocean is ̲̲ ?? Y ) = J+1 Y j=1 P✓(yj | Y<j) <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) the , dark bleu white today fine excellent good scary … Vocabulary Probability … <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) Conditional Probability ?? dark ocean is ̲̲ Y ) = J+1 Y j=1 P✓(yj | Y<j) <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j)
  • 17. 6 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Traditional LMs: 𝑛-gram LM l Example: Google 𝑛-gram https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html serve as the incoming 92 serve as the independent 794 serve as the index 223 serve as the indication 72 serve as the indicator 120 serve as the indicators 45 serve as the indispensable 111 serve as the indispensible 40 serve as the individual 234 serve as the industrial 52 serve as the industry 607 serve as the info 42 serve as the informal 102 serve as the information 838 serve as the informational 41 serve as the initial 5331 serve as the initiating 125 serve as the initiation 63 serve as the initiator 81 serve as the ………… Counts of consecutive 𝑛 words <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) Context Target word Count Probability Total 187491 0.00049 0.00423 0.00119 0.00038 0.00064 0.00024 0.00059 0.00021 0.00125 0.00028 0.00324 0.00022 0.00054 0.00447 0.00022 0.02843 0.00067 0.00034 0.00043
  • 18. 7 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Neural LM l Fit probability distribution over a sequence of words given a context with a (deep) neural network ... serve serve as the as input the Output Input 1 2 3 4 5 Time step Deep Neural Network Context … serve as the input … Input text (training data) Correct Answer Vector Current estimated probability distribution incoming 0.03 0 independent 0.12 0 index 0.04 0 indication 0.01 0 ... input 0.11 1 Initial 0.21 0 ... ... ... Vocabulary Parameter update direction Target word ...
  • 19. 8 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Typical usages of LMs (1/2) l If we have an LM, we can ... ?? 1, Evaluate likelihood of given texts Example: Statistical machine translation (SMT) system HI YWUY1 KKIUH WKUYN LO WUNI Input text Translation Model Language Model (LM) (^ ^) ♪( ´▽`) ##! Output text &’# ()#’(“ OO)” LOW) (^ ^) ♪( ´▽`) ##! ##! OOooOO (^~^) 21.5 x 145.2 x 3.2 x 111.2 x ... 72.4 x 9.2 x ... Translation score 21.5 35.2 21.2 11.2 ... 32.4 119.2 ... LM score (intermediate) Candidates
  • 20. 9 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Typical usages of LMs (2/2) l If we have an LM, we can ... ?? 2, Generate texts l Estimate next words one-by-one auto-regressively l For greedy search, compute <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) argmax <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j)
  • 21. 10 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Typical usages of LMs (2/2) l If we have an LM, we can ... ?? 2, Generate texts l Estimate next words one-by-one auto-regressively l For greedy search, compute Estimation: 𝑒 We have never met before , right ? Nice to meet you ! Neural LM (e.g., Generative pre-trained transformer: GPT) A this that … meet have you … Nice … to … too , . A this that … meet have you … Nice … to … too , . A this that … meet have you … Nice … to … too , . A this that … meet have you … Nice … to … too , . Example: Generating text using neural LM Nice to Nice to meet meet you you , too ... , ... Vocabulary Vocabulary Vocabulary Vocabulary <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j) argmax <latexit sha1_base64="vBx2H+BCUKXtn7gWNGtLQQoCzKw=">AAAC2nichVG/axRBFP6y0SQmMTljI9gshkgkcLwNiCIGDm3U6pJ4uUg2Lrt7k2Qu+4vduYNz2cZOLRUsrBQsxN4uVRr/gRQBOytRqwg2Fr7dPRATTd4yO9/75n1vvuE5kScTRbQ/oA2eOj00PHJmdGz87MRk5dzUShJ2Ylc03NAL41XHToQnA9FQUnliNYqF7TueaDrbt/PzZlfEiQyD+6oXiXXf3gzkhnRtxZRVadat1HT81FRbQtlZNpsnD7Ir+oJuRnHYepjemzMyK20vGJn+r9peZrV105ctvVRa6c02y63KNFWpCP0oMPpgunb3+afacHO8HlY+wEQLIVx04EMggGLswUbC3xoMECLm1pEyFzOSxblAhlHWdrhKcIXN7Db/Nzlb67MB53nPpFC7fIvHK2aljhnao3d0QB/pPX2hX//tlRY9ci893p1SKyJr8umF5Z8nqnzeFbb+qI71rLCB64VXyd6jgslf4Zb67qOXB8s3lmbSy/SGvrL/17RPu/yCoPvDfbsoll4d48dhLxmPxzg8jKNgZb5qXK3SIs/pFsoYwUVcwixP4xpquIM6Gtx9B5/xDd81U3usPdGelaXaQF9zHn+F9uI3/lK11A==</latexit> P✓(Y ) = J+1 Y j=1 P✓(yj | Y<j)
  • 22. 11 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) l Rough comparisons between general abilities of 𝑛-gram LM and Neural LM Why using Neural LM Computational cost Longer context Unseen context 𝑛-gram LM Neural LM 😩 😋 😩 😋 😩 😋 Performance (in terms of perplexity) 😩 😋
  • 23. 12 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ... serve serve as the as input the Output Input 1 2 3 4 5 Time step Deep Neural Network Context … serve as the input … Input text (training data) Correct Answer Vector Current estimated probability distribution incoming 0.03 0 independent 0.12 0 index 0.04 0 indication 0.01 0 ... input 0.11 1 Initial 0.21 0 ... ... ... Vocabulary Parameter update direction Target word ... ② Typical Base Model Architecture Transformer u Surprisingly, all the famous LMs developed recently have chosen Transformers as their base model
  • 24. 13 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Transformer (1/2) l 2017: Published First draft paper u Developed primarily for machine translation tasks From: Ashish Vaswani [view email] [v1] Mon, 12 Jun 2017 17:57:34 UTC [v2] Mon, 19 Jun 2017 16:49:45 UTC [v3] Tue, 20 Jun 2017 05:20:02 UTC [v4] Fri, 30 Jun 2017 17:29:30 UTC [v5] Wed, 6 Dec 2017 03:30:32 UTC
  • 25. 14 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Transformer (2/2) l 2017: Published First draft paper l 2018: Selected as base model architecture of BERT (and GPT) u => fundamental model for language l 2023 [Current]: used as base model architecture for (almost) all famous LMs l e.g., GPT-3 (ChatGPT), GPT-4, PaLM (PaLM2), OPT, LLaMa, Bloom, ... (GPT: Generative Pre-trained Transformer) (BERT: Bidirectional Encoder Representations from Transformers)
  • 26. 15 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) [FYI] Transformer for Image Processing l Used for image processing tasks u 2020: Vision Transformer (ViT) u Competitive to the conventional CNNs Split given image into parts according to the mesh https://openreview.net/forum?id=YicbFdNTTy
  • 27. 16 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ③ Three Major Model Types l Encoder type (Masked LMs) l e.g., BERT, RoBERTa l Bidirectional l Decoder type (Causal LMs) l e.g., GPT l Unidirectional (left-to-right) l Encoder-decoder type l e.g., T5 l Encoder: bidirectional, Decoder: unidirectional Not discuss deeply about Encoder type in this talk Reason: This type becomes less important in these days, in terms of the context of “Generative AI”
  • 28. 17 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ④ LMs as Universal Features l Typical usages of LMs => If we have an LM, we can ... ?? u 1, Evaluate likelihood of given texts u 2, Generate texts u 3, Use LMs as universal features l Novel usage of LMs l Ability of neural networks has enabled it l All LMs explicitly/implicitly take this approach Identical to traditional usages like n-gram LMs Pioneer of this usage
  • 29. 18 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) LMs as Universal Features l Neural LM => Implicitly encodes/captures many linguistic aspects such as l Such learned linguistic aspects can help a lot for many NLP tasks LM Training l Distribution of word occurrences given contexts l Semantically similar/dissimilar expressions l Syntactic/semantic structural information
  • 30. 19 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Large-scale dataset from the Web e.g., news articles, wikipedia, arXiv, GitHub ⑤ Pre-training / Fine-tuning Scheme l Two stage training scheme of LMs u 1, Pre-training l Train LMs from extremely many raw texts obtained from the web u 2, Fine-tuning l Train LMs from human annotated data u Human annotated data: relatively small Pre-Training Language Model Human-annotated data Fine-tuning First step Second step
  • 31. 20 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Viewpoint from Data Sparsity Problem l Pre-training / Fine-tuning scheme u Variant of Transfer learning u Requires less human annotated data to achieve reasonable task performance l Free from having to prepare a large amount of human annotation data for every NLP task u Relatively expensive and time-consuming to increase human annotation data l May solves (or alleviates) an essential problem (called data sparsity problem) in the NLP community that has remained unsolved for a long time
  • 32. 21 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ⑥ LMs as multitask Learner l Tackle to solve any NLP tasks by a single model u Possible by casting in a unified ”text-to-text” generation task Copied from https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html Example: T5
  • 33. 22 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) LMs as Multitask Learner + Pre-training / Fine-tuning Scheme l Single LM can solve many NLP tasks Large-scale dataset from the Web e.g., news articles, wikipedia, arXiv, GitHub Fact checking ○○ is good is bad Sentiment Analysis Machine Translation Hello! こんにちは 本日のニュース ・株価 ○○ ・通常国会、、 ・東京都、、 Text summarization Question Answering Language Model MT data QA data SA data ”text-to-text” format Pre-Training Fine-tuning First step Second step T5 (and partially GPT-2): pioneers of this approach => First trial toward artificial general intelligence in LM literature
  • 34. 23 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) l Summary/Take Home Message: Part 1
  • 35. 24 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part1 (1/2) l ① Language Model: Probabilistic Model u Modeling probabilities for all possible next words (probability distribution) in vocabulary, given a context l ② Base Model Architecture: Transformer u Developed mainly for neural machine translation (NMT) l ③ Model Types u Encoder-type: (discard discussions of this type in this talk) u Decoder-type: u Encoder-decoder-type:
  • 36. 25 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part1 (2/2) l ④ Universal Features u Neural LM: Implicitly encodes/captures many linguistic aspects l E.g., ELMo, BERT, GPT-2, RoBERTa, ... l ⑤ Pre-training & Fine-tuning Scheme u Valiant of transfer learning (from Pre-trained LMs) u Free from having to prepare a large amount of human annotation data for every NLP task u Alleviate data sparsity problem (long-time problem) in NLP l ⑥ Multi-task Learner (towards artificial general intelligence) u Tackle any NLP tasks in a single model u Toward artificial general intelligence
  • 37. [introductory] A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT 2023.05.25 (Thu) PAKDD 2023 Tutorial 2 Jun Suzuki Tohoku University Kyosuke Nishida NTT Human Informatics Laboratories Naoaki Okazaki Tokyo Institute of Technology PART 2
  • 38. 1 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Part 2: Large language models (LLMs)
  • 39. 2 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068
  • 40. 3 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Selected LMs Discussed in This Talk l aa ① Model nick name ② Date of first published or announced in public ③ URL for the announcement e.g., paper or blog ① ELMo ② 2018.02 ③ arxiv.org/abs/1802.05365 ① GPT-1 ② 2018.06 ③ openai.com/research/language- unsupervised ① GPT-2 ② 2019.02 ③ openai.com/blog/better- language-models/ ① GPT-3 ② 2020.05 ③ arxiv.org/abs/2005.14165 * 1 https://www.microsoft.com/en- us/research/blog/using-deepspeed- and-megatron-to-train-megatron- turing-nlg-530b-the-worlds-largest- and-most-powerful-generative- language-model/ ① BLOOM ② 2022.01 ③ *2 ① Chinchilla ② 2022.03 ③ arxiv.org/abs/2203.15556 ① PaLM ② 2022.04 ③ storage.googleapis.com/pathways- language-model/PaLM-paper.pdff *2 https://github.com/bigscience- workshop/bigscience/tree/mas ter/train/tr11-176B-ml ① RoBERTa ② 2019.07 ③ arxiv.org/abs/1907.11692 ① T5 ② 2019.10 ③ arxiv.org/abs/1910.10683 ① Megatron-turing NLG ② 2021.10 ③ *1 ① PaLM2 ② 2023.05 ③ https://ai.google/static/documents/palm2techreport.pdf ① GPT-4 ② 2023.03 ③cdn.openai.com/papers/gpt-4.pdf 2018 2019 2020 2021 2022 2023 2024 ① FLAN ② 2021.09 ③ arxiv.org/abs/2109.01652 ① BERT ② 2018.10 ③ arxiv.org/abs/1810.04805 ① T0 ② 2021.10 ③ arxiv.org/abs/2110.08207 ① LLaMA ② 2023.02 ③research.facebook.com/publications/llama- open-and-efficient-foundation-language- models/ ① OPT ② 2022.05 ③ arxiv.org/abs/2205.01068
  • 41. 4 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Contents of Part 2 l Part 2: Large Language Models (LLMs) u ① LMsʼ Scaling Laws (Parameter size ver.) u ② LLMs: GPT-3 u ③ Prompt Engineering u ④ Achievement: LLMs and Prompts u ⑤ Remaining Issues: LLMs and Prompts
  • 42. 5 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ① LMsʼ Scaling Laws (Parameter size) l General tendency: More parameters => Better performance !? [Kaplan+, 2020] Scaling Laws for Neural Language Models, arXiv:2001.08361 N : number of parameters (w/o embedding vectors) [logarithmic scale] Sm aller is better
  • 43. 6 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) LMsʼ Scaling Laws: Parameter Size PaLM 2 GPT-4 (?) LLaMA OPT PaLM Chinchilla BLOOM Megatron Turing- NLG GPT-3 T5 RoBERTa GPT-2 BERT GPT-1 ELMO 0.1 1 10 100 1000 01/2018 04/2018 07/2018 10/2018 01/2019 04/2019 07/2019 10/2019 01/2020 04/2020 07/2020 10/2020 01/2021 04/2021 07/2021 10/2021 01/2022 04/2022 07/2022 10/2022 01/2023 04/2023 07/2023 10/2023 Model Size (in billions of parameters) [logarithmic scale Roughly x10 per year
  • 44. 7 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ② LLMs: GPT-3 l Introducing several new concepts u 1, Larger model: First >100B parameter LMs u 2, Potential on prompt engineering (in-context learning) l Few-shot learning l Instruction https://arxiv.org/abs/2005.14165 PaLM 2 GPT-4 (?) LLaMA OPT PaLM Chinchilla BLOOM Megatron Turing- NLG GPT-3 T5 RoBERTa GPT-2 BERT GPT-1 ELMO 0.1 1 10 100 1000 01/2018 04/2018 07/2018 10/2018 01/2019 04/2019 07/2019 10/2019 01/2020 04/2020 07/2020 10/2020 01/2021 04/2021 07/2021 10/2021 01/2022 04/2022 07/2022 10/2022 01/2023 04/2023 07/2023 10/2023 Model Size (in billions of parameters) [logarithmic scale Large jump-up in model size (note: y-axis is log-scale)
  • 45. 8 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) GPT-3: Impact of using Large Neural Models l By scaling up the model parameters, LMs enable to gain the ... : u 1, Capacity to memorize much of the training data in its entirety u 2, Ability to produce very fluent sentences, like a human writer u 3, Solvable patterns for simple calculations, reasoning, argumentative thinking, common sense, etc., at the elementary level ( Some might say, "gain the ability to solve/think simple calculations ..." but not in this talk )
  • 46. 9 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Generate Just Like Human Writers (?) l Trial to generate fake news Figure 3.14: The GPT-3 generated news article that humans had the greatest difficulty distinguishing from a human written article (accuracy: 12%). Title: United Methodists Agree to Historic Split Subtitle: Those who oppose gay marriage will form their own denomination Article: After two days of intense debate, the United Methodist Church has agreed to a historic split - one that is expected to end in the creation of a new denomination, one that will be "theologically and socially conservative," according to The Washington Post. The majority of delegates attending the church's annual General Conference in May voted to strengthen a ban on the ordination of LGBTQ clergy and to write new rules that will "discipline" clergy who officiate at same-sex weddings. But those who opposed these measures have a new plan: They say they will form a separate denomination by 2020, calling their church the Christian Methodist denomination. The Post notes that the denomination, which claims 12.5 million members, was in the early 20th century the "largest Protestant denomination in the U.S.,” but that it has been shrinking in recent decades. The new split will be the second in the church's history. The first occurred in 1968, when roughly 10 percent of the denomination left to form the Evangelical United Brethren Church. The Post notes that the proposed split "comes at a critical time for the church, which has been losing members for years," which has been "pushed toward the brink of a schism over the role of LGBTQ people in the church." Gay marriage is not the only issue that has divided the church. In 2016, the denomination was split over ordination of transgender clergy, with the North Pacific regional conference voting to ban them from serving as clergy, and the South Pacific regional conference voting to allow them. Given Title, Subtitle and prefix (prompt) word “Article:”
  • 47. 10 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Generate texts just like human writers? l Several reports that attest to GPT-3's ability to generate fluent sentences that can potentially fool us https://www.technologyreview.com/2020/08/14/1006 780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/ l A student who studies computer science at the UC Berkeley, develops a fake blog site using GPT-3. l One of the fake blog posts reaches #1 on Hacker News. https://metastable.org/gpt-3.html l Someone posted on a Reddit message board for nearly a week using GPT-3. l The posting by GPT-3 was eventually stopped by the developer, but many readers were unaware that the posts were AI-generated.
  • 48. 11 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Effectiveness of LLMs l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
  • 49. 12 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ③ Prompt Engineering l Model size becomes larger => fine-tuning cost also becomes larger l Consider reducing the computational cost => Method without additional training from Pre-trained LLMs l Alternative approach of pre-training / fine-tuning scheme
  • 50. 13 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) What is Prompt? l Starting text given to LMs to generate subsequent text. Example: standard text completion case Sensoji is Language Model the oldest temple in Tokyo ... Prompt Equivalent to the “context” (prefix text) in LMs Why ? In GPT-2 & GPT-3 papers Example: Machine translation experiment Context format English sentence = Japanese sentence Hello. = こんにちは。 It is snowing. = 雪が降っている Today it is sunny. = 今日は晴天です。 Yesterday it rained. = Prompt Starting text Confusing, but currently, we usually call the entire starting text (prefix text) as “prompt“, not the last sentence, since we set several different types of texts in the prompt. This looks like “prompt” of shell in terminal ??
  • 51. 14 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) More Information in Context l Conventional LMs u Generate sentences following context words => LMsʼ basic task: text completion l LM + Prompt u Generate sentences by giving a sort of instruction sentences on how we expect LMs to solve a target task as context words l e.g., 1, Examples (exemplars), 2, (Human) Instruction for explaining the target task u Elicit the implicit capabilities of the LLM to solve the given task, without changing (no-learn) the model parameters l Prerequisite: already learned through pre-training ( => can't resolve the task that hasnʼt been learned in general ) u Use context to "control" generation
  • 52. 15 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Few-shot (1/3) l Equivalent to the “context” (prefix text) u But, not just a standard prefix text, but more for ordering or instruction for what we want LMs to generate a sentence Estimation: 𝑒 Sensoji Temple ” <EOS> Question: 𝑞 What is the oldest temple in Tokyo ? The answer is “ Sensoji Temple ” instruction Example: Question answering Generative Pre-trained Transformer (GPT) Q: What is the oldest temple in Tokyo? A: Sensoji / Sensoji Temple
  • 53. 16 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Few-shot (2/3) l Setting where the model is given a few demonstrations (examples/exemplars) of the task at inference time u No weight updates https://arxiv.org/abs/2005.14165
  • 54. 17 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Effectiveness of Few-show Learning l Task performance on several NLP tasks https://arxiv.org/abs/2005.14165
  • 55. 18 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Chain-of-thought (3/3) l Scaling up model size alone has not proved sufficient high performance on challenging tasks u Arithmetic, Commonsense, Symbolic reasoning Chain-of-thought (CoT) prompting improves performance Example (One-shot) Question
  • 56. 19 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Example of Prompt: Chain-of-thought (3/3) l Examples of CoT prompting for challenging tasks
  • 57. 20 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Effectiveness of CoT Prompting l Performance of CoT prompting for challenging tasks
  • 58. 21 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) ④ Achievement and Remaining Issues l Achievement u LLMs l Very fluent like human-made sentences u Prompt engineering l Solve many language tasks without 1, parameter update 2, preparing the task specific supervised data
  • 59. 22 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Remaining Issues l Prompt: not consistently effective u Limited availability, dependent on pre-training data l Effective prompt: somewhat artificial (or non-intuitive to humans) u LM tasks: basically text-completion tasks l Sometimes generates unreliable (not based on facts), biased (discriminatory or socially criticized), inconsistent (opposed responses in the same context), or harmful (information that should not publicly shared) texts Instruction tuning and chat tuning (Part 3)
  • 60. 23 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) l Summary/Take Home Message: Part 2
  • 61. 24 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part 2 (1/2) l ① LMsʼ Scaling laws u Larger is better ? l ② LLMs: GPT-3 u Very fluent as human writer l ③ Prompt engineering (in-context learning) u Background: discard fine-tuning cost l Unnecessary of the additional cost for train large model u Few-shot examples, chain-of-thought, instruction, ...
  • 62. 25 PAKDD-2023 Tutorial 2: A Gentle Introduction to Technologies Behind Language Models and Recent Achievement in ChatGPT / 2023.05.25 (Thu) Summary/Take Home Message: Part 2 (1/2) l ④ Achievement and Remaining Issues: LLMs and Prompts u Achievement l Very fluent as human writer l Solve many language tasks without preparing the task specific supervised data u Remaining Issues l Prompts can be somewhat artificial l How we prevent to generate unreliable, biased, inconsistent, or harmful texts