【DL輪読会】GPT-4Technical Report

1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
GPT-4Technical Report
Takeshi Kojima, Matsuo Lab

書誌情報
• タイトル
– GPT-4 Technical Report
• 著者
– OpenAIの方々（3ページ分）
• Pretraining:
• Long Context:
• Vision:
• Reinforcement Learning & Alignment:
• Evaluation & analysis:
• Deployment:
• https://arxiv.org/abs/2303.08774
– V1: 2023/3/15
2

概要
• GPT4とは
– 大規模なTransformerベースのマルチモーダルモデル
– 入力：文章と画像、出力：文章
• 学習方法
– 事前学習：大規模データで次の単語をひたすら予測する.
– 事後学習：RLHF ＊これによって事実性と望ましい挙動を改善.
• 評価結果
– professional and academic ベンチマークで人間相当の精度を達成
• 模擬司法試験受験者の上位10%
• 開発の中核
– スケーリング予測を可能にしたインフラと最適化手法
• 1000分の1（0.1%）学習しただけで最終性能が予測できる
3

GPT4とは
• 大規模なTransformerベースのマルチモーダルモデル
• 入力：文章や画像、出力：文章
4

GPT4とは
• 大規模なTransformerベースのマルチモーダルモデル
• 入力：文章や画像、出力：文章
5
https://www.youtube.com/watch?v=outcGtbnMuQ

GPT4とは
• GPTの歴史
– GPT（2018, 著者4名）
• Improving Language Understanding by Generative Pre-Training
– Pre-training + Fine-tuning 最高👍
– GPT2（2018, 著者6名）
• Language Models are Unsupervised Multitask Learners
– Pre-training + Zero-shot prompting 最高👍
– GPT3（2020, 著者31名）
• Language Models are Few-Shot Learners
– Pre-Training + Few-shot In-context Learning 最高👍 スケーリング則の発見🔥
– GPT3.5 [URL]: InstructGPT & ChatGPT（2022, 著者20名）
• Training language models to follow instructions with human feedback
– Pre-Training + Instruction Following 最高👍
– GPT4（2023, 著者？名）
• GPT-4 Technical Report
– Pre-Training + Instruction Following やっぱり最高🔥🔥🔥 6

学習方法
• 第一段階：事前学習
– 手法：prediction of the next word.
– データ：
• a large dataset of text from the Internet
• データセットから性的コンテンツのサンプルをフィルタリング
– 分類器や語彙ベースの特定手法を用いてフィルタリング
• 第二段階：RLHF
– 手法：SFT -> RM (+RBRMs+Hullcination対策) -> PPO
– データ：
• プロンプトデータ
– The main dataset comes from our production traffic (with consent from users).
– We use prompts written by our red teamers, model-generated synthetic prompts, and prompts from other internal or public datasets.
• デモデータ＋報酬ラベル
– from human trainers.
• 本番環境：Content Classifier for system safety
※ Finished training in August of 2022.
7

学習方法
8
Training language models to follow instructions with human feedback
SFT PPO
RM

学習方法
• Rule-based reward models (RBRMs)
– PPO時の追加シグナルとして利用する
– zero-shot GPT-4 classifiers
– Input : prompt (optional)+ output from GPT-4 + 判定ルール(有害コンテンツ
の有無など)
– Output: 判定結果=報酬
9

学習方法
• Hullcination対策
– open-domain hallucinations
• collect real-world ChatGPT data that has been flagged by users as being not factual,
and collect additional labeled comparison data that we use to train our reward models.
– closed-domain hallucinations
• use GPT-4 itself to generate synthetic data -> mix into RM dataset
• zero-shot？
11

学習方法
• Content Classifier for system safety
– 目的：有害コンテンツを含むユーザ入力をブロックする
– OpenAI constantly developing and improving these classifiers.
– Moderation API
– Classifierの学習自体にGPT4を活用している
• 分類ルールをプロンプトとして与えて、間違ってラベル付けされたテストデータをZero-
shot classificationで特定
• Few-shot classificationで学習データのラベル付け
12

学習方法
• Content Classifier for system safety
13

評価結果
• GPT performance on academic and professional exams
– Post-trained GPT-4 model
– The model’s capabilities on exams
appear to stem primarily from the
pre-training process.
－On multiple choice questions, both the base GPT-4 model and the RLHF
model (=pre-trained & post-trained
model) perform equally well on
average across the exams we tested.
14

評価結果
– Pre-trained GPT-4 model
15

評価結果
• Truthful QA
– To tests the model’s ability to separate fact from an adversarially-selected
set of incorrect statements
– after RLHF post-training we observe large improvements over GPT-3.5.
16

評価結果
– Contamination Check
• For test data appearing in the training set
• Using substring match
– 学習データと評価データを前処理（空白や記号を除去）
– 各評価データについて、50文字の部分文字列を3回、無作為に選択する
– サンプリングされた3つの評価用部分文字列のいずれかが、学習データに存在するかをチェック
– 存在が確認された評価データを除外して再評価する
• False Positive や False Negativeの可能性あり
– The RLHF post-training dataset is vastly smaller than the pretraining set
and unlikely to have any particular question contaminated. However we did
not check explicitly.
17

評価結果
• Visual Inputs
– The standard test-time techniques developed for language models (e.g.
few-shot prompting, chain-of-thought, etc) are similarly effective
18

評価結果
• Limitations
– Not fully reliable (it “hallucinates” facts and makes reasoning errors).
– Still GPT-4 significantly reduces hallucinations relative to previous GPT-3.5
models
19

評価結果
• Limitations
– GPT-4 generally lacks knowledge of events that have occurred after the
vast majority of its pre-training data cuts off in September 2021.
– 多くの領域で能力を発揮しているとは思えないような単純な推論ミスをするこ
ともある
– ユーザーから明らかに間違ったことを言われても、過度に騙されることもある
– 人間と同じように難しい問題で失敗することもある.
• 例: 作成したコードにセキュリティの脆弱性を持ち込むことも.
20

評価結果
• Limitations
– GPT-4 can also be confidently wrong in its predictions
• 事前学習モデルは確信度と正解率が概ねシンクロしている
• 事後学習の過程で相関が薄くなっていく
21

評価結果
• Risks & mitigations
– Adversarial Testing via Domain Experts
• GPT4特有の問題への対応
– long-term AI alignment risks, cybersecurity, 個人情報, and international security
• 50以上の領域の専門家からのアドバイスや訓練データを改善に利用している
22

評価結果
– Safety and Alignment
– Model-Assisted Safety Pipeline
• RLHFによってユーザの意図に沿うような回答を行うように学習はしているものの
• 好ましくない挙動が発生することもある
– 例：犯罪のアドバイスなど
– 報酬モデルの学習データを収集するプロセスでラベル付けをする人たちに正しく指示を与えられ
なかったことが原因
• 対策：
– rule-based reward models (RBRMs)
23

評価結果
– Improvements on Safety Metrics
24

開発の中核
• スケール則による予測
– スケール則（Scaling Law）とは
25
Scaling Laws for Neural Language Models
Scaling Laws for Autoregressive Generative Modeling

開発の中核
– Test Lossの予測：かなり正確
26

開発の中核
– Pythonなどのコーディング問題の予測：少し誤差が出るがある程度正確
27

開発の中核
– But still inverse scaling prize is hard to predict.
28

まとめ・所感
• まとめ
– GPT4は５年間の集大成
• 事前学習だけでも突出した能力
• 事後学習にGPT4自身を活用
• スケーリング則を利用した正確な予測でコスト最小化
• “望ましくない”挙動をフィルタリングするためにデータ収集加工と評価に注力
– 基本スペックは非公開
• データセット、モデル構造＆サイズ、計算時間
• 所感
– すさまじい共同作業
– スケーリング則はどこまで有効なのか
– 望ましい挙動は既に知っている前提（＠事前学習？）
– What’s Next?
29

【DL輪読会】GPT-4Technical Report

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 【DL輪読会】GPT-4Technical Report

Similar to 【DL輪読会】GPT-4Technical Report (20)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (14)

【DL輪読会】GPT-4Technical Report