ソースコード生成を試してみた話

CDLE LT甲子園予選会 #4
ソースコード生成を試してみた話
2021/03/16
鵜飼

1
本日お話しする内容
• ソースコード生成とは？（GPT-3の概要紹介）
• GPT-3のAPIで調査してみた話
• GPT-2の再現を試みた話

ソースコード生成とは？
2

3
ソースコード生成の流れ

GPT-3の特徴
• 文書生成に長けている
– GPT-3 を使用して掲示板への投稿を行っていたが、
1週間近く誰にも気づかれなかった
– 非常に短い時間で長い投稿をしていることが気になっ
たユーザから問合せがあり、そこで初めて発覚した
• モデルが大きい
– モデルのパラメータ数は1750億と、BERTの約500倍
以上のパラメータ数を持つモデル
– 学習に膨大な時間と費用がかかる
• 幅広いタスクに利用できる
– 1つのモデルで翻訳タスクや読解、QA対応などの多岐
に渡る利用が可能
– ファインチューニングの代わりに、数サンプルの文章
を与えることで様々な生成パターンに対応できる
文献 Parameters Layers dmodel
GPT-2
の論文
117M 12 768
345M 24 1024
762M 36 1280
1542M 48 1600
GPT-3
の論文
125M 12 768
350M 24 1024
760M 24 1536
1.3B 24 2048
2.7B 32 2560
6.7B 32 4096
13.0B 40 5140
175.0B 96 12288
GPT-2
GPT-3
BERTの
パラメータ数
と同等
GPT
文章生成AI「GPT-3」がRedditで1週間誰にも気付かれず人間と会話していたことが判明 - Gigazine
https://gigazine.net/news/20201008-gpt-3-reddit/ 4

5
GPT-3へのテキストの与え方
• 入力テキストの中身は3つの要素で構成される
– タスクの説明（task description）
– 生成させる問と答えのサンプル（example）
– 生成させたい冒頭のテキスト（prompt）
• テキストの与え方によって3パターンの使い方がある
– Zero-shot
• タスクの説明と生成させたい冒頭部分を記載
• 生成させる問と答えのサンプルは与えない
– One-shot
• 生成させる問と答えのサンプルを1つ与える
– Few-shot
• 生成させる問と答えのサンプルを複数与える
Language Models are Few-Shot Learners: https://arxiv.org/abs/2005.14165

GPT-3のAPIで調査してみた話
6

7
ソースコード生成のAPI調査
• OpenAI が公開している API を用いて、以下の組み合わせで調査を実施（下記から 1, 5 を抜粋してご紹介）
# 言語目的結果
1 HTML/CSS, React
プロトタイプの
レイアウト生成
色が特徴的な単語であればレイアウト
に落とし込むことが出来そう。複雑な
図形などは難易度が高そう。
2 HTML/CSS, React
動的なコンポーネントの生成
（計算ロジック）
shotで与えたサンプルの数字を変える
レベルであれば対応できそう。複雑な
処理の場合は未検証。
3 HTML/CSS, React
動的なコンポーネントの生成
（バリデーション）
郵便番号、メールアドレス、金額など
のバリデーションはできそう。
4 Java
システム開発観点
でのコード生成
ファイル入出力のような処理であれば、
ほぼそのまま動作しそうなコードが生
成できた。
5 Python
データ分析処理や
モデル学習処理の生成
データセットの読み込みからモデル作
成までのソースコード生成に成功。
OpenAI: https://beta.openai.com/

8
HTML/CSS, Reactの調査
• task description と example に下記のテキストを与えて検証を実施
// Here are the 2 description:code pairs used to give GPT-3 some context for how to provide a response
// sample1
description: a red button that says stop
code: <button style={{color: 'white', backgroundColor: 'red'}}>Stop</button>
// sample2
description: a blue box that contains 3 yellow circles with red borders
code: <div style={{backgroundColor: 'blue', padding: 20}}><div style={{backgroundColor: 'yellow', border: '5px solid
red', borerRadius: '50%', padding: 20, width: 100, height: 100}}></div><div style={{backgroundColor: 'yellow',
borderWidth: 1, border: '5px solid red', borerRadius: '50%', padding: 20, width: 100, height: 100}}></div><div
style={{backgroundColor: 'yellow', border: '5px solid red', borerRadius: '50%', padding: 20, width: 100, height:
100}}></div></div>

9
HTML/CSS, Reactの調査結果
# description code 出力イメージ
1 a button that like a watermelon
<button style={{backgroundColor: 'red', border: '5px solid green',
borderRadius: '50%', padding: 20}}>I like watermelon</button>
2 a button that like an apple
<button style={{backgroundColor: 'red', borderWidth: 1, border: '5px
solid red', borerRadius: '50%', padding: 20, width: 100, height:
100}}>Apple</button>
3 a button that like an orange
<button style={{color: 'white', backgroundColor: 'orange'}}>Like</button
>
4 a button that like Japan <button style={{color: 'white', backgroundColor: 'red'}}>Japan</button>
5 a button that like a tv
<button style={{backgroundColor: 'white', padding: 20}}>Like a
TV</button>
• 出力されたソースコードは人手で修正することなく動作した
• 名詞から色を抽出するのは部分的にできているが、形状などうまく抽出できない部分もあった

10
Pythonの調査 (1/2)
// sample1
description: merge Dataframe [[1,2,3], [4,5,6]] and [[a,b,c],[d,e,f]]
code:
import pandas as pd
df_a = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
df_b = pd.DataFrame([[a, b, c], [d, e, f]])
pd.merge(df_a, df_b)
// sample2
description: analysis statistics for Titanic dataset
code:
import pandas as pd
train = pd.read_csv("train.csv")
print("Train shape: " + train.shape)
train.describe()

11
Pythonの調査結果 (1/2)
# shot数 description code
1 2 calculate the average of [1, 2, 3, 4, 5, 6]
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
2 2 calculate the mean of [1, 2, 3, 4, 5, 6]
import numpy as np
np.mean([1, 2, 3, 4, 5, 6])
3 2
create classification model using iris
dataset
import pandas as pd
from sklearn.model_selection import train_test_split
• Python のメソッド名にある名詞を使った場合は該当するコードが生成できるが、メソッド名にない名詞を使った場合は想定と異なるコード
が生成されるケースを確認
• scikit-learn や numpy などのライブラリは事前に与えていないが、自動的に import 文が生成されることを確認

12
Pythonの調査 (2/2)
• 3つめのサンプルとして機械学習のモデル構築のコードを追加し、下記を与えて実施
// sample1
description: description: merge Dataframe [[1,2,3], [4,5,6]] and [[a,b,c],[d,e,f]]
code:
import pandas as pd
df_a = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
df_b = pd.DataFrame([[a, b, c], [d, e, f]])
pd.merge(df_a, df_b)
// sample2
description: analysis statistics for Titanic dataset
code:
import pandas as pd
train = pd.read_csv("train.csv")
print("Train shape: " + train.shape)
train.describe()
// sample3
description: train decision tree using Titanic dataset, and test the test dataset
code:
from sklearn import tree
train_y = train[“Survived"].values
train_X = train[["Pclass", "Sex", "Age", "Fare"]].values
test_X = test[["Pclass", "Sex", "Age", "Fare"]].values
tree = tree.DecisionTreeClassifier()
tree = tree.fit(train_y, train_X)
pred = tree.predict(test_X)
print(pred)
サンプルを1つ追加

13
Pythonの調査結果 (2/2)
# shot数
shotの
インデント
description code(整形後)
4
3
(決定木の
shotを追
加)
あり
dataset
from sklearn import svm
iris = datasets.load_iris()
5
3
(決定木の
shotを追
加)
なし
(;区切り)
dataset
import pandas as pd;
import numpy as np;
import matplotlib.pyplot as plt;
from sklearn import svm, datasets;
iris = datasets.load_iris()
X = iris.data[:, :2];
y = iris.target[:, :2];
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)
svm = svm.SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)
print(svm.score(X_test, y_test))
• 事前に与えたテキストが Python の記法に従ったインデントありの状態ではうまく生成されないケースがみられた
• 実験的にインデントを「;」で代用してワンライナーで与えたところ、生成結果は意図したソースコードに近づいた
• 各種ライブラリの import なども行われた

GPT-2の再現を試みた話
14

15
公開モデルを用いたfine-tuningベースでの調査
• 事前学習済みモデルにfine-tuningを行いソースコード生成を試みた
– 実験条件
• まずはReactの生成に目的を絞り込んで実施
• 事前学習済みのGPT-2公開モデル（117MB）を使用
• fine-tuningにはGithubのOrganization単位での
star数上位100アカウントのリポジトリから
jsx形式のファイルを抽出したデータを使用（979ファイル）
– 参照： https://gitstar-ranking.com/organizations
関連リポジトリ: https://github.com/nshepperd/gpt-2
Rank Orginazation
1 Microsoft
2 google
3 facebook
4 apache
5 alibaba
6 vuejs
7 tensorflow
8 freeCodeCamp
9 Tencent
10 github
Organization スターランキング
// sample1
code: <button style={{color: 'white', backgroundColor: 'red'}}>Stop</button>
// sample2
code: <div style={{backgroundColor: 'blue', padding: 20}}><div style={{backgroundColor: 'yellow', border: '5px solid
red', borerRadius: '50%', padding: 20, width: 100, height: 100}}></div><div style={{backgroundColor: 'yellow',
borderWidth: 1, border: '5px solid red', borerRadius: '50%', padding: 20, width: 100, height: 100}}></div><div
style={{backgroundColor: 'yellow', border: '5px solid red', borerRadius: '50%', padding: 20, width: 100, height:
100}}></div></div>

16
• ボタン生成ではなく、Todoリストを作成するようなコードが生成されることを確認
• fine-tuningで与えたデータにTodoリストが含まれており、直前に与えたサンプルよりもfine-tuningに使ったデータに影響を受けた可能性
が考えられる
# description code
1
a button that like a
watermelon
import React from 'react';
import { faCaret, faTooltip } from 'faslib';
class Helppoint extends React.Component {
static propTypes = {
children: PropTypes.any,
label: PropTypes.node.isRequired,
};
export default class Home extends React.Component {
constructor(){
super();
}
componentDidMount(){
this.getStateFromProps();
this.setState({
children: null,
label: null,
ariaLabel: false,
});
this.setState({
children: {
label: 'First 5 minutes',
}
},
asyncHandler => {
this.handleCreateNewTodo({ label: "First 5 minutes" }),
this.handleNewTodoCreate({ template: 'newtodo' })
});
};
static defaultProps = {
title: 'Home',
},
asyncSelect(e) {
this.setState({
createNewTodo: e.previous.testCase(),
newTodo: e.previous.testCase()
});
},
saveData() {
return `${saveFile( '#' + this.props.title + '_home' )}${
renderText}_${saveJSON(this.props)}`;
}
render() {
const { children } = this.props;
if (children.length === 0) {
return <div>
<p>
{this.state.children}
公開モデルを用いたfine-tuningベースでの調査結果
Todoリストを
生成しようとした形跡

17
GPT-2モデルの事前学習の再現に向けた調査・結果
• GPT-2モデルを1から構築してのソースコード生成
– 実験条件
• URLフィルタリング済みのデータを事前学習に使用
– CommonCrawl を活用し、Redditからリンクが貼られているURLのみを抽出
– GPT2論文において、 Wikipediaのデータは複数の検証用データセットからも参照され
ているコーパスであるため、適切な評価を行いやすくするべく除去した
– 参考：https://github.com/jcpeterson/openwebtext
• スクレイピング処理
– htmlの記述を除去してwebページのテキスト部分を抽出
– 処理時間を考慮し、まずは1年分のテキストデータを使用（約42MB）
– 結果
• 「the」「he」「been」などの単語が部分部分で
出現したが、自然な流れのテキストは生成されなかった
• 学習データ量、前処理、パラメータ設定などに改善の余地
関連リポジトリ: https://github.com/akanyaani/gpt-2-tensorflow2.0
// Here are the 2 description:code pairs used to give GPT-3 some context
for how to provide a response
// sample1
code: <button style={{color: 'white', backgroundColor:
'red'}}>Stop</button>
// sample2
code: <div style={{backgroundColor: 'blue', padding: 20}}><div
style={{backgroundColor: 'yellow', border: '5px solid red', borerRadius:
'50%', padding: 20, width: 100, height: 100}}></div><div
style={{backgroundColor: 'yellow', borderWidth: 1, border: '5px solid red',
borerRadius: '50%', padding: 20, width: 100, height: 100}}></div><div
style={{backgroundColor: 'yellow', border: '5px solid red', borerRadius:
'50%', padding: 20, width: 100, height: 100}}></div></div>
# description code
1
a button that like a
watermelon
s the the ? he. ? he's ? he' ? e be ?
he. ? t the ? o a ? h ? h been been
and ? e the the a ? he ? n

19
まとめ
• GPT-3のAPIで調査
– HTML/CSS, React, Java, Python など簡単なソースコードであれば生成できることを確認した
• GPT-2の再現
– 事前学習済みモデルへのファインチューニングと事前学習を行ったが、現時点の調査ではソースコードの
生成は難しいと思われる
• その他の動向
1. 与えられるサンプルテキストの長さと数のトレードオフの関係
• 与えられるテキストの長さに上限があるため、長いサンプルを与えたい場合は与えられるサンプ
ル数が少なくなり、モデルの表現力は少量のサンプルに影響されやすい。短いサンプルを多数与
える場合、モデルの表現力に汎用性を持たせられるが長いテキストの生成に対応しにくい
2. GPT-Neoの取り組み
• 有志研究者によるGPT-3のオープンソース化に向けたプロジェクト
3. GPT-3のAPIのファインチューニング対応に向けた動き
GPT-Neo: https://github.com/EleutherAI/gpt-neo

ソースコード生成を試してみた話

Recommended

Recommended

More Related Content

More from 日本ディープラーニング協会（JDLA）

More from 日本ディープラーニング協会（JDLA） (7)

Recently uploaded

Recently uploaded (12)

ソースコード生成を試してみた話