2. 1 きっかけ:猫事件
• おそらく一般の人に認知されるきっかけ
• 2012年、Google Brainチームが開発した自己符号化器ディープニュー
ラルネットが猫などの高次概念をラベルなしデータのみで認識した
Le et al. (2012) Building high-level features using large-scale unsupervised learning
8. 7 ニューロン
x0
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
i=0
wixi + b
x0
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
i=0
wixi + b
x0
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
i=0
wixi + b
x0
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
i=0
wixi + b
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
wixi + b
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x7
=
783
i=0
wixi + b
x0
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
i=0
wixi + b
x0
x1
x782
x783
u
f(u)
z
z = f(u)
u = w0x0 + w1x1 + · · · + w783x783 + b
=
783
i=0
wixi + b
神経細胞ニューロン
9. 8 ソフトマックス分類器
• 出力層の活性化関数にソフトマックス関数を使う
softmaxk(u) =
euk
∑K
j=1 eu
k
• 層中の各ユニットのソフトマックス関数の結果の合計は1になる
• ソフトマックス関数の結果は、入力x、重みW、バイアスbが与えられ
た時、xがクラスiに分類される確率を表すと解釈できる
P(Y = i|x, W, b) = softmaxi(Wx + b)
=
eWix+bi
∑
j eWix+bi
• 入力xが属するクラスの予測は条件確率Pを最大にするクラスiになる
ypred = argmaxiP(Y = i|x, W, b)
18. 17 実行してみる
git clone https://github.com/lisa-lab/DeepLearningTutorials.git
cd DeepLearningTutorials
python code/logistic_sgd.py
... loading data
... building the model
... training the model
epoch 1, minibatch 83/83, validation error 12.458333 %
epoch 1, minibatch 83/83, test error of best model 12.375000 %
epoch 2, minibatch 83/83, validation error 11.010417 %
epoch 2, minibatch 83/83, test error of best model 10.958333 %
...
epoch 73, minibatch 83/83, validation error 7.500000 %
epoch 73, minibatch 83/83, test error of best model 7.489583 %
Optimization complete with best validation score of 7.500000 %,
with test performance 7.489583 %
The code run for 74 epochs, with 8.225877 epochs/sec
The code for file logistic_sgd.py ran for 9.0s
24. 23 多層パーセプトロンのニューロンの関係構造
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
l
l + 1
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
= f(l+1)
(u
(l+1)
) = f(l+1)
(z
(l)
) = f(l+1)
f(l)
(u
(l)
)
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
uj
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(
l
l + 1
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
l
l + 1
j
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
l
l + 1
j
k
l + 1
j
k
1
l
l + 1
j
k
1
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
l
l + 1
u
(l)
j
f(l)
(u
(l)
j )
z
(l)
j
u
(l+1)
k
f(l+1)
(u
(l+1)
k )
z
(l+1)
k
z
(l)
j = f(l)
(u
(l)
j )
z
(l+1)
k = f(l+1)
(u
(l+1)
k ) = f(l+1)
(z
(l)
j ) = f(l+1)
f(l)
(u
(l)
j )
l
l + 1
j
26. 25 誤差逆伝播法
i
j
k
l − 1
l
i
j
k
l − 1
l
i
j
k
l − 1
l
l − 1
l
l + 1
w
(l)
ji
w
(l+1)
kj
u
(l)
j
ul+1
k
z
(l)
j
δ
(l+1)
1
δ
(l+1)
k
k
l − 1
l
l + 1
w
(l)
ji
w
(l+1)
kj
u
(l)
j
ul+1
k
z
(l)
j
δ
(l+1)
i
j
k
l − 1
l
l + 1
i
j
k
l − 1
l
l + 1
w
(l)
ji
w
(l+1)
kj
i
j
k
l − 1
l
l + 1
w
(l)
ji
kj
u
(l)
j
ul+1
k
z
(l)
j
δ
(l+1)
1
δ
(l+1)
k
δ
(l+1)
k+1
δ
(l)
j
δ
(l+1)
1
δ
(l+1)
k
δ
(l+1)
k+1
δ
(l)
j 1
ul+1
k
z
(l)
j
δ
(l+1)
1
δ
(l+1)
k
δ
(l+1)
k+1
δ
(l)
j
27. 25 誤差逆伝播法
• 誤差関数E の第l − 1層のiと第l層のj ユニットの間の重みw
(l)
ji につい
ての微分 ∂E
∂w
(l)
ji
について考える
• 合成関数の微分法則(chain rule)より、
∂E
∂w
(l)
ji
=
∂E
∂u
(l)
j
∂u
(l)
j
∂w
(l)
ji
(1)
• 第1項 ∂E
∂u
(l)
j
を考える。これをデルタδ
(l)
j と定義する
• E へのu
(l)
j の変化の影響は、ユニットj に接続されているl + 1層の総入
力u
(l+1)
k の変化によって生じる。同様に合成関数の微分法則により、
δ
(l)
j =
∂E
∂u
(l)
j
=
∑
k
∂E
∂u
(l+1)
k
∂u
(l+1)
k
∂u
(l)
j
(2)
28. • u
(l+1)
k =
∑
j w
(l+1)
kj z
(l)
j =
∑
j w
(l+1)
kj f(u
(l)
j )より
∂ul+1
k
∂u
(l)
j
= wl+1
kj f′
(u
(l)
j )
なので、(2)は
δ
(l)
j =
∑
k
δ
(l+1)
k (w
(l)
kj f′
(u
(l)
j )) (3)
つまりδ
(l)
j は1つ上位の層l + 1のデルタから計算できる
• 第2項
∂u
(l)
j
∂w
(l)
ji
はu
(l)
j =
∑
i w
(l)
ji z
(l−1)
i より
∂u
(l)
j
∂w
(l)
ji
= z
(l−1)
i (4)
• (3)と(4)より目的の(1)は、
∂E
∂w
(l)
ji
= δ
(l)
j z
(l−1)
i (5)
31. 27 実行してみる
git clone https://github.com/lisa-lab/DeepLearningTutorials.git
cd DeepLearningTutorials
python code/mlp.py
... loading data
... building the model
... training
epoch 1, minibatch 2500/2500, validation error 9.620000 %
epoch 1, minibatch 2500/2500, test error of best model 10.090000 %
epoch 2, minibatch 2500/2500, validation error 8.610000 %
epoch 2, minibatch 2500/2500, test error of best model 8.740000 %
epoch 3, minibatch 2500/2500, validation error 8.000000 %
epoch 3, minibatch 2500/2500, test error of best model 8.160000 %
epoch 4, minibatch 2500/2500, validation error 7.600000 %
epoch 4, minibatch 2500/2500, test error of best model 7.790000 %
...
epoch 820, minibatch 2500/2500, test error of best model 1.650000 %
epoch 821, minibatch 2500/2500, validation error 1.680000 %
epoch 822, minibatch 2500/2500, validation error 1.690000 %
...
epoch 999, minibatch 2500/2500, validation error 1.700000 %
epoch 1000, minibatch 2500/2500, validation error 1.700000 %
Optimization complete. Best validation score of 1.680000 % obtained at iteration 205000
with test performance 1.650000 %
The code for file mlp.py ran for 84.29m