4. Outline
What is Machine Learning?
What is Deep Learning?
Hands-on Tutorial of Deep Learning
Tips for Training DL Models
Variants - Convolutional Neural Network
4
6. 一句話說明 Machine Learning
6
Field of study that gives computers the ability
to learn without being explicitly programmed.
- Arthur Lee Samuel, 1959
『
』
7. Machine Learning vs Artificial Intelligence
AI is the simulation of human intelligence processes
Outcome-based: 從結果來看,是否有 human intelligence
一個擁有非常詳盡的 rule-based 系統也可以是 AI
Machine learning 是達成 AI 的一種方法
從資料當中學習出 rules
找到一個夠好的 function
能解決特定的問題
7
Artificial
Intelligence
Machine
Learning
8. Goal of Machine Learning
For a specific task, find a best function to complete
Task: 每集寶可夢結束的“猜猜我是誰”
8
f*( ) =
f*( ) =
Ans: 妙蛙種子
Ans: 超夢
10. 1. Define a Set of Functions
10
Define a set
of functions
Evaluate
and
Search
Pick the best
function
A set of functions, f(‧)
{f(ө1),…,f(ө*),…,f(өn)}
在北投公園的寶可夢訓練師
11. 2. Evaluate and Search
11
Define a set
of functions
Evaluate
and
Search
Pick the best
function
f1( ) =
f1( ) =
根據結果,修正 ө :
避免找身上有皮卡丘的人 (遠離 ө1 )
f1=f(ө1)
12. 3. Pick The Best Function
12
Define a set
of functions
Evaluate
and
Search
Pick the best
function
找到寶可夢訓練大師
23. A Neuron
23
Input
x1
xn
……
z = w1x1+w2x2+……+wnxn+b
̂y
wn
w1
Output
Weight
+
z
σ: Activation
function
b
Bias
σ(z)
̂y = σ(z)
…
Neuron
24. Neuron 的運作
Example
24
σ(z)=z linear function
z = w1x1+w2x2+……+wnxn+b
A neuron
5
2
z = w1x1+w2x2+……+wnxn+b
3
-1
Output
+
z
σ(z)
3
̂y
̂y = z = (-1)*5+3*2+3 = 4 Ө: weights, bias
25. Fully Connected Neural Network
很多個 neurons 連接成 network
Universality theorem: a network with enough number
of neurons can present any function
25
X1
Xn
w1,n
w1,1
w2,n
w2,1
26. Fully Connected Neural Network
A simple network with linear activation functions
5
2
-0.5
+0.2
-0.1
+0.5
-1.5
+0.8
27. Fully Connected Neural Network
5
2
+0.4
+0.1
+0.5
+0.9
0.12
0.55
-0.5
+0.2
-0.1
+0.5
-1.5
+0.8
A simple network with linear activation functions
28. 給定 Network Weights
28
f( ) =
5
2
-0.5
+0.5
-0.1
+0.2
+0.4
+0.1
+0.5
+0.9 0.12
0.55
Given & ,
f(x,Ө)
Ө: weights, bias
A Neural Network = A Function
29. Recall: Deep Learning Framework
31
Define a set
of functions
Evaluate
and
Search
Pick the best
function
特定的網絡架構
A set of functions, f(‧)
{f(ө1),…,f(ө*),…,f(өn)}
f(ө94)
f(ө87)
f(ө945)…
不斷修正 f 的參數
找到最適合的參數
f(ө*)
30. output values 跟 actual values 越一致越好
A loss function is to quantify the gap between
network outputs and actual values
Loss function is a function of Ө
如何評估模型好不好?
32
X1
X2
̂y1
̂y2
y1
y2
L
f(x,Ө)
(Ө)
31. 目標:最佳化 Total Loss
Find the best function that minimize total loss
Find the best network weights, ө*
𝜃∗ = argm𝑖𝑛
𝜃
𝐿(𝜃)
最重要的問題: 該如何找到 ө* 呢?
踏破鐵鞋無覓處 (enumerate all possible values)
假設 weights 限制只能 0.0, 0.1, …, 0.9,有 500 個 weights
全部組合就有 10500 組
評估 1 秒可以做 106 組,要約 10486 年
宇宙大爆炸到現在才 1010 年
Impossible to enumerate
33
37. Summary – Gradient Descent
用來最佳化一個連續的目標函數
朝著進步的方向前進
Gradient descent
Gradient 受 loss function 影響
Gradient 受 activation function 影響
受 learning rate 影響
39
38. Gradient Descent 的缺點
一個 epoch 更新一次,收斂速度很慢
一個 epoch 等於看過所有 training data 一次
Problem 1
有辦法加速嗎?
A solution: stochastic gradient descent (SGD)
Problem 2
Gradient based method 不能保證找到全域最佳解
可以利用 momentum 降低困在 local minimum 的機率
40
39. Gradient Descent 的缺點
一個 epoch 更新一次,收斂速度很慢
一個 epoch 等於看過所有 training data 一次
Problem 1
有辦法加速嗎?
A solution: stochastic gradient descent (SGD)
Problem 2
Gradient based method 不能保證找到全域最佳解
可以利用 momentum 降低困在 local minimum 的機率
41
40. Stochastic Gradient Descent
隨機抽一筆 training sample,依照其 loss 更新一次
另一個問題,一筆一筆更新也很慢
Mini-batch: 每一個 mini-batch 更新一次
Benefits of mini-batch
相較於 SGD: faster to complete one epoch
相較於 GD: faster to converge (to optimum)
42
Update once Update once Update once
Loss Loss Loss
56. 處理順序資料
Ordinal variables (順序資料)
For example: {Low, Medium, High}
Encoding in order
{Low, Medium, High} {1,2,3}
Create a new feature using mean or median
58
UID Age
P1 0-17
P2 0-17
P3 55+
P4 26-35
UID Age
P1 15
P2 15
P3 70
P4 30
82. Alternative: Functional API
The way to go for defining a complex model
For example: multiple outputs, multiple input source
Why “Functional API” ?
All layers and models are callable (like function call)
Example
84
from keras.layers import Input, Dense
input = Input(shape=(200,))
output = Dense(10)(input)
83. 85
# Sequential (依序的)深度學習模型
model = Sequential()
model.add(Dense(128, input_dim=200))
model.add(Activation('sigmoid'))
model.add(Dense(256))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('softmax'))
model.summary()
# Functional API
from keras.layers import Input, Dense
from keras.models import Model
input = Input(shape=(200,))
x = Dense(128,activation='sigmoid')(input)
x = Dense(256,activation='sigmoid')(x)
output = Dense(5,activation='softmax')(x)
# 定義 Model (function-like)
model = Model(inputs=[input], outputs=[output])
84. Good Use Case for Functional API (1)
Model is callable as well, so it is easy to re-use the
trained model
Re-use the architecture and weights as well
86
# If model and input is defined already
# re-use the same architecture of the above model
y1 = model(input)
85. Good Use Case for Functional API (2)
Easy to manipulate various input sources
87
x2
Dense(100) Dense(200)y1x1 outputnew_x2
x1 = input(shape=(10,))
y1 = Dense(100)(x1)
x2 = input(shape=(20,))
new_x2 = keras.layers.concatenate([y1,x2])
output = Dense(200)(new_x2)
Model = Model(inputs=[x1,x2],outputs=[output])
86. Today
Our exercise uses “Sequential” model
Because it is more straight-forward to understand the
details of stacking layers
88
90. Tips for Deep Learning
92
No
Activation Function
YesGood result on
training dataset?
Loss Function
Good result on
testing dataset?
Optimizer
Learning Rate
91. Tips for Deep Learning
93
No
Activation Function
YesGood result on
training dataset?
Loss Function
Good result on
testing dataset?
Optimizer
Learning Rate
𝜕𝐿
𝜕𝜃
=
𝜕𝐿
𝜕 𝑦
𝜕 𝑦
𝜕𝑧
𝜕𝑧
𝜕𝜃
受 loss function 影響
92. Using MSE
在指定 loss function 時
94
# 指定 loss function 和 optimizier
model.compile(loss='categorical_crossentropy',
optimizer=sgd)
# 指定 loss function 和 optimizier
model.compile(loss='mean_squared_error',
optimizer=sgd)
96. How to Select Loss function
Classification 常用 cross-entropy
搭配 softmax 當作 output layer 的 activation function
Regression 常用 mean absolute/squared error
對特定問題定義 loss function
Unbalanced dataset, class 0 : class 1 = 99 : 1
Self-defined loss function
98
Loss Class 0 Class 1
Class 0 0 99
Class 1 1 0
97. Current Best Model Configuration
99
Component Selection
Loss function categorical_crossentropy
Activation function sigmoid + softmax
Optimizer SGD
98. Tips for Deep Learning
100
No
Activation Function
YesGood result on
training data?
Loss Function
Good result on
testing data?
Optimizer
Learning Rate
99. 練習 02_learningRateSelection.py
(5-8 minutes)
101
# 指定 optimizier
from keras.optimizers import SGD, Adam, RMSprop, Adagrad
sgd = SGD(lr=0.01,momentum=0.0,decay=0.0,nesterov=False)
試試看改變 learning rate,挑選出最好的 learning rate。
建議一次降一個數量級,如: 0.1 vs 0.01 vs 0.001
101. How to Set Learning Rate
大多要試試看才知道,通常不會大於 0.1
一次調一個數量級
0.1 0.01 0.001
0.01 0.012 0.015 0.018 …
幸運數字!
103
102. Tips for Deep Learning
104
No
Activation Function
YesGood result on
training data?
Loss Function
Good result on
testing data?
Optimizer
Learning Rate 𝜕𝐿
𝜕𝜃
=
𝜕𝐿
𝜕 𝑦
𝜕 𝑦
𝜕𝑧
𝜕𝑧
𝜕𝜃
受 activation function 影響
108. Leaky ReLU
Allow a small gradient while the input to activation
function smaller than 0
111
α=0.1
f(x)= x if x > 0,
αx otherwise.
df/dx=
1 if x > 0,
α otherwise.
109. Leaky ReLU in Keras
更多其他的 activation functions
https://keras.io/layers/advanced-activations/
112
# For example
From keras.layer.advanced_activation import LeakyReLU
lrelu = LeakyReLU(alpha = 0.02)
model.add(Dense(128, input_dim = 200))
# 指定 activation function
model.add(lrelu)
113. How to Select Activation Functions
Hidden layers
通常會用 ReLU
Sigmoid 有 vanishing gradient 的問題較不推薦
Output layer
Regression: linear
Classification: softmax
116
114. Current Best Model Configuration
117
Component Selection
Loss function categorical_crossentropy
Activation function relu + softmax
Optimizer SGD
115. Tips for Deep Learning
118
No
YesGood result on
training dataset?
Good result on
testing dataset?
Activation Function
Loss Function
Optimizer
Learning Rate
116. Optimizers in Keras
SGD – Stochastic Gradient Descent
Adagrad – Adaptive Learning Rate
RMSprop – Similar with Adagrad
Adam – Similar with RMSprop + Momentum
Nadam – Adam + Nesterov Momentum
119
117. Optimizer – SGD
Stochastic gradient descent
支援 momentum, learning rate decay, Nesterov momentum
Momentum 的影響
無 momentum: update = -lr*gradient
有 momentum: update = -lr*gradient + m*last_update
Learning rate decay after update once
屬於 1/t decay lr = lr / (1 + decay*t)
t: number of done updates
120
130. 一般的起手式: Adam
Adaptive learning rate for every weights
Momentum included
Keras 推薦 RNN 使用 RMSProp
在訓練 RNN 需要注意 explosive gradient 的問題
clip gradient 的暴力美學
RMSProp 與 Adam 的戰爭仍在延燒
133
How to Select Optimizers
131. Tips for Deep Learning
134
No
YesGood result on
training data?
Good result on
testing data?
Activation Function
Loss Function
Optimizer
Learning Rate
132. Current Best Model Configuration
135
Component Selection
Loss function categorical_crossentropy
Activation function relu + softmax
Optimizer Adam
50 epochs 後
90% 準確率!
135. Tips for Deep Learning
138
YesGood result on
training dataset?
YesGood result on
testing dataset?
No
什麼是 overfitting?
training result 進步,但 testing result 反而變差
Early Stopping
Regularization
Dropout
136. Tips for Deep Learning
139
YesGood result on
training dataset?
YesGood result on
testing dataset?
No
Early Stopping
Regularization
Dropout
145. Early Stopping in Keras
Early Stopping
monitor: 要監控的 performance index
patience: 可以容忍連續幾次的不思長進
148
''' EarlyStopping '''
from keras.callbacks import EarlyStopping
earlyStopping=EarlyStopping(monitor = 'val_loss',
patience = 3)
156. How to Set Dropout
不要一開始就加入 Dropout
不要一開始就加入 Dropout
不要一開始就加入 Dropout
a) Dropout 會讓 training performance 變差
b) Dropout 是在避免 overfitting,不是萬靈丹
c) 參數少時,regularization
159
164. Tips for Training Your Own DL Model
167
Yes Good result on
testing dataset?
No
Early Stopping
Regularization
Dropout
No
Activation Function
Loss Function
Optimizer
Learning Rate
Good result on
training dataset?
170. Introduction
“transfer”: use the knowledge learned from task A to
tackle another task B
Example: 綿羊/羊駝 classifier
173
綿羊
羊駝
其他動物的圖
171. Use as Fixed Feature Extractor
A known model, like VGG, trained on ImageNet
ImageNet: 10 millions images with labels
174
OutputInput
取某一個 layer output
當作 feature vectors
Train a classifier based on the features
extracted by a known model
172. Use as Initialization
Initialize your net by the
weights of a known model
Use your dataset to further
train your model
Fine-tuning the known
model
175
OutputInput
OutputInput
VGG model
Your model
173. Short Summary
Unlabeled data (lack of y)
Semi-supervised learning
Insufficient data (lack of both x and y)
Transfer learning (focus on layer transfer)
Use as fixed feature extractor
Use as initialization
Resources: https://keras.io/applications/
Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable are
features in deep neural networks?”, https://arxiv.org/abs/1411.1792, 2014
176
174. Variants of Deep Neural Network
Convolutional Neural Network (CNN)
177
182. Common applications
模糊化, 銳利化, 浮雕
http://setosa.io/ev/image-kernels/
Convolution in Computer Vision (CV)
185
183. Adding each pixel and its local neighbors which are
weighted by a filter (kernel)
Perform this convolution process to every pixels
Convolution in Computer Vision (CV)
186
Image
Filter
184. Adding each pixel and its local neighbors which are
weighted by a filter (kernel)
Perform this convolution process to every pixels
Convolution in Computer Vision (CV)
187
Image Filter
A filter could be
seen as a pattern
189. Feedforward DNN
CNN Structure
192
Convolutional Layer
&
Activation Layer
Pooling Layer
Can be performed
may times
Flatten
PICACHU!!!
Pooling Layer
Convolutional Layer
&
Activation Layer
New image
New image
New image
208. CIFAR-10 Dataset
60,000 (50,000 training + 10,000 testing) samples,
32x32 color images in 10 classes
10 classes
airplane, automobile, ship, truck,
bird, cat, deer, dog, frog, horse
Official website
https://www.cs.toronto.edu/~kriz/
cifar.html
212
209. Overview of CIFAR-10 Dataset
Files of CIFAR-10 dataset
data_batch_1, …, data_batch_5
test_batch
4 elements in the input dataset
data
labels
batch_label
filenames
213
210. How to Load Samples form a File
This reading function is provided from the official site
214
# this function is provided from the official site
def unpickle(file):
import cPickle
fo = open(file, 'rb')
dict = cPickle.load(fo)
fo.close()
return dict
# reading a batch file
raw_data = unpickle(dataset_path + fn)
211. How to Load Samples form a File
Fixed function for Python3
215
# this function is provided from the official site
def unpickle(file):
import pickle
fo = open(file, 'rb')
dict = pickle.load(fo, encoding='latin1')
fo.close()
return dict
# reading a batch file
raw_data = unpickle(dataset_path + fn)
212. Checking the Data Structure
Useful functions and attributes
216
# [1] the type of input dataset
type(raw_data)
# <type 'dict'>
# [2] check keys in the dictionary
raw_data_keys = raw_data.keys()
# ['data', 'labels', 'batch_label', 'filenames']
# [3] check dimensions of pixel values
print "dim(data)", numpy.array(raw_data['data']).shape
# dim(data) (10000, 3072)
217. Input Structures in Keras
Depends on the configuration parameter of Keras
channels_last → (Height, Width, Depth)
channels_first→ (Depth, Height, Width)
222
Width
Height
Depth
32
32 {
"image_data_format":"channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
218. Reshape the Training/Testing Inputs
利用影像的長寬資訊先將 RGB 影像分開,再利用
reshape 函式將一維向量轉換為二維矩陣,最後用
dstack 將 RGB image 連接成三維陣列
223
X_train = numpy.asarray(
[numpy.dstack(
(
r[0:(width*height)].reshape(height,width),
r[(width*height):(2*width*height)].reshape(height,width),
r[(2*width*height):(3*width*height)].reshape(height,width))
) for r in img_px_values]
)
Y_train = np_utils.to_categorical(numpy.array(img_lab), classes)
219. Saving Each Data as Image
SciPy library (pillow package)
Dimension of "arr_data" should be (height, width, 3)
Supported image format
.bmp, .png
224
''' saving ndarray to image'''
from scipy.misc import imsave
def ndarray2image(arr_data, image_fn):
imsave(image_fn, arr_data)
220. Saving Each Data as Image
PIL library (Linux OS)
Dimension of "arr_data" should be (height, width, 3)
Supported image format
.bmp, .jpeg, .png, etc.
225
''' saving ndarray to image'''
from PIL import Image
def ndarray2image(arr_data, image_fn):
img = Image.fromarray(arr_data, 'RGB')
img.save(image_fn)
224. Only one function
Total parameters
Number of Parameters of Each Layers
229
# check parameters of every layers
model.summary()
32*3*3*3 + 32=896
32*3*3*32 + 32=9,248
7200*512 + 512=3,686,912
225. Let’s Start Training
Two validation methods
Validate with splitting training samples
Validate with testing samples
230
''' training'''
# define batch size and # of epoch
batch_size = 128
epoch = 32
# [1] validation data comes from training data
fit_log = model.fit(X_train, Y_train, batch_size=batch_size,
nb_epoch=epoch, validation_split=0.1,
shuffle=True)
# [2] validation data comes from testing data
fit_log = model.fit(X_train, Y_train, batch_size=batch_size,
nb_epoch=epoch, shuffle=True,
validation_data=(X_test, Y_test))
226. Saving Training History
Save training history to .csv file
231
'''saving training history'''
import csv
# define the output file name
history_fn = 'ccmd.csv'
# create the output file
def write_csv(history_fn, fit_log):
with open(history_fn, 'wb') as csv_file:
w = csv.writer(csv_file, lineterminator='n')
# convert the data structure from dictionary to ndarray
temp = numpy.array(fit_log.history.values())
# write headers
w.writerow(fit_log.history.keys())
# write values
for i in range(temp.shape[1]):
w.writerow(temp[:,i])
227. Model Saving and Prediction
Saving/loading the whole CNN model
Predicting the classes with new image samples
232
'''saving model'''
from keras.models import load_model
model.save('my_first_cnn_model.h5')
del model
'''loading model'''
model = load_model('my_first_cnn_model.h5')
'''prediction'''
pred = model.predict_classes(X_test, batch_size, verbose=0)
229. Practice – Design a CNN Model
設計一個 CNN model,並讓他可以順利執行 (Line
17-32)
235
# set dataset path
dataset_path = './cifar_10/'
classes = 10
X_train, X_test, Y_train, Y_test =
utils.read_dataset(dataset_path, "img")
'''CNN model'''
model = Sequential()
# 請建立一個 CNN model
# CNN
model.add(Flatten())
# DNN
230. Let’s Try CNN
Hints
Check the format of training dataset / validation dataset
Design your own CNN model
Don’t forget saving the model
236
Figure reference
https://unsplash.com/collections/186797/coding
235. Recap – Fundamentals
Fundamentals of deep learning
A neural network = a function
Gradient descent
Stochastic gradient descent
Mini-batch
Guidelines to determine a network structure
241
236. Recap – Improvement on Training Set
How to improve performance on training dataset
242
Activation Function
Loss Function
Optimizer
Learning Rate
237. Recap – Improvement on Testing Set
How to improve performance on testing dataset
243
Early Stopping
Regularization
Dropout
238. Recap – CNN
Fundamentals of CNN
Concept of filters
Hyper-parameters
Filter size
Zero-padding
Stride
Depth (total number of filters)
Pooling layers
How to train a CNN in Keras
CIFAR-10 dataset
244
240. Model 儲存與讀取
246
# save model
model.save(“filename.h5”)
# load model
from keras.model import load_model
model_test = load_model(“filename.h5”)
# BTW, use model.summary() to check your layers
Model_test.summary()
241. How to Get Trained Weights
weights = model.get_weights()
model.layers[1].set_weights(weights[0:2])
247
# get weights
myweight = model_test.get_weights()
# set weights
model_test.layers[1].set_weights(myweights[0:2])
# BTW, use model.summary() to check your layers
model_test.summary()
242. How to Get Layer Output
248
# get weights
model_layer1 = Sequential()
model_layer1.add(Dense(128,input_dim=200,
weights=model_test.layers[0].get_weights()))
model_layer1.add(Activation(‘relu’))
# predict
model_layer1.predict(X_train[0:1])
243. Fit_Generator
當資料太大無法一次讀進時 (memory limitation)
249
# get weights
def train_generator(batch_size):
while 1:
data = np.genfromtext('pkgo_city66_class5_v1.csv',
delimiter=',',
skip_header=1)
for i in range(0,np.floor(batch_size/len(data)))
x = data[i*batch_size:(i+1)*batch_size,:200]
y = data[i*batch_size:(i+1)*batch_size,200]
yield x,y
Model.fit_generator(train_generator(28),
epochs=30,
steps_per_epoch=100,
validation_steps=100) # or validation_data
249. Go Deeper in Deep Learning
“Neural Networks and Deep Learning”
written by Michael Nielsen
http://neuralnetworksanddeeplearning.com/
“Deep Learning”
Written by Yoshua Bengio, Ian J. Goodfellow and Aaron
Courville
http://www.iro.umontreal.ca/~bengioy/dlbook/
Course: Machine learning and having it deep and
structured
http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_
2.html
(Slide Credit: Hung-Yi Lee)
250. References
Keras documentation Keras 官方網站,非常詳細
Keras Github 可以從 example/ 中找到適合自己應用的範例
Youtube 頻道 – 台大電機李宏毅教授
Convolutional Neural Networks for Visual Recognition cs231n
若有課程上的建議,歡迎來信
cmchang@iis.sinica.edu.tw and chihfan@iis.sinica.edu.tw
257