6. 6
3 8 2 1
7 9 1 1
4 5 2 3
5 6 1 2
9 2
6 3
Max Pool
Pool after convolutional layer, reduce noise and data size.
Local information are concentrated into higher level information.
Besides max pool, there is also average pool.
Size: 2 x 2
7. 7
Input x0, x1, x2
Output a
Parameters w0, w1, w2, b
(to be optimized)
𝑧 = w0 x0 + w1x1 + w2 x2 + b
a = f(z) activation function
Nonlinear increase certainty
ReLU Sigmoid
𝑓(𝑧) =
1
1 + 𝑒−𝑧
f(z) = max(0, z)
http://cs231n.github.io/neural-networks-1/
on
off
on
off
8. 8
Fully Connected Layers
All the input from the previous layer are
combined at each node.
x0
x1
x2
x3
𝑎0
[1]
= 𝑓(𝑤0,0
1
∙ 𝑥0 + 𝑤1,0
1
∙ 𝑥1 + 𝑤2,0
1
∙ 𝑥2 + 𝑤3,0
1
∙ 𝑥3 + 𝑏0
1
)
All the local features extracted in previous
layers are fully connected with different
weights to construct global features.
Complicated relationship between input can
be revealed by deep networks.
https://github.com/drewnoff/spark-notebook-ml-labs/tree/master/labs/DLFramework
𝑎0
[1]
𝑎1
[1]
𝑎2
[1]
𝑎3
[1]
𝑎4
[1]
𝑎5
[1]
𝑎0
[2]
𝑎1
[2]
𝑎2
[2]
𝑎3
[2]
𝑎4
[2]
𝑎5
[2]
𝑎0
[3]
𝑎1
[3]
𝑎2
[3]
𝑎3
[3]
𝑎4
[3]
𝑎5
[3]
𝑎1
[1]
= 𝑓(𝑤0,1
1
∙ 𝑥0 + 𝑤1,1
1
∙ 𝑥1 + 𝑤2,1
1
∙ 𝑥2 + 𝑤3,1
1
∙ 𝑥3 + 𝑏1
1
)
…...
10. 10
…………
𝑎0
[𝐿−1]
𝑎1
[𝐿−1]
𝑎2
[𝐿−2]
𝑎1023
[𝐿−1]
……………..….…………
L-1 layer
L layer
𝑦0
𝑦1
𝑦2
𝑦9
𝑊[𝐿]
𝑏[𝐿]
𝑎0
[𝐿−2]
𝑎1
[𝐿−2]
𝑎2
[𝐿−2]
𝑎1023
[𝐿−2]
……………..….…………
L-2 layer
𝑊[𝐿−1]
𝑏[𝐿−1]
0
1
2
9
𝐺𝑜𝑎𝑙: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑 𝑦 𝑎𝑛𝑑 𝑡𝑟𝑢𝑒 𝑦.
1. With the initial parameters W and b, predict the label 𝑦 with
forward propagation, calculate the cost.
2. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝐿 𝑙𝑎𝑦𝑒𝑟, 𝑊[𝐿]
& 𝑏[𝐿]
,
assuming inputs from L-1 layer, 𝐴[𝐿−1]
do not change.
3. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑜𝑓 𝐿– 1 𝑙𝑎𝑦𝑒𝑟 𝑖𝑛𝑝𝑢𝑡 , 𝐴[𝐿−1]
,
which is needed to minimize the cost funtion,
assuming parameters 𝑊[𝐿]
& 𝑏[𝐿]
do not change.
4. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝐿– 1 𝑙𝑎𝑦𝑒𝑟, 𝑊[𝐿−1] & 𝑏[𝐿−1] ,
𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 of 𝐴[𝐿−1]
.
5. Proceed like this all the way to the first layer,
optimize the parameters W and b of all layers.
6. Running forward propagation and backpropagation once is called
one epoch, run multiple epochs until cost is near minimum value.
𝐴[𝐿−2] 𝐴[𝐿−1] Forward Propagation
Backpropagation https://en.wikipedia.org/wiki/Backpropagation https://en.wikipedia.org/wiki/Geoffrey_Hinton
11. 1
1
This technique can force the network to learn features in a distributed way and reduces the overfitting.
Dropout applied after the two pool layers and first two full connected layers.
A proportion of nodes in the
layer are randomly ignored for
each training sample.
12. 12
Some images are even hard for human to
recognize, more samples like these can help.
After 1 hour, 30 epochs’ training,
achieved 99.67% accuracy.
Can you predict the true value?
13. 13
Some images are even hard for human to
recognize, more samples like these can help.
After 1 hour, 30 epochs’ training,
achieved 99.67% accuracy.
18. 1
8
• Convolutional Neural Network is very powerful for analyzing visual image.
• The convolutional layers can capture the local features.
• The pooling layers can concentrate the local changes, as well as reduce the noise and data size.
• The full connected layers can combine all the local features to generate global features .
• The global features are combined to make the final judgement, here the probability of label [0,9].
• Can human understand Artificial Neural Networks?
• Is there any similarity between brain and CNN to process the visual information?
• What is the meaning of local and global features generated by machines?
• Can human understand machines’ logic?
Python code of the project at kaggle: https://www.kaggle.com/dingli/digits-recognition-with-cnn-keras