Neural Networks to Deep Learning: Perceptrons, MLPs, CNNs and AlphaGo

Neural Networks
to Deep Learning
Introduction
16.03.11 You Sung Min

1. Perceptron
2. Multilayer Perceptron(MLP)
3. Algorithm of Neural Networks
4. Deep Networks
5. AlphaGo
Contents

Structure of perceptron (Developed in 1950s)
A simple model to emulate a single neuron
A perceptron takes binary inputs (𝒙 𝟏, 𝒙 𝟐, 𝒙 𝟑 … )
and produce a single binary output (0, 1)
Perceptron
=
𝟎 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 ≤ 𝑻
𝟏 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 > 𝑻
𝝎 𝟏
𝝎 𝟐
𝝎 𝟑
𝒋
𝝎𝒋 𝒙𝒋Binary Inputs
Threshold T

Realistic example
 Suppose the week end is coming up
 There is a cheese festival in your city
 And you like cheese
→ Decide to go or not to go ?
1. Is the weather good? (i.e., 𝒙 𝟏 = 𝟏 𝒐𝒓 𝟎)
2. Does your girlfriend want to accompany you? (𝒙 𝟐 = 𝟏 𝒐𝒓 𝟎)
3. Is the festival near public transit? (𝒙 𝟑 = 𝟏 𝒐𝒓 𝟎)
The decision is depend on the output value
Perceptron
𝒐𝒖𝒕𝒑𝒖𝒕 =
𝟎 (𝒅𝒐𝒏𝒕′
𝒈𝒐) 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 ≤ 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅
𝟏 (𝒈𝒐) 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 > 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅

Affecting factors (input)
1. Weather (𝒙 𝟏 = 𝟏 𝒐𝒓 𝟎)
2. Girlfriend (𝒙 𝟐 = 𝟏 𝒐𝒓 𝟎)
3. Public transit (𝒙 𝟑 = 𝟏 𝒐𝒓 𝟎)
Weight, Threshold and Output (Decision)
If 𝜔1 = 6, 𝜔2 = 2 and 𝜔3 = 2
→ Threshold = 5, Depends on only the weather
→ Threshold = 3, More willing to go to the festival
Perceptron
Go (1): 𝒋 𝝎𝒋 𝒙𝒋 > 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅

Recognizing handwritten digits
Handwritten digits
Rule-based approach
 “9” has a loop at the top, and a vertical stroke in the
bottom right
 Rules are complicated; exceptions
5 0 4 1 9 2

Neural Networks approach
Use large training samples
(handwritten digits data)
Develop a system able to
learn from those examples
- Automatically infer rules
for recognizing digits

Four-layer net (with two hidden layers)
Multilayer Perceptron (MLP)
Input: intensities of pixels
(e.g., 4096 input neurons
for 64-by-64 grayscale
image “9”)
output:
< 0.5 for “not 9”;
> 0.5 for “ 9 “
Not Efficient
Binary Input
(Intensity of a pixel)

Three-layer net to recognize each digit
Multilayer Perceptron (MLP)
Desired output for “5”
𝒚(𝒙) = 𝟎, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟎, 𝟎 𝑻
Handwritten digit with
28 by 28 pixel image
Binary Input
(Intensity of a pixel)

(Quadratic) Cost function
Learning of Neural Network
Input “5”
Output vector for “5”
𝒐𝒖𝒕𝒑𝒖𝒕 = (𝒂 𝟏, 𝒂 𝟐, … , 𝒂 𝟏𝟎) 𝑻
Target vector (Desired output)
𝒚(𝒙) = 𝟎, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟎, 𝟎 𝑻
𝑪 𝝎, 𝒃 =
𝟏
𝟐𝒏
𝒙
| 𝒚 𝒙 − 𝒐𝒖𝒕𝒑𝒖𝒕 | 𝟐
Cost function
Minimize difference
Randomly initialized networks

Find weights to approximate 𝒚(𝒙) for all x
 (Quadratic) Cost function
 Gradient descent
𝑪 𝝎, 𝒃 =
𝟏
𝟐𝒏
𝒙
| 𝒚 𝒙 − 𝒐𝒖𝒕𝒑𝒖𝒕 | 𝟐
𝚫𝑪 ≈
𝝏𝑪
𝝏𝒗 𝟏
𝚫𝒗 𝟏 +
𝝏𝑪
𝝏𝒗 𝟐
𝚫𝒗 𝟐
𝚫𝑪 ≈ 𝛁𝑪 ∙ 𝚫𝒗
𝛁𝑪 ≡ (
𝝏𝑪
𝝏𝒗 𝟏
,
𝝏𝑪
𝝏𝒗 𝟐
) 𝑻Gradient
Vector

 Learning algorithm
 Gradient descent to weights & biases
𝚫𝒗 = −𝜼𝛁𝑪 (𝜼 ∶ 𝒍𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝒓𝒂𝒕𝒆)
𝒗 → 𝒗′
= 𝒗 − 𝜼𝛁𝑪
𝒃𝒍 → 𝒃𝒍
′
= 𝒃𝒍 − 𝜼
𝝏𝑪
𝝏𝒃𝒍
𝝎 𝒌 → 𝝎 𝒌
′
= 𝝎 𝒌 − 𝜼
𝝏𝑪
𝝏𝝎 𝒌
Weight
Bias

Backpropagation algorithm
• Computation of partial derivatives
𝝏𝑪
𝝏𝒃𝒋
𝒍
𝝏𝑪
𝝏𝝎𝒋𝒌
𝒍
Error backpropagation path

(Fully-connected) Multi-layer network
Deep Network
More complex networks, more complicated problems ?

Computation of partial derivatives
Vanishing gradient problem
𝝏𝑪
𝝏𝒃 𝟏
= 𝝈′ 𝒛 𝟏 ∗ 𝝎 𝟐 ∗ 𝝈′ 𝒛 𝟐 ∗ 𝝎 𝟑 ∗ 𝝈′ 𝒛 𝟑 ∗ 𝝎 𝟒 ∗ 𝝈′ 𝒛 𝟒 ∗
𝝏𝑪
𝝏𝒂 𝟒
0.25
0
∵ 𝝎𝒋 < 𝟏, 𝝈′ 𝒛𝒋 <
𝟏
𝟒
< 𝟏/𝟒
Hard to train
deep architecture network
𝑍1 𝑍2 𝑍3 𝑍4
< 𝟏/𝟒
Backpropagation

Deep Networks
To learn deep architecture network
 Convolutional Neural Network (CNN)
 Deep belief Net (DBN)
 Stacked Auto Encoder (SAE)
 Recurrent Neural Network (RNN)

Convolutional Neural Networks
3 Characteristics of CNN
 Local receptive field (connectivity)
- Reduce connections between neurons
 Shared weights
- Reduce total number of weights and bias
 Pooling layer
- Simplify (condense) information

 Local receptive field (connectivity)
28 by 28 23 by 23
5 by 5
Kernel
(window)
2D Convolution
1. Detect local information
(features)
(e.g., Edge, Shape)
2. Reduce connections
between layers
• Fully connected network
→ 𝟐𝟖 ∗ 𝟐𝟖 ∗ 𝟐𝟑 ∗ 𝟐3 connections
• Local connected network
→ 𝟓 ∗ 𝟓 ∗ 𝟐𝟑 ∗ 𝟐𝟑 connections
𝑤11 𝑤12
𝑤55

 Shared weights
1. Detect same feature
in other positions
2. Reduce total number of
weights and bias
3. Construct multiple feature
maps (kernels)
𝒐𝒖𝒕𝒑𝒖𝒕 = 𝝈(𝒃 +
𝒍=𝟎
𝟒
𝒎=𝟎
𝟒
𝝎𝒍,𝒎 𝒂𝒋+𝒍,𝒌+𝒎)

 Pooling layer
1. Simplify (condense)
information in the feature
map
2. Reduce connections
(weights and biases)
Max-pooling:
Output only maximum activation
Conv. Pooling

Deep Networks Application
AlphaGo
 Game of Go (바둑) Artificial Intelligence algorithm
 Google Deepmind
 Convolutional Neural Network (CNN)
 Monte Carlo Tree Search (MTCS)
 Achieved 99.8 % winning rate against other Go
program
 Defeated Human European Go champion by 5:0
Silver, David, et al. "Mastering the game of Go with deep neural
networks and tree search." Nature 529.7587 (2016): 484-489.

Alphago
Convolutional Neural Network in Alphago
The board position as the input with 19 by 19 image
Policy Network
 Decide where to place stone by calculating the probability
Value Network
 Evaluate current state of board positions (winning probability)
Supervised learning(SL)
 Learn from human expert moves (30 million board states)
Reinforcement learning(RL)
 Self-learning to reinforce policy network

Alphago
Convolutional Neural Network in Alphago
Policy
Network
Value
Network
13 Convolutional Layers
Computation
Implementation
CPU 1202
GPU 176
Threads 64
30 billion computation
19 by 19
Board Status
Input
PD for next move
Output
Value of
game state

Alphago
Game Tree Search Algorithm

Alphago
Monte Carlo Tree Search (MTCS)
Current State
Fast rollout

References
Image Source from
http://neuralnetworksanddeeplearning.com
Silver, David, et al. "Mastering the game of Go
with deep neural networks and tree
search." Nature 529.7587 (2016): 484-489.
SPRi Issue Report 2016-002, (2016)

Human Visual Pathway
Human visual system easily recognize
Connection between visual cortex
 E.g. V1 (140 million neurons, tens of billions of connections)
Functional structure of layer
 V1 / V2 : Basic visual features
 V3 / V5 : Spatial localization
 V3 : Shape perception
 V4 : Color vision
Appendix

𝒂𝒋
𝒍
= 𝝈(
𝒌
𝝎𝒋𝒌
𝒍
𝒂 𝒌
𝒍−𝟏
+ 𝒃𝒋
𝒍
)
• Output of each neuron
Weight of connection
Output of prior neuron
Bias of current neuron
Appendix

𝑪 𝝎, 𝒃
=
𝟏
𝟐𝒏
𝒙
| 𝒚 𝒙 − 𝒂 𝑳
(𝒙) | 𝟐
• Cost function
𝑪 =
𝟏
𝟐
| 𝒚 − 𝒂 𝑳
| 𝟐
=
𝟏
𝟐
𝒋
(𝒚𝒋 − 𝒂𝒋
𝑳
) 𝟐
• Cost function
for single training example
Appendix

Neural Networks to Deep Learning: Perceptrons, MLPs, CNNs and AlphaGo

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Neural Networks to Deep Learning: Perceptrons, MLPs, CNNs and AlphaGo

Similar to Neural Networks to Deep Learning: Perceptrons, MLPs, CNNs and AlphaGo (20)

Recently uploaded

Recently uploaded (20)

Neural Networks to Deep Learning: Perceptrons, MLPs, CNNs and AlphaGo

Editor's Notes