2. Neural Network
● Neural network is what in the deep learning
● Designed to simulate human brain
● Consist of several layers and data will be passed
one by one
Input Layer
Hidden Layer
Hidden Layer
Output Layer
Data Input
Data Output
3. Perceptron
● In one perceptron, output value is computed by
● Where xi is the input, wi is weight and f is activation function
y
x1
x2
x3
f z
4. One Layer
● For input vector x, output of one layer is computed by
y = Wx+b
z = f(y)
Where W is weight of the layer, b is bias, f is activation function.
● W and b are parameters because we will modify
them to optimize the model
5. Activation Function
● Without activation function, multiple layers would be meaningless
● Good activation function is non-linear, differentiable, monotonically increasing.
● Logistic function
● Hyperbolic Tangent
● Retified linear function (ReLU)
6. For Various Problems
● The activation function of the output layer depends on the type of problem
● For regression:
○ Activation function: Identity function
○ Length of output vector: Any
● For binary classification:
○ Activation function: Logistic function
○ Length of output vector: 1
● For multi-class classification:
7. Example Task
● Task setting:
○ Given picture of hand-written numbers 0-9, we want to tell which number
it is
○ Training dataset consist of a lot of pictures and all picture is labeled with
correct answer.
● Analyze task:
○ Problem type: Multi-class classification
○ Activation function of output layer: Softmax function
8. Learn: Minimize Error
● Consider error function E(xi, di; W, b) which represents how off the model is
from the true value for the ith picture. Here xi is the vector representation of
picture.
● In our example we use
Here, dik is 1 only if xi is actually picture of number k and otherwise 0.
● We want to modify W and b to minimize error function.
9. Learn: Gradient Descent
We use Gradient Descent to modify parameters W and b.
● Consider yourself in mountain and willing to reach the top of mountain, but you
lost map and you can’t look distance because of smog. How do you reach the
top?
➔Move to direction that bring you to the highest.
● The vector that indicate the direction to move is called gradient
● Since we want to minimize (instead of maximize), we update parameters by
subtracting gradient.
10. Learn: Backpropagation
● Neural network with multiple layers are too complex to
simply compute gradients.
● This problem was one bottleneck in early stage of
development of deep learning.
● Backpropagation compute gradient from output layer
to input layer (A lot of chain rules).
Input Layer
Hidden Layer
Hidden Layer
Output Layer
Data Input
Data Output
11. Why Deep Learning?
● Deep learning has tons of parameters (things that we can change to optimize
the model)
➔Better accuracy
➔Hard to optimize
➔Need a lot of data
● Flexible model
➔Can be used to different types of problems
➔Easy to modify models for various situations
12. Experiment
● Run python script for the example task
● Input vector is given as vector of length 784
● Configurations are
○ 2 hidden layers
○ 1000 perceptrons for each layers
○ ReLU function for activation function
○ Softmax function for activation function for the output layer
○ Batch size: 100
13. Result
● For this experiment, I used example code of chainer
● Execution time: about 45min
● Final Validation Loss: 0.107
● Final Validation Accuracy: 0.98
14. Use case of Deep Learning
● Convolutional Neural Network: Deep learning specific to picture data
○ Object identification
○ Face recognition
● Recurrent Neural Network: Deep learning for sequential data
○ Speech recognition
○ For text
● DQN: Combination of deep learning and Q learning
○ Alpha Go uses DQN and won top level Go player