Modul Topik 8 - Kecerdasan Buatan

Topik 8
Pengantar Deep Learning
Dr. Sunu Wibirama
Modul Kuliah Kecerdasan Buatan
Kode mata kuliah: UGMx 001001132012
July 4, 2022

July 4, 2022
1 Capaian Pembelajaran Mata Kuliah
Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma-
chine learning klasik (linear regression, rule-based machine learning, probabilistic machine
learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge-
nalan citra (convolutional neural network).
Adapun indikator tercapainya CPMK tersebut adalah mengetahui sejarah singkat dan
aplikasi deep learning, mengerti dan memahami konsep backpropagation, perceptron, mema-
hami cara kerja convolutional neural network.
2 Cakupan Materi
Cakupan materi dalam topik ini sebagai berikut:
a) Artificial intelligence and deep learning: materi ini membahas sejarah dan perkem-
bangan riset jaringan syaraf tiruan (neural network) sebagai salah satu cikal bakal
dari teknologi kecerdasan buatan. Pada materi ini dibahas juga penemuan perceptron,
kelemahan perceptron yang tidak bisa menyelesaikan persamaan non-linear sederhana,
penemuan metode backpropagation, sampai dengan penggunaan deep learning hari
ini. Pada materi ini juga dibahas hal-hal yang mendasari pesatnya perkembangan
teknologi deep learning—perangkat keras, big data, dan perangkat lunak.
b) Visualizing deep learning: materi membahas cara kerja deep learning secara visual,
dengan ilustrasi yang mudah dipahami. Konsep-konsep dasar yang dijelaskan dalam
materi ini adalah proses training dalam jaringan syaraf tiruan, bagaimana jaringan
syaraf tiruan mengelola masukan, konsep bobot, konsep fungsi aktivasi, dan konsep
dasar feed forward neural network.
c) Deep learning essentials: materi ini membahas secara detail konsep dasar deep learn-
ing, yakni perceptron, stacking perceptron to form neural networks, optimization through
backpropagation, dan adaptive learning. Konsep loss function juga akan dibahas se-
cara detail dalam materi ini, meliputi binary cross entropy, mean squared error, dan
empirical loss.
d) Convolutional neural network: materi ini akan membahas secara detail konsep feature
engineering dengan konvolusi, konsep pooling, konsep normalization, dan konsep dense
network dalam sebuah arsitektur deep learning—convolutional neural network. Selain
itu, materi ini juga akan membahas kelebihan dan kekurangan convolutional neural
network, serta berbagai macam implementasinya.
1

01/07/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to Deep Learning (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Source: A. Amini (6.S191 Introduction to Deep Learning | MIT, 2019)
The rise of deep learning

01/07/2022
sunu@ugm.ac.id
https://www.mygreatlearning.com/blog/deep-learning-applications/
Deep learning applications
sunu@ugm.ac.id
What is deep learning?
Artificial intelligence and deep learning

01/07/2022
sunu@ugm.ac.id
Neural networks vs. deep neural networks
Source: https://www.pnas.org/content/pnas/116/4/1074/F2.large.jpg
sunu@ugm.ac.id
http://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html
Milestones in the development of neural networks

01/07/2022
sunu@ugm.ac.id
7
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Early version of neural networks
(Andrew Beam, 2017)

01/07/2022
sunu@ugm.ac.id
(Andrew Beam, 2017)
The first AI Winter (1969)
sunu@ugm.ac.id
Early perceptron was not able
to classify simple non linear function such as XOR function
XOR problem in early perceptron

01/07/2022
sunu@ugm.ac.id
(Andrew Beam, 2017)
The emergence of backpropagandists (1986)
sunu@ugm.ac.id
Neural networks was used to recognize hand writing in
a bank check (LeCun,1999)
(Andrew Beam, 2017)

01/07/2022
sunu@ugm.ac.id
The rise of deep learning in Silicon Valley (2012)
• 2012 was the first year that neural nets grew to prominence
• Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used
them to win that year’s ImageNet competition (it is the
annual olympics of computer vision)
• They dropped the classification error record from 26.2% to
15.3%, a remarkable improvement at the time.
• Ever since then, a host of companies have been using
deep learning at the core of their services:
• Facebook uses neural nets for their automatic tagging
algorithms
• Google for their photo search
• Amazon for their product recommendations
• Pinterest for their home feed personalization
• Instagram for their search infrastructure.
sunu@ugm.ac.id
(Andrew Beam, 2017)
Deep learning has mastered GO (2016)

01/07/2022
sunu@ugm.ac.id
Playing video game with deep learning
(Andrew Beam, 2017)
Playing game with deep learning
sunu@ugm.ac.id
https://towardsdatascience.com/machine-learning-methods-to-aid-in-coronavirus-response-70df8bfc7861
Using deep learning to investigate probability
of Covid-19 infection through classification of CT scan images (2020)

01/07/2022
sunu@ugm.ac.id
What was behind the emergence of deep learning?
sunu@ugm.ac.id
12
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Workflow of Traditional Machine Learning
Most machine learning research works try
to develop novel features for more accurate performance
Workflow of traditional machine learning

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
Deep learning

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
Pros and cons
Source: Introducing Deep Learning with Matlab (Mathworks, 2018)

01/07/2022
sunu@ugm.ac.id
Where to start learning deep learning?
John D. Kelleher,
“Deep Learning”, MIT Press, 2019
Jon Krohn, Grant Beyleveld, Aglae Bassens
“Deep Learning Illustrated”, Pearson, 2019
Ian Goodfellow, Yoshua Bengio, Aaron Courville
“Deep Learning”, MIT Press, 2016
Easy Medium Hard
sunu@ugm.ac.id
MIT 6.S191 – Introduction to Deep Learning
http://introtodeeplearning.com

01/07/2022
sunu@ugm.ac.id
Deep Learning Course @ NYU Center for Data Science
https://cds.nyu.edu/deep-learning/
https://atcold.github.io/pytorch-Deep-Learning/
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
15
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id
1986 2015
Read these papers for detailed explanation

01/07/2022
sunu@ugm.ac.id
Basic concept of deep learning
sunu@ugm.ac.id
The Perceptron: forward propagation

01/07/2022
sunu@ugm.ac.id
Common activation functions
sunu@ugm.ac.id
Importance of activation functions

01/07/2022
sunu@ugm.ac.id
Now, let’s go to the simplest one. We
have a camera with 2 x 2 pixels of
resolution.
The following slides are based on Brandon Rohrer’s lecture (2017), crowned as the best Deep Learning lecture in KDDNuggets
sunu@ugm.ac.id
A four pixel camera

01/07/2022
sunu@ugm.ac.id
Categorize images solid
vertical
diagonal
horizontal
If we have a picture, can we ask our computer to
decide: what type of image it is?
sunu@ugm.ac.id
vertical
diagonal
horizontal

01/07/2022
sunu@ugm.ac.id
vertical
diagonal
horizontal
sunu@ugm.ac.id
vertical
diagonal
horizontal

01/07/2022
sunu@ugm.ac.id
solid
vertical
diagonal
horizontal
Categorize images
sunu@ugm.ac.id
However, if you have so many possible combination,
simple rules can’t do it solid
vertical
diagonal
horizontal

01/07/2022
sunu@ugm.ac.id
Instead of coding by yourself, we use
neural networks
sunu@ugm.ac.id
16
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id
dendrites
soma
axon
Drawing of
Purkinje cells (A) and
granule cells (B) from
pigeon cerebellum by
Santiago Ramón y
Cajal,
1899; Instituto Cajal,
Madrid, Spain
B
A
Diagrams of neurons

01/07/2022
sunu@ugm.ac.id
Diagrams of neurons
• Axons can connect to dendrites strongly, weakly, or
somewhere in between.
sunu@ugm.ac.id
Diagrams of neurons
• Medium connection (.6)

01/07/2022
sunu@ugm.ac.id
Diagrams of neurons
• Strong connection (1.0)
sunu@ugm.ac.id
Diagrams of neurons
• Weak connection (.2)
• No connection is a 0.

01/07/2022
sunu@ugm.ac.id
Diagrams of neurons
• Lots of axons connect with the dendrites of one neuron.
• Each has its own connection strength.
sunu@ugm.ac.id
Redrawing and simplifying …

01/07/2022
sunu@ugm.ac.id
Adding quantitative weight
.8 .9
.2
.3
.5
sunu@ugm.ac.id
Back to our 2 x 2 image…

01/07/2022
sunu@ugm.ac.id
Input neurons
sunu@ugm.ac.id
Pixel brightness
-.75 -.50 -.25 0.0 +.25 +.50 +.75 +1.0
We quantify the brightness with a scaled range

01/07/2022
sunu@ugm.ac.id
.75
-.75
0.0
.50
Input vector
sunu@ugm.ac.id
Receptive fields
Each input neuron will only consider pixel value
at a particular position, regardless of pixel value
at the other positions

01/07/2022
sunu@ugm.ac.id
A neuron
+
sunu@ugm.ac.id
Sum all the inputs
+
.75
-.75
0.0
.50
.50
.50
0.00
-.75
+ .75
.50
+

01/07/2022
sunu@ugm.ac.id
Weights
+
.75
-.75
0.0
.50
.50
1.0
1.0
1.0
1.0
.50
0.00
-.75
+ .75
.50
x 1.0
x 1.0
x 1.0
x 1.0
The art of neural networks lay in the
strength of each neuron. Hence, weight of
neuron.
+
sunu@ugm.ac.id
Weights
+
.75
-.75
0.0
.50
-1.075
-.2
0.0
.8
-.5
.50
0.00
-.75
+ .75
-1.075
-.2
0.0
.8
-.5
x
x
x
x
The art of neural networks lay in the
strength of each neuron. Hence, weight of
neuron
+

01/07/2022
sunu@ugm.ac.id
Weights
+
.75
-.75
0.0
.50
-1.075
-.2
0.0
.8
-.5
.50
0.00
-.75
+ .75
-1.075
-.2
0.0
.8
-.5
x
x
x
x
Now, we represent the weight as:
Black : negative
Missing: zero
Orange: positive
+
sunu@ugm.ac.id
Squash the result
+
.75
-.75
0.0
.50
-1.075 -0.746

01/07/2022
sunu@ugm.ac.id
Sigmoid squashing function
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
sunu@ugm.ac.id
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
Your number goes in here

01/07/2022
sunu@ugm.ac.id
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
sunu@ugm.ac.id
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
The squashed version
comes out here

01/07/2022
sunu@ugm.ac.id
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
sunu@ugm.ac.id
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0

01/07/2022
sunu@ugm.ac.id
No matter what you start with,
the answer stays between -1 and 1.
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
sunu@ugm.ac.id
28
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id
Squash the result
+
.75
-.75
0.0
.50
-1.075 -0.746

01/07/2022
sunu@ugm.ac.id
Weighted sum-and-squash neuron
.75
-.75
0.0
.50
-0.746
sunu@ugm.ac.id
Make lots of neurons, identical except for weights
To keep our picture clear, weights
will either be
1.0 (orange)
-1.0 (black) or
0.0 (missing)

01/07/2022
sunu@ugm.ac.id
Receptive fields get more complex
sunu@ugm.ac.id
Repeat for additional layers

01/07/2022
sunu@ugm.ac.id
Receptive fields get still more complex
sunu@ugm.ac.id
Repeat with a variation

01/07/2022
sunu@ugm.ac.id
Rectified linear units (ReLUs)
1.0
.5
-1.0
-.5
1.0
.5 1.5 2.0
-1.0 -.5
-1.5
-2.0
If your number is positive, keep it.
Otherwise you get a zero.
sunu@ugm.ac.id
Positive: solid white
Negative: solid black
Positive: left vertical
Negative: right vertical
Positive: right diagonal
Negative: left diagonal
Positive: bottom horizontal
Negative: top horizontal

01/07/2022
sunu@ugm.ac.id
Add an output layer
solid
vertical
diagonal
horizontal
sunu@ugm.ac.id
solid
vertical
diagonal
horizontal

01/07/2022
sunu@ugm.ac.id
Remember the big picture?
If we have more pixels, we can represent more complex receptive fields
sunu@ugm.ac.id
solid
vertical
diagonal
horizontal
Now, let’s set the receptive fields according to the input

01/07/2022
sunu@ugm.ac.id
solid
vertical
diagonal
horizontal
sunu@ugm.ac.id
solid
vertical
diagonal
horizontal

01/07/2022
sunu@ugm.ac.id
solid
vertical
diagonal
horizontal
sunu@ugm.ac.id
Summary?
• Traditional machine learning research focuses on finding
a novel features. Good for small training datasets, but
requires domain knowledge.
• Deep learning is a type of neural networks with enormous
hidden layers and new type of training method.
• So far, we have learned basic concepts of:
– Perceptron
– Activation function in neural networks
– Feed forward architecture

01/07/2022
sunu@ugm.ac.id
21
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
Deep Learning Essentials (Part 01)
sunu@ugm.ac.id
Core components of deep learning
(Source: Alexander Amini, “Introduction to Deep Learning”, MIT, 2019)

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
e = 2.718..
(Euler
number)
z
sunu@ugm.ac.id
Common activation functions

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
The Perceptron: example
z

01/07/2022
sunu@ugm.ac.id
z
sunu@ugm.ac.id
z

01/07/2022
sunu@ugm.ac.id
z
sunu@ugm.ac.id
16
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
From perceptron to feed forward networks
Perceptron: simplified
Note: for simplicity,
we remove the drawing of bias

01/07/2022
sunu@ugm.ac.id
Multi outputs Perceptron
sunu@ugm.ac.id
Single layer neural networks

01/07/2022
sunu@ugm.ac.id
w
w
w
Single layer neural networks
sunu@ugm.ac.id
Multi output perceptron

01/07/2022
sunu@ugm.ac.id
Deep feed forward networks
sunu@ugm.ac.id
Example problem

01/07/2022
sunu@ugm.ac.id
Example problem: Will I pass this class?
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
Quantifying loss
sunu@ugm.ac.id
Empirical loss

01/07/2022
sunu@ugm.ac.id
Binary cross entropy loss
sunu@ugm.ac.id
Mean squared error loss

01/07/2022
sunu@ugm.ac.id
19
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
Core maths in deep learning

01/07/2022
sunu@ugm.ac.id
Core maths in deep learning
sunu@ugm.ac.id
Chain rules
Composite function refers to the composition of two
functions: one function takes the output of the
other as input.
Consider = and = 2 . You can
compose these functions as
= 2 = 2
To calculate derivatives of composite functions,
you need to use the chain rule:
Note:
The chain rule is important to understand concept of deep neural networks: to
update the network parameters, the chain rule is used to calculate the derivative of
the cost function and update the parameters accordingly (backpropagation).

01/07/2022
sunu@ugm.ac.id
Chain rules
Again, consider = and = 2 .
Based on basic functions of derivatives, we have:
( )
( )
= 2 and
( )
( )
= 2 .
Using chain rules, let’s calculate the derivative of ℎ( ):
(ℎ )
( )
=
( )
( )
( )
( )
Because = ( ) , we have ′ = 2 ( ).
In addition, we also have = 2.
Thus: ℎ = ′
= 2 ( ) 2
= 2 2x 2 = 8x
sunu@ugm.ac.id
Partial derivatives and gradient
• You can use the stylized cursive letter d, ∂, that
can be called “curly d”, to refer to partial
derivatives.
• Let’s take the following function as a first example:
The function takes the two variables and
as input.
• Partial derivatives of , = + are
derivatives with respect to each independent
variable ( and ).

01/07/2022
sunu@ugm.ac.id
The partial derivative of , with
respect to (considering as a constant)
Likewise, the partial derivative of ,
with respect to (considering as a
constant)
sunu@ugm.ac.id
• Suppose you calculate the derivative with
respect to each variable of a function
( , , , … , ) and store all these partial
derivatives in a vector named with the symbol ∇
(“nabla”).
• It is the gradient of f: ∇f is pronounced here
gradient of f, grad f or del f, and contains the
partial derivatives of the function with respect
to each variable:
http://www.claudiobellei.com/2018/01/06/backprop-word2vec/

01/07/2022
sunu@ugm.ac.id
Learning: loss optimization
In deep learning, the way to adjust weights is to optimize to loss function
Thus, learning means automatically finding appropriate weights that optimize loss function
sunu@ugm.ac.id
Find elements of vector (consists of several weights)
that minimizes ( )
Loss optimization

01/07/2022
sunu@ugm.ac.id
for all layers
Loss optimization
sunu@ugm.ac.id
Loss optimization

01/07/2022
sunu@ugm.ac.id
Loss optimization
sunu@ugm.ac.id
Loss optimization

01/07/2022
sunu@ugm.ac.id
Loss optimization
sunu@ugm.ac.id
Gradient descent

01/07/2022
sunu@ugm.ac.id
Gradient descent
sunu@ugm.ac.id
Gradient descent

01/07/2022
sunu@ugm.ac.id
Computing gradient: backpropagation
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
(Alexander Amini, 2019)
Training deep neural networks is difficult
sunu@ugm.ac.id
Lost functions can be difficult to optimize

01/07/2022
sunu@ugm.ac.id
Lost functions can be difficult to optimize
sunu@ugm.ac.id
Source: https://www.jeremyjordan.me/nn-learning-rate/
Setting the learning rate
( )
( )
( )

01/07/2022
sunu@ugm.ac.id
Adaptive learning rates
sunu@ugm.ac.id
Adaptive learning rates

01/07/2022
sunu@ugm.ac.id
Review: core components of deep learning
sunu@ugm.ac.id
34
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
Convolutional Neural Network (Part 01)
sunu@ugm.ac.id
What is Convolutional Neural Network?
• Directly using the original image for recognition (as in the
previous lecture) leads to poor accuracy. Hence, CNN is
developed.
• CNN is one of the most popular deep learning architectures
• Very good deep learning architecture for computer vision task:
object recognition / object classification
• It implements “convolution” to extract features inside the training
set before sending features value to neural network layers
• Used by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton to
win ImageNet Competition (dropping classification error from
26.2% to 15.3%).
• Used in autonomous car of Nvidia.
How can you differentiate
cat vs. dog?

01/07/2022
sunu@ugm.ac.id
Proposed method
Top-5 error rate: 15.3%
What is Convolutional Neural Network?
sunu@ugm.ac.id
How self-driving car works?
https://devblogs.nvidia.com/deep-learning-self-driving-cars/
Training the neural network.
The trained network is used to generate steering
commands from a single front-facing center camera.
High-level view of the data collection system. Input
Input
Actual output by
human driver
Actual output by
human driver
Computed
output
Computed
output
Input
Input Computed output
Computed output

01/07/2022
sunu@ugm.ac.id
CNN architecture
Feature extraction network Classifier network
Source: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/
sunu@ugm.ac.id
CNN architecture

01/07/2022
sunu@ugm.ac.id
Feature extraction network: convolution
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
sunu@ugm.ac.id
How can you differentiate a Samoyed dog and a Wolf?
The computer should be robust from small variations of similar object (e.g.: images of Samoyed dog
taken from different point of views)but sensitive enough to recognize different objects with almost
similar appearance (Samoyed dog vs. Wolf)

01/07/2022
sunu@ugm.ac.id
A toy CNN: X’s (SamoyedDog) and O’s (Wolf)
Says whether a picture is of an X or an O
X or O
CNN
A two-dimensional
array of pixels
sunu@ugm.ac.id
For example
CNN
X
CNN
O

01/07/2022
sunu@ugm.ac.id
Trickier cases: variations of similar object
CNN
X
CNN
O
translation scaling weight
rotation
sunu@ugm.ac.id
Deciding is hard
=
?
What do you think? Is it the same “X” ?

01/07/2022
sunu@ugm.ac.id
What computers see
=
?
sunu@ugm.ac.id
What computers see
Some parts match with original image, some parts don’t
Original image Rotated image

01/07/2022
sunu@ugm.ac.id
Computers are literal
=
x
Computers will say, “Uncertain, I don’t know whether this image matches with another”
sunu@ugm.ac.id
CNN match pieces of the image
=
=
=
Rather than matching the whole thing, we match parts of image

01/07/2022
sunu@ugm.ac.id
Features match pieces of the image
• Rather than matching the whole thing, we match parts of
image
• We provide convolutional filters for this. Note that the
values of the filter are trained using backpropagation
algorithm.
• Therefore, these values are continuously trained
throughout the training process. This aspect is similar to
the updating process of the connection weights of the
ordinary neural network.
sunu@ugm.ac.id
Diagonal line
(downward left to right)
Diagonal line
(Downward right to left)
Little “X”
Three different convolutional filters (kernels)

01/07/2022
sunu@ugm.ac.id
Notes about convolutional kernel
In this example, the pixel values of the kernel
are fixed, for the sake of simplicity.
However in real world CNN, the kernel’s
values are initialized with random values,
and then learned and optimized through
backpropagation (just like weights in deep
neural network)
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id
CNN architecture

01/07/2022
sunu@ugm.ac.id
Diagonal line
(downward left to right)
Diagonal line
(Downward right to left)
Little “X”
Three different convolutional filters (kernels)
sunu@ugm.ac.id
Those features match the image exactly

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
Filtering: the math behind the match
sunu@ugm.ac.id
1. Line up the feature and the image patch.
2. Multiply each image pixel by the corresponding
feature pixel.
3. Stride / slide the filter (usually one or two pixels)
4. Add them up.
5. Divide by the total number of pixels in the feature.
Note: stride is the number of pixels with which we slide our filter

01/07/2022
sunu@ugm.ac.id
1 x 1 = 1
sunu@ugm.ac.id
1 x 1 = 1

01/07/2022
sunu@ugm.ac.id
-1 x -1 = 1
sunu@ugm.ac.id
-1 x -1 = 1

01/07/2022
sunu@ugm.ac.id
-1 x -1 = 1
sunu@ugm.ac.id
1 x 1 = 1

01/07/2022
sunu@ugm.ac.id
-1 x -1 = 1
sunu@ugm.ac.id
-1 x -1 = 1

01/07/2022
sunu@ugm.ac.id
-1 x -1 = 1
sunu@ugm.ac.id
1 x 1 = 1

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
1 x 1 = 1

01/07/2022
sunu@ugm.ac.id
-1 x 1 = -1
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
.55
sunu@ugm.ac.id
Convolution: trying every possible match
Move the filter on the whole pixels,
we get the following map
9 X 9 (n x n) 9 - (m-1) x 9 - (m-1)
= 7 X 7
3 X 3
(m x m)

01/07/2022
sunu@ugm.ac.id
Convolution: trying every possible match
=
sunu@ugm.ac.id
=
=
=

01/07/2022
sunu@ugm.ac.id
Convolution layer
One image becomes a stack of filtered images
sunu@ugm.ac.id
Convolution layer
One image becomes a stack of filtered images

01/07/2022
sunu@ugm.ac.id
Feature extraction network: ReLU
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
sunu@ugm.ac.id
Normalization
• Keep the math from breaking
by tweaking each of the
values just a bit.
• Change everything negative
to zero.

01/07/2022
sunu@ugm.ac.id
Rectified Linear Units (ReLUs)
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
ReLU layer
A stack of images becomes a stack of images with
no negative values.
sunu@ugm.ac.id
Feature extraction network: Pooling
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling

01/07/2022
sunu@ugm.ac.id
Pooling: Shrinking the image stack
1. Pick a window size (usually 2 or 3).
2. Decide a stride (moving step, usually 2)
3. Walk your window across your filtered images.
4. From each window, take the maximum value.
Note: stride is the number of pixels with which we slide our filter
sunu@ugm.ac.id
Pooling
maximum

01/07/2022
sunu@ugm.ac.id
Pooling
maximum
sunu@ugm.ac.id
Pooling
maximum

01/07/2022
sunu@ugm.ac.id
Pooling
max pooling
We get similar patter with
the original map, but smaller
sunu@ugm.ac.id

01/07/2022
sunu@ugm.ac.id
Pooling layer
A stack of images becomes a stack of smaller images.
sunu@ugm.ac.id
Layers get stacked
The output of one becomes the input of the next
Convolution
ReLU
Pooling

01/07/2022
sunu@ugm.ac.id
Deep stacking
Layers can be repeated several (or many) times
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
sunu@ugm.ac.id
50
End of File

01/07/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
sunu@ugm.ac.id
CNN architecture

01/07/2022
sunu@ugm.ac.id
CNN architecture
sunu@ugm.ac.id
Fully connected layer
The outputs are stacked into one layer

01/07/2022
sunu@ugm.ac.id
Vote depends on how strongly a value predicts X or O
X
O
Some pixels of
this stacked layer
(we call it extracted feature)
will have large values for
a certain input (whether it is
X or O)
sunu@ugm.ac.id
Vote depends on how strongly a value predicts X or O
X
O
Some pixels of
this stacked layer
(we call it extracted feature)
will have large values for
a certain input (whether it is
X or O)

01/07/2022
sunu@ugm.ac.id
Future values vote on X or O
X
O
These pixels decide the label
of an unseen new input.
Pixel #1, 4, 5, 10, 11
are used for voting class X
Pixel #2, 3, 9, 12 are used
for voting class O
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
12
sunu@ugm.ac.id
X
O
Suppose we have
a new unseen input.
We want to decide
whether this is a Samoyed
dog or Wolf

01/07/2022
sunu@ugm.ac.id
X
O
.92
Suppose we have
a new unseen input.
We want to decide
dog or Wolf
sunu@ugm.ac.id
X
O
.92
Suppose we have
a new unseen input.
We want to decide
dog or Wolf

01/07/2022
sunu@ugm.ac.id
X
O
.92
.51
Suppose we have
a new unseen input.
We want to decide
dog or Wolf
sunu@ugm.ac.id
X
O
.92
.51
Suppose we have
a new unseen input.
We want to decide
dog or Wolf
OK, so this is
a dog, according
to your AI system

01/07/2022
sunu@ugm.ac.id
A list of feature values becomes a list of votes
X
O
These features become input
for a deep neural network
sunu@ugm.ac.id
These can also be stacked
X
O

01/07/2022
sunu@ugm.ac.id
Putting it all together
A set of pixels becomes a set of votes.
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.92
.51
sunu@ugm.ac.id
Backpropagation
Error = right answer – actual answer
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.92
.51

01/07/2022
sunu@ugm.ac.id
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.92
.51
sunu@ugm.ac.id
.92
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.51
.92

01/07/2022
sunu@ugm.ac.id
.92
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.51
.92
sunu@ugm.ac.id
.92
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.51

01/07/2022
sunu@ugm.ac.id
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.51
.92
sunu@ugm.ac.id
Use backpropagation to minimize loss function
Backpropagation can be used in CNN to
automatically find appropriate weights that minimize loss function
Convolution
ReLU
Pooling
Convolution
ReLU
Convolution
ReLU
Pooling
Fully
connected
Fully
connected
X
O
.51
.92

01/07/2022
sunu@ugm.ac.id
CNN can be used to other data
• Any 2D (or 3D) data.
• Things closer together are
more closely related than
things far away.
Columns of pixels
Rows
of
pixels
sunu@ugm.ac.id
Time steps
Intensity
in
each
frequency
band
CNN can be used to other data
SOUND
Position in sentence
Words
in
dictionary
TEXT

01/07/2022
sunu@ugm.ac.id
Limitations
• CNN only captures local “spatial”
patterns in data.
• If the data can’t be made to look
like an image, CNN is less useful.
• If your data is just as useful after
swapping any of your columns
with each other, then you can’t
use Convolutional Neural
Networks.
• Example: customer data
Name, age,address,
email,purchases,browsing activity,…
Customers
sunu@ugm.ac.id
Take home messages
• In this lecture, we learn about
a well-known variant of deep learning,
Convolutional Neural Networks (CNN).
• CNN consists of two parts: feature extraction
networks and classifiers networks.
• Convolution layers are useful to extract
features from input images.
• Pooling layers are used to squeeze the data
• ReLU layers are used for normalization,
converting data to a range of 0 to 1.
• There are so many variants of CNN, each of
them has been developed to solve specific
problem.
Source: https://towardsdatascience.com/top-10-cnn-architectures-every-machine-learning-engineer-should-know-68e2b0e07201

01/07/2022
sunu@ugm.ac.id
27
End of File

Modul Topik 8 - Kecerdasan Buatan

Recommended

Recommended

More Related Content

Similar to Modul Topik 8 - Kecerdasan Buatan

Similar to Modul Topik 8 - Kecerdasan Buatan (20)

Recently uploaded

Recently uploaded (20)

Modul Topik 8 - Kecerdasan Buatan