SlideShare a Scribd company logo
1 of 50
Topik 5
Regresi Linear dan Seleksi Model
Machine Learning
Dr. Sunu Wibirama
Modul Kuliah Kecerdasan Buatan
Kode mata kuliah: UGMx 001001132012
June 22, 2022
June 22, 2022
1 Capaian Pembelajaran Mata Kuliah
Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma-
chine learning klasik (linear regression, rule-based machine learning, probabilistic machine
learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge-
nalan citra (convolutional neural network).
Adapun indikator tercapainya CPMK tersebut adalah memahami dasar-dasar linear re-
gression, mampu membedakan antara klasifikasi dan regresi, mengerti implementasi linear
regression, memahami konsep K-Fold Cross Validation, curse of dimensionality, overfitting,
dan underfitting.
2 Cakupan Materi
Cakupan materi dalam topik ini sebagai berikut:
a) Simple Linear Regression: materi ini menjelaskan konsep dasar regresi linear, konsep
residual sum of squares, dan contoh penerapannya dalam memprediksi nilai baru.
b) Correlation and coefficient of determination: materi ini menjelaskan konsep korelasi
dan koefisien determinasi. Korelasi menjelaskan kekuatan dari variabel x dan y. Koe-
fisien determinasi 𝑅2
adalah proporsi dari variasi pada variabel terikat (𝑌 ) yang dapat
diprediksi oleh variabel bebas (𝑥).
c) Multiple Linear Regression: materi ini menjelaskan konsep regresi dengan variabel
bebas lebih dari satu. Pada materi ini juga dijelaskan konsep perhitungan parameter
regresi dengan memanfaatkan teknik-teknik aljabar linear (Gauss Elimination).
d) Model Selection: materi ini menjelaskan konsep overfitting, underfitting, dan seleksi
model dengan menggunakan K-Fold Cross Validation. Selain itu, pada materi ini juga
dijelaskan konsep regularization dan curse of dimensionality.
1
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Simple Linear Regression (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Supervised Machine Learning
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
1st type of supervised learning: Regression
• Regression is a measure of the
relation between values of one
variable (e.g. housing price) and
corresponding values of other
variable (e.g. time).
• Regression means to predict the
numerical output value using
training data.
• Basic algorithm : linear regression
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Neural network model Geometric model
Logical model/rule-based model Probabilistic model
2nd type of supervised learning: Classification
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Using regression for modeling Covid-19 infection
Puno, G.R., Puno, R.C.C. and Maghuyop, I.V., 2021. COVID-19 case fatality rates across Southeast Asian countries (SEA): a preliminary estimate using a
simple linear regression model. Journal of Health Research.
Deaths and confirmed cases of infections of COVID-
19 among the SEA countries as of May 21, 2020
Scatterplots and regression lines of COVID-19
confirmed cases of infections and deaths of the six
SEA countries as of May 21, 2020
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Eye Tracking Calibration: Fixation-Based
• Goal: improving spatial accuracy during gaze
interaction
• User is asked to fixate their gaze on calibration
targets (animated white or red circle).
• Mapping from eye position to calibration target:
second order polynomial regression
Gaze calibration target
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
7
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Simple Linear Regression (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Advertising budget case
Suppose that we are statistical consultants hired by a
client to provide advice on how to improve sales of a
particular product.
The advertising dataset consists of the sales of that
product in 200 different markets, along with advertising
budget for TV (TV ads).
If we determine that there is a correlation between
advertising and sales, then we can instruct our client to
adjust advertising budget, thereby indirectly increasing
sales.
Our goal is to develop an accurate mathematical model
that can be used to predict sales on the basis of the TV
ads budget.
Sales
(Thousands)
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with
permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
TV ads budget (IDR million)
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Advertising budget case
The advertising budget is input variables while sales is an
output variable.
The inputs go by different names, such as predictors,
independent variables, regressors, features, or sometimes just
variables, denoted using the symbol
The output variable—in this case, sales—is often called the
response or dependent variable, and is typically denoted
using the symbol .
A reasonable form of a relationship between the response
and the regressor is the linear relationship:
= +
Sales
(Thousands)
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with
permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
TV ads budget (IDR million)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Linear Regression
= +
: response variable or
dependent variable
: predictor variables or features
or independent variables or
regressor
: intercept
: slope
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Simple Linear Regression (SLM)
• Regression analysis: the relationships among variables are
not deterministic (i.e., not exact). There must be a random
component to the equation that relates the variables.
• This random component takes into account considerations
that are not being measured or, in fact, are not understood
by the scientists or engineers.
• To accommodate this random component: simple linear
regression model (SLM)
• In the above, and are unknown intercept and slope
parameters, respectively, and is a random variable that is
assumed to be distributed with E( ) = 0 and Var( ) = ,
called random error or random disturbances.
= + +
Basic concept:
We want to find a line (linear model) that
can generalize the data, where the line
represent all data with minimum error
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Weighted linear equation in deep learning
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Simple Linear Regression (Part 03)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Simple Linear Regression (SLM)
• Regression analysis: the relationships among variables are
not deterministic (i.e., not exact). There must be a random
component to the equation that relates the variables.
• This random component takes into account considerations
that are not being measured or, in fact, are not understood
by the scientists or engineers.
• To accommodate this random component: simple linear
regression model (SLM)
• In the above, and are unknown intercept and slope
parameters, respectively, and is a random variable that is
assumed to be distributed with E( ) = 0 and Var( ) = ,
called random error or random disturbances.
= + +
Basic concept:
We want to find a line (linear model) that
can generalize the data, where the line
represent all data with minimum error
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Least squares and the fitted model
• The residual sum of squares (RSS) is often called the sum of
squares of the errors about the regression line and is denoted by
SSE.
• This minimization procedure for estimating the parameters is called
the method of least squares. Hence, we shall find and so as
to minimize
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Side note: turning points
• We can find turning points (that might correspond to minima) of a function, ( ), by
searching for points where the gradient of the function,
( )
, is zero. To determine
whether or not a turning point corresponds to a maximum, minimum (saddle point),
we can examine the second derivative,
( )
. If, at a turning point , the second
derivative is positive, we know that this turning point is a minimum.
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Least squares and the fitted model
• Setting the partial derivatives equal to zero, we obtain normal
equations for estimating regression coefficients:
=
∑ − ∑ ∑
∑ − ∑
=
∑ ∑ − ∑ ∑
∑ − ∑
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Least squares and the fitted model
• Simpler formula:
Note: additional notation
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Simple Linear Regression (Part 04)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Least squares and the fitted model
• Setting the partial derivatives equal to zero, we obtain normal
equations for estimating regression coefficients:
=
∑ − ∑ ∑
∑ − ∑
=
∑ ∑ − ∑ ∑
∑ − ∑
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Example : weights of 10 students
• There is a research about weight of 10 students that are
predicted to be affected by consumption of calory / day.
• Goal of analysis: to understand whether calory affects
weight.
• We must identify predictor variable (features) and
response variable:
• (predictor) = amount of calory / day
• (response) = weight
• = 10 students
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Example : weights of 10 students
Subject # Calory / day ( ) Weight ( )
1 530 89
2 300 48
3 358 56
4 510 72
5 302 54
6 300 42
7 387 60
8 527 85
9 415 63
10 512 74
After collecting data from 10 students, we get the following table:
(Source: Digital Talent Scholarship Training Course Materials, 2019)
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
To find a and b, we must compute ², ²,
y dan the sum ( ) of them
(Source: Digital Talent Scholarship Training Course Materials, 2019)
Example : weights of 10 students
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Since we have only one predictor (feature), computing and is straightforward
2.608
0.149
(Source: Digital Talent Scholarship Training Course Materials, 2019)
Example : weights of 10 students
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Example : weights of 10 students
Then, we create the regression mathematical model. Based on our previous slide, we
obtain this equation: = 2.608 + 0.149
(Source: Digital Talent Scholarship Training Course Materials, 2019)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Example : weights of 10 students
Using this equation, we can the perform prediction
Suppose a student obtains 600 calory/day:
= 2.608 + 0.149
prediction of = 2.608 + (0.149 * 600) = 92 kilogram
You can also predict the calory consumption, given the weight of the student.
For example, if the weight of the student is 40 kilogram, then:
40 = 2.608 + 0.149
37.392 = 0.149
prediction of = 250.59 calory/day
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Correlation
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Correlation
• Correlation testing is used to measure the
strength of relationship between and .
• Regression analysis is commonly performed with
correlation analysis.
• Correlation coefficient (r) is expressed by:
=
∑ − ∑ ∑
∑ − ∑ ∑ − ∑
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Example : weights of 10 students
• There is a research about weight of 10 students that are
predicted to be affected by consumption of calory / day.
• Goal of analysis: to understand whether calory affects
weight.
• We must identify predictor variable (features) and
response variable:
• (predictor) = amount of calory / day
• (response) = weight
• = 10 students
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Example : weights of 10 students
Subject # Calory / day ( ) Weight ( )
1 530 89
2 300 48
3 358 56
4 510 72
5 302 54
6 300 42
7 387 60
8 527 85
9 415 63
10 512 74
After collecting data from 10 students, we get the following table:
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Data: weight of 10 students
• The relationship between predictor variable ( ) and response variable ( ) is very
strong, we found 95% correlation value (r = 0.95)
• In summary, body mass is indeed affected by calory consumption per day.
0.95
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Coefficient of determination
• The coefficient of determination, denoted 2 or 2 and pronounced
"R squared", is the proportion of the variance in the dependent
variable ( ) that is predictable from the independent variable ( )
• 2 is a statistic that will give some information about the goodness
of fit of a model (i.e., adequacy of regression model)  0  2  1
• An 2 of 1 indicates that the regression predictions perfectly fit the
data. 2 of 0 indicates that the dependent variable can’t be reliably
predicted from the independent variable.
• From our data, the coefficient of determination is 2 = 0.90
Interpretation: 90% of predictor variable ( ) can explain response
variable ( ), while 10% of it is explained by the other variable.
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Coefficient of determination
• Another way to compute 2 or 2
SSE: sum of squared error
SST: total corrected sum of squares
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Multiple Linear Regression (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Review: Ads budget and sales
(Thousands)
TV ads budget
(IDR million)
Radio ads budget
(IDR million)
Newspaper ads budget
(IDR million)
The input variables are typically denoted using the symbol X, with a subscript to distinguish them.
So will be the TV budget, the radio budget, and the newspaper budget.
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Complexity of engineering problems
• In some cases, more than one independent variable is needed to
develop a regression model.
• Thus, we need multiple linear regression
• The estimated response is obtained from the sample regression
equation:
• We obtain the least squares estimators of the parameters to the
data points:
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Multiple linear regression
• In using the concept of least squares to arrive at estimates
, , … , , we minimize the expression
• We generate the set of k + 1 normal equations for multiple linear
regression.
Solve this linear
equation of xb = y
using numerical method
or linear algebra
(i.e. Gauss Elimination
or inverse)
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Multiple linear regression – fitted model
For the advertising data, a linear regression fit to sales using TV ads
budget and Radio ads budget as predictors.
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"
(Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Multiple Linear Regression (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Multiple linear regression – fitted model
For the advertising data, a linear regression fit to sales using TV ads
budget and Radio ads budget as predictors.
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"
(Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Multiple linear regression
• In using the concept of least squares to arrive at estimates
, , … , , we minimize the expression
• We generate the set of k + 1 normal equations for multiple linear
regression.
Solve this linear
equation of xb = y
using numerical method
or linear algebra
(i.e. Gauss Elimination
or inverse)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Example
• We used data on pull strength of a
wire bond in a semiconductor
manufacturing process, wire length,
and die height to illustrate building an
empirical model.
• Note: for simplicity, units are omitted.
• Number of observation ( ): 25
• Independent variables:
• Wire length ( )
• Die height ( )
• Dependent variable:
• Pull strength ( )
Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger.
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Example
Compute the coefficient of normal equations
Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Example
Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger.
Compute the coefficient of normal equations
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Multiple Linear Regression (Part 03)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Solving a system of linear equation
• Systems of linear equations that have to be solved
simultaneously arise in problems that include several (possibly
many) variables that are dependent on each other.
• The general form of a system of n linear algebraic equations is:
• Solving a system of linear equation can be performed by various
ways, such as: inverse method, Gauss elimination, Gauss-Jordan
elimination, LU decomposition, or iterative method (Jacobi,
Gauss-Seidel).
• You need basic understanding of numerical methods to solve a
system of linear equation
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Example – Inverse method
We can write normal equations in
matrix form as follows:
xb = y
To solve vector b, we find inverse of
matrix x ∶
b = x y
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
But not all matrices have their inverse…
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Example – Gauss Elimination
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Example – Gauss Elimination
Goal: to convert the equation into upper triangular matrix
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Example – Gauss Elimination
Step 1: R1 times by 8.24 (206/25), and subtracted from R2
Step 2: R1 times by 331.76 (8294/25), and subtracted from R3
R1 as pivot row
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Example – Gauss Elimination
Step 3: R2 times by 12.64664 (8834.44/698.56), and subtracted from R3
Step 4: Final form of the equation with upper triangular matrix
R2 as pivot row
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Example – Gauss Elimination
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
Example – Gauss Elimination
Gaussian elimination method Inverse matrix method
Note: if you use inverse matrix method, you need to make sure
that the matrix is not singular and you can compute the inverse
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Model Selection (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
The philosophy of machine learning
Given these examples, decide which
arithmetic operation (addition,
subtraction, multiplication, and
division) is the best choice to explain
mapping of the unknown function
between inputs and outputs
(Kelleher, 2019)
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
The philosophy of machine learning
• Multiplication is the best choice.
• However, if the number of inputs to the
unknown function increases (perhaps to
hundreds or thousands inputs), the
variety of potential functions to be
considered get larger
• This is how machine learning takes part to
search the best function:
find a function that generalizes well to an
unseen data, based on previously seen
data.
(Kelleher, 2019)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Machine Learning Pipeline
https://machinelearningmastery.com/machine-learning-checklist/
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Why machine learning is difficult?
• First, dataset includes noises. Learning a function
to exactly matches the data = learning the noises.
Hence, “data are the new oil”
• Second, sometimes the set of possible functions
is larger than the set of examples in the dataset
(ill-posed problem: information given in the
problem is not sufficient to find a single best
solution).
• Third, not all features that you extract are useful to
find a good model. A lot of features do not lead to
more accurate model. This is called the curse of
dimensionality.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Overfitting and underfitting
Regression
problem
Classification
problem
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Model Selection (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Model selection (1)
• An important problem in many applications of regression
analysis involves selecting the set of regressor variables
(independent variables) to be used in the model.
• Our case: we are sure that not all these candidate regressors
are necessary to adequately model the response Y.
• Our solution: variable selection  screening the candidate
variables to obtain a regression model that contains the “best”
subset of regressor variables.
• Goal: the final model contains enough regressor variables so
that in the intended use of the model (prediction, for example)
will perform satisfactorily.
• We would like the model to use as few regressor variables as
possible to model maintenance costs to a minimum and to make
the model easy to use.
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Model selection (2)
• We assume that there are candidate regressors, ,
, … , and a single response variable .
• All models will include an intercept term , so the model with all
variables included would have +1 terms.
• If there are candidate regressors, there are 2 total equations
to be examined. For example, for two candidates , , we
must examine :
=
= +
= +
= + +
• Hence, the number of equations to be examined increases
rapidly as the number of candidate variables increases.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Model selection (3)
• We can use to evaluate linear regression model.
• Another way to assess the quality of regression model is
to use total mean square error for the regression model.
• We define mean square error (MSE) for the regression
model as:
= ∑ −
= −
= + − 2
= + 2 + − 2
= + − 2
= + ( )2
• We choose the best model with the least MSE score
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
MSE vs model complexity
• Simple model is restrictive  cannot
capture variability in complex real data
• Higher-order model is flexible  can
capture variability in complex real data
• Increasing model complexity means
increasing flexibility. However, this
comes with cost.
• If the model is too complex, it fails to
perform generalization  test MSE is
high but training MSE is low.
• Ideal condition is achieved when the
model is not to complex (blue line,
spline 1)  test MSE and training MSE
achieve minimum score
: linear regression
: splines with complex model
: real data
: training MSE
: test MSE
linear regression
spline 1
spline 2
Source: G. James, D. Witten, T. Hastie and R. Tibshirani,
An Introduction to Statistical Learning, with applications in R, Springer, 2013.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
End of File
14/06/2022
1
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Model Selection (Part 03)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Bias-variance trade off
• Variance: refers to the amount by which would change if we
estimated it using a different training data set. If a model has overly
high variance, it means that the model is learning the noise of the
training data in addition to the true association.
• Bias: is the average error between the prediction values and the
true values of new data. It reflects the ability of a model to learn
the true association. If a model has overly high bias, it means that
the algorithm was not able to learn a good approximation of true
association of the real data.
• Training set: the training set is typically 60% of the data. As the
name suggests, this is used for training a regression model (finding
regression parameters).
• Validation set: The validation is also called the the development
set. This is typically 20% of the data. This set is not used during
training. It is used to test the quality of the trained model.
• Test set: this set is typically 20% of the data. Its only purpose is to
report the accuracy of the final model.
Bias problem:
high training error, validation error is similar
in magnitude to the training error
Variance problem:
Low training error, validation error is very high
14/06/2022
2
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Overfitting and underfitting
Underfitting:
linear model is not
sufficient to fit the samples;
high bias, low variance
Good fit:
polynomial of degree 4
approximates true
function almost perfectly;
balanced bias and variance
Overfitting:
a higher degrees of polynomial
fits the noises from sample data;
low bias, high variance
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Cross validation
• The loss (MSE) that we calculate from validation
set will be sensitive to the choice of data in our
validation set.
• This is particularly problematic if our dataset (and
hence our validation set) is small
• K-fold cross-validation splits the data into K equally
(or as close to equal as possible) sized blocks.
• Each block takes its turn as a validation set for a
training set comprised of the other K − 1 blocks.
• Averaging over the resulting K loss values gives us
our final loss value.
Source: S. Rogers and M. Girolami, A First Course in Machine Learning, Chapman & Hall/CRC, 2017.
14/06/2022
3
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Cross validation
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
End of File
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Model Selection (Part 04)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
As the number of independent
variables or features or
dimensions grows, the amount of
data we need to generalize
accurately grows exponentially
Source: C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Curse of dimensionality
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Curse of dimensionality
The amount of training data needed to cover 20% of the feature range
grows exponentially with the number of dimensions (features/independent variables).
http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
1 dimension 2 dimensions 3 dimensions
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Data availability and regression parameters
• Previously, we use least squares to find and by minimizing the residual sum of squares
(RSS) or sum of squares errors (SSE)
• Least squares works well if the number of observations is much larger than the number
of regression parameters , ≫ .
• However, data are expensive (curse of dimensionality: if the independent variables increase,
you need more data). If > or even > , the model will result in overfitting.
• By constraining or shrinking the estimated coefficients, we can often substantially reduce the
variance at the cost of a negligible increase in bias.
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Regularization: ridge regression
• Ridge regression is very similar to least squares, except that the coefficients are
estimated by minimizing a slightly different quantity.
• The ridge regression minimizes the following equation:
where ≥ 0 is a tuning parameter, to be determined separately.
• Ridge regression seeks coefficient estimates that fit the data well, by making the SSE
small. However, the second term ∑ , called shrinkage penalty, is small
when , …, are close to zero, so it has the effect of shrinking towards zero.
2
2 2
0
1 1 1 1
p p p
n
i j ij j j
i j j j
y b b x b SSE b
 
   
 
    
 
 
   
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Regularization: ridge regression
• If λ = 0, we retrieve the original solution just like what least
squares do.
• A fifth-order polynomial function can fit the six data points
exactly and we can see this if we set λ = 0
• In general N data points can be perfectly fitted by an (N − 1)th
order polynomial.
• If we increase λ, we begin to see the regularization taking
effect.
• λ = 1 x 10-6 follows the general shape of the exact fifth-order
polynomial but without as much variability and subsequently
is further from the data points. Thus, it can be used to obtain
better prediction while avoiding overfitting.
• It is common to use cross validation to choose the value of λ
that gives the best predictive performance
14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
How to avoid overfitting and underfitting
• Fixing high bias problem:
• Train more complex model: we can increase the degree of the
independent variable  instead of using linear regression,
we can use polynomial regression.
• Train the model with more independent variables: we add more
information to the model (multiple linear regression), so that we
have better representation of the dependent variable.
• Fixing high variance problem:
• Obtain more data  avoid overfitting by providing more exposure
to the model
• Decrease the number of independent variable
• Performing regularization (ridge regression)
• Choose the best model using cross validation
• The goal is to achieve balance between bias and variance  this is
where the experiment takes place. You try several model and choose
the best (the most optimum) by looking at one or more metrics
(MSE, , and )
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
End of File

More Related Content

Similar to Modul Topik 5 - Kecerdasan Buatan

IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET Journal
 
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLINGCREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLINGIRJET Journal
 
Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...Enamul Islam
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingKai Xin Thia
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - ReportAkanksha Gohil
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...IRJET Journal
 
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingGregg Barrett
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...Alexander Decker
 
Loan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesLoan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesIRJET Journal
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsIRJET Journal
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Loan Approval Prediction
Loan Approval PredictionLoan Approval Prediction
Loan Approval PredictionIRJET Journal
 

Similar to Modul Topik 5 - Kecerdasan Buatan (20)

IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLINGCREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
CREDIT CARD FRAUD DETECTION USING PREDICTIVE MODELLING
 
Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
 
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
 
Data mining on Financial Data
Data mining on Financial DataData mining on Financial Data
Data mining on Financial Data
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...An application of artificial intelligent neural network and discriminant anal...
An application of artificial intelligent neural network and discriminant anal...
 
Loan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesLoan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning Techniques
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy Algorithms
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Loan Approval Prediction
Loan Approval PredictionLoan Approval Prediction
Loan Approval Prediction
 

More from Sunu Wibirama

Modul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanModul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanModul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan BuatanModul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdfModul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdfSunu Wibirama
 
Modul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfModul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfSunu Wibirama
 
Modul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanModul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 2 - Kecerdasan Buatan.pdf
Modul Topik 2 - Kecerdasan Buatan.pdfModul Topik 2 - Kecerdasan Buatan.pdf
Modul Topik 2 - Kecerdasan Buatan.pdfSunu Wibirama
 
Modul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan BuatanModul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan BuatanSunu Wibirama
 
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfPengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfSunu Wibirama
 
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanIntroduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanSunu Wibirama
 
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Sunu Wibirama
 

More from Sunu Wibirama (11)

Modul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanModul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan Buatan
 
Modul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanModul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan Buatan
 
Modul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan BuatanModul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan Buatan
 
Modul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdfModul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdf
 
Modul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfModul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdf
 
Modul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanModul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan Buatan
 
Modul Topik 2 - Kecerdasan Buatan.pdf
Modul Topik 2 - Kecerdasan Buatan.pdfModul Topik 2 - Kecerdasan Buatan.pdf
Modul Topik 2 - Kecerdasan Buatan.pdf
 
Modul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan BuatanModul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan Buatan
 
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfPengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
 
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanIntroduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
 
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 

Modul Topik 5 - Kecerdasan Buatan

  • 1. Topik 5 Regresi Linear dan Seleksi Model Machine Learning Dr. Sunu Wibirama Modul Kuliah Kecerdasan Buatan Kode mata kuliah: UGMx 001001132012 June 22, 2022
  • 2. June 22, 2022 1 Capaian Pembelajaran Mata Kuliah Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma- chine learning klasik (linear regression, rule-based machine learning, probabilistic machine learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge- nalan citra (convolutional neural network). Adapun indikator tercapainya CPMK tersebut adalah memahami dasar-dasar linear re- gression, mampu membedakan antara klasifikasi dan regresi, mengerti implementasi linear regression, memahami konsep K-Fold Cross Validation, curse of dimensionality, overfitting, dan underfitting. 2 Cakupan Materi Cakupan materi dalam topik ini sebagai berikut: a) Simple Linear Regression: materi ini menjelaskan konsep dasar regresi linear, konsep residual sum of squares, dan contoh penerapannya dalam memprediksi nilai baru. b) Correlation and coefficient of determination: materi ini menjelaskan konsep korelasi dan koefisien determinasi. Korelasi menjelaskan kekuatan dari variabel x dan y. Koe- fisien determinasi 𝑅2 adalah proporsi dari variasi pada variabel terikat (𝑌 ) yang dapat diprediksi oleh variabel bebas (𝑥). c) Multiple Linear Regression: materi ini menjelaskan konsep regresi dengan variabel bebas lebih dari satu. Pada materi ini juga dijelaskan konsep perhitungan parameter regresi dengan memanfaatkan teknik-teknik aljabar linear (Gauss Elimination). d) Model Selection: materi ini menjelaskan konsep overfitting, underfitting, dan seleksi model dengan menggunakan K-Fold Cross Validation. Selain itu, pada materi ini juga dijelaskan konsep regularization dan curse of dimensionality. 1
  • 3. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Simple Linear Regression (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Supervised Machine Learning
  • 4. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 1st type of supervised learning: Regression • Regression is a measure of the relation between values of one variable (e.g. housing price) and corresponding values of other variable (e.g. time). • Regression means to predict the numerical output value using training data. • Basic algorithm : linear regression sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Neural network model Geometric model Logical model/rule-based model Probabilistic model 2nd type of supervised learning: Classification
  • 5. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Using regression for modeling Covid-19 infection Puno, G.R., Puno, R.C.C. and Maghuyop, I.V., 2021. COVID-19 case fatality rates across Southeast Asian countries (SEA): a preliminary estimate using a simple linear regression model. Journal of Health Research. Deaths and confirmed cases of infections of COVID- 19 among the SEA countries as of May 21, 2020 Scatterplots and regression lines of COVID-19 confirmed cases of infections and deaths of the six SEA countries as of May 21, 2020 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Eye Tracking Calibration: Fixation-Based • Goal: improving spatial accuracy during gaze interaction • User is asked to fixate their gaze on calibration targets (animated white or red circle). • Mapping from eye position to calibration target: second order polynomial regression Gaze calibration target
  • 6. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 7 End of File
  • 7. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Simple Linear Regression (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Advertising budget case Suppose that we are statistical consultants hired by a client to provide advice on how to improve sales of a particular product. The advertising dataset consists of the sales of that product in 200 different markets, along with advertising budget for TV (TV ads). If we determine that there is a correlation between advertising and sales, then we can instruct our client to adjust advertising budget, thereby indirectly increasing sales. Our goal is to develop an accurate mathematical model that can be used to predict sales on the basis of the TV ads budget. Sales (Thousands) "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani " TV ads budget (IDR million)
  • 8. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Advertising budget case The advertising budget is input variables while sales is an output variable. The inputs go by different names, such as predictors, independent variables, regressors, features, or sometimes just variables, denoted using the symbol The output variable—in this case, sales—is often called the response or dependent variable, and is typically denoted using the symbol . A reasonable form of a relationship between the response and the regressor is the linear relationship: = + Sales (Thousands) "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani " TV ads budget (IDR million) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Linear Regression = + : response variable or dependent variable : predictor variables or features or independent variables or regressor : intercept : slope
  • 9. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Simple Linear Regression (SLM) • Regression analysis: the relationships among variables are not deterministic (i.e., not exact). There must be a random component to the equation that relates the variables. • This random component takes into account considerations that are not being measured or, in fact, are not understood by the scientists or engineers. • To accommodate this random component: simple linear regression model (SLM) • In the above, and are unknown intercept and slope parameters, respectively, and is a random variable that is assumed to be distributed with E( ) = 0 and Var( ) = , called random error or random disturbances. = + + Basic concept: We want to find a line (linear model) that can generalize the data, where the line represent all data with minimum error sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Weighted linear equation in deep learning
  • 10. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 End of File
  • 11. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Simple Linear Regression (Part 03) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Simple Linear Regression (SLM) • Regression analysis: the relationships among variables are not deterministic (i.e., not exact). There must be a random component to the equation that relates the variables. • This random component takes into account considerations that are not being measured or, in fact, are not understood by the scientists or engineers. • To accommodate this random component: simple linear regression model (SLM) • In the above, and are unknown intercept and slope parameters, respectively, and is a random variable that is assumed to be distributed with E( ) = 0 and Var( ) = , called random error or random disturbances. = + + Basic concept: We want to find a line (linear model) that can generalize the data, where the line represent all data with minimum error
  • 12. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Least squares and the fitted model • The residual sum of squares (RSS) is often called the sum of squares of the errors about the regression line and is denoted by SSE. • This minimization procedure for estimating the parameters is called the method of least squares. Hence, we shall find and so as to minimize sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Side note: turning points • We can find turning points (that might correspond to minima) of a function, ( ), by searching for points where the gradient of the function, ( ) , is zero. To determine whether or not a turning point corresponds to a maximum, minimum (saddle point), we can examine the second derivative, ( ) . If, at a turning point , the second derivative is positive, we know that this turning point is a minimum.
  • 13. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Least squares and the fitted model • Setting the partial derivatives equal to zero, we obtain normal equations for estimating regression coefficients: = ∑ − ∑ ∑ ∑ − ∑ = ∑ ∑ − ∑ ∑ ∑ − ∑ sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Least squares and the fitted model • Simpler formula: Note: additional notation
  • 14. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 End of File
  • 15. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Simple Linear Regression (Part 04) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Least squares and the fitted model • Setting the partial derivatives equal to zero, we obtain normal equations for estimating regression coefficients: = ∑ − ∑ ∑ ∑ − ∑ = ∑ ∑ − ∑ ∑ ∑ − ∑
  • 16. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Example : weights of 10 students • There is a research about weight of 10 students that are predicted to be affected by consumption of calory / day. • Goal of analysis: to understand whether calory affects weight. • We must identify predictor variable (features) and response variable: • (predictor) = amount of calory / day • (response) = weight • = 10 students sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Example : weights of 10 students Subject # Calory / day ( ) Weight ( ) 1 530 89 2 300 48 3 358 56 4 510 72 5 302 54 6 300 42 7 387 60 8 527 85 9 415 63 10 512 74 After collecting data from 10 students, we get the following table: (Source: Digital Talent Scholarship Training Course Materials, 2019)
  • 17. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 To find a and b, we must compute ², ², y dan the sum ( ) of them (Source: Digital Talent Scholarship Training Course Materials, 2019) Example : weights of 10 students sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Since we have only one predictor (feature), computing and is straightforward 2.608 0.149 (Source: Digital Talent Scholarship Training Course Materials, 2019) Example : weights of 10 students
  • 18. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Example : weights of 10 students Then, we create the regression mathematical model. Based on our previous slide, we obtain this equation: = 2.608 + 0.149 (Source: Digital Talent Scholarship Training Course Materials, 2019) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Example : weights of 10 students Using this equation, we can the perform prediction Suppose a student obtains 600 calory/day: = 2.608 + 0.149 prediction of = 2.608 + (0.149 * 600) = 92 kilogram You can also predict the calory consumption, given the weight of the student. For example, if the weight of the student is 40 kilogram, then: 40 = 2.608 + 0.149 37.392 = 0.149 prediction of = 250.59 calory/day
  • 19. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 End of File
  • 20. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Correlation Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Correlation • Correlation testing is used to measure the strength of relationship between and . • Regression analysis is commonly performed with correlation analysis. • Correlation coefficient (r) is expressed by: = ∑ − ∑ ∑ ∑ − ∑ ∑ − ∑
  • 21. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Example : weights of 10 students • There is a research about weight of 10 students that are predicted to be affected by consumption of calory / day. • Goal of analysis: to understand whether calory affects weight. • We must identify predictor variable (features) and response variable: • (predictor) = amount of calory / day • (response) = weight • = 10 students sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Example : weights of 10 students Subject # Calory / day ( ) Weight ( ) 1 530 89 2 300 48 3 358 56 4 510 72 5 302 54 6 300 42 7 387 60 8 527 85 9 415 63 10 512 74 After collecting data from 10 students, we get the following table:
  • 22. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Data: weight of 10 students • The relationship between predictor variable ( ) and response variable ( ) is very strong, we found 95% correlation value (r = 0.95) • In summary, body mass is indeed affected by calory consumption per day. 0.95 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Coefficient of determination • The coefficient of determination, denoted 2 or 2 and pronounced "R squared", is the proportion of the variance in the dependent variable ( ) that is predictable from the independent variable ( ) • 2 is a statistic that will give some information about the goodness of fit of a model (i.e., adequacy of regression model)  0  2  1 • An 2 of 1 indicates that the regression predictions perfectly fit the data. 2 of 0 indicates that the dependent variable can’t be reliably predicted from the independent variable. • From our data, the coefficient of determination is 2 = 0.90 Interpretation: 90% of predictor variable ( ) can explain response variable ( ), while 10% of it is explained by the other variable.
  • 23. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Coefficient of determination • Another way to compute 2 or 2 SSE: sum of squared error SST: total corrected sum of squares sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 End of File
  • 24. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Multiple Linear Regression (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Review: Ads budget and sales (Thousands) TV ads budget (IDR million) Radio ads budget (IDR million) Newspaper ads budget (IDR million) The input variables are typically denoted using the symbol X, with a subscript to distinguish them. So will be the TV budget, the radio budget, and the newspaper budget.
  • 25. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Complexity of engineering problems • In some cases, more than one independent variable is needed to develop a regression model. • Thus, we need multiple linear regression • The estimated response is obtained from the sample regression equation: • We obtain the least squares estimators of the parameters to the data points: sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Multiple linear regression • In using the concept of least squares to arrive at estimates , , … , , we minimize the expression • We generate the set of k + 1 normal equations for multiple linear regression. Solve this linear equation of xb = y using numerical method or linear algebra (i.e. Gauss Elimination or inverse)
  • 26. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Multiple linear regression – fitted model For the advertising data, a linear regression fit to sales using TV ads budget and Radio ads budget as predictors. "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani " sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 End of File
  • 27. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Multiple Linear Regression (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Multiple linear regression – fitted model For the advertising data, a linear regression fit to sales using TV ads budget and Radio ads budget as predictors. "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
  • 28. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Multiple linear regression • In using the concept of least squares to arrive at estimates , , … , , we minimize the expression • We generate the set of k + 1 normal equations for multiple linear regression. Solve this linear equation of xb = y using numerical method or linear algebra (i.e. Gauss Elimination or inverse) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Example • We used data on pull strength of a wire bond in a semiconductor manufacturing process, wire length, and die height to illustrate building an empirical model. • Note: for simplicity, units are omitted. • Number of observation ( ): 25 • Independent variables: • Wire length ( ) • Die height ( ) • Dependent variable: • Pull strength ( ) Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger.
  • 29. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Example Compute the coefficient of normal equations Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Example Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger. Compute the coefficient of normal equations
  • 30. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 End of File
  • 31. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Multiple Linear Regression (Part 03) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Solving a system of linear equation • Systems of linear equations that have to be solved simultaneously arise in problems that include several (possibly many) variables that are dependent on each other. • The general form of a system of n linear algebraic equations is: • Solving a system of linear equation can be performed by various ways, such as: inverse method, Gauss elimination, Gauss-Jordan elimination, LU decomposition, or iterative method (Jacobi, Gauss-Seidel). • You need basic understanding of numerical methods to solve a system of linear equation
  • 32. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Example – Inverse method We can write normal equations in matrix form as follows: xb = y To solve vector b, we find inverse of matrix x ∶ b = x y sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 But not all matrices have their inverse…
  • 33. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Example – Gauss Elimination sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Example – Gauss Elimination Goal: to convert the equation into upper triangular matrix
  • 34. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Example – Gauss Elimination Step 1: R1 times by 8.24 (206/25), and subtracted from R2 Step 2: R1 times by 331.76 (8294/25), and subtracted from R3 R1 as pivot row sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Example – Gauss Elimination Step 3: R2 times by 12.64664 (8834.44/698.56), and subtracted from R3 Step 4: Final form of the equation with upper triangular matrix R2 as pivot row
  • 35. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Example – Gauss Elimination sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 Example – Gauss Elimination Gaussian elimination method Inverse matrix method Note: if you use inverse matrix method, you need to make sure that the matrix is not singular and you can compute the inverse
  • 36. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 End of File
  • 37. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Model Selection (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 The philosophy of machine learning Given these examples, decide which arithmetic operation (addition, subtraction, multiplication, and division) is the best choice to explain mapping of the unknown function between inputs and outputs (Kelleher, 2019)
  • 38. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 The philosophy of machine learning • Multiplication is the best choice. • However, if the number of inputs to the unknown function increases (perhaps to hundreds or thousands inputs), the variety of potential functions to be considered get larger • This is how machine learning takes part to search the best function: find a function that generalizes well to an unseen data, based on previously seen data. (Kelleher, 2019) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Machine Learning Pipeline https://machinelearningmastery.com/machine-learning-checklist/
  • 39. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Why machine learning is difficult? • First, dataset includes noises. Learning a function to exactly matches the data = learning the noises. Hence, “data are the new oil” • Second, sometimes the set of possible functions is larger than the set of examples in the dataset (ill-posed problem: information given in the problem is not sufficient to find a single best solution). • Third, not all features that you extract are useful to find a good model. A lot of features do not lead to more accurate model. This is called the curse of dimensionality. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Overfitting and underfitting Regression problem Classification problem
  • 40. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 End of File
  • 41. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Model Selection (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Model selection (1) • An important problem in many applications of regression analysis involves selecting the set of regressor variables (independent variables) to be used in the model. • Our case: we are sure that not all these candidate regressors are necessary to adequately model the response Y. • Our solution: variable selection  screening the candidate variables to obtain a regression model that contains the “best” subset of regressor variables. • Goal: the final model contains enough regressor variables so that in the intended use of the model (prediction, for example) will perform satisfactorily. • We would like the model to use as few regressor variables as possible to model maintenance costs to a minimum and to make the model easy to use.
  • 42. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Model selection (2) • We assume that there are candidate regressors, , , … , and a single response variable . • All models will include an intercept term , so the model with all variables included would have +1 terms. • If there are candidate regressors, there are 2 total equations to be examined. For example, for two candidates , , we must examine : = = + = + = + + • Hence, the number of equations to be examined increases rapidly as the number of candidate variables increases. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Model selection (3) • We can use to evaluate linear regression model. • Another way to assess the quality of regression model is to use total mean square error for the regression model. • We define mean square error (MSE) for the regression model as: = ∑ − = − = + − 2 = + 2 + − 2 = + − 2 = + ( )2 • We choose the best model with the least MSE score
  • 43. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 MSE vs model complexity • Simple model is restrictive  cannot capture variability in complex real data • Higher-order model is flexible  can capture variability in complex real data • Increasing model complexity means increasing flexibility. However, this comes with cost. • If the model is too complex, it fails to perform generalization  test MSE is high but training MSE is low. • Ideal condition is achieved when the model is not to complex (blue line, spline 1)  test MSE and training MSE achieve minimum score : linear regression : splines with complex model : real data : training MSE : test MSE linear regression spline 1 spline 2 Source: G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, with applications in R, Springer, 2013. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 End of File
  • 44. 14/06/2022 1 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Model Selection (Part 03) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Bias-variance trade off • Variance: refers to the amount by which would change if we estimated it using a different training data set. If a model has overly high variance, it means that the model is learning the noise of the training data in addition to the true association. • Bias: is the average error between the prediction values and the true values of new data. It reflects the ability of a model to learn the true association. If a model has overly high bias, it means that the algorithm was not able to learn a good approximation of true association of the real data. • Training set: the training set is typically 60% of the data. As the name suggests, this is used for training a regression model (finding regression parameters). • Validation set: The validation is also called the the development set. This is typically 20% of the data. This set is not used during training. It is used to test the quality of the trained model. • Test set: this set is typically 20% of the data. Its only purpose is to report the accuracy of the final model. Bias problem: high training error, validation error is similar in magnitude to the training error Variance problem: Low training error, validation error is very high
  • 45. 14/06/2022 2 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Overfitting and underfitting Underfitting: linear model is not sufficient to fit the samples; high bias, low variance Good fit: polynomial of degree 4 approximates true function almost perfectly; balanced bias and variance Overfitting: a higher degrees of polynomial fits the noises from sample data; low bias, high variance sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Cross validation • The loss (MSE) that we calculate from validation set will be sensitive to the choice of data in our validation set. • This is particularly problematic if our dataset (and hence our validation set) is small • K-fold cross-validation splits the data into K equally (or as close to equal as possible) sized blocks. • Each block takes its turn as a validation set for a training set comprised of the other K − 1 blocks. • Averaging over the resulting K loss values gives us our final loss value. Source: S. Rogers and M. Girolami, A First Course in Machine Learning, Chapman & Hall/CRC, 2017.
  • 46. 14/06/2022 3 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Cross validation sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 End of File
  • 47. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Model Selection (Part 04) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 As the number of independent variables or features or dimensions grows, the amount of data we need to generalize accurately grows exponentially Source: C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 Curse of dimensionality
  • 48. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Curse of dimensionality The amount of training data needed to cover 20% of the feature range grows exponentially with the number of dimensions (features/independent variables). http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/ 1 dimension 2 dimensions 3 dimensions sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Data availability and regression parameters • Previously, we use least squares to find and by minimizing the residual sum of squares (RSS) or sum of squares errors (SSE) • Least squares works well if the number of observations is much larger than the number of regression parameters , ≫ . • However, data are expensive (curse of dimensionality: if the independent variables increase, you need more data). If > or even > , the model will result in overfitting. • By constraining or shrinking the estimated coefficients, we can often substantially reduce the variance at the cost of a negligible increase in bias.
  • 49. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Regularization: ridge regression • Ridge regression is very similar to least squares, except that the coefficients are estimated by minimizing a slightly different quantity. • The ridge regression minimizes the following equation: where ≥ 0 is a tuning parameter, to be determined separately. • Ridge regression seeks coefficient estimates that fit the data well, by making the SSE small. However, the second term ∑ , called shrinkage penalty, is small when , …, are close to zero, so it has the effect of shrinking towards zero. 2 2 2 0 1 1 1 1 p p p n i j ij j j i j j j y b b x b SSE b                      sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Regularization: ridge regression • If λ = 0, we retrieve the original solution just like what least squares do. • A fifth-order polynomial function can fit the six data points exactly and we can see this if we set λ = 0 • In general N data points can be perfectly fitted by an (N − 1)th order polynomial. • If we increase λ, we begin to see the regularization taking effect. • λ = 1 x 10-6 follows the general shape of the exact fifth-order polynomial but without as much variability and subsequently is further from the data points. Thus, it can be used to obtain better prediction while avoiding overfitting. • It is common to use cross validation to choose the value of λ that gives the best predictive performance
  • 50. 14/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 How to avoid overfitting and underfitting • Fixing high bias problem: • Train more complex model: we can increase the degree of the independent variable  instead of using linear regression, we can use polynomial regression. • Train the model with more independent variables: we add more information to the model (multiple linear regression), so that we have better representation of the dependent variable. • Fixing high variance problem: • Obtain more data  avoid overfitting by providing more exposure to the model • Decrease the number of independent variable • Performing regularization (ridge regression) • Choose the best model using cross validation • The goal is to achieve balance between bias and variance  this is where the experiment takes place. You try several model and choose the best (the most optimum) by looking at one or more metrics (MSE, , and ) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 End of File