Modul Topik 5 - Kecerdasan Buatan

Topik 5
Regresi Linear dan Seleksi Model
Machine Learning
Dr. Sunu Wibirama
Modul Kuliah Kecerdasan Buatan
Kode mata kuliah: UGMx 001001132012
June 22, 2022

June 22, 2022
1 Capaian Pembelajaran Mata Kuliah
Topik ini akan memenuhi CPMK 5, yakni mampu mendefinisikan beberapa teknik ma-
chine learning klasik (linear regression, rule-based machine learning, probabilistic machine
learning, clustering) dan konsep dasar deep learning serta implementasinya dalam penge-
nalan citra (convolutional neural network).
Adapun indikator tercapainya CPMK tersebut adalah memahami dasar-dasar linear re-
gression, mampu membedakan antara klasifikasi dan regresi, mengerti implementasi linear
regression, memahami konsep K-Fold Cross Validation, curse of dimensionality, overfitting,
dan underfitting.
2 Cakupan Materi
Cakupan materi dalam topik ini sebagai berikut:
a) Simple Linear Regression: materi ini menjelaskan konsep dasar regresi linear, konsep
residual sum of squares, dan contoh penerapannya dalam memprediksi nilai baru.
b) Correlation and coefficient of determination: materi ini menjelaskan konsep korelasi
dan koefisien determinasi. Korelasi menjelaskan kekuatan dari variabel x dan y. Koe-
fisien determinasi 𝑅2
adalah proporsi dari variasi pada variabel terikat (𝑌 ) yang dapat
diprediksi oleh variabel bebas (𝑥).
c) Multiple Linear Regression: materi ini menjelaskan konsep regresi dengan variabel
bebas lebih dari satu. Pada materi ini juga dijelaskan konsep perhitungan parameter
regresi dengan memanfaatkan teknik-teknik aljabar linear (Gauss Elimination).
d) Model Selection: materi ini menjelaskan konsep overfitting, underfitting, dan seleksi
model dengan menggunakan K-Fold Cross Validation. Selain itu, pada materi ini juga
dijelaskan konsep regularization dan curse of dimensionality.
1

14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Simple Linear Regression (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Supervised Machine Learning

14/06/2022
sunu@ugm.ac.id
1st type of supervised learning: Regression
• Regression is a measure of the
relation between values of one
variable (e.g. housing price) and
corresponding values of other
variable (e.g. time).
• Regression means to predict the
numerical output value using
training data.
• Basic algorithm : linear regression
sunu@ugm.ac.id
Neural network model Geometric model
Logical model/rule-based model Probabilistic model
2nd type of supervised learning: Classification

14/06/2022
sunu@ugm.ac.id
Using regression for modeling Covid-19 infection
Puno, G.R., Puno, R.C.C. and Maghuyop, I.V., 2021. COVID-19 case fatality rates across Southeast Asian countries (SEA): a preliminary estimate using a
simple linear regression model. Journal of Health Research.
Deaths and confirmed cases of infections of COVID-
19 among the SEA countries as of May 21, 2020
Scatterplots and regression lines of COVID-19
confirmed cases of infections and deaths of the six
SEA countries as of May 21, 2020
sunu@ugm.ac.id
Eye Tracking Calibration: Fixation-Based
• Goal: improving spatial accuracy during gaze
interaction
• User is asked to fixate their gaze on calibration
targets (animated white or red circle).
• Mapping from eye position to calibration target:
second order polynomial regression
Gaze calibration target

14/06/2022
sunu@ugm.ac.id
7
End of File

14/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Advertising budget case
Suppose that we are statistical consultants hired by a
client to provide advice on how to improve sales of a
particular product.
The advertising dataset consists of the sales of that
product in 200 different markets, along with advertising
budget for TV (TV ads).
If we determine that there is a correlation between
advertising and sales, then we can instruct our client to
adjust advertising budget, thereby indirectly increasing
sales.
Our goal is to develop an accurate mathematical model
that can be used to predict sales on the basis of the TV
ads budget.
Sales
(Thousands)
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with
permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
TV ads budget (IDR million)

14/06/2022
sunu@ugm.ac.id
Advertising budget case
The advertising budget is input variables while sales is an
output variable.
The inputs go by different names, such as predictors,
independent variables, regressors, features, or sometimes just
variables, denoted using the symbol
The output variable—in this case, sales—is often called the
response or dependent variable, and is typically denoted
using the symbol .
A reasonable form of a relationship between the response
and the regressor is the linear relationship:
= +
Sales
(Thousands)
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with
permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
TV ads budget (IDR million)
sunu@ugm.ac.id
Linear Regression
= +
: response variable or
dependent variable
: predictor variables or features
or independent variables or
regressor
: intercept
: slope

14/06/2022
sunu@ugm.ac.id
Simple Linear Regression (SLM)
• Regression analysis: the relationships among variables are
not deterministic (i.e., not exact). There must be a random
component to the equation that relates the variables.
• This random component takes into account considerations
that are not being measured or, in fact, are not understood
by the scientists or engineers.
• To accommodate this random component: simple linear
regression model (SLM)
• In the above, and are unknown intercept and slope
parameters, respectively, and is a random variable that is
assumed to be distributed with E( ) = 0 and Var( ) = ,
called random error or random disturbances.
= + +
Basic concept:
We want to find a line (linear model) that
can generalize the data, where the line
represent all data with minimum error
sunu@ugm.ac.id
Weighted linear equation in deep learning

14/06/2022
sunu@ugm.ac.id
End of File

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Simple Linear Regression (SLM)
• Regression analysis: the relationships among variables are
not deterministic (i.e., not exact). There must be a random
component to the equation that relates the variables.
• This random component takes into account considerations
that are not being measured or, in fact, are not understood
by the scientists or engineers.
• To accommodate this random component: simple linear
regression model (SLM)
• In the above, and are unknown intercept and slope
parameters, respectively, and is a random variable that is
assumed to be distributed with E( ) = 0 and Var( ) = ,
called random error or random disturbances.
= + +
Basic concept:
We want to find a line (linear model) that
can generalize the data, where the line
represent all data with minimum error

14/06/2022
sunu@ugm.ac.id
Least squares and the fitted model
• The residual sum of squares (RSS) is often called the sum of
squares of the errors about the regression line and is denoted by
SSE.
• This minimization procedure for estimating the parameters is called
the method of least squares. Hence, we shall find and so as
to minimize
sunu@ugm.ac.id
Side note: turning points
• We can find turning points (that might correspond to minima) of a function, ( ), by
searching for points where the gradient of the function,
( )
, is zero. To determine
whether or not a turning point corresponds to a maximum, minimum (saddle point),
we can examine the second derivative,
( )
. If, at a turning point , the second
derivative is positive, we know that this turning point is a minimum.

14/06/2022
sunu@ugm.ac.id
• Setting the partial derivatives equal to zero, we obtain normal
equations for estimating regression coefficients:
=
∑ − ∑ ∑
∑ − ∑
=
∑ ∑ − ∑ ∑
∑ − ∑
sunu@ugm.ac.id
• Simpler formula:
Note: additional notation

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
• Setting the partial derivatives equal to zero, we obtain normal
equations for estimating regression coefficients:
=
∑ − ∑ ∑
∑ − ∑
=
∑ ∑ − ∑ ∑
∑ − ∑

14/06/2022
sunu@ugm.ac.id
Example : weights of 10 students
• There is a research about weight of 10 students that are
predicted to be affected by consumption of calory / day.
• Goal of analysis: to understand whether calory affects
weight.
• We must identify predictor variable (features) and
response variable:
• (predictor) = amount of calory / day
• (response) = weight
• = 10 students
sunu@ugm.ac.id
Subject # Calory / day ( ) Weight ( )
1 530 89
2 300 48
3 358 56
4 510 72
5 302 54
6 300 42
7 387 60
8 527 85
9 415 63
10 512 74
After collecting data from 10 students, we get the following table:
(Source: Digital Talent Scholarship Training Course Materials, 2019)

14/06/2022
sunu@ugm.ac.id
To find a and b, we must compute ², ²,
y dan the sum ( ) of them
sunu@ugm.ac.id
Since we have only one predictor (feature), computing and is straightforward
2.608
0.149

14/06/2022
sunu@ugm.ac.id
Then, we create the regression mathematical model. Based on our previous slide, we
obtain this equation: = 2.608 + 0.149
sunu@ugm.ac.id
Using this equation, we can the perform prediction
Suppose a student obtains 600 calory/day:
= 2.608 + 0.149
prediction of = 2.608 + (0.149 * 600) = 92 kilogram
You can also predict the calory consumption, given the weight of the student.
For example, if the weight of the student is 40 kilogram, then:
40 = 2.608 + 0.149
37.392 = 0.149
prediction of = 250.59 calory/day

14/06/2022
sunu@ugm.ac.id
End of File

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Correlation
sunu@ugm.ac.id
Correlation
• Correlation testing is used to measure the
strength of relationship between and .
• Regression analysis is commonly performed with
correlation analysis.
• Correlation coefficient (r) is expressed by:
=
∑ − ∑ ∑
∑ − ∑ ∑ − ∑

14/06/2022
sunu@ugm.ac.id
• There is a research about weight of 10 students that are
predicted to be affected by consumption of calory / day.
• Goal of analysis: to understand whether calory affects
weight.
• We must identify predictor variable (features) and
response variable:
• (predictor) = amount of calory / day
• (response) = weight
• = 10 students
sunu@ugm.ac.id
Subject # Calory / day ( ) Weight ( )
1 530 89
2 300 48
3 358 56
4 510 72
5 302 54
6 300 42
7 387 60
8 527 85
9 415 63
10 512 74
After collecting data from 10 students, we get the following table:

14/06/2022
sunu@ugm.ac.id
Data: weight of 10 students
• The relationship between predictor variable ( ) and response variable ( ) is very
strong, we found 95% correlation value (r = 0.95)
• In summary, body mass is indeed affected by calory consumption per day.
0.95
sunu@ugm.ac.id
Coefficient of determination
• The coefficient of determination, denoted 2 or 2 and pronounced
"R squared", is the proportion of the variance in the dependent
variable ( ) that is predictable from the independent variable ( )
• 2 is a statistic that will give some information about the goodness
of fit of a model (i.e., adequacy of regression model)  0  2  1
• An 2 of 1 indicates that the regression predictions perfectly fit the
data. 2 of 0 indicates that the dependent variable can’t be reliably
predicted from the independent variable.
• From our data, the coefficient of determination is 2 = 0.90
Interpretation: 90% of predictor variable ( ) can explain response
variable ( ), while 10% of it is explained by the other variable.

14/06/2022
sunu@ugm.ac.id
Coefficient of determination
• Another way to compute 2 or 2
SSE: sum of squared error
SST: total corrected sum of squares
sunu@ugm.ac.id
End of File

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Multiple Linear Regression (Part 01)
sunu@ugm.ac.id
Review: Ads budget and sales
(Thousands)
TV ads budget
(IDR million)
Radio ads budget
(IDR million)
Newspaper ads budget
(IDR million)
The input variables are typically denoted using the symbol X, with a subscript to distinguish them.
So will be the TV budget, the radio budget, and the newspaper budget.

14/06/2022
sunu@ugm.ac.id
Complexity of engineering problems
• In some cases, more than one independent variable is needed to
develop a regression model.
• Thus, we need multiple linear regression
• The estimated response is obtained from the sample regression
equation:
• We obtain the least squares estimators of the parameters to the
data points:
sunu@ugm.ac.id
Multiple linear regression
• In using the concept of least squares to arrive at estimates
, , … , , we minimize the expression
• We generate the set of k + 1 normal equations for multiple linear
regression.
Solve this linear
equation of xb = y
using numerical method
or linear algebra
(i.e. Gauss Elimination
or inverse)

14/06/2022
sunu@ugm.ac.id
Multiple linear regression – fitted model
For the advertising data, a linear regression fit to sales using TV ads
budget and Radio ads budget as predictors.
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"
(Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
sunu@ugm.ac.id
End of File

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Multiple linear regression – fitted model
For the advertising data, a linear regression fit to sales using TV ads
budget and Radio ads budget as predictors.
"Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"
(Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "

14/06/2022
sunu@ugm.ac.id
Multiple linear regression
• In using the concept of least squares to arrive at estimates
, , … , , we minimize the expression
• We generate the set of k + 1 normal equations for multiple linear
regression.
Solve this linear
equation of xb = y
using numerical method
or linear algebra
(i.e. Gauss Elimination
or inverse)
sunu@ugm.ac.id
Example
• We used data on pull strength of a
wire bond in a semiconductor
manufacturing process, wire length,
and die height to illustrate building an
empirical model.
• Note: for simplicity, units are omitted.
• Number of observation ( ): 25
• Independent variables:
• Wire length ( )
• Die height ( )
• Dependent variable:
• Pull strength ( )
Reference: Applied Statistics and Probability for Engineers, by Montgomery and Runger.

14/06/2022
sunu@ugm.ac.id
Example
Compute the coefficient of normal equations
sunu@ugm.ac.id
Example
Compute the coefficient of normal equations

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Solving a system of linear equation
• Systems of linear equations that have to be solved
simultaneously arise in problems that include several (possibly
many) variables that are dependent on each other.
• The general form of a system of n linear algebraic equations is:
• Solving a system of linear equation can be performed by various
ways, such as: inverse method, Gauss elimination, Gauss-Jordan
elimination, LU decomposition, or iterative method (Jacobi,
Gauss-Seidel).
• You need basic understanding of numerical methods to solve a
system of linear equation

14/06/2022
sunu@ugm.ac.id
Example – Inverse method
We can write normal equations in
matrix form as follows:
xb = y
To solve vector b, we find inverse of
matrix x ∶
b = x y
sunu@ugm.ac.id
But not all matrices have their inverse…

14/06/2022
sunu@ugm.ac.id
Example – Gauss Elimination
sunu@ugm.ac.id
Goal: to convert the equation into upper triangular matrix

14/06/2022
sunu@ugm.ac.id
Step 1: R1 times by 8.24 (206/25), and subtracted from R2
Step 2: R1 times by 331.76 (8294/25), and subtracted from R3
R1 as pivot row
sunu@ugm.ac.id
Step 3: R2 times by 12.64664 (8834.44/698.56), and subtracted from R3
Step 4: Final form of the equation with upper triangular matrix
R2 as pivot row

14/06/2022
sunu@ugm.ac.id
sunu@ugm.ac.id
Gaussian elimination method Inverse matrix method
Note: if you use inverse matrix method, you need to make sure
that the matrix is not singular and you can compute the inverse

14/06/2022
sunu@ugm.ac.id
End of File

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
Model Selection (Part 01)
sunu@ugm.ac.id
The philosophy of machine learning
Given these examples, decide which
arithmetic operation (addition,
subtraction, multiplication, and
division) is the best choice to explain
mapping of the unknown function
between inputs and outputs
(Kelleher, 2019)

14/06/2022
sunu@ugm.ac.id
The philosophy of machine learning
• Multiplication is the best choice.
• However, if the number of inputs to the
unknown function increases (perhaps to
hundreds or thousands inputs), the
variety of potential functions to be
considered get larger
• This is how machine learning takes part to
search the best function:
find a function that generalizes well to an
unseen data, based on previously seen
data.
(Kelleher, 2019)
sunu@ugm.ac.id
Machine Learning Pipeline
https://machinelearningmastery.com/machine-learning-checklist/

14/06/2022
sunu@ugm.ac.id
Why machine learning is difficult?
• First, dataset includes noises. Learning a function
to exactly matches the data = learning the noises.
Hence, “data are the new oil”
• Second, sometimes the set of possible functions
is larger than the set of examples in the dataset
(ill-posed problem: information given in the
problem is not sufficient to find a single best
solution).
• Third, not all features that you extract are useful to
find a good model. A lot of features do not lead to
more accurate model. This is called the curse of
dimensionality.
sunu@ugm.ac.id
Overfitting and underfitting
Regression
problem
Classification
problem

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Model selection (1)
• An important problem in many applications of regression
analysis involves selecting the set of regressor variables
(independent variables) to be used in the model.
• Our case: we are sure that not all these candidate regressors
are necessary to adequately model the response Y.
• Our solution: variable selection  screening the candidate
variables to obtain a regression model that contains the “best”
subset of regressor variables.
• Goal: the final model contains enough regressor variables so
that in the intended use of the model (prediction, for example)
will perform satisfactorily.
• We would like the model to use as few regressor variables as
possible to model maintenance costs to a minimum and to make
the model easy to use.

14/06/2022
sunu@ugm.ac.id
Model selection (2)
• We assume that there are candidate regressors, ,
, … , and a single response variable .
• All models will include an intercept term , so the model with all
variables included would have +1 terms.
• If there are candidate regressors, there are 2 total equations
to be examined. For example, for two candidates , , we
must examine :
=
= +
= +
= + +
• Hence, the number of equations to be examined increases
rapidly as the number of candidate variables increases.
sunu@ugm.ac.id
Model selection (3)
• We can use to evaluate linear regression model.
• Another way to assess the quality of regression model is
to use total mean square error for the regression model.
• We define mean square error (MSE) for the regression
model as:
= ∑ −
= −
= + − 2
= + 2 + − 2
= + − 2
= + ( )2
• We choose the best model with the least MSE score

14/06/2022
sunu@ugm.ac.id
MSE vs model complexity
• Simple model is restrictive  cannot
capture variability in complex real data
• Higher-order model is flexible  can
capture variability in complex real data
• Increasing model complexity means
increasing flexibility. However, this
comes with cost.
• If the model is too complex, it fails to
perform generalization  test MSE is
high but training MSE is low.
• Ideal condition is achieved when the
model is not to complex (blue line,
spline 1)  test MSE and training MSE
achieve minimum score
: linear regression
: splines with complex model
: real data
: training MSE
: test MSE
linear regression
spline 1
spline 2
Source: G. James, D. Witten, T. Hastie and R. Tibshirani,
An Introduction to Statistical Learning, with applications in R, Springer, 2013.
sunu@ugm.ac.id
End of File

14/06/2022
1
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
Bias-variance trade off
• Variance: refers to the amount by which would change if we
estimated it using a different training data set. If a model has overly
high variance, it means that the model is learning the noise of the
training data in addition to the true association.
• Bias: is the average error between the prediction values and the
true values of new data. It reflects the ability of a model to learn
the true association. If a model has overly high bias, it means that
the algorithm was not able to learn a good approximation of true
association of the real data.
• Training set: the training set is typically 60% of the data. As the
name suggests, this is used for training a regression model (finding
regression parameters).
• Validation set: The validation is also called the the development
set. This is typically 20% of the data. This set is not used during
training. It is used to test the quality of the trained model.
• Test set: this set is typically 20% of the data. Its only purpose is to
report the accuracy of the final model.
Bias problem:
high training error, validation error is similar
in magnitude to the training error
Variance problem:
Low training error, validation error is very high

14/06/2022
2
sunu@ugm.ac.id
Overfitting and underfitting
Underfitting:
linear model is not
sufficient to fit the samples;
high bias, low variance
Good fit:
polynomial of degree 4
approximates true
function almost perfectly;
balanced bias and variance
Overfitting:
a higher degrees of polynomial
fits the noises from sample data;
low bias, high variance
sunu@ugm.ac.id
Cross validation
• The loss (MSE) that we calculate from validation
set will be sensitive to the choice of data in our
validation set.
• This is particularly problematic if our dataset (and
hence our validation set) is small
• K-fold cross-validation splits the data into K equally
(or as close to equal as possible) sized blocks.
• Each block takes its turn as a validation set for a
training set comprised of the other K − 1 blocks.
• Averaging over the resulting K loss values gives us
our final loss value.
Source: S. Rogers and M. Girolami, A First Course in Machine Learning, Chapman & Hall/CRC, 2017.

14/06/2022
3
sunu@ugm.ac.id
Cross validation
sunu@ugm.ac.id
End of File

14/06/2022
sunu@ugm.ac.id
Sunu Wibirama
sunu@ugm.ac.id
INDONESIA
sunu@ugm.ac.id
As the number of independent
variables or features or
dimensions grows, the amount of
data we need to generalize
accurately grows exponentially
Source: C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Curse of dimensionality

14/06/2022
sunu@ugm.ac.id
Curse of dimensionality
The amount of training data needed to cover 20% of the feature range
grows exponentially with the number of dimensions (features/independent variables).
http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
1 dimension 2 dimensions 3 dimensions
sunu@ugm.ac.id
Data availability and regression parameters
• Previously, we use least squares to find and by minimizing the residual sum of squares
(RSS) or sum of squares errors (SSE)
• Least squares works well if the number of observations is much larger than the number
of regression parameters , ≫ .
• However, data are expensive (curse of dimensionality: if the independent variables increase,
you need more data). If > or even > , the model will result in overfitting.
• By constraining or shrinking the estimated coefficients, we can often substantially reduce the
variance at the cost of a negligible increase in bias.

14/06/2022
sunu@ugm.ac.id
Regularization: ridge regression
• Ridge regression is very similar to least squares, except that the coefficients are
estimated by minimizing a slightly different quantity.
• The ridge regression minimizes the following equation:
where ≥ 0 is a tuning parameter, to be determined separately.
• Ridge regression seeks coefficient estimates that fit the data well, by making the SSE
small. However, the second term ∑ , called shrinkage penalty, is small
when , …, are close to zero, so it has the effect of shrinking towards zero.
2
2 2
0
1 1 1 1
p p p
n
i j ij j j
i j j j
y b b x b SSE b
 
   
 
    
 
 
   
sunu@ugm.ac.id
Regularization: ridge regression
• If λ = 0, we retrieve the original solution just like what least
squares do.
• A fifth-order polynomial function can fit the six data points
exactly and we can see this if we set λ = 0
• In general N data points can be perfectly fitted by an (N − 1)th
order polynomial.
• If we increase λ, we begin to see the regularization taking
effect.
• λ = 1 x 10-6 follows the general shape of the exact fifth-order
polynomial but without as much variability and subsequently
is further from the data points. Thus, it can be used to obtain
better prediction while avoiding overfitting.
• It is common to use cross validation to choose the value of λ
that gives the best predictive performance

14/06/2022
sunu@ugm.ac.id
How to avoid overfitting and underfitting
• Fixing high bias problem:
• Train more complex model: we can increase the degree of the
independent variable  instead of using linear regression,
we can use polynomial regression.
• Train the model with more independent variables: we add more
information to the model (multiple linear regression), so that we
have better representation of the dependent variable.
• Fixing high variance problem:
• Obtain more data  avoid overfitting by providing more exposure
to the model
• Decrease the number of independent variable
• Performing regularization (ridge regression)
• Choose the best model using cross validation
• The goal is to achieve balance between bias and variance  this is
where the experiment takes place. You try several model and choose
the best (the most optimum) by looking at one or more metrics
(MSE, , and )
sunu@ugm.ac.id
End of File

Modul Topik 5 - Kecerdasan Buatan

Recommended

Recommended

More Related Content

Similar to Modul Topik 5 - Kecerdasan Buatan

Similar to Modul Topik 5 - Kecerdasan Buatan (20)

More from Sunu Wibirama

More from Sunu Wibirama (11)

Recently uploaded

Recently uploaded (20)

Modul Topik 5 - Kecerdasan Buatan