Jin-Woo Jeong
Network Science Lab
Dept. of Mathematics
The Catholic University of Korea
E-mail: zeus0208b@gmail.com
1
 Introduction
• What is Linear Regression
 Algoritm
• Hypothesis
• Cost function : MSE
• How to minimize MSE
• Codes
 Advantage / Disadvantage
• Advantage
• Disadvantage
 Q / A
2
1. Introduction
what is Linear Regression
• Simple Linear Regression (line)
• Multiple Linear Regression (plane, hyperplane)
y=Wx+b
• we need to find
w : weigt
b : bias (intercept)
Simple Linear Regression
3
2. Algorithm
Hypothesis
• H(W,b) = Wx + b
• initialize W & b (random)
• x = [1, 2, 3]
• y = [3, 5, 7]
• initialize W = 1, b = 0
4
2. Algorithm
Cost Function : MSE
• MSE : mean squared error
= MSE
5
2. Algorithm
How to minimize MSE : Gradient descent
• Gradient descent :
The process of iteratively computing
gradients and updating variable values to
reduce the cost
ㆍProcess
1. Start at an arbitrary point or initialize.
2. Compute the gradient of the function at
the current position.
3. Move in the opposite direction of the
gradient by a small step, determined by
the learning rate.
4. Recalculate the gradient at the new
position and move again.
5. Repeat these steps iteratively to find the
optimal position (minimum).
6
2. Algorithm
How to minimize MSE : Gradient descent
W W
learning rate↓ learning rate↑
7
2. Algorithm
Gradient descent in Simple Linear Regression
α : learning rate
8
2. Algorithm
Impliment
9
2. Algorithm
α : 0.01 (Just right)
epoch = 1000
W : 2.018600 ≈ 2
b : 0.957717 ≈ 1
10
2. Algorithm
α : 0.0001 (Too small)
epoch = 1000
W : 1.803907
b : 0.366944
11
2. Algorithm
α : 0.2 (Too large)
epoch = 200
W : divergent (∞)
b : divergent (∞)
Finding the appropriate alpha is crucial
!
12
3. Advantage / Disadvantage
Advantage
• The learning speed is fast.
• Training works well even with very large
datasets.
• It can learn well even when there are
many features compared to the data, and
it's easy to understand how predictions
are made.
Disadvantage
• The values of the coefficients are not
clear in terms of why they have those
specific values
• If the features in the dataset are deeply
interrelated, it becomes very difficult to
analyze the coefficients.
13
4. Q & A
Q / A

Linear Regression.pptx

  • 1.
    Jin-Woo Jeong Network ScienceLab Dept. of Mathematics The Catholic University of Korea E-mail: zeus0208b@gmail.com
  • 2.
    1  Introduction • Whatis Linear Regression  Algoritm • Hypothesis • Cost function : MSE • How to minimize MSE • Codes  Advantage / Disadvantage • Advantage • Disadvantage  Q / A
  • 3.
    2 1. Introduction what isLinear Regression • Simple Linear Regression (line) • Multiple Linear Regression (plane, hyperplane) y=Wx+b • we need to find w : weigt b : bias (intercept) Simple Linear Regression
  • 4.
    3 2. Algorithm Hypothesis • H(W,b)= Wx + b • initialize W & b (random) • x = [1, 2, 3] • y = [3, 5, 7] • initialize W = 1, b = 0
  • 5.
    4 2. Algorithm Cost Function: MSE • MSE : mean squared error = MSE
  • 6.
    5 2. Algorithm How tominimize MSE : Gradient descent • Gradient descent : The process of iteratively computing gradients and updating variable values to reduce the cost ㆍProcess 1. Start at an arbitrary point or initialize. 2. Compute the gradient of the function at the current position. 3. Move in the opposite direction of the gradient by a small step, determined by the learning rate. 4. Recalculate the gradient at the new position and move again. 5. Repeat these steps iteratively to find the optimal position (minimum).
  • 7.
    6 2. Algorithm How tominimize MSE : Gradient descent W W learning rate↓ learning rate↑
  • 8.
    7 2. Algorithm Gradient descentin Simple Linear Regression α : learning rate
  • 9.
  • 10.
    9 2. Algorithm α :0.01 (Just right) epoch = 1000 W : 2.018600 ≈ 2 b : 0.957717 ≈ 1
  • 11.
    10 2. Algorithm α :0.0001 (Too small) epoch = 1000 W : 1.803907 b : 0.366944
  • 12.
    11 2. Algorithm α :0.2 (Too large) epoch = 200 W : divergent (∞) b : divergent (∞) Finding the appropriate alpha is crucial !
  • 13.
    12 3. Advantage /Disadvantage Advantage • The learning speed is fast. • Training works well even with very large datasets. • It can learn well even when there are many features compared to the data, and it's easy to understand how predictions are made. Disadvantage • The values of the coefficients are not clear in terms of why they have those specific values • If the features in the dataset are deeply interrelated, it becomes very difficult to analyze the coefficients.
  • 14.
    13 4. Q &A Q / A

Editor's Notes

  • #5 each data point is assigned in closest centroids. the boundary means two group are generated.
  • #6 왼쪽의 그림은 18개의 데이터포인트와 2개의 centroid가 랜덤하게 선택된 모습이다. centroid는 데이터포인트 중 하나가 아니어도 된다. let’s look at the picture on the left. there are 18 datapoints and two randomly selected centroid. 오른쪽 그림은 각 단계에서 관련된 코드이다. the picture on the right is the python codes about each step
  • #7 각 그룹 안에서의 데이터포인트들의 평균을 새로운 centroid로 설정합니다. in each group , Set the average point of the data points to the new centroid.
  • #8 각 그룹 안에서의 데이터포인트들의 평균을 새로운 centroid로 설정합니다. in each group , Set the average point of the data points to the new centroid.
  • #9 각 그룹 안에서의 데이터포인트들의 평균을 새로운 centroid로 설정합니다. in each group , Set the average point of the data points to the new centroid.