Linear regression

“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”

● Deterministic vs Statistical Relations
● Introduction to Linear Regression
● Simple Linear Regression
● Model Evaluation
● Gradient Descent
● Polynomial Regression
● Bias and Variance
● Regularization
● Lasso Regression
● Ridge Regression
● Stochastic Gradient Descent
● Robust Regressors for data with outliers
Agenda

Deterministic vs Statistical Relations
● Deterministic Relations
○ Data is aligned properly
○ Relations can be formulated
○ Example: Converting Celsius to Fahrenheit
● Statistical Relations
○ They exhibit trend but not perfect relation
○ Data also exhibits some scatter
○ Example: Height vs Weight

Introduction to Linear Regression
● Simplest and widely used
● Prediction by averaging the data
● Better prediction by additional information
● Betterness is measured by Residuals
● Finding line of Best-fit

An Example
Meal Tip Amount ($) Residual Residual Sq.
1 5 -5 25
2 17 7 49
3 11 1 1
4 8 -2 4
5 14 4 16
6 5 -5 25
SSE 120

Better Prediction
Bill ($)
Tip Amount
($) Residual Residual Sq.
34 5 0.8495 0.7217
108 17 2.0307 4.1237
64 11 2.4635 6.0688
88 8 -4.0453 16.3645
99 14 0.3465 0.1201
51 5 -1.6359 2.6762
SSE 30.075

Simple Linear Regression
● One target variable and only one feature
● Follows general form of linear equation
‘Θ0’ is the intercept
‘Θ1’ is the slope of the line
● This is the estimation of a population data

Assumptions of Linear Regression
● The population line: yi=β0+β1xi+ϵi; E(Yi)=β0+β1xi
● E(Yi), at each value of xi is a Linear function of the xi
● The errors are
○ Independent
○ Normally distributed
○ Equal variances (denoted σ^2)

Line of Best-Fit
● Best-Fit line has a less value of SSE
● Sum of square of residual Errors - SSE
h(X) is the predicted value
● Penalizes higher error more

Coefficient of Determination
SSR - "Regression sum of squares" = sum(Yh - Ymn)^2
SSE - "Error sum of squares" = sum(Y - Yh)^2
SSTO - "Total sum of squares" = SSR + SSE = sum(Y - Ymn)^2
R-squared = SSR/SSTO = 1 - (SSE/SSTO)
"R-squared×100 percent of the variation in y is 'explained by' the variation in
predictor x"

The Cost Function
● Cost function is to optimize the parameters
● Norm 2 is preferred as cost function
● We use MSE (Mean Squared Error) as cost function
● MSE is average of the SSE
● Min SSE is the Least Squares Criterion

Normal Equation
● Derived by directly equating gradient to zero
● Simple equation but..
○ Closed form solution
○ Performance better only when less no.of features
○ No. of data points should be always greater than the no.of variables
○ Availability of better technique while Regularizing the model

Gradient Descent Algorithm
● Optimization is a big part of machine learning
● It is a simple optimization procedure
● Finds the values of parameters at global minima
● “Alpha” is learning rate

GDs Calculated
Housing Data Min-Max Std. -(Y-Yh) -(Y-Yh)*X
House size (X) House price (Y) X Y Yh SSE dMSE/da dMSE/db
1,100 199,000 0 0 0.45 0.2025 0.45 0
1,400 245,000 0.22 0.22 0.62 0.16 0.4 0.088
1,425 319,000 0.24 0.58 0.63 0.0025 0.05 0.012
1,550 240,000 0.33 0.2 0.7 0.25 0.5 0.165
1,600 312,000 0.37 0.55 0.73 0.0324 0.18 0.0666
1,700 279,000 0.44 0.39 0.78 0.1521 0.39 0.1716
1,700 310,000 0.44 0.54 0.78 0.0576 0.24 0.1056
1,875 308,000 0.57 0.53 0.88 0.1225 0.35 0.1995
2,350 405,000 0.93 1 1.14 0.0196 0.14 0.1302
2,450 324,000 1 0.61 1.2 0.3481 0.59 0.59
a b sum 1.3473 3.300 1.545
0.45 0.75 MSE 0.0673 0.330 0.154

Deep Dive
X = 1,400; Y= 2,45,000; a = 0.45; b = 0.75; m = total no.of data = 10
Xs = (X - Xmin)/(Xmax - Xmin) = (1,400 - 1,100)/(2,450 - 1,100) = 0.22
Ys = (Y - Xmin)/(Ymax - Ymin) = (245 - 199)/(405 - 199) = 0.22
Yh = a + bXs = 0.45 + 0.75*(0.22) = 0.62
SSEi = (Ys - Yh)^2 = (0.22 - 0.62)^2 = 0.16
Gradients: dMSE/da = -(Ys-Yh) = 0.4
dMSE/db = -(Ys-Yh)*Xs = 0.088
MSE = (1/2m)*sum(SSEi) = 0.0673

Polynomial Regression
● Derives features
● Better in estimating values if the
trend is nonlinear
● Predicts a curve rather than
a simple line
● This plot is linear in
2-D space - Multiple regression

Regularization
● To overcome overfitting problem
● Overfitted model has high variant estimates
● High variant estimates, not good estimates
● Trading between bias and variance is achieved
● Limiting the parameters
● Different techniques to limit the paramates

L2 - Regularization
● Objective = RSS + α * (sum of square of coefficients)
○ α = 0: The objective becomes same as simple linear regression
○ α = ∞: The coefficients will be zero
○ 0 < α < ∞: The coefficients will be somewhere between 0 and ones for simple linear
regression
● As the value of alpha increases, the model complexity reduces
● Though the coefficients are very very small, they are NOT zero

L1 - Regularization
● Objective = RSS + α * (sum of absolute value of coefficients)
● For the same values of alpha, the coefficients of lasso regression are
much smaller as compared to that of ridge regression
● For the same alpha, lasso has higher RSS (poorer fit) as compared to ridge
regression
● Many of the coefficients are zero even for very small values of alpha

L2 vs L1
L2 Reg. L1 Reg.
Key Differences
Includes all (or none) of the
features in the model
Performs feature selection
Typical Use Cases
Majorly used to prevent
overfitting
Sparse solutions - modelling cases
where the features are in millions or
more
Presence of Highly
Correlated Features
Works well even in presence of
highly correlated features
Arbitrarily selects any one feature
among the highly correlated ones

Stochastic Gradient Descent
● Simple & yet efficient approach for linear models
● Supports out-of-core training
● Randomly select data & train model.
● Repeat the above step & model keeps tuning

Robust Regression
● Outliers have some serious
impact on estimation of
predictor
● Huber Regression vs Ridge
Regression

Linear regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linear regression

Similar to Linear regression (20)

More from zekeLabs Technologies

More from zekeLabs Technologies (20)

Recently uploaded

Recently uploaded (20)

Linear regression