SlideShare a Scribd company logo
1 of 26
zekeLabs
Linear Regression
“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
● Deterministic vs Statistical Relations
● Introduction to Linear Regression
● Simple Linear Regression
● Model Evaluation
● Gradient Descent
● Polynomial Regression
● Bias and Variance
● Regularization
● Lasso Regression
● Ridge Regression
● Stochastic Gradient Descent
● Robust Regressors for data with outliers
Agenda
Deterministic vs Statistical Relations
● Deterministic Relations
○ Data is aligned properly
○ Relations can be formulated
○ Example: Converting Celsius to Fahrenheit
● Statistical Relations
○ They exhibit trend but not perfect relation
○ Data also exhibits some scatter
○ Example: Height vs Weight
Introduction to Linear Regression
● Simplest and widely used
● Prediction by averaging the data
● Better prediction by additional information
● Betterness is measured by Residuals
● Finding line of Best-fit
An Example
Meal Tip Amount ($) Residual Residual Sq.
1 5 -5 25
2 17 7 49
3 11 1 1
4 8 -2 4
5 14 4 16
6 5 -5 25
SSE 120
Better Prediction
Bill ($)
Tip Amount
($) Residual Residual Sq.
34 5 0.8495 0.7217
108 17 2.0307 4.1237
64 11 2.4635 6.0688
88 8 -4.0453 16.3645
99 14 0.3465 0.1201
51 5 -1.6359 2.6762
SSE 30.075
Simple Linear Regression
● One target variable and only one feature
● Follows general form of linear equation
‘Θ0’ is the intercept
‘Θ1’ is the slope of the line
● This is the estimation of a population data
Assumptions of Linear Regression
● The population line: yi=β0+β1xi+ϵi; E(Yi)=β0+β1xi
● E(Yi), at each value of xi is a Linear function of the xi
● The errors are
○ Independent
○ Normally distributed
○ Equal variances (denoted σ^2)
Line of Best-Fit
● Best-Fit line has a less value of SSE
● Sum of square of residual Errors - SSE
h(X) is the predicted value
● Penalizes higher error more
Coefficient of Determination
SSR - "Regression sum of squares" = sum(Yh - Ymn)^2
SSE - "Error sum of squares" = sum(Y - Yh)^2
SSTO - "Total sum of squares" = SSR + SSE = sum(Y - Ymn)^2
R-squared = SSR/SSTO = 1 - (SSE/SSTO)
"R-squared×100 percent of the variation in y is 'explained by' the variation in
predictor x"
The Cost Function
● Cost function is to optimize the parameters
● Norm 2 is preferred as cost function
● We use MSE (Mean Squared Error) as cost function
● MSE is average of the SSE
● Min SSE is the Least Squares Criterion
Normal Equation
● Derived by directly equating gradient to zero
● Simple equation but..
○ Closed form solution
○ Performance better only when less no.of features
○ No. of data points should be always greater than the no.of variables
○ Availability of better technique while Regularizing the model
Gradient Descent Algorithm
● Optimization is a big part of machine learning
● It is a simple optimization procedure
● Finds the values of parameters at global minima
● “Alpha” is learning rate
Math behind GD
GDs Calculated
Housing Data Min-Max Std. -(Y-Yh) -(Y-Yh)*X
House size (X) House price (Y) X Y Yh SSE dMSE/da dMSE/db
1,100 199,000 0 0 0.45 0.2025 0.45 0
1,400 245,000 0.22 0.22 0.62 0.16 0.4 0.088
1,425 319,000 0.24 0.58 0.63 0.0025 0.05 0.012
1,550 240,000 0.33 0.2 0.7 0.25 0.5 0.165
1,600 312,000 0.37 0.55 0.73 0.0324 0.18 0.0666
1,700 279,000 0.44 0.39 0.78 0.1521 0.39 0.1716
1,700 310,000 0.44 0.54 0.78 0.0576 0.24 0.1056
1,875 308,000 0.57 0.53 0.88 0.1225 0.35 0.1995
2,350 405,000 0.93 1 1.14 0.0196 0.14 0.1302
2,450 324,000 1 0.61 1.2 0.3481 0.59 0.59
a b sum 1.3473 3.300 1.545
0.45 0.75 MSE 0.0673 0.330 0.154
Deep Dive
X = 1,400; Y= 2,45,000; a = 0.45; b = 0.75; m = total no.of data = 10
Xs = (X - Xmin)/(Xmax - Xmin) = (1,400 - 1,100)/(2,450 - 1,100) = 0.22
Ys = (Y - Xmin)/(Ymax - Ymin) = (245 - 199)/(405 - 199) = 0.22
Yh = a + bXs = 0.45 + 0.75*(0.22) = 0.62
SSEi = (Ys - Yh)^2 = (0.22 - 0.62)^2 = 0.16
Gradients: dMSE/da = -(Ys-Yh) = 0.4
dMSE/db = -(Ys-Yh)*Xs = 0.088
MSE = (1/2m)*sum(SSEi) = 0.0673
Polynomial Regression
● Derives features
● Better in estimating values if the
trend is nonlinear
● Predicts a curve rather than
a simple line
● This plot is linear in
2-D space - Multiple regression
Bias-Variance Tradeoff
The Bulls-Eye Diagram
Regularization
● To overcome overfitting problem
● Overfitted model has high variant estimates
● High variant estimates, not good estimates
● Trading between bias and variance is achieved
● Limiting the parameters
● Different techniques to limit the paramates
L2 - Regularization
● Objective = RSS + α * (sum of square of coefficients)
○ α = 0: The objective becomes same as simple linear regression
○ α = ∞: The coefficients will be zero
○ 0 < α < ∞: The coefficients will be somewhere between 0 and ones for simple linear
regression
● As the value of alpha increases, the model complexity reduces
● Though the coefficients are very very small, they are NOT zero
L1 - Regularization
● Objective = RSS + α * (sum of absolute value of coefficients)
● For the same values of alpha, the coefficients of lasso regression are
much smaller as compared to that of ridge regression
● For the same alpha, lasso has higher RSS (poorer fit) as compared to ridge
regression
● Many of the coefficients are zero even for very small values of alpha
L2 vs L1
L2 Reg. L1 Reg.
Key Differences
Includes all (or none) of the
features in the model
Performs feature selection
Typical Use Cases
Majorly used to prevent
overfitting
Sparse solutions - modelling cases
where the features are in millions or
more
Presence of Highly
Correlated Features
Works well even in presence of
highly correlated features
Arbitrarily selects any one feature
among the highly correlated ones
Stochastic Gradient Descent
● Simple & yet efficient approach for linear models
● Supports out-of-core training
● Randomly select data & train model.
● Repeat the above step & model keeps tuning
Robust Regression
● Outliers have some serious
impact on estimation of
predictor
● Huber Regression vs Ridge
Regression

More Related Content

What's hot

KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 

What's hot (20)

Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case Studies
 
Linear Regression in R
Linear Regression in RLinear Regression in R
Linear Regression in R
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Linear regression
Linear regression Linear regression
Linear regression
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Data Visualization.pptx
Data Visualization.pptxData Visualization.pptx
Data Visualization.pptx
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Tensor Explained
Tensor ExplainedTensor Explained
Tensor Explained
 
Linear regression
Linear regression Linear regression
Linear regression
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 

Similar to Linear regression

Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjdArjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
12345arjitcs
 

Similar to Linear regression (20)

What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjdArjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
 
Stepwise Selection Choosing the Optimal Model .ppt
Stepwise Selection  Choosing the Optimal Model .pptStepwise Selection  Choosing the Optimal Model .ppt
Stepwise Selection Choosing the Optimal Model .ppt
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introduction
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Six sigma
Six sigma Six sigma
Six sigma
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural Net
 
Multivariate Linear Regression.ppt
Multivariate Linear Regression.pptMultivariate Linear Regression.ppt
Multivariate Linear Regression.ppt
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear Regression
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 

More from zekeLabs Technologies

More from zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 

Linear regression

  • 2. “Goal - Become a Data Scientist” “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett “The Plan” “A Goal without a Plan is just a wish”
  • 3. ● Deterministic vs Statistical Relations ● Introduction to Linear Regression ● Simple Linear Regression ● Model Evaluation ● Gradient Descent ● Polynomial Regression ● Bias and Variance ● Regularization ● Lasso Regression ● Ridge Regression ● Stochastic Gradient Descent ● Robust Regressors for data with outliers Agenda
  • 4. Deterministic vs Statistical Relations ● Deterministic Relations ○ Data is aligned properly ○ Relations can be formulated ○ Example: Converting Celsius to Fahrenheit ● Statistical Relations ○ They exhibit trend but not perfect relation ○ Data also exhibits some scatter ○ Example: Height vs Weight
  • 5. Introduction to Linear Regression ● Simplest and widely used ● Prediction by averaging the data ● Better prediction by additional information ● Betterness is measured by Residuals ● Finding line of Best-fit
  • 6. An Example Meal Tip Amount ($) Residual Residual Sq. 1 5 -5 25 2 17 7 49 3 11 1 1 4 8 -2 4 5 14 4 16 6 5 -5 25 SSE 120
  • 7. Better Prediction Bill ($) Tip Amount ($) Residual Residual Sq. 34 5 0.8495 0.7217 108 17 2.0307 4.1237 64 11 2.4635 6.0688 88 8 -4.0453 16.3645 99 14 0.3465 0.1201 51 5 -1.6359 2.6762 SSE 30.075
  • 8. Simple Linear Regression ● One target variable and only one feature ● Follows general form of linear equation ‘Θ0’ is the intercept ‘Θ1’ is the slope of the line ● This is the estimation of a population data
  • 9. Assumptions of Linear Regression ● The population line: yi=β0+β1xi+ϵi; E(Yi)=β0+β1xi ● E(Yi), at each value of xi is a Linear function of the xi ● The errors are ○ Independent ○ Normally distributed ○ Equal variances (denoted σ^2)
  • 10. Line of Best-Fit ● Best-Fit line has a less value of SSE ● Sum of square of residual Errors - SSE h(X) is the predicted value ● Penalizes higher error more
  • 11. Coefficient of Determination SSR - "Regression sum of squares" = sum(Yh - Ymn)^2 SSE - "Error sum of squares" = sum(Y - Yh)^2 SSTO - "Total sum of squares" = SSR + SSE = sum(Y - Ymn)^2 R-squared = SSR/SSTO = 1 - (SSE/SSTO) "R-squared×100 percent of the variation in y is 'explained by' the variation in predictor x"
  • 12. The Cost Function ● Cost function is to optimize the parameters ● Norm 2 is preferred as cost function ● We use MSE (Mean Squared Error) as cost function ● MSE is average of the SSE ● Min SSE is the Least Squares Criterion
  • 13. Normal Equation ● Derived by directly equating gradient to zero ● Simple equation but.. ○ Closed form solution ○ Performance better only when less no.of features ○ No. of data points should be always greater than the no.of variables ○ Availability of better technique while Regularizing the model
  • 14. Gradient Descent Algorithm ● Optimization is a big part of machine learning ● It is a simple optimization procedure ● Finds the values of parameters at global minima ● “Alpha” is learning rate
  • 16. GDs Calculated Housing Data Min-Max Std. -(Y-Yh) -(Y-Yh)*X House size (X) House price (Y) X Y Yh SSE dMSE/da dMSE/db 1,100 199,000 0 0 0.45 0.2025 0.45 0 1,400 245,000 0.22 0.22 0.62 0.16 0.4 0.088 1,425 319,000 0.24 0.58 0.63 0.0025 0.05 0.012 1,550 240,000 0.33 0.2 0.7 0.25 0.5 0.165 1,600 312,000 0.37 0.55 0.73 0.0324 0.18 0.0666 1,700 279,000 0.44 0.39 0.78 0.1521 0.39 0.1716 1,700 310,000 0.44 0.54 0.78 0.0576 0.24 0.1056 1,875 308,000 0.57 0.53 0.88 0.1225 0.35 0.1995 2,350 405,000 0.93 1 1.14 0.0196 0.14 0.1302 2,450 324,000 1 0.61 1.2 0.3481 0.59 0.59 a b sum 1.3473 3.300 1.545 0.45 0.75 MSE 0.0673 0.330 0.154
  • 17. Deep Dive X = 1,400; Y= 2,45,000; a = 0.45; b = 0.75; m = total no.of data = 10 Xs = (X - Xmin)/(Xmax - Xmin) = (1,400 - 1,100)/(2,450 - 1,100) = 0.22 Ys = (Y - Xmin)/(Ymax - Ymin) = (245 - 199)/(405 - 199) = 0.22 Yh = a + bXs = 0.45 + 0.75*(0.22) = 0.62 SSEi = (Ys - Yh)^2 = (0.22 - 0.62)^2 = 0.16 Gradients: dMSE/da = -(Ys-Yh) = 0.4 dMSE/db = -(Ys-Yh)*Xs = 0.088 MSE = (1/2m)*sum(SSEi) = 0.0673
  • 18. Polynomial Regression ● Derives features ● Better in estimating values if the trend is nonlinear ● Predicts a curve rather than a simple line ● This plot is linear in 2-D space - Multiple regression
  • 21. Regularization ● To overcome overfitting problem ● Overfitted model has high variant estimates ● High variant estimates, not good estimates ● Trading between bias and variance is achieved ● Limiting the parameters ● Different techniques to limit the paramates
  • 22. L2 - Regularization ● Objective = RSS + α * (sum of square of coefficients) ○ α = 0: The objective becomes same as simple linear regression ○ α = ∞: The coefficients will be zero ○ 0 < α < ∞: The coefficients will be somewhere between 0 and ones for simple linear regression ● As the value of alpha increases, the model complexity reduces ● Though the coefficients are very very small, they are NOT zero
  • 23. L1 - Regularization ● Objective = RSS + α * (sum of absolute value of coefficients) ● For the same values of alpha, the coefficients of lasso regression are much smaller as compared to that of ridge regression ● For the same alpha, lasso has higher RSS (poorer fit) as compared to ridge regression ● Many of the coefficients are zero even for very small values of alpha
  • 24. L2 vs L1 L2 Reg. L1 Reg. Key Differences Includes all (or none) of the features in the model Performs feature selection Typical Use Cases Majorly used to prevent overfitting Sparse solutions - modelling cases where the features are in millions or more Presence of Highly Correlated Features Works well even in presence of highly correlated features Arbitrarily selects any one feature among the highly correlated ones
  • 25. Stochastic Gradient Descent ● Simple & yet efficient approach for linear models ● Supports out-of-core training ● Randomly select data & train model. ● Repeat the above step & model keeps tuning
  • 26. Robust Regression ● Outliers have some serious impact on estimation of predictor ● Huber Regression vs Ridge Regression