SlideShare a Scribd company logo
1 of 8
Download to read offline
Lesson 14
Machine Learning – Linear Regression Assumptions Kush Kulshrestha
Assumptions of Regression Analysis
There are 5 assumptions in regression analysis:
1. There should exist a linear relationship between the dependent variable (Y) and independent variables (X). This
means that change in Y due to unit change in X should be independent of the value of X.
2. There should be no correlation between the residual terms. It is also called autocorrelation.
3. Independent variables (Xs) should not have high correlation with each other. If they are, we end up counting
the effect multiple times. This is called multicollinearity.
4. Error term must have constant variance. This phenomenon is called homoscedasticity. Non constant variance is
called hetroskedasticity.
5. The errors must be normally distributed.
1. Existence of Linear Relationship
There should exist a linear relationship between the dependent variable (Y) and independent variables (X). This
means that change in Y due to unit change in X should be independent of the value of X.
If this is not true, then we end up modelling a linear model on non linear data. This will result in a poor fit model.
How to fix:
Consider applying a nonlinear transformation to the dependent and/or independent variables if you can think of a
transformation that seems appropriate.
Another possibility to consider is adding another regressor that is a nonlinear function of one of the other variables.
For example, if you have regressed Y on X, and the graph of residuals versus predicted values suggests a parabolic
curve, then it may make sense to regress Y on both X and X^2 (i.e., X-squared). The latter transformation is possible
even when X and/or Y have negative values, whereas log transform is not. Higher-order terms of this kind (cubic,
etc.) might also be considered in some cases, but are not preferred.
2. Correlation in the Residuals
This is mostly applicable for time series regression models.
Serial correlation in the errors means that there is room for improvement in the model. This type of error is
basically some correlation between consecutive error terms. This can be checked with the help of ACF and PACF
plots.
In general, after fitting a time series model the residuals should be white noise. So they should have no
autocorrelation.
This can be tested using Durbin-Watson test.
The Durbin-Watson statistic provides a test for significant residual autocorrelation at lag 1: the DW stat is
approximately equal to 2(1-a) where a is the lag-1 residual autocorrelation, so ideally it should be close to 2.0--say,
between 1.4 and 2.6 for a sample size of 50.
3. Multicolinearity
Multicolinearity refers to the condition when the input independent variables are highly correlated with each other.
Check the correlation coefficients for all of the input pairs to check if there is multicolinearity present in the
dataset.
According to Tabachnick & Fidell (1996) the independent variables with a bivariate correlation more than .70 should
not be included in multiple regression analysis.
If you do find high collinearity, it means that your parameter estimates are unstable. That is, small changes
(sometimes in the 4th significant figure) in your data can cause big changes in your parameter estimates
(sometimes even reversing their sign). This is a bad thing.
If the correlations are high, this does not guarantee that there is multicolinearity. In order to confirm this, you have
to check the partial correlations between the input variables. Having a value of >0.4 of partial correlation along
with high correlation indicates the presence of multicolinearity.
If there are two variables showing multicolinearity, one of them should be excluded from the analysis as it is not
adding any extra value or information in the analysis but is also making the regression coefficients unstable.
4. Homoscedasticity
Violations of homoscedasticity (which are called "heteroscedasticity") make it difficult to gauge the true standard
deviation of the forecast errors, usually resulting in confidence intervals that are too wide or too narrow. In
particular, if the variance of the errors is increasing over time, confidence intervals for out-of-sample predictions
will tend to be unrealistically narrow.
Heteroscedasticity may also have the effect of giving too much weight to a small subset of the data (namely the
subset where the error variance was largest) when estimating coefficients.
To diagnose: look at a plot of residuals versus predicted values. Be alert for evidence of residuals that grow larger
either as a function of time or as a function of the predicted value.
To fix: If the dependent variable is strictly positive and if the residual-versus-predicted plot shows that the size of
the errors is proportional to the size of the predictions, a log transformation applied to the dependent variable may
be appropriate.
5. Normality of Residuals
Violations of normality create problems for determining whether model coefficients are significantly different from
zero and for calculating confidence intervals for forecasts. Calculation of confidence intervals and various
significance tests for coefficients are all based on the assumptions of normally distributed errors. If the error
distribution is significantly non-normal, confidence intervals may be too wide or too narrow.
The best test for normally distributed errors is a normal probability plot or normal quantile plot of the residuals. If
the distribution is normal, the points on such a plot should fall close to the diagonal reference line.
There are also a variety of statistical tests for normality, Jarque-Bera test is the one we have used.
Here is an example of a bad-looking and a good looking normal quantile plot:
5. Normality of Residuals
In such cases, a nonlinear transformation of variables might cure both problems. In the case of the two normal
quantile plots above, the second model was obtained applying a natural log transformation to the variables in the
first one.
The dependent and independent variables in a regression model do not need to be normally distributed by
themselves--only the prediction errors need to be normally distributed.

More Related Content

What's hot

Factor analysis
Factor analysis Factor analysis
Factor analysis Nima
 
Poisson distribution
Poisson distributionPoisson distribution
Poisson distributionStudent
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentationCarlo Magno
 
Linear regression
Linear regressionLinear regression
Linear regressionTech_MX
 
Multicolinearity
MulticolinearityMulticolinearity
MulticolinearityPawan Kawan
 
Autocorrelation
AutocorrelationAutocorrelation
AutocorrelationADITHYAT1
 
050 sampling theory
050 sampling theory050 sampling theory
050 sampling theoryRaj Teotia
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Presentation on nominal and ordinal scales of measurement
Presentation on nominal and ordinal scales of measurementPresentation on nominal and ordinal scales of measurement
Presentation on nominal and ordinal scales of measurementForum of Blended Learning
 
Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval EstimationShubham Mehta
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)Harsh Upadhyay
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
The sampling distribution
The sampling distributionThe sampling distribution
The sampling distributionHarve Abella
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testingzoheb khan
 
Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysisariassam
 

What's hot (20)

Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Poisson distribution
Poisson distributionPoisson distribution
Poisson distribution
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Ols
OlsOls
Ols
 
050 sampling theory
050 sampling theory050 sampling theory
050 sampling theory
 
Autocorrelation (1)
Autocorrelation (1)Autocorrelation (1)
Autocorrelation (1)
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Presentation on nominal and ordinal scales of measurement
Presentation on nominal and ordinal scales of measurementPresentation on nominal and ordinal scales of measurement
Presentation on nominal and ordinal scales of measurement
 
Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval Estimation
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
The sampling distribution
The sampling distributionThe sampling distribution
The sampling distribution
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testing
 
Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysis
 

Similar to Assumptions of Linear Regression - Machine Learning

7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squaresYugesh Dutt Panday
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptErgin Akalpler
 
chapter_7.pptx
chapter_7.pptxchapter_7.pptx
chapter_7.pptxABDULAI3
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersrecepmaz
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersrecepmaz
 
SLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSSLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSNermin Osman
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
A researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxA researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxevonnehoggarth79783
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningKush Kulshrestha
 
Tempest in teapot
Tempest in teapotTempest in teapot
Tempest in teapotGaetan Lion
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxmadlynplamondon
 

Similar to Assumptions of Linear Regression - Machine Learning (20)

7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Multiple regression .pptx
Multiple regression .pptxMultiple regression .pptx
Multiple regression .pptx
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
 
chapter_7.pptx
chapter_7.pptxchapter_7.pptx
chapter_7.pptx
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Recep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managersRecep maz msb 701 quantitative analysis for managers
Recep maz msb 701 quantitative analysis for managers
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
SLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSSLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSS
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
A researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docxA researcher in attempting to run a regression model noticed a neg.docx
A researcher in attempting to run a regression model noticed a neg.docx
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
 
Sem with amos ii
Sem with amos iiSem with amos ii
Sem with amos ii
 
Tempest in teapot
Tempest in teapotTempest in teapot
Tempest in teapot
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
 

More from Kush Kulshrestha

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesKush Kulshrestha
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNNKush Kulshrestha
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationKush Kulshrestha
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionKush Kulshrestha
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionKush Kulshrestha
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine LearningKush Kulshrestha
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesKush Kulshrestha
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their typesKush Kulshrestha
 

More from Kush Kulshrestha (16)

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
 
Visualization-1
Visualization-1Visualization-1
Visualization-1
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
 
Time management
Time managementTime management
Time management
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

Assumptions of Linear Regression - Machine Learning

  • 1. Lesson 14 Machine Learning – Linear Regression Assumptions Kush Kulshrestha
  • 2. Assumptions of Regression Analysis There are 5 assumptions in regression analysis: 1. There should exist a linear relationship between the dependent variable (Y) and independent variables (X). This means that change in Y due to unit change in X should be independent of the value of X. 2. There should be no correlation between the residual terms. It is also called autocorrelation. 3. Independent variables (Xs) should not have high correlation with each other. If they are, we end up counting the effect multiple times. This is called multicollinearity. 4. Error term must have constant variance. This phenomenon is called homoscedasticity. Non constant variance is called hetroskedasticity. 5. The errors must be normally distributed.
  • 3. 1. Existence of Linear Relationship There should exist a linear relationship between the dependent variable (Y) and independent variables (X). This means that change in Y due to unit change in X should be independent of the value of X. If this is not true, then we end up modelling a linear model on non linear data. This will result in a poor fit model. How to fix: Consider applying a nonlinear transformation to the dependent and/or independent variables if you can think of a transformation that seems appropriate. Another possibility to consider is adding another regressor that is a nonlinear function of one of the other variables. For example, if you have regressed Y on X, and the graph of residuals versus predicted values suggests a parabolic curve, then it may make sense to regress Y on both X and X^2 (i.e., X-squared). The latter transformation is possible even when X and/or Y have negative values, whereas log transform is not. Higher-order terms of this kind (cubic, etc.) might also be considered in some cases, but are not preferred.
  • 4. 2. Correlation in the Residuals This is mostly applicable for time series regression models. Serial correlation in the errors means that there is room for improvement in the model. This type of error is basically some correlation between consecutive error terms. This can be checked with the help of ACF and PACF plots. In general, after fitting a time series model the residuals should be white noise. So they should have no autocorrelation. This can be tested using Durbin-Watson test. The Durbin-Watson statistic provides a test for significant residual autocorrelation at lag 1: the DW stat is approximately equal to 2(1-a) where a is the lag-1 residual autocorrelation, so ideally it should be close to 2.0--say, between 1.4 and 2.6 for a sample size of 50.
  • 5. 3. Multicolinearity Multicolinearity refers to the condition when the input independent variables are highly correlated with each other. Check the correlation coefficients for all of the input pairs to check if there is multicolinearity present in the dataset. According to Tabachnick & Fidell (1996) the independent variables with a bivariate correlation more than .70 should not be included in multiple regression analysis. If you do find high collinearity, it means that your parameter estimates are unstable. That is, small changes (sometimes in the 4th significant figure) in your data can cause big changes in your parameter estimates (sometimes even reversing their sign). This is a bad thing. If the correlations are high, this does not guarantee that there is multicolinearity. In order to confirm this, you have to check the partial correlations between the input variables. Having a value of >0.4 of partial correlation along with high correlation indicates the presence of multicolinearity. If there are two variables showing multicolinearity, one of them should be excluded from the analysis as it is not adding any extra value or information in the analysis but is also making the regression coefficients unstable.
  • 6. 4. Homoscedasticity Violations of homoscedasticity (which are called "heteroscedasticity") make it difficult to gauge the true standard deviation of the forecast errors, usually resulting in confidence intervals that are too wide or too narrow. In particular, if the variance of the errors is increasing over time, confidence intervals for out-of-sample predictions will tend to be unrealistically narrow. Heteroscedasticity may also have the effect of giving too much weight to a small subset of the data (namely the subset where the error variance was largest) when estimating coefficients. To diagnose: look at a plot of residuals versus predicted values. Be alert for evidence of residuals that grow larger either as a function of time or as a function of the predicted value. To fix: If the dependent variable is strictly positive and if the residual-versus-predicted plot shows that the size of the errors is proportional to the size of the predictions, a log transformation applied to the dependent variable may be appropriate.
  • 7. 5. Normality of Residuals Violations of normality create problems for determining whether model coefficients are significantly different from zero and for calculating confidence intervals for forecasts. Calculation of confidence intervals and various significance tests for coefficients are all based on the assumptions of normally distributed errors. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow. The best test for normally distributed errors is a normal probability plot or normal quantile plot of the residuals. If the distribution is normal, the points on such a plot should fall close to the diagonal reference line. There are also a variety of statistical tests for normality, Jarque-Bera test is the one we have used. Here is an example of a bad-looking and a good looking normal quantile plot:
  • 8. 5. Normality of Residuals In such cases, a nonlinear transformation of variables might cure both problems. In the case of the two normal quantile plots above, the second model was obtained applying a natural log transformation to the variables in the first one. The dependent and independent variables in a regression model do not need to be normally distributed by themselves--only the prediction errors need to be normally distributed.