SlideShare a Scribd company logo
Hutcheson, G. D. (2011). Ordinary Least-Squares Regression. In L. Moutinho and G. D. 
Hutcheson, The SAGE Dictionary of Quantitative Management Research. Pages 224-228. 
Ordinary Least-Squares Regression 
Introduction 
Ordinary least-squares (OLS) regression is a generalized linear modelling technique that may be used to 
model a single response variable which has been recorded on at least an interval scale. The technique may 
be applied to single or multiple explanatory variables and also categorical explanatory variables that have 
been appropriately coded. 
Key Features 
At a very basic level, the relationship between a 
continuous response variable (Y) and a 
continuous explanatory variable (X) may be 
represented using a line of best-fit, where Y is 
predicted, at least to some extent, by X. If this 
relationship is linear, it may be appropriately 
represented mathematically using the straight 
line equation 'Y = α + βx', as shown in Figure 1 
(this line was computed using the least-squares 
procedure; see Ryan, 1997). 
The relationship between variables Y and X is 
described using the equation of the line of best 
fit with α indicating the value of Y when X is 
equal to zero (also known as the intercept) and 
β indicating the slope of the line (also known as 
the regression coefficient). The regression 
coefficient β describes the change in Y that is 
associated with a unit change in X. As can be 
seen from Figure 1, β only provides an 
indication of the average expected change (the 
observed data are scattered around the line), making it important to also interpret the confidence intervals for 
the estimate (the large sample 95% two-tailed approximation of the confidence intervals can be calculated as 
β ± 1.96 s.e. β). 
In addition to the model parameters and confidence intervals for β, it is useful to also have an indication of 
how well the model fits the data. Model fit can be determined by comparing the observed scores of Y (the 
values of Y from the sample of data) with the expected values of Y (the values of Y predicted by the 
regression equation). The difference between these two values (the deviation, or residual as it is also called) 
provides an indication of how well the model predicts each data point. Adding up the deviances for all the 
data points after they have been squared (this basically removes negative deviations) provides a simple 
measure of the degree to which the data deviates from the model overall. The sum of all the squared 
residuals is known as the residual sum of squares (RSS) and provides a measure of model-fit for an OLS 
regression model. A poorly fitting model will deviate markedly from the data and will consequently have a 
relatively large RSS, whereas a good-fitting model will not deviate markedly from the data and will 
consequently have a relatively small RSS (a perfectly fitting model will have an RSS equal to zero, as there 
will be no deviation between observed and expected values of Y). It is important to understand how the RSS 
statistic (or the deviance as it is also known; see Agresti,1996, pages 96-97) operates as it is used to 
determine the significance of individual and groups of variables in a regression model. A graphical 
illustration of the residuals for a simple regression model is provided in Figure 2. Detailed examples of 
calculating deviances from residuals for null and simple regression models can be found in Hutcheson and 
Moutinho, 2008.
The deviance is an important statistic as it 
enables the contribution made by explanatory 
variables to the prediction of the response 
variable to be determined. If by adding a 
variable to the model, the deviance is greatly 
reduced, the added variable can be said to 
have had a large effect on the prediction of Y 
for that model. If, on the other hand, the 
deviance is not greatly reduced, the added 
variable can be said to have had a small effect 
on the prediction of Y for that model. The 
change in the deviance that results from the 
explanatory variable being added to the model 
is used to determine the significance of that 
variable's effect on the prediction of Y in that 
model. To assess the effect that a single 
explanatory variable has on the prediction of 
Y, one simply compares the deviance statistics 
before and after the variable has been added 
to the model. For a simple OLS regression 
model, the effect of the explanatory variable 
can be assessed by comparing the RSS 
statistic for the full regression model (Y = α + βx) with that for the null model (Y = α). The difference in 
deviance between the nested models can then be tested for significance using an F-test computed from the 
following equation. 
Fdf p−df p+q ,df p+q 
= 
RSS p−RSS p+q 
df p−df p+qRSS p+q / df p+q 
where p represents the null model, Y = α, p+q represents the model Y = α + βx, and df are the degrees of 
freedom associated with the designated model. It can be seen from this equation that the F-statistic is simply 
based on the difference in the deviances between the two models as a fraction of the deviance of the full 
model, whilst taking account of the number of parameters. 
In addition to the model-fit statistics, the R-square statistic is also commonly quoted and provides a 
measure that indicates the percentage of variation in the response variable that is `explained' by the model. 
R-square, which is also known as the coefficient of multiple determination, is defined as 
R2 = RSS after regression 
total RSS 
and basically gives the percentage of the deviance in the response variable that can be accounted for by 
adding the explanatory variable into the model. Although R-square is widely used, it will always increase as 
variables are added to the model (the deviance can only go down when additional variables are added to a 
model). One solution to this problem is to calculate an adjusted R-square statistic (R2 
a) which takes into 
account the number of terms entered into the model and does not necessarily increase as more terms are 
added. Adjusted R-square can be derived using the following equation 
Ra 2 
= R2− 
k 1−R2  
n−k−1 
where n is the number of cases used to construct the model and k is the number of terms in the model (not 
including the constant). 
An example of simple OLS regression 
A simple OLS regression model with a single explanatory variable can be illustrated using the example of 
predicting ice cream sales given outdoor temperature (Koteswara, 1970). The model for this relationship
(calculated using software) is 
Ice cream consumption = 0.207 + 0.003 temperature. 
The parameter for α (0.207) indicates the predicted consumption when temperature is equal to zero. It 
should be noted that although the parameter α is required to make predictions of ice cream consumption at 
any given temperature, the prediction of consumption at a temperature of zero might be of limited 
usefulness, particularly when the observed data does not include a temperature of zero in it's range 
(predictions should only be made within the limits of the sampled values). The parameter β indicates that for 
each unit increase in temperature, ice cream consumption increases by 0.003 units. The significance of the 
relationship between temperature and ice cream consumption can be estimated by comparing the deviance 
statistics for the two nested models in the table below; one that includes temperature and one that does not. 
This difference in deviance can be assessed for significance using the F-statistic. 
Model deviance (RSS) df change in 
deviance 
F-statistic P-value 
consumption = a 0.1255 29 
0.0755 42.28 <.0001 
consumption = α + β temperature 0.0500 28 
On the basis of this analysis, outdoor temperature would appear to be significantly related to ice cream 
consumption with each unit increase in temperature being associated with an increase of 0.003 units in ice 
cream consumption. Using these statistics it is a simple matter to also compute the R-square statistic for this 
model, which is 0.0755/0.1255, or 0.60. Temperature “explains” 60% of the deviance in ice cream 
consumption (i.e., when temperature is added to the model, the deviance in the Y variable is reduced by 
60%). 
OLS regression with multiple explanatory variables 
The OLS regression model can be extended to include multiple explanatory variables by simply adding 
additional variables to the equation. The form of the model is the same as above with a single response 
variable (Y), but this time Y is predicted by multiple explanatory variables (X1 to X3). 
Y = α + β1X1 + β2X2 + β3X3 
The interpretation of the parameters (α and β) from the above model is basically the same as for the simple 
regression model above, but the relationship cannot now be graphed on a single scatter plot. α indicates the 
value of Y when all vales of the explanatory variables are zero. Each β parameter indicates the average 
change in Y that is associated with a unit change in X, whilst controlling for the other explanatory variables 
in the model. Model-fit can be assessed through comparing deviance measures of nested models. For 
example, the effect of variable X3 on Y in the model above can be calculated by comparing the nested models 
Y = α + β1X1 + β2X2 + β3X3 
Y = α + β1X1 + β2X2 
The change in deviance between these models indicates the effect that X3 has on the prediction of Y when the 
effects of X1 and X2 have been accounted for (it is, therefore, the unique effect that X3 has on Y after taking 
into account X1 and X2). The overall effect of all three explanatory variables on Y can be assessed by 
comparing the models 
Y = α + β1X1 + β2X2 + β3X3 
Y = α. 
The significance of the change in the deviance scores can be assessed through the calculation of the F-statistic 
using the equation provided above (these are, however, provided as a matter of course by most 
software packages). As with the simple OLS regression, it is a simple matter to compute the R-square 
statistics.
An example of multiple OLS regression 
A multiple OLS regression model with three explanatory variables can be illustrated using the example from 
the simple regression model given above. In this example, the price of the ice cream and the average income 
of the neighbourhood are also entered into the model. This model is calculated as 
Ice cream consumption = 0.197 – 1.044 price + 0.033 income + 0.003 temperature. 
The parameter for α (0.197) indicates the predicted consumption when all explanatory variables are equal to 
zero. The β parameters indicate the average change in consumption that is associated with each unit increase 
in the explanatory variable. For example, for each unit increase in price, consumption goes down by 1.044 
units. The significance of the relationship between each explanatory variable and ice cream consumption can 
be estimated by comparing the deviance statistics for nested models. The table below shows the significance 
of each of the explanatory variables (shown by the change in deviance when that variable is removed from 
the model) in a form typically used by software (when only one parameter is assessed, the F-statistic is 
equivalent to the t-statistic (F = √t) which is often quoted in statistical output). 
deviance 
change 
df F-value P-value 
coefficient 
price 0.002 1 F1,26= 1.567 0.222 
income 0.011 1 F1,26= 7.973 0.009 
temperature 0.082 1 F1,26= 60.252 <0.0001 
residuals 0.035 26 
Within the range of the data collected in this study, temperature and income appear to be significantly related 
to ice cream consumption. 
Conclusion 
OLS regression is one of the major techniques used to analyse data and forms the basis of many other 
techniques (for example ANOVA and the Generalised linear models, see Rutherford, 2001). The usefulness 
of the technique can be greatly extended with the use of dummy variable coding to include grouped 
explanatory variables (see Hutcheson and Moutinho, 2008, for a discussion of the analysis of experimental 
designs using regression) and data transformation methods (see, for example, Fox, 2002). OLS regression is 
particularly powerful as it relatively easy to also check the model asumption such as linearity, constant 
variance and the effect of outliers using simple graphical methods (see Hutcheson and Sofroniou, 1999). 
Further Reading 
Agresti, A. (1996). An Introduction to Categorical Data Analysis. John Wiley and Sons, Inc. 
Fox, J. (2002). An R and S-Plus Companion to Applied Regression. London: Sage Publications. 
Hutcheson, G. D. and Moutinho, L. (2008). Statistical Modeling for Management. Sage Publications. 
Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist. London: Sage Publications. 
Koteswara, R. K. (1970). Testing for the Independence of Regression Disturbances. Econometrica, 38:,97- 
117. 
Rutherford, A. (2001). Introducing ANOVA and ANCOVA: a GLM approach. London: Sage Publications. 
Ryan, T. P. (1997). Modern Regression Methods. Chichester: John Wiley and Sons. 
Graeme Hutcheson 
Manchester University

More Related Content

What's hot

Econometrics ch3
Econometrics ch3Econometrics ch3
Econometrics ch3
Baterdene Batchuluun
 
Heteroscedasticity | Eonomics
Heteroscedasticity | EonomicsHeteroscedasticity | Eonomics
Heteroscedasticity | Eonomics
Transweb Global Inc
 
multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)
Unexplord Solutions LLP
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
GunjanKhandelwal13
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
Long Beach City College
 
DUMMY VARIABLE REGRESSION MODEL
DUMMY VARIABLE REGRESSION MODELDUMMY VARIABLE REGRESSION MODEL
DUMMY VARIABLE REGRESSION MODEL
Arshad Ahmed Saeed
 
Basic concepts of_econometrics
Basic concepts of_econometricsBasic concepts of_econometrics
Basic concepts of_econometrics
SwapnaJahan
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
Mahak Vijayvargiya
 
6.5 central limit
6.5 central limit6.5 central limit
6.5 central limitleblance
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Sohag Babu
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Shameer P Hamsa
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Muhammad Ali
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
Sabbir Tahmidur Rahman
 
SOME PROPERTIES OF ESTIMATORS - 552.ppt
SOME PROPERTIES OF ESTIMATORS - 552.pptSOME PROPERTIES OF ESTIMATORS - 552.ppt
SOME PROPERTIES OF ESTIMATORS - 552.ppt
dayashka1
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Srikant001p
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
DevendraRavindraPati
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Teachers Mitraa
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
Seth Anandaram Jaipuria College
 

What's hot (20)

Econometrics ch3
Econometrics ch3Econometrics ch3
Econometrics ch3
 
Heteroscedasticity | Eonomics
Heteroscedasticity | EonomicsHeteroscedasticity | Eonomics
Heteroscedasticity | Eonomics
 
multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)multiple linear regression in spss (procedure and output)
multiple linear regression in spss (procedure and output)
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
DUMMY VARIABLE REGRESSION MODEL
DUMMY VARIABLE REGRESSION MODELDUMMY VARIABLE REGRESSION MODEL
DUMMY VARIABLE REGRESSION MODEL
 
Basic concepts of_econometrics
Basic concepts of_econometricsBasic concepts of_econometrics
Basic concepts of_econometrics
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
 
6.5 central limit
6.5 central limit6.5 central limit
6.5 central limit
 
Panel slides
Panel slidesPanel slides
Panel slides
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
SOME PROPERTIES OF ESTIMATORS - 552.ppt
SOME PROPERTIES OF ESTIMATORS - 552.pptSOME PROPERTIES OF ESTIMATORS - 552.ppt
SOME PROPERTIES OF ESTIMATORS - 552.ppt
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 

Similar to OLS chapter

Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
Victoria Bozhenko
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
Gauravchaudhary214677
 
multiple Regression
multiple Regressionmultiple Regression
multiple Regression
Anniqah
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
Ergin Akalpler
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
Nicholas Manurung
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
Mohamed Farouk, CFA, CFTe I
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
iris765749
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression model
Mehdi Shayegani
 
Regression
RegressionRegression
Measuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian DistributionMeasuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian Distribution
IJERA Editor
 
Risi ottavio
Risi ottavioRisi ottavio
Risi ottavio
OttavioRisi
 
ML Module 3.pdf
ML Module 3.pdfML Module 3.pdf
ML Module 3.pdf
Shiwani Gupta
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
nszakir
 
Regression
RegressionRegression
Regression
nandini patil
 
manecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptxmanecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptx
asdfg hjkl
 
Regression with Time Series Data
Regression with Time Series DataRegression with Time Series Data
Regression with Time Series Data
Rizano Ahdiat R
 

Similar to OLS chapter (20)

Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
 
multiple Regression
multiple Regressionmultiple Regression
multiple Regression
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Chapter13
Chapter13Chapter13
Chapter13
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression model
 
Regression
RegressionRegression
Regression
 
Measuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian DistributionMeasuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian Distribution
 
Risi ottavio
Risi ottavioRisi ottavio
Risi ottavio
 
ML Module 3.pdf
ML Module 3.pdfML Module 3.pdf
ML Module 3.pdf
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Regression
RegressionRegression
Regression
 
manecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptxmanecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptx
 
Regression with Time Series Data
Regression with Time Series DataRegression with Time Series Data
Regression with Time Series Data
 

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

OLS chapter

  • 1. Hutcheson, G. D. (2011). Ordinary Least-Squares Regression. In L. Moutinho and G. D. Hutcheson, The SAGE Dictionary of Quantitative Management Research. Pages 224-228. Ordinary Least-Squares Regression Introduction Ordinary least-squares (OLS) regression is a generalized linear modelling technique that may be used to model a single response variable which has been recorded on at least an interval scale. The technique may be applied to single or multiple explanatory variables and also categorical explanatory variables that have been appropriately coded. Key Features At a very basic level, the relationship between a continuous response variable (Y) and a continuous explanatory variable (X) may be represented using a line of best-fit, where Y is predicted, at least to some extent, by X. If this relationship is linear, it may be appropriately represented mathematically using the straight line equation 'Y = α + βx', as shown in Figure 1 (this line was computed using the least-squares procedure; see Ryan, 1997). The relationship between variables Y and X is described using the equation of the line of best fit with α indicating the value of Y when X is equal to zero (also known as the intercept) and β indicating the slope of the line (also known as the regression coefficient). The regression coefficient β describes the change in Y that is associated with a unit change in X. As can be seen from Figure 1, β only provides an indication of the average expected change (the observed data are scattered around the line), making it important to also interpret the confidence intervals for the estimate (the large sample 95% two-tailed approximation of the confidence intervals can be calculated as β ± 1.96 s.e. β). In addition to the model parameters and confidence intervals for β, it is useful to also have an indication of how well the model fits the data. Model fit can be determined by comparing the observed scores of Y (the values of Y from the sample of data) with the expected values of Y (the values of Y predicted by the regression equation). The difference between these two values (the deviation, or residual as it is also called) provides an indication of how well the model predicts each data point. Adding up the deviances for all the data points after they have been squared (this basically removes negative deviations) provides a simple measure of the degree to which the data deviates from the model overall. The sum of all the squared residuals is known as the residual sum of squares (RSS) and provides a measure of model-fit for an OLS regression model. A poorly fitting model will deviate markedly from the data and will consequently have a relatively large RSS, whereas a good-fitting model will not deviate markedly from the data and will consequently have a relatively small RSS (a perfectly fitting model will have an RSS equal to zero, as there will be no deviation between observed and expected values of Y). It is important to understand how the RSS statistic (or the deviance as it is also known; see Agresti,1996, pages 96-97) operates as it is used to determine the significance of individual and groups of variables in a regression model. A graphical illustration of the residuals for a simple regression model is provided in Figure 2. Detailed examples of calculating deviances from residuals for null and simple regression models can be found in Hutcheson and Moutinho, 2008.
  • 2. The deviance is an important statistic as it enables the contribution made by explanatory variables to the prediction of the response variable to be determined. If by adding a variable to the model, the deviance is greatly reduced, the added variable can be said to have had a large effect on the prediction of Y for that model. If, on the other hand, the deviance is not greatly reduced, the added variable can be said to have had a small effect on the prediction of Y for that model. The change in the deviance that results from the explanatory variable being added to the model is used to determine the significance of that variable's effect on the prediction of Y in that model. To assess the effect that a single explanatory variable has on the prediction of Y, one simply compares the deviance statistics before and after the variable has been added to the model. For a simple OLS regression model, the effect of the explanatory variable can be assessed by comparing the RSS statistic for the full regression model (Y = α + βx) with that for the null model (Y = α). The difference in deviance between the nested models can then be tested for significance using an F-test computed from the following equation. Fdf p−df p+q ,df p+q = RSS p−RSS p+q df p−df p+qRSS p+q / df p+q where p represents the null model, Y = α, p+q represents the model Y = α + βx, and df are the degrees of freedom associated with the designated model. It can be seen from this equation that the F-statistic is simply based on the difference in the deviances between the two models as a fraction of the deviance of the full model, whilst taking account of the number of parameters. In addition to the model-fit statistics, the R-square statistic is also commonly quoted and provides a measure that indicates the percentage of variation in the response variable that is `explained' by the model. R-square, which is also known as the coefficient of multiple determination, is defined as R2 = RSS after regression total RSS and basically gives the percentage of the deviance in the response variable that can be accounted for by adding the explanatory variable into the model. Although R-square is widely used, it will always increase as variables are added to the model (the deviance can only go down when additional variables are added to a model). One solution to this problem is to calculate an adjusted R-square statistic (R2 a) which takes into account the number of terms entered into the model and does not necessarily increase as more terms are added. Adjusted R-square can be derived using the following equation Ra 2 = R2− k 1−R2  n−k−1 where n is the number of cases used to construct the model and k is the number of terms in the model (not including the constant). An example of simple OLS regression A simple OLS regression model with a single explanatory variable can be illustrated using the example of predicting ice cream sales given outdoor temperature (Koteswara, 1970). The model for this relationship
  • 3. (calculated using software) is Ice cream consumption = 0.207 + 0.003 temperature. The parameter for α (0.207) indicates the predicted consumption when temperature is equal to zero. It should be noted that although the parameter α is required to make predictions of ice cream consumption at any given temperature, the prediction of consumption at a temperature of zero might be of limited usefulness, particularly when the observed data does not include a temperature of zero in it's range (predictions should only be made within the limits of the sampled values). The parameter β indicates that for each unit increase in temperature, ice cream consumption increases by 0.003 units. The significance of the relationship between temperature and ice cream consumption can be estimated by comparing the deviance statistics for the two nested models in the table below; one that includes temperature and one that does not. This difference in deviance can be assessed for significance using the F-statistic. Model deviance (RSS) df change in deviance F-statistic P-value consumption = a 0.1255 29 0.0755 42.28 <.0001 consumption = α + β temperature 0.0500 28 On the basis of this analysis, outdoor temperature would appear to be significantly related to ice cream consumption with each unit increase in temperature being associated with an increase of 0.003 units in ice cream consumption. Using these statistics it is a simple matter to also compute the R-square statistic for this model, which is 0.0755/0.1255, or 0.60. Temperature “explains” 60% of the deviance in ice cream consumption (i.e., when temperature is added to the model, the deviance in the Y variable is reduced by 60%). OLS regression with multiple explanatory variables The OLS regression model can be extended to include multiple explanatory variables by simply adding additional variables to the equation. The form of the model is the same as above with a single response variable (Y), but this time Y is predicted by multiple explanatory variables (X1 to X3). Y = α + β1X1 + β2X2 + β3X3 The interpretation of the parameters (α and β) from the above model is basically the same as for the simple regression model above, but the relationship cannot now be graphed on a single scatter plot. α indicates the value of Y when all vales of the explanatory variables are zero. Each β parameter indicates the average change in Y that is associated with a unit change in X, whilst controlling for the other explanatory variables in the model. Model-fit can be assessed through comparing deviance measures of nested models. For example, the effect of variable X3 on Y in the model above can be calculated by comparing the nested models Y = α + β1X1 + β2X2 + β3X3 Y = α + β1X1 + β2X2 The change in deviance between these models indicates the effect that X3 has on the prediction of Y when the effects of X1 and X2 have been accounted for (it is, therefore, the unique effect that X3 has on Y after taking into account X1 and X2). The overall effect of all three explanatory variables on Y can be assessed by comparing the models Y = α + β1X1 + β2X2 + β3X3 Y = α. The significance of the change in the deviance scores can be assessed through the calculation of the F-statistic using the equation provided above (these are, however, provided as a matter of course by most software packages). As with the simple OLS regression, it is a simple matter to compute the R-square statistics.
  • 4. An example of multiple OLS regression A multiple OLS regression model with three explanatory variables can be illustrated using the example from the simple regression model given above. In this example, the price of the ice cream and the average income of the neighbourhood are also entered into the model. This model is calculated as Ice cream consumption = 0.197 – 1.044 price + 0.033 income + 0.003 temperature. The parameter for α (0.197) indicates the predicted consumption when all explanatory variables are equal to zero. The β parameters indicate the average change in consumption that is associated with each unit increase in the explanatory variable. For example, for each unit increase in price, consumption goes down by 1.044 units. The significance of the relationship between each explanatory variable and ice cream consumption can be estimated by comparing the deviance statistics for nested models. The table below shows the significance of each of the explanatory variables (shown by the change in deviance when that variable is removed from the model) in a form typically used by software (when only one parameter is assessed, the F-statistic is equivalent to the t-statistic (F = √t) which is often quoted in statistical output). deviance change df F-value P-value coefficient price 0.002 1 F1,26= 1.567 0.222 income 0.011 1 F1,26= 7.973 0.009 temperature 0.082 1 F1,26= 60.252 <0.0001 residuals 0.035 26 Within the range of the data collected in this study, temperature and income appear to be significantly related to ice cream consumption. Conclusion OLS regression is one of the major techniques used to analyse data and forms the basis of many other techniques (for example ANOVA and the Generalised linear models, see Rutherford, 2001). The usefulness of the technique can be greatly extended with the use of dummy variable coding to include grouped explanatory variables (see Hutcheson and Moutinho, 2008, for a discussion of the analysis of experimental designs using regression) and data transformation methods (see, for example, Fox, 2002). OLS regression is particularly powerful as it relatively easy to also check the model asumption such as linearity, constant variance and the effect of outliers using simple graphical methods (see Hutcheson and Sofroniou, 1999). Further Reading Agresti, A. (1996). An Introduction to Categorical Data Analysis. John Wiley and Sons, Inc. Fox, J. (2002). An R and S-Plus Companion to Applied Regression. London: Sage Publications. Hutcheson, G. D. and Moutinho, L. (2008). Statistical Modeling for Management. Sage Publications. Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist. London: Sage Publications. Koteswara, R. K. (1970). Testing for the Independence of Regression Disturbances. Econometrica, 38:,97- 117. Rutherford, A. (2001). Introducing ANOVA and ANCOVA: a GLM approach. London: Sage Publications. Ryan, T. P. (1997). Modern Regression Methods. Chichester: John Wiley and Sons. Graeme Hutcheson Manchester University