Multiple Regression Analysis Explained

Econometrics for Finance By
Asimamaw B.
CHPTER 3: Multiple Linear Regression
Introduction
 In simple regression we study the relationship between a dependent
variable and a single explanatory (independent variable); assume
that a dependent variable is influenced by only one explanatory
variable.
 However, many economic variables are influenced by several
factors or variables.
 For instance, in decision to investment studies we study the
relationship between quantity invested (or either to invest or not)
and interest rate, share price , exchange rate, etc..

Asimamaw B.
 Multiple regression analysis is an extension of SR analysis to
cover cases in which the dependent variable is hypothesized to
depend on more than one explanatory variable.
 Much of the analysis in MR will be a straightforward extension
of the SR.
 However, we will encounter two new problems.
 First, when evaluating the influence of a given
explanatory variable on the dependent variable, we now
have to face the problem of discriminating between
its effects and the effects of the other explanatory
variables.
 Second, we shall have to tackle the problem of model
specification.

Asimamaw B.
 MR models are models in which the dependent variable (or
regressand) depends on two or more explanatory variables, or
regressors. 𝜷𝟐
 MLR (PRF) in which we have one dependent variable Y, and K
explanatory variables, is given by:

Asimamaw B.
Motivation for multiple regression
 Incorporate more explanatory factors into the model
 Explicitly hold fixed other factors that otherwise would be
in the Error term.
 Allow for more flexible functional forms.
Example_1: Wage equation

Example_2:
 Suppose that the average test score(avgscore) depends on funding
(expend), average family income (avginc), and other unobservable.
 It is written as:
 Now, lets consider the problem of explaining the effect of per student
spending (expend) on the average standardized test score (avgscore) at
the high shool level.

Asimamaw B.
 The coefficient of interest for policy purpose is 𝜷𝟏, the ceteris paribus
effect of expend on avgscore.
 By including avginc explicitly in the model, we are able to control for
its effect on avgscore.
 This is likely to be important because average family income tends
to be correlated with per student spending: spending levels are often
determined by both property and local income taxes.
 In simple regression analysis, avginc would be in-
cluded in the error term, which would likely be correlated with expend,
causing the OLS estimator 𝜷𝟏 in the two-variable model to be biased.
 From the previuos equation:
 Per student spending(expend) is likely to be correlated with
average family income at a given high school because of school
financing
 Omitting average family income in regression would lead to
biased estimate of the effect of spending on average test
scores.
 In a SRM, effect of per student spending would partly include
the effect of family income on test scores.

Asimamaw B.
Interpretation of the multiple regression model
 More generally,
𝜷𝒋 =
𝝏𝒀𝑱
𝝏𝑿𝒋
 The multiple linear regression model manages to hold the values of
other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration
 "Ceteris paribus"-interpretation
 Ceteris paribus is a Latin phrase that generally means "all other
things being equal."
 It has still to be assumed that unobserved factors do not change
if the explanatory variables are changed.
By how much does the dependent variable
change if the j-th independent variable is
increased by one unit, holding all other
independent variables and the error term
constant.

Asimamaw B.
Example: Determinants of college GPA
Interpretation:
 Holding ACT fixed, another point on high school grade point
average is associated with another 0.453 points college grade
point average.
 OR: If we compare two students with the same ACT, but the
hsGPA of student A is one point higher, we predict student A to
have a colGPA that is 0.453 higher than that of student B.
 Holding high school grade point average fixed, another 10
points on ACT are associated with less than one point on
college GPA.

Asimamaw B.
 Given the introduction, in this chapter,
 we will first start our discussion with the assumptions of the
multiple regressions and
 we will proceed our analysis with the case of two explanatory
variables and then we will generalize the MRM in the case of k-
explanatory variables using matrix approach.
 Assumptions of the MLR
 In order to specify our multiple linear regression model and proceed
our analysis with regard to this model, some assumptions are
compulsory.
 These assumptions are:
i) Randomness of the error term: The variable 𝜺𝒊 is a real
random variable.
ii) Zero mean of the error term: (𝑬(𝜺𝒊) = 𝟎

Asimamaw B.
iii) Homoscedasticity: The variance of each 𝜺𝒊 is the same for all the 𝑿𝒊
values. i.e. 𝑬(𝜺𝒊)𝟐
= 𝝈𝜺
𝟐
(constant).
iv) Normality of 𝜺𝒊: The values of each 𝜺𝒊 are normally distributed. i.e. 𝜺𝒊
~ (0, 𝝈𝟐).
v) No auto or serial correlation: The values of 𝜺𝒊 (corresponding to 𝑿𝒊 )
are independent from the values of any other 𝜺𝒊 (corresponding to 𝑿𝒋 ) for
𝒊 ≠ 𝒋. i.e. 𝑬(𝜺𝒊𝜺𝒋) = 𝟎
Vi) Independence of 𝜺𝒊 and 𝑿𝒊: Every disturbance term 𝜺𝒊 is independent
of the explanatory variables. i.e. 𝑬(𝑿𝟏𝒊𝜺𝒊) = 𝟎 and 𝑬(𝑿𝟐𝒊𝜺𝒊) = 𝟎
 This condition is automatically fulfilled if we assume that the
values of the X’s are a set of fixed numbers in all (hypothetical)
samples.
Vii) No perfect multicollinearity: The explanatory variables are not
perfectly linearly correlated.
Vii) Correct specification of the model: the mathematical form is
correctly defined (linear or non-linear form and the number of equations

Asimamaw B.
 We can’t exclusively list all the assumptions but the above assumptions
are some of the basic assumptions that enable us to proceed our
analysis.
A Model With Two Explanatory Variables
 In order to understand the nature of MRM easily, we start our analysis
with the case of two explanatory variables, then extend this to the case
of k-explanatory variables.
Estimation
𝑌𝑖 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒊 + 𝜷𝟐𝑿𝟐𝒊 + 𝜺𝒊 … … … … … … … … . . (𝟏)
Equation (1) is MLR with two explanatory variables.
 The expected value of the above model is called population regression
equation i.e. E(𝑌𝑖) = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒊 + 𝜷𝟐𝑿𝟐𝒊 since (𝑬(𝜺𝒊) = 𝟎
where 𝜷𝒊s are the population parameters. 𝜷𝒐 is referred to as the intercept
and 𝜷𝟏and 𝜷𝟐 are also some times known as regression slopes of the
regression.
 Note that, 𝜷𝟐 for example measures the effect on E(Y) of a unit change
in 𝑿𝟐 when 𝑿𝟏 is held constant.

Asimamaw B.
 Since the PRF is unknown to any investigator, it has to be estimated from
sample data.
 Thus we use the SRF/SRE, which we write it as:
𝑌𝑖 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒊 + 𝜷𝟐𝑿𝟐𝒊 + 𝒆𝒊 … … … … … … … … . . … … . . … . (𝟐)
 Given sample observation on 𝒀𝒊, 𝑿𝟏𝒊 and 𝑿𝟐𝒊, we estimate (1) using the
method of least square (OLS).
 From equation (2),
𝒆𝒊 = 𝑌𝑖 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝑿𝟐𝒊 … … … … … … … … … … … . . … (𝟑)
OLS Method: Minimization of the sum of residual squares i.e. minimize 𝒆𝒊
𝟐
.
F.O.C:
 we partially differentiate 𝒆𝒊
𝟐
with respect to 𝜷𝟎, 𝜷𝟏 and 𝜷𝟐 and set the
partial derivatives equal to zero.
1)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝟎
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝒊 − 𝜷𝟐𝑿𝟐𝒊 = 𝟎
2)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝟏
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝒊 − 𝜷𝟐𝑿𝟐𝒊 𝑿𝟏𝒊 = 𝟎
3)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝟐
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝒊 − 𝜷𝟐𝑿𝟐𝒊 𝑿𝟐𝒊 = 𝟎

Asimamaw B.
 Summing from 1 to n, the multiple regression equation produces three
Normal Equations:
𝒀𝒊 = 𝒏𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟏 𝑿𝟐𝒊 … … … … … … … … … . … … … (𝒂)
𝑿𝟏𝒊𝒀𝒊 = 𝜷𝟎 𝑿𝟏𝒊 + 𝜷𝟏 𝑿𝟏𝒊
𝟐
+ 𝜷𝟐 𝑿𝟏𝒊𝑿𝟐𝒊 … … … … . . … … (𝒃)
𝑿𝟐𝒊𝒀𝒊 = 𝜷𝟎 𝑿𝟐𝒊 + 𝜷𝟏 𝑿𝟐𝒊𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊
𝟐
… … … … … … . . (𝒄)
From equation (a) we get,
𝜷𝟎 = 𝑌 − 𝜷𝟏𝑿𝟏 − 𝜷𝟐𝑿𝟐 … … … … … … … … … … … … … … … … . 𝒅
By substituting (d) in equations (b) and (c) and after simplifications
(including use of deviation forms) we get,
𝒙𝟏𝒊𝒚𝒊 = 𝜷𝟏 𝒙𝟏𝒊
𝟐
+ 𝜷𝟐 𝒙𝟏𝒊𝒙𝟐𝒊
𝒙𝟐𝒊𝒚𝒊 = 𝜷𝟏 𝒙𝟐𝒊𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊
𝟐
By re-arranging,
𝜷𝟏 𝒙𝟏𝒊
𝟐
+ 𝜷𝟐 𝒙𝟏𝒊𝒙𝟐𝒊 = 𝒙𝟏𝒊𝒚𝒊
𝜷𝟏 𝒙𝟐𝒊𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊
𝟐
= 𝒙𝟐𝒊𝒚𝒊

 𝜷𝟏 and 𝜷𝟐 can easily be solved using matrix.
 We can rewrite the above two equations in matrix form as follows.
𝒙𝟏𝒊
𝟐
𝒙𝟐𝒊𝒙𝟏𝒊
𝒙𝟏𝒊𝒙𝟐𝒊
𝒙𝟐𝒊
𝟐
𝜷𝟏
𝜷𝟐
=
𝒙𝟏𝒊𝒚𝒊
𝒙𝟐𝒊𝒚𝒊
 If we use Cramer’s rule to solve the above matrix we obtain,
𝜷𝟏 =
𝒙𝟏𝒊𝒚𝒊∗ 𝒙𝟐𝒊
𝟐 − 𝒙𝟐𝒊𝒚𝒊∗ 𝒙𝟏𝒊𝒙𝟐𝒊
𝒙𝟏𝒊
𝟐∗ 𝒙𝟐𝒊
𝟐 − 𝒙𝟐𝒊𝒙𝟏𝒊∗ 𝒙𝟏𝒊𝒙𝟐𝒊
… … … … … … … … … … . … … … … (𝒊)
𝜷𝟐 =
𝒙𝟏𝒊
𝟐∗ 𝒙𝟐𝒊𝒚𝒊 − 𝒙𝟐𝒊𝒙𝟏𝒊∗ 𝒙𝟏𝒊𝒚𝒊
𝒙𝟏𝒊
𝟐∗ 𝒙𝟐𝒊
𝟐 − 𝒙𝟐𝒊𝒙𝟏𝒊∗ 𝒙𝟏𝒊𝒙𝟐𝒊
… … … … … … … … … … … . … … . . (𝒊𝒊)
 We can also express 𝜷𝟏 and 𝜷𝟐 using variance and covariance notations.
𝜷𝟏 =
𝑪𝒐𝒗 𝑿𝟏, 𝒀 ∗𝑽𝒂𝒓 𝑿𝟐 − 𝑪𝒐𝒗 𝑿𝟐, 𝒀 ∗𝑪𝒐𝒗 𝑿𝟏, 𝑿𝟐
𝑽𝒂𝒓 𝑿𝟏 ∗𝑽𝒂𝒓 𝑿𝟐 − [𝑪𝒐𝒗 𝑿𝟏, 𝑿𝟐 ]𝟐 … … … … … … … … (𝒊𝒊𝒊)
𝜷𝟐 =
𝑽𝒂𝒓 𝑿𝟏 ∗𝑪𝒐𝒗 𝑿𝟐, 𝒀 − 𝑪𝒐𝒗 𝑿𝟐,𝑿𝟏 ∗𝑪𝒐𝒗 𝑿𝟏, 𝒀
𝑽𝒂𝒓 𝑿𝟏 ∗ 𝒗𝒂𝒓 𝑿𝟐 − [𝑪𝒐𝒗 𝑿𝟏, 𝑿𝟐 ]𝟐 … … … … … … . … . … (𝒊𝒗)

Asimamaw B.
Example: given a data on a dependent variable Y and two independent
variables(𝑋1 and 𝑋2) as follow
i) estimate the parameters of the MLR model using matrix approach.
ii) Interpret the estimates
i Y 𝑋1 𝑋2
1 13 10 2
2 15 12 3
3 16 15 4
4 20 17 5

Asimamaw B.
The Coefficient of Determination, 𝑹𝟐
 In the SRM, we introduced 𝑹𝟐 as a measure of the proportion of
variation in the dependent variable that is explained by variation in
the explanatory variable.
 In MLR the same measure is relevant, and the same formulas are
valid but now we talk of the proportion of variation in the
dependent variable explained by all explanatory variables included
in the model.
 The coefficient of determination is:
𝑹𝟐 =
𝑬𝑺𝑺
𝑻𝑺𝑺
=
𝒚𝒊
𝟐
𝒚𝒊
𝟐
= 𝟏 −
𝑹𝑺𝑺
𝑻𝑺𝑺
= 𝟏 −
𝒆𝒊
𝟐
𝒚𝒊
𝟐
 In the present model of two explanatory variables:
𝒆𝒊
𝟐
= (𝒀𝒊 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊)𝟐
= 𝒆𝒊(𝒀𝒊 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊)

Econometrics for
Finance By
Asimamaw B.
= 𝒆𝒊𝒚𝒊 − 𝜷𝟏 𝒙𝟏𝒊𝒆𝒊 − 𝜷𝟏 𝒙𝟐𝒊𝒆𝒊)
= 𝒆𝟏𝒊𝒚𝒊 because 𝒙𝟏𝒊𝒆𝒊 = 𝟎 = 𝒙𝟐𝒊𝒆𝒊)
= 𝒚𝒊(𝒚𝒊−𝜷𝟏𝒙𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊)
 Thus, 𝒆𝒊
𝟐
= 𝒚𝒊
𝟐
− 𝜷𝟏 𝒙𝟏𝒊𝒚𝒊 − 𝜷𝟐 𝒙𝟐𝒊𝒚𝒊
 By rearranging, we get
𝒚𝒊
𝟐
= 𝜷𝟏 𝒙𝟏𝒊𝒚𝒊 + 𝜷𝟐 𝒙𝟐𝒊𝒚𝒊 + 𝒆𝒊
𝟐
𝑻𝑺𝑺
(𝑻𝒐𝒕𝒂𝒍 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏)
ESS
(𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏)
RSS
(𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅
𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏)
 Therefore, 𝑹𝟐 =
𝑬𝑺𝑺
𝑻𝑺𝑺
=
𝜷𝟏 𝒙𝟏𝒊𝒚𝒊 + 𝜷𝟐 𝒙𝟐𝒊𝒚𝒊
𝒚𝒊
𝟐

Asimamaw B.
 As in SRM, 𝑹𝟐 is also viewed as a measure of the prediction
ability of the model over the sample period, or as a measure
of how well the estimated regression fits the data.
Note: The value of 𝑹𝟐
is also equal to the squared sample
correlation coefficient between 𝒀𝒊 and 𝒀𝒊.
 Since the sample correlation coefficient measures the
linear association between two variables, if 𝑹𝟐 is high,
that means there is a close association between the values
of 𝒀𝒊 and the values of predicted by the model, 𝒀𝒊.
 In this case, the model is said to “fit” the data well.
 If 𝑹𝟐 is low, there is low association between the values
of 𝒀𝒊 and the values predicted by the model, 𝒀𝒊 and the
model does not fit the data well.

Asimamaw B.
Adjusted Coefficient of Determination (𝑹𝟐)
 One difficulty with 𝑹𝟐 is that it can be made large by adding
more and more variables, even if the variables added have no
economic justification.
 Algebraically, it is the fact that as the variables are added the
sum of squared errors (RSS) goes down (it can remain
unchanged, but this is rare) and thus 𝑹𝟐
goes up.
 If the model contains 𝒏 − 𝟏 variables then 𝑹𝟐 = 𝟏
 The manipulation of model just to obtain a high 𝑹𝟐 is not wise.
 An alternative measure of goodness of fit, called the adjusted 𝑹𝟐
and often symbolized as (𝑹𝟐), is usually reported by regression
programs.
 It is computed as:
𝑹𝟐 = 𝟏 −
𝑹𝑺𝑺
𝒏 − 𝒌
𝑻𝑺𝑺
𝒏 − 𝟏
= 𝟏 −
𝒆𝒊
𝟐
𝒏 − 𝒌
𝒚𝒊
𝟐
𝒏 − 𝟏
= 𝟏 − (𝟏 − 𝑹𝟐)
𝒏 − 𝟏
𝒏 − 𝒌

Asimamaw B.
 This measure does not always goes up when a variable is
added because of the degree of freedom term 𝒏 − 𝒌 is the
numerator.
 As the number of variables k increases, RSS goes down, but
so does 𝒏 − 𝒌. The effect on 𝑹𝟐 depends on the amount by
which 𝑹𝟐
falls.
 While solving one problem, this corrected measure of
goodness of fit unfortunately introduces another one.
 It losses its interpretation; 𝑹𝟐
is no longer the percent of
variation explained. This modified 𝑹𝟐
is sometimes used and
misused as a device for selecting the appropriate set of
explanatory variables.

Asimamaw B.
General Linear Regression Model and Matrix
Approach
 So far we have discussed the regression models containing one
or two explanatory variables.
 Let us now generalize the model assuming that it contains k
variables. It will be of the form
𝑌𝑖 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒊 + 𝜷𝟐𝑿𝟐𝒊 + 𝜷𝟑𝑿𝟑𝒊 + ⋯ + 𝜷𝒌𝑿𝒌𝒊 + 𝜺𝒊
 There are k parameters to be estimated. The system of normal
equations consist of k+1 equations, in which the unknowns are
the parameters 𝜷𝟎, 𝜷𝟏, 𝜷𝟐, 𝜷𝟑,…, 𝜷𝒌 and the known terms will
be the sums of squares and the sums of products of all variables
in the structural equations.
 Least square estimators of the unknown parameters are obtained
by minimizing the sum of the squared residuals( 𝒆𝒊
𝟐
).

𝒆𝒊
𝟐
= (𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊 − 𝜷𝟑𝒙𝟑𝒊 − ⋯ − 𝜷𝒌𝒙𝒌𝒊)𝟐
With respect to 𝜷𝒊 where 𝒊 = 𝟎, 𝟏, 𝟐, … 𝒌
 The partial derivations are equated to zero to obtain Normal Equations.
1)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝟎
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊 − 𝜷𝟑𝒙𝟑𝒊 − ⋯ − 𝜷𝒌𝒙𝒌𝒊 = 𝟎
2)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝟏
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊 − 𝜷𝟑𝒙𝟑𝒊 − ⋯ − 𝜷𝒌𝒙𝒌𝒊) 𝑿𝟏𝒊 = 𝟎
3)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝟐
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊 − 𝜷𝟑𝒙𝟑𝒊 − ⋯ − 𝜷𝒌𝒙𝒌𝒊) 𝑿𝟐𝒊 = 𝟎
…………………………………………………………………………………
k+1)
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝒌
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊 − 𝜷𝟑𝒙𝟑𝒊 − ⋯ − 𝜷𝒌𝒙𝒌𝒊) 𝑿𝒌𝒊 = 𝟎
 The general form of the above equations (except for the first ) may be written as:
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝒋
= −𝟐 𝒀𝒊 − 𝜷𝟎 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝒙𝟐𝒊 − 𝜷𝟑𝒙𝟑𝒊 − ⋯ − 𝜷𝒌𝒙𝒌𝒊) 𝑿𝒌𝒊 = 𝟎
Where 𝒋 = 𝟏, 𝟐, … , 𝒌

Asimamaw B.
The Normal Equations of the general linear regression model are:
𝒀𝒊 = 𝒏𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊
𝒙𝟏𝒊𝒚𝒊 = 𝜷𝟎 𝒙𝟏𝒊 + 𝜷𝟏 𝒙𝟏𝒊
𝟐 + 𝜷𝟐 𝒙𝟏𝒊𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝑿𝟏𝒊 𝑿𝒌𝒊
𝒙𝟐𝒊𝒚𝒊 = 𝜷𝟎 𝒙𝟐𝒊 + 𝜷𝟏 𝒙𝟐𝒊𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊
𝟐 + ⋯ + 𝜷𝒌 𝒙𝟐𝒊 𝑿𝒌𝒊
𝒙𝒌𝒊𝒚𝒊 = 𝜷𝟎 𝒙𝒌𝒊 + 𝜷𝟏 𝒙𝒌𝒊𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝒌𝒊𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊
𝟐
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

 Solving the above normal equations will result in algebraic
complexity.
 But we can solve this easily using matrix.
 Hence in the next section we will discuss the matrix approach to
linear regression model.
Matrix Approach to Linear Regression Model
 The general linear regression model with k explanatory variables is
written in the form:
where (𝑖 = 1,2,3, … 𝑛) and 𝜷𝟎 is the intercept, 𝜷𝟏 to 𝜷𝒌 are partial
slope coefficients 𝜺𝒊 is stochastic disturbance term and 𝑖 = 𝑖𝑡ℎ
observation, ‘n’ being the size of the observation.
 Since i represents the 𝑖𝑡ℎ
observation, we shall have ‘n’ number of
equations with ‘n’ number of observations on each variable.

𝑌1 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝟏 + 𝜷𝟐𝑿𝟐𝟏 + 𝜷𝟑𝑿𝟑𝟏 + ⋯ + 𝜷𝒌𝑿𝒌𝟏 + 𝜺𝟏
𝑌2 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝟐 + 𝜷𝟐𝑿𝟐𝟐 + 𝜷𝟑𝑿𝟑𝟐 + ⋯ + 𝜷𝒌𝑿𝒌𝟐 + 𝜺𝟐
𝑌3 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝟑 + 𝜷𝟐𝑿𝟐𝟑 + 𝜷𝟑𝑿𝟑𝟑 + ⋯ + 𝜷𝒌𝑿𝒌𝟑 + 𝜺𝟑
𝑌𝑛 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒌 + 𝜷𝟐𝑿𝟐𝒌 + 𝜷𝟑𝑿𝟑𝒌 + ⋯ + 𝜷𝒌𝑿𝒏𝒌 + 𝜺𝒌
 These equations are put in matrix form as:
𝑌1
𝑌2
𝑌3
.
𝑌𝑘
=
1
1
1
.
1
𝑿𝟏𝟏
𝑿𝟏𝟐
𝑿𝟏𝟐
.
𝑿𝟏𝒌
𝑿𝟐𝟏
𝑿𝟐𝟐
𝑿𝟐𝟑
.
𝑿𝟐𝒌
.
.
.
.
.
𝑿𝒌𝟏
𝑿𝒌𝟐
𝑿𝒌𝟑
.
𝑿𝒌𝒌
𝜷𝟎
𝜷𝟏
𝜷𝟐
.
𝜷𝒌
+
𝜺𝟏
𝜺𝟐
𝜺𝟑
.
𝜺𝒏
𝑌 = 𝑋 𝜷 + 𝜺
 In short, 𝒀 = 𝑿 𝜷 + 𝜺 … … … … … … … … … … … … … … … … … … . 𝒎
 The order of matrix and vectors involved are:
𝒀 = (𝒏 × 𝟏); 𝑿 = (𝒏 × (𝒌 + 𝟏); 𝜷 = ((𝒌 + 𝟏) × 𝟏); and 𝜺 = (𝒏 × 𝟏)

Asimamaw B.
 To derive the OLS estimators of β, under the usual (classical)
assumptions mentioned earlier, we define two vectors 𝜷 and ‘𝒆’ as:
𝜷 =
𝜷𝟎
𝜷𝟏
𝜷𝟐
.
𝜷𝒌
and 𝒆 =
𝒆𝟏
𝒆𝟐
𝒆𝟑
.
𝒆𝒏
 Thus we can write: 𝒀 = 𝑿𝜷 + 𝒆 and 𝒆 = 𝒀 − 𝑿𝜷
 Given above, to find OLS estimators of β we have to minimize 𝒆𝒊
𝟐
.
𝒊=𝟏
𝒏
𝒆𝒊
𝟐
= 𝒆𝟏
𝟐
+ 𝒆𝟐
𝟐
+ 𝒆𝟑
𝟐
+ ⋯ 𝒆𝒏
𝟐
= [𝒆𝟏 𝒆𝟐 𝒆𝟑 . 𝒆𝒏]
𝒆𝟏
𝒆𝟐
𝒆𝟑
.
𝒆𝒏
= 𝒆`𝒆

𝒆`𝒆 = [𝒀 − 𝑿𝜷 ]′ [𝒀 − 𝑿𝜷]
= 𝒀′
𝒀 − 𝐘′
𝑿𝜷 − 𝜷′
𝐗′
𝐘 + 𝜷′
𝐗′
𝑿𝜷
 Since 𝜷′
𝐗′
𝒀 is a scalar (1 × 1) it is equal with its transpose.
𝜷′𝐗′𝐘 = 𝐘′𝑿𝜷
𝒆`𝒆 = 𝒀′𝒀 − 𝟐𝜷′𝐗′𝐘 + 𝜷′𝐗′𝑿𝜷 … … … … … … … … … … . . … . . 𝒓
 Minimizing 𝒆`𝒆 with respect to elements in β:
𝝏 𝒆𝒊
𝟐
𝝏𝜷𝒋
=
𝝏𝒆`𝒆
𝝏𝜷
=
𝝏[𝒀′
𝒀 − 𝟐𝜷′
𝐗′
𝐘 + 𝜷′
𝐗′
𝑿𝜷]
𝝏𝜷
= −𝟐𝐗′𝐘 + 𝟐𝐗′𝑿𝜷
Since,
𝝏[𝐗′𝑨𝑿]
𝝏𝑋
= 𝟐𝑨𝑿
 Equating the above expression to null vector 0, we obtain,
−𝟐𝐗′𝐘 + 𝟐𝐗′𝑿𝜷 = 𝟎
𝐗′
𝑿𝜷 = 𝐗′
𝐘
𝜷 =
𝑿′𝒀
𝑿′𝑿
= (𝑿′𝑿)−𝟏(𝑿′𝒀)
 Therefore, 𝜷 = (𝑿′𝑿)−𝟏(𝑿′𝒀) which is a column vector of the OLS
estimators.

Asimamaw B.
Statistical Properties of the Parameters (Matrix)
Approach
 We have seen, in simple linear regression that the OLS estimators
(𝜷𝟎 and 𝜷𝟏) satisfy the small sample property of an estimator i.e.
BLUE property.
 In multiple regression, the OLS estimators also satisfy the BLUE
property.
 Now we proceed to examine the desired properties of the
estimators in matrix notations:
A) Linearity:
 We know that 𝜷 = (𝑿′𝑿)−𝟏(𝑿′𝒀)
Let say W = (𝑿′
𝑿)−𝟏
𝑿′
𝜷 = 𝑾𝒀 … … … … … … … … … … … … … . . … … … … … … … (𝒒)
 Since C is a matrix of fixed variables, equation q indicates us 𝜷 is
linear in Y.

Asimamaw B.
B) Unbiasedness
 If the OLS estimators are said to be unbiased estimators, their
expected value has to be equal with their counter true population
parameters.
𝜷 = (𝑿′𝑿)−𝟏𝑿′𝒀
= (𝑿′
𝑿)−𝟏
𝑿′
(𝑿 𝜷 + 𝜺)
𝜷 = 𝜷 + (𝑿′𝑿)−𝟏𝑿′𝜺 since 𝑿′𝑿)−𝟏𝑿′𝑿 is identity matrix, I.
 Now lets take the expectation, and we get,
𝐸(𝜷) = 𝑬[𝜷 + 𝑿′𝑿)−𝟏𝑿′𝜺
= 𝑬(𝜷) + 𝑬[ 𝑿′
𝑿)−𝟏
𝑿′
𝜺
= 𝜷 + (𝑿′𝑿)−𝟏𝑿′𝑬(𝜺)
𝐸(𝜷) = 𝜷 since 𝑬(𝜺) is Zero.
 Thus, the ordinary least square estimators are unbiased.

Asimamaw B.
C) Minimum Variance
 Before showing all the OLS estimators are best(possess the minimum
variance property), it is important to derive their variance.
 We know that var 𝜷 = 𝐸(𝜷 − 𝐸(𝜷)𝟐 = 𝐸(𝜷 − 𝜷)𝟐
= 𝑬[ 𝜷 − 𝜷 𝜷 − 𝜷 ′]
=
𝐸(𝜷𝟏 − 𝜷𝟏)𝟐 𝑬[ 𝜷𝟏 − 𝜷𝟏 𝜷𝟐 − 𝜷𝟐 ] . . . 𝑬[ 𝜷𝟏 − 𝜷𝟏 𝜷𝒌 − 𝜷𝒌 ]
𝑬[ 𝜷𝟐 − 𝜷𝟐 𝜷𝟏 − 𝜷𝟏 ] 𝐸(𝜷𝟐 − 𝜷𝟐)𝟐 . . . 𝑬[ 𝜷𝟐 − 𝜷𝟐 𝜷𝒌 − 𝜷𝒌 ]
. . . . . .
. . . . . .
. . . . . .
𝑬[ 𝜷𝒌 − 𝜷𝒌 𝜷𝟏 − 𝜷𝟏 ] 𝑬[ 𝜷𝒌 − 𝜷𝒌 𝜷𝟏 − 𝜷𝟏 ] . . . 𝐸(𝜷𝒌 − 𝜷𝒌)𝟐
=
𝑣𝑎𝑟(𝜷𝟏) 𝑐𝑜𝑣𝑎𝑟(𝜷𝟏, 𝜷𝟐) . . . 𝑐𝑜𝑣𝑎𝑟(𝜷𝟐, 𝜷𝟏)
𝑐𝑜𝑣𝑎𝑟(𝜷𝟐, 𝜷𝟏) 𝑣𝑎𝑟(𝜷𝟐) . . . 𝑐𝑜𝑣𝑎𝑟(𝜷𝟐, 𝜷𝟏)
. . . . . .
. . . . . .
. . . . . .
𝑐𝑜𝑣𝑎𝑟(𝜷𝟐, 𝜷𝟏) 𝑐𝑜𝑣𝑎𝑟(𝜷𝟐, 𝜷𝟏) . . . 𝑣𝑎𝑟(𝜷𝟏)

Asimamaw B.
 The above matrix is a symmetric matrix containing variances along its
main diagonal and covariance of the estimators every where else.
 This matrix is, therefore, called the Variance-covariance matrix of least
squares estimators of the regression slopes.
 Thus, var 𝜷 = 𝑬[ 𝜷 − 𝜷 𝜷 − 𝜷 ′].
 From the proof of unbiasedness of 𝜷 , we have the following
relationships: 𝜷 − 𝜷 = (𝑿′
𝑿)−𝟏
(𝑿′
𝜺)
 Thus, by substituting this in to the above equation, we get,

Therefore, var 𝜷 = 𝝈𝜺
𝟐
(𝑿′
𝑿)−𝟏
Where the X’s are in their absolute form/Actual values.
 When the x’s are in deviation form we can write the multiple regression in
matrix form as:
𝜷 = (𝒙′𝒙)−𝟏𝒙′𝒚
Where
𝜷 =
𝜷𝟏
𝜷𝟐
.
𝜷𝒌
𝒙′𝒙 =
Note: The above column matrix 𝜷 doesn’t include the constant term 𝜷𝟎.
Under such conditions the variances of slope parameters in deviation form can
be written as: var 𝜷 = 𝝈𝜺
𝟐
(𝒙′
𝒙)−𝟏

Asimamaw B.
 The only unknown part in variances and covariance of the estimators is
𝝈𝜺
𝟐
.
 As we have seen in the SLR Model 𝜎𝜀
2
=
𝒆𝒊
𝟐
𝑛−2
. For K parameters
including the constant parameter 𝜎𝜀
2
=
𝒆𝒊
𝟐
𝑛−𝑘
.
 Now it is time to see the minimum variance property.
 To show that all the 𝜷𝒊s in the 𝜷 vector are Best Estimators, we
have also to prove that their respective variances obtained
previously are the smallest amongst all other possible linear
unbiased estimators.
 We follow the same procedure as followed in case of single
explanatory variable model where, we first assumed an alternative
linear unbiased estimator and then it was established that its
variance is greater than the estimator of the regression model.

Asimamaw B.
Lets assume 𝜷∗ an alternative unbiased and linear estimator of β .
Suppose that 𝜷∗ = [ 𝑿′𝑿)−𝟏𝑿′𝒀 + 𝑩 𝒀
Where B is (𝑘 𝑥 𝑛) matrix of known constants.
 Therefore, 𝜷∗ = [ 𝑿′𝑿)−𝟏𝑿′𝒀 + 𝑩 (𝑿 𝜷 + 𝜺)
𝜷∗ = (𝑿′𝑿)−𝟏𝑿′(𝑿 𝜷 + 𝜺) + 𝑩(𝑿 𝜷 + 𝜺)
𝐸 𝜷∗
= 𝑬 𝑿′
𝑿 −𝟏
𝑿′
𝑿 𝜷 + 𝜺 + 𝑬 𝑩 𝑿 𝜷 + 𝜺
𝐸 𝜷∗
= 𝜷 + 𝑩𝑿 𝜷
 Since our assumption regarding an alternative 𝜷∗
is that it is to be
an unbiased estimator of β , therefore, 𝐸 𝜷∗
should be equal to β ;
in other words 𝑩𝑿 𝜷 should be a null matrix.
 Thus we say, BX should be = 0 if 𝜷∗ = [ 𝑿′𝑿)−𝟏𝑿′𝒀 + 𝑩 𝒀 is to
be an unbiased estimator.
 Let us now find variance of this alternative estimator.
var 𝜷∗ = 𝐸(𝜷∗ − 𝐸(𝜷∗)𝟐 = 𝐸(𝜷∗ − 𝜷)𝟐
= 𝑬[ 𝜷∗ − 𝜷 𝜷∗ − 𝜷 ′]

 var 𝜷∗
 Thus, var 𝜷∗
= 𝝈𝜺
𝟐
(𝑿′
𝑿)−𝟏
+ 𝝈𝜺
𝟐
𝑩𝑩′.

Asimamaw B.
 OR, in other words, var 𝜷∗ is greater than var 𝜷 by an expression
𝝈𝜺
𝟐𝑩𝑩′ and it proves that 𝜷 is the best estimator.
Coefficient of Determination in Matrix Form
 The coefficient of determination 𝑹𝟐 can be derived in matrix form
as follows.
 We know that 𝒆 = 𝒀 − 𝑿𝜷
𝒆𝒊
𝟐
= 𝒆`𝒆 = 𝒀′
𝒀 − 𝟐𝜷′
𝐗′
𝐘 + 𝜷′
𝐗′
𝑿𝜷
𝒀𝒊
𝟐
= 𝒀′𝒀
 We also know that 𝒚𝒊
𝟐
= 𝒀𝒊
𝟐
−
𝟏
𝒏
( 𝒀𝒊)
𝟐
, which is the total
sum of squares(TSS).
 𝑬𝑺𝑺 = 𝑻𝑺𝑺 − 𝑬𝑺𝑺
= 𝒚𝒊
𝟐
− 𝒆𝒊
𝟐

Asimamaw B.
= 𝒀′𝒀 −
𝟏
𝒏
𝒀𝒊
𝟐
− 𝒀′𝒀 − 𝟐𝜷′𝐗′𝐘 + 𝜷′𝐗′𝑿𝜷
 Since 𝐗′𝑿𝜷 = 𝑿′𝒀
𝑬𝑺𝑺 = 𝒀′𝒀 −
𝟏
𝒏
𝒀𝒊
𝟐
− 𝒀′𝒀 − 𝟐𝜷′𝐗′𝐘 + 𝜷′𝐗′𝒀
𝑬𝑺𝑺 = 𝜷′𝐗′𝒀 −
𝟏
𝒏
𝒀𝒊
𝟐
𝑹𝟐 =
𝑬𝑺𝑺
𝑻𝑺𝑺
=
𝜷′
𝐗′
𝒀 −
𝟏
𝒏
𝒀𝒊
𝟐
𝒀′𝒀 −
𝟏
𝒏
( 𝒀𝒊)
𝟐
=
𝜷′
𝐗′
𝒀 − 𝒏𝒀𝟐
𝒀′𝒀 − 𝒏𝒀𝟐

Asimamaw B.
Hypothesis Testing in Multiple Regression Model
 In multiple regression models we will undertake two tests of
significance.
i) Individual significance of parameters of the model.
ii) Overall significance of the model
Test of Overall Significance
 Through out the previous section we were concerned with testing
the significance of the estimated partial regression coefficients
individually, i.e. under the separate hypothesis that each of the true
population partial regression coefficient was zero.
 In this section we extend this idea to joint test of the relevance of
all the included explanatory variables.
 Now consider the following:

Asimamaw B.
 In this case, the null hypothesis and alternative hypothesis will be
as follow:
𝐻0: 𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 = ⋯ = 𝜷𝒌 = 𝟎
𝐻1: At least one of the 𝜷𝒌 are non zero.
 The null hypothesis is a joint hypothesis that 𝜷𝟏, 𝜷𝟐, 𝜷𝟑 … 𝜷𝒌 are
jointly or simultaneously equal to zero.
 A test of such a hypothesis is called a test of overall significance
of the observed or estimated regression line, that is, whether Y is
linearly related to 𝑿𝟏, 𝑿𝟐, 𝑿𝟑 … 𝑿𝒌.
𝑭 =
𝑹𝟐
𝒌 − 𝟏
𝟏 − 𝑹𝟐
𝒏 − 𝒌

Asimamaw B.
 This implies the computed value of F can be calculated either as a
ratio of ESS & TSS or 𝑹𝟐 & 1 − 𝑹𝟐.
Test Procedures
Step 1: Formulate the hypothesis
Step 2: compute the F test static
Step 3: Compare the F computed with the critical value of F which
leaves the probability of α in the upper tail of the F distribution with
𝒌 − 𝟏 and 𝒏 − 𝒌 degree of freedom.
Step 4: Decision Rule
 If the computed value of F is greater than the critical value of
F (𝒌 − 𝟏, 𝒏 − 𝒌), then the parameters of the model are
jointly significant or the dependent variable Y is linearly
related to the independent variables included in the model.
 Thus, reject 𝐻0 and accept 𝐻1, implying overall significance.

Asimamaw B.
MLR Applications: Example
 In order to help you understand the working of matrix algebra in
MLR consider the following numerical example.
 Based on the above table and model answer the following question.
I. Estimate the parameter estimators using the matrix
approach
II. Compute the variance of the parameters.
III. Compute the coefficient of determination (𝑹𝟐)
IV. Report the regression result interpret it.
i 𝒀𝒊 𝑿𝟏𝒊 𝑿𝟐𝒊 𝑿𝟑𝒊
1 13 10 2 1
2 15 12 3 2
3 16 15 4 3
4 20 17 5 4
5 25 18 5 5

Asimamaw B.
Exercises and Assignment
Problem 1: Consider the data given below to fit a linear function:
𝑌𝑖 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒊 + 𝜷𝟐𝑿𝟐𝒊 + 𝜷𝟑𝑿𝟑𝒊 + 𝜺𝒊
i 𝒀𝒊 𝑿𝟏𝒊 𝑿𝟐𝒊 𝑿𝟑𝒊
1 40 35 53 200
2 40 35 53 212
3 41 38 50 211
4 46 40 64 212
5 52 40 70 203
6 59 42 68 194
7 53 44 59 194
8 61 46 73 188
9 55 50 59 196
10 64 50 71 190

 Based on the above table and model answer the following
question.
i. Estimate the parameter estimators using the matrix
approach
ii. Compute the variance of the parameters.
iii. Compute the coefficient of determination (𝑹𝟐)
iv. Report the regression result.
Problem 2: Consider the model:
𝑌𝑖 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏𝒊 + 𝜷𝟐𝑿𝟐𝒊 + 𝜷𝟑𝑿𝟑𝒊 + 𝜺𝒊
 On the basis of the information given below, answer the following
questions.
𝒀𝒊
𝟐
= 28,000; 𝑿𝟏𝒊
𝟐
= 3200; 𝑿𝟐𝒊
𝟐
= 7300
𝑌𝑖 = 800; 𝑋1𝑖 = 250; 𝑋2𝑖 = 400
𝑋1𝑖 𝑋2𝑖 = 4300; 𝑋1𝑖 𝑌𝑖 = 8400; 𝑋2𝑖 𝑌𝑖 = 13500
𝑛 = 25

A) Find the OLS estimate of the slope
coefficient
B) Compute variance of the estimator of 𝜷𝟐.
C) Test the significant of 𝜷2, slope parameter at
5% level of significant
D) Compute 𝑅2 and 𝑅2 and interpret the result
E) Test the overall significance of the model

Asimamaw B.
Problem 3: Researcher is using data for a sample of 10 observations
to estimate the relation between consumption expenditure and income.
Preliminary analysis of the sample data produces the following data:
𝑥𝑖 𝑦𝑖 = 700; 𝑋𝑖 = 100; 𝑌𝑖 = 200; 𝒙𝒊
𝟐 = 𝟏𝟎𝟎𝟎;
𝒀𝒊
𝟐
= 𝟏𝟒𝟎𝟎
Where 𝒙𝒊 = 𝑿𝒊 − 𝑿 and 𝒚𝒊 = 𝒀𝒊 − 𝒀
A) Use the above information to compute OLS estimates of
the intercept and slope coefficients and interpret the
result
B) Calculate the variance of the slope parameter
C) Compute the value 𝑹𝟐 (coefficient of determination) and
interpret the result
D) Compute 95% confidence interval for the slope
parameter
E) Test the significance of the slope parameter at 5% level
of confidence using t-test

Multiple Regression Analysis Explained

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Multiple Regression Analysis Explained

Similar to Multiple Regression Analysis Explained (20)

More from Beamlak5

More from Beamlak5 (13)

Recently uploaded

Recently uploaded (20)

Multiple Regression Analysis Explained