Regression analysis: Simple Linear Regression Multiple Linear Regression

MBMG-7104/ ITHS-2202/ IMAS-3101/ IMHS-3101
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM

Regression Analysis
 Regression analysis is a set of statistical process for
estimating the relationship between a dependent variable
(response) and one or more independent variables(aka
explanatory).
 For example, when a series of Y numbers (such as the
monthly sales of cameras over a period of years) is causally
connected with the series of X numbers (the monthly
advertising budget), then it is beneficial to establish a
relationship between X and Y in order to forecast Y.
2

3
Types of Regression
 There are various types of
regressions.
 Each type has its own
importance on different
scenarios, but at the core, all the
regression methods analyze the
effect of the independent
variable on dependent variables.
 Some important types of
regression models are :

4
Linear Regression
 Linear regression attempts to model the
relationship between dependent (Y) and
independent (X) variables by fitting a
linear equation to observed data.
 The case of one independent variable is
called Simple Linear Regression, for
more than one independent variable the
process is called Multiple Linear
Regression.

5
A. Simple Linear Regression (SLR)
 Linear regression models describe the relationship
between variables by fitting a line to the observed data.
 Linear regression uses a straight line, while logistic and
non-linear regression uses curved lines.
 SLR assumes that at least to some extend, the behavior of
one variable (Y) is the result of a functional relationship
between the two variables ( X & Y).

6
Objectives of Regression
1. Establish if there is a relationship between two variables
(X & Y).
2. Forecast new observation.

7
Classical Assumptions of Linear regression
1. Linear relationship
2. No-Autocorrelation : Regressors and error term has no
correlation.
3. No-Multicollinearity : Independent variables are not correlated.
4. Homoscedasticity: The error term is the same across all values of
the independent variables.
5. Multivariate Normality : Data should be normally distributed

8 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Mathematical expression of Linear regression
 Assume Y is a dependent variable
 That depends on the realization of the
independent variable X.
 We know the linear equation of a simple
line is :
(1)
 Where, m = gradient or slope
 c = y -intercept or height at which the
line crosses the y –axis.
c
mX
Y 

c
mX
Y 

Intercept c
X
Y

 For SLR model, Equation (1)
could we written as :
(2)
 Where, m = gradient or slope
 β0 = y -intercept = value where
the regression line crosses the y –
axis.
 β1 = Coefficient or slope of X.
X
Y 1
0 
 

Intercept β0
X
Y
X
Y 1
0 
 


 The slope or coefficient of X denotes
the relationship between X and Y.
 With unit change in X, Y changed as
no. of times of X coefficient.
 In other words β1 represents the
sensitivity of Y changes with X.
 In the equation, Y = β0 + β1 X,
 β0 gives the value of variable Y, when
X = 0.
Intercept β0
X
Y
X
Y 1
0 
 


y = 2.7333 + 8.6485x
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Demand : Dependent Variable (Y)
Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y))
Consider the demand data
given in the table below.
Example
Month:
Independent
Variable (X)
Demand :
Dependent
Variable (Y)
1 9
2 15
3 32
4 48
5 52
6 60
7 39
8 65
9 90
10 93
 Here , The slope or coefficient of X is
 β1 = 8.6485
 denotes that for one unit change in X,
Y changes 8.6485

y = b0 + b1x + εi
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Linear (Demand : Dependent Variable (Y))
 Data in general may not intersect or
passes through the regression line.
 Rather, there exist some errors on
random component, which can be
measured as distance between true value
and predicted value.
 The regression model must include
these error terms as :
(3)
For sample data, equation (3) can be
written as :
i
X
Y 

 

 1
0
i
x
b
b
y 


 1
0
(4)
Non-random component
Random component

 To minimize the error or random component , SLR uses OLS
method and calculate the value of and as :
 The intercept or coefficient of X,
)
var(
)
(
)
(
)
)(
(
ˆ
1
2
1
1
Y
XY
Cov
X
X
Y
Y
X
X
n
i
i
n
i
i
i










X
Y 1
0
ˆ 
 

ORDINARY LEAST SQUARES (OLS)
0
̂ 1
̂

Standard error of slope and Intercept
2
1
2
1
2
0
)
(
1
2
)
(
SE
Intercept








 n
i
i
n
i
i
X
X
X
n
n


2
1
1
2
1
)
(
2
)
(
SE
Slope







n
i
i
n
i
i
X
X
n



 To minimize the error or random component , SLR uses OLS method
 The method of least squares gives the best equation under the
assumptions stated below :
1. The regression model is linear in regression parameters.
2. The explanatory variable, X, is assumed to be non-random or
non-stochastic (i.e., X is deterministic).
3. The conditional expected value of the error terms or residuals,
E(εi | Xi), is zero.
4. In case of time series data, error terms are uncorrelated, that is,
Cov(εi , εi ) = 0 for all i ≠ j.
5. The variance of the errors, Var(εi |Xi), is constant for all values
of Xi. or follows homoscedasticity.
6. The error terms, εi , follow a normal distribution.
Explanation of OLS method

 In ordinary least squares, the objective is find the optimal values of β0 and β1
that will minimize the Sum of Squares Errors (SSE) given in Eq. (6) as
2
1
1
0
1
2
)
( i
n
i
i
n
i
i X
Y
SSE 

 
 




 (6)

 To find the optimal values of β0 and β1 that will minimize SSE, we have to
equate the partial derivative of SSE with respect to β0 and β1 to zero [Eqs. (7)
and (9) as :
 Solving Eq. (7) for β0, the estimated value of β0 is given by
 Differentiating SSE with respect to β1, we get
 Substituting the value of β0 from Eq. (8) in Eq. (9), we get
(7)
(8)
(9)

 Thus, the value of β1 is given by
)
var(
)
(
)
(
)
)(
(
ˆ
1
2
1
1
Y
XY
Cov
X
X
Y
Y
X
X
n
i
i
n
i
i
i











Solution Manual Method :-
19
Time:
(X)
Demand
(Y)
𝑑𝑥
= 𝑋 − 𝑋
𝑑𝑦
= 𝑦 − 𝑦 (𝑋 − 𝑋)2
(𝑌 − 𝑌)2
(X- X
̅ )* (Y- Y
̅ )
1 9 -4.5 -41.3 20.25 1705.69 185.85
2 15 -3.5 -35.3 12.25 1246.09 123.55
3 32 -2.5 -18.3 6.25 334.89 45.75
4 48 -1.5 -2.3 2.25 5.29 3.45
5 52 -0.5 1.7 0.25 2.89 -0.85
6 60 0.5 9.7 0.25 94.09 4.85
7 39 1.5 -11.3 2.25 127.69 -16.95
8 65 2.5 14.7 6.25 216.09 36.75
9 90 3.5 39.7 12.25 1576.09 138.95
10 93 4.5 42.7 20.25 1823.29 192.15
5.5 50.3 82.5 7132.1 713.5
𝑋
𝑌
𝑋 − 𝑋 2
𝑌 − 𝑌 2
Σ(X- X
̅ )*(Y- Y
̅ )
β1 =
(𝑋− 𝑋)∗(𝑌− 𝑌)
(𝑋−𝑋 )2
=
713.5
82.5
= 8.6485
β0 = 𝑌 − β1 ∗ 𝑋
= 50.3 – 8.6485*5.5
= 2.733
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛:
Y =β0+β1 ∗ X
= 2.733+ 8.6485*X

Example: Consider the demand data given in the table
below.
20
Time: Independent
Variable (X) 1 2 3 4 5 6 7 8 9 10
Demand : Dependent
Variable (Y) 9 15 32 48 52 60 39 65 90 93
Time:
Independent
Variable (X)
Demand :
Dependent
Variable (Y)
1 9
2 15
3 32
4 48
5 52
6 60
7 39
8 65
9 90
10 93
Slope (b) 8.65
Intercept (a) 2.73
y = 8.6485x + 2.7333
R² = 0.8652
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y))
Solution Excel :-

 The Excel functions give b = 8.65 and a = 2.73.
 Use them in equation, Y = a + bX, to make a forecast.
 For example, for period 11 (X = 11),
 Forecast = 2.73 + 11*8.65 = 97.87.
 Similarly, for period 12,
 Forecast = 2.73 + 12*8.65 = 106.52.
21

Coefficient of Determination
22
 The coefficient of determination (R2), where R is the value of the
coefficient of correlation, is a measure of the variability that is accounted
for by the regression line for the dependent variable.
 Calculation :

 Where,
Yi = actual value of Y
𝑌 = estimated value of Y
𝑌= mean value of Y
 The coefficient of determination always falls between 0 and 1.
 For example, if r = 0.8, the coefficient of determination is r2 = 0.64
meaning that 64% of the variation in Y is due to variation in X.
 The remaining 36% variation in the value of Y is due to other variables.
 If the coefficient of determination is low, multiple regression analysis
may be used to account for all variables affecting the independent
variable Y. 23

24
Solution : Coefficient of determination and Std. Errors
Time: (X)
Dema
nd (Y) 𝑌 (𝑌 − 𝑌)2 (𝑌 − 𝑌)2
𝜀2
= (𝑌 − 𝑌)2
1 9 11.3815 1705.69 1514.65
2 15 20.03 25.30 1246.09 916.2729
3 32 28.6785 11.03 334.89 467.4893
4 48 37.327 113.91 5.29 168.2987
5 52 45.9755 36.29 2.89 18.7013
6 60 54.624 28.90 94.09 18.69698
7 39 63.2725 589.15 127.69 168.2858
8 65 71.921 47.90 216.09 467.4676
9 90 80.5695 88.93 1576.09 916.2426
10 93 89.218 14.30 1823.29 1514.611
5.5 50.3 961.41 7132.1 6170.72
𝑋 𝑌 𝑆𝑆𝑅
= 𝑌 − 𝑌
2
𝑆𝑆𝑇
= 𝑌 − 𝑌 2
𝑆𝑆𝐸 = 𝜀2
= 𝑌 − 𝑌
2
𝑅2
=
𝑆𝑆𝑅
𝑆𝑆𝑇
=
6170.72
7132.1
=0.8652
𝑅2
= 1 −
𝑆𝑆𝐸
𝑆𝑆𝑇
= 1-
961.41
7132.1
= 0.8652
2
1
2
1
2
0
)
(
1
2
)
(
SE
Intercept








 n
i
i
n
i
i
X
X
X
n
n


2
1
1
2
1
)
(
2
)
(
SE
Slope







n
i
i
n
i
i
X
X
n


=
961.41
8
×
1
10
+
30.25
82.5
= 10.96 * 0.683 = 7.488
=
961.41
8
∗ 1/82.5 = 1.2069

Exercise
25
1. From the following data of a book store ABC, derive the regression
equation for the effect of purchases on sales of books.
Also, calculate standard errors and coefficient of determination.
2.

Solution 1
β0 = 𝑌 − β1 ∗ 𝑋
= 70 – 0.6132*90
= 14.812
β1 =
(𝑋− 𝑋)∗(𝑌− 𝑌)
(𝑋−𝑋 )2
=
3900
6360
= 0.6132
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛:
Y =β0+β1 ∗ X
= 0.6132+ 14.812*X

 The coefficient of determination =
 Standard Errors :
@Ravindra Na
𝑅2
=
𝑆𝑆𝑅
𝑆𝑆𝑇
=
𝑌−𝑌 ̅ 2
𝑌− ̅
𝑌 2
=
2391.51
2868
=0.8339
2
1
2
1
2
0
)
(
1
2
)
(
SE
Intercept








 n
i
i
n
i
i
X
X
X
n
n


2
1
1
2
1
)
(
2
)
(
SE
Slope







n
i
i
n
i
i
X
X
n


=
476.49
8
×
1
10
+
8100
6380
= 7.718 * 1.172= 9.045
=
476.49
8
∗ 1/6380 = 0.0967
X Y 𝑌 (𝑌 − 𝑌)2 (𝑌 − 𝑌)2
(𝑌 − 𝑌)2
91 71 70.61 0.38 1 0.15
97 75 74.29 18.42 25 0.50
108 69 81.04 121.83 1 144.90
121 97 89.01 361.35 729 63.85
67 70 55.90 198.92 0 198.92
124 91 90.85 434.67 441 0.02
51 39 46.08 571.95 961 50.19
73 61 59.58 108.68 81 2.03
111 80 82.88 165.82 100 8.28
57 47 49.76 409.50 529 7.64
90 70 2391.51 2868.00 476.49
𝑋 𝑌 𝑆𝑆𝑅
= 𝑌 − 𝑌
2
𝑆𝑆𝑇
= 𝑌 − 𝑌 2
𝑆𝑆𝐸 = 𝜀2
= 𝑌 − 𝑌
2

28
A. Multiple Linear Regression (MLR)
• Predicting an outcome (dependent variable)
based upon several independent variables
simultaneously.
• Why is this important?
• Behavior is rarely a function of just one variable,
but is instead influenced by many variables. So
the idea is that we should be able to obtain a
more accurate predicted score if using multiple
variables to predict our outcome.

Travel
time (y)
Km
travelled
(x1)
No. of
Deliveries
(x2)
Independent variables Dependent
variables
Potential
multicollinearity
Multiple regression
many-to-one
DV
IV
IV
IV
IV
Multiple regression
many-to-one
10 relationships to consider!

30
 The functional form of MLR is given by
 The variable Y is the dependent variable (response variable or outcome
variable);
 X1, X2 , …, Xk are independent variables (predictor variables or
explanatory variables);
 β0 is a constant;
 β1 , β2, …, βk are called the partial regression co-efficients corresponding
to the explanatory variables X1, X2 , …, Xk respectively; and
 εi is the error term (or residual).
i
ki
k
i
i
i X
X
X
Y 



 




 .......
2
2
1
1
0

31
 IF εi = 0, then
 The estimated value of Y will be
 In MLR each coefficient is interpreted as estimated change in Y
corresponding to the unit change in a independent variable (X1), when all
other variables held constant (X2, X3,… Xk).
ki
k
i
i X
X
X
Y
E 


 



 .......
)
( 2
2
1
1
0
ki
k
i
i X
b
X
b
X
b
b
Y 



 .......
ˆ
2
2
1
1
0

Simple vs. Multiple Regression
32

New Considerations
 Adding more independent variables to a multiple regression
procedure does not mean the regression will be "better" or offer
better predictions; in fact it can make things worse. This is
called OVERFITTING.
 The addition of more independent variables creates more
relationships among them. So not only are the independent
variables potentially related to the dependent variable, they are
also potentially related to each other. When this happens, it is
called MULTICOLLINEARITY.

 The ideal is for all of the independent variables to be correlated with
the dependent variable but NOT with each other.
 Because of multicollinearity and overfitting, there is a fair amount
of prep-work is required before conducting multiple regression
analysis :
 Correlations
 Scatter plots
 Simple regressions

Steps in MLR
1. Generate a list of potential variables; independent(s) and dependent
2. Collect data on the variables
3. Check the relationships between each independent variable and the
dependent variable using scatterplots and correlations
4. Check the relationships among the independent variables using
scatterplots and correlations
5. Conduct simple linear regressions for each IV/DV pair (Optional).
6. Use the non-redundant independent variables in the analysis to find
the best fitting model
7. Use the best fitting model to make predictions about the dependent
variable.

Example
Aditya Delivery Service (ADS) offers same-day delivery for letters,
packages, and other small courier parcels. They use Google Maps
to group individual deliveries into one trip to reduce time and fuel
costs. Some trips cost more than one delivery.
The ADS company wants to estimate how long a delivery will take
based on three factors:
1) the total distance of the trip in Kilometers (KMs)
2) the number of deliveries that must be made during the trip, and
3) the daily price of petrol.

In this case, we can predict the total travel time using the distance
traveled, number of deliveries on each trip, and daily petrol price.
Distance
Travelled (Kms),
(X1)
No. of
Deliveries (X2)
Petrol Price ($),
(X3)
Travel Time
(hrs), (Y)
89 4 3.84 7
66 1 3.19 5.4
78 3 3.78 6.6
111 6 3.89 7.4
44 1 3.57 4.8
77 3 3.57 6.4
80 3 3.03 7
66 2 3.51 5.6
109 5 3.54 7.3
76 3 3.25 6.4
Step 1 : Generate a list of potential variables; independent(s)
and dependent
Step 2 : Collect data on the variables
To conduct this analysis a random sample of 10 past trips and
record four pieces of information for each trip is given as:

Step 3 : Scatterplot IV to DV
Step 4 : Scatterplot IV to IV
y = 0.0403x + 3.1856
R² = 0.8615
5
5.5
6
6.5
7
7.5
40 60 80 100 120
Distance Travelled (Kms), (X1)
Travel
Time
(hrs),
(Y)
y = 0.4983x + 4.8454
R² = 0.8399
5
5.5
6
6.5
7
7.5
1 2 3 4 5 6
Travel
Time
(hrs),
(Y)
Travel
Time
(hrs),
(Y)
No. of Deliveries (X2)
5
5.5
6
6.5
7
7.5
3 3.2 3.4 3.6 3.8
Petrol Price ($), (X3)
y = 0.0763x - 2.97
R² = 0.9137
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
40 60 80 100 120
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
40 60 80 100 120
Distance Travelled (Kms), (X1) Distance Travelled (Kms), (X1)
No.
of
Deliveries
(X
2
)
No. of Deliveries (X2)
Petrol
Price
($),
(X
3
)
Petrol
Price
($),
(X
3
)
3
3.2
3.4
3.6
3.8
4
1 2 3 4 5 6

 Step 5. Conduct simple linear regressions for each
IV/DV pair (Optional).
 Step 6. Use the non-redundant independent
variables in the analysis to find the best fitting
model.
 In our example Step 3 suggest that Petrol Price ($),
(X3) is redundant due to overfitting issue
 X1 or X2 is redundant as X1 & X2 shows
multicollinearity
 Step 7. Use the best fitting model to make
predictions about the dependent variable.

Excel: Multiple regression
Distance
Travelled (Kms),
(X1)
No. of
Deliveries (X2)
Petrol Price ($),
(X3)
Travel Time
(hrs), (Y)
89 4 3.84 7
66 1 3.19 5.4
78 3 3.78 6.6
111 6 3.89 7.4
44 1 3.57 4.8
77 3 3.57 6.4
80 3 3.03 7
66 2 3.51 5.6
109 5 3.54 7.3
76 3 3.25 6.4
For practice purpose, if we solve our above example considering one
response variable y and three predictor variables X1, X2, and X3.

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.946
R Square 0.895
Adjusted R Square 0.842
Standard Error 0.345
Observations 10
ANOVA
df SS MS F Significance F
Regression 3 6.056 2.019 16.991 0.002
Residual 6 0.713 0.119
Total 9 6.769
Coefficients
Standard
Error
t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 6.211 2.321 2.677 0.037 0.533 11.890 0.533 11.890
Distance Travelled
(Kms), (X1)
0.014 0.022 0.636 0.548 -0.040 0.068 -0.040 0.068
No. of Deliveries
(X2)
0.383 0.300 1.277 0.249 -0.351 1.117 -0.351 1.117
Petrol Price ($),
(X3)
-0.607 0.527 -1.152 0.293 -1.895 0.682 -1.895 0.682
i
i
i
i X
X
X
Y 3
2
1 607
.
0
383
.
0
014
.
0
211
.
6 



Regression Equation:

Manual solution : Multiple regression
Suppose, for practice purpose we use Y as dependent variable and X1, and
X2 as independent variable as:
Travel Time
(hrs), (Y)
Distance
Travelled
(Kms), (X1)
No. of Deliveries
(X2)
7 89 4
5.4 66 1
6.6 78 3
7.4 111 6
4.8 44 1
6.4 77 3
7 80 3
5.6 66 2
7.3 109 5
6.4 76 3

Step 1: Calculate X1
2, X2
2, X1y, X2y and X1X2.
Travel
Time
(hrs), (Y)
Distance
Travelled
(Kms),
(X1)
No. of
Deliverie
s (X2) X1
2 X2
2 X1Y X2Y X1X2
7 89 4 7921 16 623 28 356
5.4 66 1 4356 1 356.4 5.4 66
6.6 78 3 6084 9 514.8 19.8 234
7.4 111 6 12321 36 821.4 44.4 666
4.8 44 1 1936 1 211.2 4.8 44
6.4 77 3 5929 9 492.8 19.2 231
7 80 3 6400 9 560 21 240
5.6 66 2 4356 4 369.6 11.2 132
7.3 109 5 11881 25 795.7 36.5 545
6.4 76 3 5776 9 486.4 19.2 228
SUM 63.9 796 31 66960 119 5231.3 209.5 2742
Mean 6.39 79.6 3.1

Step 2: Calculate Regression Sums
𝑋1
2
= 𝑋1
2
− ( 𝑋1)2 /𝑛 = 66960 −
(796)2
10
= 66960 – 63361.6= 3598.4
𝑋2
2
= 𝑋2
2
− ( 𝑋2)2
/𝑛 = 119 −
(31)2
10
= 119 - 96.1= 22.9
𝑋1 𝑌 = 𝑋1 𝑌 − ( 𝑋1 𝑌) /𝑛 =5231.3 −
796 × 63.9
10
= 5231.3 - 5086.44
=144.86
𝑋2 𝑌 = 𝑋2 𝑌 − ( 𝑋2 𝑌) /𝑛 = 209.5−
31 × 63.9
10
= 209.5- 198.09=11.41
𝑋1 𝑋2 = 𝑋1 𝑋2 − ( 𝑋1 𝑋2 ) /𝑛 = 2742 −
796 ×31
10
=
2742−2467.6=274.4
X1
2 X2
2 X1Y X2Y X1X2
66960 119 5231.3 209.5 2742
Sum X1 =
796
Sum X2=
31 n=10
Sum
Y=63.9

Calculation b0, b1, and b2
 The formula to calculate b1 is:
𝑏1
=
[( 𝑋2
2
)( 𝑋1 𝑌) − ( 𝑋1 𝑋2 )( 𝑋2𝑌)]
[( 𝑋1
2
) ( 𝑋2
2
) − ( 𝑋1𝑋2)2]
𝑏2
=
[( 𝑋1
2
) 𝑋2 𝑌) − ( 𝑋1 𝑋2 )( 𝑋1𝑌)]
[( 𝑋1
2
) ( 𝑋2
2
) − ( 𝑋1𝑋2)2]
𝑏0 = 𝑌 − 𝑏1𝑋1 − 𝑏2𝑋2

Step 3: Calculate b0, b1, and b2
 Calculation of b1 is:
𝑏0 = 𝑌 − 𝑏1𝑋1 − 𝑏2𝑋2 = 6.39 – (0.02622 × 79.6) – (0.1840 × 3.1)
= 6.39 − 2.087− 0.57 =3.733
X1
2 X2
2 X1Y X2Y X1X2
Regression
SUM 3598.4 22.9 144.86 11.41 274.4
Mean Y
=6.39
Mean X1=
79.6
Mean X2=
3.1
𝒃𝟏 =
[( 𝑋2
2
)( 𝑋1 𝑌) − ( 𝑋1 𝑋2 )( 𝑋2𝑌)]
[( 𝑋1
2
) ( 𝑋2
2
) − ( 𝑋1𝑋2)2]
=
[ 22.9 × 144.86 − 274.4 × 11.41 ]
[ 3598.4 × 22.9 − (274.4)2]
=
3317.29−3130.90
82403.36−75295.6
=
186.39
7107.76
= 𝟎. 𝟎𝟐𝟔𝟐𝟐
𝒃𝟐 =
[( 𝑋1
2
)( 𝑋2 𝑌) − ( 𝑋1 𝑋2 )( 𝑋1𝑌)]
[( 𝑋1
2
) ( 𝑋2
2
) − ( 𝑋1𝑋2)2]
=
[ 3598.4 × 11.41 − 274.4 × 144.86 ]
[ 3598.4 × 22.9 − (274.4)2]
=
41057.74 − 39749.58
7107.76
=
1308.16
7107.76
= 𝟎. 𝟏𝟖𝟒𝟎

Step 4: Place b0, b1, and b2 in the estimated
linear regression equation
𝒀 = 3.733 + 0.02622 𝑿𝟏 + 0.1840 𝑿𝟐

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9335
R Square 0.8714
Adjusted R Square 0.8347
Standard Error 0.3526
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 5.8985 2.9493 23.7161 0.0008
Residual 7 0.8705 0.1244
Total 9 6.769
Coefficients Standard Error t Stat P-value
Intercept 3.7322 0.8870 4.2077 0.0040
X Variable 1 0.0262 0.0200 1.3101 0.2315
X Variable 2 0.1840 0.2509 0.7335 0.4871

Steps for manual solution : Multiple regression
Suppose, we have the following dataset with one response variable y and
two predictor variables X1, and X2.

Step 1: Calculate X1
2, X2
2, X1y, X2y and X1X2.

Step 2: Calculate Regression Sums
X1
2 X2
2 X1Y X2Y X1X2
38767 2823 101895 25364 9859
Sum X1 =
555
Sum X2 =
145 n=8
Sum Y =
1452

Step 3: Calculate b0, b1, and b2
X1
2 X2
2 X1Y X2Y X1X2
Regression
SUM 263.875 194.875 1162.5 -953.5 -200.375

Step 4: Place b0, b1, and b2 in the estimated
linear regression equation

Regression analysis: Simple Linear Regression Multiple Linear Regression

Regression analysis: Simple Linear Regression Multiple Linear Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Regression analysis: Simple Linear Regression Multiple Linear Regression

Similar to Regression analysis: Simple Linear Regression Multiple Linear Regression (20)

More from Ravindra Nath Shukla

More from Ravindra Nath Shukla (9)

Recently uploaded

Recently uploaded (20)

Regression analysis: Simple Linear Regression Multiple Linear Regression