SlideShare a Scribd company logo
1 of 54
MBMG-7104/ ITHS-2202/ IMAS-3101/ IMHS-3101
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Regression Analysis
๏ƒ˜ Regression analysis is a set of statistical process for
estimating the relationship between a dependent variable
(response) and one or more independent variables(aka
explanatory).
๏ƒ˜ For example, when a series of Y numbers (such as the
monthly sales of cameras over a period of years) is causally
connected with the series of X numbers (the monthly
advertising budget), then it is beneficial to establish a
relationship between X and Y in order to forecast Y.
2
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
3
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Types of Regression
๏ƒ˜ There are various types of
regressions.
๏ƒ˜ Each type has its own
importance on different
scenarios, but at the core, all the
regression methods analyze the
effect of the independent
variable on dependent variables.
๏ƒ˜ Some important types of
regression models are :
4
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Linear Regression
๏ƒ˜ Linear regression attempts to model the
relationship between dependent (Y) and
independent (X) variables by fitting a
linear equation to observed data.
๏ƒ˜ The case of one independent variable is
called Simple Linear Regression, for
more than one independent variable the
process is called Multiple Linear
Regression.
5
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
A. Simple Linear Regression (SLR)
๏ƒ˜ Linear regression models describe the relationship
between variables by fitting a line to the observed data.
๏ƒ˜ Linear regression uses a straight line, while logistic and
non-linear regression uses curved lines.
๏ƒ˜ SLR assumes that at least to some extend, the behavior of
one variable (Y) is the result of a functional relationship
between the two variables ( X & Y).
6
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Objectives of Regression
1. Establish if there is a relationship between two variables
(X & Y).
2. Forecast new observation.
7
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Classical Assumptions of Linear regression
1. Linear relationship
2. No-Autocorrelation : Regressors and error term has no
correlation.
3. No-Multicollinearity : Independent variables are not correlated.
4. Homoscedasticity: The error term is the same across all values of
the independent variables.
5. Multivariate Normality : Data should be normally distributed
8 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Mathematical expression of Linear regression
๏ต Assume Y is a dependent variable
๏ต That depends on the realization of the
independent variable X.
๏ต We know the linear equation of a simple
line is :
(1)
๏ต Where, m = gradient or slope
๏ต c = y -intercept or height at which the
line crosses the y โ€“axis.
c
mX
Y ๏€ซ
๏€ฝ
c
mX
Y ๏€ซ
๏€ฝ
Intercept c
X
Y
9 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต For SLR model, Equation (1)
could we written as :
(2)
๏ต Where, m = gradient or slope
๏ต ฮฒ0 = y -intercept = value where
the regression line crosses the y โ€“
axis.
๏ต ฮฒ1 = Coefficient or slope of X.
X
Y 1
0 ๏ข
๏ข ๏€ซ
๏€ฝ
Intercept ฮฒ0
X
Y
X
Y 1
0 ๏ข
๏ข ๏€ซ
๏€ฝ
10 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต The slope or coefficient of X denotes
the relationship between X and Y.
๏ต With unit change in X, Y changed as
no. of times of X coefficient.
๏ต In other words ฮฒ1 represents the
sensitivity of Y changes with X.
๏ต In the equation, Y = ฮฒ0 + ฮฒ1 X,
๏ต ฮฒ0 gives the value of variable Y, when
X = 0.
Intercept ฮฒ0
X
Y
X
Y 1
0 ๏ข
๏ข ๏€ซ
๏€ฝ
y = 2.7333 + 8.6485x
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Demand : Dependent Variable (Y)
Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y))
Consider the demand data
given in the table below.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Example
Month:
Independent
Variable (X)
Demand :
Dependent
Variable (Y)
1 9
2 15
3 32
4 48
5 52
6 60
7 39
8 65
9 90
10 93
๏ต Here , The slope or coefficient of X is
๏ต ฮฒ1 = 8.6485
๏ต denotes that for one unit change in X,
Y changes 8.6485
y = b0 + b1x + ฮตi
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Demand : Dependent Variable (Y)
Demand : Dependent Variable (Y)
Linear (Demand : Dependent Variable (Y))
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต Data in general may not intersect or
passes through the regression line.
๏ต Rather, there exist some errors on
random component, which can be
measured as distance between true value
and predicted value.
๏ต The regression model must include
these error terms as :
(3)
For sample data, equation (3) can be
written as :
i
X
Y ๏ฅ
๏ข
๏ข ๏€ซ
๏€ซ
๏€ฝ 1
0
i
x
b
b
y ๏ฅ
๏€ซ
๏€ซ
๏€ฝ 1
0
(4)
Non-random component
Random component
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต To minimize the error or random component , SLR uses OLS
method and calculate the value of and as :
๏ต The intercept or coefficient of X,
)
var(
)
(
)
(
)
)(
(
ห†
1
2
1
1
Y
XY
Cov
X
X
Y
Y
X
X
n
i
i
n
i
i
i
๏€ฝ
๏€ญ
๏€ญ
๏€ญ
๏€ฝ
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏ข
X
Y 1
0
ห† ๏ข
๏ข ๏€ญ
๏€ฝ
ORDINARY LEAST SQUARES (OLS)
0
๏ขฬ‚ 1
๏ขฬ‚
Standard error of slope and Intercept
2
1
2
1
2
0
)
(
1
2
)
(
SE
Intercept
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ญ
๏€ซ
๏‚ด
๏€ญ
๏€ฝ n
i
i
n
i
i
X
X
X
n
n
๏ฅ
๏ข
2
1
1
2
1
)
(
2
)
(
SE
Slope
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ญ
๏€ญ
๏€ฝ
n
i
i
n
i
i
X
X
n
๏ฅ
๏ข
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต To minimize the error or random component , SLR uses OLS method
๏ต The method of least squares gives the best equation under the
assumptions stated below :
1. The regression model is linear in regression parameters.
2. The explanatory variable, X, is assumed to be non-random or
non-stochastic (i.e., X is deterministic).
3. The conditional expected value of the error terms or residuals,
E(ฮตi | Xi), is zero.
4. In case of time series data, error terms are uncorrelated, that is,
Cov(ฮตi , ฮตi ) = 0 for all i โ‰  j.
5. The variance of the errors, Var(ฮตi |Xi), is constant for all values
of Xi. or follows homoscedasticity.
6. The error terms, ฮตi , follow a normal distribution.
Explanation of OLS method
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต In ordinary least squares, the objective is find the optimal values of ฮฒ0 and ฮฒ1
that will minimize the Sum of Squares Errors (SSE) given in Eq. (6) as
2
1
1
0
1
2
)
( i
n
i
i
n
i
i X
Y
SSE ๏ข
๏ข
๏ฅ ๏ƒฅ
๏ƒฅ ๏€ฝ
๏€ฝ
๏€ญ
๏€ญ
๏€ฝ
๏€ฝ (6)
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต To find the optimal values of ฮฒ0 and ฮฒ1 that will minimize SSE, we have to
equate the partial derivative of SSE with respect to ฮฒ0 and ฮฒ1 to zero [Eqs. (7)
and (9) as :
๏ต Solving Eq. (7) for ฮฒ0, the estimated value of ฮฒ0 is given by
๏ต Differentiating SSE with respect to ฮฒ1, we get
๏ต Substituting the value of ฮฒ0 from Eq. (8) in Eq. (9), we get
(7)
(8)
(9)
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ต Thus, the value of ฮฒ1 is given by
)
var(
)
(
)
(
)
)(
(
ห†
1
2
1
1
Y
XY
Cov
X
X
Y
Y
X
X
n
i
i
n
i
i
i
๏€ฝ
๏€ญ
๏€ญ
๏€ญ
๏€ฝ
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏ข
Solution Manual Method :-
19
Time:
(X)
Demand
(Y)
๐‘‘๐‘ฅ
= ๐‘‹ โˆ’ ๐‘‹
๐‘‘๐‘ฆ
= ๐‘ฆ โˆ’ ๐‘ฆ (๐‘‹ โˆ’ ๐‘‹)2
(๐‘Œ โˆ’ ๐‘Œ)2
(X- X
ฬ… )* (Y- Y
ฬ… )
1 9 -4.5 -41.3 20.25 1705.69 185.85
2 15 -3.5 -35.3 12.25 1246.09 123.55
3 32 -2.5 -18.3 6.25 334.89 45.75
4 48 -1.5 -2.3 2.25 5.29 3.45
5 52 -0.5 1.7 0.25 2.89 -0.85
6 60 0.5 9.7 0.25 94.09 4.85
7 39 1.5 -11.3 2.25 127.69 -16.95
8 65 2.5 14.7 6.25 216.09 36.75
9 90 3.5 39.7 12.25 1576.09 138.95
10 93 4.5 42.7 20.25 1823.29 192.15
5.5 50.3 82.5 7132.1 713.5
๐‘‹
๐‘Œ
๐‘‹ โˆ’ ๐‘‹ 2
๐‘Œ โˆ’ ๐‘Œ 2
ฮฃ(X- X
ฬ… )*(Y- Y
ฬ… )
ฮฒ1 =
(๐‘‹โˆ’ ๐‘‹)โˆ—(๐‘Œโˆ’ ๐‘Œ)
(๐‘‹โˆ’๐‘‹ )2
=
713.5
82.5
= 8.6485
ฮฒ0 = ๐‘Œ โˆ’ ฮฒ1 โˆ— ๐‘‹
= 50.3 โ€“ 8.6485*5.5
= 2.733
๐‘…๐‘’๐‘”๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ๐‘’๐‘ž๐‘ข๐‘Ž๐‘ก๐‘–๐‘œ๐‘›:
Y =ฮฒ0+ฮฒ1 โˆ— X
= 2.733+ 8.6485*X
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Example: Consider the demand data given in the table
below.
20
Time: Independent
Variable (X) 1 2 3 4 5 6 7 8 9 10
Demand : Dependent
Variable (Y) 9 15 32 48 52 60 39 65 90 93
Time:
Independent
Variable (X)
Demand :
Dependent
Variable (Y)
1 9
2 15
3 32
4 48
5 52
6 60
7 39
8 65
9 90
10 93
Slope (b) 8.65
Intercept (a) 2.73
y = 8.6485x + 2.7333
Rยฒ = 0.8652
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Demand : Dependent Variable (Y)
Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y))
Solution Excel :-
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ƒ˜ The Excel functions give b = 8.65 and a = 2.73.
๏ƒ˜ Use them in equation, Y = a + bX, to make a forecast.
๏ƒ˜ For example, for period 11 (X = 11),
๏ƒ˜ Forecast = 2.73 + 11*8.65 = 97.87.
๏ƒ˜ Similarly, for period 12,
๏ƒ˜ Forecast = 2.73 + 12*8.65 = 106.52.
21
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Coefficient of Determination
22
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ƒ˜ The coefficient of determination (R2), where R is the value of the
coefficient of correlation, is a measure of the variability that is accounted
for by the regression line for the dependent variable.
๏ƒ˜ Calculation :
๏ƒ˜ Where,
Yi = actual value of Y
๐‘Œ = estimated value of Y
๐‘Œ= mean value of Y
๏ƒ˜ The coefficient of determination always falls between 0 and 1.
๏ƒ˜ For example, if r = 0.8, the coefficient of determination is r2 = 0.64
meaning that 64% of the variation in Y is due to variation in X.
๏ƒ˜ The remaining 36% variation in the value of Y is due to other variables.
๏ƒ˜ If the coefficient of determination is low, multiple regression analysis
may be used to account for all variables affecting the independent
variable Y. 23
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
24
Solution : Coefficient of determination and Std. Errors
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Time: (X)
Dema
nd (Y) ๐‘Œ (๐‘Œ โˆ’ ๐‘Œ)2 (๐‘Œ โˆ’ ๐‘Œ)2
๐œ€2
= (๐‘Œ โˆ’ ๐‘Œ)2
1 9 11.3815  1705.69 1514.65
2 15 20.03 25.30 1246.09 916.2729
3 32 28.6785 11.03 334.89 467.4893
4 48 37.327 113.91 5.29 168.2987
5 52 45.9755 36.29 2.89 18.7013
6 60 54.624 28.90 94.09 18.69698
7 39 63.2725 589.15 127.69 168.2858
8 65 71.921 47.90 216.09 467.4676
9 90 80.5695 88.93 1576.09 916.2426
10 93 89.218 14.30 1823.29 1514.611
5.5 50.3 961.41 7132.1 6170.72
๐‘‹ ๐‘Œ ๐‘†๐‘†๐‘…
= ๐‘Œ โˆ’ ๐‘Œ
2
๐‘†๐‘†๐‘‡
= ๐‘Œ โˆ’ ๐‘Œ 2
๐‘†๐‘†๐ธ = ๐œ€2
= ๐‘Œ โˆ’ ๐‘Œ
2
๐‘…2
=
๐‘†๐‘†๐‘…
๐‘†๐‘†๐‘‡
=
6170.72
7132.1
=0.8652
๐‘…2
= 1 โˆ’
๐‘†๐‘†๐ธ
๐‘†๐‘†๐‘‡
= 1-
961.41
7132.1
= 0.8652
2
1
2
1
2
0
)
(
1
2
)
(
SE
Intercept
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ญ
๏€ซ
๏‚ด
๏€ญ
๏€ฝ n
i
i
n
i
i
X
X
X
n
n
๏ฅ
๏ข
2
1
1
2
1
)
(
2
)
(
SE
Slope
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ญ
๏€ญ
๏€ฝ
n
i
i
n
i
i
X
X
n
๏ฅ
๏ข
=
961.41
8
ร—
1
10
+
30.25
82.5
= 10.96 * 0.683 = 7.488
=
961.41
8
โˆ— 1/82.5 = 1.2069
Exercise
25
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
1. From the following data of a book store ABC, derive the regression
equation for the effect of purchases on sales of books.
Also, calculate standard errors and coefficient of determination.
2.
Solution 1
ฮฒ0 = ๐‘Œ โˆ’ ฮฒ1 โˆ— ๐‘‹
= 70 โ€“ 0.6132*90
= 14.812
ฮฒ1 =
(๐‘‹โˆ’ ๐‘‹)โˆ—(๐‘Œโˆ’ ๐‘Œ)
(๐‘‹โˆ’๐‘‹ )2
=
3900
6360
= 0.6132
๐‘…๐‘’๐‘”๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ๐‘’๐‘ž๐‘ข๐‘Ž๐‘ก๐‘–๐‘œ๐‘›:
Y =ฮฒ0+ฮฒ1 โˆ— X
= 0.6132+ 14.812*X
๏ต The coefficient of determination =
๏ต Standard Errors :
@Ravindra Na
๐‘…2
=
๐‘†๐‘†๐‘…
๐‘†๐‘†๐‘‡
=
๐‘Œโˆ’๐‘Œ ฬ… 2
๐‘Œโˆ’ ฬ…
๐‘Œ 2
=
2391.51
2868
=0.8339
2
1
2
1
2
0
)
(
1
2
)
(
SE
Intercept
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ญ
๏€ซ
๏‚ด
๏€ญ
๏€ฝ n
i
i
n
i
i
X
X
X
n
n
๏ฅ
๏ข
2
1
1
2
1
)
(
2
)
(
SE
Slope
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ญ
๏€ญ
๏€ฝ
n
i
i
n
i
i
X
X
n
๏ฅ
๏ข
=
476.49
8
ร—
1
10
+
8100
6380
= 7.718 * 1.172= 9.045
=
476.49
8
โˆ— 1/6380 = 0.0967
X Y ๐‘Œ (๐‘Œ โˆ’ ๐‘Œ)2 (๐‘Œ โˆ’ ๐‘Œ)2
(๐‘Œ โˆ’ ๐‘Œ)2
91 71 70.61 0.38 1 0.15
97 75 74.29 18.42 25 0.50
108 69 81.04 121.83 1 144.90
121 97 89.01 361.35 729 63.85
67 70 55.90 198.92 0 198.92
124 91 90.85 434.67 441 0.02
51 39 46.08 571.95 961 50.19
73 61 59.58 108.68 81 2.03
111 80 82.88 165.82 100 8.28
57 47 49.76 409.50 529 7.64
90 70 2391.51 2868.00 476.49
๐‘‹ ๐‘Œ ๐‘†๐‘†๐‘…
= ๐‘Œ โˆ’ ๐‘Œ
2
๐‘†๐‘†๐‘‡
= ๐‘Œ โˆ’ ๐‘Œ 2
๐‘†๐‘†๐ธ = ๐œ€2
= ๐‘Œ โˆ’ ๐‘Œ
2
28
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
A. Multiple Linear Regression (MLR)
โ€ข Predicting an outcome (dependent variable)
based upon several independent variables
simultaneously.
โ€ข Why is this important?
โ€ข Behavior is rarely a function of just one variable,
but is instead influenced by many variables. So
the idea is that we should be able to obtain a
more accurate predicted score if using multiple
variables to predict our outcome.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Travel
time (y)
Km
travelled
(x1)
No. of
Deliveries
(x2)
Independent variables Dependent
variables
Potential
multicollinearity
Multiple regression
many-to-one
DV
IV
IV
IV
IV
Multiple regression
many-to-one
10 relationships to consider!
30
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ƒ˜ The functional form of MLR is given by
๏ƒ˜ The variable Y is the dependent variable (response variable or outcome
variable);
๏ƒ˜ X1, X2 , โ€ฆ, Xk are independent variables (predictor variables or
explanatory variables);
๏ƒ˜ ฮฒ0 is a constant;
๏ƒ˜ ฮฒ1 , ฮฒ2, โ€ฆ, ฮฒk are called the partial regression co-efficients corresponding
to the explanatory variables X1, X2 , โ€ฆ, Xk respectively; and
๏ƒ˜ ฮตi is the error term (or residual).
i
ki
k
i
i
i X
X
X
Y ๏ฅ
๏ข
๏ข
๏ข
๏ข ๏€ซ
๏€ซ
๏€ซ
๏€ซ
๏€ซ
๏€ฝ .......
2
2
1
1
0
31
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ƒ˜ IF ฮตi = 0, then
๏ƒ˜ The estimated value of Y will be
๏ƒ˜ In MLR each coefficient is interpreted as estimated change in Y
corresponding to the unit change in a independent variable (X1), when all
other variables held constant (X2, X3,โ€ฆ Xk).
ki
k
i
i X
X
X
Y
E ๏ข
๏ข
๏ข
๏ข ๏€ซ
๏€ซ
๏€ซ
๏€ซ
๏€ฝ .......
)
( 2
2
1
1
0
ki
k
i
i X
b
X
b
X
b
b
Y ๏€ซ
๏€ซ
๏€ซ
๏€ซ
๏€ฝ .......
ห†
2
2
1
1
0
Simple vs. Multiple Regression
32
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
New Considerations
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ƒ˜ Adding more independent variables to a multiple regression
procedure does not mean the regression will be "better" or offer
better predictions; in fact it can make things worse. This is
called OVERFITTING.
๏ƒ˜ The addition of more independent variables creates more
relationships among them. So not only are the independent
variables potentially related to the dependent variable, they are
also potentially related to each other. When this happens, it is
called MULTICOLLINEARITY.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
๏ƒ˜ The ideal is for all of the independent variables to be correlated with
the dependent variable but NOT with each other.
๏ƒ˜ Because of multicollinearity and overfitting, there is a fair amount
of prep-work is required before conducting multiple regression
analysis :
๏ถ Correlations
๏ถ Scatter plots
๏ถ Simple regressions
Steps in MLR
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
1. Generate a list of potential variables; independent(s) and dependent
2. Collect data on the variables
3. Check the relationships between each independent variable and the
dependent variable using scatterplots and correlations
4. Check the relationships among the independent variables using
scatterplots and correlations
5. Conduct simple linear regressions for each IV/DV pair (Optional).
6. Use the non-redundant independent variables in the analysis to find
the best fitting model
7. Use the best fitting model to make predictions about the dependent
variable.
Example
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Aditya Delivery Service (ADS) offers same-day delivery for letters,
packages, and other small courier parcels. They use Google Maps
to group individual deliveries into one trip to reduce time and fuel
costs. Some trips cost more than one delivery.
The ADS company wants to estimate how long a delivery will take
based on three factors:
1) the total distance of the trip in Kilometers (KMs)
2) the number of deliveries that must be made during the trip, and
3) the daily price of petrol.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
In this case, we can predict the total travel time using the distance
traveled, number of deliveries on each trip, and daily petrol price.
Distance
Travelled (Kms),
(X1)
No. of
Deliveries (X2)
Petrol Price ($),
(X3)
Travel Time
(hrs), (Y)
89 4 3.84 7
66 1 3.19 5.4
78 3 3.78 6.6
111 6 3.89 7.4
44 1 3.57 4.8
77 3 3.57 6.4
80 3 3.03 7
66 2 3.51 5.6
109 5 3.54 7.3
76 3 3.25 6.4
Step 1 : Generate a list of potential variables; independent(s)
and dependent
Step 2 : Collect data on the variables
To conduct this analysis a random sample of 10 past trips and
record four pieces of information for each trip is given as:
Step 3 : Scatterplot IV to DV
Step 4 : Scatterplot IV to IV
y = 0.0403x + 3.1856
Rยฒ = 0.8615
5
5.5
6
6.5
7
7.5
40 60 80 100 120
Distance Travelled (Kms), (X1)
Travel
Time
(hrs),
(Y)
y = 0.4983x + 4.8454
Rยฒ = 0.8399
5
5.5
6
6.5
7
7.5
1 2 3 4 5 6
Travel
Time
(hrs),
(Y)
Travel
Time
(hrs),
(Y)
No. of Deliveries (X2)
5
5.5
6
6.5
7
7.5
3 3.2 3.4 3.6 3.8
Petrol Price ($), (X3)
y = 0.0763x - 2.97
Rยฒ = 0.9137
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
40 60 80 100 120
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
40 60 80 100 120
Distance Travelled (Kms), (X1) Distance Travelled (Kms), (X1)
No.
of
Deliveries
(X
2
)
No. of Deliveries (X2)
Petrol
Price
($),
(X
3
)
Petrol
Price
($),
(X
3
)
3
3.2
3.4
3.6
3.8
4
1 2 3 4 5 6
๏ต Step 5. Conduct simple linear regressions for each
IV/DV pair (Optional).
๏ต Step 6. Use the non-redundant independent
variables in the analysis to find the best fitting
model.
๏‚ง In our example Step 3 suggest that Petrol Price ($),
(X3) is redundant due to overfitting issue
๏‚ง X1 or X2 is redundant as X1 & X2 shows
multicollinearity
๏ต Step 7. Use the best fitting model to make
predictions about the dependent variable.
Excel: Multiple regression
Distance
Travelled (Kms),
(X1)
No. of
Deliveries (X2)
Petrol Price ($),
(X3)
Travel Time
(hrs), (Y)
89 4 3.84 7
66 1 3.19 5.4
78 3 3.78 6.6
111 6 3.89 7.4
44 1 3.57 4.8
77 3 3.57 6.4
80 3 3.03 7
66 2 3.51 5.6
109 5 3.54 7.3
76 3 3.25 6.4
For practice purpose, if we solve our above example considering one
response variable y and three predictor variables X1, X2, and X3.
SUMMARY OUTPUT
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
Regression Statistics
Multiple R 0.946
R Square 0.895
Adjusted R Square 0.842
Standard Error 0.345
Observations 10
ANOVA
df SS MS F Significance F
Regression 3 6.056 2.019 16.991 0.002
Residual 6 0.713 0.119
Total 9 6.769
Coefficients
Standard
Error
t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 6.211 2.321 2.677 0.037 0.533 11.890 0.533 11.890
Distance Travelled
(Kms), (X1)
0.014 0.022 0.636 0.548 -0.040 0.068 -0.040 0.068
No. of Deliveries
(X2)
0.383 0.300 1.277 0.249 -0.351 1.117 -0.351 1.117
Petrol Price ($),
(X3)
-0.607 0.527 -1.152 0.293 -1.895 0.682 -1.895 0.682
i
i
i
i X
X
X
Y 3
2
1 607
.
0
383
.
0
014
.
0
211
.
6 ๏€ญ
๏€ซ
๏€ซ
๏€ฝ
Regression Equation:
Manual solution : Multiple regression
Suppose, for practice purpose we use Y as dependent variable and X1, and
X2 as independent variable as:
Travel Time
(hrs), (Y)
Distance
Travelled
(Kms), (X1)
No. of Deliveries
(X2)
7 89 4
5.4 66 1
6.6 78 3
7.4 111 6
4.8 44 1
6.4 77 3
7 80 3
5.6 66 2
7.3 109 5
6.4 76 3
Step 1: Calculate X1
2, X2
2, X1y, X2y and X1X2.
Travel
Time
(hrs), (Y)
Distance
Travelled
(Kms),
(X1)
No. of
Deliverie
s (X2) X1
2 X2
2 X1Y X2Y X1X2
7 89 4 7921 16 623 28 356
5.4 66 1 4356 1 356.4 5.4 66
6.6 78 3 6084 9 514.8 19.8 234
7.4 111 6 12321 36 821.4 44.4 666
4.8 44 1 1936 1 211.2 4.8 44
6.4 77 3 5929 9 492.8 19.2 231
7 80 3 6400 9 560 21 240
5.6 66 2 4356 4 369.6 11.2 132
7.3 109 5 11881 25 795.7 36.5 545
6.4 76 3 5776 9 486.4 19.2 228
SUM 63.9 796 31 66960 119 5231.3 209.5 2742
Mean 6.39 79.6 3.1
Step 2: Calculate Regression Sums
๐‘‹1
2
= ๐‘‹1
2
โˆ’ ( ๐‘‹1)2 /๐‘› = 66960 โˆ’
(796)2
10
= 66960 โ€“ 63361.6= 3598.4
๐‘‹2
2
= ๐‘‹2
2
โˆ’ ( ๐‘‹2)2
/๐‘› = 119 โˆ’
(31)2
10
= 119 - 96.1= 22.9
๐‘‹1 ๐‘Œ = ๐‘‹1 ๐‘Œ โˆ’ ( ๐‘‹1 ๐‘Œ) /๐‘› =5231.3 โˆ’
796 ร— 63.9
10
= 5231.3 - 5086.44
=144.86
๐‘‹2 ๐‘Œ = ๐‘‹2 ๐‘Œ โˆ’ ( ๐‘‹2 ๐‘Œ) /๐‘› = 209.5โˆ’
31 ร— 63.9
10
= 209.5- 198.09=11.41
๐‘‹1 ๐‘‹2 = ๐‘‹1 ๐‘‹2 โˆ’ ( ๐‘‹1 ๐‘‹2 ) /๐‘› = 2742 โˆ’
796 ร—31
10
=
2742โˆ’2467.6=274.4
X1
2 X2
2 X1Y X2Y X1X2
66960 119 5231.3 209.5 2742
Sum X1 =
796
Sum X2=
31 n=10
Sum
Y=63.9
Calculation b0, b1, and b2
๏ต The formula to calculate b1 is:
๏ต The formula to calculate b2 is:
๏ต The formula to calculate b0 is:
๐‘1
=
[( ๐‘‹2
2
)( ๐‘‹1 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹2๐‘Œ)]
[( ๐‘‹1
2
) ( ๐‘‹2
2
) โˆ’ ( ๐‘‹1๐‘‹2)2]
๐‘2
=
[( ๐‘‹1
2
) ๐‘‹2 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹1๐‘Œ)]
[( ๐‘‹1
2
) ( ๐‘‹2
2
) โˆ’ ( ๐‘‹1๐‘‹2)2]
๐‘0 = ๐‘Œ โˆ’ ๐‘1๐‘‹1 โˆ’ ๐‘2๐‘‹2
Step 3: Calculate b0, b1, and b2
๏ต Calculation of b1 is:
๏ต The formula to calculate b2 is:
๏ต The formula to calculate b0 is:
๐‘0 = ๐‘Œ โˆ’ ๐‘1๐‘‹1 โˆ’ ๐‘2๐‘‹2 = 6.39 โ€“ (0.02622 ร— 79.6) โ€“ (0.1840 ร— 3.1)
= 6.39 โˆ’ 2.087โˆ’ 0.57 =3.733
X1
2 X2
2 X1Y X2Y X1X2
Regression
SUM 3598.4 22.9 144.86 11.41 274.4
Mean Y
=6.39
Mean X1=
79.6
Mean X2=
3.1
๐’ƒ๐Ÿ =
[( ๐‘‹2
2
)( ๐‘‹1 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹2๐‘Œ)]
[( ๐‘‹1
2
) ( ๐‘‹2
2
) โˆ’ ( ๐‘‹1๐‘‹2)2]
=
[ 22.9 ร— 144.86 โˆ’ 274.4 ร— 11.41 ]
[ 3598.4 ร— 22.9 โˆ’ (274.4)2]
=
3317.29โˆ’3130.90
82403.36โˆ’75295.6
=
186.39
7107.76
= ๐ŸŽ. ๐ŸŽ๐Ÿ๐Ÿ”๐Ÿ๐Ÿ
๐’ƒ๐Ÿ =
[( ๐‘‹1
2
)( ๐‘‹2 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹1๐‘Œ)]
[( ๐‘‹1
2
) ( ๐‘‹2
2
) โˆ’ ( ๐‘‹1๐‘‹2)2]
=
[ 3598.4 ร— 11.41 โˆ’ 274.4 ร— 144.86 ]
[ 3598.4 ร— 22.9 โˆ’ (274.4)2]
=
41057.74 โˆ’ 39749.58
7107.76
=
1308.16
7107.76
= ๐ŸŽ. ๐Ÿ๐Ÿ–๐Ÿ’๐ŸŽ
Step 4: Place b0, b1, and b2 in the estimated
linear regression equation
๐’€ = 3.733 + 0.02622 ๐‘ฟ๐Ÿ + 0.1840 ๐‘ฟ๐Ÿ
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9335
R Square 0.8714
Adjusted R Square 0.8347
Standard Error 0.3526
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 5.8985 2.9493 23.7161 0.0008
Residual 7 0.8705 0.1244
Total 9 6.769
Coefficients Standard Error t Stat P-value
Intercept 3.7322 0.8870 4.2077 0.0040
X Variable 1 0.0262 0.0200 1.3101 0.2315
X Variable 2 0.1840 0.2509 0.7335 0.4871
Steps for manual solution : Multiple regression
Suppose, we have the following dataset with one response variable y and
two predictor variables X1, and X2.
Step 1: Calculate X1
2, X2
2, X1y, X2y and X1X2.
Step 2: Calculate Regression Sums
X1
2 X2
2 X1Y X2Y X1X2
38767 2823 101895 25364 9859
Sum X1 =
555
Sum X2 =
145 n=8
Sum Y =
1452
Step 3: Calculate b0, b1, and b2
X1
2 X2
2 X1Y X2Y X1X2
Regression
SUM 263.875 194.875 1162.5 -953.5 -200.375
Step 4: Place b0, b1, and b2 in the estimated
linear regression equation
Regression analysis:  Simple Linear Regression Multiple Linear Regression

More Related Content

What's hot

Chapter10
Chapter10Chapter10
Chapter10
rwmiller
ย 

What's hot (20)

Normal Distribution.pptx
Normal Distribution.pptxNormal Distribution.pptx
Normal Distribution.pptx
ย 
Regression
RegressionRegression
Regression
ย 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
ย 
Business Statistics Chapter 9
Business Statistics Chapter 9Business Statistics Chapter 9
Business Statistics Chapter 9
ย 
Chap10 hypothesis testing ; additional topics
Chap10 hypothesis testing ; additional topicsChap10 hypothesis testing ; additional topics
Chap10 hypothesis testing ; additional topics
ย 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)
ย 
Testing a Claim About a Standard Deviation or Variance
Testing a Claim About a Standard Deviation or VarianceTesting a Claim About a Standard Deviation or Variance
Testing a Claim About a Standard Deviation or Variance
ย 
The t Test for Two Independent Samples
The t Test for Two Independent SamplesThe t Test for Two Independent Samples
The t Test for Two Independent Samples
ย 
Chapter10
Chapter10Chapter10
Chapter10
ย 
T test
T testT test
T test
ย 
Basics of Hypothesis Testing
Basics of Hypothesis TestingBasics of Hypothesis Testing
Basics of Hypothesis Testing
ย 
Testing a Claim About a Proportion
Testing a Claim About a ProportionTesting a Claim About a Proportion
Testing a Claim About a Proportion
ย 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
ย 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
ย 
Hypothesis
HypothesisHypothesis
Hypothesis
ย 
Confidence Interval Estimation
Confidence Interval EstimationConfidence Interval Estimation
Confidence Interval Estimation
ย 
Testing Hypothesis
Testing HypothesisTesting Hypothesis
Testing Hypothesis
ย 
Basics of Hypothesis Testing
Basics of Hypothesis Testing  Basics of Hypothesis Testing
Basics of Hypothesis Testing
ย 
Mann Whitney U test
Mann Whitney U testMann Whitney U test
Mann Whitney U test
ย 
Chi sqr
Chi sqrChi sqr
Chi sqr
ย 

Similar to Regression analysis: Simple Linear Regression Multiple Linear Regression

Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Neeraj Bhandari
ย 
Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
ย 
Business Quantitative - Lecture 2
Business Quantitative - Lecture 2Business Quantitative - Lecture 2
Business Quantitative - Lecture 2
saark
ย 

Similar to Regression analysis: Simple Linear Regression Multiple Linear Regression (20)

Bba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsBba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression models
ย 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
ย 
Reg
RegReg
Reg
ย 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
ย 
Regression
Regression Regression
Regression
ย 
Regression analysis
Regression analysisRegression analysis
Regression analysis
ย 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JM
ย 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM
ย 
Chapter III.pptx
Chapter III.pptxChapter III.pptx
Chapter III.pptx
ย 
Rsh qam11 ch04 ge
Rsh qam11 ch04 geRsh qam11 ch04 ge
Rsh qam11 ch04 ge
ย 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
ย 
Regression analysis
Regression analysisRegression analysis
Regression analysis
ย 
Regression
RegressionRegression
Regression
ย 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
ย 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
ย 
Business Quantitative - Lecture 2
Business Quantitative - Lecture 2Business Quantitative - Lecture 2
Business Quantitative - Lecture 2
ย 
Regression
RegressionRegression
Regression
ย 
Regression
RegressionRegression
Regression
ย 
ML Module 3.pdf
ML Module 3.pdfML Module 3.pdf
ML Module 3.pdf
ย 
Data Approximation in Mathematical Modelling Regression Analysis and Curve Fi...
Data Approximation in Mathematical Modelling Regression Analysis and Curve Fi...Data Approximation in Mathematical Modelling Regression Analysis and Curve Fi...
Data Approximation in Mathematical Modelling Regression Analysis and Curve Fi...
ย 

More from Ravindra Nath Shukla (9)

Capital Budgeting : capital budgeting decision
Capital Budgeting : capital budgeting decisionCapital Budgeting : capital budgeting decision
Capital Budgeting : capital budgeting decision
ย 
Market Efficiency.pptx
Market Efficiency.pptxMarket Efficiency.pptx
Market Efficiency.pptx
ย 
Communication Extra classes.pptx
Communication  Extra classes.pptxCommunication  Extra classes.pptx
Communication Extra classes.pptx
ย 
PPT BC class.pptx
PPT BC class.pptxPPT BC class.pptx
PPT BC class.pptx
ย 
Flexible Budget.pptx
Flexible Budget.pptxFlexible Budget.pptx
Flexible Budget.pptx
ย 
Buerocratic model
Buerocratic modelBuerocratic model
Buerocratic model
ย 
Discharge of contract
Discharge of contractDischarge of contract
Discharge of contract
ย 
Market Potential of Organic food Product
Market Potential  of Organic food ProductMarket Potential  of Organic food Product
Market Potential of Organic food Product
ย 
Marketing : Concept, Tools and Retention
Marketing : Concept, Tools and RetentionMarketing : Concept, Tools and Retention
Marketing : Concept, Tools and Retention
ย 

Recently uploaded

Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
ย 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
ย 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
ย 

Recently uploaded (20)

Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
ย 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
ย 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
ย 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
ย 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
ย 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
ย 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
ย 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
ย 
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
ย 
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
ย 
Horngrenโ€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngrenโ€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngrenโ€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngrenโ€™s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
ย 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
ย 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
ย 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
ย 
Berhampur CALL GIRLโค7091819311โคCALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRLโค7091819311โคCALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRLโค7091819311โคCALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRLโค7091819311โคCALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
ย 
UAE Bur Dubai Call Girls โ˜ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls โ˜ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls โ˜ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls โ˜ 0564401582 Call Girl in Bur Dubai
ย 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
ย 
KOTA ๐Ÿ’‹ Call Girl 9827461493 Call Girls in Escort service book now
KOTA ๐Ÿ’‹ Call Girl 9827461493 Call Girls in  Escort service book nowKOTA ๐Ÿ’‹ Call Girl 9827461493 Call Girls in  Escort service book now
KOTA ๐Ÿ’‹ Call Girl 9827461493 Call Girls in Escort service book now
ย 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
ย 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
ย 

Regression analysis: Simple Linear Regression Multiple Linear Regression

  • 1. MBMG-7104/ ITHS-2202/ IMAS-3101/ IMHS-3101 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 2. Regression Analysis ๏ƒ˜ Regression analysis is a set of statistical process for estimating the relationship between a dependent variable (response) and one or more independent variables(aka explanatory). ๏ƒ˜ For example, when a series of Y numbers (such as the monthly sales of cameras over a period of years) is causally connected with the series of X numbers (the monthly advertising budget), then it is beneficial to establish a relationship between X and Y in order to forecast Y. 2 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 3. 3 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Types of Regression ๏ƒ˜ There are various types of regressions. ๏ƒ˜ Each type has its own importance on different scenarios, but at the core, all the regression methods analyze the effect of the independent variable on dependent variables. ๏ƒ˜ Some important types of regression models are :
  • 4. 4 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Linear Regression ๏ƒ˜ Linear regression attempts to model the relationship between dependent (Y) and independent (X) variables by fitting a linear equation to observed data. ๏ƒ˜ The case of one independent variable is called Simple Linear Regression, for more than one independent variable the process is called Multiple Linear Regression.
  • 5. 5 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM A. Simple Linear Regression (SLR) ๏ƒ˜ Linear regression models describe the relationship between variables by fitting a line to the observed data. ๏ƒ˜ Linear regression uses a straight line, while logistic and non-linear regression uses curved lines. ๏ƒ˜ SLR assumes that at least to some extend, the behavior of one variable (Y) is the result of a functional relationship between the two variables ( X & Y).
  • 6. 6 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Objectives of Regression 1. Establish if there is a relationship between two variables (X & Y). 2. Forecast new observation.
  • 7. 7 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Classical Assumptions of Linear regression 1. Linear relationship 2. No-Autocorrelation : Regressors and error term has no correlation. 3. No-Multicollinearity : Independent variables are not correlated. 4. Homoscedasticity: The error term is the same across all values of the independent variables. 5. Multivariate Normality : Data should be normally distributed
  • 8. 8 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Mathematical expression of Linear regression ๏ต Assume Y is a dependent variable ๏ต That depends on the realization of the independent variable X. ๏ต We know the linear equation of a simple line is : (1) ๏ต Where, m = gradient or slope ๏ต c = y -intercept or height at which the line crosses the y โ€“axis. c mX Y ๏€ซ ๏€ฝ c mX Y ๏€ซ ๏€ฝ Intercept c X Y
  • 9. 9 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต For SLR model, Equation (1) could we written as : (2) ๏ต Where, m = gradient or slope ๏ต ฮฒ0 = y -intercept = value where the regression line crosses the y โ€“ axis. ๏ต ฮฒ1 = Coefficient or slope of X. X Y 1 0 ๏ข ๏ข ๏€ซ ๏€ฝ Intercept ฮฒ0 X Y X Y 1 0 ๏ข ๏ข ๏€ซ ๏€ฝ
  • 10. 10 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต The slope or coefficient of X denotes the relationship between X and Y. ๏ต With unit change in X, Y changed as no. of times of X coefficient. ๏ต In other words ฮฒ1 represents the sensitivity of Y changes with X. ๏ต In the equation, Y = ฮฒ0 + ฮฒ1 X, ๏ต ฮฒ0 gives the value of variable Y, when X = 0. Intercept ฮฒ0 X Y X Y 1 0 ๏ข ๏ข ๏€ซ ๏€ฝ
  • 11. y = 2.7333 + 8.6485x 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 Demand : Dependent Variable (Y) Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y)) Consider the demand data given in the table below. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Example Month: Independent Variable (X) Demand : Dependent Variable (Y) 1 9 2 15 3 32 4 48 5 52 6 60 7 39 8 65 9 90 10 93 ๏ต Here , The slope or coefficient of X is ๏ต ฮฒ1 = 8.6485 ๏ต denotes that for one unit change in X, Y changes 8.6485
  • 12. y = b0 + b1x + ฮตi 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 Demand : Dependent Variable (Y) Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y)) @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต Data in general may not intersect or passes through the regression line. ๏ต Rather, there exist some errors on random component, which can be measured as distance between true value and predicted value. ๏ต The regression model must include these error terms as : (3) For sample data, equation (3) can be written as : i X Y ๏ฅ ๏ข ๏ข ๏€ซ ๏€ซ ๏€ฝ 1 0 i x b b y ๏ฅ ๏€ซ ๏€ซ ๏€ฝ 1 0 (4) Non-random component Random component
  • 13. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต To minimize the error or random component , SLR uses OLS method and calculate the value of and as : ๏ต The intercept or coefficient of X, ) var( ) ( ) ( ) )( ( ห† 1 2 1 1 Y XY Cov X X Y Y X X n i i n i i i ๏€ฝ ๏€ญ ๏€ญ ๏€ญ ๏€ฝ ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏ข X Y 1 0 ห† ๏ข ๏ข ๏€ญ ๏€ฝ ORDINARY LEAST SQUARES (OLS) 0 ๏ขฬ‚ 1 ๏ขฬ‚
  • 14. Standard error of slope and Intercept 2 1 2 1 2 0 ) ( 1 2 ) ( SE Intercept ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ซ ๏‚ด ๏€ญ ๏€ฝ n i i n i i X X X n n ๏ฅ ๏ข 2 1 1 2 1 ) ( 2 ) ( SE Slope ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ญ ๏€ฝ n i i n i i X X n ๏ฅ ๏ข
  • 15. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต To minimize the error or random component , SLR uses OLS method ๏ต The method of least squares gives the best equation under the assumptions stated below : 1. The regression model is linear in regression parameters. 2. The explanatory variable, X, is assumed to be non-random or non-stochastic (i.e., X is deterministic). 3. The conditional expected value of the error terms or residuals, E(ฮตi | Xi), is zero. 4. In case of time series data, error terms are uncorrelated, that is, Cov(ฮตi , ฮตi ) = 0 for all i โ‰  j. 5. The variance of the errors, Var(ฮตi |Xi), is constant for all values of Xi. or follows homoscedasticity. 6. The error terms, ฮตi , follow a normal distribution. Explanation of OLS method
  • 16. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต In ordinary least squares, the objective is find the optimal values of ฮฒ0 and ฮฒ1 that will minimize the Sum of Squares Errors (SSE) given in Eq. (6) as 2 1 1 0 1 2 ) ( i n i i n i i X Y SSE ๏ข ๏ข ๏ฅ ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ญ ๏€ฝ ๏€ฝ (6)
  • 17. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต To find the optimal values of ฮฒ0 and ฮฒ1 that will minimize SSE, we have to equate the partial derivative of SSE with respect to ฮฒ0 and ฮฒ1 to zero [Eqs. (7) and (9) as : ๏ต Solving Eq. (7) for ฮฒ0, the estimated value of ฮฒ0 is given by ๏ต Differentiating SSE with respect to ฮฒ1, we get ๏ต Substituting the value of ฮฒ0 from Eq. (8) in Eq. (9), we get (7) (8) (9)
  • 18. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ต Thus, the value of ฮฒ1 is given by ) var( ) ( ) ( ) )( ( ห† 1 2 1 1 Y XY Cov X X Y Y X X n i i n i i i ๏€ฝ ๏€ญ ๏€ญ ๏€ญ ๏€ฝ ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏ข
  • 19. Solution Manual Method :- 19 Time: (X) Demand (Y) ๐‘‘๐‘ฅ = ๐‘‹ โˆ’ ๐‘‹ ๐‘‘๐‘ฆ = ๐‘ฆ โˆ’ ๐‘ฆ (๐‘‹ โˆ’ ๐‘‹)2 (๐‘Œ โˆ’ ๐‘Œ)2 (X- X ฬ… )* (Y- Y ฬ… ) 1 9 -4.5 -41.3 20.25 1705.69 185.85 2 15 -3.5 -35.3 12.25 1246.09 123.55 3 32 -2.5 -18.3 6.25 334.89 45.75 4 48 -1.5 -2.3 2.25 5.29 3.45 5 52 -0.5 1.7 0.25 2.89 -0.85 6 60 0.5 9.7 0.25 94.09 4.85 7 39 1.5 -11.3 2.25 127.69 -16.95 8 65 2.5 14.7 6.25 216.09 36.75 9 90 3.5 39.7 12.25 1576.09 138.95 10 93 4.5 42.7 20.25 1823.29 192.15 5.5 50.3 82.5 7132.1 713.5 ๐‘‹ ๐‘Œ ๐‘‹ โˆ’ ๐‘‹ 2 ๐‘Œ โˆ’ ๐‘Œ 2 ฮฃ(X- X ฬ… )*(Y- Y ฬ… ) ฮฒ1 = (๐‘‹โˆ’ ๐‘‹)โˆ—(๐‘Œโˆ’ ๐‘Œ) (๐‘‹โˆ’๐‘‹ )2 = 713.5 82.5 = 8.6485 ฮฒ0 = ๐‘Œ โˆ’ ฮฒ1 โˆ— ๐‘‹ = 50.3 โ€“ 8.6485*5.5 = 2.733 ๐‘…๐‘’๐‘”๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ๐‘’๐‘ž๐‘ข๐‘Ž๐‘ก๐‘–๐‘œ๐‘›: Y =ฮฒ0+ฮฒ1 โˆ— X = 2.733+ 8.6485*X @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 20. Example: Consider the demand data given in the table below. 20 Time: Independent Variable (X) 1 2 3 4 5 6 7 8 9 10 Demand : Dependent Variable (Y) 9 15 32 48 52 60 39 65 90 93 Time: Independent Variable (X) Demand : Dependent Variable (Y) 1 9 2 15 3 32 4 48 5 52 6 60 7 39 8 65 9 90 10 93 Slope (b) 8.65 Intercept (a) 2.73 y = 8.6485x + 2.7333 Rยฒ = 0.8652 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 Demand : Dependent Variable (Y) Demand : Dependent Variable (Y) Linear (Demand : Dependent Variable (Y)) Solution Excel :- @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 21. ๏ƒ˜ The Excel functions give b = 8.65 and a = 2.73. ๏ƒ˜ Use them in equation, Y = a + bX, to make a forecast. ๏ƒ˜ For example, for period 11 (X = 11), ๏ƒ˜ Forecast = 2.73 + 11*8.65 = 97.87. ๏ƒ˜ Similarly, for period 12, ๏ƒ˜ Forecast = 2.73 + 12*8.65 = 106.52. 21 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 22. Coefficient of Determination 22 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ƒ˜ The coefficient of determination (R2), where R is the value of the coefficient of correlation, is a measure of the variability that is accounted for by the regression line for the dependent variable. ๏ƒ˜ Calculation :
  • 23. ๏ƒ˜ Where, Yi = actual value of Y ๐‘Œ = estimated value of Y ๐‘Œ= mean value of Y ๏ƒ˜ The coefficient of determination always falls between 0 and 1. ๏ƒ˜ For example, if r = 0.8, the coefficient of determination is r2 = 0.64 meaning that 64% of the variation in Y is due to variation in X. ๏ƒ˜ The remaining 36% variation in the value of Y is due to other variables. ๏ƒ˜ If the coefficient of determination is low, multiple regression analysis may be used to account for all variables affecting the independent variable Y. 23 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 24. 24 Solution : Coefficient of determination and Std. Errors @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Time: (X) Dema nd (Y) ๐‘Œ (๐‘Œ โˆ’ ๐‘Œ)2 (๐‘Œ โˆ’ ๐‘Œ)2 ๐œ€2 = (๐‘Œ โˆ’ ๐‘Œ)2 1 9 11.3815 1705.69 1514.65 2 15 20.03 25.30 1246.09 916.2729 3 32 28.6785 11.03 334.89 467.4893 4 48 37.327 113.91 5.29 168.2987 5 52 45.9755 36.29 2.89 18.7013 6 60 54.624 28.90 94.09 18.69698 7 39 63.2725 589.15 127.69 168.2858 8 65 71.921 47.90 216.09 467.4676 9 90 80.5695 88.93 1576.09 916.2426 10 93 89.218 14.30 1823.29 1514.611 5.5 50.3 961.41 7132.1 6170.72 ๐‘‹ ๐‘Œ ๐‘†๐‘†๐‘… = ๐‘Œ โˆ’ ๐‘Œ 2 ๐‘†๐‘†๐‘‡ = ๐‘Œ โˆ’ ๐‘Œ 2 ๐‘†๐‘†๐ธ = ๐œ€2 = ๐‘Œ โˆ’ ๐‘Œ 2 ๐‘…2 = ๐‘†๐‘†๐‘… ๐‘†๐‘†๐‘‡ = 6170.72 7132.1 =0.8652 ๐‘…2 = 1 โˆ’ ๐‘†๐‘†๐ธ ๐‘†๐‘†๐‘‡ = 1- 961.41 7132.1 = 0.8652 2 1 2 1 2 0 ) ( 1 2 ) ( SE Intercept ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ซ ๏‚ด ๏€ญ ๏€ฝ n i i n i i X X X n n ๏ฅ ๏ข 2 1 1 2 1 ) ( 2 ) ( SE Slope ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ญ ๏€ฝ n i i n i i X X n ๏ฅ ๏ข = 961.41 8 ร— 1 10 + 30.25 82.5 = 10.96 * 0.683 = 7.488 = 961.41 8 โˆ— 1/82.5 = 1.2069
  • 25. Exercise 25 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM 1. From the following data of a book store ABC, derive the regression equation for the effect of purchases on sales of books. Also, calculate standard errors and coefficient of determination. 2.
  • 26. Solution 1 ฮฒ0 = ๐‘Œ โˆ’ ฮฒ1 โˆ— ๐‘‹ = 70 โ€“ 0.6132*90 = 14.812 ฮฒ1 = (๐‘‹โˆ’ ๐‘‹)โˆ—(๐‘Œโˆ’ ๐‘Œ) (๐‘‹โˆ’๐‘‹ )2 = 3900 6360 = 0.6132 ๐‘…๐‘’๐‘”๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ๐‘’๐‘ž๐‘ข๐‘Ž๐‘ก๐‘–๐‘œ๐‘›: Y =ฮฒ0+ฮฒ1 โˆ— X = 0.6132+ 14.812*X
  • 27. ๏ต The coefficient of determination = ๏ต Standard Errors : @Ravindra Na ๐‘…2 = ๐‘†๐‘†๐‘… ๐‘†๐‘†๐‘‡ = ๐‘Œโˆ’๐‘Œ ฬ… 2 ๐‘Œโˆ’ ฬ… ๐‘Œ 2 = 2391.51 2868 =0.8339 2 1 2 1 2 0 ) ( 1 2 ) ( SE Intercept ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ซ ๏‚ด ๏€ญ ๏€ฝ n i i n i i X X X n n ๏ฅ ๏ข 2 1 1 2 1 ) ( 2 ) ( SE Slope ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ญ ๏€ญ ๏€ฝ n i i n i i X X n ๏ฅ ๏ข = 476.49 8 ร— 1 10 + 8100 6380 = 7.718 * 1.172= 9.045 = 476.49 8 โˆ— 1/6380 = 0.0967 X Y ๐‘Œ (๐‘Œ โˆ’ ๐‘Œ)2 (๐‘Œ โˆ’ ๐‘Œ)2 (๐‘Œ โˆ’ ๐‘Œ)2 91 71 70.61 0.38 1 0.15 97 75 74.29 18.42 25 0.50 108 69 81.04 121.83 1 144.90 121 97 89.01 361.35 729 63.85 67 70 55.90 198.92 0 198.92 124 91 90.85 434.67 441 0.02 51 39 46.08 571.95 961 50.19 73 61 59.58 108.68 81 2.03 111 80 82.88 165.82 100 8.28 57 47 49.76 409.50 529 7.64 90 70 2391.51 2868.00 476.49 ๐‘‹ ๐‘Œ ๐‘†๐‘†๐‘… = ๐‘Œ โˆ’ ๐‘Œ 2 ๐‘†๐‘†๐‘‡ = ๐‘Œ โˆ’ ๐‘Œ 2 ๐‘†๐‘†๐ธ = ๐œ€2 = ๐‘Œ โˆ’ ๐‘Œ 2
  • 28. 28 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM A. Multiple Linear Regression (MLR) โ€ข Predicting an outcome (dependent variable) based upon several independent variables simultaneously. โ€ข Why is this important? โ€ข Behavior is rarely a function of just one variable, but is instead influenced by many variables. So the idea is that we should be able to obtain a more accurate predicted score if using multiple variables to predict our outcome.
  • 29. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Travel time (y) Km travelled (x1) No. of Deliveries (x2) Independent variables Dependent variables Potential multicollinearity Multiple regression many-to-one DV IV IV IV IV Multiple regression many-to-one 10 relationships to consider!
  • 30. 30 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ƒ˜ The functional form of MLR is given by ๏ƒ˜ The variable Y is the dependent variable (response variable or outcome variable); ๏ƒ˜ X1, X2 , โ€ฆ, Xk are independent variables (predictor variables or explanatory variables); ๏ƒ˜ ฮฒ0 is a constant; ๏ƒ˜ ฮฒ1 , ฮฒ2, โ€ฆ, ฮฒk are called the partial regression co-efficients corresponding to the explanatory variables X1, X2 , โ€ฆ, Xk respectively; and ๏ƒ˜ ฮตi is the error term (or residual). i ki k i i i X X X Y ๏ฅ ๏ข ๏ข ๏ข ๏ข ๏€ซ ๏€ซ ๏€ซ ๏€ซ ๏€ซ ๏€ฝ ....... 2 2 1 1 0
  • 31. 31 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ƒ˜ IF ฮตi = 0, then ๏ƒ˜ The estimated value of Y will be ๏ƒ˜ In MLR each coefficient is interpreted as estimated change in Y corresponding to the unit change in a independent variable (X1), when all other variables held constant (X2, X3,โ€ฆ Xk). ki k i i X X X Y E ๏ข ๏ข ๏ข ๏ข ๏€ซ ๏€ซ ๏€ซ ๏€ซ ๏€ฝ ....... ) ( 2 2 1 1 0 ki k i i X b X b X b b Y ๏€ซ ๏€ซ ๏€ซ ๏€ซ ๏€ฝ ....... ห† 2 2 1 1 0
  • 32. Simple vs. Multiple Regression 32 @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
  • 33. New Considerations @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ƒ˜ Adding more independent variables to a multiple regression procedure does not mean the regression will be "better" or offer better predictions; in fact it can make things worse. This is called OVERFITTING. ๏ƒ˜ The addition of more independent variables creates more relationships among them. So not only are the independent variables potentially related to the dependent variable, they are also potentially related to each other. When this happens, it is called MULTICOLLINEARITY.
  • 34. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM ๏ƒ˜ The ideal is for all of the independent variables to be correlated with the dependent variable but NOT with each other. ๏ƒ˜ Because of multicollinearity and overfitting, there is a fair amount of prep-work is required before conducting multiple regression analysis : ๏ถ Correlations ๏ถ Scatter plots ๏ถ Simple regressions
  • 35. Steps in MLR @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM 1. Generate a list of potential variables; independent(s) and dependent 2. Collect data on the variables 3. Check the relationships between each independent variable and the dependent variable using scatterplots and correlations 4. Check the relationships among the independent variables using scatterplots and correlations 5. Conduct simple linear regressions for each IV/DV pair (Optional). 6. Use the non-redundant independent variables in the analysis to find the best fitting model 7. Use the best fitting model to make predictions about the dependent variable.
  • 36. Example @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Aditya Delivery Service (ADS) offers same-day delivery for letters, packages, and other small courier parcels. They use Google Maps to group individual deliveries into one trip to reduce time and fuel costs. Some trips cost more than one delivery. The ADS company wants to estimate how long a delivery will take based on three factors: 1) the total distance of the trip in Kilometers (KMs) 2) the number of deliveries that must be made during the trip, and 3) the daily price of petrol.
  • 37. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM In this case, we can predict the total travel time using the distance traveled, number of deliveries on each trip, and daily petrol price. Distance Travelled (Kms), (X1) No. of Deliveries (X2) Petrol Price ($), (X3) Travel Time (hrs), (Y) 89 4 3.84 7 66 1 3.19 5.4 78 3 3.78 6.6 111 6 3.89 7.4 44 1 3.57 4.8 77 3 3.57 6.4 80 3 3.03 7 66 2 3.51 5.6 109 5 3.54 7.3 76 3 3.25 6.4 Step 1 : Generate a list of potential variables; independent(s) and dependent Step 2 : Collect data on the variables To conduct this analysis a random sample of 10 past trips and record four pieces of information for each trip is given as:
  • 38. Step 3 : Scatterplot IV to DV Step 4 : Scatterplot IV to IV y = 0.0403x + 3.1856 Rยฒ = 0.8615 5 5.5 6 6.5 7 7.5 40 60 80 100 120 Distance Travelled (Kms), (X1) Travel Time (hrs), (Y) y = 0.4983x + 4.8454 Rยฒ = 0.8399 5 5.5 6 6.5 7 7.5 1 2 3 4 5 6 Travel Time (hrs), (Y) Travel Time (hrs), (Y) No. of Deliveries (X2) 5 5.5 6 6.5 7 7.5 3 3.2 3.4 3.6 3.8 Petrol Price ($), (X3) y = 0.0763x - 2.97 Rยฒ = 0.9137 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 40 60 80 100 120 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 40 60 80 100 120 Distance Travelled (Kms), (X1) Distance Travelled (Kms), (X1) No. of Deliveries (X 2 ) No. of Deliveries (X2) Petrol Price ($), (X 3 ) Petrol Price ($), (X 3 ) 3 3.2 3.4 3.6 3.8 4 1 2 3 4 5 6
  • 39. ๏ต Step 5. Conduct simple linear regressions for each IV/DV pair (Optional). ๏ต Step 6. Use the non-redundant independent variables in the analysis to find the best fitting model. ๏‚ง In our example Step 3 suggest that Petrol Price ($), (X3) is redundant due to overfitting issue ๏‚ง X1 or X2 is redundant as X1 & X2 shows multicollinearity ๏ต Step 7. Use the best fitting model to make predictions about the dependent variable.
  • 40. Excel: Multiple regression Distance Travelled (Kms), (X1) No. of Deliveries (X2) Petrol Price ($), (X3) Travel Time (hrs), (Y) 89 4 3.84 7 66 1 3.19 5.4 78 3 3.78 6.6 111 6 3.89 7.4 44 1 3.57 4.8 77 3 3.57 6.4 80 3 3.03 7 66 2 3.51 5.6 109 5 3.54 7.3 76 3 3.25 6.4 For practice purpose, if we solve our above example considering one response variable y and three predictor variables X1, X2, and X3.
  • 41. SUMMARY OUTPUT @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM Regression Statistics Multiple R 0.946 R Square 0.895 Adjusted R Square 0.842 Standard Error 0.345 Observations 10 ANOVA df SS MS F Significance F Regression 3 6.056 2.019 16.991 0.002 Residual 6 0.713 0.119 Total 9 6.769 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 6.211 2.321 2.677 0.037 0.533 11.890 0.533 11.890 Distance Travelled (Kms), (X1) 0.014 0.022 0.636 0.548 -0.040 0.068 -0.040 0.068 No. of Deliveries (X2) 0.383 0.300 1.277 0.249 -0.351 1.117 -0.351 1.117 Petrol Price ($), (X3) -0.607 0.527 -1.152 0.293 -1.895 0.682 -1.895 0.682 i i i i X X X Y 3 2 1 607 . 0 383 . 0 014 . 0 211 . 6 ๏€ญ ๏€ซ ๏€ซ ๏€ฝ Regression Equation:
  • 42. Manual solution : Multiple regression Suppose, for practice purpose we use Y as dependent variable and X1, and X2 as independent variable as: Travel Time (hrs), (Y) Distance Travelled (Kms), (X1) No. of Deliveries (X2) 7 89 4 5.4 66 1 6.6 78 3 7.4 111 6 4.8 44 1 6.4 77 3 7 80 3 5.6 66 2 7.3 109 5 6.4 76 3
  • 43. Step 1: Calculate X1 2, X2 2, X1y, X2y and X1X2. Travel Time (hrs), (Y) Distance Travelled (Kms), (X1) No. of Deliverie s (X2) X1 2 X2 2 X1Y X2Y X1X2 7 89 4 7921 16 623 28 356 5.4 66 1 4356 1 356.4 5.4 66 6.6 78 3 6084 9 514.8 19.8 234 7.4 111 6 12321 36 821.4 44.4 666 4.8 44 1 1936 1 211.2 4.8 44 6.4 77 3 5929 9 492.8 19.2 231 7 80 3 6400 9 560 21 240 5.6 66 2 4356 4 369.6 11.2 132 7.3 109 5 11881 25 795.7 36.5 545 6.4 76 3 5776 9 486.4 19.2 228 SUM 63.9 796 31 66960 119 5231.3 209.5 2742 Mean 6.39 79.6 3.1
  • 44. Step 2: Calculate Regression Sums ๐‘‹1 2 = ๐‘‹1 2 โˆ’ ( ๐‘‹1)2 /๐‘› = 66960 โˆ’ (796)2 10 = 66960 โ€“ 63361.6= 3598.4 ๐‘‹2 2 = ๐‘‹2 2 โˆ’ ( ๐‘‹2)2 /๐‘› = 119 โˆ’ (31)2 10 = 119 - 96.1= 22.9 ๐‘‹1 ๐‘Œ = ๐‘‹1 ๐‘Œ โˆ’ ( ๐‘‹1 ๐‘Œ) /๐‘› =5231.3 โˆ’ 796 ร— 63.9 10 = 5231.3 - 5086.44 =144.86 ๐‘‹2 ๐‘Œ = ๐‘‹2 ๐‘Œ โˆ’ ( ๐‘‹2 ๐‘Œ) /๐‘› = 209.5โˆ’ 31 ร— 63.9 10 = 209.5- 198.09=11.41 ๐‘‹1 ๐‘‹2 = ๐‘‹1 ๐‘‹2 โˆ’ ( ๐‘‹1 ๐‘‹2 ) /๐‘› = 2742 โˆ’ 796 ร—31 10 = 2742โˆ’2467.6=274.4 X1 2 X2 2 X1Y X2Y X1X2 66960 119 5231.3 209.5 2742 Sum X1 = 796 Sum X2= 31 n=10 Sum Y=63.9
  • 45. Calculation b0, b1, and b2 ๏ต The formula to calculate b1 is: ๏ต The formula to calculate b2 is: ๏ต The formula to calculate b0 is: ๐‘1 = [( ๐‘‹2 2 )( ๐‘‹1 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹2๐‘Œ)] [( ๐‘‹1 2 ) ( ๐‘‹2 2 ) โˆ’ ( ๐‘‹1๐‘‹2)2] ๐‘2 = [( ๐‘‹1 2 ) ๐‘‹2 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹1๐‘Œ)] [( ๐‘‹1 2 ) ( ๐‘‹2 2 ) โˆ’ ( ๐‘‹1๐‘‹2)2] ๐‘0 = ๐‘Œ โˆ’ ๐‘1๐‘‹1 โˆ’ ๐‘2๐‘‹2
  • 46. Step 3: Calculate b0, b1, and b2 ๏ต Calculation of b1 is: ๏ต The formula to calculate b2 is: ๏ต The formula to calculate b0 is: ๐‘0 = ๐‘Œ โˆ’ ๐‘1๐‘‹1 โˆ’ ๐‘2๐‘‹2 = 6.39 โ€“ (0.02622 ร— 79.6) โ€“ (0.1840 ร— 3.1) = 6.39 โˆ’ 2.087โˆ’ 0.57 =3.733 X1 2 X2 2 X1Y X2Y X1X2 Regression SUM 3598.4 22.9 144.86 11.41 274.4 Mean Y =6.39 Mean X1= 79.6 Mean X2= 3.1 ๐’ƒ๐Ÿ = [( ๐‘‹2 2 )( ๐‘‹1 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹2๐‘Œ)] [( ๐‘‹1 2 ) ( ๐‘‹2 2 ) โˆ’ ( ๐‘‹1๐‘‹2)2] = [ 22.9 ร— 144.86 โˆ’ 274.4 ร— 11.41 ] [ 3598.4 ร— 22.9 โˆ’ (274.4)2] = 3317.29โˆ’3130.90 82403.36โˆ’75295.6 = 186.39 7107.76 = ๐ŸŽ. ๐ŸŽ๐Ÿ๐Ÿ”๐Ÿ๐Ÿ ๐’ƒ๐Ÿ = [( ๐‘‹1 2 )( ๐‘‹2 ๐‘Œ) โˆ’ ( ๐‘‹1 ๐‘‹2 )( ๐‘‹1๐‘Œ)] [( ๐‘‹1 2 ) ( ๐‘‹2 2 ) โˆ’ ( ๐‘‹1๐‘‹2)2] = [ 3598.4 ร— 11.41 โˆ’ 274.4 ร— 144.86 ] [ 3598.4 ร— 22.9 โˆ’ (274.4)2] = 41057.74 โˆ’ 39749.58 7107.76 = 1308.16 7107.76 = ๐ŸŽ. ๐Ÿ๐Ÿ–๐Ÿ’๐ŸŽ
  • 47. Step 4: Place b0, b1, and b2 in the estimated linear regression equation ๐’€ = 3.733 + 0.02622 ๐‘ฟ๐Ÿ + 0.1840 ๐‘ฟ๐Ÿ
  • 48. SUMMARY OUTPUT Regression Statistics Multiple R 0.9335 R Square 0.8714 Adjusted R Square 0.8347 Standard Error 0.3526 Observations 10 ANOVA df SS MS F Significance F Regression 2 5.8985 2.9493 23.7161 0.0008 Residual 7 0.8705 0.1244 Total 9 6.769 Coefficients Standard Error t Stat P-value Intercept 3.7322 0.8870 4.2077 0.0040 X Variable 1 0.0262 0.0200 1.3101 0.2315 X Variable 2 0.1840 0.2509 0.7335 0.4871
  • 49. Steps for manual solution : Multiple regression Suppose, we have the following dataset with one response variable y and two predictor variables X1, and X2.
  • 50. Step 1: Calculate X1 2, X2 2, X1y, X2y and X1X2.
  • 51. Step 2: Calculate Regression Sums X1 2 X2 2 X1Y X2Y X1X2 38767 2823 101895 25364 9859 Sum X1 = 555 Sum X2 = 145 n=8 Sum Y = 1452
  • 52. Step 3: Calculate b0, b1, and b2 X1 2 X2 2 X1Y X2Y X1X2 Regression SUM 263.875 194.875 1162.5 -953.5 -200.375
  • 53. Step 4: Place b0, b1, and b2 in the estimated linear regression equation