Multiple regression analysis is used to understand the relationship between a dependent variable and multiple independent variables. It allows us to determine the effect of independent variables like age and work experience on a dependent variable like income. The analysis examines metrics like the coefficient of determination (R2), F-test, t-test and assumptions around normality, multicollinearity, homoscedasticity and autocorrelation. Properly conducted, multiple regression can be used to predict the value of a dependent variable based on the values of independent variables.
2. www.themegallery.com
Multiple Regression
Multiple regression analysis is used:
To know the effect of some independent
variables, X1, X2, ...,Xk to dependent variable
Y.
To predict a value of dependent variable
based on the values of independent
variables, X1, X2, ...,Xk .
2
3. www.themegallery.com
Model
General Model:
Yi = 0+1X1i+ 2X2i+3X3i+ ...+kXki+
i = Number of observations
k = Number of independent variables
0, 1, 2 , ..., k = Parameter/regression
coefficient
X1 , X2 ......Xk = Independent variables
Y = Dependent variable
= Error
3
4. www.themegallery.com
Examination of Regression
Coefficient of determination (R2)
Hypothesis test:
F-Test regression model
t-Test coefficient of regression
Classic assumption test:
Normality test
Multicolinearity test
Homoskedasticity test
Autocorrelation test
4
5. www.themegallery.com
Coefficent of Determination (R2)
R2 is used to measure the strength of association
between dependent and independent variables.
R2 is interpreted as the proportion of variance in
the dependent variable that is explained by
dependent variables.
The property of R2 : 0 ≤ R2 ≤ 1.
R2 the larger, the better.
Use adjusted R2 in multiple regression.
Example:
From a wine price study (the independent variable is
growing-season temperature), R2 was found = 0.80. It
means eighty percent of the variance in price may be
explained by growing-season temperature.
5
6. www.themegallery.com
F-Test
F-test is used to test whether a group of variables (independent
variables) in the model are jointly significant.
Hypothesis test:
H0 : 1 = 2 = 3 = 4 =............= k = 0
H1 : At least there is a ≠ 0
where k = number of independent variable
Output of F-test see ANOVA (Analysis of Variance)
table. If:
Sig. ≥ 0.05 accept H0 the independent variables
jointly have no significant effect to the
dependent variable.
Sig. < 0.05 reject H0 the independent variables
jointly have significant effect to the
dependent variable the model is good.
6
7. www.themegallery.com
t-Test
t-Test is used to check the significance of individual (partial) regression
coefficients in the multiple linear regression model.
Hypothesis test:
H0 : i = 0
H1 : i ≠ 0
where i = 1, 2, 3,..., k number of independent variable
From the Coefficient table (in the regression output) see/find the Sig.
(Significance) value for each independent variable:
If Sig. ≥ 0.05 accept H0, it means the independent
variable has no significant effect to
the dependent variable.
If Sig. < 0.05 reject H0, it means the independent
variable has significant effect to
the dependent variable.
7
8. www.themegallery.com
Basic Assumption for Regression
There are 4 main classic/basic assumptions:
Normality
No multicollinearity
Homoskedasticity
No autocorrelation
8
9. www.themegallery.com
Classical Assumptions Test in SPSS
No. Assumption Detector Notes
1. Normality Normal P-P Plot of Regression
Standardized Residual
Normal P-P Plot of Regression Standardized Residual
shows that points of data form a linear pattern or spread
approtimate to linear line.
2. Homoskedasticity Scatter Plot Scatter plot of standardized residual *ZRESID and
standardized predicted value *ZPRED not form a
specific pattern variance of its residual is constant
homoskedastic
If its scatter plot form any pattern heteroskedastic
variance of its residual is different
3. Multicolliniearity VIF (Variance Inflating Factor) The value of VIF : 1 –
VIF ≤ 10 there is no multicollinearity
VIF > 10 there is multicollinearity
TOL (Tolerance) The value of TOL = 0 – 1;
TOL 0 : there is multicollinearity
TOL 1 : there is no multicollinearity
9
∞
10. www.themegallery.com
Uji Asumsi Klasik pada PASW/SPSS
No. Assumption Detector Notes
3. Multicollinearity Eigenvalues Eigenvalues 0 : there is multicolinearity
Eigenvalues 1 : there is no multicollinearity
Conditional Index (CI) CI > 15 there is multicollinearity
CI ≤ 15 there is no multicollinearity
4. Autocorrelation Durbin-Watson (DW) The value of DW = 0 – 4
Compare DW from output and value of d from the table
(Durbin-Watson table) by condition:
• If : DW < dL positive correlation
• If : dL ≤ DW ≤ dU no conclusion/don’t know
• If : dU < DW < 4 – dU no autocorrelation
• If : 4 - dU ≤ DW ≤ 4 - dL no conclusion/don’t know
• If : DW > 4 – dL negative correlation
Continuation:
10
12. www.themegallery.com
Multicollinearity
Multicollinearity there are linear
relationship between independent variables.
For instance:
Yi = 0 + 1X1 + 2X2 + 3X3 + ui
Y : Consumption
X1 : Total Income
X2 : Income from salary
X3 : Income from non salary
Total income (X1) = Income from salary (X2)
+ Nonsalary income (X3)
multicollinearity exist
12
13. www.themegallery.com
Consequences of Multicollinearity:
Variance so high
Confidence Interval wide (variance is high
Standar Error is high Confidence
Interval is wide).
R2 is high but many variables are not
significant.
13
Multicollinearity
14. www.themegallery.com
Multicollinearity Detection
1. Eigenvalues dan Conditional Index (CI)
Multicollinearity exist in the regression equation if
Eigenvalues approtimate to zero (0).
Relationship between Eigenvalues and Conditional
Index (CI) :
s
eigenvalue
s
eigenvalue
=
CI
min
max
• If CI > 15 there is multicollinearity
• If CI < 15 there is no multicolinearity
14
15. www.themegallery.com
)
R
(
=
j
j 2
1
1
VIF
; j = 1,2,……,k
k = number of independent variable bebas
is a coefficient of determination between one independent
variable and other independent variables.
2
j
R
If VIF > 10 there is multicollinearity
If VIF ≤ 10 there is no multicolinearity
15
2. Variance Inflation Factor (VIF)
Multicollinearity Detection
16. www.themegallery.com
3. Tolerance (TOL)
VIF has relationship with TOL, as follow:
If TOL approtimate 1 no multicollinearity
2
1
1
TOL j
j R
=
VIF
=
16
Multicollinearity Detection
17. www.themegallery.com
Handling the Multicollinearity
Delete the variables which have strong
relationship with other variable.
– Commonly used.
– Be careful when deleting the variable.
Transform the variabel.
Add the sample/observation.
17
18. www.themegallery.com
Homoskedasticity
A homoskedastic error is one that has
constant variance basic assumption.
A heteroskedastic error is one that has a
nonconstant variance.
Heteroskedasticity happened when
variance of error was not constant.
Heteroskedasticity is more commonly a
problem for cross-section data sets,
although a time-series model can also
have a non-constant variance.
18
19. www.themegallery.com
Examination of Homoskedasticity
Graph Method
Principle: check the residual pattern (ui
2) to predicted
value of Yi.
The steps:
Running a regression model
Making scatter plot between ui
2 and predicted Yi.
19
Homoskedasticity
22. www.themegallery.com
Handling of Heteroskedasticity
Transform the data using Logarithm. The objective of
this transformation is to reduce the scale between
independent variables, so that the variance of error is
so small, not too different between observation group.
The model is:
Ln Yj = β0 + β1 Ln Xj + uj
22
23. www.themegallery.com
Autocorrelation
Autocorrelation correlation between variable itself
in observation at different time or different individual.
Commonly found in time series data. Current data
were influenced by previous time data. For example:
data about weight, salary/wage etc.
One of detector see the relationship pattern
between residual (ui) and independent variable or
time (X).
23
24. www.themegallery.com
Autocorrelation Pattern
ui ui
* **
* * * * * *
* * *
* * * ** * * * **
*
time/X time/X
(1) (2)
Diagram (1) shows there is a cycle, whereas diagram
(2) shows a linear line. Both indicate there are
autocorrelation.
24
Autocorrelation
25. www.themegallery.com
Autocorrelation Detection
Using Durbin Watson (DW) statistic.
Comparing the DW from SPSS output and DW
value in DW table.
The rules:
• DW < dL positive correlation
• dL ≤ DW ≤ dU no conclusion/don’t know
• dU < DW < 4 – dU no autocorrelation
• 4 - dU ≤ DW ≤ 4 - dL no conclusion/don’t know
• DW > 4 – dL negative correlation
25
27. www.themegallery.com
Application
A company has a sales person data that consist of age, income and
work experience. The director want to know whether any relationship
between age and work experience to income of sales person.
Besides that, the company also wants to make a multiple regression
model to predict the income based on age and work experience.
Regression model:
Yi = b0 + b1X1 + b2X2 + ui
Y : Income
X1 : Age
X2 : Work experience
b0 ,b1, b2 : Parameter
ui : residual
27
28. www.themegallery.com
Based on data in file multiple_regression1.sav , we will find the multiple regression
equation, y = b0 + b1x1 + b2x2 and conduct the hypothesis test to know whether the
regression coefficients are significat or not.
The steps:
1. Open file multiple_regression1.sav.
2. Click Analyze Regression Linear.
3. In Linear Regression view, move variable Income to Dependent box, then variable
Age dan Experience to Independent(s) box.
4. In Method section : select Enter.
5. Click Statistics knob, then give check at Estimates, Model fit, Collinearity
Diagnostics dan Durbin-Watson.
6. Click Continue.
7. Click Plots.
8. In Linear Regression section: Plots di bagian Standardized Residual
Plots, pilih Normal probability plot. Kemudian pindahkan *ZRESID
(standardized residual) ke dolam kotak Y dan *ZPRED (standardized
predicted value) ke dalam kotak X.
9. Click Continue, then OK.
28
Application
29. www.themegallery.com
1. Coefficient of Determination (R2)
Adjusted R2 shows that 92.7 % variance of Income can be explained by the
changes in Experience and Age.
2. Autocorrelation Test
Durbin-Watson (DW) value = 1.497
From DW table, with k=2 (independent variable), a = 0.05, we find that
dL=0.6972; dU=1.6413; 4-dU=2.3587; 4-dL=3.3028.
Since DW=1.497, then dL (0.6972) ≤ DW (1.497) ≤ dU (1.6413) no conclusion
29
Output Interpretation
Experience, Age
31. www.themegallery.com
F-Test :
From ANOVA table:
Sig. (p value) = 0.000 < a = 0.05 the independent variables jointly
influence the dependent variable
significantly model is good
Hypothesis test:
H0 : b1 = b2 = 0
H1 : b1 ≠ b2 ≠ 0
Sig. (p value) = 0.000 < a = 0.05 reject H0 b1 and b2 not equal
zero
31
Output Interpretation
32. www.themegallery.com
t-Test
– To test whether each regression coefficient is significant or
not see Coefficient table.
From Coefficient table:
– Variable Age (X1) :
Hypothesis test:
H0 : b1 = 0
H1 : b1 ≠ 0
Sig. : 0.000 < 0.05 Significant variable Age affects
income significantly
32
Output Interpretation
34. www.themegallery.com
Regression equation:
From Coefficient table:
Y = -10360.5 + 1201.098 X1 + 1663.516 X2
where : Y = Income
X1 = Age
X2 = Work experience
Interpretation of regression parameters:
-10360.5 intercept; the value of Y If X1 and X2 are zero.
+ 1201.098 every increase of X1 by one unit, the value of Y will
increase by +1201.098 unit.
+ 1663.516 every increase of X2 by one unit, the value of Y will
increase by +1663.516 unit.
34
Output Interpretation
35. www.themegallery.com
4. Normality Test:
See the Normal P-P Plots of Regression the data
points spread approtimate diagonal line form linear
pattern the data distribution is normal.
Company Logo
37. www.themegallery.com
5. Homoskedasticity Test
See Scatterplot output below:
The dispersion of data doesn’t form a specific pattern its variance is
constant homoskedastic.
37
Output Interpretation
38. www.themegallery.com
6. Multicollinearity Test
See Coefficients table, in Collinearity Statistics column:
The value of VIF = 1.377 < 10 no multicolinearity
The value of TOL = 0.726 approtimate to 1 no multicollinearity
38
Interpretasi Output Regresi Berganda
Age
Experience