Analyze Multiple Regression Model to Predict Income

www.themegallery.com
Multiple Regression
Multiple regression analysis is used:
 To know the effect of some independent
variables, X1, X2, ...,Xk to dependent variable
Y.
 To predict a value of dependent variable
based on the values of independent
variables, X1, X2, ...,Xk .
2

Model
General Model:
Yi = 0+1X1i+ 2X2i+3X3i+ ...+kXki+
i = Number of observations
k = Number of independent variables
0, 1, 2 , ..., k = Parameter/regression
coefficient
X1 , X2 ......Xk = Independent variables
Y = Dependent variable
 = Error
3

Examination of Regression
 Coefficient of determination (R2)
 Hypothesis test:
 F-Test  regression model
 t-Test  coefficient of regression
 Classic assumption test:
 Normality test
 Multicolinearity test
 Homoskedasticity test
 Autocorrelation test
4

Coefficent of Determination (R2)
 R2 is used to measure the strength of association
between dependent and independent variables.
 R2 is interpreted as the proportion of variance in
the dependent variable that is explained by
dependent variables.
 The property of R2 : 0 ≤ R2 ≤ 1.
 R2  the larger, the better.
 Use adjusted R2 in multiple regression.
 Example:
 From a wine price study (the independent variable is
growing-season temperature), R2 was found = 0.80. It
means eighty percent of the variance in price may be
explained by growing-season temperature.
5

F-Test
 F-test is used to test whether a group of variables (independent
variables) in the model are jointly significant.
 Hypothesis test:
H0 : 1 = 2 = 3 = 4 =............= k = 0
H1 : At least there is a  ≠ 0
where k = number of independent variable
 Output of F-test  see ANOVA (Analysis of Variance)
table. If:
 Sig. ≥ 0.05  accept H0  the independent variables
jointly have no significant effect to the
dependent variable.
 Sig. < 0.05  reject H0  the independent variables
jointly have significant effect to the
dependent variable  the model is good.
6

t-Test
 t-Test is used to check the significance of individual (partial) regression
coefficients in the multiple linear regression model.
 Hypothesis test:
H0 : i = 0
H1 : i ≠ 0
where i = 1, 2, 3,..., k number of independent variable
 From the Coefficient table (in the regression output)  see/find the Sig.
(Significance) value for each independent variable:
 If Sig. ≥ 0.05  accept H0, it means the independent
variable has no significant effect to
the dependent variable.
 If Sig. < 0.05  reject H0, it means the independent
variable has significant effect to
the dependent variable.
7

Basic Assumption for Regression
There are 4 main classic/basic assumptions:
 Normality
 No multicollinearity
 Homoskedasticity
 No autocorrelation
8

Classical Assumptions Test in SPSS
No. Assumption Detector Notes
1. Normality Normal P-P Plot of Regression
Standardized Residual
Normal P-P Plot of Regression Standardized Residual
shows that points of data form a linear pattern or spread
approtimate to linear line.
2. Homoskedasticity Scatter Plot  Scatter plot of standardized residual *ZRESID and
standardized predicted value *ZPRED  not form a
specific pattern  variance of its residual is constant
 homoskedastic
 If its scatter plot form any pattern  heteroskedastic
 variance of its residual is different
3. Multicolliniearity VIF (Variance Inflating Factor)  The value of VIF : 1 –
 VIF ≤ 10  there is no multicollinearity
 VIF > 10  there is multicollinearity
TOL (Tolerance)  The value of TOL = 0 – 1;
 TOL  0 : there is multicollinearity
 TOL  1 : there is no multicollinearity
9
∞

Uji Asumsi Klasik pada PASW/SPSS
No. Assumption Detector Notes
3. Multicollinearity Eigenvalues  Eigenvalues  0 : there is multicolinearity
 Eigenvalues  1 : there is no multicollinearity
Conditional Index (CI)  CI > 15  there is multicollinearity
 CI ≤ 15  there is no multicollinearity
4. Autocorrelation Durbin-Watson (DW)  The value of DW = 0 – 4
Compare DW from output and value of d from the table
(Durbin-Watson table) by condition:
• If : DW < dL  positive correlation
• If : dL ≤ DW ≤ dU  no conclusion/don’t know
• If : dU < DW < 4 – dU  no autocorrelation
• If : 4 - dU ≤ DW ≤ 4 - dL  no conclusion/don’t know
• If : DW > 4 – dL  negative correlation
Continuation:
10

© 2007 Prentice Hall
17-11
Assumptions
 The error term is normally distributed. For each
fixed value of X, the distribution of Y (dependent
variable) is normal.
 The means of all these normal distributions of Y,
given X, lie on a straight line with slope b.
 The mean of the error term is 0.
 The variance of the error term is constant. This
variance does not depend on the values
assumed by X  homoscedasticity
 The error terms are uncorrelated. In other
words, the observations have been drawn
independently  no multicollinearity

Multicollinearity
 Multicollinearity  there are linear
relationship between independent variables.
 For instance:
Yi = 0 + 1X1 + 2X2 + 3X3 + ui
Y : Consumption
X1 : Total Income
X2 : Income from salary
X3 : Income from non salary
Total income (X1) = Income from salary (X2)
+ Nonsalary income (X3)
 multicollinearity exist
12

Consequences of Multicollinearity:
 Variance so high
 Confidence Interval  wide (variance is high
 Standar Error is high  Confidence
Interval is wide).
 R2 is high but many variables are not
significant.
13
Multicollinearity

Multicollinearity Detection
1. Eigenvalues dan Conditional Index (CI)
 Multicollinearity exist in the regression equation if
Eigenvalues approtimate to zero (0).
 Relationship between Eigenvalues and Conditional
Index (CI) :
s
eigenvalue
s
eigenvalue
=
CI
min
max
• If CI > 15  there is multicollinearity
• If CI < 15  there is no multicolinearity
14

)
R
(
=
j
j 2
1
1
VIF

; j = 1,2,……,k
k = number of independent variable bebas
is a coefficient of determination between one independent
variable and other independent variables.
2
j
R
If VIF > 10  there is multicollinearity
If VIF ≤ 10  there is no multicolinearity
15
2. Variance Inflation Factor (VIF)

3. Tolerance (TOL)
 VIF has relationship with TOL, as follow:
 If TOL approtimate 1  no multicollinearity
 
2
1
1
TOL j
j R
=
VIF
= 
16

Handling the Multicollinearity
 Delete the variables which have strong
relationship with other variable.
– Commonly used.
– Be careful when deleting the variable.
 Transform the variabel.
 Add the sample/observation.
17

Homoskedasticity
 A homoskedastic error is one that has
constant variance  basic assumption.
 A heteroskedastic error is one that has a
nonconstant variance.
 Heteroskedasticity happened when
variance of error was not constant.
 Heteroskedasticity is more commonly a
problem for cross-section data sets,
although a time-series model can also
have a non-constant variance.
18

Examination of Homoskedasticity
Graph Method
 Principle: check the residual pattern (ui
2) to predicted
value of Yi.
 The steps:
 Running a regression model
 Making scatter plot between ui
2 and predicted Yi.
19
Homoskedasticity

ui
2
i
Observation:
1. There is no sistematic pattern.
2. Variance is constant  homoskedastic data
,
20
Homoskedasticity

ui
2
ui
2
i i
21
Observation:
1. There is a sistematic pattern.
2. Variance is not constant  heteroskedastic data
Homoskedasticity

Handling of Heteroskedasticity
 Transform the data using Logarithm. The objective of
this transformation is to reduce the scale between
independent variables, so that the variance of error is
so small, not too different between observation group.
 The model is:
Ln Yj = β0 + β1 Ln Xj + uj
22

Autocorrelation
 Autocorrelation  correlation between variable itself
in observation at different time or different individual.
 Commonly found in time series data. Current data
were influenced by previous time data. For example:
data about weight, salary/wage etc.
 One of detector  see the relationship pattern
between residual (ui) and independent variable or
time (X).
23

 Autocorrelation Pattern
 ui ui
 * **
 * * * * * *
 * * *
 * * * ** * * * **
*
time/X time/X
(1) (2)
 Diagram (1) shows there is a cycle, whereas diagram
(2) shows a linear line. Both indicate there are
autocorrelation.
24
Autocorrelation

Autocorrelation Detection
 Using Durbin Watson (DW) statistic.
 Comparing the DW from SPSS output and DW
value in DW table.
 The rules:
• DW < dL  positive correlation
• dL ≤ DW ≤ dU  no conclusion/don’t know
• dU < DW < 4 – dU  no autocorrelation
• 4 - dU ≤ DW ≤ 4 - dL  no conclusion/don’t know
• DW > 4 – dL  negative correlation
25

No conclusion/don’t know
Positive
correlation
0 dL dU 4-dU 4-dL 4
26
Autocorrelation Detection
No conclusion/don’t know
No autocorrelation Negative
correlation

Application
 A company has a sales person data that consist of age, income and
work experience. The director want to know whether any relationship
between age and work experience to income of sales person.
Besides that, the company also wants to make a multiple regression
model to predict the income based on age and work experience.
 Regression model:
Yi = b0 + b1X1 + b2X2 + ui
Y : Income
X1 : Age
X2 : Work experience
b0 ,b1, b2 : Parameter
ui : residual
27

Based on data in file multiple_regression1.sav , we will find the multiple regression
equation, y = b0 + b1x1 + b2x2 and conduct the hypothesis test to know whether the
regression coefficients are significat or not.
The steps:
1. Open file multiple_regression1.sav.
2. Click Analyze  Regression  Linear.
3. In Linear Regression view, move variable Income to Dependent box, then variable
Age dan Experience to Independent(s) box.
4. In Method section : select Enter.
5. Click Statistics knob, then give check at Estimates, Model fit, Collinearity
Diagnostics dan Durbin-Watson.
6. Click Continue.
7. Click Plots.
8. In Linear Regression section: Plots di bagian Standardized Residual
Plots, pilih Normal probability plot. Kemudian pindahkan *ZRESID
(standardized residual) ke dolam kotak Y dan *ZPRED (standardized
predicted value) ke dalam kotak X.
9. Click Continue, then OK.
28
Application

1. Coefficient of Determination (R2)
Adjusted R2 shows that 92.7 % variance of Income can be explained by the
changes in Experience and Age.
2. Autocorrelation Test
 Durbin-Watson (DW) value = 1.497
 From DW table, with k=2 (independent variable), a = 0.05, we find that
dL=0.6972; dU=1.6413; 4-dU=2.3587; 4-dL=3.3028.
 Since DW=1.497, then dL (0.6972) ≤ DW (1.497) ≤ dU (1.6413)  no conclusion
29
Output Interpretation
Experience, Age

3. F-Test and t-Test :
30
Experience, Age
Age
Experience

 F-Test :
From ANOVA table:
Sig. (p value) = 0.000 < a = 0.05  the independent variables jointly
influence the dependent variable
significantly  model is good
Hypothesis test:
H0 : b1 = b2 = 0
H1 : b1 ≠ b2 ≠ 0
Sig. (p value) = 0.000 < a = 0.05  reject H0  b1 and b2 not equal
zero
31

 t-Test
– To test whether each regression coefficient is significant or
not  see Coefficient table.
From Coefficient table:
– Variable Age (X1) :
Hypothesis test:
H0 : b1 = 0
H1 : b1 ≠ 0
Sig. : 0.000 < 0.05  Significant  variable Age affects
income significantly
32

 t-Test (Cont.)
– Variable Experience (X2) :
Hypothesis test:
H0 : b2 = 0
H1 : b2 ≠ 0
Sig. : 0.013 < 0.05  Significant  variable Experience
affects income significantly
33

 Regression equation:
From Coefficient table:
Y = -10360.5 + 1201.098 X1 + 1663.516 X2
where : Y = Income
X1 = Age
X2 = Work experience
Interpretation of regression parameters:
-10360.5  intercept; the value of Y If X1 and X2 are zero.
+ 1201.098  every increase of X1 by one unit, the value of Y will
increase by +1201.098 unit.
+ 1663.516  every increase of X2 by one unit, the value of Y will
increase by +1663.516 unit.
34

4. Normality Test:
 See the Normal P-P Plots of Regression  the data
points spread approtimate diagonal line form linear
pattern  the data distribution is normal.
Company Logo

36
The data points spread approtimate diagonal line and form a linear
pattern  the data distribution is normal.

5. Homoskedasticity Test
 See Scatterplot output below:
 The dispersion of data doesn’t form a specific pattern  its variance is
constant  homoskedastic.
37

6. Multicollinearity Test
See Coefficients table, in Collinearity Statistics column:
 The value of VIF = 1.377 < 10  no multicolinearity
 The value of TOL = 0.726  approtimate to 1  no multicollinearity
38
Interpretasi Output Regresi Berganda
Age
Experience

Analyze Multiple Regression Model to Predict Income

Recommended

Recommended

More Related Content

Similar to Analyze Multiple Regression Model to Predict Income

Similar to Analyze Multiple Regression Model to Predict Income (20)

Recently uploaded

Recently uploaded (20)

Analyze Multiple Regression Model to Predict Income