4. Assumptions of Linear Regression
Linear relationship
Multivariate normality
No or little multicollinearity
No auto-correlation
Homoscedasticity
6. What is a multiple linear regression?
Multiple linear regression attempts to model
the relationship between two or more explanatory
variables and a response variable by fitting a
linear equation to observed data. Every value of the
independent variable x is associated with a value
of the dependent variable y.
7. When do you use regression
analysis?
Regression analysis is also used to
understand which among the
independent variables are related to the
dependent variable, and to explore the
forms of these relationships. In
restricted circumstances, regression
analysis can be used to infer causal
relationships between the independent
and dependent variables.
Regression Analysis
What is regression analysis used
for?
In statistics, regression analysis is a
statistical process for estimating the
relationships among variables. It
includes many techniques for
modeling and analysing several
variables, when the focus is on the
relationship between a dependent
variable and one or more independent
variables.
8. Multiple Linear Regression:
DIAGNOSTICS
RMSE
Lower values of RMSE indicate
better fit. RMSE is a good
measure of how accurately the
model predicts the response,
and is the most important
criterion for fit if the main
purpose of the model is
prediction.
The meaning of R2
The value r2 is a fraction
between 0.0 and 1.0, and has
no units. An r2 value of 0.0
means that knowing X does not
help you predict Y. There is no
linear relationship between X
and Y, and the best-fit line is a
horizontal line going through the
mean of all Y values. When
r2 equals 1.0, all points lie
exactly on a straight line with no
scatter. Knowing X lets you
predict Y perfectly.
Adjusted R2
The adjusted R-squared is a
modified version of R-squared that
has been adjusted for the number of
predictors in the model.
The adjusted R-squared increases
only if the new term improves the
model more than would be expected
by chance. It decreases when a
predictor improves the model by less
than expected by chance.
11. Multiple Linear Regression:
PROCEDURE & DIMENTIONS:
To get best fit
model i used
significance level
i.e 0.05 for
removing non
significance &
Multicollinearity
variables.
12. Multiple Linear Regression:
STEPWISE REGRESSION
Combines forward & backward.
At each step, variables may be entered or removed if they meet
certain criteria.
Useful for developing the best prediction equation from the
smallest number of variables.
Redundant predictors removed.
Computer-driven-- Controversial
13. Multiple Linear Regression:
MODEL SUMMARY
Model done in 9
steps using
stepwise
selection
Partial R2 of the relation is shown in the
following picture 1.2
14. Multiple Linear Regression:
Rules of AIC & BIC & Mallows’s C(p) Index
The model with the smaller AIC is considered the better fitting model & AIC
can be negative. To choose the model we use the criteria of lower AIC (-
230.2E+4).
If BIC is positive, the saturated model (i.e. the model with one parameter
for every case; the BIC for a saturated model will equal 0) is preferred (i.e. the
more complex model is better). When BIC is negative, the current model is
preferred. The more negative the BIC, the better the fit.
Condition Index – the condition index is calculated using a factor analysis
on the independent variables. Values of 10-30 indicate a mediocre
multicollinearity in the regression variables, values > 30 indicate strong
multicollinearity.
15. Multiple Linear Regression:
RULE OF THUMB FOR INTERPRETATION OF R2
This model R2 and Adj.R2
relation looks like below
picture .10 = small (R ~ 3)
.25 = moderate (R ~ .5)
.00 = no linearship
.50 = strong (R~ .7)
1.00 = perfect linear relationship
In Our Model R2 is 0.9103 its near to 1.00 so there
is a perfect relationship in our model
Picture 1.2
16. Multiple Linear Regression:
MODEL SUMMARY
Sign is there means diff
is more so Accepting
Alternate H0
In statistics, the Bayesian information criterion (BIC)
or Schwarz criterion (also SBC, SBIC) is a criterion
for model selection among a finite set of models
18. Multiple Linear Regression:
SELECTED MODEL ANOVA
Here F & P values are satisfied as per the Rules
And if error(residual) is close to Zero model is good
so here model is good
no auto-
correlation
19. Multiple Linear Regression:
DURBIN-WATSON -- AUTOCORRELATION
linear regression analysis requires that there is little or no autocorrelation
in the data. Autocorrelation occurs when the residuals are not independent
from each other. In other words when the value of y(x+1) is not independent
from the value of y(x).
We can test the linear regression model for autocorrelation with the
Durbin-Watson test. Durbin-Watson's d tests the null hypothesis that the
residuals are not linearly auto-correlated. While d can assume values
between 0 and 4, values around 2 indicate no autocorrelation. As a rule of
thumb values of 1.5 < d < 2.5 show that there is no auto-correlation in the
data, however the Durbin-Watson test only analyses linear autocorrelation
and only between direct neighbors, which are first order effects.
20. Multiple Linear Regression:
SELECTED MODEL SUMMARY
If the Chi-square value is greater than or equal to the
critical value
There is a significant difference between the groups we
are studying. That is, the difference between actual data
and the expected data (that assumes the groups aren’t
different) is probably too great to be attributed to chance.
So we conclude that our sample supports the hypothesis
of a difference.
Critical value:
95th percentile of the Chi-
Squared distribution
with 400 DF
447.63246783
Here All
diagnostics are
satisfied as per
rules so we are
concluding that the
model is best fit
model
RMSE is an absolute
measure of fit.
Use PRESS to assess your model's predictive
ability. Usually, the smaller the PRESS value, the
better the model's predictive ability. PRESS is used
to calculate the predicted R2 which is usually more
intuitive to interpret.
24. Our process is easy
Interpretation
Data
Mining
Techniques
Data
Cleaning
25. Multiple Linear Regression:
After completion of the model the predictor
variables are as follows
Widely Available
Quantity purchase option
Quality comparable to Branded items
Good shelf life
Quality Conscious
Brand Loyality
Age_group
Monthly_Spent
Education
28. Multiple Linear Regression:
INTERPRETATION
Widely Available
As people response
positive+storng or
negitive+weak weight increases
otherwise decreases.
Quantity purchase
option
As people judgement varies
towards Q2 Total_weight is
also decreases(this is directly
propotional)
Quality comparable to
Branded items
The weight is depending upon
the firmness of decision of the
people.(if they agree weight
decreases otherwise it
increases)
Good shelf life
As people response
positive+storng or
negitive+weak weight increases
otherwise decreases.
Quality Conscious
As people response
strong+positive or
strong+negitive weight
increases otherwise decreases
Brand Loyality
As people response
positive+storng or
negitive+weak weight increases
otherwise decreases.
29. Multiple Linear Regression:
INTERPRETATION
Age_group
If the people are younger or
older weight is high, otherwise
weight is decreases
Monthly_Spent
As the monthly spent of the
people increases weight also
increases
Education
The person is graduated
or P.G weight is high and if
the person is inter/ssc or
professional weight is low
30. Credits
Special thanks to MR K.Venkat Rao Director of
Reachout Business Analytics Pvt Ltd for his
Guidence and support all through my Career.