SlideShare a Scribd company logo
1 of 40
Quantitative Research Technique
Multiple Regression Analysis
Selection of Predictor Variables
Confidence and Prediction Interval
Dinesh Pudasaini (CRN 071MSI604)
1
Goal
• Develop a statistical model that can predict the values of a
dependent (response) variable based upon the values of
the Independent (explanatory) variables.
• In many situations, more than one independent variable
may be useful in predicting the value of a dependent
variable. We then use multiple regression.
2
Introduction
Simple Regression
A statistical model that utilizes one quantitative independent
variable “X” to predict the quantitative dependent variable
“Y.”
Multiple Regression:
A statistical model that utilizes two or more quantitative and
qualitative explanatory variables (x1,..., xk) to predict a
quantitative dependent variable Y.
3
Simple vs. Multiple
• Simple Regression
•  represents the unit
change in Y per unit
change in X .
• Does not take into account
any other variable besides
single independent
variable.
• Multiple regression
• i represents the unit
change in Y per unit change
in Xi.
• Takes into account the
effect of other i s.
4
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
5
Linear Model
• Relationship between one dependent & two or more
independent variables is a linear function
Dependent
(response)
variable
Independent
(explanatory)
variables
Population
slopes
Population
Y-intercept
Random
error




 




 P
P
X
X
X
Y 
2
2
1
1
0
6
Linear Model
• The error terms Ɛ are mutually independent and identically
distributed, with mean = 0 and constant variances
• This is so, because the observations y1, y2, . . . ,yn are a random
sample, they are mutually independent and hence the error
terms are also mutually independent
• The distribution of the error term is independent of the joint
distribution of x i, x 2, . . . , x k
7
Method of Least Squares
• we use the least-squares method to fit a linear function to the
data.
• bo,b1, b2, b3 . . . , bk are the sample estimates of the
coefficients ß0,ß1, ß2, ß3 . . . , ßk
• The least-squares method chooses the b’s that make the sum of
squares of the residuals as small as possible.
• The least-squares estimates are the values that minimize the
quantity.
8
Standard Error of Estimate and
Coefficient of Multiple Determination
• The observed variability of the responses about
this fitted model is measured by the variance
and the regression standard error of estimate is
Coefficient of Multiple Determination
When null hypothesis is rejected, a relationship between Y and
the X variables exists. Strength measured by R2
9
Coefficient of Multiple Determination.
• Sum of squares due to error
SSE =
• Sum of squares due to regression
SSR =
• Total sum of squares
SST =
• Obviously,
• The ratio SSR/SST represents the proportion of the total variation in
y explained by the regression model.
• This ratio, denoted by R2, is called the coefficient of multiple
determination.
10
Adjusted Coefficient of Multiple
Determination.
• R2 is sensitive to the magnitudes of n and k in small samples.
If k is large relative to n, the model tends to fit the data very
well. In the extreme case, if n = k+1, the model would exactly
fit the data.
• A better goodness of fit measure is the adjusted R2
Adjusted R2= 1 – (n-1/n-k-1) (1-R2)
» 1- SSE/(n-k-1)/SST/(n-1)
11
Hypothesis Tests in Multiple Linear
Regression
• Three types of hypothesis tests can be carried out for multiple
linear regression models:
• First Test for significance of regression: This test checks the
significance of the whole regression model.
• Second Test: This test checks the significance of individual regression
coefficients.
• Third Test: This test can be used to simultaneously check the
significance of a number of regression coefficients.
12
F-test for the overall fit of the model
13
Test for Significance of Regression
14
Significance tests for ẞi
15
ANOVA for Regression
• Analysis of Variance (ANOVA) consists of
calculations that provide information about levels of
variability within a regression model and form a basis
for tests of significance.
16
Example
A TV industry analyst wants to build a statistical model for
predicting the number of subscribers that a cable station can
expect.
Y = Number of cable subscribers (SUSCRIB).
X1 = Advertising rate which the station charges local advertisers for one minute
of prim time space (ADRATE).
X2 = Kilowatt power of the station’s non-cable signal (KILOWATT).
X3 = Number of families living in the station’s area of dominant influence
(ADI), a geographical division of radio and TV audiences (APIPOP).
X4 = Number of competing stations in the ADI (COMPETE).
17
Example (contd….)
18
Multiple Regression Equation
• Based on the partial t-test, the variables signal and compete
are the least significant variables in our model.
• Let’s drop the least significant variables one at a time.
19
Multiple Regression Equation
Y = 562.15 - 5.44x1 - 20.01x2
where: x1 = temperature [degrees F]
x2 = attic insulation [inches]
20
Multiple Regression Equation
• The variable Compete is the next variable to get rid of.
21
Multiple Regression Prediction
• All the variables in the model are statistically significant,
therefore our final model is:
• Final Model
22
Multicollinearity
• High correlation between X variables (Independent variables).
• Coefficients measure combined effect.
• Leads to unstable coefficients depending on X variables in model
• Always exists; matter of degree
• Example: Using both total number of rooms and number of
bedrooms as explanatory variables in same model
• In many non-experimental situations in business,
economics, and the social and biological sciences, the
independent variables tend to be correlated among
themselves.
23
Detecting Multicollinearity
• Examine correlation matrix
– Determines if the Correlations between pairs of X
variables are more than with Y variable
• Few remedies
– Obtain new sample data
– Eliminate one correlated X variable
24
Finding the Best Multiple Regression
Equation
• Use common sense and practical considerations to
include or exclude variables.
• Consider the P-value.
• Consider equations with high values of adjusted R2
and try to include only a few variables.
• For a given number of predictor (x) variables,
select the equation with the largest value of adjusted
R2.
Selection of Predictor Variable
Stepwise regression
26
Statement of problem
• A common problem is that there is a large set of candidate
predictor variables.
• Goal is to choose a small subset from the larger set so that the
resulting regression model is simple, yet have good predictive
ability.
Example: Cement data
• Response y: heat evolved in calories during hardening of cement on a per
gram basis
• Predictor x1: % of tricalcium aluminate
• Predictor x2: % of tricalcium silicate
• Predictor x3: % of tetracalcium alumino ferrite
• Predictor x4: % of dicalcium silicate
27
Two basic methods
of selecting predictors
• Stepwise regression: Enter and remove predictors, in a
stepwise manner, until there is no justifiable reason to enter or
remove more.
• Best subsets regression: Select the subset of predictors that do
the best at meeting some well-defined objective criterion.
28
Stepwise regression: the idea
• Start with no predictors in the “stepwise model.”
• At each step, enter or remove a predictor based on partial F-
tests (that is, the t-tests).
• Stop when no more predictors can be justifiably entered or
removed from the stepwise model.
1. Specify an Alpha-to-Enter (αE = 0.15) significance level.
2. Specify an Alpha-to-Remove (αR = 0.15) significance level.
29
Stepwise regression:
Step #1
1. Fit each of the one-predictor models, that is, regress y on x1,
regress y on x2, … regress y on xp-1.
2. The first predictor put in the stepwise model is the predictor that
has the smallest t-test P-value (below αE = 0.15).
3. If P-value < 0.15, stop.
Step #2
1. Suppose x1 was the “best” one predictor.
2. Fit each of the two-predictor models with x1 in the model, that is,
regress y on (x1, x2), regress y on (x1, x3), …, and y on (x1, xp-1).
3. The second predictor put in stepwise model is the predictor that
has the smallest t-test P-value (below αE = 0.15).
4. If P-value < 0.15, stop.
30
Stepwise regression:
Step #2 (continued)
1. Suppose x2 was the “best” second predictor.
2. Step back and check P-value for β1 = 0. If the P-value for
β1 = 0 has become not significant (above αR = 0.15),
remove x1 from the stepwise model.
Step#3
1. Suppose both x1 and x2 made it into the two-predictor
stepwise model.
2. Fit each of the three-predictor models with x1 and x2 in the
model, that is, regress y on (x1, x2, x3), regress y on (x1, x2,
x4), …, and regress y on (x1, x2, xp-1).
31
Stepwise regression:
Step #3 (continued)
1. The third predictor put in stepwise model is the predictor
that has the smallest t-test P-value (below αE = 0.15).
2. If P-value < 0.15, stop.
3. Step back and check P-values for β1 = 0 and β2 = 0. If either
P-value has become not significant (above αR = 0.15),
remove the predictor from the stepwise model.
Stopping the procedure
The procedure is stopped when adding an additional predictor
does not yield a t-test P-value below αE = 0.15.
32
Prediction and Confidence
Intervals
33
Confidence intervals are intervals constructed about the
predicted value of y, at a given level of x, which are used to
measure the accuracy of the mean response of all the
individuals in the population.
Prediction intervals are intervals constructed about the
predicted value of y that are used to measure the accuracy of a
single individual’s predicted value.
34
35
Confidence Interval:
Prediction Interval:
Example
• Suppose we want to estimate the average weight of an adult
male in a city We draw a random sample of 1,000 men from a
population of 1,000,000 men and weigh them. We find that the
average man in our sample weighs 180 pounds, and the
standard deviation of the sample is 30 pounds. What is the
95% confidence interval.
Solution:
• Identify a sample statistic. Since we are trying to estimate the
mean weight in the population, we choose the mean weight in
our sample (180) as the sample statistic.
• Select a confidence level. We are working with a 95%
confidence level.
36
Example Contd….
• Find the margin of error.
Find standard error.
The standard error (SE) of the mean is:
SE = s / sqrt( n ) = 30 / sqrt(1000) = 30/31.62 = 0.95
Find critical value.
• The critical value is a factor used to compute the margin of
error. To express the critical value as a t score(t*)
Compute alpha (α): α = 1 - (confidence level / 100) = 0.05
– Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 =
0.975
– Find the degrees of freedom(df): df = n - 1 = 1000 - 1 =
999 37
Example Contd..
– The critical value is the t score having 999 degrees of
freedom and a cumulative probability equal to 0.975. From
the t distribution table, we find that the critical value is
1.96.
• Note: We might also have expressed the critical value as a z
for small sample size.
• Compute margin of error (ME): ME = critical value * standard
error = 1.96 * 0.95 = 1.86
• The range of the confidence interval = sample statistic +
margin of error.
• And the uncertainty is denoted by the confidence level, this
95% confidence interval is 180 + 1.86
38
Questions
• Explain the linear multiple regression model.
• How predictor variable can be selected Using stepwise
Regression Analysis?
• Suppose we want to estimate the average weight of an adult
male in a city We draw a random sample of 1,000 men from a
population of 1,000,000 men and weigh them. We find that the
average man in our sample weighs 180 pounds, and the
standard deviation of the sample is 30 pounds. What is the
95% confidence interval.
39
Thank You
40

More Related Content

Similar to 604_multiplee.ppt

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
regression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfregression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfMuhammadAftab89
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural NetRatul Alahy
 
10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm iSatwik Mohanty
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
Pa, moderation, mediation (final)
Pa, moderation, mediation (final)Pa, moderation, mediation (final)
Pa, moderation, mediation (final)ahmed-nor
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Stephen Ong
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introductionedinyoka
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisHARISH Kumar H R
 

Similar to 604_multiplee.ppt (20)

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
regression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfregression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdf
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural Net
 
10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i
 
Model selection
Model selectionModel selection
Model selection
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Regression
RegressionRegression
Regression
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Pa, moderation, mediation (final)
Pa, moderation, mediation (final)Pa, moderation, mediation (final)
Pa, moderation, mediation (final)
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introduction
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Measures of Variation
Measures of Variation Measures of Variation
Measures of Variation
 
report
reportreport
report
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 

Recently uploaded

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

604_multiplee.ppt

  • 1. Quantitative Research Technique Multiple Regression Analysis Selection of Predictor Variables Confidence and Prediction Interval Dinesh Pudasaini (CRN 071MSI604) 1
  • 2. Goal • Develop a statistical model that can predict the values of a dependent (response) variable based upon the values of the Independent (explanatory) variables. • In many situations, more than one independent variable may be useful in predicting the value of a dependent variable. We then use multiple regression. 2
  • 3. Introduction Simple Regression A statistical model that utilizes one quantitative independent variable “X” to predict the quantitative dependent variable “Y.” Multiple Regression: A statistical model that utilizes two or more quantitative and qualitative explanatory variables (x1,..., xk) to predict a quantitative dependent variable Y. 3
  • 4. Simple vs. Multiple • Simple Regression •  represents the unit change in Y per unit change in X . • Does not take into account any other variable besides single independent variable. • Multiple regression • i represents the unit change in Y per unit change in Xi. • Takes into account the effect of other i s. 4
  • 6. Linear Model • Relationship between one dependent & two or more independent variables is a linear function Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error            P P X X X Y  2 2 1 1 0 6
  • 7. Linear Model • The error terms Ɛ are mutually independent and identically distributed, with mean = 0 and constant variances • This is so, because the observations y1, y2, . . . ,yn are a random sample, they are mutually independent and hence the error terms are also mutually independent • The distribution of the error term is independent of the joint distribution of x i, x 2, . . . , x k 7
  • 8. Method of Least Squares • we use the least-squares method to fit a linear function to the data. • bo,b1, b2, b3 . . . , bk are the sample estimates of the coefficients ß0,ß1, ß2, ß3 . . . , ßk • The least-squares method chooses the b’s that make the sum of squares of the residuals as small as possible. • The least-squares estimates are the values that minimize the quantity. 8
  • 9. Standard Error of Estimate and Coefficient of Multiple Determination • The observed variability of the responses about this fitted model is measured by the variance and the regression standard error of estimate is Coefficient of Multiple Determination When null hypothesis is rejected, a relationship between Y and the X variables exists. Strength measured by R2 9
  • 10. Coefficient of Multiple Determination. • Sum of squares due to error SSE = • Sum of squares due to regression SSR = • Total sum of squares SST = • Obviously, • The ratio SSR/SST represents the proportion of the total variation in y explained by the regression model. • This ratio, denoted by R2, is called the coefficient of multiple determination. 10
  • 11. Adjusted Coefficient of Multiple Determination. • R2 is sensitive to the magnitudes of n and k in small samples. If k is large relative to n, the model tends to fit the data very well. In the extreme case, if n = k+1, the model would exactly fit the data. • A better goodness of fit measure is the adjusted R2 Adjusted R2= 1 – (n-1/n-k-1) (1-R2) » 1- SSE/(n-k-1)/SST/(n-1) 11
  • 12. Hypothesis Tests in Multiple Linear Regression • Three types of hypothesis tests can be carried out for multiple linear regression models: • First Test for significance of regression: This test checks the significance of the whole regression model. • Second Test: This test checks the significance of individual regression coefficients. • Third Test: This test can be used to simultaneously check the significance of a number of regression coefficients. 12
  • 13. F-test for the overall fit of the model 13
  • 14. Test for Significance of Regression 14
  • 16. ANOVA for Regression • Analysis of Variance (ANOVA) consists of calculations that provide information about levels of variability within a regression model and form a basis for tests of significance. 16
  • 17. Example A TV industry analyst wants to build a statistical model for predicting the number of subscribers that a cable station can expect. Y = Number of cable subscribers (SUSCRIB). X1 = Advertising rate which the station charges local advertisers for one minute of prim time space (ADRATE). X2 = Kilowatt power of the station’s non-cable signal (KILOWATT). X3 = Number of families living in the station’s area of dominant influence (ADI), a geographical division of radio and TV audiences (APIPOP). X4 = Number of competing stations in the ADI (COMPETE). 17
  • 19. Multiple Regression Equation • Based on the partial t-test, the variables signal and compete are the least significant variables in our model. • Let’s drop the least significant variables one at a time. 19
  • 20. Multiple Regression Equation Y = 562.15 - 5.44x1 - 20.01x2 where: x1 = temperature [degrees F] x2 = attic insulation [inches] 20
  • 21. Multiple Regression Equation • The variable Compete is the next variable to get rid of. 21
  • 22. Multiple Regression Prediction • All the variables in the model are statistically significant, therefore our final model is: • Final Model 22
  • 23. Multicollinearity • High correlation between X variables (Independent variables). • Coefficients measure combined effect. • Leads to unstable coefficients depending on X variables in model • Always exists; matter of degree • Example: Using both total number of rooms and number of bedrooms as explanatory variables in same model • In many non-experimental situations in business, economics, and the social and biological sciences, the independent variables tend to be correlated among themselves. 23
  • 24. Detecting Multicollinearity • Examine correlation matrix – Determines if the Correlations between pairs of X variables are more than with Y variable • Few remedies – Obtain new sample data – Eliminate one correlated X variable 24
  • 25. Finding the Best Multiple Regression Equation • Use common sense and practical considerations to include or exclude variables. • Consider the P-value. • Consider equations with high values of adjusted R2 and try to include only a few variables. • For a given number of predictor (x) variables, select the equation with the largest value of adjusted R2.
  • 26. Selection of Predictor Variable Stepwise regression 26
  • 27. Statement of problem • A common problem is that there is a large set of candidate predictor variables. • Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability. Example: Cement data • Response y: heat evolved in calories during hardening of cement on a per gram basis • Predictor x1: % of tricalcium aluminate • Predictor x2: % of tricalcium silicate • Predictor x3: % of tetracalcium alumino ferrite • Predictor x4: % of dicalcium silicate 27
  • 28. Two basic methods of selecting predictors • Stepwise regression: Enter and remove predictors, in a stepwise manner, until there is no justifiable reason to enter or remove more. • Best subsets regression: Select the subset of predictors that do the best at meeting some well-defined objective criterion. 28
  • 29. Stepwise regression: the idea • Start with no predictors in the “stepwise model.” • At each step, enter or remove a predictor based on partial F- tests (that is, the t-tests). • Stop when no more predictors can be justifiably entered or removed from the stepwise model. 1. Specify an Alpha-to-Enter (αE = 0.15) significance level. 2. Specify an Alpha-to-Remove (αR = 0.15) significance level. 29
  • 30. Stepwise regression: Step #1 1. Fit each of the one-predictor models, that is, regress y on x1, regress y on x2, … regress y on xp-1. 2. The first predictor put in the stepwise model is the predictor that has the smallest t-test P-value (below αE = 0.15). 3. If P-value < 0.15, stop. Step #2 1. Suppose x1 was the “best” one predictor. 2. Fit each of the two-predictor models with x1 in the model, that is, regress y on (x1, x2), regress y on (x1, x3), …, and y on (x1, xp-1). 3. The second predictor put in stepwise model is the predictor that has the smallest t-test P-value (below αE = 0.15). 4. If P-value < 0.15, stop. 30
  • 31. Stepwise regression: Step #2 (continued) 1. Suppose x2 was the “best” second predictor. 2. Step back and check P-value for β1 = 0. If the P-value for β1 = 0 has become not significant (above αR = 0.15), remove x1 from the stepwise model. Step#3 1. Suppose both x1 and x2 made it into the two-predictor stepwise model. 2. Fit each of the three-predictor models with x1 and x2 in the model, that is, regress y on (x1, x2, x3), regress y on (x1, x2, x4), …, and regress y on (x1, x2, xp-1). 31
  • 32. Stepwise regression: Step #3 (continued) 1. The third predictor put in stepwise model is the predictor that has the smallest t-test P-value (below αE = 0.15). 2. If P-value < 0.15, stop. 3. Step back and check P-values for β1 = 0 and β2 = 0. If either P-value has become not significant (above αR = 0.15), remove the predictor from the stepwise model. Stopping the procedure The procedure is stopped when adding an additional predictor does not yield a t-test P-value below αE = 0.15. 32
  • 34. Confidence intervals are intervals constructed about the predicted value of y, at a given level of x, which are used to measure the accuracy of the mean response of all the individuals in the population. Prediction intervals are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual’s predicted value. 34
  • 36. Example • Suppose we want to estimate the average weight of an adult male in a city We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval. Solution: • Identify a sample statistic. Since we are trying to estimate the mean weight in the population, we choose the mean weight in our sample (180) as the sample statistic. • Select a confidence level. We are working with a 95% confidence level. 36
  • 37. Example Contd…. • Find the margin of error. Find standard error. The standard error (SE) of the mean is: SE = s / sqrt( n ) = 30 / sqrt(1000) = 30/31.62 = 0.95 Find critical value. • The critical value is a factor used to compute the margin of error. To express the critical value as a t score(t*) Compute alpha (α): α = 1 - (confidence level / 100) = 0.05 – Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.975 – Find the degrees of freedom(df): df = n - 1 = 1000 - 1 = 999 37
  • 38. Example Contd.. – The critical value is the t score having 999 degrees of freedom and a cumulative probability equal to 0.975. From the t distribution table, we find that the critical value is 1.96. • Note: We might also have expressed the critical value as a z for small sample size. • Compute margin of error (ME): ME = critical value * standard error = 1.96 * 0.95 = 1.86 • The range of the confidence interval = sample statistic + margin of error. • And the uncertainty is denoted by the confidence level, this 95% confidence interval is 180 + 1.86 38
  • 39. Questions • Explain the linear multiple regression model. • How predictor variable can be selected Using stepwise Regression Analysis? • Suppose we want to estimate the average weight of an adult male in a city We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval. 39