SlideShare a Scribd company logo
1 of 45
U N I V E R S I T Y O F S O U T H F L O R I D A //
Linear Regression Concepts
Dr. S. Shivendu
U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Linear Regression Concepts
Identify the mathematical basis of linear
regression.
01
Differentiate statistical inferences about
relationships based on regression output.
02
Analyze the concepts of p-value, hypothesis
testing, and confidence intervals, and their
interpretation.
03
U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Linear Regression Concepts
Regression Analysis
Introduction
Linear Regression
Concepts
Assumptions
Concepts
Coefficient Confidence Intervals
Concepts
Prediction Confidence Intervals
Concepts
U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Models
A mathematical model is a mathematical expression of some phenomenon
Describe relationships between variables
Deterministic
Models
Probabilistic
Models
U N I V E R S I T Y O F S O U T H F L O R I D A // 5
Deterministic Models
Hypothesize exact relationships.
Suitable when the relationship is certain and known.
Example: Force is exactly mass times acceleration
 F = m·a
U N I V E R S I T Y O F S O U T H F L O R I D A // 6
The relationship is not certain and all factors that impact
the outcome are not known
Hypothesize two components
Probabilistic Models
 Deterministic and random error
Example: Sales volume (y) is 10 times advertising
spending (x) + random error
 y = 10x + 
 The random error may be due to factors
other than advertising
U N I V E R S I T Y O F S O U T H F L O R I D A // 7
Regression Models
Answers: “What is the relationship between the variables?”
Equations used:
One numerical dependent (response) variable
Used mainly for estimating the strength of the relationship and
for prediction
One or more numerical or categorical independent
(explanatory) variables
U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Regression Modeling Steps
Hypothesize the
deterministic
relationship
between the
response variable
(dependent
variable) and one
or more
explanatory
(independent
variables) in the
Population
Specify
probability
distribution of
random error
term. Estimate
the standard
deviation of the
error
Estimate
unknown model
parameters
Interpret the
estimated
parameters?
What is a
parameter?
U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Model Specification is Based on Theory
Theory of field
(e.g., Sociology)
Mathematical
theory
Previous research
“Common sense”
U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Types of Regression Models
Simple
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Linear Linear Non- Linear
Non- Linear
U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Linear Regression Models
Relationship between variables is a linear function
y 
Dependent (Response)
Variable
 x 
= + +
Population y - intercept Participation Slope Random Error
Independent (Explanatory)
Variable
0 1
U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Population Linear Regression Model
y
x
0 1
i i i
y x
  
  
  0 1
E y x
 
 
Observed value
Observed value
i = Random error
U N I V E R S I T Y O F S O U T H F L O R I D A // 13
Sample Linear Regression Model
y
x
0 1
ˆ ˆ ˆ
i i i
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
Unsampled observation
i = Random error
Observed value
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 14
Estimating Parameters: Least Squares Method
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Scattergram
0
20
40
60
0 20 40 60
x
y
Plot of all (xi, yi) pairs
Suggests how well the model will fit
U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Thinking Challenge
How would you draw a line
through the points?
0
20
40
60
0 20 40 60
x
y
How would you determine
which line fits best?
U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Least Squares
“Best fit’ means the
difference between
actual y values and
estimated or predicted y
values are a minimum
 
2 2
1 1
ˆ ˆ
n n
i
i i
i i
y y 
 
 
 
Positive differences off-set
negative
Least Squares minimizes
the Sum of the Squared
Differences (SSE)
U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Least Squares Graphically
e2
y
x
e1 e3
e4
^
^
^
^
2 0 1 2 2
ˆ ˆ ˆ
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
2 2 2 2 2
1 2 3 4
1
ˆ ˆ ˆ ˆ ˆ
LS minimizes
n
i
i
    

   

U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Coefficient Equations
Prediction Equation
0 1
ˆ ˆ
ŷ x
 
 
1 1
1
1 2
1
2
1
ˆ
n n
i i
n
i i
i i
xy i
n
xx
i
n
i
i
i
x y
x y
SS n
SS
x
x
n

 



  
  
  

 
 
 
 

 



Slope
0 1
ˆ ˆ
y x
 
 
y-intercept
U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Estimated y changes by 1 for each 1unit increase in x
Interpretation of Coefficients
If 1 = 2, then Sales (y) is expected to increase by 2 for each
1 unit increase in Advertising (x)
The average value of y when x = 0
If 0 = 4, then Average Sales (y) is expected to be 4 when
Advertising (x) is 0
Slope (1)
Y-Intercept (0)
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Parameter Estimation Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
0
^
1
^
ˆ .1 .7
y x
  
U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Sales Volume (y) is expected to increase by .7 units for
each $1 increase in Advertising (x)
Coefficient Interpretation Solution
Average value of Sales Volume (y) is -.10 units when
Advertising (x) is 0
 Difficult to explain to marketing manager
 Expect some sales without advertising
Slope (1)
Y-Intercept (0)
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Probability Distribution of Random Error
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Linear Regression Assumptions
The mean probability
distribution of error, ε, is
0
The probability
distribution of error, ε, is
approximately normally
distributed
The probability
distribution of error has
a constant variance
Errors are independent
U N I V E R S I T Y O F S O U T H F L O R I D A // 25
Error Probability Distribution
x1 x2 x3
y
E(y) = β0 + β1x
x
Variation of actual y from
predicted y, y
Random Error Variation
Measured by standard error of
regression model. Sample
standard deviation of  : s
Affects several factors like
parameter significance and
prediction accuracy
U N I V E R S I T Y O F S O U T H F L O R I D A // 27
Variation Measures
y
x
xi
0 1
ˆ ˆ
ˆi i
y x
 
 
yi
2
ˆ
( )
i i
y y

Unexplained sum of
squares or SSE
2
( )
i
y y

Total sum of squares
2
ˆ
( )
i
y y

Explained sum of
squares
y
U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Estimation of Variance of Error σ2
 
2
2
ˆ
2
i i
SSE
s where SSE y y
n
  


2
2
SSE
s s
n
 

U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Residual Analysis
e Y Y
= -
i i
ˆ
Check the assumptions of regression by examining the residuals
 Examine for linearity assumption
 Evaluate independence assumption
 Evaluate normal distribution assumption
 Examine for constant variance for all levels of X (homoscedasticity)
The residual for observation i, ei, is the difference between its
observed and predicted value
U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Residual Analysis for Linearity
Not Linear Linear
x
residuals
x
Y
x
Y
x
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 31
Residual Analysis for Independence
Not Independent Independent
X
X
residuals
residuals
X
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 32
Check for Normality
Examine the Sem-and-Leaf Display of the Residuals
Examine the Boxplot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the Residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Residual Analysis for Normality
Percent
Residual
When using a normal probability plot, normal errors
will approximately display in a straight line
-3 -2 -1 0 1 2 3
0
100
U N I V E R S I T Y O F S O U T H F L O R I D A // 34
Residual Analysis for Equal Variance
Non-constant variance Constant variance
x x
Y
x x
Y
residuals
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 35
Interpreting the Model - Testing for Significance
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Interpret model
U N I V E R S I T Y O F S O U T H F L O R I D A // 36
Test of Slope Coefficient
Shows if there is a linear
relationship between x
and y
Hypotheses:
Involves population
slope 1
Theoretical basis is
sampling distribution of
slope
 H0: 1 = 0 (No Linear Relationship)
 Ha: 1  0 (Linear Relationship)
U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Sampling Distribution of Sample Slopes
y
Population Line
x
Sample 1 Line
Sample 2 Line
1
Sampling Distribution
1
1
S
^
^
All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
: :
Very large number of
sample slopes
U N I V E R S I T Y O F S O U T H F L O R I D A // 38
Slope Coefficient Test Statistic
1
1 1
ˆ
2
1
2
1
ˆ ˆ
2
where
xx
n
i
n
i
xx i
i
t df n
s
S
SS
x
SS x
n

 


   
 
 
 
 


U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Test of Slope Coefficient Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
t = 1 / S
P-Value
S
1 1 1
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 40
Prediction with Regression Models
Types of predictions
What is predicted?
 Point estimates
 Interval estimates
 Population mean response E (y) for given x
 Point on population regression line
 Individual response (y) for given x
U N I V E R S I T Y O F S O U T H F L O R I D A // 41
Confidence Interval Estimate for Mean Value of y at x = x
 
xx
p
SS
x
x
n
S
t
y
2
2
/
1
ˆ


 
df = n – 2
p
U N I V E R S I T Y O F S O U T H F L O R I D A // 42
Factors Affecting Interval Width
Level of confidence (1 – )
 Width increases as confidence increases
Data dispersion (s)
 Width increases as variation increases
Sample size
 Width decreases as sample size increases
Distance of x from mean x
 Width increases as distance increases
p
-
U N I V E R S I T Y O F S O U T H F L O R I D A // 43
Prediction Interval of Individual Value of y at x = x
df = n – 2
p
 
2
/2
1
ˆ 1
p
xx
x x
y t S
n SS


  
U N I V E R S I T Y O F S O U T H F L O R I D A // 44
Key Takeaway
The statistical
interpretation is the
value proposition of
the linear
regression model
The statistical
interpretation
depends on
assumptions of the
linear model being
met
Understanding
outliers is critical for
drawing meaningful
inferences from the
linear regression
model
U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.

More Related Content

Similar to Linear Regression

Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
 
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxPampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
bunyansaturnina
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
simonithomas47935
 

Similar to Linear Regression (20)

Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Regression
RegressionRegression
Regression
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical Methods
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with exampleRegression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with example
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
Chapter13
Chapter13Chapter13
Chapter13
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
ders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptxders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptx
 
10.Analysis of Variance.ppt
10.Analysis of Variance.ppt10.Analysis of Variance.ppt
10.Analysis of Variance.ppt
 
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxPampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Correlation
CorrelationCorrelation
Correlation
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 

More from Michael770443

More from Michael770443 (7)

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Classification
ClassificationClassification
Classification
 
Segmentation: Clustering and Classification
Segmentation: Clustering and ClassificationSegmentation: Clustering and Classification
Segmentation: Clustering and Classification
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical Concepts
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 

Linear Regression

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // Linear Regression Concepts Dr. S. Shivendu
  • 2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2 Objectives Linear Regression Concepts Identify the mathematical basis of linear regression. 01 Differentiate statistical inferences about relationships based on regression output. 02 Analyze the concepts of p-value, hypothesis testing, and confidence intervals, and their interpretation. 03
  • 3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3 Agenda Linear Regression Concepts Regression Analysis Introduction Linear Regression Concepts Assumptions Concepts Coefficient Confidence Intervals Concepts Prediction Confidence Intervals Concepts
  • 4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4 Models A mathematical model is a mathematical expression of some phenomenon Describe relationships between variables Deterministic Models Probabilistic Models
  • 5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5 Deterministic Models Hypothesize exact relationships. Suitable when the relationship is certain and known. Example: Force is exactly mass times acceleration  F = m·a
  • 6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6 The relationship is not certain and all factors that impact the outcome are not known Hypothesize two components Probabilistic Models  Deterministic and random error Example: Sales volume (y) is 10 times advertising spending (x) + random error  y = 10x +   The random error may be due to factors other than advertising
  • 7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7 Regression Models Answers: “What is the relationship between the variables?” Equations used: One numerical dependent (response) variable Used mainly for estimating the strength of the relationship and for prediction One or more numerical or categorical independent (explanatory) variables
  • 8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8 Regression Modeling Steps Hypothesize the deterministic relationship between the response variable (dependent variable) and one or more explanatory (independent variables) in the Population Specify probability distribution of random error term. Estimate the standard deviation of the error Estimate unknown model parameters Interpret the estimated parameters? What is a parameter?
  • 9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9 Model Specification is Based on Theory Theory of field (e.g., Sociology) Mathematical theory Previous research “Common sense”
  • 10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10 Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Linear Non- Linear Non- Linear
  • 11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11 Linear Regression Models Relationship between variables is a linear function y  Dependent (Response) Variable  x  = + + Population y - intercept Participation Slope Random Error Independent (Explanatory) Variable 0 1
  • 12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12 Population Linear Regression Model y x 0 1 i i i y x         0 1 E y x     Observed value Observed value i = Random error
  • 13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13 Sample Linear Regression Model y x 0 1 ˆ ˆ ˆ i i i y x       0 1 ˆ ˆ ˆi i y x     Unsampled observation i = Random error Observed value ^
  • 14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14 Estimating Parameters: Least Squares Method Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Evaluate model Use model for prediction and estimation
  • 15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15 Scattergram 0 20 40 60 0 20 40 60 x y Plot of all (xi, yi) pairs Suggests how well the model will fit
  • 16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16 Thinking Challenge How would you draw a line through the points? 0 20 40 60 0 20 40 60 x y How would you determine which line fits best?
  • 17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17 Least Squares “Best fit’ means the difference between actual y values and estimated or predicted y values are a minimum   2 2 1 1 ˆ ˆ n n i i i i i y y        Positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE)
  • 18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18 Least Squares Graphically e2 y x e1 e3 e4 ^ ^ ^ ^ 2 0 1 2 2 ˆ ˆ ˆ y x       0 1 ˆ ˆ ˆi i y x     2 2 2 2 2 1 2 3 4 1 ˆ ˆ ˆ ˆ ˆ LS minimizes n i i           
  • 19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19 Coefficient Equations Prediction Equation 0 1 ˆ ˆ ŷ x     1 1 1 1 2 1 2 1 ˆ n n i i n i i i i xy i n xx i n i i i x y x y SS n SS x x n                               Slope 0 1 ˆ ˆ y x     y-intercept
  • 20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20 Estimated y changes by 1 for each 1unit increase in x Interpretation of Coefficients If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x) The average value of y when x = 0 If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0 Slope (1) Y-Intercept (0) ^ ^ ^ ^
  • 21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21 Parameter Estimation Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 0 ^ 1 ^ ˆ .1 .7 y x   
  • 22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22 Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x) Coefficient Interpretation Solution Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0  Difficult to explain to marketing manager  Expect some sales without advertising Slope (1) Y-Intercept (0) ^ ^ ^ ^
  • 23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23 Probability Distribution of Random Error Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Evaluate model Use model for prediction and estimation
  • 24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24 Linear Regression Assumptions The mean probability distribution of error, ε, is 0 The probability distribution of error, ε, is approximately normally distributed The probability distribution of error has a constant variance Errors are independent
  • 25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25 Error Probability Distribution x1 x2 x3 y E(y) = β0 + β1x x
  • 26. Variation of actual y from predicted y, y Random Error Variation Measured by standard error of regression model. Sample standard deviation of  : s Affects several factors like parameter significance and prediction accuracy
  • 27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27 Variation Measures y x xi 0 1 ˆ ˆ ˆi i y x     yi 2 ˆ ( ) i i y y  Unexplained sum of squares or SSE 2 ( ) i y y  Total sum of squares 2 ˆ ( ) i y y  Explained sum of squares y
  • 28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28 Estimation of Variance of Error σ2   2 2 ˆ 2 i i SSE s where SSE y y n      2 2 SSE s s n   
  • 29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29 Residual Analysis e Y Y = - i i ˆ Check the assumptions of regression by examining the residuals  Examine for linearity assumption  Evaluate independence assumption  Evaluate normal distribution assumption  Examine for constant variance for all levels of X (homoscedasticity) The residual for observation i, ei, is the difference between its observed and predicted value
  • 30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30 Residual Analysis for Linearity Not Linear Linear x residuals x Y x Y x residuals
  • 31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31 Residual Analysis for Independence Not Independent Independent X X residuals residuals X residuals
  • 32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32 Check for Normality Examine the Sem-and-Leaf Display of the Residuals Examine the Boxplot of the Residuals Examine the Histogram of the Residuals Construct a Normal Probability Plot of the Residuals
  • 33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33 Residual Analysis for Normality Percent Residual When using a normal probability plot, normal errors will approximately display in a straight line -3 -2 -1 0 1 2 3 0 100
  • 34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34 Residual Analysis for Equal Variance Non-constant variance Constant variance x x Y x x Y residuals residuals
  • 35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35 Interpreting the Model - Testing for Significance Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Interpret model
  • 36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36 Test of Slope Coefficient Shows if there is a linear relationship between x and y Hypotheses: Involves population slope 1 Theoretical basis is sampling distribution of slope  H0: 1 = 0 (No Linear Relationship)  Ha: 1  0 (Linear Relationship)
  • 37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37 Sampling Distribution of Sample Slopes y Population Line x Sample 1 Line Sample 2 Line 1 Sampling Distribution 1 1 S ^ ^ All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : : Very large number of sample slopes
  • 38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38 Slope Coefficient Test Statistic 1 1 1 ˆ 2 1 2 1 ˆ ˆ 2 where xx n i n i xx i i t df n s S SS x SS x n                   
  • 39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39 Test of Slope Coefficient Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 t = 1 / S P-Value S 1 1 1 ^ ^ ^ ^
  • 40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40 Prediction with Regression Models Types of predictions What is predicted?  Point estimates  Interval estimates  Population mean response E (y) for given x  Point on population regression line  Individual response (y) for given x
  • 41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41 Confidence Interval Estimate for Mean Value of y at x = x   xx p SS x x n S t y 2 2 / 1 ˆ     df = n – 2 p
  • 42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42 Factors Affecting Interval Width Level of confidence (1 – )  Width increases as confidence increases Data dispersion (s)  Width increases as variation increases Sample size  Width decreases as sample size increases Distance of x from mean x  Width increases as distance increases p -
  • 43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43 Prediction Interval of Individual Value of y at x = x df = n – 2 p   2 /2 1 ˆ 1 p xx x x y t S n SS     
  • 44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44 Key Takeaway The statistical interpretation is the value proposition of the linear regression model The statistical interpretation depends on assumptions of the linear model being met Understanding outliers is critical for drawing meaningful inferences from the linear regression model
  • 45. U N I V E R S I T Y O F S O U T H F L O R I D A // You have reached the end of the presentation.