SlideShare a Scribd company logo
U N I V E R S I T Y O F S O U T H F L O R I D A //
Linear Regression Concepts
Dr. S. Shivendu
U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Linear Regression Concepts
Identify the mathematical basis of linear
regression.
01
Differentiate statistical inferences about
relationships based on regression output.
02
Analyze the concepts of p-value, hypothesis
testing, and confidence intervals, and their
interpretation.
03
U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Linear Regression Concepts
Regression Analysis
Introduction
Linear Regression
Concepts
Assumptions
Concepts
Coefficient Confidence Intervals
Concepts
Prediction Confidence Intervals
Concepts
U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Models
A mathematical model is a mathematical expression of some phenomenon
Describe relationships between variables
Deterministic
Models
Probabilistic
Models
U N I V E R S I T Y O F S O U T H F L O R I D A // 5
Deterministic Models
Hypothesize exact relationships.
Suitable when the relationship is certain and known.
Example: Force is exactly mass times acceleration
 F = m·a
U N I V E R S I T Y O F S O U T H F L O R I D A // 6
The relationship is not certain and all factors that impact
the outcome are not known
Hypothesize two components
Probabilistic Models
 Deterministic and random error
Example: Sales volume (y) is 10 times advertising
spending (x) + random error
 y = 10x + 
 The random error may be due to factors
other than advertising
U N I V E R S I T Y O F S O U T H F L O R I D A // 7
Regression Models
Answers: “What is the relationship between the variables?”
Equations used:
One numerical dependent (response) variable
Used mainly for estimating the strength of the relationship and
for prediction
One or more numerical or categorical independent
(explanatory) variables
U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Regression Modeling Steps
Hypothesize the
deterministic
relationship
between the
response variable
(dependent
variable) and one
or more
explanatory
(independent
variables) in the
Population
Specify
probability
distribution of
random error
term. Estimate
the standard
deviation of the
error
Estimate
unknown model
parameters
Interpret the
estimated
parameters?
What is a
parameter?
U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Model Specification is Based on Theory
Theory of field
(e.g., Sociology)
Mathematical
theory
Previous research
“Common sense”
U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Types of Regression Models
Simple
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Linear Linear Non- Linear
Non- Linear
U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Linear Regression Models
Relationship between variables is a linear function
y 
Dependent (Response)
Variable
 x 
= + +
Population y - intercept Participation Slope Random Error
Independent (Explanatory)
Variable
0 1
U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Population Linear Regression Model
y
x
0 1
i i i
y x
  
  
  0 1
E y x
 
 
Observed value
Observed value
i = Random error
U N I V E R S I T Y O F S O U T H F L O R I D A // 13
Sample Linear Regression Model
y
x
0 1
ˆ ˆ ˆ
i i i
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
Unsampled observation
i = Random error
Observed value
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 14
Estimating Parameters: Least Squares Method
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Scattergram
0
20
40
60
0 20 40 60
x
y
Plot of all (xi, yi) pairs
Suggests how well the model will fit
U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Thinking Challenge
How would you draw a line
through the points?
0
20
40
60
0 20 40 60
x
y
How would you determine
which line fits best?
U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Least Squares
“Best fit’ means the
difference between
actual y values and
estimated or predicted y
values are a minimum
 
2 2
1 1
ˆ ˆ
n n
i
i i
i i
y y 
 
 
 
Positive differences off-set
negative
Least Squares minimizes
the Sum of the Squared
Differences (SSE)
U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Least Squares Graphically
e2
y
x
e1 e3
e4
^
^
^
^
2 0 1 2 2
ˆ ˆ ˆ
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
2 2 2 2 2
1 2 3 4
1
ˆ ˆ ˆ ˆ ˆ
LS minimizes
n
i
i
    

   

U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Coefficient Equations
Prediction Equation
0 1
ˆ ˆ
ŷ x
 
 
1 1
1
1 2
1
2
1
ˆ
n n
i i
n
i i
i i
xy i
n
xx
i
n
i
i
i
x y
x y
SS n
SS
x
x
n

 



  
  
  

 
 
 
 

 



Slope
0 1
ˆ ˆ
y x
 
 
y-intercept
U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Estimated y changes by 1 for each 1unit increase in x
Interpretation of Coefficients
If 1 = 2, then Sales (y) is expected to increase by 2 for each
1 unit increase in Advertising (x)
The average value of y when x = 0
If 0 = 4, then Average Sales (y) is expected to be 4 when
Advertising (x) is 0
Slope (1)
Y-Intercept (0)
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Parameter Estimation Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
0
^
1
^
ˆ .1 .7
y x
  
U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Sales Volume (y) is expected to increase by .7 units for
each $1 increase in Advertising (x)
Coefficient Interpretation Solution
Average value of Sales Volume (y) is -.10 units when
Advertising (x) is 0
 Difficult to explain to marketing manager
 Expect some sales without advertising
Slope (1)
Y-Intercept (0)
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Probability Distribution of Random Error
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Linear Regression Assumptions
The mean probability
distribution of error, ε, is
0
The probability
distribution of error, ε, is
approximately normally
distributed
The probability
distribution of error has
a constant variance
Errors are independent
U N I V E R S I T Y O F S O U T H F L O R I D A // 25
Error Probability Distribution
x1 x2 x3
y
E(y) = β0 + β1x
x
Variation of actual y from
predicted y, y
Random Error Variation
Measured by standard error of
regression model. Sample
standard deviation of  : s
Affects several factors like
parameter significance and
prediction accuracy
U N I V E R S I T Y O F S O U T H F L O R I D A // 27
Variation Measures
y
x
xi
0 1
ˆ ˆ
ˆi i
y x
 
 
yi
2
ˆ
( )
i i
y y

Unexplained sum of
squares or SSE
2
( )
i
y y

Total sum of squares
2
ˆ
( )
i
y y

Explained sum of
squares
y
U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Estimation of Variance of Error σ2
 
2
2
ˆ
2
i i
SSE
s where SSE y y
n
  


2
2
SSE
s s
n
 

U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Residual Analysis
e Y Y
= -
i i
ˆ
Check the assumptions of regression by examining the residuals
 Examine for linearity assumption
 Evaluate independence assumption
 Evaluate normal distribution assumption
 Examine for constant variance for all levels of X (homoscedasticity)
The residual for observation i, ei, is the difference between its
observed and predicted value
U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Residual Analysis for Linearity
Not Linear Linear
x
residuals
x
Y
x
Y
x
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 31
Residual Analysis for Independence
Not Independent Independent
X
X
residuals
residuals
X
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 32
Check for Normality
Examine the Sem-and-Leaf Display of the Residuals
Examine the Boxplot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the Residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Residual Analysis for Normality
Percent
Residual
When using a normal probability plot, normal errors
will approximately display in a straight line
-3 -2 -1 0 1 2 3
0
100
U N I V E R S I T Y O F S O U T H F L O R I D A // 34
Residual Analysis for Equal Variance
Non-constant variance Constant variance
x x
Y
x x
Y
residuals
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 35
Interpreting the Model - Testing for Significance
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Interpret model
U N I V E R S I T Y O F S O U T H F L O R I D A // 36
Test of Slope Coefficient
Shows if there is a linear
relationship between x
and y
Hypotheses:
Involves population
slope 1
Theoretical basis is
sampling distribution of
slope
 H0: 1 = 0 (No Linear Relationship)
 Ha: 1  0 (Linear Relationship)
U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Sampling Distribution of Sample Slopes
y
Population Line
x
Sample 1 Line
Sample 2 Line
1
Sampling Distribution
1
1
S
^
^
All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
: :
Very large number of
sample slopes
U N I V E R S I T Y O F S O U T H F L O R I D A // 38
Slope Coefficient Test Statistic
1
1 1
ˆ
2
1
2
1
ˆ ˆ
2
where
xx
n
i
n
i
xx i
i
t df n
s
S
SS
x
SS x
n

 


   
 
 
 
 


U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Test of Slope Coefficient Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
t = 1 / S
P-Value
S
1 1 1
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 40
Prediction with Regression Models
Types of predictions
What is predicted?
 Point estimates
 Interval estimates
 Population mean response E (y) for given x
 Point on population regression line
 Individual response (y) for given x
U N I V E R S I T Y O F S O U T H F L O R I D A // 41
Confidence Interval Estimate for Mean Value of y at x = x
 
xx
p
SS
x
x
n
S
t
y
2
2
/
1
ˆ


 
df = n – 2
p
U N I V E R S I T Y O F S O U T H F L O R I D A // 42
Factors Affecting Interval Width
Level of confidence (1 – )
 Width increases as confidence increases
Data dispersion (s)
 Width increases as variation increases
Sample size
 Width decreases as sample size increases
Distance of x from mean x
 Width increases as distance increases
p
-
U N I V E R S I T Y O F S O U T H F L O R I D A // 43
Prediction Interval of Individual Value of y at x = x
df = n – 2
p
 
2
/2
1
ˆ 1
p
xx
x x
y t S
n SS


  
U N I V E R S I T Y O F S O U T H F L O R I D A // 44
Key Takeaway
The statistical
interpretation is the
value proposition of
the linear
regression model
The statistical
interpretation
depends on
assumptions of the
linear model being
met
Understanding
outliers is critical for
drawing meaningful
inferences from the
linear regression
model
U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.

More Related Content

Similar to Linear Regression

lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
MoinPasha12
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
Bhavik2002
 
Regression
RegressionRegression
Regression
Sauravurp
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
Pakistan Gum Industries Pvt. Ltd
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical Methods
Michael770443
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
 
Regression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with exampleRegression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with example
shivshankarshiva98
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
Mansi Rastogi
 
Chapter13
Chapter13Chapter13
Chapter13
rwmiller
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
TanyaWadhwani4
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
akashayosha
 
ders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptxders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptx
Ergin Akalpler
 
10.Analysis of Variance.ppt
10.Analysis of Variance.ppt10.Analysis of Variance.ppt
10.Analysis of Variance.ppt
AbdulhaqAli
 
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxPampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
bunyansaturnina
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
Sameeraasif2
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
simonithomas47935
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
Long Beach City College
 
Correlation
CorrelationCorrelation
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
Pakistan Gum Industries Pvt. Ltd
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
Babasab Patil
 

Similar to Linear Regression (20)

lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Regression
RegressionRegression
Regression
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
Introduction to Statistical Methods
Introduction to Statistical MethodsIntroduction to Statistical Methods
Introduction to Statistical Methods
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with exampleRegression.ppt basic introduction of regression with example
Regression.ppt basic introduction of regression with example
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
Chapter13
Chapter13Chapter13
Chapter13
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
ders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptxders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptx
 
10.Analysis of Variance.ppt
10.Analysis of Variance.ppt10.Analysis of Variance.ppt
10.Analysis of Variance.ppt
 
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxPampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
Correlation
CorrelationCorrelation
Correlation
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 

More from Michael770443

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2
Michael770443
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
Michael770443
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
Michael770443
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
Michael770443
 
Classification
ClassificationClassification
Classification
Michael770443
 
Segmentation: Clustering and Classification
Segmentation: Clustering and ClassificationSegmentation: Clustering and Classification
Segmentation: Clustering and Classification
Michael770443
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical Concepts
Michael770443
 

More from Michael770443 (7)

Discrete Choice Model - Part 2
Discrete Choice Model - Part 2Discrete Choice Model - Part 2
Discrete Choice Model - Part 2
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Classification
ClassificationClassification
Classification
 
Segmentation: Clustering and Classification
Segmentation: Clustering and ClassificationSegmentation: Clustering and Classification
Segmentation: Clustering and Classification
 
Overview of Statistical Concepts
Overview of Statistical ConceptsOverview of Statistical Concepts
Overview of Statistical Concepts
 

Recently uploaded

C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 

Recently uploaded (20)

C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 

Linear Regression

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // Linear Regression Concepts Dr. S. Shivendu
  • 2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2 Objectives Linear Regression Concepts Identify the mathematical basis of linear regression. 01 Differentiate statistical inferences about relationships based on regression output. 02 Analyze the concepts of p-value, hypothesis testing, and confidence intervals, and their interpretation. 03
  • 3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3 Agenda Linear Regression Concepts Regression Analysis Introduction Linear Regression Concepts Assumptions Concepts Coefficient Confidence Intervals Concepts Prediction Confidence Intervals Concepts
  • 4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4 Models A mathematical model is a mathematical expression of some phenomenon Describe relationships between variables Deterministic Models Probabilistic Models
  • 5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5 Deterministic Models Hypothesize exact relationships. Suitable when the relationship is certain and known. Example: Force is exactly mass times acceleration  F = m·a
  • 6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6 The relationship is not certain and all factors that impact the outcome are not known Hypothesize two components Probabilistic Models  Deterministic and random error Example: Sales volume (y) is 10 times advertising spending (x) + random error  y = 10x +   The random error may be due to factors other than advertising
  • 7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7 Regression Models Answers: “What is the relationship between the variables?” Equations used: One numerical dependent (response) variable Used mainly for estimating the strength of the relationship and for prediction One or more numerical or categorical independent (explanatory) variables
  • 8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8 Regression Modeling Steps Hypothesize the deterministic relationship between the response variable (dependent variable) and one or more explanatory (independent variables) in the Population Specify probability distribution of random error term. Estimate the standard deviation of the error Estimate unknown model parameters Interpret the estimated parameters? What is a parameter?
  • 9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9 Model Specification is Based on Theory Theory of field (e.g., Sociology) Mathematical theory Previous research “Common sense”
  • 10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10 Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Linear Non- Linear Non- Linear
  • 11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11 Linear Regression Models Relationship between variables is a linear function y  Dependent (Response) Variable  x  = + + Population y - intercept Participation Slope Random Error Independent (Explanatory) Variable 0 1
  • 12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12 Population Linear Regression Model y x 0 1 i i i y x         0 1 E y x     Observed value Observed value i = Random error
  • 13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13 Sample Linear Regression Model y x 0 1 ˆ ˆ ˆ i i i y x       0 1 ˆ ˆ ˆi i y x     Unsampled observation i = Random error Observed value ^
  • 14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14 Estimating Parameters: Least Squares Method Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Evaluate model Use model for prediction and estimation
  • 15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15 Scattergram 0 20 40 60 0 20 40 60 x y Plot of all (xi, yi) pairs Suggests how well the model will fit
  • 16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16 Thinking Challenge How would you draw a line through the points? 0 20 40 60 0 20 40 60 x y How would you determine which line fits best?
  • 17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17 Least Squares “Best fit’ means the difference between actual y values and estimated or predicted y values are a minimum   2 2 1 1 ˆ ˆ n n i i i i i y y        Positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE)
  • 18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18 Least Squares Graphically e2 y x e1 e3 e4 ^ ^ ^ ^ 2 0 1 2 2 ˆ ˆ ˆ y x       0 1 ˆ ˆ ˆi i y x     2 2 2 2 2 1 2 3 4 1 ˆ ˆ ˆ ˆ ˆ LS minimizes n i i           
  • 19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19 Coefficient Equations Prediction Equation 0 1 ˆ ˆ ŷ x     1 1 1 1 2 1 2 1 ˆ n n i i n i i i i xy i n xx i n i i i x y x y SS n SS x x n                               Slope 0 1 ˆ ˆ y x     y-intercept
  • 20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20 Estimated y changes by 1 for each 1unit increase in x Interpretation of Coefficients If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x) The average value of y when x = 0 If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0 Slope (1) Y-Intercept (0) ^ ^ ^ ^
  • 21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21 Parameter Estimation Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 0 ^ 1 ^ ˆ .1 .7 y x   
  • 22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22 Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x) Coefficient Interpretation Solution Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0  Difficult to explain to marketing manager  Expect some sales without advertising Slope (1) Y-Intercept (0) ^ ^ ^ ^
  • 23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23 Probability Distribution of Random Error Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Evaluate model Use model for prediction and estimation
  • 24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24 Linear Regression Assumptions The mean probability distribution of error, ε, is 0 The probability distribution of error, ε, is approximately normally distributed The probability distribution of error has a constant variance Errors are independent
  • 25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25 Error Probability Distribution x1 x2 x3 y E(y) = β0 + β1x x
  • 26. Variation of actual y from predicted y, y Random Error Variation Measured by standard error of regression model. Sample standard deviation of  : s Affects several factors like parameter significance and prediction accuracy
  • 27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27 Variation Measures y x xi 0 1 ˆ ˆ ˆi i y x     yi 2 ˆ ( ) i i y y  Unexplained sum of squares or SSE 2 ( ) i y y  Total sum of squares 2 ˆ ( ) i y y  Explained sum of squares y
  • 28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28 Estimation of Variance of Error σ2   2 2 ˆ 2 i i SSE s where SSE y y n      2 2 SSE s s n   
  • 29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29 Residual Analysis e Y Y = - i i ˆ Check the assumptions of regression by examining the residuals  Examine for linearity assumption  Evaluate independence assumption  Evaluate normal distribution assumption  Examine for constant variance for all levels of X (homoscedasticity) The residual for observation i, ei, is the difference between its observed and predicted value
  • 30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30 Residual Analysis for Linearity Not Linear Linear x residuals x Y x Y x residuals
  • 31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31 Residual Analysis for Independence Not Independent Independent X X residuals residuals X residuals
  • 32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32 Check for Normality Examine the Sem-and-Leaf Display of the Residuals Examine the Boxplot of the Residuals Examine the Histogram of the Residuals Construct a Normal Probability Plot of the Residuals
  • 33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33 Residual Analysis for Normality Percent Residual When using a normal probability plot, normal errors will approximately display in a straight line -3 -2 -1 0 1 2 3 0 100
  • 34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34 Residual Analysis for Equal Variance Non-constant variance Constant variance x x Y x x Y residuals residuals
  • 35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35 Interpreting the Model - Testing for Significance Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Interpret model
  • 36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36 Test of Slope Coefficient Shows if there is a linear relationship between x and y Hypotheses: Involves population slope 1 Theoretical basis is sampling distribution of slope  H0: 1 = 0 (No Linear Relationship)  Ha: 1  0 (Linear Relationship)
  • 37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37 Sampling Distribution of Sample Slopes y Population Line x Sample 1 Line Sample 2 Line 1 Sampling Distribution 1 1 S ^ ^ All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : : Very large number of sample slopes
  • 38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38 Slope Coefficient Test Statistic 1 1 1 ˆ 2 1 2 1 ˆ ˆ 2 where xx n i n i xx i i t df n s S SS x SS x n                   
  • 39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39 Test of Slope Coefficient Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 t = 1 / S P-Value S 1 1 1 ^ ^ ^ ^
  • 40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40 Prediction with Regression Models Types of predictions What is predicted?  Point estimates  Interval estimates  Population mean response E (y) for given x  Point on population regression line  Individual response (y) for given x
  • 41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41 Confidence Interval Estimate for Mean Value of y at x = x   xx p SS x x n S t y 2 2 / 1 ˆ     df = n – 2 p
  • 42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42 Factors Affecting Interval Width Level of confidence (1 – )  Width increases as confidence increases Data dispersion (s)  Width increases as variation increases Sample size  Width decreases as sample size increases Distance of x from mean x  Width increases as distance increases p -
  • 43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43 Prediction Interval of Individual Value of y at x = x df = n – 2 p   2 /2 1 ˆ 1 p xx x x y t S n SS     
  • 44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44 Key Takeaway The statistical interpretation is the value proposition of the linear regression model The statistical interpretation depends on assumptions of the linear model being met Understanding outliers is critical for drawing meaningful inferences from the linear regression model
  • 45. U N I V E R S I T Y O F S O U T H F L O R I D A // You have reached the end of the presentation.