SlideShare a Scribd company logo
1 of 25
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 1
The Linear Model
 Correlation says “There seems to be a linear
association between these two variables,” but it
doesn’t tell what that association is.
 We can say more about the linear relationship
between two quantitative variables with a model.
 A model simplifies reality to help us understand
underlying patterns and relationships.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 2
The Linear Model (cont.)
 The linear model is just an equation of a straight
line through the data.
 The points in the scatterplot don’t all line up,
but a straight line can summarize the general
pattern.
 The linear model can help us understand how
the values are associated.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 3
Residuals
 The model won’t be perfect, regardless of the line
we draw.
 Some points will be above the line and some will
be below.
 The estimate made from a model is the predicted
value (denoted as ).ˆy
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 4
Residuals (cont.)
 The difference between the observed value and
its associated predicted value is called the
residual.
 To find the residuals, we always subtract the
predicted value from the observed one:
ˆresidual observed predicted y y= − = −
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 5
Fat versus Protein: An Example
 The following is a scatterplot of total fat versus
protein for 30 items on the Burger King menu:
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 6
Residuals (cont.)
 A negative residual means
the predicted value’s too
big (an overestimate).
 A positive residual means
the predicted value’s too
small (an underestimate).
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Why do we need them?
Regression models have many
applications in industry and science.
Regression models can predict future trends, like
population growth.
Regression models can be used to estimate costs
and profits for businesses.
Regression models can help medical care
professionals determine how much medication to
prescribe for their patients.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 8
“Best Fit” Means Least Squares
 Some residuals are positive, others are negative,
and, on average, they cancel each other out.
 So, we can’t assess how well the line fits by
adding up all the residuals.
 Similar to what we did with deviations, we square
the residuals and add the squares.
 The smaller the sum, the better the fit.
 The line of best fit is the line for which the sum of
the squared residuals is smallest.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 9
The Least Squares Line
 We write our model as
 This model says that our predictions from our
model follow a straight line.
 If the model is a good one, the data values will
scatter closely around it.
0 1
ˆy b b x= +
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 10
The Least Squares Line (cont.)
 In our model, we have a slope (b1):
 The slope is built from the correlation and the
standard deviations:
 Our slope is always in units of y per unit of x.
1
y
x
s
b r
s
=
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario 11
The Least Squares Equation
 The formulas for b1 and b0 are:
algebraic equivalent:
∑ ∑
∑ ∑ ∑
−
−
=
n
x
x
n
yx
xy
b 2
2
1
)(
∑
∑
−
−−
= 21
)(
))((
xx
yyxx
b
xbyb 10 −=
and
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 12
Fat versus Protein: An Example
 The regression line for the
Burger King data fits the
data well:
 The equation is
 The predicted fat
content for a BK Broiler
chicken sandwich is
6.8 + 0.97(30) = 35.9
grams of fat.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 13
The Least Squares Line (cont.)
 Since regression and correlation are closely
related, we need to check the same conditions for
regressions as we did for correlations:
 Quantitative Variables Condition
 Straight Enough Condition
 Outlier Condition
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 14
How Big Can Predicted Values Get?
 r cannot be bigger than 1 (in absolute value),
so each predicted y tends to be closer to its mean
(in standard deviations) than its corresponding
x was.
 This property of the linear model is called
regression to the mean; the line is called the
regression line.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 15
Residuals Revisited
 The linear model assumes that the relationship
between the two variables is a perfect straight
line. The residuals are the part of the data that
hasn’t been modeled.
Data = Model + Residual
or (equivalently)
Residual = Data – Model
Or, in symbols,
ˆe y y= −
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 16
Residuals Revisited (cont.)
 Residuals help us to see whether the model
makes sense.
 When a regression model is appropriate, nothing
interesting should be left behind.
 After we fit a regression model, we usually plot
the residuals in the hope of finding…nothing.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 17
Residuals Revisited (cont.)
 The residuals for the BK menu regression look
appropriately boring:
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 18
R2
—The Variation Accounted For
 The variation in the residuals is the key to
assessing how well the model fits.
 In the BK menu items
example, total fat has
a standard deviation
of 16.4 grams. The
standard deviation
of the residuals
is 9.2 grams.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 19
R2
—The Variation Accounted For (cont.)
 If the correlation were 1.0 and the model
predicted the fat values perfectly, the residuals
would all be zero and have no variation.
 As it is, the correlation is 0.83—not perfection.
 However, we did see that the model residuals
had less variation than total fat alone.
 We can determine how much of the variation is
accounted for by the model and how much is left
in the residuals.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 20
R2
—The Variation Accounted For (cont.)
 The squared correlation, r2
, gives the fraction of
the data’s variance accounted for by the model.
 Thus, 1– r2
is the fraction of the original variance
left in the residuals.
 For the BK model, r2
= 0.832
= 0.69, so 31% of the
variability in total fat has been left in the
residuals.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 21
R2
—The Variation Accounted For (cont.)
 All regression analyses include this statistic,
although by tradition, it is written R2
(pronounced
“R-squared”). An R2
of 0 means that none of the
variance in the data is in the model; all of it is still
in the residuals.
 When interpreting a regression model you need
to Tell what R2
means.
 In the BK example, 69% of the variation in total
fat is accounted for by the model.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 22
How Big Should R2
Be?
 R2
is always between 0% and 100%. What makes
a “good” R2
value depends on the kind of data
you are analyzing and on what you want to do
with it.
 The standard deviation of the residuals can give
us more information about the usefulness of the
regression by telling us how much scatter there is
around the line.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 23
Regressions Assumptions and Conditions (cont.)
 It’s a good idea to check linearity again after
computing the regression when we can examine
the residuals.
 You should also check for outliers, which could
change the regression.
 If the data seem to clump or cluster in the
scatterplot, that could be a sign of trouble worth
looking into further.
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 24
 If the scatterplot is not straight enough, stop here.
 You can’t use a linear model for any two
variables, even if they are related.
 They must have a linear association or the
model won’t mean a thing.
 Some nonlinear relationships can be saved by re-
expressing the data to make the scatterplot more
linear.
Regressions Assumptions and Conditions (cont.)
Copyright © 2012 Pearson Canada Inc., Toronto, Ontario
Slide 8- 25
What Can Go Wrong?
 Don’t fit a straight line to a nonlinear relationship.
 Beware extraordinary points (y-values that stand off from
the linear pattern or extreme x-values).
 Don’t extrapolate beyond the data—the linear model may
no longer hold outside of the range of the data.
 Don’t infer that x causes y just because there is a good
linear model for their relationship—association is not
causation.
 Don’t choose a model based on R2
alone.

More Related Content

Similar to Linear regression

notes on correlational research
notes on correlational researchnotes on correlational research
notes on correlational researchSiti Ishark
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_FinalEric Esajian
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...
Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...
Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...Solventure
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Analysis and Estimation of Child Mortality and the Influence of Maternal Care...
Analysis and Estimation of Child Mortality and the Influence of Maternal Care...Analysis and Estimation of Child Mortality and the Influence of Maternal Care...
Analysis and Estimation of Child Mortality and the Influence of Maternal Care...IRJET Journal
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionMohamad Sahil
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxoreo10
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Gaetan Lion
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Building a Regression Model using SPSS
Building a Regression Model using SPSSBuilding a Regression Model using SPSS
Building a Regression Model using SPSSZac Bodner
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSOsama Yousaf
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationAsadJaved304231
 

Similar to Linear regression (20)

notes on correlational research
notes on correlational researchnotes on correlational research
notes on correlational research
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_Final
 
Correlation
CorrelationCorrelation
Correlation
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...
Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...
Chapter 2 - Benchmarking and the Best Practice Frontier in the Supply Chain T...
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Analysis and Estimation of Child Mortality and the Influence of Maternal Care...
Analysis and Estimation of Child Mortality and the Influence of Maternal Care...Analysis and Estimation of Child Mortality and the Influence of Maternal Care...
Analysis and Estimation of Child Mortality and the Influence of Maternal Care...
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
200994363
200994363200994363
200994363
 
Econometrics
EconometricsEconometrics
Econometrics
 
SEM
SEMSEM
SEM
 
Building a Regression Model using SPSS
Building a Regression Model using SPSSBuilding a Regression Model using SPSS
Building a Regression Model using SPSS
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 

More from suncil0071

Conditional probability
Conditional probabilityConditional probability
Conditional probabilitysuncil0071
 
Basics of probability
Basics of probabilityBasics of probability
Basics of probabilitysuncil0071
 
Cluster and multistage sampling
Cluster and multistage samplingCluster and multistage sampling
Cluster and multistage samplingsuncil0071
 
Stratified sampling
Stratified samplingStratified sampling
Stratified samplingsuncil0071
 
Simple random sampling
Simple random samplingSimple random sampling
Simple random samplingsuncil0071
 
Simple random sampling
Simple random samplingSimple random sampling
Simple random samplingsuncil0071
 
Ideas to understand
Ideas to understandIdeas to understand
Ideas to understandsuncil0071
 
Understanding randomness
Understanding randomnessUnderstanding randomness
Understanding randomnesssuncil0071
 
Measures of central tendency mean
Measures of central tendency  meanMeasures of central tendency  mean
Measures of central tendency meansuncil0071
 
Histograms & stem plots
Histograms  & stem plotsHistograms  & stem plots
Histograms & stem plotssuncil0071
 

More from suncil0071 (11)

Conditional probability
Conditional probabilityConditional probability
Conditional probability
 
Basics of probability
Basics of probabilityBasics of probability
Basics of probability
 
Who’s who
Who’s whoWho’s who
Who’s who
 
Cluster and multistage sampling
Cluster and multistage samplingCluster and multistage sampling
Cluster and multistage sampling
 
Stratified sampling
Stratified samplingStratified sampling
Stratified sampling
 
Simple random sampling
Simple random samplingSimple random sampling
Simple random sampling
 
Simple random sampling
Simple random samplingSimple random sampling
Simple random sampling
 
Ideas to understand
Ideas to understandIdeas to understand
Ideas to understand
 
Understanding randomness
Understanding randomnessUnderstanding randomness
Understanding randomness
 
Measures of central tendency mean
Measures of central tendency  meanMeasures of central tendency  mean
Measures of central tendency mean
 
Histograms & stem plots
Histograms  & stem plotsHistograms  & stem plots
Histograms & stem plots
 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Linear regression

  • 1. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 1 The Linear Model  Correlation says “There seems to be a linear association between these two variables,” but it doesn’t tell what that association is.  We can say more about the linear relationship between two quantitative variables with a model.  A model simplifies reality to help us understand underlying patterns and relationships.
  • 2. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 2 The Linear Model (cont.)  The linear model is just an equation of a straight line through the data.  The points in the scatterplot don’t all line up, but a straight line can summarize the general pattern.  The linear model can help us understand how the values are associated.
  • 3. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 3 Residuals  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line and some will be below.  The estimate made from a model is the predicted value (denoted as ).ˆy
  • 4. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 4 Residuals (cont.)  The difference between the observed value and its associated predicted value is called the residual.  To find the residuals, we always subtract the predicted value from the observed one: ˆresidual observed predicted y y= − = −
  • 5. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 5 Fat versus Protein: An Example  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:
  • 6. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 6 Residuals (cont.)  A negative residual means the predicted value’s too big (an overestimate).  A positive residual means the predicted value’s too small (an underestimate).
  • 7. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Why do we need them? Regression models have many applications in industry and science. Regression models can predict future trends, like population growth. Regression models can be used to estimate costs and profits for businesses. Regression models can help medical care professionals determine how much medication to prescribe for their patients.
  • 8. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 8 “Best Fit” Means Least Squares  Some residuals are positive, others are negative, and, on average, they cancel each other out.  So, we can’t assess how well the line fits by adding up all the residuals.  Similar to what we did with deviations, we square the residuals and add the squares.  The smaller the sum, the better the fit.  The line of best fit is the line for which the sum of the squared residuals is smallest.
  • 9. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 9 The Least Squares Line  We write our model as  This model says that our predictions from our model follow a straight line.  If the model is a good one, the data values will scatter closely around it. 0 1 ˆy b b x= +
  • 10. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 10 The Least Squares Line (cont.)  In our model, we have a slope (b1):  The slope is built from the correlation and the standard deviations:  Our slope is always in units of y per unit of x. 1 y x s b r s =
  • 11. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario 11 The Least Squares Equation  The formulas for b1 and b0 are: algebraic equivalent: ∑ ∑ ∑ ∑ ∑ − − = n x x n yx xy b 2 2 1 )( ∑ ∑ − −− = 21 )( ))(( xx yyxx b xbyb 10 −= and
  • 12. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 12 Fat versus Protein: An Example  The regression line for the Burger King data fits the data well:  The equation is  The predicted fat content for a BK Broiler chicken sandwich is 6.8 + 0.97(30) = 35.9 grams of fat.
  • 13. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 13 The Least Squares Line (cont.)  Since regression and correlation are closely related, we need to check the same conditions for regressions as we did for correlations:  Quantitative Variables Condition  Straight Enough Condition  Outlier Condition
  • 14. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 14 How Big Can Predicted Values Get?  r cannot be bigger than 1 (in absolute value), so each predicted y tends to be closer to its mean (in standard deviations) than its corresponding x was.  This property of the linear model is called regression to the mean; the line is called the regression line.
  • 15. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 15 Residuals Revisited  The linear model assumes that the relationship between the two variables is a perfect straight line. The residuals are the part of the data that hasn’t been modeled. Data = Model + Residual or (equivalently) Residual = Data – Model Or, in symbols, ˆe y y= −
  • 16. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 16 Residuals Revisited (cont.)  Residuals help us to see whether the model makes sense.  When a regression model is appropriate, nothing interesting should be left behind.  After we fit a regression model, we usually plot the residuals in the hope of finding…nothing.
  • 17. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 17 Residuals Revisited (cont.)  The residuals for the BK menu regression look appropriately boring:
  • 18. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 18 R2 —The Variation Accounted For  The variation in the residuals is the key to assessing how well the model fits.  In the BK menu items example, total fat has a standard deviation of 16.4 grams. The standard deviation of the residuals is 9.2 grams.
  • 19. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 19 R2 —The Variation Accounted For (cont.)  If the correlation were 1.0 and the model predicted the fat values perfectly, the residuals would all be zero and have no variation.  As it is, the correlation is 0.83—not perfection.  However, we did see that the model residuals had less variation than total fat alone.  We can determine how much of the variation is accounted for by the model and how much is left in the residuals.
  • 20. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 20 R2 —The Variation Accounted For (cont.)  The squared correlation, r2 , gives the fraction of the data’s variance accounted for by the model.  Thus, 1– r2 is the fraction of the original variance left in the residuals.  For the BK model, r2 = 0.832 = 0.69, so 31% of the variability in total fat has been left in the residuals.
  • 21. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 21 R2 —The Variation Accounted For (cont.)  All regression analyses include this statistic, although by tradition, it is written R2 (pronounced “R-squared”). An R2 of 0 means that none of the variance in the data is in the model; all of it is still in the residuals.  When interpreting a regression model you need to Tell what R2 means.  In the BK example, 69% of the variation in total fat is accounted for by the model.
  • 22. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 22 How Big Should R2 Be?  R2 is always between 0% and 100%. What makes a “good” R2 value depends on the kind of data you are analyzing and on what you want to do with it.  The standard deviation of the residuals can give us more information about the usefulness of the regression by telling us how much scatter there is around the line.
  • 23. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 23 Regressions Assumptions and Conditions (cont.)  It’s a good idea to check linearity again after computing the regression when we can examine the residuals.  You should also check for outliers, which could change the regression.  If the data seem to clump or cluster in the scatterplot, that could be a sign of trouble worth looking into further.
  • 24. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 24  If the scatterplot is not straight enough, stop here.  You can’t use a linear model for any two variables, even if they are related.  They must have a linear association or the model won’t mean a thing.  Some nonlinear relationships can be saved by re- expressing the data to make the scatterplot more linear. Regressions Assumptions and Conditions (cont.)
  • 25. Copyright © 2012 Pearson Canada Inc., Toronto, Ontario Slide 8- 25 What Can Go Wrong?  Don’t fit a straight line to a nonlinear relationship.  Beware extraordinary points (y-values that stand off from the linear pattern or extreme x-values).  Don’t extrapolate beyond the data—the linear model may no longer hold outside of the range of the data.  Don’t infer that x causes y just because there is a good linear model for their relationship—association is not causation.  Don’t choose a model based on R2 alone.