SlideShare a Scribd company logo
1 of 13
SIMPLE REGRESSION
MODEL
Regression analysis is a form of predictive modelling
technique which investigates the relationship
between a dependent (target) and independent
variable (s) (predictor).
This technique is used for forecasting, time series
modelling and finding the causal effect relationship
between the variables.
It is the two-variable
(bivariate ) linear
model because it relates the
two variables x and y.
Y dependent (explained, response, predicted) variable,
regressand
X independent (explanatory, control, predictor) variable,
regressor
ε error term or disturbance
β1 slope coefficient
β0 the intercept coefficient or the constant term
Table 2.1 Terminology for Simple regression
REGRESSION VERSUS CORRELATION
correlation analysis
the primary objective is to
measure the strength or degree
of linear association between
two variables.
regression analysis
the primary objective is to
estimate or predict the average
value of one variable on the
basis of the fixed values of
other variables
Least Squares Principle
The method of least squares estimates the parameters β1 and β2 by
minimizing the sum of squares of difference between the observations and
the line in the scatter diagram.
The intercept and slope of this line, the line that best fits
the data using the leasts quares principle,are b1 and
b2,the leasts quares estimates of β1 and β2. The fitted line
itself is then
𝑦𝑖 = 𝑏1 + 𝑏2𝑥𝑖
The vertical distances from each point to the fitted line
are the leasts quares residuals. They are given by
𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖 = 𝑦𝑖 − 𝑏1 + 𝑏2𝑥𝑖 Figure. The relationship
among y, 𝑒𝑖 and the fitted
regression line
𝑏2 =
(𝑥𝑖−𝑥)(𝑦𝑖−𝑦)
(𝑥𝑖−𝑥)2
𝑏1 = 𝑦 − 𝑏2𝑥
where 𝑦 = 𝑦𝑖/𝑁 and 𝑥 = 𝑥𝑖/𝑁 are the sample means of the
observations on y and x.
Elasticity is the measurement of the proportional change of an
economic variable in response to a change in another.
The elasticity of a variable y with respect to another variable x is
𝜀 =
𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑦
𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑥
=
∆ 𝑦 𝑦
∆ 𝑥 𝑥
=
∆𝑦
∆𝑥
×
𝑥
𝑦
𝜀 = 𝑏2
𝑥
𝑦
= 10.21 ×
19.60
283.57
= 0.71
We estimate that a 1% increase in
weekly household income will
lead, on average, to a 0.71%
increase in weekly household
expenditure on food
Assessing the Fit of Regression Models
Three statistics are used in Ordinary Least Squares (OLS) regression to
evaluate model fit:
 R-squared,
 the overall F-test,
 the Root Mean Square Error (RMSE).
All three are based on two sums of squares: Sum of Squares Total (SST)
and Sum of Squares Error (SSE). SST measures how far the data are
from the mean, and SSE measures how far the data are from the
model’s predicted values. Different combinations of these two values
provide different information about how the regression model compares
to the mean model.
Sum of Squares Total (SST) is a measure of the total sample
variation in the yi; that is, it measures how spread out the yi are in
the sample
𝑆𝑆𝑇 =
𝑖=1
𝑛
𝑦𝑖 − 𝑦 2
Sum of Squares Error (SSE) measures the sample variation in
the 𝑦𝑖
𝑆𝑆𝐸 =
𝑖=1
𝑛
𝑦𝑖 − 𝑦 2
The R-squared of the regression (the coefficient of
determination) is defined as
Adjusted R-squared incorporates the model’s degrees of freedom.
𝑅2
=
𝑆𝑆𝐸
𝑆𝑆𝑇
R2 is the ratio of the explained variation compared to the total variation;
thus, it is interpreted as the fraction of the sample variation in y that is
explained by x
The value of R2 is always between zero and one
 Adjusted R-squared will decrease as predictors are added if the increase in
model fit does not make up for the loss of degrees of freedom
 R-squared should always be used with models with more than one predictor
variable. It is interpreted as the proportion of total variance that is
explained by the model.
The F-test
An F test is used to test the significance of R. The hypotheses are
H0: p= 0 and H1: p ≠ 0 where r represents the population correlation
coefficient for multiple correlation
The formula for the F test is
𝐹 =
𝑅2/𝑘
(1 − 𝑅2)/(𝑛 − 𝑘 − 1)
where n is the number of data groups (x1, x2, . . . , y) and k is the
number of independent variables. The degrees of freedom are d.f.N.
= n - k and d.f.D. =n - k - 1.
A significant F-test indicates that the observed R-squared is reliable and is
not a spurious result of oddities in the data set.
Root Mean Square Error (RMSE) is the square root of the
variance of the residuals.
It indicates the absolute fit of the model to the data–how close the
observed data points are to the model’s predicted values
RMSE is a good measure of how accurately the model predicts the
response, and it is the most important criterion for fit if the main
purpose of the model is prediction.
The p-value is the probability of obtaining a test statistic at least
as extreme as the one that was actually observed, assuming that the
null hypothesis is true. If the p-value is less than 0.05 or 0.01,
corresponding respectively to a 5% or 1% chance of rejecting the null
hypothesis when it is true.
The p-value for each term tests the null hypothesis that the
coefficient is equal to zero (no effect).
A low p-value (< 0.05) indicates that you can reject the null
hypothesis. In other words, a predictor that has a low p-value is likely
to be a meaningful addition to your model because changes in the
predictor's value are related to changes in the response variable.
Conversely, a larger (insignificant) p-value suggests that changes
in the predictor are not associated with changes in the response.

More Related Content

Similar to SIMPLE REGRESSION MODEL FIT

Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxGauravRajole
 
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxPampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxbunyansaturnina
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notesJapheth Muthama
 
Correlation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxCorrelation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxHamdiMichaelCC
 
Correlation & Linear Regression
Correlation & Linear RegressionCorrelation & Linear Regression
Correlation & Linear RegressionHammad Waseem
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
Pearson product moment correlation
Pearson product moment correlationPearson product moment correlation
Pearson product moment correlationSharlaine Ruth
 

Similar to SIMPLE REGRESSION MODEL FIT (20)

Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptx
 
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxPampers CaseIn an increasingly competitive diaper market, P&G’.docx
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Regression
RegressionRegression
Regression
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notes
 
Correlation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxCorrelation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptx
 
Correlation & Linear Regression
Correlation & Linear RegressionCorrelation & Linear Regression
Correlation & Linear Regression
 
OLS chapter
OLS chapterOLS chapter
OLS chapter
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Linear Correlation
Linear Correlation Linear Correlation
Linear Correlation
 
Pearson product moment correlation
Pearson product moment correlationPearson product moment correlation
Pearson product moment correlation
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 

Recently uploaded

Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Recently uploaded (20)

Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

SIMPLE REGRESSION MODEL FIT

  • 2. Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables.
  • 3. It is the two-variable (bivariate ) linear model because it relates the two variables x and y. Y dependent (explained, response, predicted) variable, regressand X independent (explanatory, control, predictor) variable, regressor ε error term or disturbance β1 slope coefficient β0 the intercept coefficient or the constant term Table 2.1 Terminology for Simple regression
  • 4. REGRESSION VERSUS CORRELATION correlation analysis the primary objective is to measure the strength or degree of linear association between two variables. regression analysis the primary objective is to estimate or predict the average value of one variable on the basis of the fixed values of other variables
  • 5. Least Squares Principle The method of least squares estimates the parameters β1 and β2 by minimizing the sum of squares of difference between the observations and the line in the scatter diagram. The intercept and slope of this line, the line that best fits the data using the leasts quares principle,are b1 and b2,the leasts quares estimates of β1 and β2. The fitted line itself is then 𝑦𝑖 = 𝑏1 + 𝑏2𝑥𝑖 The vertical distances from each point to the fitted line are the leasts quares residuals. They are given by 𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖 = 𝑦𝑖 − 𝑏1 + 𝑏2𝑥𝑖 Figure. The relationship among y, 𝑒𝑖 and the fitted regression line
  • 6. 𝑏2 = (𝑥𝑖−𝑥)(𝑦𝑖−𝑦) (𝑥𝑖−𝑥)2 𝑏1 = 𝑦 − 𝑏2𝑥 where 𝑦 = 𝑦𝑖/𝑁 and 𝑥 = 𝑥𝑖/𝑁 are the sample means of the observations on y and x.
  • 7. Elasticity is the measurement of the proportional change of an economic variable in response to a change in another. The elasticity of a variable y with respect to another variable x is 𝜀 = 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑦 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑥 = ∆ 𝑦 𝑦 ∆ 𝑥 𝑥 = ∆𝑦 ∆𝑥 × 𝑥 𝑦 𝜀 = 𝑏2 𝑥 𝑦 = 10.21 × 19.60 283.57 = 0.71 We estimate that a 1% increase in weekly household income will lead, on average, to a 0.71% increase in weekly household expenditure on food
  • 8. Assessing the Fit of Regression Models Three statistics are used in Ordinary Least Squares (OLS) regression to evaluate model fit:  R-squared,  the overall F-test,  the Root Mean Square Error (RMSE). All three are based on two sums of squares: Sum of Squares Total (SST) and Sum of Squares Error (SSE). SST measures how far the data are from the mean, and SSE measures how far the data are from the model’s predicted values. Different combinations of these two values provide different information about how the regression model compares to the mean model.
  • 9. Sum of Squares Total (SST) is a measure of the total sample variation in the yi; that is, it measures how spread out the yi are in the sample 𝑆𝑆𝑇 = 𝑖=1 𝑛 𝑦𝑖 − 𝑦 2 Sum of Squares Error (SSE) measures the sample variation in the 𝑦𝑖 𝑆𝑆𝐸 = 𝑖=1 𝑛 𝑦𝑖 − 𝑦 2
  • 10. The R-squared of the regression (the coefficient of determination) is defined as Adjusted R-squared incorporates the model’s degrees of freedom. 𝑅2 = 𝑆𝑆𝐸 𝑆𝑆𝑇 R2 is the ratio of the explained variation compared to the total variation; thus, it is interpreted as the fraction of the sample variation in y that is explained by x The value of R2 is always between zero and one  Adjusted R-squared will decrease as predictors are added if the increase in model fit does not make up for the loss of degrees of freedom  R-squared should always be used with models with more than one predictor variable. It is interpreted as the proportion of total variance that is explained by the model.
  • 11. The F-test An F test is used to test the significance of R. The hypotheses are H0: p= 0 and H1: p ≠ 0 where r represents the population correlation coefficient for multiple correlation The formula for the F test is 𝐹 = 𝑅2/𝑘 (1 − 𝑅2)/(𝑛 − 𝑘 − 1) where n is the number of data groups (x1, x2, . . . , y) and k is the number of independent variables. The degrees of freedom are d.f.N. = n - k and d.f.D. =n - k - 1. A significant F-test indicates that the observed R-squared is reliable and is not a spurious result of oddities in the data set.
  • 12. Root Mean Square Error (RMSE) is the square root of the variance of the residuals. It indicates the absolute fit of the model to the data–how close the observed data points are to the model’s predicted values RMSE is a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is prediction.
  • 13. The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than 0.05 or 0.01, corresponding respectively to a 5% or 1% chance of rejecting the null hypothesis when it is true. The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable. Conversely, a larger (insignificant) p-value suggests that changes in the predictor are not associated with changes in the response.