Linear regression analysis can be used to predict the value of a dependent variable based on the value of an independent variable. It involves finding coefficients for the regression equation that minimize the sum of squared errors between observed and predicted values. These coefficients are estimated via least squares regression. The slope and intercept of the regression line can be interpreted, and the model can be used to predict individual values that fall within the observed range of the independent variable.
This chapter introduces simple linear regression. Simple linear regression finds the linear relationship between a dependent variable (Y) and a single independent variable (X). It estimates the regression coefficients (intercept and slope) that best predict Y from X using the least squares method. The chapter provides an example of predicting house prices from square footage. It explains how to interpret the regression coefficients and make predictions. Key outputs like the coefficient of determination (r-squared), standard error, and assumptions of the regression model are also introduced. Residual analysis is discussed as a way to check if the assumptions are met.
This presentation introduces regression analysis. It discusses key concepts such as dependent and independent variables, simple and multiple regression, and linear and nonlinear regression models. It also covers different types of regression including simple linear regression, cross-sectional vs time series data, and methods for building regression models like stepwise regression and forward/backward selection. Examples are provided to demonstrate calculating regression equations using the least squares method and computing deviations from mean values.
This document provides an overview of multiple regression analysis. It defines multiple regression, explains how to interpret regression coefficients and outputs, and discusses best practices for variable selection and assessing assumptions. Examples are provided on how to conduct multiple regression in SPSS to analyze customer survey data from two restaurants. Advanced topics like multicollinearity and dummy variables are also mentioned.
This document provides an overview of simple linear regression. It defines regression as determining the statistical relationship between variables where changes in one variable depend on changes in another. Regression analysis is used for prediction and exploring relationships between dependent and independent variables. The key aspects covered include:
- Dependent variables change due to independent variables.
- Lines of regression show the relationship between the variables.
- The method of least squares is used to determine the line of best fit that minimizes the error between predicted and actual values.
- Linear regression models take the form of y = a + bx and are used for tasks like prediction and determining impact of independent variables.
A two-way ANOVA analyzes the influence of two independent variables on a single dependent variable. It tests for main effects of each independent variable as well as interactions between the variables. The independent variables are categorical and the dependent variable is measured on an ordinal or ratio scale. It compares sums of squares and mean squares to determine if the means of observations grouped by each factor differ significantly. An example tests the effect of gender and age on test scores, with gender, age as independent variables and test score as the dependent variable.
Correlation and Regression Analysis using SPSS and Microsoft ExcelSetia Pramana
This document discusses correlation and linear regression analysis. It covers correlation coefficients, linear relationships between variables, assumptions of linear regression, and using SPSS and Excel to conduct correlation and regression analyses. Pearson and Spearman correlation coefficients are introduced as measures of the linear association between two continuous variables. Simple and multiple linear regression models are explained as tools to predict an outcome variable from one or more predictor variables.
This document describes using logistic regression to analyze data on smoking, matches use, and lung cancer while adjusting for potential confounding. It presents sample data stratified by smoking and matches use, then develops a logistic regression model with smoking and matches as predictors. The model indicates smoking significantly increases lung cancer risk but matches use does not modify this relationship. The document concludes by noting logistic regression can simultaneously adjust for multiple variables and derive coefficient estimates using maximum likelihood.
This chapter introduces simple linear regression. Simple linear regression finds the linear relationship between a dependent variable (Y) and a single independent variable (X). It estimates the regression coefficients (intercept and slope) that best predict Y from X using the least squares method. The chapter provides an example of predicting house prices from square footage. It explains how to interpret the regression coefficients and make predictions. Key outputs like the coefficient of determination (r-squared), standard error, and assumptions of the regression model are also introduced. Residual analysis is discussed as a way to check if the assumptions are met.
This presentation introduces regression analysis. It discusses key concepts such as dependent and independent variables, simple and multiple regression, and linear and nonlinear regression models. It also covers different types of regression including simple linear regression, cross-sectional vs time series data, and methods for building regression models like stepwise regression and forward/backward selection. Examples are provided to demonstrate calculating regression equations using the least squares method and computing deviations from mean values.
This document provides an overview of multiple regression analysis. It defines multiple regression, explains how to interpret regression coefficients and outputs, and discusses best practices for variable selection and assessing assumptions. Examples are provided on how to conduct multiple regression in SPSS to analyze customer survey data from two restaurants. Advanced topics like multicollinearity and dummy variables are also mentioned.
This document provides an overview of simple linear regression. It defines regression as determining the statistical relationship between variables where changes in one variable depend on changes in another. Regression analysis is used for prediction and exploring relationships between dependent and independent variables. The key aspects covered include:
- Dependent variables change due to independent variables.
- Lines of regression show the relationship between the variables.
- The method of least squares is used to determine the line of best fit that minimizes the error between predicted and actual values.
- Linear regression models take the form of y = a + bx and are used for tasks like prediction and determining impact of independent variables.
A two-way ANOVA analyzes the influence of two independent variables on a single dependent variable. It tests for main effects of each independent variable as well as interactions between the variables. The independent variables are categorical and the dependent variable is measured on an ordinal or ratio scale. It compares sums of squares and mean squares to determine if the means of observations grouped by each factor differ significantly. An example tests the effect of gender and age on test scores, with gender, age as independent variables and test score as the dependent variable.
Correlation and Regression Analysis using SPSS and Microsoft ExcelSetia Pramana
This document discusses correlation and linear regression analysis. It covers correlation coefficients, linear relationships between variables, assumptions of linear regression, and using SPSS and Excel to conduct correlation and regression analyses. Pearson and Spearman correlation coefficients are introduced as measures of the linear association between two continuous variables. Simple and multiple linear regression models are explained as tools to predict an outcome variable from one or more predictor variables.
This document describes using logistic regression to analyze data on smoking, matches use, and lung cancer while adjusting for potential confounding. It presents sample data stratified by smoking and matches use, then develops a logistic regression model with smoking and matches as predictors. The model indicates smoking significantly increases lung cancer risk but matches use does not modify this relationship. The document concludes by noting logistic regression can simultaneously adjust for multiple variables and derive coefficient estimates using maximum likelihood.
Simple Regression presentation is a
partial fulfillment to the requirement in PA 297 Research for Public Administrators, presented by Atty. Gayam , Dr. Cabling and Mr. Cagampang
The document presents the results of a simple linear regression analysis conducted by a black belt to predict the number of calls answered (dependent variable) based on staffing levels (independent variable) using data collected over 240 samples in a call center. The regression equation found 83.4% of the variation in calls answered was explained by staffing levels. Notable outliers and leverage points were identified that could impact the strength of the predicted relationship between calls answered and staffing.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
This document provides an introduction to basic statistics and regression analysis. It defines regression as relating to or predicting one variable based on another. Regression analysis is useful for economics and business. The document outlines the objectives of understanding simple linear regression, regression coefficients, and merits and demerits of regression analysis. It describes types of regression including simple and multiple regression. Key concepts explained in more detail include regression lines, regression equations, regression coefficients, and the difference between correlation and regression. Examples are provided to demonstrate calculating regression equations using different methods.
a full lecture presentation on ANOVA .
areas covered include;
a. definition and purpose of anova
b. one-way anova
c. factorial anova
d. mutiple anova
e MANOVA
f. POST-HOC TESTS - types
f. easy step by step process of calculating post hoc test.
This document discusses multiple linear regression analysis. It begins by defining a multiple regression equation that describes the relationship between a response variable and two or more explanatory variables. It notes that multiple regression allows prediction of a response using more than one predictor variable. The document outlines key elements of multiple regression including visualization of relationships, statistical significance testing, and evaluating model fit. It provides examples of interpreting multiple regression output and using the technique to predict outcomes.
The document summarizes a study on applying ordinal logistic regression to analyze a proposed new integrated education plan takaful (Islamic insurance) product. The study used a questionnaire distributed to 410 respondents to collect data on demographics and preferences. Ordinal logistic regression and correlation analyses found high acceptance of the integrated plan among all income levels. The proposed plan combines multiple riders into one affordable plan.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.2: Regression
Introduction to correlation and regression analysisFarzad Javidanrad
This document provides an introduction to correlation and regression analysis. It defines key concepts like variables, random variables, and probability distributions. It discusses how correlation measures the strength and direction of a linear relationship between two variables. Correlation coefficients range from -1 to 1, with values closer to these extremes indicating stronger correlation. The document also introduces determination coefficients, which measure the proportion of variance in one variable explained by the other. Regression analysis builds on correlation to study and predict the average value of one variable based on the values of other explanatory variables.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 4: Probability
4.3: Complements and Conditional Probability, and Bayes' Theorem
The document discusses simple linear regression analysis. It provides definitions and formulas for simple linear regression, including that the regression equation is y = a + bx. An example is shown of using the stepwise method to determine if there is a significant relationship between number of absences (x) and grades (y) for students. The analysis finds a significant negative relationship, meaning more absences correlated with lower grades. Formulas are provided for calculating the slope, intercept, and testing significance of the regression model.
This document provides an overview of logistic regression. It begins by defining logistic regression as a specialized form of regression used when the dependent variable is dichotomous while the independent variables can be of any type. It notes logistic regression allows prediction of discrete variables from continuous and discrete predictors without assumptions about variable distributions. The document then discusses why logistic regression is used when assumptions of other regressions like normality and equal variance are violated. It also outlines how to perform and interpret logistic regression including assessing model fit. Finally, it provides an example research question and hypotheses about predicting solar panel adoption using household income and mortgage as predictors.
This document defines and explains various types of regression analysis including linear, logistic, polynomial, stepwise, ridge and lasso regression. It discusses the key differences between correlation and regression. It also covers topics such as the least squares method, R-squared/coefficient of determination, adjusted R-squared, limitations of regression analysis and applications of regression analysis.
The document discusses various statistical concepts including range, mean deviation, variance, and standard deviation. It provides formulas and steps to calculate each measure. The range is the distance between the highest and lowest values. Mean deviation measures the average deviation from the mean. Variance is the average of the squared deviations from the mean and standard deviation is the square root of the variance, representing the average distance from the mean. Examples are given to demonstrate calculating each measure for both ungrouped and grouped data.
This document provides an overview of simple linear regression analysis. It discusses estimating regression coefficients using the least squares method, interpreting the regression equation, assessing model fit using measures like the standard error of the estimate and coefficient of determination, testing hypotheses about regression coefficients, and using the regression model to make predictions.
Correlation and regression analysis are statistical methods used to determine if a relationship exists between variables and describe the nature of that relationship. A scatter plot graphs the independent and dependent variables and allows visualization of any trends in the data. The correlation coefficient measures the strength and direction of the linear relationship between variables, ranging from -1 to 1. Regression finds the linear "best fit" line that minimizes the residuals and can be used to predict dependent variable values.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
- Regression analysis is a statistical technique for modeling relationships between variables, where one variable is dependent on the others. It allows predicting the average value of the dependent variable based on the independent variables.
- The key assumptions of regression models are that the error terms are normally distributed with zero mean and constant variance, and are independent of each other.
- Linear regression specifies that the dependent variable is a linear combination of the parameters, though the independent variables need not be linearly related. In simple linear regression with one independent variable, the least squares estimates of the intercept and slope are calculated to minimize the sum of squared errors.
linear Regression, multiple Regression and AnnovaMansi Rastogi
This document provides an overview of simple linear regression analysis. It defines key concepts such as the regression line, slope, intercept, and error term. The learning objectives are to predict dependent variable values from independent variables, interpret regression coefficients, evaluate assumptions, and make inferences. An example uses house price data to fit a linear regression model with square footage as the independent variable. The slope is interpreted as the change in house price associated with an additional square foot. A t-test is used to infer whether square footage significantly affects price.
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
This document introduces linear regression and correlation analysis. It discusses calculating and interpreting the correlation coefficient and linear regression equation to determine the relationship between two variables. It covers scatter plots, the assumptions of regression analysis, and using regression to predict and describe relationships in data. Key terms introduced include the correlation coefficient, linear regression model, explained and unexplained variation, and the coefficient of determination.
Simple Regression presentation is a
partial fulfillment to the requirement in PA 297 Research for Public Administrators, presented by Atty. Gayam , Dr. Cabling and Mr. Cagampang
The document presents the results of a simple linear regression analysis conducted by a black belt to predict the number of calls answered (dependent variable) based on staffing levels (independent variable) using data collected over 240 samples in a call center. The regression equation found 83.4% of the variation in calls answered was explained by staffing levels. Notable outliers and leverage points were identified that could impact the strength of the predicted relationship between calls answered and staffing.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
This document provides an introduction to basic statistics and regression analysis. It defines regression as relating to or predicting one variable based on another. Regression analysis is useful for economics and business. The document outlines the objectives of understanding simple linear regression, regression coefficients, and merits and demerits of regression analysis. It describes types of regression including simple and multiple regression. Key concepts explained in more detail include regression lines, regression equations, regression coefficients, and the difference between correlation and regression. Examples are provided to demonstrate calculating regression equations using different methods.
a full lecture presentation on ANOVA .
areas covered include;
a. definition and purpose of anova
b. one-way anova
c. factorial anova
d. mutiple anova
e MANOVA
f. POST-HOC TESTS - types
f. easy step by step process of calculating post hoc test.
This document discusses multiple linear regression analysis. It begins by defining a multiple regression equation that describes the relationship between a response variable and two or more explanatory variables. It notes that multiple regression allows prediction of a response using more than one predictor variable. The document outlines key elements of multiple regression including visualization of relationships, statistical significance testing, and evaluating model fit. It provides examples of interpreting multiple regression output and using the technique to predict outcomes.
The document summarizes a study on applying ordinal logistic regression to analyze a proposed new integrated education plan takaful (Islamic insurance) product. The study used a questionnaire distributed to 410 respondents to collect data on demographics and preferences. Ordinal logistic regression and correlation analyses found high acceptance of the integrated plan among all income levels. The proposed plan combines multiple riders into one affordable plan.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.2: Regression
Introduction to correlation and regression analysisFarzad Javidanrad
This document provides an introduction to correlation and regression analysis. It defines key concepts like variables, random variables, and probability distributions. It discusses how correlation measures the strength and direction of a linear relationship between two variables. Correlation coefficients range from -1 to 1, with values closer to these extremes indicating stronger correlation. The document also introduces determination coefficients, which measure the proportion of variance in one variable explained by the other. Regression analysis builds on correlation to study and predict the average value of one variable based on the values of other explanatory variables.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 4: Probability
4.3: Complements and Conditional Probability, and Bayes' Theorem
The document discusses simple linear regression analysis. It provides definitions and formulas for simple linear regression, including that the regression equation is y = a + bx. An example is shown of using the stepwise method to determine if there is a significant relationship between number of absences (x) and grades (y) for students. The analysis finds a significant negative relationship, meaning more absences correlated with lower grades. Formulas are provided for calculating the slope, intercept, and testing significance of the regression model.
This document provides an overview of logistic regression. It begins by defining logistic regression as a specialized form of regression used when the dependent variable is dichotomous while the independent variables can be of any type. It notes logistic regression allows prediction of discrete variables from continuous and discrete predictors without assumptions about variable distributions. The document then discusses why logistic regression is used when assumptions of other regressions like normality and equal variance are violated. It also outlines how to perform and interpret logistic regression including assessing model fit. Finally, it provides an example research question and hypotheses about predicting solar panel adoption using household income and mortgage as predictors.
This document defines and explains various types of regression analysis including linear, logistic, polynomial, stepwise, ridge and lasso regression. It discusses the key differences between correlation and regression. It also covers topics such as the least squares method, R-squared/coefficient of determination, adjusted R-squared, limitations of regression analysis and applications of regression analysis.
The document discusses various statistical concepts including range, mean deviation, variance, and standard deviation. It provides formulas and steps to calculate each measure. The range is the distance between the highest and lowest values. Mean deviation measures the average deviation from the mean. Variance is the average of the squared deviations from the mean and standard deviation is the square root of the variance, representing the average distance from the mean. Examples are given to demonstrate calculating each measure for both ungrouped and grouped data.
This document provides an overview of simple linear regression analysis. It discusses estimating regression coefficients using the least squares method, interpreting the regression equation, assessing model fit using measures like the standard error of the estimate and coefficient of determination, testing hypotheses about regression coefficients, and using the regression model to make predictions.
Correlation and regression analysis are statistical methods used to determine if a relationship exists between variables and describe the nature of that relationship. A scatter plot graphs the independent and dependent variables and allows visualization of any trends in the data. The correlation coefficient measures the strength and direction of the linear relationship between variables, ranging from -1 to 1. Regression finds the linear "best fit" line that minimizes the residuals and can be used to predict dependent variable values.
Simple Linear Regression: Step-By-StepDan Wellisch
This presentation was made to our meetup group found here.: https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ on 9/26/2017. Our group is focused on technology applied to healthcare in order to create better healthcare.
- Regression analysis is a statistical technique for modeling relationships between variables, where one variable is dependent on the others. It allows predicting the average value of the dependent variable based on the independent variables.
- The key assumptions of regression models are that the error terms are normally distributed with zero mean and constant variance, and are independent of each other.
- Linear regression specifies that the dependent variable is a linear combination of the parameters, though the independent variables need not be linearly related. In simple linear regression with one independent variable, the least squares estimates of the intercept and slope are calculated to minimize the sum of squared errors.
linear Regression, multiple Regression and AnnovaMansi Rastogi
This document provides an overview of simple linear regression analysis. It defines key concepts such as the regression line, slope, intercept, and error term. The learning objectives are to predict dependent variable values from independent variables, interpret regression coefficients, evaluate assumptions, and make inferences. An example uses house price data to fit a linear regression model with square footage as the independent variable. The slope is interpreted as the change in house price associated with an additional square foot. A t-test is used to infer whether square footage significantly affects price.
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
This document introduces linear regression and correlation analysis. It discusses calculating and interpreting the correlation coefficient and linear regression equation to determine the relationship between two variables. It covers scatter plots, the assumptions of regression analysis, and using regression to predict and describe relationships in data. Key terms introduced include the correlation coefficient, linear regression model, explained and unexplained variation, and the coefficient of determination.
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
The regression coefficients are 0.8 and 0.2.
The coefficient of correlation r is the geometric mean of the regression coefficients, which is:
√(0.8 × 0.2) = 0.4
Therefore, the value of the coefficient of correlation is 0.4.
This chapter summary covers simple linear regression models. Key topics include determining the simple linear regression equation, measures of variation such as total, explained, and unexplained sums of squares, assumptions of the regression model including normality, homoscedasticity and independence of errors. Residual analysis is discussed to examine linearity and assumptions. The coefficient of determination, standard error of estimate, and Durbin-Watson statistic are also introduced.
- The document discusses simple linear regression analysis and how to use it to predict a dependent variable (y) based on an independent variable (x).
- Key points covered include the simple linear regression model, estimating regression coefficients, evaluating assumptions, making predictions, and interpreting results.
- Examples are provided to demonstrate simple linear regression analysis using data on house prices and sizes.
- The document discusses simple linear regression analysis and how it can be used to predict a dependent variable (e.g. house prices) based on an independent variable (e.g. house size).
- Key outputs of linear regression include the slope, intercept, and r-squared value. The slope and intercept define the linear regression line that best fits the data. R-squared indicates how well the regression line represents the data.
- Examples are provided of linear regression performed on a house price data set to predict prices based on size, including interpretation of slope, intercept, and r-squared.
This document discusses simple linear regression analysis. It begins by defining key terms like dependent variable, independent variable, and regression equation. It then presents the simple linear regression model formula and explains how to interpret the intercept and slope coefficients. The document demonstrates a simple linear regression example using house price and square footage data. It shows how to generate the regression equation in Excel and interpret the results, including making predictions. Finally, it discusses statistical tests like the t-test and F-test that can be used to evaluate the significance of the regression model and coefficients.
- Regression analysis is used to study the relationship between variables and predict how the value of one variable changes with the other. It is one of the most commonly used tools for business analysis.
- Simple linear regression analyzes the relationship between one independent variable and one dependent variable. The regression equation estimates the dependent variable as a linear function of the independent variable.
- Least squares regression fits a line to the data by minimizing the sum of the squared residuals, providing estimates of the slope and y-intercept coefficients in the regression equation.
Regression analysis is used to identify relationships between variables and make predictions. Simple linear regression fits a straight line to data using one independent variable to predict a dependent variable. Multiple linear regression uses more than one independent variable to explain variance in the dependent variable. The goal is to select variables that sufficiently explain variation in the dependent variable to allow for accurate prediction. Key outputs of regression include coefficients, R-squared, standard error, and significance values.
20 ms-me-amd-06 (simple linear regression)HassanShah124
This document discusses simple linear regression. It defines simple linear regression as having one independent variable and a linear relationship between the independent and dependent variables. The simple linear regression model is presented as Yi = β0 + β1Xi + Ԑi, where β0 is the intercept and β1 is the slope. Formulas to estimate the regression line and calculate statistics like the F-test, t-test, and R-squared are also provided. An example is worked through to demonstrate how to apply simple linear regression to a real data set.
1. The document discusses the simple linear regression model and how to derive the regression coefficients using the least squares method.
2. It uses a numerical example to show how to calculate the regression coefficients b1 and b2 by minimizing the sum of squared residuals.
3. The general method is then described for a model with n observations, where the regression coefficients b1 and b2 are the values that minimize the total sum of squared residuals.
A presentation on correlation and regression for engineering students studying probability and statistics. The presentation is designed according to syllabus of Institute of Engineering (IOE), Tribhuvan University. But the course content is similar to that of almost all the engineering universities.
The document discusses regression analysis, including definitions, uses, calculating regression equations from data, graphing regression lines, the standard error of estimate, and limitations. Regression analysis is a statistical technique used to understand the relationship between variables and allow for predictions. The document provides examples of calculating regression equations from various data sets and determining the standard error of estimate.
This document discusses relationships between variables in experiments. It defines two types of relationships: functional and statistical. A functional relationship is a perfect mathematical relationship where each value of the independent variable corresponds to a single, unique value of the dependent variable. A statistical relationship is imperfect, with a range of possible dependent variable values for each independent variable value. The document also discusses simple linear regression analysis, how to estimate regression coefficients, and how to interpret them to understand the relationship between variables.
This document discusses correlation, regression, and the general linear model. It defines correlation as assessing the relationship between two variables, while regression describes how well one variable can predict another. Pearson's r standardizes the covariance between variables. Linear regression finds the best-fitting line that minimizes the residuals through the least squares method. The coefficient of determination, r-squared, indicates how much variance in the dependent variable is explained by the independent variable. Multiple regression extends this to include multiple independent variables. The general linear model encompasses linear regression and can analyze effects across multiple dependent variables.
This document discusses correlation, regression, and the general linear model. It defines correlation as assessing the relationship between two variables, while regression describes how well one variable can predict another. Pearson's r standardizes the covariance between variables. Linear regression finds the best-fitting line that minimizes the residuals through the least squares method. The coefficient of determination, r2, indicates how much variance in the dependent variable is explained by the independent variable. Multiple regression extends this to include multiple independent variables. The general linear model encompasses both linear regression and multiple regression.
This document discusses correlation, regression, and the general linear model. It defines correlation as assessing the relationship between two variables, while regression describes how well one variable can predict another. Pearson's r standardizes the covariance between variables. Linear regression finds the best-fitting line that minimizes the residuals through the least squares method. The coefficient of determination, r-squared, indicates how much variance in the dependent variable is explained by the independent variable. Multiple regression extends this to include multiple independent variables. The general linear model encompasses linear regression and can analyze effects across multiple dependent variables.
Correlation & Regression for Statistics Social Sciencessuser71ac73
This document discusses correlation, regression, and the general linear model. It defines correlation as assessing the relationship between two variables, while regression describes how well one variable can predict another. Pearson's r standardizes the covariance between variables. Linear regression finds the best-fitting line that minimizes the residuals through the least squares method. The coefficient of determination, r-squared, indicates how much variance in the dependent variable is explained by the independent variable. Multiple regression extends this to include multiple independent variables. The general linear model encompasses both simple and multiple regression.
This document discusses correlation, regression, and the general linear model. It defines correlation as assessing the relationship between two variables, while regression describes how well one variable can predict another. Pearson's r standardizes the covariance between variables. Linear regression finds the best-fitting line that minimizes the residuals through the least squares method. The coefficient of determination, r-squared, indicates how much variance in the dependent variable is explained by the independent variable. Multiple regression extends this to include multiple independent variables. The general linear model encompasses both simple and multiple regression.
This document discusses correlation, regression, and the general linear model. It defines correlation as assessing the relationship between two variables, while regression describes how well one variable can predict another. Pearson's r standardizes the covariance between variables. Linear regression finds the best-fitting line that minimizes the residuals through the least squares method. The coefficient of determination, r-squared, indicates how much variance in the dependent variable is explained by the independent variable. Multiple regression extends this to include multiple independent variables. The general linear model encompasses both simple and multiple regression.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
1. Linearna Regresija
ILI - KAKO OBJASNITI METODU NAJMANJEG KVADRANTA TOTALNOM DEBILU
dr Dragoljub
mojja verzija na" Indijanskom Egleskom"
2. Learning Objectives
In this chapter, you learn:
How to use regression analysis to predict the value of
a dependent variable based on an independent
variable
The meaning of the regression coefficients b 0 and b1
How to evaluate the assumptions of regression
analysis and know what to do if the assumptions are
violated
To make inferences about the slope and correlation
coefficient
To estimate mean values and predict individual values
13-2
3. Correlation vs. Regression
A scatter plot can be used to show the
relationship between two variables
DCOVA
Correlation analysis is used to measure the
strength of the association (linear relationship)
between two variables
Correlation is only concerned with strength of the
relationship
No causal effect is implied with correlation
Scatter plots were first presented in Ch. 2
Correlation was first presented in Ch. 3
13-3
4. Introduction to
Regression Analysis
DCOVA
Regression analysis is used to:
Predict the value of a dependent variable based on
the value of at least one independent variable
Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable:
the variable we wish to
predict or explain
Independent variable: the variable used to predict
or explain the dependent
variable
13-4
5. Simple Linear Regression
Model
DCOVA
Only one independent variable, X
Relationship between X and Y is
described by a linear function
Changes in Y are assumed to be related
to changes in X
13-5
9. Simple Linear Regression
Model
DCOVA
Population
Y intercept
Dependent
Variable
Population
Slope
Coefficient
Independent
Variable
Random
Error
term
Yi β0 β1Xi ε i
Linear component
Random Error
component
13-9
11. Simple Linear Regression
Equation (Prediction Line) DCOVA
The simple linear regression equation provides an
estimate of the population regression line
Estimated
(or predicted)
Y value for
observation i
Estimate of
the regression
intercept
Estimate of the
regression slope
ˆ b b X
Yi
0
1 i
Value of X for
observation i
13-11
12. The Least Squares Method
DCOVA
b0 and b1 are obtained by finding the values of
that minimize the sum of the squared
ˆ
differences between Y and Y :
ˆ )2 min (Y (b b X ))2
min (Yi Yi
i
0
1 i
13-12
13. Finding the Least Squares
Equation
DCOVA
The coefficients b0 and b1 , and other
regression results in this chapter, will be
found using Excel
Formulas are shown in the text for those
who are interested
13-13
14. Interpretation of the
Slope and the Intercept
DCOVA
b0 is the estimated average value of Y
when the value of X is zero
b1 is the estimated change in the
average value of Y as a result of a
one-unit increase in X
13-14
15. Simple Linear Regression
Example
DCOVA
A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
13-15
16. Simple Linear Regression
Example: Data
House Price in $1000s
(Y)
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
DCOVA
1700
13-16
21. Simple Linear Regression Example:
Excel Output
DCOVA
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
The regression equation is:
house price 98.24833 0.10977 (square feet)
41.33032
Observations
10
ANOVA
df
SS
MS
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
11.0848
Significance F
1708.1957
Total
F
P-value
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
13-21
22. Simple Linear Regression Example:
Graphical Representation
DCOVA
House price model: Scatter Plot and Prediction Line
Slope
= 0.10977
Intercept
= 98.248
house price 98.24833 0.10977 (square feet)
13-22
23. Simple Linear Regression
Example: Interpretation of bo
DCOVA
house price 98.24833 0.10977 (square feet)
b0 is the estimated average value of Y when the
value of X is zero (if X = 0 is in the range of
observed X values)
Because a house cannot have a square footage
of 0, b0 has no practical application
13-23
24. Simple Linear Regression
Example: Interpreting b1
DCOVA
house price 98.24833 0.10977 (square feet)
b1 estimates the change in the average
value of Y as a result of a one-unit
increase in X
Here, b1 = 0.10977 tells us that the mean value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
13-24
25. Simple Linear Regression
Example: Making Predictions
Predict the price for a house
with 2000 square feet:
DCOVA
house price 98.25 0.1098 (sq.ft.)
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
13-25
26. Simple Linear Regression
Example: Making Predictions
DCOVA
When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation
Do not try to
extrapolate
beyond the range
of observed X’s
13-26
27. Measures of Variation
DCOVA
Total variation is made up of two parts:
SST
SSR
Total Sum of
Squares
Regression Sum
of Squares
SST ( Yi Y )2
ˆ
SSR ( Yi Y )2
SSE
Error Sum of
Squares
ˆ
SSE ( Yi Yi )2
where:
Y = Mean value of the dependent variable
Yi = Observed value of the dependent variable
ˆ
Yi = Predicted value of Y for the given X i value
13-27
28. Measures of Variation
(continued)
DCOVA
SST = total sum of squares
Measures the variation of the Y i values around their
mean Y
SSR = regression sum of squares (Explained Variation)
(Total Variation)
Variation attributable to the relationship between X
and Y
SSE = error sum of squares (Unexplained Variation)
Variation in Y attributable to factors other than X
13-28
30. Coefficient of Determination, r2
DCOVA
The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
The coefficient of determination is also called
r-squared and is denoted as r2
SSR regression sum of squares
r
SST
total sum of squares
2
note:
0 r 1
2
13-30
31. Examples of Approximate
r2 Values
DCOVA
Y
r2 = 1
r2 = 1
X
100% of the variation in Y is
explained by variation in X
Y
r2
=1
Perfect linear relationship
between X and Y:
X
13-31
32. Examples of Approximate
r2 Values
DCOVA
Y
0 < r2 < 1
X
Weaker linear relationships
between X and Y:
Some but not all of the
variation in Y is explained
by variation in X
Y
X
13-32
33. Examples of Approximate
r2 Values
DCOVA
r2 = 0
Y
No linear relationship
between X and Y:
r2 = 0
X
The value of Y does not
depend on X. (None of the
variation in Y is explained
by variation in X)
13-33
34. Simple Linear Regression Example:
Coefficient of Determination, r2 in Excel
DCOVA
SSR 18934.9348
r
0.58082
SST 32600.5000
2
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
58.08% of the variation in
house prices is explained by
variation in square feet
41.33032
Observations
10
ANOVA
df
SS
MS
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
11.0848
Significance F
1708.1957
Total
F
P-value
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
13-34
35. Standard Error of Estimate
DCOVA
The standard deviation of the variation of
observations around the regression line is
estimated by
n
SSE
S YX
n2
ˆ
(Yi Yi ) 2
i 1
n2
Where
SSE = error sum of squares
n = sample size
13-35
36. Simple Linear Regression Example:
Standard Error of Estimate in Excel
DCOVA
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
S YX 41.33032
0.52842
Standard Error
41.33032
Observations
10
ANOVA
df
SS
MS
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
11.0848
Significance F
1708.1957
Total
F
P-value
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
13-36
37. Comparing Standard Errors
SYX is a measure of the variation of observed
Y values from the regression line
Y
DCOVA
Y
small SYX
X
large SYX
X
The magnitude of SYX should always be judged relative to the
size of the Y values in the sample data
i.e., SYX = $41.33K is moderately small relative to house prices
in the $200K - $400K range
13-37
38. Assumptions of Regression
L.I.N.E
DCOVA
Linearity
The relationship between X and Y is linear
Independence of Errors
Error values are statistically independent
Normality of Error
Error values are normally distributed for any given
value of X
Equal Variance (also called homoscedasticity)
The probability distribution of the errors has constant
variance
13-38
39. Residual Analysis
ˆ
ei Yi Yi
The residual for observation i, e i, is the difference
between its observed and predicted value
Check the assumptions of regression by examining the
residuals
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
DCOVA
Examine for constant variance for all levels of X
(homoscedasticity)
Graphical Analysis of Residuals
Can plot residuals vs. X
13-39
40. Residual Analysis for Linearity
DCOVA
Y
Y
x
x
Not Linear
residuals
residuals
x
x
Linear
13-40
42. Checking for Normality
DCOVA
Examine the Stem-and-Leaf Display of the
Residuals
Examine the Boxplot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the
Residuals
13-42
43. Residual Analysis for Normality
DCOVA
When using a normal probability plot, normal
errors will approximately display in a straight line
Percent
100
0
-3
-2
-1
0
1
2
3
Residual
13-43
44. Residual Analysis for
Equal Variance
DCOVA
Y
Y
x
x
Non-constant variance
residuals
residuals
x
x
Constant variance
13-44
45. Simple Linear Regression Example:
Excel Residual Output
DCOVA
RESIDUAL OUTPUT
Predicted
House Price
Residuals
1
251.92316
-6.923162
2
273.87671
38.12329
3
284.85348
-5.853484
4
304.06284
3.937162
5
218.99284
-19.99284
6
268.38832
-49.38832
7
356.20251
48.79749
8
367.17929
-43.17929
9
254.6674
64.33264
10
284.85348
-29.85348
Does not appear to violate
any regression assumptions
13-45
46. Measuring Autocorrelation:
The Durbin-Watson Statistic
DCOVA
Used when data are collected over time to
detect if autocorrelation is present
Autocorrelation exists if residuals in one
time period are related to residuals in
another period
13-46
47. Autocorrelation
DCOVA
Autocorrelation is correlation of the errors
(residuals) over time
Here, residuals show a
cyclic pattern, not
random. Cyclical
patterns are a sign of
positive autocorrelation
Violates the regression assumption that residuals
are random and independent
13-47
48. The Durbin-Watson Statistic
DCOVA
The Durbin-Watson statistic is used to test for
autocorrelation
H0: residuals are not correlated
H1: positive autocorrelation is present
n
D
(e e
i 2
i
i1
2
)
The possible range is 0 ≤ D ≤ 4
D should be close to 2 if H 0 is true
n
ei2
i1
D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
13-48
49. Testing for Positive
Autocorrelation
H0: positive autocorrelation does not exist
DCOVA
H1: positive autocorrelation is present
Calculate the Durbin-Watson test statistic = D
(The Durbin-Watson Statistic can be found using Excel or Minitab)
Find the values dL and dU from the Durbin-Watson table
(for sample size n and number of independent variables k)
Decision rule: reject H0 if D < dL
Reject H0
0
Inconclusive
dL
Do not reject H0
dU
2
13-49
51. Testing for Positive
Autocorrelation
(continued)
DCOVA
Example with n = 25:
Excel/PHStat output:
Durbin-Watson Calculations
Sum of Squared
Difference of Residuals
3296.18
Sum of Squared
Residuals
3279.98
Durbin-Watson
Statistic
1.00494
n
D
(ei ei1 )2
i 2
n
e
i1
2
3296.18
1.00494
3279.98
i
13-51
52. Testing for Positive
Autocorrelation
(continued)
DCOVA
Here, n = 25 and there is k = 1 one independent variable
Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
D = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
Decision: reject H0 since
D = 1.00494 < d L
Reject H0
0
Inconclusive
dL=1.29
Do not reject H0
dU=1.45
2
13-52
53. Inferences About the Slope
DCOVA
The standard error of the regression slope
coefficient (b1) is estimated by
S YX
Sb1
SSX
S YX
(Xi X)2
where:
Sb1
= Estimate of the standard error of the slope
S YX
SSE
= Standard error of the estimate
n2
13-53
54. Inferences About the Slope:
t Test
DCOVA
t test for a population slope
Null and alternative hypotheses
Is there a linear relationship between X and Y?
H0: β1 = 0
H1: β1 ≠ 0
(no linear relationship)
(linear relationship does exist)
Test statistic
t STAT
b1 β 1
Sb
1
d.f. n 2
where:
b1 = regression slope
coefficient
β1 = hypothesized slope
Sb1 = standard
error of the slope
13-54
55. Inferences About the Slope:
t Test Example
DCOVA
House Price
in $1000s
(y)
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
Estimated Regression Equation:
1700
house price 98.25 0.1098 (sq.ft.)
The slope of this model is 0.1098
Is there a relationship between the
square footage of the house and its
sales price?
13-55
56. Inferences About the Slope:
t Test Example
From Excel output:
Coefficients
Intercept
Square Feet
DCOVA
H0: β1 = 0
H1: β1 ≠ 0
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
b1
Sb1
t STAT
b1 β 1
Sb
0.10977 0
3.32938
0.03297
1
13-56
57. Inferences About the Slope:
t Test Example
DCOVA
Test Statistic: tSTAT = 3.329
H0: β1 = 0
H1: β1 ≠ 0
d.f. = 10- 2 = 8
/2=.025
Reject H0
/2=.025
Do not reject H0
-tα/2
-2.3060
0
Reject H0
tα/2
2.3060
3.329
Decision: Reject H0
There is sufficient evidence
that square footage affects
house price
13-57
58. Inferences About the Slope:
t Test Example
DCOVA
H0: β1 = 0
H1: β1 ≠ 0
From Excel output:
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
Decision: Reject H0, since p-value < α
p-value
There is sufficient evidence that
square footage affects house price.
13-58
59. F Test for Significance
DCOVA
MSR
F Test statistic: F
STAT
MSE
where
MSR
SSR
k
SSE
MSE
n k 1
where FSTAT follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
(k = the number of independent variables in the regression model)
13-59
60. F-Test for Significance
Excel Output
DCOVA
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
FSTAT
MSR 18934.9348
11.0848
MSE 1708.1957
41.33032
Observations
10
With 1 and 8 degrees
of freedom
p-value for
the F-Test
ANOVA
df
SS
MS
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
9
11.0848
Significance F
1708.1957
Total
F
0.01039
32600.5000
13-60
61. F Test for Significance
(continued)
DCOVA
Test Statistic:
H0: β1 = 0
H1: β1 ≠ 0
= .05
df1= 1
df2 = 8
FSTAT
Decision:
Reject H0 at = 0.05
Critical
Value:
F = 5.32
Conclusion:
= .05
0
Do not
reject H0
Reject H0
MSR
11.08
MSE
F
There is sufficient evidence that
house size affects selling price
F.05 = 5.32
13-61
62. Confidence Interval Estimate
for the Slope
DCOVA
Confidence Interval Estimate of the Slope:
b1 t α / 2 S b
1
d.f. = n - 2
Excel Printout for House Prices:
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
At 95% level of confidence, the confidence interval for
the slope is (0.0337, 0.1858)
13-62
63. Confidence Interval Estimate
(continued)
for the Slope
DCOVA
Coefficients
Intercept
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
Square Feet
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Since the units of the house price variable is
$1000s, we are 95% confident that the average
impact on sales price is between $33.74 and
$185.80 per square foot of house size
This 95% confidence interval does not include 0.
Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance
13-63
64. t Test for a Correlation Coefficient
DCOVA
Hypotheses
H0: ρ = 0 (no correlation between X and Y)
H1: ρ ≠ 0 (correlation exists)
Test statistic
t STAT
r -ρ
2
1 r
n2
(with n – 2 degrees of freedom)
where
r r 2 if b1 0
r r 2 if b1 0
13-64
65. t-test For A Correlation Coefficient
(continued)
Is there evidence of a linear relationship
between square feet and house price at the
.05 level of significance?
H0: ρ = 0
H1: ρ ≠ 0
DCOVA
(No correlation)
(correlation exists)
=.05 , df = 10 - 2 = 8
t STAT
r ρ
1 r2
n2
.762 0
1 .762 2
10 2
3.329
13-65
66. t-test For A Correlation Coefficient
(continued)
DCOVA
t STAT
r ρ
1 r2
n2
.762 0
1 .762 2
10 2
3.329
Conclusion:
There is
evidence of a
linear association
at the 5% level of
significance
d.f. = 10-2 = 8
/2=.025
Reject H0
-tα/2
-2.3060
/2=.025
Do not reject H0
0
Decision:
Reject H0
Reject H0
tα/2
2.3060
3.329
13-66
67. Estimating Mean Values and
Predicting Individual Values
DCOVA
Goal: Form intervals around Y to express
uncertainty about the value of Y for a given X i
Confidence
Interval for
the mean of
Y, given Xi
Y
Y
Y = b0+b1Xi
Prediction Interval
for an individual Y,
given Xi
Xi
X
13-67
68. Confidence Interval for
the Average Y, Given X
DCOVA
Confidence interval estimate for the
mean value of Y given a particular Xi
Confidence interval for μ Y|X X :
i
ˆ
Y t α / 2 S YX hi
Size of interval varies according
to distance away from mean, X
1 (Xi X)2 1
(Xi X)2
hi
n
SSX
n (Xi X)2
13-68
69. Prediction Interval for
an Individual Y, Given X
DCOVA
Confidence interval estimate for an
Individual value of Y given a particular Xi
Confidence interval for YX X :
i
ˆ
Y t α / 2 S YX 1 hi
This extra term adds to the interval width to reflect
the added uncertainty for an individual case
13-69
70. Estimation of Mean Values:
Example
DCOVA
Confidence Interval Estimate for μY|X=X i
Find the 95% confidence interval for the mean price
of 2,000 square-foot houses
Predicted Price Yi = 317.85 ($1,000s)
1
ˆ t
Y 0.025S YX
n
(X i X) 2
(X i X)
2
317.85 37.12
The confidence interval endpoints are 280.66 and 354.90,
or from $280,660 to $354,900
13-70
71. Estimation of Individual Values:
Example
DCOVA
Prediction Interval Estimate for YX=X i
Find the 95% prediction interval for an individual
house with 2,000 square feet
Predicted Price Yi = 317.85 ($1,000s)
1
ˆ t
Y 0.025S YX 1
n
(X i X) 2
(X i X)
2
317.85 102.28
The prediction interval endpoints are 215.50 and 420.07,
or from $215,500 to $420,070
13-71
72. Finding Confidence and
Prediction Intervals in Excel
DCOVA
From Excel, use
PHStat | regression | simple linear regression …
Check the
“confidence and prediction interval for X= ”
box and enter the X-value and confidence level
desired
13-72
73. Finding Confidence and
Prediction Intervals in Excel
(continued)
DCOVA
Input values
Y
Confidence Interval Estimate for μY|X=Xi
Prediction Interval Estimate for Y X=Xi
13-73
74. Pitfalls of Regression Analysis
Lacking an awareness of the assumptions
underlying least-squares regression
Not knowing how to evaluate the assumptions
Not knowing the alternatives to least-squares
regression if a particular assumption is violated
Using a regression model without knowledge of
the subject matter
Extrapolating outside the relevant range
13-74
75. Strategies for Avoiding
the Pitfalls of Regression
Start with a scatter plot of X vs. Y to observe
possible relationship
Perform residual analysis to check the
assumptions
Plot the residuals vs. X to check for violations of
assumptions such as homoscedasticity
Use a histogram, stem-and-leaf display, boxplot,
or normal probability plot of the residuals to
uncover possible non-normality
13-75
76. Strategies for Avoiding
the Pitfalls of Regression
(continued)
If there is violation of any assumption, use
alternative methods or models
If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals
Avoid making predictions or forecasts outside
the relevant range
13-76
77. Chapter Summary
Introduced types of regression models
Reviewed assumptions of regression and
correlation
Discussed determining the simple linear
regression equation
Described measures of variation
Discussed residual analysis
Addressed measuring autocorrelation
13-77
78. Chapter Summary
(continued)
Described inference about the slope
Discussed correlation -- measuring the strength
of the association
Addressed estimation of mean values and
prediction of individual values
Discussed possible pitfalls in regression and
recommended strategies to avoid them
13-78