Upcoming SlideShare
×

Like this presentation? Why not share!

# Chapter 10

## on May 22, 2008

• 3,717 views

### Views

Total Views
3,717
Views on SlideShare
3,711
Embed Views
6

Likes
1
112
0

### 1 Embed6

 http://www.slideshare.net 6

### Report content

• Comment goes here.
Are you sure you want to

## Chapter 10Presentation Transcript

• Chapter 10: Correlation and Regression
• Correlation & Regression
• Correlation is a statistical method used to determine if a relationship between variables exists.
• Regression is the statistical method used to describe the nature of the relationship between variables - that is, positive or negative, linear or nonlinear.
• Independent and Dependent Variable
• There are two types of variables in a regression analysis:
• The independent variable is the variable in regression that can be controlled or manipulated.
• The dependent variable is the variable that cannot be controlled or manipulated
• Scatter Plot
• A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y.
• The scatter plot is a visual way to describe the nature of the relationship between the independent and dependent variables.
• Analyzing a Scatter Plot
• After the plot is drawn, it should be analyzed to determine which type of relationship, if any exists.
• A positive relationship exists when both variables increase or decrease
• Ex. ______________________________________
• A negative relationship exists when one variable decreases while the other increases.
• Ex. ______________________________________
• Example: Construct a scatter plot of the data and discuss any trends that you see.
• Yield of Wheat vs. Rainfall
54.4 71.3 44.5 41.6 80.6 52.2 28.7 62.5 Yield of Wheat (bushels per acre) 13.1 15.9 10.3 8.8 18.6 11.3 7.2 12.9 Rainfall (inches)
• Example: Prepare a scatter plot of the data and discuss any trends.
• Description: Ice cream consumption was measured over 30 four-week periods from March 18, 1951 to July 11, 1953. The purpose of the study was to determine if ice cream consumption depends on the variables price, income, or temperature. The variables Lag-temp and Year have been added to the original data.
• Correlation Coefficient
• The correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables.
• The symbol for the sample correlation coefficient is r .
• The symbol for the population correlation coefficient is  .
• Correlation Coefficient (cont’d.)
• The range of the correlation coefficient is from  1 to  1.
• If there is a strong positive linear relationship between the variables, the value of r will be close to  1.
• If there is a strong negative linear relationship between the variables, the value of r will be close to  1.
• Correlation Coefficient (cont’d.)
• When there is no linear relationship between the variables or only a weak relationship, the value of r will be close to 0.
Strong negative linear relationship Strong positive linear relationship  1  1 0 No linear relationship
• Formula for the Correlation Coefficient r
• where n is the number of data pairs.
• Possible Relationships Between Variables
• There is a direct cause-and-effect relationship between the variables : that is, x causes y .
• There is a reverse cause-and-effect relationship between the variables : that is, y causes x .
• The relationship between the variable may be caused by a third variable : that is, y may appear to cause x but in reality z causes x .
• Possible Relationships Between Variables
• There may be a complexity of interrelationships among many variables ; that is, x may cause y but w , t , and z fit into the picture as well.
• The relationship may be coincidental : although a researcher may find a relationship between x and y , common sense may prove otherwise.
• Interpretation of Relationships
• When the null hypothesis is rejected, the researcher must consider all possibilities and select the appropriate relationship between the variables as determined by the study. Remember, correlation does not necessarily imply causation.
• Example
• A medical researcher wishes to determine how the dosage (in milligrams) of a drug affects the heart rate of the patient. The data for seven patients are given here. Draw the scatter plot for the variables and compute the correlation coefficient.
82 80 88 92 93 90 95 Heart Rate, y 0.50 0.40 0.35 0.30 0.25 0.20 0.125 Drug Dosage, x
• Example
• A researcher wishes to determine whether there is a relationship between the age (in years) of grocery store cash registers and monthly maintenance cost. The data follow. Draw the scatter plot for the variables and compute the correlation coefficient.
87 90 83 65 70 90 75 Cost, y 4 6 2 1 3 4 2 Age, x
• Steps in Regression Analysis
• First. Collect the data.
• Second: Construct a scatter plot to see if there is any linear relationship between the variables.
• Third: Compute the value of the correlation coefficient.
• Steps in Regression Analysis
• Fourth: If the value of the correlation coefficient is significant, then determine the equation of the regression line which is the data’s line of best fit. Note: Determining the regression line when r is not significant is meaningless.
• The purpose of the regression line is to enable the researcher to see the trend and make predictions on the basis of the data.
• The line of best fit is the line that minimizes the sum of the squared residual.
• The closer the points fit the regression line, the higher the absolute value of r and the closer it will be to -1 or 1
• When all points fall directly on the line, r will equal 1 or -1 and this indicates a perfect linear relationship between the variables.
• The values y’ - y are called residuals. The residual is the difference between the actual (observed) value y and the predicted value y’. The sum of the residuals is always zero. The regression line determined by the formulas is the line that best fits the points of the observed data. This line is also called the least-squares line because the sum of the ____________ of the ___________ computed using the regression line is the ___________ __________ ________
• Recall from algebra that the equation of the line can be given by y = mx + b, where m is the slope of the line and b is the y-intercept.
• In statistics, the equation of the regression line is y’=ax + b where y’ represents the predicted function value of x.
• Example: Drug dosage vs. Heart Rate Data Revisited
• Find the regression equation for the drug dosage and heart rate data.
• Use the equation of the regression line to predict the heart rate of a patient given a dosage of .27 milligrams.
• Example: Age of Register vs. Repair Cost Data Revisited
• Find the regression equation for the age of the cash register and repair cost data.
• Use the equation of the regression line to predict the repair cost of a register that is 4 years old.
• The Coefficient of Determination
• The coefficient of determination is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable. The symbol for the coefficient of determination is r 2 .
• _________ = _______________
• The coefficient of nondetermination is the measure of the unexplained variation. It is found by subtracting the coefficient of determination from 1.
• Coefficient of Nondetermination = __________
• Example
• Determine the percentage of explained variation and unexplained variation in the heart rate data.
• Example
• Determine the coefficient of determination for the cash register data and explain what it means.
•