Correlation and regression

54,106 views

Published on

Published in: Technology
1 Comment
50 Likes
Statistics
Notes
No Downloads
Views
Total views
54,106
On SlideShare
0
From Embeds
0
Number of Embeds
30,639
Actions
Shares
0
Downloads
1,540
Comments
1
Likes
50
Embeds 0
No embeds

No notes for slide
  • Typically, in the summer as the temperature increases people are thirstier. Consider the two numerical variables, temperature and water consumption. We would expect the higher the temperature, the more water a given person would consume. Thus we would say that in the summer time, temperature and water consumption are positively correlated.
  • (The data is shown in the table with the temperature placed in increasing order.)
  • This graph helps us visualize what appears to be a somewhat linear relationship between temperature and the amount of water one drinks.
  • Direction of the Association: The association can be either positive or negative. Positive Correlation: as the x variable increases so does the y variable. Example: In the summer, as the temperature increases, so does thirst. Negative Correlation: as the x variable increases, the y variable decreases. Example: As the price of an item increases, the number of items sold decreases.
  • Strength of the Association:  The strength of the linear association is measured by the sample Correlation Coefficient, r.  r can be any value from –1 to +1.    The closer r is to one (in magnitude) the stronger the linear association.   If r equals zero, then there is no linear association between the two variables. 
  • *  No other values of r have precise definitions of strength. See the chart below. Note:  All of the values in the second table are positive. Thus the associations are positive. The same strength interpretations hold for negative values of r, only the direction interpretations of the association would change.
  • To describe or model a set of data with one dependent variable and one (or more) independent variables. To predict or estimate the values of the dependent variable based on given value(s) of the independent variable(s). To control or administer standards from a useable statistical relationship.
  • Simple: only one independent variable Linear in the Independent Variable:  the independent variable only appears to the first power.
  • Regression analysis tries to fit a model to one response variable based on one or more explanatory variables. In most cases, there will be error. See the graph below for an example of Simple Linear Regression. The actual data points (x,y) are the blue dots The Least Squares linear model (regression line) of the response (y) variable based on the the explanatory (x) variable is shown in black.   The errors (residuals) for each data point to its predicted value are the vertical lines shown in red. The goal, in general, is to minimize the errors from the actual data to the regression line. The least squares line minimizes the sum of the square of the errors.
  • If you were planning an outdoor party in the summer time, you might need to estimate how many soft drinks to buy. In your planning, you will, of course, need to know how many people will be attending. However, as you determine the number of soft drinks for each person, you might want to also consider how hot it will be. The temperatures have been forecasted for around 95 degrees on the day of the outdoor party. (In real life, you would probably make an estimate based on your previous experience and make a logical increase based on the temperature.) We will use some hypothetical data to determine how thirsty people might be. We will use the Temperature/Water example data for this.
  •  
  •  
  •  
  • And we cannot expect it be accurate at such low temperatures, because the sample of data was taken in the summer time and the temperatures range from 75 to 99 degrees Fahrenheit. Thus the model only predicts for temperatures in approximately that range. 
  • Solution: Substitute the value of x=95 (degrees F) into the regression equation y=1.5*x - 96.9  and solve for y (water consumption). y=1.5*x - 96.9 if x=95, then y=1.5*95 - 96.9 = 45.6 ounces.   Note: Since 95 degrees F is in the range of values that were used to find the regression equation, then we can use this equation to predict the water consumption. If the desired prediction had been for 50 degrees F, then the model should not be used, since the value of x does not fall in the range of predictability for this model. 
  • Note: This means that approximately 7% of the variation in the water consumption is not explained by the temperature. So perhaps there is another variable that accounts for the remaining portion of the variation.  Can you think of a reasonable variable? Multiple Regression is the name of the method that would include more than one independent variable to predict amount of water people would drink.
  • Predicting the Solar MaximumPlanning for satellite orbits and space missions often require advance knowledge of solar activity levels. NASA scientists are using new techniques to predict sunspot maxima years in advance. Click here for the latest predictions for the current solar cycle. Click on the image for current predictions
  • Compare the height and arm span of students in the class, except for one student. Test for the strength of association between the two variables. Measure the arm span of the one student and ask the class to predict the student's height.
  • Have students search the Internet or other sources for data on basketball players. In particular, find the number of points scored in a game and the number of minutes played in that same game. Based on the data, could the number of points scored in a game be predicted by the amount of time a player plays in a game? Have students suggest alternative ways to set up the situation that would provide better prediction capabilities and ask them to find the data to check their theories. (Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.)  
  • Correlation and regression

    1. 1. Introduction to Correlation and Regression ECONOMICS OF ICMAP, ICAP, MA-ECONOMICS, B.COM. FINANCIAL ACCOUNTING OF ICMAP STAGE 1,3,4 ICAP MODULE B, B.COM, BBA, MBA & PIPFA. COST ACCOUNTING OF ICMAP STAGE 2,3 ICAP MODULE D, BBA, MBA & PIPFA. CONTACT: KHALID AZIZ 0322-3385752 R-1173,ALNOOR SOCIETY, BLOCK 19,F.B.AREA, KARACHI, PAKISTAN
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Linear Correlation </li></ul><ul><li>Regression </li></ul><ul><ul><li>Simple Linear Regression </li></ul></ul><ul><ul><li>Using the TI-83 </li></ul></ul><ul><ul><li>Model/Formulas </li></ul></ul>
    3. 3. Outline continued <ul><li>Applications </li></ul><ul><ul><li>Real-life Applications </li></ul></ul><ul><ul><li>Practice Problems </li></ul></ul><ul><li>Internet Resources </li></ul><ul><ul><li>Applets </li></ul></ul><ul><ul><li>Data Sources </li></ul></ul>
    4. 4. Correlation <ul><li>Correlation </li></ul><ul><ul><li>A measure of association between two numerical variables. </li></ul></ul><ul><li>Example (positive correlation) </li></ul><ul><ul><li>Typically, in the summer as the temperature increases people are thirstier. </li></ul></ul>
    5. 5. Specific Example <ul><li>For seven random summer days, a person recorded the temperature and their water consumption , during a three-hour period spent outside.   </li></ul>Temperature (F) Water Consumption (ounces) 75 16 83 20 85   25 85 27 92 32 97 48 99 48
    6. 6. How would you describe the graph?
    7. 7. How “strong” is the linear relationship?
    8. 8. Measuring the Relationship <ul><li>Pearson’s Sample Correlation Coefficient, r </li></ul><ul><ul><li>measures the direction and the strength of the linear association between two numerical paired variables. </li></ul></ul>
    9. 9. Direction of Association <ul><li>Positive Correlation </li></ul><ul><li>Negative Correlation </li></ul>
    10. 10. Strength of Linear Association r value Interpretation 1 perfect positive linear relationship 0 no linear relationship -1 perfect negative linear relationship
    11. 11. Strength of Linear Association
    12. 12. Other Strengths of Association r value Interpretation 0.9 strong association 0.5 moderate association 0.25 weak association
    13. 13. Other Strengths of Association
    14. 14. Formula       = the sum         n = number of paired items       x i = input variable y i = output variable  x = x-bar = mean of x ’s y = y-bar = mean of y ’s s x = standard deviation of x ’s s y = standard deviation of y ’s
    15. 15. Regression <ul><li>Regression </li></ul><ul><ul><li>Specific statistical methods for finding the “line of best fit” for one response (dependent) numerical variable based on one or more explanatory (independent) variables. </li></ul></ul>
    16. 16. Curve Fitting vs. Regression <ul><li>Regression </li></ul><ul><ul><li>Includes using statistical methods to assess the &quot;goodness of fit&quot; of the model.  (ex. Correlation Coefficient) </li></ul></ul>
    17. 17. Regression: 3 Main Purposes <ul><li>To describe (or model) </li></ul><ul><li>To predict ( or estimate) </li></ul><ul><li>To control (or administer) </li></ul>
    18. 18. Simple Linear Regression <ul><li>Statistical method for finding </li></ul><ul><ul><li>the “line of best fit” </li></ul></ul><ul><ul><li>for one response (dependent) numerical variable </li></ul></ul><ul><ul><li>based on one explanatory (independent) variable.   </li></ul></ul>
    19. 19. Least Squares Regression <ul><li>GOAL - minimize the sum of the square of the errors of the data points. </li></ul><ul><li>This minimizes the Mean Square Error </li></ul>
    20. 20. Example <ul><li>Plan an outdoor party. </li></ul><ul><li>Estimate number of soft drinks to buy per person, based on how hot the weather is. </li></ul><ul><li>Use Temperature/Water data and regression . </li></ul>
    21. 21. Steps to Reaching a Solution <ul><li>Draw a scatterplot of the data. </li></ul>
    22. 22. Steps to Reaching a Solution <ul><li>Draw a scatterplot of the data. </li></ul><ul><li>Visually, consider the strength of the linear relationship. </li></ul>
    23. 23. Steps to Reaching a Solution <ul><li>Draw a scatterplot of the data. </li></ul><ul><li>Visually, consider the strength of the linear relationship. </li></ul><ul><li>If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. </li></ul>
    24. 24. Steps to Reaching a Solution <ul><li>Draw a scatterplot of the data. </li></ul><ul><li>Visually, consider the strength of the linear relationship. </li></ul><ul><li>If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. </li></ul><ul><li>If the correlation is still relatively strong, then find the simple linear regression line. </li></ul>
    25. 25. Our Next Steps <ul><li>Learn to Use the TI-83 for Correlation and Regression. </li></ul><ul><li>Interpret the Results (in the Context of the Problem). </li></ul>
    26. 26. Finding the Solution: TI-83 <ul><li>Using the TI- 83 graphing calculator </li></ul><ul><ul><li>Turn on the calculator diagnostics. </li></ul></ul><ul><ul><li>Enter the data. </li></ul></ul><ul><ul><li>Graph a scatterplot of the data. </li></ul></ul><ul><ul><li>Find the equation of the regression line and the correlation coefficient. </li></ul></ul><ul><ul><li>Graph the regression line on a graph with the scatterplot. </li></ul></ul>
    27. 27. Preliminary Step <ul><li>Turn the Diagnostics On. </li></ul><ul><ul><li>Press 2nd 0 (for Catalog). </li></ul></ul><ul><ul><li>Scroll down to DiagnosticOn . The marker points to the right of the words. </li></ul></ul><ul><ul><li>Press ENTER . Press ENTER again. </li></ul></ul><ul><ul><li>The word Done should appear on the right hand side of the screen. </li></ul></ul>
    28. 28. Example Temperature (F) Water Consumption (ounces) 75 16 83 20 85   25 85 27 92 32 97 48 99 48
    29. 29. 1. Enter the Data into Lists <ul><li>Press STAT . </li></ul><ul><li>Under EDIT , select 1: Edit . </li></ul><ul><li>Enter x-values (input) into L1 </li></ul><ul><li>Enter y-values (output) into L2 . </li></ul><ul><li>After data is entered in the lists, go to  2nd MODE to quit and return to the home screen. </li></ul><ul><li>Note: If you need to clear out a list, for example list 1, place the cursor on L1  then hit CLEAR and ENTER . </li></ul>
    30. 30. 2. Set up the Scatterplot. <ul><li>Press  2nd Y= (STAT PLOTS). </li></ul><ul><li>Select 1: PLOT 1 and hit  ENTER . </li></ul><ul><li>Use the arrow keys to move the cursor down to On and hit  ENTER . </li></ul><ul><li>Arrow down to Type: and select the first graph under Type. </li></ul><ul><li>Under Xlist: Enter L1 . </li></ul><ul><li>Under Ylist: Enter L2 . </li></ul><ul><li>Under Mark: select any of these. </li></ul>
    31. 31. 3. View the Scatterplot <ul><li>Press 2nd MODE to quit and return to the home screen. </li></ul><ul><li>To plot the points, press  ZOOM and select 9: ZoomStat . </li></ul><ul><li>The scatterplot will then be graphed. </li></ul>
    32. 32. 4. Find the regression line. <ul><li>Press  STAT . </li></ul><ul><li>Press CALC . </li></ul><ul><li>Select 4: LinReg(ax + b) . </li></ul><ul><li>Press 2nd 1 (for List 1) </li></ul><ul><li>Press the comma key , </li></ul><ul><li>Press 2nd 2 (for List 2) </li></ul><ul><li>Press  ENTER .   </li></ul>
    33. 33. 5. Interpreting and Visualizing <ul><li>Interpreting the result: </li></ul><ul><li>y = ax + b </li></ul><ul><ul><li>The value of a is the slope </li></ul></ul><ul><ul><li>The value of b is the y-intercept </li></ul></ul><ul><ul><li>r is the correlation coefficient </li></ul></ul><ul><ul><li>r 2 is the coefficient of determination </li></ul></ul>
    34. 34. 5. Interpreting and Visualizing <ul><li>Write down the equation of the line in slope intercept form. </li></ul><ul><li>Press  Y= and enter the equation under Y1. (Clear all other equations.)  </li></ul><ul><li>Press  GRAPH and the line will be graphed through the data points. </li></ul>
    35. 35. Questions ???
    36. 36. Interpretation in Context <ul><li>Regression Equation: </li></ul><ul><li>y=1.5*x - 96.9 </li></ul><ul><ul><li>Water Consumption = </li></ul></ul><ul><ul><li>1.5*Temperature - 96.9 </li></ul></ul><ul><ul><li>  </li></ul></ul>
    37. 37. Interpretation in Context <ul><li>Slope = 1.5 (ounces)/(degrees F) </li></ul><ul><ul><li>for each 1 degree F increase in temperature, you expect an increase of 1.5 ounces of water drank. </li></ul></ul><ul><ul><li>  </li></ul></ul>
    38. 38. Interpretation in Context <ul><li>y-intercept = -96.9 </li></ul><ul><ul><li>For this example, when the temperature is 0 degrees F, then a person would drink about -97 ounces of water. </li></ul></ul><ul><ul><li>That does not make any sense! </li></ul></ul><ul><ul><li>Our model is not applicable for x=0.   </li></ul></ul>
    39. 39. Prediction Example <ul><li>Predict the amount of water a person would drink when the temperature is 95 degrees F. </li></ul><ul><li>Solution: Substitute the value of x=95 (degrees F) into the regression equation and solve for y (water consumption). If x=95, y=1.5*95 - 96.9 = 45.6 ounces.   </li></ul>
    40. 40. Strength of the Association: r 2 <ul><li>Coefficient of Determination – r 2 </li></ul><ul><li>General Interpretation: The coefficient of determination tells the percent of the variation in the response variable that is explained (determined) by the model and the explanatory variable.   </li></ul>
    41. 41. Interpretation of r 2 <ul><li>Example: r 2 =92.7%. </li></ul><ul><li>Interpretation: </li></ul><ul><ul><li>Almost 93% of the variability in the amount of water consumed is explained by outside temperature using this model. </li></ul></ul><ul><ul><li>Note: Therefore 7% of the variation in the amount of water consumed is not explained by this model using temperature. </li></ul></ul>
    42. 42. Questions ???
    43. 43. Simple Linear Regression Model <ul><li>The model for </li></ul><ul><li>simple linear regression is </li></ul><ul><li>There are mathematical assumptions behind the concepts that </li></ul><ul><li>we are covering today. </li></ul>
    44. 44. Formulas <ul><li>Prediction Equation: </li></ul>
    45. 45. Real Life Applications <ul><li>Cost Estimating for Future Space Flight Vehicles (Multiple Regression) </li></ul>
    46. 46. Nonlinear Application <ul><li>Predicting when Solar Maximum Will Occur </li></ul><ul><li>http://science.msfc.nasa.gov/ssl/pad/ </li></ul><ul><li>solar/predict.htm </li></ul>
    47. 47. Real Life Applications <ul><li>Estimating Seasonal Sales for Department Stores (Periodic) </li></ul>
    48. 48. Real Life Applications <ul><li>Predicting Student Grades Based on Time Spent Studying </li></ul>
    49. 49. Real Life Applications <ul><li>. . . </li></ul><ul><li>What ideas can you think of? </li></ul><ul><li>What ideas can you think of that your students will relate to? </li></ul>
    50. 50. Practice Problems <ul><li>Measure Height vs. Arm Span </li></ul><ul><li>Find line of best fit for height. </li></ul><ul><li>Predict height for one student not in data set. Check predictability of model. </li></ul>
    51. 51. Practice Problems <ul><li>Is there any correlation between shoe size and height? </li></ul><ul><li>Does gender make a difference in this analysis? </li></ul>
    52. 52. Practice Problems <ul><li>Can the number of points scored in a basketball game be predicted by </li></ul><ul><ul><li>The time a player plays in the game? </li></ul></ul><ul><ul><li>By the player’s height? </li></ul></ul><ul><li>Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.) </li></ul>
    53. 53. JOIN KHALID AZIZ <ul><li>ECONOMICS OF ICMAP, ICAP, MA-ECONOMICS, B.COM. </li></ul><ul><li>FINANCIAL ACCOUNTING OF ICMAP STAGE 1,3,4 ICAP MODULE B, B.COM, BBA, MBA & PIPFA. </li></ul><ul><li>COST ACCOUNTING OF ICMAP STAGE 2,3 ICAP MODULE D, BBA, MBA & PIPFA. </li></ul><ul><li>CONTACT: </li></ul><ul><li>0322-3385752 </li></ul><ul><li>0312-2302870 </li></ul><ul><li>0300-2540827 </li></ul><ul><li>R-1173,ALNOOR SOCIETY, BLOCK 19,F.B.AREA, KARACHI, PAKISTAN </li></ul>

    ×