Stats For Life Module7 Oc

1,755 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,755
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • page 506 of text
  • Relate a scatter plot to the algebraic plotting of number pairs (x,y).
  • page 509 of text
  • page 507 of text Explain to students the difference between the ‘paired’ data of this chapter and the investigation of two groups of data in Chapter 8.
  • page 512 of text If using a graphics calculator for demonstration, it will be an easy exercise to switch the x and y values to show that the value of r will not change.
  • Although the term nonparametric strongly suggests that the test is not based on a parameter, there are some nonparametric tests that do depend on a parameter such as the median, but they don't require a particular distribution. Although distribution-free is a more accurate description, the term non, parametric is more commonly used.
  • Nonparametric methods can be applied to a wide variety of situations because they do not have the more rigid requirements associated with parametric methods. In particular, nonparametric methods do not require normally distributed populations. Unlike parametric methods, nonparametric methods can often be applied to nonnumerical data, such as the genders of survey respondents. Nonparametric methods usually involve simpler computations than the corresponding parametric methods and are therefore easier to understand and apply.
  • Nonparametric methods may appear to waste information in cases where exact numerical data are converted to a qualitative form. For example, in the nonparametric sign test, weight losses by dieters are recorded simply as negative signs; the actual magnitudes of the weight losses are ignored. Nonparametric tests are not as efficient as parametric tests, so with a nonparametric test we generally need stronger evidence (such as a larger sample or greater differences) before we reject a null hypothesis.
  • When the requirements of population distributions are satisfied, nonparametric tests are generally less efficient than their parametric counterparts, but the reduced efficiency can be compensated for by an increased sample size. For example, Section 13-6 will deal with a concept called rank correlation, which has an efficiency rating of 0.91 when compared to the linear correlation presented in Chapter 9. This means that all other things being equal, non parametric rank correlation requires 100 sample observations to achieve the same results as 91 sample observations analysed through parametric linear correlation, assuming the stricter requirements for using the parametric method are met. Table 13-1 lists the nonparametric methods covered in this chapter, along with the corresponding parametric approach and efficiency rating. Table 13-1 shows that several nonparametric tests have efficiency ratings above 0.90, so the lower efficiency might not be a critical factor in choosing between parametric or nonparametric methods. More important is that we avoid using the parametric tests when their required assumptions are not satisfied.
  • Stats For Life Module7 Oc

    1. 1. Statistics for the Life Sciences Module 7: Correlation and Simple Linear Regression MATH-0072 Instructor: Nicole Rabe
    2. 2. Acknowledgements
    3. 3. Homework ~ Reading <ul><li>Correlation Coefficient : pp. 457-463 & pg 468 </li></ul><ul><li>Simple Linear Regression: pp. 474-477 </li></ul>
    4. 4. Overview <ul><li>Paired Data </li></ul><ul><li>Is there a relationship? </li></ul><ul><li>If so, what is the equation? </li></ul><ul><li>Use that equation for prediction. </li></ul>Pearson Education © 2004
    5. 5. Definition <ul><li>A correlation exists between two variables when one of them is related to the other in some way. </li></ul>Pearson Education © 2004
    6. 6. Definition <ul><li>A Scatterplot (or scatter diagram) is a graph in which the paired ( x, y ) sample data are plotted with a horizontal x- axis and a vertical y- axis. Each individual ( x, y ) pair is plotted as a single point. </li></ul>Pearson Education © 2004
    7. 7. Definition <ul><ul><li>The linear correlation coefficient r measures strength of the linear relationship between paired x and y values in a sample. </li></ul></ul>Pearson Education © 2004
    8. 8. Assumptions <ul><li>1. The sample of paired data ( x, y ) is a random sample. </li></ul><ul><li>2. The pairs of ( x, y ) data have a bivariate normal distribution. </li></ul>Bivariate=relating to or involving two variables Pearson Education © 2004
    9. 9. Correlation <ul><li>Indicates direction of relationship </li></ul><ul><ul><li>r>0 = positive </li></ul></ul><ul><ul><li>r<0 = negative </li></ul></ul><ul><li>Perfect correlation, r=1 or -1, occurs when points lie exactly on straight line </li></ul>
    10. 10. Correlation <ul><li>r has a value between -1 and +1: </li></ul><ul><li>Correlation has no units </li></ul>r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1 Points fall exactly on a straight line Points fall exactly on a straight line No linear relationship (uncorrelated)
    11. 11. Be Aware <ul><li>Use correlation only if you have two quantitative variables </li></ul><ul><ul><li>There is an association between gender and weight but there isn’t a correlation between gender and weight! </li></ul></ul><ul><li>Use correlation only if the relationship is linear </li></ul><ul><li>Beware of outliers! </li></ul>
    12. 12. Correlation “r” Means “add these terms for all the individuals” Begin with standardizing values e.g. x =height in cm and y = weight in kg and this data exists for n people and are the mean and standard deviation of the n heights
    13. 13. Correlation in EXCEL <ul><li>Open file called </li></ul><ul><ul><li>Correlation unit6–rabe06.xls </li></ul></ul><ul><li>Lengths of two bones in five fossil specimens of the extinct beast Archaeopteryx </li></ul>What type of relationship is shown above? = Strong positive linear
    14. 14. Summary: Properties of the Linear Correlation Coefficient r <ul><li>1. –1  r  1 </li></ul><ul><li>2. Value of r does not change if all values of either variable are converted to a different scale. </li></ul><ul><li>3. The r is not affected by the choice of x and y . interchange x and y and the value of r will not change. </li></ul><ul><li>4. r measures strength of a linear relationship. </li></ul>Pearson Education © 2004
    15. 15. Regression (r 2 )
    16. 16. Regression (r 2 ) <ul><li>Both linear regression and correlation analysis can be used to describe the strength of a linear relationship between two continuous variables. </li></ul><ul><li>However, linear regression allows us to estimate the equation describing the relationship. </li></ul><ul><li>To do this, it is necessary to declare one of the variables to be: </li></ul><ul><ul><li>dependent variable (denoted Y) </li></ul></ul><ul><ul><li>independent variable (denoted X). </li></ul></ul>
    17. 17. Regression v.s. correlation <ul><li>We want to predict the value y given value x </li></ul><ul><li>Unlike correlation in that regression requires an explanatory variable and a response variable </li></ul>
    18. 18. Regression <ul><li>Definition </li></ul><ul><li>Regression Equation </li></ul>The regression equation expresses a relationship between x (called the independent variable , predictor variable or explanatory variable) and y (called the dependent variable or response variable . Pearson Education © 2004
    19. 19. y = dependent variable x = independent variable a = intercept (value we would expect y to have if x is 0) b = slope Plot y (response variable) on vertical axis Plot x (explanatory variable) on horizontal axis Fit the line to the points distribution Slope is the amount by which y changes
    20. 20. Prediction using regression analysis <ul><li>Can use a regression line to predict the response of y for a specific value of the explanatory variable x </li></ul><ul><li>however one must error in using any regression equation </li></ul><ul><ul><li>Error = observed value – predicted values </li></ul></ul>
    21. 21. Regression in EXCEL <ul><li>Water testing: </li></ul><ul><ul><li>Form a dye by chemical reaction with dissolved pollutant </li></ul></ul><ul><ul><li>Pass a light through solution and measure its “absorbance” </li></ul></ul><ul><ul><li>To calibrate – measure known solution & use regression to relate absorbance to pollutant concentration </li></ul></ul><ul><ul><li>Do above daily </li></ul></ul>
    22. 22. r 2 = variance of predicted values / variance of observed values Fraction of the Variation Explained The square of the correlation is the fraction of the variation in the values of y that is explained by the regression of y on x.
    23. 23. Can you use a Tape Measure to WEIGH a bear? <ul><li>Case Study: </li></ul><ul><ul><li>Researchers anesthetise 8 male bears to obtain data on age, gender, length, and weight </li></ul></ul><ul><ul><li>Most bears are heavy (duhhhh!) and heavy to lift when out in the wild! </li></ul></ul><ul><ul><li>Can we determine the weight of a bear from other measurements using regression analysis? </li></ul></ul>
    24. 24. Dataset <ul><li>Just using your eyes - is there a relationship? </li></ul><ul><li>Is it a strong linear relationship? </li></ul><ul><li>Estimate the correlation r and the regression r 2 </li></ul><ul><li>Next Page shows the result…. </li></ul>
    25. 25. Best line fit was Exponential not linear... Dataset provided by Triola, Bisotatistics - Bears.xls feel free to try some of the variables
    26. 26. Play Time! <ul><li>http://cwx.prenhall.com/bookbind/pubbooks/esm_larson_elemstats_2/chapter9/deluxe.html </li></ul><ul><li>Move points on scatterplot by using mouse and dragging </li></ul><ul><li>Witness the impact on the r 2 value </li></ul><ul><li>Now drag a point far away from the line – see how an outlier can change or weaken r 2 </li></ul>Regression Statistical Applet
    27. 27. Practice Regression <ul><li>Practice manual calculation of correlation on Femur/Humerus Dataset or Olympic Runners Dataset </li></ul><ul><li>Case Study (pg 506) and do Analyzing the Data: a, b, d, & e </li></ul>
    28. 28. Parametric v.s. non-parametric tests Introduction Only!!!!
    29. 29. Definitions <ul><li>Parametric tests require assumptions about the nature or shape of the populations involved </li></ul><ul><li>Nonparametric tests do not require such assumptions </li></ul><ul><li>Nonparametric test are often called distribution free </li></ul>
    30. 30. Advantages of Nonparametric Methods <ul><li>Can be applied to a wide variety of situations because they do not have the more rigid requirements associated with parametric methods. </li></ul><ul><li>Can often be applied to nonnumerical data, such as the genders of survey respondents. </li></ul><ul><li>Usually involve simpler computations than the corresponding parametric methods and are therefore easier to understand and apply. </li></ul>
    31. 31. Disadvantages of Nonparametric Methods <ul><li>May appear to waste information in cases where exact numerical data are converted to a qualitative form. </li></ul><ul><li>Not as efficient as parametric tests, so with a nonparametric test we generally need stronger evidence (such as a larger sample or greater differences) before we reject a null hypothesis. </li></ul>
    32. 32. Comparison of Parametric and Nonparametric Tests

    ×