Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

No Downloads

Total views

1,755

On SlideShare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

0

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Statistics for the Life Sciences Module 7: Correlation and Simple Linear Regression MATH-0072 Instructor: Nicole Rabe
- 2. Acknowledgements
- 3. Homework ~ Reading <ul><li>Correlation Coefficient : pp. 457-463 & pg 468 </li></ul><ul><li>Simple Linear Regression: pp. 474-477 </li></ul>
- 4. Overview <ul><li>Paired Data </li></ul><ul><li>Is there a relationship? </li></ul><ul><li>If so, what is the equation? </li></ul><ul><li>Use that equation for prediction. </li></ul>Pearson Education © 2004
- 5. Definition <ul><li>A correlation exists between two variables when one of them is related to the other in some way. </li></ul>Pearson Education © 2004
- 6. Definition <ul><li>A Scatterplot (or scatter diagram) is a graph in which the paired ( x, y ) sample data are plotted with a horizontal x- axis and a vertical y- axis. Each individual ( x, y ) pair is plotted as a single point. </li></ul>Pearson Education © 2004
- 7. Definition <ul><ul><li>The linear correlation coefficient r measures strength of the linear relationship between paired x and y values in a sample. </li></ul></ul>Pearson Education © 2004
- 8. Assumptions <ul><li>1. The sample of paired data ( x, y ) is a random sample. </li></ul><ul><li>2. The pairs of ( x, y ) data have a bivariate normal distribution. </li></ul>Bivariate=relating to or involving two variables Pearson Education © 2004
- 9. Correlation <ul><li>Indicates direction of relationship </li></ul><ul><ul><li>r>0 = positive </li></ul></ul><ul><ul><li>r<0 = negative </li></ul></ul><ul><li>Perfect correlation, r=1 or -1, occurs when points lie exactly on straight line </li></ul>
- 10. Correlation <ul><li>r has a value between -1 and +1: </li></ul><ul><li>Correlation has no units </li></ul>r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1 Points fall exactly on a straight line Points fall exactly on a straight line No linear relationship (uncorrelated)
- 11. Be Aware <ul><li>Use correlation only if you have two quantitative variables </li></ul><ul><ul><li>There is an association between gender and weight but there isn’t a correlation between gender and weight! </li></ul></ul><ul><li>Use correlation only if the relationship is linear </li></ul><ul><li>Beware of outliers! </li></ul>
- 12. Correlation “r” Means “add these terms for all the individuals” Begin with standardizing values e.g. x =height in cm and y = weight in kg and this data exists for n people and are the mean and standard deviation of the n heights
- 13. Correlation in EXCEL <ul><li>Open file called </li></ul><ul><ul><li>Correlation unit6–rabe06.xls </li></ul></ul><ul><li>Lengths of two bones in five fossil specimens of the extinct beast Archaeopteryx </li></ul>What type of relationship is shown above? = Strong positive linear
- 14. Summary: Properties of the Linear Correlation Coefficient r <ul><li>1. –1 r 1 </li></ul><ul><li>2. Value of r does not change if all values of either variable are converted to a different scale. </li></ul><ul><li>3. The r is not affected by the choice of x and y . interchange x and y and the value of r will not change. </li></ul><ul><li>4. r measures strength of a linear relationship. </li></ul>Pearson Education © 2004
- 15. Regression (r 2 )
- 16. Regression (r 2 ) <ul><li>Both linear regression and correlation analysis can be used to describe the strength of a linear relationship between two continuous variables. </li></ul><ul><li>However, linear regression allows us to estimate the equation describing the relationship. </li></ul><ul><li>To do this, it is necessary to declare one of the variables to be: </li></ul><ul><ul><li>dependent variable (denoted Y) </li></ul></ul><ul><ul><li>independent variable (denoted X). </li></ul></ul>
- 17. Regression v.s. correlation <ul><li>We want to predict the value y given value x </li></ul><ul><li>Unlike correlation in that regression requires an explanatory variable and a response variable </li></ul>
- 18. Regression <ul><li>Definition </li></ul><ul><li>Regression Equation </li></ul>The regression equation expresses a relationship between x (called the independent variable , predictor variable or explanatory variable) and y (called the dependent variable or response variable . Pearson Education © 2004
- 19. y = dependent variable x = independent variable a = intercept (value we would expect y to have if x is 0) b = slope Plot y (response variable) on vertical axis Plot x (explanatory variable) on horizontal axis Fit the line to the points distribution Slope is the amount by which y changes
- 20. Prediction using regression analysis <ul><li>Can use a regression line to predict the response of y for a specific value of the explanatory variable x </li></ul><ul><li>however one must error in using any regression equation </li></ul><ul><ul><li>Error = observed value – predicted values </li></ul></ul>
- 21. Regression in EXCEL <ul><li>Water testing: </li></ul><ul><ul><li>Form a dye by chemical reaction with dissolved pollutant </li></ul></ul><ul><ul><li>Pass a light through solution and measure its “absorbance” </li></ul></ul><ul><ul><li>To calibrate – measure known solution & use regression to relate absorbance to pollutant concentration </li></ul></ul><ul><ul><li>Do above daily </li></ul></ul>
- 22. r 2 = variance of predicted values / variance of observed values Fraction of the Variation Explained The square of the correlation is the fraction of the variation in the values of y that is explained by the regression of y on x.
- 23. Can you use a Tape Measure to WEIGH a bear? <ul><li>Case Study: </li></ul><ul><ul><li>Researchers anesthetise 8 male bears to obtain data on age, gender, length, and weight </li></ul></ul><ul><ul><li>Most bears are heavy (duhhhh!) and heavy to lift when out in the wild! </li></ul></ul><ul><ul><li>Can we determine the weight of a bear from other measurements using regression analysis? </li></ul></ul>
- 24. Dataset <ul><li>Just using your eyes - is there a relationship? </li></ul><ul><li>Is it a strong linear relationship? </li></ul><ul><li>Estimate the correlation r and the regression r 2 </li></ul><ul><li>Next Page shows the result…. </li></ul>
- 25. Best line fit was Exponential not linear... Dataset provided by Triola, Bisotatistics - Bears.xls feel free to try some of the variables
- 26. Play Time! <ul><li>http://cwx.prenhall.com/bookbind/pubbooks/esm_larson_elemstats_2/chapter9/deluxe.html </li></ul><ul><li>Move points on scatterplot by using mouse and dragging </li></ul><ul><li>Witness the impact on the r 2 value </li></ul><ul><li>Now drag a point far away from the line – see how an outlier can change or weaken r 2 </li></ul>Regression Statistical Applet
- 27. Practice Regression <ul><li>Practice manual calculation of correlation on Femur/Humerus Dataset or Olympic Runners Dataset </li></ul><ul><li>Case Study (pg 506) and do Analyzing the Data: a, b, d, & e </li></ul>
- 28. Parametric v.s. non-parametric tests Introduction Only!!!!
- 29. Definitions <ul><li>Parametric tests require assumptions about the nature or shape of the populations involved </li></ul><ul><li>Nonparametric tests do not require such assumptions </li></ul><ul><li>Nonparametric test are often called distribution free </li></ul>
- 30. Advantages of Nonparametric Methods <ul><li>Can be applied to a wide variety of situations because they do not have the more rigid requirements associated with parametric methods. </li></ul><ul><li>Can often be applied to nonnumerical data, such as the genders of survey respondents. </li></ul><ul><li>Usually involve simpler computations than the corresponding parametric methods and are therefore easier to understand and apply. </li></ul>
- 31. Disadvantages of Nonparametric Methods <ul><li>May appear to waste information in cases where exact numerical data are converted to a qualitative form. </li></ul><ul><li>Not as efficient as parametric tests, so with a nonparametric test we generally need stronger evidence (such as a larger sample or greater differences) before we reject a null hypothesis. </li></ul>
- 32. Comparison of Parametric and Nonparametric Tests

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment