Upcoming SlideShare
×

# Statistics And Correlation

6,357 views
6,150 views

Published on

3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
6,357
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
321
0
Likes
3
Embeds 0
No embeds

No notes for slide

### Statistics And Correlation

1. 1. Statistics / Correlation research
2. 2. <ul><li>After a research project has been carried out, what are the results? </li></ul><ul><li>For quantitative data, the results are a bunch of numbers. </li></ul><ul><li>Now what? What do the numbers look like, what do the numbers mean </li></ul><ul><li>Statistical analysis allows us to: </li></ul><ul><ul><li>Summarize the data </li></ul></ul><ul><ul><li>Represent the data in meaningful ways </li></ul></ul><ul><ul><li>Determine whether our data is meaningful or not </li></ul></ul>
3. 3. <ul><li>Many forms of research </li></ul><ul><ul><li>Many forms of data </li></ul></ul><ul><ul><li>Variety of dependent variables </li></ul></ul><ul><li>Data can take 1 of 4 different forms. </li></ul><ul><li>Four measurement scales: </li></ul><ul><ul><li>Nominal </li></ul></ul><ul><ul><li>Ordinal </li></ul></ul><ul><ul><li>Interval </li></ul></ul><ul><ul><li>Ratio </li></ul></ul>
4. 4. <ul><li>Nominal scale – </li></ul><ul><ul><li>simplest form of measurement: </li></ul></ul><ul><ul><li>you give something a name. </li></ul></ul><ul><ul><li>Qualitative scale of measurement </li></ul></ul><ul><li>Assign participants to a category based on a physical or psychological characteristic rather than a numerical score. </li></ul><ul><li>E.g., </li></ul><ul><ul><li>Male vs. Female; color of eyes </li></ul></ul><ul><ul><li>Intelligence levels: smart vs. dull </li></ul></ul><ul><li>Data is determined by a strict category </li></ul><ul><li>Only allows for crude comparisons of results. </li></ul><ul><li>Can really only be used for qualitative comparisons. </li></ul>
5. 5. <ul><li>Ordinal scales </li></ul><ul><ul><li>ranking system – data is ranked from highest to lowest </li></ul></ul><ul><li>Show relative rankings but say nothing about the extent of the differences between the rankings. </li></ul><ul><li>Does not assume that the intervals between rankings are equal. </li></ul><ul><li>E.g., rank 10 smartest kids </li></ul><ul><li>E.g., college football rankings </li></ul><ul><li>Problem – no absolute magnitude </li></ul><ul><li>Makes it difficult to make comparisons </li></ul>
6. 6. <ul><li>Interval scales – numeric scores without absolute zero </li></ul><ul><li>Not only relative ranks of scores, but also equal distances or degrees between the scores. </li></ul><ul><li>Interval = equal intervals ordering </li></ul><ul><li>E.g., IQ scores – difference between 100 and 120 is the same as the difference between 60 and 80. </li></ul><ul><li>Problem – no absolute zero </li></ul><ul><ul><li>Cannot have an IQ score of 0. </li></ul></ul><ul><ul><li>Does not allow for ratio comparisons. E.g., IQ of 120 is not twice as smart as 60. </li></ul></ul>
7. 7. <ul><li>Ratio scales - numeric scores but with an absolute zero point </li></ul><ul><li>All of the properties of the other scales but with a meaningful zero point. </li></ul><ul><li>Allows you to make ratio comparisons </li></ul><ul><ul><li>i.e., is one twice as much as another? </li></ul></ul><ul><li>E.g., number of correct answers on an exam. </li></ul><ul><li>E.g., number of friends a person has. </li></ul>
8. 8. <ul><li>Nominal and Ordinal scales are discrete or categorical </li></ul><ul><li>Interval and Ratio scales are continuous scales. </li></ul><ul><li>NOIR  Increasing levels of resolution </li></ul><ul><li>Most observable behaviors are measured on a Ratio scale. </li></ul><ul><li>Most psychological constructs are measured on an Interval scale. </li></ul><ul><li>Important to recognize what scale of measurement is being used. </li></ul><ul><ul><li>Nominal and ordinal data require different statistical analyses than interval or ratio data. </li></ul></ul>
9. 9. <ul><li>After data collection is finished, the data must be summarized. What does it look like? </li></ul><ul><li>Start with exploring the data. Look at individual scores. </li></ul><ul><li>Frequency distributions show us the collection of individual scores. </li></ul><ul><li>Simple frequency distributions – lists all possible score values and then indicates their frequency. </li></ul><ul><li>Allows us to make sense of the individual scores. </li></ul>
10. 12. <ul><li>Grouped frequency distribution – raw data are combined into equal sized groups </li></ul>
11. 13. <ul><li>Histogram – a frequency distribution in graphical form </li></ul><ul><ul><li>Bar graph </li></ul></ul>
12. 15. <ul><li>Numeric summaries that condense information </li></ul><ul><ul><li>Numbers that are used to make comparisons </li></ul></ul><ul><ul><li>Numbers that portray relationships or associations. </li></ul></ul><ul><li>Two main types of stats </li></ul><ul><ul><li>Descriptive statistics </li></ul></ul><ul><ul><li>Inferential statistics </li></ul></ul>
13. 16. <ul><li>Descriptive statistics – summarize results </li></ul><ul><ul><li>Central tendency </li></ul></ul><ul><ul><li>Variability </li></ul></ul><ul><li>Inferential statistics – Used to determine whether relationships or differences between samples are statistically significant </li></ul>
14. 17. <ul><li>Central tendency – what is the “heart of the data”? </li></ul><ul><li>Three measures of central tendency </li></ul><ul><li>Mean – average </li></ul><ul><ul><li>Add up all scores and divide by the total number of samples </li></ul></ul><ul><li>Median – middle score </li></ul><ul><ul><li>Line up all scores and find the middle one </li></ul></ul><ul><li>Mode – most common score </li></ul><ul><ul><li>Which score occurs the most often </li></ul></ul>
15. 18. <ul><li>Simply add up all of the scores and divide by the number in the sample. </li></ul><ul><li>The statistic for a sample – X bar - </li></ul><ul><li>=  X / n </li></ul>
16. 19. =  X / n  = 1822 / 23 = 79.22
17. 20. <ul><li>Pros and cons of using the mean </li></ul><ul><li>Pros </li></ul><ul><ul><li>Summarizes data in a way that is easy to understand. </li></ul></ul><ul><ul><li>Uses all the data </li></ul></ul><ul><ul><li>Used in many statistical applications </li></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Affected by extreme values </li></ul></ul><ul><ul><li>E.g., If Robert would have scored a 0, the mean changes to 74. </li></ul></ul><ul><li>E.g., average salary at a company </li></ul><ul><ul><li>12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 20,000; 390,000 </li></ul></ul><ul><ul><li>Mean = \$44, 167 </li></ul></ul>
18. 21. <ul><li>Median – the middle score in the data: half the scores are above it, half of the scores are below it. </li></ul><ul><ul><li>Scores are ranked…. Find the one in middle. </li></ul></ul><ul><li>50 56 66 68 70 72 76 76 76 78 78 78 78 80 80 86 86 86 88 96 98 100 100 </li></ul><ul><li>Example – Median is the score 78. </li></ul><ul><li>If there is an even number of scores, the median is the average of the two middle scores. </li></ul><ul><ul><li>E.g., 10, 10, 9, 9 – Median is 9.5 </li></ul></ul>
19. 22. <ul><li>Pros and cons of using the median </li></ul><ul><li>Pros </li></ul><ul><ul><li>Not affected by extreme values </li></ul></ul><ul><ul><li>Always exists </li></ul></ul><ul><ul><li>Easy to compute </li></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Doesn't use all of the data values </li></ul></ul><ul><ul><li>Categories must be properly ordered </li></ul></ul><ul><li>Mean is almost always preferred. Exception: data is skewed, not distributed symmetically, or has extreme scores. </li></ul>
20. 24. <ul><li>Mode – the most common score of the data </li></ul><ul><li>Mode is 78 </li></ul>
21. 25. <ul><li>Pros and cons of using mode </li></ul><ul><li>Pros </li></ul><ul><ul><li>Fairly easy to compute </li></ul></ul><ul><ul><li>Not affected by extreme values </li></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Sometimes not very descriptive of the data </li></ul></ul><ul><ul><li>Not necessarily unique – if two modes = bimodal; if multiple modes = polymodal. </li></ul></ul><ul><ul><li>Doesn't use all values. </li></ul></ul>
22. 26. <ul><li>Examples: shoe size, height </li></ul>
23. 27. <ul><li>Variability – how spread out is the data </li></ul><ul><li>Measures of variability </li></ul><ul><ul><li>Range </li></ul></ul><ul><ul><li>Variance </li></ul></ul><ul><ul><li>Standard deviation – “average variability” </li></ul></ul><ul><li>Range – the simplest variability statistic = high score – low score. </li></ul><ul><li>Standard deviation - a measure of the variation, or spread, of individual measurements; a measurement which indicates how far away from the middle the scores are. </li></ul>
24. 28. <ul><li>The larger the standard deviation, the more spread out the scores are. </li></ul><ul><li>The smaller the standard deviation, the closer the scores are to the mean. </li></ul>
25. 29. <ul><li>Computing SD </li></ul><ul><ul><li>1. subtract each score from the mean </li></ul></ul><ul><ul><ul><li>Ex. (100 – 80 = 20) </li></ul></ul></ul><ul><ul><li>2. square that number for each score </li></ul></ul><ul><ul><li>3. add up the squared numbers. This is the “sum of squares” </li></ul></ul><ul><ul><li>4. Divide the sum of squares by the total number in the sample minus one - this is the variance </li></ul></ul><ul><ul><li>4. take the square root of that number. This is the standard deviation </li></ul></ul>
26. 30. <ul><li>Data is usually spread around the mean in both directions </li></ul><ul><ul><li>Some are higher than the mean, some are lower. </li></ul></ul><ul><li>The frequency distribution of the scores tells us how the scores land relative to the mean. </li></ul><ul><li>Ideally, some scores are higher, some are lower, most are in the middle. </li></ul><ul><li>The normal distribution – the bell curve </li></ul>
27. 32. <ul><li>As sample size increases, the distribution of the data becomes more normalized. </li></ul><ul><li>Importance of the normal distribution </li></ul><ul><ul><li>Symmetrical </li></ul></ul><ul><ul><li>Mean, median, mode all the same </li></ul></ul><ul><ul><li>The further away from the mean, the less likely the score is to occur </li></ul></ul><ul><ul><li>Probabilities can be calculated </li></ul></ul>
28. 33. <ul><li>We can assume that many human traits or behavior follow the normal distribution </li></ul><ul><li>Some are high is a trait, some are low, but most people are in the middle. </li></ul><ul><li>E.g., personality traits, memory ability, musical capabilities </li></ul><ul><li>People have a tendency to think categorically - erroneous </li></ul>
29. 34. <ul><li>All data points are arranged, and a particular data point is compared to the population. </li></ul><ul><ul><li>E.g. IQ score of 130 </li></ul></ul><ul><li>Percentile reflect the percentage of scores that were below your data point of interest. </li></ul><ul><ul><li>IQ score of 130 is at the 95 th percentile. </li></ul></ul><ul><li>Percentile is arranged according to standard deviation. </li></ul>
30. 35. 0 SD is the 50 th percentile 1 SD is the 84 th percentile 2 SDs is the 97 th percentile 3 SDs is the 99.5 th percentile
31. 36. <ul><li>Advanced statistics that reveal whether differences are meaningful. </li></ul><ul><li>Take into account both central tendency (usually the mean) and variability </li></ul><ul><li>Determines the probability that the differences arose due to chance. </li></ul><ul><li>If the probability that the observed differences are due to chance is very low, we say that the difference is statistically significant. </li></ul><ul><li>Science holds a strict criteria for determining significance. </li></ul>
32. 37. <ul><li>α = alpha – the probability of committing a Type I error. </li></ul><ul><li>α is normally set at 0.05. </li></ul><ul><ul><li>Only a 5% chance of committing a type I error. </li></ul></ul><ul><li>Can find the probability that the observed differences are statistically significant. </li></ul><ul><ul><li>If that probability is less than 0.05, the results are statistically significant. </li></ul></ul><ul><li>Many types of inferential statistics </li></ul><ul><ul><li>t test </li></ul></ul><ul><ul><li>Analysis of Variance </li></ul></ul>
33. 38. <ul><li>Visually representing the data can make it more understandable for you as well as anyone else looking at your results. </li></ul><ul><li>Horizontal axis is the X-axis </li></ul><ul><li>Vertical axis is the Y-axis </li></ul><ul><li>The best graph is the one that makes the data more clear. </li></ul>
34. 40. <ul><li>Each score is divided into two parts, a stem and a leaf </li></ul><ul><ul><li>The leaf is the last digit of the score </li></ul></ul><ul><ul><li>The stem is the remaining digit(s) </li></ul></ul><ul><ul><li>E.g., 49 would have 4 as the stem and 9 as the leaf. </li></ul></ul><ul><li>Graphing a stem and leaf is like making a table. </li></ul>
35. 41. Stem Leaf 5 6 6 068 7 0266688888 8 006668 9 68 10 00
36. 42. <ul><li>Much of the time, a plot of the means is useful. </li></ul>
37. 43. <ul><li>Line graphs are especially important for Repeated Measures </li></ul>
38. 44. <ul><li>Show the median and distribution of scores. </li></ul><ul><li>Also shows outliers – scores that are more than 3 standard deviations from the mean. </li></ul>
39. 45. <ul><li>Keys to making figures: </li></ul><ul><ul><li>Keep it simple </li></ul></ul><ul><ul><li>Nothing is “required” for making figures </li></ul></ul><ul><ul><li>Purpose is to better illustrate the results. </li></ul></ul><ul><li>Don’t “lie” with figures. Axes should be set at appropriate range. </li></ul>
40. 48. <ul><li>Correlational research investigates the relationships between two variables. </li></ul><ul><ul><li>E.g., is there a relationship between poverty levels and crime </li></ul></ul><ul><ul><li>Attachment level in children and future behavior. </li></ul></ul><ul><ul><li>Are the number of hours husbands spend watching sports associated with wives’ marital satisfaction? </li></ul></ul><ul><ul><li>Are basketball players heights associated with number of points scored? </li></ul></ul>
41. 49. <ul><li>Establishes the relationship between the variables </li></ul><ul><ul><li>Whether it exists </li></ul></ul><ul><ul><li>The strength of the relationship </li></ul></ul><ul><li>Correlation can be used as a method for conducting research, or as a tool within the research. </li></ul>
42. 50. <ul><li>Correlation does not mean causation </li></ul><ul><ul><li>Ex. Significant correlation between ice cream sales and murder rates – ice cream sales and shark attacks </li></ul></ul><ul><ul><li>The number of cavities in elementary school children and vocabulary size have a strong positive correlation. </li></ul></ul><ul><ul><li>Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter). </li></ul></ul>
43. 51. <ul><li>Can be causation, but correlational research is not designed to assess that. </li></ul><ul><li>Meanings of correlation: </li></ul><ul><li>Causation: Changes in X cause changes in Y </li></ul><ul><li>Common Response: changes in X and Y are both caused by some unobserved variable. </li></ul><ul><li>Confounding variables are causing Y and not X . </li></ul>
44. 52. <ul><li>Correlation simply measure relationships. </li></ul><ul><li>All methods use to calculate correlation are established so that it can vary between –1 and +1. </li></ul><ul><li>Most common method is the Pearson product-moment correlation coefficient </li></ul><ul><ul><li>Represented by r </li></ul></ul><ul><li>Strength of the correlation </li></ul><ul><ul><li>The closer to +1 or -1, stronger the correlation </li></ul></ul>
45. 53. <ul><li>Positive correlations – as X increases, Y increases. </li></ul><ul><ul><li>Ex. Horsepower and speed </li></ul></ul><ul><ul><li>The value of the correlation represents the strength of the relationship. </li></ul></ul><ul><ul><li>+1 represents a perfect positive relationship. </li></ul></ul><ul><ul><li>0.9 is an extremely high correlation, 0.2 isn’t as strong. </li></ul></ul><ul><li>Zero correlations – as X increases, we have no idea what happens to Y. </li></ul><ul><ul><li>Values around 0 </li></ul></ul><ul><ul><li>Examples: length of hair and test scores </li></ul></ul>
46. 54. <ul><li>Negative correlations – as X increases, Y decreases. </li></ul><ul><ul><li>Horsepower and miles per gallon </li></ul></ul><ul><li>Important: a negative correlation simply tells what direction the relationship is, not the strength of the relationship. </li></ul><ul><li>One way to view correlations is graphically. </li></ul><ul><li>Scatterplots – graph that plots pairs of scores: one variable on the X axis, one on the Y axis. </li></ul>
47. 55. Concurrent Change Same Direction
48. 61. Negative Correlation Concurrent Change in Opposite Directions
49. 63. <ul><li>Scatter plots also allow you to see outliers. </li></ul><ul><li>Most correlations are assessing a linear relationship. </li></ul><ul><li>Some relationships are more complex. </li></ul><ul><li>E.g., the Yerkes-Dodson law </li></ul>