Chapter 11 Psrm

1,226 views

Published on

Published in: Technology, Economy & Finance
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,226
On SlideShare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Chapter 11 Psrm

  1. 1. Chapter 11 Central Tendency Dispersion Statistical Inference Hypothesis Testing
  2. 2. Description <ul><li>We can describe data in a number of ways: </li></ul><ul><ul><li>We could describe every observation, or every value in a data set (but this would be overwhelming and mostly unhelpful) </li></ul></ul><ul><ul><li>Alternatively, we could summarize the data: </li></ul></ul><ul><ul><ul><li>Graphical summaries </li></ul></ul></ul><ul><ul><ul><ul><li>Bar graphs, pie graphs, dot plots, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Statistical summaries </li></ul></ul></ul><ul><ul><ul><ul><li>Frequency distributions </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Descriptive statistics </li></ul></ul></ul></ul>
  3. 3. Description <ul><li>Frequency distributions </li></ul><ul><ul><li>A table that shows the number of observations having each value of a variable </li></ul></ul><ul><ul><li>May include other statistics like the relative frequency proportion, percentage, missing values, or odds ratios </li></ul></ul><ul><li>Descriptive statistics </li></ul><ul><ul><li>Describing a large amount of data with just one number </li></ul></ul>
  4. 4. Description <ul><li>Two classes of descriptive statistics </li></ul><ul><ul><li>Central tendency </li></ul></ul><ul><ul><li>Dispersion </li></ul></ul>
  5. 5. Central Tendency <ul><li>Measures of central tendency </li></ul><ul><ul><li>Describe the typical case in a data set or distribution </li></ul></ul><ul><ul><li>Three statistics </li></ul></ul><ul><ul><ul><li>Mode </li></ul></ul></ul><ul><ul><ul><li>Median </li></ul></ul></ul><ul><ul><ul><li>Mean </li></ul></ul></ul>
  6. 6. Central Tendency <ul><li>Mode </li></ul><ul><ul><li>Indicates the most common observation </li></ul></ul><ul><ul><li>Simply count the number of times you observe each value </li></ul></ul><ul><ul><li>Mode is resistant to outliers </li></ul></ul><ul><ul><ul><li>By definition, the mode cannot be an outlier </li></ul></ul></ul><ul><ul><ul><li>Describes only a single value in the data </li></ul></ul></ul>
  7. 7. Central Tendency <ul><li>Median </li></ul><ul><ul><li>Describes the middle value in an ordered set of values </li></ul></ul><ul><ul><li>Important to rank order the observations first </li></ul></ul><ul><ul><li>Median = ( N +1)/2 </li></ul></ul><ul><ul><li>With an even number of observations, average the two middle values </li></ul></ul><ul><ul><li>Resistant to outliers—by definition, median is not an outlier </li></ul></ul><ul><ul><li>Includes only one value </li></ul></ul>
  8. 8. Central Tendency <ul><li>Mean </li></ul><ul><ul><li>Describes the average value </li></ul></ul><ul><ul><li>Mean = (∑ Y )/ N </li></ul></ul><ul><ul><li>Mean is not resistant to outliers </li></ul></ul><ul><ul><ul><li>Outliers will pull the mean up or down, sometimes significantly </li></ul></ul></ul><ul><ul><li>Computed using all values </li></ul></ul>
  9. 9. Central Tendency <ul><li>Compute the mode, median, and mean for each of these data sets: </li></ul><ul><li>Data set #1 Data set #2 </li></ul><ul><li>i Y i Y </li></ul><ul><li>1 5 1 1 </li></ul><ul><li>2 5 2 4 </li></ul><ul><li>3 5 3 5 </li></ul><ul><li>4 5 4 5 </li></ul><ul><li>5 5 5 10 </li></ul>
  10. 10. Central Tendency <ul><li>Data set #1 </li></ul><ul><ul><li>Mode = 5 </li></ul></ul><ul><ul><li>Median = 5 </li></ul></ul><ul><ul><li>Mean = 5 </li></ul></ul><ul><li>Clearly, the two data sets are not identical </li></ul><ul><li>Data set #2 </li></ul><ul><ul><li>Mode = 5 </li></ul></ul><ul><ul><li>Median = 5 </li></ul></ul><ul><ul><li>Mean = 5 </li></ul></ul><ul><li>But central tendency belies the truth </li></ul>
  11. 11. Dispersion <ul><li>What we need is some way to differentiate between data set #1 and data set #2. </li></ul><ul><li>The typical values in each data set were the same. </li></ul><ul><li>We need a measure that describes the other values in the data sets. </li></ul><ul><li>Measures of dispersion indicate how the other values vary around the typical value. </li></ul>
  12. 12. Dispersion <ul><li>Measures of dispersion </li></ul><ul><ul><li>Range </li></ul></ul><ul><ul><li>Variance </li></ul></ul><ul><ul><li>Standard deviation </li></ul></ul>
  13. 13. Dispersion <ul><li>Range </li></ul><ul><ul><li>One of the simplest measures of dispersion is the range. </li></ul></ul><ul><ul><li>Range = Y maximum – Y minimum </li></ul></ul><ul><ul><li>Describes the extremes of the data around the typical case. </li></ul></ul>
  14. 14. Dispersion <ul><li>Variance </li></ul><ul><ul><li>The variance takes into account all of the values in the data set. </li></ul></ul><ul><ul><li>There are two formulas to calculate the variance: </li></ul></ul><ul><ul><ul><li>One formula for the sample </li></ul></ul></ul><ul><ul><ul><li>One formula for the population </li></ul></ul></ul><ul><ul><li>The only difference is that we subtract 1 from the sample size in the sample version of the equation. </li></ul></ul>
  15. 16. Dispersion <ul><li>Standard deviation </li></ul><ul><ul><li>The standard deviation also takes into account all of the values in the data set. </li></ul></ul><ul><ul><li>There are also two formulas to calculate the standard deviation: </li></ul></ul><ul><ul><ul><li>One formula for the sample </li></ul></ul></ul><ul><ul><ul><li>One formula for the population </li></ul></ul></ul><ul><ul><li>Like variance, the only difference is that we subtract 1 from the sample size in the sample version of the equation. </li></ul></ul>
  16. 18. Dispersion <ul><li>Compute the range, sample variance, and sample standard deviation for each of these data sets: </li></ul><ul><li>Data set #1 Data set #2 </li></ul><ul><li>i Y i Y </li></ul><ul><li>1 5 1 1 </li></ul><ul><li>2 5 2 4 </li></ul><ul><li>3 5 3 5 </li></ul><ul><li>4 5 4 5 </li></ul><ul><li>5 5 5 10 </li></ul>
  17. 19. Dispersion <ul><li>Data set #1 </li></ul><ul><ul><li>Range = 0 </li></ul></ul><ul><ul><li>Variance = 0 </li></ul></ul><ul><ul><li>Standard deviation = 0 </li></ul></ul><ul><li>Measures of dispersion indicate that the data sets are not the same. </li></ul><ul><li>Data set #2 </li></ul><ul><ul><li>Range = 9 </li></ul></ul><ul><ul><li>Variance = 10.5 </li></ul></ul><ul><ul><li>Standard deviation = 3.24 </li></ul></ul>
  18. 20. Dispersion <ul><li>Now try calculating the population versions of the variance and standard deviation for data set #2. </li></ul><ul><li>Data set #2 </li></ul><ul><ul><li>Variance = ? </li></ul></ul><ul><ul><li>Standard deviation = ? </li></ul></ul>
  19. 21. Dispersion <ul><li>As you can see, the population variance and standard deviation are slightly smaller than in the sample version. </li></ul><ul><li>This reflects our greater confidence in population data than in sample data. </li></ul><ul><li>Data set #2 </li></ul><ul><ul><li>Variance = 8.4 </li></ul></ul><ul><ul><li>Standard deviation = 2.89 </li></ul></ul>
  20. 22. Dispersion <ul><li>Variance and standard deviation </li></ul><ul><ul><li>Variance is used in many different statistical applications. </li></ul></ul><ul><ul><li>The standard deviation is used more often to summarize the data than variance because the standard deviation is in the same units as the mean. </li></ul></ul><ul><ul><li>If data sets #1 and #2 describe miles per gallon, we could say that in data set #2 we have a mean of 5 miles per gallon and a standard deviation of 2 miles per gallon. </li></ul></ul>
  21. 23. Statistical Inference <ul><ul><li>The normal distribution is our first choice in most cases because it has such wonderful properties: </li></ul></ul><ul><ul><ul><li>Distribution is symmetrical around the mean </li></ul></ul></ul><ul><ul><ul><li>Percentage of cases associated with standard deviations </li></ul></ul></ul><ul><ul><ul><li>Can identify probability of values under the curve </li></ul></ul></ul><ul><ul><ul><li>A linear combination of normally distributed variables is itself distributed normally </li></ul></ul></ul><ul><ul><ul><li>Central limit theorem </li></ul></ul></ul><ul><ul><ul><li>Normal distribution is symmetric and mesokurtic </li></ul></ul></ul><ul><ul><li>Great flexibility in using the normal distribution </li></ul></ul>
  22. 24. Statistical Inference
  23. 25. Statistical Inference <ul><li>We can calculate a z score for every observation in the data set. </li></ul><ul><li>The z score allows us to compare each observation to the rest of the data set, relative to the mean. </li></ul><ul><li>z score, or z of X = ( X –  ) </li></ul><ul><li>  </li></ul>
  24. 26. Statistical Inference <ul><li>Example : </li></ul><ul><li> = 64  = 2.4 X i =70 or more </li></ul><ul><li>z = ( X –  ) /  </li></ul>
  25. 27. Statistical Inference <ul><li>Example : </li></ul><ul><li> = 64  = 2.4 X i =70 or more </li></ul><ul><li>z = ( X –  ) /  </li></ul><ul><li>z = (70 – 64) / 2.4 </li></ul><ul><li>z = (6) / 2.4 </li></ul><ul><li>z = 2.5 for 70 contacts </li></ul><ul><li>p = .0062; or 0.62% </li></ul>
  26. 28. Hypothesis Testing <ul><li>How do you test hypotheses with statistics? </li></ul><ul><li>Comparing the means of two groups </li></ul><ul><ul><li>Consider an experiment </li></ul></ul><ul><li>Research hypothesis: </li></ul><ul><li>Null hypothesis: </li></ul>X 1 ≠ X 2 ─ ─ X 1 = X 2 ─ ─
  27. 29. Hypothesis Testing <ul><li>Type 1 error </li></ul><ul><ul><li>State of the world: Research hypothesis is false </li></ul></ul><ul><ul><li>Incorrect rejection of null </li></ul></ul><ul><li>Type 2 error </li></ul><ul><ul><li>State of the world: Research hypothesis is true </li></ul></ul><ul><ul><li>Incorrect acceptance of null </li></ul></ul>
  28. 30. Hypothesis Testing <ul><li>Hypothesis : College students are less likely to read political news stories than are other voting-age citizens. </li></ul><ul><li>X = 5;  = 10;  = 2; n = 25 </li></ul>( X –  ) (  / √ n ) __________ z = _ _
  29. 31. Hypothesis Testing <ul><li>Hypothesis : College students are less likely to read political news stories than are other voting-age citizens. </li></ul><ul><li>X = 5;  = 10;  = 2; n = 25 </li></ul>_ -12.5 z = ( X –  ) (  / √ n ) __________ z = _ (5 – 10) (2 / √25) __________ z = (-5) (.4) __________ z =
  30. 32. Hypothesis Testing <ul><li>Hypothesis : College students are less likely to read political news stories than are other voting-age citizens. </li></ul><ul><li>95% confidence </li></ul><ul><li>z critical = 1.96 </li></ul>-12.5 z = ( X –  ) (  / √ n ) __________ z = _ (5 – 10) (2 / √25) __________ z = (-5) (.4) __________ z =
  31. 33. Hypothesis Testing <ul><li>Hypothesis : College students rate liberal candidates higher than do the rest of the voting population. </li></ul><ul><li>X = 52;  = 50;  = 5; n = 25 </li></ul>_ ( X –  ) (  / √ n ) __________ t = _
  32. 34. Hypothesis Testing <ul><li>Hypothesis : College students rate liberal candidates higher than do the rest of the voting population. </li></ul><ul><li>X = 52;  = 50;  = 5; n = 25 </li></ul>2 t = _ ( X –  ) (  / √ n ) __________ t = _ (52 – 50) (5 / √25) ___________ t = (2) (1) __________ t =
  33. 35. Hypothesis Testing <ul><li>Hypothesis : College students rate liberal candidates higher than do the rest of the voting population. </li></ul><ul><li>Two-tailed test; .05 level; n – 1 df </li></ul><ul><li>t critical = 2.064 </li></ul>2 t = ( X –  ) (  / √ n ) __________ t = _ (52 – 50) (5 / √25) ___________ t = (2) (1) __________ t =

×