- 1. PROBABILITY & SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS Behavioral Statistics Summer 2017 Dr. Germano
- 2. What we’ve learned so far… Thus far, we have been talking about probabilities for a single event (n = 1) In Chapter 5… Z-scores help us determine a score’s exact position in a distribution in terms of standard deviations from the mean In Chapter 6… If the variable is normally distributed, we can use the z- score to determine exact probabilities for obtaining any individual score 68.26% 94.46% 99.73%
- 3. Samples and Populations • Typically, samples are much larger than n = 1 • How can we move from considering the probability of a single score to considering the probability of a group of scores? • Find some value that is a representative value of that sample, and convert that into a z-score to represent the sample. • What single value could we use to represent a group of scores? • The mean (‘typical’/ ‘central’) Now we can begin to think about the probability of obtaining a certain sample from the population (vs. a single score)
- 4. Issues with Samples Sampling Error • The natural discrepancy – or amount of error – that exists between a sample statistic and the corresponding population parameter Samples are variable • Different samples from the same population will not be exactly the same
- 5. Issues with Samples Samples provide an incomplete picture of the population While blindfolded, you pick 4 marbles (your sample) from one of these jars (population) If you picked 4 black marbles in a row, which jar would say they came from? Jar A Very low probability they came from this one Jar B Jar B Very high probability they came from this one
- 6. THE DISTRIBUTION OF SAMPLE MEANS
- 7. Distribution of Sample Means The set of sample means from all the possible random samples of a specific size (n) selected from a specific population • This distribution has well-defined (and predictable) characteristics that are specified in the Central Limit Theorem (CLT) • This collection of all sample means follows a pattern that allows us to predict characteristics of any one sample • Much like the z-score distribution allows us to predict characteristics of any one score from a normally distributed variable
- 8. • A distribution of statistics obtained by selecting all the possible samples of a specific size from a population Distribution of statistics vs. Distribution of scores Sampling Distribution
- 9. Creating a Sampling Distribution 1. Start with a population (µ, σ) 2. Randomly sample from the population (with each sample having equal n) repeatedly until every possible sample has been selected 3. Each time, calculate the mean (M) for your sample 4. Create a distribution of these sample means (M)
- 10. Example 7.1 Step 1 is to start with a population • Figure 7.1 is a frequency distribution histogram for a population of 4 scores: 2, 4, 6, 8
- 11. Example 7.1 Step 2 is to randomly sample from the population (equal n’s) until every possible sample has been selected • Table 7.1 lists all possible samples of n = 2 scores that can be obtained from the population presented in Figure 7.1 • Note that the table lists random samples. • This requires sampling with replacement, so it is possible to select the same score twice. Step 3 is to calculate the mean (M) for each sample
- 12. Example 7.1 Step 4 is to create a distribution of these sample means (M) • Figure 7.2 shows the distribution of 16 sample means from Table 7.1
- 13. Characteristics of a Sampling Distribution 1. Most sample means (M) should be clustered around μ 2. The distribution should be relatively normally distributed 3. The larger the sample size (n), the closer the sample means will approximate μ
- 14. What can we do with this distribution? Make statements about the probability of obtaining any one sample mean • Since we have a distribution of all possible samples, we can answer: • What is the probability of obtaining a sample with a mean greater than 7? • p(M > 7) = 1/16 = 0.063 • What proportion of all possible sample means have a value less than 5? • p(M < 5) = 6/16 or 3/8 = 0.375
- 15. Is the Sampling Distribution Useful? YES • Typically when we conduct research, we deal with very large populations and it is not realistic to believe we will be able to measure every possible sample How is the sampling distribution useful? • If all sampling distributions of the mean follow a similar mathematical pattern (the Central Limit Theorem), we will know how the distribution will behave without actually creating it. • Then, we can still make claims about the likelihood of our one sample considering all possible samples
- 16. The Central Limit Theorem (CLT) For any population with a mean μ and standard deviation σ, the distribution of sample means for sample size n will have a mean of μ and a standard deviation and will approach a normal distribution as n approaches infinity s n
- 17. The Central Limit Theorem (CLT) For any population with a mean μ and standard deviation σ, the distribution of sample means for sample size n will have a mean of μ and a standard deviation and will approach a normal distribution as n approaches infinity (shape, central tendency, variability) • Serves as a cornerstone for inferential statistics • Describes the sampling distribution of means from any population s n
- 18. The Central Limit Theorem (CLT) For any population with a mean μ and standard deviation σ, the distribution of sample means for sample size n will have a mean of μ and a standard deviation and will approach a normal distribution as n approaches infinity (shape, central tendency, variability) s n The Expected Value of M The Standard Error of M
- 19. Shape of the Sample Distribution • The shape of the distribution of sample means tends to be normal • It is guaranteed to be normal if either: A. The population from which the samples are obtained is normal B. The sample size is n = 30 or more
- 20. The Expected Value of M The mean of the distribution of sample means is always equal to the mean of the population of scores (μ) • If two (or more) samples are selected from the same population, the two samples probably will have different means. • Although the samples will have different means, you should expect the sample means to be close to the population mean • an unbaised statistic; accurately describes the population mean • Thus, the average value of all possible sample means will equal exactly the population parameter
- 21. The Standard Error of M (σM) The standard deviation of the distribution of sample means • = standard distance between M and μ • Two general purposes: 1. Describes the distribution of sample means • A measure of how much difference is expected from one sample to another 2. Measures how well an individual sample mean represents an entire distribution • Provides a measure of how much distance is reasonable to expect between M and μ • The magnitude of is determined by: 1. The size of the sample (n), and 2. The standard deviation (σ) of the population M M
- 22. The Magnitude of σM 1. The influence of n In general, as n increases, the error between M and μ decreases (the inverse is also true: as n decreases, the error increases) Law of Large Numbers: the larger the n, the more probable it is that M will be close to μ
- 23. The Magnitude of σM 2. The influence of σ • Large n = smaller error; small n = larger error • Consider σ as the “starting point” for standard error • When n = 1: • We have one score (X) • The sample mean: M = X • Standard error (σM) = standard distance between X and μ • Therefore, σM = σ • In the situation with the largest possible standard error, it is equal to the population standard deviation
- 24. The Magnitude of σM 2. The influence of σ (continued) • What should happen to the standard error as we get more information (as n increases)? • It should become smaller in a way that takes into account how much information we have
- 25. The Magnitude of σM Table 7.2 Calculations for the points shown in Figure 7.3. Again, notice that the size of the standard error decreases as the size of the sample increases. sM = s n = s 2 n
- 26. Three Different Distributions a) Original population of IQ scores • Has its own shape, mean, and SD b) Sample of n = 25 selected from population • Also has its own shape, mean, and SD c) Distribution of sample means obtained from all possible random samples of specific size (n = 25) • Expected Value of M = • Standard Error of M = • This distribution also has its own shape, mean and SD sM = s n 100 3 = 15 25 = 15 5 = 3
- 27. PROBABILITY AND THE DISTRIBUTION OF SAMPLE MEANS
- 28. Recap Sampling Distribution of the Mean • Collection of all possible samples’ means • Approximately normal at n = 30 or if from a normal population • Mean (expected value of M) equals the population mean • Standard deviation (standard error of M) equals: n M
- 29. Probability and Sample Means • Now we have a distribution of sample means that is normally distributed • We can find the probability of obtaining a sample with any M if we know the likelihood of all possible samples • The z-score value obtained for a sample mean can be used with the unit normal table (in your textbooks) to obtain probabilities • The procedures for computing z-scores and finding probabilities for sample means are essentially the same as we used for individual scores
- 30. Z-scores • For an individual score Gives the exact position of a score in a distribution in relation to the mean (by describing the number of standard deviations from the mean) • For a sample mean Gives the exact position of a sample mean in the distribution of sample means in relation to the population mean (by describing the number of standard deviations from the mean) z = x -m s z = x -m sM
- 31. Now we can find probabilities… The population of SAT scores is normally distributed with μ = 500 and σ = 100. If I randomly sample n = 25, what is the probability the sample mean will be greater than M = 540? Or, to restate as a proportion question: Out of all the possible sample means, what proportion have values greater than 540? • Based on the information from the CLT, we know that the sampling distribution of the mean: • Is normal because the population of SAT scores is normal • Has an expected value of M = 500 because μ = 500 • For n = 20, sM = s n = 100 25 = 20
- 32. Here is the distribution of sample means What is my next step? • Compute the z-score of M = 540 • Use the Unit Normal Table to find the proportion in the tail for z = 2.00 z = M -m sM = 540-500 20 = 40 20 = 2
- 33. Now answer the question The population of SAT scores is normally distributed with μ = 500 and σ = 100. If I randomly sample n = 25, what is the probability the sample mean will be greater than M = 540? Or, to restate as a proportion question: Out of all the possible sample means, what proportion have values greater than 540? If I randomly sample 25 people from the population, 2.28% of the time they will have a mean SAT score above 540 or Out of all the possible sample means, .0228 have values greater than 540
- 34. Now you try it: • What is p(M > 550)? • After looking up z = 2.50 in the Unit Normal Table, which column has the information I need? p(M > 550) = 0.0062 We have a normal distribution of SAT scores with μ = 500 and σ = 100. If I randomly sample n = 25 from the population: z = M -m s n æ è ç ö ø ÷ z = M -m sM n M = 550-500 100 25 æ è ç ö ø ÷ = 50 100 5 æ è ç ö ø ÷ = 50 20 = 2.50
- 35. Now you try it: • What is p(470 < M < 520)? • After looking up both z-scores, what information do I need? p(470 < M < 520) = (0.4332 + 0.3413) = 0.7745 We have a normal distribution of SAT scores with μ = 500 and σ = 100. If I randomly sample n = 25 from the population: z = M -m s n æ è ç ö ø ÷ z = M -m sM n M = 470-500 100 25 æ è ç ö ø ÷ = -30 100 5 æ è ç ö ø ÷ = -30 20 = -1.50 = 520-500 100 25 æ è ç ö ø ÷ = 20 100 5 æ è ç ö ø ÷ = 20 20 =1.00
- 37. Differences in Error Sampling Error • A sample will not typically provide an exact estimate of the population • 50% of samples will overestimate μ, 50% will underestimate μ Standard Error • A way to estimate how much sampling error exists • Standard deviation of the sampling distribution of the mean • Large standard error = less accurate sample estimations = more sampling error
- 38. LOOKING AHEAD TO INFERENTIAL STATISTICS
- 39. Looking ahead • Natural differences exist between statistics and parameters • Samples are not perfect representatives and there will always be some error • Sampling error of M • There will always be some amount of uncertainty when trying to generalize to a population from a sample
- 40. How can we use these concepts to help draw inferences? • We have a population • All students in the class • We know how this population performs • Population μ and σ on a typical test • We can sample from this population • Randomly sample n = 5 students • Give them some treatment • Special study sessions • And see if they have a mean noticeably different than the population • If the sample scores noticeably higher than typical, we have evidence that these study sessions ‘work’
- 41. The Point for Inferential Statistics If I know the distribution of all possible means… then I can make judgments about whether an event is unlikely or atypical • Is an event likely to occur by chance given how all possible events occur? • Or is an event unlikely and thus attributed to some other factor than chance? • (i.e., treatment, intervention, etc.)