- 1. Statistics 1 (FPN)– Question Pool
- 2. Welcome to Success Formula Question Pool Disclaimers • All slides and its materials are the property of Success Formula • You get an exclusive free personal access once buying the course the slides are made for • The slides are individually marked, and Success Formula can track to which users they belong • No part of this slide deck may be reproduced, distributed, or transmitted (hereafter in this slide referred together as “Shared”) in any form or by any means, including sharing the material on platforms such as StudyDrive • In case slides are shared, Success Formula can attempt legal actions towards the sharing party in line with European and Dutch Law (Copyright laws) 1 Error Bounty • If you find any mistake in this slide deck, let us know and we will refund you the cost of the slides • Only the first person indicating the mistake gets the refund
- 3. Answers Question Some people seem to like Breaking Bad, others like Prison Break. What is the percentage of people that watch TV? 2 A. The Walking Dead B. Depends on the year C. All of them D. Answer D because it is the best answer Answer: C Introduction question Question topic The question Difficulty Answers Correct Answer
- 4. Significance level *** Always use a significance level of 0.05 if otherwise not specified*** 3
- 5. Stats1 – Question Pool Probability Theory
- 6. Answers Question Florian wants to show Julian a new magic trick. As part of the trick, Julian has to pull a card out of a 52 card deck, 3 times in a row, each time keeping the card before pulling the next one. There are 26 red cards and 26 black cards. Which statement is incorrect? 5 A. The probability that out of the three chosen cards, there is at least one red card or at least one black card is equal to 1 B. The outcome of the 2nd trial will influence the outcome of the 3rd trial C. The probability of picking a queen of hearts equals the probability of picking a queen of hearts given that in the previous trial Julian picked a 7 of spades D. The sample space is all the possible combinations of cards that can be drawn in a sample of 3 Answer: C 1. Probability Theory
- 7. 1E. Probability Theory Question Florian wants to show Julian a new magic trick. As part of the trick, Julian has to pull a card out of a 52 card deck, 3 times in a row, each time keeping the card before pulling the next one. There are 26 red cards and 26 black cards. Which statement is incorrect? 6 Solution A. Correct. Since the deck of cards has an equal number of red and black cards, Julian will definitely pick at least 1 card of either black or red colour, meaning that we have a perfect probability equal to 1 B. Correct. Every time Julian picks a card, he does not put it back, meaning that each outcome of every trial will influence the next one (the events become dependent) C. Incorrect. P(QH) = P(QH/7S) à That would be correct if the events were independent. In other words, if after every trial, Julian put his chosen card back in the deck. D. Correct. Julian picks 3 cards in total so any possible combination that he can make with 3 cards is included in the sample space
- 8. Answers Question Suppose that 2 dice are rolled at the same time. Calculate the following probabilities: • P(A): The sum of the two numbers is equal to 1 • P(B): The sum of the two numbers is equal to 5 • P(C): The sum of the two numbers is less than 13 7 A. P(A) = 0.5, P(B) = 0.23, P(C) = 0 B. P(A) = 0, P(B) = 0.111, P(C) = 1 C. P(A) = 1, P(B) = 0.12, P(C) = 0 D. The probabilities cannot be calculated Answer: B 2. Probability Theory
- 9. 2E. Probability Theory Question Suppose that 2 dice are rolled at the same time. Calculate the following probabilities: • P(A): The sum of the two numbers is equal to 1 • P(B): The sum of the two numbers is equal to 5 • P(C): The sum of the two numbers is less than 13 Sample Space: (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) 8 Solution No possible combination resulting from rolling 2 dice at the same time can give us a sum equal to 1 since dice do not have the number 0. • The smallest sum we can find is equal to 2, resulting from the combination (1,1) • P(A) = 0 To calculate P(B), we need to identify from our sample space the combinations that yield a sum of 5. In this case, we have 4 combinations (colored ones). • We can use the general formula • P(A) = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑨 𝒕𝒐𝒕𝒂𝒍 • 𝑷 𝑨 = 𝟒 𝟑𝟔 = 𝟏 𝟗 = 𝟎. 𝟏𝟏𝟏 We can observe that the combination resulting in the largest sum is the (6,6) with a sum of 12. • This means that all possible combinations will yield a sum lower than 13 • P(C) is the probability of the entire sample space • P(C) = 1
- 10. Answers Question An experiment has four mutually exclusive outcomes, A, B, C, and D. If P(A) = 0.33, P(B) = 0.17, P(C) = 0.43, P(D) = 0.07, which of the following statements must be true? 9 A. All of the events are independent with each other B. The marginal probability of A equals the conditional probability of A given D C. The joint probability of C and B is equal to 0 D. None of the alternatives is correct Answer: C 3. Probability Theory
- 11. 3E. Probability Theory Question An experiment has four mutually exclusive outcomes, A, B, C, and D. If P(A) = 0.33, P(B) = 0.17, P(C) = 0.43, P(D) = 0.07, which of the following statements must be true? 10 Solution A. Incorrect. Given that all of our 4 events are mutually exclusive, they cannot happen at the same time. Thus, we know that our events must be dependent on each other. B. Incorrect. This is only the case when the 2 events are independent with one another [𝑃 𝐴 = 𝑃 ⁄ 𝐴 𝐵 .] C. Correct. Οur events are mutually exclusive, meaning that they cannot happen at the same time. [P(C AND B) = 0] D. Incorrect. C is the correct statement.
- 12. Answers Question Suppose we conduct a random experiment and two events, A and B are independent. Which of the following rules can we use to prove the relationship between A and B? 11 A. P(A and B) = 0 B. P(and B) = P(A) x P(B/A) C. P(A or B) = P(A) + P(B) – P(A and B) D. P(A)=P(A/B) Answer: D 4. Probability Theory
- 13. 4E. Probability Theory Question Suppose we conduct a random experiment and two events, A and B are independent. Which of the following rules can we use to prove the relationship between A and B? 12 Solution A. Incorrect. P(A and B) = 0 is the rule for spotting disjoint events. It shows that the two events cannot happen at the same time. B. Incorrect. P(A and B) = P(A) x P(B/A) is the general multiplication rule C. Incorrect. P(A or B) = P(A) + P(B) – P(A and B) is the general addition rule D. Correct. P(A) = P(A/B) is a rule for spotting independent events, showing that the probability of event A is not influenced by the occurrence of event B
- 14. Answers Question A recent survey showed that 45% of Success Formula students prefer to visit Tapijn park to relax after a long day of studying. Also, 27% of UM students both like to go to Tapijn park and the city center to relax. Finally, the survey showed that 40% of students said that they don’t visit the city center for some time off. Based on the above data, determine the following probabilities: a. PA: the probability that a randomly selected UM student visits Tapijn given that he/she also visits the city center b. PB: the probability that a randomly selected UM student visits Tapijn or visits the city center 13 A. P(A) = 0.45, P(B) = 0.27 B. P(A) = 0.88, P(B) = 0 C. P(A) = 0.18, P(B) = 0.85 D. P(A) = 0.45, P(B) = 0.78 Answer: D 5. Probability Theory
- 15. 5E. Probability Theory Question A recent survey showed that 45% of Success Formula students prefer to visit Tapijn park to relax after a long day of studying. Also, 27% of UM students both like to go to Tapijn park and the city center to relax. Finally, the survey showed that 40% of students said that they don’t visit the city center for some time off. Based on the above data, determine the following probabilities: a. PA: the probability that a randomly selected UM student visits Tapijn given that he/she also visits the city center b. PB: the probability that a randomly selected UM student visits Tapijn or visits the city center P(Tapijn) = 0.45 P(Tapijn AND City) = 0.27 𝑷 𝑪𝒊𝒕𝒚! = 0.4 P(City) =𝟏 − 𝑷 𝑪𝒊𝒕𝒚! P(City) = 𝟏 − 𝟎. 𝟒 = 𝟎. 𝟔 14 Solution Ø For P(A) we are looking for the P(Tapijn/City) Ø We can first check if these 2 events are independent • 𝑃 𝐴 𝐴𝑁𝐷 𝐵 = 𝑃 𝐴 ×𝑃 𝐵 à rule for spotting independence • 0.27 = 0.45 × 0.6 • 0.27 = 0.27 à P(Tapijn) and P(City) are independent • P(Tapijn/City) = P(Tapijn) • P(A) = 0.45 Ø For P(B) we want the P(Tapijn Or City) Ø The joint probability of these events is not equal to 0, thus the events are non-disjoint Ø We can use the general formula • 𝑃 𝐵 = 𝑃 𝑇𝑎𝑝𝑖𝑗𝑛 + 𝑃 𝐶𝑖𝑡𝑦 − 𝑃 𝑇𝑎𝑝𝑖𝑗𝑛 𝐴𝑛𝑑 𝐶𝑖𝑡𝑦 • P(B) = 0.45 + 0.6 – 0.27 • P(B) = 0.78
- 16. Answers Question Suppose one runs a random experiment with 3 events (A, B, C). Events A and B are disjoint, C is independent of A and dependent with B. P(B) = 0.3, P(C/B) = 0.135, P(C/A) =0.48, P(C and A) = 0.16. Calculate the following probabilities: a. P(C) b. P(A and B) c. P(B or C) d. P(A or B) 15 A. P(C) = 0.48, P(A and B) = 0, P(B or C) = 0.74, P(A or B) = 0.63 B. P(C) = 0.48, P(A and B) = 0.0405, P(B or C) = 0.78, P(A or B) = 0 C. P(C) = 0.48, P(A and B) = 0, P(B or C) = 0.63, P(A or B) = 0.74 D. P(C) = 0.48, P(A and B) = 0.73, P(B or C) = 0.86, P(A or B) = 0.63 Answer: A 6. Probability Theory
- 17. 6E. Probability Theory Question Suppose one runs a random experiment with 3 events (A, B, C). Events A and B are disjoint, C is independent of A and dependent with B. P(B) = 0.3, P(C/B) = 0.135, P(C/A) =0.48, P(C and A) = 0.16. Calculate the following probabilities: a. P(C) b. P(A and B) c. P(B or C) d. P(A or B) 16 Graph Event C Event B Event A Solution Since events A and C are independent we can say: • P(C) = P(C/A) • P(C) = 0.48 We know that events A and B are disjoint and we also see that there is no intersection in the graph: • P(A and B) = 0 P(B or C) = P(B) + P(C) – P(B and C) • We do not have P(B and C) but we can find it using the multiplication rule • P(B and C) = P(B) x P(C/B) = 0.3 X 0.135 = 0.0405 • P(B or C) = 0.3 + 0.48 - 0.0405 = 0.74 Since A and B are disjoint events we will use the special form of the formula: • P(A or B) = P(A) +P(B) • We can calculate P(A) using the multiplication rule • P(C and A) = P(A) x P(C) • à P(A) = 0.16/0.48 = 0.33 P(A or B) = 0.33 + 0.3 = 0.63
- 18. Answers Question Remco decides to investigate which Dutch delicacy is most preferred by students in Maastricht. He writes down his results in the following table. Calculate the following probabilities: 1. The probability that we randomly select a student who likes fries, given that they are a male 2. The probability that we randomly select a student who is a female, given they like fries 3. The probability that the student likes bitterballen 17 A. P(1) = 66.67%, P(2) = 34.78%, P(3) = 32.5% B. P(1) = 20%, P(2) = 66.67%, P(3) = 17.5% C. P(1) = 34.78%, P(2) = 33.33%%, P(3) = 32.5% D. P(1) = 34.78%, P(2) = 23.52%, P(3) = 17.5% Answer: C 7. Probability Theory Fries Bitterballen Stroopwaffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
- 19. 7E. Probability Theory Question Remco decides to investigate which Dutch delicacy is most preferred by students in Maastricht. He writes down his results in the following table. Calculate the following probabilities: 1. The probability that we randomly select a student who likes fries, given that they are a male 2. The probability that we randomly select a student who is a female, given they like fries 3. The probability that the student likes bitterballen 18 Solution P(1) = P(Fries/Male) • It is a conditional probability so we are not working within the entire sample space • The condition indicates the denominator • 𝑃 1 = !" ##$ = 34,78% P(2) = P(Female/Fries) • P(2) = "# $# = 33.33% P(3) = P(Bitterballen) • It is the marginal probability within the entire sample space • P(3) = $% "## = 32.5% Fries Bitter ballen Stroop waffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
- 20. Answers Question Refer to the table from the previous question. Which of the following statements is correct: 19 A. The probability P(Bitterballen/Female) is not evaluated across the entire sample space B. The events of picking randomly someone that is a female and of picking randomly someone who likes stroopwaffles are disjoint C. The marginal probability of P(Fries) is equal to the conditional probability of P(Fries/Male) D. The events of randomly picking a male and randomly picking someone that likes stroopwaffles are independent Answer: A 8. Probability Theory Fries Bitterballen Stroopwaffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
- 21. 8E. Probability Theory Question Refer to the table from the previous question. Which of the following statements is correct: 20 Solution A. Correct. P(Bitterballen/Female) is not evaluated across the entire sample space, Conditional probabilities are evaluated across a subset of the entire sample space, in this case acorss the subset of females. B. Incorrect. We can see from the table that there are females that prefer stroopwaffles (n=35), so these 2 events can happen at the same time (not Disjoint) C. Incorrect. P(Fries) ≠ P(Fries/Male) 𝑃(𝐹𝑟𝑖𝑒𝑠) = 60 200 = 0.3 𝑃 𝐹𝑟𝑖𝑒𝑠 𝑀𝑎𝑙𝑒 = 40 115 = 0.35 D. Incorrect. P(Male) ≠ P(Male/Stroopwaffles) 𝑃 𝑀𝑎𝑙𝑒 = 115 200 = 0.575 𝑃 𝑀𝑎𝑙𝑒/𝑆𝑡𝑟𝑜𝑜𝑝𝑤𝑎𝑓𝑓𝑙𝑒𝑠 = 40 80 = 0.2 Fries Bitterb allen Stroop waffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
- 22. Answers Question The probability of meeting someone who wears eyeglasses randomly in the street is 0.55. When meeting 4 random people, what is the probability that the number of people that you meet wearing eyeglasses is 3 or higher? 21 A. P(X≥ 3) = 0.392 B. P(X≥ 3) = 0.346 C. P(X≥ 3) = 0.092 D. The probability cannot be calculated because we do not have the sample size Answer: A 9. Probability Theory
- 23. 9E. Probability Theory Question The probability of meeting someone who wears eyeglasses randomly in the street is 0.55. When meeting 4 random people, what is the probability that the number of people that you meet wearing eyeglasses is 3 or higher? 22 Solution G G G G NG NG G NG NG G G NG NG G NG NG G G G NG NG G NG NG G G NG NG G NG 0.55 0.45
- 24. 9E. Probability Theory 23 Find the Right Combinations Since we are looking for the probability of meeting 3 or more people with glasses in our sample of 4, the right combinations are the following: • G-G-G-G • G-G-G-NG • G-G-NG-G • G-NG-G-G • NG-G-G-G Calculate the Probabilities We need to calculate the probabilities using multiplication for each of the combinations: • G-G-G-G è 0.55 x 0.55 x 0.55 x 0.55 = 0.092 • G-G-G-NG è 0.55 x 0.55 x 0.55 x 0.45 = 0.075 • G-G-NG-G è 0.55 x 0.55 x 0.45 x 0.55 = 0.075 • G-NG-G-G è 0.55 x 0.45 x 0.55 x 0.55 = 0.075 • NG-G-G-G è 0.45 x 0.55 x 0.55 x 0.55 = 0.075 Sum Them Up We need to add all of the probabilities we just calculated to find the overall probability of meeting 3 or more people with glasses [P(x ≥ 3)] • 0.092 + 0.075 + 0.075 + 0.075 + 0.075 = 0.392
- 25. Answers Question Given the following probability distribution, what is the approximate variance of X? 24 A. 4.05 B. -1.66 C. 7.38 D. 15.52 Answer: D 10. Probability Theory X P(x) 0 0.4 1 0.8 2 0.32 3 0.15 4 0.54
- 26. 10E. Probability Theory Question 25 Solution Ø First, we need to calculate the expected value in order to use in the formula for the variance: • µ𝒙 = ∑ 𝑃(𝑥) ∗ x = 0 x 0.4 + 1 x 0.8 + 2 x 0.32 + 3 x 0.15 + 4 x 0.54 = 4.05 Ø We can now calculate the variance using the formula 𝜎3² = ∑ 𝑃(𝑥) ∗ (𝑥 − µ3)² • 𝜎3² = 0.4 0 − 4.05 4 + 0.8 1 − 4.05 4 + 0.32 2 − 4.05 4 + 0.15 3 − 4.05 4 + 0.54 4 − 4.05 4 𝜎3² = (6.56) + (7.44) + (1.34) + (0.17) + (0.00135) 𝝈𝒙² = 15.52 Given the following probability distribution, what is the variance of X? X P(x) 0 0.4 1 0.8 2 0.32 3 0.15 4 0.54
- 27. Stats1 – Question Pool Probability Distribution
- 28. Answers Question Thomas takes a standardized test as part of his university application. Standardized tests allow comparisons to be made regarding student achievement. When he received his results, he was told that he scored -0.28 in terms of Z-scores. However, he is not sure whether that is a good or bad result. Given that the test scores are normally distributed, what can he conclude from the result? 27 A. He did better than half of the participants B. He did worse than half of the participants C. He did worse than 28% of the participants D. Nothing can be said because we do not have the standard deviation and the mean Answer: B 1. Probability Distribution
- 29. 1E. Probability Distribution Question Thomas takes a standardized test as part of his university application. Standardized tests allow comparisons to be made regarding student achievement. When he received his results, he was told that he scored -0.28 in terms of Z-scores. However, he is not sure whether that is a good or bad result. Given that the test scores are normally distributed, what can he conclude from the result? 28 Solution Ø Since Thomas has a Z-score equal to -0.28, it means that he scored 0.28 standard deviations below the mean. The negative sign indicates the direction in regards to the mean. The mean is the average, with 50% of the scores below and 50% of the scores above it. Since Thomas is on the left side, we can say that he performed worse than 50% of the test takers.
- 30. 1E. Probability Distribution 29 µ 50% 50% Z=-0.28
- 31. Answers Question Lea decides to investigate the average income distribution in her hometown. She observes that the majority of households have a low to middle income and a small minority with a high-income. Which of the following statements is correct? 30 A. Scores located within 1 standard deviation to the left and right of the mean make up 68% of the entire data set B. A household with an income of 2.3 standard deviations above the mean is in the top 2.5% of the population C. The variable in question is a discrete variable D. None of the above statements is correct Answer: D 2. Probability Distribution
- 32. 2E. Probability Distribution Question Lea decides to investigate the average income distribution in her hometown. She observes that the majority of households have a low to middle income and a small minority with a high-income. Which of the following statements is correct? 31 Solution Ø From the discription, we can understand that the distribution of average income is right skewed, rather than a normal distribution. Ø A) and B) alternatives are wrong because they refer to the rule of thumb (68%-95%-99.7%), which can only be used for normal distributions Ø The thrid alternative is wrong because the variable of average income can take infinite possible values, thus the variable is continuous
- 33. Answers Question Alexandra decides to measure extraversion scores of students at Success Formula. The scores are well modeled by a normal distribution with a mean of 72 and a standard deviation of 14. What is the probability of a randomly selected person to score between 66 and 76 for extraversion? 32 A. 28.05% B. 61.41% C. 32.98% D. 40.82% Answer: A 3. Probability Distribution
- 34. 3E. Probability Distribution Question Alexandra decides to measure extraversion scores of students at Success Formula. The scores are well modeled by a normal distribution with a mean of 72 and a standard deviation of 14. What is the probability of a randomly selected person to score between 66 and 76 for extraversion? 33 Solution Calculate the z-scores: 𝑧& = '$('" &) = 0.29 and 𝑧" = $$('" &) = −0.43 Look up probabilities in z-table: 𝑧& = 0.29 → 61.41% and 𝑧" = −0.43 → 33.36% Calculate the probability that the score is between 66 and 78: 61.41% − 33.36% = 28.05%
- 35. Answers Question Suppose that Alexandra measures extraversion scores for a different population with a mean of 80 and a standard deviation of 9. What is the probability that a randomly selected person scores higher than 91? 34 A. 73.89% B. 11.12% C. 40.57% D. 55.63% Answer: B 4. Probability Distribution
- 36. 4E. Probability Distribution Question Suppose that Alexandra measures extraversion scores for a different population with a mean of 80 and a standard deviation of 9. What is the probability that a randomly selected person scores higher than 91? 35 Solution Calculate the z-scores: 𝑧& = *(+ , = -&(.# - = 1.22 Look up probabilities in z-table: 𝑧& = 1.22 → 0.8888 (𝑇ℎ𝑖𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑒𝑓𝑡 𝑠𝑖𝑑𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦) Calculate the probability that score is higher than 91 (right sided probability): 1 − 0.8888 = 0.1112 → 11.12%
- 37. Answers Question According to the Central Limit Theorem: 36 A. The sample distribution becomes normal if there is a sufficient sample size (n>25) B. The sampling distribution becomes normal only when the population distribution is normal C. Regardless of the shape of the population distribution, the sampling distribution will always be normal D. As a sample size increases, the sample mean and standard deviation will be closer in value to the population mean µ and standard deviation σ Answer: D 5. Probability Distribution
- 38. 5E. Probability Distribution Questions According to the Central Limit Theorem: 37 Solution A. Incorrect. It is not the sample distribution that approaches normality when there is a sufficiently large sample. It is the sampling distribution. B. Incorrect. The sampling distribution is indeed normal when the population distribution is normal but it can also approach normality whenever the sample size is suffciently large, regardless of the population’s shape C. Incorrect. The sampling distribution is not always normal. For a small sample size, it has a similar shape to the population distribution and not necessarly normal. For a large sample size, it becomes approximately normal D. Correct. As the sample size becomes larger, the mean of all sampled variables and the variances of the samples become approximately equal to that of the population.
- 39. Answers Question Maja plans to study the effects of Omega-3 supplements on antisocial behaviour. She develops a measurement which will be filled by her participants before and after a 2-month long trial during which subjects will be taking daily omega-3 supplements. However, she has trouble recruiting a high number of participants. Given that the sample size is not large enough, which of the following statements is incorrect: 38 A. The sample mean is a biased estimator of the population mean B. The shape of the sampling distribution will be similar to that of the population distribution C. The standard error will probably be too high D. There is a high risk of unreliable statements about population parameters Answer: A 6. Probability Distribution
- 40. 6E. Probability Distribution Question Maja plans to study the effects of Omega-3 supplements on antisocial behaviour. She develops a measurement which will be filled by her participants before and after a 2-month long trial during which subjects will be taking daily omega-3 supplements. However, she has trouble recruiting a high number of participants. Given that the sample size is not large enough, which of the following statements is incorrect: 39 Solution A. This statement is incorrect. Bias is not depended on the size of the sample. We might have an inaccurate estimate, but if we are using the right one for the population parameter, the estimate is still unbiased. An estimate will be biased if the estimate is not the appropriate one (e.g., no random sample) B. Correct. Since Maja has a small sample size, the sampling distribution has a similar shape to the population distribution and not necessarly a normnal one. C. Correct. Based on the C.L.T, the lower the sample size, the greater the standard error D. Correct. Larger sample sizes allow more reliable statements about population parameters, compared to small sample sizes.
- 41. 6E. Probability Distribution 40 Estimator Something that is used in statistics to estimate some facts about population. à Sample mean is an estimator of population mean. Bias Bias = the difference between the expected value that is estimated and the true value of the parameter à The V 𝑿 of a simple random sample is always unbiased. Efficiency The accuracy of the sample mean. à The larger the sample size, the smaller the standard error. à The smaller the standard error, the more efficient the estimate.
- 42. Answers Question Alexithymia is a personality trait which features inability to describe, identify and experience emotions. In a population of people with borderline alexithymia, emotional intelligence scores have a mean of 57 and a standard deviation of 15. The population distribution is skewed the right. Darian takes a simple random sample of 32. What is the probability that our sample mean will be between 55 and 60? 41 A. 74.86% B. 13.11% C. 64.42% D. The probability cannot be calculated because the population distribution is skewed Answer: C 7. Probability Distribution
- 43. 7E. Probability Distribution Question Alexithymia is a personality trait which features inability to describe, identify and experience emotions. In a population of people with borderline alexithymia, emotional intelligence scores have a mean of 57 and a standard deviation of 15. The population distribution is skewed the right. Darian takes a simple random sample of 32. What is the probability that our sample mean will be between 55 and 60? µ = 57 σ = 15 n = 32 à Central Limit Theorem applies (n >25) 42 Solution Ø Calculate Z-scores 𝑧& = X Χ − 𝜇 𝜎 𝑛 = 60 − 57 15 32 = 1.13 z" = X Χ − 𝜇 𝜎 𝑛 = 55 − 57 15 32 = −0.75 Ø Look up probabilities in z-table 𝑧& = 1.13 → 87.08% 𝑧" = −0.75 → 22.66% Ø Calculate the probability that the score is between 55 and 60: 87.08% − 22.66% = 64.42%
- 44. Answers Question A certain variable follows a normal population distribution. The population mean is equal to 23.48 and the standard deviation equal to 4.657. The probability that the sample mean is higher than 24 equals 25.14%. Calculate the sample size. 43 A. 49 B. 24 C. 36 D. The sample size cannot be calculated Answer: C 8. Probability Distribution
- 45. 8E. Probability Distribution Question A certain variable follows a normal population distribution. The population mean is equal to 23.48 and the standard deviation equal to 4.657. The probability that the sample mean is higher than 24 equals 25.14%. Calculate the sample size. µ = 23.48 σ = 4.657 P( ̅ 𝑥 > 24) = 25.14% 44 Solution Ø We need to see for which Z-score, the probability of having a sample mean higher than 24 equals 25.14% • Since it is a right-sided probability, we need to substract from 1 (table gives left-sided probabilities) • 1-0.2514=0.7486 • We can find the 0.7486 in the table and it is for the z-score of 0.67 Ø We can use the Z-formula 𝑧 = X 𝑋 − 𝜇 𝜎 𝑛 0.67 = 24 − 23.48 4.657 𝑛 = 0.52 4.657 𝑛 0.67 = 0.52× 𝑛 4.657 𝑛 = 0.67×4.657 0.52 = 6 𝒏 = 𝟔𝟐 = 𝟑𝟔
- 46. Answers Question Eero develops a new brand of cherry soda and he has decided on a specific bottle design. The contents of soda bottles are normally distriuted with a mean of 400 and a standard deviation of 7. There is a 8.38% chance that the average contents of a 4-pack will exceed how many ml? 45 A. 400.12 B. 404.83 C. 407.31 D. 400.60 Answer: B 9E. Probability Distribution
- 47. 9E. Probability Distribution Question Eero develops a new brand of cherry soda and he has decided on a specific bottle design. The contents of soda bottles are normally distriuted with a mean of 400 and a standard deviation of 7. There is a 8.38% chance that the average contents of a 4-pack will exceed how many ml? 46 Solution Ø We know that the contents of the soda bottles are normally distributed, thus we can use the Z-table Ø P( ̅ 𝑥>?)=8.38 (right sided probability) ⇔ 1– 0.0838 = 0.9162 ⇔ Z = 1.38 𝑍 = ̅ 𝑥 − 𝜇 g 𝜎 𝑛 1.38 = ̅ 𝑥 − 400 g 7 4 4.83 + 400 = ̅ 𝑥 ̅ 𝑥 = 404.83
- 48. Answers Question Leonie wishes to investigate homeslessness experiences in Maastricht. However, there is no list of homeless people in the city. She decides to use instead a non-random sampling method known as snowball sampling. Leonie meets one homeless person who participates in her research and also put her in contact with other homeless people in the area that they know. Using this method she is able to gather 178 participants. Which of following statements pertaining to the population estimator is true? 47 A. The estimator is unbiased and efficient B. The estimator is unbiased and not efficient C. The estimator is biased and efficient D. The estimator is biased and not efficient Answer: C 10. Probability Distribution
- 49. 10E. Probability Distribution Question Leonie wishes to investigate homeslessness experiences in Maastricht. However, there is no list of homeless people in the city. She decides to use instead a non-random sampling method known as snowball sampling. Leonie meets one homeless person who participates in her research and also put her in contact with other homeless people in the area that they know. Using this method she is able to gather 178 participants. Which of following statements pertaining to the population mean estimator is true? 48 Solution Ø Leonie is using a non-random sampling method, meaning that her sample is not random. This can lead to Leonie using an inappropriate estimator for the population mean which would make her estimator biased. ’Bias’ has nothing to do with the sample size Ø Leonie has a sample size of 178 participants which is a sufficiently large sample (C.L.T). Thus, her estimator for the population mean will indeed be efficient. As the sample size increases, the standard error decreases
- 50. Stats1 – Question Pool Hypothesis Testing
- 51. Answers Question A researcher claims that he was able to develop a drug that enhances human attention. He will test this hypothesis by recruiting 80 individuals with Attention Deficit Disorder (ADD). He divides evenly his sample into 2 groups and makes sure that the groups are matched in their attention levels. He continues by administering the drug only in group 1, keeping group 2 as a control. Finally, all participants across both groups have to complete an Attention Test, with higher scores indicating worse attention. What is the researcher’s null and alternative hypothesis? 50 A. H0: µ1= µ2, Hα: µ1 ≠ µ2 B. H0: µ1 ≠ µ2, Hα: µ1< µ2 C. H0: µ1= µ2 Hα: µ1> µ2 D. H0: µ1= µ2 Hα: µ1< µ2 Answer: D 1. Probability Theory
- 52. 1E. Hypothesis Testing Question A researcher claims that he was able to develop a drug that enhances human attention. He will test this hypothesis by recruiting 80 individuals with Attention Deficit Disorder (ADD). He divides evenly his sample into 2 groups and makes sure that the groups are matched in their attention levels. He continues by administering the drug only in group 1, keeping group 2 as a control. Finally, all participants across both groups have to complete an Attention Test, with higher scores indicating worse attention. What is the researcher’s null and alternative hypothesis? 51 Solution A. Incorrect. The alternative hypothesis indicates a two-sided test (Hα: µ1 ≠ µ2). The researcher wants to test the hypothesis that the drug enhances human attention, so we are looking for a one-sided test. B. Incorrect. The null hypothesis always suggests that there is no significant relationship between our data. In this case, it is the hypothesis that the drug will not have an effect on the mean of group 1 (H0: µ1 =µ2) C. Incorrect. The alternative hypothesis states that the mean of group 1 should be higher than that of group 2 after the drug administration. However, higher scores mean worse attention levels. Since the researcher expects that the drug is beneficial, we should be expecting that group 1 has better attention levels than group 2, thus lower scores D. Correct. The alternative hypothesis claims that group 2 will have worse attention relative to group 1, as seen from their higher test scores
- 53. Answers Question Refer back to the example in question one. The researcher is informed that the population of people with ADD is skewed to the right. Which of the following statements is correct? 52 A. The researcher can still test his hypothesis because normality is not a necessary condition B. The researcher can still test his hypothesis because his sample size is large enough C. The researcher cannot test his hypothesis because there is no normality in the population D. The researcher cannot test his hypothesis because his sample size is not large enough Answer: B 2. Hypothesis Testing
- 54. 2E. Hypothesis Testing Question Refer back to the example in question one. The researcher is informed that the population of people with ADD is skewed to the right. Which of the following statements is correct? 53 Solution A. Incorrect. In order to be able to test our hypothesis, we need to make sure that we are working with a normal distribution B. Correct. The researcher can indeed do the test because he has a large enough sample size, meaning that the central limit theorem applies (= the sampling distribution approximates a normal distribution as the sample size gets larger, regardless of the population distribution) C. Incorrect. Since the central limit theorem applies, we do not need to worry about the skewed population distribution D. Incorrect. The sample size is large enough. The cut-off for the central limit theorem to apply is n ≥ 25
- 55. Answers Question Florian believes that a new Artificial Intelligence teaching method can influence student ratings compared to using human tutors. He is however unsure about what this influence can look like because, despite the AI’s greater efficiency, students might still prefer human interaction during their tutorials. Florian then takes a SRS of 27 students from a population of students with a mean rating of µ=30,2 and a standard deviation of σ=16. The sample of students take a lesson from the AI system and then give it a rating with a mean of 24,5. Can Florian conclude that the mean rating of the AI system is significantly different from the mean of the normal method? 54 A. Yes, we reject the null hypothesis with the p-value of 0.0322 B. Yes, we reject the null hypothesis with the p-value of 0.0644 C. No, we cannot reject the null hypothesis with the p-value of 0.0322 D. No, we cannot reject the null hypothesis with the p-value of 0.0644 Answer: D 3. Hypothesis Testing
- 56. 3E. Hypothesis Testing Question Florian believes that a new Artificial Intelligence teaching method can influence student ratings compared to using human tutors. He is however unsure about what this influence can look like because, despite the AI’s greater efficiency, students might still prefer human interaction during their tutorials. Florian then takes a SRS of 27 students from a population of students with a mean rating of µ=30,2 and a standard deviation of σ=16. The sample of students take a lesson from the AI system and then give it a rating with a mean of 24,5. The significance level is 5% Can Florian conclude that the mean rating of the AI system is significantly different from the mean of the normal method? 55 Data Η0: 𝜇& = 𝜇" Hα: 𝜇& ≠ 𝜇" (2-tailed test) α = 0.05 µ = 30.2 σ = 16 n = 27 ̅ 𝑥 = 24.5 Solution Ø The sample size is large enough (n=27), so we can continue with the test Ø We can use the Z formula to calculate the Zobs 𝑍012 = X 𝑋 − 𝜇 𝜎 𝑛 = 24.5 − 30.2 16 27 = −1.85 Ø Using the Z-table we see that a Zobs with a value of -1.85 is matched to a p-value of 0.0322 Ø Since we have a 2-tailed test, we need to double our p-value 𝑝 − 𝑣𝑎𝑙𝑢𝑒×2 0.0322×2 = 0.0644 Ø We can then compare our p-value to the alpha 0.0644 > 0.05 Ø The p-value is larger than the α, thus the null hypothesis cannot be rejected
- 57. Answers Question Suppose that for a two-sided test, an experimenter decides to have a significance level of 0.10. Which of the following statements is incorrect? 56 A. The Z-critical is going to be equal to ±1.65 B. The probability of a type 1 error is equal to 10% C. If the null hypothesis is rejected at this level, then it will also be rejected at α=0.05 D. With the current significance level, there is a lower probability of not rejecting a false null hypothesis compared to a significance level of 0.05 Answer: C 4. Hypothesis Testing
- 58. 4E. Hypothesis Testing Question Suppose that for a two-sided test, an experimenter decides to have a significance level of 0.10. Which of the following statements is incorrect? 57 Solution A. Correct. In case of a two-sided test with α=10%, then the Z-critical becomes +/- 1.65 B. Correct. The probability of a type 1 error is always equal to the significance level of the study • Type 1 error = α = 10% C. Incorrect. If the null hypothesis is rejected at α = 10%, it does not necessarily mean that it will be rejected at α = 1% • E.g., a p-value equal to 0.04 is smaller than 0.10, however it is not smaller than 0.01. Thus, the H0 would be rejected at α = 10% but not at α = 1% D. Correct. By increasing the significance level, we make the decision criteria more lenient, making it more difficult to commit a type 2 error. However, we simultaneously increase the risk of a false positive, that is rejecting a true null hypothesis 90% 5% 5%
- 59. Answers Question A questionnaire has been constructed to measure the level of psychopathy for incarcerated individuals. The population is normally distributed with a mean of 44 and a standard deviation of 12. A researcher wants to check the hypothesis that the population mean is different, so she draws a SRS of 23 individuals. The sample mean is 53. What are the boundaries of a 90% confidence interval based on this specific sample? 58 A. [48.87, 57.13] B. [48.14, 56.90] C. [43.89, 54.96] D. [49.63, 52.47] Answer: A 5. Hypothesis Testing
- 60. 5E. Hypothesis Testing Question A questionnaire has been constructed to measure the level of psychopathy for incarcerated individuals. The population is normally distributed with a mean of 44 and a standard deviation of 12. A researcher wants to check the hypothesis that the population mean is different, so she draws a SRS of 23 individuals. The sample mean is 53. What are the boundaries of a 90% confidence interval based on this specific sample? 59 Solution H0: µ = 44 Hα: µ≠ 44 µ = 44 σ = 12 n = 23 X 𝑋 = 53 Zc = 1.65 (because it is a 90% CI) 𝑋𝑜𝑏𝑠 ± 𝑍𝑐× 𝜎 𝑛 53 ± 1.65× 12 23 53 − 1.65× 12 23 = 53 − 1.65×2.5 = 48.87 53 + 1.65× 12 23 = 53 + 1.65×2.5 = 57.13 [48.87, 57.13]
- 61. Answers Question Suppose we have a 95% Confidence Interval [37.2, 42.5]. Calculate the sample mean and the standard error 60 A. X 𝑋 = 40.05, 𝑆𝐸 = 3,39 B. X 𝑋 = 38.74, 𝑆𝐸 = 4.63 C. X 𝑋 = 39.85, 𝑆𝐸 = 1.35 D. X 𝑋 = 41.40, 𝑆𝐸 = 2.22 Answer: C 6. Hypothesis Testing
- 62. 6E. Hypothesis Testing Sample Mean Suppose we have a 95% Confidence Interval [37.2, 42.5]. Calculate the sample mean and the standard error. α = 5% Zc = 1.96 CI [37.2, 42.5] V 𝒙 ± 𝒁𝒄× 𝝈 𝒏 V 𝒙 ± 𝟏. 𝟗𝟔× 𝝈 𝒏 61 Standard Error Ø Confidence interval: x̄012 ± 𝑍3 ∗ g 4 5 Ø From the previous calculations we can see that: 1.96× 𝜎 𝑛 = ̅ 𝑥−37.2 Ø We already found the sample mean, so we can use it to calculate the fruction: 1.96× 𝜎 𝑛 = 39.85 − 37.2 𝜎 𝑛 = 2.65 1.96 𝜎 𝑛 = 1.35 37.2 = ̅ 𝑥 − 1.96× 𝜎 𝑛 1.96× 𝜎 𝑛 = ̅ 𝑥−37.2 42.5 = ̅ 𝑥 + (1.96× 𝜎 𝑛 ) 42.5 = ̅ 𝑥 + ̅ 𝑥 − 37.2 2 ̅ 𝑥 = 42.5 + 37.2 2 ̅ 𝑥 = 79.7 ̅ 𝑥 = 79.7 2 4 𝒙 = 𝟑𝟗. 𝟖𝟓 Standard Error
- 63. Answers Question Going back to the example of the previous question, what can be said about the null hypothesis, given that the population mean is equal to 36.05? 62 A. The null hypothesis is accepted B. The null hypothesis is rejected C. The null hypothesis cannot be rejected D. Nothing can be said about the null hypothesis with the current data Answer: B 7. Hypothesis Testing
- 64. 7E. Hypothesis Testing Question Going back to the example of the previous question, what can be said about the null hypothesis, given that the population mean is equal to 36.05? 63 Solution A. Incorrect. When doing a hypothesis test, we can either reject the null hypothesis or do not reject the null hypothesis, but we can never accept the null hypothesis. We cannot conclude that the null hypothesis is true merely because we did not find evidence to reject it B. Correct. We can see that for our 2-tailed test, the population mean is not included within the range of the 90% CI, so the null hypothesis is rejected C. Incorrect. Since the population mean is not included in the confidence interval, the null hypothesis is rejected D. Incorrect. The second statement is correct.
- 65. 7E. Hypothesis Testing 64 Condifence Interval Ø A confidence interval is an interval estimate of µ. Ø It shows the values that the population mean probably falls between V 𝑿 ± 𝒁𝒄× 𝝈 𝒏 Interpretation Example: 95% Confidence Interval Ø If we draw infinite Confidence Intervals, then 95% of those CI have the population mean µ Hypothesis Testing Ø We can use the confidence interval to see if the null hypothesis is rejected or not for a two-tailed test Ø If the population mean from the null hypothesis is located inside the interval, then the null hypothesis cannot be rejected because the specific value is a possible population mean Ø If the population mean from the null hypothesis is not located inside the interval, the null hypothesis is rejected
- 66. Answers Question Tobias investigates the effects of participative leadership on satisfaction levels within employees.The sample mean is equal to 73.8. The boundaries of the 95% confidence interval are [71.4, 76.5]. Calculate the margin of error and the standard error. 65 A. ME = 5.7, SE = 1.22 B. ME = 2.4, SE = 1.22 C. ME = 2.9, SE = 3.91 D. ME = 2.4, SE = 4.75 Answer: B 8. Hypothesis Testing
- 67. 8E. Hypothesis Testing Question Tobias investigates the effects of participative leadership on satisfaction levels within employees.The sample mean is equal to 73.8. The boundaries of the 95% confidence interval are [71.4, 76.5]. Calculate the margin of error and the standard error 66 Solution X 𝑋 = 73.8 95% 𝐶. 𝐼 → [71.4, 76.5] Zcritical = 1.96 Margin of error: L 𝑋 ± 𝑍5× 𝜎 𝑛 L 𝑋 − 𝑍5× 𝜎 𝑛 = 71.4 𝑍5× 𝜎 𝑛 = L 𝑋 − 71.4 = 73.8 − 71.4 𝑍5× 𝜎 𝑛 = 2.4 Standard error: 𝑍6× 𝜎 𝑛 = 2.4 𝜎 𝑛 = 2.4 𝑍6 = 2.4 1.96 = 1.22
- 68. Answers Question Kian is the HR manager for Success Formula. He noticed that the employees are lately having more stress than usual, so he decides to evaluate their stress levels using a measurement scale (less points = less stress). On average, the 26 employees had a stress score of 83 with a standard deviation of 17 . Kian then decided to implement a mindfulness program with the goal of reducing stress scores by 8 points. The significance level is 5% What is the power of the test, given that the mindfulness program works as Kian was expecting? 67 A. 0.7734 B. 0.2266 C. 0.6066 D. 0.7123 Answer: B 9. Hypothesis Testing
- 69. Question Kian is the HR manager for Success Formula. He noticed that the employees are lately having more stress than usual, so he decides to evaluate their stress levels using a measurement scale (less points = less stress). On average, the 26 employees had a stress score of 83 with a standard deviation of 17 . Kian then decided to implement a mindfulness program with the goal of reducing stress scores by 8 points. The significance level is 5% What is the power of the test, given that the mindfulness program works as Kian was expecting? H0: µ = 83 Ηα: µ < 83 Zc = -1.65 α = 0.05 n = 26 σ = 17 µ = 83 µ (new) = 75 68 Answer Ø Find the critical value 𝑍3 = 𝑋3 − 𝜇 𝜎 𝑛 −1.65 = Χ3 − 83 17 26 −5.49 = 𝑋3 − 83 ⇒ 𝑋3 = 77.51 Ø Solve for Z 𝑍3 = 𝑋3 − 𝜇(𝑛𝑒𝑤) 𝜎 𝑛 Z = 77.51 − 75 17 26 = 0.75 Ø Find the β • Using the Z-table, we find a p-value of 0.7734 Ø To calculate the power we use the formula: 𝑷𝒐𝒘𝒆𝒓 = 𝟏 − 𝜷 𝑷𝒐𝒘𝒆𝒓 = 𝟏 − 𝟎. 𝟕𝟕𝟑𝟒 = 𝟎. 𝟐𝟐𝟔𝟔 9E. Hypothesis Testing
- 70. 9E. Hypothesis Testing 69 Type II Error Ø Definition: We fail to reject a false null hypothesis Ø Measured by β Ø Calculation: • Find the critical value where 𝑯𝒐 would be rejected. • 𝑍5 = 𝑿𝒄78" 9 # $ à solve for 𝑿𝒄 • Z = 𝑿𝒄78% 9 # $ à solve for Z, then look up P Power Ø Definition: The probability that we are able to reject a false null hypothesis Ø Calculation: • Power = 1 - 𝜷 Illustration
- 71. Answers Question Suppose Micheal is conducting an experiment on fear conditioning. He uses a sample of 65 participants and a significance level of 5%. Before he begins, he wants to make sure that the probability of rejecting a true null hypothesis is as small as possible. Which of the following statements is correct? 70 A. He should increase his sample size B. He should increase the effect size C. He should increase the significance level D. None of the above Answer: D 10. Hypothesis Testing
- 72. 10E. Hypothesis Testing Questions Suppose Micheal is conducting an experiment on fear conditioning. He uses a sample of 65 participants and a significance level of 5%. Before he begins, he wants to make sure that the probability of rejecting a true null hypothesis is as small as possible. Which of the following statements is correct? 71 Solution A. Incorrect. By increasing the sample size, we decrease the standard error and thus the probability of not rejecting a false null hypothesis (Type II error) B. Incorrect. Increasing the effect size is difficult in real life since researchers do not have any control over it. Theoretically, the higher the effect size, the lower the probability of failing to reject a null hypothesis (Type II error) C. Incorrect. By increasing the significance level, it becomes easier to reject a null hypothesis. We increase the probability of rejecting a true H0 hypothesis (Type I error) D. None of the above alternatives is correct. Rejecting a true null hypothesis is the Type I error and its probability is measured by α. We can reduce the probability by reducing the α, but this increases the probability of type II error (Nor recommended)
- 73. Stats1 - Question Pool T-tests 72
- 74. Answers Question A randomly drawn sample of 60 university students undergo exam training. Before the training, their mean score on a practice exam was 68. After the training, their mean score improved by 7 points. What (t-)test would you employ to check if the exam training had a significant effect? 73 A. One-sample t-test B. Paired samples t-test C. Independent samples t-test D. Two-sample t-test Answer: B 1. T-tests
- 75. 1E. T-tests Question A randomly drawn sample of 60 university students undergo exam training. Before the training, their mean score on a practice exam was 68. After the training, their mean score improved by 7 points. What (t-)test would you employ to check if the exam training had a significant effect? 74 Solution A. Incorrect, we compare two dependent samples not the one sample against the population. B. Correct, the groups are paired since we test the sample twice (before and after exam training). C. Incorrect, the two groups are not independent, they are dependent. D. Incorrect, a two-samples t-test is an independent t-test. The groups were dependent, not independent.
- 76. Answers Question When testing a null hypothesis about a single population mean, a t-test is usually performed rather than a z-test. A t-test is more likely to be employed because… 75 A. A t-test has more power than a z-test, leading to a more reliable result. B. Quantitative variables can only be analysed with t-tests. C. Z-tests are more prone to type I errors, which are to be avoided. D. In practice, the standard deviation of a population is rarely known. Answer: D 2. T-tests
- 77. 2E. T-tests T-tests When to use a t-test? When we can’t use the z-scores because, σ (population standard deviation) is unknown • We have to estimate for both parameters. • We use an extra estimate (Sx) • T-distribution is more dispersed relative to the z-distribution • T-test is always less powerful 76 Z-tests Z-tests measure of how many standard deviations our sample (V 𝑿) differs from the hypothesized value of the population mean (𝝁). • Makes use of the z-distribution • More powerful than a t-test • Most times cannot be used, since in reality we do not know much about the parameters of the population
- 78. Answers Question A researcher is interested in the effect of wearing red lipstick on the score at minigolf. They ask 40 people to wear red lipstick while playing 18 holes on the minigolf court. 70 people played the same 18 holes without wearing red lipstick. The dependent variable is the obtained score after the 18 holes (a lower score is considered to be better). The red lipstick condition had a mean score of 47.5 and a standard deviation of 4.3. The no-red lipstick condition had a mean score of 62 and a standard deviation of 9.2. Which test should the researcher use to test the hull hypothesis that the score at minigolf is not affected by wearing red lipstick? 77 A. An independent samples t-test, assuming unequal population variances. B. An independent samples t-test, assuming equal population variances. C. A paired samples t-test. D. A one-sample t-tests. Answer: A 3. T-tests
- 79. 3E. T-tests Question A researcher is interested in the effect of wearing red lipstick on the score at minigolf. They ask 40 people to wear red lipstick while playing 18 holes on the minigolf court. 70 people played the same 18 holes without wearing red lipstick. The dependent variable is the obtained score after the 18 holes (a lower score is considered to be better). The red lipstick condition had a mean score of 47.5 and a standard deviation of 4.3. The no-red lipstick condition had a mean score of 62 and a standard deviation of 9.2. Which test should the researcher use to test the hull hypothesis that the score at minigolf is not affected by wearing red lipstick? 78 Solution A. Correct. The 2 groups are independent, and we compare their samples. The goal of the test is to check if the 2 samples come from populations with equal means. We see that the rule of thumb (𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝐷 ×2 > 𝐵𝑖𝑔𝑔𝑒𝑟 𝑆𝐷) does not hold and the groups don’t have equal sample sizes. This means we have to do the t-test without assuming equal variances B. Incorrect. We cannot assume equal variances because the rule of thumb is violated and the group sizes are not equal C. Incorrect. Paired samples t-test requires matched groups or a within-subject design. D. Incorrect. One sample t-test is used when we have 1 population and want to check if its mean is equal to a specific value.
- 80. 3E. T-tests Assumption T-Test Concerned How to Determine What if Violated Normality All T-Tests 1. Histogram of Sample Scores looks normal 2. Sample Size is large (Central Limit Theorem) Can’t do T-test Quantitative All T-Tests Dependent variable is quantitative Can’t do T-test Dependent Groups Paired T-Test The groups are matched Two-Sample T-test Independent Groups Two-Samples T-Test Two separate groups are measured. Paired T-test Equal Variance Two-Samples T-Test 1. One sample SD is not 2x bigger than the other. (Rule of Thumb). 2. Levene’s Test is not significant. 3. The sample sizes are equal. If the assumption is violated Two-Sample T-test not assuming Equal variance has to be used. à Less powerful 79
- 81. Answers Question The effect of Ritalin on test performance is tested. 31 participants received a Ritalin pill while another 31 participants received a placebo. The test performance is assumed to be good if the score on the test is high. The null hypothesis is that exam performance is the same both under Ritalin and placebo, while the alternative hypothesis is that Ritalin leads to better test performance. The table below presents the group statistics, computed by SPSS (equal variances assumed). What statement is incorrect? 80 A. The means of the two populations are very similar. However, a visual inspection of the group statistics is not enough to reject the null hypothesis. B. The equal variances assumption is violated, thus we should not interpret the test C. The equal variances assumption is not violated, thus we can interpret the test D. During the t-test, we should compute the weighted average of the two standard deviations Answer: B 4. T-tests condition N Mean Std. Deviation Std. Error Mean Test score placebo 31 10.1182 1.9463 .1699 Ritalin 31 10.9374 2.2824 .4099
- 82. 4E. T-tests Question The effect of Ritalin on test performance is tested. 31 participants received a Ritalin pill while another 31 participants received a placebo. The test performance is assumed to be good if the score on the test is high. The null hypothesis is that exam performance is the same both under Ritalin and placebo, while the alternative hypothesis is that Ritalin leads to better test performance. The table below presents the group statistics, computed by SPSS. What statement is incorrect? 81 Solution A. Correct. Sample means are random variables, meaning they change depending on the sample. Thus in order to be able to make conclusions about the populations we need to make sure whether the differences between the means are indeed significant. B. Incorrect. The equal variances assumption is not violated. We can check this using the rule of thumb (biggest SD < smallest SD x 2) C. Correct. Using the rule of thumb, we can see that the product of the smallest SD multiplied by 2 is bigger than the bigger SD (Ritalin group), thus the assumption is not violated D. Correct. Since the equal variances assumtpion is not violated, the 2 standard deviations estimate the same population standard deviation. By computing their weighted average (pooled SD), we have the best estimate of σ condition N Mean Std. Deviation Std. Error Mean Test score placebo 31 10.1182 1.9463 .1699 Ritalin 31 10.9374 2.2824 .4099
- 83. 4E. T-test 82 Checking Equal Variances Assumption We can use 2 ways to check for the assumption 1. Rule of Thumb – Smaller SDx2 should be larger than the Bigger SF 2. Levene’s Test – If the test is significant, the variances are unequal (H0: 𝜎; 4 = 𝜎4 4 ) Violation of Assumption If this assumption is violated, we can continue with the t-test if the sample size across both samples is approximately equally large Special case If there is violation AND the samples have a difference in size, we can do the t-test but only with the following formula: 𝑡 = x̅! − x̅" − (𝜇!− 𝜇") 𝑠! " 𝑛! + 𝑠" " 𝑛" If H0: 𝜇! = 𝜇" → = 0
- 84. Answers Question Natalia is a memory researcher and as part of her pilot study, she wishes to test the differences in memory recall between severe anxiety patients and controls. She suspects that anxiety patients will have different memory recall scores compared to controls. After a memory test, she compares the scores of the groups. The anxiety group has a mean of 12.6 and a standard deviation of 3.38. The control group has a mean of 13.4 and a standard deviation of 2.61. There are 70 participants in total, equally divided into the 2 groups. What can Natalia conclude about the null hypothesis. 83 A. The null hypothesis is not rejected with 0.10 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.15 B. The null hypothesis is rejected with 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05 C. The null hypothesis is not rejected with 0.20 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.30 D. The nyll hypothesis is rejected with 0.02 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025 Answer: C 5. T-tests
- 85. 5E. T-tests Question Natalia is a memory researcher and as part of her pilot study, she wishes to test the differences in memory recall between severe anxiety patients and controls. She suspects that anxiety patients will have different memory recall scores compared to controls. After a memory test, she compares the scores of the groups. The control group has a mean of 13.4 and a standard deviation of 2.61. The anxiety group has a mean of 12.6 and a standard deviation of 3.38. There are 70 participants in total, equally divided into the 2 groups. What can Natalia conclude about the null hypothesis. H0: µ1=µ2 Hα: µ1≠ µ2 n1=n2=35 X1=13.4 X2= 12.6 S1=2.61 S2=3.38 84 Solution Ø Since equal variances assumed, we need to calculate the pooled standard deviation 𝑠#= 𝑛! − 1 𝑠! " + (𝑛" − 1)𝑠"² (𝑛!−1) + (𝑛" − 1) 𝑆𝑝 = 34 < 2.61" + 34 < 3.38" 34 + 34 = 3.02 Ø Next, we need to calculate the Tobs 𝑇 = @ 𝑋! − @ 𝑋" 𝑆𝑝 < 1 𝑛1 + 1 𝑛2 𝑇 = 13.4 − 12.6 3.02 < 1 35 + 1 35 𝑇 = 0.8 3.02 < 0.24 = 1.11 Ø Using the t-table we see that the p-value is between the 0.10 and the 0.15. For a 2- tailed test, we need to double these values 0.20 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.30 Bigger SD < Smallest SD x 2 3.38 < 2.61 x 2 3.38 <5.22 (True) à Equal variances assumed
- 86. Answers Question 85 A. [-6.52, -3.88] B. [-6.34; -4.59] C. [-6.50; -4.0] D. [-7.29;-3.91] Answer: A 6. T-tests An ice cream company has two new potential flavours ready for the market. They developed a tastiness scale scored from 0 to 30. 40 volunteers tasted flavour A and another 25 volunteers tasted flavour B. The obtained values are: @ 𝑋$= 22.8, @ 𝑋% = 28, 𝑠$ = 4.2 and 𝑆% = 1.9. What is the 95% Confidence Interval corresponding to this t-test?
- 87. 6E. T-tests Question An ice cream company has two new potential flavours ready for the market. They developed a tastiness scale scored from 0 to 30. 40 volunteers tasted flavour A and another 25 volunteers tasted flavour B. The obtained values are: @ 𝑋$= 22.8, @ 𝑋% = 28, 𝑠$ = 4.2 and 𝑆% = 1.9. What is the 95% Confidence Interval corresponding to this t-test? nA=40 nB=25 @ 𝑋$= 22.8 @ 𝑋% = 28 𝑆$ = 4.2 𝑆% = 1.9 86 Solution Ø We are dealing with 2 independent groups, thus we should have an independent samples t-test Ø We have to decide if the assumption of equal variances is violated, in order to use the correct fomrulas 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝐷 ×2 > 𝐵𝑖𝑔𝑔𝑒𝑟 𝑆𝐷 1.9×2 > 4.2 3.8 > 4.2 𝑁𝑜𝑡 𝑡𝑟𝑢𝑒 Ø The equal variances assumption is violated, thus we use the special case of the t-test @ 𝑋! − @ 𝑋" ± 𝑇 < 𝑠! " 𝑛! + 𝑆" " 𝑛" 22.8 − 28 ± 1.711 < 4.2" 40 + 1.9" 25 −5.2 ± 1.711 < 0.5854 −5.2 ± 1.711 < 0.77 −5.2 ± 1.32 [−6.52, −3.88]
- 88. 6E. T-tests Confidence Interval: General Formula Observed X±𝑡& ∗ Standard Error Example: Two-Sample T-Test N=20 (both conditions), @ 𝑋$= −2.1, @ 𝑋% = −3.5, 𝑠$ = 2.05 and 𝑆% = 1.89. What is the 95% CI? @ 𝑋$ − @ 𝑋% ±𝑡& ∗ (𝑠# ∗ ! '! + ! '" ) 𝑠#= !(∗".+,"-!(∗!..( ² !(-!( = 1.97 1.4±2.04*(1.97 ∗ ! "+ + ! "+ ) = [0.13;2.67] Standard Errors of The Different T-Tests One-Sample T-test T 𝑠 𝑛 Paired Sample T-test T 𝑠0 𝑛 Two-Sample T-test 𝑠# ∗ 1 𝑛! + 1 𝑛" Pooled Standard Deviation: 𝑠#= '!1! 2! "-('"1!)2"² ('!1!) - ('"1!) Two-Sample T-test Equal variance not assumed 𝑠! " 𝑛! + 𝑠" " 𝑛" 87
- 89. Answers Question Suppose we are testing the null hypothesis that the population mean is equal to a specific value and the test is right sided. Refer to the SPSS output. Which of the following statements is correct? 88 A. The null hypothesis is rejected for a significance level of 2.5% B. The null hypothesis is not rejected for a significance level of 5% C. The degrees of freedom were found by taking the smallest sample size and subtracting 1 D. None of the alternatives is correct Answer: A 7. T-tests Test Value = 570 t df Sig. (2- tailed) Mean Difference 95% Confidence Interval Lower Upper Test score 2.139 29 0.041 20.333 0.89 39.77
- 90. 7E. T-tests Question Suppose we are testing the null hypothesis that the population mean is equal to a specific value and the test is right sided. Refer to the SPSS output. Which of the following statements is correct? 89 Solution A. Correct. The SPSS output gives the p-value for a two-sided test. However, we have a one-tailed test (right sided test means that the alternative hypothesis has the (<) symbol). Thus, we need to divide the p-value by two (0.041/2=0.0205). We can now see that the corrected p-value is smaller than 0.025, thus the H0 is rejected at an α = 2.5% B. Incorrect. The corrected p-value is smaller than 0.05 as well. Thus, the H0 is rejected at α = 5% as well. C. Incorrect. Since we have a one sample t-test, the formula for the degrees of freedom is N-1. It is for an independent samples t-test, not assuming equal variances that we take the smallest n and subtract 1 for the df D. Incorrect. A is the correct one Test Value = 570 t df Sig. (2- tailed) Mean Difference 95% Confidence Interval Lower Upper Test score 2.139 29 0.041 20.333 0.89 39.77
- 91. Answers Question A researcher wants to test whether ethnic background influences IQ scores of Dutch primary school children. They draw a sample of 50 children with grandparents of Turkish origin and another 50 children with Dutch grandparents. Each child of Turkish descend is match for age and sex with a Dutch one. The groups data is summarized in the table below. A paired sample t-test was used to test this hypothesis. Which of the following tests could have yielded the same result? 90 Mean N Std. Deviation Std. Error Mean Turkish 98.657 50 10.0023 1.6523 Dutch 103.203 50 14.5602 2.2436 A. An independent t-test, assuming equal population variances. B. An independent t-test, assuming unequal population variances. C. A one-sample t-test, conducted for the difference in IQ score between matched children. D. None of the answer above. Answer: C 8. T-tests
- 92. 8E. T-tests Question A researcher wants to test whether ethnic background influences IQ scores of Dutch primary school children. They draw a sample of 50 children with grandparents of Turkish origin and another 50 children with Dutch grandparents. Each child of Turkish descend is match for age and sex with a Dutch one. The groups data is summarized in the table below. A paired sample t-test was used to test this hypothesis. Which of the following tests could have yielded the same result? 91 Solution A. Incorrect, the two groups are match, so they are dependent, not independent. B. Incorrect, the two groups are match, so they are dependent, not independent. C. Correct, a paired samples t-test compares the means of the samples to check whether there is a difference between their means. The 2 tests have the same calculations, thus if one finds the mean differences and then performs a one sample t-test on the differences, they would get the same result. D. Incorrect. Answer is C Mean N Std. Deviation Std. Error Mean Turkish 98.657 50 10.0023 1.6523 Dutch 103.203 50 14.5602 2.2436
- 93. Answers Question Inspect the given output. What answer is Correct? 92 A. Lavene’s Test is not significant, therefore equal variances can be assumed. B. The Tobs is equal to -2.845 C. According to the t-table, the null hypothesis is rejected D. All answers are correct. Answer: D 9. T-tests ? ? ? ?
- 94. 9E. T-tests Question Inspect the given output. What answer is Correct? 93 Solution A. Correct. Levene’s Test has the null hypothesis that the population variances are equal (𝜎! " = 𝜎" " ). Since we can see that the p-value is a lot larger than 0.05 (p-value = 0.582), we can say that the null hypothesis is not rejected and that there is no violation of the equal variances assumption B. Correct. We can calculate the Tobs by dividing the Mean difference ( ̅ 𝑥! − ̅ 𝑥" = −14.00) by the Std. Error difference (𝑠# ∗ ! '! + ! '" = 4.92). This will give us -2.845 C. Correct. The null hypothesis in this case is rejected because the value 0 is not located in the 95% CI, meaning that the population difference between the 2 groups cannot be 0 ? ? ? ?
- 95. Answers Question Florian is the GM of Success Formula and has recently heard that colour can influence learning performances and outcomes. He was informed that research has shown that the colour blue leads to better performances in tests and better recall. The classes at SF however are painted in white. Florian decides to test if indeed the colour blue leads to better results compared to white. He gathers 38 students and assigns them to 2 groups. The groups are matched together in regards to skill, age, motivation and more. One group takes the class in a room painted white, while the second group in a room painted blue. The test score means afterwards are compared. The population distribution of difference scores is normal. Florian gets the following SPSS output. Which statement is correct? 94 10. T-tests Paired Differences Mean Std. Deviation Std. Error Mean 95% CI T df Sig (2- tailed) Lower Upper Pair 1. White - Blue -.579 2.524 .579 -1.795 .637 -1.000 18 .331
- 96. Answers Question Florian is the GM of Success Formula and has recently heard that colour can influence learning performances and outcomes. He was informed that research has shown that the colour blue leads to better performances in tests and better recall. The classes at SF however are painted in white. Florian decides to test if indeed the colour blue leads to better results compared to white. He gathers 38 students and assigns them to 2 groups. The groups are matched together in regards to skill, age, motivation and more. One group takes the class in a room painted white, while the second group in a room painted blue. The test score means afterwards are compared. The population distribution of difference scores is normal. Florian gets the following SPSS output. Which statement is correct? 95 A. There is a probability of 0.331 that the H0 is true B. The researcher might be making a Type I error C. The researcher might be making a Type II error. D. Since the TOBS is not located within the 95% CI, the null hypothesis can be rejected Answer: C 10. T-tests
- 97. 10E. T-tests Question Florian is the GM of Success Formula and has recently heard that colour can influence learning performances and outcomes. He was informed that research has shown that the colour blue leads to better performances in tests and better recall. The classes at SF however are painted in white. Florian decides to test if indeed the colour blue leads to better results compared to white. He gathers 38 students and assigns them to 2 groups. The groups are matched together in regards to skill, age, motivation and more. One group takes the class in a room painted white, while the second group in a room painted blue. The test score means afterwards are compared. The population distribution of difference scores is normal. Florian gets the following SPSS output (next slide). Which statement is correct? 96 Solution A. Incorrect. The p-value is 0.331 and it is defined as the probability that our data (or more extreme data) would have occurred, given that the null hypothesis is true. The p-value does not give the probability that H0 is true. It is the conditional probability with the condition that H0 is true B. Incorrect. Type 1 error is defined as rejecting a true null hypothesis. However, our p-value is larger than 0.05, thus we did nor reject the null hypothesis in the first place. The probability that we are making a Type 1 error in this case is 0% C. Correct. Type 2 error is defiened as not rejecting a false null hypothesis. Since the p-value is larger than our significance level, we did reject H0, but there is always the chance that we made an error D. Incorrect. While using the CI to see if the H0 is rejected or not for a paired samples t-test, we need to see if the value 0 is located in the interval, not the Tobs. This is becausle the null hypothesis states that there is no difference.
- 98. 10E. T-tests Type I Error The null hypothesis is true but we reject it. à Measured with α 97 Graphical Illustration Type II Error The null hypothesis is false but we fail to reject it. à Measured by β
- 99. Stats I – Question Pool ANOVA 98
- 100. Answers Question ANOVA assumes the following statistical model: 𝑌𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗, in which Yij denoting the score of person j in group i. Choose the incorrect statement from below: 99 A. µ1= Yij - 𝜀𝑖𝑗 represents the mean of group 1 B. εij has a different value for each individual participant, regardless of treatment effects. C. µ is a variable effect, specific to each participant. D. If there is no treatment effect, αi is equal among all participants. Answer: C 1. ANOVA
- 101. 1E. ANOVA Question ANOVA assumes the following statistical model: 𝑌𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗, in which Yij denoting the score of person j in group i. Choose the incorrect statement from below: 100 Solution A. Correct. The difference between the individual score from the group mean is a great indicator of the unexplained variation caused by factors not controlled. It can be written as 𝜀𝑖𝑗 = Yij − 𝜇5 ⇔ 𝜇5 = Yij − 𝜀𝑖𝑗 B. Correct. Individual differences are uncontrollable factors that result in the divergence of scores of participants within the same groups. For each participant, regardless the treatment effects, the individual differences/residual factors are different C. Incorrect, µ is a constant effect. It refers to the factors that are the same in all conditions. It stays the same for each subject. D. Correct, if there is no treatment effect, 𝛂𝐢 (for all participants) = 0.
- 102. 1E. ANOVA Main Formula 𝐘𝐢𝐣 = 𝛍 + 𝛂𝐢 + 𝛆𝐢𝐣 101 Sum of Squares ∑(𝒀𝒊𝒋 -Ӯ)² = ∑𝒊𝒏𝒊(Ӯ𝒊- Ӯ)² + ∑(𝒀𝒊𝒋 - Ӯ𝒊)² Participant j in group i Constant effect Effect of group i Effect of remaining factors of participant j in group i (error) = + + Total sum of squares (TSS) Between group sum of squares (SSG) Within group sum of squares (SSE) = +
- 103. 1E. ANOVA Example SSG (Between Groups) SSG = ∑5𝑛5(Ӯ5- Ӯ)² SSG = 3*(2-4)²+3*(4-4)²+3*(6-4)² SSG = 24 Tip: Alternative notation of 𝛼5= µ5 - µ Here µ5=Ӯ5 (mean of single group) and µ=Ӯ (total mean). Preparation What is the mean of each group Ӯ!= (1+2+3)/3 = 2 Ӯ"= (3+4+5)/3 = 4 Ӯ7= (4+5+6)/3 = 6 What is the total mean? Ӯ = (2+4+6)/3 = 4 SSW (Within Groups) SSW = ∑(𝑌58 - Ӯ5)² SSW = (1-2)²+(2-2)²+(3-2)²+(3-4)²…+(7-6)² SSW = 6 Tip: Alternative notation of 𝜀58= 𝑌58 - µ5 Here µ5 is the same as Ӯ5. Both describe the mean of a single group. G1 G2 G3 P1 1 3 5 P2 2 4 6 P3 3 5 7 3 different conditions with 3 participants each 102
- 104. Answers Question Participants were asked to memorise a list of words. They were divided into several groups, each using a different memorization technique. 60 minutes later, the experimenter assessed how many words they could still remember (the dependent variable RECALL in the output). Which statement is correct? 103 A. The experimental setting had 3 conditions. B. The total variance equals 4.91 C. The ANOVA test is significant (𝛂= 5%). D. All answer are correct. Answer: D 2. ANOVA 41.566 41.850 83.416 20.783 2.790
- 105. 2E. ANOVA 104 Question Solution A. Correct. The degrees of freedom between groups is given by the formula 𝑘 − 1. à Degrees of freedom for “between groups” is equal to “number of groups minus 1” (k-1). In our case we had 3 conditions so df=(3-1) = 2 B. Correct. The total variance can be found by the formula 𝑀𝑆9:9;< = ==9 >?# = .7.@!A !B = 4.91 C. Correct. The ANOVA SPSS output has a p- value of 0.006 for an F=7.447. The p-value is smaller than the significance level 5%, thus the test is significant. D. Yes, they are all correct. Participants were asked to memorise a list of words. They were divided into several groups, each using a different memorization technique. 60 minutes later, the experimenter assessed how many words they could still remember (the dependent variable RECALL in the output). Which statement is correct? 41.566 41.850 83.416 20.783 2.790
- 106. Answers Question A sample of n= 35 participants was randomly selected from UM students pool. A baseline assessment rated their arachnophobia. After undergoing 2 sessions of exposure therapy (to spiders), their arachnophobia was measured again with the same scale. The researcher wants to see if the 2 sessions of exposure therapy had a significant effect. Should an ANOVA test be performed on this data set? 105 A. Yes, the normality assumptions hold since the sample size is big enough. B. Yes, the equal variances assumptions is met because 35 participants were tested both times. C. No, The independence assumption is violated. D. Yes, the data is quantitative as their phobia is rated on scale. Answer: C 3. ANOVA
- 107. 3E. ANOVA Answers A sample of n= 35 participants was randomly selected from UM students pool. A baseline assessment rated their arachnophobia. After undergoing 2 sessions of exposure therapy (to spiders), their arachnophobia was measured again with the same scale. The researcher wants to see if the 2 sessions of exposure therapy had a significant effect. Should an ANOVA test be performed on this data set? 106 Solution A. Correct, but the main criteria for an ANOVA: independent groups is violated. Thus, an ANOVA is not the suitable test here. B. Incorrect, the same sample is tested twice (baseline and after exposure). We are not comparing independent groups. C. Correct, the same sample is tested twice (baseline and after exposure). We are not comparing independent groups. D. Correct, but the main criteria for an ANOVA: independent groups is violated. Thus, an ANOVA is not the suitable test here.
- 108. Answers Question An experiment on the effect of listening to music on information retention is performed. A total sample of 75 is divided into three equally large groups. All three groups are asked to memorized a list of words while either (a) listening to Vivaldi, (b) listening to AC/DC, or (c) listening to crickets singing. An analysis of variance is performed. It is concluded that the null hypothesis cannot be rejected. What statement is correct? 107 A. MSG and MSE are both unbiased estimators of the error variance. B. Since the null hypothesis is true, then the difference between groups is as large as difference within groups. C. There is no group effect. D. All are correct Answer: D 4. ANOVA
- 109. 4E. ANOVA Question An experiment on the effect of listening to music on information retention is performed. A total sample of 75 is divided into three equally large groups. All three groups are asked to memorized a list of words while either (a) listening to Vivaldi, (b) listening to AC/DC, or (c) listening to crickets singing. An analysis of variance is performed. It is concluded that the null hypothesis cannot be rejected. What statement is correct? 108 Solution A. Correct. When H0 is rejected, it means that the difference between groups was caused by uncontrolled factors (error). This means that the MS(G) is an unbiased estimator of error variance. MSE is an unbiased estimator of error variance in any case. B. Correct. The difference between groups is measured by MSG while the difference within groups is measured by MSE. In the case of a true null hypothesis, both MSE and MSQ are unbiased estimators of error variance, thus MSE=MSG C. Correct. The H0 for ANOVA states that the means of all groups are equal, meaning that there is no treatment effect. D. Correct
- 110. 4E. ANOVA 109 Unbiased Estimator • MSE is an unbiased estimator of error variance. Pooled Variance • Since we already have the assumption that all populations have equal variance, we can take the average of estimates. 𝑆𝑝" = 𝑁! − 1 ×𝑆! " + 𝑁" − 1 ×𝑆" " +. . +(𝑁' − 1)×𝑆' " 𝑁! − 1 + 𝑁" − 1 +. . +(𝑁' − 1) Conclusion • MSE = Sp 2 • Accurate and efficient error estimate.
- 111. 4E. ANOVA Random Variables MSG and MSE count as random variables. MSE and MSG as Estimators of Error Variance If there is no group effect (𝐻+: true) MSE as well as MSG count as unbiased estimations of the error variance. Relation of MSE and MSG MSE is the error (or noise) MSG is the error + the effect of the group. If 𝐻+ is true and there is no effect of the group MSE and MSG will be approximately equal. Another way to phrase this would be, the difference between groups is as large as difference within groups. 110
- 112. Answers Question Synesthesia is a perceptual phenomeneon in which there is an experience of 2 sensory/cognitive pathways. Synesthesia has been linked to enhanced memory skills due to increased association available. Anton wanders if there is a difference in memory recall between different synesthesia types. He gathers 120 participants and within his sample, there are 4 different synesthesia types. Each group has an equal number of participants. After a memorization period, Anton gives his participants a memory test. Following an ANOVA, SSG = 167.91 and SSE = 1760.88 What can be concluded? 111 A. H0 not rejected with p-value > 0.05 B. H0 rejected with 0.025 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05 C. H0 rejected with 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025 D. Ho not rejected because Fobs< Fcritical Answer: C 5. ANOVA
- 113. 5E. ANOVA Question Synesthesia is a perceptual phenomeneon in which there is an experience of 2 sensory/cognitive pathways. Synesthesia has been linked to enhanced memory skills due to increased association available. Anton wanders if there is a difference in memory recall between different synesthesia types. He gathers 120 participants and within his sample, there are 4 different synesthesia types. Each group has an equal number of participants. After a memorization period, Anton gives his participants a memory test. Following an ANOVA, SSG = 167.91 and SSE = 1760.88 What can be concluded? 112 Solution Ø Calculate the degrees of freedom 𝑑𝑓 𝐺 = 𝑘 − 1 = 4 − 1 = 3 𝑑𝑓 𝐸 = 𝑁 − 𝑘 = 120 − 4 = 116 Ø Calculate the Mean Squares 𝑀𝑆 𝐺 = 𝑆𝑆𝐺 𝑑𝑓(𝐺) = 167.91 3 = 55.97 𝑀𝑆 𝐸 = 𝑆𝑆𝐸 𝑑𝑓(𝐸) = 1760.88 116 = 15.18 Ø Calculate the F-value 𝐹 = 𝑀𝑆(𝐺) 𝑀𝑆(𝐸) = 55.97 15.18 = 3.687 Ø By taking a look at the F-table we see that for α=0.05, the Fc(3.116)=2.70, which means the null hypothesis is rejected 𝐹C%= > 𝐹D Ø On the next pages we see that for α=0.01, the Fc = 3.98 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025
- 114. Answers Question Based on the ANOVA output, which of the following statements are correct? 113 A. The scores on the dependent variable likely vary due to residual effects only. B. The scores on the dependent variable likely vary due to residual effects and group effect. C. The scores on the dependent variable likely vary due to group effect only D. The scores on the dependent variable likely do not vary due to residual effects nor due to the group effect. Answer: B 6. ANOVA Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
- 115. 6E. ANOVA Question Based on the ANOVA output, which of the following statements are correct? 114 Solution Ø Using the F-table, we can see that for α=0.05, the 𝐹𝑐 1.58 = 4.03 Ø The Fobs is bigger than the Fc, meaning that the null hypothesis is rejected Ø There is an overall treatment effect, thus not all group means are the same Ø However, error cannot be controlled for, so it is always there Scores likely vary due to treatment/group effect AND error/residual factors Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
- 116. Answers Question Maja conducted a study with 5 conditions and 30 participants in total have been recruited. Choose the correct statement: 115 A. F = 21.801, not significant B. F = 17.474, not significant C. F = 19.625, significant D. F = 18.926, significant Answer: D 7. ANOVA ? ? ? ? ? ? ? ? 2244.500 9041.367
- 117. 7E. ANOVA Question Maja conducted a study with 5 conditions and 30 participants in total have been recruited. Choose the correct statement: 116 Solution 1) Calcualte the SS(G): 𝑆𝑆𝑇 = 𝑆𝑆𝐺 + 𝑆𝑆𝐸 𝑆𝑆𝐺 = 𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝐺 = 9041.367 − 2244.5 = 6796.867 2) Calcualte degrees of freedom: 𝑑𝑓 𝐺 = 𝑘 − 1 = 5 − 1 = 4 𝑑𝑓 𝐸 = 𝑁 − 𝑘 = 30 − 5 = 25 𝑑𝑓 𝑇 = 𝑁 − 1 = 30 − 1 = 29 3) Calculate Mean squares: 𝑀𝑆 𝐺 = 𝑆𝑆𝐺 𝑑𝑓(𝐺) = 6796.867 4 = 1699.217 𝑀𝑆 𝐸 = 𝑆𝑆𝐸 𝑑𝑓(𝐸) = 2244.5 25 = 89.780 4) Calculate F-value: 𝐹 = 𝑀𝑆𝐺 𝑀𝑆𝐸 = 1699.217 89.780 = 18.926 5) Use the F-table to reach yout decision: 𝐹𝑐 4,25 = 2.76 ⇒ 𝐹𝑜𝑏𝑠 > 𝐹𝑐 ⇒ 𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 ? 2244.500 9041.367 ? ? ? ? ? ? ?
- 118. Answers Question Micheal is a sports enthusiast. He wants to investigate which form of excersise leads to better concentration. He recruits 75 participants and assigns them randomly to 3 groups (cardio, weights, crossfit). He later measures their concentration levels and compares the means of the groups. Given that Micheal ended up rejecting the null hypothesis, which of the following is correct? 117 A. There is no difference in concentration levels between groups B. Micheal can confidently say that cardio is better than weights C. Micheal needs an extra statistical analysis D. There is no treatment effect Answer: C 8. ANOVA
- 119. 8E. ANOVA Question Micheal is a sports enthusiast. He wants to investigate which form of excersise leads to better concentration. He recruits 75 participants and assigns them randomly to 3 groups (cardio, weights, crossfit). He later measures their concentration levels and compares the means of the groups. Given that Micheal ended up rejecting the null hypothesis, which of the following is correct? 118 Solution A. Incorrect. The null hypothesis states that all group means are the same (no treatment effect). By rejecting the null hypothesis we can confidently say that not all group means are the same. B. Incorrect. By rejecting the null hypothesis, we know that not all group means are the same, however we do not know where the difference is exactly (e.i., between which groups). C. Correct. If we want to uncover the exact nature of the group difference, we need to conduct multiple comparisons. D. Incorrect. Null hypothesis was rejected, thus there is treatment effect.
- 120. Answers Question Micheal did conduct multiple comparisons to examine the differences between groups. What can be concluded based on the SPSS output? 119 9. ANOVA Dependent Variable: Concentration scores LSD (I) Group (J) Group Mean Difference Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Cardio Weights 0.1762 0.5102 0.730 -.8338 1.1861 Crossfit 1.4606 0.5470 0.009 .3778 2.5435 Weights Cardio -.1762 0.5102 0.730 -1.1861 .8338 Crossfit 1.2844 0.5696 0.026 .1569 2.4119 Crossfit Cardio -1.4606 0.5470 0.009 -2.5435 -.3778 Weights -1.2844 0.5696 0.026 -2.4119 -.1569
- 121. Answers Question Micheal did conduct multiple comparisons to examine the differences between groups. What can be concluded based on the SPSS output? 120 A. There are 2 statistically significant comparisons B. There is 1 statistically significant comparison C. All three comparisons are statistically significant D. None of the comparisons reaches significance Answer: B 9. ANOVA
- 122. 9E. ANOVA Question Micheal did conduct multiple comparisons to examine the differences between groups. What can be concluded based on the SPSS output? 121 Family-wise Type 1 error In a multiple comparison the α-value of each comparison is added up. Hence, the chance of making a Type I Error increases Solution Ø While the output does show 2 comparisons that reach significance (cardio-crossfit, weights-crossfit), no Bonferroni correction has been appied for the family-wise Type 1 error. Ø By applying the Bonferroni correction (multiply p-value by number of comparisons), we see that only the comparison between cardio and crossfit remains significant Bonferroni Correction 1. Multiply p-value by number of comparisons Or 2. Divide significance level by number of comparisons Number of comparisons: (k(k-1))/2)
- 123. Answers Question Given that the groups have equal sample sizes and the following output, which statement is correct? 122 A. The normality assumption was violated, so the test should not have been done B. An independent samples t-test could be done instead of ANOVA C. MSE is smaller than MSG, hence the treatment effects are significant D. If the test is significant, multiple comparisons are the necessary next step Answer: C 10. ANOVA Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
- 124. 10E. ANOVA Question Given that the groups have equal sample sizes, which statement is correct, given the following output? 123 Solution A. Incorrect. We can see that our sample size is 60 (N-1=59 à N=60). Given that each group has 30 participants, the CLT can be applied, thus the test is robust against a normality violation B. Correct. Since we have just 2 groups, an independent samples t-test would be equivalent to this ANOVA. C. Incorrect. It might be that MSE is smaller than MSG, thus F is bigger than 1, but we always have to rely on the p-value which tells us whether the result is actually significant D. Incorrect. Since we only have two groups, if the test is significant, we can immediately tell between which groups there is a difference, thus it is not a necessity to conduct multiple comparisons. However if we want to see how the difference will look like, we can continue on with them. Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
- 125. Stats1 – Question Pool Proportions, Entire Distributions
- 126. Answers Question Florian is the new general manager at Success Formula, replacing Michalina. Success formula offers courses in Psychology, Business Economics and Law. During the time Michalina was GM, 60% the student population at SF attended Business Economics courses, 25% Psychology courses and 15% Law courses. After an intense marketing campaign, Florian believes that this year, things will be different. In a simple random sample of 275 students, 145 of them chose B/E courses, 75 choose psychology and 55 choose law. Based on the data, Florian wants to test whether the population distribution of field choice will change or will it be the same as during Michalina’s reign as GM. Does the result from the sample give sufficient evidence? 125 A. No, the null hypothesis is not rejected with the observed value of the statistic test equal to 1.23 B. Yes, the null hypothesis is rejected with the observed value of the statistic test equal to 7,57 C. No, the null hypothesis is not rejected with the observed value of the statistic test equal to 2.50 D. Yes, the null hypothesis is rejected with the observed value of the statistic test equal to 9.93 Answer: B 1. Proportions and Entire Distributions
- 127. 1E. Proportions and Entire Distributions Question Florian is the new general manager at Success Formula, replacing Michalina. Success formula offers courses in Psychology, Business Economics and Law. During the time Michalina was GM, 60% the student population at SF attended Business Economics courses, 25% Psychology courses and 15% Law courses. After an intense marketing campaign, Florian believes that this year, things will be different. In a simple random sample of 275 students, 145 of them chose B/E courses, 75 choose psychology and 55 choose law. Based on the data, Florian wants to test whether the population distribution of field choice will change or will it be the same as during Michalina’s reign as GM. Does the result from the sample give sufficient evidence? 126 Solution Ø We see that we have only 1 variable (count of students) which has more than 2 levels (3) Ø We want to see how well the sample distribution fits a specific model Ø We have to use the X2 Goodness of Fit Test
- 128. 1E. Proportions and Entire Distributions Data Model: BE(60%)-Psy(25%)-Law(15%) N = 275 H0: Distribution within sample fits the model Hα: Distribution within sample does not fit model 127 Solution Ø Calculate Expected Counts [𝐸𝑐 = 𝑁×𝑃 𝑒 ] • B/E: 275 x 0.6 = 165 • Psy: 275 x 0.25 = 68.75 • Law: 275 x 0.15 = 41.25 Ø Calculate the chi-square 𝑥! = Σ 𝑂𝐶 − 𝐸𝐶 ! 𝐸𝐶 𝑥! = 145 − 165 ! 165 + 75 − 68.75 ! 68.75 + 55 − 41.25 ! 41.25 𝑥! = 2.42 + 0.57 + 4.58 𝒙𝟐 = 𝟕. 𝟓𝟕 Ø Check the x2 table for the p-value 𝟎. 𝟎𝟐 ≤ 𝒑 − 𝒗𝒂𝒍𝒖𝒆 ≤ 𝟎. 𝟎𝟐𝟓 We see that the p-value should be lower than 0.05, thus the H0 that the distribution within the sample fits the model is rejected. Students Business/Economics 145 Psychology 75 Law 55
- 129. 1E. Proportions and Entire Distributions When to Use Data type: categorical data à Check how well a proposed proportion distribution fits with an observed one. 𝐻#: 𝑇ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑖𝑡𝑠 𝑜𝑢𝑟 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 𝐻$: 𝑇ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑖𝑡 𝑜𝑢𝑟 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 Degrees of Freedom Nationality of class Dutch 0.2 German 0.5 Belgian 0.2 French 0.1 Formula Χ!= Σ Obs−Exp ! Exp Assumptions • Categorical Data • Expected Counts >5 EC = N*p(e) Df = # of cells – 1 df = 4-1 = 3 128
- 130. Answers Question Andreia has been researching the effectiveness of dialectical behavior therapy (DBT), a type of cognitive behavioural therapy, for the development of healthy ways to cope with stress and emotion regulation. She wonders whether DBT has different efficiency levels for different types of populations. She decides to take two samples, one of people exhibiting eating disorders and one of people with substance use disorders. After several sessions, Andreia and her team, note for each subject if there was improvement or not. Andreia is the first researcher to conduct such a study, so she does not know how the different disorders can have an effect on improvement. What can be concluded? 129 2. Proportions and Entire Distributions Improvement Yes No Disorder Eating Disorders 148 112 260 Substance use Disorders 173 102 275 321 214 535
- 131. Answers Question Andreia has been researching the effectiveness of dialectical behavior therapy (DBT), a type of cognitive behavioural therapy, for the development of healthy ways to cope with stress and emotion regulation. She wonders whether DBT has different efficiency levels for different types of populations. She decides to take two samples, one of people exhibiting eating disorders and one of people with substance use disorders. After several sessions, Andreia and her team, note for each subject if there was improvement or not. Andreia is the first researcher to conduct such a study, so she does not know how the different disorders can have an effect on improvement. What can be concluded? 130 A. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.98 B. The null hypothesis is rejected with the observed value of the statistic test equal to 1.36 C. The null hypothesis is not rejected with the observed value of the statistic test equal to -1.36 D. The null hypothesis is rejected with the observed value of the statistic test equal to -2.71 Answer: C 2. Proportions and Entire Distributions
- 132. 2E. Proportions and Entire Distributions Data Ø We now compare 2 independent samples Ø The dependent variable is dichotomous Ø We have to use a 2 proportion z-test 𝐻#: 𝜋% = 𝜋! 𝐻$: 𝜋% ≠ 𝜋! 𝑝1 = 𝑥% 𝑛% = 148 260 = 0.57 𝑝2 = 𝑥! 𝑛! = 173 275 = 0.63 𝜋 = 𝑥% + 𝑥! 𝑛% + 𝑛! = 148 + 173 260 + 275 = 0.6 131 Solution Ø Calculate the Z 𝑍 = 𝑝1 − 𝑝2 − (𝜋1 − 𝜋2) 𝜋 < (1 − 𝜋) < 1 𝑛1 + 1 𝑛2 𝑍 = 0.57 − 0.63 0.6(1 − 0.6) < 1 260 + 1 275 𝑍 = −0.06 0.49 < 0.09 = −1.36 Ø Look at the Z-table for the p-value P-value(z=-1.36)= 0.0869 Ø Double the p-value since it is a two-tailed test 2x0.0869 = 0.1738 > 0.05 The null hypothesis cannot be rejected.
- 133. 2E. Proportions and Entire Distributions When to Use Comparing the proportion of two groups (categorical data). 𝐻#: 𝑝% = 𝑝! 𝐻$: 𝑝% ≠ 𝑝!(two-sided) 𝐻$: 𝑝% < 𝑝!or 𝐻$: 𝑝% > 𝑝!(one-sided) Assumptions: • Categorical variables à dichotomous • Independent groups • Normality - always violated - Central Limit Theorem Formulas and Application Z-score = (' (!) * ("))# ,- Estimate: • 𝑝% − • 𝑝! SE (for z-test): ' (!∗(%)* (%) /! + ' ("∗(%)* (!) /" Confidence Interval p1 – p2 ± 𝑍! "#(#%"#) '# + "((#%"() '( 132
- 134. Answers Question Refer back to the previous question. What is the 95% confidence interval? 133 A. [0.063, 0.015] B. [-0.014, 0.023] C. [-0.053, 0.090] D. [1.678, 3.683] Answer: B 3. Proportions and Entire Distributions
- 135. 3E. Proportions and Entire Distributions Question Refer back to the previous question. What is the 95% confidence interval? 134 Solution 𝑝1 − 𝑝2 ± 𝑍𝑐 < 𝑝1 1 − 𝑝1 𝑛1 + 𝑝2 1 − 𝑝2 𝑛2 0.57 − 0.63 ± 1.96 < 0.57 < 0.43 260 + 0.63 < 0.37 275 −0.06 ± 1.96 < 0.042 [−0.014, 0.023]
- 136. Answers Question Nik wants to see if there is association between the presence of neuroscientific evidence (1=no, 2=yes) and juror verdicts (not guilty=1, not guilty due to insanity=2 guilty=3). What can be concluded based on the table? 135 4. Proportion and Entire Distribution Neuroscientific Evidence No Yes Verdict Not Guilty 32 29 61 Not Guilty due to insanity 55 61 116 Guilty 10 13 23 97 103 200
- 137. Answers Question Nik wants to see if there is association between the presence of neuroscientific evidence (1=no, 2=yes) and juror verdicts (not guilty=1, not guilty due to insanity=2 guilty=3). What can be concluded based on the table? 136 A. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.67 B. The null hypothesis is rejected with the observed value of the statistic test equal to 1.30 C. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.20 D. The null hypothesis is rejected with the observed value of the statistic test equal to 0.65 Answer: A 4. Proportion and Entire Distribution
- 138. 4E. Proportion and Entire Distribution Data Ø We want to study the relationship of two categorical variables Ø We use a contigency table Ø We use the chi-square test for contigency tables Expected Counts: 𝐸𝐶 = 𝑇𝑜𝑡𝑎𝑙 𝑟𝑜𝑤 < 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑙𝑢𝑚𝑛 𝑁 137 Solution Ø Caclualte the chi-square 𝑥! = Σ 𝑂𝐶 − 𝐸𝐶 ! 𝐸𝐶 𝑋! = 32 − 29.585 ! 29.585 + 55 − 56.26 ! 56.26 + 10 − 11.155 ! 11.155 + 29 − 31.415 ! 31.415 + 61 − 59.740 ! 59.740 + 13 − 11.845 ! 11.845 𝑥! = 0.197 + 0.028 + 0.119 + 0.186 + 0.026 + 0.113 𝑋! = 0.669 = 0.67 Ø Calculate df 𝑑𝑓 = #𝑟𝑜𝑤𝑠 − 1 < #𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1 = 3 − 1 < 2 − 1 = 2 Ø Check the p-value The p-value looks to be greater than 0.25, thus the null hypothesis cannot be rejected. No Yes Not Guilty 32 (29.585) 29 (31.415) Not Guilty due to Insanity 55 (56.26) 61 (59.740) Guilty 10 (11.155) 13 (11.845)