- 1. CHAPTER 11 The t Test for Two Related Samples
- 2. Stephens, Atkins, & Kingston (2009) Does cursing focus attention on pain (thereby increasing it), or does it distract from the pain (thereby decreasing it)? Swearing No Swearing Group A 1st 2nd Group B 2nd 1st • Recorded the length of time each group was able to keep their hands in the ice water under each condition • Now we have two sets of data for each participant (each underwent Conditions 1 and 2) We are looking for within-subjects differences now: Was the participant able to keep his hand in the bowl for a longer time when he swore, or when he didn’t swear?
- 3. CHAPTER 11.1 Introduction to Repeated-Measures Designs
- 4. Repeated Measures Design A research design in which a single sample of individuals is measured more than once on the same dependent variable • (a.k.a. Within-Subjects Design) • The same subjects are used in all of the treatment conditions We are comparing two sets of data, but now it all comes from the same group of participants • Comparing depression levels before and after therapy • Comparing test scores before and after specialized instruction
- 5. A Major Advantage of Repeated Measures • Instead of working with two groups of participants who may be substantially different from each other • You are working with identical participants in both treatment conditions, eliminating individual differences Gwyneth Angelina Nikki Gaga Dr. Evil Brad Bono Dexter Christina Mariah Britney Madonna Dr. Oz George The Edge Jesse Gwyneth Angelina Nikki Gaga Dr. Evil Brad Bono Dexter
- 6. How do we obtain two sets of data? Independent-Measures (Between Subjects) Design From two completely separate groups of participants: A separate sample of individuals is used for each treatment condition (or population) e.g., Men vs. Women, Teaching Strategy A vs. B From the same group of participants: A single sample of individuals is measured more than once on the DV Obtain pre- and post- intervention scores e.g., Depression level before and after therapy Repeated-Measures (Within Subjects) Design
- 7. Two people matched on: Matched-Subjects Design Each individual in one sample is matched with a subject in the other sample. • The matching is done so that the two individuals are equivalent (or nearly equivalent) • On IQ, GPA, Gender, Ability, Chronic Illness, et. al. • Any specific variable that the researcher would like to control • Ensures that the participants are equivalent (matched) on one (or more) variable(s) Gender Race Height Weight Career
- 8. Related Samples Design Repeated Measures • Two sets of scores (2 samples) • Each “pre-test” score (the first sample) is directly related to the “post-test” score (the second sample ) Matched Samples Pre-test Treatment Post-test Peyton Manning “Pre-test” “Post-test”Untreated Treated Untreated Treated Treatment
- 9. Data for a Related Samples Design • For both the repeated-measures and matched-subjects design, the data will look like this: • For each participant, there are two data points (scores)
- 10. In SPSS: This is what Variable View should look like: This is what Data View should look like: • Each line (row) in the data sheet represents one participant. • For each participant, there should be two scores
- 11. CHAPTER 11.2 The t-Statistic for A Repeated-Measures Research Design
- 12. Data for a Repeated-Measures Study • We are interested in the differences, if any, between before and after scores. • Data comes from difference scores obtained for each participant • Provides a measure of the difference between X1 and X2 Person Before Meds (X1) After Meds (X2) Difference D A 215 210 -5 B 221 242 21 C 196 219 23 D 203 228 25 ΣD = 64 MD = 16 12 XXD The sign of each D score tells you the direction of the change “-” = a decrease in score “+” = an increase in score
- 13. The Repeated-Measures t Test Is there a difference between two treatment conditions for the general population? • Now we are interested in the population of difference scores • And so our calculation of the mean (M) requires the ΣD: Person Before Meds (X1) After Meds (X2) Difference D A 215 210 -5 B 221 242 21 C 196 219 23 D 203 228 25 ΣD = 64 MD = 16 MD = SD n A comparison between two groups of scores from identical (or matched) individuals *Note the use of the subscript “D” to identify this as the mean of the difference scores.
- 14. The Hypotheses • We are using the sample of difference scores to infer information about the general population • Is there a difference between two treatment conditions for the general population? 0:0 DH H1 :mD ¹ 0 Null: There is no effect of the treatment Alternative: There is an effect of treatment
- 15. The Hypotheses If μ is the mean of the population, and we are now examining the mean of the difference scores (D), then the notation for μ is now μD We are testing whether or not the sample mean of the difference scores is representative of the population mean of the difference scores. Therefore, our hypotheses are: Null: There is no effect of the treatment Alternative: There is an effect of treatment 0:0 DH 0:1 DH Two-tailed One-tailed (increase) H0 :mD £ 0 H1 :mD > 0 One-tailed (decrease) H0 :mD ³ 0 H1 :mD < 0
- 16. The General Situation for the Repeated Measures Hypothesis Test We are working with difference scores
- 17. The t Statistics Single-sample t statistic Formula Numerator Actual difference between the sample mean and the hypothesized population mean Denominator Amount of expected error (estimated amount of error expected when we use a sample mean to represent a population mean) t = M -m sM Related-samples t statistic Actual difference between the sample mean of difference scores and the hypothesized population mean of difference scores Amount of expected error (estimated amount of error expected when you use a sample mean of difference scores to represent a population mean of difference score) DM DD s M t
- 18. The t Statistic Used to test hypotheses about an unknown population mean μ when the value of σ is unknown DM DD s M t sMD = s2 n Same structure as the t- statistic, except this uses the estimated standard error of the difference scores
- 19. CHAPTER 11.3 Hypothesis Tests and Effect Size for the Repeated Measures Design
- 20. Test Procedures 1. State the hypotheses and select the α level (no effect) α = .01 (there is an effect) 2. Locate the critical region by calculating df and then using the t Distribution Table in your textbook (p.703) 3. Compute the test statistic 4. Make a decision 0:0 DH 0:1 DH 1 Dndf DM DD s M t
- 21. tcrittcrit Two-Tailed Tests: “Difference” Hypotheses • If the group 1 should have a different mean than group 2 • Example: If you are testing whether room temperature (IV – the treatment) changes test performance (DV), then you expect the mean of the difference scores for the sample to be different from zero (0). However, the direction is uncertain • You have two critical values (a negative and positive value), and your calculated t statistic must fall in one of the tails (the critical region) of the distribution H0 :mD = 0 H1 :mD ¹ 0
- 22. tcrit One-Tailed Tests: Directional Hypotheses • If the group 1 should have a higher mean than group 2 • Example: If you are testing to see if classical music (the IV – the treatment) increases scores on a math exam, then you should expect the mean of the difference scores for your sample to be greater than zero (0) • You have one critical value (a positive value), and your calculated t statistic must fall in the tail (the critical region) of the distribution H0 :mD £ 0 H1 :mD > 0
- 23. tcrit One-Tailed Tests: Directional Hypotheses • If the group 1 should have a lower mean than group 2 • Example: If you are testing to see if an overheated room (the IV) lowers students’ attention spans (the DV), then you should expect the mean of the difference scores to be less than zero (0) • You have one critical value (a negative value), and your calculated t statistic must fall in the tail (the critical region) of the distribution H0 :mD ³ 0 H1 :mD < 0
- 24. Measuring Effect Size Cohen’s d • Produces a standardized measure of mean difference • numerator: observed difference between the means • denominator: difference expected to occur by chance r2 • measures how much of the variability in scores can be explained by the treatment effects Confidence Intervals • estimates the size of the population mean difference between the two populations or treatment conditions d = MD s dft t r 2 2 2 )( DMDD stM
- 25. Example 11.1 (Elliot & Niesta, 2008) • Does the color red increase men’s attraction to women? • Set of 30 women’s photographs • 15 mounted on white • 15 mounted on red • 1 “test photograph” appears twice on red and on white • What is the rating of the test photograph in both conditions on a 12-point scale?
- 26. • N = 9 • There were 9 men who rated the “target” photograph twice (2 conditions of the IV) • For each participant, there are two scores: • Rating of “target” photograph on white background • Rating of “target photograph on red background Participant White Background Red Background D D2 A 6 9 +3 9 B 8 9 +1 1 C 7 10 +3 9 D 7 11 +4 16 E 8 11 +3 9 F 6 9 +3 9 G 5 11 +6 36 H 10 11 +1 1 I 8 11 +3 9 ∑D = 27 ∑D2 = 99 n D DSS 2 2 )( 18 8199 9 729 99 9 )27( 99 2 SS SS SS SS
- 27. Repeated Measures t-Test: Step 1 State the hypotheses and select the levelState the hypotheses and select the α level There is no difference between the two colors There is a difference between the two colors Let’s set α = .01 0:0 DH 0:1 DH
- 28. Repeated Measures t-Test: Step 2 Locate the critical region Look for .01 proportion in two tails combined df 1 Dn )19( 8
- 29. Repeated Measures t-Test: Step 3 Compute the test statistic Think of this computation as having 3 steps: 1. Compute variance t DM DD s M 1 2 n SS s 25.2 8 18 Part. White Back Red Back D D2 A 6 9 +3 9 B 8 9 +1 1 C 7 10 +3 9 D 7 11 +4 16 E 8 11 +3 9 F 6 9 +3 9 G 5 11 +6 36 H 10 11 +1 1 I 8 11 +3 9 ∑D = 27 ∑D2 = 99
- 30. Repeated Measures t-Test: Step 3 Compute the test statistic Think of this computation as having 3 steps: 1. Compute variance 2. Use the variance to compute the estimated standard error 9 25.2 n s s DM 2 25. 50.0 25.22 s t DM DD s M Part. White Back Red Back D D2 A 6 9 +3 9 B 8 9 +1 1 C 7 10 +3 9 D 7 11 +4 16 E 8 11 +3 9 F 6 9 +3 9 G 5 11 +6 36 H 10 11 +1 1 I 8 11 +3 9 ∑D = 27 ∑D2 = 99
- 31. Repeated Measures t-Test: Step 3 Compute the test statistic Think of this computation as having 3 steps: 1. Compute variance 2. Use the variance to compute the estimated standard error 3. Compute the t statistic 50.0DMs 25.22 s 50.0 03 t 5. 3 6 t DM DD s M Part. White Back Red Back D D2 A 6 9 +3 9 B 8 9 +1 1 C 7 10 +3 9 D 7 11 +4 16 E 8 11 +3 9 F 6 9 +3 9 G 5 11 +6 36 H 10 11 +1 1 I 8 11 +3 9 ∑D = 27 ∑D2 = 99
- 32. Repeated Measures t-Test: Step 4 Make a decision • Our obtained t = 6.00 is in the critical region Reject the null hypothesis: The background color has a significant effect on the judged attractiveness of the woman in the test photograph.
- 33. Related Sample t-Test: Effect Size • Cohen’s d • r2 47% of the variability in scores can be explained by the treatment effects s M d D 25.2 00.3 5.1 0.3 00.2 r2 = t2 t2 + df 86 6 2 2 836 36 44 36 818.0
- 34. Related Samples t-Test: Effect Size • Confidence Intervals • Our confidence interval is: We are 99% confidence that for general population of men, changing the background color from white to red increases the average attractiveness rating for the woman in the photograph between 1.323 and 4.168 points. )( DMDD stM )50.0(355.33 6775.13 16775.416775.13to3225.116775.13
- 35. Reporting the Results Changing the background color from white to red increased the attractiveness rating of the woman in the photograph by an average of M = 3.00 points with SD = 1.50. The treatment effect was statistically significant, t(8) = 6.00, p < .01, d = 2.00. When using SPSS, you will have the exact probability (in this case p = .000), which you would report instead of “p < .05” t(8) = 6.00, p = .000, d = 2.00. If you are stating r2 to describe the effect size, then you would state: …The reduction was statistically significant, t(8) = 6.00, p = .000, r2 = .818
- 36. CHAPTER 11.4 Uses and Assumptions for Related-Samples t Tests
- 37. The Advantages of Repeated-Measures Design • Requires fewer participants than the Independent Samples Design • Each individual is measured twice • A more efficient use of participants’ time • Able to observe changes over time • Each individual is measured once and then again at a later time • Allows a longitudinal observation • Eliminates problems caused by individual differences • E.g., age, IQ, gender, personality, environmental differences
- 38. The Issue of Individual Differences • Eliminates the concern that any changes between groups could be attributed to something other than the treatment • Potential confounds that may “muddy” the results
- 39. A Disadvantage of the Repeated Measures Design • Time related factors • Changes in environment may produce changes between T1 and T2 • For developmental research: • Puberty causes changes in attitudes and behaviors between T1 and T2 • Attrition due to illness or death • Order effects • Participation during T1 influences performance in T2 • Practice effects: You are now familiar with the measurement utilized in the study, and therefore may be better the second time • Fatigue (mental = boredom; physical = tiredness) may decrease scores in T2 How do we control for these potential confounds?
- 40. Counterbalancing Participants randomly divided into two groups • Group 1 receives Treatment A first, then Treatment B • Sees photo on red background first, then on white background • Group 2 receives Treatment B first, then Treatment A • Sees photo on white background first, then on red background Goal: • Distribute any outside effects evenly over the two treatments. • Practice effects are distributed equally between ttreatments
- 41. Assumptions of the Related Samples t Test There are two basic assumptions: 1. The observations within each treatment condition must be independent 2. The population distribution of difference scores (D values) must be normal (the “Normality” assumption)