1. What is a codebook? Who might create one, what would she or he include in it, and what purpose or purposes would it serve?
2. Explain the distinction between variable labels and value labels in an electronic dataset.
3. True or False: The linear regression model cannot handle curvilinear relationships between independent and dependent variables.
4. True or False: By convention, if we conduct a statistical hypothesis test and obtain a p-value of .3, we would reject the null hypothesis.
5. Suppose you have two SPSS datasets. The first contains the variables ID, X1, X2, and X3 for participants 1 through 100; the second contains the variables ID, X4, X5, and X6 for the same 100 participants. Suppose that the datasets are named EvalPre.sav and EvalPost.sav, and are saved on your computer in following file location:
C:\Documents and Settings\Evaluation\EvalData\
And suppose, finally, that you want to combine these datasets to create a new dataset, to be named EvalPrePost.sav, containing ID and X1 through X6 for all 100 participants. What SPSS syntax would you use to accomplish this?
6. A research team is studying cognitive decline in old age. They collect data on 300 people between the ages of 75 and 95 years. One of the key variables is a measure of one particular aspect of cognitive functioning: Executive function (named EXFUNC in the dataset). For this study it is measured using a test that produces values ranging from 0 to 100, with higher values representing better executive function. The investigators fit a linear regression model to their data and obtain the following estimated model:
EXFUNCi = 161.73 – 1.05AGEi + ei
According to this model, by how many points does the typical score on the executive function scale decline between age 80 and 90?
7. Suppose your boss gives you a dataset and asks you to run frequencies on the variables X1 and X4, and descriptive statistics on the variables X2, X3, X5, and X6. What SPSS syntax would you use to accomplish this task? (Please present only the command(s) that generate the frequencies and descriptive statistics.)
8. Suppose your dataset has a variable, X1, that was derived from a questionnaire item with a response options ranging from Strongly Disagree (coded 1) to Strongly Agree (coded 5). Because the wording of this item runs in the opposite direction of the wording of several related items, you want to create a reverse-coded version of this variable on which Strongly Disagree will be coded 5 while Strongly Agree will be coded 1. What SPSS syntax would you use to accomplish this task?
9. An investigator interested in regional differences in breastfeeding attitudes and practices conducts a national survey. The survey includes a multi-item instrument measuring breastfeeding attitudes. The resulting breastfeeding attitudes scale takes values ranging from 1 to 5, with higher numbers representing more favorable attitudes toward breastfeeding. This scale score is name ...
A Critique of the Proposed National Education Policy Reform
1. What is a codebook Who might create one, what would she or h.docx
1. 1. What is a codebook? Who might create one, what would she
or he include in it, and what purpose or purposes would it
serve?
2. Explain the distinction between variable labels and value
labels in an electronic dataset.
3. True or False: The linear regression model cannot handle
curvilinear relationships between independent and dependent
variables.
4. True or False: By convention, if we conduct a statistical
hypothesis test and obtain a p-value of .3, we would reject the
null hypothesis.
5. Suppose you have two SPSS datasets. The first contains the
variables ID, X1, X2, and X3 for participants 1 through 100; the
second contains the variables ID, X4, X5, and X6 for the same
100 participants. Suppose that the datasets are named
EvalPre.sav and EvalPost.sav, and are saved on your computer
in following file location:
C:Documents and SettingsEvaluationEvalData
And suppose, finally, that you want to combine these datasets to
create a new dataset, to be named EvalPrePost.sav, containing
ID and X1 through X6 for all 100 participants. What SPSS
syntax would you use to accomplish this?
6. A research team is studying cognitive decline in old age.
They collect data on 300 people between the ages of 75 and 95
years. One of the key variables is a measure of one particular
aspect of cognitive functioning: Executive function (named
EXFUNC in the dataset). For this study it is measured using a
2. test that produces values ranging from 0 to 100, with higher
values representing better executive function. The investigators
fit a linear regression model to their data and obtain the
following estimated model:
EXFUNCi = 161.73 – 1.05AGEi + ei
According to this model, by how many points does the typical
score on the executive function scale decline between age 80
and 90?
7. Suppose your boss gives you a dataset and asks you to run
frequencies on the variables X1 and X4, and descriptive
statistics on the variables X2, X3, X5, and X6. What SPSS
syntax would you use to accomplish this task? (Please present
only the command(s) that generate the frequencies and
descriptive statistics.)
8. Suppose your dataset has a variable, X1, that was derived
from a questionnaire item with a response options ranging from
Strongly Disagree (coded 1) to Strongly Agree (coded 5).
Because the wording of this item runs in the opposite direction
of the wording of several related items, you want to create a
reverse-coded version of this variable on which Strongly
Disagree will be coded 5 while Strongly Agree will be coded 1.
What SPSS syntax would you use to accomplish this task?
9. An investigator interested in regional differences in
breastfeeding attitudes and practices conducts a national survey.
The survey includes a multi-item instrument measuring
breastfeeding attitudes. The resulting breastfeeding attitudes
scale takes values ranging from 1 to 5, with higher numbers
representing more favorable attitudes toward breastfeeding.
This scale score is named BRATT in the dataset. The dataset
also includes a variable named region which takes the following
values: 1 = Northeast, 2 = Southeast, 3 = Midwest, 4 =
3. Southwest, 5 = Rocky Mountains, and 6 = West Coast. The
investigator creates a set of dummy variables and runs a linear
regression model with the breastfeeding attitudes scale as the
dependent variable. The estimated model is
BRATTi = 3.68 + 0.57NORTHEASTi – 0.13SOUTHEASTi –
0.41SOUTWESTi – 0.04ROCKIESi + 0.77WESTCOASTi + ei
According to this model, what is the mean score on the
breastfeeding attitudes scale for respondents residing in the
Midwest?
10. Suppose you have a dataset with items X7, X8, X9, and
X10 and you want to sum these items to create a scale score
with variable name SumX. What SPSS syntax would you use to
accomplish this task?
11. Suppose you have a dataset that contains a variable, named
BMI, that gives measured BMI values for 100 adults. And
suppose that you want to create a new variable, BMI3CAT, that
categorizes participants as normal, overweight, or obese
according to the following scheme.
BMI
BMI3CAT
< 25
1
≥ 25 and < 30
2
≥ 30
3
What SPSS syntax would you use to accomplish this?
12. Some researchers claim that exclusive breastfeeding of
infants from birth to six months of age can boost a child’s
4. intelligence. Others are skeptical and believe that previous
findings to this effect may be attributable to confounding by
maternal socioeconomic status. That is, higher socioeconomic
status mothers may be more likely to practice exclusive
breastfeeding through six months of age; and high material
socioeconomic status may contribute to the development of
intelligence in the child through mechanisms other than
breastfeeding. An investigator studying these issues has a
dataset on 1422 mother-child dyads. The dataset contains the
following key variables: CHILDIQ, the intelligence of the child
as measured via the Stanford-Binet IQ test age age 6 years, with
higher scores indicating greater intelligence; BRSTFD, the
mother’s self-report of whether or not she breastfed the child
exclusively through six months of age (0 = no, 1 = yes); and
MOMSES, an index of maternal socioeconomic status derived
from information about educational attainment, income, and her
own parents’ occupations. The investigator finds that the mean
score on Stanford-Binet IQ test was 107.32 among the 454
children who were exclusively breastfed for six months; and
102.55 for the 968 children who were not exclusively breast fed
through six months of age. Thus, the average breastfed child
had an IQ 4.77 points higher than the average non-breastfed
child. To determine the extent to which this difference could
attributable to confounding by maternal socioeconomic status
rather than to an actual effect of breastfeeding on intelligence,
the investigator next runs the following linear regression model:
If the difference is due in part to confounding by maternal SES,
compare to the raw difference of 4.77?
13. True or False: A boxplot is a useful way of examining the
distribution of a dichotomous variable.
5. 14. Suppose you have a dataset that includes the variable
BMI3CAT as described in question 8, and you wish to create a
bar chart that shows how many people fall into the three
categories: normal, overweight, and obese. What SPSS syntax
would you use to obtain that bar chart?
15. Suppose you are trying to use linear regression analysis to
determine whether the effect of one variable, X1, on another
variable, Y, depends upon the value taken by a third variable,
X2. What type of term should you include in your regression
model?
A. A curvilinear term
B. A logistic term
C. An interaction term
D. An orthogonal term
E. None of the above
16. True or False: A cross-tabulation is a useful way of
examining how two categorical variables are related.
17. Suppose you have obtained from SPSS the correlation
matrix in Appendix 1. According to the information in this
matrix, which two variables exhibit the strongest linear
relationship?
18. In your own words, what is “confounding” and why is it
sometimes a problem in observational studies?
19. Suppose that your agency has been evaluating an
intervention using a posttest-only control group design. There
were 155 people in the treatment group and 140 in the control
group. The outcome variable is continuous and, according to
boxplots, appears to follow a bell-shaped distribution with
similar variances in the treatment and control groups. What
statistical test would be most appropriate for testing the null
6. hypothesis of no intervention effect?
20. Suppose your agency has been evaluating an intervention
using a one-group pretest-posttest design with 20 participants.
The focal variable is continuous but, in looking at the boxplots,
you see that it is skewed heavily to the left both before and
after the intervention. What statistical test would be most
appropriate for testing the null hypothesis of no intervention
effect (i.e., no change from before to after the intervention)?
21. Suppose you wish to include a nominal variable that takes
four different values as an independent variable in a logistic
regression model. How many dummy variables should you
include in your logistic regression model in order to accomplish
this?
22. True or False: ANCOVA is often used to test the null
hypothesis of no intervention effect in the context of an impact
evaluation using pretest-posttest control-group design with a
continuous dependent variable.
23. Suppose your agency has been evaluating an intervention
using a one-group pretest-posttest design with 200 participants.
The focal variable is dichotomous. What statistical test would
be most appropriate for testing the null hypothesis of no
intervention effect (i.e., no change from before to after the
intervention)?
24. A linear regression model with a single dummy variable
predicting a continuous dependent variable is equivalent to
which of the following statistical tests?
A. Fisher’s Exact Test
B. Independent samples t-test (unequal variances version)
C. Paired t-test
D. Mann-Whitney U test
7. E. Independent samples t-test (equal variances version)
25. Suppose that your agency has been evaluating an
intervention using a posttest-only control group design. There
were 95 people in the treatment group and 111 in the control
group. The outcome variable takes the following three ordinal
values: normal, overweight, and obese. What statistical test
would be most appropriate for testing the null hypothesis of no
intervention effect?
26. True or False: When running a two-sample t-test, if the
Levene’s test gives a p-value of .023, you should look at the
equal variances rather than the unequal variances version of the
t-test.
27. True or False: Before you run a logistic regression model,
you must first use a COMPUTE command to enact the log-odds
or logit transformation on the dichotomous dependent variable.
28. Suppose you want to conduct a chi-square test of
independence and Fisher’s exact test for the relationship
between two dichotomous variables. The first variable, named
TREAT, is coded 0 for control group members and 1 for
treatment group members. The second variable, named
POSTSMOKE, indicates smoking status assessed one month
after the intervention being evaluated; it is coded 0 for
participants who were not smoking, and 1 for participants who
were smoking, at that time. What SPSS syntax would you use
to obtain these hypothesis tests?
29. Suppose you want to conduct a paired t-test to see if post-
intervention knowledge scores, KNOWPOST, differ
significantly on average from pre-intervention knowledge
scores, KNOWPRE, in a dataset containing information on
people exposed to the intervention. What SPSS syntax would
you use to obtain the paired t-test?
8. 30. What do you get when you exponentiate a logistic
regression coefficient?
A. A relative risk
B. A hazard ratio
C. A quadratic term
D. An odds ratio
E. An intercept
31. Suppose you are want to conduct a Mann-Whitney U test to
test for an effect of an intervention on a continuous but highly
skewed outcome in a small scale experiment using a posttest-
only control group design. The variable TREAT is coded 0 for
control group members and 1 for treatment group members.
The continuous but skewed outcome variable is a measure of
blood glucose; in the dataset it is named GLUCOSE. What
SPSS syntax would you use to obtain the Mann-Whitney U test?
32. Suppose that the analysis of data from an impact evaluation
using a posttest-only control group design with a dichotomous
outcome variable, SICKPOST, resulted in the SPSS output
appearing in Appendix 2. How would you quantify the
estimated effect of the intervention in terms of a relative risk,
and do the statistical tests provide support for the effectiveness
of this intervention?
33. Suppose that you a part of a team that has been analyzing
data from an impact evaluation that a one-group pretest-posttest
design. Measures of social support were obtained before and
after the intervention using the same instrument. They are
continuous. The pre-intervention social support variable
appears in the dataset as SSPRE, and the post-intervention
appears as SSPOST. A paired t-test was conducted and you
have been presented with the output appearing in Appendix 3.
How would you quantify the estimated effect of the intervention
9. in terms of a mean difference, and do the statistical tests
provide support for the effectiveness of this intervention?
34. Suppose you have a dataset containing survey data on 783
adolescents. For each adolescent, you have a variable
EVERSEX that indicates whether she or he has ever had sexual
intercourse (0 for no, 1 for yes). You’re interested in how the
likelihood of sexual activity varies in relation to three
variables: perceived parental disapproval of teen sexual
activity, perceived peer norms valuing sexual activity, and age.
Perceived parental disapproval is a scale score derived from
multiple Likert-type questionnaire items, and is named PARDIS
in your dataset. Perceived peer norms is also a scale score, and
is named PEERNORM. Age is a continuous variable computed
as the difference between the date of data collection and each
respondent’s date of birth, and is named AGE in your dataset.
What SPSS syntax would you use to run a logistic regression
model with these three independent variables predicting
EVERSEX?
35. Suppose that your team has been analyzing data from an
impact evaluation that used a posttest-only control group
design. There were 227 participants in the treatment group, and
236 in the control group. The key outcome is a scale score
measuring social support. In the dataset, the experimental
group variable is named TREAT, and takes the value 0 for
control group members and 1 for treatment group members;
while the outcome variable is named SOCSUPP. A member of
your team used SPSS to conduct a two-sample t-test, and you
have been presented with the output appearing in Appendix 4.
How would you quantify the estimated effect of the intervention
in terms of a mean difference, and does the statistical test
provide support for the effectiveness of the intervention?
36. One more question. Suppose your team has been analyzing
data from an impact evaluation that used a one-group pretest-
10. posttest design. In the evaluation 50 participants were
classified as either sick or not sick both before and after the
intervention. The pre-intervention measure appears in the
dataset as a variable named SICKPRE, taking values 0 for not
sick and 1 for sick. The post-intervention measure uses the
same coding scheme and is named SICKPOST. A member of
your team has used McNemar’s test to see whether the
proportion of participants who were sick declined significantly
over the course of the intervention, and you have been presented
with the output appearing in Appendix 5. Looking at this
output, did the proportion classified as sick increase or decrease
over the course of the intervention, and was this change
statistically significant?
Appendix 1. SPSS Output for Question 17.
X1
X2
X3
X1
Pearson Correlation
1
.303**
-.441**
12. 500
500
500
** Correlation is significant at the .01 level (2-tailed).
* Correlation is significant at the .05 level (2-tailed).
Appendix 2. SPSS Output for Question 32.
TREAT*SICKPOST Crosstabulation
SICKPOST
0.00 Not Sick
1.00 Sick
Total
TREAT
0.00 Control
Count
10
90
100
% within TREAT
10.0%
90.0%
100.0%
13. 1.00 Treated
Count
30
70
100
% within TREAT
30.0%
70.0%
100.0%
Total
Count
40
160
200
% within TREAT
20.0%
80.0%
100.0%
Chi-Square Tests
Value
df
Asymp. Sig.
(2-sided)
Exact Sig.
(2-sided)
Exact Sig.
15. a. 0 cells (.0%) have expected counts less than 5. The
minimum expected count is 10.00.
b. Computed only for a 2x2 table.
Appendix 3. SPSS Output for Question 33
Paired Samples Statistics
Mean
N
Std. Deviation
Std. Error Mean
Pair 1
SSPOST
18.5150
200
3.51876
.24881
SSPRE
18.0200
200
3.45678
.24443