1
Please write up and interpret the results for the following repeated measures ANOVA, using the Activity 6.sav data set. Score_0 through Score _12 is the repeated measure (7 levels) and gender is a fixed factor. Discuss especially both main effects and the presence/absence of an interaction between the two.
All of the relevant data is given below.
Within-Subjects Factors
Measure: MEASURE_1
Score
Dependent Variable
1
Score_0
2
Score_2
3
Score_4
4
Score_6
5
Score_8
6
Score_10
7
Score_12
Between-Subjects Factors
Value Label
N
Gender
F
Female
8
M
Male
4
Descriptive Statistics
Gender
Mean
Std. Deviation
N
Pre-test score
Female
28.25
8.172
8
Male
32.25
19.432
4
Total
29.58
12.221
12
Week 2 score
Female
29.75
6.319
8
Male
39.75
13.889
4
Total
33.08
10.113
12
Week 4 score
Female
33.63
5.181
8
Male
39.00
16.432
4
Total
35.42
9.885
12
Week 6 score
Female
35.88
6.556
8
Male
35.25
17.802
4
Total
35.67
10.671
12
Week 8 score
Female
39.38
5.370
8
Male
41.00
16.633
4
Total
39.92
9.718
12
Week 10 score
Female
44.88
5.743
8
Male
47.25
13.961
4
Total
45.67
8.690
12
Week 12 score
Female
48.38
8.518
8
Male
53.25
13.793
4
Total
50.00
10.189
12
Multivariate Testsa
Effect
Value
F
Hypothesis df
Error df
Sig.
Score
Pillai's Trace
.961
20.439b
6.000
5.000
.002
Wilks' Lambda
.039
20.439b
6.000
5.000
.002
Hotelling's Trace
24.526
20.439b
6.000
5.000
.002
Roy's Largest Root
24.526
20.439b
6.000
5.000
.002
Score * Gender
Pillai's Trace
.491
.804b
6.000
5.000
.607
Wilks' Lambda
.509
.804b
6.000
5.000
.607
Hotelling's Trace
.965
.804b
6.000
5.000
.607
Roy's Largest Root
.965
.804b
6.000
5.000
.607
a. Design: Intercept + Gender
Within Subjects Design: Score
b. Exact statistic
Mauchly's Test of Sphericitya
Measure: MEASURE_1
Within Subjects Effect
Mauchly's W
Approx. Chi-Square
df
Sig.
Epsilonb
Greenhouse-Geisser
Huynh-Feldt
Lower-bound
Score
.001
56.876
20
.000
.441
.674
.167
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix.
a. Design: Intercept + Gender
Within Subjects Design: Score
b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.
Tests of Within-Subjects Effects
Measure: MEASURE_1
Source
Type III Sum of Squares
df
Mean Square
F
Sig.
Score
Sphericity Assumed
3246.536
6
541.089
20.609
.000
Greenhouse-Geisser
3246.536
2.646
1227.164
20.609
.000
Huynh-Feldt
3246.536
4.045
802.659
20.609
.000
Lower-bound
3246.536
1.000
3246.536
20.609
.001
Score * Gender
Sphericity Assumed
182.155
6
30.359
1.156
.342
Greenhouse-Geisser
182.155
2.646
68.853
1.156
.341
Huynh-Feldt
182.155
4.045
45.035
1.156
.344
Lower-bound
182.155
1.000
182.155
1.156
.307
Error(Score)
Sphericity Assumed
1575.321
60
26.255
Greenhouse-Geisser
1575.321
26.456
59.546
Huynh-Feldt
1575.321
40.447
38.948
Lower-bound
1575.321
10.000
15.
1Please write up and interpret the results for the following.docx
1. 1
Please write up and interpret the results for the following
repeated measures ANOVA, using the Activity 6.sav data set.
Score_0 through Score _12 is the repeated measure (7 levels)
and gender is a fixed factor. Discuss especially both main
effects and the presence/absence of an interaction between the
two.
All of the relevant data is given below.
Within-Subjects Factors
Measure: MEASURE_1
Score
Dependent Variable
1
Score_0
2
Score_2
3
Score_4
4
Score_6
5
Score_8
6
Score_10
7
Score_12
Between-Subjects Factors
7. Roy's Largest Root
.965
.804b
6.000
5.000
.607
a. Design: Intercept + Gender
Within Subjects Design: Score
b. Exact statistic
Mauchly's Test of Sphericitya
Measure: MEASURE_1
Within Subjects Effect
Mauchly's W
Approx. Chi-Square
df
Sig.
Epsilonb
Greenhouse-Geisser
Huynh-Feldt
Lower-bound
Score
.001
56.876
20
.000
.441
.674
.167
Tests the null hypothesis that the error covariance matrix of the
8. orthonormalized transformed dependent variables is
proportional to an identity matrix.
a. Design: Intercept + Gender
Within Subjects Design: Score
b. May be used to adjust the degrees of freedom for the
averaged tests of significance. Corrected tests are displayed in
the Tests of Within-Subjects Effects table.
Tests of Within-Subjects Effects
Measure: MEASURE_1
Source
Type III Sum of Squares
df
Mean Square
F
Sig.
Score
Sphericity Assumed
3246.536
6
541.089
20.609
.000
Greenhouse-Geisser
3246.536
2.646
1227.164
20.609
.000
Huynh-Feldt
3246.536
4.045
802.659
20.609
14. Order 4
266.594
10
26.659
Order 5
88.403
10
8.840
Order 6
27.330
10
2.733
Tests of Between-Subjects Effects
Measure: MEASURE_1
Transformed Variable: Average
Source
Type III Sum of Squares
df
Mean Square
F
Sig.
Intercept
114349.339
1
114349.339
15. 188.733
.000
Gender
290.720
1
290.720
.480
.504
Error
6058.804
10
605.880
a. Is the assumption of sphericity violated? How can you tell?
What does this mean in the context of interpreting the results?
Mauchly's Test of Sphericitya
Measure: MEASURE_1
Within Subjects Effect
Mauchly's W
Approx. Chi-Square
df
Sig.
Epsilonb
Greenhouse-Geisser
Huynh-Feldt
16. Lower-bound
Score
.001
56.876
20
.000
.441
.674
.167
Tests the null hypothesis that the error covariance matrix of the
orthonormalized transformed dependent variables is
proportional to an identity matrix.
a. Design: Intercept + Gender
Within Subjects Design: Score
b. May be used to adjust the degrees of freedom for the
averaged tests of significance. Corrected tests are displayed in
the Tests of Within-Subjects Effects table.
The above table depicts the results of Mauchly’s Test of
Spheriicty which tests for one of the assumptions of the
ANOVA with repeated measures, namely, sphericity
(homogeneity of covariance). This particular table is important
for viewing as this assumption is commonly violated. In this
case, since p-value is less than .05, I conclude that there are
significant differences between the variance of difference.
Therefore, the condition of sphericity has not been met.
b. Is there a main effect of gender? Is so, explain the effect. Use
post hoc tests when necessary (or explain why they are not
required in this specific case).
In this case, there is no main effect of gender since gender
has a p-value of .504 which means that this is not significant at
the 5% level. Also, in this case, the effect is not significant so
there is no need for a post hoc test. Moreover, if the effect was
significant, then we would not be able to perform the post hoc
test since we only have two categories. Post hoc can be run if
there are more than two classifications.
17. c. Is there a main effect tie (i.e. an increase in scores from
Week 0 to Week 12)? If so, explain the effect. Use post hoc
tests when necessary (or explain why they are not required in
this specific case). Examine the output carefully and give as
much detail as possible in your findings.
Tests of Within-Subjects Effects
Measure: MEASURE_1
Source
Type III Sum of Squares
df
Mean Square
F
Sig.
Score
Sphericity Assumed
3246.536
6
541.089
20.609
.000
Greenhouse-Geisser
3246.536
2.646
1227.164
20.609
.000
Huynh-Feldt
3246.536
4.045
802.659
20.609
.000
30. 4
15.250*
2.711
.000
9.209
21.291
5
10.625*
2.065
.000
6.024
15.226
6
4.750*
1.705
.019
.952
8.548
Based on estimated marginal means
a. Adjustment for multiple comparisons: Least Significant
Difference (equivalent to no adjustments).
*. The mean difference is significant at the .05 level.
The mean effect of score is significant at 5% level of
significance. From the table, I am able to ascertain the F-value
for the score factor, its associated significance level, and the
effect size (Partial Eta Squared). Because my data violated the
assumption of sphericity, I examine the values in the
Greenhouse-Geisser row (if sphericity had not been violated, I
31. would have looked under the Sphericity Assumed row). Thus, I
can report that when using an ANOVA with repeated measures
with a Greenhouse-Geiseer correction, the mean scores for
weeks were statistically significantly different (F(2.646,60) =
20.609, p < 0.0005.
In addition, in looking at the above Paired Comparisons
Table, I recognize the labels associated with score in the
experiment from the Within-Subject Factors Table. This is a
table which gives the significance level of differences between
the individual time points. It can be seen that there was a
significant difference in scores in training from pre to week 12.
The p-values indicate the significant differences between the
groups.
LAB 2:
Descriptive Statistics
1
Descriptive statistics are numerical estimates that organize and
sum up or present the data.
For quantitative variables (scale)
Mean with Standard deviation are used to summarize non-
skewed scale variables
Median with range or interquartile range are used to summarize
32. skewed scale variables
The three steps to evaluate the normality assumption are:
Compare the statistics values ( mean versus median)
Obtain the histogram with normal curve
Obtain the Box-Whiskers plot
For this class,
If there is any extreme outliers, median with range should be
used to summarize the variable of interest
If there is any outliers (regular outliers), you need to based your
decision regarding the best measure (mean with SD or median
with range) to summarize the variable of interest on the shape
of the histogram
Introduction
2
For qualitative variable (nominal or ordinal)
Frequency distributions (number with percentages) are used to
summarize qualitative variables
Descriptive statistics for multiple groups:
Use Split file option in SPSS to obtain the measures of central
tendency and the measures of variation for quantitative
variables.
After you split your file by the grouping variable, you should
follow the previous steps to select the most appropriate
measures to summarize your variable of interest.
Please note that you have to un-split the data before running
further analysis
Use Crosstabs option in SPSS to obtain the frequency
distributions (number with percentages) for the qualitative
variables
Introduction
33. 3
4
Types of variables
Continuous (Quantitative) Variables
Qualitative (Categorical) Variables
Nominal/ Ordinal
Interval/Ratio
Number and Percent
N (%)
Normal distribution
1- Statistics {Mean and Median}
2- Histogram with Normal Curve
3- Box-Whiskers Plot
40. number of people from Cornwall, Ontario, Canada who attended
a lifestyle intervention program (Coronary Health Improvement
Project, or better known as CHIP) consisting of a series of
lectures and personal counseling sessions every day for a five
day period. This data set consists of a number of demographic
and clinical variables. Create the lifestyle dataset using the
table below and answer the following questions.
Variable View
Data View
Table 1
Variable
Mean ± SD
Median (Range)
N (%)
Age
Exercise
Smoke
Weight
Glucose
Question1: Summarize using the most appropriate measure the
following variables presented in the Table 1.1. Choose either
the mean, median or n(%) as the most appropriate measure for
each variable.
Answer: 1.1
41. For quantitative variables:
Step1: Compare statistics values (mean versus median) for all
variables
Step2: Evaluate the normal curve
Answer: 1.1
For quantitative variables:
Step3: Evaluate the Box - Whiskers Plot
Variable Statistics
(mean vs. medianHistogram with normal curve Box- Whiskers
42. PlotDecisionAgeCloseNormalOne regular outlier (no extreme
outliers) Mean ± SDWeightCloseNormalNo outliersMean ±
SDGlucoseCloseSkewed Extreme outlierMedian (Range)
Normality Assumption Checklist
For qualitative variables
Question1: Summarize using the most appropriate measure the
following variables presented in the Table 1.1.
Question2: Summarize using the most appropriate measure the
following variables presented in the Table 1.2.
Step1: Split file by grouping variable (Gender)
Question2: Summarize using the most appropriate measure the
following variables presented in the Table 1.2.
Step2: Compare statistics values (mean versus median) for all
variables
Question2: Summarize using the most appropriate measure the
43. following variables presented in the Table 1.2.
Step3: Evaluate the normal curve
Male
Female
Question2: Summarize using the most appropriate measure the
following variables presented in the Table 1.2.
Step4: Evaluate the Box - Whiskers Plot
Male
Female
Question2: Summarize using the most appropriate measure the
following variables presented in the Table 1.2.
Note: Un-split the data file before running further analysis
Question3: Summarize using the most appropriate measure the
44. following variables presented in the Table 1.3.
Qualitative Variables
Male : n (%)
Female : n (%)
Frame Small
Medium
Large
Exercise: None
Mild
Moderate
Vigorous
Question3: Summarize using the most appropriate measure the
following variables presented in the Table 1.3.
Step1: Use Crosstabs option in SPSS to obtain the frequency
distributions for the qualitative variable by the grouping
variable
Question3: Summarize using the most appropriate measure the
following variables presented in the Table 1.3.
Qualitative Variables
Male : n (%)
Female : n (%)
Frame Small
0 (0.0)
0 (0.0)
Medium
4 (40.0)
6 (60.0)
Large
48. 0 (0.0)
Males n =
Females n =
Quantitative Variables
Mean ± SD Median (Range)
Mean ± SD Median (Range)
Age
Baseline Weight
Baseline Glucose
49. Males n =
Females n =
Quantitative Variables
Mean ± SD Median (Range)
Mean ± SD Median (Range)
Age
Baseline Weight
Baseline Glucose
Males n =
Females n =
Quantitative Variables
50. Mean ± SD Median (Range)
Mean ± SD Median (Range)
Age
Baseline Weight
Baseline Glucose
Males n =10
Females n =10
Quantitative Variables
Mean ± SD Median (Range)
Mean ± SD Median (Range)
51. Age
56.1 ± 14.3
47.1 ± 9.6
Baseline Weight
189.3 ± 35.8
150.5 (132)
Baseline Glucose
5.5 (5)
5.01 (1.7)
Males n =10
Females n =10
Quantitative Variables
Mean ± SD Median (Range)
Mean ± SD Median (Range)
Age
56.1 ± 14.3
47.1 ± 9.6
52. Baseline Weight
189.3 ± 35.8
150.5 (132)
Baseline Glucose
5.5 (5)
5.01 (1.7)
Lab Practice
1. How many single males are in this study?
_______________6___________
MARITAL STATUS * GENDER Crosstabulation
GENDER
Total
MALE
FEMALE
MARITAL STATUS
SINGLE
Count
6
12
18
53. % within MARITAL STATUS
33.3%
66.7%
100.0%
% within GENDER
3.6%
4.7%
4.2%
MARRIED
Count
145
201
346
% within MARITAL STATUS
41.9%
58.1%
100.0%
% within GENDER
86.8%
77.9%
81.4%
DIVORCED
Count
14
20
34
54. % within MARITAL STATUS
41.2%
58.8%
100.0%
% within GENDER
8.4%
7.8%
8.0%
WIDOWED
Count
2
25
27
% within MARITAL STATUS
7.4%
92.6%
100.0%
% within GENDER
1.2%
9.7%
6.4%
Total
Count
167
258
425
% within MARITAL STATUS
39.3%
55. 60.7%
100.0%
% within GENDER
100.0%
100.0%
100.0%
2. What percent of baseline non-smokers are females? 61.7%
BASELINE SMOKING STATUS * GENDER Crosstabulation
GENDER
Total
MALE
FEMALE
BASELINE SMOKING STATUS
SMOKER
Count
18
19
37
% within BASELINE SMOKING STATUS
48.6%
51.4%
100.0%
% within GENDER
10.8%
7.3%
8.7%
56. NON-SMOKER
Count
149
240
389
% within BASELINE SMOKING STATUS
38.3%
61.7%
100.0%
% within GENDER
89.2%
92.7%
91.3%
Total
Count
167
259
426
% within BASELINE SMOKING STATUS
39.2%
60.8%
100.0%
% within GENDER
100.0%
100.0%
100.0%
57. 3. What percent of females are widowed? 9.7%
MARITAL STATUS * GENDER Crosstabulation
GENDER
Total
MALE
FEMALE
MARITAL STATUS
SINGLE
Count
6
12
18
% within MARITAL STATUS
33.3%
66.7%
100.0%
% within GENDER
3.6%
4.7%
4.2%
MARRIED
Count
145
201
346
58. % within MARITAL STATUS
41.9%
58.1%
100.0%
% within GENDER
86.8%
77.9%
81.4%
DIVORCED
Count
14
20
34
% within MARITAL STATUS
41.2%
58.8%
100.0%
% within GENDER
8.4%
7.8%
8.0%
WIDOWED
Count
2
25
27
59. % within MARITAL STATUS
7.4%
92.6%
100.0%
% within GENDER
1.2%
9.7%
6.4%
Total
Count
167
258
425
% within MARITAL STATUS
39.3%
60.7%
100.0%
% within GENDER
100.0%
100.0%
100.0%
4. What is the median baseline weight? 171.50
Statistics
BASELINE WEIGHT (lbs)
N
Valid
426
60. Missing
0
Median
171.50
5. What is the males mean baseline pulse? 69.32
Statistics
BASELINE PULSE (RESTING)
MALE
N
Valid
167
Missing
0
Mean
69.32
FEMALE
N
Valid
259
Missing
0
Mean
75.16
6. What percent of the subjects have baseline glucose of 5 or
less? 33.3%
61. 7. Create difference in cholesterol from baseline to six-weeks
and indicate the mean difference in cholesterol - 0.7007± 0.749
COMPUTE Chol_diff=chol2 - chol.
EXECUTE.
Statistics
Chol_diff
N
Valid
426
Missing
0
Mean
-.7007
Std. Deviation
.74867
Questions 8 - 9: Classify the study subjects BMI at baseline into
the categories indicated in Table 4.1
62. Table 1
BMI at Baseline
Description
≤ 18.499
Underweight
18.5 – 24.999
Normal
25 - 29.999
Overweight
≥ 30
Obesity
8. Indicate the percent of the subjects with overweight 41.1%
9. What percent of the female population have normal weight?
29.3%
65. % within GENDER
100.0%
100.0%
100.0%
10. Number in study (N) Total: 426 Males: 167 Females:
259
GENDER
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
MALE
167
39.2
39.2
39.2
FEMALE
259
60.8
60.8
100.0
Total
426
100.0
100.0
66. 11. BMI at baseline
Statistics
BMI_baseline
MALE
N
Valid
167
Missing
0
Mean
28.9953
Std. Deviation
4.91355
Minimum
20.88
Maximum
49.02
FEMALE
N
Valid
259
73. % within MARITAL STATUS
5.6%
4.6%
14.7%
0.0%
5.2%
Total
Count
18
346
34
26
424
% within BASELINE EXERCISE LEVEL
4.2%
81.6%
8.0%
6.1%
100.0%
% within MARITAL STATUS
100.0%
100.0%
100.0%
100.0%
100.0%
14. What is the best measure to summarize baseline systolic
74. blood pressure (mmHg)?
Median (Range): 130 (192)
Statistics
BASELINE SYSTOLIC BLOOD PRESSURE (mmHg)
N
Valid
426
Missing
0
Mean
131.14
Median
130.00
Std. Deviation
21.293
Range
192
15. Indicate which of the following are true or false:
I. In SPSS, Transform compute procedure is used to create a
qualitative variable from a quantitative variable (F)
II. Pie chart can be used to display qualitative variables only.
(T)
III. If an outlier is added to a dataset, the mean will be changing
more than the median. (T)
STAT LAB MIDTERM REVIEW SHEET
Following are some of the key topics covered in the first half of
75. STAT Lab. Use this sheet as a study guide for the midterm.·
Quantitative variables:
· Be able to compute and compare mean and median of a set of
data
· Be able to construct a histogram given a set of data
· Be able to construct a box plot given a set of data
· Be able to assess the normal distribution using statistics,
histogram and boxplot
· Be able to identify outliers
· Be able to summarize/ describe variable using the most
appropriate measure· Qualitative variables:
· Be able to summarize qualitative variables using number and
percentage
· Be able to construct a pie chart given a set of data
· Be able to interpret row and column percentages · How to run
the following procedure in SPSS:
· Be able to run split file procedure
· Be able to run crosstabs procedure
· Be able to create a new variable using transform → compute
procedure
· Be able to create a new variable using transform → Recode
into different variables procedure · How to interpret SPSS
output of the following:
· Frequency tables
· Crosstabs
· Histograms
· Boxplots
· Pie charts
Lab Practice
1. How many single males are in this study?
_________________________________
2. What percent of baseline non-smokers are females?
_______________________________
76. 3. What percent of females are widowed?
______________________________________
4. What is the median baseline weight?
________________________________________
5. What is the males mean baseline pulse?
_________________________________________
6. What percent of the subjects have baseline glucose of 5 or
less? ______________________
7. Create difference in cholesterol from baseline to six-weeks
and indicate the mean difference in cholesterol
_____________________________________________________
______________
Questions 8 - 9: Classify the study subjects BMI at baseline into
the categories indicated in Table 4.1
Table 1
BMI at Baseline
Description
≤ 18.499
Underweight
18.5 – 24.999
Normal
77. 25 - 29.999
Overweight
≥ 30
Obesity
8. Indicate the percent of the subjects with overweight
_____________________
9. What percent of the female population have normal weight?
____________________
10. Number in study (N) Total……….. Males ………..
Females………………..
11. BMI at baseline Males : Mean ……… SD……….
Min……….Max………
Females: Mean ………
SD………. Min……….Max………
12. What percent of those who do not exercise at baseline are
males?
13. What percent of singles practice moderate level of baseline
exercise?
14. What is the best measure to summarize baseline systolic
78. blood pressure (mmHg)
15. Indicate which of the following are true or false:
I. In SPSS, Transform compute procedure is used to create a
qualitative variable from a quantitative variable
II. Pie chart can be used to display qualitative variables only.
III. If an outlier is added to a dataset, the mean will be changing
more than the median.
STAT 509 Lab 1 Assignment
Instructions:
1. Due Date
· April 14 @ 11:50pm
2. Assignment submission:
· When you submit an electronic file (assignment) you must use
the following format: This is for: LAST NAME (in upper case),
your first name, the course code, activity/assignment number
and file format
· For example: SMITHJohn-STAT509-A1.doc.
Assignment:
Following is the dictionary for a data set. The data collected on
a number of people from Cornwall, Ontario, Canada who
attended a lifestyle intervention program (Coronary Health
Improvement Project, or better known as CHIP) consisting of a
series of lectures and personal counseling sessions every day for
a five day period. This data set consists of a number of
demographic and clinical variables. In order for data to be
easily analyzed it must first be entered into a computer data
base.
· Create the Lifestyle dataset for the variables below:
Id
Age
Sex
84. 4
2
2
172
136
4.90
In Table 1.1 summarize the lifestyle data using the most
appropriate measure?
Table 1.1 Comment by Microsoft account: Choose
only one cell out of 3 according to the variable type;
85. quantitative a) normal looking histogram b) not normally
looking histogram, qualitative a) nominal or b) ordinal.
(-2 pt)
Variable
Mean ± SD
Median (Range)
N (%)
Age
51.60
20%
Gender
1.50(1)
20% Comment by Microsoft account: Gender, Exercise, Smoke
and Frame are Qualitative variables. Thus you need to obtain
count (n) and % for each category and report in this column:
N(%). (-2 pt)
Exercise
2.10
1.50(3)
20%
Smoke
2
2(0)
20%
Frame
2.50
86. 20%
Weight
20%
Sysbp
20%
Glucose
20%
a) To Obtain Descriptive Statistics for the Quantitative
Variables (Mean or Median)
· From the menus choose:
Analyze-Descriptive Statistics-Frequencies -Select the
quantitative variables-Click Statistics for descriptive
statistics for quantitative variables and select mean, median,
standard deviation, and range-Click continue then OK
b) To Obtain Histograms
· From the menus choose:
Analyze-Descriptive Statistics-Frequencies -Enter Appropriate
Variables-Click Chart-Select Histograms-Check Display Normal
Curve box-Continue then OK
c) To complete Table 1.1 for the qualitative variables, use
frequency tables:
· From the menus choose:
109. Construct a pie chart for each of the following variables:
· Gender
· Body Frame Size
a) To Obtain Pie Charts for Frequencies
· From the menus choose:
Analyze-Descriptive Statistics-FrequenciesCharts and
select the specific chart typeContinue then OK
Please note: Submit completed Table 1.1 and the two pie charts
112. Percent
Gender
Sex 2.0 1.0 1.0 2.0 2.0 1.0 1.0 2.0 2.0 2.0 1.0
1.0 2.0 1.0 2.0 1.0 2.0 2.0 1.0 1.0
Frame 2.0 2.0 3.0 2.0 3.0 2.0 2.0 3.0 3.0 2.0
3.0 3.0 2.0 3.0 2.0 3.0 3.0 2.0 3.0 2.0
LAB 2
Descriptive Statistics
Name____ _____
Following is a dictionary for a data set. The data collected on a
number of people from Cornwall, Ontario, Canada who attended
a lifestyle intervention program (Coronary Health Improvement
Project, or better known as CHIP) consisting of a series of
lectures and personal counseling sessions every day for a five
day period. This data set consists of a number of demographic
and clinical variables. All variables ending in '2' were measured
6 weeks after the baseline variables with the same name.
The data can be found in the desktop
NameDescription
ID
ID
AGE
AGE (yrs)
SEX
113. GENDER
Value Label
1 MALE
2 FEMALE
MARITAL
MARITAL STATUS
Value Label
1 SINGLE
2 MARRIED
3 DIVORCED
4 WIDOWED
EXERCISE
BASELINE EXERCISE LEVEL Value
Label
1 NONE
2 MILD
3 MODERATE
4 VIGOROUS
SMOKE
BASELINE SMOKING STATUS Value
Label
116. PUL2 PULSE (resting)- 6 weeks
LDL2 LDL CHOLESTEROL (mmol/l)- 6 weeks
HDL2 HDL CHOLESTEROL (mmol/l)- 6 weeks
LAB 2
Descriptive Statistics of Combinations of Variables
Objective:
The purpose of this lab is to summarize some of the variables in
the corn1 dataset.
1. Summarize using the most appropriate measure the following
variables presented in Table 2.1. Choose either the mean,
median or n(%) as the most appropriate measure for each
variable.
Table 2.1
Variables
Mean ± SD
Median (Range)
N (%)
Age
52.24±11.60102
Height
65.4155±3.67614
Baseline Weight
175.446±39.4006
119. 37
(8.69%)
389
(91.31%)
To complete Table 2.1, refer to Lab 1 to obtain some descriptive
statistics as well as histograms of relevant variables.
Now run the descriptive and histograms of relevant variables
by:
· From the menus choose:
Analyze-(Descriptive Statistics-(Frequencies -(Enter
Appropriate Variables-(Click Chart-(Select Histograms-
(Display Normal Curve-(Continue-(OK
Now run the Box and Whisker Plot of relevant variables by:
· From the menus choose:
Graphs-(Chart Builder-(Frequencies -(Gallery ( Select Boxplot-
(drag 1- D Boxplot ( Select a variable ( OK
120. Submit completed Table 2.1
2. Above you recorded the summary of the some variables for
the entire population in Table 2.1. However, most of us are
aware that certain physical characteristics are often somewhat
different in females than in males, so usually it is more
informative to stratify (group) information by gender. On the
next page are two tables in which you can record some summary
information which describes this group of people by gender.
2a. Summarize using the most appropriate measure the
following variables presented in Table 2.2
Table 2.2
Males n =167
Females n =259
Quantitative Variables
Mean ± SD Median (Range)
Mean ± SD Median (Range)
Age
52.9281±11.6514
51.7946±11.56913
Height
68.6467±2.69555
63.332±2.54376
Baseline Weight
194.5928±35.71611
157(200)
121. Baseline Pulse
69.3174±10.72048
76(64)
Baseline Cholesterol (mmol/l)
5.4990±1.12798
5.66(8.72)
Baseline Glucose
5.4(13.1)
5.2(12)
Baseline Systolic BP
133.5629±18.97606
124(192)
You will be able to complete the above table, by splitting the
file by the qualitative variable and then running descriptiveon
the quantitative variable:
To Split a Data File for Analysis
· From the menus choose:
Data(Split File(Select Compare groups(Select sex as the
grouping variable(Click ok.
Now run the descriptive and histograms of relevant variables
by:
· From the menus choose:
Analyze-(Descriptive Statistics-(Frequencies -(Enter
122. Appropriate Variables-(Click Chart-(Select Histograms-
(Display Normal Curve-(Continue-(OK
Now run the Box and Whisker Plot of relevant variables by:
· From the menus choose:
Graphs-(Chart Builder-(Frequencies -(Gallery ( Select Boxplot-
(drag 1- D Boxplot ( Select
a variable ( OK
Please Note that you have to un-split the data before running
further analysis
Submit completed Table 2.2
2b. Summarize using the most appropriate measure the
following variables presented in Table 2.3
Table 2.3
Qualitative Variables
Males: n (%)
Females: n (%)
Marital Status: Single
6(3.6%)
12(4.7%)
Married
145(86.8%)
201(77.9%)
Divorced
14(8.4%)
20(7.8%)
124. · From the menus choose:
Analyze(Descriptive Statistics(Crosstabs(Put martial, exercise
and smoke as row variables(Put
sex as the column variable(Click Cells and select column
percentages(Click continue
then OK
Completing the two tables in this manner gives the researcher
an informative ‘picture’ of the type of people in the study.
Submit Completed Table 2.3
LAB 3
Creating New Variables Name_____________________
Objective:
The purpose of this lab is to explore different ways of creating
or adding new variables to a dataset.
In this 6-week life-style changing program, many people
experienced a change in body mass index (BMI). There are
several statistics describing the change in BMI that would be
interesting such as mean BMI change, maximum BMI gain, and
maximum BMI loss. To investigate these statistics and several
others you will need to first create this new variable.
125. 1. Create a new variable called BMI at baseline (BMI1) and
BMI after 6 weeks (BMI2) and then calculate the change in BMI
(BMIchange). To do this, calculate BMI using the following
information: Note: the following formula is for weight (lb), and
height (in). If you use units of weight in kilograms and height
in meters, the constant 703 will drop from it.
BMI1 = (weight at baseline x 703)/Height2
BMI2= (weight after 6weeks x 703)/Height2
BMI Change = BMI2 – BMI1
To compute new data values based on numeric transformations
of existing variables:
· From the menu choose:
Transform
Compute variable
· Enter the name of the target variable BMI1
· Enter the numeric expression… (weight at baseline x
703)/Height2
· Click ok
After you have created BMI1, create BMI2 and change in BMI,
create a frequency table for change in BMI. Use the frequency
table to answer the following questions:
a) For change in BMI, find:
Mean (s.d.) _______ Minimum______ Maximum_______
Mean
Standard Deviation
Minimum
Maximum
BMI_CHANGE
-.98
126. .61
-2.92
2.15
Extreme Outliers (if any): __2.15________
b) What percent of the population reduced BMI 94.8%
c) What is the mean change in BMI for the females - 0.94 ±
0.62
Statistics
BMI_CHANGE
MALE
N
Valid
167
Missing
0
Mean
-1.0524
Std. Deviation
.60359
FEMALE
N
Valid
259
Missing
0
127. Mean
-.9378
Std. Deviation
.61568
Please Note that you have to un-split the data before running
further analysis
2. A) Often it is useful to examine a variable such as baseline
BMI (a quantitative variable) by recoding it into several
meaningful categories (a qualitative variable).
To Recode the Values of a Variable into a New Variable
· From the menus choose:
· Transform
Recode Into Different Variables...
· Select baseline BMI into the box as the “Input Variable”
· For the “Output Variable”, name it BaselineBMI_CAT and
click “Change”.
· Next Click “Old and New Values”.
· In this new box, under Old Value, choose "Range Lowest
through value" and put 20 in there, because you want all the
values lowest through 20. Under New Value, put in a 1 for
category 1, and click "Add."
· Next, under Old Value, choose "Range ____ through ____"
and put 20.01 in the first box and 24 in the second, because you
want a range between 20.01 and 24. Under New Value, put in a
2 for category 2, and click "Add."
· Next, under Old Value, choose "Range ____ through ____"
128. and put 24.01 in the first box and 27 in the second, because you
want a range between 24.01 and 27. Under New Value, put in
a 3 for category 3, and click "Add."
· Lastly, under Old Value, choose "Range value through
Highest" and put 27.01 in there, because you want all the values
from 27.01 to the highest. Under New Value, put in a 4 for
category 4, and click "Add."
· Click “continue” then “ok”.
B) Summarize the BMI categories variable
BMI_BASELINE_CAT
Frequency
Valid Percent
UNDER WEIGHT
9
2.1
NORMAL
71
16.7
OVER WEIGTH
92
21.6
OBESE
254
59.6
3. What percent of those who reduced their BMI are males?
39.6%
(Out of the 404 who reduced BMI, 160 are males,
(160/404)*100=39.6%).
BMI_CHANGE_CAT * GENDER Crosstabulation
GENDER
131. 100.0%
100.0%
4. What percent of the males increased their BMI? 3%
(Out of the 167 males, 5 increased their BMI,
(5/167)*100=3.0%).
BMI_CHANGE_CAT * GENDER Crosstabulation
GENDER
Total
MALE
FEMALE
BMI_CHANGE_CAT
REDUCED
Count
160
244
404
% within BMI_CHANGE_CAT
39.6%
60.4%
100.0%
% within GENDER
95.8%
94.2%
94.8%
NO CHANGE
Count
2
6
132. 8
% within BMI_CHANGE_CAT
25.0%
75.0%
100.0%
% within GENDER
1.2%
2.3%
1.9%
INCREASED
Count
5
9
14
% within BMI_CHANGE_CAT
35.7%
64.3%
100.0%
% within GENDER
3.0%
3.5%
3.3%
Total
Count
167
259
426
133. % within BMI_CHANGE_CAT
39.2%
60.8%
100.0%
% within GENDER
100.0%
100.0%
100.0%
LAB 3:
Creating New Variables
1
Types of original variables
Procedures
Type of new
variables
Creating New Variables
Transform
(Compute)
135. To recode the values of BMI change (Quantitative) into BMI
change as qualitative variable
136. Now label each category
The row percentage (i.e., out of the 404 who reduced BMI, 160
are males, (160/404)*100=39.6%).
The column percentage (i.e., out of the 167 males, 5 increased
their BMI, (5/167)*100=3.0%).
The overall column percentage (i.e., out of all 426 people, 404
reduced their BMI, (404/426)*100=94.8%).
The overall row percentage (i.e., out of all 426 people, 259 were
females , (259/426)*100=60.8%).BMI_change_CAT *
GENDERGENDERTotalMALEFEMALEBMI_change_CATRedu
cedCount160244404% within
BMI_change_CAT39.6%60.4%100.0%% within
GENDER95.8%94.2%94.8%No changeCount268% within
BMI_change_CAT25.0%75.0%100.0%% within
GENDER1.2%2.3%1.9%IncreasedCount5914% within
BMI_change_CAT35.7%64.3%100.0%% within
GENDER3.0%3.5%3.3%TotalCount167259426% within
BMI_change_CAT39.2%60.8%100.0%% within
GENDER100.0%100.0%100.0%
LAB 3:
Creating New Variables
137. 1
The main purpose of this lab is to explore different ways to
create new variables or to manipulate existing variables.
Transform Compute procedure in SPSS is used to create a
quantitative variable from another quantitative variable(s)
Transform Recode into different variables procedure in SPSS is
used to create a qualitative variable from another quantitative
variable
Cumulative percentage is a way of describing frequency
distributions. It is also used to identify the percent of people
Under, At, or Above a certain cut-off point of a scale variable.
For example, the percent of population who have weight less
than 200 lbs.
Introduction
2
Types of original variables
Procedures
Type of new
variables
Creating New Variables
Transform
(Compute)
139. The row percentage (i.e., out of the 404 who reduced BMI, 160
are males, (160/404)*100=39.6%).
The column percentage (i.e., out of the 167 males, 5 increased
their BMI, (5/167)*100=3.0%).
The overall column percentage (i.e., out of all 426 people, 404
reduced their BMI, (404/426)*100=94.8%).
The overall row percentage (i.e., out of all 426 people, 259 were
females, (259/426)*100=60.8%).BMI_change_CAT *
140. GENDERGENDERTotalMALEFEMALEBMI_change_CATRedu
cedCount160244404% within
BMI_change_CAT39.6%60.4%100.0%% within
GENDER95.8%94.2%94.8%No changeCount268% within
BMI_change_CAT25.0%75.0%100.0%% within
GENDER1.2%2.3%1.9%IncreasedCount5914% within
BMI_change_CAT35.7%64.3%100.0%% within
GENDER3.0%3.5%3.3%TotalCount167259426% within
BMI_change_CAT39.2%60.8%100.0%% within
GENDER100.0%100.0%100.0%
Example: Use the corn1 dataset (lab2) to answer the following
questions:
What percent of the subjects have body mass index (BMI) of 20
or more at baseline?
Answer:
Step 1: Computing the BMI from two quantitative variables
From the menu choose:
Transform Compute Variable Enter the name of the
target variable BMI_baseline
Enter the numeric expression… (weight at baseline x
703)/Height2 Click ok
Please note that no space is allowed for the new name
Please select each variable name from the variable list because
it might be different from the variable label.
The BMI formula in SPSS should look like this
(weight*703)/height**2
141. 5
Step 1:
Enter the name of the new variable (with no space)
Step 2:
Select the original variables from the list
Use this pad exclusively to complete the formula
Step 2:
Obtain the cumulative frequencies for the BMI at baseline:
From the menus choose:
Analyze Descriptive Statistics
Frequencies ...( make sure the frequency tables item is checked
Step 3: Interpret the cumulative percent
Total percent of population who had BMI less than 20
Total percent of population who had BMI 20 or more = 100- 2.1
= 97.9%
142. The percent of population who had BMI exactly 20.01
Classify the study subjects BMI at baseline into the categories
indicated in Table 1.1
Answer:
Step 1: Use Transform Record procedure to create BMI groups
at baseline (qualitative) from BMI at baseline (quantitative).
Use “Recode into Different Variables” option in order to avoid
overwriting the original variable. See the next slide.
Indicate the percent of the subjects with overweight
__________21.6%___________
What percent of the female population have normal weight?