Missing Parts:
I don’t think you understood the assignment. I am looking at it, all I see is where you entered
SAS codes and then that’s it. These SAS codes you inputted, I’d like to see some results, such as
these things I am about to mention:
Part I)
1. (2 pts.) Import the data into your software. Be sure to check that your data looks
exactly like the original data before proceeding! 2. (2 pts.) For BOTH of your
original quantitative variables, create TWO categorized versions based upon cutoffs
of your choice. One binary version and one multi-level version with 3-5 groups. Use
numbers for the new variables to represent the groups. No group should have less
than 10% of the overall sample. Be sure you define your groups so that they do not
overlap and you do not miss any observations. • In SPSS this can be done using
TRANSFORM and RECODE INTO DIFFERENT VARIABLE. • In SAS you need
to use a DATA step with IF-THEN statements to create the new variables. 3. (2 pts.)
Create translations which provide the range of values for the variables created in
Question 3. • In SPSS this is done in the variable view using the “Values” column. •
In SAS you need to create the formats using PROC FORMAT and then assign those
formats to the appropriate variables using a DATA step. 4. (3 pts.) Label all
variables with descriptive titles. • In SPSS this is done in the variable view using the
“Label” column. • In SAS you need to use a DATA step which includes a LABEL
statement.
All the codes I’m looking at, I didn’t need to see them, I expect to see them in a table. I’ve
similar exercises, and that’s not how they look.
PART II)
Part 2: Descriptive Summary of Each Variable 5. (6 pts.) Calculate the sample size, sample
mean, sample median, sample standard deviation, min, max, Q1, Q3, and 95% confidence
interval for the population mean for your two quantitative variables. Provide the software
output containing these results in your solution. 6. (6 pts.) Construct a histogram, boxplot,
and QQ-plot for your two quantitative variables. Provide only the graphs in your solution.
7. (8 pts.) Construct a frequency table for each of the four variables created in Question 3.
8. (6 pts.) Provide a brief discussion of the distribution of your two main variables using as
much of the information in Questions 5-7 as possible (and yet remain as concise as
possible).
Where did you do all these calculations; I didn’t see anything. I did see a histogram, that’s all I
saw. Where’s the box plot, QQ plot, there was no graph. Also, you didn’t provide any discussion.
PART III)
Part 3: Case QQ - Using the two quantitative variables 9. (2 pts.) Construct a scatterplot.
Provide only this plot in your solution. 10. (2 pts.) Regardless of whether it is appropriate,
calculate Pearson’s correlation coefficient. Provide the output containing the estimate and
the p-value. 11. (3 pts.) Regardless of whether it is a ...
Missing Parts I don’t think you understood the assignment.docx
1. Missing Parts:
I don’t think you understood the assignment. I am looking at it,
all I see is where you entered
SAS codes and then that’s it. These SAS codes you inputted, I’d
like to see some results, such as
these things I am about to mention:
Part I)
1. (2 pts.) Import the data into your software. Be sure to check
that your data looks
exactly like the original data before proceeding! 2. (2 pts.) For
BOTH of your
original quantitative variables, create TWO categorized versions
based upon cutoffs
of your choice. One binary version and one multi-level version
with 3-5 groups. Use
numbers for the new variables to represent the groups. No group
should have less
than 10% of the overall sample. Be sure you define your groups
so that they do not
overlap and you do not miss any observations. • In SPSS this
can be done using
2. TRANSFORM and RECODE INTO DIFFERENT VARIABLE. •
In SAS you need
to use a DATA step with IF-THEN statements to create the new
variables. 3. (2 pts.)
Create translations which provide the range of values for the
variables created in
Question 3. • In SPSS this is done in the variable view using the
“Values” column. •
In SAS you need to create the formats using PROC FORMAT
and then assign those
formats to the appropriate variables using a DATA step. 4. (3
pts.) Label all
variables with descriptive titles. • In SPSS this is done in the
variable view using the
“Label” column. • In SAS you need to use a DATA step which
includes a LABEL
statement.
All the codes I’m looking at, I didn’t need to see them, I expect
to see them in a table. I’ve
similar exercises, and that’s not how they look.
PART II)
Part 2: Descriptive Summary of Each Variable 5. (6 pts.)
Calculate the sample size, sample
3. mean, sample median, sample standard deviation, min, max, Q1,
Q3, and 95% confidence
interval for the population mean for your two quantitative
variables. Provide the software
output containing these results in your solution. 6. (6 pts.)
Construct a histogram, boxplot,
and QQ-plot for your two quantitative variables. Provide only
the graphs in your solution.
7. (8 pts.) Construct a frequency table for each of the four
variables created in Question 3.
8. (6 pts.) Provide a brief discussion of the distribution of your
two main variables using as
much of the information in Questions 5-7 as possible (and yet
remain as concise as
possible).
Where did you do all these calculations; I didn’t see anything. I
did see a histogram, that’s all I
saw. Where’s the box plot, QQ plot, there was no graph. Also,
you didn’t provide any discussion.
PART III)
Part 3: Case QQ - Using the two quantitative variables 9. (2
4. pts.) Construct a scatterplot.
Provide only this plot in your solution. 10. (2 pts.) Regardless
of whether it is appropriate,
calculate Pearson’s correlation coefficient. Provide the output
containing the estimate and
the p-value. 11. (3 pts.) Regardless of whether it is appropriate,
conduct a simple linear
regression analysis. In your solution provide output containing:
overall ANOVA table,
table of parameter estimates (slope/intercept) including
confidence intervals, and the value
of the coefficient of determination, R2. Provide a histogram and
normal probability plot of
the residuals as well as a scatterplot of the residuals (Y) vs. the
predicted values (X) 12. (6
pts.) Provide a brief discussion of the relationship between your
two quantitative variables
using as much of the information in Questions 9-11 as possible.
• If linear regression and
correlation are appropriate, provide interpretations of the
correlation coefficient, the slope,
and the coefficient of determination • If these are not
appropriate, discuss why using as
much support as possible from your output and discuss whether
5. Spearman’s rank
correlation would be appropriate
First, where’s the scatter plot, missing. Where did you calculate
“r”? Very incomplete.
Part 4:
Case CC 13. (8 pts.) Construct all four combinations of two-
way tables which investigate
the relationship between your two variables. Your solution must
include the two-way table,
the row and column percentages, and the appropriate chi-
squared statistic with p-value.
14. (6 pts.) Provide a summary of the conclusions of the
appropriate chi-squared test for
each of the combinations in Question 13 including a discussion
of the distribution of your
response within the levels of the explanatory variable. Be sure
to address any concerns with
using these test.
I am gonna stop here, what you have is completely different
than what the teacher asked. I’m
completely unsatisfied, and franckly, I cannot release any fund
for that.
6. Below is an example of one of my homework. Take a look at it.
The SAS System
The FREQ Procedure
High blood pressure
HBP Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 768 76.80 768 76.80
1 232 23.20 1000 100.00
Smoking Status
SMOKE Frequency Percent Cumulative
Frequency
Cumulative
Percent
7. 1 501 50.10 501 50.10
2 245 24.50 746 74.60
3 254 25.40 1000 100.00
The SAS System
The FREQ Procedure
High blood pressure
HBP Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 768 76.80 768 76.80
1 232 23.20 1000 100.00
Smoking Status
SMOKE Frequency Percent Cumulative
Frequency
8. Cumulative
Percent
1 501 50.10 501 50.10
2 245 24.50 746 74.60
3 254 25.40 1000 100.00
The SAS System
The MEANS Procedure
Analysis Variable : AGE Age (years)
High
blood
pressur
e
N
Ob
s
9. N Mean Minimum Lower
Quartile
Median Upper
Quartile
Std Dev
0 768 76
8
43.842447
9
20.000000
0
29.000000
0
40.000000
0
56.000000
0
17.669939
11. The SAS System
The FREQ Procedure
High blood pressure
HBP Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 768 76.80 768 76.80
1 232 23.20 1000 100.00
Smoking Status
SMOKE Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 501 50.10 501 50.10
2 245 24.50 746 74.60
3 254 25.40 1000 100.00
12. The SAS System
The MEANS Procedure
Analysis Variable : AGE Age (years)
High
blood
pressur
e
N
Ob
s
N Mean Minimum Lower
Quartile
Median Upper
Quartile
Std Dev
18. The FREQ Procedure
High blood pressure
HBP Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 768 76.80 768 76.80
1 232 23.20 1000 100.00
Smoking Status
SMOKE Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 501 50.10 501 50.10
2 245 24.50 746 74.60
3 254 25.40 1000 100.00
19. The SAS System
The MEANS Procedure
Analysis Variable : AGE Age (years)
High
blood
pressur
e
N
Ob
s
N Mean Minimum Lower
Quartile
Median Upper
Quartile
Std Dev
0 768 76
8
25. The FREQ Procedure
High blood pressure
HBP Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 768 76.80 768 76.80
1 232 23.20 1000 100.00
Smoking Status
SMOKE Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 501 50.10 501 50.10
2 245 24.50 746 74.60
3 254 25.40 1000 100.00
26. The SAS System
The MEANS Procedure
Analysis Variable : AGE Age (years)
High
blood
pressur
e
N
Ob
s
N Mean Minimum Lower
Quartile
Median Upper
Quartile
Std Dev
0 768 76
32. The SAS System
The FREQ Procedure
High blood pressure
HBP Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 768 76.80 768 76.80
1 232 23.20 1000 100.00
Smoking Status
SMOKE Frequency Percent Cumulative
Frequency
Cumulative
Percent
33. 1 501 50.10 501 50.10
2 245 24.50 746 74.60
3 254 25.40 1000 100.00
The SAS System
The MEANS Procedure
Analysis Variable : AGE Age (years)
High
blood
pressur
e
N
Ob
s
N Mean Minimum Lower
Quartile
36. Analysis Variable : AGE Age (years)
Smokin
g Status
N
Ob
s
N Mean Minimum Lower
Quartile
Median Upper
Quartile
Std Dev
1 501 50
1
47.646706
6
20.000000
0
30.000000
39. The SAS System
The FREQ Procedure
Frequency
Percent
Row Pct
Col Pct
Table of HBP by SMOKE
HBP(High blood pressure) SMOKE(Smoking Status)
1 2 3 Total
0 398
39.80
51.82
79.44
42. 1000
100.00
The SAS System
The FREQ Procedure
Frequency
Percent
Row Pct
Col Pct
Table of HBP by SMOKE
HBP(High blood
pressure)
SMOKE(Smoking Status)
43. Former Current 3 Total
No 398
39.80
51.82
79.44
155
15.50
20.18
63.27
215
21.50
27.99
84.65
768
76.80
yes 103
46. Table of HBP by SMOKE
HBP(High blood
pressure)
SMOKE(Smoking Status)
Never Former Current Total
No 398
39.80
51.82
79.44
155
15.50
20.18
63.27
215
21.50
27.99
84.65
51. 0292.1646015124192.0426215150193.6816816080192.07660.51
6270292.24663.516320193.2396516394192.45761.517110192.8
3361.517250192.80359.517300192.9236417420191.8556018030
192.7986218080191.869571
Part 2: Descriptive Summary of Each Variable
5. (6 pts.) Calculate the sample size, sample mean, sample
median, sample standard deviation, min, max, Q1,
Q3, and 95% confidence interval for the population mean for
your two quantitative variables. Provide the
software output containing these results in your solution.
6. (6 pts.) Construct a histogram, boxplot, and QQ-plot for your
two quantitative variables. Provide only the
graphs in your solution.
7. (8 pts.) Construct a frequency table for each of the four
variables created in Question 3.
8. (6 pts.) Provide a brief discussion of the distribution of your
two main variables using as much of the
information in Questions 5-7 as possible (and yet remain as
concise as possible).
Part 3: Case QQ - Using the two quantitative variables
9. (2 pts.) Construct a scatterplot. Provide only this plot in your
solution.
10. (2 pts.) Regardless of whether it is appropriate, calculate
Pearson’s correlation coefficient. Provide the
52. output containing the estimate and the p-value.
11. (3 pts.) Regardless of whether it is appropriate, conduct a
simple linear regression analysis. In your solution
provide output containing: overall ANOVA table, table of
parameter estimates (slope/intercept) including
confidence intervals, and the value of the coefficient of
determination, R2. Provide a histogram and normal
probability plot of the residuals as well as a scatterplot of the
residuals (Y) vs. the predicted values (X)
12. (6 pts.) Provide a brief discussion of the relationship
between your two quantitative variables using as much
of the information in Questions 9-11 as possible.
• If linear regression and correlation are appropriate, provide
interpretations of the correlation
coefficient, the slope, and the coefficient of determination
• If these are not appropriate, discuss why using as much
support as possible from your output and
discuss whether Spearman’s rank correlation would be
appropriate
Part 4: Case CC
13. (8 pts.) Construct all four combinations of two-way tables
which investigate the relationship between your
two variables. Your solution must include the two-way table,
the row and column percentages, and the
appropriate chi-squared statistic with p-value.
14. (6 pts.) Provide a summary of the conclusions of the
appropriate chi-squared test for each of the
combinations in Question 13 including a discussion of the
53. distribution of your response within the levels of
the explanatory variable. Be sure to address any concerns with
using these tests.
Part 5: Case CQ
15. (2 pts.) Construct side-by-side boxplots of your chosen
response variable by the binary version of your
explanatory variable. Provide only the graph in your solution.
16. (4 pts.) Conduct a two-sample t-test. Provide all output
given by your software in your solution.
17. (6 pts.) Provide a conclusion about the relationship between
your two variables based upon the results in
Questions 15 and 16. Be sure to state which t-test you are using
and interpret the confidence interval for the
difference between the two population means. Address any
concerns with using this test.
Part 6: Case CQ
18. (2 pts.) Construct side-by-side boxplots of your chosen
response variable by the multi-level version of your
explanatory variable. Provide only the graph in your solution.
19. (4 pts.) Conduct an ANOVA. Provide only the basic
ANOVA table in your solution.
20. (6 pts.) Provide a conclusion about the relationship between
your two variables based upon the results in
Questions 18 and 19. Address any concerns with using this test.
Part 7: Summary
21. (10 pts.) Compare the results of the different ways of
investigating the relationship between your two
variables. Did all of the conclusions agree? What type(s) of
error(s) might you have made for each of the
54. tests? What are the strengths and limitations of each method?
Learning Objectives
CO-4: Distinguish among different measurement scales, choose
the appropriate descriptive and inferential statistical
methods based on these distinctions, and interpret the results.
LO 4.3: Using appropriate numerical measures and/or visual
displays, describe the distribution of a categorical variable in
context.
LO 4.4: Using appropriate graphical displays and/or numerical
measures, describe the distribution of a quantitative variable
in context: a) describe the overall pattern, b) describe striking
deviations from the pattern.
LO 4.7: Define and describe the features of the distribution of
one quantitative variable (shape, center, spread, outliers).
LO 4.8: Define and calculate the sample mean of a quantitative
variable.
LO 4.9: Define and calculate the sample median of a
quantitative variable.
LO 4.10: Choose the appropriate measures for a quantitative
variable based upon the shape of the distribution.
LO 4.12: Define and calculate Q1, Q3, and the IQR for one
quantitative variable.
LO 4.13: Define and calculate the standard deviation and
variance of one quantitative variable.
LO 4.14: Define and interpret measures of position (percentiles,
quartiles, the five-number summary, z-scores).
LO 4.18: Compare and contrast distributions (of quantitative
data) from two or more groups, and produce a brief summary,
interpreting your findings in context.
LO 4.19: For a data analysis situation involving two variables,
identify the role of each variable in the scenario.
LO 4.20: Classify a data analysis situation involving two
variables according to the “role-type classification.”
55. LO 4.21: For a data analysis situation involving two variables,
determine the appropriate graphical display(s) and/or
numerical measures(s) that should be used to summarize the
data.
LO 4.23: Given a contingency table (two-way table), interpret
the information it reveals about the association between two
categorical variables by calculating and comparing conditional
percentages.
LO 4.25: Describe the relationship displayed in a scatterplot
including: a) the overall pattern, b) striking deviations from the
pattern.
LO 4.26: Explain the limitations of Pearson’s correlation
coefficient (r) as a measure of the association between two
quantitative variables.
LO 4.27: In the special case of a linear relationship, interpret
Pearson’s correlation coefficient (r) in context.
LO 4.28: In the special case of a linear relationship, interpret
the slope of the regression line and use the regression line to
make predictions.
LO 4.29: Determine and use the correct point estimates for
specified population parameters.
LO 4.30: Interpret confidence intervals for population
parameters in context.
LO 4.36: For a data analysis situation involving two variables,
carry out the appropriate inferential method for examining
relationships between the variables and draw the correct
conclusions in context.
LO 4.38: In a given context, determine the appropriate standard
method for comparing groups and provide the correct
conclusions given the appropriate software output.
LO 4.39: In a given context, set up the appropriate null and
alternative hypotheses for comparing groups.
LO 4.41: Based upon the output for a two-sample t-test,
correctly interpret in context the appropriate confidence interval
for the difference between population means.
LO 4.42: Based upon the output for a two-sample t-test,
56. determine whether to use the results assuming equal variances
or
those assuming unequal variances.
LO 4.43: In a given context, determine the appropriate standard
method for examining the relationship between two
categorical variables. Given the appropriate software output,
choose the correct p-value and provide the correct
conclusions in context.
LO 4.44: In a given context, set up the appropriate null and
alternative hypotheses for examining the relationship between
two categorical variables.
LO 4.45: In a given context, set up the appropriate null and
alternative hypotheses for examining the relationship between
two quantitative variables.
LO 4.46: In a given context, determine the appropriate standard
method for examining the relationship between two
quantitative variables and interpret the results provided in the
appropriate software output in context.
LO 4.47: For simple linear regression models, interpret the
coefficient of determination in context.
CO-5: Determine preferred methodological alternatives to
commonly used statistical methods when assumptions are not
met.
LO 5.1: For a data analysis situation involving two variables,
determine the appropriate alternative (non-parametric)
method when assumptions of our standard methods are not met.
LO 5.2: Recognize situations in which Spearman’s rank
correlation is a more appropriate measure of the relationship
between two quantitative variables.
CO-6: Apply basic concepts of probability, random variation,
and commonly used statistical probability distributions.
LO 6.1: Explain the meaning of the term distribution in
statistics.
57. LO 6.3: Use histograms and QQ-plots (or Normal Probability
Plots) to visually assess the normality of distributions of
quantitative variables.
LO 6.4: Relate the probability of an event to the likelihood of
this event occurring.
LO 6.5: Apply the relative frequency approach to estimate the
probability of an event.
LO 6.9: Apply logic or probability rules to calculate conditional
probabilities, P(A|B), and interpret them in context.
LO 6.19: Identify and distinguish between a parameter and a
statistic.
LO 6.27: Explain what the p-value is and how it is used to draw
conclusions.
LO 6.28: Define a Type I and Type II error in general and in the
context of specific scenarios.
LO 6.30: Use a confidence interval to determine the correct
conclusion to the associated two-sided hypothesis test.
CO-7: Use statistical software to analyze public health data.
LO 7.0: Apply other software skills such as labeling variables,
creating translations for coded categorical variables, and
creating new variables.
LO 7.1: View a dataset in EXCEL, text editor, or other
spreadsheet or statistical software.
LO 7.2: Given data containing a categorical variable, use
software to create a frequency distribution, bar chart, and pie
chart.
LO 7.3: Given data containing a quantitative variable, use
software to create a histogram, boxplot, and QQ-plot (or PP-
Plot).
LO 7.4: Given data containing a quantitative variable, use
software to calculate numerical summaries.
LO 7.5: Given appropriate data involving two variables, use
software to construct side-by-side boxplots.
LO 7.6: Given appropriate data involving two variables, use
software to calculate numerical measures for a quantitative
variable within two or more groups.
58. LO 7.7: Given appropriate data involving two variables, use
software to construct a two-way table.
LO 7.8: Given appropriate data involving two variables, use
software to construct a scatterplot.
LO 7.9: Given appropriate data involving two variables, use
software to calculate the value of Pearson’s correlation
coefficient.
LO 7.10: Given appropriate data involving two variables, use
software to calculate the value of the coefficient of
determination (R-squared).
LO 7.11: In the special case of a linear relationship, use
software to construct the simple linear regression equation.
LO 7.15: Given appropriate data, use software to conduct a two-
sample t-test.
LO 7.16: Given appropriate data, use software to conduct an
ANOVA.
LO 7.17: Given appropriate data, use software to conduct a chi-
square test of independence of two categorical variables.
LO 7.18: Given appropriate data, use software to conduct a test
for statistical significance of Pearson’s correlation
coefficient.
LO 7.19: Given appropriate data, use software to conduct a test
for statistical significance of the slope of the simple linear
regression equation.
CO-8: Develop presentations based on statistical analyses for
both public health professionals and educated lay
audiences.
There are no learning sub-objectives for this course objective.
This course objective will be assessed through group and
individual assignments during the semester.
hggyuA(10)