Convenience shopping
STAT-S301
Fall 2019
Question Set 1
1. Get to know your scientific question (Chapter 1)
(a) Identify the variable of interest.
(b) Identify the population(s) and sample(s).
(c) Identify the parameter(s) and statistic(s).
(d) What is the scientific question? Is this Descriptive Statistics or Inferential Statistics?
2. Get to know your data (Chapter 1)
(a) Identify the types of your data: nominal data, ordinal data or quantitative data.
(b) Identify the types of your data: time series data or cross-sectional data.
(c) Identify the source of your data: primary data or secondary data. Do you think the data is
reliable? Are there possible issues with your data?
3. Calculate descriptive statistics in Excel (Chapter 3)
(a) Calculate the statistics for your variable of interest, such as sample mean (x̄), median, mode,
variance (s2), and standard deviation (s).
(b) Identify two different groups based on the qualitative data. Calculate the above statistics for
each group to compare.
4. Display your data with charts and graphs in Excel (Chapter 2)
(a) Construct displays that best describe your qualitative variable (e.g. bar chart, pie chart); and
describe the distribution.
(b) Construct displays that best describe your variable of interest and describe its distribution. (Use:
Frequency distribution tables, histograms and/or the empirical rule to discuss normality, symmetry
and skewness)
(c) Construct displays that best describe the relationship/association between two quantitative
variables (the variable of interest as the dependent variable, y, and another quantitative
variable as the independent variable, x); and describe the relationship.
5. Distributions (Chapters 5-6)
(a) Consider the distribution of your quantitative data in 4(b). Would it be appropriate to use the
Binomial or Normal distribution to model your data? Why or why not? Hint: The binomial
distribution models success/failure discrete data while the normal distribution is for bell-
shaped continuous data.
1
Question Set 2
1. Construct a confidence interval for a population mean (Chapter 8)
(a) Do you need to make assumptions in order to perform the procedure of constructing a
confidence interval? If so, what assumptions need to be made? If not, why?
(b) Construct a confidence interval for the average sales .
i. Should you use a z-interval or a t-interval? Why?
ii. Compute the necessary sample statistics for constructing a confidence interval.
iii. Find the margin of error of the confidence interval at confidence levels of 92% and 95%,
respectively.
iv. Calculate these two confidence intervals.
(c) Someone believes that the average sales is 2421 Dollars. Does the sample support the claim?
Explain if you have different conclusions using the above two confidence intervals. (You must
discuss in terms of accuracy and precision.)
2. Conduct a hypothesis test for a population mean (Chapter 9)
(a) Do you need to make assumptions in order to p.
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Convenience shoppingSTAT-S301Fall 2019Question Set 1.docx
1. Convenience shopping
STAT-S301
Fall 2019
Question Set 1
1. Get to know your scientific question (Chapter 1)
(a) Identify the variable of interest.
(b) Identify the population(s) and sample(s).
(c) Identify the parameter(s) and statistic(s).
(d) What is the scientific question? Is this Descriptive Statistics
or Inferential Statistics?
2. Get to know your data (Chapter 1)
(a) Identify the types of your data: nominal data, ordinal data or
quantitative data.
(b) Identify the types of your data: time series data or cross-
sectional data.
(c) Identify the source of your data: primary data or secondary
data. Do you think the data is
reliable? Are there possible issues with your data?
3. Calculate descriptive statistics in Excel (Chapter 3)
2. (a) Calculate the statistics for your variable of interest, such as
sample mean (x̄ ), median, mode,
variance (s2), and standard deviation (s).
(b) Identify two different groups based on the qualitative data.
Calculate the above statistics for
each group to compare.
4. Display your data with charts and graphs in Excel (Chapter 2)
(a) Construct displays that best describe your qualitative
variable (e.g. bar chart, pie chart); and
describe the distribution.
(b) Construct displays that best describe your variable of
interest and describe its distribution. (Use:
Frequency distribution tables, histograms and/or the empirical
rule to discuss normality, symmetry
and skewness)
(c) Construct displays that best describe the
relationship/association between two quantitative
variables (the variable of interest as the dependent variable, y,
and another quantitative
variable as the independent variable, x); and describe the
relationship.
5. Distributions (Chapters 5-6)
(a) Consider the distribution of your quantitative data in 4(b).
Would it be appropriate to use the
Binomial or Normal distribution to model your data? Why or
why not? Hint: The binomial
distribution models success/failure discrete data while the
normal distribution is for bell-
shaped continuous data.
3. 1
Question Set 2
1. Construct a confidence interval for a population mean
(Chapter 8)
(a) Do you need to make assumptions in order to perform the
procedure of constructing a
confidence interval? If so, what assumptions need to be made?
If not, why?
(b) Construct a confidence interval for the average sales .
i. Should you use a z-interval or a t-interval? Why?
ii. Compute the necessary sample statistics for constructing a
confidence interval.
iii. Find the margin of error of the confidence interval at
confidence levels of 92% and 95%,
respectively.
iv. Calculate these two confidence intervals.
(c) Someone believes that the average sales is 2421 Dollars.
Does the sample support the claim?
Explain if you have different conclusions using the above two
confidence intervals. (You must
discuss in terms of accuracy and precision.)
2. Conduct a hypothesis test for a population mean (Chapter 9)
4. (a) Do you need to make assumptions in order to perform the
procedure of conducting a hypothesis
test? If so, what assumptions need to be made? If not, why?
(b) Using α = 0.07 perform a hypothesis test to determine if the
average sales is higher than 2350
Dollars.
i. Write down the hypotheses.
ii. Calculate the test statistic, critical values and p-value.
iii. Describe your decision of the test and make a conclusion
based on the context.
3. Compare two population means (Chapter 10)
(a) Do you need to make assumptions in order to perform the
procedure of conducting a hypothesis
test or constructing a confidence interval? If so, what
assumptions need to be made? If not,
why?
(b) Using α = 0.04 perform a hypothesis test to determine if the
mean Sales Dollars of the two
groups identified by your qualitative variable are different. We
cannot assume equal variances.
List the results of all key steps before you reach your
conclusion, such as the hypotheses, test
statistic, critical value(s) and/or p-value. (Use the Data Analysis
Toolpak in Excel.)
(c) Find the 90% confidence interval to estimate the average
difference in sales between the two
populations according to the qualitative variable.
5. (d) Interpret the above confidence interval.
Question Set 3
1. Building a Simple Linear Regression Model: Preprocess.
2
(a) Identify all quantitative variables from the dataset.
(b) Construct a Scatter Plot to show the relationship between
Sales Dollars (Y ) and each
independent variable. Calculate the sample correlation
coefficients for all pairs. Describe the
association.
(c) Which pair has the strongest linear association?
(d) Write down the general formula for the Simple Linear
Regression Model between Y and X.
(Write the formula using general parameters notation β0 and β1,
what should be capitalize or
lowercase ? what should be added, if any? )
2. Describe the linear relationship between Sales Dollars (Y )
and the variable you answered in
2(c) (above) as x.
(a) Calculate the slope and y-intercept of the least squares
regression line using Excel. Write
down the linear equation.
(b) Interpret the regression slope.
6. (c) What percentage of the total variation in y can be explained
by this independent variable x?
3. Use the regression model to predict Sales (Y ).
(a) What is the predicted sales with 3250 ? (Fill in the blank
with units and
name of the independent variable you chose.)
(b) Calculate the 93% confidence interval for the average Sales
Dollars (Y ) with 3250
and interpret. (Fill in the blank with units and name of the
independent
variable you chose.)
(c) Calculate the 93% prediction interval for a SINGLE sales (Y
) with 3250 and
interpret. (Fill in the blank with units and name of the
independent variable you chose.)
4. Is there a linear relationship between Y and X?
(a) Test the significance of the slope of the regression equation.
Use α = 0.09.
i. Write down the hypotheses.
ii. What is the p-value?
iii. Describe your decision.
(b) Develop a 90% confidence interval for the population slope.
Does this confidence interval
include 0? (c) State your conclusion.(Hint: You may need to re-
calculate Regression analysis:
7. Data → Data Analysis → Regression → Confidence level.)
5. Check the assumptions for regression analysis. Make
necessary plots in Excel to justify and
include them in your answers.
(a) Is the relationship between the dependent and independent
variables linear? Which plot
should you check?
(b) Do the residuals exhibit some pattern across values for the
independent variable? Which plot
should you check?
3
(c) Is the variation of the dependent variable the same across all
values of the independent variable?
Which plot should you check?
(d) Do the residuals follow the normal probability distribution?
Which plot should you check?
(e) Conclusion: Are the results from the regression analysis
reliable?
Question Set 4
1. Model 1: Develop a multiple regression model to predict the
Sales (Y ) using all the other
variables of interest as listed above. (Round all numerical
answers to two decimal places as
needed.)
8. (a) Identify qualitative variable(s) from the list of variables of
interest, if there is any, and create
a dummy variable in Excel. (Note: use Excel function =IF() and
use alphabetical order
to assign values 0 and 1)
(b) Perform a multiple regression with the Data Analysis
Toolpak in Excel, and write down the
regression equation for Model 1. (Enter in Excel the confidence
level given in question 1(e).
Note: Excel requires that the independent variables be located
in adjacent columns)
(c) Explain the variation of the dependent variable after
accounting for the effects of the other
independent variables:
i. What percentage of total variation in the Sales (Y ) can be
explained by Model 1?
ii. What is the value of the adjusted multiple coefficient of
determination, R2A?
(d) Is the overall regression model significant using α = 0.07?
State the hypotheses and your
conclusion.
(e) Which independent variables are signifcant predictors using
α = 0.005 or confidence level
99.5%? Which are not significant? (After accounting for the
effects of the other independent
variables)
2. Develop a second multiple regression model (Model 2) using
ONE step of the “backward
elimination method”. (Remember: variables should be removed
9. one at the time and regression
analysis i.e. coefficients, R2, p-values, etc must be re-
calculated at each step) (Round all
numerical answers to two decimal places as needed.)
(a) Which variable should you remove from Model 1? Why?
(b) Perform a multiple regression with the Data Analysis
Toolpak in Excel, and write down the
regression equation for Model 2. (Enter in Excel the confidence
level given in question 2(e).
Note: Excel requires that the independent variables be located
in adjacent columns)
(c) Explaining the variation of the dependent variable:
i. What percentage of total variation in the Sales (Y ) can be
explained by Model 2? How does
this compare with the percentage you obtained with Model 1?
ii. What is the value of the adjusted multiple coefficient of
determination, R2A? How does this
compare with the one you obtained with Model 1?
(d) Is the overall regression model (Model 2) significant using α
= 0.04?
4
(e) Are all the independent variables in Model 2 significant
predictors using α = 0.01 or confidence
level 99 % after accounting for the effects of the other
independent variables?
10. (f) Prediction:
i. Is Model 2 better than Model 1?
ii. Predict the sales (Y ) with DayWeek = yes; Volume
(Gallons) = 2931; Washes() = 76; Price
(cents) = 145.7 using “the best” model (between Model 1 and
Model 2). NOTE: you may or
may not need to use all given values.
(g) Interpret regression coefficients.
i. Interpret the coefficient of Washes.
3. Check the assumptions for regression analysis for the model
you have chosen. Make necessary
plots in Excel to justify.
(a) Is the relationship between the dependent and independent
variables linear?
(b) Do the residuals exhibit some patterns across values of the
independent variables?
(c) Are the variations of the dependent variable the same across
all values of the independent
variables?
(d) Do the residuals follow the normal probability distribution?
(e) Conclusion: Are the results from the regression analysis
reliable?
5
Question Set 1Question Set 2Question Set 3Question Set 4
11. 1
CASE STUDY ASSESSMENT RUBRIC (60 points available)
Criteria Level of Achievement1
5 4* 3 2** 1
Identification
of Scientific
Question(s)
and Data
Exploration
(15 points
available)
Report includes the following:
1. Identifies scientific question(s)
clearly.
2. Uses appropriate descriptive
statistics to display the main features
of the case study data.
3. Uses appropriate charts to display
the main features of the case study
data.
4. Clearly explores the distribution of
the target (response) variable in
relation to the potential explanatory
variables in the data.
(15 points) (12 points)
12. Report has TWO of the following issues:
1. Scientific question(s) is(are) not
clearly identified.
2. Uses only descriptive statistics but
no charts.
3. Uses only charts but no descriptive
statistics.
4. Did not explore the distribution of
the target (response) variable in
relation to the potential explanatory
variables in the data.
(9 points) (6 points)
Report has THREE of the following
issues:
1. Scientific question(s) is(are) not
clearly identified.
2. Uses only descriptive statistics but
no charts.
3. Uses only charts but no
descriptive statistics.
4. Did not explore the distribution of
the target (response) variable in
relation to the potential
explanatory variables in the data.
13. (3 points)
Estimation of
Population
Parameters
and Testing
Research
Hypotheses
(20 points
available)
Report includes the following:
1. Used the sample data to compute
point estimate(s) and construct
confidence interval(s) for the
parameter(s) of interest.
2. Also conducted hypothesis test(s) to
compare the average of the target
variable among different population
groups.
3. The assumption(s) needed to
construct the confidence
interval/perform the statistical tests
are clearly stated and checked.
4. Made clear comments about the
relevance of these
estimates/hypothesis tests to answer
the scientific question(s).
(20 points) (16 points)
14. Report has TWO of the following issues:
1. Did not compute point estimate(s)
and/or construct confidence
interval(s) for the parameter(s) of
interest.
2. Did not conduct appropriate
hypothesis test(s).
3. The assumption(s) needed to
construct the confidence
interval/perform the statistical tests
are NOT clearly stated and/or
checked.
4. Failed to make clear comments
about the relevance of these
estimates/hypothesis tests to answer
the scientific question(s).
(12 points) (8 points)
Report has THREE of the following
issues:
1. Did not compute point estimate(s)
and/or construct confidence
interval(s) for the parameter(s) of
interest.
2. Did not conduct appropriate
hypothesis test(s)
3. The assumption(s) needed to
construct the confidence
15. interval/perform the statistical
tests are NOT clearly stated
and/or checked.
4. Failed to make clear comments
about the relevance of these
estimates/hypothesis tests to
answer the scientific question(s).
(4 points)
2
Predictive
Models
(25 points
available)
Report includes the following:
1. Studied the correlation between the
response variable and each of the
potential explanatory variables
(using correlation coefficient and/or
scatter plots).
2. Tried different regression models to
explain and predict the response
variable based on the explanatory
variables.
3. Chose the best model using
16. appropriate model selection criteria.
4. Made clear and relevant
interpretations of the results of the
chosen model (significance of the
overall model, significance of
explanatory variables, the amount of
variation in the response variable
explained by the model,
interpretation of regression
coefficients).
5. Used appropriate plots to check the
assumptions for regression analysis
and commented on the reliability of
the regression results/predictions.
6. Used the regression results to
answer the scientific question (made
necessary predictions).
(25 points)
17. (20 points)
Report has TWO or THREE of the
following issues:***
1. The correlation between the
response variable and some
potential explanatory variables is
18. explored (uses only correlation
coefficient or only scatter plots).
2. Fitted one single regression model
(instead of trying several models) to
explain and predict the response
variable based on the explanatory
variables.
3. Wrong choice for the best model (or
didn’t use appropriate model
selection criteria).
4. Made unclear and irrelevant
interpretations of the results of the
chosen model or didn’t interpret
some of the regression results
(significance of the overall model,
significance of explanatory
variables, the amount of variation in
the response variable explained by
the model, interpretation of
regression coefficients).
5. Failed to use appropriate plots to
check the assumptions for
regression analysis and/or didn’t
comment on the reliability of the
regression results/predictions.
6. The regression results are not
utilized to answer the scientific
question (or didn’t make necessary
predictions).
20. (10 points)
Report has FOUR or FIVE of the
following issues:***
1. The correlation between the
response variable and some
potential explanatory variables is
explored (uses only correlation
coefficient or only scatter plots).
2. Fitted one single regression model
(instead of trying several models)
to explain and predict the
response variable based on the
explanatory variables.
3. Wrong choice for the best model
(or didn’t use appropriate model
selection criteria).
4. Made unclear and irrelevant
interpretations of the results of the
chosen model or didn’t interpret
some of the regression results
(significance of the overall model,
significance of explanatory
variables, the amount of variation
in the response variable explained
by the model, interpretation of
regression coefficients).
21. 5. Failed to use appropriate plots to
check the assumptions for
regression analysis and/or didn’t
comment on the reliability of the
regression results/predictions.
6. The regression results are not
utilized to answer the scientific
question (or didn’t make
necessary predictions).
(5 points)
4* Exhibits some characteristics of “5” and some of “3”
2** Exhibits some characteristics that fall somewhere between
“3” and “1”
*** TWO or THREE (FOUR or FIVE) issues is determined
based on the seriousness of the issue
Sheet
1Sales.(Dollars)Volume.(Gallons)WashesPrice.(cents)DayWeek
22733111261144.2no20802588280136.3no20743464280140.8no
2641374976138.1yes17652811268137.3no20172808420140.7no
20413477153152.5no17912280259147.4no24793493167137.3ye
s25093638192158.2yes21293009201136.4no2423233035140.8n
o24543472296155.4yes22103472278142.7yes28553948542137.4
yes24073587217163.1yes21892416135136.7no30223382666137.
2yes26772925337138.5yes20663862113154.3yes199525602561
40.2no20933196193136.9no22152793167136.8no246036103111
44.6yes22473730206154.9yes23012925215137.3no24423801193
136.4yes26683851249138.1yes1998375720139.6no24993577367
142.7yes24223670348137.6yes20043572213156.7yes243430032
25. •
Washes : number of vehicle washes sold per day
•
DayWeek : day of the week: Weekday (YES) or weekend day
(NO)
•
Price : daily average price of one gallon of gasoline in cents of
U.S. dollar. (regardless of gas type