HW1_STAT206.pdf
Statistical Inference II: J. Lee Assignment 1
Problem 1. Suppose the day after the Drexel-Northeastern basketball game, a poll of 1000 Drexel students
was conducted and it was determined that 850 out of the 1000 watched the game (live or on television).
Assume that this was a simple random sample and that the Drexel undergraduate population is 20000.
(a) Generate an unbiased estimate of the true proportion of Drexel undergraduate students who watched
the game.
(b) What is your estimated standard error for the proportion estimate in (a)?
(c) Give a 95% confidence interval for the true proportion of Drexel undergraduate students who watched
the game.
Problem 2. (Exercise 18 in Chapter 7 of Rice) From independent surveys of two populations, 90% con-
fidence intervals for the population means are conducted. What is the probability that neither interval
contains the respective population mean? That both do?
Problem 3. (Exercise 23 in Chapter 7 of Rice)
(a) Show that the standard error of an estimated proportion is largest when p = 1/2.
(b) Use this result and Corollary B of Section 7.3.2 (also, on Page 17 of the lecture notes) to conclude that
the quantity
1
2
√
N − n
N(n − 1)
is a conservative estimate of the standard error of p̂ no matter what the value of p may be.
(c) Use the central limit theorem to conclude that the interval
p̂ ±
√
N − n
N(n − 1)
contains p with probability at least .95.
HW2_STAT206.pdf
Statistical Inference II: J. Lee Assignment 2
Problem 1. The following data set represents the number of NBA games in January 2016, watched by 10
randomly selected student in STAT 206.
7, 0, 4, 2, 2, 1, 0, 1, 2, 3
(a) What is the sample mean?
(b) Calculate sample variance.
(c) Estimate the mean number of NBA games watched by a student in January 2016.
(d) Estimate the standard error of the estimated mean.
Problem 2. True or false? Tell me why for the false statements.
(a) The center of a 95% confidence interval for the population mean is a random variable.
(b) A 95% confidence interval for µ contains the sample mean with probability .95.
(c) A 95% confidence interval contains 95% of the population.
(d) Out of one hundred 95% confidence intervals for µ, 95 will contain µ.
Problem 3. An investigator quantifies her uncertainty about the estimate of a population mean by reporting
X ± sX . What size confidence interval is?
Problem 4. For a random sample of size n from a population of size N, consider the following as an
estimate of µ:
Xc =
n∑
i=1
ciXi,
where the ci are fixed numbers and X1, . . . ,Xn are the sample. Find a condition on the ci such that the
estimate is unbiased.
Problem 5. A sample of size 100 has the sample mean X = 10. Suppose the we know that the population
standard deviation σ = 5. Find a 95% confidence interval for the population mean µ.
Problem 6. Suppose the we know that the population standard deviation σ = 5. Then how large should a
sample be to estimate the popula.
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
1. HW1_STAT206.pdf
Statistical Inference II: J. Lee Assignment 1
Problem 1. Suppose the day after the Drexel-Northeastern
basketball game, a poll of 1000 Drexel students
was conducted and it was determined that 850 out of the 1000
watched the game (live or on television).
Assume that this was a simple random sample and that the
Drexel undergraduate population is 20000.
(a) Generate an unbiased estimate of the true proportion of
Drexel undergraduate students who watched
the game.
(b) What is your estimated standard error for the proportion
estimate in (a)?
(c) Give a 95% confidence interval for the true proportion of
Drexel undergraduate students who watched
the game.
Problem 2. (Exercise 18 in Chapter 7 of Rice) From independent
surveys of two populations, 90% con-
fidence intervals for the population means are conducted. What
is the probability that neither interval
contains the respective population mean? That both do?
Problem 3. (Exercise 23 in Chapter 7 of Rice)
(a) Show that the standard error of an estimated proportion is
largest when p = 1/2.
2. (b) Use this result and Corollary B of Section 7.3.2 (also, on
Page 17 of the lecture notes) to conclude that
the quantity
1
2
√
N − n
N(n − 1)
is a conservative estimate of the standard error of p̂ no matter
what the value of p may be.
(c) Use the central limit theorem to conclude that the interval
p̂ ±
√
N − n
N(n − 1)
contains p with probability at least .95.
HW2_STAT206.pdf
Statistical Inference II: J. Lee Assignment 2
Problem 1. The following data set represents the number of
NBA games in January 2016, watched by 10
3. randomly selected student in STAT 206.
7, 0, 4, 2, 2, 1, 0, 1, 2, 3
(a) What is the sample mean?
(b) Calculate sample variance.
(c) Estimate the mean number of NBA games watched by a
student in January 2016.
(d) Estimate the standard error of the estimated mean.
Problem 2. True or false? Tell me why for the false statements.
(a) The center of a 95% confidence interval for the population
mean is a random variable.
(b) A 95% confidence interval for µ contains the sample mean
with probability .95.
(c) A 95% confidence interval contains 95% of the population.
(d) Out of one hundred 95% confidence intervals for µ, 95 will
contain µ.
Problem 3. An investigator quantifies her uncertainty about the
estimate of a population mean by reporting
X ± sX . What size confidence interval is?
Problem 4. For a random sample of size n from a population of
size N, consider the following as an
estimate of µ:
Xc =
4. n∑
i=1
ciXi,
where the ci are fixed numbers and X1, . . . ,Xn are the sample.
Find a condition on the ci such that the
estimate is unbiased.
Problem 5. A sample of size 100 has the sample mean X = 10.
Suppose the we know that the population
standard deviation σ = 5. Find a 95% confidence interval for the
population mean µ.
Problem 6. Suppose the we know that the population standard
deviation σ = 5. Then how large should a
sample be to estimate the population mean µ with a margin of
error not exceeding 0.5?
Problem 7. You flip a fair coin n times and keep track of the
sample mean, X
(n)
(the fraction of heads
among the n flips). Of course, when n is very large, you expect
that the random variable X
(n)
will be very
close to 0.5 (since the coin is fair).
(a) Use the Central Limit Theorem to estimate how large n must
be in order for you to be 95% confident
that X
5. (n)
is between 0.45 and 0.55.
(b) Use Chebyshev inequality to obtain a number K such that
you can guarantee that if n is at least K,
then the probability that X
(n)
is between 0.45 and 0.55 is at least 0.95.
Rice_HW1.pdf
Dr. Jinwook Lee
!
Survey Sampling
(Ref: Ch 7.1-7.3.3 in Rice)
2!
Introduction
Many applications of statistics is for inference on a fixed and
finite
population – estimation of population parameters and providing
some sort of quantification of accuracy. Typically the estimates/
accuracies are generated via some sort of “random” sampling of
6. the
population. This Lecture will describe the appropriate
probability
(and hence statistical) models for results of random sampling.
3!
(a) A population is a class of things/elements and we denote its
size by N. We assume that associated to each thing/element is a
number xi which is the characteristic of interest. So a
population
is:
(b) Population mean is:
(c) Population total is:
(d) Population variance is:
Population Parameters
4!
(e) An important special case is when all of the xi’s are 0 or 1.
• The population consists of those having or not having a
particular characteristic.
-
-
• Refer to the population mean as the population proportion and
7. denote it by p.
• In this case the variance is:
Population Parameters
5!
Definition. For a population of size N, we say that a random
sample of size n is a simple random sample (srs) if
(i) Sampling is done without replacement.
(ii) All ︎“N choose n” ︎(“N combination n”) subsets of size n
in the
population have an equally likely chance of being chosen.
Remark. Actually carrying out a srs can be very hard to do in
practice.
Sampling
6!
Example.
8. Sampling
7!
Suppose X1,X2,...,Xn are random variables representing a srs
from
a population. They are not independent (due to the sampling
without replacement), but they do have common mean and
variance.
Proposition. Suppose X1, X2, . . . , Xn is a srs from a
population of
size N, mean µ, and variance of σ2. Then
Expectation and Variance for srs
8!
Definitions.
(a) Suppose that X1,X2,...,Xn denote a srs of size n. Then the
sample mean is defined as:
(b) In the case where population values are 0 or 1, then the
sample proportion is:
(c) For a srs of X1,X2,...,Xn from a population, a natural
estimate
9. of the population total τ is given by:
Remarks. The sample mean, the sample proportion, and the
population total
estimate are natural estimates for the respective population
parameters of mean µ,
population proportion p, and population total τ.
Sample Mean as Estimate
9!
Remark. As in prediction, we want to quantify how good the
estimates: are in estimating parameters µ, τ, p –
natural to do this via mean-squared error (MSE) which can be
defined for any estimator.
Definition. Bias of the estimate is defined as:
Remark.
MSE/Bias/Variance of Estimators
µ̂
10. 10!
Corrollary. Suppose that X1,X2,...,Xn is a srs from a population
with mean µ and total τ. Then
(a) is an unbiased estimate for µ.
(b) T is an unbiased estimate for τ.
(c) if the population consists of 0 and 1, the sample proportion
is an unbiased estimate for p.
Sample Means are Unbiased
p̂
X
11!
Remarks.
Sample Means are Unbiased
12!
Theorem. Suppose that X1,X2,...,Xn are random variables
corresponding to a srs from a population of size N and which
has
11. a mean of µ and variance of σ2. Then
Remarks.
Variance of sample mean from a srs
13!
Notation and Terminology.
Corollary 1.
Corollary 2.
Variance of sample mean from a srs
14!
Remark. Typically one does not know the mean or the variance
of a population – that is why one is sampling and doing
estimation. The standard errors of the and T depend on the
underlying population standard deviation, and the standard error
of depends on the population proportion p, the very parameter
we are trying to estimate.
12. Recall.
Estimation of Population Variance
X
p̂
15!
Theorem. Suppose X1,X2,...,Xn is a srs from a population of
size
N with a mean of µ and variance of σ2. Then we have:
Estimation of Population Variance
16!
We want to translate previous results into deriving unbiased
estimates for the standard errors of our 3 estimators.
Notation. Let denote the estimate of the standard error .
Corollary A.
Corollary A´.
13. Unbiased estimation of standard errors
sX �X
17!
Corollary B.
Terminology. We refer to as estimated standard
errors. For each, the squared values are unbiased estimates of
the
variance.
Unbiased estimation of standard errors
sX, sT , sp̂
18!
Population Parameters, Estimates, Std. Errors
14. 19!
Summary
20!
• In case of sampling with replacement, could invoke the CLT
to
derive the approximate sampling distribution for reasonable
sized n, i.e., n ≥ 25
• There is a generalization of the CLT which applies for the
case
where one has a srs – essentially says that CLT applies if
– sample size n is large enough and
– the sampling fraction, n, is small enough
CLT Approximation for srs
21!
CLT Result for srs
22!
Recall.
15. Remark. It is desirable to have a more direct statement
quantifying the accuracy of the estimate – one standard way for
doing this is via confidence intervals.
Introduction to Confidence Intervals
23!
Definition.
Suppose θ is a (general) population parameter and X1, X2, . . . ,
Xn
is a srs. Then a 100(1−α)% confidence interval is
(a) an random interval I to be computed/derived from the srs.
(b) in advance we know that P(θ � I) ≥ 1 − α.
Overview of Confidence Intervals
24!
Remarks.
• Once the data is collected and a confidence interval is
computed, the parameter is either in or out of the confidence
interval – there is no longer any probability.
16. ! Hence the terminology of term confidence interval (as
opposed
to probability interval)!
• One potentially helpful interpretation is that if one were to
collect srs’s and compute 95% confidence intervals over and
over again (say a 1000 times), approximately 95% of those
intervals would contain the true parameter
! We simply do not know which ones!
• Typical values for α are .10, .05., 01, resulting in 90, 95, and
99 percent confidence intervals
Overview of Confidence Intervals
25!
Specifics of Confidence Intervals
26!
• Confidence intervals for µ
• Confidence intervals for p
Confidence Intervals for srs