Assignment #9
First, we recall some definitions that will be helpful in answering questions 1-3
A population parameter is a single value that describes a population characteristic (such as center, spread, location etc.)
EXAMPLES:
· The proportion
p
of adults in the United States who worry about money
· The mean lifetime
m
of a certain brand of computer hard disks
· The lower quartile
1
q
of a population of incomes
· The standard deviation
s
of the nicotine content per cigarette produced by a certain manufacture.
In real life, population parameters are usually unknown. An important objective of statistical inference is to use information obtained from random sample or samples (depending on the design of the study) to estimate parameters and to test claims made about them.
A statistic is a number computed form the sample data only. The resulting sample value must be independent of the population parameters.
Statistics are used as numerical estimates of population parameters.
Example: A random sample of 1500 national adults shows that 33% of Americans worry about money. The margin of error is +/- 3 percentage points.
Statistics have variation. Different random samples of size
n
from the same population will usually yield different values of the same statistic. This is called sampling variability.
The sampling distribution of a statistic is the distribution of the values taken by the statistic over all possible random samples of the same size from a given population.
What do we look for in a sampling distrinution?
Bias: A statistic is unbiased if its sampling distribution has a mean that is equal to the true value of the parameter being estimated by that statistic.
Variability: How much variation is there in the sampling distribution?
The goal of this assignment is to simulate the sampling distribution of some statistics.
Question 1:
An urn contains 50 beads. The beads are identical in shape and have one of two colors: blue and orange. We would like to estimate the proportion
p
of blue beads. We select without replacement a sample of 10 beads. The relevant statistics is the sample proportion
p
ˆ
of blue beads (i.e., the number of blue beads in the sample divided by 10.
For the purpose of the simulation exercise, we will assume that the box contains exactly 15 blue beads or, equivalently, the proportion of blue beads is
30
.
0
=
p
.
i) Select 100 samples of size 10 from the box.
ii) Compute the sample proportion
p
ˆ
of blue beads for each of the 100 samples found in (i).
iii) Make a histogram of the values of
p
ˆ
found in (ii) (that is the approximate sampling distribution of
p
ˆ
.)
iv) Find the summary statistics of the 100 values of
p
ˆ
.
v) Base yourself on the histogram and the summary statistics to describe the approximate sampling distribution of
p
ˆ
.
vi) Is
p
ˆ
an unbiased estimator of
p
? Hint: Evaluate the difference between
30
.
0
=
p
and the mean value. ...
Presiding Officer Training module 2024 lok sabha elections
Assignment #9First, we recall some definitions that will be help.docx
1. Assignment #9
First, we recall some definitions that will be helpful in
answering questions 1-3
A population parameter is a single value that describes a
population characteristic (such as center, spread, location etc.)
EXAMPLES:
· The proportion
p
of adults in the United States who worry about money
· The mean lifetime
m
of a certain brand of computer hard disks
· The lower quartile
1
q
of a population of incomes
· The standard deviation
s
2. of the nicotine content per cigarette produced by a certain
manufacture.
In real life, population parameters are usually unknown. An
important objective of statistical inference is to use information
obtained from random sample or samples (depending on the
design of the study) to estimate parameters and to test claims
made about them.
A statistic is a number computed form the sample data only. The
resulting sample value must be independent of the population
parameters.
Statistics are used as numerical estimates of population
parameters.
Example: A random sample of 1500 national adults shows that
33% of Americans worry about money. The margin of error is
+/- 3 percentage points.
Statistics have variation. Different random samples of size
n
from the same population will usually yield different values of
the same statistic. This is called sampling variability.
The sampling distribution of a statistic is the distribution of the
values taken by the statistic over all possible random samples of
the same size from a given population.
What do we look for in a sampling distrinution?
Bias: A statistic is unbiased if its sampling distribution has a
mean that is equal to the true value of the parameter being
estimated by that statistic.
3. Variability: How much variation is there in the sampling
distribution?
The goal of this assignment is to simulate the sampling
distribution of some statistics.
Question 1:
An urn contains 50 beads. The beads are identical in shape and
have one of two colors: blue and orange. We would like to
estimate the proportion
p
of blue beads. We select without replacement a sample of 10
beads. The relevant statistics is the sample proportion
p
ˆ
of blue beads (i.e., the number of blue beads in the sample
divided by 10.
For the purpose of the simulation exercise, we will assume that
the box contains exactly 15 blue beads or, equivalently, the
proportion of blue beads is
30
.
0
=
p
.
i) Select 100 samples of size 10 from the box.
4. ii) Compute the sample proportion
p
ˆ
of blue beads for each of the 100 samples found in (i).
iii) Make a histogram of the values of
p
ˆ
found in (ii) (that is the approximate sampling distribution of
p
ˆ
.)
iv) Find the summary statistics of the 100 values of
p
ˆ
.
v) Base yourself on the histogram and the summary statistics to
describe the approximate sampling distribution of
p
ˆ
.
vi) Is
5. p
ˆ
an unbiased estimator of
p
? Hint: Evaluate the difference between
30
.
0
=
p
and the mean value.
vii) Based on the histogram, estimate the probability that
40
.
0
ˆ
>
p
and the probability that
20
.
0
ˆ
<
p
6. ?
Rcmdr instructions: (No data set needed)
Upload the package Rcmdr (refer to the general instructions on
the first page) with the command library(Rcmdr).
· For (i) and (ii): To select 100 random samples and to compute
the sample proportions, choose
· Distributions → Discrete distributions → Hypergeometric
distribution → Sample from hypergeometric distribution
· Enter 15 (the number of blue beads in the urn) in the “m” box
· Enter 35 (the number of orange beads in the urn) in the “n”
box
· Enter 1 (selecting 1 bead at a time) in the “k”box
· Enter 100 in the “Number of samples” box
· Enter 10 in the “Number of columns” box
· Select Sample means in the Add to data set list (it’s actually
the default selection)
· Click OK
Click on the View data set to see the result of the simulation.
The 1 signifies “blue” and 0 “orange”. Each row is a random
sample of size 10 and its mean (last column) is the sample
proportion (do you see why?).
· To find the histogram of the 100 sample proportions:
· Choose Graphs → Histogram
· Pick the “mean” variable
· Click on Options and
7. · select “Percentages” from Axis Scaling
· in the x-label box, enter Sample proportion (n=10)
· Leave the y-label box empty
· Insert a title in the Graph title field (for e.g., Approximate
Sampling Distribution of the Sample Proportion) and click OK.
Copy and paste to your Word document
· Click OK. The output appears in a separate graph window
· To find the summary statistics:
· Choose Statistics → Summaries → Numerical summaries
· Pick the “mean” variable
· Click on Options and choose Mean, Standard deviation and
quantiles (these are the percentiles) and deselect everything
else.
· Click OK. The output will be found in the Output part of the R
Commander window. Copy and paste.
Plain R instructions (Important skip if you used Rcmdr)
Starting R: Double click on the R icon. The R console appears.
Copy and paste the program in the box below to the R console
and press Enter. Note the R code is in blue. The comments
follow the # sign. Take them along as they are not executable.
n=10 #sample size
b=15 #number of blue beads
o=35 #number of orange beads
k=1 #number of beads selected from the box
phat=c() #storage for the 100 phat values we will be generating
8. below
for (i in 1:100){
y=rhyper(n, b, o, k) #selects a random sample of size n
phat[i]=mean(y) #computes the proportion of blue beads in the
samoke
}
hist(phat, labels=T, col="grey", main="Sampling distribution of
phat") #gives histograms with counts on top of bin
summary(phat) #gives the 5-number summary and the mean of
phat
sd(phat) #gives the standard deviation of phat
length(phat) #gives the number phat values we simulated
The output consists of two parts:
a) The histogram of the 100 sample proportions that R
simulated
b) The summary statistics of the 100 sample proportions
Copy and paste the output and answer the questions.
Question 2:
Repeat question 1, but this time we select without replacement a
sample of size 20.
Rcmdr instructions:
· Repeat the Rcmdr instructions for question, except for the
number of columns. That line becomes Enter 20 in the “Number
of columns” box
Plain R instructions: (skip if you are using Rcmdr)
· Replace the line n=10 with n=20 in the box above in the plain
9. R script provided for question 1.
Question 3:
Assume your boss has asked you to estimate the proportion of
blue beads in the urn described above. Based on your findings
in questions 1 and 2, which of the two sampling distributions
would you prefer to work with. Explain your choice.
In questions 4-6, we have R simulate confidence intervals for a
normal population mean.
Question 4: (R)
i) Generate 25 samples of size 16 from a normal population with
mean
460
=
m
and standard deviation
100
=
s
. (nothing to take to your Word file)
ii) For each sample found in a), construct a 95% confidence
interval for the population mean.
iii) Verify by hand and for sample 1 only the results obtained by
R. Note that the sample mean is the midpoint of the confidence
interval.
iv) How many intervals contain
m
. Would you expect all 25 confidence intervals to contain
10. m
? Explain your answer.
R instructions (We are not using the Rcmdr package for this
problem)
Starting R: Double click on the R icon. The R console appears.
Copy and paste the program in the box below to the R console
and press Enter. Note the R code is in blue. The comments
follow the # sign. Take them along as they are not executable.
mu=460;sigma=100;n=16; #we are declaring the constants in the
simulation
k=25 #number of samples we will be selecting in this simulation
se=sigma/sqrt(n) #the yard stick, aka standard error of the
sample mean
lcb=c() #we are reserving a column for the lower confidence
bound
ucb=c() #we are reserving a column for the upper confidence
bound)
#We are going to generate k samples of size n from a normal
population with mean mu and standard deviation sigma
for (i in 1:k)
{
11. xbar=mean(rnorm(n,mu,sigma))#we are selecting a ample of
size n and computing its mean
lcb[i]=xbar-1.96*se #formula for the lower confidence bound
ucb[i]=xbar+1.96*se #formula for the upper confidence bound
}
#we print the confidence interval
ci=data.frame(lcb,ucb) #we create a data frame that consistts of
sample number and the lcb and ucb obtained from that sample
ci #print the data frame
#we plot the confidence intervals
matplot(rbind(lcb,ucb),
rbind(1:k,1:k),type="l", lty=1)#plots the ci's as line segments
abline(v=mu) #add a vertical line that represents the population
mean
c) The output consists of two parts:
d) the 25 confidence intervals (lower confidence bound, upper
confidence bound) to be found in the R console
e) a graphical representation of the intervals (in a separate
window)
Copy and paste the output and answer the questions.
Question 5: (R)
Repeat question 4, but this time use an 80% confidence level.
Instructions: Since we are dealing with a new confidence level,
12. modify the
*
z
value acccordingly in the lcb[i] and ucb[i] equations in the
program above and run it again.
Question 6: (R)
Based on the simulations you conducted in questions 4 and 5,
what are the differences between 80% and a 95% confidence
intervals for a population mean?
1
_1488732988.unknown
_1488733024.unknown
_1488734247.unknown
_1488739611.unknown
_1488772138.unknown
_1488904921.unknown
_1489391004.unknown
_1488732900.unknown
_1488732818.unknown
_1485108772.unknown
_1076310090.unknown
_1076310071.unknown
_1076309942.unknown
_1076309916.unknown