Inferences on Sample Proportions from BITS Pilani Course

BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Course No: MTH F113
Probability and Statistics

BITS Pilani
Pilani Campus
Chapter 9: Inferences on Proportions
Sumanta Pasari

What is Sample Proportion?
Example 1:
• Suppose a student guesses at the answer on every question in a
300-question examination. If he gets 60 questions correct, then
his proportion of correct guesses is 60/300=0.20.
• If he gets 75 questions correct, then his proportion of correct
guesses is 75/300=.25.
• Thus, the proportion of correct guesses is simply the number of
correct guesses divided by the total number of questions.
3

What is Sample Proportion?
Examples:
• Babies: Is the proportion of Indian babies born male
different from 0.50?
• Handedness: Are more than 80% of Indians right handed?
• Ice cream: Is the percentage of “creamery” customers who
prefer chocolate ice cream over vanilla less than 70%?
• Students: Is the proportion of Indian students visiting
abroad during BTech is less than 0.20?
4

We would like to estimate “population
proportion”, somehow using random
samples.
So, we have a population of interest; a particular
trait (a distinguishing characteristic or quality) is
being studied, and each member of the
population can be classified as either having the
trait or not (like, Bernoulli trials).
Let’s Define…

Let us draw a random sample X1, X2, …, Xn of size n from
population, where
Xi = 1, if the i-th member of the sample has the trait
= 0, if i-th sample does not have the trait
Then, gives the number of objects in the
sample with the trait and the statistic X/n gives the
proportion of the sample with the trait. Note that X is a
binomial RV with parameters n (known) and p.
Let’s Define…
1
n
X X
i
i
 


The statistic that estimates the parameter , a proportion of a
population that has some property, is
number in sample with the trait (success)
ˆ
sample size
the sample proportion
Properties:
(i) As th
X
p
n
p
 
 
(WH
e s
Y?)
ample size increases ( large), the sampling
ˆ
distribution of becomes approximately normal
1
ˆ ˆ
(ii) The mean of is , and variance of (WHY?)
(iii) Can we get estimators of ?
is
Po
n
p
p
p p
p p p
n

int and interval estimator
Sample Proportion

   
1
Note that
where, each is an independent point binomial (Bernoulli RV),
that is, 1 and
ˆ
0 1
,
i
i
n
i
i
X
P X p P X
X
n
p
X
p X
n
  
    

Sample Proportion
xi 1 0
f(xi) p 1-p
E[Xi] = 1(p)+0(1-p)=p
Var (Xi) = E[Xi
2]-(E[Xi])2 = p(1-p)

We just obtained the point estimator of p. Is it
unbiased? Does it have small variance?
Now to obtain CI on p, we would like to get a
known statistic. Can the central limit theorem be
useful?
What are the requirements for CLT?
Estimators for p
1
ˆ
n
i
X
X
p X
n n
  


 
 
 
 
 
2 2
2
Note that for large (using CLT),
Taking two points symmetrically about the origin,
1 ˆ
ˆ , 0,1
1
ˆ
1
1
con
we get
fidence l
Here 1 is known as e
p p p p
p N p N
n p p
n
p p
P
n
z z
p p
n
z


 

 
 

 
  
 
 
 

 
    
 









vel.
Confidence Interval on p
Interval Estimate = Point Estimate + /  Margin of Error
`

   
 
2 2
2
As is unknown, above confidence bounds are not statistics. So replace by
ˆ
unbiased estimator , and then the CI on having i
1 1
ˆ ˆ 1
confidence level 1
ˆ ˆ
1
ˆ
s
p p
p p
p p p p
P p z p p z
n n
p
p z
 



 
 
 
     
 
 



   
2
The endpoints of the confidence interval is cal
ˆ ˆ
1
ˆ
, .
confidence limits
led .
p p p
p z
n n

 

 

 
 

We can be 100(1-% sure that and p differ by
at most d , where d is given by
Thus, sample size for estimating p, when prior
estimate available is
2
2 2
ˆ ˆ
(1 )
p p
n z
d



p̂
Sample Size for Estimating p
2
ˆ ˆ
(1 )
p p
d z
n




 
2
2
2
It can be shown Exercise 9, page 327 that the value of
Thus, sample size for estimating ,when
prior estimate is not available is
1
ˆ ˆ
(1 ) .
4
.
4

 

p p
z
n
d
p
Sample Size for Estimating p

Problem Solving
Ex 9.1 (Ex. 2, Section 9.1, Page 326)
A study of electromechanical protection devices used in
electrical power systems showed that of 193 devices
that failed when tested, 75 were due to mechanical
part failures.
a)Find a point estimate for p, the proportion of failures
that are due to mechanical failures.
b)Find a 95% confidence interval on p.
c)How large a sample is required to estimate p to within
0.03 with 95% confidence.

Random variable X= number of failed devices which were due
to mechanical failure among 193 failed devices.
X has approx. normal dist with mean = 193p,
variance=193p(1-p).
a) Point estimate for p = = x/n = 75/193 = 0.3886.
b) 95% confidence interval on p is
ˆobs
p
 
  
0.025
75 75 75
1 193 0.3198,0.4574
193 193
193
2
1.96 0.389 0.611
(c) (with using prior estimate) ~1015
2
(0.03 )
2 2
1.96
/ 2
(without using prior estimate) ~1068.
2 2
4 (4)(0.03 )
z
n
z
n
d

 
  
 
 

 
Problem Solving

Ex 9.2 (Ex. 33, Page 333)
A survey of mining companies is to be conducted to
estimate p, the proportion of companies that
anticipate hiring either graduating seniors or
experienced engineers during the coming year.
a) How large a sample is required to estimate p
within 0.04 with 94% confidence?
b) A sample of size 500 yields 105 companies that
plan to hire such engineers. Find the point
estimate of p. Also find 94% confidence interval for
p.
Problem Solving

2 2
0.03
2 2
2
2
1.88
( ) (estimate unknown) 552.25 ~ 553.
(4)(0.04 ) (4)(0.04 )
Suppose we already know 0.2 p 0.35.
Then we can improve n by maximizing
p(1-p) on this interval to get
1.88
max{ (1 ) :0.2 0
0.04
z
a n
n p p p
  
 
   
2
2
.35}
1.88
max{(0.2)(0.8),(0.35)(0.65)}
0.04
502.5475 ~ 503.
 
 
 
 
 
 
 
 
 

 
 

 
Problem Solving

ˆ
( ) 105 / 500 0.210.
94% C.I. for p is
obs
b p  
0.03 0.03
ˆ ˆ ˆ ˆ ˆ ˆ
(1 ) / (1 ) /
. .
0.136 0.244
obs obs obs obs obs obs
p z p p n p p z p p n
i e
p
     
 
Problem Solving

20
Problem Solving
HW 9.1
A major metropolitan newspaper selected a random sample of 1,600
readers from their list of 100,000 subscribers. They asked whether the
paper should increase its coverage of local news. It was found that 640
readers out of the random sample of 1600 readers wanted more local
news.
(a) What is the point estimate for p, the proportion of readers who
would like more coverage of local news
(b) Find 95% and 99% confidence intervals on p.
(c) How large a sample is required to estimate p to within 0.05
with 99% confidence?

21
Problem Solving
HW 9.2
To examine the proportion of left-handed professional baseball players,
we choose a random sample of size 59 from this population. It was
observed that there are 15 left-handed baseball players in the given
random sample.
(a) What is the point estimate for p, the proportion of left handed
basketball players.
(b) Find 95% confidence intervals on p and interpret the result.

22
Problem Solving
HW 9.3
The cure rate for a standard treatment of a disease is 45%. Dr. Sen has
perfected a primitive treatment which he claims is much better. As
evidence, he says that he has used his new treatment on 50 patients
with the disease and cured 25 of them. What do you think? Is this new
treatment better? Use a 95% confidence interval to answer the question
[Ans: (0.36, 0.64)].
Does this example give some notion of hypothesis testing on
proportion?

23
Problem Solving
HW 9.4
The advocacy group of a company took a random sample of 1000
consumers who recently purchased their MP3 player and found that 400
were satisfied with their purchase. Find a 95% confidence interval for p,
the proportion of customers that are satisfied with the MP3 player of
that company.
HW 9.5
Experimenters injected a growth hormone gene into thousands of carp
eggs. Of the 400 carp that grew from these eggs, 20 incorporated the
gene into their DNA (Science News, May 20, 1989). Calculate a 95%
confidence interval for the proportion of carp that would incorporate
the gene into their DNA. [Ans.: (.03,.07)]

A hypothesis test about the value of a
population proportion p must take one of the
following three forms (where p0 is the
hypothesized value of the population). Then,
One-tailed
(lower tail)
One-tailed
(upper tail)
Two-tailed
0 0
:
H p p

0
:
a
H p p

0
:
a
H p p

0 0
:
H p p
 0 0
:
H p p

0
:
a
H p p

Hypothesis Testing on Proportions

9.2 Hypothesis testing on proportions
Let p0 be the null value of p.
Then
H0:p=p0 H0:p=p0 H0:p=p0
H1:p>p0 H1:p<p0 H1:p≠p0
Right-tailed test Left-tailed test Two-tailed test

 
 
0
0
0 0
0
Teststatisticfor testing
ˆ
:
1
H p
p
p
n
p
p
Z p




 Rejection Rule: P –Value Approach
Reject H0 if P –value <  0.05
 Rejection Rule: Critical Value Approach
H0: p  p0 Reject H0 if z > z
Reject H0 if z < -z
Reject H0 if z < -z or z > z
H0: p  p0
H0: p  p0

Note that the test statistic is a logical choice because it
compares the unbiased estimator of p with the null value
of p0.
 
 
0
0
0 0
0
Teststatisticfor testing
ˆ
:
1
H p
p
p
n
p
p
Z p





Steps of Hypothesis Testing
28
Step 1. Develop the null and alternative hypotheses of proportion;
Calculate the observed value of test statistic.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the test statistic.
Step 4. Based on , identify critical values (use the value of
test statistic to compute the P-value).
Step 5. Reject H0 if the calculated test statistic value falls in the
rejection region (Reject H0 if P-value < ).

Ex 9.3 For a Christmas and New Year’s week, the
National Safety Council estimated that 500 people
would be killed and 25,000 injured on the nation’s
roads. The NSC claimed that 50% of the accidents
would be caused by drunk driving. To verify, a sample
of 120 accidents showed that 67 were caused by drunk
driving. Use these data to test the NSC’s claim at α =
0.05
Sol.: Discussed in class from two approaches, P-
value and critical value
Problem Solving

Ex 9.4 A marketing company claims that it receives 8%
responses from its mailing. To test this claim, a
random sample of 500 were surveyed with 25
responses. Test at the  =0.05 significance level.
Sol.: Discussed in class from two approaches, P-
value and critical value
Problem Solving
30/55

Ex 9.5 (Ex. 10) A poll of investment analysts taken
earlier suggests that a majority of these individuals
think that the dominant issue affecting the future of
the solar energy industry is falling energy prices. A
new survey is being taken to see if this is still the case.
Let p denote the proportion of investment analysts
holding this opinion.
(a) Set up the appropriate null and alternative
hypothesis.
Problem Solving

b)When the survey is conducted, 59 of the 100
analysts sampled agreed that the major issue is
falling energy prices. Is this sufficient to allow us
to reject H0? Explain based on P-value of the test.
c) Interpret your results in the context of this
problem.
Problem Solving

Solution
p = proportion of persons from the population who
feel falling energy prices is a dominant issue for
future of solar energy.
a) H0: p≤ 0.5 Vs H1: p > 0.5
b) These can be replaced by
H0: p = 0.5 Vs H1: p > 0.5. To test these, we have used
a sample with n = 100,
ˆ / 0.59.
obs
p x n
 

0
0 0
0
ˆ 0.09
1.8.
(1 ) / 0.25 /100
( 1.8) 0.0359
P-value is relatively small. Hence we reject H .
obs
p p
z
p p n
P value P Z

  

   
c) A majority of investment analysis think that falling
energy prices is the dominant issue in the future of
solar energy.
Solution (Ex. 9.3)

HW 9.6 (Ex. 15) A battery operated digital pressure
monitor is being developed for use in calibrating
pneumatic pressure gauges in the field. It is thought that
95 % of the readings it gives lies within .01 lb/in2 of the
true reading. In a series of 100 tests, the gauge is
subjected to a pressure of 10,000 lb/in2. A test is
considered to be a success if the reading lies within
10,000± 0.01 lb/in2. We want to test
H0: p=0.95 H1: p≠0.95
at the =0.05 level.
Problem Solving

a) What are the critical points for the test.
b) When the data are gathered, it is found that 98 out
of 100 readings were successful. Can H0 be rejected at
α=0.05 level? To what type of error are you now
subject?
Problem Solving

Hints:
H0: p=0.95
H1: p≠0.95
Z has the standard normal distribution.
Since =0.05, z0.025 = 1.96, can we get critical values?
.
2179
.
0
)
95
.
0
ˆ
(
10
100
/
)
05
.
0
)(
95
.
0
(
95
.
0
ˆ
/
)
1
(
ˆ
:
Statistic
Test
0
0
0







p
p
n
p
p
p
p
Z

HW 9.7 (Ex. 14).
Opponents of the construction of a dam on the New river
claim that less than half of the residents living along the
river are in favour of its construction. A survey is conducted
to gain support for this point of view.
a) Set up the appropriate null and alternate hypotheses.
b) Find the critical point for an α=0.1 level test.
c) Of 500 people surveyed, 230 favour the construction. Is it a
sufficient evidence to justify the claim of the opponents?
d) To what type of error are you now subject? Discuss the
practical consequences of making such an error.
Problem Solving

Inferences on Sample Proportions from BITS Pilani Course

Recommended

Recommended

More Related Content

Similar to Inferences on Sample Proportions from BITS Pilani Course

Similar to Inferences on Sample Proportions from BITS Pilani Course (19)

More from ShubhamJain981677

More from ShubhamJain981677 (6)

Recently uploaded

Recently uploaded (20)

Inferences on Sample Proportions from BITS Pilani Course