6.00
Statistical Estimation
October 6, 2023 1
Objectives
At the end of this session the student will be able to:
 Describe Statistical inference and estimation
 Differentiate between point and interval estimation
 Compute appropriate confidence intervals for population
means and proportions and interpret the findings
 Describe methods of sample size calculation
October 6, 2023 2
Outline
 Definition of Statistical inference and estimation
 Point estimation
 Interval estimation
 Sample size calculation
October 6, 2023 3
 Descriptive statistics help investigators describe and
summarize data.
 Probability and sampling distribution concepts needed to
evaluate data using statistical methods.
 Without probability and sampling distribution theory:
 we could not make statements about populations without
studying everyone in the population
 studying everyone population is an undesirable and often
impossible task
October 6, 2023 4
Statistical inference
Statistical inference
 is the procedure by which we reach a conclusion about a
population on the basis of the information contained in a
sample that has been drawn from that population.
 The two primary methods for making inference are
estimation and hypothesis testing.
October 6, 2023 5
October 6, 2023 6
Statistical Estimation
Estimation: is the process of determining a likely value for a
variable in the population based on information collected
from the sample.
 The use of sample statistics to estimate population
parameters.
 Researchers are usually interested in looking at estimates
of many statistics , totals, averages and proportions
E.g. Estimates for the proportion of smokers among all people
aged 15 to 24 in the population
October 6, 2023 7
Types of Estimation
1. Point Estimation
2. Interval Estimation
October 6, 2023 8
1. Point Estimation
 A single numerical value is used to estimate the
corresponding population parameter.
 Point estimate is always within the interval estimate
 is an estimator of the population mean μ.
 s is an estimator of the population standard deviation σ
 p is an estimator of the population proportion π.
 r is an estimator of the population correlation coefficient ρ
October 6, 2023 9
From a single sample we can calculate a sample statistic to estimate
a single parameter (a point estimate).
Point estimate for population mean µ is
Point estimate for population proportion is given by
Where x is the total number of success (events)
October 6, 2023 10
n
x
=
x
n
1
=
i
i

n
=
p
x

Properties of a Good Estimates
a.Unbiasedness
 A sample statistic whose mean is equal to the population
parameter it estimates is unbiased.
The sample mean and median are unbiased estimators of
the population mean μ.
b. Minimum variance
 An estimate which has a minimum standard error is a good
estimator.
For symmetrical distribution the mean has a minimum
standard error and
If the distribution is skewed the median has a minimum
standard error.
October 6, 2023 11
c. Consistency
 As sample size increases, variation of the estimator from
the true population value decreases
October 6, 2023 12
2. Interval estimation
Interval estimation: is a statement that a population parameter
has a value lying between two specified limits.
 An interval estimate provides more information about a
population characteristic than a point estimate
 The value of the sample statistic will vary from sample to
sample therefore to simply obtain an estimate of the single
value of the parameter is not generally acceptable.
October 6, 2023 13
 We need to take into account the sample to sample variation of
the statistic.
 A confidence interval defines an interval within which the
true population parameter is like to fall (interval estimate)
October 6, 2023 14
October 6, 2023 15
October 6, 2023 16
Interval estimate (Confidence interval) - consists of two
numbers, a lower limit and an upper limit which serve as
the bounding values within which the parameter is expected
to lie with a certain degree of confidence
October 6, 2023 17
October 6, 2023 18
 Confidence interval therefore takes into account the sample
to sample variation of the statistic and gives the measure of
precision.
 The general formula used to calculate a Confidence interval
is Estimate ± K × Standard Error, k is called reliability
coefficient
 Confidence intervals express the inherent uncertainty in any
medical study by expressing upper and lower bounds for
anticipated true underlying population parameter.
 Most commonly the 95% confidence intervals are
calculated, however 90% and 99% confidence intervals are
sometimes used.
October 6, 2023 19
proportion
estimating
for
n
P
P
z
p
n
P
P
z
p
e
s
by
estimated
be
can
it
unknown
is
if
mean
estimating
for
n
z
x
n
z
x
]
/
)
1
(
.
,
/
)
1
(
.
[
.
,
]
.
,
.
[
2
2
2
2

















October 6, 2023 20
A (1-α) 100% confidence interval for unknown population mean
and population proportion is given as follows;
 The 95% confidence interval is interpreted in such a way that,
under the conditions assumed for underlying distribution, you are
95% confident that the interval contains the true parameter.
 90% CI is narrower than 95% CI since we are only 90% certain
that the interval includes the population parameter.
 The 99% CI is wider than 95% CI; the extra width meaning that
we can be more certain that the interval will contain the
population parameter.
October 6, 2023 21
October 6, 2023 22
 But to obtain a higher confidence from the same sample, we must
be willing to accept a larger margin of error (a wider interval).
 For a given confidence level (i.e. 90%, 95%, 99%) the width of
the confidence interval depends on the Standard Error of the
estimate which in turn depends on the:
Sample size:-The larger the sample size, the narrower the
confidence interval and the more precise our estimate.
 Lack of precision means in repeated sampling the values of the
sample statistic are spread out or scattered.
 The result of sampling is not repeatable.
October 6, 2023 23
 You can make the precision as high as you want by taking a
large enough sample.
 The margin of error decreases as√n increases.
Standard deviation:-The more the variation among the
individual values, the wider the confidence interval and the
less precise the estimate.
 As sample size increases SD decreases.
October 6, 2023 24
October 6, 2023 25
Confidence Intervals for
• A single population mean
• A single population proportion
• Difference between two population means
• Difference between two population proportions
October 6, 2023 26
1) C.I. for a population mean (normally distributed)
a) Known variance (large sample size)
• A 100(1‐α)% C.I. for μ is
• α is to be chosen by the researcher, most common values of α are
0.05, 0.01, 0.001 and 0.1.
October 6, 2023 27
Example
 A physical therapist wished to estimate, with 99% confidence,
the mean maximal strength of a particular muscle in a certain
group of individuals.
 He assume that strength scores are approximately normally
distributed with a variance of 144.
 A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.
October 6, 2023 28
Solution:
⇒ We are 99% confident that the population mean is between
76.3 and 92.3.
October 6, 2023 29
 E.g. 2. A random sample of 100 cancer patients treated with a
new drug has a mean survival time of 46.9 months.
 If the SD of the population is 43.3 months, find a 95%
confidence interval for the population mean.
Solution: 46.9 ± (1.96) x(43.3 /√100) = 46.9 ± 8.5 = (38.4 to 55.4
months)
 Hence, there is 95% certainty that the limits (38.4, 55.4) contain
the mean survival times in the population from which the sample
arose.
October 6, 2023 30
b) Unknown variance (small sample size n ≤ 30)
A 100(1‐α)% C.I. for μ is
 The t distribution density curve is bell shaped and symmetrical
about zero.
 Different curves for different df (i.e. sample sizes) and for very
large df very close to Z.
October 6, 2023 31
The Z-test is applied when:
 The distribution is normal
 The population standard deviation σ is known or
 When the sample size n is large ( n ≥ 30) and
 With unknown σ (by taking S as estimator of σ).
October 6, 2023 32
But, what happens when n< 30 and σ is unknown?
 We will use a t-distribution which depends on the number of
degrees of freedom (df).
 The t-distribution is a theoretical probability distribution (i.e.
its total area is 100 percent) and is defined by a mathematical
function.
 The distribution is symmetrical, bell-shaped and similar to the
normal but more spread out.
October 6, 2023 33
 As the df decrease, the t-distribution becomes increasingly
spread out compared with the normal.
 The sample standard deviation is used as an estimate of σ (the
standard deviation of the population which is unknown) and
appears to be a logical substitute.
 For large sample sizes (n ≥ 30), both t and Z curves are so close
together and it does not much matter which you use.
October 6, 2023 34
Degrees of Freedom
 It is defined as the number of values which are free to vary
after imposing a certain restriction on your data.
Example: If 3 scores have a mean of 10, how many of the scores
can be freely chosen?
Solution: The first and the second scores could be chosen freely
(i.e., 8 and 12, 9 and 5, 7 & 15, etc.)
But the third score is fixed (i.e., 10, 16, 8, etc.)
 Hence, there are two degrees of freedom
October 6, 2023 35
Table of t-distributions
The table of t-distribution shows values of t for
selected areas under the t curve.
Different values of df appear in the first column.
The table is adapted for efficient use for either one or
two-tailed tests.
October 6, 2023 36
E.g. 1) If df = 8, 5% of t scores are above what value?
Solution:
 Look at along the row labeled “one tail” to the value 0.05;
 The intersection of the 0.05 column and the row with 8 in the df
column gives the value of t = 1.86.
E.g. 2) Find t if n =13 and 95% of t scores are between –t and +t.
Solution: df =13-1 = 12. If 95% of t scores are between -to and + to,
then 5% are in the two tails.
October 6, 2023 37
 Look at the table along the row labeled “two tail” to
the value 0.05;
 The intersection of this 0.05 column and the row
with 12 in the df column gives to = 2.179.
E.g 3. If df =5, what is the probability that a t score is
above 2.02 or below -2.02?
Solution:
Two tails are implied. Look along the “df =5” row to
find the entry 2.02.
 The probability is 0.10
October 6, 2023 38
October 6, 2023 39
October 6, 2023 40
E.g. the mean pulse rate and standard deviation of a random sample
of 9 first year male medical students were 68.7 and 8.67 beats
per minute respectively. (Assume normal distribution).
 Find a 95% C.I. for the population mean.
Solution: t α/2, n-1= t= t(0.025, 8) = 2.31
t tab (with α = 0.05 and (n-1 )df = ± 2.31 and S / √ 9 = 2.89
Therefore, 95% C.I. for μ = 68.7 ± (2.31 x 2.89) = 68.7 ± 6.7
= (62.0 to 75.4) beats per minute.
We are 95% sure that the population mean (μ) lies between 62.0
and 75.4
October 6, 2023 41
October 6, 2023 42
October 6, 2023 43
October 6, 2023 44
October 6, 2023 45
October 6, 2023 46
October 6, 2023 47
October 6, 2023 48
October 6, 2023 49
October 6, 2023 50
October 6, 2023 51
October 6, 2023 52
October 6, 2023 53
October 6, 2023 54
October 6, 2023 55
October 6, 2023 56
3) C.I. for a population proportion (large sample size)
October 6, 2023 57
 p = 123/300 = 0.41 a point estimator of π.
 α = 0.05 ⇒ Z0.025 = 1.96
Example 2: An epidemiologist is worried about the ever increasing
trend of malaria in a certain locality and wants to estimate the
proportion of persons infected in the peak malaria transmission
period.
• If he takes a random sample of 150 persons in that locality
during the peak transmission period and finds that 60 of them are
positive for malaria,
October 6, 2023 58
Find: a) 95% b) 90% c) 99% confidence intervals
for the proportion of the whole infected people in that
locality during the peak malaria transmission period.
Solution:
• Semple proportion = 60 / 150 =0.4
a) A 95% C.I for the population proportion (the
proportion of the whole infected people in that
locality) = 0.4 ± 1.96 (0.04) = (0.4 ± 0.078) =
(0.322, 0.478).
b) A 90 = 0.4 ± 1.64 (0.04) = (0.4 ± 0.065)
c) A 99= 0.4 ± 2.57 (0.04) = (0.4 ± 0.1)
October 6, 2023 59
October 6, 2023 60
October 6, 2023 61
October 6, 2023 62
Sample size determination
How many samples should be taken from the larger
population to have a representative sample?
it depends on :
 objective of the study;
 design of the study;
How different or dispersed the population
 accuracy of the measurements to be made;
 degree of precision required for generalization;
 degree of confidence with which to conclude
Availability of resources
October 6, 2023 63
Sample size determination
• Given confidence interval
• Hence the absolute precision denoted by d is given as
• Where s.e is the standard error of the estimator of the
parameter of interest.
October 6, 2023 64
e
s
z
proportion
mean .
)
(
2


e
s
z
d .
2


Estimating a single population mean
October 6, 2023 65
Sample size for single population proportion
 If the study aims to be conducted on single population, then we
need the following :
1. What is the probability of the event occurring?
2. How much error is tolerable ?or How much precision do
we need?
3. How confident do we need to be that the true population
value falls within the confidence interval?
October 6, 2023 66
Single population proportion
• Let p denotes proportion of success, then
October 6, 2023 67
Where:
 n-is minimum sample size
 p-is estimate of the prevalence rate for the
population
 d-is the margin of sampling error tolerated
 Zα/2 is the standard normal variable at (1-α)100%
confidence level and α is mostly 5%
October 6, 2023 68
Point to be considered
October 6, 2023 69
Example
1.A hospital administrator wishes to know what proportions of
discharged patients are unhappy with the care received
during hospitalization. If 95% Confidence interval is desired
to estimate the proportion within 5%, how large a sample
should be drawn?
n = Z2p(1-p)/d2=(1.96)2 (.5×.5)/(.05)2 =384.2
≈ 385 patients
October 6, 2023 70

estimation.pptx

  • 1.
  • 2.
    Objectives At the endof this session the student will be able to:  Describe Statistical inference and estimation  Differentiate between point and interval estimation  Compute appropriate confidence intervals for population means and proportions and interpret the findings  Describe methods of sample size calculation October 6, 2023 2
  • 3.
    Outline  Definition ofStatistical inference and estimation  Point estimation  Interval estimation  Sample size calculation October 6, 2023 3
  • 4.
     Descriptive statisticshelp investigators describe and summarize data.  Probability and sampling distribution concepts needed to evaluate data using statistical methods.  Without probability and sampling distribution theory:  we could not make statements about populations without studying everyone in the population  studying everyone population is an undesirable and often impossible task October 6, 2023 4
  • 5.
    Statistical inference Statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample that has been drawn from that population.  The two primary methods for making inference are estimation and hypothesis testing. October 6, 2023 5
  • 6.
  • 7.
    Statistical Estimation Estimation: isthe process of determining a likely value for a variable in the population based on information collected from the sample.  The use of sample statistics to estimate population parameters.  Researchers are usually interested in looking at estimates of many statistics , totals, averages and proportions E.g. Estimates for the proportion of smokers among all people aged 15 to 24 in the population October 6, 2023 7
  • 8.
    Types of Estimation 1.Point Estimation 2. Interval Estimation October 6, 2023 8
  • 9.
    1. Point Estimation A single numerical value is used to estimate the corresponding population parameter.  Point estimate is always within the interval estimate  is an estimator of the population mean μ.  s is an estimator of the population standard deviation σ  p is an estimator of the population proportion π.  r is an estimator of the population correlation coefficient ρ October 6, 2023 9
  • 10.
    From a singlesample we can calculate a sample statistic to estimate a single parameter (a point estimate). Point estimate for population mean µ is Point estimate for population proportion is given by Where x is the total number of success (events) October 6, 2023 10 n x = x n 1 = i i  n = p x 
  • 11.
    Properties of aGood Estimates a.Unbiasedness  A sample statistic whose mean is equal to the population parameter it estimates is unbiased. The sample mean and median are unbiased estimators of the population mean μ. b. Minimum variance  An estimate which has a minimum standard error is a good estimator. For symmetrical distribution the mean has a minimum standard error and If the distribution is skewed the median has a minimum standard error. October 6, 2023 11
  • 12.
    c. Consistency  Assample size increases, variation of the estimator from the true population value decreases October 6, 2023 12
  • 13.
    2. Interval estimation Intervalestimation: is a statement that a population parameter has a value lying between two specified limits.  An interval estimate provides more information about a population characteristic than a point estimate  The value of the sample statistic will vary from sample to sample therefore to simply obtain an estimate of the single value of the parameter is not generally acceptable. October 6, 2023 13
  • 14.
     We needto take into account the sample to sample variation of the statistic.  A confidence interval defines an interval within which the true population parameter is like to fall (interval estimate) October 6, 2023 14
  • 15.
  • 16.
    October 6, 202316 Interval estimate (Confidence interval) - consists of two numbers, a lower limit and an upper limit which serve as the bounding values within which the parameter is expected to lie with a certain degree of confidence
  • 17.
  • 18.
  • 19.
     Confidence intervaltherefore takes into account the sample to sample variation of the statistic and gives the measure of precision.  The general formula used to calculate a Confidence interval is Estimate ± K × Standard Error, k is called reliability coefficient  Confidence intervals express the inherent uncertainty in any medical study by expressing upper and lower bounds for anticipated true underlying population parameter.  Most commonly the 95% confidence intervals are calculated, however 90% and 99% confidence intervals are sometimes used. October 6, 2023 19
  • 20.
  • 21.
     The 95%confidence interval is interpreted in such a way that, under the conditions assumed for underlying distribution, you are 95% confident that the interval contains the true parameter.  90% CI is narrower than 95% CI since we are only 90% certain that the interval includes the population parameter.  The 99% CI is wider than 95% CI; the extra width meaning that we can be more certain that the interval will contain the population parameter. October 6, 2023 21
  • 22.
  • 23.
     But toobtain a higher confidence from the same sample, we must be willing to accept a larger margin of error (a wider interval).  For a given confidence level (i.e. 90%, 95%, 99%) the width of the confidence interval depends on the Standard Error of the estimate which in turn depends on the: Sample size:-The larger the sample size, the narrower the confidence interval and the more precise our estimate.  Lack of precision means in repeated sampling the values of the sample statistic are spread out or scattered.  The result of sampling is not repeatable. October 6, 2023 23
  • 24.
     You canmake the precision as high as you want by taking a large enough sample.  The margin of error decreases as√n increases. Standard deviation:-The more the variation among the individual values, the wider the confidence interval and the less precise the estimate.  As sample size increases SD decreases. October 6, 2023 24
  • 25.
  • 26.
    Confidence Intervals for •A single population mean • A single population proportion • Difference between two population means • Difference between two population proportions October 6, 2023 26
  • 27.
    1) C.I. fora population mean (normally distributed) a) Known variance (large sample size) • A 100(1‐α)% C.I. for μ is • α is to be chosen by the researcher, most common values of α are 0.05, 0.01, 0.001 and 0.1. October 6, 2023 27
  • 28.
    Example  A physicaltherapist wished to estimate, with 99% confidence, the mean maximal strength of a particular muscle in a certain group of individuals.  He assume that strength scores are approximately normally distributed with a variance of 144.  A sample of 15 subjects who participated in the experiment yielded a mean of 84.3. October 6, 2023 28
  • 29.
    Solution: ⇒ We are99% confident that the population mean is between 76.3 and 92.3. October 6, 2023 29
  • 30.
     E.g. 2.A random sample of 100 cancer patients treated with a new drug has a mean survival time of 46.9 months.  If the SD of the population is 43.3 months, find a 95% confidence interval for the population mean. Solution: 46.9 ± (1.96) x(43.3 /√100) = 46.9 ± 8.5 = (38.4 to 55.4 months)  Hence, there is 95% certainty that the limits (38.4, 55.4) contain the mean survival times in the population from which the sample arose. October 6, 2023 30
  • 31.
    b) Unknown variance(small sample size n ≤ 30) A 100(1‐α)% C.I. for μ is  The t distribution density curve is bell shaped and symmetrical about zero.  Different curves for different df (i.e. sample sizes) and for very large df very close to Z. October 6, 2023 31
  • 32.
    The Z-test isapplied when:  The distribution is normal  The population standard deviation σ is known or  When the sample size n is large ( n ≥ 30) and  With unknown σ (by taking S as estimator of σ). October 6, 2023 32
  • 33.
    But, what happenswhen n< 30 and σ is unknown?  We will use a t-distribution which depends on the number of degrees of freedom (df).  The t-distribution is a theoretical probability distribution (i.e. its total area is 100 percent) and is defined by a mathematical function.  The distribution is symmetrical, bell-shaped and similar to the normal but more spread out. October 6, 2023 33
  • 34.
     As thedf decrease, the t-distribution becomes increasingly spread out compared with the normal.  The sample standard deviation is used as an estimate of σ (the standard deviation of the population which is unknown) and appears to be a logical substitute.  For large sample sizes (n ≥ 30), both t and Z curves are so close together and it does not much matter which you use. October 6, 2023 34
  • 35.
    Degrees of Freedom It is defined as the number of values which are free to vary after imposing a certain restriction on your data. Example: If 3 scores have a mean of 10, how many of the scores can be freely chosen? Solution: The first and the second scores could be chosen freely (i.e., 8 and 12, 9 and 5, 7 & 15, etc.) But the third score is fixed (i.e., 10, 16, 8, etc.)  Hence, there are two degrees of freedom October 6, 2023 35
  • 36.
    Table of t-distributions Thetable of t-distribution shows values of t for selected areas under the t curve. Different values of df appear in the first column. The table is adapted for efficient use for either one or two-tailed tests. October 6, 2023 36
  • 37.
    E.g. 1) Ifdf = 8, 5% of t scores are above what value? Solution:  Look at along the row labeled “one tail” to the value 0.05;  The intersection of the 0.05 column and the row with 8 in the df column gives the value of t = 1.86. E.g. 2) Find t if n =13 and 95% of t scores are between –t and +t. Solution: df =13-1 = 12. If 95% of t scores are between -to and + to, then 5% are in the two tails. October 6, 2023 37
  • 38.
     Look atthe table along the row labeled “two tail” to the value 0.05;  The intersection of this 0.05 column and the row with 12 in the df column gives to = 2.179. E.g 3. If df =5, what is the probability that a t score is above 2.02 or below -2.02? Solution: Two tails are implied. Look along the “df =5” row to find the entry 2.02.  The probability is 0.10 October 6, 2023 38
  • 39.
  • 40.
  • 41.
    E.g. the meanpulse rate and standard deviation of a random sample of 9 first year male medical students were 68.7 and 8.67 beats per minute respectively. (Assume normal distribution).  Find a 95% C.I. for the population mean. Solution: t α/2, n-1= t= t(0.025, 8) = 2.31 t tab (with α = 0.05 and (n-1 )df = ± 2.31 and S / √ 9 = 2.89 Therefore, 95% C.I. for μ = 68.7 ± (2.31 x 2.89) = 68.7 ± 6.7 = (62.0 to 75.4) beats per minute. We are 95% sure that the population mean (μ) lies between 62.0 and 75.4 October 6, 2023 41
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
    3) C.I. fora population proportion (large sample size) October 6, 2023 57
  • 58.
     p =123/300 = 0.41 a point estimator of π.  α = 0.05 ⇒ Z0.025 = 1.96 Example 2: An epidemiologist is worried about the ever increasing trend of malaria in a certain locality and wants to estimate the proportion of persons infected in the peak malaria transmission period. • If he takes a random sample of 150 persons in that locality during the peak transmission period and finds that 60 of them are positive for malaria, October 6, 2023 58
  • 59.
    Find: a) 95%b) 90% c) 99% confidence intervals for the proportion of the whole infected people in that locality during the peak malaria transmission period. Solution: • Semple proportion = 60 / 150 =0.4 a) A 95% C.I for the population proportion (the proportion of the whole infected people in that locality) = 0.4 ± 1.96 (0.04) = (0.4 ± 0.078) = (0.322, 0.478). b) A 90 = 0.4 ± 1.64 (0.04) = (0.4 ± 0.065) c) A 99= 0.4 ± 2.57 (0.04) = (0.4 ± 0.1) October 6, 2023 59
  • 60.
  • 61.
  • 62.
  • 63.
    Sample size determination Howmany samples should be taken from the larger population to have a representative sample? it depends on :  objective of the study;  design of the study; How different or dispersed the population  accuracy of the measurements to be made;  degree of precision required for generalization;  degree of confidence with which to conclude Availability of resources October 6, 2023 63
  • 64.
    Sample size determination •Given confidence interval • Hence the absolute precision denoted by d is given as • Where s.e is the standard error of the estimator of the parameter of interest. October 6, 2023 64 e s z proportion mean . ) ( 2   e s z d . 2  
  • 65.
    Estimating a singlepopulation mean October 6, 2023 65
  • 66.
    Sample size forsingle population proportion  If the study aims to be conducted on single population, then we need the following : 1. What is the probability of the event occurring? 2. How much error is tolerable ?or How much precision do we need? 3. How confident do we need to be that the true population value falls within the confidence interval? October 6, 2023 66
  • 67.
    Single population proportion •Let p denotes proportion of success, then October 6, 2023 67
  • 68.
    Where:  n-is minimumsample size  p-is estimate of the prevalence rate for the population  d-is the margin of sampling error tolerated  Zα/2 is the standard normal variable at (1-α)100% confidence level and α is mostly 5% October 6, 2023 68
  • 69.
    Point to beconsidered October 6, 2023 69
  • 70.
    Example 1.A hospital administratorwishes to know what proportions of discharged patients are unhappy with the care received during hospitalization. If 95% Confidence interval is desired to estimate the proportion within 5%, how large a sample should be drawn? n = Z2p(1-p)/d2=(1.96)2 (.5×.5)/(.05)2 =384.2 ≈ 385 patients October 6, 2023 70

Editor's Notes

  • #9 Population parameter: The underlying (unknown) distribution of the variable of interest for a population. Sample parameter: Estimates of the population parameters obtained from a sample.
  • #14 Usually, we only have a sample and, don’t know the entire population. Eg: Point estimate of 0.94 for population proportion It is not reasonable to assume that the population proportion is exactly 0.94 The probability of getting a sample statistic value that is exactly equal to the corresponding population parameter is usually quite small. It may be reasonable to assume that 0.94 is close to the population proportion We use a point estimate to obtain an interval estimate
  • #15 Usually, we only have a sample and, don’t know the entire population. Eg: Point estimate of 0.94 for population proportion It is not reasonable to assume that the population proportion is exactly 0.94 The probability of getting a sample statistic value that is exactly equal to the corresponding population parameter is usually quite small. It may be reasonable to assume that 0.94 is close to the population proportion We use a point estimate to obtain an interval estimate
  • #31 Example • A data on 199 patients on systolic blood pressure gives a mean value of 125.8 mmHg. Let us assume that the standard deviation for this patient population is known to be 20 mmHg. Construct a 95 percent confidence interval for the population mean. Solution α -0.05 1.96 / 2 = ⇒ = α Z 199 125.8 ±1.96× 20 • The 95% CI is (123.0, 128.6 mmHg ) • We are 95% sure that the average systolic blood pressure for similar patients is between 123 and 128.6.
  • #44 Example • In a study of preeclampsia, Kaminski and Rechberger found the mean systolic blood pressure of 10 healthy, nonpregnant women to be 119 with a standard deviation of 2.1. A. What is the estimated standard error of the mean? B. Construct the 99% confidence interval for the mean of the population from which the 10 subjects may be presumed to be a random sample. C. What is the precision of the estimate? D. What assumptions are necessary for the validity of the confidence interval you constructed?0.66, The 99% CI is (116.8, 121.2), C. Precision = 3.250 X 0.66 = 2.16, D. The population is normally distributed The 10 subjects represent a random sample from this population
  • #64 Heterogeneity: need larger sample to study more diverse population • Desired precision: need larger sample to get smaller error • Sampling design: smaller if stratified, larger if cluster • Nature of analysis: complex multivariate statistics need larger samples • Accuracy: depends upon sample size, not to ratio of sample to population