SlideShare a Scribd company logo
1 of 79
Download to read offline
Confidence intervals
and Sample Size
Assist.Prof. TOL BUNKEA, MD,MSc-Epidemiology
(London School of Hygiene and Tropical Medicine, UK)
Head of Epidemiology Unit of National Centre for Parasitology, Entomology and
Malarial Control (CNM, MoH)
Lecturer of Epidemiology and Biostatistics (NIPH, UHS, UP)
Tel: 016 690 999
Email: tolbunkea@ymail.com,
Objec&ves:
At the end of this session you will be able to:
1. Distinguish between sample statistic and population parameter
2. Find the confidence interval for the mean when s is unknown
3. Interpret confidence interval in the estimation of the population
parameter.
4. Describe factors influencing the width of the confidence interval
5. Explain reasons for using confidence interval
Outline
① Population and Samples
② Point Es3mates and Confidence Intervals
③ Effect of sample size on Confidence Intervals
④ Confidence interval for mean and proportion
3
Introduction
Statistical inference is the process by which we
draw conclusions about a population from data
collected on a sample.
4
Population and Samples
• Census
–Everyone in popula7on
• Eg. All Cambodian residents
• Popula+on
–is a set of persons (or objects) having a
common observable characteris7c.
–the en7re collec7on of units about
which we would like informa7on.
• Sample
–is a representative subject (subgroup) of a
population.
–the collection of units we actually measure
• Example:
–If we want to know many persons in a
community
• have quit smoking or
• have health insurance or
• plan to vote for a certain candidate,
Population and Samples
• In medical research
–Population
• All patients candidate for treatment
–Sample
• All patients candidate for treatment who volunteer
for your study
• Infer results from volunteers (sample) to other
candidates for the same treatment (population).
–We usually obtain information on an
appropriate sample of the community and
generalize from it to the entire population.
Population and Samples
• The way the sample is selected, not its size,
determines whether we may draw
appropriate inferences about a population.
• The primary reason for selecting a sample
from a population is to draw inferences about
that population.
• Statistical inference is the process by which
we infer population properties from sample
properties.
• A major component: Parameter and Statistic
Population and Samples
Population and Samples
Popula&on and Samples
Parameters
ØAre fixed values “truth”, but we rarely
know them because it is often difficult
to obtain measures from the entire
population.
ØThe true value we hope to obtain.
Population and Samples
Statistics
ØAre known values computed from a sample; they
are random variables because they differ from
sample to sample.
ØAn estimate of the parameter based on observed
information in the sample
ØStatistics have associated error
• The study of statistics is about estimating that
error
• Central limit theorem tells us how much error to
expect in our sample estimates (i.e. sample
statistics)
Example
• A study is conducted to estimate the true mean
annual income of all adult residents of Cambodia.
• The study randomly selects 2000 adult residents of
Cambodia. What are the population? Sample? Parameter? Statistic?
• The population consists of all adult residents of
Cambodia.
• The sample is the 2000 residents in the study.
• The parameter is the true mean annual income of
all adult residents of Cambodia.
• The statistic is the mean of the 2000 residents in
this sample.
Example
• A survey is carried out at a university to estimate the
proportion of undergraduate students who drive to campus to
attend classes.
• One thousand students are randomly selected and asked
whether they drive or not to campus to attend classes.
– What are the population? Sample? Parameter? Statistic?
• The population is all of the undergraduates at that university.
• The sample is the group of 1000 undergraduate students
surveyed.
• The parameter is the true proportion of all undergraduate
students at that university who drive to campus to attend
classes.
• The statistic is the proportion of the 1000 sampled
undergraduates who drive to campus to attend classes.
Statistical methods to make inferences about the
population from the sample
One aspect of inferential statistics is estimation,
which is the process of estimating the value of a
parameter from information obtained from a
sample.
Point and Interval Estimates
• Since the populaLons from which these values
were obtained are large, these values are only
esLmates of the true parameters and are
derived from data collected from samples.
• The staLsLcal procedures for esLmaLng the
populaLon mean, propor?on, variance, and
standard devia?on will be explained.
• An important quesLon in esLmaLon is that of
sample size.
Point and Interval Estimates
• How large should the sample be in order to
make an accurate estimate?
• This question is not easy to answer since the
size of the sample depends on several factors,
such as the accuracy desired and the
probability of making a correct estimate.
• The question of sample size will be explained.
Point and Interval Estimates
Confidence Intervals for the Mean
When s Is Known and Sample Size
Confidence Intervals for the Mean
When s Is Known and Sample Size
• Suppose a college president wishes to estimate the
average age of students attending classes this semester.
• The president could select a random sample of 100
students and find the average age of these students, say,
22.3 years.
• From the sample mean, the president could infer that the
average age of all the students is 22.3 years.
• This type of estimate is called a point estimate.
Confidence Intervals for the Mean
When s Is Known and Sample Size
• You might ask why other measures of central
tendency, such as the median and mode, are not
used to esCmate the populaCon mean.
• The reason is that the means of samples vary less
than other staCsCcs (such as medians and modes)
when many samples are selected from the same
populaCon.
• Therefore, the sample mean is the best esCmate of
the populaCon mean.
Confidence Intervals for the Mean
When s Is Known and Sample Size
• Sample measures (i.e., statistics) are used to
estimate population measures (i.e., parameters).
• These statistics are called estimators.
• As previously stated, the sample mean is a better
estimator of the population mean than the sample
median or sample mode.
• A good estimator should satisfy the three properties
described now.
Confidence Intervals for the Mean
When s Is Known and Sample Size
Confidence Intervals
• The sample mean will be, for the most part,
somewhat different from the population
mean due to sampling error.
• Therefore, you might ask a second question:
How good is a point estimate?
• The answer is that there is no way of knowing
how close a particular point estimate is to the
population mean.
Confidence Intervals
• This answer places some doubt on the
accuracy of point estimates.
• For this reason, statisticians prefer another
type of estimate, called an interval estimate.
Confidence Intervals
• In an interval estimate, the parameter is specified as being
between two values.
• For example, an interval estimate for the average age of all
students might be
26.9 < µ < 27.7, or
27.3 ± 0.4 years.
• Either the interval contains the parameter or it does not.
• A degree of confidence (usually a percent) can be assigned
before an interval estimate is made.
• For instance, you may wish to be 95% confident that the
interval contains the true population mean.
• Another question then arises.
Why 95%? Why not 99 or 99.5%?
Confidence Intervals
• If you desire to be more confident, such as 99 or
99.5% confident, then you must make the interval
larger.
• For example, a 99% confidence interval for the mean
age of college students might be
26.7 < µ < 27.9, or
27.3 ± 0.6.
• Hence, a tradeoff occurs.
• To be more confident that the interval contains the
true population mean, you must make the interval
wider.
Confidence Intervals
• Intervals constructed in this way are called
confidence intervals.
• Three common confidence intervals are used:
the 90, the 95, and the 99% confidence intervals.
Confidence Intervals
The central limit theorem states that when the sample
size is large, approximately 95% of the sample means
taken from a population and same sample size will fall
within ± 1.96 standard errors of the population mean,
that is,
µ ± 𝟏. 𝟗𝟔
s
𝒏
Confidence Intervals
Hence, you can be 95% confident that the population
mean is contained within that interval when the values
of the variable are normally distributed in the
population.
Confidence Intervals
Confidence Intervals
Point and Interval Estimates
A point estimate is the statistic, computed from
sample information, which is used to estimate
the population parameter (single number) that
is an estimate of the population parameter;
• Point estimate is one of the main purposes of
statistics.
• The basic idea is that we take a sample of data and
use it to make inferences about the population of
interest.
• Point estimate involves the calculation of
confidence intervals for some statistic (For ex. a
proportion or an average)
Point and Interval Estimates
Point and Interval Es?mates
• A confidence interval estimate is a range of values
constructed from sample data so that the population
parameter is likely to occur within that range at a
specified probability.
• A range of values within which, we believe, the true
parameter lies with high probability.
• The specified probability is called the level of
confidence.
• Point estimate is a form of statistical inference.
• In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.
Example:
• A random sample of 32 patients treatment
cost is taken from a local hospital. Find a point
estimate for the population mean µ.
• The point estimate for the population mean µ
of treatment cost is 74.22$.
Point and Interval Estimates
Confidence Intervals
• Describes the precision of the estimate.
• The CI represents a range of values on either
side of the estimate.
• The narrower the CI, the more precise the
point estimate.
Example
• A large bag of 500 red, green and blue marbles:
– You want to know the percentage of green marbles
but don’t want to count every marble.
– Shake up the bag and select 50 marbles to give an
estimate of the percentage of green marbles
• Sample of 50 marbles:
– 15 green marbles, 10 red marbles, 25 blue marbles
– Based on sample we conclude that 30% (15 out of 50) marbles
are green
– 30% = point estimate
Example
• How confident are we in this estimate?
– Actual percentage of green marbles could be higher
or lower, ie. sample of 50 may not reflect
distribution in entire bag of marbles
• Can calculate a confidence interval to
determine the degree of uncertainty.
• How do you calculate a confidence interval?
• Can do so by hand or use a statistical program
– Epi Info, SAS, STATA, SPSS and Episheet are common
statistical programs
• Most commonly used confidence interval is the 95%
interval
Ø95% CI indicates that our estimated range has a 95%
chance of containing the true population value
• Assume that the 95% CI for our bag of marbles
example is 17-43%
• We estimated that 30% of the marbles are green:
ØCI tells us that the true percentage of green marbles
is most likely between 17 and 43%
ØThere is a 5% chance that this range (17-43%) does
not contain the true percentage of green marbles
Confidence Intervals
Confidence Intervals
• If we want less chance of error we could
calculate a 99% confidence interval
ØA 99% CI will have only a 1% chance of error but
will have a wider range
Ø99% CI for green marbles is 13-47%
• If a higher chance of error is acceptable we
could calculate a 90% confidence interval
Ø90% CI for green marbles is 19-41%
• Very narrow CIs indicate a very precise estimate.
• Can get a more precise estimate by taking a larger
sample
Ø100 marble sample with 33 green marbles
• Point estimate is(33%)
• 95% confidence interval is 21-39% (rather than 17-43% for
original sample)
Ø200 marble sample with 56 green marbles
• Point estimate is 28%
• 95% confidence interval is 24-36%
• CI becomes narrower as the sample size
increases
Confidence Intervals
Formula of Confidence Intervals
• 95% CI for a mean & proporLon
x̅ ± 1.96 x SE(x)
• 95% CI for a rate, rate raLo, SMR or odds raLo
Rate ÷/x Error factor
Rate raLo ÷/x Error factor
Odds raLo ÷/x Error factor
SMR ÷/x Error factor
• Statisticians can calculate a range (interval) in which we can be
fairly sure (confident) that the “true value” lies.
– For example, we may be interested in blood pressure (BP)
reduction with antihypertensive treatment.
– From a sample of treated patients we can work out the
mean change in BP.
• However, this will only be the mean for our particular sample.
• If we took another group of patients we would not expect to
get exactly the same value, because chance can also affect the
change in BP.
• The CI gives the range in which the true value (i.e. the mean
change in BP if we treated an infinite number of patients) is
likely to be.
Confidence Intervals
Interpretation of CI
• We can be 95% confident that the true mean
cholesterol of population (parameter) lies within
this interval 194.3 and 198.7.
• We are 95% confident that the true mean
cholesterol of population (parameter) is between
194.3 and 198.7
• The interpretation of CI always relates to a
parameter, and never a statistic.
What precisely do we mean by 95% confident?
• Suppose we were to repeatedly sample from the
population, and calculate a 95% CI for each sample.
• 95% of those 95% CI would capture the true value of
the population.
• Suppose we take a random sample of 10 students
from a high school and obtain their score of Math
Exam.
• These 10 students had a mean of 12 with a
corresponding 95% CI (11, 15)
Interpreta(on of CI
• We can be 95% confident that the populaLon
mean Math score for students in this school
lies between 11 and 15.
• In repeated sampling, 95% of the 95% CIs
calculated in this manner would capture the
true mean Math score of students in this
school.
Wrong Interpretation of CI
• 95% of students in this school have Math score
that lie between 11 and 15.
• We can be 95% confident that the sample
mean Math score of the 10 students lies
between 11 and 15.
• In repeated sampling, 95% of the interval will
capture the sample mean.
Example
• What is the complication rate of thoracoscopy at GHS? How to
interpret?
• Using 3 years of data from GHS there were 52 patients who
had a thoracoscopy; of these, 4 patients had a complication
(7.7%) complication rate an (95% CI = 2.5%, 17.5%).
Interpretation:
• Based on our sample data, we are 95% confident that the
"true" complication rate at GHS is between 2.5% and 17.5%.
• Another interpretation:
– if we were to take 100 additional samples, 95 times out of
100, the complication rate would fall between 2.5% and
17.5%.
Example
• The statistics professors at a university want
to estimate the average statistics anxiety
score for all of their undergraduate students.
• It would be too time consuming and costly to
give every undergraduate student at the
university their statistics anxiety survey.
• Instead, they take a random sample of 50
undergraduate students at the university and
administer their survey.
Example cont.
• Using the data collected from the sample, they
construct a 95% confidence interval for the mean
staCsCcs anxiety score in the populaCon of all
university undergraduate students.
• They are using x̅ to esCmate μ.
• If the 95% confidence interval for μ is 26 to 32, then
we could say,
“we are 95% confident that the mean staasacs anxiety
score of undergraduate students at this university is
between 26 and 32.”
• In other words, we are 95% confidence that
26≤μ≤32. This may also be wri[en as
[29, 95% CI: 26,32] or [29, 95% CI: 26 to 32]
Confidence Intervals
• A range computed using sample statistics to
estimate an unknown population parameter
with a given level of confidence.
• A range (or interval) of values used to
estimate the true value of a population
parameter.
Factors Affecting Confidence Interval Estimates
The factors that determine the width of a
confidence interval are:
1. The sample size, n.
2. The variability in the population,
usually σ estimated by s.
3. The desired level of confidence.
Sample Size
• Sample size determination is closely related to
statistical estimation.
• Quite often, you ask, How large a sample is
necessary to make an accurate estimate?
• The answer is not simple, since it depends on three
things:
1. the maximum error of the estimate,
2. the population standard deviation, and
3. the degree of confidence.
Sample Size
• For example, how close to the true mean do you
want to be (2 units, 5 units, etc.), and how confident
do you wish to be (90, 95, 99%, etc.)?
• For the purpose of this chapter, it will be assumed
that the populaCon standard deviaCon of the
variable is known or has been esCmated from a
previous study.
Sample Size
Sample Size
Sample Size
Sample Size
Sample Size
Example
Example
Example
Example
Example
[0.76, 95% CI: 0.701,0.819 ]or [0.76, 95% CI: 0.701 to 0.819 ]
• Thus, a 95% confidence interval for a mean is
calculated as follows:
• If we took thousands of samples, and for each
sample calculated the mean and associated 95%
confidence interval, we would expect 95% of
these confidence intervals to include the
population mean.
Confidence Interval for a Mean
Exercise
• The interpretation of the confidence interval
in this statement is (B)
Confidence Interval for a Mean
• SomeLmes we may wish to use other
confidence intervals such as 90% or 99%
confidence intervals.
• For a 99% confidence interval the value 1.96
used in the formula for a 95% confidence
interval becomes 2.58.
• For a 90% confidence interval the value 1.96
in the formula used previously becomes
1.65..
Confidence Interval for a Mean
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf

More Related Content

Similar to 2_Lecture 2_Confidence_Interval_3.pdf

Similar to 2_Lecture 2_Confidence_Interval_3.pdf (20)

Sampling
SamplingSampling
Sampling
 
Sampling
SamplingSampling
Sampling
 
Sample Size Determination
Sample Size DeterminationSample Size Determination
Sample Size Determination
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
unit 10 Sampling presentation L- short.ppt
unit 10 Sampling presentation L- short.pptunit 10 Sampling presentation L- short.ppt
unit 10 Sampling presentation L- short.ppt
 
Brm chap-4 present-updated
Brm chap-4 present-updatedBrm chap-4 present-updated
Brm chap-4 present-updated
 
SAMPLING Theory.ppt
SAMPLING Theory.pptSAMPLING Theory.ppt
SAMPLING Theory.ppt
 
Probability_and_Statistics_lecture_notes_1.pptx
Probability_and_Statistics_lecture_notes_1.pptxProbability_and_Statistics_lecture_notes_1.pptx
Probability_and_Statistics_lecture_notes_1.pptx
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statistics
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statistics
 
Presentation1
Presentation1Presentation1
Presentation1
 
statistical inference.pptx
statistical inference.pptxstatistical inference.pptx
statistical inference.pptx
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Statistics four
Statistics fourStatistics four
Statistics four
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 
TREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxTREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptx
 
COM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxCOM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptx
 
2RM2 PPT.pptx
2RM2 PPT.pptx2RM2 PPT.pptx
2RM2 PPT.pptx
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical research
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 

2_Lecture 2_Confidence_Interval_3.pdf

  • 1. Confidence intervals and Sample Size Assist.Prof. TOL BUNKEA, MD,MSc-Epidemiology (London School of Hygiene and Tropical Medicine, UK) Head of Epidemiology Unit of National Centre for Parasitology, Entomology and Malarial Control (CNM, MoH) Lecturer of Epidemiology and Biostatistics (NIPH, UHS, UP) Tel: 016 690 999 Email: tolbunkea@ymail.com,
  • 2. Objec&ves: At the end of this session you will be able to: 1. Distinguish between sample statistic and population parameter 2. Find the confidence interval for the mean when s is unknown 3. Interpret confidence interval in the estimation of the population parameter. 4. Describe factors influencing the width of the confidence interval 5. Explain reasons for using confidence interval
  • 3. Outline ① Population and Samples ② Point Es3mates and Confidence Intervals ③ Effect of sample size on Confidence Intervals ④ Confidence interval for mean and proportion 3
  • 4. Introduction Statistical inference is the process by which we draw conclusions about a population from data collected on a sample. 4
  • 5. Population and Samples • Census –Everyone in popula7on • Eg. All Cambodian residents • Popula+on –is a set of persons (or objects) having a common observable characteris7c. –the en7re collec7on of units about which we would like informa7on.
  • 6. • Sample –is a representative subject (subgroup) of a population. –the collection of units we actually measure • Example: –If we want to know many persons in a community • have quit smoking or • have health insurance or • plan to vote for a certain candidate, Population and Samples
  • 7. • In medical research –Population • All patients candidate for treatment –Sample • All patients candidate for treatment who volunteer for your study • Infer results from volunteers (sample) to other candidates for the same treatment (population). –We usually obtain information on an appropriate sample of the community and generalize from it to the entire population. Population and Samples
  • 8. • The way the sample is selected, not its size, determines whether we may draw appropriate inferences about a population. • The primary reason for selecting a sample from a population is to draw inferences about that population. • Statistical inference is the process by which we infer population properties from sample properties. • A major component: Parameter and Statistic Population and Samples
  • 10. Popula&on and Samples Parameters ØAre fixed values “truth”, but we rarely know them because it is often difficult to obtain measures from the entire population. ØThe true value we hope to obtain.
  • 11. Population and Samples Statistics ØAre known values computed from a sample; they are random variables because they differ from sample to sample. ØAn estimate of the parameter based on observed information in the sample ØStatistics have associated error • The study of statistics is about estimating that error • Central limit theorem tells us how much error to expect in our sample estimates (i.e. sample statistics)
  • 12. Example • A study is conducted to estimate the true mean annual income of all adult residents of Cambodia. • The study randomly selects 2000 adult residents of Cambodia. What are the population? Sample? Parameter? Statistic? • The population consists of all adult residents of Cambodia. • The sample is the 2000 residents in the study. • The parameter is the true mean annual income of all adult residents of Cambodia. • The statistic is the mean of the 2000 residents in this sample.
  • 13. Example • A survey is carried out at a university to estimate the proportion of undergraduate students who drive to campus to attend classes. • One thousand students are randomly selected and asked whether they drive or not to campus to attend classes. – What are the population? Sample? Parameter? Statistic? • The population is all of the undergraduates at that university. • The sample is the group of 1000 undergraduate students surveyed. • The parameter is the true proportion of all undergraduate students at that university who drive to campus to attend classes. • The statistic is the proportion of the 1000 sampled undergraduates who drive to campus to attend classes.
  • 14.
  • 15.
  • 16.
  • 17. Statistical methods to make inferences about the population from the sample
  • 18. One aspect of inferential statistics is estimation, which is the process of estimating the value of a parameter from information obtained from a sample. Point and Interval Estimates
  • 19. • Since the populaLons from which these values were obtained are large, these values are only esLmates of the true parameters and are derived from data collected from samples. • The staLsLcal procedures for esLmaLng the populaLon mean, propor?on, variance, and standard devia?on will be explained. • An important quesLon in esLmaLon is that of sample size. Point and Interval Estimates
  • 20. • How large should the sample be in order to make an accurate estimate? • This question is not easy to answer since the size of the sample depends on several factors, such as the accuracy desired and the probability of making a correct estimate. • The question of sample size will be explained. Point and Interval Estimates
  • 21. Confidence Intervals for the Mean When s Is Known and Sample Size
  • 22. Confidence Intervals for the Mean When s Is Known and Sample Size • Suppose a college president wishes to estimate the average age of students attending classes this semester. • The president could select a random sample of 100 students and find the average age of these students, say, 22.3 years. • From the sample mean, the president could infer that the average age of all the students is 22.3 years. • This type of estimate is called a point estimate.
  • 23. Confidence Intervals for the Mean When s Is Known and Sample Size • You might ask why other measures of central tendency, such as the median and mode, are not used to esCmate the populaCon mean. • The reason is that the means of samples vary less than other staCsCcs (such as medians and modes) when many samples are selected from the same populaCon. • Therefore, the sample mean is the best esCmate of the populaCon mean.
  • 24. Confidence Intervals for the Mean When s Is Known and Sample Size • Sample measures (i.e., statistics) are used to estimate population measures (i.e., parameters). • These statistics are called estimators. • As previously stated, the sample mean is a better estimator of the population mean than the sample median or sample mode. • A good estimator should satisfy the three properties described now.
  • 25. Confidence Intervals for the Mean When s Is Known and Sample Size
  • 26. Confidence Intervals • The sample mean will be, for the most part, somewhat different from the population mean due to sampling error. • Therefore, you might ask a second question: How good is a point estimate? • The answer is that there is no way of knowing how close a particular point estimate is to the population mean.
  • 27. Confidence Intervals • This answer places some doubt on the accuracy of point estimates. • For this reason, statisticians prefer another type of estimate, called an interval estimate.
  • 28. Confidence Intervals • In an interval estimate, the parameter is specified as being between two values. • For example, an interval estimate for the average age of all students might be 26.9 < µ < 27.7, or 27.3 ± 0.4 years. • Either the interval contains the parameter or it does not. • A degree of confidence (usually a percent) can be assigned before an interval estimate is made. • For instance, you may wish to be 95% confident that the interval contains the true population mean. • Another question then arises. Why 95%? Why not 99 or 99.5%?
  • 29. Confidence Intervals • If you desire to be more confident, such as 99 or 99.5% confident, then you must make the interval larger. • For example, a 99% confidence interval for the mean age of college students might be 26.7 < µ < 27.9, or 27.3 ± 0.6. • Hence, a tradeoff occurs. • To be more confident that the interval contains the true population mean, you must make the interval wider.
  • 30. Confidence Intervals • Intervals constructed in this way are called confidence intervals. • Three common confidence intervals are used: the 90, the 95, and the 99% confidence intervals.
  • 31. Confidence Intervals The central limit theorem states that when the sample size is large, approximately 95% of the sample means taken from a population and same sample size will fall within ± 1.96 standard errors of the population mean, that is, µ ± 𝟏. 𝟗𝟔 s 𝒏
  • 32. Confidence Intervals Hence, you can be 95% confident that the population mean is contained within that interval when the values of the variable are normally distributed in the population.
  • 35. Point and Interval Estimates A point estimate is the statistic, computed from sample information, which is used to estimate the population parameter (single number) that is an estimate of the population parameter;
  • 36. • Point estimate is one of the main purposes of statistics. • The basic idea is that we take a sample of data and use it to make inferences about the population of interest. • Point estimate involves the calculation of confidence intervals for some statistic (For ex. a proportion or an average) Point and Interval Estimates
  • 37. Point and Interval Es?mates • A confidence interval estimate is a range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability. • A range of values within which, we believe, the true parameter lies with high probability. • The specified probability is called the level of confidence. • Point estimate is a form of statistical inference. • In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter.
  • 38. Example: • A random sample of 32 patients treatment cost is taken from a local hospital. Find a point estimate for the population mean µ. • The point estimate for the population mean µ of treatment cost is 74.22$. Point and Interval Estimates
  • 39. Confidence Intervals • Describes the precision of the estimate. • The CI represents a range of values on either side of the estimate. • The narrower the CI, the more precise the point estimate.
  • 40. Example • A large bag of 500 red, green and blue marbles: – You want to know the percentage of green marbles but don’t want to count every marble. – Shake up the bag and select 50 marbles to give an estimate of the percentage of green marbles • Sample of 50 marbles: – 15 green marbles, 10 red marbles, 25 blue marbles – Based on sample we conclude that 30% (15 out of 50) marbles are green – 30% = point estimate
  • 41. Example • How confident are we in this estimate? – Actual percentage of green marbles could be higher or lower, ie. sample of 50 may not reflect distribution in entire bag of marbles • Can calculate a confidence interval to determine the degree of uncertainty. • How do you calculate a confidence interval? • Can do so by hand or use a statistical program – Epi Info, SAS, STATA, SPSS and Episheet are common statistical programs
  • 42. • Most commonly used confidence interval is the 95% interval Ø95% CI indicates that our estimated range has a 95% chance of containing the true population value • Assume that the 95% CI for our bag of marbles example is 17-43% • We estimated that 30% of the marbles are green: ØCI tells us that the true percentage of green marbles is most likely between 17 and 43% ØThere is a 5% chance that this range (17-43%) does not contain the true percentage of green marbles Confidence Intervals
  • 43. Confidence Intervals • If we want less chance of error we could calculate a 99% confidence interval ØA 99% CI will have only a 1% chance of error but will have a wider range Ø99% CI for green marbles is 13-47% • If a higher chance of error is acceptable we could calculate a 90% confidence interval Ø90% CI for green marbles is 19-41%
  • 44. • Very narrow CIs indicate a very precise estimate. • Can get a more precise estimate by taking a larger sample Ø100 marble sample with 33 green marbles • Point estimate is(33%) • 95% confidence interval is 21-39% (rather than 17-43% for original sample) Ø200 marble sample with 56 green marbles • Point estimate is 28% • 95% confidence interval is 24-36% • CI becomes narrower as the sample size increases Confidence Intervals
  • 45.
  • 46. Formula of Confidence Intervals • 95% CI for a mean & proporLon x̅ ± 1.96 x SE(x) • 95% CI for a rate, rate raLo, SMR or odds raLo Rate ÷/x Error factor Rate raLo ÷/x Error factor Odds raLo ÷/x Error factor SMR ÷/x Error factor
  • 47. • Statisticians can calculate a range (interval) in which we can be fairly sure (confident) that the “true value” lies. – For example, we may be interested in blood pressure (BP) reduction with antihypertensive treatment. – From a sample of treated patients we can work out the mean change in BP. • However, this will only be the mean for our particular sample. • If we took another group of patients we would not expect to get exactly the same value, because chance can also affect the change in BP. • The CI gives the range in which the true value (i.e. the mean change in BP if we treated an infinite number of patients) is likely to be. Confidence Intervals
  • 48.
  • 49. Interpretation of CI • We can be 95% confident that the true mean cholesterol of population (parameter) lies within this interval 194.3 and 198.7. • We are 95% confident that the true mean cholesterol of population (parameter) is between 194.3 and 198.7 • The interpretation of CI always relates to a parameter, and never a statistic.
  • 50. What precisely do we mean by 95% confident? • Suppose we were to repeatedly sample from the population, and calculate a 95% CI for each sample. • 95% of those 95% CI would capture the true value of the population. • Suppose we take a random sample of 10 students from a high school and obtain their score of Math Exam. • These 10 students had a mean of 12 with a corresponding 95% CI (11, 15)
  • 51. Interpreta(on of CI • We can be 95% confident that the populaLon mean Math score for students in this school lies between 11 and 15. • In repeated sampling, 95% of the 95% CIs calculated in this manner would capture the true mean Math score of students in this school.
  • 52. Wrong Interpretation of CI • 95% of students in this school have Math score that lie between 11 and 15. • We can be 95% confident that the sample mean Math score of the 10 students lies between 11 and 15. • In repeated sampling, 95% of the interval will capture the sample mean.
  • 53. Example • What is the complication rate of thoracoscopy at GHS? How to interpret? • Using 3 years of data from GHS there were 52 patients who had a thoracoscopy; of these, 4 patients had a complication (7.7%) complication rate an (95% CI = 2.5%, 17.5%). Interpretation: • Based on our sample data, we are 95% confident that the "true" complication rate at GHS is between 2.5% and 17.5%. • Another interpretation: – if we were to take 100 additional samples, 95 times out of 100, the complication rate would fall between 2.5% and 17.5%.
  • 54.
  • 55. Example • The statistics professors at a university want to estimate the average statistics anxiety score for all of their undergraduate students. • It would be too time consuming and costly to give every undergraduate student at the university their statistics anxiety survey. • Instead, they take a random sample of 50 undergraduate students at the university and administer their survey.
  • 56. Example cont. • Using the data collected from the sample, they construct a 95% confidence interval for the mean staCsCcs anxiety score in the populaCon of all university undergraduate students. • They are using x̅ to esCmate μ. • If the 95% confidence interval for μ is 26 to 32, then we could say, “we are 95% confident that the mean staasacs anxiety score of undergraduate students at this university is between 26 and 32.” • In other words, we are 95% confidence that 26≤μ≤32. This may also be wri[en as [29, 95% CI: 26,32] or [29, 95% CI: 26 to 32]
  • 57. Confidence Intervals • A range computed using sample statistics to estimate an unknown population parameter with a given level of confidence. • A range (or interval) of values used to estimate the true value of a population parameter.
  • 58. Factors Affecting Confidence Interval Estimates The factors that determine the width of a confidence interval are: 1. The sample size, n. 2. The variability in the population, usually σ estimated by s. 3. The desired level of confidence.
  • 59. Sample Size • Sample size determination is closely related to statistical estimation. • Quite often, you ask, How large a sample is necessary to make an accurate estimate? • The answer is not simple, since it depends on three things: 1. the maximum error of the estimate, 2. the population standard deviation, and 3. the degree of confidence.
  • 60. Sample Size • For example, how close to the true mean do you want to be (2 units, 5 units, etc.), and how confident do you wish to be (90, 95, 99%, etc.)? • For the purpose of this chapter, it will be assumed that the populaCon standard deviaCon of the variable is known or has been esCmated from a previous study.
  • 70.
  • 71. Example [0.76, 95% CI: 0.701,0.819 ]or [0.76, 95% CI: 0.701 to 0.819 ]
  • 72. • Thus, a 95% confidence interval for a mean is calculated as follows: • If we took thousands of samples, and for each sample calculated the mean and associated 95% confidence interval, we would expect 95% of these confidence intervals to include the population mean. Confidence Interval for a Mean
  • 73. Exercise • The interpretation of the confidence interval in this statement is (B) Confidence Interval for a Mean
  • 74. • SomeLmes we may wish to use other confidence intervals such as 90% or 99% confidence intervals. • For a 99% confidence interval the value 1.96 used in the formula for a 95% confidence interval becomes 2.58. • For a 90% confidence interval the value 1.96 in the formula used previously becomes 1.65.. Confidence Interval for a Mean