SlideShare a Scribd company logo
1 of 45
Download to read offline
PROBABILITY
GR Miyambu
School of Science & Technology
Department of Statistical Sciences
Types of Probability
Types of Probability
Frequency
What is the probability of a
randomly chosen person
dying in the next year?
Model-based
What is the probability of a
child being affected by
cystic fibrosis given one of
the parents is a carrier of
the disease?
Subjective
What is the probability that
a particular patient has
heart disease given they
have chest pain?
Properties of Probability
The three types of probability all have the following
properties.
1. All probabilities lie between 0 and 1.
2. When the outcome can never happen the probability is 0.
3. When the outcome will definitely happen the probability
is 1.
Diagnostic Test
• A diagnostic test is any approach used to gather clinical
information for the purpose of making a clinical decision
(i.e., diagnosis).
• Some examples of diagnostic tests include X-rays, biopsies,
pregnancy tests, medical histories, and results from
physical examinations.
• From a statistical point of view there are two points to keep
in mind:
1. the clinical decision-making process is based on
probability;
2. the goal of a diagnostic test is to move the estimated
probability of disease toward either end of the
probability scale (i.e., 0 rules out disease, 1 confirms the
disease).
Uses of Diagnostic Test
• In making a diagnosis, a clinician first establishes a possible set of
diagnostic alternatives and then attempts to reduce these by
progressively ruling out specific diseases or conditions.
• Alternatively, the clinician may have a strong hunch that the patient
has one particular disease and he then sets about confirming it.
• Given a particular diagnosis, a good diagnostic test should indicate
either that the disease is very unlikely or that it is very probable.
• In a practical sense it is important to realize that a diagnostic test is
useful only if the result influences patient management since, if the
management is the same for two different conditions, there is little
point in trying strenuously to distinguish between them.
Analysis of Diagnostic Test
Disease No Disease
Test Positive a (true positives) b (false positives)
Test Negative c (false negatives) d (true negatives)
Gold Standard
• The "Gold Standard" is the method used to obtain a definitive diagnosis for a
particular disease; it may be biopsy, surgery, autopsy or an acknowledged standard.
• Gold Standards are used to define true disease status against which the results of a
new diagnostic test are compared.
• Here are a number of definitive diagnostic tests that will confirm whether or not
you have the disease.
• Some of these are quite invasive and this is a major reason why new diagnostic
procedures are being developed.
Target Disorder Gold Standard
Breast cancer Excisional biopsy
Prostate cancer Transrectal biopsy
Coronar stenosis Coronary angiography
Myocardial infarction Catheterization
Strep throat Throat culture
Sensitivity and specificity
• Many diagnostic test results are given in the form of a
continuous variable (that is one that can take any value
within a given range), such as diastolic blood pressure or
haemoglobin level.
• However, for ease of discussion we will first assume that
these have been divided into positive or negative results.
• For example, a positive diagnostic result of ‘hypertension’ is
a diastolic blood pressure greater than 90 mmHg;
• whereas for ‘anaemia’, a haemoglob
in level less than 10 g/ d
l is
required.
• For every diagnostic procedure (which may involve a
laboratory test of a sample taken) there is a set of
fundamental questions that should be asked.
Sensitivity and specificity
• First, if the d
isease is present, what is the prob
a b
ility that the test
result will be positive?
• This leads to the notion of the sensitivity of the test.
• Secon d
, if the d
isease is a b
sent, what is the pro b
a b
ility that the test
result will be negative?
• This question refers to the specificity of the test.
• These questions can b
e answere donly if it is known what the ‘true’
diagnosis is.
• In the case of organic d
isease this can b
e d
etermine d b
y b
iopsy or, for
example, an expensive an drisky proce d
ure such as angiography for
heart disease.
• In other situations it may b
e b
y ‘expert’ opinion. Such tests provi d
e
the so-called ‘gold standard’.
Example
Diagnosis of heart
disease
• Consider the results of an
assay of N-terminal pro-brain
natriuretic peptide (NT-
proBNP) for diagnosis of heart
failure in a general population
survey in those over 45 years
of age and in patients with
existing diagnosis of heart
failure obtained by Hobbs et
al (2002) and summarised in
the table.
• Heart failure was identified
when NT-proBNP >36 pmol/l.
NT-proBNP
(pmol/l)
Confirmed Diagnosis of
Heart Failure
Present Absent Total
(D+) (D-)
> 36
Positive
(T+) 35 (a) 7 (b) 42
Negative
(T-) 68 (c) 300 (d) 368
Total 103 307 410
Sensitivity and Specificity
• The prevalence of heart failure in these subjects is (a + c)/(a + b + c + d)
• P(D+)=??
• The sensitivity of a test is theproportion of those with thedisease who also have a
positive test result.
• The sensitivity is a/(a + c)=??
• N o
w sensitivity is the pr o
bability o
f a p o
s it iv
e test result (e v
ent T
+) g
iv
en that the d
isease is
present (event D+) and can be written as p(T+|D+)=??, where the ‘|’ is read as ‘given’.
• The specificity of the test is the proportion of those without disease who give a
negative test result.
• Thus the specificity is d/(b + d)=??
• N o
w specificity is the pr o
bability o
f a ne g
ativ
e test result (e v
ent T
-) g
iv
en that the d
isease is absent
(event D-) and can be written as p(T-|D-)=??
• Since sensitivity is conditional on the disease being present, and specificity on the
disease being absent, in theory, they are unaffected by disease prevalence.
Sensitivity and Specificity
• Sensitivity and specificity are useful statistics because they will
yield consistent results for the diagnostic test in a variety of
patient groups with different disease prevalences.
• This is an important point; sensitivity and specificity are
characteristics of the test, not the population to which the test
is applied.
• Although indeed they are independent of disease prevalence, in
practice if the disease is very rare, the accuracy with which one
can estimate the sensitivity will be limited.
• Two other terms in common use are: the false negative rate (or
probability of a false negative) which is given by c/(a + c) =
1 - Sensitivity, and the false positive rate (or probability of
a false positive) or b/(b + d) = 1 - Specificity.
• Since sensitivity = 1 - Probability(false negative) and specificity =
1 - Probability(false positive), a possibly useful mnemonic to
recall this is that ‘sensitivity’ and ‘negative’ have ‘n’s in them
and ‘specificity’ and ‘positive’ have ‘p’s in them.
Summary of Definitions of Sensitivity and Specificity
Test result
True diagnosis
Disease present Disease absent
Positive Sensitivity
Probability of a false
positive
Negative
Probability of a false
negative
Specificity
Rates Assuming a Predicted Condition
• The predictive value refers to the likelihood for determining an outbreak or
non-outbreak of an infectious disease based on early warning results.
• Predictive values can be classified into the positive predictive value (PPV) and
the negative predictive value (PNV).
• Positive predictive value is the proportion of individuals with positive test
results that are correctly diagnosed and actually have the disease.
𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑝𝑝 𝐷𝐷 + 𝑇𝑇 + =
𝑎𝑎
𝑎𝑎 + 𝑏𝑏
• Negative predictive value is the proportion of individuals with negative test
results that are correctly diagnosed and do not have the disease.
𝑁𝑁𝑁𝑁𝑁𝑁 = 𝑝𝑝 𝐷𝐷 − 𝑇𝑇 − =
𝑑𝑑
𝑐𝑐 + 𝑑𝑑
• False Omission Rate is the proportion of the individuals with a negative test
result for which the true condition is positive.
𝐹𝐹𝐹𝐹𝐹𝐹 =
𝑐𝑐
𝑐𝑐 + 𝑑𝑑
• The false discovery rate is the proportion of the individuals with a positive
test result for which the true condition is negative.
𝐹𝐹𝐹𝐹𝐹𝐹 =
𝑏𝑏
𝑎𝑎 + 𝑏𝑏
Whole Table Rates
• P r
evalence is the proportion of a population who have a
specific characteristic in a given time period.
• The prevalence may be estimated from the table if all the
individuals are randomly sampled from the population.
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑎𝑎 + 𝑐𝑐
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
• The Accu racy o
r P ro
p orti o
n C o
rrectly Cla ss
ifie d reflects
the total proportion of individuals that are correctly
classified.
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑎𝑎 + 𝑑𝑑
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
• The proportion incorrectly classified reflects the total
proportion of individuals that are incorrectly classified.
𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑏𝑏 + 𝑐𝑐
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
Likelihood Ratio
• The clear simplicity of diagnostic test data, particularly when presented as a 2 x 2
table, is confounded by many ways of reporting the results.
• The likelihood ratio (LR) is a simple measure combining sensitivity and specificity
• We have positive likelihood ratio (LR+) defined as
𝐿𝐿𝐿𝐿 +=
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
=
𝑇𝑇𝑇𝑇𝑇𝑇
𝐹𝐹𝐹𝐹𝐹𝐹
• This gives a ratio of the test being positive for patients with disease compared with those without
disease. Aim to be much greater than 1 for a good test.
• And negative likelihood ratio (LR-) defined as
𝐿𝐿𝐿𝐿 −=
1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
=
𝐹𝐹𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇𝑇𝑇
• This gives a ratio of the test being negative for patients with disease compared with those without
disease. Aim to be considerably less than 1 for a good test.
• The likelihood odds ratio is the ratio of the positive likelihood ratio to the negative
likelihood ratio.
• In some calculation methods, ½ is added to all counts before the calculation of LOR, to avoid
dividing by 0.
𝐿𝐿𝐿𝐿𝐿𝐿 =
𝐿𝐿𝐿𝐿 +
𝐿𝐿𝐿𝐿 −
Distributions
Types of Distributions
• Binomial
• Poisson
• Normal
The Normal Distribution
Properties of Normal Distribution
How do we use the Normal distribution?
• The Normal probability distribution can be used to calculate
the probability of different values occurring.
• We could be interested in: what is the probability of being
within 1 standard deviation of the mean (or outside it)?
• We can use a Normal distribution table which tells us the
probability of being outside this value.
• The Normal distribution also has other uses in statistics and
is often used as an approximation to the Binomial and
Poisson distributions.
Populations and Samples
• In the statistical sense a population is a theoretical concept used
to describe an entire group of individuals in whom we are
interested.
• Examples are the population of all patients with diabetes
mellitus, or the population of all middle-aged men.
• Parameters are quantities used to describe characteristics of
such populations.
• Thus the proportion of diabetic patients with nephropathy, or
the mean blood pressure of middle-aged men, are characteristics
describing the two populations.
• Generally, it is costly and labour intensive to study the entire
population.
• Therefore we collect data on a sample of individuals from the
population who we believe are representative of that
population, that is, they have similar characteristics to the
individuals in the population.
• We then use them to draw conclusions, technically make
inferences, about the population as a whole.
Populations and Samples
Populations and Samples
• The process is represented schematically in the figure.
• So, samples are taken from populations to provide estimates
of population parameters
• It is important to note that although the study populations
are unique, samples are not as we could take more than one
sample from the target population if we wished.
• Thus for middle-aged men there is only one normal range
for blood pressure.
• However, one investigator taking a random sample from a
population of middle-aged men and measuring their blood
pressure may obtain a different normal range from another
investigator who takes a different random sample from the
same population of such men.
• By studying only some of the population we have
introduced a sampling error.
Populations and Samples
Sample
• In some circumstances the sample may consist of all the members of a specifically
defined population.
• For practical reasons, this is only likely to be the case if the population of interest is
not too large.
• If all members of the population can be assessed, then the estimate of the
parameter concerned is derived from information obtained on all members and so
its value will be the population parameter itself.
• In this idealised situation we know all about the population as we have examined all
its members and the parameter is estimated with no bias.
• The dotted arrow in Figure 6.1 connecting the population ellipse to population
parameter box illustrates this.
• However, this situation will rarely be the case so, in practice, we take a sample which
is often much smaller in size than the population under study.
Sample
• Ideally we should aim for a random sample.
• A list of all individuals from the population is drawn up (the
sampling frame), and individuals are selected randomly
from this list, that is, every possible sample of a given size in
the population has an equal chance of being chosen.
• Sometimes, there may be difficulty in constructing this list
or we may have to ‘make-do’ with those subjects who
happen to be available or what is termed a convenience
sample.
• Essentially if we take a random sample then we obtain an
unbiased estimate of the corresponding population
parameter, whereas a convenience sample may provide a
biased estimate but by how much we will not know
Properties of the distribution of sample
means
• The mean of all the sample means will be the same as the
population mean.
• The standard deviation of all the sample means is known as
the standard error (SE) of the mean or SEM.
• Given a large enough sample size, the distribution of sample
means, will be roughly Normal regardless of the distribution
of the variable.
Properties of standard errors
• The standard error (SE) is a measure of the precision
of a sample estimate.
• It provides a measure of how far from the true value
in the population the sample estimate is likely to be.
• All standard errors have the following interpretation:
• A large standard error indicates that the estimate is
imprecise.
• A small standard error indicates that the estimate is
precise.
• The standard error is reduced, that is, we obtain a more
precise estimate, if the size of the sample is increased.
Standard errors
Worked example: Standard error of a mean
– birthweight of preterm infants
• Simpson (2004) reported the birthweights of 98 infants who
were born prematurely, for which n = 98, ̅
𝑥𝑥 = 1.31 kg,
s = 0.42 kg and 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥 =? ?
• The standard error provides a measure of the precision of
our sample estimate of the population mean birthweight
Worked example: Standard error of a
proportion – acupuncture and headache
• Melchart et al (2005) give the proportion who responded to
acupuncture treatment in 124 patients with tension type
headache as p = 0.46.
• We assume the numbers who respond have a Binomial
distribution and from Table 6.3 we find the standard error is
𝑆𝑆𝑆𝑆 𝑝𝑝 =? ?
Worked example: Standard error of a rate –
cadaveric heart donors
• The study of Wight et al (2004) gave the number of organ
donations calculated over a two-year period as r = 1.82 per
day.
• We assume the number of donations follows a Poisson
distribution and from Table 6.3 we find the standard error is
𝑆𝑆𝑆𝑆 𝑟𝑟 =? ?
Standard Errors of Differences
Example: Difference in means – physiotherapy for patients with lung
disease
• Griffiths et al (2000) report the results of a randomised
controlled trial to compare a pulmonary rehabilitation
programme (Intervention) with standard medical management
(Control) for the treatment of chronic obstructive pulmonary
disease.
• One outcome measure was the walking capacity (distance
walked in metres from a standardised test) of the patient
assessed 6 weeks after randomisation.
• Further suppose such measurements can be assumed to follow a
Normal distribution.
• The results from the 184 patients are expressed using the group
means and standard deviations (SD) as follows:
nInt = 93, ̅
𝑥𝑥Int = 211, SD(xInt ) = sInt = 118
nCon = 91, ̅
𝑥𝑥Con = 123, SD(xCon ) = sCon = 99.
• From these data d = xInt - xCon = 211 - 123 = 88 m and the
corresponding standard error is 𝑺𝑺𝑺𝑺 �
𝒅𝒅 =
Worked example: Difference in proportions – post-natal
urinary incontinence
• The results of randomised controlled trial conducted by
Glazener et al (2001) to assess the effect of nurse
assessment with reinforcement of pelvic floor muscle
training exercises and bladder training (Intervention)
compared with standard management (Control) among
women with persistent incontinence three months
postnatally are summarised in Table 6.2.
Confidence intervals for an estimate
• A confidence interval defines a range of values within which
our population parameter is likely to lie
• Such an interval for the population mean 𝜇𝜇 is defined by
̅
𝑥𝑥 − 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥 𝑡𝑡𝑡𝑡 ̅
𝑥𝑥 + 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥
and, in this case, is termed a 95% confidence interval as it
includes the multiplier 1.96.
Confidence intervals for an mean
Confidence Intervals for Differences
• To calculate a confidence interval for a difference in
means, for example d = 𝜇𝜇A - 𝜇𝜇B, the same structure for
the confidence interval of a single mean is used but
with ̅
𝑥𝑥 replaced by ̅
𝑥𝑥1 - ̅
𝑥𝑥2 and SE( x) replaced by SE( x1 -
x2).
• Algebraic expressions for these standard errors are
given in Table 6.4 (Section 6.10).
• Thus the 95% CI is given by
̅
𝑥𝑥1 − ̅
𝑥𝑥2 − 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥1 − ̅
𝑥𝑥2 𝑡𝑡𝑡𝑡 ̅
𝑥𝑥1 − ̅
𝑥𝑥2 + 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥1 − ̅
𝑥𝑥2
More Accurate Confidence Intervals for a Proportion
• To use this method, we first need to calculate three
quantities: 𝐴𝐴 = 2𝑟𝑟 + 𝑧𝑧2; 𝐵𝐵 = 𝑧𝑧 𝑧𝑧24𝑟𝑟(1 − 𝑝𝑝); and, 𝐶𝐶 =
2(𝑛𝑛 + 𝑧𝑧2)
Where 𝑧𝑧 is from the standard normal table
• The recommended confidence interval is given by
(𝐴𝐴 − 𝐵𝐵)
𝐶𝐶
𝑡𝑡𝑡𝑡
(𝐴𝐴 + 𝐵𝐵)
𝐶𝐶
• When there are no observed events, r = 0 and hence 𝑝𝑝 =
0
𝑛𝑛
= 0, the recommended CI simplifies to
0 𝑡𝑡𝑡𝑡
𝑧𝑧2
(𝑛𝑛 + 𝑧𝑧2)
• While when r = n so that p = 1, the CI becomes
𝑛𝑛
(𝑛𝑛 + 𝑧𝑧2)
𝑡𝑡𝑡𝑡 1
Probability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for R

More Related Content

Similar to Probability.pdf.pdf and Statistics for R

Epidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptxEpidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptxBhoj Raj Singh
 
8 screening.pptxscreening.pptxscreening.
8 screening.pptxscreening.pptxscreening.8 screening.pptxscreening.pptxscreening.
8 screening.pptxscreening.pptxscreening.Wasihun Aragie
 
7 Lecture 8 Outbrek invesitgation.pptx
7 Lecture 8 Outbrek invesitgation.pptx7 Lecture 8 Outbrek invesitgation.pptx
7 Lecture 8 Outbrek invesitgation.pptxmergawekwaya
 
Evidence-based diagnosis
Evidence-based diagnosisEvidence-based diagnosis
Evidence-based diagnosisHesham Gaber
 
Screening and diagnostic testing
Screening and diagnostic  testingScreening and diagnostic  testing
Screening and diagnostic testingamitakashyap1
 
Evidence based diagnosis
Evidence based diagnosisEvidence based diagnosis
Evidence based diagnosisHesham Al-Inany
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiranKiran Ramakrishna
 
Disease screening and screening test validity
Disease screening and screening test validityDisease screening and screening test validity
Disease screening and screening test validityTampiwaChebani
 
OAJ presentation final draft
OAJ presentation final draftOAJ presentation final draft
OAJ presentation final draftBrian Eisen
 
Epide 8.pptx epidomology assignment year one
Epide 8.pptx epidomology assignment year oneEpide 8.pptx epidomology assignment year one
Epide 8.pptx epidomology assignment year oneGetahunAlega
 
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptxVALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptxShaliniPattanayak
 
Validity of a screening test
Validity of a screening testValidity of a screening test
Validity of a screening testdrkulrajat
 

Similar to Probability.pdf.pdf and Statistics for R (20)

Disease screening
Disease screeningDisease screening
Disease screening
 
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptxEpidemiological Approaches for Evaluation of diagnostic tests.pptx
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
 
H.Assessment Bates Chapter#02.pptx
H.Assessment Bates Chapter#02.pptxH.Assessment Bates Chapter#02.pptx
H.Assessment Bates Chapter#02.pptx
 
Epcm l18-19 assessing tests
Epcm  l18-19 assessing testsEpcm  l18-19 assessing tests
Epcm l18-19 assessing tests
 
8 screening.pptxscreening.pptxscreening.
8 screening.pptxscreening.pptxscreening.8 screening.pptxscreening.pptxscreening.
8 screening.pptxscreening.pptxscreening.
 
7 Lecture 8 Outbrek invesitgation.pptx
7 Lecture 8 Outbrek invesitgation.pptx7 Lecture 8 Outbrek invesitgation.pptx
7 Lecture 8 Outbrek invesitgation.pptx
 
Evidence-based diagnosis
Evidence-based diagnosisEvidence-based diagnosis
Evidence-based diagnosis
 
Screening and diagnostic testing
Screening and diagnostic  testingScreening and diagnostic  testing
Screening and diagnostic testing
 
Evidence based diagnosis
Evidence based diagnosisEvidence based diagnosis
Evidence based diagnosis
 
H.Assessment Bates Chapter#02.pptx
H.Assessment Bates Chapter#02.pptxH.Assessment Bates Chapter#02.pptx
H.Assessment Bates Chapter#02.pptx
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiran
 
Disease screening and screening test validity
Disease screening and screening test validityDisease screening and screening test validity
Disease screening and screening test validity
 
OAJ presentation final draft
OAJ presentation final draftOAJ presentation final draft
OAJ presentation final draft
 
chapter2-191105204556.pdf
chapter2-191105204556.pdfchapter2-191105204556.pdf
chapter2-191105204556.pdf
 
Clinical epidemiology
Clinical epidemiologyClinical epidemiology
Clinical epidemiology
 
Screening of diseases
Screening of diseasesScreening of diseases
Screening of diseases
 
SCREENING.pptx
SCREENING.pptxSCREENING.pptx
SCREENING.pptx
 
Epide 8.pptx epidomology assignment year one
Epide 8.pptx epidomology assignment year oneEpide 8.pptx epidomology assignment year one
Epide 8.pptx epidomology assignment year one
 
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptxVALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
VALIDITY AND RELIABLITY OF A SCREENING TEST seminar 2.pptx
 
Validity of a screening test
Validity of a screening testValidity of a screening test
Validity of a screening test
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Probability.pdf.pdf and Statistics for R

  • 1. PROBABILITY GR Miyambu School of Science & Technology Department of Statistical Sciences
  • 2. Types of Probability Types of Probability Frequency What is the probability of a randomly chosen person dying in the next year? Model-based What is the probability of a child being affected by cystic fibrosis given one of the parents is a carrier of the disease? Subjective What is the probability that a particular patient has heart disease given they have chest pain?
  • 3. Properties of Probability The three types of probability all have the following properties. 1. All probabilities lie between 0 and 1. 2. When the outcome can never happen the probability is 0. 3. When the outcome will definitely happen the probability is 1.
  • 4. Diagnostic Test • A diagnostic test is any approach used to gather clinical information for the purpose of making a clinical decision (i.e., diagnosis). • Some examples of diagnostic tests include X-rays, biopsies, pregnancy tests, medical histories, and results from physical examinations. • From a statistical point of view there are two points to keep in mind: 1. the clinical decision-making process is based on probability; 2. the goal of a diagnostic test is to move the estimated probability of disease toward either end of the probability scale (i.e., 0 rules out disease, 1 confirms the disease).
  • 5. Uses of Diagnostic Test • In making a diagnosis, a clinician first establishes a possible set of diagnostic alternatives and then attempts to reduce these by progressively ruling out specific diseases or conditions. • Alternatively, the clinician may have a strong hunch that the patient has one particular disease and he then sets about confirming it. • Given a particular diagnosis, a good diagnostic test should indicate either that the disease is very unlikely or that it is very probable. • In a practical sense it is important to realize that a diagnostic test is useful only if the result influences patient management since, if the management is the same for two different conditions, there is little point in trying strenuously to distinguish between them.
  • 6. Analysis of Diagnostic Test Disease No Disease Test Positive a (true positives) b (false positives) Test Negative c (false negatives) d (true negatives)
  • 7. Gold Standard • The "Gold Standard" is the method used to obtain a definitive diagnosis for a particular disease; it may be biopsy, surgery, autopsy or an acknowledged standard. • Gold Standards are used to define true disease status against which the results of a new diagnostic test are compared. • Here are a number of definitive diagnostic tests that will confirm whether or not you have the disease. • Some of these are quite invasive and this is a major reason why new diagnostic procedures are being developed. Target Disorder Gold Standard Breast cancer Excisional biopsy Prostate cancer Transrectal biopsy Coronar stenosis Coronary angiography Myocardial infarction Catheterization Strep throat Throat culture
  • 8. Sensitivity and specificity • Many diagnostic test results are given in the form of a continuous variable (that is one that can take any value within a given range), such as diastolic blood pressure or haemoglobin level. • However, for ease of discussion we will first assume that these have been divided into positive or negative results. • For example, a positive diagnostic result of ‘hypertension’ is a diastolic blood pressure greater than 90 mmHg; • whereas for ‘anaemia’, a haemoglob in level less than 10 g/ d l is required. • For every diagnostic procedure (which may involve a laboratory test of a sample taken) there is a set of fundamental questions that should be asked.
  • 9. Sensitivity and specificity • First, if the d isease is present, what is the prob a b ility that the test result will be positive? • This leads to the notion of the sensitivity of the test. • Secon d , if the d isease is a b sent, what is the pro b a b ility that the test result will be negative? • This question refers to the specificity of the test. • These questions can b e answere donly if it is known what the ‘true’ diagnosis is. • In the case of organic d isease this can b e d etermine d b y b iopsy or, for example, an expensive an drisky proce d ure such as angiography for heart disease. • In other situations it may b e b y ‘expert’ opinion. Such tests provi d e the so-called ‘gold standard’.
  • 10. Example Diagnosis of heart disease • Consider the results of an assay of N-terminal pro-brain natriuretic peptide (NT- proBNP) for diagnosis of heart failure in a general population survey in those over 45 years of age and in patients with existing diagnosis of heart failure obtained by Hobbs et al (2002) and summarised in the table. • Heart failure was identified when NT-proBNP >36 pmol/l. NT-proBNP (pmol/l) Confirmed Diagnosis of Heart Failure Present Absent Total (D+) (D-) > 36 Positive (T+) 35 (a) 7 (b) 42 Negative (T-) 68 (c) 300 (d) 368 Total 103 307 410
  • 11. Sensitivity and Specificity • The prevalence of heart failure in these subjects is (a + c)/(a + b + c + d) • P(D+)=?? • The sensitivity of a test is theproportion of those with thedisease who also have a positive test result. • The sensitivity is a/(a + c)=?? • N o w sensitivity is the pr o bability o f a p o s it iv e test result (e v ent T +) g iv en that the d isease is present (event D+) and can be written as p(T+|D+)=??, where the ‘|’ is read as ‘given’. • The specificity of the test is the proportion of those without disease who give a negative test result. • Thus the specificity is d/(b + d)=?? • N o w specificity is the pr o bability o f a ne g ativ e test result (e v ent T -) g iv en that the d isease is absent (event D-) and can be written as p(T-|D-)=?? • Since sensitivity is conditional on the disease being present, and specificity on the disease being absent, in theory, they are unaffected by disease prevalence.
  • 12. Sensitivity and Specificity • Sensitivity and specificity are useful statistics because they will yield consistent results for the diagnostic test in a variety of patient groups with different disease prevalences. • This is an important point; sensitivity and specificity are characteristics of the test, not the population to which the test is applied. • Although indeed they are independent of disease prevalence, in practice if the disease is very rare, the accuracy with which one can estimate the sensitivity will be limited. • Two other terms in common use are: the false negative rate (or probability of a false negative) which is given by c/(a + c) = 1 - Sensitivity, and the false positive rate (or probability of a false positive) or b/(b + d) = 1 - Specificity. • Since sensitivity = 1 - Probability(false negative) and specificity = 1 - Probability(false positive), a possibly useful mnemonic to recall this is that ‘sensitivity’ and ‘negative’ have ‘n’s in them and ‘specificity’ and ‘positive’ have ‘p’s in them.
  • 13. Summary of Definitions of Sensitivity and Specificity Test result True diagnosis Disease present Disease absent Positive Sensitivity Probability of a false positive Negative Probability of a false negative Specificity
  • 14. Rates Assuming a Predicted Condition • The predictive value refers to the likelihood for determining an outbreak or non-outbreak of an infectious disease based on early warning results. • Predictive values can be classified into the positive predictive value (PPV) and the negative predictive value (PNV). • Positive predictive value is the proportion of individuals with positive test results that are correctly diagnosed and actually have the disease. 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑝𝑝 𝐷𝐷 + 𝑇𝑇 + = 𝑎𝑎 𝑎𝑎 + 𝑏𝑏 • Negative predictive value is the proportion of individuals with negative test results that are correctly diagnosed and do not have the disease. 𝑁𝑁𝑁𝑁𝑁𝑁 = 𝑝𝑝 𝐷𝐷 − 𝑇𝑇 − = 𝑑𝑑 𝑐𝑐 + 𝑑𝑑 • False Omission Rate is the proportion of the individuals with a negative test result for which the true condition is positive. 𝐹𝐹𝐹𝐹𝐹𝐹 = 𝑐𝑐 𝑐𝑐 + 𝑑𝑑 • The false discovery rate is the proportion of the individuals with a positive test result for which the true condition is negative. 𝐹𝐹𝐹𝐹𝐹𝐹 = 𝑏𝑏 𝑎𝑎 + 𝑏𝑏
  • 15. Whole Table Rates • P r evalence is the proportion of a population who have a specific characteristic in a given time period. • The prevalence may be estimated from the table if all the individuals are randomly sampled from the population. 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑎𝑎 + 𝑐𝑐 𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑 • The Accu racy o r P ro p orti o n C o rrectly Cla ss ifie d reflects the total proportion of individuals that are correctly classified. 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑎𝑎 + 𝑑𝑑 𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑 • The proportion incorrectly classified reflects the total proportion of individuals that are incorrectly classified. 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑏𝑏 + 𝑐𝑐 𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
  • 16. Likelihood Ratio • The clear simplicity of diagnostic test data, particularly when presented as a 2 x 2 table, is confounded by many ways of reporting the results. • The likelihood ratio (LR) is a simple measure combining sensitivity and specificity • We have positive likelihood ratio (LR+) defined as 𝐿𝐿𝐿𝐿 += 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑇𝑇𝑇𝑇𝑇𝑇 𝐹𝐹𝐹𝐹𝐹𝐹 • This gives a ratio of the test being positive for patients with disease compared with those without disease. Aim to be much greater than 1 for a good test. • And negative likelihood ratio (LR-) defined as 𝐿𝐿𝐿𝐿 −= 1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝐹𝐹𝐹𝐹𝐹𝐹 𝑇𝑇𝑇𝑇𝑇𝑇 • This gives a ratio of the test being negative for patients with disease compared with those without disease. Aim to be considerably less than 1 for a good test. • The likelihood odds ratio is the ratio of the positive likelihood ratio to the negative likelihood ratio. • In some calculation methods, ½ is added to all counts before the calculation of LOR, to avoid dividing by 0. 𝐿𝐿𝐿𝐿𝐿𝐿 = 𝐿𝐿𝐿𝐿 + 𝐿𝐿𝐿𝐿 −
  • 17. Distributions Types of Distributions • Binomial • Poisson • Normal
  • 19. Properties of Normal Distribution
  • 20. How do we use the Normal distribution? • The Normal probability distribution can be used to calculate the probability of different values occurring. • We could be interested in: what is the probability of being within 1 standard deviation of the mean (or outside it)? • We can use a Normal distribution table which tells us the probability of being outside this value. • The Normal distribution also has other uses in statistics and is often used as an approximation to the Binomial and Poisson distributions.
  • 21. Populations and Samples • In the statistical sense a population is a theoretical concept used to describe an entire group of individuals in whom we are interested. • Examples are the population of all patients with diabetes mellitus, or the population of all middle-aged men. • Parameters are quantities used to describe characteristics of such populations. • Thus the proportion of diabetic patients with nephropathy, or the mean blood pressure of middle-aged men, are characteristics describing the two populations. • Generally, it is costly and labour intensive to study the entire population. • Therefore we collect data on a sample of individuals from the population who we believe are representative of that population, that is, they have similar characteristics to the individuals in the population. • We then use them to draw conclusions, technically make inferences, about the population as a whole.
  • 23. Populations and Samples • The process is represented schematically in the figure. • So, samples are taken from populations to provide estimates of population parameters • It is important to note that although the study populations are unique, samples are not as we could take more than one sample from the target population if we wished. • Thus for middle-aged men there is only one normal range for blood pressure. • However, one investigator taking a random sample from a population of middle-aged men and measuring their blood pressure may obtain a different normal range from another investigator who takes a different random sample from the same population of such men. • By studying only some of the population we have introduced a sampling error.
  • 25. Sample • In some circumstances the sample may consist of all the members of a specifically defined population. • For practical reasons, this is only likely to be the case if the population of interest is not too large. • If all members of the population can be assessed, then the estimate of the parameter concerned is derived from information obtained on all members and so its value will be the population parameter itself. • In this idealised situation we know all about the population as we have examined all its members and the parameter is estimated with no bias. • The dotted arrow in Figure 6.1 connecting the population ellipse to population parameter box illustrates this. • However, this situation will rarely be the case so, in practice, we take a sample which is often much smaller in size than the population under study.
  • 26. Sample • Ideally we should aim for a random sample. • A list of all individuals from the population is drawn up (the sampling frame), and individuals are selected randomly from this list, that is, every possible sample of a given size in the population has an equal chance of being chosen. • Sometimes, there may be difficulty in constructing this list or we may have to ‘make-do’ with those subjects who happen to be available or what is termed a convenience sample. • Essentially if we take a random sample then we obtain an unbiased estimate of the corresponding population parameter, whereas a convenience sample may provide a biased estimate but by how much we will not know
  • 27. Properties of the distribution of sample means • The mean of all the sample means will be the same as the population mean. • The standard deviation of all the sample means is known as the standard error (SE) of the mean or SEM. • Given a large enough sample size, the distribution of sample means, will be roughly Normal regardless of the distribution of the variable.
  • 28. Properties of standard errors • The standard error (SE) is a measure of the precision of a sample estimate. • It provides a measure of how far from the true value in the population the sample estimate is likely to be. • All standard errors have the following interpretation: • A large standard error indicates that the estimate is imprecise. • A small standard error indicates that the estimate is precise. • The standard error is reduced, that is, we obtain a more precise estimate, if the size of the sample is increased.
  • 30. Worked example: Standard error of a mean – birthweight of preterm infants • Simpson (2004) reported the birthweights of 98 infants who were born prematurely, for which n = 98, ̅ 𝑥𝑥 = 1.31 kg, s = 0.42 kg and 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥 =? ? • The standard error provides a measure of the precision of our sample estimate of the population mean birthweight
  • 31. Worked example: Standard error of a proportion – acupuncture and headache • Melchart et al (2005) give the proportion who responded to acupuncture treatment in 124 patients with tension type headache as p = 0.46. • We assume the numbers who respond have a Binomial distribution and from Table 6.3 we find the standard error is 𝑆𝑆𝑆𝑆 𝑝𝑝 =? ?
  • 32. Worked example: Standard error of a rate – cadaveric heart donors • The study of Wight et al (2004) gave the number of organ donations calculated over a two-year period as r = 1.82 per day. • We assume the number of donations follows a Poisson distribution and from Table 6.3 we find the standard error is 𝑆𝑆𝑆𝑆 𝑟𝑟 =? ?
  • 33. Standard Errors of Differences
  • 34. Example: Difference in means – physiotherapy for patients with lung disease • Griffiths et al (2000) report the results of a randomised controlled trial to compare a pulmonary rehabilitation programme (Intervention) with standard medical management (Control) for the treatment of chronic obstructive pulmonary disease. • One outcome measure was the walking capacity (distance walked in metres from a standardised test) of the patient assessed 6 weeks after randomisation. • Further suppose such measurements can be assumed to follow a Normal distribution. • The results from the 184 patients are expressed using the group means and standard deviations (SD) as follows: nInt = 93, ̅ 𝑥𝑥Int = 211, SD(xInt ) = sInt = 118 nCon = 91, ̅ 𝑥𝑥Con = 123, SD(xCon ) = sCon = 99. • From these data d = xInt - xCon = 211 - 123 = 88 m and the corresponding standard error is 𝑺𝑺𝑺𝑺 � 𝒅𝒅 =
  • 35. Worked example: Difference in proportions – post-natal urinary incontinence • The results of randomised controlled trial conducted by Glazener et al (2001) to assess the effect of nurse assessment with reinforcement of pelvic floor muscle training exercises and bladder training (Intervention) compared with standard management (Control) among women with persistent incontinence three months postnatally are summarised in Table 6.2.
  • 36. Confidence intervals for an estimate • A confidence interval defines a range of values within which our population parameter is likely to lie • Such an interval for the population mean 𝜇𝜇 is defined by ̅ 𝑥𝑥 − 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥 𝑡𝑡𝑡𝑡 ̅ 𝑥𝑥 + 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥 and, in this case, is termed a 95% confidence interval as it includes the multiplier 1.96.
  • 38.
  • 39.
  • 40. Confidence Intervals for Differences • To calculate a confidence interval for a difference in means, for example d = 𝜇𝜇A - 𝜇𝜇B, the same structure for the confidence interval of a single mean is used but with ̅ 𝑥𝑥 replaced by ̅ 𝑥𝑥1 - ̅ 𝑥𝑥2 and SE( x) replaced by SE( x1 - x2). • Algebraic expressions for these standard errors are given in Table 6.4 (Section 6.10). • Thus the 95% CI is given by ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2 − 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2 𝑡𝑡𝑡𝑡 ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2 + 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2
  • 41.
  • 42.
  • 43. More Accurate Confidence Intervals for a Proportion • To use this method, we first need to calculate three quantities: 𝐴𝐴 = 2𝑟𝑟 + 𝑧𝑧2; 𝐵𝐵 = 𝑧𝑧 𝑧𝑧24𝑟𝑟(1 − 𝑝𝑝); and, 𝐶𝐶 = 2(𝑛𝑛 + 𝑧𝑧2) Where 𝑧𝑧 is from the standard normal table • The recommended confidence interval is given by (𝐴𝐴 − 𝐵𝐵) 𝐶𝐶 𝑡𝑡𝑡𝑡 (𝐴𝐴 + 𝐵𝐵) 𝐶𝐶 • When there are no observed events, r = 0 and hence 𝑝𝑝 = 0 𝑛𝑛 = 0, the recommended CI simplifies to 0 𝑡𝑡𝑡𝑡 𝑧𝑧2 (𝑛𝑛 + 𝑧𝑧2) • While when r = n so that p = 1, the CI becomes 𝑛𝑛 (𝑛𝑛 + 𝑧𝑧2) 𝑡𝑡𝑡𝑡 1