This document provides an introduction to biostatistics in health. It discusses:
- How data is collected through instruments which have limitations and human biases. Statistics help extract meaningful information from large amounts of raw data.
- Key concepts including populations, samples, variables, and different measurement scales. Variables can be qualitative taking categories like gender, or quantitative measured on interval/ratio scales.
- Descriptive statistics help summarize and present data through tables, graphs, and measures of central tendency and spread. Inferential statistics are used to draw conclusions beyond the sample studied.
- The importance of biostatistics in health fields like understanding diagnostic tests, clinical trials, epidemiology, and evidence-based practice. Statistics under
2. Introduction
ā¢ In all walks of life, especially health care, we are under
constant threat of being overwhelmed by data. But unless we
are going to practice on the basis of hunches, we need these
data to make good decisions and so optimize our treatment or
policy plans. So, our first objective must be to understand a
little about data.
ā¢ Data do not arise by magic; data collection is determined by a
human, with all his or her prejudices and proneness to error.
We rely on instruments, such as questionnaires, mercury
barometers and spectrophotometers, to measure and record
variables of interest, such as peopleās ages or blood pressures,
or amounts of chloro-fluorocarbons in the atmosphere. But
all instruments have limitations on their accuracy.
3. Introduction (contd..)
ā¢ Humans tend to accept a body of raw data at face
value; a computer print-out of numbers can appear
to be an unalterable truth. It may, in fact, be junk.
Subsequent statistical processing will not recycle junk
into truth (āgarbage in, garbage outāprinciple).
ā¢ The next problem is: how do we extract meaningful
information from masses of raw data? (How do we
see the forest, when the trees keep getting in the
way?)
4. ā¢ Statistics
ā A science that deals with principles & methods for
the collection, presentation, analysis &
interpretation of data
ā A science that deals with the study of aggregates
or total or āpopulationā
ā¢ Bio-Statistics
ā Statistical methods used in the field of Health,
Biology, Medicine, Public Health
Definition
5. Statistical thinking would one
day be as necessary a
qualification for good citizenship
as the ability to read and write
H.G. WELLS, 1903
7. Data and variables
ā¢ A variable is a characteristic of a population which can take
different values. The population might be a human
population; variables of interest in this population might be
age, income, number of children and education level. One
could envisage a population of teaching hospitals and perhaps
be interested in such variables as annual budget, number of
medical students and number of X-rays taken per year.
ā¢ The Health Commission might be interested in a population of
small chemical processing plants and wish to measure
variables related to the health of employees ā for example,
fresh air recirculation times, or number of emergency dousing
showers per factory.
8. Data and variables (contd..)
ā¢ Data are measurements collected on a variable as a result of
taking observations.
ā¢ Often, data will have associated units of measurement.
ā¢ Most often, due to time and resource constraints, we are
dealing with data collected on a subset or sample of a
population.
ā¢ Data may be classified as being discrete if the variable can
take only a finite number of values or continuous if the
variable can (at least within a certain range) take any value
along the number line, for example, height, plasma
cholesterol level, blood pressure.
9. Measurement Scales
ā¢ Depending on the nature of the variable, we have different
measurement scales.
ā¢ Firstly, not all data are numbers, though they are often
represented as numbers. choose to code males as ā1ā and
females as ā2ā, especially when entering the data into a
computer. In this example, the numbers 1 and 2 have none of
the usual properties of numbers
ā¢ Two observations are equal if both are female (or both male)
and non-equal otherwise. Such a scale is called nominal or
categorical, and because of this lack of mathematical
properties it is called the āweakestā scale.
10. Measurement of scales (contd..)
ā¢ A slightly stronger scale, one that uses the mathematical
notion of ordering is the ordinal scale. An example of a
variable measured on an ordinal scale might be the position
of an examination candidate in the results order of merit. The
number of the ranked position tells us nothing about how
much difference exists between successive positions on the
scale.
ā¢ Interval scale: It possesses strong mathematical properties
due to the fact that in this scale equal differences between
points represent equal differences in the measured quantity.
For example, the difference between 12 metres and 11
metres is the same as the difference between 4 metres and 3
metres.
11. Measurement of scales
ā¢ The ratio scale is considered a refinement of the interval
scale. In this scale, the order and size of interval are
important, but the ratio between two measures also has
meaning. This occurs when there is a true zero point
associated with the scale.
ā¢ For example,
ā temperature in degrees Celsius (interval scale)
ā Temperature in Degree Kelvin (ratio scale).
12. Simplification of data
ā¢ The stronger measures can always be ācollapsedā down to form a
weaker measure but not vice versa. Eg. Height of students
ā¢ when data are simplified or summarised, information is always lost,
understanding may be gained.
ā¢ Here, if we measure heights on enough people, we may be
overwhelmed by the number of individual measurements, but being
able to say that x% are tall and (100 - x)% are short may give a useful
insight into this aspect of the population.
ā¢ Of course, using this nominal scale of measuring height we are no
longer able to say what an individualās actual height is.
ā¢ The reason we stress scales of measure is that the information
content of data depends on the scale, and different descriptive
techniques and different statistical tests are appropriate to different
scales
13. Descriptive statistics
ā¢ Descriptive statistics includes methods for presenting
and summarising data.
ā¢ These allow us to digest and understand large
quantities of data, and to effectively communicate to
others important aspects of our research.
14. Descriptive statistics
Frequency Distributions and Data Presentation
ā¢ If you arrange your raw data so that the scores
on a variable of interest are in order of
magnitude, that is, you rank the data, and
then indicate by means of a table orgraph
how often a score occurs, then you will have
constructed a frequency distribution ā atally
of the scores.
17. Inferential statistics
ā¢ The inferential approach helps to decide whether the
outcome of the study is a result of factors planned
within design of the study or determined by chance.
ā¢ The two approaches are often used sequentially in
that first data are described with descriptive
statistics, and then additional statistical
manipulations are done to make inferences about
the likelihood that the outcome the outcome was
due to chance through inferential statistics.
18. Need for learning Biostatistical principles for
health/research workers
1. A knowledge of Statistics is required to understand the rationale
on which diagnostic, prognostic and therapeutic decisions are or
should be based
2. To appreciate that Medicine is highly dependent on concepts of
probability
3. Within their competence, health workers need to interpret
laboratory tests and bedside observations & measurements in
the light of a knowledge of physiological, observer and
instrument variation
19. Need for learning Biostatistics contd.
4. Health workers must know and understand the Statistical &
Epidemiological facts about the etiology and prognosis of the
diseases that they treat in order to give the best advice to their
patients about how to avoid or limit the effects of these
diseases
5. Health workers are the primary generators of the data on which
Health Statistics are based. Therefore they need to know how
data can be and should be used, both for the benefit of their
own practice and for the organization and delivery of health
care
20. Need for learning Biostatistics contd.
6. Health managers need to know how to interpret and draw
inferences from the indicators that describe health levels &
trends etc.
7. The study of Statistics helps to foster in students the critical and
deductive faculties that they need through out their studies and
in their professional work through out their careers
8. Knowledge of Statistics helps in understanding & evaluating
medical literature/ reports so that they keep abreast of
developments in their profession
21. Statistical applications in Health
ā¢ Normal values of a characteristic
ā¢ How to classify an individual as healthy
or sick?
ā¢ Needs treatment or not ?
ā¢ Based on the ānormalā values of certain
clinical, laboratory & other measurements
āNormalā here is of statistical concept and
depends on the distribution of the characteristic
in the āPopulationā
22. Statistical applications in Health
ā¢ Most often in Health research, we need to
know whether the level of a parameter is
same between two or more groups
Eg.Is the birth wt same in male & female?
Is it same in babies born to educated &
uneducated mothers?
ā¢ We can answer this only by statistical means
(Testing of hypotheses)
23. Statistical applications in Health
ā¢ Sometimes, the interest could be the relation
between 2 variables ā Eg. Weeks of gestation
& Birth weight of the newborn
ā¢ We wish to know whether a change in one
brings a change in the other
ā¢ Statistical methods that help in this are:
ā¢ Exploratory - Scatter plot
ā¢ Quantification ā Correlation Coefficient
24. Statistical applications in Health
ā¢ Quantifying the influence of one factor on another
ā¢ How much birth weight can be gained if the delivery
can be postponed by a week?
ā¢ How likely the new therapy improve clinical outcomes?
ā¢ What is risk of breast cancer in a woman, if her mother
had h/o the same? (Regression Analysis)
25. Statistical applications in Health
If our interest is in seeing the agreement between two measurements of the same
characteristic (note: agreement ļ¹ correlation)
ā¢ BP measurements by a physician & a trained nurse
ā¢ Interpretation of slides by 2 pathologists
ā¢ Smoking status by interview & presence of cotinine in urine
(Agreement analysis methods)
ā¢ Exploratory plots
ā¢ Quantification ā Kappa statistic
26. Statistical applications in Health
ā¢ Agreement between 2 diagnostic tests
Eg. Diagnosis of TB by sputum culture vs PCR
ā¢ Sensitivity
ā¢ Specificity
ā¢ Positive Predictive value (PV+)
ā¢ Negative predictive value (PV-)
27. Statistical applications in Health
ā¢ Development of new drugs and treatment modalities
ā¢ What is the tolerable dose of a new drug?
ā¢ What are the pharmaco-kinetics of the drug?
ā¢ What is the effect of new drug/treatment in treating a condition?
ā¢ How comparable is the new drug/treatment with the existing
drug(s)/ treatment(s)?
ā¢ Issues: Inclusion and exclusion criteria, confounding, bias, blinding,
randomization etc. (Clinical Trials ā Therapeutic & Prophylactic)
28. Statistical applications in Health
ā¢ Ensuring maximum benefit of diagnosis and
treatment with minimum cost, based on
available resources
(Health Economics & Operational Research)
29. Statistical applications in Health
ā¢ Indices on population characteristics
ā¢ Birth rate
ā¢ Death rate
ā¢ Fertility rate
ā¢ Reproduction rate
ā¢ Infant mortality, maternal mortality etc.
(Vital statistics)
30. Statistical applications in Health
ā¢ Population dynamics
ā¢ Growth
ā¢ Rural ā Urban components
ā¢ Migration
ā¢ Age composition
ā¢ Expectation of life at birth
(Demography)
31. Statistical applications in Health
ā¢ Estimating the magnitude of various diseases
ā¢ Distribution w.r. to age, place, time
ā¢ Identifying the possible causative factors
ā¢ Principles of different study designs for different objectives
ā¢ Control of Confounding, Bias
ā¢ Role of any interactions
(Epidemiological methods)
32. Statistical applications in Health
ā¢ Estimating the potency and relative potency of
drugs
ā¢ What is the dose at which 50% of subjects
respond? (LD50, ED50)
ā¢ Eg. If the relative potency of a drug (A) compared
to another drug (B) is 1.5, then 1 unit of drug A is
equivalent to 1.5 units of drug B
(Biological Assays)
33. Statistical applications in Health
ā¢ Maintaining the quality of:
ā¢ Drugs
ā¢ Laboratory Instruments
ā¢ Surgical Instruments
(Quality Control analysis)
34. Statistical applications in Health
ā¢ Synthesis of data from separate but similar,
comparable studies to have a quantitative
summary of pooled results
ā¢ Aim is to integrate the findings, pool the data
and find overall trend of results
Meta-analysis (Systematic Review)
35.
36. Statistical applications in Health
ā¢ Use of current best evidence derived from published
clinical & epidemiological research in management of
patients, with due attention to:
ā¢ Balance of risks & benefits of diagnostic tests
ā¢ Alternative treatment regimens
ā¢ Each patientās unique circumstances, including baseline
risk, co-morbidities & personal preferences
(Evidence Based Medicine)
37. Statistical applications in Health
Application of best available evidence in setting
public health policies and practices:
Evidence may be from epidemiologic,
demographic, sociologic, economic etc. sources
Implementation of public health policies,
programs & practices require good evidence on
feasibility, efficacy, efficiency, cost, acceptability
to the target population
(Evidence Based Public Health)
38. Whatever the branch of Statistical
applications:
ā¢ What sample size is needed to arrive at a valid
conclusion?
ā¢ What is the role of chance in the observed
findings?
ā¢ Is the observed result due to some other factor?
ā¢ Is there any inter-play of different factors?
ā¢ Can the observed result be generalized?
39. Role of statistics in Health Sciences &
Health Care Delivery
ā¢ Statistical methods are applied consciously or
subconsciously in health care delivery at the
community and individual patient levels
ā¢ At the Community level:
ā Monitor & assess the health situation & trends
ā Predict the likely outcome of an intervention
ā¢ At individual patient level:
ā To arrive at the most likely diagnosis
ā To predict the prognostic course
ā To evaluate the relative efficacy of different modes
of treatment
40. ā¢ Practical convenience dictates that we study a set
of items or individuals from a larger aggregate or
population about which we wish to know
ā¢ Eg. What proportion of primary school children in
Delhi have their first molar tooth erupted ?
ā¢ Study all primary school going children and count
how many had the first molar erupted
- Needs enormous time, resources
- May not be necessary also
41. ā¢ Alternatively, a subset or a portion of the total
primary school children can be studied and
the results observed in the small set can be
projected to the total children
ā¢ The āTotalā or āAggregateā we are talking about
is called the āPopulationā and the part or
subset is referred to as the āSampleā
ā¢ Thus the essence of all Statistics is:
ā¢ Study only a portion and project the results to
a target total
42. ā¢ In the process of studying only a part and
guessing about the total:
ā Part or subset is called āSampleā
ā The target of our investigation or the
total/aggregate is called the āPopulationā
ā In the molar eruption example:
ā¢ All primary school going children (in Delhi) is our
āPopulationā
ā¢ Selected subset of primary school going children forms
our āSampleā
43. E.g. 2. If we wish to know the proportion of anemic
pregnant women in a village:
ā¢ All currently pregnant women in the village form
the āPopulationā
ā¢ Selected small set of pregnant women (from the
same village) will form the āSampleā
E.g. 3. If we wish to know the birth weight of newborns
in a clinic:
ā¢ All babies born in the clinic is the āPopulationā
Selected newborn babies form the āSampleā
44. Population vs Sample
ā¢ Note that results of sample are of interest only if they tell
us about the population from which the sample is drawn
ā¢ Intuitively, the bigger the Sample, the more confident we
are about the applicability of results to the population
ā¢ The more the Sample is representative of the Population
(Total), the more confident we are
ā¢ So for a given investigation, clear definition of Population &
Sample and careful selection of the Sample are very
important
45. In the investigation on molar eruption in primary
school children, the characteristic of interest is
whether a child has an erupted molar (Yes/ No)
Similarly, in the birth weights example, the
characteristic of interest is the birth weight of the
newborns (800 gms ā 4500 gms)
Such characteristics which are likely to vary from
person to person are called āVariablesā
Because the characteristic is likely to vary from person
to person we have to study it based on a selected
group of persons (Sample)
46. If the characteristic is same for all individuals in the
āPopulationā, there is no need to study. Just knowing the
status of one will tell all. In other words, such a
characteristic is not a āVariableā
Molar eruption status can take 2 values: Yes/No
Anemic status of mothers can take 2 values:Yes/No
Birth weight of newborns can take any value between
some range (say 800 ā 4500 gms)
The variable Blood group can take possible values as A, B,
AB or O
Though they vary, notice that the type of possible values
are different
47. Variables that can take a few values, that are not
Measurable, but only attributes or qualities are called
āQualitative Variablesā
In our examples molar eruption status, anemic
status, Blood group are āQualitative Variablesā
Birth weight of newborns is quantifiable and can take
a wide range of possible values between say 800-
4500 gms. This characteristic can be measured and
has some units of measurement. This type variables
are called āQuantitative Variablesā
48. Qualitative Variables
Since Qualitative variables can a take a few values or categories,
they are also called Categorical variables
For example, any individual can have either A or B or AB or O
blood group, but no in between values
Examples: Race, smoking status, gender etc.
Nominal, Ordinal Variables
49. Quantitative Variables
Quantitative variables as the name indicates can take
any value on a continuous spectrum.
For convenience sake we measure them in some units
(rounding off) ā however finer they are (height in
metres, cms, mm etc.)
But the actual values can lie on a continuous spectrum.
So they are also called Continuous variables
Examples: Blood sugar, serum creatinine, age, income,
height, weight etc.