statistics introduction.ppt

Introduction
• In all walks of life, especially health care, we are under
constant threat of being overwhelmed by data. But unless we
are going to practice on the basis of hunches, we need these
data to make good decisions and so optimize our treatment or
policy plans. So, our first objective must be to understand a
little about data.
• Data do not arise by magic; data collection is determined by a
human, with all his or her prejudices and proneness to error.
We rely on instruments, such as questionnaires, mercury
barometers and spectrophotometers, to measure and record
variables of interest, such as people’s ages or blood pressures,
or amounts of chloro-fluorocarbons in the atmosphere. But
all instruments have limitations on their accuracy.

Introduction (contd..)
• Humans tend to accept a body of raw data at face
value; a computer print-out of numbers can appear
to be an unalterable truth. It may, in fact, be junk.
Subsequent statistical processing will not recycle junk
into truth (“garbage in, garbage out”principle).
• The next problem is: how do we extract meaningful
information from masses of raw data? (How do we
see the forest, when the trees keep getting in the
way?)

• Statistics
– A science that deals with principles & methods for
the collection, presentation, analysis &
interpretation of data
– A science that deals with the study of aggregates
or total or ‘population’
• Bio-Statistics
– Statistical methods used in the field of Health,
Biology, Medicine, Public Health
Definition

Statistical thinking would one
day be as necessary a
qualification for good citizenship
as the ability to read and write
H.G. WELLS, 1903

Data and variables
• A variable is a characteristic of a population which can take
different values. The population might be a human
population; variables of interest in this population might be
age, income, number of children and education level. One
could envisage a population of teaching hospitals and perhaps
be interested in such variables as annual budget, number of
medical students and number of X-rays taken per year.
• The Health Commission might be interested in a population of
small chemical processing plants and wish to measure
variables related to the health of employees – for example,
fresh air recirculation times, or number of emergency dousing
showers per factory.

Data and variables (contd..)
• Data are measurements collected on a variable as a result of
taking observations.
• Often, data will have associated units of measurement.
• Most often, due to time and resource constraints, we are
dealing with data collected on a subset or sample of a
population.
• Data may be classified as being discrete if the variable can
take only a finite number of values or continuous if the
variable can (at least within a certain range) take any value
along the number line, for example, height, plasma
cholesterol level, blood pressure.

Measurement Scales
• Depending on the nature of the variable, we have different
measurement scales.
• Firstly, not all data are numbers, though they are often
represented as numbers. choose to code males as “1” and
females as “2”, especially when entering the data into a
computer. In this example, the numbers 1 and 2 have none of
the usual properties of numbers
• Two observations are equal if both are female (or both male)
and non-equal otherwise. Such a scale is called nominal or
categorical, and because of this lack of mathematical
properties it is called the “weakest” scale.

Measurement of scales (contd..)
• A slightly stronger scale, one that uses the mathematical
notion of ordering is the ordinal scale. An example of a
variable measured on an ordinal scale might be the position
of an examination candidate in the results order of merit. The
number of the ranked position tells us nothing about how
much difference exists between successive positions on the
scale.
• Interval scale: It possesses strong mathematical properties
due to the fact that in this scale equal differences between
points represent equal differences in the measured quantity.
For example, the difference between 12 metres and 11
metres is the same as the difference between 4 metres and 3
metres.

Measurement of scales
• The ratio scale is considered a refinement of the interval
scale. In this scale, the order and size of interval are
important, but the ratio between two measures also has
meaning. This occurs when there is a true zero point
associated with the scale.
• For example,
– temperature in degrees Celsius (interval scale)
– Temperature in Degree Kelvin (ratio scale).

Simplification of data
• The stronger measures can always be “collapsed” down to form a
weaker measure but not vice versa. Eg. Height of students
• when data are simplified or summarised, information is always lost,
understanding may be gained.
• Here, if we measure heights on enough people, we may be
overwhelmed by the number of individual measurements, but being
able to say that x% are tall and (100 - x)% are short may give a useful
insight into this aspect of the population.
• Of course, using this nominal scale of measuring height we are no
longer able to say what an individual’s actual height is.
• The reason we stress scales of measure is that the information
content of data depends on the scale, and different descriptive
techniques and different statistical tests are appropriate to different
scales

Descriptive statistics
• Descriptive statistics includes methods for presenting
and summarising data.
• These allow us to digest and understand large
quantities of data, and to effectively communicate to
others important aspects of our research.

Descriptive statistics
Frequency Distributions and Data Presentation
• If you arrange your raw data so that the scores
on a variable of interest are in order of
magnitude, that is, you rank the data, and
then indicate by means of a table orgraph
how often a score occurs, then you will have
constructed a frequency distribution – atally
of the scores.

Table
grams/day frequency relative frequency
0-9 125 125/1000 = 0.125
10-19 250 250/1000 = 0.250
20-29 400 400/1000 = 0.400
30-39 150 150/1000 = 0.150
40-59 50 50/1000 = 0.050
60-99 25 25/1000 = 0.025

Inferential statistics
• The inferential approach helps to decide whether the
outcome of the study is a result of factors planned
within design of the study or determined by chance.
• The two approaches are often used sequentially in
that first data are described with descriptive
statistics, and then additional statistical
manipulations are done to make inferences about
the likelihood that the outcome the outcome was
due to chance through inferential statistics.

Need for learning Biostatistical principles for
health/research workers
1. A knowledge of Statistics is required to understand the rationale
on which diagnostic, prognostic and therapeutic decisions are or
should be based
2. To appreciate that Medicine is highly dependent on concepts of
probability
3. Within their competence, health workers need to interpret
laboratory tests and bedside observations & measurements in
the light of a knowledge of physiological, observer and
instrument variation

Need for learning Biostatistics contd.
4. Health workers must know and understand the Statistical &
Epidemiological facts about the etiology and prognosis of the
diseases that they treat in order to give the best advice to their
patients about how to avoid or limit the effects of these
diseases
5. Health workers are the primary generators of the data on which
Health Statistics are based. Therefore they need to know how
data can be and should be used, both for the benefit of their
own practice and for the organization and delivery of health
care

Need for learning Biostatistics contd.
6. Health managers need to know how to interpret and draw
inferences from the indicators that describe health levels &
trends etc.
7. The study of Statistics helps to foster in students the critical and
deductive faculties that they need through out their studies and
in their professional work through out their careers
8. Knowledge of Statistics helps in understanding & evaluating
medical literature/ reports so that they keep abreast of
developments in their profession

Statistical applications in Health
• Normal values of a characteristic
• How to classify an individual as healthy
or sick?
• Needs treatment or not ?
• Based on the ‘normal’ values of certain
clinical, laboratory & other measurements
‘Normal’ here is of statistical concept and
depends on the distribution of the characteristic
in the ‘Population’

• Most often in Health research, we need to
know whether the level of a parameter is
same between two or more groups
Eg.Is the birth wt same in male & female?
Is it same in babies born to educated &
uneducated mothers?
• We can answer this only by statistical means
(Testing of hypotheses)

• Sometimes, the interest could be the relation
between 2 variables – Eg. Weeks of gestation
& Birth weight of the newborn
• We wish to know whether a change in one
brings a change in the other
• Statistical methods that help in this are:
• Exploratory - Scatter plot
• Quantification – Correlation Coefficient

• Quantifying the influence of one factor on another
• How much birth weight can be gained if the delivery
can be postponed by a week?
• How likely the new therapy improve clinical outcomes?
• What is risk of breast cancer in a woman, if her mother
had h/o the same? (Regression Analysis)

If our interest is in seeing the agreement between two measurements of the same
characteristic (note: agreement  correlation)
• BP measurements by a physician & a trained nurse
• Interpretation of slides by 2 pathologists
• Smoking status by interview & presence of cotinine in urine
(Agreement analysis methods)
• Exploratory plots
• Quantification – Kappa statistic

• Agreement between 2 diagnostic tests
Eg. Diagnosis of TB by sputum culture vs PCR
• Sensitivity
• Specificity
• Positive Predictive value (PV+)
• Negative predictive value (PV-)

• Development of new drugs and treatment modalities
• What is the tolerable dose of a new drug?
• What are the pharmaco-kinetics of the drug?
• What is the effect of new drug/treatment in treating a condition?
• How comparable is the new drug/treatment with the existing
drug(s)/ treatment(s)?
• Issues: Inclusion and exclusion criteria, confounding, bias, blinding,
randomization etc. (Clinical Trials – Therapeutic & Prophylactic)

• Ensuring maximum benefit of diagnosis and
treatment with minimum cost, based on
available resources
(Health Economics & Operational Research)

• Indices on population characteristics
• Birth rate
• Death rate
• Fertility rate
• Reproduction rate
• Infant mortality, maternal mortality etc.
(Vital statistics)

• Population dynamics
• Growth
• Rural – Urban components
• Migration
• Age composition
• Expectation of life at birth
(Demography)

• Estimating the magnitude of various diseases
• Distribution w.r. to age, place, time
• Identifying the possible causative factors
• Principles of different study designs for different objectives
• Control of Confounding, Bias
• Role of any interactions
(Epidemiological methods)

• Estimating the potency and relative potency of
drugs
• What is the dose at which 50% of subjects
respond? (LD50, ED50)
• Eg. If the relative potency of a drug (A) compared
to another drug (B) is 1.5, then 1 unit of drug A is
equivalent to 1.5 units of drug B
(Biological Assays)

• Maintaining the quality of:
• Drugs
• Laboratory Instruments
• Surgical Instruments
(Quality Control analysis)

• Synthesis of data from separate but similar,
comparable studies to have a quantitative
summary of pooled results
• Aim is to integrate the findings, pool the data
and find overall trend of results
Meta-analysis (Systematic Review)

• Use of current best evidence derived from published
clinical & epidemiological research in management of
patients, with due attention to:
• Balance of risks & benefits of diagnostic tests
• Alternative treatment regimens
• Each patient’s unique circumstances, including baseline
risk, co-morbidities & personal preferences
(Evidence Based Medicine)

Application of best available evidence in setting
public health policies and practices:
Evidence may be from epidemiologic,
demographic, sociologic, economic etc. sources
Implementation of public health policies,
programs & practices require good evidence on
feasibility, efficacy, efficiency, cost, acceptability
to the target population
(Evidence Based Public Health)

Whatever the branch of Statistical
applications:
• What sample size is needed to arrive at a valid
conclusion?
• What is the role of chance in the observed
findings?
• Is the observed result due to some other factor?
• Is there any inter-play of different factors?
• Can the observed result be generalized?

Role of statistics in Health Sciences &
Health Care Delivery
• Statistical methods are applied consciously or
subconsciously in health care delivery at the
community and individual patient levels
• At the Community level:
– Monitor & assess the health situation & trends
– Predict the likely outcome of an intervention
• At individual patient level:
– To arrive at the most likely diagnosis
– To predict the prognostic course
– To evaluate the relative efficacy of different modes
of treatment

• Practical convenience dictates that we study a set
of items or individuals from a larger aggregate or
population about which we wish to know
• Eg. What proportion of primary school children in
Delhi have their first molar tooth erupted ?
• Study all primary school going children and count
how many had the first molar erupted
- Needs enormous time, resources
- May not be necessary also

• Alternatively, a subset or a portion of the total
primary school children can be studied and
the results observed in the small set can be
projected to the total children
• The ‘Total’ or ‘Aggregate’ we are talking about
is called the ‘Population’ and the part or
subset is referred to as the ‘Sample’
• Thus the essence of all Statistics is:
• Study only a portion and project the results to
a target total

• In the process of studying only a part and
guessing about the total:
– Part or subset is called ‘Sample’
– The target of our investigation or the
total/aggregate is called the ‘Population’
– In the molar eruption example:
• All primary school going children (in Delhi) is our
‘Population’
• Selected subset of primary school going children forms
our ‘Sample’

E.g. 2. If we wish to know the proportion of anemic
pregnant women in a village:
• All currently pregnant women in the village form
the ‘Population’
• Selected small set of pregnant women (from the
same village) will form the ‘Sample’
E.g. 3. If we wish to know the birth weight of newborns
in a clinic:
• All babies born in the clinic is the ‘Population’
Selected newborn babies form the ‘Sample’

Population vs Sample
• Note that results of sample are of interest only if they tell
us about the population from which the sample is drawn
• Intuitively, the bigger the Sample, the more confident we
are about the applicability of results to the population
• The more the Sample is representative of the Population
(Total), the more confident we are
• So for a given investigation, clear definition of Population &
Sample and careful selection of the Sample are very
important

In the investigation on molar eruption in primary
school children, the characteristic of interest is
whether a child has an erupted molar (Yes/ No)
Similarly, in the birth weights example, the
characteristic of interest is the birth weight of the
newborns (800 gms – 4500 gms)
Such characteristics which are likely to vary from
person to person are called ‘Variables’
Because the characteristic is likely to vary from person
to person we have to study it based on a selected
group of persons (Sample)

If the characteristic is same for all individuals in the
‘Population’, there is no need to study. Just knowing the
status of one will tell all. In other words, such a
characteristic is not a ‘Variable’
Molar eruption status can take 2 values: Yes/No
Anemic status of mothers can take 2 values:Yes/No
Birth weight of newborns can take any value between
some range (say 800 – 4500 gms)
The variable Blood group can take possible values as A, B,
AB or O
Though they vary, notice that the type of possible values
are different

Variables that can take a few values, that are not
Measurable, but only attributes or qualities are called
‘Qualitative Variables’
In our examples molar eruption status, anemic
status, Blood group are ‘Qualitative Variables’
Birth weight of newborns is quantifiable and can take
a wide range of possible values between say 800-
4500 gms. This characteristic can be measured and
has some units of measurement. This type variables
are called ‘Quantitative Variables’

Qualitative Variables
Since Qualitative variables can a take a few values or categories,
they are also called Categorical variables
For example, any individual can have either A or B or AB or O
blood group, but no in between values
Examples: Race, smoking status, gender etc.
Nominal, Ordinal Variables

Quantitative Variables
Quantitative variables as the name indicates can take
any value on a continuous spectrum.
For convenience sake we measure them in some units
(rounding off) – however finer they are (height in
metres, cms, mm etc.)
But the actual values can lie on a continuous spectrum.
So they are also called Continuous variables
Examples: Blood sugar, serum creatinine, age, income,
height, weight etc.

Variables
Quantitative Qualitative
Nominal
Eg.
Bl. group
Ordinal
Eg.
Stage of dis.
Discrete
Eg.
Viral load
Continuous
Eg.
Temperature

So for any given investigation we
should be very clear about:
• Target population
• Sample
• Variable(s) under study

statistics introduction.ppt

Recommended

Recommended

More Related Content

Similar to statistics introduction.ppt

Similar to statistics introduction.ppt (20)

More from CHANDAN PADHAN

More from CHANDAN PADHAN (20)

Recently uploaded

Recently uploaded (20)

statistics introduction.ppt