CONTENTS
Measures of dispersion
Normaldistribution curve
Probability or p value
Tests of significance
Parametric
Non – parametric test
One tailed
Two tailed
Conclusion
References
5.
INTRODUCTION
Branch of statisticsapplied to biological or medical
sciences
Statistics
Italian word statista – statesman
German word statistik – political state
Science of statistics originated from
Government records
Mathematics
Father of health statistics –
JOHN GRAUNT(1620-1674)
6.
Biostatistics
Bio – partinvolves biology
Statistics – involves accumulation, tracking,
analysis, and application of data
Biostatistics is the method of collection,
organizing, analyzing, tabulating and
interpretation of datas related to living
organisms and human beings
7.
TERMINOLOGY
Constant
Quantities that donot vary e.g. in biostatistics, mean,
standard deviation are considered constant for a population
Variable
A name denoting a condition, occurrence, or effect that can
assume different values is a variable.
8.
TERMINOLOGY
Population
Population includes allpersons, events and objects under study. it
may be finite or infinite.
Sample
Defined as a part of a population generally selected so as to be
representative of the population whose variables are under study
Individual entities that form focus of the study are termed sampling units.
List of sampling units is known as sampling list or sampling frame.
9.
TERMINOLOGY
Parameter
It is aconstant that describes a population e.g. in a college
there are 80% girls or average age of dental patients in2010.
This describes the population, hence it is a parameter.
Statistic
Statistic is a constant that describes the sample e.g. out of
100 students of the same college 80% girls. This 80% will be
statistic as it describes the sample
DATA
A collective recordingof observations either numerical or
otherwise is called Data
Observations can be collected by recording
No of cases who have removable prosthesis
Or by the age or gender of the patient
In any of the cases a certain observation is made of a
characteristic which varies from person to person – this is
called a Variable
QUALITATIVE DATA
Data collectedon basis of attributes or quality like sex, occlusion, cavity
etc.
In such data there is no notion of magnitude or size of an attribute as
the same cannot be measured.
The number of person having the same attribute are variable and are
measured
e.g. like out of 100 people 75 have class I occlusion, 15 have class II occlusion and
10 have class III occlusion.
Class I II III are attributes , which cannot be measured in figures, only no of people
having it can be determined
14.
QUANTITATIVE DATA
Data iscollected through measurement using calipers, like arch length, width,
flouride concentration in water supply etc.
In this the attribute has a magnitude, both the attribute and the number of
persons having the attribute vary.
It may be
Continous - variable can take any value in a given range, decimal or fractional.
Discrete – variable under observation takes fixed values like whole numbers. (DMF teeth)
E.g Freeway space. It varies for every patient. It is a quantity with a different
value for each individual and is measurable. It is continuous as it can take any
value between 2 and 4 like it can be 2.10 or 2.55 or 3.07 etc.
15.
Source of data
Themain sources for collection of data
Experiments
Surveys
Records
16.
SOURCE OF DATA
Experiments
Experimentsare performed to collect data for
investigations and research by one or more
workers.
Records
Records are maintained as a routine in registers
and books over a long period of time
provides readymade data.
17.
Surveys
Carried out forEpidemiological studies in the field
by trained teams to find incidence or prevalence of
health or disease in a community.
These epidemiological studies can be either
Descriptive or Analytical
18.
Descriptive epidemiological
study
Descriptive epidemiologicalstudies uses
Cross – sectional study
Longitudinal study for measuring of a disease in terms
of magnitude
Incidence of a disease is obtained from longitudinal
study
Prevalence of a disease is obtained from cross sectional
study
19.
Cross – sectionalstudy
Measurement of exposure and effect are made at
the same time. So, we get relationship between a
disease and other variables of interest as they exist
at one point of time
E.g., In cross- sectional study of oral cancer we can
also collect data during their survey on age, sex,
occupation, habits and tobacco usage
20.
Longitudinal study
Study conductedover long period of time, it is longitudinal study
These are done on samples drawn from population and
observations made at periodic intervals
Longitudinal studies are useful
For studying the natural history of disease and its outcome
For identifying risk factors associated with disease
For calculating the incidence rate of disease.
21.
Analytical epidemiological
study
Cause ofdisease, referred to as event, condition or
characteristic or combination of these factors play an
important role in production of disease.
Before ensuring a factor as a cause, several observations
have to be made on the so called ‘exposure’
These observations comprise the procedure of analytical
epidemiology
22.
Cohort study
Approach ofbeginning with exposure and searching for effects in a
prospective manner in time is referred to as Cohort study
Is an observational study which attempts to study relationship between
purported cause and subsequent risk of developing disease.
Distinguishing features of cohort study
Group of persons to be studied are defined in terms of characteristics manifest
prior to appearance of disease under investigation
Study groups are observed over a period of time to determine frequency of
disease among them
23.
Case – controlstudy
Beginning with disease and searching for causes in the
past is referred to as case – control study.
This is done along with another group of individuals who
have not fallen prey of the condition, called the control
Case – control studies are primarily used to assess risks
and to study causes of diseases.
SAMPLING DESIGN
Simple randomsampling
Systemic random sampling
Stratified random sampling
Cluster sampling
Multiphase sampling
Pathfinder surveys
27.
SIMPLE RANDOM SAMPLING
Eashand every unit in population have equal
chance of being included in sample.
Selection of unit is determined by chance.
To ensure randomness any of the methods
can be chosen
Lottery method
Table of random numbers
28.
SYSTEMIC RANDOM
SAMPLING
Selecting oneunit at random and then
selecting additional units at evenly spaced
interval till sample of required size has been
formed
This method is used when complete list of
population is available
29.
STRATIFIED RANDOM
SAMPLING
Population tobe sampled is subdivided into groups known as
strata, such that each group is homogeneous in characteristic
A simple random sample is chosen from each stratum. This type
of sampling is heterogeneous with regard to the characteristic
under study.
E.g., to know the prevalence of DMF teeth in different age groups,
then age groups form the strata and the random sample should
be chosen from each stratum i.e., the age group
30.
CLUSTER SAMPLING
This methodused when population forms natural
groups or clusters, such as villages, wards, school
children etc.
First a sample of clusters is selected and then all
units in each of the selected clusters are surveyed.
This method simpler and involves less time and cost,
but gives higher standard error.
31.
MULTIPHASE SAMPLING
Part ofinformation is collected from whole sample
and part from sub-sample.
E.g., all children in school are surveyed and those
with only oral health problem are selected in second
phase. In 3rd
phase those who need the treatment
are only selected. Thus by 3rd
and 4th
phases sub
samples become smaller and smaller
32.
PATHFINDER SURVEY
Pathfinder surveyscan be either pilot or national, depending on the
number and type of sampling sites and the age groups included.
A national pathfinder survey incorporates sufficient examination sites
to cover all important subgroups of the population that may have
differing disease levels or treatment needs, and at least three of the
age groups or index ages
This type of survey design is suitable for collection of data for planning
and monitoring of services in all countries whatever the level of
disease, availability of resources, or complexity of services.
33.
In a largecountry with many geographic and
population subdivisions and a complex service
structure, a larger number of sampling sites is
needed.
The basic principle of using index ages and
standard samples in each sites within a
stratified approach, however, remains valid.
34.
SAMPLE SIZE ???
Biggerthe sample, higher the precision of estimates
of sample.
E.g., if field survey is conducted to estimate the
prevalence rate of a disease, the sample size is
calculated by the formula
n = z2
α * p* (1-p) / L2
n – sample size
p – app. prevalence rate of disease
L – permissible error in estimation of p
zα – normal value for probability level.
35.
SAMPLING ERRORS
2 typesof errors
Sampling error – due to sampling process and could
arise b’cos of
Faulty sampling design
Small size of sample
Non – sampling error
Coverage error – due to non-response or non-cooperation
of informant
Observational error – due to interviews bias or imperfect
experimental technique or interaction of both
Processing error – due to errors in statistical analysis
36.
Data presentation
Statistical dataonce collected should be systematically
arranged and presented
To arouse interest of readers
For data reduction
To bring out important points clearly and strikingly
For easy grasp and meaningful conclusions
To facilitate further analysis
To facilitate communication
37.
Two main typesof data presentation are
Tabulation
Graphic representation with charts and
diagrams
38.
TABULATION
It is themost common method
Data presentation is in the form of columns
and rows
It can be of the following types
Simple tables
Frequency distribution tables
Frequency distribution table
Ina frequency distribution table, the data is first
split into convenient groups ( class interval ) and
the number of items ( frequency ) which occurs in
each group is shown in adjacent column.
NO. OF CAVITIES NO. OF PATIENTS
0-3 100
3-6 67
6-9 32
9 & ABOVE 20
TYPES OF CHARTSAND DIAGRAMS
Bar chart
Histogram
Frequency polygon
Line diagram
Pie diagram
Spot map or map diagram or cartograms
43.
BAR CHART
Length ofbars drawn vertical or horizontal is proportional to
frequency of variable.
Used to represent qualitative data
Suitable scale is chosen
Bars usually equally spaced
They are of three types
simple bar
Multiple bar
Component bar
44.
SIMPLE BAR CHART
Representsonly one variable
JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC
0
100
200
300
400
500
600
Series1
NO OF CD
PATIENTS
45.
MULTIPLE BAR CHART
twoor more variables are grouped together
JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC
0
100
200
300
400
500
600
700
800
900
1000
CD PATIENTS
RPD PATIENTS
FPD PATIENTS
46.
COMPONENT BAR CHART
barsare divided into two parts
each part representing certain item and
proportional to magnitude of that item
JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC
0
1000
2000
3000
4000
5000
6000
7000
8000
PATIENTS TO OTHER DEPARTMENTS
PATIENTS TO PROSTHODONTICS
47.
HISTOGRAM
Pictorial presentation offrequency distribution
Used to depict quantitative data of continuous type
Represents frequency distribution.
consists of series of rectangles
class interval given on vertical axis
Area of rectangle is proportional to the frequency
75
45
40
32
43
22
34
29
38
0
10
20
30
40
50
60
70
80
Number of carious lesions
0 to 3
3 to 6
6 to 9
9 to 12
12 to 15
15 to 18
18 to 21
21 to 24
24 to 27
48.
FREQUENCY POLYGON
Represents frequencydistribution of
quantitative data.
obtained by joining midpoints of histogram
blocks at the height of frequency by straight
lines usually forming a polygon
49.
LINE DIAGRAM
line diagramare used to show the trends of
events with the passage of time
JAN
FEB
M
ARCH
APRIL
M
AY
JUNE
JULY
AUG
SEP
OCT
NOV
DEC
0
2
4
6
8
10
12
14
16
NO OF IMPLANTS PLACED IN 2010
NO OF IMPLANTS PLACED IN
2010
50.
PIE CHART
In thisfrequencies of the group are shown as segment of
circle
Used to represent qualitative data
Degree of angle denotes the frequency
Angle is calculated by
No. of observations in specific group X 360 / total
observations in all groups
prostho
cons
perio
ortho
pedo
51.
CARTOGRAMS
Spot map ormap diagram
These maps are prepared to show
geographic distribution of frequencies of
characteristics
52.
MEASURES OF STATISTICALAVERAGES OR CENTRAL
TENDENCY
Average value in a distribution is the one central value
around which all the other observations are
concentrated
Average value helps
to find most characteristic value of a set of measurements
to find which group is better off by comparing the average of
one group with that of the other
The most commonly used averages are
Mean
Median
Mode
53.
Objectives of centraltendency
• To condense entire mass of data
• To facilitate comparison
A good measure of central tendency
o Should be easy to understand and compute
o Should be based on each and every item in series
o Should not be affected by extreme variations
o Should be capable of further statistical computations
o Should have sampling stability
54.
MEAN
Refers to arithmeticmean
It is the summation of all the observations
divided by the total number of observations (n)
Denoted by X for sample and µ for population
X = x1 + X2 + X3 …. Xn / n
Advantages – it is easy to calculate
Disadvantages – influenced by extreme values
55.
MEDIAN
When all theobservation are arranged either
in ascending order or descending order, the
middle observation is known as median
In case of even number the average of the
two middle values is taken
Median is better indicator of central value as
it is not affected by the extreme values
56.
MODE
Most frequently occurringobservation in a
data is called mode
Not often used in medical statistics.
Example
Number of decayed teeth in 10 children
2,2,4,1,3,0,10,2,3,
Mode = 2 ( 3 Times)
BIOLOGICAL VARIABILITY
It isthe natural difference which occurs in
individuals due to age, gender and other
attributes which are inherent
This difference is small and occurs by
chance and is within certain accepted
biological limits
e.g. vertical dimension may vary from patient
to patient
59.
REAL VARIABILITY
Such variabilityis more than the normal
biological limits
The cause of difference is not inherent or
natural and is due to some external factors
e.g. difference in incidence of cancer among
smokers and non smokers may be due to
excessive smoking and not due to chance
only
60.
EXPERIMENTAL VARIABILITY
it occursdue to the experimental study
they are of three types
Observer error
the investigator may alter some information or not record the
measurement correctly
Instrumental error
this is due to defects in the measuring instrument
both the observer and the instrument error are called non sampling
error
Sampling error or errors of bias
this is the error which occurs when the samples are not chosen at
random from population.
Thus the sample does not truly represent the population
61.
Measures of variationor dispersion
Biological data collected by measurement
shows variation
Dispersion – degree of spread or variation of
variable about central value.
e.g. BP of an individual can show variation even
if taken by standardized method and measured
by the same person.
Thus one should know what is the normal
variation and how to measure it.
62.
Mainly used
To determinereliability of an average
To serve as basis for control of variability
To compare two or more series in relation to
their variability
Facilitate further statistical analysis
63.
The various measuresof variation or
dispersion are
Range
Mean or average deviation
Standard deviation
Co-efficient of variation
64.
RANGE
It is thesimplest
Defined as the difference between the
highest and the lowest figures in a sample
Defines the normal limits of a biological
characteristic e.g. freeway space ranges
between 2-4 mm
Not satisfactory as based on two extreme
values only
65.
MEAN DEVIATION
It isthe summation of difference or
deviations from the mean in any distribution
ignoring the + or – sign
Denoted by MD
MD = ∑ ( X – x ) / n
X = observation
x = mean
n = no of observation
66.
STANDARD DEVIATION
Also calledroot mean square deviation
It is an Improvement over mean deviation used most commonly
in statistical analysis
Denoted by SD or s for sample and σ for a population
Denoted by the formula
SD = √ ∑ ( x – x )2 / n or n-1
Greater the standard deviation, greater will be the magnitude of
dispersion from mean
Small standard deviation means a high degree of uniformity of
the observations
Usually measurement beyond the range of ± 2 SD are considered
rare or unusual in any distribution
67.
Uses of StandardDeviation
It summarizes the deviation of a large
distribution from it’s mean.
It helps in finding the suitable size of sample e.g.
greater deviation indicates the need for larger
sample to draw meaningful conclusions
It helps in calculation of standard error which
helps us to determine whether the difference
between two samples is by chance or real
68.
COEFFICIENT OF VARIATION
Itis used to compare attributes having two
different units of measurement e.g. height
and weight
Denoted by CV
CV = SD X 100 / Mean
and is expressed as percentage
Higher CV, greater is the variation in series of
data.
69.
Normal distribution ornormal curve
So much of physiologic variation occurs in any
observation
Necessary to
Define normal limits
Determine the chances of an observation being normal
To determine the proportion of observation that lie within a
given range
Normal distribution or normal curve used most
commonly in statistics helps us to find these
Large number of observations with a narrow class
interval gives a frequency curve called the normal curve
70.
It has thefollowing characteristics
Bell shaped
Bilaterally symmetrical
Frequency increases from one side reaches its highest and
decreases exactly the way it had increased
The highest point denotes mean, median and mode which
coincide
Maximum no observations is at value of variable
corresponding to mean and the no of observations
gradually decreases on either side with few observations at
the extreme points
71.
Area under curvebetween any 2 points which correspond to
no of observations between any two values of variate can be
found in terms of a relationship between the mean and the
standard deviation as
Mean +/ - 1 SD includes 68.27% of all observations . such
observations are fairly common
Mean +/ - 2 SD includes 95.45% of all observations i.e. by
convention values beyond this range are uncommon or rare. There
chances of being normal is 100 – 95.45% i.e. only 4.55.%.
Mean +/ - 3 SD includes 99.73%. such values are very rare. There
chance of being normal is 0.27% only
This relationship is used for fixing confidence intervals
72.
These limits oneither side of measurement are called confidence
limits
the look of frequency distribution curve may vary depending on
mean and SD . thus it becomes necessary to standardize it.
Eg- One study has SD as 3 and other has SD as 2,thus it becomes
difficult to compare them
Thus normal curve is standardized by using the unit of standard
deviation to place any measurement with reference to mean.
The curve that emerges through this procedure is called standard
normal curve
74.
Relative or standardnormal deviation
When variable X follows a normal distribution
with mean ¯x and standard deviation S, then
relative or standard normal or deviate Z is
given by
Z = x – x¯ / S or Z = Observation – Mean / SD
Values of Z for several values of X form
normal distribution with mean 0 and SD 1
75.
Probability or pvalue
Probability is the chance of occurrence of any
event or permutation combination.
It is denoted by p for sample and P for
population
In various tests of significance we are often
interested to know whether the observed
difference between 2 samples is by chance or
due to sampling variation.
There probability or p value is used
76.
Probability
P ranges from0 to 1
0 = there is no chance that the observed difference
could not be due to sampling variation
1 = it is absolutely certain that observed difference
between 2 samples is due to sampling variation
However such extreme values are rare.
P = 0.4 i.e. chances that the difference is due to
sampling variation is 4 in 10
Chances that it is not due to sampling variation will be
6 in 10
77.
Probability
The essence ofany test of significance is to find out p
value and draw inference
If p value is 0.05 or more
it is customary to accept that difference is due to chance
(sampling variation) .
The observed difference is said to be statistically not significant.
If p value is less than 0.05
observed difference is not due chance but due to role of some
external factors.
The observed difference here is said to be statistically
significant.
78.
From shape ofnormal curve
We know that 95% observation lie within mean ± 2SD . Thus
probability of value more or less than this range is 5%
From probability tables
p value is also determined by probability tables in case of
student t test or chi square test
By area under normal curve
Here (z) standard normal deviate is calculated
Corresponding to z values the area under the curve is
determined (A)
Probability is given by 2(0.5 - A)
79.
References
Soben Peter; essentialsof preventive and
community dentistry, second edition.
G.N.prabhakara; biostatistics
T . Bhaskara rao; methods of biostatistics
Parametric tests
Parametric testsare those tests in which certain assumptions are
made about the population
Population from which sample is drawn has normal distribution
The variances of sample do not differ significantly
The observations found are truly numerical thus arithmetic procedure
such as addition, division, and multiplication can be used
Since these test make assumptions about the population
parameters hence they are called parameteric tests .
These are usually used to test the difference
They are:
Student t test( paired or unpaired)
ANOVA
Test of significance between two means
83.
Non – parametrictests
In many biological investigation the research worker may not know the
nature of distribution or other required values of the population.
Also some biological measurements may not be true numerical values
hence arithmetic procedures are not possible in such cases.
In such cases distribution free or non parametric tests are used in
which no assumption are made about the population parameters e.g.
Mann Whitney test
Chi square test
Pi coefficient test
Fischer’s Exact test
Sign Test
Freidmans Test
84.
Two tailed test
Thistest determines if there is a difference between the two groups without
specifying whether difference is higher or lower
It includes both ends or tails of the normal distribution
Such test is called Two tailed test
If the objective is to conclude that 2 samples are from same population or
not, without considering the direction of difference between means, then two
tailed test is used,
Eg., when one wants to know if mean IQ in malnourished children is different
from well nourished children but does not specify if it is more or less
85.
One tailed test
Inthe test of significance when one wants to specifically know if the difference
between the two groups is higher or lower . i.e., the direction plus or minus side
is specified.
Then one tail of the distribution is excluded
If the objective is to conclude that the mean of one of the sample is larger than
the other or not, one tailed test is used
E.g., if one wants to know if mal nourished children have less mean IQ than well
nourished then higher side of the distribution will be excluded
Such test of significance is called one tailed test