Good
morning
BIOSTATISTICS
PRESENTED BY
R.PRIYA DARSHINI
1 ST YEAR M.D.S
DEPARTMENT OF
PROSTHODONTICS
CONTENTS
Introduction
Terminology
Data
Source of data
Collection of data
Sampling and sampling design
Sample size
Errors in sampling
Presentation of data
Measures of central tendency
CONTENTS
Measures of dispersion
Normal distribution curve
Probability or p value
Tests of significance
Parametric
Non – parametric test
One tailed
Two tailed
Conclusion
References
INTRODUCTION
Branch of statistics applied to biological or medical
sciences
Statistics
Italian word statista – statesman
German word statistik – political state
Science of statistics originated from
Government records
Mathematics
Father of health statistics –
JOHN GRAUNT(1620-1674)
Biostatistics
Bio – part involves biology
Statistics – involves accumulation, tracking,
analysis, and application of data
Biostatistics is the method of collection,
organizing, analyzing, tabulating and
interpretation of datas related to living
organisms and human beings
TERMINOLOGY
Constant
Quantities that do not vary e.g. in biostatistics, mean,
standard deviation are considered constant for a population
Variable
A name denoting a condition, occurrence, or effect that can
assume different values is a variable.
TERMINOLOGY
Population
Population includes all persons, events and objects under study. it
may be finite or infinite.
Sample
Defined as a part of a population generally selected so as to be
representative of the population whose variables are under study
Individual entities that form focus of the study are termed sampling units.
List of sampling units is known as sampling list or sampling frame.
TERMINOLOGY
Parameter
It is a constant that describes a population e.g. in a college
there are 80% girls or average age of dental patients in2010.
This describes the population, hence it is a parameter.
Statistic
Statistic is a constant that describes the sample e.g. out of
100 students of the same college 80% girls. This 80% will be
statistic as it describes the sample
TERMINOLOGY
Attribute
A characteristic based on which the population
can be described into categories or class e.g.
gender, caste, religion.
DATA
A collective recording of observations either numerical or
otherwise is called Data
Observations can be collected by recording
No of cases who have removable prosthesis
Or by the age or gender of the patient
In any of the cases a certain observation is made of a
characteristic which varies from person to person – this is
called a Variable
TYPES OF DATA
Data is of two types
Qualitative data
Quantitative data
QUALITATIVE DATA
Data collected on basis of attributes or quality like sex, occlusion, cavity
etc.
In such data there is no notion of magnitude or size of an attribute as
the same cannot be measured.
The number of person having the same attribute are variable and are
measured
e.g. like out of 100 people 75 have class I occlusion, 15 have class II occlusion and
10 have class III occlusion.
Class I II III are attributes , which cannot be measured in figures, only no of people
having it can be determined
QUANTITATIVE DATA
Data is collected through measurement using calipers, like arch length, width,
flouride concentration in water supply etc.
In this the attribute has a magnitude, both the attribute and the number of
persons having the attribute vary.
It may be
Continous - variable can take any value in a given range, decimal or fractional.
Discrete – variable under observation takes fixed values like whole numbers. (DMF teeth)
E.g Freeway space. It varies for every patient. It is a quantity with a different
value for each individual and is measurable. It is continuous as it can take any
value between 2 and 4 like it can be 2.10 or 2.55 or 3.07 etc.
Source of data
The main sources for collection of data
Experiments
Surveys
Records
SOURCE OF DATA
Experiments
Experiments are performed to collect data for
investigations and research by one or more
workers.
Records
Records are maintained as a routine in registers
and books over a long period of time
provides readymade data.
Surveys
Carried out for Epidemiological studies in the field
by trained teams to find incidence or prevalence of
health or disease in a community.
These epidemiological studies can be either
Descriptive or Analytical
Descriptive epidemiological
study
Descriptive epidemiological studies uses
Cross – sectional study
Longitudinal study for measuring of a disease in terms
of magnitude
Incidence of a disease is obtained from longitudinal
study
Prevalence of a disease is obtained from cross sectional
study
Cross – sectional study
Measurement of exposure and effect are made at
the same time. So, we get relationship between a
disease and other variables of interest as they exist
at one point of time
E.g., In cross- sectional study of oral cancer we can
also collect data during their survey on age, sex,
occupation, habits and tobacco usage
Longitudinal study
Study conducted over long period of time, it is longitudinal study
These are done on samples drawn from population and
observations made at periodic intervals
Longitudinal studies are useful
For studying the natural history of disease and its outcome
For identifying risk factors associated with disease
For calculating the incidence rate of disease.
Analytical epidemiological
study
Cause of disease, referred to as event, condition or
characteristic or combination of these factors play an
important role in production of disease.
Before ensuring a factor as a cause, several observations
have to be made on the so called ‘exposure’
These observations comprise the procedure of analytical
epidemiology
Cohort study
Approach of beginning with exposure and searching for effects in a
prospective manner in time is referred to as Cohort study
Is an observational study which attempts to study relationship between
purported cause and subsequent risk of developing disease.
Distinguishing features of cohort study
Group of persons to be studied are defined in terms of characteristics manifest
prior to appearance of disease under investigation
Study groups are observed over a period of time to determine frequency of
disease among them
Case – control study
Beginning with disease and searching for causes in the
past is referred to as case – control study.
This is done along with another group of individuals who
have not fallen prey of the condition, called the control
Case – control studies are primarily used to assess risks
and to study causes of diseases.
Data collected through
Primary source
Secondary source
SAMPLING
Selection of sampling
Purposive sample
Random sample
Advantages ( acc to R.A.Fisher)
Adaptability
Speed
Economy
Enhanced scientific approach
SAMPLING DESIGN
Simple random sampling
Systemic random sampling
Stratified random sampling
Cluster sampling
Multiphase sampling
Pathfinder surveys
SIMPLE RANDOM SAMPLING
Eash and every unit in population have equal
chance of being included in sample.
Selection of unit is determined by chance.
To ensure randomness any of the methods
can be chosen
Lottery method
Table of random numbers
SYSTEMIC RANDOM
SAMPLING
Selecting one unit at random and then
selecting additional units at evenly spaced
interval till sample of required size has been
formed
This method is used when complete list of
population is available
STRATIFIED RANDOM
SAMPLING
Population to be sampled is subdivided into groups known as
strata, such that each group is homogeneous in characteristic
A simple random sample is chosen from each stratum. This type
of sampling is heterogeneous with regard to the characteristic
under study.
E.g., to know the prevalence of DMF teeth in different age groups,
then age groups form the strata and the random sample should
be chosen from each stratum i.e., the age group
CLUSTER SAMPLING
This method used when population forms natural
groups or clusters, such as villages, wards, school
children etc.
First a sample of clusters is selected and then all
units in each of the selected clusters are surveyed.
This method simpler and involves less time and cost,
but gives higher standard error.
MULTIPHASE SAMPLING
Part of information is collected from whole sample
and part from sub-sample.
E.g., all children in school are surveyed and those
with only oral health problem are selected in second
phase. In 3rd
phase those who need the treatment
are only selected. Thus by 3rd
and 4th
phases sub
samples become smaller and smaller
PATHFINDER SURVEY
Pathfinder surveys can be either pilot or national, depending on the
number and type of sampling sites and the age groups included.
A national pathfinder survey incorporates sufficient examination sites
to cover all important subgroups of the population that may have
differing disease levels or treatment needs, and at least three of the
age groups or index ages
This type of survey design is suitable for collection of data for planning
and monitoring of services in all countries whatever the level of
disease, availability of resources, or complexity of services.
In a large country with many geographic and
population subdivisions and a complex service
structure, a larger number of sampling sites is
needed.
The basic principle of using index ages and
standard samples in each sites within a
stratified approach, however, remains valid.
SAMPLE SIZE ???
Bigger the sample, higher the precision of estimates
of sample.
E.g., if field survey is conducted to estimate the
prevalence rate of a disease, the sample size is
calculated by the formula
n = z2
α * p* (1-p) / L2
n – sample size
p – app. prevalence rate of disease
L – permissible error in estimation of p
zα – normal value for probability level.
SAMPLING ERRORS
2 types of errors
Sampling error – due to sampling process and could
arise b’cos of
Faulty sampling design
Small size of sample
Non – sampling error
Coverage error – due to non-response or non-cooperation
of informant
Observational error – due to interviews bias or imperfect
experimental technique or interaction of both
Processing error – due to errors in statistical analysis
Data presentation
Statistical data once collected should be systematically
arranged and presented
To arouse interest of readers
For data reduction
To bring out important points clearly and strikingly
For easy grasp and meaningful conclusions
To facilitate further analysis
To facilitate communication
Two main types of data presentation are
Tabulation
Graphic representation with charts and
diagrams
TABULATION
It is the most common method
Data presentation is in the form of columns
and rows
It can be of the following types
Simple tables
Frequency distribution tables
SIMPLE TABLE
MONTH NO. OF PATIENTS AT VDC, BVRM
JANUARY 2800
FEBRUARY 3000
MARCH 2500
Frequency distribution table
In a frequency distribution table, the data is first
split into convenient groups ( class interval ) and
the number of items ( frequency ) which occurs in
each group is shown in adjacent column.
NO. OF CAVITIES NO. OF PATIENTS
0-3 100
3-6 67
6-9 32
9 & ABOVE 20
GRAPHIC REPRESENTATION
Charts and diagrams
Useful method of presenting statistical data
Powerful impact on imagination of the people
TYPES OF CHARTS AND DIAGRAMS
Bar chart
Histogram
Frequency polygon
Line diagram
Pie diagram
Spot map or map diagram or cartograms
BAR CHART
Length of bars drawn vertical or horizontal is proportional to
frequency of variable.
Used to represent qualitative data
Suitable scale is chosen
Bars usually equally spaced
They are of three types
simple bar
Multiple bar
Component bar
SIMPLE BAR CHART
Represents only one variable
JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC
0
100
200
300
400
500
600
Series1
NO OF CD
PATIENTS
MULTIPLE BAR CHART
two or more variables are grouped together
JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC
0
100
200
300
400
500
600
700
800
900
1000
CD PATIENTS
RPD PATIENTS
FPD PATIENTS
COMPONENT BAR CHART
bars are divided into two parts
each part representing certain item and
proportional to magnitude of that item
JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC
0
1000
2000
3000
4000
5000
6000
7000
8000
PATIENTS TO OTHER DEPARTMENTS
PATIENTS TO PROSTHODONTICS
HISTOGRAM
Pictorial presentation of frequency distribution
Used to depict quantitative data of continuous type
Represents frequency distribution.
consists of series of rectangles
class interval given on vertical axis
Area of rectangle is proportional to the frequency
75
45
40
32
43
22
34
29
38
0
10
20
30
40
50
60
70
80
Number of carious lesions
0 to 3
3 to 6
6 to 9
9 to 12
12 to 15
15 to 18
18 to 21
21 to 24
24 to 27
FREQUENCY POLYGON
Represents frequency distribution of
quantitative data.
obtained by joining midpoints of histogram
blocks at the height of frequency by straight
lines usually forming a polygon
LINE DIAGRAM
line diagram are used to show the trends of
events with the passage of time
JAN
FEB
M
ARCH
APRIL
M
AY
JUNE
JULY
AUG
SEP
OCT
NOV
DEC
0
2
4
6
8
10
12
14
16
NO OF IMPLANTS PLACED IN 2010
NO OF IMPLANTS PLACED IN
2010
PIE CHART
In this frequencies of the group are shown as segment of
circle
Used to represent qualitative data
Degree of angle denotes the frequency
Angle is calculated by
No. of observations in specific group X 360 / total
observations in all groups
prostho
cons
perio
ortho
pedo
CARTOGRAMS
Spot map or map diagram
These maps are prepared to show
geographic distribution of frequencies of
characteristics
MEASURES OF STATISTICAL AVERAGES OR CENTRAL
TENDENCY
Average value in a distribution is the one central value
around which all the other observations are
concentrated
Average value helps
to find most characteristic value of a set of measurements
to find which group is better off by comparing the average of
one group with that of the other
The most commonly used averages are
Mean
Median
Mode
Objectives of central tendency
• To condense entire mass of data
• To facilitate comparison
A good measure of central tendency
o Should be easy to understand and compute
o Should be based on each and every item in series
o Should not be affected by extreme variations
o Should be capable of further statistical computations
o Should have sampling stability
MEAN
Refers to arithmetic mean
It is the summation of all the observations
divided by the total number of observations (n)
Denoted by X for sample and µ for population
X = x1 + X2 + X3 …. Xn / n
Advantages – it is easy to calculate
Disadvantages – influenced by extreme values
MEDIAN
When all the observation are arranged either
in ascending order or descending order, the
middle observation is known as median
In case of even number the average of the
two middle values is taken
Median is better indicator of central value as
it is not affected by the extreme values
MODE
Most frequently occurring observation in a
data is called mode
Not often used in medical statistics.
Example
Number of decayed teeth in 10 children
2,2,4,1,3,0,10,2,3,
Mode = 2 ( 3 Times)
VARIABILITY
Types of variability
There are three types of variability
Biological variability
Real variability
Experimental variability
BIOLOGICAL VARIABILITY
It is the natural difference which occurs in
individuals due to age, gender and other
attributes which are inherent
This difference is small and occurs by
chance and is within certain accepted
biological limits
e.g. vertical dimension may vary from patient
to patient
REAL VARIABILITY
Such variability is more than the normal
biological limits
The cause of difference is not inherent or
natural and is due to some external factors
e.g. difference in incidence of cancer among
smokers and non smokers may be due to
excessive smoking and not due to chance
only
EXPERIMENTAL VARIABILITY
it occurs due to the experimental study
they are of three types
Observer error
the investigator may alter some information or not record the
measurement correctly
Instrumental error
this is due to defects in the measuring instrument
both the observer and the instrument error are called non sampling
error
Sampling error or errors of bias
this is the error which occurs when the samples are not chosen at
random from population.
Thus the sample does not truly represent the population
Measures of variation or dispersion
Biological data collected by measurement
shows variation
Dispersion – degree of spread or variation of
variable about central value.
e.g. BP of an individual can show variation even
if taken by standardized method and measured
by the same person.
Thus one should know what is the normal
variation and how to measure it.
Mainly used
To determine reliability of an average
To serve as basis for control of variability
To compare two or more series in relation to
their variability
Facilitate further statistical analysis
The various measures of variation or
dispersion are
Range
Mean or average deviation
Standard deviation
Co-efficient of variation
RANGE
It is the simplest
Defined as the difference between the
highest and the lowest figures in a sample
Defines the normal limits of a biological
characteristic e.g. freeway space ranges
between 2-4 mm
Not satisfactory as based on two extreme
values only
MEAN DEVIATION
It is the summation of difference or
deviations from the mean in any distribution
ignoring the + or – sign
Denoted by MD
MD = ∑ ( X – x ) / n
X = observation
x = mean
n = no of observation
STANDARD DEVIATION
Also called root mean square deviation
It is an Improvement over mean deviation used most commonly
in statistical analysis
Denoted by SD or s for sample and σ for a population
Denoted by the formula
SD = √ ∑ ( x – x )2 / n or n-1
Greater the standard deviation, greater will be the magnitude of
dispersion from mean
Small standard deviation means a high degree of uniformity of
the observations
Usually measurement beyond the range of ± 2 SD are considered
rare or unusual in any distribution
Uses of Standard Deviation
It summarizes the deviation of a large
distribution from it’s mean.
It helps in finding the suitable size of sample e.g.
greater deviation indicates the need for larger
sample to draw meaningful conclusions
It helps in calculation of standard error which
helps us to determine whether the difference
between two samples is by chance or real
COEFFICIENT OF VARIATION
It is used to compare attributes having two
different units of measurement e.g. height
and weight
Denoted by CV
CV = SD X 100 / Mean
and is expressed as percentage
Higher CV, greater is the variation in series of
data.
Normal distribution or normal curve
So much of physiologic variation occurs in any
observation
Necessary to
Define normal limits
Determine the chances of an observation being normal
To determine the proportion of observation that lie within a
given range
Normal distribution or normal curve used most
commonly in statistics helps us to find these
Large number of observations with a narrow class
interval gives a frequency curve called the normal curve
It has the following characteristics
Bell shaped
Bilaterally symmetrical
Frequency increases from one side reaches its highest and
decreases exactly the way it had increased
The highest point denotes mean, median and mode which
coincide
Maximum no observations is at value of variable
corresponding to mean and the no of observations
gradually decreases on either side with few observations at
the extreme points
Area under curve between any 2 points which correspond to
no of observations between any two values of variate can be
found in terms of a relationship between the mean and the
standard deviation as
Mean +/ - 1 SD includes 68.27% of all observations . such
observations are fairly common
Mean +/ - 2 SD includes 95.45% of all observations i.e. by
convention values beyond this range are uncommon or rare. There
chances of being normal is 100 – 95.45% i.e. only 4.55.%.
Mean +/ - 3 SD includes 99.73%. such values are very rare. There
chance of being normal is 0.27% only
This relationship is used for fixing confidence intervals
These limits on either side of measurement are called confidence
limits
the look of frequency distribution curve may vary depending on
mean and SD . thus it becomes necessary to standardize it.
Eg- One study has SD as 3 and other has SD as 2,thus it becomes
difficult to compare them
Thus normal curve is standardized by using the unit of standard
deviation to place any measurement with reference to mean.
The curve that emerges through this procedure is called standard
normal curve
Relative or standard normal deviation
When variable X follows a normal distribution
with mean ¯x and standard deviation S, then
relative or standard normal or deviate Z is
given by
Z = x – x¯ / S or Z = Observation – Mean / SD
Values of Z for several values of X form
normal distribution with mean 0 and SD 1
Probability or p value
Probability is the chance of occurrence of any
event or permutation combination.
It is denoted by p for sample and P for
population
In various tests of significance we are often
interested to know whether the observed
difference between 2 samples is by chance or
due to sampling variation.
There probability or p value is used
Probability
P ranges from 0 to 1
0 = there is no chance that the observed difference
could not be due to sampling variation
1 = it is absolutely certain that observed difference
between 2 samples is due to sampling variation
However such extreme values are rare.
P = 0.4 i.e. chances that the difference is due to
sampling variation is 4 in 10
Chances that it is not due to sampling variation will be
6 in 10
Probability
The essence of any test of significance is to find out p
value and draw inference
If p value is 0.05 or more
it is customary to accept that difference is due to chance
(sampling variation) .
The observed difference is said to be statistically not significant.
If p value is less than 0.05
observed difference is not due chance but due to role of some
external factors.
The observed difference here is said to be statistically
significant.
From shape of normal curve
We know that 95% observation lie within mean ± 2SD . Thus
probability of value more or less than this range is 5%
From probability tables
p value is also determined by probability tables in case of
student t test or chi square test
By area under normal curve
Here (z) standard normal deviate is calculated
Corresponding to z values the area under the curve is
determined (A)
Probability is given by 2(0.5 - A)
References
Soben Peter; essentials of preventive and
community dentistry, second edition.
G.N.prabhakara; biostatistics
T . Bhaskara rao; methods of biostatistics
Tests of significance
Classified as
Parametric tests
Non – parametric tests
Can also be divided into
One tailed
Two tailes
Parametric tests
Parametric tests are those tests in which certain assumptions are
made about the population
Population from which sample is drawn has normal distribution
The variances of sample do not differ significantly
The observations found are truly numerical thus arithmetic procedure
such as addition, division, and multiplication can be used
Since these test make assumptions about the population
parameters hence they are called parameteric tests .
These are usually used to test the difference
They are:
Student t test( paired or unpaired)
ANOVA
Test of significance between two means
Non – parametric tests
In many biological investigation the research worker may not know the
nature of distribution or other required values of the population.
Also some biological measurements may not be true numerical values
hence arithmetic procedures are not possible in such cases.
In such cases distribution free or non parametric tests are used in
which no assumption are made about the population parameters e.g.
Mann Whitney test
Chi square test
Pi coefficient test
Fischer’s Exact test
Sign Test
Freidmans Test
Two tailed test
This test determines if there is a difference between the two groups without
specifying whether difference is higher or lower
It includes both ends or tails of the normal distribution
Such test is called Two tailed test
If the objective is to conclude that 2 samples are from same population or
not, without considering the direction of difference between means, then two
tailed test is used,
Eg., when one wants to know if mean IQ in malnourished children is different
from well nourished children but does not specify if it is more or less
One tailed test
In the test of significance when one wants to specifically know if the difference
between the two groups is higher or lower . i.e., the direction plus or minus side
is specified.
Then one tail of the distribution is excluded
If the objective is to conclude that the mean of one of the sample is larger than
the other or not, one tailed test is used
E.g., if one wants to know if mal nourished children have less mean IQ than well
nourished then higher side of the distribution will be excluded
Such test of significance is called one tailed test
BIOSTATISTI                             CS.pptx
BIOSTATISTI                             CS.pptx
BIOSTATISTI                             CS.pptx

BIOSTATISTI CS.pptx

  • 1.
  • 2.
    BIOSTATISTICS PRESENTED BY R.PRIYA DARSHINI 1ST YEAR M.D.S DEPARTMENT OF PROSTHODONTICS
  • 3.
    CONTENTS Introduction Terminology Data Source of data Collectionof data Sampling and sampling design Sample size Errors in sampling Presentation of data Measures of central tendency
  • 4.
    CONTENTS Measures of dispersion Normaldistribution curve Probability or p value Tests of significance Parametric Non – parametric test One tailed Two tailed Conclusion References
  • 5.
    INTRODUCTION Branch of statisticsapplied to biological or medical sciences Statistics Italian word statista – statesman German word statistik – political state Science of statistics originated from Government records Mathematics Father of health statistics – JOHN GRAUNT(1620-1674)
  • 6.
    Biostatistics Bio – partinvolves biology Statistics – involves accumulation, tracking, analysis, and application of data Biostatistics is the method of collection, organizing, analyzing, tabulating and interpretation of datas related to living organisms and human beings
  • 7.
    TERMINOLOGY Constant Quantities that donot vary e.g. in biostatistics, mean, standard deviation are considered constant for a population Variable A name denoting a condition, occurrence, or effect that can assume different values is a variable.
  • 8.
    TERMINOLOGY Population Population includes allpersons, events and objects under study. it may be finite or infinite. Sample Defined as a part of a population generally selected so as to be representative of the population whose variables are under study Individual entities that form focus of the study are termed sampling units. List of sampling units is known as sampling list or sampling frame.
  • 9.
    TERMINOLOGY Parameter It is aconstant that describes a population e.g. in a college there are 80% girls or average age of dental patients in2010. This describes the population, hence it is a parameter. Statistic Statistic is a constant that describes the sample e.g. out of 100 students of the same college 80% girls. This 80% will be statistic as it describes the sample
  • 10.
    TERMINOLOGY Attribute A characteristic basedon which the population can be described into categories or class e.g. gender, caste, religion.
  • 11.
    DATA A collective recordingof observations either numerical or otherwise is called Data Observations can be collected by recording No of cases who have removable prosthesis Or by the age or gender of the patient In any of the cases a certain observation is made of a characteristic which varies from person to person – this is called a Variable
  • 12.
    TYPES OF DATA Datais of two types Qualitative data Quantitative data
  • 13.
    QUALITATIVE DATA Data collectedon basis of attributes or quality like sex, occlusion, cavity etc. In such data there is no notion of magnitude or size of an attribute as the same cannot be measured. The number of person having the same attribute are variable and are measured e.g. like out of 100 people 75 have class I occlusion, 15 have class II occlusion and 10 have class III occlusion. Class I II III are attributes , which cannot be measured in figures, only no of people having it can be determined
  • 14.
    QUANTITATIVE DATA Data iscollected through measurement using calipers, like arch length, width, flouride concentration in water supply etc. In this the attribute has a magnitude, both the attribute and the number of persons having the attribute vary. It may be Continous - variable can take any value in a given range, decimal or fractional. Discrete – variable under observation takes fixed values like whole numbers. (DMF teeth) E.g Freeway space. It varies for every patient. It is a quantity with a different value for each individual and is measurable. It is continuous as it can take any value between 2 and 4 like it can be 2.10 or 2.55 or 3.07 etc.
  • 15.
    Source of data Themain sources for collection of data Experiments Surveys Records
  • 16.
    SOURCE OF DATA Experiments Experimentsare performed to collect data for investigations and research by one or more workers. Records Records are maintained as a routine in registers and books over a long period of time provides readymade data.
  • 17.
    Surveys Carried out forEpidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community. These epidemiological studies can be either Descriptive or Analytical
  • 18.
    Descriptive epidemiological study Descriptive epidemiologicalstudies uses Cross – sectional study Longitudinal study for measuring of a disease in terms of magnitude Incidence of a disease is obtained from longitudinal study Prevalence of a disease is obtained from cross sectional study
  • 19.
    Cross – sectionalstudy Measurement of exposure and effect are made at the same time. So, we get relationship between a disease and other variables of interest as they exist at one point of time E.g., In cross- sectional study of oral cancer we can also collect data during their survey on age, sex, occupation, habits and tobacco usage
  • 20.
    Longitudinal study Study conductedover long period of time, it is longitudinal study These are done on samples drawn from population and observations made at periodic intervals Longitudinal studies are useful For studying the natural history of disease and its outcome For identifying risk factors associated with disease For calculating the incidence rate of disease.
  • 21.
    Analytical epidemiological study Cause ofdisease, referred to as event, condition or characteristic or combination of these factors play an important role in production of disease. Before ensuring a factor as a cause, several observations have to be made on the so called ‘exposure’ These observations comprise the procedure of analytical epidemiology
  • 22.
    Cohort study Approach ofbeginning with exposure and searching for effects in a prospective manner in time is referred to as Cohort study Is an observational study which attempts to study relationship between purported cause and subsequent risk of developing disease. Distinguishing features of cohort study Group of persons to be studied are defined in terms of characteristics manifest prior to appearance of disease under investigation Study groups are observed over a period of time to determine frequency of disease among them
  • 23.
    Case – controlstudy Beginning with disease and searching for causes in the past is referred to as case – control study. This is done along with another group of individuals who have not fallen prey of the condition, called the control Case – control studies are primarily used to assess risks and to study causes of diseases.
  • 24.
    Data collected through Primarysource Secondary source
  • 25.
    SAMPLING Selection of sampling Purposivesample Random sample Advantages ( acc to R.A.Fisher) Adaptability Speed Economy Enhanced scientific approach
  • 26.
    SAMPLING DESIGN Simple randomsampling Systemic random sampling Stratified random sampling Cluster sampling Multiphase sampling Pathfinder surveys
  • 27.
    SIMPLE RANDOM SAMPLING Eashand every unit in population have equal chance of being included in sample. Selection of unit is determined by chance. To ensure randomness any of the methods can be chosen Lottery method Table of random numbers
  • 28.
    SYSTEMIC RANDOM SAMPLING Selecting oneunit at random and then selecting additional units at evenly spaced interval till sample of required size has been formed This method is used when complete list of population is available
  • 29.
    STRATIFIED RANDOM SAMPLING Population tobe sampled is subdivided into groups known as strata, such that each group is homogeneous in characteristic A simple random sample is chosen from each stratum. This type of sampling is heterogeneous with regard to the characteristic under study. E.g., to know the prevalence of DMF teeth in different age groups, then age groups form the strata and the random sample should be chosen from each stratum i.e., the age group
  • 30.
    CLUSTER SAMPLING This methodused when population forms natural groups or clusters, such as villages, wards, school children etc. First a sample of clusters is selected and then all units in each of the selected clusters are surveyed. This method simpler and involves less time and cost, but gives higher standard error.
  • 31.
    MULTIPHASE SAMPLING Part ofinformation is collected from whole sample and part from sub-sample. E.g., all children in school are surveyed and those with only oral health problem are selected in second phase. In 3rd phase those who need the treatment are only selected. Thus by 3rd and 4th phases sub samples become smaller and smaller
  • 32.
    PATHFINDER SURVEY Pathfinder surveyscan be either pilot or national, depending on the number and type of sampling sites and the age groups included. A national pathfinder survey incorporates sufficient examination sites to cover all important subgroups of the population that may have differing disease levels or treatment needs, and at least three of the age groups or index ages This type of survey design is suitable for collection of data for planning and monitoring of services in all countries whatever the level of disease, availability of resources, or complexity of services.
  • 33.
    In a largecountry with many geographic and population subdivisions and a complex service structure, a larger number of sampling sites is needed. The basic principle of using index ages and standard samples in each sites within a stratified approach, however, remains valid.
  • 34.
    SAMPLE SIZE ??? Biggerthe sample, higher the precision of estimates of sample. E.g., if field survey is conducted to estimate the prevalence rate of a disease, the sample size is calculated by the formula n = z2 α * p* (1-p) / L2 n – sample size p – app. prevalence rate of disease L – permissible error in estimation of p zα – normal value for probability level.
  • 35.
    SAMPLING ERRORS 2 typesof errors Sampling error – due to sampling process and could arise b’cos of Faulty sampling design Small size of sample Non – sampling error Coverage error – due to non-response or non-cooperation of informant Observational error – due to interviews bias or imperfect experimental technique or interaction of both Processing error – due to errors in statistical analysis
  • 36.
    Data presentation Statistical dataonce collected should be systematically arranged and presented To arouse interest of readers For data reduction To bring out important points clearly and strikingly For easy grasp and meaningful conclusions To facilitate further analysis To facilitate communication
  • 37.
    Two main typesof data presentation are Tabulation Graphic representation with charts and diagrams
  • 38.
    TABULATION It is themost common method Data presentation is in the form of columns and rows It can be of the following types Simple tables Frequency distribution tables
  • 39.
    SIMPLE TABLE MONTH NO.OF PATIENTS AT VDC, BVRM JANUARY 2800 FEBRUARY 3000 MARCH 2500
  • 40.
    Frequency distribution table Ina frequency distribution table, the data is first split into convenient groups ( class interval ) and the number of items ( frequency ) which occurs in each group is shown in adjacent column. NO. OF CAVITIES NO. OF PATIENTS 0-3 100 3-6 67 6-9 32 9 & ABOVE 20
  • 41.
    GRAPHIC REPRESENTATION Charts anddiagrams Useful method of presenting statistical data Powerful impact on imagination of the people
  • 42.
    TYPES OF CHARTSAND DIAGRAMS Bar chart Histogram Frequency polygon Line diagram Pie diagram Spot map or map diagram or cartograms
  • 43.
    BAR CHART Length ofbars drawn vertical or horizontal is proportional to frequency of variable. Used to represent qualitative data Suitable scale is chosen Bars usually equally spaced They are of three types simple bar Multiple bar Component bar
  • 44.
    SIMPLE BAR CHART Representsonly one variable JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC 0 100 200 300 400 500 600 Series1 NO OF CD PATIENTS
  • 45.
    MULTIPLE BAR CHART twoor more variables are grouped together JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC 0 100 200 300 400 500 600 700 800 900 1000 CD PATIENTS RPD PATIENTS FPD PATIENTS
  • 46.
    COMPONENT BAR CHART barsare divided into two parts each part representing certain item and proportional to magnitude of that item JAN-MARCH APRIL-JUNE JULY-SEP OCT-DEC 0 1000 2000 3000 4000 5000 6000 7000 8000 PATIENTS TO OTHER DEPARTMENTS PATIENTS TO PROSTHODONTICS
  • 47.
    HISTOGRAM Pictorial presentation offrequency distribution Used to depict quantitative data of continuous type Represents frequency distribution. consists of series of rectangles class interval given on vertical axis Area of rectangle is proportional to the frequency 75 45 40 32 43 22 34 29 38 0 10 20 30 40 50 60 70 80 Number of carious lesions 0 to 3 3 to 6 6 to 9 9 to 12 12 to 15 15 to 18 18 to 21 21 to 24 24 to 27
  • 48.
    FREQUENCY POLYGON Represents frequencydistribution of quantitative data. obtained by joining midpoints of histogram blocks at the height of frequency by straight lines usually forming a polygon
  • 49.
    LINE DIAGRAM line diagramare used to show the trends of events with the passage of time JAN FEB M ARCH APRIL M AY JUNE JULY AUG SEP OCT NOV DEC 0 2 4 6 8 10 12 14 16 NO OF IMPLANTS PLACED IN 2010 NO OF IMPLANTS PLACED IN 2010
  • 50.
    PIE CHART In thisfrequencies of the group are shown as segment of circle Used to represent qualitative data Degree of angle denotes the frequency Angle is calculated by No. of observations in specific group X 360 / total observations in all groups prostho cons perio ortho pedo
  • 51.
    CARTOGRAMS Spot map ormap diagram These maps are prepared to show geographic distribution of frequencies of characteristics
  • 52.
    MEASURES OF STATISTICALAVERAGES OR CENTRAL TENDENCY Average value in a distribution is the one central value around which all the other observations are concentrated Average value helps to find most characteristic value of a set of measurements to find which group is better off by comparing the average of one group with that of the other The most commonly used averages are Mean Median Mode
  • 53.
    Objectives of centraltendency • To condense entire mass of data • To facilitate comparison A good measure of central tendency o Should be easy to understand and compute o Should be based on each and every item in series o Should not be affected by extreme variations o Should be capable of further statistical computations o Should have sampling stability
  • 54.
    MEAN Refers to arithmeticmean It is the summation of all the observations divided by the total number of observations (n) Denoted by X for sample and µ for population X = x1 + X2 + X3 …. Xn / n Advantages – it is easy to calculate Disadvantages – influenced by extreme values
  • 55.
    MEDIAN When all theobservation are arranged either in ascending order or descending order, the middle observation is known as median In case of even number the average of the two middle values is taken Median is better indicator of central value as it is not affected by the extreme values
  • 56.
    MODE Most frequently occurringobservation in a data is called mode Not often used in medical statistics. Example Number of decayed teeth in 10 children 2,2,4,1,3,0,10,2,3, Mode = 2 ( 3 Times)
  • 57.
    VARIABILITY Types of variability Thereare three types of variability Biological variability Real variability Experimental variability
  • 58.
    BIOLOGICAL VARIABILITY It isthe natural difference which occurs in individuals due to age, gender and other attributes which are inherent This difference is small and occurs by chance and is within certain accepted biological limits e.g. vertical dimension may vary from patient to patient
  • 59.
    REAL VARIABILITY Such variabilityis more than the normal biological limits The cause of difference is not inherent or natural and is due to some external factors e.g. difference in incidence of cancer among smokers and non smokers may be due to excessive smoking and not due to chance only
  • 60.
    EXPERIMENTAL VARIABILITY it occursdue to the experimental study they are of three types Observer error the investigator may alter some information or not record the measurement correctly Instrumental error this is due to defects in the measuring instrument both the observer and the instrument error are called non sampling error Sampling error or errors of bias this is the error which occurs when the samples are not chosen at random from population. Thus the sample does not truly represent the population
  • 61.
    Measures of variationor dispersion Biological data collected by measurement shows variation Dispersion – degree of spread or variation of variable about central value. e.g. BP of an individual can show variation even if taken by standardized method and measured by the same person. Thus one should know what is the normal variation and how to measure it.
  • 62.
    Mainly used To determinereliability of an average To serve as basis for control of variability To compare two or more series in relation to their variability Facilitate further statistical analysis
  • 63.
    The various measuresof variation or dispersion are Range Mean or average deviation Standard deviation Co-efficient of variation
  • 64.
    RANGE It is thesimplest Defined as the difference between the highest and the lowest figures in a sample Defines the normal limits of a biological characteristic e.g. freeway space ranges between 2-4 mm Not satisfactory as based on two extreme values only
  • 65.
    MEAN DEVIATION It isthe summation of difference or deviations from the mean in any distribution ignoring the + or – sign Denoted by MD MD = ∑ ( X – x ) / n X = observation x = mean n = no of observation
  • 66.
    STANDARD DEVIATION Also calledroot mean square deviation It is an Improvement over mean deviation used most commonly in statistical analysis Denoted by SD or s for sample and σ for a population Denoted by the formula SD = √ ∑ ( x – x )2 / n or n-1 Greater the standard deviation, greater will be the magnitude of dispersion from mean Small standard deviation means a high degree of uniformity of the observations Usually measurement beyond the range of ± 2 SD are considered rare or unusual in any distribution
  • 67.
    Uses of StandardDeviation It summarizes the deviation of a large distribution from it’s mean. It helps in finding the suitable size of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real
  • 68.
    COEFFICIENT OF VARIATION Itis used to compare attributes having two different units of measurement e.g. height and weight Denoted by CV CV = SD X 100 / Mean and is expressed as percentage Higher CV, greater is the variation in series of data.
  • 69.
    Normal distribution ornormal curve So much of physiologic variation occurs in any observation Necessary to Define normal limits Determine the chances of an observation being normal To determine the proportion of observation that lie within a given range Normal distribution or normal curve used most commonly in statistics helps us to find these Large number of observations with a narrow class interval gives a frequency curve called the normal curve
  • 70.
    It has thefollowing characteristics Bell shaped Bilaterally symmetrical Frequency increases from one side reaches its highest and decreases exactly the way it had increased The highest point denotes mean, median and mode which coincide Maximum no observations is at value of variable corresponding to mean and the no of observations gradually decreases on either side with few observations at the extreme points
  • 71.
    Area under curvebetween any 2 points which correspond to no of observations between any two values of variate can be found in terms of a relationship between the mean and the standard deviation as Mean +/ - 1 SD includes 68.27% of all observations . such observations are fairly common Mean +/ - 2 SD includes 95.45% of all observations i.e. by convention values beyond this range are uncommon or rare. There chances of being normal is 100 – 95.45% i.e. only 4.55.%. Mean +/ - 3 SD includes 99.73%. such values are very rare. There chance of being normal is 0.27% only This relationship is used for fixing confidence intervals
  • 72.
    These limits oneither side of measurement are called confidence limits the look of frequency distribution curve may vary depending on mean and SD . thus it becomes necessary to standardize it. Eg- One study has SD as 3 and other has SD as 2,thus it becomes difficult to compare them Thus normal curve is standardized by using the unit of standard deviation to place any measurement with reference to mean. The curve that emerges through this procedure is called standard normal curve
  • 74.
    Relative or standardnormal deviation When variable X follows a normal distribution with mean ¯x and standard deviation S, then relative or standard normal or deviate Z is given by Z = x – x¯ / S or Z = Observation – Mean / SD Values of Z for several values of X form normal distribution with mean 0 and SD 1
  • 75.
    Probability or pvalue Probability is the chance of occurrence of any event or permutation combination. It is denoted by p for sample and P for population In various tests of significance we are often interested to know whether the observed difference between 2 samples is by chance or due to sampling variation. There probability or p value is used
  • 76.
    Probability P ranges from0 to 1 0 = there is no chance that the observed difference could not be due to sampling variation 1 = it is absolutely certain that observed difference between 2 samples is due to sampling variation However such extreme values are rare. P = 0.4 i.e. chances that the difference is due to sampling variation is 4 in 10 Chances that it is not due to sampling variation will be 6 in 10
  • 77.
    Probability The essence ofany test of significance is to find out p value and draw inference If p value is 0.05 or more it is customary to accept that difference is due to chance (sampling variation) . The observed difference is said to be statistically not significant. If p value is less than 0.05 observed difference is not due chance but due to role of some external factors. The observed difference here is said to be statistically significant.
  • 78.
    From shape ofnormal curve We know that 95% observation lie within mean ± 2SD . Thus probability of value more or less than this range is 5% From probability tables p value is also determined by probability tables in case of student t test or chi square test By area under normal curve Here (z) standard normal deviate is calculated Corresponding to z values the area under the curve is determined (A) Probability is given by 2(0.5 - A)
  • 79.
    References Soben Peter; essentialsof preventive and community dentistry, second edition. G.N.prabhakara; biostatistics T . Bhaskara rao; methods of biostatistics
  • 81.
    Tests of significance Classifiedas Parametric tests Non – parametric tests Can also be divided into One tailed Two tailes
  • 82.
    Parametric tests Parametric testsare those tests in which certain assumptions are made about the population Population from which sample is drawn has normal distribution The variances of sample do not differ significantly The observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used Since these test make assumptions about the population parameters hence they are called parameteric tests . These are usually used to test the difference They are: Student t test( paired or unpaired) ANOVA Test of significance between two means
  • 83.
    Non – parametrictests In many biological investigation the research worker may not know the nature of distribution or other required values of the population. Also some biological measurements may not be true numerical values hence arithmetic procedures are not possible in such cases. In such cases distribution free or non parametric tests are used in which no assumption are made about the population parameters e.g. Mann Whitney test Chi square test Pi coefficient test Fischer’s Exact test Sign Test Freidmans Test
  • 84.
    Two tailed test Thistest determines if there is a difference between the two groups without specifying whether difference is higher or lower It includes both ends or tails of the normal distribution Such test is called Two tailed test If the objective is to conclude that 2 samples are from same population or not, without considering the direction of difference between means, then two tailed test is used, Eg., when one wants to know if mean IQ in malnourished children is different from well nourished children but does not specify if it is more or less
  • 85.
    One tailed test Inthe test of significance when one wants to specifically know if the difference between the two groups is higher or lower . i.e., the direction plus or minus side is specified. Then one tail of the distribution is excluded If the objective is to conclude that the mean of one of the sample is larger than the other or not, one tailed test is used E.g., if one wants to know if mal nourished children have less mean IQ than well nourished then higher side of the distribution will be excluded Such test of significance is called one tailed test