SlideShare a Scribd company logo
1 of 77
BIOSTATISTICS
PRESENTED BY: DR KAJOL BHAVSAR
GUIDED BY: DR JASUMA RAI
INDEX
• INTRODUCTION
• DEFINITIONS
• PRINCIPLES OF BIOSTATISTICS IN RESEARCH
• PRINCIPLES OF BIOSTATISTICS IN EPIDEMIOLOGY
• TYPES OF DATA
• SCALES OF DATA MEASUREMENT
• PROBABILITY DISTRIBUTION AND P VALUE
• CONFIDENCE INTERVAL
• MEASURES OF CENTRAL TENDENCY
• MEASURES OF DISPERSION
• NORMAL CURVE
• SKEWED DISTRIBUTION
• TEST OF SIGNIFICANCE
• CORRELATION
• REGRESSION
INTRODUCTION
• Analysis & interpretation of data is done using
biostatistics.
• Francis Galton is considered as the “Father of Biostatistics”.
Sir Galton, for the first time used statistical tools to study
differences among human population. He also invented
the use of questionnaires and surveys for collecting data
on human communities.
Variable is a
characteristic of an
object that can be
measured or
categorized
Biostatistics is a branch
of statistics concerned
with the mathematical
facts and data related
to biological events.
Statistics is a science of
compiling, classifying
& tabulating numerical
data & expressing the
results in a
mathematical and
graphical form.
Data are a set of values
of one or more
variable recorded on
one or more
individuals
DEFINITIONS
PRINCIPLES OF BIOSTATISTICS IN
RESEARCH
• Collecting, analyzing and interpreting data are
essential components of biomedical research and
biostatistics.
• The first is to choose right statistical test for computer
to perform based on nature of data derived from one
own’s research.
• Second is to understand if an analysis was performed
appropriately during review and interpretation of
other’s research.
PRINCIPLES OF BIOSTATISTICS IN
EPIDEMIOLOGY
• To test whether difference between two populations is
real or a chance occurrence
• To study correlation between attributes in same
population
• To evaluate efficacy of vaccines, sera etc.
• To measure mortality and morbidity
• To fix priorities in public health programs.
TYPES OF DATA
Qualitative
Data
• Characteristics that cannot be
expressed as numerical value. Eg:
Gender: Male /Female
• OHI: Good/Fair/Poor
Quantitative
Data
• Can be naturally expressed in
numerical value.
• Example: Age in years, Pocket depth
in mm
DATA
DISCRETE DATA
A random variable
whose value is obtained
by counting( only takes
certain values)
Eg: No. of decayed teeth,
Size of family & no. of
erupted permanent
teeth
CONTINUOUS DATA
A random variable
whose value is obtained
by measuring (can take
any value)
Eg: Pocket depth,
Amount of blood loss,
Temperature
SCALES OF DATA MEASUREMENT
A scale of measurement is how variables are defined and categorised
NOMINAL
ORDINAL
INTERVAL
RATIO
NOMINAL SCALE
• Simplest type of data, in which the values are unordered categories
• Consists of names/categories/classes of things-mutually exclusive
• No quantitative relationship among the categories
• Example:
• Gender (Male/Female)
• Blood group(A,B,AB,O)
• Speciality area in dentistry
ORDINAL SCALE
• Categories can be ordered or ranked
• Ranks subjects based on their relative standing on a specific
attribute
• The amount of the difference between any two categories ,
though they can be ordered , is not quantified.
• Examples:
• Post surgery pain: No pain/Mild pain/Moderate pain/Severe pain
• Tooth mobility
INTERVAL SCALE
• Observations can be ordered, and precise differences
between units of measure exist. However, there is no
meaningful absolute zero
• Difference between variables is meaningful whereas
ratio isn’t
• Example:
• IQ scale
• Temperature
RATIO SCALE
• Similar to interval scale but there exists a true absolute
zero
• Difference between variables and ratio are meaningful.
• Examples:
• Treatment cost in Rupees
• Attachment loss in mm
• Inter condylar distance
• Weight in kgs
PROBABILITY DISTRIBUTION
The probability distribution of a discrete random variable
is a table, graph, formula, or other device used to specify
all possible values of a discrete random variable along
with their respective probabilities.
It is a way to enumerate different values the variable can
have, and how frequently each value appear in
population.
• Actual frequency distribution is approximated to a
theatrical curve that is used as a probability
distribution. Egs: Binominal and nominal
• Most statistical analysis in health research use one of
these 3 common probability distribution.
• For eg; Suppose a die is thrown randomly 10 times,
then the probability of getting 2 for anyone throw is ⅙.
When you throw the dice 10 times, you have a binomial
distribution of n = 10 and p = ⅙.
PROBABILITY DISTRIBUTION
• Binominal distribution has 2 parameters. The binominal
distribution occurs when fixed no. of subjects are observed,
characteristic is dichotomous in nature(two possible values) and
each subject has same probability(p) of having one value and (1-
p) of other value.
• The statistical inference then involves finding out value of p in
the population, based on an observation of selected sample.
• Normal distribution on other hand , is mathematical curve
represented by 2 quantities , m and s.
• The former represents mean of values of variables, and latter, the
standard deviation.
Binomial distribution-It is a
type of frequency
distribution used when there
are exactly two outcomes
from each trial(success or
failure)
Normal distribution- A
probability distribution that
is symmetric about the mean,
showing that data near the
mean are more frequent in
occurrence than the data far
from the mean.
BINOMIAL DISTRIBUTION
• The number of tests can be infinite
• Continuous data
• The distribution is based on the
values in a dataset
• The number of tests is fixed, and each
test is independent. So the probability
of success in each trial is exactly the
same
• Discrete data
• The distribution is based on 2
outcomes of success or failure
NORMAL DISTRIBUTION
DIFFERENCE BETWEEN BINOMIAL &
NORMAL DISTRIBUTION
EXAMPLE OF BINOMIAL DISTRIBUTION
NORMAL CURVE/NORMAL
DISTRIBUTION
• When data is collected from a very large no. of people & frequency
distribution is made with narrow class interval, the resulting curve is
smooth & symmetrical & it is called normal curve.
• A distribution of this nature or shape is called normal
distribution/Guassian distribution.
CHARACTERISTICS OF NORMAL CURVE
• Smooth, symmetrical bell shaped curve
• The total area of curve is 1
• Its mean 0 and standard deviation is 1
• Parameters mean, median and mode coincide at
centre A. Maximum no. of observations are at the
center and gradually decrease towards the extremities
on the either side
SKEWED DISTRIBUTION
• A frequency distribution is considered asymmetrical if the frequencies
are not equally distributed on both the sides of central value
SKEWED DISTRIBUTION
P VALUE
• A p value is the probability that the computed value of a test statistic is
at least as extreme as a specified value of the test statistic when the null
hypothesis is true. Thus, the p value is the smallest value of α for which
we can reject a null hypothesis.
• It is probability that a difference of at least as extreme as those found in
observed data would have occurred by chance when null hypothesis is
true
• Probability of getting a minimum difference of what you observed due
to chance when there is no difference
• High P value of 0.8,0.6 or 0.1 indicates high probability to get the
observed result favors null hypothesis
• Low P value of 0.001, 0.01 or 0.05 indicates low probability to get
observed result rejects null hypothesis.
• By chance
WHEN TO USE P VALUES? ALMOST
ALWAYS
• Valid calculations of p-value requires minimal assumptions- less than
any other statistical method.
• The “assumptions” need not be merely assumed, but can be assumed by
a properly designed experiment(randomization + non parametric test)
P-VALUES AND STATISTICAL
SIGNIFICANCE EXPLAINED
HYPOTHESIS TESTING
• It is a process of deciding statistically whether the
findings are real or by chance.
• Research in which an independent variable is
manipulated is called experimental hypothesis testing
research
• Research in an independent variable is not
manipulated is called non experimental hypothesis
testing research
• The use of null hypothesis is quite frequent
NULL HYPOTHESIS
• There is no difference between
two groups
• It is denoted by H0
• It is followed by ‘equal to’ sign
• The researcher tries to disprove in
null hypothesis.
• There is a relationship between
two selected variables in a study
• It is denoted by H1 or H2
• It is followed by not equal to
‘greater than’ or lesser than’ sign.
• The researcher tries to prove in
alternative hypothesis
ALTERNATIVE HYPOTHESIS
DIFFERENCE BETWEEN NULL AND
ALTERNATIVE HYPOTHESIS
STEPS IN HYPOTHESIS TESTING PROCEDURE
CONFIDENCE INTERVAL
• A confidence interval is a statistical measure used to indicate the range
of estimates within which an unknown statistical parameter is likely to
fall. If the parameter is the population mean, the confidence interval is
an estimate of possible values of the population mean.
• A confidence interval is determined through use of observed (sample)
data and is calculated at a selected confidence level (chosen prior to the
computation of the confidence interval). This confidence level, such as a
95% confidence level, indicates the reliability of the estimation
procedure; it is not the degree of certainty that the computed confidence
interval contains the true value of the parameter being studied.
Specifically, the confidence level indicates the proportion of confidence
intervals, that when constructed given the chosen confidence level over
an infinite number of independent trials, will contain the true value of the
parameter.
CONFIDENCE INTERVAL
• For example, if 100 confidence intervals are computed at a 95%
confidence level, it is expected that 95 of these 100 confidence intervals
will contain the true value of the given parameter; it does not say
anything about individual confidence intervals. If 1 of these 100
confidence intervals is selected, we cannot say that there is a 95%
chance it contains the true value of the parameter – this is a common
misconception. The selected confidence interval will either contain or
will not contain the true value, but we cannot say anything about the
probability of a specific confidence interval containing the true value of
the parameter.
• Confidence intervals are typically written as (some value) ± (a range).
The range can be written as an actual value or a percentage. It can also
be written as simply the range of values. For example, the following are
all equivalent confidence intervals:
• 20.6 ±0.887
• or
• 20.6 ±4.3%
CALCULATING CONFIDENCE INTERVAL
• This calculator computes confidence intervals for normally
distributed data with an unknown mean, but known standard
deviation. It does not calculate confidence intervals for data with
an unknown mean and unknown standard deviation.
• Calculating a confidence interval involves determining the
sample mean, X
̄ , and the population standard deviation, σ, if
possible. If the population standard deviation cannot be used,
then the sample standard deviation, s, can be used when the
sample size is greater than 30
• For a sample size greater than 30, the population standard
deviation and the sample standard deviation will be similar.
• Depending on which standard deviation is known, the equation
used to calculate the confidence interval differs.
FORMULA FOR CI
Where Z is the Z-value for the chosen confidence level, X is the
sample mean, σ is the standard deviation, and n is the sample
size. Assuming the following with a confidence level of 95%:
The confidence interval is:
Z VALUES FOR CI
CONFIDENCE INTERVAL AND P-VALUES
CALCULATION OF CI
• Confidence Interval: 20.6 ±0.887 (±4.3%) [19.713 – 21.487]
• STEPS:
DIFFERENCE BETWEEN CONFIDENCE
INTERVAL AND P VALUE
• Confidence Interval
• A confidence interval calculated for a
measure of treatment effect shows the
range within which the true treatment
effect is likely to lie.
• Confidence intervals are preferable to p-
values, as they tell us the range of
possible effect sizes compatible with
the data, and thus provide clinically
relevant information.
• P value
• A p-value is calculated to assess whether
differences between treatments are likely
to have occurred simply through chance,
or whether they are likely to represent a
genuine effect.
P-values simply provide a cut-off beyond
which we assert that the findings are
‘statistically significant’ (by convention, this is
p<0.05)
MEASURES OF CENTRAL TENDENCY-
OBJECTIVES
• It is a central value around which other values are distributed
• To condense entire mass of the data
• To facilitate comparison
MEASURES OF CENTRAL TENDENCY
ARITHMETIC MEAN
• Most frequently used measure of central tendency is arithmetic mean
• Most widely used & dependent on value of every observation
• Balance point of distribution
• Mean= sum of all observations
• total no. of observations
• The no. of missing teeth in 5 school children are 2,2,1,4 and 1
• Mean no. of missing teeth= 2+2+1+4+1 = 2
• 5
MERITS
• Based on all observations
• Easy to calculate and
understand
• Least affected by sampling
fluctuation, hence more
stable
• No need to arrange data
• Can be used only for
quantitative data
• Affected by extreme
observations
• Unrealistic
• Graphical presentation is
not possible
DEMERITS
MERITS AND DEMERITS OF MEAN
MEDIAN
• Mid value of series
• Positional average
• Half of the observations have a value larger than median and the rest
have a value smaller than median
• Arrange all the observations in ascending or descending order
Number of observations is
odd, then median is the
middle value
Number of observations is even,
Then median is the average of
the two middle values.
-find the value at position
MEDIAN
• Missing teeth in 5 school children are 2,3,1,4 and 9
• Arranging them in order: 1,2,3,4,9
• The median is 2
• Missing teeth in 6 school children are 2,3,1,4,5 and 9
• Arrange them in order: 1,2,3,4,5,9
• The median is 3.5
PROPERTIES OF MEDIAN
• Uniqueness- As is true with the mean, there is
only one median for a given set of data
• Simplicity- The median is easy to calculate
• It is not drastically affected by extreme values
as is the mean
MERITS
• Not affected by extreme
observations
• Graphical presentation is possible
• Affected more by sampling
fluctuations
• Equal importance to all data
• Need to arrange data: ascending
or descending
DEMERITS
MERITS AND DEMERITS OF MEDIAN
MODE
• Most frequently occurring observation in a series
• If a series has 2 items with highest frequency – bimodal distribution
• Not affected by extreme values
• Mode= 3median-2mean
MODE
A count of the ages in the above reveals that the age 53 occurs most frequently(17 times). The
mode for this population of ages is 53.
The given table shows Ordered Array of Ages of Subjects
MERITS
• Not affected by extreme
observations
• Both for quantitative and
qualitative data
• Not rigidly defined
• Can be used for future
mathematical calculation
DEMERITS
MERITS AND DEMERITS OF MODE
MEASURES OF DISPERSION
• The measures of dispersion helps to know how widely the observations
are spread on either side of the average
• Dispersion is the degree of spread or variation of the variable about a
central value
• Most common measures of dispersion used in dental science are:
RANGE,MEAN DEVIATION AND STANDARD DEVIATION
RANGE
• It is the difference between highest and lowest values
• This measures gives no information about the values that lie between
the extreme values
MEAN DEVIATION
• It is the average of the deviation from the arithmetic mean
• The mean deviation is one way of measuring how closely the individual
scores in the data set cluster around the mean
MEAN DEVIATION
• The no. of decayed tooth in 6 school children are 1,3,5,7,9,11
respectively
• Mean(X)=1+3+5+7+9+11/6=6
MD=18/6=3
The average distance between each child’s caries status and the mean caries
status of the population is 3 units
LIMITATIONS OF MEAN DEVIATION
• Though based on all values, it ignores the algebraic signs of deviations –
limited sense
• Rarely used in modern statistics-replaced with standard deviation
STANDARD DEVATION
• It is the most important and widely used measure of studying
dispersion
• Also known as root mean square deviation because it is square
root of the mean of the squared deviations from arithmetic mean
• Greater the standard deviation , greater will be the magnitude of
dispersion from the mean
MERITS OF STANDARD DEVIATION
• Most widely used measure of dispersion
• Based on all values of observation
• Signs of deviation not discarded, instead eliminated
by squaring
WHAT IS STANDARD ERROR?
• The standard error (SE)is the standard deviation of the
sampling distribution of the statistical mean.
• It is used to refer to an estimate of the standard
deviation, derived from a particular sample to
compute the sample.
NEED OF STANDARD ERROR
• Used as an instrument in testing a given
hypothesis.
• It provides an idea about an unreliability of
the sample
• S.E. helps in determining the limits within
which the values are expected to lie
TEST OF SIGNIFICANCE
• It deals with technique to know how far the difference between the
groups are due to sampling variations.
• Start with hypothesis that null hypothesis is true.
• There are 2 types of tests:
-Parametric test: these tests are applied when data is normally
distributed.
• -Non parametric test: these tests are applied when data is not normally
distributed.
RELATIVE RISK
• Relative risk is the ratio of the risk of developing a disease among
subjects with the risk factor to the risk of developing the disease among
subjects without the risk factor.
• We represent the relative risk from a prospective study symbolically as
CLASSIFICATION OF A SAMPLE OF SUBJECTS
WITH RESPECT TO DISEASE STATUS & RISK
FACTOR
RISK FACTOR PRESENT ABSENT TOTAL AT RISK
Present a b a+b
Absent c d c+d
Total a+c b+d n
The data resulting from a prospective study in which the dependent variable and the risk factor
are both dichotomous may be displayed in a 2×2 contingency table . The risk of the development
of the disease among the subjects with the risk factor is a/(a+b) & risk of the development of the
disease among the subjects without the risk factor is c/(c+d)
DISEASE
ODD’S RATIO
• The odds for success are the ratio of the probability of success to the
probability of failure.
• We use this definition of odds to define two odds that we can calculate
from data displayed in the table:
• 1. the odds of being a case (having a disease) to being a control (not
having the disease) among subjects with the risk factor
[a/(a+b)]/[b/(a+b)]=a/b
• 2. The odds of being a case (having a disease) to being a control (not
having the disease) among subjects with the risk factor
[c/(c+d)]/[d/(c+d)]=c/d
RISK FACTOR CASES CONTROLS TOTAL
PRESENT a b a+b
ABSENT c d c+d
TOTAL a+c b+d n
PARAMETRIC TESTS:
• One sample t-test(unpaired t-test)
• This test is applied to unpaired data of independent observations made
on individuals of 2 different or separate groups or samples drawn from 2
populations , to test if difference between the means is real or it can be
attributed to sampling variability.
• Criteria for applying ‘t’ test:
• -sample must be randomly selected
• Data must be quantitative
• Sample should be <30
PARAMETRIC TEST
• Paired sample t-test
• -used in context of paired samples, in studies comparing differences in
outcome before and after an intervention or in studies on paired organs
• -this test is used when measurements on one entity is related to
measurement on the other ; when abbreviations are dependent.
PARAMETRIC TEST
• Independent samples t-test
• -When comparison of 2 independent groups on a continuous outcome
is required
• -used in case control studies
PARAMETRIC TEST
• Analysis of variance(ANOVA)
• It is used when comparison of more than and independent groups on a
continuous outcome is required
• This is to test whether the mean of the outcome variables in the different
groups are same
• Many situations involve collecting data on 3 or more groups of
individuals , with the objective of determining whether any true
differences in mean performance exist among conditions under the
study
PARAMETRIC AND NON PARAMETRIC TEST
Comparisons Hypothesis
tested
Parametric
test
Hypothesis
tested
Non-
test
Single group Sample mean
not different
from
population
mean
One sample
t test(<30)
Z test(>30)
Sample median
not different
from
population
median
Chi square test
Two
independent
samples
2 populations
means are
equal
Unpaired t-test Two
populations
medians are
equal
Mann Whitney
test
Two related
samples
Mean
difference=0
Paired t-test Median
difference=0
Wilcoxon’s
signed rank test
Three or more
independent
samples
All population
means are
equal
ANOVA All populations
medians are
equal
Kruskal Wallis
test
PARAMETRIC AND NON PARAMETRIC
TESTS
Comparisons Hypothesis
tested
Parametric
test
Hypothesis
tested
Non-
test
Three or more
dependent
samples
Difference
between
means of each
population
group is equal
Repeated
measures
ANOVA
Difference
between
medians of each
population
group is equal
Mann Whitney
test for each
group
Relation
between two
continuous
variables
For normal Pearson’s
correlation
coefficient
For non-
data or ordinal
data
Spearman’s
correlation
coefficient
Parametric
test
Non Parametric
test
One sample Two
sample
One
sample
Two
sample
t-test
Z-test
Ind. Sample:
Two group
t-test
Z-test
Paired sample:
Paired t-test -Chi-square
-Kolmogorov-
Smirnov test
-Runs test
Paired
sample:
-Sign test
-Wilcoxon
test
-McNemar
-Chi square
test
Ind. Sample:
-Chi-square
-Mann-
Whitney
-
Kolmogorov
-Smirnov
test
QUALITATIVE
• Chi-Square test- Test association between categorical variables
• Mc Nemar test- Alternative to paired t-test for categorical variables
• Fischers exact test- used when expected value is <5 in any column in a
chi square contingency table.
NON PARAMETRIC TEST
• The Chi square test for qualitative data
• -developed by Karl Pearson
• When data is measured in terms of attributes or qualities , and it is
intended to test whether difference in distribution of attributes in
different groups is due to sampling variation or not , the Chi square test
is applied.
• Used to test significance of difference between two proportions and can
be used when there are more than 2 groups to be compared .
• Demographic data and periodontal parameters. FMPS, full-mouth
plaque score; FMBS, full-mouth percentage bleeding score Data were
presented as mean ± SD, percentage p<0.05; the significant difference
between groups
Since the data distributions were not normal, non-parametric tests were used to evaluate the analysis. The Chi
Square test was used in this table.
CORRELATION
• It is the relationship between two set of variables
• The magnitude or degree of relationship between two variables is called
correlation coefficient and is denoted by ‘r’
• The correlation coefficient ranges from -1 to +1 i.e; -1<r<+1
REGRESSION
• It is statistical method for studying relationship between a single
dependent variable and one or more independent variables
• It is used to find association between two variables when there are many
variables
• Regression might imply causation by giving prediction
CORRELATION
• Correlation is a statistical measure
that determines the association
between the two variables
• It is used to represent a linear
relationship between two
variables.
• There is no difference between
dependent and independent
variables
• To find a numerical value
expressing the relationship
between the variables
• Regression describes how to
numerically relate an independent
variable to dependent variable
• To estimate one variable based on
another
• Both the variables are different
• To estimate the values of random
variables based on the values of
fixed variables
REGRESSION
COMPARISON OF CORRELATION &
REGRESSION
EXAMPLE OF CORRELATION
CONCLUSION
• Research is a quest for knowledge through diligent search or
investigation or experimentation aimed at discovery and interpretation
of new knowledge
• Scientific method is systemic body of procedures and techniques
applied in carrying out investigation or experimentation targeted at
obtaining new knowledge
• Research and scientific methods maybe considered a course of critical
inquiry leading to discovery of fact or information, which increases our
understanding of human health and disease.
REFERENCES
• 1. Soben Peter, Essentials of Public Health Dentistry, 6th Edition, Arya
Medi Publishing House Pvt. Ltd,
• 2. Wayne W. Daniel, Biostatistics Basic Concepts and Methodology for
the Health Sciences, 9th Edition, Wiley India Pvt. Ltd, 2014
THANK YOU

More Related Content

Similar to BIOSTATISTICS.pptx

De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsGillian Byrne
 
Clinical research ( Medical stat. concepts)
Clinical research ( Medical stat. concepts)Clinical research ( Medical stat. concepts)
Clinical research ( Medical stat. concepts)Mohamed Fahmy Dehim
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchShinjan Patra
 
Tests of significance Periodontology
Tests of significance PeriodontologyTests of significance Periodontology
Tests of significance PeriodontologySaiLakshmi128
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptxMrymNb
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalIstiqlalEid
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptDiptoKumerSarker1
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in ResearchManoj Sharma
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1naranbatn
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdfezaldeen2013
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesisJunaid Ijaz
 
TREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxTREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxCarmela857185
 
Overview of different statistical tests used in epidemiological
Overview of different  statistical tests used in epidemiologicalOverview of different  statistical tests used in epidemiological
Overview of different statistical tests used in epidemiologicalshefali jain
 

Similar to BIOSTATISTICS.pptx (20)

De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statistics
 
Clinical research ( Medical stat. concepts)
Clinical research ( Medical stat. concepts)Clinical research ( Medical stat. concepts)
Clinical research ( Medical stat. concepts)
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical research
 
Tests of significance Periodontology
Tests of significance PeriodontologyTests of significance Periodontology
Tests of significance Periodontology
 
Analysis 101
Analysis 101Analysis 101
Analysis 101
 
Parametric tests
Parametric testsParametric tests
Parametric tests
 
Biostatistics
Biostatistics Biostatistics
Biostatistics
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
 
Statistics
StatisticsStatistics
Statistics
 
TREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptxTREATMENT OF DATA_Scrd.pptx
TREATMENT OF DATA_Scrd.pptx
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Scales of measurement
Scales of measurementScales of measurement
Scales of measurement
 
Overview of different statistical tests used in epidemiological
Overview of different  statistical tests used in epidemiologicalOverview of different  statistical tests used in epidemiological
Overview of different statistical tests used in epidemiological
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

BIOSTATISTICS.pptx

  • 1. BIOSTATISTICS PRESENTED BY: DR KAJOL BHAVSAR GUIDED BY: DR JASUMA RAI
  • 2. INDEX • INTRODUCTION • DEFINITIONS • PRINCIPLES OF BIOSTATISTICS IN RESEARCH • PRINCIPLES OF BIOSTATISTICS IN EPIDEMIOLOGY • TYPES OF DATA • SCALES OF DATA MEASUREMENT • PROBABILITY DISTRIBUTION AND P VALUE • CONFIDENCE INTERVAL • MEASURES OF CENTRAL TENDENCY • MEASURES OF DISPERSION • NORMAL CURVE • SKEWED DISTRIBUTION • TEST OF SIGNIFICANCE • CORRELATION • REGRESSION
  • 3. INTRODUCTION • Analysis & interpretation of data is done using biostatistics. • Francis Galton is considered as the “Father of Biostatistics”. Sir Galton, for the first time used statistical tools to study differences among human population. He also invented the use of questionnaires and surveys for collecting data on human communities.
  • 4. Variable is a characteristic of an object that can be measured or categorized Biostatistics is a branch of statistics concerned with the mathematical facts and data related to biological events. Statistics is a science of compiling, classifying & tabulating numerical data & expressing the results in a mathematical and graphical form. Data are a set of values of one or more variable recorded on one or more individuals DEFINITIONS
  • 5. PRINCIPLES OF BIOSTATISTICS IN RESEARCH • Collecting, analyzing and interpreting data are essential components of biomedical research and biostatistics. • The first is to choose right statistical test for computer to perform based on nature of data derived from one own’s research. • Second is to understand if an analysis was performed appropriately during review and interpretation of other’s research.
  • 6. PRINCIPLES OF BIOSTATISTICS IN EPIDEMIOLOGY • To test whether difference between two populations is real or a chance occurrence • To study correlation between attributes in same population • To evaluate efficacy of vaccines, sera etc. • To measure mortality and morbidity • To fix priorities in public health programs.
  • 7. TYPES OF DATA Qualitative Data • Characteristics that cannot be expressed as numerical value. Eg: Gender: Male /Female • OHI: Good/Fair/Poor Quantitative Data • Can be naturally expressed in numerical value. • Example: Age in years, Pocket depth in mm
  • 8. DATA DISCRETE DATA A random variable whose value is obtained by counting( only takes certain values) Eg: No. of decayed teeth, Size of family & no. of erupted permanent teeth CONTINUOUS DATA A random variable whose value is obtained by measuring (can take any value) Eg: Pocket depth, Amount of blood loss, Temperature
  • 9. SCALES OF DATA MEASUREMENT A scale of measurement is how variables are defined and categorised NOMINAL ORDINAL INTERVAL RATIO
  • 10. NOMINAL SCALE • Simplest type of data, in which the values are unordered categories • Consists of names/categories/classes of things-mutually exclusive • No quantitative relationship among the categories • Example: • Gender (Male/Female) • Blood group(A,B,AB,O) • Speciality area in dentistry
  • 11. ORDINAL SCALE • Categories can be ordered or ranked • Ranks subjects based on their relative standing on a specific attribute • The amount of the difference between any two categories , though they can be ordered , is not quantified. • Examples: • Post surgery pain: No pain/Mild pain/Moderate pain/Severe pain • Tooth mobility
  • 12. INTERVAL SCALE • Observations can be ordered, and precise differences between units of measure exist. However, there is no meaningful absolute zero • Difference between variables is meaningful whereas ratio isn’t • Example: • IQ scale • Temperature
  • 13. RATIO SCALE • Similar to interval scale but there exists a true absolute zero • Difference between variables and ratio are meaningful. • Examples: • Treatment cost in Rupees • Attachment loss in mm • Inter condylar distance • Weight in kgs
  • 14. PROBABILITY DISTRIBUTION The probability distribution of a discrete random variable is a table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities. It is a way to enumerate different values the variable can have, and how frequently each value appear in population. • Actual frequency distribution is approximated to a theatrical curve that is used as a probability distribution. Egs: Binominal and nominal • Most statistical analysis in health research use one of these 3 common probability distribution. • For eg; Suppose a die is thrown randomly 10 times, then the probability of getting 2 for anyone throw is ⅙. When you throw the dice 10 times, you have a binomial distribution of n = 10 and p = ⅙.
  • 15. PROBABILITY DISTRIBUTION • Binominal distribution has 2 parameters. The binominal distribution occurs when fixed no. of subjects are observed, characteristic is dichotomous in nature(two possible values) and each subject has same probability(p) of having one value and (1- p) of other value. • The statistical inference then involves finding out value of p in the population, based on an observation of selected sample. • Normal distribution on other hand , is mathematical curve represented by 2 quantities , m and s. • The former represents mean of values of variables, and latter, the standard deviation.
  • 16. Binomial distribution-It is a type of frequency distribution used when there are exactly two outcomes from each trial(success or failure) Normal distribution- A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than the data far from the mean.
  • 17. BINOMIAL DISTRIBUTION • The number of tests can be infinite • Continuous data • The distribution is based on the values in a dataset • The number of tests is fixed, and each test is independent. So the probability of success in each trial is exactly the same • Discrete data • The distribution is based on 2 outcomes of success or failure NORMAL DISTRIBUTION DIFFERENCE BETWEEN BINOMIAL & NORMAL DISTRIBUTION
  • 18. EXAMPLE OF BINOMIAL DISTRIBUTION
  • 19. NORMAL CURVE/NORMAL DISTRIBUTION • When data is collected from a very large no. of people & frequency distribution is made with narrow class interval, the resulting curve is smooth & symmetrical & it is called normal curve. • A distribution of this nature or shape is called normal distribution/Guassian distribution.
  • 20. CHARACTERISTICS OF NORMAL CURVE • Smooth, symmetrical bell shaped curve • The total area of curve is 1 • Its mean 0 and standard deviation is 1 • Parameters mean, median and mode coincide at centre A. Maximum no. of observations are at the center and gradually decrease towards the extremities on the either side
  • 21. SKEWED DISTRIBUTION • A frequency distribution is considered asymmetrical if the frequencies are not equally distributed on both the sides of central value
  • 23. P VALUE • A p value is the probability that the computed value of a test statistic is at least as extreme as a specified value of the test statistic when the null hypothesis is true. Thus, the p value is the smallest value of α for which we can reject a null hypothesis. • It is probability that a difference of at least as extreme as those found in observed data would have occurred by chance when null hypothesis is true • Probability of getting a minimum difference of what you observed due to chance when there is no difference • High P value of 0.8,0.6 or 0.1 indicates high probability to get the observed result favors null hypothesis • Low P value of 0.001, 0.01 or 0.05 indicates low probability to get observed result rejects null hypothesis. • By chance
  • 24. WHEN TO USE P VALUES? ALMOST ALWAYS • Valid calculations of p-value requires minimal assumptions- less than any other statistical method. • The “assumptions” need not be merely assumed, but can be assumed by a properly designed experiment(randomization + non parametric test)
  • 26. HYPOTHESIS TESTING • It is a process of deciding statistically whether the findings are real or by chance. • Research in which an independent variable is manipulated is called experimental hypothesis testing research • Research in an independent variable is not manipulated is called non experimental hypothesis testing research • The use of null hypothesis is quite frequent
  • 27. NULL HYPOTHESIS • There is no difference between two groups • It is denoted by H0 • It is followed by ‘equal to’ sign • The researcher tries to disprove in null hypothesis. • There is a relationship between two selected variables in a study • It is denoted by H1 or H2 • It is followed by not equal to ‘greater than’ or lesser than’ sign. • The researcher tries to prove in alternative hypothesis ALTERNATIVE HYPOTHESIS DIFFERENCE BETWEEN NULL AND ALTERNATIVE HYPOTHESIS
  • 28. STEPS IN HYPOTHESIS TESTING PROCEDURE
  • 29. CONFIDENCE INTERVAL • A confidence interval is a statistical measure used to indicate the range of estimates within which an unknown statistical parameter is likely to fall. If the parameter is the population mean, the confidence interval is an estimate of possible values of the population mean. • A confidence interval is determined through use of observed (sample) data and is calculated at a selected confidence level (chosen prior to the computation of the confidence interval). This confidence level, such as a 95% confidence level, indicates the reliability of the estimation procedure; it is not the degree of certainty that the computed confidence interval contains the true value of the parameter being studied. Specifically, the confidence level indicates the proportion of confidence intervals, that when constructed given the chosen confidence level over an infinite number of independent trials, will contain the true value of the parameter.
  • 30. CONFIDENCE INTERVAL • For example, if 100 confidence intervals are computed at a 95% confidence level, it is expected that 95 of these 100 confidence intervals will contain the true value of the given parameter; it does not say anything about individual confidence intervals. If 1 of these 100 confidence intervals is selected, we cannot say that there is a 95% chance it contains the true value of the parameter – this is a common misconception. The selected confidence interval will either contain or will not contain the true value, but we cannot say anything about the probability of a specific confidence interval containing the true value of the parameter. • Confidence intervals are typically written as (some value) ± (a range). The range can be written as an actual value or a percentage. It can also be written as simply the range of values. For example, the following are all equivalent confidence intervals: • 20.6 ±0.887 • or • 20.6 ±4.3%
  • 31. CALCULATING CONFIDENCE INTERVAL • This calculator computes confidence intervals for normally distributed data with an unknown mean, but known standard deviation. It does not calculate confidence intervals for data with an unknown mean and unknown standard deviation. • Calculating a confidence interval involves determining the sample mean, X ̄ , and the population standard deviation, σ, if possible. If the population standard deviation cannot be used, then the sample standard deviation, s, can be used when the sample size is greater than 30 • For a sample size greater than 30, the population standard deviation and the sample standard deviation will be similar. • Depending on which standard deviation is known, the equation used to calculate the confidence interval differs.
  • 32. FORMULA FOR CI Where Z is the Z-value for the chosen confidence level, X is the sample mean, σ is the standard deviation, and n is the sample size. Assuming the following with a confidence level of 95%: The confidence interval is:
  • 35. CALCULATION OF CI • Confidence Interval: 20.6 ±0.887 (±4.3%) [19.713 – 21.487] • STEPS:
  • 36. DIFFERENCE BETWEEN CONFIDENCE INTERVAL AND P VALUE • Confidence Interval • A confidence interval calculated for a measure of treatment effect shows the range within which the true treatment effect is likely to lie. • Confidence intervals are preferable to p- values, as they tell us the range of possible effect sizes compatible with the data, and thus provide clinically relevant information. • P value • A p-value is calculated to assess whether differences between treatments are likely to have occurred simply through chance, or whether they are likely to represent a genuine effect. P-values simply provide a cut-off beyond which we assert that the findings are ‘statistically significant’ (by convention, this is p<0.05)
  • 37. MEASURES OF CENTRAL TENDENCY- OBJECTIVES • It is a central value around which other values are distributed • To condense entire mass of the data • To facilitate comparison
  • 39. ARITHMETIC MEAN • Most frequently used measure of central tendency is arithmetic mean • Most widely used & dependent on value of every observation • Balance point of distribution • Mean= sum of all observations • total no. of observations • The no. of missing teeth in 5 school children are 2,2,1,4 and 1 • Mean no. of missing teeth= 2+2+1+4+1 = 2 • 5
  • 40. MERITS • Based on all observations • Easy to calculate and understand • Least affected by sampling fluctuation, hence more stable • No need to arrange data • Can be used only for quantitative data • Affected by extreme observations • Unrealistic • Graphical presentation is not possible DEMERITS MERITS AND DEMERITS OF MEAN
  • 41. MEDIAN • Mid value of series • Positional average • Half of the observations have a value larger than median and the rest have a value smaller than median • Arrange all the observations in ascending or descending order Number of observations is odd, then median is the middle value Number of observations is even, Then median is the average of the two middle values. -find the value at position
  • 42. MEDIAN • Missing teeth in 5 school children are 2,3,1,4 and 9 • Arranging them in order: 1,2,3,4,9 • The median is 2 • Missing teeth in 6 school children are 2,3,1,4,5 and 9 • Arrange them in order: 1,2,3,4,5,9 • The median is 3.5
  • 43. PROPERTIES OF MEDIAN • Uniqueness- As is true with the mean, there is only one median for a given set of data • Simplicity- The median is easy to calculate • It is not drastically affected by extreme values as is the mean
  • 44. MERITS • Not affected by extreme observations • Graphical presentation is possible • Affected more by sampling fluctuations • Equal importance to all data • Need to arrange data: ascending or descending DEMERITS MERITS AND DEMERITS OF MEDIAN
  • 45. MODE • Most frequently occurring observation in a series • If a series has 2 items with highest frequency – bimodal distribution • Not affected by extreme values • Mode= 3median-2mean
  • 46. MODE A count of the ages in the above reveals that the age 53 occurs most frequently(17 times). The mode for this population of ages is 53. The given table shows Ordered Array of Ages of Subjects
  • 47. MERITS • Not affected by extreme observations • Both for quantitative and qualitative data • Not rigidly defined • Can be used for future mathematical calculation DEMERITS MERITS AND DEMERITS OF MODE
  • 48. MEASURES OF DISPERSION • The measures of dispersion helps to know how widely the observations are spread on either side of the average • Dispersion is the degree of spread or variation of the variable about a central value • Most common measures of dispersion used in dental science are: RANGE,MEAN DEVIATION AND STANDARD DEVIATION
  • 49. RANGE • It is the difference between highest and lowest values • This measures gives no information about the values that lie between the extreme values
  • 50. MEAN DEVIATION • It is the average of the deviation from the arithmetic mean • The mean deviation is one way of measuring how closely the individual scores in the data set cluster around the mean
  • 51. MEAN DEVIATION • The no. of decayed tooth in 6 school children are 1,3,5,7,9,11 respectively • Mean(X)=1+3+5+7+9+11/6=6 MD=18/6=3 The average distance between each child’s caries status and the mean caries status of the population is 3 units
  • 52. LIMITATIONS OF MEAN DEVIATION • Though based on all values, it ignores the algebraic signs of deviations – limited sense • Rarely used in modern statistics-replaced with standard deviation
  • 53. STANDARD DEVATION • It is the most important and widely used measure of studying dispersion • Also known as root mean square deviation because it is square root of the mean of the squared deviations from arithmetic mean • Greater the standard deviation , greater will be the magnitude of dispersion from the mean
  • 54. MERITS OF STANDARD DEVIATION • Most widely used measure of dispersion • Based on all values of observation • Signs of deviation not discarded, instead eliminated by squaring
  • 55. WHAT IS STANDARD ERROR? • The standard error (SE)is the standard deviation of the sampling distribution of the statistical mean. • It is used to refer to an estimate of the standard deviation, derived from a particular sample to compute the sample.
  • 56. NEED OF STANDARD ERROR • Used as an instrument in testing a given hypothesis. • It provides an idea about an unreliability of the sample • S.E. helps in determining the limits within which the values are expected to lie
  • 57. TEST OF SIGNIFICANCE • It deals with technique to know how far the difference between the groups are due to sampling variations. • Start with hypothesis that null hypothesis is true. • There are 2 types of tests: -Parametric test: these tests are applied when data is normally distributed. • -Non parametric test: these tests are applied when data is not normally distributed.
  • 58. RELATIVE RISK • Relative risk is the ratio of the risk of developing a disease among subjects with the risk factor to the risk of developing the disease among subjects without the risk factor. • We represent the relative risk from a prospective study symbolically as
  • 59. CLASSIFICATION OF A SAMPLE OF SUBJECTS WITH RESPECT TO DISEASE STATUS & RISK FACTOR RISK FACTOR PRESENT ABSENT TOTAL AT RISK Present a b a+b Absent c d c+d Total a+c b+d n The data resulting from a prospective study in which the dependent variable and the risk factor are both dichotomous may be displayed in a 2×2 contingency table . The risk of the development of the disease among the subjects with the risk factor is a/(a+b) & risk of the development of the disease among the subjects without the risk factor is c/(c+d) DISEASE
  • 60. ODD’S RATIO • The odds for success are the ratio of the probability of success to the probability of failure. • We use this definition of odds to define two odds that we can calculate from data displayed in the table: • 1. the odds of being a case (having a disease) to being a control (not having the disease) among subjects with the risk factor [a/(a+b)]/[b/(a+b)]=a/b • 2. The odds of being a case (having a disease) to being a control (not having the disease) among subjects with the risk factor [c/(c+d)]/[d/(c+d)]=c/d RISK FACTOR CASES CONTROLS TOTAL PRESENT a b a+b ABSENT c d c+d TOTAL a+c b+d n
  • 61. PARAMETRIC TESTS: • One sample t-test(unpaired t-test) • This test is applied to unpaired data of independent observations made on individuals of 2 different or separate groups or samples drawn from 2 populations , to test if difference between the means is real or it can be attributed to sampling variability. • Criteria for applying ‘t’ test: • -sample must be randomly selected • Data must be quantitative • Sample should be <30
  • 62. PARAMETRIC TEST • Paired sample t-test • -used in context of paired samples, in studies comparing differences in outcome before and after an intervention or in studies on paired organs • -this test is used when measurements on one entity is related to measurement on the other ; when abbreviations are dependent.
  • 63. PARAMETRIC TEST • Independent samples t-test • -When comparison of 2 independent groups on a continuous outcome is required • -used in case control studies
  • 64. PARAMETRIC TEST • Analysis of variance(ANOVA) • It is used when comparison of more than and independent groups on a continuous outcome is required • This is to test whether the mean of the outcome variables in the different groups are same • Many situations involve collecting data on 3 or more groups of individuals , with the objective of determining whether any true differences in mean performance exist among conditions under the study
  • 65. PARAMETRIC AND NON PARAMETRIC TEST Comparisons Hypothesis tested Parametric test Hypothesis tested Non- test Single group Sample mean not different from population mean One sample t test(<30) Z test(>30) Sample median not different from population median Chi square test Two independent samples 2 populations means are equal Unpaired t-test Two populations medians are equal Mann Whitney test Two related samples Mean difference=0 Paired t-test Median difference=0 Wilcoxon’s signed rank test Three or more independent samples All population means are equal ANOVA All populations medians are equal Kruskal Wallis test
  • 66. PARAMETRIC AND NON PARAMETRIC TESTS Comparisons Hypothesis tested Parametric test Hypothesis tested Non- test Three or more dependent samples Difference between means of each population group is equal Repeated measures ANOVA Difference between medians of each population group is equal Mann Whitney test for each group Relation between two continuous variables For normal Pearson’s correlation coefficient For non- data or ordinal data Spearman’s correlation coefficient
  • 67. Parametric test Non Parametric test One sample Two sample One sample Two sample t-test Z-test Ind. Sample: Two group t-test Z-test Paired sample: Paired t-test -Chi-square -Kolmogorov- Smirnov test -Runs test Paired sample: -Sign test -Wilcoxon test -McNemar -Chi square test Ind. Sample: -Chi-square -Mann- Whitney - Kolmogorov -Smirnov test
  • 68. QUALITATIVE • Chi-Square test- Test association between categorical variables • Mc Nemar test- Alternative to paired t-test for categorical variables • Fischers exact test- used when expected value is <5 in any column in a chi square contingency table.
  • 69. NON PARAMETRIC TEST • The Chi square test for qualitative data • -developed by Karl Pearson • When data is measured in terms of attributes or qualities , and it is intended to test whether difference in distribution of attributes in different groups is due to sampling variation or not , the Chi square test is applied. • Used to test significance of difference between two proportions and can be used when there are more than 2 groups to be compared .
  • 70. • Demographic data and periodontal parameters. FMPS, full-mouth plaque score; FMBS, full-mouth percentage bleeding score Data were presented as mean ± SD, percentage p<0.05; the significant difference between groups Since the data distributions were not normal, non-parametric tests were used to evaluate the analysis. The Chi Square test was used in this table.
  • 71. CORRELATION • It is the relationship between two set of variables • The magnitude or degree of relationship between two variables is called correlation coefficient and is denoted by ‘r’ • The correlation coefficient ranges from -1 to +1 i.e; -1<r<+1
  • 72. REGRESSION • It is statistical method for studying relationship between a single dependent variable and one or more independent variables • It is used to find association between two variables when there are many variables • Regression might imply causation by giving prediction
  • 73. CORRELATION • Correlation is a statistical measure that determines the association between the two variables • It is used to represent a linear relationship between two variables. • There is no difference between dependent and independent variables • To find a numerical value expressing the relationship between the variables • Regression describes how to numerically relate an independent variable to dependent variable • To estimate one variable based on another • Both the variables are different • To estimate the values of random variables based on the values of fixed variables REGRESSION COMPARISON OF CORRELATION & REGRESSION
  • 75. CONCLUSION • Research is a quest for knowledge through diligent search or investigation or experimentation aimed at discovery and interpretation of new knowledge • Scientific method is systemic body of procedures and techniques applied in carrying out investigation or experimentation targeted at obtaining new knowledge • Research and scientific methods maybe considered a course of critical inquiry leading to discovery of fact or information, which increases our understanding of human health and disease.
  • 76. REFERENCES • 1. Soben Peter, Essentials of Public Health Dentistry, 6th Edition, Arya Medi Publishing House Pvt. Ltd, • 2. Wayne W. Daniel, Biostatistics Basic Concepts and Methodology for the Health Sciences, 9th Edition, Wiley India Pvt. Ltd, 2014

Editor's Notes

  1. It is important for the investigator and interpreting clinician to understand basics of biostatistics for 2 reasons.