TESTS OF SIGNIFICANCE
Deals with techniques to know how far the difference between the estimates of different samples is due to sampling variation.
Standard error (S.E) of Mean = S.D/√n
Standard error (S.E) of Proportion = √pq/n
Tests of significance:
Can be broadly classified into 2 types
1. Parametric tests (or) standard tests of hypothesis
2. Non – Parametric tests (or) distribution free-test of hypothesis
PARAMETRIC TESTS:
Parametric test is a statistical test that makes assumptions about the parameters of the population distribution(s) from which ones data is drawn.
When to use parametric test???
Subjects should be randomly selected
Data should be normally distributed
Homogeneity of variances
The important parametric tests are:
1) z-test
2) t-test
3) ANOVA
4) Pearson correlation coefficient
Z - Test:
This is a most frequently used test in research studies.
Z - test is based on the normal probability distribution and is used for judging the significance of several statistical measures, particularly the mean.
Z - test is used when sample size greater than 30. Test of significance for large samples
Z = observation – mean
SD
Prerequisites to apply z- test
Sample must be selected randomly
Data must be quantitative
Variable is assumed to follow normal distribution in the population
Sample size must be greater than 30. if SD of population is known, z test can be applied even sample size is less than 30
2) t- Test
• In case of samples less than 30 the Z value will not follow the normal distribution
• Hence Z test will not give the correct level of significance
• In such cases students t test is used
• It was given by “WS Gossett” whose pen name was student. So, it is also called as Student test.
There are two types of student t Test
1. Unpaired t test
2. Paired t test
Criteria for applying t- test
1. Random samples
2. Quantitative data
3. Variable normally distributed
4. Sample size less than 30
Unpaired test:
• Applied to unpaired data of independent observation made on individuals of 2 separate groups or samples drawn from the population.
• To test if the difference between the 2 means is real or it can be due to sampling variability.
Paired t - test:
• It is applied to paired data of observation from one sample only (observation before and after taking a drug)
Examples:
1. Pulse rate before and after exertion
2. Plaque scores before and after using oral hygiene aid
3) ANOVA ( Analysis of Variance):
• Investigations may not always be confined to comparison of 2 samples only
• In such cases where more than 2 samples are used ANOVA can be used.
• Also when measurements are influenced by several factors playing their role e.g. factors affecting retention of a denture, ANOVA can be used.
Indications:
To compare more than two sample means
Types:
1. one-way ANIVA
2. Two-way ANOVA
3. Multi-way ANOVA
Pearson’s correlation
2. CONTENTS
• Introduction
• History
• Data
• Measures of Central tendency
• Measures of Dispersion
• Normal Distribution
• Hypothesis and types of errors
4. INTRODUCTION
• Statistics - It is the science of compiling, classifying &
tabulating numerical data and expressing the results in a
mathematical/graphical form.
• Bio statistics - is that branch of statistics concerned with
mathematical facts and data relating to biological events.
5. Application and uses of Biostatistics
• In Physiology and Anatomy:
1) To define the limits of normality for variable such
as height or weight or Blood Pressure etc in a
population.
2) Variation more than natural limits may be
pathological i.e abnormal due to play of certain
external factors.
3) To find the difference between means and
proportions of normal at two places or in different
periods
6. • In Pharmacology:
1) To find the action of the drug
2) To compare the action of two different drugs or two
successive dosages of the same drug
3) To find the relative potency of a new drug with respect
to a standard drug
7. • In Medicine:
1) To compare the efficacy of particular drug, operation or
line of treatment
2) To find the an association between two attributes such as
cancer and smoking
3) To identify signs and symptoms of a disease
• In Community Medicine and Public Health:
1) To test usefulness of sera and vaccines in the field
2) In epidemiological studies – the role of causative factors
is statistically tested.
8. HISTORY
• In 1925, Ronald Fisher advanced the idea of statistical
hypothesis which he called as “tests of significance” in
his publication Statistical Methods for research workers.
• He suggested a probability of one in twenty(0.05) as a
convenient cutoff level to reject the null hypothesis.
• In 1933, Jery Neyman and Egon Pearson called this
cutoff the significance level, which they named a
9. • These tests are the mathematically used methods by
which probability of an observed difference by chance is
found.
• It may be difference between means or proportions of the
sample and universe or between the estimates of
experiment or control groups
10. DATA
• A collective recording of observations either numeric or
otherwise is called data
• Understanding the data is crucial in biostatistics, since
the type of data determines the selection of appropriate
test of significance.
12. Qualitative Data or Categorical Data
• This data exists in mutually exclusive categories. It deals
with attributes or qualities of sampling units.
Nominal Data:
• Categorical variables that have neither measurement
scales nor direction.
Examples:
• Recording of blood groups, hair color, marital status
• Reasons for extraction of teeth
1) Caries 2) periodontitis 3) therapeutic 4) others
13. Ordinal (ranked) data:
• Characterized in terms of more than two variables and have
a clearly implied direction but the data is not measured on a
measurement scale
Examples:
• Severity of pain perceived by the patient
1) No pain 2) mild pain 3) moderate pain 4) severe pain
• Most popular persons on social media
• Best books of 2019
14. Dichotomous data: (Binary Variable)
• The variable can have only 2 values
Examples:
• Gender : Male / Female
• Exam results : Pass / Fail
• Do you have caries: yes/ No
15. Quantitative / Numerical Data
• Observations follow a direction and are quantified on a
scale of measurement
• Continuous data not only show the position of different
observations relative to each other but also show the
extent to which one observation differs from another
• It enables the investigators to make more detailed
inferences than do nominal / ordinal data
16. Discrete Data:
• When the variable under observation takes only fixed
values like whole numbers, the data is discrete
Example:
• DMFT score (0-32)
• Number of students in a class
17. Continuous Data:
• If the variable can take only value in a given range,
decimal or fractional
Example:
• BMI, Height, B.P, arch length, mesio-distal width of
erupted teeth
• Depending upon the source of data can be divided into
primary data and secondary data
18. Primary Data :
• Obtained directly from the source
• It is first hand information
• Data obtained by means of questionnaire, interviews or
clinical experiments
Secondary Data :
• Obtained from pre-existing records
• It is second hand information
• Data obtained from government and hospital records
19. Measures of Central tendency
• It is the central value around which the other values are
distributed.
• Also known as statistical averages
Should satisfy following properties
It should
1) Be easy to understand and compute
2) Be based on each and every item in the series
3) Not be affected by extreme observations
4) Have sampling stability
20. • Mean – mathematical estimate
• Median – positional estimate
• Mode – based on frequency
21. 1) Mean/ Arithmetic Mean/ Arithmetic Average
• Obtained by adding all the individual observations
and divided by total number of observations
• Mean = Σxi
• Eg: No. of decayed teeth in group of 10 children aged
5 years are 2,2,4,1,3,0,5,2,3,4
• Mean = 2+2+4+1+3+0+5+2+3+4
10
• Mean = 2.6
n
22. 2) Median
• When all the observations of a variable are arranged
in either ascending or descending order, the middle
observation is known as Median.
23. • Eg: No. of visits to a dentist by 10 patients in one
year 13,8,4,3,5,2,8,1,7,4
first arrange them in order
1,2,3,4,4,5,7,8,8,13
4+5
2
= 4.5
24. 3) Mode
• Mode or modal value is that value in a series of
observations that occurs with the greatest frequency.
• Eg: Age at eruption of the canine is 6,6,5,7,8,6,7,5
Mode = 6
• When mode is ill-defined
Mode = 3 Median – 2 mean
25. Measures of Dispersion/ Measures of variability/
Measures of variation or scatter
• Dispersion is the degree of spread / variation of the
variable about a central value
Uses:
• Determine reliability of an average
• Serve as a basis of control of variability
• Comparison of 2 or more series
• Facilitate further statistical analysis
26. i) Range
• Difference between maximum and minimum values
• Simplest method
• Gives no information about the values that lie between
the extreme values
• Subjected to fluctuations from sample to sample
27. ii) Mean Deviation
• The average of the deviations from the arithmetic
mean, ignoring the + and – sign
M.D = Σ (X – Xi) / n
Σ = sum of
X = arithmetic mean
Xi = value of each observation in the data
n = no. of observations in the data
28. iii) Standard Deviation
• Most important and widely used measure of variation
• Also known as root mean square deviation
• It is square root of the mean of the squared deviations
from arithmetic mean
• Greater the deviation – greater the magnitude of
dispersion from mean
• Small standard deviation – higher degree of uniformity
of the observations.
29. S.D =
(𝑋 −𝑋𝑖)2
𝑛
Steps:
• Calculate the mean – X
• Find the deviations (or) of the individuals
• Square these deviations and add them up Σ 𝑋 − Xi 2)
• Divide the result by total no. of observations – n (or n-1
if sample size is less than 30)
• Then obtain square root. This gives standard deviation
30. Uses:
• Summarizes the deviations of a large distribution
• Indicates whether the variation from mean is by chance
or real
• Helps in finding standard error, suitable sample size
• S.D is only interpretable as a summary measure for
variation having approximately symmetric preparations
31. Normal Curve /Gaussian Distribution / Normal
distribution
• When data is collected from a very large number of
people and a frequency distribution is made with narrow
class intervals, the resulting curve is smooth and
symmetric and it is called a normal curve.
• In a normal curve,
a) Mean + 1 S.D covers 68.3% of the observations
b) Mean + 2 S.D covers 95.4% of the observations
c) Mean + 3 S.D covers 99.7% of the observations
32.
33. Standard Normal Curve
• Bell Shape
• Perfectly Symmetrical
• Max. number of observations is at the mean and the
number of observations gradually decrease on either side
with few observations at the extreme points
• Total area of curve 1
Mean 0
S.D 1
34. • All the 3 measures of central tendency, the mean,
median and mode coincide
• If mean > 2 S.D Indicates values are normally
distributed
Mean ≥ 2 SD = Normal distribution
35. Skewness
• Skewness is a measure of the degree of asymmetry or
tail age of a frequency distribution
Frequency
36. Probability
• Probability may be defined as relative frequency or
probable chance of occurrence
• Probability is usually expressed by the symbol ‘p’.
• It ranges from zero (O) to one (1).
• When p = O. It means there is no chance of an event
happening or its occurrence is impossible.
Eg. Chances of survival after rabies is zero or nil.
37. • If p = 1, It means the chances of an event happening
are 100%.
Eg. Chances of survival after sandfly fever is 100%
• The P-value can be more than α or less than α
depending on data, when P-value is less than α result
is statistically significant
38. • The level of significance is usually fixed at
5% (0.05)
1% (0.01)
0.1% (0.001)
0.5% (0.005)
• Maximum desirable is 5% level
0.05-0.01 = statistically significant
< 0.01= highly statistically significant
< 0.001 or 0.005 = very highly significant
39. Hypothesis
• Can be defined as tentative prediction or explanation of
the relationship between 2 or more variables
Null Hypothesis:
• States that there is no real (true) difference between the
means (or proportions) of the groups being compared.
• Generally symbolized as HO
40. Alternative Hypothesis:
• It states that the sample result is different i.e., greater or
smaller than the hypothetical value of population.
• Generally symbolized as H1
• Eg: weight gain / loss due to new feeding regimen
• 1. Zone of Acceptance
2. Zone of Rejection
41. Zone of Acceptance
• If the result of a sample falls in a plain area i.e., within
the mean + 1.96 SE the H0 is accepted, hence this area
is called Zone of acceptance for null hypothesis
Zone of Rejection
• If the result of a sample falls out of the plain area or
shaded area i.e., beyond mean + 1.96 SE it is
significantly different from the universe value. H0 is
rejected and H1 is accepted. This area is called Zone of
Rejection for H0
42.
43. Types of Errors
Type -I Error
• Rejection of hypothesis which should have been
accepted
• Denoted by a
Type – II Error
• Accepting the hypothesis which should have been
rejected
• Denoted by
44. Tests of Significance
• Can be broadly classified into 2 types
1. Parametric tests (or) standard tests of hypothesis
2. Non – Parametric tests (or) distribution free-test of
hypothesis
45. PARAMETRIC TESTS
• Parametric test is a statistical test that makes
assumptions about the parameters of the population
distribution(s) from which ones data is drawn.
46. When to use parametric test???
• Subjects should be randomly selected
• Data should be normally distributed
• Homogeneity of variances
47. • The important parametric tests are:
1) z-test
2) t-test
3) ANOVA
4) Pearson correlation coefficient
48. Z - Test
• This is a most frequently used test in research studies.
• z-test is based on the normal probability distribution
and is used for judging the significance of several
statistical measures, particularly the mean.
• z test is used when sample size greater than 30. Test of
significance for large samples
Z = observation – mean
SD
49. Pre-requisites to apply z- test
• Sample must be selected randomly
• Data must be quantitative
• Variable is assumed to follow normal distribution in
the population
• Sample size must be greater than 30. if SD of
population is known, z test can be applied even sample
size is less than 30.
50. • Z – test for means has two applications
1) To test the significance of difference between the
sample mean(X) and a know value of population().
Observed differences between sample
Z = sample mean – population mean
SE of sample mean
51. 2) To test the significance of difference between 2 sample
means or between experimental and control sample means.
Z = observed difference between 2 sample means
SE of difference between 2 sample means
52. One-tailed and Two-tailed Z - tests
• Z value on either side of the mean are calculated as
-Z / +Z
• Value larger than mean +Z
• Value smaller than mean -Z
53. One-tailed Z - Test
• In the test of significance when one wants to specifically
know if the difference between the two groups is higher
or lower i.e the direction plus or minus side is specified.
• Then one end or tail of the distribution is excluded.
• Eg. if one wants to know if malnourished children have
less mean IQ than well nourished, then higher side of the
distribution will be excluded
• Such test of significance is called one tailed test
54. Two-tailed Z - Test
• This test determines if there is a difference between the
two groups without specifying whether difference is
higher or lower.
• It includes both ends and tails of the normal distribution.
Such test is called Two tailed test.
• Eg: When one wants to know if mean IQ in malnourished
children is different from well nourished children but does
not specify if it is more or less.
55. t - Test
• In case of samples less than 30 the Z value will not
follow the normal distribution
• Hence Z test will not give the correct level of
significance
• In such cases students t test is used
• It was given by WS Gossett whose pen name was
Student. So, it is also called as Student Test.
56. There are two types of student t Test
1. Unpaired t test
2. Paired t test
57. Criteria for applying t - test
• Random samples
• Quantitative data
• Variable normally distributed
• Sample size less than 30
58. Unpaired test
• Applied to unpaired data of independent observation
made on individuals of 2 separate groups or samples
drawn from the population
• To test if the difference between the 2 means is real or it
can be due to sampling variability
59. Steps in unpaired t- test:
• As per null hypothesis, assume that there is no real difference
between the means of 2 samples
• calculate the mean of two samples
• Calculate observed difference between means of 2 samples
• X1 – X2
• Calculate the standard error of mean which is given by
SE = SD
60. • t = x1-x2
SE
• Determine the degree of freedom which is one less than
no of observation in a sample (n -1).
• if it is for 2 samples here combined degree of freedom
will be df = (n1 – 1) + (n2 – 1) = n1 + n2-2
61. Paired t - test
• It is applied to paired data of observation from one
sample only .
• The individual gives a pair of observation i.e.
observation before and after taking a drug
Examples:
• Pulse rate before and after exertion
• Plaque scores before and after using oral hygiene aid
62. Steps in paired t- test:
• As per null hypothesis, assume that there is no real
difference between the means of before and after
experiment
• Calculate the mean difference in paired observation i.e.
before and after = x1 – x2 = X
• Calculate SE = SD
n
• Determine t = X
SE
63. • Determine the degree of freedom Since there is one
sample df = n-1
• Refer to table and find the probability of the t value
corresponding to degree of freedom
• P< 0.05 states difference is significant
• P> 0.05 states difference is not significant
64. ANOVA (Analysis of Variance)
• Investigations may not always be confined to
comparison of 2 samples only
• In such cases where more than 2 samples are used
ANOVA can be used
• Also when measurements are influenced by several
factors playing their role e.g. factors affecting retention
of a denture, ANOVA can be used.
• ANOVA helps to decide which factors are more
important
65. • Indications: To compare more than two sample means
Criteria for applying ANOVA:
• Randomly selected samples from the corresponding
populations
• Quantitative data
• Variables are normally distributed
67. One way ANOVA
The design includes only one independent variable (e.g.,
treatment group), the technique applied is called One-way
ANOVA
Eg:
1. Compare control group with three different doses of
aspirin in rats
2. Effect of supplementation of vitamin C in each subject
before, during and after the treatment.
68. Two way ANOVA
• Used to determine the effect of two nominal predictor
variables on a continuous outcome variable.
• A two-way ANOVA test analyzes the effect of the
independent variables on the expected outcome along
with their relationship to the outcome itself.
69. Multi way ANOVA
• Three or more factors affect the result or outcomes
between the groups
70. Knowledge, Attitude, and Perceived Barriers toward
Evidence-Based Practice among Dental and Medical
Academicians and Private Practitioners in Pune: A
Comparative Cross-sectional Study
71. Pearson’s correlation coefficient
• Relationship or association between two quantitatively
measured or continuous variables
• Eg : Height and weight, temperature and pulse, age and vital
capacity, etc..
• The extent of relationship of two quantitative variables is
measured by Pearson’s correlation coefficient. It is denoted
by letter ‘r’.
• -1 ≤ r ≤ +1
72. Types of correlation
• Perfect positive correlation, r = +1
• Perfect negative correlation r = -1
• Absolutely no correlation, r = 0
73. Z - Test t - Test ANOVA Pearson correlation
coefficient
Type of
Data
Continuous
data
Independent variable
– qualitative(nominal)
Dependent variable –
quantitative(continuo
us)
Continuous data
Sample size ˃ 30 ˂ 30 Large enough -
Types 1) One tailed
Z test
2) Two tailed
Z test
1) Paired t-
test
2) Unpaired t-
test
1) One way ANOVA
2) Two way ANOVA
3) Multi way ANOVA
1) Perfect positive
correlation
2) Perfect negative
correlation
3) Perfect no
correlation
Application To compare the
differences
between the
proportions
To compare the
means of two
independent or
two related
samples
To compare the
means of 3 or more
independent samples
To determine the
relationship or
association between
two quantitatively
measured or
continuous variables
74. Examples 1) Proportion of
patients surviving
in a treated group
differs from that in
an untreated group
1) UNPAIRED t-
test Compare the
mean systolic blood
pressure of male
and female
participants
2) PAIRED t- test
Plaque scores
before and after
using oral hygiene
aid
1) Effect of
supplementation of
vitamin C in each
subject before,
during and after
the treatment
1) Correlation
between diastolic
and systolic blood
pressure
75. CONCLUSION
• Tests of significance play an important role in conveying
the results of any research and thus the choice of an
appropriate statistical test is very important as it decides
the fate of outcome of the study.
• Hence the emphasis placed on tests of significance in
clinical research must be tempered with an
understanding that they are tools for analyzing data and
should never be used as a substitute for knowledgeable
interpretation of outcomes.
76. REFERENCES
• Katz DL, Elmore JG, Wild DMG, Lucan SC. Jekel’s
Epidemiology, Biostatistics and Preventive Medicine. 4rd
edition. Philadelphia: Elsevier Publishers; 2014.
• Kothari CR. Research Methodology-Methods and
Techniques: 4th Edition: New Age International
publishers; 2019.
• Mahajan BK. Methods in Biostatistics. 8th ed. New Delhi:
Jaypee Publishers; 2009.
77. • Peter S. Essentials of preventive and community
dentistry. 6th edition Arya publishers; 2017.
• Kim JS and Dailey RJ. Biostatistics for oral healthcare.
1st edition.
• Jaakkola S, Rautava P, Alanen P, Aromaa M,
Pienihäkkinen K, Räihä H, Vahlberg T, Mattila ML,
Sillanpää M. Dental fear: one single clinical question for
measurement. The open dentistry journal. 2009;3:161.
78. • Valizadeh S, Eil N, Ehsani S, Bakhshandeh H.
Correlation between dental and cervical vertebral
maturation in Iranian females. Iranian Journal of
Radiology. 2013 Jan;10(1):1.