PRESENTED BY,
Dr. Sushi Kadanakuppe
II year PG student
Dept of Preventive & Community Dentistry
Oxford Dental College & Hospital
BIOSTATISTICS
 INTRODUCTION
 BASIC CONCEPTS
Data
Distributions
 DESCRIPTIVE STATISTICS
Displaying data
Frequency distribution tables.
Graphs or pictorial presentation of data.
Tables.
Numerical summary of data
Measure of central tendency
Measure of dispersion.
 ANALYTICAL OR INFERENTIAL
STATISTICS
 The nature and purpose of statistical inference
 The process of testing hypothesis
a. False-positive & false-negative errors.
b. The null hypothesis & alternative hypothesis
c. The alpha level & p value
d. Variation in individual observations and in
multiple samples.
 Tests of statistical significance
 Choosing an appropriate statistical test
 Making inferences from continuous
(parametric) data.
 Making inferences from ordinal data.
 Making inferences from dichotomous and
nominal (nonparametric) data.
 CONCLUSION
INTRODUCTION
The worker with human material will find the
statistical method of great value and will have
even more need for it than will the laboratory
worker.
Claude Bernard (1927)1,
a French physiologist of
the nineteenth century and a pioneer in laboratory
research, writes: “We compile statistics only when
we cannot possibly help it. Statistics yield
probability, never certainty- and can bring forth
only conjectural sciences.”
The worker with human material, however, can
seldom control environment, nor can bring about
drastic changes in his subjects quickly,
particularly if he is studying chronic disease.
The variability of human material, plus the fact
that time allows the introduction of many
additional factors which may contribute to a
disease process, leaves the worker with
quantitative data affected by a multiplicity of
factors.
Statistical methods becomes necessary, probability
becomes of great interest, and conjecture based
upon statistical probability may show a way to
break the chain of causation of a disease even
before all factors entering into the production of
the disease are clearly understood.
Yule (1950)2
has defined statistics as “methods
specially adapted to the elucidation of quantitative
data affected by a multiplicity of causes”.
Fully half the work in the biostatistics involves
common sense in the selection and interpretation
of data. The magic of numbers is no substitute.
Bernard points with derision at a German author
who measured the salivary output of one sub
maxillary and one parotid gland in a dog for one
hour1
.
This author then proceeded to deduce the output of
all salivary glands, right and left, and finally the
output of saliva of a man per kilogram per day.
The result, of course, was a very top-heavy
structure built upon a set of observations entirely
too small for the purpose.
Work of this sort explains the jibes which so often
ricochet upon better statisticians. Such mistakes
can be avoided.
Statisticians also suffer because they are so often
content merely to collect and analyze data as an
end in itself without the purpose or hope of
producing new knowledge or a new concept.
Conant (1947), in his book On Understanding
Science, makes it very clear that new concepts
must alternate with the collection of data if an
advance in our knowledge is to occur3
.
DEFINITION
Statistics is a scientific field that deals with the
collection, classification, description, analysis,
interpretation, and presentation of data4
.
• Descriptive statistics
• Analytical statistics
• Vital statistics
a. Descriptive statistics concerns the summary
measures of data for a sample of a population.
b. Analytical statistics concerns the use of data
from a sample of a population to make
inferences about the population.
c. Vital statistics is the ongoing collection by
government agencies of data relating to events
such as births, deaths, marriages, divorces and
health- and –disease related conditions deemed
reportable by local health authorities.
USES
Biostatistics is a powerful ally in the quest for
the truth that infuses a set of data and waits to be
told.
• Statistics is a scientific method that uses theory
and probability to aid in the evaluation and
interpretation of measurements and data
obtained by other methods.
b. Statistics provides a powerful reinforcement for
other determinants of scientific causality.
c. Statistical reasoning, albeit unintentional or
subconscious, is involved in all scientific clinical
judgments, especially with preventive
medicine/dentistry and clinical
medicine/dentistry becoming increasingly
quantitative.
BASIC CONCEPTS
DATA
Definition: Data are the basic building blocks of
statistics and refers to the individual values
presented, measured, or observed.
a. Population vs sample. Data can be derived
from a total population or a sample.
1. A population is the universe of units or values
being studied. It can consist of individuals,
objects, events, observations, or any other
grouping.
2. A sample is a selected part of a population.
The following are some of the common types of
samples:
a) Simple random sample
b) Systematic selected sample
c) Stratified selected sample
d) Cluster selected sample
e) Nonrandomly selected, or convenience
sample.
b. Ungrouped vs grouped
1. Ungrouped data are presented or observed individually.
An example of ungrouped data is the following list of
weights (in pounds) for six men: 140, 150, 150, 150,
160, and 160.
2. Grouped data are presented in groups consisting of
identical data by frequency.
An example of grouped data is the following list of
weights for the six men noted above: 140 lb (one man),
150 lb (three men), and 160 lb (two men).
c. Quantitative vs qualitative
1. Quantitative data are numerical, or based on
numbers.
An example of quantitative data is height measured
in inches.
2. Qualitative data are nonnumerical, or based on
a categorical scale.
An example of qualitative data is height measured
in terms of short, medium, and tall.
d. Discrete vs continuous
1.Discrete data or categorical data are data for
which distinct categories and a limited number of
possible values exist.
An example of discrete data is the number of
children in a family, that is, two or three children,
but not 2.5 children.
All qualitative data are discrete.
Categorical data are further classified into
two types:
• nominal scale
• ordinal scale.
Nominal scale:
A variable measured on a nominal scale is
characterized by named categories having no
particular order.
For example,
 patient gender (male/female),
 reason for dental visit (checkup, routine
treatment, emergency), and
 use of fluoridated water (yes/no) are all
categorical variables measured on a nominal scale.
Within each of these scales, an individual subject
may belong to only one level, and one level does
not mean something greater than any other level.
Ordinal scale
Ordinal scale data are variables whose categories
possess a meaningful order.
For example,
 Severity of periodontal disease (0=none, 1=mild,
2=moderate, 3=severe) and
 Length of time spent in a dental office waiting
room (1= less than 15 min, 2= 15 to less than 30
minutes, 3= 30 minutes or more) are variables
measured on ordinal scales.
2. Continuous data or measurement data are data
for which there are an unlimited number of
possible values.
An example of continuous data is an individual’s
weight, which may actually be 159.232872…lb
but is reported as 159 lb.
• Measurement data can be characterized by
interval scale
ratio scale
• If the continuous scale has a true 0 point, the
variables derived from it can be called ratio
variables. The Kelvin temperature scale is a ratio
scale, because 0 degrees on this scale is absolute 0.
• The centigrade temperature scale is a continuous
scale but not a ratio scale, because 0 degrees on
this scale does not mean the absence of heat. So
this becomes an example of an interval scale, as
zero is only a reference point.
e. The quality of measured data is defined in
terms of the data’s accuracy, validity, precision,
and reliability.
1. Accuracy refers to the extent that the
measurement measures the true value of what is
under study.
2. Validity refers to the extent that the measurement
measures what it is supposed to measure.
3. Precision refers to the extent that the
measurement is detailed.
4. Reliability refers to the extent that the
measurement is stable and dependable.
Dental health professionals have a variety of
uses for data5
:
• For designing a health care program or facility
• For evaluating the effectiveness of an oral hygiene
education program
• For determining the treatment needs of a specific
population
• For proper interpretation of the scientific
literature.
DISTRIBUTIONS
Definition. A distribution is a complete summary of
frequencies or proportions of a characteristic for a series of
data from a sample or population.
Types of distributions
• Binomial distribution
• Uniform distribution
• Skewed distribution
• Normal distribution
• Log-normal distribution
• Poisson distribution
a. Binomial distribution is a distribution of
possible outcomes from a series of data
characterized by two mutually exclusive
categories.
b. Uniform distribution, also called rectangular
distribution, is a distribution in which all events
occur with equal frequency.
c. Skewed distribution is a distribution that is
asymmetric.
1. A skewed distribution with a tail among the
lower values being characterized is skewed to the
left, or negatively skewed.
2. A skewed distribution with a tail among the
higher values being characterized is skewed to
the right, or positively skewed.
d. Normal distribution, also called Gaussian
distribution, is a continuous, symmetric, bell-
shaped distribution and can be defined by a
number of measures.
e. Log-normal distribution is a skewed distribution
when graphed using an arithmetic scale but a
normal distribution when graphed using a
logarithmic scale.
f. Poisson distribution is used to describe the
occurrence of rare events in a large population.
Normal distribution Skewed distribution
Binomial distribution
DESCRIPTIVE STATISTICS
Descriptive statistical techniques enable the
researchers to numerically describe and
summarize a set of data.
Data can be displayed by the following ways:
Frequency distribution tables.
Graphs or pictorial presentation of data.
Tables.
Numerical summary of data
Measure of central tendency
Measure of dispersion.
I DISPLAYING DATA
Data can be displayed by the following ways:
Frequency distribution tables.
Graphs or pictorial presentation of data.
Tables.
Frequency Distribution Tables
To better explain the data that have been collected,
the data values are often organized and presented
in a table termed a frequency distribution table.
This type of data display shows each value that
occurs in the data set and how often each value
occurs.
In addition to providing the sense of the shape of a
variable’s distribution, these displays provide the
researcher with an opportunity to screen the data
values for incorrect or impossible values, a first
step in the process known as “cleaning the data”5
• The data values are first arranged in order from
lowest to highest value (an array).
• The frequency with which each value occurs is
then tabulated.
• The frequency of occurrence for each data point is
expressed in four ways:
1. The actual count or frequency
2. The relative frequency (percent of the total
number of values).
3. Cumulative frequency (total number of
observations equal to or less than the value)
4. Cumulative relative frequency (the percent of
observations equal to or less than the value)
commonly referred to as percentile.
Exam
Scores Frequency % cumulative frequency cumulative %
56 1 3.0 1 3.0
57 1 3.0 2 6.1
63 1 3.0 3 9.1
65 2 6.1 3 15.2
66 1 3.0 3 18.2
68 2 6.1 5 24.2
69 2 6.1 6 30.3
70 2 3.0 8 36.4
71 1 3.0 10 42.4
72 1 6.1 11 45.5
74 2 3.0 12 48.5
75 1 3.0 14 54.5
76 3 6.1 15 63.6
77 2 9.1 16 69.7
78 1 6.1 18 72.7
79 1 3.0 21 75.8
80 2 3.0 23 84.8
81 3 3.0 24 87.9
Frequency Distribution Table for exam scores
• Instead of displaying each individual value in a
data set, the frequency distribution for a variable
can group values of the variable into consecutive
intervals.
• Then the number of observations belonging to an
interval is counted.
Exam scores Number of students %
56-61 2 6
62-65 3 9
66-69 5 15
70-73 4 12
74-77 7 21
78-81 7 21
82-85 3 9
86-89 2 6
Grouped frequency distribution of exam scores
Although the data are condensed in a useful
fashion, some information is lost.
The frequency of occurrence of an individual data
point cannot be obtained from a grouped
frequency distribution.
For example, in the above presentation of data,
seven students scored between 74 and 77, but the
number of students who scored 75 is not shown
here.
Graphic or pictorial presentation of data
Graphic or pictorial presentations of data are useful
in simplifying the presentation and enhancing the
comprehension of data.
All graphs, figures, and other pictures should have
clearly stated and informative titles, and all axes
and keys should be clearly labeled, including the
appropriate units of measurement.
Visual aids can take many forms; some basic
methods of presenting data are described below.
1. Pie chart
A pie chart is a pictorial representation of the
proportional divisions of a sample or population,
with the divisions represented as parts of a whole
circle.
cervical caries
Occlusal caries
Root caries
Dental caries in xerostomia patients
39% 42%
19%
2. Venn diagram
A Venn diagram shows the degrees of overlap and
exclusivity for two or more characteristics or factors
within a sample or population (in which case each
characteristic is represented by a whole circle)
or
for a characteristic or factor among two or more samples
or populations (in which case each sample or population
is represented by a whole circle).
The sizes of the circles (or other symbols) need not
be equal and may represent the relative size for
each factor or population.
3. Bar diagram
 A bar diagram is a tool for comparing categories
of mutually exclusive discrete data.
 The different categories are indicated on one
axis, the frequency of data in each category is
indicated on the other axis, and the lengths of
the bars compare the categories.
 Because the data categories are discrete, the bars
can be arranged in any order with spaces
between them.
Dental caries in Xerostomia Patients
0
20
40
60
80
cervical caries Occlusal
caries
Rootcaries
Series1
4. Histogram
A histogram is a special form of bar diagram that
represents categories of continuous and ordered
data.
The data are adjacent to each other on the x-axis
(abscissa), and there is no intervening space. The
frequency of data in each category is depicted on
the y-axis (ordinate), and the width of the bar
represents the interval of each category.
0
10
20
30
40
50
No of Subjects
5 to 10 years
10 to 15
15 to 20
20 to 25
25 to 30
Histogram of age for xerostomia subjects
5. Epidemic curve
An epidemic curve is a histogram that depicts the
time course of an illness, disease, abnormality,
or condition in a defined population and in a
specified location and time period.
The time intervals are indicated on the x-axis,
and the number of cases during each time
interval is indicated on the y-axis.
An epidemic curve can help an investigator
determine such outbreak characteristics as the
peak of disease occurrence (mode), a possible
incubation or latency period, and the type of
disease propagation.
6. Frequency polygon
A frequency polygon is a representation of the
distribution of categories of continuous and
ordered data and, in this respect, is similar to a
histogram.
The x-axis depicts the categories of data, and the
y-axis depicts the frequency of data in each
category.
In a frequency polygon, however, the frequency is
plotted against the midpoint of each category, and
a line is drawn through each of these plotted
points.
The frequency polygon can be more useful than the
histogram because several frequency distributions
can be plotted easily on one graph.
Frequency polygon showing cancer mortality by age group
and sex
7. Cumulative frequency graph
A cumulative frequency graph also is a
representation of the distribution of continuous
and ordered data.
In this case, however, the frequency of data in
each category represents the sum of the data
from that category and from the preceding
categories.
The x-axis depicts the categories of data, and the y-
axis is the cumulative frequency of data,
sometimes given as a percentage ranging from 0%
to 100%.
The cumulative frequency graph is useful in
calculating distribution by percentile, including
the median, which is the category of data that
occurs at the cumulative frequency of 50%.
Medical examiner reported (MER) in St. Louis for the years 1979,
1980, & 1981
8. Box plot
 A box plot is a representation of the quartiles
[25%, 50% (median), and 75%] and the range of
a continuous and ordered data set.
 The y-axis can be arthimetic or logarithmic.
 Box plots can be used to compare the different
distributions of data values.
Distribution of weights of patients from hospital A and hospital B
9. Spot map
A spot map, also called a geographic coordinate
chart, is a map of an area with the location of
each case of an illness, disease, abnormality, or
condition identified by a spot or other symbol on
the map.
A spot map often is used in an outbreak setting
and can help an investigator determine the
distribution of cases and characterize an
outbreak if the population at risk is distributed
evenly over the area.
Distribution of Lyme disease cases in Canada from 1977 to 1989
TABLES
In addition to graphs, data are often summarized in
tables. When material is presented in tabular form,
the table should be able to stand alone; that is,
correctly presented material in tabular form should
be understandable even if the written discussion of
the data is not read.
A major concern in the presentation of both figures
and tables is readability.
Tables and figures must be clearly understood and
clearly labeled so that the reader is aided by the
information rather than confused.
Suggestions for the display of data in graphic or
tabular form5
:
1. The contents of a table as a whole and the items
in each separate column should be clearly and
fully defined. The unit of measurement must be
included.
2. If the table includes rates, the basis on which
they are measured must be clearly stated- death
rate percent, per thousand, per million, as the
case may be.
3. Rates or proportions should not be given alone
without any information as to the numbers of
observations on which they are based. By giving
only rates of observations and omitting the
actual number of observations, we are excluding
the basic data.
4. Where percentages are used, it must be clearly
indicated that these are not absolute numbers.
Rather than combine too many figures in one
table, it is often best to divide the material into
two or three small tables.
5. Full particulars of any exclusion of observations
from a collected series must be given. The
reasons for and the criteria of exclusions must be
clearly defined, perhaps in a footnote.
II NUMERICAL SUMMARY OF DATA
Although graphs and frequency distribution tables
can enhance our understanding of the nature of a
variable, rarely do these techniques alone suffice
to describe the variable. A more formal numerical
summary of the variable is usually required for the
full presentation of a data set.
To adequately describe a variable’s values, three
summary measures are needed:
1. The sample size.
2. A measure of central tendency
3. A measure of dispersion.
 The sample size is simply the total number of
observations in the group and is symbolized by the
letter N or n.
 A measure of central tendency or location
describes the middle (or typical) value in a data
set.
 A measure of dispersion or spread quantifies
the degree to which values in a group vary from
one another.
Measures of Central Tendency
Whenever one wishes to evaluate the outcome of
study, it is crucial that the attributes of the sample
that could have influenced it be described.
Three statistics, the mode, median, and mean,
provide a means of describing the “typical”
individual within a sample.
These statistics are frequently referred to as
“measures of central tendency”.
 Measures of central tendency are characteristics that
describe the middle or most commonly occurring values in
a series.
 They tell us the point about which items have a tendency
to cluster. Such a measure is considered as the most
representative figure for the entire mass of data.
 They are used as summary measures for the series. The
series can consist of a sample of observations or a total
population, and the vales can be grouped or ungrouped.
Measure of central tendency is also known as statistical
average.
1. Mode
The mode of a data set is that value that occurs
with the greatest frequency.
A series may have no mode (i.e., no value occurs
more than once) or it may have several modes
(i.e., several values equally occur at a higher
frequency than the other values in the series).
Whenever there are two nonadjacent scores with
the same frequency and they are the highest in the
distribution, each score may be referred to as the
‘mode’ and the distribution is ‘bimodal’.
In truly bimodal distribution, the population
contains two sub-groups, each of which has a
different distribution that peaks at a different
point.
More than one mode can also be produced
artificially by what is known as digit preference,
when observers tend to favor certain numbers over
others.
For example, persons who measure blood pressure
values tend to favor even numbers, particularly
those ending in 0 (e.g., 120 mm Hg).
Calculation: The mode is calculated by
determining which value or values occur most in a
series.
Example: consider the following data. Patients
who had received routine periodontal scaling were
given a common pain-relieving drug and were
asked to record the minutes to 100% pain relief.
Note that “minutes to pain relief” is a continuous
variable that is measured on the ratio scale. The
patients recorded the following data:
Minutes to 100% pain relief:
15 14 10 18 8 10 12 16 10 8 13
First, make an array, that is, arrange the values in
ascending order:
8 8 10 10 10 12 13 14 15 16 18
By inspection, we already know two descriptive
measures belonging to this data: N=11 and
mode=10.
Application and characteristics
1. The primary value of the mode lies in its ease of
computation and in its convenience as a quick
indicator of a central value in a distribution.
2. The mode is useful in practical epidemiological
work, such as determining the peak of disease
occurrence in the investigation of a disease.
3. The mode is the most difficult measure of central
tendency to manipulate mathematically, that is, it
is not amenable to algebraic treatment; no
analytic concepts are based on the mode.
4. It is also the least reliable because with successive
samplings from the same population the
magnitude of the mode fluctuates significantly
more than the median or mean.
It is possible, for example, that a change in just
one score can substantially change the value of the
modal score.
2. Median P50
The median is the value that divides the
distribution of data points into two equal parts,
that is, the value at which 50% of the data points
lie above it and 50% lie below it.
The median is the middle of the quartiles (the
values that divide the series into quarters) and the
middle of the percentiles (the values that divide
the series into defined percentages).
Calculation:
a) In a series with an odd number of values, the
values in the series are arranged from lowest to
highest, and the value that divides the series in
half is the median.
b) In a series with even number of values, the two
values that divide the series in half are determined,
and the arithmetic mean of these values is the
median.
c) An alternative method for calculating the median
is to determine the 50% value on a cumulative
frequency curve.
Example: In the above example of data series of minutes to
100% pain relief,
8 8 10 10 10 12 13 14 15 16 18
determine which value cuts the array into equal portions. In
this array, there are five data points below 12 and there are
five data points above 12. Thus the median is 12.
8 8 10 10 10 12 13 14 15 16 18
⇑
Median
If the number of observations is even, unlike the
preceding example, simply take the midpoint of
the two values that would straddle the center of
the data set.
Consider the following data set with N=10:
8 8 10 10 10 13 14 15 16 18
⇑
Median = 10+13
 = 11.5
2
Applications and characteristics:
1.The median is not sensitive to one or more extreme
values in a series; therefore, in a series with an
extreme value, the median is a more representative
measure of central tendency than the arithmetic
mean.
2. It is not frequently used in sampling statistics. In
terms of sampling fluctuation, the median is
superior to the mode but less stable than the mean.
For this reason, and because the median does not
possess convenient algebraic properties, it is not
used as often as the mean.
3. Median is a positional average and is used only in
the context of qualitative phenomena, for example,
in estimating intelligence, etc., which are often
encountered in sociological fields.
4. Median is not useful where items need to be
assigned relative importance and weights.
5. The median is used in cumulative frequency
graphs and in survival analysis.
3. Arithmetic Mean
The arithmetic mean, or simply, the mean, is the
sum of all values in a series divided by the actual
number of values in a series.
The symbol for the mean is a capital letter X with a
bar above it:Χ or “X-bar”.
Calculation:
The arithmetic mean is determined as
Χ = ∑ X / N
Example:
Using the minutes to pain relief, N = 11 and ∑
X = 134. Therefore
Χ = 134 / 11 = 12.2 min
Properties of the Mean
1. The mean of a sample is an unbiased estimator
of the mean of the population from which it came.
2. The mean is the mathematical expectation. As
such, it is different from the mode, which is the
value observed most often.
3. The sum of the squared deviations of the
observations from the mean is smaller than the
sum of the squared deviations from any other
number.
4. The sum of the squared deviations from the mean
is fixed for a given set of observations. This
property is not unique to the mean, but it is a
necessary property of any good measure of central
tendency.
Applications and characteristics:
1. The arithmetic mean is useful when performing
analytic manipulation. With the exception of a
situation where extreme scores occur in the
distribution, the mean is generally the best
measure of central tendency.
 The values of mean tend to fluctuate least from
sample to sample.
 It is amenable to algebraic treatment and it
possesses known mathematical relationships with
other statistics.
 Hence, it is used in further statistical
calculations. Thus, in most situations the mean is
more likely to be used than either the mode or the
median.
2. The mean can be conceptualized as a fulcrum
such that the distribution of scores around it is in
perfect balance. Since the scores above and below
the mean are in perfect balance, it follows that the
algebraic sum of the observations of these scores
from the mean is 0.
3. Whereas the median counts each score, no matter
what its magnitude, as only one score, the mean
takes into account the absolute magnitude of the
score. The median, therefore, does not balance the
halves of the distribution except when the
distribution is exactly symmetrical; in which case
the mean and the median have identical values.
4. Another way of contrasting the median and the
mean is to compare their values when the
distribution of scores is not symmetrical.
Curve (a) is positively skewed;
that is, the curve tails off to the
right. In this case the mean is
larger than the median because of
the influence of the few very
high scores. Thus these high
scores are sufficient to balance
off the several lower scores. The
median does not balance the
distribution because the
magnitude of the scores is not
included in the computation.
xP50
Curve (b) is negatively
skewed; that is, the
curve tails off to the left.
Now the mean is smaller
than the median because
of the effect of the few
very small scores.
xP50
5. It suffers from some limitations viz., it is unduly
affected by extreme items; it may not coincide
with actual value of an item in a series, and it may
lead to wrong impressions, particularly when the
item values are not given the average.
Let’s refer again to the group of values in which
one patient recorded a rather extreme, for this
group, value:
8 8 10 10 10 12 13 14 15 16 58
The adjusted mean, somewhat larger than the
original mean of 12.2, is calculated as follows:
X = 174 / 11 = 15.8 min
The calculation of the mean is correct, but is its use
appropriate for this data set?
By definition the mean should describe the middle of the
data set.
However, for this data set the mean of 15.8 is larger than
most (9 out of 11!) of the values in the group.
Not exactly a picture of the middle!
In this case the median (12 minutes) is the better choice for
the measure of central tendency and should be used.
However, mean is better than other averages,
especially in economic and social studies where
direct quantitative measurements are possible.
4. Geometric mean
The geometric mean is the nth root of the product
of the values in a series of n values.
Geometric mean (or G.M.) = n π XN
Where,
G.M. = geometric mean,
N = number of items,
π = Conventional product notation
For instance, the geometric mean of the numbers, 4,
6, and 9 is worked out as
G.M.= 3 4.6.9
= 6
Applications and characteristics
1. The geometric mean is more useful and
representative than the arithmetic mean when
describing a series of reciprocal or fractional
values. The most frequently used application of
this average is in the determination of average
percent of change i.e., it is often used in the
preparation of index numbers or when we deal in
ratios.
2. The geometric mean can be used only for
positive values.
3. It is more difficult to calculate than the
arithmetic mean.
5. Harmonic mean
Harmonic mean is defined as the reciprocal of the
average of reciprocals of the values of items of a
series. Symbolically, we can express it as under:
Σ Rec X i
Harmonic mean (H.M.) = Rec. 
N
Applications and characteristics:
1. Harmonic mean is of limited application,
particularly in cases where time and rate are
involved.
2. The harmonic mean gives largest weight to the
smallest item and smallest weight to the largest
item.
3. As such it is used in cases like time and motion
study where time is variable and distance constant.
Measures Of Dispersion
Measures of central tendency provide useful
information about the typical performance for a
group of data. To understand the data more
completely, it is necessary to know how the
members of the data set arrange themselves
about the central or typical value.
The following questions must be answered:
How spread out are the data points?
How stable are the values in the group?
The descriptive tools known as measures of
dispersion answer these questions by quantifying
the variability of the values within a group.
Hence, they are the characteristics that are used to
describe the spread, variation, and scatter of a
series of values.
The series can consist of observations or a total
population, and the values can be grouped or
ungrouped.
This can be done by calculating measures based
on percentiles or measures based on the mean6
.
Measures of dispersion based on percentiles
1. Percentiles
which are sometimes called quantiles, are the
percentage of observations below the point
indicated when all of the observations are ranked
in descending order.
The median, discussed above, is the 50th
percentile.
The 75th
percentile is the point below which 75%
of the observations lie, while the 25th
percentile is
the point below which 25% of the observations lie.
2. Range
The range is the difference between the highest and
lowest values in a series.
Range = Maximum – Minimum.
More usual, however, is the interpretation of the
range as simply the statement of the minimum and
maximum values:
Range = (Minimum, Maximum)
For the sample of minutes to 100% pain relief,
8 8 10 10 10 12 13 14 15 16 58
Range = (8, 18) or Range = 18-8 = 10 min
 The overall range reflects the distance between
the highest and the lowest value in the data set.
In this example it is 10 min.
 In the same example, the 75th
and 25th
percentiles
are 15 and 10 respectively and the distance
between them is 5 min.
This difference is called the interquartile range
(sometimes abbreviated Q3
-Q1
).
Because of central clumping, the interquartile
range is usually considerably smaller than half the
size of the overall range of values.
The advantage of using percentiles is that they can
be applied to any set of continuous data, even if
the data do not form any known distribution.
Application and characteristics
1. The range is used to measure data spread.
2. The range presents the exact lower and upper
boundaries of a set of data points and thus quickly
lends perspective regarding the variable’s
distribution.
3. The range is usually reported along with the
sample median (not the mean).
4. The range provides no information concerning the
scatter within the series.
5. The range can be deemed unstable because it is
affected by one extremely high score or one
extremely low value. Also, only two values are
considered, and these happen to be the extreme
scores of the distribution. The measure of spread
known as standard deviation addresses this
disadvantage of the range.
Measures of dispersion based on the mean
Mean deviation, variance, and standard deviation
are three measures of dispersion based on the
mean.
Although mean deviation is seldom used, a
discussion of it provides a better understanding of
the concept of dispersion.
1. Mean deviation
Because the mean has several advantages, it might
seem logical to measure dispersion by taking the
“average deviation” from the mean. That proves to
be useless, because the sum of the deviations from
the mean is 0.
However, this inconvenience can easily be solved
by computing the mean deviation, which is the
average of the absolute value of the deviations
from the mean, as shown in the following formula:
Mean deviation = ∑ (X - X)

N
Because the mean deviation does not have
mathematical properties that enable many
statistical tests to be based on it, the formula has
not come into popular use.
Instead, the variance has become the fundamental
measure of dispersion in statistics that are based
on the normal distribution.
2. Variance
The variance is the sum of the squared deviations
from the mean divided by the number of values in
the series minus 1.
Variance is symbolized by s2
or V.
s2=
Σ (X - X)2
/ N-1
Σ (X - X)2
is called sum of squares.
In the above formula, the squaring solves the
problem that the deviations from the mean add up
to 0.
Dividing by N-1 (called degrees of freedom),
instead of dividing by N, is necessary for the
sample variance to be an unbiased estimator of the
population variance.
The numerator of the variance (i.e., the sum of the
squared deviations of the observations from the
mean) is an extremely important entity in
statistics. It is usually called either the sum of
squares (abbreviated SS) or the total sum of
squares (TSS).
The TSS measures the total amount of variation in
a set of observations.
Properties of the variance
1. When the denominator of the equation for
variance is expressed as the number of
observations minus 1 (N-1), the variance of a
random sample is an unbiased estimator of the
variance of the population from which it was
taken.
2. The variance of the sum of two independently
sampled variables is equal to the sum of the
variances.
3. The variance of the difference between two
independently sampled variables is equal to the
sum of their individual variances as well.
Application and characteristics
1. The principal use of the variance is in calculating
the standard deviation.
2. The variance is mathematically unwieldy, and its
value falls outside the range of observed values in
a data set.
3. The variance is generally of greater importance
to statisticians than to researchers, students, and
clinicians trying to understand the fruits of data
collection.
We should note that the sample variance is a
squared term, not so easy to fathom in relation to
the sample mean.
Thus the square root of the variance, the standard
deviation, is desirable.
3. Standard deviation (s or SD)
The standard deviation is a measure of the
variability among the individual values within a
group.
Loosely defined, it is a description of the average
distance of individual observations from the group
mean.
Conceptualizing the s, or any of the measures of
variance, is more difficult than understanding the
concept of central tendency.
From one point of view, however, the s is similar
to the mean; that is; it represents the mean of the
squared deviations.
Taking the mean and the standard deviation
together, a sample can be described in terms of its
average score and in terms of its average variation.
If more samples were taken from the same
population it would be possible to predict with
some accuracy the average score of these samples
and also the amount of variation.
The mathematical derivation of the standard
deviation is presented here in some detail because
the intermediate steps in its calculation (1) create a
theme (called “sum of squares”) that is repeated
over and over in statistical arithmetic and (2)
create the quantity known as the sample variance.
Calculation:
STEPS MATHEMATICAL
TERM
LABEL
1. Calculate the mean X
of the group
X = Σ X / N Sample mean
2. Subtract the mean
from each value X.
(X - X) Deviation from the
mean
3. Square each
deviation from the
mean.
(X - X)2 Squared deviation
from the mean.
4. Add the squared
deviations from the
mean.
Σ (X - X)2 Sum of squares (ss)
5. Divide the
sum of
squares by
(N-1).
ss / (N -1) Variance (s2
)
6. Find the
square root
of the
variance.
s2
Standard
deviation
(SD or s)
The above table presents the calculation of the standard
deviation for our sample of minutes to 100% pain relief.
We now have two sets of complete sample description for
our example.
Sample Description 1 Sample Description 2
Sample size N = 11 N = 11
Measure of central
tendency
Median = 12 min X = 12.2 minutes
Measure of spread Range = (8, 18) SD = 3.31
The standard deviation is reported along with the
sample mean, usually in the following format:
mean ± SD.
This format serves as a pertinent reminder that the
SD measures the variability of values surrounding
the middle of the data set.
It also leads us to the practical application of the
concepts of mean and standard deviation shown in
the following rules of thumb:
X ± 1 SD encompasses approximately 68% of the
values in a group.
X ± 2 SD encompasses approximately 95% of the
values in a group.
X ± 3 SD encompasses approximately 99% of the
values in a group.
These rules of thumb are useful when deciding
whether to report the mean ± SD or the median
and range as the appropriate descriptive statistics
for a group of data points.
If roughly 95% of the values in a group are
contained in the interval X ± 2SD, researchers
tend to use mean ± SD. Otherwise the median and
the range are perhaps more appropriate.
Applications and characteristics
1. The standard deviation is extremely important in
sampling theory, in co relational analysis, in
estimating reliability of measures, and in
determining relative position of an individual
within a distribution of scores and between
distributions of scores.
2. The standard deviation is the most widely used
estimate of variation because of its known
algebraic properties and its amenability to use
with other statistics.
3. It also provides a better estimate of variation in
the population than the other indexes.
4. The numerical value of standard deviation is
likely to fluctuate less from sample to sample than
the other indexes.
5. In certain circumstances, quantitative probability
statements that characterize a series, a sample of
observations, or a total population can be derived
from the standard deviation of the series, sample,
or population.
6. When the standard deviation of any sample is
small, the sample mean is close to any individual
value.
7. When standard deviation of a random sample is
small, the sample mean is likely to be close to the
mean of all the data in the population.
8. The standard deviation decreases when the
sample size increases.
4. Coefficient of variation
The coefficient of variation is the ratio of the
standard deviation of a series to the arithmetic
mean of the series.
The coefficient of variation is unit less and is
expressed as a percentage.
Application and characteristics
1. The co efficient of variation is used to compare the
relative variation, or spread, of the distributions of
different series, samples, or populations or of the
distributions of different characteristics of a single series.
2. The coefficient of variation can be used only for
characteristics that are based on a scale with a true zero
value.
Calculation:
The coefficient of variation (CV) is calculated as
CV (%) = SD / X × 100
For example,
In a typical medical school, the mean weight of
100 fourth-year medical students is 140 lb, with a
standard deviation of 28 lb.
CV (%) = 28 / 140 × 100 = 20%
The coefficient of variation for weight is 28 lb
divided by 140 lb, or 20%.
THE NORMAL DISTRIBUTION
The majority of measurements of continuous data
in medicine and biology tend to approximate the
theoretical distribution that is known as the normal
distribution and is also called the gaussian
distribution (named after Johann Karl Gauss, the
person who best described it)6.
• The normal distribution is one of the most
frequently used distributions in biomedical and
dental research.
• The normal distribution is a population frequency
distribution.
• It is characterized by a bell-shaped curve that is
unimodal and is symmetric around the mean of the
distribution.
• The normal curve depends on only two
parameters: the population mean and the
population standard deviation.
• In order to discuss the area under the normal curve
in terms of easily seen percentages of the
population distribution, the normal distribution
has been standardized to the normal distribution in
which the population mean is 0 and the population
standard deviation is 1.
• The area under the normal curve can be
segmented starting with the mean in the center (on
the x axis) and moving by increments of 1 SD
above and below the mean.
Figure shows a standard normal distribution (mean = 0; SD= 1)
and the percentages of area under the curve at each increment of
SD.
34.13% 13.59% 2.27%.2.27%. 13.59% 34.13%
• The total area beneath the normal curve is 1, or
100% of the observations in the population
represented by the curve.
• As indicated in the figure, the portion of the area
under the curve between the mean and 1 SD is
34.13% of the total area.
• The same area is found between the mean and one
unit below the mean.
Moving 2 SD more above the mean cuts off an
additional 13.59% of the area, and moving a total
of 3 SD above the mean cuts off another 2.27%.
The theory of the standard normal distribution
leads us, therefore, to the following property of a
normally distributed variable:
Exactly 68.26% of the observations lie within 1 SD
of the mean.
Exactly 95.45% of the observations lie within 2 SD
of the mean.
Exactly 99.73% of the observations lie within 3 SD
of the mean.
Virtually all of the observations are contained
within 3 SD of the mean. This is the justification
used by those who label values outside of the
interval X ± 3 SD as “outliers” or unlikely
values.
Incidentally, the number of standard deviations
away from the mean is called Z score.
Problems In Analyzing A Frequency Distribution
In a normal distribution, the following holds true:
mean =median =mode.
In an observed data set, there may be skewness,
kurtosis, and extreme values, in which case the
measures of central tendency may not follow this
pattern.
Skewness and Kurtosis
1.Skewness.
A horizontal stretching of a frequency distribution
to one side or the other, so that one tail of
observations is longer and has more observations
than the other tail, is called skewness.
When a histogram or frequency polygon has a
longer tail on the left side of the diagram, the
distribution is said to be skewed to the left.
If a distribution is skewed, the mean moves farther
in the direction of the long tail than does the
median, because the mean is more heavily
influenced by extreme values.
. A quick way to get an approximate idea of whether
or not a frequency distribution is skewed is to
compare the mean and the median. If these two
measures are close to each other, the distribution
is probably not skewed.
2.Kurtosis.
It is characterized by a vertical
stretching of the frequency
distribution.
It is the measure of the
peakedness of a probability
distribution.
As shown in the figure kurtotic
distribution could look more
peaked or could look more
flattened than the bell shaped
normal distribution.
A normal distribution has zero
kurtosis.
• Significant skewness or kurtosis can be detected
by statistical tests that reveal that the observed
data do not form a normal distribution. Many
statistical tests require that the data they analyze
be normally distributed, and the tests may not be
valid if they are used to compare very abnormal
distributions.
• Kurtosis is seldom discussed as a problem in the
medical literature, although skewness is frequently
observed and is treated as a problem.
3. Extreme values (Outliers)
One of the most perplexing problems for the
analysis of data is how to treat a value that is
abnormally far above or below the mean.
However, before analyzing the data set, the
investigator would want to be sure that this item of
data was legitimate and would check the original
source of data. Although the value is an outlier, it
may probably be correct.
PRESENTED BY,
Dr. Sushi Kadanakuppe
II year PG student
Dept of Preventive & Community Dentistry
Oxford Dental College & Hospital
 ANALYTICAL OR INFERENTIAL
STATISTICS
 The nature and purpose of statistical inference
 The process of testing hypothesis
a. False-positive & false-negative errors.
b. The null hypothesis & alternative hypothesis
c. The alpha level & p value
d. Variation in individual observations and in
multiple samples.
 Tests of statistical significance
 Choosing an appropriate statistical test
 Making inferences from continuous
(parametric) data.
 Making inferences from ordinal data.
 Making inferences from dichotomous and
nominal (nonparametric) data.
 REFERENCES
THE NATURE AND PURPOSE OF
STATISTICAL INFERENCE
As stated earlier, it is often impossible to study
each member of a population. Instead, we select a
sample from the population and from that sample
attempt to generalize to the population as a whole.
The process of generalizing sample results to a
population is termed statistical inference and is
the end product of formal statistical hypothesis
testing.
Inference means the drawing of conclusions from
data.
Statistical inference can be defined as the drawing
of conclusions from quantitative or qualitative
information using the methods of statistics to
describe and arrange the data and to test suitable
hypotheses.
Differences Between Deductive Reasoning And
Inductive Reasoning
Because data do not come with their own
interpretation, the interpretation must be put into
the data by inductive reasoning (from Latin,
meaning “to lead into”). This approach to
reasoning is less familiar to most people than is
deductive reasoning (from Latin, meaning “to
lead out from”), which is learned from
mathematics, particularly from geometry.
Deductive reasoning proceeds from the general
(i.e., from assumptions, from propositions, and
from formulas considered true) to the specific (i.e.,
to specific members belonging to the general
category).
Consider, for example, the following two
propositions: (1) All Americans believe in
democracy. (2) This person is an American. If
both propositions are true, then the following
deduction must be true: This person believes in
democracy.
Deductive reasoning is of special use in science
once hypotheses are formed. Using deductive
reasoning, an investigator says, If this hypothesis
is true, then the following prediction or
predictions also must be true.
If the data are inconsistent with the predictions
from the hypothesis, they force a rejection or
modification of the hypothesis. If the data are
consistent with the hypothesis, they cannot prove
that the hypothesis is true, although they do lend
support to the hypothesis.
To reiterate, even if the data are consistent with
the hypothesis, they do not prove the hypothesis.
Physicians often proceed from formulas accepted
as true and from observed data to determine the
values that variables must have in a certain clinical
situation. For example, if the amount of a
medication that can be safely given per kilogram
of body weight (a constant) is known, then it is
simple to calculate how much of that medication
can be given to a patient weighing 50 kg.
This is deductive reasoning, because it proceeds
from the general (a constant and a formula) to the
specific (the patient).
Inductive reasoning, in contrast, seeks to find valid
generalizations and general principles from data.
Statistics, the quantitative aid to inductive
reasoning, proceeds from the specific (that is, from
data) to the general (that is, to formulas or
conclusions about the data).
For example, by sampling a population and
determining both the age and the blood pressure of
the persons in the sample (the specific data), an
investigator using statistical methods can
determine the general relationship between age
and blood pressure (e.g., that, on the average,
blood pressure increases with age).
Differences Between Mathematics And Statistics
The differences between mathematics and statistics
can be illustrated by showing that they form the
basis for very different approaches to the same
basic equation:
y = mx + b
This equation is the formula for a straight line in
analytic geometry. It is also the formula for simple
regression analysis in statistics, although the
letters used and their order customarily are
different.
In the mathematical formula above, the b is a
constant, and it stands for the y-intercept (i.e., the
value of y when the variable x equals 0). The value
m is also a constant, and it stands for the slope (the
amount of change in y for a unit increase in the
value of x).
The important thing to notice is that in
mathematics, one of the variables (either x or y) is
unknown (i.e., to be calculated), while the formula
and the constants are known.
In statistics, however, just the reverse is true: the
variables, x and y, are known for all observations,
and the investigator usually wishes to determine
whether or not there is a linear (straight line)
relationship between x and y, by estimating the
slope and the intercept. This can be done using the
form of analysis called linear regression, which is
discussed later.
As a general rule, what is known in statistics is
unknown in mathematics, and vice versa. In
statistics, the investigator starts from the specific
observations (data) to induce or estimate the
general relationships between variables.
Probability
The probability of a specified event is the fraction,
or proportion, of all possible events of a specified
type in a sequence of almost unlimited random
trials under similar conditions.
The probability of an event is the likelihood the
event will occur; it can never be greater than 1
(100%) or less than 0 (0%).
Applications and characteristics
1. The probability values in a population are
distributed in a definable manner that can be used
to analyze the population.
2. Probability values that do not follow a
distribution can be analyzed using nonparametric
methods.
Calculation
The probability of an event is determined as
P (A) = A / N
Where P (A) = the probability of event A
occurring; A = the number of times that event A
actually occurs; and N = the total number of
events during which event A can occur.
Example: A medical student performs
venipunctures on 1000 patients and is successful
on 800 in the first attempt. Assuming that all other
factors are equal (i.e., random selection of
patients), the probability that the next
venipuncture will be successful on the first
attempt is 80%.
Rules
a. Additive rule
1. Definition. The additive rule applies when
considering the probability of one of at least two
mutually exclusive events occurring, which is
calculated by adding together the probability value
of each event.
Calculation. The probability of only one of two
mutually exclusive events is determined as
P (A or B) = P (A) + P (B)
Where P (A or B) = the probability of event A or
event B occurring.
1. Example. About 6.3% of all medical students are
black, and 5.5% are Hispanics
The probability that a medical student will ever be
either black or Hispanic is 6.3% plus 5.5%, or
11.8%.
a. Multiplicative rule.
1. Definition. The multiplicative rule applies
when considering the probability of at least two
independent events occurring together, which is
calculated by multiplying the probability values
for the events.
1. Calculation. The probability of two independent
events occurring together is determined as
P (A and B) = P (A) × P (B)
Where P (A and B) = the probability of both event
A and event B occurring.
1. Example. About 6.3% of all medical students are
black and 36.1% of all students are women.
Assuming race and sex are independent selection
factors, the percentage of students who are black
women should be about 6.3% multiplied by
36.1%, or 2.3%.
THE PROCESS OF TESTING
HYPOTHESES
Hypotheses are predictions about what the examination of
appropriate data will show. The following discussion
introduces the basic concepts underlying the usual tests of
statistical significance.
These tests determine the probability that a finding (such
as a difference between means or proportions) represents a
true deviation from what was expected (i.e., from the
model, which is often a null hypothesis that there will be
no difference between the means or proportions).
 False Positive And False Negative Errors
Science is based on the following set of principles
1. Previous experience serves as the basis for
developing hypotheses;
2. hypotheses serve as the basis for developing
predictions;
3. and predictions must be subjected to
experimental or observational testing.
In deciding whether data are consistent or
inconsistent with the hypotheses, investigators are
subject to two types of error.
They could assert that the data support a
hypothesis when in fact the hypothesis is false;
this would be a false-positive error, which is also
called an alpha error or a type I error.
Conversely, they could assert that the data do not
support the hypothesis when in fact the hypothesis
is true; this would be a false-negative error, which
is also called a beta error or a type II error.
Based on the knowledge that the scientists become
attached to their own hypotheses and based on the
conviction that the proof in science, as in the
courts, must be “beyond the reasonable doubt”,
investigators are historically been particularly
careful to avoid the false-positive error.
Probably this is best for theoretical science.
In medicine, however, where a false-negative error
in a diagnostic test may mean missing a disease
until it is too late to institute therapy and where a
false-negative error in the study of a medical
intervention may mean overlooking an effective
treatment, investigators cannot feel comfortable
about false-negative errors either.
 The Null Hypothesis And The Alternative
Hypothesis
The process of significance testing involves three
basic steps:
(1) Asserting the null hypothesis,
(2) Establishing the alpha level, and
(3) Rejecting or failing to reject a null hypothesis
The first step consists of asserting the null
hypothesis, which is the hypothesis that there is no
real (true) difference between means or
proportions of the groups being compared or that
there is no real association between two
continuous variables. It may seem strange to begin
the process by asserting that something is not true,
but it is far easier to reject an assertion than to
prove something is true.
If the data are not consistent with the hypothesis,
the hypothesis can be rejected.
If the data are consistent with a hypothesis, this
still does not prove the hypothesis, because other
hypotheses may fit the data equally well.
The second step is to determine the probability of
being in error if the null hypothesis is rejected.
This step requires that the investigator establish an
alpha level, as described below.
If the p value is found to be greater than the alpha
level, the investigator fails to reject the null
hypothesis. If, however, the p value is found to be
less than or equal to the alpha level, the next step
is to reject the null hypothesis and to accept the
alternative hypothesis, which is the hypothesis
that there is in fact a real difference or association.
Although it may seem awkward, this process is
now standard in medical science and has yielded
considerable scientific benefits.
Statistical tests begin with the statement of the
hypothesis itself, but stated in the form of a null
hypothesis.
For example, consider again the group of patients
who tested the new pain-relieving drug, drug A,
and recorded their number of minutes to 100%
pain relief. Suppose that a similar sample of
patients tested another drug, drug B, in the same
way, and investigators wished to know if one
group of patients experienced total pain relief
more quickly than the other group.
In this case, the null hypothesis would be stated in
this way: “there is no difference in time to 100%
pain relief between the two pain-relieving drugs A
and B”. The null hypothesis is one of no
difference, no effect, no association, and serves as
a reference point for the statistical test.
In symbols, the null hypothesis is referred to as H0
.
In the comparison of the two drugs A and B, we
can state the H0
in terms of there being no
difference in the average number of minutes to
pain relief between drugs A and B, or
H0
: XA
= XB
.
The alternative is that the means of the two drugs
are not equal. This is an expression of the
alternative hypothesis H1
.
Null hypothesis H0
: XA
= XB
Alternative hypothesis H1
: XA
≠XB
 The Alpha Level And P Value
Before doing any calculations to test the null
hypothesis, the investigator must establish a
criterion called the alpha level, which is the
maximum probability of making a false-positive
error that the investigator is willing to accept.
By custom, the level of alpha is usually set at
p = 0.05. This says that the investigator is willing
to run a 5% risk (but no more) of being in error
when asserting that the treatment and control
groups truly differ.
In choosing an alpha level, the investigator inserts
value judgment into the process. However, when
that is done before the data are collected, at least
the post hoc bias of being tempted to adjust the
alpha level to make the data show statistical
significance is avoided.
The p value obtained by a statistical test (such as
the t-test) gives the probability that the observed
difference could have been obtained by chance
alone, given random variation and a single test of
the null hypothesis.
Usually, if the observed p value is ≤ 0.05, members
of the scientific community who read about an
investigation will accept the difference as being
real.
Although setting alpha at ≤ 0.05 is somewhat
arbitrary, that level has become so customary that
it is wise to provide explanations for choosing
another alpha level or for choosing not to perform
tests of significance at all, which may be the best
approach in some descriptive studies.
The p value is the final arithmetic answer that is
calculated by a statistical test of a hypothesis.
Its magnitude informs the researcher as to the
validity of the H0
, that is, whether to accept or
reject the H0
as worth keeping.
The p value is crucial for drawing the proper
conclusions about a set of data.
So what numerical value of p should be used as the
dividing line for acceptance or rejection of the H0
?
Here is the decision rule for the observed value of
p and the decision regarding the H0
.
If p ≤ 0.05, reject the H0
If p > 0.05, accept the H0
If the observed probability is less than or equal to
0.05 (5%), the null hypothesis is rejected, that is,
the observed outcome is judged to be incompatible
with the notion of “no difference” or “no effect”,
and the alternative hypothesis is adopted.
In this case, the results are said to be “statistically
significant”.
If the observed probability is greater than 0.05
(5%), the decision is to accept the null hypothesis,
and the results are called “not statistically
significant” or simply NS, the notation often used
in tables.
Statistical Versus Clinical Significance
The distinction between statistical significance and
clinical or practical significance is worth
mentioning.
For example, in the statistical test of the
H0
: XA
= XB
for two drug groups,
let’s assume that the observed probability is p =
0.01, a value that is less than the dividing line of
0.05 or 5%.
This would lead the investigator to reject the H0
and
to conclude that the results are
“significant at p = 0.01”, that is, one drug caused
total pain relief significantly faster, on average,
than the other drug at p = 0.01.
But if the actual difference in the group means is
itself clinically meaningless or negligible, the
statistical significance may be considered real yet
not useful.
According to Dr. Horowitz,
Statistical significance, “is a mathematical
expression of the degree of confidence that an
observed difference between groups represents a
real difference – that a zero response would not
occur if the study were repeated, and that the
study is not merely due to chance”.
On the other hand, “clinical significance is a
judgment made by the researcher or reader that
differences in response to intervention observed
between groups are important for health”.
“It is a subjective evaluation of the test”, continues
Dr. Horowitz, based on clinical experience and
familiarity with the “disease or condition being
measured”.
 Variation In Individual Observations And In
Multiple Samples
Most tests of significance relate to a difference
between means or proportions.
They help investigators decide whether an
observed difference is real, which in statistical
terms is defined as whether the difference is
greater than would be expected by chance alone.
Inspecting the means to see if they were different
is inadequate because it is not known whether the
observed difference was unusual or whether a
difference that large might have been found
infrequently if the experiment were repeated.
To generalize beyond the particular subjects in the
single study, the investigators must know the
extent to which the difference discovered in the
study are reliable.
The estimate of reliability is given by the standard
error, which is not the same as the standard
deviation.
Standard Deviation And Standard Error
A normal distribution could be completely
described by its mean and standard deviation. This
information is useful in describing individual
observations (raw data),
but it is not useful in determining how close a
sample mean from research data is to the mean
for the underlying population (which is also called
the true mean or the population mean). This
determination must be made on the basis of the
standard error.
The standard error is related to the standard
deviation, but it differs from the standard
deviation in important ways.
Basically, the standard error is the standard
deviation of a population of sample means, rather
than of individual observations.
Therefore the standard error refers to the variability
of individual observations, so that it provides an
idea of how variable a single estimate of the mean
from one set of research data is likely to be.
The frequency distribution of the 100 different
means could be plotted, treating each mean as a
single observation.
These sample means will form a truly normal
(gaussian) frequency distribution, the mean of
which would be very close to the true mean for the
underlying population.
More important for this discussion, the standard
deviation of this distribution of sample means is
an unbiased estimate of the standard deviation of
the underlying population and is called the
standard error of the distribution.
The standard error is a parameter that enables the
investigator to do two things that are central to the
function of statistics.
 One is to estimate the probable amount of error
around a quantitative assertion.
 The other is to perform tests of statistical
significance.
If only the standard deviation and sample size of
one research sample are known, however, the
standard deviation can be converted to a standard
error so that these functions can be pursued.
An unbiased estimate of the standard error can be
obtained from the standard deviation of a single
research sample if the standard deviation was
originally calculated using the degrees of freedom
(N - 1) in the denominator.
The formula for converting a standard deviation
(SD) to a standard error (SE) is as follows:
Standard error = SE = SD

N
The larger the sample size (N), the smaller the
standard error, and the better the estimate of the
population mean.
At any given point on the x-axis, the height of the
bell-shaped curve of the sample means represents
the relative probability that a single sample mean
would fall at that point.
Most of the time, the sample mean would be near
the true mean. Less often, it would be farther
away.
In the medical literature, means or proportions are
often reported either as the mean plus or minus 1
SD or as the mean plus or minus 1 SE.
Reported data must be examined carefully to
determine whether the SD or the SE is shown.
Either is acceptable in theory, because an SD can
be converted to an SE and vice versa if the sample
size is known.
However, many journals have a policy stating
whether the SD or SE must be reported. The
sample size should also be shown.
Confidence Intervals
Whereas the SD shows the variability of individual
observations, the SE shows the variability of
means.
Whereas the mean plus or minus 1.96 SD estimates
the range in which 95% of individual observations
would be expected to fall, the mean plus or minus
1.96 SE estimates the range in which 95% of the
means of repeated samples of the same size
would be expected to fall.
Moreover, if the value for the mean plus or minus
1.96 SE is known, it can be used to calculate the
95% confidence interval, which is the range of
values in which the investigator can be 95%
confident that the true mean of the underlying
population falls.
Tests Of Statistical Significance
The science of biostatistics has given us a large
number of tests that can be applied to public
health data. An understanding of the tests will
guide an individual toward the efficient collection
of data that will meet the assumptions of the
statistical procedures particularly well.
The tests allow investigators to compare two
parameters, such as means or proportions, and to
determine whether the difference between them is
statistically significant.
The various t- tests (the one tailed Student’s t- test,
the two-tailed Student’s t –test, and the paired t-
test) compare differences between means, while
z- tests compare differences between proportions.
All of these tests make comparisons possible by
calculating the appropriate form of a ratio, which
is called a critical ratio because it permits the
investigator to make a decision.
This is done by comparing the ratio obtained from
whatever test is performed (e.g., a t- test) with the
values in the appropriate statistical table (e.g., a
table of t values) for the observed number of
degrees of freedom.
Before individual tests are discussed in detail, the
concepts of critical ratios and degrees of freedom
are defined.
Critical Ratios
Critical ratios are a class of tests of statistical
significance that depend on dividing some
parameter (such as a difference between means)
by the standard error (SE) of that parameter.
The general formula for tests of statistical tests is as
follows:
Critical Ratio = Parameter

SE of that parameter
When applied to the student’s t- test, the formula
becomes:
Difference between two means
Critical Ratio = t = 
SE of the difference between two means
When applied to a z- test, the formula becomes:
Difference between two proportions
Critical Ratio = z = 
SE of the difference between two proportions
The value of the critical ratio (e.g., t or z) is then
looked up in the appropriate table (of t or z) to
determine the corresponding value of p.
For any critical ratio, the larger the ratio, the more
likely that the difference between means or
proportions is due to more than just random
variation (i.e., the more likely it is that the
difference can be considered statistically
significant and, hence, real).
Unless the total sample size is small (say, under
30), the finding of a critical ratio of greater than
about 2 usually indicates that the difference is real
and enables the investigator to reject the null
hypothesis.
The statistical tables adjust the critical ratios for
the sample size by means of the degrees of
freedom.
Degrees of Freedom
The term “degrees of freedom” refers to the
number of observations that are free to vary.
The Idea Behind The Degrees Of Freedom
The term “degrees of freedom” refers to the
number of observations (N) that are free to vary.
The degree of freedom is lost every time a mean is
calculated.
Why should this be?
Before putting on a pair of gloves, a person has the
freedom to decide whether to begin with left or the
right glove. However, once the person puts on the
first glove, he or she loses the freedom to decide
which glove to put on last.
If centipedes put on shoes, they would have a
choice to make for the first 99 shoes but not for
the 100th
shoe. Right at the end, the freedom to
choose (vary) is restricted.
In statistics, if there are two observed values, only
one estimate of the variation between them is
possible.
Something has to serve as the basis against which
other observations are compared.
The mean is the most “solid” estimate of the
expected value of a variable, so it is assumed to be
“fixed”.
This implies that the numerator of the mean (the
sum of individual observations, or the sum of xi
),
which is based on N observations, is also fixed.
Once N – 1 observations (each of which was,
presumably, free to vary) have been added up, the
last observation is not free to vary, because the
total values of the N observations must add up to
the sum of xi
.
For this reason, 1 degree of freedom is lost each
time a mean is calculated. The proper average of a
sum of squares when calculated from an observed
sample, therefore, is the sum of squares divided by
the degrees of freedom (N - 1).
Hence, for simplicity, the degrees of freedom for
any test are considered to be the total sample size
minus 1 degree of freedom for each mean that is
calculated. In Student’s t- test 2 degrees of
freedom are lost because two means are calculated
(one mean for each group whose means are to be
compared).
The general formula for degrees of freedom for the
Student’s two-group t- test is N1
+ N2
– 2,
where N1
is the sample size in the first group and
N2
is the sample size in the second group.
Use of t- test
In medical research, t- tests are among the three or
four most commonly used statistical tests
(Emerson and Colditz 1983)6.
The purpose of t- test is to compare the means of a
continuous variable in two research samples in
order to determine whether or not the difference
between the two observed means exceeds the
difference that would be expected by chance from
random sample.
Sample population and Sizes
If two different samples come from two different
groups (e.g., a group of men and a group of
women), the Student’s t- test is used.
If the two samples come from the same group
(e.g., pretreatment and post- treatment values for
the same study subjects), the paired t- test is used.
Both types of t-tests depend on certain
assumptions, including the assumption that the
data in the continuous variable are normally
distributed (i.e., have a bell-shaped distribution).
Very seldom, however, will observed data be
perfectly normally distributed. Does this invalidate
the t-test? Fortunately, it does not. There is a
convenient theorem, that rescues the t-test (and
much of statistics as well).
The central limit theorem can be derived
theoretically or observed by experimentation.
According to the theorem, for reasonably large
samples (say, 30 or more observations in each
sample), the distribution of the means of many
samples is normal (gaussian), even though the data
in individual samples may have skewness,
kurtosis, or unevenness.
Because the critical theoretical requirement for the
t-test is that the sample means be normally
distributed, a t-test may be compared on almost
any set of continuous data, if the observations can
be considered a random sample and the sample
size is reasonable large.
The t-distribution
The t distribution was described by William Gosset,
who used the pseudonym “Student” when he
wrote the description.
The t distribution looks similar to normal
distribution, except that its tails are somewhat
wider and its peak is slightly less high, depending
on the sample size.
The t distribution is necessary because when
sample sizes are small, the observed estimates of
the mean and variance are subject to considerable
error.
The larger the sample size is, the smaller the errors
are, and the more the t distribution looks like the
normal distribution. In the case of an infinite
sample size, the two distributions are identical.
For practical purposes, when the combined sample
size of the two groups being compared is larger
than 120, the difference between the normal
distribution and the t distribution is negligible.
Student’s t test
There are two types of Student’s t test:
the one-tailed and
the two-tailed type.
The calculations are the same, but the
interpretation of the resulting t differs somewhat.
The common features will be discussed before the
differences are outlined.
Calculation of the value of t.
In both types of Student’s t test, t is calculated by
taking the observed differences between the means
of the two groups (the numerator) and dividing
this difference by the standard error of the
difference between the means of the two groups
(the denominator).
Before t can be calculated, then, the standard error
of the difference between the means (SED) must
be determined.
The basic formula for this is the square root of the
sum of the respective population variances, each
divided by its own sample size.
When the Student’s t-test is used to test the null hypothesis
in research involving an experimrntal group and a control
group, it usually takes the general form of the following
equation:
t = xE
- xC
– 0
s2
p
[(1 / NE
) + (1 / NC
)]
df = NE
+ NC
– 2
The 0 in the numerator of the equation for t was
added for correctness, because the t-test
determines if the difference between the means is
significantly different from 0.
However, because the 0 does not affect the
calculations in any way, it is usually omitted from
t-test formulas.
The same formula, recast in terms to apply to any
two independent samples (e.g., samples of men
and women), is as follows,
t = x1
- x2
- 0
s2
p
[(1 / N1
) + (1 / N2
)]
df = N1
+ N2
– 2
in which x1
is the mean of the first sample, x2
is
the mean of the second sample, s2
p
is the pooled
estimate of the variance, N1
is the size of the first
sample, N2
is the size of the second sample, and df
is the degrees of freedom.
The 0 in the numerator indicates that the null
hypothesis states that the difference between the
means will not be significantly different from 0.
The df is needed to enable the investigator to refer
to the correct line in the table of the values of t and
their relationship to p.
The t test is designed to help investigators
distinguish “explained variation” from
“unexplained variation” (random error, or
chance).
These concepts are like “signal” and “background
noise” in radio broadcast engineering. Listeners
who are searching for a particular station on their
radio dial will find background nose on almost
every radio frequency.
When they reach the station that they want to hear,
they may not notice the background noise, since
the signal is so much stronger than this noise.
In medical studies, the particular factor that is
being investigated is similar to the radio signal,
and random error is similar to background noise.
Statistical analysis helps distinguish one from the
other by comparing their strengths.
If the variation caused by the factor of interest is
considerably larger than the variation caused by
random factors (i.e., if in the t-test the ratio is
approximately 1.96), the effect of the factor of
interest becomes detectable above the statistical
“noise” of random factors.
Interpretation of the results
If the value of t is large, the p value will be small,
because it is unlikely that a large t ratio will be
obtained by chance alone. If the p value is 0.05 or
less, it is customary to assume that there is a real
difference. Conceptually, the p value is the
probability of being in error if the null hypothesis
of no difference between the means is rejected and
the alternative hypothesis of a true difference is
accepted.
• One-Tailed and Two-Tailed t-Tests
• These tests are sometimes called the one-
sided test and the two-sided tests.
• In the two-tailed test,
alpha is equally
divided at the ends of
the two tails of the
distribution. The two-
tailed test is generally
recommended,
because differences
in either direction are
usually important to
document.
For example, it is obviously important to know if a
new treatment is significantly better than a
standard or placebo treatment, but it is also
important to know if a new treatment is
significantly worse and should therefore be
avoided.
In this situation, the two-tailed test provides an
accepted criterion for when a difference shows the
new treatment to be either better or worse.
Sometimes, however, only a one-tailed test is
needed.
Suppose, for example, that a new therapy is known
to cost much more than the currently used therapy.
Obviously, it would not be used if it were worse
than the current therapy, but it would also not be
used if it were merely as good as the current
therapy.
Under these circumstances, some
investigators consider it
acceptable to use a one-tailed test.
When this occurs, the 5% rejection
region for the null hypothesis is
all put on one tail of the
distribution, instead of being
evenly divided between the
extremes of the two tails.
In the one-tailed test, the null hypothesis
nonrejection region extends only to 1.645 standard
errors above the “no difference” point of 0.
In the two-tailed test, it extends to 1.96 standard
errors above and below the “no difference” point.
This makes the one-tailed test more robust-that is,
more able to detect a significant difference, if it is
in the expected direction. Many investigators
dislike one-tailed tests, because they believe that if
an intervention is significantly worse than the
standard therapy, that should be documented
scientifically. Most reviewers and editors require
that the use of a one-tailed significance test be
justified.
Paired t- test
In many medical studies, individuals are followed
over time to see if there is a change in the value of
some continuous variable. Typically, this occurs in
a “better and after” experiment, such as one testing
to see if there was a drop in average blood pressure
following treatment or to see if there was a drop in
weight following the use of a special diet. In this
type of comparison, an individual patient serves as
his or her own control.
The appropriate statistical test for this kind of data
is the paired t-test. The paired t-test is more robust
than the Student’s t-test because it considers the
variation from only one group of people, whereas
the Student’s t-test considers variation from two
groups.
Any variation that is detected in the paired t-test is
attributable to the intervention or to changes over
time in the same person.
Calculation of the value of t
To calculate a paired t-test, a new variable is
created. This variable, called d, is the difference
between the values before and after the
intervention for each individual studied.
The paired t-test is a test of the null hypothesis
that, on the average, the difference is equal to 0,
which is what would be expected if there were no
change over time.
Using the symbol d to indicate the mean
observed difference between the before and after
values, the formula for the paired t-test is as
follows:
tpaired
= tp
= d – 0
Standard error of d
= d – 0
sd
2
N
df = N – 1
But in the paired t-test, because only one mean is
calculated (d) , only one degree of freedom is
lost; therefore, the formula for the degrees of
freedom is N – 1.
Interpretation of the results
If the value of t is large, the p value will be small,
because it is unlikely that a large t ratio will be
obtained by chance alone. If the p value is 0.05 or
less, it is customary to assume that there is a real
difference (i.e., that the null hypothesis of no
difference can be rejected).
Use of z-tests
In contrast to t-tests, which compare differences
between means, z-tests compare differences
between proportions.
In medicine, examples of proportions that are
frequently studied are sensitivity, specificity,
positive predictive value, risks, percentages of
people with a given symptom, percentages of
people who are ill, and percentages of ill people
who survive their illness
Frequently, the goal of research is to see if the
proportion of patients surviving in a treated group
differs from that in an untreated group. This can
be evaluated using a z-test for proportions.
Calculation of the value of z
As discussed earlier, z is calculated by taking the
observed difference between the two proportions
(the numerator) and dividing it by the standard
error of the difference between the two
proportions (the denominator).
For purposes of illustration, assume that research is
being conducted to see if the proportion of patients
surviving in a treated group is greater than that in
an untreated group.
For each group, if p is the proportion of successes
(survivals), then 1 – p is the proportion of failures
(nonsurvivals).
If N represents the size of the group on which the
proportion is based, the parameters of the
proportion could as follows:
Variance (proportion) = p (1 - p)
N
Standard error (proportion) = SEp
= p (1 - p)
N
95% confidence interval = 95% CI = p ± 1.96 SEp
if there is a 0.60 (60%) survival rate following a given
treatment, the calculations of SEp
and the 95% CI of the
proportion, based on a sample of 100 study subjects, would
be as follows:
SEp
= (0.6) (0.4) / 100
= 0.24 / 100
= 0.49 / 10
= 0.049
95% CI = 0.6 ± (1.96) (0.049)
= 0.6 ± 0.096
= between 0.6 – 0.096 and 0.6 +
0.096
= 0.504, 0.696
Now that there is a way to obtain the standard error
of a proportion, the standard error of the
difference between proportions also can be
obtained, and the equation for the z-test can be
expressed as follows:
z = p1
– p2
-0
p (1 - p) [(1/ N1
) + (1/ N2
)]
in which p1
is the proportion of the first sample, p2
is the proportion of the second sample, N1
is the
size of the first sample, N2
is the size of the second
sample, and p is the mean proportion of
successes in all observations combined. The 0 in
the numerator indicates that the null hypothesis
states that the difference between the proportions
will not be significantly different from 0.
Interpretation of results
Note that the above formula for z is similar to the
formula for t in the Student’s t-test, as described
earlier. However, because the variance and the
standard error of the proportion are based on a
theoretical distribution (the binominal
approximation to the z distribution), the z
distribution is used instead of the t distribution in
determining whether the difference is statistically
significant. When the z ratio is large (as when the t
ratio is large), the difference is more likely to be
real.
The computations for the z tests appear different
from the computations for the chi-square test, but
when the same data are set up as a 2 × 2 table,
technically the computations for the two tests are
identical. Most people find it easier to do a chi-
square test than do a z-test for proportions.
Choosing An Appropriate
Statistical Test
A variety of statistical tests can be used to analyze
the relationship between two or more variables.
The bivariate analysis is the analysis of the
relationship between one independent (possibly
causal) variable and one dependent (outcome)
variable. Whereas, the multivariable analysis is
the analysis of the relationship of more than one
independent variable to a single dependable
variable.
Statistical tests should be chosen only after the
types of clinical data to be analyzed and the basic
research design have been established. In general,
the analytic approach should begin with a study of
the individual variables, including their
distributions and outliers, and with a search for
errors. Then bivariate analysis can be done to test
hypotheses and probe for relationships. Only after
these procedures have been done carefully should
multivariable analysis be attempted.
Among the factors involved in choosing an
appropriate statistical test are the goals and
research design of the study and the type of data
being gathered.
In some studies the investigators are interested in
descriptive information, such as the sensitivity or
specificity of a laboratory assay, in which case
there may be no reason to perform a test of
statistical significance.
In other studies, the investigators are interested in
determining whether the difference between two
means is real, in which case testing for statistical
significance is appropriate.
The types of variables and the research designs set
the limits to statistical analysis and determine
which tests are appropriate. An investigator’s
knowledge of the types of variables (continuous
data, ordinal data, dichotomous data and nominal
data) and appropriate statistical tests is analogous
to a painter’s knowledge of the types of media
(oils, tempera, water colors, and so forth) and the
appropriate brushes and techniques to be used.
If the research design involves before and after
comparisons in the same study subjects or
involves comparisons of matched pairs of study
subjects, a paired test of statistical significance-
such as the paired t-test, the Wilcoxon matched
pairs signed-ranks test, or the McNemar chi-
square test- would be appropriate. Moreover, if the
sampling procedure in a study is not random,
statistical tests that assume random sampling, such
as most of the parametric tests, may not be valid.
Making inferences from continuous
(parametric) data
If the study involves two continuous variables, the
following questions may be answered:
(1) is there a real relationship between the
variables or not?
(2) If there is real relationship, is it a positive or
negative linear relationship (a straight-line
relationship), or is it more complex?
(3) If there is a real relationship, how strong is it?
(4) How likely is the relationship to be
generalizable?
The best way to answer these questions is first to
plot the continuous data on a joint distribution
graph and then to perform correlation analysis and
simple linear regression analysis.
The Joint Distribution Graph
Taking the example of a sample of elderly
xerostomia patients, does the number of root
caries increase with increasing amounts of sugar
in the diet (number of servings per day)? In this
instance, data are recorded on a single group of
subjects, and each subject constitutes a pair of
measures (number of servings per day of sugar
and number of root caries). Commonly, any pair
of variables entered into a correlation analysis is
given the names x and y.
Y
X
This data can be plotted on a joint distribution
graph, as shown in fig. The data do not form a
perfectly straight line, but they do appear to lie
along a straight line, going from the lower left to
the upper right on the graph, and all of these
observations but one are fairly close to the line.
As indicated in fig, the correlation between two
variables, labeled x and y, can range from
nonexistent to strong. If the value of y increases as
x increases, the correlation is positive; if y
increases as x increases, the correlation is
negative.
It appears from the
graph that the
correlation between
amounts of sugar and
dental caries is strong
and is positive.
Y
X
Therefore, based on fig, the answer to the first
question above is that there is a real relationship
between amount of sugar and dental caries. The
graph, however, does not reveal the probability
that such a relationship could have occurred by
chance. The answer to the second question is that
the relationship is positive and is linear. The graph
does not provide quantitative information about
how strong the association is (although it looks
strong to the eye).
To answer these questions more precisely, it s
necessary to use the techniques of correlation and
simple linear regression. Neither the graph nor
these techniques, however, can answer the
question of how generalizable the findings are.
The Pearson Correlation Coefficient
Even without plotting the observations for two
variables (variable x and variable y) on a graph,
the extent of their linear relationship can be
determined by calculating the Pearson product-
moment correlation coefficient, which is given the
symbol r and is referred to as the r value.
This statistic varies from –1 to +1, going through
0. A finding of –1indicates that the two variables
have a perfect negative linear relationship; +1
indicates that they have a perfect positive linear
relationship; and 0 indicates that the two variables
are totally independent of each other. The r value
is rarely found to be –1 or +1.
Frequently, there is an imperfect correlation
between the two variables, resulting in r values
between 0 and 1 or between 0 and –1. Because the
Pearson correlation coefficient is strongly
influenced by extreme values, the value of r can
only be trusted when the distribution of each of
the two variables to be correlated is approximately
normal (i.e., without sever skewness or extreme
outlier values).
As is the case in every test of significance, for a
fixed level of strength of association, the larger the
sample size, the more likely it is to be statistically
significant.
A weak correlation in a large sample might be
statistically significant, despite the fact that it was
not etiologically or clinically important.
There is no perfect statistical way to estimate
clinical importance, but with continuous variables
a valuable concept is the strength of the
association, measured by the square of the
correlation coefficient, or r2
.
The r2
value is the proportion of variation in y
explained by x (or vice versa). It is an important
parameter in advanced statistics.
Looking at the strength of association is analogous
to looking at the size and clinical importance of an
observed difference.
Linear Regression Analysis
Linear regression is related to correlation analysis,
but it produces two parameters that can be directly
related to the data (i.e., the slope and the
intercept). Linear regression seeks to quantify the
linear relationship that may exist between an
independent variable x and a dependent variable y.
Recall that the formula for a straight line, as
expressed in statistics, is y=a+bx. The y is the
value of an observation on the y-axis; x is the
value of the same observation on the x-axis; a is
the regression constant (the value of y when the
value of x is 0); and b is the slope (the change in
the value of y for a unit change in the value of x).
Linear regression is used to estimate two
parameters: the slope of the line (b) and the y-
intercept (a).
Most fundamental is the slope, which determines
the strength of the impact of variable x on y. For
example, the slope can tell how much weight will
increase, on the average, for each additional
centimeter of height.
Linear regression analysis enables investigators
to predict the value of y from the values that x
takes.
In other words, the formula for linear regression is
a form of statistical modeling, and the adequacy of
the model is determined by how closely the value
of y can be predicted from other data in the model.
Just as it is possible to set confidence intervals
around parameters such as means and proportions,
it is possible to set confidence intervals around the
slope and the intercept, using computations based
on linear regression formulas. Most statistical
computer programs perform these computations
and are within the scope of advanced statistics.
Making Inferences From Ordinal Data
Many medical data are ordinal data, which are
ranked from the lowest value to the highest value
but are not measured on an exact scale. In some
cases, investigators will assume that ordinal data
meet the criteria for continuous (measurement)
data and will treat the ordinal data as though they
had been obtained from a measurement scale.
For example, if the patient’s satisfaction with the
care in a given hospital were being studied, the
investigators might assume that the conceptual
distance between “very satisfied” (coded as a 3)
and “fairly satisfied” (coded as a 2) is equal to the
distance between “fairly satisfied” (coded as a 2)
and “unsatisfied” (coded as a 1).
If the investigators are willing to make these
assumptions, the data can be analyzed using the
parametric statistical methods such as t-tests,
analysis of variance, and analysis of the Pearson
correlation coefficient. However, sometimes
clinical investigators make this assumption when
it is appropriate, because the statistics are easier to
obtain and are more likely to produce statistical
significance.
If the investigator is unwilling to make such
assumptions, statistics for discrete (nonparametric)
data, such as a chi-square test, can be used.
However, analysis using chi-square would require
discarding the information about the rank of each
observation. Fortunately, there are a number of
bivariate statistical tests for ordinal data that can be
used.
The Mann-Whitney U Test
It is one of the best-known non-parametric
significance tests. It was proposed, apparently
independently, by Mann and Whitney (1947) and
Wilcoxon (1945), and therefore is sometimes also
called the Mann-Whitney-Wilcoxon (MWW)
test or the Wilcoxon rank-sum test.
In statistics the Mann-Whitney U test is a test for
assessing whether the meidans between two
samples of observations are the same. The null
hypothesis is that the two samples are drawn from
a single population, and therefore that the medians
are equal. It requires the two samples to be
independet, and the observations to be ordinal or
continuous measurements, i.e. one can at least say,
of any two observations, which is the greater.
The test for ordinal data that is similar to the
Student’s t-test is the Mann-Whitney U test, also
called the Wilcoxon rank-sum test. U, like t,
designates a probability distribution. In the Mann-
Whitney test, all of the observations in a study of
two samples are ranked numerically from the
smallest to the largest, without regard to whether
the observations came from the first sample (e.g.,
the control group) or from the second sample (e.g.,
the experimental group).
Next, the observations from the first sample are
identified, the ranks in this sample are summed,
and the average rank for the first sample and the
variance of those ranks are determined. The
process is repeated for the observations from the
second sample. If the null hypothesis is true (i.e.,
if there is no real difference between the two
samples), the average ranks of the two samples
should be similar.
If the average rank of one sample is considerably
greater or considerably smaller than that of the
other sample, the null hypothesis probably can be
rejected, but a test of significance is needed to be
sure.
Because the U-method for calculating t is tedious,
a t-test can be done instead and will yield very
similar results. The Student’s t-test uses raw
ranked data and divides the difference between the
two average ranks (which form the numerator) by
the square root of the pooled variance of the two
rank lists. The degrees of freedom equals the sum
of the sample sizes of the two groups minus 2.
The Wilcoxon Matched-Pairs Signed-Ranks Test
The test is named for Frank Wilcoxon (1892–
1965) who proposed this, and the rank-sum test
for two independent samples (Wilcoxon, 1945).
Like the t-test, the Wilcoxon test involves
comparisons of differences between
measurements, so it requires that the data are
measured at an interval level of measurement.
However it does not require assumptions about the
form of the distribution of the measurements. It
should therefore be used whenever the
distributional assumptions that underlie the t-test
cannot be satisfied.
The rank-order test that is comparable to the
paired t-test is the Wilcoxon matched-pairs
signed-ranks test. In this test, all of the
observations in a study of two samples are ranked
numerically from the largest to the smallest,
without regard to whether the observations came
from the first sample (e.g., the pretreatment
sample) or from the second sample (e.g., the post
treatment sample).
After pairs of data are identified (e.g., pretreatment
and post treatment sample), the difference in rank
is identified for each pair. If in a given pair the
pretreatment observation scored 7 ranks higher
than the post treatment observation, the difference
would be noted as –7. If in another pair the
pretreatment observation scored 5 ranks lower
than the post treatment observation, the difference
would be noted as +5.
Each pair would be scored in this way. If the null
hypothesis is true (i.e., if there is no real difference
between the samples), the sum of the positive
scores and negative scores should be close to 0. If
the average difference is considerably different
from 0, the null hypothesis can be rejected.
The Kruskal-Wallis Test
If the investigators in a study involving continuous
data want to compare the means of three or more
groups simultaneously, the appropriate test is a
one-way analysis of variance (a one-way
ANOVA), usually called an f-test. The
comparable test for ordinal data is called
Kruskal-Wallis one-way ANOVA.
As in the Mann-Whitney U test, in the Kruskal-
Wallis test all of the data are ranked numerically,
and the rank values are summed in each of the
groups to be compared.
The Kruskal-Wallis test seeks to determine if the
average ranks from three or more groups differ
from one another more than would be expected by
chance alone.
The Sign Test
The sign test can be used to test the hypothesis that
there is "no difference" between two continuous
distributions X and Y.
Sometimes an experimental intervention produces
positive results in many areas, but few if any of
the individual outcome variables show a
statistically significant improvement.
In this case, the sign test can be extremely helpful
in comparing the results in the experimental group
with those in the control group. If the null
hypothesis is true (i.e., there is no real difference
between the groups), then, by chance, for half of
the outcome variables the experimental group
should perform better, and for half of the outcome
variables the control group should perform better.
The only data needed for the sign test are the
record of whether, on the average, the
experimental subjects or the control subjects
scored “better” on each outcome variable (by what
amount is not important).
If the average score in the experimental group is
better, the result is recorded as a plus sign (+); if
the average score in the control group is better, the
result is scored as a minus sign (-); and if the
average score in the two groups is exactly the
same, no result is recorded and the variable is
omitted from the analysis.
For the sign test, “better” can be determined from a
continuous variable, an ordinal variable, a
dichotomous variable, a clinical score, or a
component of a score. Because under the null
hypothesis, the expected proportion of plus signs
is 0.5 and of minus signs is 0.5, the test compares
the observed proportion of successes with the
expected value of 0.5.
Making Inferences From Dichotomous
And Nominal (Nonparametric) Data
The chi-square test, the Fisher exact probability
test, and the McNemar chi-square test can be used
in the bivariate analysis of dichotomous
nonparametric data. Usually, the data are first
arranged in a 2×2 table, and the goal is to test the
null hypothesis that the variables are independent.
The 2×2 Contingency Table
The contingency table is used to determine
whether the distribution of one variable is
conditionally dependent (contingent) upon the
other variable.
More specifically, provides an example of a 2×2
contingency table, meaning that it has two cells in
each direction.
In a contingency table, a cell is a specific location
in the matrix created by the two variables whose
relationship is being studied. Each cell shows the
observed number, the expected number, and the
percentage of study subjects in each treatment
group who lived or died.
If there are more than two cells in each direction of
a contingency table, the table is called an R × C
table, where R stands for the number of rows and
C stands for the number of columns. Although the
principles of the chi-square test are valid for R × C
tables, the discussion below focuses on 2×2 tables.
The Chi-Square Test Of Independence
After t-tests, the most basic and common form of
standard analysis in the medical literature is the
chi-square test of the independence of two
variables in a contingency table (Emerson and
Colditz 1983).
The chi-square test is an example of a common
approach to statistical analysis known as
statistical modeling, which seeks to develop a
statistical expression (the model) that predicts the
behavior of a dependent variable on the basis of
knowledge of one or more independent variables.
The process of comparing the observed counts
with the expected counts- that is, of comparing O
with E- is called a goodness of fit test, because the
goal is to see how well the observed counts in a
contingency table “fit” the counts expected on the
basis of the model. Usually, the model in such a
table is the null hypothesis that the two variables
are independent of each other.
If the chi-square value is small, the fit is good and
the null hypothesis is not rejected. If, however, the
chi-square value is large, the data do not fit the
hypothesis well.
Calculation Of The Chi-Square Value
Once the observed (O) and expected (E) counts are
known, the chi-square (χ2
) value can be calculated.
One of two methods can be used, depending on the
size:
Method for large numbers
Method for Small Numbers
Method for large numbers
In box, the investigators begin by calculating the
chi-square value for each cell in the table, using
the following formula:
(O – E)2
E
Here, the numerator is the square of the deviation
of the observed count in a given cell from the
count that would be expected in that cell if the null
hypothesis were true.
This is similar to the numerator of the variance,
which is expressed as (xi
- x)2
, where xi
represents
the observed value and x (the mean) is the
expected value. However, whereas the
denominator for variance is the degrees of
freedom (N - 1), the denominator for chi-square is
the expected number (E).
To obtain the total chi-square value for a 2×2 table,
the investigators then add up the chi-square values
for the four cells:
χ2
= Σ (O – E)2
E
Thus, the basic statistical method for measuring the
total amount of variation in a data set, the total
sum of squares (TSS), is rewritten for the chi-
square test as the sum of
(O – E)2
.
Method for Small Numbers
Because the chi-square test is based on the normal
approximation of the binomial distribution (which
is discontinuous), many statisticians believe that a
correction for continuity is needed in the equation
for calculating chi-square, while others believe
that this is unnecessary.
The correction, originally described by F. Yates
and called the Yates correction for continuity,
makes little difference if the numbers in the table
are large, but in tables with small numbers it
probably is worth doing.
The only change in the chi-square test formula
given above is that in the continuity corrected chi-
square test, the number 0.5 is subtracted from the
absolute value of the (O -E) in each cell before
squaring. The formula is as follows:
Yates χ2
= Σ (O - E- 0.5)2
E
Clearly, the use of this formula reduces the size of
the chi-square value somewhat and reduces the
chance of finding a statistically significant
difference, so that correction for continuity makes
the test more conservative.
Determination of Degrees of Freedom
The term degrees of freedom refers to the number
of observations that can be considered to be free to
vary. According to the null hypothesis, the best
estimate of the expected distribution of counts in
the cells of a contingency table is provided by the
row and column totals.
Therefore, the row and column totals are
considered to be fixed, as is the mean in
calculating a variance.
An observed count can be entered “freely” into one
of the cells of a 2×2 table has only 1 degree of
freedom.
Multivariable Analysis
Statistical models that have one outcome variable
but include more than one independent variable
are generally called multivariable models.
Multivariable models are intuitively attractive to
investigator to ignore the basic principles of good
research design and analysis, because
multivariable analysis also has many limitations.
The methodology and interpretation of findings in
this type of analysis are difficult for most
physicians, despite the fact that the methods and
results of multivariable analysis are reported
frequently in the medical literature and their use is
increasing (Concato, Feinstein, and Holford
1993)6.
Their conceptual attractiveness and the availability
of high-speed computers contribute to making
these models popular. In order to be intelligent
consumers of the medical literature, health care
professionals should understand how to interpret
the findings of multivariable analysis as they are
presented in the literature.
The General Linear Model
The multivariable equation, with one dependent
variable and one or more independent variables, is
usually called the general linear model. The model
is “general” because there are many variations
regarding the types of variables for y and xi
as well
as the number of x variables that can be used. The
model is “linear” because it is a linear
combination of the xi
terms.
For the xi
variables, a variety of transformations
(e.g., square of x,
cube of x, square root of x, or
logarithm of x) could be used and the combination
of terms would still be linear, so that the model
would remain linear. What cannot happen if the
model is to remain linear is for any of the
coefficients (the bi
terms) to be a square, a square
root, a logarithm, or another transformation.
Numerous procedures for multivariable analysis
are based on the general linear model. These
include methods with such imposing terms as
analysis of variance (ANOVA), analysis of
covariance (ANCOVA), multiple linear regression
analysis, multiple logistic regression, the log-
linear model, and discriminant function analysis.
The choice of which procedure to use depends
primarily on whether the dependent and
independent variables are continuous,
dichotomous, nominal, or ordinal. Knowing that
the procedures are all variations of the same theme
(the general linear model) helps to make them less
confusing.
Analysis of variance (ANOVA)
If the dependent variable is continuous and all of
the independent variables are categorical (i.e.,
nominal, dichotomous, or ordinal), the correct
multivariable technique is analysis of variance
(ANOVA).
One-way ANOVA and N-way ANOVA are
discussed briefly. Both the techniques are based
on the general linear model and can be use to
analyze the results of an experimental study. If the
design includes only one independent variable
(e.g., treatment), the technique is called one-way
analysis, regardless of how many different
treatment groups are present. If it includes more
than one independent variable (e.g., treatment, age
group, and gender), the technique is called N-way
ANOVA.
One-Way ANOVA (The F-Test)
Suppose a team of investigators wanted to study
the effects of drugs A and B on blood pressure.
They might randomly allocate hypertensive
patients into four treatment groups: those taking
drug A alone, those taking drug B alone, those
taking drugs A and B in combination, and those
taking a placebo.
The investigators would measure systolic blood
pressure before and after treatment in each patient
and calculate a difffernce score (posttreatment
systolic pressure minus pretreatment systolic
pressure) for each study subject. This difference
score would become the outcome variable. They
would then calculate a mean difference score for
each of the four treatment groups (i.e., the three
drug groups and the one placebo group) so that
these mean scores could be compared in a test of
statistical significance.
The investigators would want to determine whether
the difference in blood pressure found in one or
more of the drug groups was large enough to be
clinically important, assuming it was a drop. Fro
example, a drop in mea systolic blood pressure
from 150 mm Hg to 148 mm Hg would be too
small to be clinically useful. If the results were not
clinically useful, there would be little point in
looking for an appropriate test of significance.
If, however, one or more of the groups showed a
clinically important drop in blood pressure, the
investigators would want to determine whether the
difference was likely to have occurred by chance
alone. To do this, an appropriate statistical test of
significance is needed.
The Student’s t-test could be used to compare each
pair of groups, but this would require six different
t-tests: each of the three drug groups (A, B, and
AB) versus the placebo group; drug A group
versus drug B group; drug A group versus drug
combination AB group; and drub B group versus
drug combination AB group. This raises the
problem of multiple hypotheses and multiple
associations.
Even if the investigators decide that the primary
comparison should be each drug or drug
combination with the placebo, this would still
leave three hypotheses to test instead of just one.
Moreover, if two or three groups did significantly
better than the placebo group, it would be
necessary to determine if one effective drug was
significantly better than the others.
There are numerous complex ways of handling the
problem of multiple associations, but the best
approach in cases such as this is to begin by
performing an F-test, which is the first step of
ANOVA. The F-test is a kind of “super t-test” that
allows the investigators to compare more than two
means simultaneously.
. The null hypothesis for the F-test in the previous
example is that the mean change in blood pressure
(d) will be the same for all four groups
(dA
= dB
= dAB
= dp
), indicating that all samples
were from the same population and that any
differences between the means are due to chance
variation.
In creating the F-test (F is for Fisher), Sir Ronald
Fisher reasoned that if two different methods
could be found to estimate the variance and if all
of the samples came from the same population,
these two different estimates of variance should be
similar. He therefore developed two measures of
the variance of the observations.
One is called between-groups variance and is
based on the variation between (or among) the
means. The other is called within-groups variance
and is based on the variation within each group-
i.e., variation around a single group mean. In
ANOVA, these two measures of variance are also
called the between-groups mean square and the
within-groups mean square.
The ratio of the two measures of variance can therefore be
expressed as follows:
F ratio = Between-groups variance = Between-groups mean square
Within-groups variance Within-groups mean square
If the F-ratio is fairly close to 1.0, the two
estimates of variance are similar, and the null
hypothesis that all of the means came from the
same underlying population is not rejected. If the
ratio is much larger than 1.0, there must have been
some force, attributable to group differences,
pushing the means apart, and the null hypothesis of
no difference is rejected.
N-Way ANOVA
The goal of ANOVA, stated in the simplest terms,
is to explain (to “model”) the total variation found
in a study.
If only one independent variable is tested in a
model and that variable happens to be gender, the
total amount of variation must be explained in
terms of how much variation is due to gender and
how much is not. Any variation (SS) that is not
due to the model (gender) is considered to be error
(residual) variation.
If two independent variables are tested in a model
and those variables happen to be treatment and
gender, the total amount of variation must be
explained in terms of how much variation is due to
each of the following: the independent effect of
gender, the interaction between (joint effect of)
treatment and gender, and error.
If more than two variables are tested, the analysis
becomes increasingly complicated, but the
underlying logic remains the same. As long as
research design is balanced-that is, there are equal
numbers of observations in all of the study
groups-ANOVA can be used to analyze the
individual and joint effects of the independent
variables and to partition the total variation into
the various component parts.
Analysis of covariance (ANCOVA)
Analysis of variance (ANOVA) and analysis of
covariance (ANCOVA) are methods for evaluating
studies in which the dependent variable is
continuous. If the independent variables are all of
the categorical type (nominal or dichotomous), then
ANOVA is used.
However, if some of the independent variables are
categorical and some are continuous, then
ANCOVA is appropriate.
ANCOVA would be used, for example, in a study
in which the goal was to test the effects of
hypertensive drugs on systolic blood pressure (a
continuous variable that is the dependent variable
here) and the independent variables were age (a
continuous variable) and treatment (a categorical
variable with four levels-i.e., those treated with
drug A, those treated with drug B, those treated
with both A and B, and those treated with a
placebo).
The ANCOVA procedure adjusts the dependent
variable on the basis of the continuous
independent variable or variables, and it then does
an N-Way ANOVA on the adjusted dependent
variable. In the above example, the ANCOVA
procedure would remove the effect of age from the
analysis of the effect of the drugs on systolic
blood pressure.
Controlling for age means that (artificially) all of
the study subjects are made the same age. Suppose
that the mean systolic blood pressure in the study
group is 150 mm Hg at an average age of 50 years.
The first step (and this is all done by the computer
packages that have ANCOVA) is to do a simple
regression between age and blood pressure, which
shows that the blood pressure increases, say, an
average of 1 mm Hg for each year of age over 50
years and decreases an average of 1 mm Hg for
each year of age under 50. Thus, if a subject’s age
is 59, then 9 mm Hg would be subtracted from that
subject’s current blood pressure to arrive at the
adjusted blood pressure.
If another subject’s age is 35, then 15 mm Hg
would be added to that subject’s current blood
pressure to arrive at the adjusted value. If a
subject’s age is 50, no adjustment is necessary,
because that subject is already at the population
mean age. ANCOVA can adjust the dependent
variable for several continuous independent
variables (called covariates) at the same time.
Multiple Linear Regression
If the dependent variable and all of the
independent variables are continuous, the correct
type of multi-variable analysis is multiple linear
regression. There are several computerized
methods of analyzing the data in a multiple linear
regression. Probably the most common method is
called stepwise linear regression.
The investigator either chooses which variable to
being with (i.e., to enter first in the analysis) or
else instructs the computer to start by entering the
one variable that has the strongest association with
the dependent variable. In either case, when only
the first variable has entered, the result is a simple
regression analysis.
Next, the second variable is entered according to
the investigator’s instructions. The explanatory
strength of the variable entered- that is, the r2
–
changes as each new variable is entered. The
“stepping” continues until none of the remaining
independent variables meets the predetermined
criterion for being entered (e.g., p is ≤ 0.1 or the
increase in r2
is ≥ 0.01) or until all of the variables
have been entered. When the stepping stops, the
analysis is complete.
In addition to watching for the statistical
significance of the overall equation and of each
variable entered, the investigator keeps a close
watch on the overall r2
for each step, which is the
proportion of variation the model has explained so
far.
In multiple regression equations that are
statistically significant, the increase in the total r2
after each step, compared with the total r2
after the
previous step, indicates how much additional
variation is explained by the variable just entered.
References
1.C. Bernard, An introduction to the study of
experimental medicine.
2. Daniel McCann, ‘Dental research: The clinical trial
formula’, JADA 1990 Apr, 384-392.
3. J. M. Dunning, Principles of Dental public health,
fourth edition, 1986.
4. National medical series,Preventive medicine &
public health, second edition, 1992.
5. G. M. Gluck, W.M. Morganstei, Jong’s community
detal health, fifth edition, 2003.
6. J.F. Jekel, D. L. Katz, Epidemiology, biostatistics
and preventive medicine, second edition, 2001.
7. Cynthia M. Pine, Community oral health, first
edition, 1997.
8. Park’s text book of preventive and social
medicine, eighteenth edition, 2006
9. C. R. Kothari, Research Methodology- Methods &
Techniques, second edition, 2006.
10. Mahajan, Biostatistics, sixth edition, 2006.
11. B.Burt, Eklund, Dentistry, Dental practice & The
Community, sixthe edition, 2005.

Biostatics ppt

  • 1.
    PRESENTED BY, Dr. SushiKadanakuppe II year PG student Dept of Preventive & Community Dentistry Oxford Dental College & Hospital
  • 2.
    BIOSTATISTICS  INTRODUCTION  BASICCONCEPTS Data Distributions  DESCRIPTIVE STATISTICS Displaying data Frequency distribution tables. Graphs or pictorial presentation of data. Tables. Numerical summary of data Measure of central tendency Measure of dispersion.
  • 3.
     ANALYTICAL ORINFERENTIAL STATISTICS  The nature and purpose of statistical inference  The process of testing hypothesis a. False-positive & false-negative errors. b. The null hypothesis & alternative hypothesis c. The alpha level & p value d. Variation in individual observations and in multiple samples.
  • 4.
     Tests ofstatistical significance  Choosing an appropriate statistical test  Making inferences from continuous (parametric) data.  Making inferences from ordinal data.  Making inferences from dichotomous and nominal (nonparametric) data.  CONCLUSION
  • 5.
  • 6.
    The worker withhuman material will find the statistical method of great value and will have even more need for it than will the laboratory worker.
  • 7.
    Claude Bernard (1927)1, aFrench physiologist of the nineteenth century and a pioneer in laboratory research, writes: “We compile statistics only when we cannot possibly help it. Statistics yield probability, never certainty- and can bring forth only conjectural sciences.”
  • 8.
    The worker withhuman material, however, can seldom control environment, nor can bring about drastic changes in his subjects quickly, particularly if he is studying chronic disease.
  • 9.
    The variability ofhuman material, plus the fact that time allows the introduction of many additional factors which may contribute to a disease process, leaves the worker with quantitative data affected by a multiplicity of factors.
  • 10.
    Statistical methods becomesnecessary, probability becomes of great interest, and conjecture based upon statistical probability may show a way to break the chain of causation of a disease even before all factors entering into the production of the disease are clearly understood.
  • 11.
    Yule (1950)2 has definedstatistics as “methods specially adapted to the elucidation of quantitative data affected by a multiplicity of causes”.
  • 12.
    Fully half thework in the biostatistics involves common sense in the selection and interpretation of data. The magic of numbers is no substitute. Bernard points with derision at a German author who measured the salivary output of one sub maxillary and one parotid gland in a dog for one hour1 .
  • 13.
    This author thenproceeded to deduce the output of all salivary glands, right and left, and finally the output of saliva of a man per kilogram per day. The result, of course, was a very top-heavy structure built upon a set of observations entirely too small for the purpose. Work of this sort explains the jibes which so often ricochet upon better statisticians. Such mistakes can be avoided.
  • 14.
    Statisticians also sufferbecause they are so often content merely to collect and analyze data as an end in itself without the purpose or hope of producing new knowledge or a new concept. Conant (1947), in his book On Understanding Science, makes it very clear that new concepts must alternate with the collection of data if an advance in our knowledge is to occur3 .
  • 15.
    DEFINITION Statistics is ascientific field that deals with the collection, classification, description, analysis, interpretation, and presentation of data4 . • Descriptive statistics • Analytical statistics • Vital statistics
  • 16.
    a. Descriptive statisticsconcerns the summary measures of data for a sample of a population. b. Analytical statistics concerns the use of data from a sample of a population to make inferences about the population. c. Vital statistics is the ongoing collection by government agencies of data relating to events such as births, deaths, marriages, divorces and health- and –disease related conditions deemed reportable by local health authorities.
  • 17.
    USES Biostatistics is apowerful ally in the quest for the truth that infuses a set of data and waits to be told. • Statistics is a scientific method that uses theory and probability to aid in the evaluation and interpretation of measurements and data obtained by other methods.
  • 18.
    b. Statistics providesa powerful reinforcement for other determinants of scientific causality. c. Statistical reasoning, albeit unintentional or subconscious, is involved in all scientific clinical judgments, especially with preventive medicine/dentistry and clinical medicine/dentistry becoming increasingly quantitative.
  • 19.
  • 20.
    DATA Definition: Data arethe basic building blocks of statistics and refers to the individual values presented, measured, or observed.
  • 21.
    a. Population vssample. Data can be derived from a total population or a sample. 1. A population is the universe of units or values being studied. It can consist of individuals, objects, events, observations, or any other grouping. 2. A sample is a selected part of a population.
  • 22.
    The following aresome of the common types of samples: a) Simple random sample b) Systematic selected sample c) Stratified selected sample d) Cluster selected sample e) Nonrandomly selected, or convenience sample.
  • 23.
    b. Ungrouped vsgrouped 1. Ungrouped data are presented or observed individually. An example of ungrouped data is the following list of weights (in pounds) for six men: 140, 150, 150, 150, 160, and 160. 2. Grouped data are presented in groups consisting of identical data by frequency. An example of grouped data is the following list of weights for the six men noted above: 140 lb (one man), 150 lb (three men), and 160 lb (two men).
  • 24.
    c. Quantitative vsqualitative 1. Quantitative data are numerical, or based on numbers. An example of quantitative data is height measured in inches. 2. Qualitative data are nonnumerical, or based on a categorical scale. An example of qualitative data is height measured in terms of short, medium, and tall.
  • 25.
    d. Discrete vscontinuous 1.Discrete data or categorical data are data for which distinct categories and a limited number of possible values exist. An example of discrete data is the number of children in a family, that is, two or three children, but not 2.5 children. All qualitative data are discrete.
  • 26.
    Categorical data arefurther classified into two types: • nominal scale • ordinal scale.
  • 27.
    Nominal scale: A variablemeasured on a nominal scale is characterized by named categories having no particular order. For example,  patient gender (male/female),  reason for dental visit (checkup, routine treatment, emergency), and  use of fluoridated water (yes/no) are all categorical variables measured on a nominal scale.
  • 28.
    Within each ofthese scales, an individual subject may belong to only one level, and one level does not mean something greater than any other level.
  • 29.
    Ordinal scale Ordinal scaledata are variables whose categories possess a meaningful order. For example,  Severity of periodontal disease (0=none, 1=mild, 2=moderate, 3=severe) and  Length of time spent in a dental office waiting room (1= less than 15 min, 2= 15 to less than 30 minutes, 3= 30 minutes or more) are variables measured on ordinal scales.
  • 30.
    2. Continuous dataor measurement data are data for which there are an unlimited number of possible values. An example of continuous data is an individual’s weight, which may actually be 159.232872…lb but is reported as 159 lb.
  • 31.
    • Measurement datacan be characterized by interval scale ratio scale • If the continuous scale has a true 0 point, the variables derived from it can be called ratio variables. The Kelvin temperature scale is a ratio scale, because 0 degrees on this scale is absolute 0.
  • 32.
    • The centigradetemperature scale is a continuous scale but not a ratio scale, because 0 degrees on this scale does not mean the absence of heat. So this becomes an example of an interval scale, as zero is only a reference point.
  • 33.
    e. The qualityof measured data is defined in terms of the data’s accuracy, validity, precision, and reliability. 1. Accuracy refers to the extent that the measurement measures the true value of what is under study. 2. Validity refers to the extent that the measurement measures what it is supposed to measure.
  • 34.
    3. Precision refersto the extent that the measurement is detailed. 4. Reliability refers to the extent that the measurement is stable and dependable.
  • 35.
    Dental health professionalshave a variety of uses for data5 : • For designing a health care program or facility • For evaluating the effectiveness of an oral hygiene education program • For determining the treatment needs of a specific population • For proper interpretation of the scientific literature.
  • 36.
    DISTRIBUTIONS Definition. A distributionis a complete summary of frequencies or proportions of a characteristic for a series of data from a sample or population. Types of distributions • Binomial distribution • Uniform distribution • Skewed distribution • Normal distribution • Log-normal distribution • Poisson distribution
  • 37.
    a. Binomial distributionis a distribution of possible outcomes from a series of data characterized by two mutually exclusive categories. b. Uniform distribution, also called rectangular distribution, is a distribution in which all events occur with equal frequency.
  • 38.
    c. Skewed distributionis a distribution that is asymmetric. 1. A skewed distribution with a tail among the lower values being characterized is skewed to the left, or negatively skewed. 2. A skewed distribution with a tail among the higher values being characterized is skewed to the right, or positively skewed.
  • 39.
    d. Normal distribution,also called Gaussian distribution, is a continuous, symmetric, bell- shaped distribution and can be defined by a number of measures. e. Log-normal distribution is a skewed distribution when graphed using an arithmetic scale but a normal distribution when graphed using a logarithmic scale. f. Poisson distribution is used to describe the occurrence of rare events in a large population.
  • 40.
  • 41.
  • 44.
  • 45.
    Descriptive statistical techniquesenable the researchers to numerically describe and summarize a set of data.
  • 46.
    Data can bedisplayed by the following ways: Frequency distribution tables. Graphs or pictorial presentation of data. Tables. Numerical summary of data Measure of central tendency Measure of dispersion.
  • 47.
    I DISPLAYING DATA Datacan be displayed by the following ways: Frequency distribution tables. Graphs or pictorial presentation of data. Tables.
  • 48.
    Frequency Distribution Tables Tobetter explain the data that have been collected, the data values are often organized and presented in a table termed a frequency distribution table. This type of data display shows each value that occurs in the data set and how often each value occurs.
  • 49.
    In addition toproviding the sense of the shape of a variable’s distribution, these displays provide the researcher with an opportunity to screen the data values for incorrect or impossible values, a first step in the process known as “cleaning the data”5
  • 50.
    • The datavalues are first arranged in order from lowest to highest value (an array). • The frequency with which each value occurs is then tabulated.
  • 51.
    • The frequencyof occurrence for each data point is expressed in four ways: 1. The actual count or frequency 2. The relative frequency (percent of the total number of values). 3. Cumulative frequency (total number of observations equal to or less than the value) 4. Cumulative relative frequency (the percent of observations equal to or less than the value) commonly referred to as percentile.
  • 52.
    Exam Scores Frequency %cumulative frequency cumulative % 56 1 3.0 1 3.0 57 1 3.0 2 6.1 63 1 3.0 3 9.1 65 2 6.1 3 15.2 66 1 3.0 3 18.2 68 2 6.1 5 24.2 69 2 6.1 6 30.3 70 2 3.0 8 36.4 71 1 3.0 10 42.4 72 1 6.1 11 45.5 74 2 3.0 12 48.5 75 1 3.0 14 54.5 76 3 6.1 15 63.6 77 2 9.1 16 69.7 78 1 6.1 18 72.7 79 1 3.0 21 75.8 80 2 3.0 23 84.8 81 3 3.0 24 87.9 Frequency Distribution Table for exam scores
  • 53.
    • Instead ofdisplaying each individual value in a data set, the frequency distribution for a variable can group values of the variable into consecutive intervals. • Then the number of observations belonging to an interval is counted.
  • 54.
    Exam scores Numberof students % 56-61 2 6 62-65 3 9 66-69 5 15 70-73 4 12 74-77 7 21 78-81 7 21 82-85 3 9 86-89 2 6 Grouped frequency distribution of exam scores
  • 55.
    Although the dataare condensed in a useful fashion, some information is lost. The frequency of occurrence of an individual data point cannot be obtained from a grouped frequency distribution. For example, in the above presentation of data, seven students scored between 74 and 77, but the number of students who scored 75 is not shown here.
  • 56.
    Graphic or pictorialpresentation of data Graphic or pictorial presentations of data are useful in simplifying the presentation and enhancing the comprehension of data. All graphs, figures, and other pictures should have clearly stated and informative titles, and all axes and keys should be clearly labeled, including the appropriate units of measurement.
  • 57.
    Visual aids cantake many forms; some basic methods of presenting data are described below. 1. Pie chart A pie chart is a pictorial representation of the proportional divisions of a sample or population, with the divisions represented as parts of a whole circle.
  • 58.
    cervical caries Occlusal caries Rootcaries Dental caries in xerostomia patients 39% 42% 19%
  • 59.
    2. Venn diagram AVenn diagram shows the degrees of overlap and exclusivity for two or more characteristics or factors within a sample or population (in which case each characteristic is represented by a whole circle) or for a characteristic or factor among two or more samples or populations (in which case each sample or population is represented by a whole circle).
  • 60.
    The sizes ofthe circles (or other symbols) need not be equal and may represent the relative size for each factor or population.
  • 62.
    3. Bar diagram A bar diagram is a tool for comparing categories of mutually exclusive discrete data.  The different categories are indicated on one axis, the frequency of data in each category is indicated on the other axis, and the lengths of the bars compare the categories.  Because the data categories are discrete, the bars can be arranged in any order with spaces between them.
  • 63.
    Dental caries inXerostomia Patients 0 20 40 60 80 cervical caries Occlusal caries Rootcaries Series1
  • 64.
    4. Histogram A histogramis a special form of bar diagram that represents categories of continuous and ordered data. The data are adjacent to each other on the x-axis (abscissa), and there is no intervening space. The frequency of data in each category is depicted on the y-axis (ordinate), and the width of the bar represents the interval of each category.
  • 65.
    0 10 20 30 40 50 No of Subjects 5to 10 years 10 to 15 15 to 20 20 to 25 25 to 30 Histogram of age for xerostomia subjects
  • 66.
    5. Epidemic curve Anepidemic curve is a histogram that depicts the time course of an illness, disease, abnormality, or condition in a defined population and in a specified location and time period. The time intervals are indicated on the x-axis, and the number of cases during each time interval is indicated on the y-axis.
  • 67.
    An epidemic curvecan help an investigator determine such outbreak characteristics as the peak of disease occurrence (mode), a possible incubation or latency period, and the type of disease propagation.
  • 69.
    6. Frequency polygon Afrequency polygon is a representation of the distribution of categories of continuous and ordered data and, in this respect, is similar to a histogram. The x-axis depicts the categories of data, and the y-axis depicts the frequency of data in each category.
  • 70.
    In a frequencypolygon, however, the frequency is plotted against the midpoint of each category, and a line is drawn through each of these plotted points. The frequency polygon can be more useful than the histogram because several frequency distributions can be plotted easily on one graph.
  • 71.
    Frequency polygon showingcancer mortality by age group and sex
  • 72.
    7. Cumulative frequencygraph A cumulative frequency graph also is a representation of the distribution of continuous and ordered data. In this case, however, the frequency of data in each category represents the sum of the data from that category and from the preceding categories.
  • 73.
    The x-axis depictsthe categories of data, and the y- axis is the cumulative frequency of data, sometimes given as a percentage ranging from 0% to 100%. The cumulative frequency graph is useful in calculating distribution by percentile, including the median, which is the category of data that occurs at the cumulative frequency of 50%.
  • 74.
    Medical examiner reported(MER) in St. Louis for the years 1979, 1980, & 1981
  • 75.
    8. Box plot A box plot is a representation of the quartiles [25%, 50% (median), and 75%] and the range of a continuous and ordered data set.  The y-axis can be arthimetic or logarithmic.  Box plots can be used to compare the different distributions of data values.
  • 76.
    Distribution of weightsof patients from hospital A and hospital B
  • 77.
    9. Spot map Aspot map, also called a geographic coordinate chart, is a map of an area with the location of each case of an illness, disease, abnormality, or condition identified by a spot or other symbol on the map. A spot map often is used in an outbreak setting and can help an investigator determine the distribution of cases and characterize an outbreak if the population at risk is distributed evenly over the area.
  • 78.
    Distribution of Lymedisease cases in Canada from 1977 to 1989
  • 79.
    TABLES In addition tographs, data are often summarized in tables. When material is presented in tabular form, the table should be able to stand alone; that is, correctly presented material in tabular form should be understandable even if the written discussion of the data is not read.
  • 80.
    A major concernin the presentation of both figures and tables is readability. Tables and figures must be clearly understood and clearly labeled so that the reader is aided by the information rather than confused.
  • 81.
    Suggestions for thedisplay of data in graphic or tabular form5 : 1. The contents of a table as a whole and the items in each separate column should be clearly and fully defined. The unit of measurement must be included. 2. If the table includes rates, the basis on which they are measured must be clearly stated- death rate percent, per thousand, per million, as the case may be.
  • 82.
    3. Rates orproportions should not be given alone without any information as to the numbers of observations on which they are based. By giving only rates of observations and omitting the actual number of observations, we are excluding the basic data.
  • 83.
    4. Where percentagesare used, it must be clearly indicated that these are not absolute numbers. Rather than combine too many figures in one table, it is often best to divide the material into two or three small tables. 5. Full particulars of any exclusion of observations from a collected series must be given. The reasons for and the criteria of exclusions must be clearly defined, perhaps in a footnote.
  • 84.
    II NUMERICAL SUMMARYOF DATA Although graphs and frequency distribution tables can enhance our understanding of the nature of a variable, rarely do these techniques alone suffice to describe the variable. A more formal numerical summary of the variable is usually required for the full presentation of a data set.
  • 85.
    To adequately describea variable’s values, three summary measures are needed: 1. The sample size. 2. A measure of central tendency 3. A measure of dispersion.
  • 86.
     The samplesize is simply the total number of observations in the group and is symbolized by the letter N or n.  A measure of central tendency or location describes the middle (or typical) value in a data set.  A measure of dispersion or spread quantifies the degree to which values in a group vary from one another.
  • 87.
  • 88.
    Whenever one wishesto evaluate the outcome of study, it is crucial that the attributes of the sample that could have influenced it be described. Three statistics, the mode, median, and mean, provide a means of describing the “typical” individual within a sample. These statistics are frequently referred to as “measures of central tendency”.
  • 89.
     Measures ofcentral tendency are characteristics that describe the middle or most commonly occurring values in a series.  They tell us the point about which items have a tendency to cluster. Such a measure is considered as the most representative figure for the entire mass of data.  They are used as summary measures for the series. The series can consist of a sample of observations or a total population, and the vales can be grouped or ungrouped. Measure of central tendency is also known as statistical average.
  • 90.
    1. Mode The modeof a data set is that value that occurs with the greatest frequency. A series may have no mode (i.e., no value occurs more than once) or it may have several modes (i.e., several values equally occur at a higher frequency than the other values in the series).
  • 91.
    Whenever there aretwo nonadjacent scores with the same frequency and they are the highest in the distribution, each score may be referred to as the ‘mode’ and the distribution is ‘bimodal’. In truly bimodal distribution, the population contains two sub-groups, each of which has a different distribution that peaks at a different point.
  • 92.
    More than onemode can also be produced artificially by what is known as digit preference, when observers tend to favor certain numbers over others. For example, persons who measure blood pressure values tend to favor even numbers, particularly those ending in 0 (e.g., 120 mm Hg).
  • 93.
    Calculation: The modeis calculated by determining which value or values occur most in a series.
  • 94.
    Example: consider thefollowing data. Patients who had received routine periodontal scaling were given a common pain-relieving drug and were asked to record the minutes to 100% pain relief. Note that “minutes to pain relief” is a continuous variable that is measured on the ratio scale. The patients recorded the following data:
  • 95.
    Minutes to 100%pain relief: 15 14 10 18 8 10 12 16 10 8 13 First, make an array, that is, arrange the values in ascending order: 8 8 10 10 10 12 13 14 15 16 18 By inspection, we already know two descriptive measures belonging to this data: N=11 and mode=10.
  • 96.
    Application and characteristics 1.The primary value of the mode lies in its ease of computation and in its convenience as a quick indicator of a central value in a distribution. 2. The mode is useful in practical epidemiological work, such as determining the peak of disease occurrence in the investigation of a disease.
  • 97.
    3. The modeis the most difficult measure of central tendency to manipulate mathematically, that is, it is not amenable to algebraic treatment; no analytic concepts are based on the mode.
  • 98.
    4. It isalso the least reliable because with successive samplings from the same population the magnitude of the mode fluctuates significantly more than the median or mean. It is possible, for example, that a change in just one score can substantially change the value of the modal score.
  • 99.
    2. Median P50 Themedian is the value that divides the distribution of data points into two equal parts, that is, the value at which 50% of the data points lie above it and 50% lie below it. The median is the middle of the quartiles (the values that divide the series into quarters) and the middle of the percentiles (the values that divide the series into defined percentages).
  • 100.
    Calculation: a) In aseries with an odd number of values, the values in the series are arranged from lowest to highest, and the value that divides the series in half is the median. b) In a series with even number of values, the two values that divide the series in half are determined, and the arithmetic mean of these values is the median. c) An alternative method for calculating the median is to determine the 50% value on a cumulative frequency curve.
  • 101.
    Example: In theabove example of data series of minutes to 100% pain relief, 8 8 10 10 10 12 13 14 15 16 18 determine which value cuts the array into equal portions. In this array, there are five data points below 12 and there are five data points above 12. Thus the median is 12. 8 8 10 10 10 12 13 14 15 16 18 ⇑ Median
  • 102.
    If the numberof observations is even, unlike the preceding example, simply take the midpoint of the two values that would straddle the center of the data set. Consider the following data set with N=10: 8 8 10 10 10 13 14 15 16 18 ⇑ Median = 10+13  = 11.5 2
  • 103.
    Applications and characteristics: 1.Themedian is not sensitive to one or more extreme values in a series; therefore, in a series with an extreme value, the median is a more representative measure of central tendency than the arithmetic mean.
  • 104.
    2. It isnot frequently used in sampling statistics. In terms of sampling fluctuation, the median is superior to the mode but less stable than the mean. For this reason, and because the median does not possess convenient algebraic properties, it is not used as often as the mean. 3. Median is a positional average and is used only in the context of qualitative phenomena, for example, in estimating intelligence, etc., which are often encountered in sociological fields.
  • 105.
    4. Median isnot useful where items need to be assigned relative importance and weights. 5. The median is used in cumulative frequency graphs and in survival analysis.
  • 106.
    3. Arithmetic Mean Thearithmetic mean, or simply, the mean, is the sum of all values in a series divided by the actual number of values in a series. The symbol for the mean is a capital letter X with a bar above it:Χ or “X-bar”.
  • 107.
    Calculation: The arithmetic meanis determined as Χ = ∑ X / N
  • 108.
    Example: Using the minutesto pain relief, N = 11 and ∑ X = 134. Therefore Χ = 134 / 11 = 12.2 min
  • 109.
    Properties of theMean 1. The mean of a sample is an unbiased estimator of the mean of the population from which it came. 2. The mean is the mathematical expectation. As such, it is different from the mode, which is the value observed most often.
  • 110.
    3. The sumof the squared deviations of the observations from the mean is smaller than the sum of the squared deviations from any other number. 4. The sum of the squared deviations from the mean is fixed for a given set of observations. This property is not unique to the mean, but it is a necessary property of any good measure of central tendency.
  • 111.
    Applications and characteristics: 1.The arithmetic mean is useful when performing analytic manipulation. With the exception of a situation where extreme scores occur in the distribution, the mean is generally the best measure of central tendency.  The values of mean tend to fluctuate least from sample to sample.
  • 112.
     It isamenable to algebraic treatment and it possesses known mathematical relationships with other statistics.  Hence, it is used in further statistical calculations. Thus, in most situations the mean is more likely to be used than either the mode or the median.
  • 113.
    2. The meancan be conceptualized as a fulcrum such that the distribution of scores around it is in perfect balance. Since the scores above and below the mean are in perfect balance, it follows that the algebraic sum of the observations of these scores from the mean is 0.
  • 114.
    3. Whereas themedian counts each score, no matter what its magnitude, as only one score, the mean takes into account the absolute magnitude of the score. The median, therefore, does not balance the halves of the distribution except when the distribution is exactly symmetrical; in which case the mean and the median have identical values.
  • 115.
    4. Another wayof contrasting the median and the mean is to compare their values when the distribution of scores is not symmetrical.
  • 116.
    Curve (a) ispositively skewed; that is, the curve tails off to the right. In this case the mean is larger than the median because of the influence of the few very high scores. Thus these high scores are sufficient to balance off the several lower scores. The median does not balance the distribution because the magnitude of the scores is not included in the computation. xP50
  • 117.
    Curve (b) isnegatively skewed; that is, the curve tails off to the left. Now the mean is smaller than the median because of the effect of the few very small scores. xP50
  • 118.
    5. It suffersfrom some limitations viz., it is unduly affected by extreme items; it may not coincide with actual value of an item in a series, and it may lead to wrong impressions, particularly when the item values are not given the average.
  • 119.
    Let’s refer againto the group of values in which one patient recorded a rather extreme, for this group, value: 8 8 10 10 10 12 13 14 15 16 58 The adjusted mean, somewhat larger than the original mean of 12.2, is calculated as follows: X = 174 / 11 = 15.8 min
  • 120.
    The calculation ofthe mean is correct, but is its use appropriate for this data set? By definition the mean should describe the middle of the data set. However, for this data set the mean of 15.8 is larger than most (9 out of 11!) of the values in the group. Not exactly a picture of the middle! In this case the median (12 minutes) is the better choice for the measure of central tendency and should be used.
  • 121.
    However, mean isbetter than other averages, especially in economic and social studies where direct quantitative measurements are possible.
  • 122.
    4. Geometric mean Thegeometric mean is the nth root of the product of the values in a series of n values. Geometric mean (or G.M.) = n π XN Where, G.M. = geometric mean, N = number of items, π = Conventional product notation For instance, the geometric mean of the numbers, 4, 6, and 9 is worked out as G.M.= 3 4.6.9 = 6
  • 123.
    Applications and characteristics 1.The geometric mean is more useful and representative than the arithmetic mean when describing a series of reciprocal or fractional values. The most frequently used application of this average is in the determination of average percent of change i.e., it is often used in the preparation of index numbers or when we deal in ratios.
  • 124.
    2. The geometricmean can be used only for positive values. 3. It is more difficult to calculate than the arithmetic mean.
  • 125.
    5. Harmonic mean Harmonicmean is defined as the reciprocal of the average of reciprocals of the values of items of a series. Symbolically, we can express it as under: Σ Rec X i Harmonic mean (H.M.) = Rec.  N
  • 126.
    Applications and characteristics: 1.Harmonic mean is of limited application, particularly in cases where time and rate are involved. 2. The harmonic mean gives largest weight to the smallest item and smallest weight to the largest item. 3. As such it is used in cases like time and motion study where time is variable and distance constant.
  • 127.
  • 128.
    Measures of centraltendency provide useful information about the typical performance for a group of data. To understand the data more completely, it is necessary to know how the members of the data set arrange themselves about the central or typical value.
  • 129.
    The following questionsmust be answered: How spread out are the data points? How stable are the values in the group?
  • 130.
    The descriptive toolsknown as measures of dispersion answer these questions by quantifying the variability of the values within a group. Hence, they are the characteristics that are used to describe the spread, variation, and scatter of a series of values. The series can consist of observations or a total population, and the values can be grouped or ungrouped.
  • 131.
    This can bedone by calculating measures based on percentiles or measures based on the mean6 . Measures of dispersion based on percentiles 1. Percentiles which are sometimes called quantiles, are the percentage of observations below the point indicated when all of the observations are ranked in descending order.
  • 132.
    The median, discussedabove, is the 50th percentile. The 75th percentile is the point below which 75% of the observations lie, while the 25th percentile is the point below which 25% of the observations lie.
  • 133.
    2. Range The rangeis the difference between the highest and lowest values in a series. Range = Maximum – Minimum. More usual, however, is the interpretation of the range as simply the statement of the minimum and maximum values: Range = (Minimum, Maximum)
  • 134.
    For the sampleof minutes to 100% pain relief, 8 8 10 10 10 12 13 14 15 16 58 Range = (8, 18) or Range = 18-8 = 10 min
  • 135.
     The overallrange reflects the distance between the highest and the lowest value in the data set. In this example it is 10 min.  In the same example, the 75th and 25th percentiles are 15 and 10 respectively and the distance between them is 5 min. This difference is called the interquartile range (sometimes abbreviated Q3 -Q1 ). Because of central clumping, the interquartile range is usually considerably smaller than half the size of the overall range of values.
  • 136.
    The advantage ofusing percentiles is that they can be applied to any set of continuous data, even if the data do not form any known distribution.
  • 137.
    Application and characteristics 1.The range is used to measure data spread. 2. The range presents the exact lower and upper boundaries of a set of data points and thus quickly lends perspective regarding the variable’s distribution.
  • 138.
    3. The rangeis usually reported along with the sample median (not the mean). 4. The range provides no information concerning the scatter within the series. 5. The range can be deemed unstable because it is affected by one extremely high score or one extremely low value. Also, only two values are considered, and these happen to be the extreme scores of the distribution. The measure of spread known as standard deviation addresses this disadvantage of the range.
  • 139.
    Measures of dispersionbased on the mean Mean deviation, variance, and standard deviation are three measures of dispersion based on the mean. Although mean deviation is seldom used, a discussion of it provides a better understanding of the concept of dispersion.
  • 140.
    1. Mean deviation Becausethe mean has several advantages, it might seem logical to measure dispersion by taking the “average deviation” from the mean. That proves to be useless, because the sum of the deviations from the mean is 0.
  • 141.
    However, this inconveniencecan easily be solved by computing the mean deviation, which is the average of the absolute value of the deviations from the mean, as shown in the following formula: Mean deviation = ∑ (X - X)  N
  • 142.
    Because the meandeviation does not have mathematical properties that enable many statistical tests to be based on it, the formula has not come into popular use. Instead, the variance has become the fundamental measure of dispersion in statistics that are based on the normal distribution.
  • 143.
    2. Variance The varianceis the sum of the squared deviations from the mean divided by the number of values in the series minus 1. Variance is symbolized by s2 or V. s2= Σ (X - X)2 / N-1 Σ (X - X)2 is called sum of squares.
  • 144.
    In the aboveformula, the squaring solves the problem that the deviations from the mean add up to 0. Dividing by N-1 (called degrees of freedom), instead of dividing by N, is necessary for the sample variance to be an unbiased estimator of the population variance.
  • 145.
    The numerator ofthe variance (i.e., the sum of the squared deviations of the observations from the mean) is an extremely important entity in statistics. It is usually called either the sum of squares (abbreviated SS) or the total sum of squares (TSS). The TSS measures the total amount of variation in a set of observations.
  • 146.
    Properties of thevariance 1. When the denominator of the equation for variance is expressed as the number of observations minus 1 (N-1), the variance of a random sample is an unbiased estimator of the variance of the population from which it was taken.
  • 147.
    2. The varianceof the sum of two independently sampled variables is equal to the sum of the variances. 3. The variance of the difference between two independently sampled variables is equal to the sum of their individual variances as well.
  • 148.
    Application and characteristics 1.The principal use of the variance is in calculating the standard deviation. 2. The variance is mathematically unwieldy, and its value falls outside the range of observed values in a data set. 3. The variance is generally of greater importance to statisticians than to researchers, students, and clinicians trying to understand the fruits of data collection.
  • 149.
    We should notethat the sample variance is a squared term, not so easy to fathom in relation to the sample mean. Thus the square root of the variance, the standard deviation, is desirable.
  • 150.
    3. Standard deviation(s or SD) The standard deviation is a measure of the variability among the individual values within a group. Loosely defined, it is a description of the average distance of individual observations from the group mean.
  • 151.
    Conceptualizing the s,or any of the measures of variance, is more difficult than understanding the concept of central tendency. From one point of view, however, the s is similar to the mean; that is; it represents the mean of the squared deviations.
  • 152.
    Taking the meanand the standard deviation together, a sample can be described in terms of its average score and in terms of its average variation. If more samples were taken from the same population it would be possible to predict with some accuracy the average score of these samples and also the amount of variation.
  • 153.
    The mathematical derivationof the standard deviation is presented here in some detail because the intermediate steps in its calculation (1) create a theme (called “sum of squares”) that is repeated over and over in statistical arithmetic and (2) create the quantity known as the sample variance.
  • 154.
    Calculation: STEPS MATHEMATICAL TERM LABEL 1. Calculatethe mean X of the group X = Σ X / N Sample mean 2. Subtract the mean from each value X. (X - X) Deviation from the mean 3. Square each deviation from the mean. (X - X)2 Squared deviation from the mean. 4. Add the squared deviations from the mean. Σ (X - X)2 Sum of squares (ss)
  • 155.
    5. Divide the sumof squares by (N-1). ss / (N -1) Variance (s2 ) 6. Find the square root of the variance. s2 Standard deviation (SD or s) The above table presents the calculation of the standard deviation for our sample of minutes to 100% pain relief.
  • 156.
    We now havetwo sets of complete sample description for our example. Sample Description 1 Sample Description 2 Sample size N = 11 N = 11 Measure of central tendency Median = 12 min X = 12.2 minutes Measure of spread Range = (8, 18) SD = 3.31
  • 157.
    The standard deviationis reported along with the sample mean, usually in the following format: mean ± SD. This format serves as a pertinent reminder that the SD measures the variability of values surrounding the middle of the data set.
  • 158.
    It also leadsus to the practical application of the concepts of mean and standard deviation shown in the following rules of thumb: X ± 1 SD encompasses approximately 68% of the values in a group. X ± 2 SD encompasses approximately 95% of the values in a group. X ± 3 SD encompasses approximately 99% of the values in a group.
  • 159.
    These rules ofthumb are useful when deciding whether to report the mean ± SD or the median and range as the appropriate descriptive statistics for a group of data points. If roughly 95% of the values in a group are contained in the interval X ± 2SD, researchers tend to use mean ± SD. Otherwise the median and the range are perhaps more appropriate.
  • 160.
    Applications and characteristics 1.The standard deviation is extremely important in sampling theory, in co relational analysis, in estimating reliability of measures, and in determining relative position of an individual within a distribution of scores and between distributions of scores.
  • 161.
    2. The standarddeviation is the most widely used estimate of variation because of its known algebraic properties and its amenability to use with other statistics.
  • 162.
    3. It alsoprovides a better estimate of variation in the population than the other indexes. 4. The numerical value of standard deviation is likely to fluctuate less from sample to sample than the other indexes.
  • 163.
    5. In certaincircumstances, quantitative probability statements that characterize a series, a sample of observations, or a total population can be derived from the standard deviation of the series, sample, or population. 6. When the standard deviation of any sample is small, the sample mean is close to any individual value.
  • 164.
    7. When standarddeviation of a random sample is small, the sample mean is likely to be close to the mean of all the data in the population. 8. The standard deviation decreases when the sample size increases.
  • 165.
    4. Coefficient ofvariation The coefficient of variation is the ratio of the standard deviation of a series to the arithmetic mean of the series. The coefficient of variation is unit less and is expressed as a percentage.
  • 166.
    Application and characteristics 1.The co efficient of variation is used to compare the relative variation, or spread, of the distributions of different series, samples, or populations or of the distributions of different characteristics of a single series. 2. The coefficient of variation can be used only for characteristics that are based on a scale with a true zero value.
  • 167.
    Calculation: The coefficient ofvariation (CV) is calculated as CV (%) = SD / X × 100
  • 168.
    For example, In atypical medical school, the mean weight of 100 fourth-year medical students is 140 lb, with a standard deviation of 28 lb. CV (%) = 28 / 140 × 100 = 20% The coefficient of variation for weight is 28 lb divided by 140 lb, or 20%.
  • 169.
    THE NORMAL DISTRIBUTION Themajority of measurements of continuous data in medicine and biology tend to approximate the theoretical distribution that is known as the normal distribution and is also called the gaussian distribution (named after Johann Karl Gauss, the person who best described it)6.
  • 170.
    • The normaldistribution is one of the most frequently used distributions in biomedical and dental research. • The normal distribution is a population frequency distribution. • It is characterized by a bell-shaped curve that is unimodal and is symmetric around the mean of the distribution.
  • 171.
    • The normalcurve depends on only two parameters: the population mean and the population standard deviation. • In order to discuss the area under the normal curve in terms of easily seen percentages of the population distribution, the normal distribution has been standardized to the normal distribution in which the population mean is 0 and the population standard deviation is 1.
  • 172.
    • The areaunder the normal curve can be segmented starting with the mean in the center (on the x axis) and moving by increments of 1 SD above and below the mean.
  • 173.
    Figure shows astandard normal distribution (mean = 0; SD= 1) and the percentages of area under the curve at each increment of SD. 34.13% 13.59% 2.27%.2.27%. 13.59% 34.13%
  • 174.
    • The totalarea beneath the normal curve is 1, or 100% of the observations in the population represented by the curve. • As indicated in the figure, the portion of the area under the curve between the mean and 1 SD is 34.13% of the total area. • The same area is found between the mean and one unit below the mean.
  • 175.
    Moving 2 SDmore above the mean cuts off an additional 13.59% of the area, and moving a total of 3 SD above the mean cuts off another 2.27%.
  • 176.
    The theory ofthe standard normal distribution leads us, therefore, to the following property of a normally distributed variable: Exactly 68.26% of the observations lie within 1 SD of the mean. Exactly 95.45% of the observations lie within 2 SD of the mean. Exactly 99.73% of the observations lie within 3 SD of the mean.
  • 177.
    Virtually all ofthe observations are contained within 3 SD of the mean. This is the justification used by those who label values outside of the interval X ± 3 SD as “outliers” or unlikely values. Incidentally, the number of standard deviations away from the mean is called Z score.
  • 178.
    Problems In AnalyzingA Frequency Distribution In a normal distribution, the following holds true: mean =median =mode. In an observed data set, there may be skewness, kurtosis, and extreme values, in which case the measures of central tendency may not follow this pattern.
  • 179.
    Skewness and Kurtosis 1.Skewness. Ahorizontal stretching of a frequency distribution to one side or the other, so that one tail of observations is longer and has more observations than the other tail, is called skewness.
  • 181.
    When a histogramor frequency polygon has a longer tail on the left side of the diagram, the distribution is said to be skewed to the left. If a distribution is skewed, the mean moves farther in the direction of the long tail than does the median, because the mean is more heavily influenced by extreme values.
  • 182.
    . A quickway to get an approximate idea of whether or not a frequency distribution is skewed is to compare the mean and the median. If these two measures are close to each other, the distribution is probably not skewed.
  • 183.
    2.Kurtosis. It is characterizedby a vertical stretching of the frequency distribution. It is the measure of the peakedness of a probability distribution. As shown in the figure kurtotic distribution could look more peaked or could look more flattened than the bell shaped normal distribution. A normal distribution has zero kurtosis.
  • 185.
    • Significant skewnessor kurtosis can be detected by statistical tests that reveal that the observed data do not form a normal distribution. Many statistical tests require that the data they analyze be normally distributed, and the tests may not be valid if they are used to compare very abnormal distributions. • Kurtosis is seldom discussed as a problem in the medical literature, although skewness is frequently observed and is treated as a problem.
  • 186.
    3. Extreme values(Outliers) One of the most perplexing problems for the analysis of data is how to treat a value that is abnormally far above or below the mean. However, before analyzing the data set, the investigator would want to be sure that this item of data was legitimate and would check the original source of data. Although the value is an outlier, it may probably be correct.
  • 187.
    PRESENTED BY, Dr. SushiKadanakuppe II year PG student Dept of Preventive & Community Dentistry Oxford Dental College & Hospital
  • 188.
     ANALYTICAL ORINFERENTIAL STATISTICS  The nature and purpose of statistical inference  The process of testing hypothesis a. False-positive & false-negative errors. b. The null hypothesis & alternative hypothesis c. The alpha level & p value d. Variation in individual observations and in multiple samples.
  • 189.
     Tests ofstatistical significance  Choosing an appropriate statistical test  Making inferences from continuous (parametric) data.  Making inferences from ordinal data.  Making inferences from dichotomous and nominal (nonparametric) data.  REFERENCES
  • 190.
    THE NATURE ANDPURPOSE OF STATISTICAL INFERENCE As stated earlier, it is often impossible to study each member of a population. Instead, we select a sample from the population and from that sample attempt to generalize to the population as a whole. The process of generalizing sample results to a population is termed statistical inference and is the end product of formal statistical hypothesis testing.
  • 191.
    Inference means thedrawing of conclusions from data. Statistical inference can be defined as the drawing of conclusions from quantitative or qualitative information using the methods of statistics to describe and arrange the data and to test suitable hypotheses.
  • 192.
    Differences Between DeductiveReasoning And Inductive Reasoning Because data do not come with their own interpretation, the interpretation must be put into the data by inductive reasoning (from Latin, meaning “to lead into”). This approach to reasoning is less familiar to most people than is deductive reasoning (from Latin, meaning “to lead out from”), which is learned from mathematics, particularly from geometry.
  • 193.
    Deductive reasoning proceedsfrom the general (i.e., from assumptions, from propositions, and from formulas considered true) to the specific (i.e., to specific members belonging to the general category). Consider, for example, the following two propositions: (1) All Americans believe in democracy. (2) This person is an American. If both propositions are true, then the following deduction must be true: This person believes in democracy.
  • 194.
    Deductive reasoning isof special use in science once hypotheses are formed. Using deductive reasoning, an investigator says, If this hypothesis is true, then the following prediction or predictions also must be true.
  • 195.
    If the dataare inconsistent with the predictions from the hypothesis, they force a rejection or modification of the hypothesis. If the data are consistent with the hypothesis, they cannot prove that the hypothesis is true, although they do lend support to the hypothesis. To reiterate, even if the data are consistent with the hypothesis, they do not prove the hypothesis.
  • 196.
    Physicians often proceedfrom formulas accepted as true and from observed data to determine the values that variables must have in a certain clinical situation. For example, if the amount of a medication that can be safely given per kilogram of body weight (a constant) is known, then it is simple to calculate how much of that medication can be given to a patient weighing 50 kg. This is deductive reasoning, because it proceeds from the general (a constant and a formula) to the specific (the patient).
  • 197.
    Inductive reasoning, incontrast, seeks to find valid generalizations and general principles from data. Statistics, the quantitative aid to inductive reasoning, proceeds from the specific (that is, from data) to the general (that is, to formulas or conclusions about the data).
  • 198.
    For example, bysampling a population and determining both the age and the blood pressure of the persons in the sample (the specific data), an investigator using statistical methods can determine the general relationship between age and blood pressure (e.g., that, on the average, blood pressure increases with age).
  • 199.
    Differences Between MathematicsAnd Statistics The differences between mathematics and statistics can be illustrated by showing that they form the basis for very different approaches to the same basic equation: y = mx + b
  • 200.
    This equation isthe formula for a straight line in analytic geometry. It is also the formula for simple regression analysis in statistics, although the letters used and their order customarily are different.
  • 201.
    In the mathematicalformula above, the b is a constant, and it stands for the y-intercept (i.e., the value of y when the variable x equals 0). The value m is also a constant, and it stands for the slope (the amount of change in y for a unit increase in the value of x). The important thing to notice is that in mathematics, one of the variables (either x or y) is unknown (i.e., to be calculated), while the formula and the constants are known.
  • 202.
    In statistics, however,just the reverse is true: the variables, x and y, are known for all observations, and the investigator usually wishes to determine whether or not there is a linear (straight line) relationship between x and y, by estimating the slope and the intercept. This can be done using the form of analysis called linear regression, which is discussed later.
  • 203.
    As a generalrule, what is known in statistics is unknown in mathematics, and vice versa. In statistics, the investigator starts from the specific observations (data) to induce or estimate the general relationships between variables.
  • 204.
    Probability The probability ofa specified event is the fraction, or proportion, of all possible events of a specified type in a sequence of almost unlimited random trials under similar conditions. The probability of an event is the likelihood the event will occur; it can never be greater than 1 (100%) or less than 0 (0%).
  • 205.
    Applications and characteristics 1.The probability values in a population are distributed in a definable manner that can be used to analyze the population. 2. Probability values that do not follow a distribution can be analyzed using nonparametric methods.
  • 206.
    Calculation The probability ofan event is determined as P (A) = A / N Where P (A) = the probability of event A occurring; A = the number of times that event A actually occurs; and N = the total number of events during which event A can occur.
  • 207.
    Example: A medicalstudent performs venipunctures on 1000 patients and is successful on 800 in the first attempt. Assuming that all other factors are equal (i.e., random selection of patients), the probability that the next venipuncture will be successful on the first attempt is 80%.
  • 208.
    Rules a. Additive rule 1.Definition. The additive rule applies when considering the probability of one of at least two mutually exclusive events occurring, which is calculated by adding together the probability value of each event.
  • 209.
    Calculation. The probabilityof only one of two mutually exclusive events is determined as P (A or B) = P (A) + P (B) Where P (A or B) = the probability of event A or event B occurring.
  • 210.
    1. Example. About6.3% of all medical students are black, and 5.5% are Hispanics The probability that a medical student will ever be either black or Hispanic is 6.3% plus 5.5%, or 11.8%.
  • 211.
    a. Multiplicative rule. 1.Definition. The multiplicative rule applies when considering the probability of at least two independent events occurring together, which is calculated by multiplying the probability values for the events.
  • 212.
    1. Calculation. Theprobability of two independent events occurring together is determined as P (A and B) = P (A) × P (B) Where P (A and B) = the probability of both event A and event B occurring.
  • 213.
    1. Example. About6.3% of all medical students are black and 36.1% of all students are women. Assuming race and sex are independent selection factors, the percentage of students who are black women should be about 6.3% multiplied by 36.1%, or 2.3%.
  • 214.
    THE PROCESS OFTESTING HYPOTHESES
  • 215.
    Hypotheses are predictionsabout what the examination of appropriate data will show. The following discussion introduces the basic concepts underlying the usual tests of statistical significance. These tests determine the probability that a finding (such as a difference between means or proportions) represents a true deviation from what was expected (i.e., from the model, which is often a null hypothesis that there will be no difference between the means or proportions).
  • 216.
     False PositiveAnd False Negative Errors Science is based on the following set of principles 1. Previous experience serves as the basis for developing hypotheses; 2. hypotheses serve as the basis for developing predictions; 3. and predictions must be subjected to experimental or observational testing.
  • 217.
    In deciding whetherdata are consistent or inconsistent with the hypotheses, investigators are subject to two types of error.
  • 218.
    They could assertthat the data support a hypothesis when in fact the hypothesis is false; this would be a false-positive error, which is also called an alpha error or a type I error. Conversely, they could assert that the data do not support the hypothesis when in fact the hypothesis is true; this would be a false-negative error, which is also called a beta error or a type II error.
  • 219.
    Based on theknowledge that the scientists become attached to their own hypotheses and based on the conviction that the proof in science, as in the courts, must be “beyond the reasonable doubt”, investigators are historically been particularly careful to avoid the false-positive error.
  • 220.
    Probably this isbest for theoretical science. In medicine, however, where a false-negative error in a diagnostic test may mean missing a disease until it is too late to institute therapy and where a false-negative error in the study of a medical intervention may mean overlooking an effective treatment, investigators cannot feel comfortable about false-negative errors either.
  • 221.
     The NullHypothesis And The Alternative Hypothesis The process of significance testing involves three basic steps: (1) Asserting the null hypothesis, (2) Establishing the alpha level, and (3) Rejecting or failing to reject a null hypothesis
  • 222.
    The first stepconsists of asserting the null hypothesis, which is the hypothesis that there is no real (true) difference between means or proportions of the groups being compared or that there is no real association between two continuous variables. It may seem strange to begin the process by asserting that something is not true, but it is far easier to reject an assertion than to prove something is true.
  • 223.
    If the dataare not consistent with the hypothesis, the hypothesis can be rejected. If the data are consistent with a hypothesis, this still does not prove the hypothesis, because other hypotheses may fit the data equally well.
  • 224.
    The second stepis to determine the probability of being in error if the null hypothesis is rejected. This step requires that the investigator establish an alpha level, as described below.
  • 225.
    If the pvalue is found to be greater than the alpha level, the investigator fails to reject the null hypothesis. If, however, the p value is found to be less than or equal to the alpha level, the next step is to reject the null hypothesis and to accept the alternative hypothesis, which is the hypothesis that there is in fact a real difference or association. Although it may seem awkward, this process is now standard in medical science and has yielded considerable scientific benefits.
  • 226.
    Statistical tests beginwith the statement of the hypothesis itself, but stated in the form of a null hypothesis. For example, consider again the group of patients who tested the new pain-relieving drug, drug A, and recorded their number of minutes to 100% pain relief. Suppose that a similar sample of patients tested another drug, drug B, in the same way, and investigators wished to know if one group of patients experienced total pain relief more quickly than the other group.
  • 227.
    In this case,the null hypothesis would be stated in this way: “there is no difference in time to 100% pain relief between the two pain-relieving drugs A and B”. The null hypothesis is one of no difference, no effect, no association, and serves as a reference point for the statistical test.
  • 228.
    In symbols, thenull hypothesis is referred to as H0 . In the comparison of the two drugs A and B, we can state the H0 in terms of there being no difference in the average number of minutes to pain relief between drugs A and B, or H0 : XA = XB . The alternative is that the means of the two drugs are not equal. This is an expression of the alternative hypothesis H1 .
  • 229.
    Null hypothesis H0 :XA = XB Alternative hypothesis H1 : XA ≠XB
  • 230.
     The AlphaLevel And P Value Before doing any calculations to test the null hypothesis, the investigator must establish a criterion called the alpha level, which is the maximum probability of making a false-positive error that the investigator is willing to accept.
  • 231.
    By custom, thelevel of alpha is usually set at p = 0.05. This says that the investigator is willing to run a 5% risk (but no more) of being in error when asserting that the treatment and control groups truly differ. In choosing an alpha level, the investigator inserts value judgment into the process. However, when that is done before the data are collected, at least the post hoc bias of being tempted to adjust the alpha level to make the data show statistical significance is avoided.
  • 232.
    The p valueobtained by a statistical test (such as the t-test) gives the probability that the observed difference could have been obtained by chance alone, given random variation and a single test of the null hypothesis. Usually, if the observed p value is ≤ 0.05, members of the scientific community who read about an investigation will accept the difference as being real.
  • 233.
    Although setting alphaat ≤ 0.05 is somewhat arbitrary, that level has become so customary that it is wise to provide explanations for choosing another alpha level or for choosing not to perform tests of significance at all, which may be the best approach in some descriptive studies.
  • 234.
    The p valueis the final arithmetic answer that is calculated by a statistical test of a hypothesis. Its magnitude informs the researcher as to the validity of the H0 , that is, whether to accept or reject the H0 as worth keeping. The p value is crucial for drawing the proper conclusions about a set of data.
  • 235.
    So what numericalvalue of p should be used as the dividing line for acceptance or rejection of the H0 ? Here is the decision rule for the observed value of p and the decision regarding the H0 . If p ≤ 0.05, reject the H0 If p > 0.05, accept the H0
  • 236.
    If the observedprobability is less than or equal to 0.05 (5%), the null hypothesis is rejected, that is, the observed outcome is judged to be incompatible with the notion of “no difference” or “no effect”, and the alternative hypothesis is adopted. In this case, the results are said to be “statistically significant”.
  • 237.
    If the observedprobability is greater than 0.05 (5%), the decision is to accept the null hypothesis, and the results are called “not statistically significant” or simply NS, the notation often used in tables.
  • 238.
    Statistical Versus ClinicalSignificance The distinction between statistical significance and clinical or practical significance is worth mentioning. For example, in the statistical test of the H0 : XA = XB for two drug groups, let’s assume that the observed probability is p = 0.01, a value that is less than the dividing line of 0.05 or 5%.
  • 239.
    This would leadthe investigator to reject the H0 and to conclude that the results are “significant at p = 0.01”, that is, one drug caused total pain relief significantly faster, on average, than the other drug at p = 0.01. But if the actual difference in the group means is itself clinically meaningless or negligible, the statistical significance may be considered real yet not useful.
  • 240.
    According to Dr.Horowitz, Statistical significance, “is a mathematical expression of the degree of confidence that an observed difference between groups represents a real difference – that a zero response would not occur if the study were repeated, and that the study is not merely due to chance”.
  • 241.
    On the otherhand, “clinical significance is a judgment made by the researcher or reader that differences in response to intervention observed between groups are important for health”. “It is a subjective evaluation of the test”, continues Dr. Horowitz, based on clinical experience and familiarity with the “disease or condition being measured”.
  • 242.
     Variation InIndividual Observations And In Multiple Samples Most tests of significance relate to a difference between means or proportions. They help investigators decide whether an observed difference is real, which in statistical terms is defined as whether the difference is greater than would be expected by chance alone.
  • 243.
    Inspecting the meansto see if they were different is inadequate because it is not known whether the observed difference was unusual or whether a difference that large might have been found infrequently if the experiment were repeated.
  • 244.
    To generalize beyondthe particular subjects in the single study, the investigators must know the extent to which the difference discovered in the study are reliable. The estimate of reliability is given by the standard error, which is not the same as the standard deviation.
  • 245.
    Standard Deviation AndStandard Error A normal distribution could be completely described by its mean and standard deviation. This information is useful in describing individual observations (raw data), but it is not useful in determining how close a sample mean from research data is to the mean for the underlying population (which is also called the true mean or the population mean). This determination must be made on the basis of the standard error.
  • 246.
    The standard erroris related to the standard deviation, but it differs from the standard deviation in important ways. Basically, the standard error is the standard deviation of a population of sample means, rather than of individual observations. Therefore the standard error refers to the variability of individual observations, so that it provides an idea of how variable a single estimate of the mean from one set of research data is likely to be.
  • 247.
    The frequency distributionof the 100 different means could be plotted, treating each mean as a single observation. These sample means will form a truly normal (gaussian) frequency distribution, the mean of which would be very close to the true mean for the underlying population.
  • 248.
    More important forthis discussion, the standard deviation of this distribution of sample means is an unbiased estimate of the standard deviation of the underlying population and is called the standard error of the distribution.
  • 249.
    The standard erroris a parameter that enables the investigator to do two things that are central to the function of statistics.  One is to estimate the probable amount of error around a quantitative assertion.  The other is to perform tests of statistical significance.
  • 250.
    If only thestandard deviation and sample size of one research sample are known, however, the standard deviation can be converted to a standard error so that these functions can be pursued.
  • 251.
    An unbiased estimateof the standard error can be obtained from the standard deviation of a single research sample if the standard deviation was originally calculated using the degrees of freedom (N - 1) in the denominator. The formula for converting a standard deviation (SD) to a standard error (SE) is as follows: Standard error = SE = SD  N
  • 252.
    The larger thesample size (N), the smaller the standard error, and the better the estimate of the population mean. At any given point on the x-axis, the height of the bell-shaped curve of the sample means represents the relative probability that a single sample mean would fall at that point. Most of the time, the sample mean would be near the true mean. Less often, it would be farther away.
  • 253.
    In the medicalliterature, means or proportions are often reported either as the mean plus or minus 1 SD or as the mean plus or minus 1 SE. Reported data must be examined carefully to determine whether the SD or the SE is shown. Either is acceptable in theory, because an SD can be converted to an SE and vice versa if the sample size is known. However, many journals have a policy stating whether the SD or SE must be reported. The sample size should also be shown.
  • 254.
    Confidence Intervals Whereas theSD shows the variability of individual observations, the SE shows the variability of means. Whereas the mean plus or minus 1.96 SD estimates the range in which 95% of individual observations would be expected to fall, the mean plus or minus 1.96 SE estimates the range in which 95% of the means of repeated samples of the same size would be expected to fall.
  • 255.
    Moreover, if thevalue for the mean plus or minus 1.96 SE is known, it can be used to calculate the 95% confidence interval, which is the range of values in which the investigator can be 95% confident that the true mean of the underlying population falls.
  • 256.
  • 257.
    The science ofbiostatistics has given us a large number of tests that can be applied to public health data. An understanding of the tests will guide an individual toward the efficient collection of data that will meet the assumptions of the statistical procedures particularly well.
  • 258.
    The tests allowinvestigators to compare two parameters, such as means or proportions, and to determine whether the difference between them is statistically significant.
  • 259.
    The various t-tests (the one tailed Student’s t- test, the two-tailed Student’s t –test, and the paired t- test) compare differences between means, while z- tests compare differences between proportions. All of these tests make comparisons possible by calculating the appropriate form of a ratio, which is called a critical ratio because it permits the investigator to make a decision.
  • 260.
    This is doneby comparing the ratio obtained from whatever test is performed (e.g., a t- test) with the values in the appropriate statistical table (e.g., a table of t values) for the observed number of degrees of freedom. Before individual tests are discussed in detail, the concepts of critical ratios and degrees of freedom are defined.
  • 261.
    Critical Ratios Critical ratiosare a class of tests of statistical significance that depend on dividing some parameter (such as a difference between means) by the standard error (SE) of that parameter.
  • 262.
    The general formulafor tests of statistical tests is as follows: Critical Ratio = Parameter  SE of that parameter
  • 263.
    When applied tothe student’s t- test, the formula becomes: Difference between two means Critical Ratio = t =  SE of the difference between two means
  • 264.
    When applied toa z- test, the formula becomes: Difference between two proportions Critical Ratio = z =  SE of the difference between two proportions
  • 265.
    The value ofthe critical ratio (e.g., t or z) is then looked up in the appropriate table (of t or z) to determine the corresponding value of p. For any critical ratio, the larger the ratio, the more likely that the difference between means or proportions is due to more than just random variation (i.e., the more likely it is that the difference can be considered statistically significant and, hence, real).
  • 266.
    Unless the totalsample size is small (say, under 30), the finding of a critical ratio of greater than about 2 usually indicates that the difference is real and enables the investigator to reject the null hypothesis. The statistical tables adjust the critical ratios for the sample size by means of the degrees of freedom.
  • 267.
    Degrees of Freedom Theterm “degrees of freedom” refers to the number of observations that are free to vary.
  • 268.
    The Idea BehindThe Degrees Of Freedom The term “degrees of freedom” refers to the number of observations (N) that are free to vary. The degree of freedom is lost every time a mean is calculated. Why should this be?
  • 269.
    Before putting ona pair of gloves, a person has the freedom to decide whether to begin with left or the right glove. However, once the person puts on the first glove, he or she loses the freedom to decide which glove to put on last. If centipedes put on shoes, they would have a choice to make for the first 99 shoes but not for the 100th shoe. Right at the end, the freedom to choose (vary) is restricted.
  • 270.
    In statistics, ifthere are two observed values, only one estimate of the variation between them is possible. Something has to serve as the basis against which other observations are compared. The mean is the most “solid” estimate of the expected value of a variable, so it is assumed to be “fixed”.
  • 271.
    This implies thatthe numerator of the mean (the sum of individual observations, or the sum of xi ), which is based on N observations, is also fixed. Once N – 1 observations (each of which was, presumably, free to vary) have been added up, the last observation is not free to vary, because the total values of the N observations must add up to the sum of xi .
  • 272.
    For this reason,1 degree of freedom is lost each time a mean is calculated. The proper average of a sum of squares when calculated from an observed sample, therefore, is the sum of squares divided by the degrees of freedom (N - 1).
  • 273.
    Hence, for simplicity,the degrees of freedom for any test are considered to be the total sample size minus 1 degree of freedom for each mean that is calculated. In Student’s t- test 2 degrees of freedom are lost because two means are calculated (one mean for each group whose means are to be compared).
  • 274.
    The general formulafor degrees of freedom for the Student’s two-group t- test is N1 + N2 – 2, where N1 is the sample size in the first group and N2 is the sample size in the second group.
  • 275.
    Use of t-test In medical research, t- tests are among the three or four most commonly used statistical tests (Emerson and Colditz 1983)6. The purpose of t- test is to compare the means of a continuous variable in two research samples in order to determine whether or not the difference between the two observed means exceeds the difference that would be expected by chance from random sample.
  • 276.
    Sample population andSizes If two different samples come from two different groups (e.g., a group of men and a group of women), the Student’s t- test is used. If the two samples come from the same group (e.g., pretreatment and post- treatment values for the same study subjects), the paired t- test is used.
  • 277.
    Both types oft-tests depend on certain assumptions, including the assumption that the data in the continuous variable are normally distributed (i.e., have a bell-shaped distribution). Very seldom, however, will observed data be perfectly normally distributed. Does this invalidate the t-test? Fortunately, it does not. There is a convenient theorem, that rescues the t-test (and much of statistics as well).
  • 278.
    The central limittheorem can be derived theoretically or observed by experimentation. According to the theorem, for reasonably large samples (say, 30 or more observations in each sample), the distribution of the means of many samples is normal (gaussian), even though the data in individual samples may have skewness, kurtosis, or unevenness.
  • 279.
    Because the criticaltheoretical requirement for the t-test is that the sample means be normally distributed, a t-test may be compared on almost any set of continuous data, if the observations can be considered a random sample and the sample size is reasonable large.
  • 280.
    The t-distribution The tdistribution was described by William Gosset, who used the pseudonym “Student” when he wrote the description.
  • 281.
    The t distributionlooks similar to normal distribution, except that its tails are somewhat wider and its peak is slightly less high, depending on the sample size. The t distribution is necessary because when sample sizes are small, the observed estimates of the mean and variance are subject to considerable error.
  • 282.
    The larger thesample size is, the smaller the errors are, and the more the t distribution looks like the normal distribution. In the case of an infinite sample size, the two distributions are identical. For practical purposes, when the combined sample size of the two groups being compared is larger than 120, the difference between the normal distribution and the t distribution is negligible.
  • 283.
    Student’s t test Thereare two types of Student’s t test: the one-tailed and the two-tailed type. The calculations are the same, but the interpretation of the resulting t differs somewhat. The common features will be discussed before the differences are outlined.
  • 284.
    Calculation of thevalue of t. In both types of Student’s t test, t is calculated by taking the observed differences between the means of the two groups (the numerator) and dividing this difference by the standard error of the difference between the means of the two groups (the denominator).
  • 285.
    Before t canbe calculated, then, the standard error of the difference between the means (SED) must be determined. The basic formula for this is the square root of the sum of the respective population variances, each divided by its own sample size.
  • 286.
    When the Student’st-test is used to test the null hypothesis in research involving an experimrntal group and a control group, it usually takes the general form of the following equation: t = xE - xC – 0 s2 p [(1 / NE ) + (1 / NC )] df = NE + NC – 2
  • 287.
    The 0 inthe numerator of the equation for t was added for correctness, because the t-test determines if the difference between the means is significantly different from 0. However, because the 0 does not affect the calculations in any way, it is usually omitted from t-test formulas.
  • 288.
    The same formula,recast in terms to apply to any two independent samples (e.g., samples of men and women), is as follows, t = x1 - x2 - 0 s2 p [(1 / N1 ) + (1 / N2 )] df = N1 + N2 – 2
  • 289.
    in which x1 isthe mean of the first sample, x2 is the mean of the second sample, s2 p is the pooled estimate of the variance, N1 is the size of the first sample, N2 is the size of the second sample, and df is the degrees of freedom.
  • 290.
    The 0 inthe numerator indicates that the null hypothesis states that the difference between the means will not be significantly different from 0. The df is needed to enable the investigator to refer to the correct line in the table of the values of t and their relationship to p.
  • 291.
    The t testis designed to help investigators distinguish “explained variation” from “unexplained variation” (random error, or chance). These concepts are like “signal” and “background noise” in radio broadcast engineering. Listeners who are searching for a particular station on their radio dial will find background nose on almost every radio frequency.
  • 292.
    When they reachthe station that they want to hear, they may not notice the background noise, since the signal is so much stronger than this noise. In medical studies, the particular factor that is being investigated is similar to the radio signal, and random error is similar to background noise.
  • 293.
    Statistical analysis helpsdistinguish one from the other by comparing their strengths. If the variation caused by the factor of interest is considerably larger than the variation caused by random factors (i.e., if in the t-test the ratio is approximately 1.96), the effect of the factor of interest becomes detectable above the statistical “noise” of random factors.
  • 294.
    Interpretation of theresults If the value of t is large, the p value will be small, because it is unlikely that a large t ratio will be obtained by chance alone. If the p value is 0.05 or less, it is customary to assume that there is a real difference. Conceptually, the p value is the probability of being in error if the null hypothesis of no difference between the means is rejected and the alternative hypothesis of a true difference is accepted.
  • 295.
    • One-Tailed andTwo-Tailed t-Tests • These tests are sometimes called the one- sided test and the two-sided tests.
  • 296.
    • In thetwo-tailed test, alpha is equally divided at the ends of the two tails of the distribution. The two- tailed test is generally recommended, because differences in either direction are usually important to document.
  • 298.
    For example, itis obviously important to know if a new treatment is significantly better than a standard or placebo treatment, but it is also important to know if a new treatment is significantly worse and should therefore be avoided. In this situation, the two-tailed test provides an accepted criterion for when a difference shows the new treatment to be either better or worse.
  • 299.
    Sometimes, however, onlya one-tailed test is needed. Suppose, for example, that a new therapy is known to cost much more than the currently used therapy. Obviously, it would not be used if it were worse than the current therapy, but it would also not be used if it were merely as good as the current therapy.
  • 300.
    Under these circumstances,some investigators consider it acceptable to use a one-tailed test. When this occurs, the 5% rejection region for the null hypothesis is all put on one tail of the distribution, instead of being evenly divided between the extremes of the two tails.
  • 301.
    In the one-tailedtest, the null hypothesis nonrejection region extends only to 1.645 standard errors above the “no difference” point of 0. In the two-tailed test, it extends to 1.96 standard errors above and below the “no difference” point.
  • 302.
    This makes theone-tailed test more robust-that is, more able to detect a significant difference, if it is in the expected direction. Many investigators dislike one-tailed tests, because they believe that if an intervention is significantly worse than the standard therapy, that should be documented scientifically. Most reviewers and editors require that the use of a one-tailed significance test be justified.
  • 303.
    Paired t- test Inmany medical studies, individuals are followed over time to see if there is a change in the value of some continuous variable. Typically, this occurs in a “better and after” experiment, such as one testing to see if there was a drop in average blood pressure following treatment or to see if there was a drop in weight following the use of a special diet. In this type of comparison, an individual patient serves as his or her own control.
  • 304.
    The appropriate statisticaltest for this kind of data is the paired t-test. The paired t-test is more robust than the Student’s t-test because it considers the variation from only one group of people, whereas the Student’s t-test considers variation from two groups. Any variation that is detected in the paired t-test is attributable to the intervention or to changes over time in the same person.
  • 305.
    Calculation of thevalue of t To calculate a paired t-test, a new variable is created. This variable, called d, is the difference between the values before and after the intervention for each individual studied.
  • 306.
    The paired t-testis a test of the null hypothesis that, on the average, the difference is equal to 0, which is what would be expected if there were no change over time. Using the symbol d to indicate the mean observed difference between the before and after values, the formula for the paired t-test is as follows:
  • 307.
    tpaired = tp = d– 0 Standard error of d = d – 0 sd 2 N
  • 308.
    df = N– 1 But in the paired t-test, because only one mean is calculated (d) , only one degree of freedom is lost; therefore, the formula for the degrees of freedom is N – 1.
  • 309.
    Interpretation of theresults If the value of t is large, the p value will be small, because it is unlikely that a large t ratio will be obtained by chance alone. If the p value is 0.05 or less, it is customary to assume that there is a real difference (i.e., that the null hypothesis of no difference can be rejected).
  • 310.
    Use of z-tests Incontrast to t-tests, which compare differences between means, z-tests compare differences between proportions. In medicine, examples of proportions that are frequently studied are sensitivity, specificity, positive predictive value, risks, percentages of people with a given symptom, percentages of people who are ill, and percentages of ill people who survive their illness
  • 311.
    Frequently, the goalof research is to see if the proportion of patients surviving in a treated group differs from that in an untreated group. This can be evaluated using a z-test for proportions.
  • 312.
    Calculation of thevalue of z As discussed earlier, z is calculated by taking the observed difference between the two proportions (the numerator) and dividing it by the standard error of the difference between the two proportions (the denominator).
  • 313.
    For purposes ofillustration, assume that research is being conducted to see if the proportion of patients surviving in a treated group is greater than that in an untreated group. For each group, if p is the proportion of successes (survivals), then 1 – p is the proportion of failures (nonsurvivals).
  • 314.
    If N representsthe size of the group on which the proportion is based, the parameters of the proportion could as follows: Variance (proportion) = p (1 - p) N
  • 315.
    Standard error (proportion)= SEp = p (1 - p) N 95% confidence interval = 95% CI = p ± 1.96 SEp
  • 316.
    if there isa 0.60 (60%) survival rate following a given treatment, the calculations of SEp and the 95% CI of the proportion, based on a sample of 100 study subjects, would be as follows: SEp = (0.6) (0.4) / 100 = 0.24 / 100 = 0.49 / 10 = 0.049
  • 317.
    95% CI =0.6 ± (1.96) (0.049) = 0.6 ± 0.096 = between 0.6 – 0.096 and 0.6 + 0.096 = 0.504, 0.696
  • 318.
    Now that thereis a way to obtain the standard error of a proportion, the standard error of the difference between proportions also can be obtained, and the equation for the z-test can be expressed as follows: z = p1 – p2 -0 p (1 - p) [(1/ N1 ) + (1/ N2 )]
  • 319.
    in which p1 isthe proportion of the first sample, p2 is the proportion of the second sample, N1 is the size of the first sample, N2 is the size of the second sample, and p is the mean proportion of successes in all observations combined. The 0 in the numerator indicates that the null hypothesis states that the difference between the proportions will not be significantly different from 0.
  • 320.
    Interpretation of results Notethat the above formula for z is similar to the formula for t in the Student’s t-test, as described earlier. However, because the variance and the standard error of the proportion are based on a theoretical distribution (the binominal approximation to the z distribution), the z distribution is used instead of the t distribution in determining whether the difference is statistically significant. When the z ratio is large (as when the t ratio is large), the difference is more likely to be real.
  • 321.
    The computations forthe z tests appear different from the computations for the chi-square test, but when the same data are set up as a 2 × 2 table, technically the computations for the two tests are identical. Most people find it easier to do a chi- square test than do a z-test for proportions.
  • 322.
  • 323.
    A variety ofstatistical tests can be used to analyze the relationship between two or more variables. The bivariate analysis is the analysis of the relationship between one independent (possibly causal) variable and one dependent (outcome) variable. Whereas, the multivariable analysis is the analysis of the relationship of more than one independent variable to a single dependable variable.
  • 324.
    Statistical tests shouldbe chosen only after the types of clinical data to be analyzed and the basic research design have been established. In general, the analytic approach should begin with a study of the individual variables, including their distributions and outliers, and with a search for errors. Then bivariate analysis can be done to test hypotheses and probe for relationships. Only after these procedures have been done carefully should multivariable analysis be attempted.
  • 325.
    Among the factorsinvolved in choosing an appropriate statistical test are the goals and research design of the study and the type of data being gathered. In some studies the investigators are interested in descriptive information, such as the sensitivity or specificity of a laboratory assay, in which case there may be no reason to perform a test of statistical significance.
  • 326.
    In other studies,the investigators are interested in determining whether the difference between two means is real, in which case testing for statistical significance is appropriate.
  • 327.
    The types ofvariables and the research designs set the limits to statistical analysis and determine which tests are appropriate. An investigator’s knowledge of the types of variables (continuous data, ordinal data, dichotomous data and nominal data) and appropriate statistical tests is analogous to a painter’s knowledge of the types of media (oils, tempera, water colors, and so forth) and the appropriate brushes and techniques to be used.
  • 328.
    If the researchdesign involves before and after comparisons in the same study subjects or involves comparisons of matched pairs of study subjects, a paired test of statistical significance- such as the paired t-test, the Wilcoxon matched pairs signed-ranks test, or the McNemar chi- square test- would be appropriate. Moreover, if the sampling procedure in a study is not random, statistical tests that assume random sampling, such as most of the parametric tests, may not be valid.
  • 329.
    Making inferences fromcontinuous (parametric) data
  • 330.
    If the studyinvolves two continuous variables, the following questions may be answered: (1) is there a real relationship between the variables or not? (2) If there is real relationship, is it a positive or negative linear relationship (a straight-line relationship), or is it more complex? (3) If there is a real relationship, how strong is it? (4) How likely is the relationship to be generalizable?
  • 331.
    The best wayto answer these questions is first to plot the continuous data on a joint distribution graph and then to perform correlation analysis and simple linear regression analysis.
  • 332.
    The Joint DistributionGraph Taking the example of a sample of elderly xerostomia patients, does the number of root caries increase with increasing amounts of sugar in the diet (number of servings per day)? In this instance, data are recorded on a single group of subjects, and each subject constitutes a pair of measures (number of servings per day of sugar and number of root caries). Commonly, any pair of variables entered into a correlation analysis is given the names x and y.
  • 333.
  • 334.
    This data canbe plotted on a joint distribution graph, as shown in fig. The data do not form a perfectly straight line, but they do appear to lie along a straight line, going from the lower left to the upper right on the graph, and all of these observations but one are fairly close to the line.
  • 335.
    As indicated infig, the correlation between two variables, labeled x and y, can range from nonexistent to strong. If the value of y increases as x increases, the correlation is positive; if y increases as x increases, the correlation is negative.
  • 337.
    It appears fromthe graph that the correlation between amounts of sugar and dental caries is strong and is positive. Y X
  • 338.
    Therefore, based onfig, the answer to the first question above is that there is a real relationship between amount of sugar and dental caries. The graph, however, does not reveal the probability that such a relationship could have occurred by chance. The answer to the second question is that the relationship is positive and is linear. The graph does not provide quantitative information about how strong the association is (although it looks strong to the eye).
  • 339.
    To answer thesequestions more precisely, it s necessary to use the techniques of correlation and simple linear regression. Neither the graph nor these techniques, however, can answer the question of how generalizable the findings are.
  • 340.
    The Pearson CorrelationCoefficient Even without plotting the observations for two variables (variable x and variable y) on a graph, the extent of their linear relationship can be determined by calculating the Pearson product- moment correlation coefficient, which is given the symbol r and is referred to as the r value.
  • 341.
    This statistic variesfrom –1 to +1, going through 0. A finding of –1indicates that the two variables have a perfect negative linear relationship; +1 indicates that they have a perfect positive linear relationship; and 0 indicates that the two variables are totally independent of each other. The r value is rarely found to be –1 or +1.
  • 342.
    Frequently, there isan imperfect correlation between the two variables, resulting in r values between 0 and 1 or between 0 and –1. Because the Pearson correlation coefficient is strongly influenced by extreme values, the value of r can only be trusted when the distribution of each of the two variables to be correlated is approximately normal (i.e., without sever skewness or extreme outlier values).
  • 343.
    As is thecase in every test of significance, for a fixed level of strength of association, the larger the sample size, the more likely it is to be statistically significant. A weak correlation in a large sample might be statistically significant, despite the fact that it was not etiologically or clinically important.
  • 344.
    There is noperfect statistical way to estimate clinical importance, but with continuous variables a valuable concept is the strength of the association, measured by the square of the correlation coefficient, or r2 .
  • 345.
    The r2 value isthe proportion of variation in y explained by x (or vice versa). It is an important parameter in advanced statistics. Looking at the strength of association is analogous to looking at the size and clinical importance of an observed difference.
  • 346.
    Linear Regression Analysis Linearregression is related to correlation analysis, but it produces two parameters that can be directly related to the data (i.e., the slope and the intercept). Linear regression seeks to quantify the linear relationship that may exist between an independent variable x and a dependent variable y.
  • 347.
    Recall that theformula for a straight line, as expressed in statistics, is y=a+bx. The y is the value of an observation on the y-axis; x is the value of the same observation on the x-axis; a is the regression constant (the value of y when the value of x is 0); and b is the slope (the change in the value of y for a unit change in the value of x).
  • 348.
    Linear regression isused to estimate two parameters: the slope of the line (b) and the y- intercept (a). Most fundamental is the slope, which determines the strength of the impact of variable x on y. For example, the slope can tell how much weight will increase, on the average, for each additional centimeter of height.
  • 349.
    Linear regression analysisenables investigators to predict the value of y from the values that x takes. In other words, the formula for linear regression is a form of statistical modeling, and the adequacy of the model is determined by how closely the value of y can be predicted from other data in the model.
  • 350.
    Just as itis possible to set confidence intervals around parameters such as means and proportions, it is possible to set confidence intervals around the slope and the intercept, using computations based on linear regression formulas. Most statistical computer programs perform these computations and are within the scope of advanced statistics.
  • 351.
  • 352.
    Many medical dataare ordinal data, which are ranked from the lowest value to the highest value but are not measured on an exact scale. In some cases, investigators will assume that ordinal data meet the criteria for continuous (measurement) data and will treat the ordinal data as though they had been obtained from a measurement scale.
  • 353.
    For example, ifthe patient’s satisfaction with the care in a given hospital were being studied, the investigators might assume that the conceptual distance between “very satisfied” (coded as a 3) and “fairly satisfied” (coded as a 2) is equal to the distance between “fairly satisfied” (coded as a 2) and “unsatisfied” (coded as a 1).
  • 354.
    If the investigatorsare willing to make these assumptions, the data can be analyzed using the parametric statistical methods such as t-tests, analysis of variance, and analysis of the Pearson correlation coefficient. However, sometimes clinical investigators make this assumption when it is appropriate, because the statistics are easier to obtain and are more likely to produce statistical significance.
  • 355.
    If the investigatoris unwilling to make such assumptions, statistics for discrete (nonparametric) data, such as a chi-square test, can be used. However, analysis using chi-square would require discarding the information about the rank of each observation. Fortunately, there are a number of bivariate statistical tests for ordinal data that can be used.
  • 356.
    The Mann-Whitney UTest It is one of the best-known non-parametric significance tests. It was proposed, apparently independently, by Mann and Whitney (1947) and Wilcoxon (1945), and therefore is sometimes also called the Mann-Whitney-Wilcoxon (MWW) test or the Wilcoxon rank-sum test.
  • 357.
    In statistics theMann-Whitney U test is a test for assessing whether the meidans between two samples of observations are the same. The null hypothesis is that the two samples are drawn from a single population, and therefore that the medians are equal. It requires the two samples to be independet, and the observations to be ordinal or continuous measurements, i.e. one can at least say, of any two observations, which is the greater.
  • 358.
    The test forordinal data that is similar to the Student’s t-test is the Mann-Whitney U test, also called the Wilcoxon rank-sum test. U, like t, designates a probability distribution. In the Mann- Whitney test, all of the observations in a study of two samples are ranked numerically from the smallest to the largest, without regard to whether the observations came from the first sample (e.g., the control group) or from the second sample (e.g., the experimental group).
  • 359.
    Next, the observationsfrom the first sample are identified, the ranks in this sample are summed, and the average rank for the first sample and the variance of those ranks are determined. The process is repeated for the observations from the second sample. If the null hypothesis is true (i.e., if there is no real difference between the two samples), the average ranks of the two samples should be similar.
  • 360.
    If the averagerank of one sample is considerably greater or considerably smaller than that of the other sample, the null hypothesis probably can be rejected, but a test of significance is needed to be sure.
  • 361.
    Because the U-methodfor calculating t is tedious, a t-test can be done instead and will yield very similar results. The Student’s t-test uses raw ranked data and divides the difference between the two average ranks (which form the numerator) by the square root of the pooled variance of the two rank lists. The degrees of freedom equals the sum of the sample sizes of the two groups minus 2.
  • 362.
    The Wilcoxon Matched-PairsSigned-Ranks Test The test is named for Frank Wilcoxon (1892– 1965) who proposed this, and the rank-sum test for two independent samples (Wilcoxon, 1945). Like the t-test, the Wilcoxon test involves comparisons of differences between measurements, so it requires that the data are measured at an interval level of measurement.
  • 363.
    However it doesnot require assumptions about the form of the distribution of the measurements. It should therefore be used whenever the distributional assumptions that underlie the t-test cannot be satisfied.
  • 364.
    The rank-order testthat is comparable to the paired t-test is the Wilcoxon matched-pairs signed-ranks test. In this test, all of the observations in a study of two samples are ranked numerically from the largest to the smallest, without regard to whether the observations came from the first sample (e.g., the pretreatment sample) or from the second sample (e.g., the post treatment sample).
  • 365.
    After pairs ofdata are identified (e.g., pretreatment and post treatment sample), the difference in rank is identified for each pair. If in a given pair the pretreatment observation scored 7 ranks higher than the post treatment observation, the difference would be noted as –7. If in another pair the pretreatment observation scored 5 ranks lower than the post treatment observation, the difference would be noted as +5.
  • 366.
    Each pair wouldbe scored in this way. If the null hypothesis is true (i.e., if there is no real difference between the samples), the sum of the positive scores and negative scores should be close to 0. If the average difference is considerably different from 0, the null hypothesis can be rejected.
  • 367.
    The Kruskal-Wallis Test Ifthe investigators in a study involving continuous data want to compare the means of three or more groups simultaneously, the appropriate test is a one-way analysis of variance (a one-way ANOVA), usually called an f-test. The comparable test for ordinal data is called Kruskal-Wallis one-way ANOVA.
  • 368.
    As in theMann-Whitney U test, in the Kruskal- Wallis test all of the data are ranked numerically, and the rank values are summed in each of the groups to be compared. The Kruskal-Wallis test seeks to determine if the average ranks from three or more groups differ from one another more than would be expected by chance alone.
  • 369.
    The Sign Test Thesign test can be used to test the hypothesis that there is "no difference" between two continuous distributions X and Y. Sometimes an experimental intervention produces positive results in many areas, but few if any of the individual outcome variables show a statistically significant improvement.
  • 370.
    In this case,the sign test can be extremely helpful in comparing the results in the experimental group with those in the control group. If the null hypothesis is true (i.e., there is no real difference between the groups), then, by chance, for half of the outcome variables the experimental group should perform better, and for half of the outcome variables the control group should perform better.
  • 371.
    The only dataneeded for the sign test are the record of whether, on the average, the experimental subjects or the control subjects scored “better” on each outcome variable (by what amount is not important).
  • 372.
    If the averagescore in the experimental group is better, the result is recorded as a plus sign (+); if the average score in the control group is better, the result is scored as a minus sign (-); and if the average score in the two groups is exactly the same, no result is recorded and the variable is omitted from the analysis.
  • 373.
    For the signtest, “better” can be determined from a continuous variable, an ordinal variable, a dichotomous variable, a clinical score, or a component of a score. Because under the null hypothesis, the expected proportion of plus signs is 0.5 and of minus signs is 0.5, the test compares the observed proportion of successes with the expected value of 0.5.
  • 374.
    Making Inferences FromDichotomous And Nominal (Nonparametric) Data
  • 375.
    The chi-square test,the Fisher exact probability test, and the McNemar chi-square test can be used in the bivariate analysis of dichotomous nonparametric data. Usually, the data are first arranged in a 2×2 table, and the goal is to test the null hypothesis that the variables are independent.
  • 376.
    The 2×2 ContingencyTable The contingency table is used to determine whether the distribution of one variable is conditionally dependent (contingent) upon the other variable. More specifically, provides an example of a 2×2 contingency table, meaning that it has two cells in each direction.
  • 377.
    In a contingencytable, a cell is a specific location in the matrix created by the two variables whose relationship is being studied. Each cell shows the observed number, the expected number, and the percentage of study subjects in each treatment group who lived or died.
  • 379.
    If there aremore than two cells in each direction of a contingency table, the table is called an R × C table, where R stands for the number of rows and C stands for the number of columns. Although the principles of the chi-square test are valid for R × C tables, the discussion below focuses on 2×2 tables.
  • 380.
    The Chi-Square TestOf Independence After t-tests, the most basic and common form of standard analysis in the medical literature is the chi-square test of the independence of two variables in a contingency table (Emerson and Colditz 1983).
  • 381.
    The chi-square testis an example of a common approach to statistical analysis known as statistical modeling, which seeks to develop a statistical expression (the model) that predicts the behavior of a dependent variable on the basis of knowledge of one or more independent variables.
  • 382.
    The process ofcomparing the observed counts with the expected counts- that is, of comparing O with E- is called a goodness of fit test, because the goal is to see how well the observed counts in a contingency table “fit” the counts expected on the basis of the model. Usually, the model in such a table is the null hypothesis that the two variables are independent of each other.
  • 383.
    If the chi-squarevalue is small, the fit is good and the null hypothesis is not rejected. If, however, the chi-square value is large, the data do not fit the hypothesis well.
  • 384.
    Calculation Of TheChi-Square Value Once the observed (O) and expected (E) counts are known, the chi-square (χ2 ) value can be calculated. One of two methods can be used, depending on the size: Method for large numbers Method for Small Numbers
  • 385.
    Method for largenumbers In box, the investigators begin by calculating the chi-square value for each cell in the table, using the following formula: (O – E)2 E
  • 386.
    Here, the numeratoris the square of the deviation of the observed count in a given cell from the count that would be expected in that cell if the null hypothesis were true.
  • 387.
    This is similarto the numerator of the variance, which is expressed as (xi - x)2 , where xi represents the observed value and x (the mean) is the expected value. However, whereas the denominator for variance is the degrees of freedom (N - 1), the denominator for chi-square is the expected number (E).
  • 388.
    To obtain thetotal chi-square value for a 2×2 table, the investigators then add up the chi-square values for the four cells: χ2 = Σ (O – E)2 E
  • 389.
    Thus, the basicstatistical method for measuring the total amount of variation in a data set, the total sum of squares (TSS), is rewritten for the chi- square test as the sum of (O – E)2 .
  • 390.
    Method for SmallNumbers Because the chi-square test is based on the normal approximation of the binomial distribution (which is discontinuous), many statisticians believe that a correction for continuity is needed in the equation for calculating chi-square, while others believe that this is unnecessary.
  • 391.
    The correction, originallydescribed by F. Yates and called the Yates correction for continuity, makes little difference if the numbers in the table are large, but in tables with small numbers it probably is worth doing.
  • 392.
    The only changein the chi-square test formula given above is that in the continuity corrected chi- square test, the number 0.5 is subtracted from the absolute value of the (O -E) in each cell before squaring. The formula is as follows:
  • 393.
    Yates χ2 = Σ(O - E- 0.5)2 E
  • 394.
    Clearly, the useof this formula reduces the size of the chi-square value somewhat and reduces the chance of finding a statistically significant difference, so that correction for continuity makes the test more conservative.
  • 395.
    Determination of Degreesof Freedom The term degrees of freedom refers to the number of observations that can be considered to be free to vary. According to the null hypothesis, the best estimate of the expected distribution of counts in the cells of a contingency table is provided by the row and column totals.
  • 396.
    Therefore, the rowand column totals are considered to be fixed, as is the mean in calculating a variance. An observed count can be entered “freely” into one of the cells of a 2×2 table has only 1 degree of freedom.
  • 397.
  • 398.
    Statistical models thathave one outcome variable but include more than one independent variable are generally called multivariable models. Multivariable models are intuitively attractive to investigator to ignore the basic principles of good research design and analysis, because multivariable analysis also has many limitations.
  • 399.
    The methodology andinterpretation of findings in this type of analysis are difficult for most physicians, despite the fact that the methods and results of multivariable analysis are reported frequently in the medical literature and their use is increasing (Concato, Feinstein, and Holford 1993)6.
  • 400.
    Their conceptual attractivenessand the availability of high-speed computers contribute to making these models popular. In order to be intelligent consumers of the medical literature, health care professionals should understand how to interpret the findings of multivariable analysis as they are presented in the literature.
  • 401.
    The General LinearModel The multivariable equation, with one dependent variable and one or more independent variables, is usually called the general linear model. The model is “general” because there are many variations regarding the types of variables for y and xi as well as the number of x variables that can be used. The model is “linear” because it is a linear combination of the xi terms.
  • 402.
    For the xi variables,a variety of transformations (e.g., square of x, cube of x, square root of x, or logarithm of x) could be used and the combination of terms would still be linear, so that the model would remain linear. What cannot happen if the model is to remain linear is for any of the coefficients (the bi terms) to be a square, a square root, a logarithm, or another transformation.
  • 403.
    Numerous procedures formultivariable analysis are based on the general linear model. These include methods with such imposing terms as analysis of variance (ANOVA), analysis of covariance (ANCOVA), multiple linear regression analysis, multiple logistic regression, the log- linear model, and discriminant function analysis.
  • 404.
    The choice ofwhich procedure to use depends primarily on whether the dependent and independent variables are continuous, dichotomous, nominal, or ordinal. Knowing that the procedures are all variations of the same theme (the general linear model) helps to make them less confusing.
  • 405.
    Analysis of variance(ANOVA) If the dependent variable is continuous and all of the independent variables are categorical (i.e., nominal, dichotomous, or ordinal), the correct multivariable technique is analysis of variance (ANOVA).
  • 406.
    One-way ANOVA andN-way ANOVA are discussed briefly. Both the techniques are based on the general linear model and can be use to analyze the results of an experimental study. If the design includes only one independent variable (e.g., treatment), the technique is called one-way analysis, regardless of how many different treatment groups are present. If it includes more than one independent variable (e.g., treatment, age group, and gender), the technique is called N-way ANOVA.
  • 407.
    One-Way ANOVA (TheF-Test) Suppose a team of investigators wanted to study the effects of drugs A and B on blood pressure. They might randomly allocate hypertensive patients into four treatment groups: those taking drug A alone, those taking drug B alone, those taking drugs A and B in combination, and those taking a placebo.
  • 408.
    The investigators wouldmeasure systolic blood pressure before and after treatment in each patient and calculate a difffernce score (posttreatment systolic pressure minus pretreatment systolic pressure) for each study subject. This difference score would become the outcome variable. They would then calculate a mean difference score for each of the four treatment groups (i.e., the three drug groups and the one placebo group) so that these mean scores could be compared in a test of statistical significance.
  • 409.
    The investigators wouldwant to determine whether the difference in blood pressure found in one or more of the drug groups was large enough to be clinically important, assuming it was a drop. Fro example, a drop in mea systolic blood pressure from 150 mm Hg to 148 mm Hg would be too small to be clinically useful. If the results were not clinically useful, there would be little point in looking for an appropriate test of significance.
  • 410.
    If, however, oneor more of the groups showed a clinically important drop in blood pressure, the investigators would want to determine whether the difference was likely to have occurred by chance alone. To do this, an appropriate statistical test of significance is needed.
  • 411.
    The Student’s t-testcould be used to compare each pair of groups, but this would require six different t-tests: each of the three drug groups (A, B, and AB) versus the placebo group; drug A group versus drug B group; drug A group versus drug combination AB group; and drub B group versus drug combination AB group. This raises the problem of multiple hypotheses and multiple associations.
  • 412.
    Even if theinvestigators decide that the primary comparison should be each drug or drug combination with the placebo, this would still leave three hypotheses to test instead of just one. Moreover, if two or three groups did significantly better than the placebo group, it would be necessary to determine if one effective drug was significantly better than the others.
  • 413.
    There are numerouscomplex ways of handling the problem of multiple associations, but the best approach in cases such as this is to begin by performing an F-test, which is the first step of ANOVA. The F-test is a kind of “super t-test” that allows the investigators to compare more than two means simultaneously.
  • 414.
    . The nullhypothesis for the F-test in the previous example is that the mean change in blood pressure (d) will be the same for all four groups (dA = dB = dAB = dp ), indicating that all samples were from the same population and that any differences between the means are due to chance variation.
  • 415.
    In creating theF-test (F is for Fisher), Sir Ronald Fisher reasoned that if two different methods could be found to estimate the variance and if all of the samples came from the same population, these two different estimates of variance should be similar. He therefore developed two measures of the variance of the observations.
  • 416.
    One is calledbetween-groups variance and is based on the variation between (or among) the means. The other is called within-groups variance and is based on the variation within each group- i.e., variation around a single group mean. In ANOVA, these two measures of variance are also called the between-groups mean square and the within-groups mean square.
  • 417.
    The ratio ofthe two measures of variance can therefore be expressed as follows: F ratio = Between-groups variance = Between-groups mean square Within-groups variance Within-groups mean square
  • 418.
    If the F-ratiois fairly close to 1.0, the two estimates of variance are similar, and the null hypothesis that all of the means came from the same underlying population is not rejected. If the ratio is much larger than 1.0, there must have been some force, attributable to group differences, pushing the means apart, and the null hypothesis of no difference is rejected.
  • 419.
    N-Way ANOVA The goalof ANOVA, stated in the simplest terms, is to explain (to “model”) the total variation found in a study.
  • 420.
    If only oneindependent variable is tested in a model and that variable happens to be gender, the total amount of variation must be explained in terms of how much variation is due to gender and how much is not. Any variation (SS) that is not due to the model (gender) is considered to be error (residual) variation.
  • 421.
    If two independentvariables are tested in a model and those variables happen to be treatment and gender, the total amount of variation must be explained in terms of how much variation is due to each of the following: the independent effect of gender, the interaction between (joint effect of) treatment and gender, and error.
  • 422.
    If more thantwo variables are tested, the analysis becomes increasingly complicated, but the underlying logic remains the same. As long as research design is balanced-that is, there are equal numbers of observations in all of the study groups-ANOVA can be used to analyze the individual and joint effects of the independent variables and to partition the total variation into the various component parts.
  • 423.
    Analysis of covariance(ANCOVA) Analysis of variance (ANOVA) and analysis of covariance (ANCOVA) are methods for evaluating studies in which the dependent variable is continuous. If the independent variables are all of the categorical type (nominal or dichotomous), then ANOVA is used. However, if some of the independent variables are categorical and some are continuous, then ANCOVA is appropriate.
  • 424.
    ANCOVA would beused, for example, in a study in which the goal was to test the effects of hypertensive drugs on systolic blood pressure (a continuous variable that is the dependent variable here) and the independent variables were age (a continuous variable) and treatment (a categorical variable with four levels-i.e., those treated with drug A, those treated with drug B, those treated with both A and B, and those treated with a placebo).
  • 425.
    The ANCOVA procedureadjusts the dependent variable on the basis of the continuous independent variable or variables, and it then does an N-Way ANOVA on the adjusted dependent variable. In the above example, the ANCOVA procedure would remove the effect of age from the analysis of the effect of the drugs on systolic blood pressure.
  • 426.
    Controlling for agemeans that (artificially) all of the study subjects are made the same age. Suppose that the mean systolic blood pressure in the study group is 150 mm Hg at an average age of 50 years.
  • 427.
    The first step(and this is all done by the computer packages that have ANCOVA) is to do a simple regression between age and blood pressure, which shows that the blood pressure increases, say, an average of 1 mm Hg for each year of age over 50 years and decreases an average of 1 mm Hg for each year of age under 50. Thus, if a subject’s age is 59, then 9 mm Hg would be subtracted from that subject’s current blood pressure to arrive at the adjusted blood pressure.
  • 428.
    If another subject’sage is 35, then 15 mm Hg would be added to that subject’s current blood pressure to arrive at the adjusted value. If a subject’s age is 50, no adjustment is necessary, because that subject is already at the population mean age. ANCOVA can adjust the dependent variable for several continuous independent variables (called covariates) at the same time.
  • 429.
    Multiple Linear Regression Ifthe dependent variable and all of the independent variables are continuous, the correct type of multi-variable analysis is multiple linear regression. There are several computerized methods of analyzing the data in a multiple linear regression. Probably the most common method is called stepwise linear regression.
  • 430.
    The investigator eitherchooses which variable to being with (i.e., to enter first in the analysis) or else instructs the computer to start by entering the one variable that has the strongest association with the dependent variable. In either case, when only the first variable has entered, the result is a simple regression analysis.
  • 431.
    Next, the secondvariable is entered according to the investigator’s instructions. The explanatory strength of the variable entered- that is, the r2 – changes as each new variable is entered. The “stepping” continues until none of the remaining independent variables meets the predetermined criterion for being entered (e.g., p is ≤ 0.1 or the increase in r2 is ≥ 0.01) or until all of the variables have been entered. When the stepping stops, the analysis is complete.
  • 432.
    In addition towatching for the statistical significance of the overall equation and of each variable entered, the investigator keeps a close watch on the overall r2 for each step, which is the proportion of variation the model has explained so far.
  • 433.
    In multiple regressionequations that are statistically significant, the increase in the total r2 after each step, compared with the total r2 after the previous step, indicates how much additional variation is explained by the variable just entered.
  • 434.
    References 1.C. Bernard, Anintroduction to the study of experimental medicine. 2. Daniel McCann, ‘Dental research: The clinical trial formula’, JADA 1990 Apr, 384-392. 3. J. M. Dunning, Principles of Dental public health, fourth edition, 1986. 4. National medical series,Preventive medicine & public health, second edition, 1992. 5. G. M. Gluck, W.M. Morganstei, Jong’s community detal health, fifth edition, 2003. 6. J.F. Jekel, D. L. Katz, Epidemiology, biostatistics and preventive medicine, second edition, 2001.
  • 435.
    7. Cynthia M.Pine, Community oral health, first edition, 1997. 8. Park’s text book of preventive and social medicine, eighteenth edition, 2006 9. C. R. Kothari, Research Methodology- Methods & Techniques, second edition, 2006. 10. Mahajan, Biostatistics, sixth edition, 2006. 11. B.Burt, Eklund, Dentistry, Dental practice & The Community, sixthe edition, 2005.