STATISTICAL ANALYSIS
Princy Francis M
Ist Yr MSc(N)
JMCON
DEFINITION
• Statistical analysis is the organisation and analysis of
quantitative or qualitative data using statistical procedures,
including both descriptive and inferential statistics.
• It’s the science of collecting, exploring and presenting
large amounts of data to discover underlying patterns and
trends.
DEFINITION
• Statistics is a branch of science that deals with the
collection, organisation, analysis of data and drawing of
inferences from the samples to the whole population.
• Sample is a small portion of population which truly
represents the population with respect to the study
characteristic of the population.
PURPOSES
• Summarize
• Explore the meaning of deviations in data
• Compare or contrast descriptively
• Test the proposed relationships in a theoretical model
• Infer that the findings from sample are indicative
• Examine causality.
• Predict or infer from the sample to a theoretical model.
ELEMENTS OF STATISTICAL ANALYSIS
Understand the complex relationship among the
correlates of the disease under study.
The analysis should start with simple comparison of
proportions and means.
Interpretation of result should be guided by clinical and
biological consideration.
STATISTICAL MEASURES
 Mean
 Mode
 Median
 Interquartile Range
 Standard Deviation.
MEAN
• The mean is the average of all numbers
Example
• Mean of 10, 20, 30, 40
25
MEDIAN
• When all the observations are arranged in ascending or descending
orders of magnitude, the middle one is the median.
• For raw data, If n is the total number of observations, the value of the
[
𝑛+1
2
] th item will be called median .
• if n is the even number, the mean of n/2th item and [
𝑛
2
+ 1] th item
will be median.
Example : Median of given data 10, 20, 30 is
20
MODE
• The Mode is the value of a series which appears most frequently than
any other .
• For grouped data,
Mode, M0 = L0 +{
𝛥1
𝛥1+𝛥2
} x c
Where, L0 is lower limit of modal class,
C is class interval
𝛥1 is difference between modal frequency and its preceding class
∆2 is difference between modal frequency and following class
frequency.
Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is
80
INTERQUARTILE RANGE
• The interquartile range (IQR), is a measure of statistical dispersion,
being equal to the difference between 75th and 25th percentiles, or
between upper and lower quartiles.
• IQR = Q3 − Q1.
Example
Interquartile range of following data 30, 20, 40, 60 , 50
• Q1 =[
𝑛+1
4
]th item = 1.5th item = 20+ 0.5 (30-20) = 25
• Q3 = 3[
𝑛+1
4
]th item = 50 +0.5x (60-50) = 55.
• IQR = 30
STANDARD DEVIATION
• The standard deviation is the most useful and most popular measure
of dispersion.
• The standard deviation is defined as the positive square root of the
arithmetic mean of the square of the deviations of given observations
from their arithmetic mean.
• The standard deviation is denoted by ‘𝜎 ’.
STANDARD DEVIATION Formula
EXAMPLE
• Standard deviation of data 10, 20, 30, 40, 50 where n= 5 , 𝑥 = 30
• 𝜎 = √1000/4 = √250 = 15. 811
STANDARD NORMAL DISTRIBUTION CURVE AND MEAN, MEDIAN,
INTERQUARTILE RANGE AND STANDARD DEVIATION
TYPES
• PARAMETRIC STATISTICAL ANALYSIS
• NONPARAMETRIC STATISTICAL ANALYSIS
PARAMETRIC STATISTICAL ANALYSIS
• Most commonly used type of statistical analysis.
• This analysis is referred to as parametric statistical analysis because
the findings are inferred to the parameters of a normally distributed
populations.
• Numerical data (quantitative variables) that are normally distributed
are analysed with parametric tests.
ASSUMPTIONS
• The assumption of normality which specifies that the means of the
sample group are normally distributed
• The assumption of equal variance which specifies that the variances
of the samples and of their corresponding population are equal.
• The data can be treated as random samples
NONPARAMETRIC STATISTICAL ANALYSIS
• Nonparametric statistical analysis or distribution free techniques
• It can be used in studies that do not meet the first two assumptions.
• Most nonparametric techniques are not as powerful as their
parametric counter parts.
• If the distribution of the sample is skewed towards one side or the
distribution is unknown due to the small sample size, non-parametric
statistical techniques are used.
• Non-parametric tests are used to analyse ordinal and categorical data.
EXPLORATORY DATA ANALYSIS AND
CONFIRMATORY DATA ANALYSIS
• John Tukey
• Exploratory data analysis to obtain a preliminary indication of the
nature of the data and to search data for hidden structure or models.
• Confirmatory data analysis involves traditional inferential statistics ,
which you can use to make an inference about a population or a
process based on evidence from the study sample.
STATISTICAL ANALYSIS DECISION MAKING
Two group
comparison
Mean
Parametric Independent 2 sample t test
Nonparametric Mann Witney U test
Percentage Chi-Square Test
One group
comparison
Mean
Single mean One sample t test
Mean
difference
Parametric Paired t test
Non parametric Wilcoxan Signed Scale test
More than 2
group
comparison
Mean
Parametric ANOVA
Non parametric Kruskal Walli’s test
Percentage Chi square test
PARAMETRIC
STATISTICAL
ANALYSIS
Student's t-test
 Z test
Analysis of variance (ANOVA)
Student's t-test
• Developed by Prof.W.S.Gossett
• Student's t-test is used to test the null hypothesis that there is no
difference between the means of the two groups
• One-sample t-test
• Independent Two Sample T Test (the unpaired t-test)
• The paired t-test
One-sample t-test
• To test if a sample mean (as an estimate of a population mean) differs
significantly from a given population mean.
• The mean of one sample is compared with population mean
where 𝑥 = sample mean, u = population mean and S = standard
deviation, n = sample size
Example
A random sample of size 20 from a normal population gives a sample
mean of 40, standard deviation of 6. Test the hypothesis is population
mean is 44. Check whether there is any difference between mean.
• H0: There is no significant difference between sample mean and
population mean
• H1: There is no significant difference between sample mean and
population mean
mean = 40 , 𝜇 = 44, n = 20 and S = 6
• tcalculated = 2.981
• t table value = 2.093
• tcalculated > t table value ;
Reject H0.
Independent Two Sample T Test (the
unpaired t-test)
• To test if the population means estimated by two independent
samples differ significantly.
• Two different samples with same mean at initial point and compare
mean at the end
t =
𝑥1− 𝑥2
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
1
𝑛1
+
1
𝑛2
Where 𝑥1 - 𝑥2 is the difference between the means of the two groups
and S denotes the standard deviation.
Example
Mean Hb level of 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10,
17.5, 14.2,15 and 14.1 . Test whether there is any significant difference
between Hb values.
• H0: There is no significant difference between Hb Level
• H1: There is no significant difference between Hb level.
t =
𝑥1− 𝑥2
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
1
𝑛1
+
1
𝑛2
• 𝑥1 = 11.2 , 𝑥2 =14.16 , 𝑆1
2
= 1.075, 𝑆2
2
= 7.293
• tcalculated = 2.287, t table = 2.306, tcalculated > t table value ; reject H0.
X1 X2 X1 - 𝑥1 X2 - 𝑥2 (X1 - 𝑥1)2 (X2 - 𝑥2)2
10
11
12.5
10.5
12
10
17.5
14.2
15
14.1
-1.2
- 0.2
1.3
-0.7
0.8
-4.16
3.34
0.04
0.84
-0.06
1.44
0.04
1.69
0.49
0.64
17.305
11.156
0.0016
0.706
0.0036
ÎŁ = 56 70.8 4.3 29.172
The paired t-test
• To test if the population means estimated by two dependent samples differ
significantly .
• A usual setting for paired t-test is when measurements are made on the
same subjects before and after a treatment.
where 𝑑 is the mean difference and Sd denotes the standard deviation of the
difference.
Example
Systolic BP of 5 patients before and after a drug therapy is
Before 160, 150, 170, 130, 140
After 140, 110, 120, 140, 130
Test whether there is any significant difference between BP level.
• H0: There is no significant difference between BP Level before and after
drug
• H1: There is no significant difference between BP level before and after
drug
• 𝑑 = 22, Sd = 23.875
• tcalculated = 2.060, t table = 2.567, tcalculated < t table value ; Accept H0.
Before After d d- 𝑑 (d- 𝑑 )2
160
150
170
130
140
140
110
120
140
130
20
40
50
-10
10
-2
18
28
-32
-12
4
324
784
1024
144
𝛴𝑑 = 110 2280
Z test
Generally, z-tests are used when we have large sample sizes (n > 30),
whereas t-tests are most helpful with a smaller sample size (n < 30).
Both methods assume a normal distribution of the data, but the z-tests
are most useful when the standard deviation is known.
z = (x – μ) / (σ / √n)
ANALYSIS OF VARIANCE (ANOVA)
• R. A. Fischer.
• The Student's t-test cannot be used for comparison of three or more groups.
• The purpose of ANOVA is to test if there is any significant difference between the
means of two or more groups.
• The analysis of variance is the systematic algebraic procedure of decomposing the
overall variation in the responses observed in an experiment into variation.
• Two variances – (a) between-group variability and (b) within-group variability that
is variation existing between the samples and variations existing within the
sample.
• The within-group variability (error variance) is the variation that cannot be
accounted for in the study design.
• The between-group (or effect variance) is the result of treatment
• A simplified formula for the F statistic is
where MST is the mean squares between the groups and MSE is the
mean squares within groups
NONPARAMETRIC
STATISTICAL
ANALYSIS
 CHI-SQUARE TEST
 THE WILCOXON'S SIGNED RANK TEST
 MANN-WHITNEY U TEST
 KRUSKAL-WALLIS TEST
CHI-SQUARE TEST
• Tests to analyse the categorical data
• The chi-square test is a widely used test in statistical decision making.
• The test is first used by Karl pearson in 1900.
• The Chi-square test compares the frequencies and tests whether the
observed data differ significantly from that of the expected data.
CHI-SQUARE TEST
It is calculated by the sum of the squared difference between observed
(O) and the expected (E) data (or the deviation, d) divided by the
expected data by the following formula:
Example
• Attack rates among vaccinated and not vaccinated against measles
are given in the following table. Test the association between
association between vaccination and attack of measles
Groups Attacked Not attacked
Vaccinated
Not vaccinated
10
26
90
74
• H0: There is no significant association between vaccination and attack
of measles
• H1: There is significant association between vaccination and attack of
measles
• Chi square table value = 3.841 , chi square calculated value = 8.672
• 𝑥2
calculated > 𝑥2
table value ; Reject H0.
Oi Ei Oi - Ei (Oi - Ei )2 (Oi - Ei )2 /
Ei
10
90
26
74
18
82
18
82
-8
8
8
-8
64
64
64
64
3.556
0.780
3.556
0.780
𝛴 = 8.672
THE WILCOXON'S SIGNED RANK TEST
• Wilcoxon's rank sum test ranks all data points in order, calculates the
rank sum of each sample and compares the difference in the rank
sums.
• For testing whether the differences observed in the values of the
quantitative variable between two correlated samples (before and
after design ) are statistically different or not
• This test corresponds to the paired t test.
Method
• H0: There is no difference in the paired values, on an average, between the two
groups.
• H1: There is difference in the paired values, on an average, between the two
groups.
• Compute the difference between each group of paired values in the two group.
• Rank the difference from smallest, without considering the sign of difference.
• After giving ranks, the corresponding sign should be attached.
• T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T
is taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ .
• Find the W critical value from Wilcoxon’s Signed rank Table .
• if Wstat < WCritical Value; Reject H0.
EXAMPLE
• IQ values of 8 malnourished children of 4 years age before and after
giving some nutritious diet for 3 months are given below
Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
• H0: There is no difference in the paired values
• H1: There is difference in the paired values
Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
Differe
nce
-10 -20 5 -5 3 10 -10 -15
Absolu
te
differe
nce
10 20 5 5 3 10 10 15
Rank 5 8 2.5 2.5 1 5 5 7
• T+ = 8.5, T- = 27.5. T = 8.5
• Wstat = 8.5, Wcritic = 3
• Wstat > WCritical Value; Accept H0.
• If Assuming normal distribution for the differences, test statistic is,
Z = {|T-m| -0.5} / SD
Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and
SD = √{
𝑛 𝑛+1 2𝑛+1
24
}
• If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected
MANN-WHITNEY U TEST
• For testing whether two independent samples with respect to a
quantitative variable come from the same population or not.
• Wilcoxon’s Rank Sum test.
• It is used to test the null hypothesis that two samples have the same
median or, alternatively, whether observations in one sample tend to
be larger than observations in the other.
• This test is alternative of t test for two independent samples
METHOD
H0: The average values in the two groups are the same
H1: The average values in the two groups are the different
• Let n1 is the sample size of one group and n2 is the sample size of
second group, Rank all the values in the two groups take together.
Tied values should be given same ranks.
• The ranksum of each group is taken and Ustat is calculated using
Ustat = Rank Sum - {n(n +1)/2 }.
• Both U1 and U2 is calculated and smaller value is taken as Ustat. and
Ucritical value is calculated from the Mann- Whitney U test table
• if Ustat < UCritical value; ; Reject H0.
Example
Treatment A Treatment B
3
4
2
6
2
5
9
7
5
10
6
8
• H0: The average values in the 2 treatment are the same
• H1: The average values in the 2 treatment are the different
Ustat = Rank Sum - {n(n +1)/2 }.
Ranks 1 2 3 4 5 6 7 8 9 10 11 12
Values 2 2 3 4 5 5 6 6 7 8 9 10
Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12
• UA = 23 – 21 = 2, UB = 55- 21 =34 so Ustat = 2 (lowest value)
• Ucritic = 5
• Ustat < UCritical value; Reject H0.
• Assuming that the ranks are randomly distributed in the two groups,
the test statisticis
Z = {|m-T| -0.5} / SD
Where T = smaller of T1 and T2.
T1 = sum of the ranks of smaller group, T2 = {(n1 +n2)(n1 +n2 +1) / 2} – T1 ,
m= mean sum of ranks { n1 ( n1 +n2+1)}/2
SD = √{
n1 x
n2
)(
n1
+
n2
+
1
12
}
• If Z is less than 1.96, H0 is accepted
• if Z>1.96 , H0 is rejected at 5% level of significance
KRUSKAL-WALLIS TEST
• The Kruskal–Wallis test is a non-parametric test to analyse the
variance.
• It is for the comparison among several independent samples.
• For testing whether several independent samples of a quantitative
variable come from the same population or not
• It corresponds to one way analysis of variance in parametric methods.
• It analyses if there is any difference in the median values of three or more
independent samples.
• The data values are ranked in an increasing order, and the rank sums
calculated followed by calculation of the test
Where n is the total of sample sizes in all the groups and Ri is the sum of the
ranks in the ith group.
Method
H0: The average values in the different groups are the same
H1: The average values in the different groups are the different
• Rank the all values taking all the group together.
• The chisquare table is used to get table value at 5% level of significane
• if Hstat is < Htable value ; reject H0
Example
Sample 1 Sample 2 Sample 3
8
10
9
12
11
13
10
9
13
14
9
16
13
8
9
13
17
15
• H0: The average values in the three groups are the same
• H1: The average values in the three groups are the different
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17
Tied
rank
1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18
• H = {12/18x19 [ (45.252 /6 ) + (61.52/6) + (65.52/6 )]} – 3x19
• Hcalculated = 56.99 , 𝑥2
table value = 5.99
• Hstat > 𝑥2
table value ; Reject H0
Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3
8
10
9
12
11
13
1.5
7.5
4.75
10
9
12.5
10
9
13
14
9
16
7.5
4.75
12.5
15
4.75
17
13
8
9
13
17
15
12.5
1.5
4.75
12.5
18
16
𝛴 = 45.25 𝛴 = 61.5 𝛴 = 65.25
WAYS TO RULE OUT ALTERNATIVE EXPLANATIONS
FOR OUTCOMES BY USING STATISTICAL ANALYSIS
• Testing null hypothesis
• Determining the probability of type I and type II error
• Calculating and reporting tests of effect size
• Ensuring data meet the fundamental assumptions of the statistical
test
TESTING NULL HYPOTHESIS
• When attempting to determine if an outcome is related to a cause, it is necessary to
know if the outcomes or results could have occurred by chance alone.
• This cannot be done with certainity, but researchers can determine the probability that
the hypothesis is true.
• Accepting a null hypothesis is a statement that there are no differences in the
outcomes based on the intervention or observation(that is, there is no cause and effect
relationship).
• Using a null hypothesis enables the researcher to quantify and report the probability
that the outcome was due to random error.
DETERMINING THE PROBABILITY OF
TYPE I AND TYPE II ERROR
• Before accepting the results as evidence for practice, however
the probability that an error was made should be evaluated.
• This coupled with the results of the hypothesis test, enables
the researcher to quantify the role of error in the outcome.
• The relationship between Type I and Type II error is
paradoxical – as one is controlled, the risk of other increases.
• Both types of error should be avoided
CALCULATING AND REPORTING TESTS
OF EFFECT SIZE
• Effect size refers to how much impact the intervention
or variable is expected to have on the outcome.
• Large effect sizes enhance the confidence of the
findings. When a treatment exerts a dramatic effect,
then the validity of the findings is not so called into
question.
• On the other hand, when effect sizes are very small,
then the potential for effects from extraneous
ENSURING DATA MEET THE FUNDAMENTAL
ASSUMPTIONS OF THE STATISTICAL TEST
• Data analysis is based on many assumptions about the
nature of the data, the statistical procedures that are used
to conduct the analysis and the match between the data
and the procedure
• If assumption is violated, the result can be an inaccurate
estimate of the real relationship.
• In accurate conclusions lead to an error, which in turn
affects the validity of a study.
RESOURCES FOR STATISTICAL
ANALYSIS PROGRAM
• Packaged computer programs can perform the data analysis
and provide with the results of analysis on a computer
printout.
• SPSS, SAS and Biomedical Data Processing (BMDP)
• If the analysis selected are inappropriate for the data, the
computer program is often unable to detect that error and
proceed to perform the analysis
STATISTICALANALYSIS SYSTEM
Comprehensive software developed by North Carolina University.
This software is divided into many modules and its licensing is
flexible, based upon the need for functions.
This system contains a very large variety of statistical methods and is
the software of choice of many major businesses, including the entire
pharmaceutical industry.
 SAS has also developed a PC SAS, which is compatible with the
personal computer and has a user-friendly windows interface.
PITFALLS OF STATISTICAL ANALYSIS
• Statistics can be used, intentionally or unintentionally, to reach faulty
conclusions. Misleading information is unfortunately the norm in
advertising. The drug companies, for example, are well known to
indulge in misleading information.
• Data dredging
• Survey questions
It is therefore important that to understand not just the numbers but
the meaning behind the numbers. Statistics is a tool, not a substitute
for in-depth reasoning and analysis
APPLICATION OF STATISTICAL ANALYSIS
IN NURSING FIELD
• To analyze a trend in the vital statistics of a particular patient.
• Research in nursing processes and procedures
• A statistical analysis of patient outcomes
• Trends in nursing
JOURNAL ABSTRACT
Use of Statistical Analysis in The New England Journal of Medicine
• A sorting of the statistical methods used by authors of the 760 research
and review articles in Volumes 298 to 301 of The New England Journal of
Medicine indicates that a reader who is conversant with descriptive
statistics (percentages, means, and standard deviations) has statistical
access to 58 per cent of the articles. Understanding t-tests increases this
access to 67 per cent.
• The addition of contingency tables gives statistical access to 73 per cent of
the articles.
• Familiarity with each additional statistical method gradually increases the
percentage of accessible articles.
• Original Articles use statistical techniques more extensively than other
articles in the Journal.
Statistical analysis and design in marketing
journal articles
• The use of statistical analysis in 922 articles from the 1980 through 1985
issues of the Journal of The Academy of Marketing Science (JAMS), the
Journal of Marketing (JM), the Journal of Marketing Research (JMR), and
the Journal of Consumer Research (JCR) was analyzed.
• A reader with no statistical background can understand 31, 56, 9, and 21
percent of the articles respectively in these four journals.
• Knowledge of regression and analysis of variance is important in
comprehending many of the articles.
• 38 percent of the JAMS articles and 25, 57 and 56 percent, respectively, of
the other three journals make use of these statistical techniques.
ASSIGNMENT
• Mean and Standard deviation of weight (Kg) of 100 School going(A) and
100 children not going to school(B) of 5 years of age in slum areas are given
below
Which test is used to find the statistical significance?
Population Sample
size
Mean SD
A 100 17.4 3
B 100 13.2 2.5
REFERENCES
• Indrayan A. Basic methods of medical research. NewDelhi: AITBS
Publishers; 2006.
• Kader P . Nursing Research: Principles, process and issues. Second
edition. Newyork : Palgrave Macmillan; 2006.
• Sundaram RK, Dwivedi SN, Sreenivas V. Medical Statistics : Principles
and methods. Second edition. New Delhi: Wolter Kluwer publication;
2015
• Rao SSSP. Biostatistics. Third edition. New Delhi: Prentice Hall India
Pvt Ltd;2004
Statistical  analysis

Statistical analysis

  • 2.
  • 3.
    DEFINITION • Statistical analysisis the organisation and analysis of quantitative or qualitative data using statistical procedures, including both descriptive and inferential statistics. • It’s the science of collecting, exploring and presenting large amounts of data to discover underlying patterns and trends.
  • 4.
    DEFINITION • Statistics isa branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population. • Sample is a small portion of population which truly represents the population with respect to the study characteristic of the population.
  • 5.
    PURPOSES • Summarize • Explorethe meaning of deviations in data • Compare or contrast descriptively • Test the proposed relationships in a theoretical model • Infer that the findings from sample are indicative • Examine causality. • Predict or infer from the sample to a theoretical model.
  • 6.
    ELEMENTS OF STATISTICALANALYSIS Understand the complex relationship among the correlates of the disease under study. The analysis should start with simple comparison of proportions and means. Interpretation of result should be guided by clinical and biological consideration.
  • 7.
    STATISTICAL MEASURES  Mean Mode  Median  Interquartile Range  Standard Deviation.
  • 8.
    MEAN • The meanis the average of all numbers
  • 9.
    Example • Mean of10, 20, 30, 40 25
  • 10.
    MEDIAN • When allthe observations are arranged in ascending or descending orders of magnitude, the middle one is the median. • For raw data, If n is the total number of observations, the value of the [ 𝑛+1 2 ] th item will be called median . • if n is the even number, the mean of n/2th item and [ 𝑛 2 + 1] th item will be median. Example : Median of given data 10, 20, 30 is 20
  • 11.
    MODE • The Modeis the value of a series which appears most frequently than any other . • For grouped data, Mode, M0 = L0 +{ 𝛥1 𝛥1+𝛥2 } x c Where, L0 is lower limit of modal class, C is class interval 𝛥1 is difference between modal frequency and its preceding class ∆2 is difference between modal frequency and following class frequency. Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is 80
  • 12.
    INTERQUARTILE RANGE • Theinterquartile range (IQR), is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles. • IQR = Q3 − Q1.
  • 13.
    Example Interquartile range offollowing data 30, 20, 40, 60 , 50 • Q1 =[ 𝑛+1 4 ]th item = 1.5th item = 20+ 0.5 (30-20) = 25 • Q3 = 3[ 𝑛+1 4 ]th item = 50 +0.5x (60-50) = 55. • IQR = 30
  • 14.
    STANDARD DEVIATION • Thestandard deviation is the most useful and most popular measure of dispersion. • The standard deviation is defined as the positive square root of the arithmetic mean of the square of the deviations of given observations from their arithmetic mean. • The standard deviation is denoted by ‘𝜎 ’.
  • 15.
  • 16.
    EXAMPLE • Standard deviationof data 10, 20, 30, 40, 50 where n= 5 , 𝑥 = 30 • 𝜎 = √1000/4 = √250 = 15. 811
  • 17.
    STANDARD NORMAL DISTRIBUTIONCURVE AND MEAN, MEDIAN, INTERQUARTILE RANGE AND STANDARD DEVIATION
  • 18.
    TYPES • PARAMETRIC STATISTICALANALYSIS • NONPARAMETRIC STATISTICAL ANALYSIS
  • 19.
    PARAMETRIC STATISTICAL ANALYSIS •Most commonly used type of statistical analysis. • This analysis is referred to as parametric statistical analysis because the findings are inferred to the parameters of a normally distributed populations. • Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.
  • 20.
    ASSUMPTIONS • The assumptionof normality which specifies that the means of the sample group are normally distributed • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal. • The data can be treated as random samples
  • 21.
    NONPARAMETRIC STATISTICAL ANALYSIS •Nonparametric statistical analysis or distribution free techniques • It can be used in studies that do not meet the first two assumptions. • Most nonparametric techniques are not as powerful as their parametric counter parts.
  • 22.
    • If thedistribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric statistical techniques are used. • Non-parametric tests are used to analyse ordinal and categorical data.
  • 23.
    EXPLORATORY DATA ANALYSISAND CONFIRMATORY DATA ANALYSIS • John Tukey • Exploratory data analysis to obtain a preliminary indication of the nature of the data and to search data for hidden structure or models. • Confirmatory data analysis involves traditional inferential statistics , which you can use to make an inference about a population or a process based on evidence from the study sample.
  • 24.
    STATISTICAL ANALYSIS DECISIONMAKING Two group comparison Mean Parametric Independent 2 sample t test Nonparametric Mann Witney U test Percentage Chi-Square Test One group comparison Mean Single mean One sample t test Mean difference Parametric Paired t test Non parametric Wilcoxan Signed Scale test More than 2 group comparison Mean Parametric ANOVA Non parametric Kruskal Walli’s test Percentage Chi square test
  • 25.
  • 26.
    Student's t-test • Developedby Prof.W.S.Gossett • Student's t-test is used to test the null hypothesis that there is no difference between the means of the two groups • One-sample t-test • Independent Two Sample T Test (the unpaired t-test) • The paired t-test
  • 27.
    One-sample t-test • Totest if a sample mean (as an estimate of a population mean) differs significantly from a given population mean. • The mean of one sample is compared with population mean where 𝑥 = sample mean, u = population mean and S = standard deviation, n = sample size
  • 28.
    Example A random sampleof size 20 from a normal population gives a sample mean of 40, standard deviation of 6. Test the hypothesis is population mean is 44. Check whether there is any difference between mean. • H0: There is no significant difference between sample mean and population mean • H1: There is no significant difference between sample mean and population mean mean = 40 , 𝜇 = 44, n = 20 and S = 6
  • 29.
    • tcalculated =2.981 • t table value = 2.093 • tcalculated > t table value ; Reject H0.
  • 30.
    Independent Two SampleT Test (the unpaired t-test) • To test if the population means estimated by two independent samples differ significantly. • Two different samples with same mean at initial point and compare mean at the end
  • 31.
    t = 𝑥1− 𝑥2 𝑛1−1𝑆1 2+ 𝑛2−1 𝑆2 2 𝑛1+𝑛2−2 1 𝑛1 + 1 𝑛2 Where 𝑥1 - 𝑥2 is the difference between the means of the two groups and S denotes the standard deviation.
  • 32.
    Example Mean Hb levelof 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10, 17.5, 14.2,15 and 14.1 . Test whether there is any significant difference between Hb values. • H0: There is no significant difference between Hb Level • H1: There is no significant difference between Hb level. t = 𝑥1− 𝑥2 𝑛1−1 𝑆1 2+ 𝑛2−1 𝑆2 2 𝑛1+𝑛2−2 1 𝑛1 + 1 𝑛2
  • 33.
    • 𝑥1 =11.2 , 𝑥2 =14.16 , 𝑆1 2 = 1.075, 𝑆2 2 = 7.293 • tcalculated = 2.287, t table = 2.306, tcalculated > t table value ; reject H0. X1 X2 X1 - 𝑥1 X2 - 𝑥2 (X1 - 𝑥1)2 (X2 - 𝑥2)2 10 11 12.5 10.5 12 10 17.5 14.2 15 14.1 -1.2 - 0.2 1.3 -0.7 0.8 -4.16 3.34 0.04 0.84 -0.06 1.44 0.04 1.69 0.49 0.64 17.305 11.156 0.0016 0.706 0.0036 Σ = 56 70.8 4.3 29.172
  • 34.
    The paired t-test •To test if the population means estimated by two dependent samples differ significantly . • A usual setting for paired t-test is when measurements are made on the same subjects before and after a treatment. where 𝑑 is the mean difference and Sd denotes the standard deviation of the difference.
  • 35.
    Example Systolic BP of5 patients before and after a drug therapy is Before 160, 150, 170, 130, 140 After 140, 110, 120, 140, 130 Test whether there is any significant difference between BP level. • H0: There is no significant difference between BP Level before and after drug • H1: There is no significant difference between BP level before and after drug
  • 36.
    • 𝑑 =22, Sd = 23.875 • tcalculated = 2.060, t table = 2.567, tcalculated < t table value ; Accept H0. Before After d d- 𝑑 (d- 𝑑 )2 160 150 170 130 140 140 110 120 140 130 20 40 50 -10 10 -2 18 28 -32 -12 4 324 784 1024 144 𝛴𝑑 = 110 2280
  • 37.
    Z test Generally, z-testsare used when we have large sample sizes (n > 30), whereas t-tests are most helpful with a smaller sample size (n < 30). Both methods assume a normal distribution of the data, but the z-tests are most useful when the standard deviation is known. z = (x – μ) / (σ / √n)
  • 38.
    ANALYSIS OF VARIANCE(ANOVA) • R. A. Fischer. • The Student's t-test cannot be used for comparison of three or more groups. • The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups. • The analysis of variance is the systematic algebraic procedure of decomposing the overall variation in the responses observed in an experiment into variation. • Two variances – (a) between-group variability and (b) within-group variability that is variation existing between the samples and variations existing within the sample. • The within-group variability (error variance) is the variation that cannot be accounted for in the study design. • The between-group (or effect variance) is the result of treatment
  • 39.
    • A simplifiedformula for the F statistic is where MST is the mean squares between the groups and MSE is the mean squares within groups
  • 40.
    NONPARAMETRIC STATISTICAL ANALYSIS  CHI-SQUARE TEST THE WILCOXON'S SIGNED RANK TEST  MANN-WHITNEY U TEST  KRUSKAL-WALLIS TEST
  • 41.
    CHI-SQUARE TEST • Teststo analyse the categorical data • The chi-square test is a widely used test in statistical decision making. • The test is first used by Karl pearson in 1900. • The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data.
  • 42.
    CHI-SQUARE TEST It iscalculated by the sum of the squared difference between observed (O) and the expected (E) data (or the deviation, d) divided by the expected data by the following formula:
  • 43.
    Example • Attack ratesamong vaccinated and not vaccinated against measles are given in the following table. Test the association between association between vaccination and attack of measles Groups Attacked Not attacked Vaccinated Not vaccinated 10 26 90 74
  • 44.
    • H0: Thereis no significant association between vaccination and attack of measles • H1: There is significant association between vaccination and attack of measles
  • 45.
    • Chi squaretable value = 3.841 , chi square calculated value = 8.672 • 𝑥2 calculated > 𝑥2 table value ; Reject H0. Oi Ei Oi - Ei (Oi - Ei )2 (Oi - Ei )2 / Ei 10 90 26 74 18 82 18 82 -8 8 8 -8 64 64 64 64 3.556 0.780 3.556 0.780 𝛴 = 8.672
  • 46.
    THE WILCOXON'S SIGNEDRANK TEST • Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums. • For testing whether the differences observed in the values of the quantitative variable between two correlated samples (before and after design ) are statistically different or not • This test corresponds to the paired t test.
  • 47.
    Method • H0: Thereis no difference in the paired values, on an average, between the two groups. • H1: There is difference in the paired values, on an average, between the two groups. • Compute the difference between each group of paired values in the two group. • Rank the difference from smallest, without considering the sign of difference. • After giving ranks, the corresponding sign should be attached. • T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T is taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ . • Find the W critical value from Wilcoxon’s Signed rank Table . • if Wstat < WCritical Value; Reject H0.
  • 48.
    EXAMPLE • IQ valuesof 8 malnourished children of 4 years age before and after giving some nutritious diet for 3 months are given below Before 40 60 55 65 43 70 80 60 After 50 80 50 70 40 60 90 85
  • 49.
    • H0: Thereis no difference in the paired values • H1: There is difference in the paired values Before 40 60 55 65 43 70 80 60 After 50 80 50 70 40 60 90 85 Differe nce -10 -20 5 -5 3 10 -10 -15 Absolu te differe nce 10 20 5 5 3 10 10 15 Rank 5 8 2.5 2.5 1 5 5 7
  • 50.
    • T+ =8.5, T- = 27.5. T = 8.5 • Wstat = 8.5, Wcritic = 3 • Wstat > WCritical Value; Accept H0.
  • 51.
    • If Assumingnormal distribution for the differences, test statistic is, Z = {|T-m| -0.5} / SD Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and SD = √{ 𝑛 𝑛+1 2𝑛+1 24 } • If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected
  • 52.
    MANN-WHITNEY U TEST •For testing whether two independent samples with respect to a quantitative variable come from the same population or not. • Wilcoxon’s Rank Sum test. • It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other. • This test is alternative of t test for two independent samples
  • 53.
    METHOD H0: The averagevalues in the two groups are the same H1: The average values in the two groups are the different • Let n1 is the sample size of one group and n2 is the sample size of second group, Rank all the values in the two groups take together. Tied values should be given same ranks. • The ranksum of each group is taken and Ustat is calculated using Ustat = Rank Sum - {n(n +1)/2 }. • Both U1 and U2 is calculated and smaller value is taken as Ustat. and Ucritical value is calculated from the Mann- Whitney U test table • if Ustat < UCritical value; ; Reject H0.
  • 54.
    Example Treatment A TreatmentB 3 4 2 6 2 5 9 7 5 10 6 8
  • 55.
    • H0: Theaverage values in the 2 treatment are the same • H1: The average values in the 2 treatment are the different Ustat = Rank Sum - {n(n +1)/2 }. Ranks 1 2 3 4 5 6 7 8 9 10 11 12 Values 2 2 3 4 5 5 6 6 7 8 9 10 Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12
  • 56.
    • UA =23 – 21 = 2, UB = 55- 21 =34 so Ustat = 2 (lowest value) • Ucritic = 5 • Ustat < UCritical value; Reject H0.
  • 57.
    • Assuming thatthe ranks are randomly distributed in the two groups, the test statisticis Z = {|m-T| -0.5} / SD Where T = smaller of T1 and T2. T1 = sum of the ranks of smaller group, T2 = {(n1 +n2)(n1 +n2 +1) / 2} – T1 , m= mean sum of ranks { n1 ( n1 +n2+1)}/2 SD = √{ n1 x n2 )( n1 + n2 + 1 12 } • If Z is less than 1.96, H0 is accepted • if Z>1.96 , H0 is rejected at 5% level of significance
  • 58.
    KRUSKAL-WALLIS TEST • TheKruskal–Wallis test is a non-parametric test to analyse the variance. • It is for the comparison among several independent samples. • For testing whether several independent samples of a quantitative variable come from the same population or not • It corresponds to one way analysis of variance in parametric methods.
  • 59.
    • It analysesif there is any difference in the median values of three or more independent samples. • The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test Where n is the total of sample sizes in all the groups and Ri is the sum of the ranks in the ith group.
  • 60.
    Method H0: The averagevalues in the different groups are the same H1: The average values in the different groups are the different • Rank the all values taking all the group together. • The chisquare table is used to get table value at 5% level of significane • if Hstat is < Htable value ; reject H0
  • 61.
    Example Sample 1 Sample2 Sample 3 8 10 9 12 11 13 10 9 13 14 9 16 13 8 9 13 17 15
  • 62.
    • H0: Theaverage values in the three groups are the same • H1: The average values in the three groups are the different Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17 Tied rank 1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18
  • 63.
    • H ={12/18x19 [ (45.252 /6 ) + (61.52/6) + (65.52/6 )]} – 3x19 • Hcalculated = 56.99 , 𝑥2 table value = 5.99 • Hstat > 𝑥2 table value ; Reject H0 Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3 8 10 9 12 11 13 1.5 7.5 4.75 10 9 12.5 10 9 13 14 9 16 7.5 4.75 12.5 15 4.75 17 13 8 9 13 17 15 12.5 1.5 4.75 12.5 18 16 𝛴 = 45.25 𝛴 = 61.5 𝛴 = 65.25
  • 64.
    WAYS TO RULEOUT ALTERNATIVE EXPLANATIONS FOR OUTCOMES BY USING STATISTICAL ANALYSIS • Testing null hypothesis • Determining the probability of type I and type II error • Calculating and reporting tests of effect size • Ensuring data meet the fundamental assumptions of the statistical test
  • 65.
    TESTING NULL HYPOTHESIS •When attempting to determine if an outcome is related to a cause, it is necessary to know if the outcomes or results could have occurred by chance alone. • This cannot be done with certainity, but researchers can determine the probability that the hypothesis is true. • Accepting a null hypothesis is a statement that there are no differences in the outcomes based on the intervention or observation(that is, there is no cause and effect relationship). • Using a null hypothesis enables the researcher to quantify and report the probability that the outcome was due to random error.
  • 66.
    DETERMINING THE PROBABILITYOF TYPE I AND TYPE II ERROR • Before accepting the results as evidence for practice, however the probability that an error was made should be evaluated. • This coupled with the results of the hypothesis test, enables the researcher to quantify the role of error in the outcome. • The relationship between Type I and Type II error is paradoxical – as one is controlled, the risk of other increases. • Both types of error should be avoided
  • 67.
    CALCULATING AND REPORTINGTESTS OF EFFECT SIZE • Effect size refers to how much impact the intervention or variable is expected to have on the outcome. • Large effect sizes enhance the confidence of the findings. When a treatment exerts a dramatic effect, then the validity of the findings is not so called into question. • On the other hand, when effect sizes are very small, then the potential for effects from extraneous
  • 68.
    ENSURING DATA MEETTHE FUNDAMENTAL ASSUMPTIONS OF THE STATISTICAL TEST • Data analysis is based on many assumptions about the nature of the data, the statistical procedures that are used to conduct the analysis and the match between the data and the procedure • If assumption is violated, the result can be an inaccurate estimate of the real relationship. • In accurate conclusions lead to an error, which in turn affects the validity of a study.
  • 69.
    RESOURCES FOR STATISTICAL ANALYSISPROGRAM • Packaged computer programs can perform the data analysis and provide with the results of analysis on a computer printout. • SPSS, SAS and Biomedical Data Processing (BMDP) • If the analysis selected are inappropriate for the data, the computer program is often unable to detect that error and proceed to perform the analysis
  • 70.
    STATISTICALANALYSIS SYSTEM Comprehensive softwaredeveloped by North Carolina University. This software is divided into many modules and its licensing is flexible, based upon the need for functions. This system contains a very large variety of statistical methods and is the software of choice of many major businesses, including the entire pharmaceutical industry.  SAS has also developed a PC SAS, which is compatible with the personal computer and has a user-friendly windows interface.
  • 71.
    PITFALLS OF STATISTICALANALYSIS • Statistics can be used, intentionally or unintentionally, to reach faulty conclusions. Misleading information is unfortunately the norm in advertising. The drug companies, for example, are well known to indulge in misleading information. • Data dredging • Survey questions It is therefore important that to understand not just the numbers but the meaning behind the numbers. Statistics is a tool, not a substitute for in-depth reasoning and analysis
  • 72.
    APPLICATION OF STATISTICALANALYSIS IN NURSING FIELD • To analyze a trend in the vital statistics of a particular patient. • Research in nursing processes and procedures • A statistical analysis of patient outcomes • Trends in nursing
  • 73.
    JOURNAL ABSTRACT Use ofStatistical Analysis in The New England Journal of Medicine • A sorting of the statistical methods used by authors of the 760 research and review articles in Volumes 298 to 301 of The New England Journal of Medicine indicates that a reader who is conversant with descriptive statistics (percentages, means, and standard deviations) has statistical access to 58 per cent of the articles. Understanding t-tests increases this access to 67 per cent. • The addition of contingency tables gives statistical access to 73 per cent of the articles. • Familiarity with each additional statistical method gradually increases the percentage of accessible articles. • Original Articles use statistical techniques more extensively than other articles in the Journal.
  • 74.
    Statistical analysis anddesign in marketing journal articles • The use of statistical analysis in 922 articles from the 1980 through 1985 issues of the Journal of The Academy of Marketing Science (JAMS), the Journal of Marketing (JM), the Journal of Marketing Research (JMR), and the Journal of Consumer Research (JCR) was analyzed. • A reader with no statistical background can understand 31, 56, 9, and 21 percent of the articles respectively in these four journals. • Knowledge of regression and analysis of variance is important in comprehending many of the articles. • 38 percent of the JAMS articles and 25, 57 and 56 percent, respectively, of the other three journals make use of these statistical techniques.
  • 75.
    ASSIGNMENT • Mean andStandard deviation of weight (Kg) of 100 School going(A) and 100 children not going to school(B) of 5 years of age in slum areas are given below Which test is used to find the statistical significance? Population Sample size Mean SD A 100 17.4 3 B 100 13.2 2.5
  • 76.
    REFERENCES • Indrayan A.Basic methods of medical research. NewDelhi: AITBS Publishers; 2006. • Kader P . Nursing Research: Principles, process and issues. Second edition. Newyork : Palgrave Macmillan; 2006. • Sundaram RK, Dwivedi SN, Sreenivas V. Medical Statistics : Principles and methods. Second edition. New Delhi: Wolter Kluwer publication; 2015 • Rao SSSP. Biostatistics. Third edition. New Delhi: Prentice Hall India Pvt Ltd;2004