Parametric &
Non-Parametric tests
Parametric tests
• For the most accurate results, the use of the highest level of
statistical test available for that type of data is indispensable
• Many tests suitable for quantitative data make large assumptions
about the distribution of the variables in the populations compared
• Tests which make such distributional assumptions about the variable
being analyzed are called‘parametric tests’
• On the other hand, with fairly large sample sizes, many of the
assumptions for the parametric tests may hold approximately
• In general, parametric tests are more powerful in detecting
differences between populations when the underlying assumptions
hold
Dept. Of Biostats,SJMC, Bangalore
Independent vs. paired samples
• Paired samples : Each observation in the first group has
corresponding observation in the second group (corresponding
observations typically not independent!)
• Independent samples : Observations in each of the two groups
are not related to each other
Dept. Of Biostats,SJMC, Bangalore
Comparison of means
Dept. Of Biostats,SJMC, Bangalore
Dept. of Biostatistics, SJMC, Bangalore
Student’s t-test
Student’s t-test (Independent t-test)
• to assess the statistical significance of the difference
between two population means
Assumption
• Sample observations are random and independent
• Outcome variable must be continuous and normally
distributed
• The variance of the outcome variable is the same in
the two groups (Homogeneity of variance )
Dept. of Biostatistics, SJMC, Bangalore
Dept. of Biostatistics, SJMC, Bangalore
One group of observations
(One sample t-test)
Compare the mean of a single group of observations with a
specified value.
Example data:
Comparison of mean dietary intake of a particular group of
individuals with the recommended daily intake.
Data: Average daily energy intake (ADEI) over 10 days of 11
healthy women (Manocha et al., 1986)
Dept. of Biostatistics, SJMC, Bangalore
Dept. of Biostatistics, SJMC, Bangalore
Example for independent t test:
A study was done to investigate the nature of lung
destruction in cigarette smokers before the development
of marked emphysema.Two lung destructive index
lifelong
These
measurements were made
nonsmokers and smokers
on the lungs of
who died suddenly.
indices are as given below.
Can we conclude that the smokers have generally greater
lung damage as measured than nonsmokers?
Dept. of Biostatistics, SJMC, Bangalore
Dept. of Biostatistics, SJMC, Bangalore
Example for paired t-
test:
A study on the effect of a particular drug on pulse rate was
observed on 8 patients before and after the administration of
the drug.
Is the drug administered effective in changing the pulse rate in
those 8 patients?
Patient Before drug After drug
1 58 66
2 65 69
3 68 75
4 70 68
5 66 73
6 75 75
7 62 68
8 72 69
Tests of association
By
JOHN MICHAEL RAJ
Chi-square test
• The Chi-square test can be used for two applications
• Independence between two variables
• The null hypothesis for this test is that the variables are
independent (i.e. that there is no statistical association)
• The alternative hypothesis is that there is a statistical
relationship or association between the two variables
• Test for equality of proportions between two or more
groups
• The null hypothesis for this test is that the 2 proportions are
equal
• The alternative hypothesis is that the proportions are not
equal (test for a difference in either direction)
Chi-square Test
• 𝑆𝑡𝑒𝑝 1: 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 (𝑎𝑙𝑤𝑎𝑦𝑠 𝑡𝑤𝑜 − 𝑠𝑖𝑑𝑒𝑑):
𝐻0: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝐻𝐴: 𝑁𝑜𝑡 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
• 𝑆𝑡𝑒𝑝 2: 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐:
χ2
= σ
𝑥𝑖−𝑒𝑖
2
𝑒𝑖
͠ χ2 𝑤𝑖𝑡ℎ 𝑑𝑓 = 𝑟 − 1 𝑐 − 1
• Step 3: Calculate the p-value
p-value = P(χ2 > χ 2)
• Step 4: Draw a conclusion
• p-value<α reject independence
• p-value>α do not reject independence
Test statistic
• Where oi’s are observed frequency
Ei’s are expected counts
• Basically the deviation between expected and
observed is computed
• Expected frequencies are calculated based on
Row & Column margin total

−
=
i
i
i
E
E
O 2
2 )
(

Dept. of Biostats, CMC, Vellore
Testing for Independence-
Example
• Contingency tables or cross – classified table
can be used
• Eg:
• How to view the association?
• Proportions of the groups will help in
comparison
Type II diabetes Hypertension
Yes No
Yes 5 57
No 51 2105
Dept. of Biostats, CMC, Vellore
Expected frequency
Type II
diabetes
Hypertension
Yes No Total
Yes 5
(1.6)
57
(60.4)
62
No 51
(54.4)
2105
(2101.6)
2156
Total 56 2162 2218
RT *CT/N =
(62*56)/2218
Dept. of Biostats, CMC, Vellore
Decision making
• If χ²calc ≥ χ ²tab at (r-1) *(c-1)df then, null is rejected
 =
−
= 95
.
7
)
( 2
2
i
i
i
E
E
O

• χ²calc ≥ χ ²tab at 1 df, null is rejected, concluding that
there is an association between type II diabetes and
hypertension.
χ ²tab = 3.84 at 1df
Dept. of Biostats, CMC, Vellore
Few notes before applying
nonparametric test!
➢In practice, of course, no distribution is exactly
Normal. Fortunately, our usual methods for
inference about population means (the one-sample
and two-sample t procedures and analysis of
variance) are quite robust.
➢That is, the results of interference are not very
sensitive to moderate lack of Normality, especially
when the samples are reasonably large.
➢Problem is serious if plots suggest that the data are
clearly not Normal, especially when we have only a
few observations?
Dept. Biostatistics, SJMC, Bangalore
Steps before opt for
nonparametric test!
➢If lack of Normality is due to outliers, it may be
legitimate to remove the outlier
➢Sometimes we can transform our data so that
their distribution is more nearly Normal
➢In some settings, other standard distributions
replace the Normal distributions as models for
the overall pattern in the population
➢Modern bootstrap methods and permutation
tests do not require Normality or any other
specific form of sampling distribution.
Dept. Biostatistics, SJMC, Bangalore
Why do I advocate parametric in the
class of nonparametric statistics?
Easy interpretation.
Can tolerate mild to moderate violation of
assumptions when sample is sufficiently large.
Nonparametric methods give 95% accuracy over
parametric when parametric assumptions are
satisfied.
Dept. Biostatistics, SJMC, Bangalore
Introduction
❑Make no assumptions about the data's characteristics.
So called “Distribution free-tests”
❑Answers the same sort of questions as the parametric
test – for each Parametric tests (PT) there is an
alternative Non-Parametric Test (NP)
Dept. Biostatistics, SJMC, Bangalore
When are non-parametric tests used?
Assumptions of parametric test are violated
❑Non-normal or skewed
❑Unequal variance
❑Data is on an ordinal scale
❑Very few observations
Dept. Biostatistics, SJMC, Bangalore
Assigning Ranks
❑Arranging the data in ascending order or descending
order
❑Assign the rank 1 to the first item, rank 2 to the second
item, and similarly for the rest of the items
Dept. Biostatistics, SJMC, Bangalore
Example
Assigning ranks for the following 5 scores 6, 9, 8, 3, 4
Original
score
Arranged
score
Ranks
6 3 1
9 4 2
8 6 3
3 8 4
4 9 5
Dept. Biostatistics, SJMC, Bangalore
How to Handel Ties
Example
Calculate the Rank for the following scores
Original scores Arranged Scores Ranks
4 3 1
9 4 2.5
3 4 2.5
4 5 4
8 7 5
10 8 6
7 9 7
5 10 8
(2+3)/2=2.5
Dept. Biostatistics, SJMC, Bangalore
Mann Whitney U Test
Dept. Biostatistics, SJMC, Bangalore
Mann-Whitney U Test
❑Also known as Wilcoxon Rank Sum test
❑Alternative to the parametric independent t-test
❑To test whether two independent groups have been drawn
from the same population
Assumptions:
❑Two sample are selected independently and at random
from their respective population
❑Variable of interest is continuous
❑Measurement scale is at least ordinal.
Dept. Biostatistics, SJMC, Bangalore
Mann-Whitney U Test
Compare the distribution of scores on a quantitative variable
obtained from two independent groups.
Young adults
BMI
Men 18.19 23.79 25.76 21.20 15.79 26.45 29.85
26.66 17.58 25.86 21.54 23.75 22.83
Women 18.86 25.86 16.54 18.87 17.87 18.73 15.75
17.77 17.46 18.28 30.47 30.03
Ho: There is no significant difference between two
population distributions
Ha: There is significant difference between two population
distributions
Dept. Biostatistics, SJMC, Bangalore
Procedure
Men(n=13) Women (n=12)
BMI
(kg/m²) Rank
BMI
(kg/m²) Rank
18.19 8 18.86 11
23.79 17 25.86 19.5
25.76 18 16.54 3
21.2 13 18.87 12
15.79 2 17.87 7
26.45 21 18.73 10
17.58 5 15.75 1
29.85 23 17.77 6
26.66 22 17.46 4
25.86 19.5 18.28 9
21.54 14 30.47 25
23.75 16 30.03 24
22.83 15
sum=193.5 sum=131.5
Ranking tied
observations
Dept. Biostatistics, SJMC, Bangalore
U1=53.5, U2=115.5
U=53.5
Rule: Calculated value should be less than the
critical value to reject the null hypothesis
Test statistic: ‘U’ = Min(U1, U2)
Dept. Biostatistics, SJMC, Bangalore
Dept. Biostatistics, SJMC, Bangalore
Wilcoxon Signed – Rank Test
Dept. Biostatistics, SJMC, Bangalore
Wilcoxon Signed-Rank Test
❑An alternative to the parametric paired t-test
❑Used to compare two samples from populations are not
independent eg., measure a variable in each subject before and
after an intervention
Assumptions
❑Samples must be paired
❑Pairs are randomly selected from the larger population
❑Measurements should be continuous
Dept. Biostatistics, SJMC, Bangalore
Example
A drug was designed to lower systolic blood pressure.
The systolic blood pressure was measured before and
after administration of the drug. Find whether the
drug is effective in lowering systolic blood pressure?
Dept. Biostatistics, SJMC, Bangalore
Procedure
Systolic blood
pressure
Subjec
t Before After Difference Rank Sign
1 170 175 5 1 +
2 168 171 3 2 +
3 199 178 -21 6 -
4 183 152 -31 9 -
5 178 159 -19 5 -
6 208 183 -25 7 -
7 194 176 -18 4 -
8 186 159 -27 8 -
9 156 145 -11 3 -
10 210 177 -33 10 -
Sum of (+ve ranks) =
1+2 =3
Test statistic is W = Min(W+ = 3 , W- =
52)
Dept. Biostatistics, SJMC, Bangalore
Decision making
We reject H0,because CV=3 < 8=TV.
How to decide the significance?
Dept. Biostatistics, SJMC, Bangalore

Parametric & Non-Parametric tests SPSS WORKSHOPpdf

  • 1.
  • 2.
    Parametric tests • Forthe most accurate results, the use of the highest level of statistical test available for that type of data is indispensable • Many tests suitable for quantitative data make large assumptions about the distribution of the variables in the populations compared • Tests which make such distributional assumptions about the variable being analyzed are called‘parametric tests’ • On the other hand, with fairly large sample sizes, many of the assumptions for the parametric tests may hold approximately • In general, parametric tests are more powerful in detecting differences between populations when the underlying assumptions hold Dept. Of Biostats,SJMC, Bangalore
  • 3.
    Independent vs. pairedsamples • Paired samples : Each observation in the first group has corresponding observation in the second group (corresponding observations typically not independent!) • Independent samples : Observations in each of the two groups are not related to each other Dept. Of Biostats,SJMC, Bangalore
  • 4.
    Comparison of means Dept.Of Biostats,SJMC, Bangalore
  • 5.
    Dept. of Biostatistics,SJMC, Bangalore Student’s t-test Student’s t-test (Independent t-test) • to assess the statistical significance of the difference between two population means Assumption • Sample observations are random and independent • Outcome variable must be continuous and normally distributed • The variance of the outcome variable is the same in the two groups (Homogeneity of variance )
  • 6.
    Dept. of Biostatistics,SJMC, Bangalore
  • 7.
    Dept. of Biostatistics,SJMC, Bangalore One group of observations (One sample t-test) Compare the mean of a single group of observations with a specified value. Example data: Comparison of mean dietary intake of a particular group of individuals with the recommended daily intake. Data: Average daily energy intake (ADEI) over 10 days of 11 healthy women (Manocha et al., 1986)
  • 8.
    Dept. of Biostatistics,SJMC, Bangalore
  • 9.
    Dept. of Biostatistics,SJMC, Bangalore Example for independent t test: A study was done to investigate the nature of lung destruction in cigarette smokers before the development of marked emphysema.Two lung destructive index lifelong These measurements were made nonsmokers and smokers on the lungs of who died suddenly. indices are as given below. Can we conclude that the smokers have generally greater lung damage as measured than nonsmokers?
  • 10.
    Dept. of Biostatistics,SJMC, Bangalore
  • 11.
    Dept. of Biostatistics,SJMC, Bangalore Example for paired t- test: A study on the effect of a particular drug on pulse rate was observed on 8 patients before and after the administration of the drug. Is the drug administered effective in changing the pulse rate in those 8 patients? Patient Before drug After drug 1 58 66 2 65 69 3 68 75 4 70 68 5 66 73 6 75 75 7 62 68 8 72 69
  • 12.
  • 13.
    Chi-square test • TheChi-square test can be used for two applications • Independence between two variables • The null hypothesis for this test is that the variables are independent (i.e. that there is no statistical association) • The alternative hypothesis is that there is a statistical relationship or association between the two variables • Test for equality of proportions between two or more groups • The null hypothesis for this test is that the 2 proportions are equal • The alternative hypothesis is that the proportions are not equal (test for a difference in either direction)
  • 14.
    Chi-square Test • 𝑆𝑡𝑒𝑝1: 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 (𝑎𝑙𝑤𝑎𝑦𝑠 𝑡𝑤𝑜 − 𝑠𝑖𝑑𝑒𝑑): 𝐻0: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝐻𝐴: 𝑁𝑜𝑡 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 • 𝑆𝑡𝑒𝑝 2: 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: χ2 = σ 𝑥𝑖−𝑒𝑖 2 𝑒𝑖 ͠ χ2 𝑤𝑖𝑡ℎ 𝑑𝑓 = 𝑟 − 1 𝑐 − 1 • Step 3: Calculate the p-value p-value = P(χ2 > χ 2) • Step 4: Draw a conclusion • p-value<α reject independence • p-value>α do not reject independence
  • 15.
    Test statistic • Whereoi’s are observed frequency Ei’s are expected counts • Basically the deviation between expected and observed is computed • Expected frequencies are calculated based on Row & Column margin total  − = i i i E E O 2 2 ) (  Dept. of Biostats, CMC, Vellore
  • 16.
    Testing for Independence- Example •Contingency tables or cross – classified table can be used • Eg: • How to view the association? • Proportions of the groups will help in comparison Type II diabetes Hypertension Yes No Yes 5 57 No 51 2105 Dept. of Biostats, CMC, Vellore
  • 17.
    Expected frequency Type II diabetes Hypertension YesNo Total Yes 5 (1.6) 57 (60.4) 62 No 51 (54.4) 2105 (2101.6) 2156 Total 56 2162 2218 RT *CT/N = (62*56)/2218 Dept. of Biostats, CMC, Vellore
  • 18.
    Decision making • Ifχ²calc ≥ χ ²tab at (r-1) *(c-1)df then, null is rejected  = − = 95 . 7 ) ( 2 2 i i i E E O  • χ²calc ≥ χ ²tab at 1 df, null is rejected, concluding that there is an association between type II diabetes and hypertension. χ ²tab = 3.84 at 1df Dept. of Biostats, CMC, Vellore
  • 19.
    Few notes beforeapplying nonparametric test! ➢In practice, of course, no distribution is exactly Normal. Fortunately, our usual methods for inference about population means (the one-sample and two-sample t procedures and analysis of variance) are quite robust. ➢That is, the results of interference are not very sensitive to moderate lack of Normality, especially when the samples are reasonably large. ➢Problem is serious if plots suggest that the data are clearly not Normal, especially when we have only a few observations? Dept. Biostatistics, SJMC, Bangalore
  • 20.
    Steps before optfor nonparametric test! ➢If lack of Normality is due to outliers, it may be legitimate to remove the outlier ➢Sometimes we can transform our data so that their distribution is more nearly Normal ➢In some settings, other standard distributions replace the Normal distributions as models for the overall pattern in the population ➢Modern bootstrap methods and permutation tests do not require Normality or any other specific form of sampling distribution. Dept. Biostatistics, SJMC, Bangalore
  • 21.
    Why do Iadvocate parametric in the class of nonparametric statistics? Easy interpretation. Can tolerate mild to moderate violation of assumptions when sample is sufficiently large. Nonparametric methods give 95% accuracy over parametric when parametric assumptions are satisfied. Dept. Biostatistics, SJMC, Bangalore
  • 22.
    Introduction ❑Make no assumptionsabout the data's characteristics. So called “Distribution free-tests” ❑Answers the same sort of questions as the parametric test – for each Parametric tests (PT) there is an alternative Non-Parametric Test (NP) Dept. Biostatistics, SJMC, Bangalore
  • 23.
    When are non-parametrictests used? Assumptions of parametric test are violated ❑Non-normal or skewed ❑Unequal variance ❑Data is on an ordinal scale ❑Very few observations Dept. Biostatistics, SJMC, Bangalore
  • 24.
    Assigning Ranks ❑Arranging thedata in ascending order or descending order ❑Assign the rank 1 to the first item, rank 2 to the second item, and similarly for the rest of the items Dept. Biostatistics, SJMC, Bangalore
  • 25.
    Example Assigning ranks forthe following 5 scores 6, 9, 8, 3, 4 Original score Arranged score Ranks 6 3 1 9 4 2 8 6 3 3 8 4 4 9 5 Dept. Biostatistics, SJMC, Bangalore
  • 26.
    How to HandelTies Example Calculate the Rank for the following scores Original scores Arranged Scores Ranks 4 3 1 9 4 2.5 3 4 2.5 4 5 4 8 7 5 10 8 6 7 9 7 5 10 8 (2+3)/2=2.5 Dept. Biostatistics, SJMC, Bangalore
  • 27.
    Mann Whitney UTest Dept. Biostatistics, SJMC, Bangalore
  • 28.
    Mann-Whitney U Test ❑Alsoknown as Wilcoxon Rank Sum test ❑Alternative to the parametric independent t-test ❑To test whether two independent groups have been drawn from the same population Assumptions: ❑Two sample are selected independently and at random from their respective population ❑Variable of interest is continuous ❑Measurement scale is at least ordinal. Dept. Biostatistics, SJMC, Bangalore
  • 29.
    Mann-Whitney U Test Comparethe distribution of scores on a quantitative variable obtained from two independent groups. Young adults BMI Men 18.19 23.79 25.76 21.20 15.79 26.45 29.85 26.66 17.58 25.86 21.54 23.75 22.83 Women 18.86 25.86 16.54 18.87 17.87 18.73 15.75 17.77 17.46 18.28 30.47 30.03 Ho: There is no significant difference between two population distributions Ha: There is significant difference between two population distributions Dept. Biostatistics, SJMC, Bangalore
  • 30.
    Procedure Men(n=13) Women (n=12) BMI (kg/m²)Rank BMI (kg/m²) Rank 18.19 8 18.86 11 23.79 17 25.86 19.5 25.76 18 16.54 3 21.2 13 18.87 12 15.79 2 17.87 7 26.45 21 18.73 10 17.58 5 15.75 1 29.85 23 17.77 6 26.66 22 17.46 4 25.86 19.5 18.28 9 21.54 14 30.47 25 23.75 16 30.03 24 22.83 15 sum=193.5 sum=131.5 Ranking tied observations Dept. Biostatistics, SJMC, Bangalore
  • 31.
    U1=53.5, U2=115.5 U=53.5 Rule: Calculatedvalue should be less than the critical value to reject the null hypothesis Test statistic: ‘U’ = Min(U1, U2) Dept. Biostatistics, SJMC, Bangalore
  • 32.
  • 33.
    Wilcoxon Signed –Rank Test Dept. Biostatistics, SJMC, Bangalore
  • 34.
    Wilcoxon Signed-Rank Test ❑Analternative to the parametric paired t-test ❑Used to compare two samples from populations are not independent eg., measure a variable in each subject before and after an intervention Assumptions ❑Samples must be paired ❑Pairs are randomly selected from the larger population ❑Measurements should be continuous Dept. Biostatistics, SJMC, Bangalore
  • 35.
    Example A drug wasdesigned to lower systolic blood pressure. The systolic blood pressure was measured before and after administration of the drug. Find whether the drug is effective in lowering systolic blood pressure? Dept. Biostatistics, SJMC, Bangalore
  • 36.
    Procedure Systolic blood pressure Subjec t BeforeAfter Difference Rank Sign 1 170 175 5 1 + 2 168 171 3 2 + 3 199 178 -21 6 - 4 183 152 -31 9 - 5 178 159 -19 5 - 6 208 183 -25 7 - 7 194 176 -18 4 - 8 186 159 -27 8 - 9 156 145 -11 3 - 10 210 177 -33 10 - Sum of (+ve ranks) = 1+2 =3 Test statistic is W = Min(W+ = 3 , W- = 52) Dept. Biostatistics, SJMC, Bangalore
  • 37.
    Decision making We rejectH0,because CV=3 < 8=TV. How to decide the significance? Dept. Biostatistics, SJMC, Bangalore