Wilcoxon Rank-Sum Test
and
Type I and Type II Errors
Lakshmi. M. B
Sr. No. 9219708
S3, MTech CSE
Sahrdaya College of Engineering and Technology, Kodakara
October 6, 2020
Lakshmi. M. B Big Data Analytics Seminar 1/14
Wilcoxon Rank-Sum Test
Non-parametric hypothesis test
Checks whether two populations are identically distributed.
Assumption: 2 populations are identically distributed.
Expectation: Ordering would be evenly intermixed among
themselves.
Lakshmi. M. B Big Data Analytics Seminar 2/14
Steps involved in Wilcoxon rank-sum test are:-
1 Rank the set of observations from the 2 groups as if they
come from one large group.
2 The assigned ranks are summed for atleast one population’s
sample.
Lakshmi. M. B Big Data Analytics Seminar 3/14
Example:
Let 2 populations be pop1 and pop2 with
independently random samples of size n1 and n2
respectively.
Total no:of observations, N = n1 + n2
Lakshmi. M. B Big Data Analytics Seminar 4/14
Step 1:
Smallest observation receives rank 1
Second smallest observation receives rank 2
.
.
.
Largest observation receives rank N.
Ties among observation receives rank equal to the average
ranks they span.
Lakshmi. M. B Big Data Analytics Seminar 5/14
Step 2:
Ranks are used to specific assumptions about the shape of the
distribution.
If the distribution of pop1 is shifted to the right of pop2, then
the rank-sum of pop1 sample should be larger than the
rank-sum of pop2.
Lakshmi. M. B Big Data Analytics Seminar 6/14
Wilcoxon rank-sum test determines the significance of the
observed rank-sums.
wilcox.test() – ranks the observations, determines the
respective rank-sums, and then determines the probability of
such rank-sums of such magnitude.
Example: wilcox.test(x, y, conf.int = TRUE)
More robust than the t-test.
Lakshmi. M. B Big Data Analytics Seminar 7/14
Type I and Type II Errors
There are 2 types of errors in a hypothesis test.
They are:-
1 Type I Error
2 Type II Error
Lakshmi. M. B Big Data Analytics Seminar 8/14
Lakshmi. M. B Big Data Analytics Seminar 9/14
Lakshmi. M. B Big Data Analytics Seminar 10/14
Type I Error:
Rejection of null hypothesis when the null hypothesis is TRUE.
Probability is denoted by α
Lakshmi. M. B Big Data Analytics Seminar 11/14
Type II Error:
Acceptance of null hypothesis when the null hypothesis is
False.
Probability is denoted by β
Lakshmi. M. B Big Data Analytics Seminar 12/14
By selecting an appropriate significance level, the probability
of committing a type I error can be defined before any data is
collected or analyzed.
Probability of committing a Type II error is more difficult to
determine.
If 2 population means are truly not equal, the probability of
committing a type II error will depend on how far apart the
means truly are.
To reduce the probability of a type II error to a reasonable
level; increase the sample size.
Lakshmi. M. B Big Data Analytics Seminar 13/14
Lakshmi. M. B Big Data Analytics Seminar 14/14

Wilcoxon Rank-Sum Test

  • 1.
    Wilcoxon Rank-Sum Test and TypeI and Type II Errors Lakshmi. M. B Sr. No. 9219708 S3, MTech CSE Sahrdaya College of Engineering and Technology, Kodakara October 6, 2020 Lakshmi. M. B Big Data Analytics Seminar 1/14
  • 2.
    Wilcoxon Rank-Sum Test Non-parametrichypothesis test Checks whether two populations are identically distributed. Assumption: 2 populations are identically distributed. Expectation: Ordering would be evenly intermixed among themselves. Lakshmi. M. B Big Data Analytics Seminar 2/14
  • 3.
    Steps involved inWilcoxon rank-sum test are:- 1 Rank the set of observations from the 2 groups as if they come from one large group. 2 The assigned ranks are summed for atleast one population’s sample. Lakshmi. M. B Big Data Analytics Seminar 3/14
  • 4.
    Example: Let 2 populationsbe pop1 and pop2 with independently random samples of size n1 and n2 respectively. Total no:of observations, N = n1 + n2 Lakshmi. M. B Big Data Analytics Seminar 4/14
  • 5.
    Step 1: Smallest observationreceives rank 1 Second smallest observation receives rank 2 . . . Largest observation receives rank N. Ties among observation receives rank equal to the average ranks they span. Lakshmi. M. B Big Data Analytics Seminar 5/14
  • 6.
    Step 2: Ranks areused to specific assumptions about the shape of the distribution. If the distribution of pop1 is shifted to the right of pop2, then the rank-sum of pop1 sample should be larger than the rank-sum of pop2. Lakshmi. M. B Big Data Analytics Seminar 6/14
  • 7.
    Wilcoxon rank-sum testdetermines the significance of the observed rank-sums. wilcox.test() – ranks the observations, determines the respective rank-sums, and then determines the probability of such rank-sums of such magnitude. Example: wilcox.test(x, y, conf.int = TRUE) More robust than the t-test. Lakshmi. M. B Big Data Analytics Seminar 7/14
  • 8.
    Type I andType II Errors There are 2 types of errors in a hypothesis test. They are:- 1 Type I Error 2 Type II Error Lakshmi. M. B Big Data Analytics Seminar 8/14
  • 9.
    Lakshmi. M. BBig Data Analytics Seminar 9/14
  • 10.
    Lakshmi. M. BBig Data Analytics Seminar 10/14
  • 11.
    Type I Error: Rejectionof null hypothesis when the null hypothesis is TRUE. Probability is denoted by α Lakshmi. M. B Big Data Analytics Seminar 11/14
  • 12.
    Type II Error: Acceptanceof null hypothesis when the null hypothesis is False. Probability is denoted by β Lakshmi. M. B Big Data Analytics Seminar 12/14
  • 13.
    By selecting anappropriate significance level, the probability of committing a type I error can be defined before any data is collected or analyzed. Probability of committing a Type II error is more difficult to determine. If 2 population means are truly not equal, the probability of committing a type II error will depend on how far apart the means truly are. To reduce the probability of a type II error to a reasonable level; increase the sample size. Lakshmi. M. B Big Data Analytics Seminar 13/14
  • 14.
    Lakshmi. M. BBig Data Analytics Seminar 14/14