SlideShare a Scribd company logo
On some interesting features and an
                    application of rank correlation

                                         Kushal Kr. Dey
                                    Indian Statistical Institute
                                D.Basu Memorial Award Talk 2011




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
List of contents




          1   Historical overview of rank correlation.
          2   Some properties of rank correlation.
          3   A practical example of rank correlation.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Historical Overview—Correlation



              In 1886, Sir Francis Galton coined the term correlation by
              quoting
                   length of a human arm is said to be correlated with
                   that of the leg, because a person with long arm has
                   usually long legs and conversely.

              Galton wanted a measure of correlation that takes value +1
              for perfect correspondence, 0 for independence, and -1 for
              perfect inverse correspondence.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Historical Overview—Correlation



              In 1886, Sir Francis Galton coined the term correlation by
              quoting
                   length of a human arm is said to be correlated with
                   that of the leg, because a person with long arm has
                   usually long legs and conversely.

              Galton wanted a measure of correlation that takes value +1
              for perfect correspondence, 0 for independence, and -1 for
              perfect inverse correspondence.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Historical overview—contd.


              Karl Pearson, a student of Galton, worked on his idea and
              formulated his ”product moments” measure of correlation in
              1896.
                                             Sxy
                                    r=√             .                   (1)
                                           Sxx Syy
              Spearman observed that for characteristics not quantitatively
              measurable, the Pearsonian measure fails to measure the
              association. This motivated him to use rank-based methods
              for association and develop his rank correlation coefficient in
              1904. [”The proof and measurement of association between
              two things” by C. Spearman in The American Journal of
              Psychology (1904)].



Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Historical overview—contd.


              Karl Pearson, a student of Galton, worked on his idea and
              formulated his ”product moments” measure of correlation in
              1896.
                                             Sxy
                                    r=√             .                   (1)
                                           Sxx Syy
              Spearman observed that for characteristics not quantitatively
              measurable, the Pearsonian measure fails to measure the
              association. This motivated him to use rank-based methods
              for association and develop his rank correlation coefficient in
              1904. [”The proof and measurement of association between
              two things” by C. Spearman in The American Journal of
              Psychology (1904)].



Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Historical overview contd


              In 1938, two years after the death of Pearson, Maurice
              Kendall, a British scientist, while working on psychological
              experiments, came up with a new measure of correlation
              popularly known as Kendall’s τ . [”A new measure of rank
              correlation”, M. Kendall, Biometrika,(1938)].
              Th next few years saw extensive research in this area due to
              Kendall, Daniels, Hoeffding and others.
              In 1954, a modification to Kendall’s coefficient in case of ties
              was made by Goodman and Kruskal. [”Measures of
              association for cross classifications” Part I, L.A.Goodman and
              W.H. Kruskal, J. Amer. Statist. Assoc, (1954)]




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Historical overview contd


              In 1938, two years after the death of Pearson, Maurice
              Kendall, a British scientist, while working on psychological
              experiments, came up with a new measure of correlation
              popularly known as Kendall’s τ . [”A new measure of rank
              correlation”, M. Kendall, Biometrika,(1938)].
              Th next few years saw extensive research in this area due to
              Kendall, Daniels, Hoeffding and others.
              In 1954, a modification to Kendall’s coefficient in case of ties
              was made by Goodman and Kruskal. [”Measures of
              association for cross classifications” Part I, L.A.Goodman and
              W.H. Kruskal, J. Amer. Statist. Assoc, (1954)]




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Daniel’s Generalized correlation coefficient


              H.E. Daniels of Cambridge University, a close associate of
              Kendall, proposed a measure in 1944 to unify Pearson’s r ,
              Spearman’s ρ and Kendall’s τ [The relation between
              measures of correlation in the universe of sample
              permutations, H.E.Daniels, Biometrika,(1944)].
              Consider n data points given by (Xi , Yi ), i = 1(|)n , for each
              pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0,
              similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s
              generalized coefficient D is given by
                                                           n      n
                                    d                      i=1    j=1 aij bij
                                D=            n         n           n         n         1                     (2)
                                                               2                     2 2
                                        (     i=1       j=1 aij .   i=1       j=1 bij )




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Daniel’s Generalized correlation coefficient


              H.E. Daniels of Cambridge University, a close associate of
              Kendall, proposed a measure in 1944 to unify Pearson’s r ,
              Spearman’s ρ and Kendall’s τ [The relation between
              measures of correlation in the universe of sample
              permutations, H.E.Daniels, Biometrika,(1944)].
              Consider n data points given by (Xi , Yi ), i = 1(|)n , for each
              pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0,
              similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s
              generalized coefficient D is given by
                                                           n      n
                                    d                      i=1    j=1 aij bij
                                D=            n         n           n         n         1                     (2)
                                                               2                     2 2
                                        (     i=1       j=1 aij .   i=1       j=1 bij )




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Daniel’s generalized coefficient contd.




       Special cases
           Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s r .

              Put aij as Rank(Xj ) − Rank(Xi ) and bij as
              Rank(Yj ) − Rank(Yi ) to get Spearman’s ρ.

              Put aij as sgn(Xj − Xi ) and bij as sgn(Yj − Yi ) to get
              Kendall’s τ .




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Alternative expression for τ and ρ

              First, we define dij to be +1 when the rank j ( j > i) precedes
              the rank i in the second ranking and zero otherwise.
              We can write the Kendall’s τ as the following

                                                                   4Q
                                                  τ =1−                                                       (3)
                                                                 n(n − 1)
              where Q is the total score, Q = i<j dij and n is the total
              number of elements in the sample.
              Similarly, we can write Spearman’s ρ as the following

                                                                  12V
                                                  ρ=1−                                                        (4)
                                                                n(n2 − 1)
              where V = i<j (j − i)dij is the sum of inversions weighted
              by the numerical difference between the ranks inverted. This
              difference is called the weight of inversion.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Alternative expression for τ and ρ

              First, we define dij to be +1 when the rank j ( j > i) precedes
              the rank i in the second ranking and zero otherwise.
              We can write the Kendall’s τ as the following

                                                                   4Q
                                                  τ =1−                                                       (3)
                                                                 n(n − 1)
              where Q is the total score, Q = i<j dij and n is the total
              number of elements in the sample.
              Similarly, we can write Spearman’s ρ as the following

                                                                  12V
                                                  ρ=1−                                                        (4)
                                                                n(n2 − 1)
              where V = i<j (j − i)dij is the sum of inversions weighted
              by the numerical difference between the ranks inverted. This
              difference is called the weight of inversion.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
An interesting result




              We simulated observations in large sample size from a
              bivariate normal distribution and plotted the mean values of
              Spearman’s ρ and Kendall’s τ against Pearson’s r . We
              obtained the following graph.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The graph




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Relation of τ and ρ with r for BVN

              In 1907, Pearson , in his book [”On Further Methods of
              Determining Correlation”, Karl Pearson, Biometric series IV,
              (1907)], established the following relation between
              Spearman’s ρ and his r for bivariate normal distribution.
                                                                      π
                                                     r = 2 sin          ρ                                     (5)
                                                                      6
              Cramer, in 1946, also established a relation between Kendall’s
              τ and Pearson’s r for bivariate normal.
                                                                    π
                                                      r = sin         τ                                       (6)
                                                                    2
              However it is easy to show that the above two relations hold
              for any elliptic distribution.


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Relation of τ and ρ with r for BVN

              In 1907, Pearson , in his book [”On Further Methods of
              Determining Correlation”, Karl Pearson, Biometric series IV,
              (1907)], established the following relation between
              Spearman’s ρ and his r for bivariate normal distribution.
                                                                      π
                                                     r = 2 sin          ρ                                     (5)
                                                                      6
              Cramer, in 1946, also established a relation between Kendall’s
              τ and Pearson’s r for bivariate normal.
                                                                    π
                                                      r = sin         τ                                       (6)
                                                                    2
              However it is easy to show that the above two relations hold
              for any elliptic distribution.


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Relation of τ and ρ with r for BVN

              In 1907, Pearson , in his book [”On Further Methods of
              Determining Correlation”, Karl Pearson, Biometric series IV,
              (1907)], established the following relation between
              Spearman’s ρ and his r for bivariate normal distribution.
                                                                      π
                                                     r = 2 sin          ρ                                     (5)
                                                                      6
              Cramer, in 1946, also established a relation between Kendall’s
              τ and Pearson’s r for bivariate normal.
                                                                    π
                                                      r = sin         τ                                       (6)
                                                                    2
              However it is easy to show that the above two relations hold
              for any elliptic distribution.


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Relation between Kendall’s τ and r for bivariate
 normal
              Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from
              BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is
              an unbiased estimator of

                  2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7)

              where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ).
                                   d √    √
              Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W )
              have standard normal distribution. Since (Z1 , Z2 ) is symmetric
              about (0, 0)

              4P(Z1 > 0, Z2 > 0)−1 = 4P(V       1 − r 2 +Wr > 0, W > 0)−1
                                                                       (8)
              Use polar transformation on (V , W ) and evaluate this
              probability to get π sin−1 r .
                                 2


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Relation between Kendall’s τ and r for bivariate
 normal
              Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from
              BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is
              an unbiased estimator of

                  2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7)

              where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ).
                                   d √    √
              Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W )
              have standard normal distribution. Since (Z1 , Z2 ) is symmetric
              about (0, 0)

              4P(Z1 > 0, Z2 > 0)−1 = 4P(V       1 − r 2 +Wr > 0, W > 0)−1
                                                                       (8)
              Use polar transformation on (V , W ) and evaluate this
              probability to get π sin−1 r .
                                 2


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Relation between Spearman’s ρ and r for bivariate
 normal

              Now we try to give a sketch of a proof of the relationship
              between Pearson’s r and Spearman’s ρ for bivariate normal
              distribution .
              Let R(Xi ) and R(Yi ) be the ranks of Xi and Yi . Define
              H(t) = I{t>0} . Then, observe that
                                                          n
                                          R(Xi ) =             H(Xi − Xj ) + 1                                (9)
                                                         j=1

              Note that Spearman’s ρ is the Pearson’s correlation coefficient
                                                                        h− 1 n(n−1)2
              between R(Xi ) and R(Yi ) which is                          1
                                                                            4
                                                                            n(n2 −1)
                                                                         12
                                    n         n         n
              where h =             i=1       j=1       k=1 H(Xi       − Xj )H(Yi − Yk ).


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Proof continued




       Case 1
           If i, j, k are distinct, then (Xi − Xj , Yi − Yk ) are distributed as
                              r
           BVN(0, 0, 2, 2, 2 ).
              E {H(Xi − Xj )H(Yi − Yk )} will reduce to the integral of the
              probability density over the positive quadrant.
              We can check, following similar technique as in the case of τ
              that, this integral is 2 (1 − π cos−1 2 ).
                                     1      1       r




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Proof continued



       Case 2
           If i = j = k, then (Xi − Xj , Yi − Yk ) are distributed as
           BVN(0, 0, 2, 2, r ) and the above expectation would reduce to
           1       1    −1 r ). Then,
           2 (1 − π cos



                       h − 4 n(n − 1)2
                            1
                                                         6      n − 2 −1 r   1
                E        1     2
                                                    =                sin   +   sin−1 r
                         12 n(n − 1)
                                                         π      n+1      2 n+1
                                                                                    (10)
              As n goes to infinity, the R.H.S reduces to                             6
                                                                                     π   sin−1 2 .
                                                                                               r




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Reason for approximate linear relationship between
 Spearman’s ρ and Pearson’s r for BVN



              As observed from the graph, Spearman’s ρ for Bivariate
              normal is almost linearly related with Pearson’s r . This may
              be attributed to the fact that ρ = π sin−1 2
                                                  6      r
                              3
                = π ( 2 + 1 r8 + . . .)
                  6 r
                          6
                   3
                 = π r + terms very small compared to 1st order term
                    3
                 ≈ πr
              For Kendall’s τ , using similar expansion, we can also show
              that τ convex function of r in the interval [0,1]. a




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Kendall’s comparative assessment of τ and ρ


                                                                −n                          3
              Kendall in his paper admitted that ρ can take n 6 values
                                                              2 −n
              between −1 and +1, whereas τ can take only n 2 values in
              the range, but according to him, this does not seriously affect
              the sensitivity of τ .
              Both Kendall’s τ and Spearman’s ρ computed from the
              sample have asymptotically normal distributions.
              But Kendall showed using simulation experiments that the
              distribution for his correlation coefficient is surprisingly close
              to normal even for small values of n, which is not the case for
              Spearman’s correlation.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Bias properties of Kendall’s τ and Spearman’s ρ


              Consider a finite population. Let ρ and τ be Spearman’s
              and Kendall’s rank correlation coefficients computed from the
              entire population.
              Suppose that we have a simple random sample without
              replacement from that population. And we compute
              Spearman’s ρ and Kendall’s τ from the sample.
              Then, τ is an unbiased estimator for τ but ρ is a biased
              estimator for ρ .
              If the population size N tends to infinity, expected value of
                                      1
              Spearman’s ρ goes to n+1 {3τ + (n − 2)ρ } where n is the
              size of the sample.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
small sample distribution of τ , ρ and r

              It is well-known that for a simple random sample of size n
              drawn from a bivariate normal distribution, under the
              assumption of zero correlation, Pearson’s r satisfies
                                        √
                                       r n−2
                                       √        ∼ tn−2                   (11)
                                         1 − r2
              But the distribution of r for small samples from normal
              distribution with non-zero correlation and from non-normal
              distributions, is not tractable.
              τ and ρ are distribution free statistics in the sense that their
              distributions do not depend on the distribution of the data so
              long as X and Y are independent. Consequently, their
              distributions under the hypothesis of independence of X and
              Y can be tabulated.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Asymptotic normality of r , ρ and τ



              Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τ
              computed from a bivariate data are asymptotically normally
              distributed.
              Asymptotic normality of Pearson’s r can be derived using
              Central Limit Theorem applied to various bivariate sample
              moments.
              Asymptotic normality of Spearman’s ρ follows from
              asymptotic normality of linear rank statistics.
              Asymptotic normality of Kendall’s τ follows from asymptotic
              normality of U-statistics.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
List of contents




              Historical overview of rank correlation.
              Some properties of rank correlation.
              A practical example of rank correlation.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
A practical application of rank correlation



              Recently, the Ministry of Human Resource Development
              (MHRD) considered giving weightage to the marks scored in
              the 10+2 Board exams for admission to engineering colleges
              in India.
              The raw scores across the Boards are not comparable. So,
              they wanted help in this regard from the Indian Statistical
              Institute.
              The use of percentile ranks of students based on their
              aggregate scores was recommended by Indian Statistical
              Institute.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
A practical application of rank correlation



              Recently, the Ministry of Human Resource Development
              (MHRD) considered giving weightage to the marks scored in
              the 10+2 Board exams for admission to engineering colleges
              in India.
              The raw scores across the Boards are not comparable. So,
              they wanted help in this regard from the Indian Statistical
              Institute.
              The use of percentile ranks of students based on their
              aggregate scores was recommended by Indian Statistical
              Institute.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The Data


              Indian Statistical Institute was provided data from 4 boards
              (namely, ICSE , CBSE , West Bengal Board and
              Tamil Nadu Board) for two consecutive years 2008 and 2009
              Though the recommendation from Indian Statistical Institute
              was to use aggregate scores of a student for computing the
              percentile rank of the student (and that recommendation was
              favorably accepted by MHRD), a statistically interesting
              question is what happens if we consider various subject scores
              separately instead of the aggregate score.
              We intend to investigate this issue under some appropriate
              assumptions.
                 2




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The Data


              Indian Statistical Institute was provided data from 4 boards
              (namely, ICSE , CBSE , West Bengal Board and
              Tamil Nadu Board) for two consecutive years 2008 and 2009
              Though the recommendation from Indian Statistical Institute
              was to use aggregate scores of a student for computing the
              percentile rank of the student (and that recommendation was
              favorably accepted by MHRD), a statistically interesting
              question is what happens if we consider various subject scores
              separately instead of the aggregate score.
              We intend to investigate this issue under some appropriate
              assumptions.
                 2




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The Data


              Indian Statistical Institute was provided data from 4 boards
              (namely, ICSE , CBSE , West Bengal Board and
              Tamil Nadu Board) for two consecutive years 2008 and 2009
              Though the recommendation from Indian Statistical Institute
              was to use aggregate scores of a student for computing the
              percentile rank of the student (and that recommendation was
              favorably accepted by MHRD), a statistically interesting
              question is what happens if we consider various subject scores
              separately instead of the aggregate score.
              We intend to investigate this issue under some appropriate
              assumptions.
                 2




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The Model
              For convenience, let us consider only two subjects namely
              Mathematics and Physics.
              Let us denote the observed score of a student in Mathematics
              and Physics as XM and XP . Assume the existence of
              unobserved merit variables WP and WM such that the scores
              in the two subjects are related as

                                       XM ≈ gM (WM ) XP ≈ gP (WP )                                          (12)



              WM and WP may be treated as attributes of the student
              which depend on the knowledge and understanding of Maths
              and Physics respectively and also on other factors like
              schooling, intelligence etc.
              gM and gP relate to the examination procedure corresponding
              to the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The Model
              For convenience, let us consider only two subjects namely
              Mathematics and Physics.
              Let us denote the observed score of a student in Mathematics
              and Physics as XM and XP . Assume the existence of
              unobserved merit variables WP and WM such that the scores
              in the two subjects are related as

                                       XM ≈ gM (WM ) XP ≈ gP (WP )                                          (12)



              WM and WP may be treated as attributes of the student
              which depend on the knowledge and understanding of Maths
              and Physics respectively and also on other factors like
              schooling, intelligence etc.
              gM and gP relate to the examination procedure corresponding
              to the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
The Model
              For convenience, let us consider only two subjects namely
              Mathematics and Physics.
              Let us denote the observed score of a student in Mathematics
              and Physics as XM and XP . Assume the existence of
              unobserved merit variables WP and WM such that the scores
              in the two subjects are related as

                                       XM ≈ gM (WM ) XP ≈ gP (WP )                                          (12)



              WM and WP may be treated as attributes of the student
              which depend on the knowledge and understanding of Maths
              and Physics respectively and also on other factors like
              schooling, intelligence etc.
              gM and gP relate to the examination procedure corresponding
              to the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Formulation of the model




              Two students may obtain different scores in Mathematics and
              Physics because of the difference in their merit variables WM
              and WP or due to the difference in examination procedure gM
              and gP across the boards.

              It is time that we lay down our assumptions about WM , WP
              and gM and gP .




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Assumptions of the model



       Assumption 1
           The functions gP and gM are monotonically increasing. This
           implies the scores of the students are expected to increase
           from less meritorious to more meritorious students for each of
           the two subjects.

       Assumption 2
              The joint distribution of (WP , WM ) for the students is the
              same in different boards.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
How Assumptions can be checked


              Imagine a common test in Mathematics and Physics taken by
              students of all the boards.
              Mathematics score in the common test would be a monotone
              function of the Mathematics score in the board examination,
              as both are monotone functions of the same merit variable.
              (The same holds for Physics scores).
              This can be tested by using Spearman’s ρ and Kendall’s τ
              statistics.
              Mathematics and Physics scores in the common test would
              have the same distribution in the subpopulations
              corresponding to different boards.
              This can be tested using any non-parametric test for equality
              of bivariate distributions.


Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Is there a way to check the validity of these
       assumptions using currently available data?




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
How assumptions can be checked without a
 common test



              According to Assumption 2, the dependence between merits
              in Physics and Mathematics should be similar in all the
              boards.
              Rank correlation between Physics and Mathematics scores in
              a particular board should not depend on the board-specific
              monotone functions gM and gP .
              Therefore, rank correlation between Physics and Mathematics
              scores across the boards should be the same.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Rank correlation between Physics & Maths for
 different boards and years




          0

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Rank correlation Physics & Chemistry




       Figure: Rank correlation between Physics and Chemistry marks over
       years

          0
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
bar chart of rank correlation Chemistry & Maths




       Figure: Rank correlation between Chemistry and Maths marks over years


          m

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Subject percentile graph WBHS 2008




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Variation of a subject across a board same year




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Inference from the data analysis
              Between boards variation is significantly higher than within
              board variation across the two years.
              Visibly,there is high correlation in Tamil Nadu Board, whereas
              low correlation is observed in CBSE Board.
              If we interpret the data available as a large sample from a
              larger hypothetical population, the rank correlation computed
              for a board in a particular year will have an approximate
              normal distribution.
              So, we can use this rank correlation values to carry out
              ANOVA type statistical analysis to see whether there is
              significant difference values across different boards and across
              different years. When this is done, rank correlation appears to
              be significant across different boards.
              This essentially implies breakdown of Assumption 2.
              Study of the rank correlation brings out this fact even without
              scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Inference from the data analysis
              Between boards variation is significantly higher than within
              board variation across the two years.
              Visibly,there is high correlation in Tamil Nadu Board, whereas
              low correlation is observed in CBSE Board.
              If we interpret the data available as a large sample from a
              larger hypothetical population, the rank correlation computed
              for a board in a particular year will have an approximate
              normal distribution.
              So, we can use this rank correlation values to carry out
              ANOVA type statistical analysis to see whether there is
              significant difference values across different boards and across
              different years. When this is done, rank correlation appears to
              be significant across different boards.
              This essentially implies breakdown of Assumption 2.
              Study of the rank correlation brings out this fact even without
              scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Inference from the data analysis
              Between boards variation is significantly higher than within
              board variation across the two years.
              Visibly,there is high correlation in Tamil Nadu Board, whereas
              low correlation is observed in CBSE Board.
              If we interpret the data available as a large sample from a
              larger hypothetical population, the rank correlation computed
              for a board in a particular year will have an approximate
              normal distribution.
              So, we can use this rank correlation values to carry out
              ANOVA type statistical analysis to see whether there is
              significant difference values across different boards and across
              different years. When this is done, rank correlation appears to
              be significant across different boards.
              This essentially implies breakdown of Assumption 2.
              Study of the rank correlation brings out this fact even without
              scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Inference from the data analysis
              Between boards variation is significantly higher than within
              board variation across the two years.
              Visibly,there is high correlation in Tamil Nadu Board, whereas
              low correlation is observed in CBSE Board.
              If we interpret the data available as a large sample from a
              larger hypothetical population, the rank correlation computed
              for a board in a particular year will have an approximate
              normal distribution.
              So, we can use this rank correlation values to carry out
              ANOVA type statistical analysis to see whether there is
              significant difference values across different boards and across
              different years. When this is done, rank correlation appears to
              be significant across different boards.
              This essentially implies breakdown of Assumption 2.
              Study of the rank correlation brings out this fact even without
              scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Inference from the data analysis
              Between boards variation is significantly higher than within
              board variation across the two years.
              Visibly,there is high correlation in Tamil Nadu Board, whereas
              low correlation is observed in CBSE Board.
              If we interpret the data available as a large sample from a
              larger hypothetical population, the rank correlation computed
              for a board in a particular year will have an approximate
              normal distribution.
              So, we can use this rank correlation values to carry out
              ANOVA type statistical analysis to see whether there is
              significant difference values across different boards and across
              different years. When this is done, rank correlation appears to
              be significant across different boards.
              This essentially implies breakdown of Assumption 2.
              Study of the rank correlation brings out this fact even without
              scores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Acknowledgement




       I would like to express my gratitude towards my mentors for this
       project, Prof.Probal Chaudhuri and Prof. Debasis Sengupta
       for their immense co-operation. I would also like to think all those
       who have been associated with this work in some way or the other.




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla
Thank You




Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011
                                                               On some interesting features and an application of rank correla

More Related Content

What's hot

Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
_pem
 
History of Environmental Grassroots movement
History of Environmental Grassroots movementHistory of Environmental Grassroots movement
History of Environmental Grassroots movement
meharoof786
 
Coordinate transformation
Coordinate transformationCoordinate transformation
Coordinate transformation
RCC Institute of Information Technology
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - Thiyagu
Thiyagu K
 
Measures of dispersion range qd md
Measures of dispersion range qd mdMeasures of dispersion range qd md
Measures of dispersion range qd md
RekhaChoudhary24
 
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville ProblemGreen’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
IJSRED
 
Principal ideal
Principal idealPrincipal ideal
Principal ideal
Muhammad Umar Farooq
 
Environment and Education
Environment and EducationEnvironment and Education
Environment and Education
Ma Elena Oblino Abainza
 
Ecologically Sustainable Development
Ecologically Sustainable DevelopmentEcologically Sustainable Development
Ecologically Sustainable Development
Hawkesdale P12 College
 
Road to Rio+20, UN Conference on Sustainable Development 2012
Road to Rio+20, UN Conference on Sustainable Development 2012Road to Rio+20, UN Conference on Sustainable Development 2012
Road to Rio+20, UN Conference on Sustainable Development 2012
ISCIENCES, L.L.C.
 
Relations and Its Applications
Relations and Its ApplicationsRelations and Its Applications
Relations and Its Applications
Vedavyas Sheoy
 
Tribal Livelihoods in Odisha, (Biriguda village)
Tribal Livelihoods in Odisha, (Biriguda village)Tribal Livelihoods in Odisha, (Biriguda village)
Tribal Livelihoods in Odisha, (Biriguda village)
Sourav Anand
 
Correlation- an introduction and application of spearman rank correlation by...
Correlation- an introduction and application of spearman rank correlation  by...Correlation- an introduction and application of spearman rank correlation  by...
Correlation- an introduction and application of spearman rank correlation by...
Gunjan Verma
 
EVS GE6351-unit 4
EVS GE6351-unit 4EVS GE6351-unit 4
EVS GE6351-unit 4
SASI KUMAR C
 
Measures Taken to Preserve Fauna And Flora Of Our Country.pptx
Measures Taken to Preserve Fauna And Flora Of Our Country.pptxMeasures Taken to Preserve Fauna And Flora Of Our Country.pptx
Measures Taken to Preserve Fauna And Flora Of Our Country.pptx
NishathAnjum4
 
Meaning and types of correlation
Meaning and types of correlationMeaning and types of correlation
Meaning and types of correlation
Higher Education Department JK
 
Fractional calculus and applications
Fractional calculus and applicationsFractional calculus and applications
Fractional calculus and applications
PlusOrMinusZero
 
Introduction to Econometrics
Introduction to EconometricsIntroduction to Econometrics
Introduction to Econometrics
Almaszabeen Badekhan
 
Environmental determinism and possibilism
Environmental determinism and possibilismEnvironmental determinism and possibilism
Environmental determinism and possibilism
Amstrongofori
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
yatin bhardwaj
 

What's hot (20)

Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 
History of Environmental Grassroots movement
History of Environmental Grassroots movementHistory of Environmental Grassroots movement
History of Environmental Grassroots movement
 
Coordinate transformation
Coordinate transformationCoordinate transformation
Coordinate transformation
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - Thiyagu
 
Measures of dispersion range qd md
Measures of dispersion range qd mdMeasures of dispersion range qd md
Measures of dispersion range qd md
 
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville ProblemGreen’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
 
Principal ideal
Principal idealPrincipal ideal
Principal ideal
 
Environment and Education
Environment and EducationEnvironment and Education
Environment and Education
 
Ecologically Sustainable Development
Ecologically Sustainable DevelopmentEcologically Sustainable Development
Ecologically Sustainable Development
 
Road to Rio+20, UN Conference on Sustainable Development 2012
Road to Rio+20, UN Conference on Sustainable Development 2012Road to Rio+20, UN Conference on Sustainable Development 2012
Road to Rio+20, UN Conference on Sustainable Development 2012
 
Relations and Its Applications
Relations and Its ApplicationsRelations and Its Applications
Relations and Its Applications
 
Tribal Livelihoods in Odisha, (Biriguda village)
Tribal Livelihoods in Odisha, (Biriguda village)Tribal Livelihoods in Odisha, (Biriguda village)
Tribal Livelihoods in Odisha, (Biriguda village)
 
Correlation- an introduction and application of spearman rank correlation by...
Correlation- an introduction and application of spearman rank correlation  by...Correlation- an introduction and application of spearman rank correlation  by...
Correlation- an introduction and application of spearman rank correlation by...
 
EVS GE6351-unit 4
EVS GE6351-unit 4EVS GE6351-unit 4
EVS GE6351-unit 4
 
Measures Taken to Preserve Fauna And Flora Of Our Country.pptx
Measures Taken to Preserve Fauna And Flora Of Our Country.pptxMeasures Taken to Preserve Fauna And Flora Of Our Country.pptx
Measures Taken to Preserve Fauna And Flora Of Our Country.pptx
 
Meaning and types of correlation
Meaning and types of correlationMeaning and types of correlation
Meaning and types of correlation
 
Fractional calculus and applications
Fractional calculus and applicationsFractional calculus and applications
Fractional calculus and applications
 
Introduction to Econometrics
Introduction to EconometricsIntroduction to Econometrics
Introduction to Econometrics
 
Environmental determinism and possibilism
Environmental determinism and possibilismEnvironmental determinism and possibilism
Environmental determinism and possibilism
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 

Viewers also liked

Spearman Rank Correlation Presentation
Spearman Rank Correlation PresentationSpearman Rank Correlation Presentation
Spearman Rank Correlation Presentation
cae_021
 
Correlation ppt...
Correlation ppt...Correlation ppt...
Correlation ppt...
Shruti Srivastava
 
A&amp;S 350 Vr
A&amp;S 350 VrA&amp;S 350 Vr
A&amp;S 350 Vr
zjmo222
 
Stats powerpoint presentation
Stats powerpoint presentationStats powerpoint presentation
Stats powerpoint presentation
jpratt23
 
Ols by hiron
Ols by hironOls by hiron
Ols by hiron
mrhasibecon
 
Rank correlation
Rank correlationRank correlation
Rank correlation
Brainmapsolutions
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
Elkana Rorio
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introduction
stata_org_uk
 
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear RegressionSimple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Philip Tiongson
 
Report writing
Report writingReport writing
Report writing
Samuel Maniraj
 
Measures of correlation (pearson's r correlation coefficient and spearman rho)
Measures of correlation (pearson's r correlation coefficient and spearman rho)Measures of correlation (pearson's r correlation coefficient and spearman rho)
Measures of correlation (pearson's r correlation coefficient and spearman rho)
Jyl Matz
 
Regression
RegressionRegression
Regression
mandrewmartin
 
Spearman Rank
Spearman RankSpearman Rank
Spearman Rank
i-study-co-uk
 
GCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s RankGCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s Rank
Mark Cowan
 
Correlation
CorrelationCorrelation
Correlation
James Neill
 
Spearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation CoefficientSpearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation Coefficient
Sharlaine Ruth
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
Harsh Upadhyay
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
Azmi Mohd Tamil
 
What is a Spearman's Rank Order Correlation (independence)?
What is a Spearman's Rank Order Correlation (independence)?What is a Spearman's Rank Order Correlation (independence)?
What is a Spearman's Rank Order Correlation (independence)?
Ken Plummer
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Ravi shankar
 

Viewers also liked (20)

Spearman Rank Correlation Presentation
Spearman Rank Correlation PresentationSpearman Rank Correlation Presentation
Spearman Rank Correlation Presentation
 
Correlation ppt...
Correlation ppt...Correlation ppt...
Correlation ppt...
 
A&amp;S 350 Vr
A&amp;S 350 VrA&amp;S 350 Vr
A&amp;S 350 Vr
 
Stats powerpoint presentation
Stats powerpoint presentationStats powerpoint presentation
Stats powerpoint presentation
 
Ols by hiron
Ols by hironOls by hiron
Ols by hiron
 
Rank correlation
Rank correlationRank correlation
Rank correlation
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introduction
 
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear RegressionSimple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
 
Report writing
Report writingReport writing
Report writing
 
Measures of correlation (pearson's r correlation coefficient and spearman rho)
Measures of correlation (pearson's r correlation coefficient and spearman rho)Measures of correlation (pearson's r correlation coefficient and spearman rho)
Measures of correlation (pearson's r correlation coefficient and spearman rho)
 
Regression
RegressionRegression
Regression
 
Spearman Rank
Spearman RankSpearman Rank
Spearman Rank
 
GCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s RankGCSE Geography: How And Why To Use Spearman’s Rank
GCSE Geography: How And Why To Use Spearman’s Rank
 
Correlation
CorrelationCorrelation
Correlation
 
Spearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation CoefficientSpearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation Coefficient
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
 
What is a Spearman's Rank Order Correlation (independence)?
What is a Spearman's Rank Order Correlation (independence)?What is a Spearman's Rank Order Correlation (independence)?
What is a Spearman's Rank Order Correlation (independence)?
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Recently uploaded

MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
S. Raj Kumar
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 

Recently uploaded (20)

MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 

Rank correlation- some features and an application

  • 1. On some interesting features and an application of rank correlation Kushal Kr. Dey Indian Statistical Institute D.Basu Memorial Award Talk 2011 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 2. List of contents 1 Historical overview of rank correlation. 2 Some properties of rank correlation. 3 A practical example of rank correlation. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 3. Historical Overview—Correlation In 1886, Sir Francis Galton coined the term correlation by quoting length of a human arm is said to be correlated with that of the leg, because a person with long arm has usually long legs and conversely. Galton wanted a measure of correlation that takes value +1 for perfect correspondence, 0 for independence, and -1 for perfect inverse correspondence. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 4. Historical Overview—Correlation In 1886, Sir Francis Galton coined the term correlation by quoting length of a human arm is said to be correlated with that of the leg, because a person with long arm has usually long legs and conversely. Galton wanted a measure of correlation that takes value +1 for perfect correspondence, 0 for independence, and -1 for perfect inverse correspondence. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 5. Historical overview—contd. Karl Pearson, a student of Galton, worked on his idea and formulated his ”product moments” measure of correlation in 1896. Sxy r=√ . (1) Sxx Syy Spearman observed that for characteristics not quantitatively measurable, the Pearsonian measure fails to measure the association. This motivated him to use rank-based methods for association and develop his rank correlation coefficient in 1904. [”The proof and measurement of association between two things” by C. Spearman in The American Journal of Psychology (1904)]. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 6. Historical overview—contd. Karl Pearson, a student of Galton, worked on his idea and formulated his ”product moments” measure of correlation in 1896. Sxy r=√ . (1) Sxx Syy Spearman observed that for characteristics not quantitatively measurable, the Pearsonian measure fails to measure the association. This motivated him to use rank-based methods for association and develop his rank correlation coefficient in 1904. [”The proof and measurement of association between two things” by C. Spearman in The American Journal of Psychology (1904)]. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 7. Historical overview contd In 1938, two years after the death of Pearson, Maurice Kendall, a British scientist, while working on psychological experiments, came up with a new measure of correlation popularly known as Kendall’s τ . [”A new measure of rank correlation”, M. Kendall, Biometrika,(1938)]. Th next few years saw extensive research in this area due to Kendall, Daniels, Hoeffding and others. In 1954, a modification to Kendall’s coefficient in case of ties was made by Goodman and Kruskal. [”Measures of association for cross classifications” Part I, L.A.Goodman and W.H. Kruskal, J. Amer. Statist. Assoc, (1954)] Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 8. Historical overview contd In 1938, two years after the death of Pearson, Maurice Kendall, a British scientist, while working on psychological experiments, came up with a new measure of correlation popularly known as Kendall’s τ . [”A new measure of rank correlation”, M. Kendall, Biometrika,(1938)]. Th next few years saw extensive research in this area due to Kendall, Daniels, Hoeffding and others. In 1954, a modification to Kendall’s coefficient in case of ties was made by Goodman and Kruskal. [”Measures of association for cross classifications” Part I, L.A.Goodman and W.H. Kruskal, J. Amer. Statist. Assoc, (1954)] Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 9. Daniel’s Generalized correlation coefficient H.E. Daniels of Cambridge University, a close associate of Kendall, proposed a measure in 1944 to unify Pearson’s r , Spearman’s ρ and Kendall’s τ [The relation between measures of correlation in the universe of sample permutations, H.E.Daniels, Biometrika,(1944)]. Consider n data points given by (Xi , Yi ), i = 1(|)n , for each pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0, similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s generalized coefficient D is given by n n d i=1 j=1 aij bij D= n n n n 1 (2) 2 2 2 ( i=1 j=1 aij . i=1 j=1 bij ) Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 10. Daniel’s Generalized correlation coefficient H.E. Daniels of Cambridge University, a close associate of Kendall, proposed a measure in 1944 to unify Pearson’s r , Spearman’s ρ and Kendall’s τ [The relation between measures of correlation in the universe of sample permutations, H.E.Daniels, Biometrika,(1944)]. Consider n data points given by (Xi , Yi ), i = 1(|)n , for each pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0, similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s generalized coefficient D is given by n n d i=1 j=1 aij bij D= n n n n 1 (2) 2 2 2 ( i=1 j=1 aij . i=1 j=1 bij ) Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 11. Daniel’s generalized coefficient contd. Special cases Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s r . Put aij as Rank(Xj ) − Rank(Xi ) and bij as Rank(Yj ) − Rank(Yi ) to get Spearman’s ρ. Put aij as sgn(Xj − Xi ) and bij as sgn(Yj − Yi ) to get Kendall’s τ . Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 12. Alternative expression for τ and ρ First, we define dij to be +1 when the rank j ( j > i) precedes the rank i in the second ranking and zero otherwise. We can write the Kendall’s τ as the following 4Q τ =1− (3) n(n − 1) where Q is the total score, Q = i<j dij and n is the total number of elements in the sample. Similarly, we can write Spearman’s ρ as the following 12V ρ=1− (4) n(n2 − 1) where V = i<j (j − i)dij is the sum of inversions weighted by the numerical difference between the ranks inverted. This difference is called the weight of inversion. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 13. Alternative expression for τ and ρ First, we define dij to be +1 when the rank j ( j > i) precedes the rank i in the second ranking and zero otherwise. We can write the Kendall’s τ as the following 4Q τ =1− (3) n(n − 1) where Q is the total score, Q = i<j dij and n is the total number of elements in the sample. Similarly, we can write Spearman’s ρ as the following 12V ρ=1− (4) n(n2 − 1) where V = i<j (j − i)dij is the sum of inversions weighted by the numerical difference between the ranks inverted. This difference is called the weight of inversion. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 14. An interesting result We simulated observations in large sample size from a bivariate normal distribution and plotted the mean values of Spearman’s ρ and Kendall’s τ against Pearson’s r . We obtained the following graph. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 15. The graph Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 16. Relation of τ and ρ with r for BVN In 1907, Pearson , in his book [”On Further Methods of Determining Correlation”, Karl Pearson, Biometric series IV, (1907)], established the following relation between Spearman’s ρ and his r for bivariate normal distribution. π r = 2 sin ρ (5) 6 Cramer, in 1946, also established a relation between Kendall’s τ and Pearson’s r for bivariate normal. π r = sin τ (6) 2 However it is easy to show that the above two relations hold for any elliptic distribution. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 17. Relation of τ and ρ with r for BVN In 1907, Pearson , in his book [”On Further Methods of Determining Correlation”, Karl Pearson, Biometric series IV, (1907)], established the following relation between Spearman’s ρ and his r for bivariate normal distribution. π r = 2 sin ρ (5) 6 Cramer, in 1946, also established a relation between Kendall’s τ and Pearson’s r for bivariate normal. π r = sin τ (6) 2 However it is easy to show that the above two relations hold for any elliptic distribution. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 18. Relation of τ and ρ with r for BVN In 1907, Pearson , in his book [”On Further Methods of Determining Correlation”, Karl Pearson, Biometric series IV, (1907)], established the following relation between Spearman’s ρ and his r for bivariate normal distribution. π r = 2 sin ρ (5) 6 Cramer, in 1946, also established a relation between Kendall’s τ and Pearson’s r for bivariate normal. π r = sin τ (6) 2 However it is easy to show that the above two relations hold for any elliptic distribution. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 19. Relation between Kendall’s τ and r for bivariate normal Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is an unbiased estimator of 2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7) where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ). d √ √ Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W ) have standard normal distribution. Since (Z1 , Z2 ) is symmetric about (0, 0) 4P(Z1 > 0, Z2 > 0)−1 = 4P(V 1 − r 2 +Wr > 0, W > 0)−1 (8) Use polar transformation on (V , W ) and evaluate this probability to get π sin−1 r . 2 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 20. Relation between Kendall’s τ and r for bivariate normal Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is an unbiased estimator of 2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7) where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ). d √ √ Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W ) have standard normal distribution. Since (Z1 , Z2 ) is symmetric about (0, 0) 4P(Z1 > 0, Z2 > 0)−1 = 4P(V 1 − r 2 +Wr > 0, W > 0)−1 (8) Use polar transformation on (V , W ) and evaluate this probability to get π sin−1 r . 2 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 21. Relation between Spearman’s ρ and r for bivariate normal Now we try to give a sketch of a proof of the relationship between Pearson’s r and Spearman’s ρ for bivariate normal distribution . Let R(Xi ) and R(Yi ) be the ranks of Xi and Yi . Define H(t) = I{t>0} . Then, observe that n R(Xi ) = H(Xi − Xj ) + 1 (9) j=1 Note that Spearman’s ρ is the Pearson’s correlation coefficient h− 1 n(n−1)2 between R(Xi ) and R(Yi ) which is 1 4 n(n2 −1) 12 n n n where h = i=1 j=1 k=1 H(Xi − Xj )H(Yi − Yk ). Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 22. Proof continued Case 1 If i, j, k are distinct, then (Xi − Xj , Yi − Yk ) are distributed as r BVN(0, 0, 2, 2, 2 ). E {H(Xi − Xj )H(Yi − Yk )} will reduce to the integral of the probability density over the positive quadrant. We can check, following similar technique as in the case of τ that, this integral is 2 (1 − π cos−1 2 ). 1 1 r Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 23. Proof continued Case 2 If i = j = k, then (Xi − Xj , Yi − Yk ) are distributed as BVN(0, 0, 2, 2, r ) and the above expectation would reduce to 1 1 −1 r ). Then, 2 (1 − π cos h − 4 n(n − 1)2 1 6 n − 2 −1 r 1 E 1 2 = sin + sin−1 r 12 n(n − 1) π n+1 2 n+1 (10) As n goes to infinity, the R.H.S reduces to 6 π sin−1 2 . r Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 24. Reason for approximate linear relationship between Spearman’s ρ and Pearson’s r for BVN As observed from the graph, Spearman’s ρ for Bivariate normal is almost linearly related with Pearson’s r . This may be attributed to the fact that ρ = π sin−1 2 6 r 3 = π ( 2 + 1 r8 + . . .) 6 r 6 3 = π r + terms very small compared to 1st order term 3 ≈ πr For Kendall’s τ , using similar expansion, we can also show that τ convex function of r in the interval [0,1]. a Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 25. Kendall’s comparative assessment of τ and ρ −n 3 Kendall in his paper admitted that ρ can take n 6 values 2 −n between −1 and +1, whereas τ can take only n 2 values in the range, but according to him, this does not seriously affect the sensitivity of τ . Both Kendall’s τ and Spearman’s ρ computed from the sample have asymptotically normal distributions. But Kendall showed using simulation experiments that the distribution for his correlation coefficient is surprisingly close to normal even for small values of n, which is not the case for Spearman’s correlation. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 26. Bias properties of Kendall’s τ and Spearman’s ρ Consider a finite population. Let ρ and τ be Spearman’s and Kendall’s rank correlation coefficients computed from the entire population. Suppose that we have a simple random sample without replacement from that population. And we compute Spearman’s ρ and Kendall’s τ from the sample. Then, τ is an unbiased estimator for τ but ρ is a biased estimator for ρ . If the population size N tends to infinity, expected value of 1 Spearman’s ρ goes to n+1 {3τ + (n − 2)ρ } where n is the size of the sample. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 27. small sample distribution of τ , ρ and r It is well-known that for a simple random sample of size n drawn from a bivariate normal distribution, under the assumption of zero correlation, Pearson’s r satisfies √ r n−2 √ ∼ tn−2 (11) 1 − r2 But the distribution of r for small samples from normal distribution with non-zero correlation and from non-normal distributions, is not tractable. τ and ρ are distribution free statistics in the sense that their distributions do not depend on the distribution of the data so long as X and Y are independent. Consequently, their distributions under the hypothesis of independence of X and Y can be tabulated. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 28. Asymptotic normality of r , ρ and τ Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τ computed from a bivariate data are asymptotically normally distributed. Asymptotic normality of Pearson’s r can be derived using Central Limit Theorem applied to various bivariate sample moments. Asymptotic normality of Spearman’s ρ follows from asymptotic normality of linear rank statistics. Asymptotic normality of Kendall’s τ follows from asymptotic normality of U-statistics. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 29. List of contents Historical overview of rank correlation. Some properties of rank correlation. A practical example of rank correlation. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 30. A practical application of rank correlation Recently, the Ministry of Human Resource Development (MHRD) considered giving weightage to the marks scored in the 10+2 Board exams for admission to engineering colleges in India. The raw scores across the Boards are not comparable. So, they wanted help in this regard from the Indian Statistical Institute. The use of percentile ranks of students based on their aggregate scores was recommended by Indian Statistical Institute. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 31. A practical application of rank correlation Recently, the Ministry of Human Resource Development (MHRD) considered giving weightage to the marks scored in the 10+2 Board exams for admission to engineering colleges in India. The raw scores across the Boards are not comparable. So, they wanted help in this regard from the Indian Statistical Institute. The use of percentile ranks of students based on their aggregate scores was recommended by Indian Statistical Institute. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 32. The Data Indian Statistical Institute was provided data from 4 boards (namely, ICSE , CBSE , West Bengal Board and Tamil Nadu Board) for two consecutive years 2008 and 2009 Though the recommendation from Indian Statistical Institute was to use aggregate scores of a student for computing the percentile rank of the student (and that recommendation was favorably accepted by MHRD), a statistically interesting question is what happens if we consider various subject scores separately instead of the aggregate score. We intend to investigate this issue under some appropriate assumptions. 2 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 33. The Data Indian Statistical Institute was provided data from 4 boards (namely, ICSE , CBSE , West Bengal Board and Tamil Nadu Board) for two consecutive years 2008 and 2009 Though the recommendation from Indian Statistical Institute was to use aggregate scores of a student for computing the percentile rank of the student (and that recommendation was favorably accepted by MHRD), a statistically interesting question is what happens if we consider various subject scores separately instead of the aggregate score. We intend to investigate this issue under some appropriate assumptions. 2 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 34. The Data Indian Statistical Institute was provided data from 4 boards (namely, ICSE , CBSE , West Bengal Board and Tamil Nadu Board) for two consecutive years 2008 and 2009 Though the recommendation from Indian Statistical Institute was to use aggregate scores of a student for computing the percentile rank of the student (and that recommendation was favorably accepted by MHRD), a statistically interesting question is what happens if we consider various subject scores separately instead of the aggregate score. We intend to investigate this issue under some appropriate assumptions. 2 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 35. The Model For convenience, let us consider only two subjects namely Mathematics and Physics. Let us denote the observed score of a student in Mathematics and Physics as XM and XP . Assume the existence of unobserved merit variables WP and WM such that the scores in the two subjects are related as XM ≈ gM (WM ) XP ≈ gP (WP ) (12) WM and WP may be treated as attributes of the student which depend on the knowledge and understanding of Maths and Physics respectively and also on other factors like schooling, intelligence etc. gM and gP relate to the examination procedure corresponding to the two subjects. They may vary across the boards. 3 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 36. The Model For convenience, let us consider only two subjects namely Mathematics and Physics. Let us denote the observed score of a student in Mathematics and Physics as XM and XP . Assume the existence of unobserved merit variables WP and WM such that the scores in the two subjects are related as XM ≈ gM (WM ) XP ≈ gP (WP ) (12) WM and WP may be treated as attributes of the student which depend on the knowledge and understanding of Maths and Physics respectively and also on other factors like schooling, intelligence etc. gM and gP relate to the examination procedure corresponding to the two subjects. They may vary across the boards. 3 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 37. The Model For convenience, let us consider only two subjects namely Mathematics and Physics. Let us denote the observed score of a student in Mathematics and Physics as XM and XP . Assume the existence of unobserved merit variables WP and WM such that the scores in the two subjects are related as XM ≈ gM (WM ) XP ≈ gP (WP ) (12) WM and WP may be treated as attributes of the student which depend on the knowledge and understanding of Maths and Physics respectively and also on other factors like schooling, intelligence etc. gM and gP relate to the examination procedure corresponding to the two subjects. They may vary across the boards. 3 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 38. Formulation of the model Two students may obtain different scores in Mathematics and Physics because of the difference in their merit variables WM and WP or due to the difference in examination procedure gM and gP across the boards. It is time that we lay down our assumptions about WM , WP and gM and gP . Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 39. Assumptions of the model Assumption 1 The functions gP and gM are monotonically increasing. This implies the scores of the students are expected to increase from less meritorious to more meritorious students for each of the two subjects. Assumption 2 The joint distribution of (WP , WM ) for the students is the same in different boards. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 40. How Assumptions can be checked Imagine a common test in Mathematics and Physics taken by students of all the boards. Mathematics score in the common test would be a monotone function of the Mathematics score in the board examination, as both are monotone functions of the same merit variable. (The same holds for Physics scores). This can be tested by using Spearman’s ρ and Kendall’s τ statistics. Mathematics and Physics scores in the common test would have the same distribution in the subpopulations corresponding to different boards. This can be tested using any non-parametric test for equality of bivariate distributions. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 41. Is there a way to check the validity of these assumptions using currently available data? Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 42. How assumptions can be checked without a common test According to Assumption 2, the dependence between merits in Physics and Mathematics should be similar in all the boards. Rank correlation between Physics and Mathematics scores in a particular board should not depend on the board-specific monotone functions gM and gP . Therefore, rank correlation between Physics and Mathematics scores across the boards should be the same. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 43. Rank correlation between Physics & Maths for different boards and years 0 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 44. Rank correlation Physics & Chemistry Figure: Rank correlation between Physics and Chemistry marks over years 0 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 45. bar chart of rank correlation Chemistry & Maths Figure: Rank correlation between Chemistry and Maths marks over years m Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 46. Subject percentile graph WBHS 2008 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 47. Variation of a subject across a board same year Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 48. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 49. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 50. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 51. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 52. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 53. Acknowledgement I would like to express my gratitude towards my mentors for this project, Prof.Probal Chaudhuri and Prof. Debasis Sengupta for their immense co-operation. I would also like to think all those who have been associated with this work in some way or the other. Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  • 54. Thank You Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla