On some interesting features and an                    application of rank correlation                                    ...
List of contents          1   Historical overview of rank correlation.          2   Some properties of rank correlation.  ...
Historical Overview—Correlation              In 1886, Sir Francis Galton coined the term correlation by              quoti...
Historical Overview—Correlation              In 1886, Sir Francis Galton coined the term correlation by              quoti...
Historical overview—contd.              Karl Pearson, a student of Galton, worked on his idea and              formulated ...
Historical overview—contd.              Karl Pearson, a student of Galton, worked on his idea and              formulated ...
Historical overview contd              In 1938, two years after the death of Pearson, Maurice              Kendall, a Brit...
Historical overview contd              In 1938, two years after the death of Pearson, Maurice              Kendall, a Brit...
Daniel’s Generalized correlation coefficient              H.E. Daniels of Cambridge University, a close associate of        ...
Daniel’s Generalized correlation coefficient              H.E. Daniels of Cambridge University, a close associate of        ...
Daniel’s generalized coefficient contd.       Special cases           Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s...
Alternative expression for τ and ρ              First, we define dij to be +1 when the rank j ( j > i) precedes            ...
Alternative expression for τ and ρ              First, we define dij to be +1 when the rank j ( j > i) precedes            ...
An interesting result              We simulated observations in large sample size from a              bivariate normal dis...
The graphKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011                             ...
Relation of τ and ρ with r for BVN              In 1907, Pearson , in his book [”On Further Methods of              Determ...
Relation of τ and ρ with r for BVN              In 1907, Pearson , in his book [”On Further Methods of              Determ...
Relation of τ and ρ with r for BVN              In 1907, Pearson , in his book [”On Further Methods of              Determ...
Relation between Kendall’s τ and r for bivariate normal              Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a s...
Relation between Kendall’s τ and r for bivariate normal              Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a s...
Relation between Spearman’s ρ and r for bivariate normal              Now we try to give a sketch of a proof of the relati...
Proof continued       Case 1           If i, j, k are distinct, then (Xi − Xj , Yi − Yk ) are distributed as              ...
Proof continued       Case 2           If i = j = k, then (Xi − Xj , Yi − Yk ) are distributed as           BVN(0, 0, 2, 2...
Reason for approximate linear relationship between Spearman’s ρ and Pearson’s r for BVN              As observed from the ...
Kendall’s comparative assessment of τ and ρ                                                                −n             ...
Bias properties of Kendall’s τ and Spearman’s ρ              Consider a finite population. Let ρ and τ be Spearman’s       ...
small sample distribution of τ , ρ and r              It is well-known that for a simple random sample of size n          ...
Asymptotic normality of r , ρ and τ              Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τ             ...
List of contents              Historical overview of rank correlation.              Some properties of rank correlation.  ...
A practical application of rank correlation              Recently, the Ministry of Human Resource Development             ...
A practical application of rank correlation              Recently, the Ministry of Human Resource Development             ...
The Data              Indian Statistical Institute was provided data from 4 boards              (namely, ICSE , CBSE , Wes...
The Data              Indian Statistical Institute was provided data from 4 boards              (namely, ICSE , CBSE , Wes...
The Data              Indian Statistical Institute was provided data from 4 boards              (namely, ICSE , CBSE , Wes...
The Model              For convenience, let us consider only two subjects namely              Mathematics and Physics.    ...
The Model              For convenience, let us consider only two subjects namely              Mathematics and Physics.    ...
The Model              For convenience, let us consider only two subjects namely              Mathematics and Physics.    ...
Formulation of the model              Two students may obtain different scores in Mathematics and              Physics beca...
Assumptions of the model       Assumption 1           The functions gP and gM are monotonically increasing. This          ...
How Assumptions can be checked              Imagine a common test in Mathematics and Physics taken by              student...
Is there a way to check the validity of these       assumptions using currently available data?Kushal Kr. Dey [1.5 pt] Ind...
How assumptions can be checked without a common test              According to Assumption 2, the dependence between merits...
Rank correlation between Physics & Maths for different boards and years          0Kushal Kr. Dey [1.5 pt] Indian Statistica...
Rank correlation Physics & Chemistry       Figure: Rank correlation between Physics and Chemistry marks over       years  ...
bar chart of rank correlation Chemistry & Maths       Figure: Rank correlation between Chemistry and Maths marks over year...
Subject percentile graph WBHS 2008Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011    ...
Variation of a subject across a board same yearKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award ...
Inference from the data analysis              Between boards variation is significantly higher than within              boa...
Inference from the data analysis              Between boards variation is significantly higher than within              boa...
Inference from the data analysis              Between boards variation is significantly higher than within              boa...
Inference from the data analysis              Between boards variation is significantly higher than within              boa...
Inference from the data analysis              Between boards variation is significantly higher than within              boa...
Acknowledgement       I would like to express my gratitude towards my mentors for this       project, Prof.Probal Chaudhur...
Thank YouKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011                             ...
Upcoming SlideShare
Loading in …5
×

Rank correlation- some features and an application

2,465 views

Published on

This PPT was presented by Kushal Kumar Dey, a B.Stat (undergraduate) student from Indian Statistical Institute, Kolkata for D.Basu Memorial Award

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,465
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
55
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Rank correlation- some features and an application

  1. 1. On some interesting features and an application of rank correlation Kushal Kr. Dey Indian Statistical Institute D.Basu Memorial Award Talk 2011Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  2. 2. List of contents 1 Historical overview of rank correlation. 2 Some properties of rank correlation. 3 A practical example of rank correlation.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  3. 3. Historical Overview—Correlation In 1886, Sir Francis Galton coined the term correlation by quoting length of a human arm is said to be correlated with that of the leg, because a person with long arm has usually long legs and conversely. Galton wanted a measure of correlation that takes value +1 for perfect correspondence, 0 for independence, and -1 for perfect inverse correspondence.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  4. 4. Historical Overview—Correlation In 1886, Sir Francis Galton coined the term correlation by quoting length of a human arm is said to be correlated with that of the leg, because a person with long arm has usually long legs and conversely. Galton wanted a measure of correlation that takes value +1 for perfect correspondence, 0 for independence, and -1 for perfect inverse correspondence.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  5. 5. Historical overview—contd. Karl Pearson, a student of Galton, worked on his idea and formulated his ”product moments” measure of correlation in 1896. Sxy r=√ . (1) Sxx Syy Spearman observed that for characteristics not quantitatively measurable, the Pearsonian measure fails to measure the association. This motivated him to use rank-based methods for association and develop his rank correlation coefficient in 1904. [”The proof and measurement of association between two things” by C. Spearman in The American Journal of Psychology (1904)].Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  6. 6. Historical overview—contd. Karl Pearson, a student of Galton, worked on his idea and formulated his ”product moments” measure of correlation in 1896. Sxy r=√ . (1) Sxx Syy Spearman observed that for characteristics not quantitatively measurable, the Pearsonian measure fails to measure the association. This motivated him to use rank-based methods for association and develop his rank correlation coefficient in 1904. [”The proof and measurement of association between two things” by C. Spearman in The American Journal of Psychology (1904)].Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  7. 7. Historical overview contd In 1938, two years after the death of Pearson, Maurice Kendall, a British scientist, while working on psychological experiments, came up with a new measure of correlation popularly known as Kendall’s τ . [”A new measure of rank correlation”, M. Kendall, Biometrika,(1938)]. Th next few years saw extensive research in this area due to Kendall, Daniels, Hoeffding and others. In 1954, a modification to Kendall’s coefficient in case of ties was made by Goodman and Kruskal. [”Measures of association for cross classifications” Part I, L.A.Goodman and W.H. Kruskal, J. Amer. Statist. Assoc, (1954)]Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  8. 8. Historical overview contd In 1938, two years after the death of Pearson, Maurice Kendall, a British scientist, while working on psychological experiments, came up with a new measure of correlation popularly known as Kendall’s τ . [”A new measure of rank correlation”, M. Kendall, Biometrika,(1938)]. Th next few years saw extensive research in this area due to Kendall, Daniels, Hoeffding and others. In 1954, a modification to Kendall’s coefficient in case of ties was made by Goodman and Kruskal. [”Measures of association for cross classifications” Part I, L.A.Goodman and W.H. Kruskal, J. Amer. Statist. Assoc, (1954)]Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  9. 9. Daniel’s Generalized correlation coefficient H.E. Daniels of Cambridge University, a close associate of Kendall, proposed a measure in 1944 to unify Pearson’s r , Spearman’s ρ and Kendall’s τ [The relation between measures of correlation in the universe of sample permutations, H.E.Daniels, Biometrika,(1944)]. Consider n data points given by (Xi , Yi ), i = 1(|)n , for each pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0, similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s generalized coefficient D is given by n n d i=1 j=1 aij bij D= n n n n 1 (2) 2 2 2 ( i=1 j=1 aij . i=1 j=1 bij )Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  10. 10. Daniel’s Generalized correlation coefficient H.E. Daniels of Cambridge University, a close associate of Kendall, proposed a measure in 1944 to unify Pearson’s r , Spearman’s ρ and Kendall’s τ [The relation between measures of correlation in the universe of sample permutations, H.E.Daniels, Biometrika,(1944)]. Consider n data points given by (Xi , Yi ), i = 1(|)n , for each pair of X ’s, (Xi , Xj ), we may allot aij = −aji and aii = 0, similarly, we may allot bij to the pair (Yi , Yj ), then Daniel’s generalized coefficient D is given by n n d i=1 j=1 aij bij D= n n n n 1 (2) 2 2 2 ( i=1 j=1 aij . i=1 j=1 bij )Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  11. 11. Daniel’s generalized coefficient contd. Special cases Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s r . Put aij as Rank(Xj ) − Rank(Xi ) and bij as Rank(Yj ) − Rank(Yi ) to get Spearman’s ρ. Put aij as sgn(Xj − Xi ) and bij as sgn(Yj − Yi ) to get Kendall’s τ .Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  12. 12. Alternative expression for τ and ρ First, we define dij to be +1 when the rank j ( j > i) precedes the rank i in the second ranking and zero otherwise. We can write the Kendall’s τ as the following 4Q τ =1− (3) n(n − 1) where Q is the total score, Q = i<j dij and n is the total number of elements in the sample. Similarly, we can write Spearman’s ρ as the following 12V ρ=1− (4) n(n2 − 1) where V = i<j (j − i)dij is the sum of inversions weighted by the numerical difference between the ranks inverted. This difference is called the weight of inversion.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  13. 13. Alternative expression for τ and ρ First, we define dij to be +1 when the rank j ( j > i) precedes the rank i in the second ranking and zero otherwise. We can write the Kendall’s τ as the following 4Q τ =1− (3) n(n − 1) where Q is the total score, Q = i<j dij and n is the total number of elements in the sample. Similarly, we can write Spearman’s ρ as the following 12V ρ=1− (4) n(n2 − 1) where V = i<j (j − i)dij is the sum of inversions weighted by the numerical difference between the ranks inverted. This difference is called the weight of inversion.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  14. 14. An interesting result We simulated observations in large sample size from a bivariate normal distribution and plotted the mean values of Spearman’s ρ and Kendall’s τ against Pearson’s r . We obtained the following graph.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  15. 15. The graphKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  16. 16. Relation of τ and ρ with r for BVN In 1907, Pearson , in his book [”On Further Methods of Determining Correlation”, Karl Pearson, Biometric series IV, (1907)], established the following relation between Spearman’s ρ and his r for bivariate normal distribution. π r = 2 sin ρ (5) 6 Cramer, in 1946, also established a relation between Kendall’s τ and Pearson’s r for bivariate normal. π r = sin τ (6) 2 However it is easy to show that the above two relations hold for any elliptic distribution.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  17. 17. Relation of τ and ρ with r for BVN In 1907, Pearson , in his book [”On Further Methods of Determining Correlation”, Karl Pearson, Biometric series IV, (1907)], established the following relation between Spearman’s ρ and his r for bivariate normal distribution. π r = 2 sin ρ (5) 6 Cramer, in 1946, also established a relation between Kendall’s τ and Pearson’s r for bivariate normal. π r = sin τ (6) 2 However it is easy to show that the above two relations hold for any elliptic distribution.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  18. 18. Relation of τ and ρ with r for BVN In 1907, Pearson , in his book [”On Further Methods of Determining Correlation”, Karl Pearson, Biometric series IV, (1907)], established the following relation between Spearman’s ρ and his r for bivariate normal distribution. π r = 2 sin ρ (5) 6 Cramer, in 1946, also established a relation between Kendall’s τ and Pearson’s r for bivariate normal. π r = sin τ (6) 2 However it is easy to show that the above two relations hold for any elliptic distribution.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  19. 19. Relation between Kendall’s τ and r for bivariate normal Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is an unbiased estimator of 2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7) where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ). d √ √ Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W ) have standard normal distribution. Since (Z1 , Z2 ) is symmetric about (0, 0) 4P(Z1 > 0, Z2 > 0)−1 = 4P(V 1 − r 2 +Wr > 0, W > 0)−1 (8) Use polar transformation on (V , W ) and evaluate this probability to get π sin−1 r . 2Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  20. 20. Relation between Kendall’s τ and r for bivariate normal Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a sample drawn from BVN(0,0,1,1,r). Then Kendall’s τ computed from the data is an unbiased estimator of 2P((X1 − X2 )(Y1 − Y2 ) > 0) − 1 = 2P(Z1 Z2 > 0) − 1 (7) where (Z1 , Z2 ) ∼ BVN(0, 0, 2, 2, 2r ). d √ √ Note that (Z1 , Z2 ) = 2(V 1 − r 2 + Wr , W ) where (V , W ) have standard normal distribution. Since (Z1 , Z2 ) is symmetric about (0, 0) 4P(Z1 > 0, Z2 > 0)−1 = 4P(V 1 − r 2 +Wr > 0, W > 0)−1 (8) Use polar transformation on (V , W ) and evaluate this probability to get π sin−1 r . 2Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  21. 21. Relation between Spearman’s ρ and r for bivariate normal Now we try to give a sketch of a proof of the relationship between Pearson’s r and Spearman’s ρ for bivariate normal distribution . Let R(Xi ) and R(Yi ) be the ranks of Xi and Yi . Define H(t) = I{t>0} . Then, observe that n R(Xi ) = H(Xi − Xj ) + 1 (9) j=1 Note that Spearman’s ρ is the Pearson’s correlation coefficient h− 1 n(n−1)2 between R(Xi ) and R(Yi ) which is 1 4 n(n2 −1) 12 n n n where h = i=1 j=1 k=1 H(Xi − Xj )H(Yi − Yk ).Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  22. 22. Proof continued Case 1 If i, j, k are distinct, then (Xi − Xj , Yi − Yk ) are distributed as r BVN(0, 0, 2, 2, 2 ). E {H(Xi − Xj )H(Yi − Yk )} will reduce to the integral of the probability density over the positive quadrant. We can check, following similar technique as in the case of τ that, this integral is 2 (1 − π cos−1 2 ). 1 1 rKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  23. 23. Proof continued Case 2 If i = j = k, then (Xi − Xj , Yi − Yk ) are distributed as BVN(0, 0, 2, 2, r ) and the above expectation would reduce to 1 1 −1 r ). Then, 2 (1 − π cos h − 4 n(n − 1)2 1 6 n − 2 −1 r 1 E 1 2 = sin + sin−1 r 12 n(n − 1) π n+1 2 n+1 (10) As n goes to infinity, the R.H.S reduces to 6 π sin−1 2 . rKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  24. 24. Reason for approximate linear relationship between Spearman’s ρ and Pearson’s r for BVN As observed from the graph, Spearman’s ρ for Bivariate normal is almost linearly related with Pearson’s r . This may be attributed to the fact that ρ = π sin−1 2 6 r 3 = π ( 2 + 1 r8 + . . .) 6 r 6 3 = π r + terms very small compared to 1st order term 3 ≈ πr For Kendall’s τ , using similar expansion, we can also show that τ convex function of r in the interval [0,1]. aKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  25. 25. Kendall’s comparative assessment of τ and ρ −n 3 Kendall in his paper admitted that ρ can take n 6 values 2 −n between −1 and +1, whereas τ can take only n 2 values in the range, but according to him, this does not seriously affect the sensitivity of τ . Both Kendall’s τ and Spearman’s ρ computed from the sample have asymptotically normal distributions. But Kendall showed using simulation experiments that the distribution for his correlation coefficient is surprisingly close to normal even for small values of n, which is not the case for Spearman’s correlation.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  26. 26. Bias properties of Kendall’s τ and Spearman’s ρ Consider a finite population. Let ρ and τ be Spearman’s and Kendall’s rank correlation coefficients computed from the entire population. Suppose that we have a simple random sample without replacement from that population. And we compute Spearman’s ρ and Kendall’s τ from the sample. Then, τ is an unbiased estimator for τ but ρ is a biased estimator for ρ . If the population size N tends to infinity, expected value of 1 Spearman’s ρ goes to n+1 {3τ + (n − 2)ρ } where n is the size of the sample.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  27. 27. small sample distribution of τ , ρ and r It is well-known that for a simple random sample of size n drawn from a bivariate normal distribution, under the assumption of zero correlation, Pearson’s r satisfies √ r n−2 √ ∼ tn−2 (11) 1 − r2 But the distribution of r for small samples from normal distribution with non-zero correlation and from non-normal distributions, is not tractable. τ and ρ are distribution free statistics in the sense that their distributions do not depend on the distribution of the data so long as X and Y are independent. Consequently, their distributions under the hypothesis of independence of X and Y can be tabulated.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  28. 28. Asymptotic normality of r , ρ and τ Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τ computed from a bivariate data are asymptotically normally distributed. Asymptotic normality of Pearson’s r can be derived using Central Limit Theorem applied to various bivariate sample moments. Asymptotic normality of Spearman’s ρ follows from asymptotic normality of linear rank statistics. Asymptotic normality of Kendall’s τ follows from asymptotic normality of U-statistics.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  29. 29. List of contents Historical overview of rank correlation. Some properties of rank correlation. A practical example of rank correlation.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  30. 30. A practical application of rank correlation Recently, the Ministry of Human Resource Development (MHRD) considered giving weightage to the marks scored in the 10+2 Board exams for admission to engineering colleges in India. The raw scores across the Boards are not comparable. So, they wanted help in this regard from the Indian Statistical Institute. The use of percentile ranks of students based on their aggregate scores was recommended by Indian Statistical Institute.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  31. 31. A practical application of rank correlation Recently, the Ministry of Human Resource Development (MHRD) considered giving weightage to the marks scored in the 10+2 Board exams for admission to engineering colleges in India. The raw scores across the Boards are not comparable. So, they wanted help in this regard from the Indian Statistical Institute. The use of percentile ranks of students based on their aggregate scores was recommended by Indian Statistical Institute.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  32. 32. The Data Indian Statistical Institute was provided data from 4 boards (namely, ICSE , CBSE , West Bengal Board and Tamil Nadu Board) for two consecutive years 2008 and 2009 Though the recommendation from Indian Statistical Institute was to use aggregate scores of a student for computing the percentile rank of the student (and that recommendation was favorably accepted by MHRD), a statistically interesting question is what happens if we consider various subject scores separately instead of the aggregate score. We intend to investigate this issue under some appropriate assumptions. 2Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  33. 33. The Data Indian Statistical Institute was provided data from 4 boards (namely, ICSE , CBSE , West Bengal Board and Tamil Nadu Board) for two consecutive years 2008 and 2009 Though the recommendation from Indian Statistical Institute was to use aggregate scores of a student for computing the percentile rank of the student (and that recommendation was favorably accepted by MHRD), a statistically interesting question is what happens if we consider various subject scores separately instead of the aggregate score. We intend to investigate this issue under some appropriate assumptions. 2Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  34. 34. The Data Indian Statistical Institute was provided data from 4 boards (namely, ICSE , CBSE , West Bengal Board and Tamil Nadu Board) for two consecutive years 2008 and 2009 Though the recommendation from Indian Statistical Institute was to use aggregate scores of a student for computing the percentile rank of the student (and that recommendation was favorably accepted by MHRD), a statistically interesting question is what happens if we consider various subject scores separately instead of the aggregate score. We intend to investigate this issue under some appropriate assumptions. 2Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  35. 35. The Model For convenience, let us consider only two subjects namely Mathematics and Physics. Let us denote the observed score of a student in Mathematics and Physics as XM and XP . Assume the existence of unobserved merit variables WP and WM such that the scores in the two subjects are related as XM ≈ gM (WM ) XP ≈ gP (WP ) (12) WM and WP may be treated as attributes of the student which depend on the knowledge and understanding of Maths and Physics respectively and also on other factors like schooling, intelligence etc. gM and gP relate to the examination procedure corresponding to the two subjects. They may vary across the boards. 3Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  36. 36. The Model For convenience, let us consider only two subjects namely Mathematics and Physics. Let us denote the observed score of a student in Mathematics and Physics as XM and XP . Assume the existence of unobserved merit variables WP and WM such that the scores in the two subjects are related as XM ≈ gM (WM ) XP ≈ gP (WP ) (12) WM and WP may be treated as attributes of the student which depend on the knowledge and understanding of Maths and Physics respectively and also on other factors like schooling, intelligence etc. gM and gP relate to the examination procedure corresponding to the two subjects. They may vary across the boards. 3Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  37. 37. The Model For convenience, let us consider only two subjects namely Mathematics and Physics. Let us denote the observed score of a student in Mathematics and Physics as XM and XP . Assume the existence of unobserved merit variables WP and WM such that the scores in the two subjects are related as XM ≈ gM (WM ) XP ≈ gP (WP ) (12) WM and WP may be treated as attributes of the student which depend on the knowledge and understanding of Maths and Physics respectively and also on other factors like schooling, intelligence etc. gM and gP relate to the examination procedure corresponding to the two subjects. They may vary across the boards. 3Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  38. 38. Formulation of the model Two students may obtain different scores in Mathematics and Physics because of the difference in their merit variables WM and WP or due to the difference in examination procedure gM and gP across the boards. It is time that we lay down our assumptions about WM , WP and gM and gP .Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  39. 39. Assumptions of the model Assumption 1 The functions gP and gM are monotonically increasing. This implies the scores of the students are expected to increase from less meritorious to more meritorious students for each of the two subjects. Assumption 2 The joint distribution of (WP , WM ) for the students is the same in different boards.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  40. 40. How Assumptions can be checked Imagine a common test in Mathematics and Physics taken by students of all the boards. Mathematics score in the common test would be a monotone function of the Mathematics score in the board examination, as both are monotone functions of the same merit variable. (The same holds for Physics scores). This can be tested by using Spearman’s ρ and Kendall’s τ statistics. Mathematics and Physics scores in the common test would have the same distribution in the subpopulations corresponding to different boards. This can be tested using any non-parametric test for equality of bivariate distributions.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  41. 41. Is there a way to check the validity of these assumptions using currently available data?Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  42. 42. How assumptions can be checked without a common test According to Assumption 2, the dependence between merits in Physics and Mathematics should be similar in all the boards. Rank correlation between Physics and Mathematics scores in a particular board should not depend on the board-specific monotone functions gM and gP . Therefore, rank correlation between Physics and Mathematics scores across the boards should be the same.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  43. 43. Rank correlation between Physics & Maths for different boards and years 0Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  44. 44. Rank correlation Physics & Chemistry Figure: Rank correlation between Physics and Chemistry marks over years 0Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  45. 45. bar chart of rank correlation Chemistry & Maths Figure: Rank correlation between Chemistry and Maths marks over years mKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  46. 46. Subject percentile graph WBHS 2008Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  47. 47. Variation of a subject across a board same yearKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  48. 48. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  49. 49. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  50. 50. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  51. 51. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  52. 52. Inference from the data analysis Between boards variation is significantly higher than within board variation across the two years. Visibly,there is high correlation in Tamil Nadu Board, whereas low correlation is observed in CBSE Board. If we interpret the data available as a large sample from a larger hypothetical population, the rank correlation computed for a board in a particular year will have an approximate normal distribution. So, we can use this rank correlation values to carry out ANOVA type statistical analysis to see whether there is significant difference values across different boards and across different years. When this is done, rank correlation appears to be significant across different boards. This essentially implies breakdown of Assumption 2. Study of the rank correlation brings out this fact even without scores of a common test.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  53. 53. Acknowledgement I would like to express my gratitude towards my mentors for this project, Prof.Probal Chaudhuri and Prof. Debasis Sengupta for their immense co-operation. I would also like to think all those who have been associated with this work in some way or the other.Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla
  54. 54. Thank YouKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla

×