# Different kind of distance and Statistical Distance

A short brief of distance and statistical distance which is core of multivariate analysis.................you will get here some more simple conception about distances and statistical distance.

• which are the two null statistical distances?

1. 1. WELCOME TO MY PRESENTATION ON STATISTICAL DISTANCE
2. 2. Md. Menhazul Abedin M.Sc. Student Dept. of Statistics Rajshahi University Mob: 01751385142 Email: menhaz70@gmail.com
3. 3. Objectives • To know about the meaning of statistical distance and it’s relation and difference with general or Euclidean distance
4. 4. Content Definition of Euclidean distance Concept & intuition of statistical distance Definition of Statistical distance Necessity of statistical distance Concept of Mahalanobis distance (population &sample) Distribution of Mahalanobis distance Mahalanobis distance in R Acknowledgement
5. 5. Euclidean Distance from origin (0,0) (X,Y) X Y
6. 6. Euclidean Distance P(X,Y) Y O (0,0) X By Pythagoras 𝑑(𝑜, 𝑝) = 𝑋2 + 𝑌2
7. 7. Euclidean Distance Specific point
8. 8. we see that two specific points in each picture Our problem is to determine the length between two points . But how ?????????? Assume that these pictures are placed in two dimensional spaces and points are joined by a straight line
9. 9. Let 1st point is (𝑥1,𝑦1) and 2nd point is (𝑥2, 𝑦2) then distance is D= √ ( (𝑥1−𝑥2)2 + (𝑦1 − 𝑦2)2 ) What will be happen when dimension is three
10. 10. Distanse in 𝑅3
11. 11. Distance is given by • Points are (x1,x2,x3) and (y1,y2,y3) (𝑥1 − 𝑦1)2+(𝑥2 − 𝑦2)2+(𝑥3 − 𝑦3)2
12. 12. For n dimension it can be written as the following expression and named as Euclidian distance 22 22 2 11 2121 )()()(),( ),,,(),,,,( pp pp yxyxyxQPd yyyQxxxP   
13. 13. 12/12/2016 14 Properties of Euclidean Distance and Mathematical Distance • Usual human concept of distance is Eucl. Dist. • Each coordinate contributes equally to the distance 22 22 2 11 2121 )()()(),( ),,,(),,,,( pp pp yxyxyxQPd yyyQxxxP    14 Mathematicians, generalizing its three properties , 1) d(P,Q)=d(Q,P). 2) d(P,Q)=0 if and only if P=Q and 3) d(P,Q)=<d(P,R)+d(R,Q) for all R, define distance on any set.
14. 14. P(X1,Y1) Q(X2,Y2) R(Z1,Z2)) R(Z1,Z2)
15. 15. Taxicab Distance :NotionRed: Manh attan distan ce. Green: diagonal, straight- line distance Blue, yello w: equiv alent Man hatta n dista nces.
16. 16. • The Manhattan distance is the simple sum of the horizontal and vertical components, whereas the diagonal distance might be computed by applying the Pythagorean Theorem .
17. 17. • Red: Manhattan distance. • Green: diagonal, straight-line distance. • Blue, yellow: equivalent Manhattan distances.
18. 18. • Manhattan distance 12 unit • Diagonal or straight-line distance or Euclidean distance is 62 + 62 =6√2 We observe that Euclidean distance is less than Manhattan distance
19. 19. Taxicab/Manhattan distance :Definition (p1,p2)) (q1,q2) │𝑝1 − 𝑞2│ │p2-q2│
20. 20. Manhattan Distance • The taxicab distance between (p1,p2) and (q1,q2) is │p1-q1│+│p2-q2│
21. 21. Relationship between Manhattan & Euclidean distance. 7 Block 6 Block
22. 22. Relationship between Manhattan & Euclidean distance. • It now seems that the distance from A to C is 7 blocks, while the distance from A to B is 6 blocks. • Unless we choose to go off-road, B is now closer to A than C. • Taxicab distance is sometimes equal to Euclidean distance, but otherwise it is greater than Euclidean distance. Euclidean distance <Taxicab distance Is it true always ??? Or for n dimension ???
23. 23. Proof…….. Absolute values guarantee non-negative value Addition property of inequality
24. 24. Continued………..
25. 25. Continued………..
26. 26. For high dimension • It holds for high dimensional case • Σ │𝑥𝑖 − 𝑦𝑖│2 ≤ Σ │𝑥𝑖 − 𝑦𝑖│2 + 2Σ│𝑥𝑖 − 𝑥𝑖││𝑥𝑗 − 𝑥𝑗│ Which implies Σ (𝑥𝑖 − 𝑦𝑖)2 ≤ Σ│𝑥𝑖 − 𝑥𝑗│ 𝑑 𝐸 ≤ 𝑑 𝑇
27. 27. 12/12/2016 Statistical Distance • Weight coordinates subject to a great deal of variability less heavily than those that are not highly variable Whoisnearerto datasetifitwere point? Same distance from origin
28. 28. • Here variability in x1 axis > variability in x2 axis  Is the same distance meaningful from origin ??? Ans: no But, how we take into account the different variability ???? Ans : Give different weights on axes.
29. 29. 12/12/2016 Statistical Distance for Uncorrelated Data     22 2 2 11 2 12* 2 2* 1 222 * 2111 * 1 21 ),( /,/ )0,0(),,( s x s x xxPOd sxxsxx OxxP   weight Standardization
30. 30. all point that have coordinates (x1,x2) and are a constant squared distance , c2 from the origin must satisfy 𝑥12 𝑠11 + 𝑥22 𝑠22 =𝑐2 But … how to choose c ????? It’s a problem Choose c as 95% observation fall in this area …. 𝑠11 > 𝑠22 = > 1 𝑠11 < 1 𝑠22
31. 31. 12/12/2016 Ellipse of Constant Statistical Distance for Uncorrelated Data 11sc 11sc 22sc 22sc x1 x2 0
32. 32. • This expression can be generalized as ……… statistical distance from an arbitrary point P=(x1,x2) to any fixed point Q=(y1,y2) ;lk;lk; For P dimension……………..
33. 33. Remark : 1) The distance of P to the origin O is obtain by setting all 𝑦𝑖 = 0 2) If all 𝑠𝑖𝑖 are equal Euclidean distance formula is appropriate
34. 34. Scattered Plot for Correlated Measurements
35. 35. • How do you measure the statistical distance of the above data set ?????? • Ans : Firstly make it uncorrelated . • But why and how………??????? • Ans: Rotate the axis keeping origin fixed.
36. 36. 12/12/2016 Scattered Plot for Correlated Measurements
37. 37. Rotation of axes keeping origin fixed O M R X1 N Q 𝑥1 P(x1,x2) x2 𝑥2 𝜃 𝜃
38. 38. x=OM =OR-MR = 𝑥1 cos𝜃 – 𝑥2 sin𝜃 ……. (i) y=MP =QR+NP = 𝑥1 sin𝜃 + 𝑥2 cos𝜃 ……….(ii)
39. 39. • The solution of the above equations
40. 40. Choice of 𝜃 What 𝜃 will you choice ? How will you do it ?  Data matrix → Centeralized data matrix → Covariance of data matrix → Eigen vector Theta = angle between 1st eigen vector and [1,0] or angle between 2nd eigen vector and [0,1]
41. 41. Why is that angle between 1st eigen vector and [0,1] or angle between 2nd eigen vector and [1,0] ?? Ans: Let B be a (p by p) positive definite matrix with eigenvalues λ1≥λ2≥λ3≥ … … . . ≥ λp>0 and associated normalized eigenvectors 𝑒1, 𝑒2, … … … , 𝑒 𝑝.Then 𝑚𝑎𝑥 𝑥≠0 𝑥′ 𝐵𝑥 𝑥′ 𝑥 = λ1 attained when x= 𝑒1 𝑚𝑖𝑛 𝑥≠0 𝑥′ 𝐵𝑥 𝑥′ 𝑥 = λ 𝑝 attained when x= 𝑒 𝑝
42. 42. 𝑚𝑎𝑥 𝑥⊥𝑒1,𝑒2,…,𝑒 𝑘 𝑥′ 𝐵𝑥 𝑥′ 𝑥 = λ 𝑘+1 attained when x= 𝑒 𝑘+1 , k = 1,2, … , p − 1.
43. 43. Choice of 𝜃 #### Excercise 16.page(309).Heights in inches (x) & Weights in pounds(y). An Introduction to Statistics and Probability M.Nurul Islam ####### x=c(60,60,60,60,62,62,62,64,64,64,66,66,66,66,68, 68,68,70,70,70);x y=c(115,120,130,125,130,140,120,135,130,145,135 ,170,140,155,150,160,175,180,160,175);y ############ V=eigen(cov(cdata))\$vectors;V as.matrix(cdata)%*%V plot(x,y)
44. 44. data=data.frame(x,y);data as.matrix(data) colMeans(data) xmv=c(rep(64.8,20));xmv ### x mean vector ymv=c(rep(144.5,20));ymv ### y mean vector meanmatrix=cbind(xmv,ymv);meanmatrix cdata=data-meanmatrix;cdata ### mean centred data plot(cdata) abline(h=0,v=0) cor(cdata)
45. 45. • ################## cov(cdata) eigen(cov( cdata)) xx1=c(1,0);xx1 xx2=c(0,1);xx2 vv1=eigen(cov(cdata))\$vectors[,1];vv1 vv2=eigen(cov(cdata))\$vectors[,2];vv2
46. 46. ################ theta = acos( sum(xx1*vv1) / ( sqrt(sum(xx1 * xx1)) * sqrt(sum(vv1 * vv1)) ) );theta theta = acos( sum(xx2*vv2) / ( sqrt(sum(xx2 * xx2)) * sqrt(sum(vv2 * vv2)) ) );theta ############### xx=cdata[,1]*cos( 1.41784)+cdata[,2]*sin( 1.41784);xx yy=-cdata[,1]*sin( 1.41784)+cdata[,2]*cos( 1.41784);yy plot(xx,yy) abline(h=0,v=0)
47. 47. V=eigen(cov(cdata))\$vectors;V tdata=as.matrix(cdata)%*%V;tdata ### transformed data cov(tdata) round(cov(tdata),14) cor(tdata) plot(tdata) abline(h=0,v=0) round(cor(tdata),16)
48. 48. • ################ comparison of both method ############ comparison=tdata - as.matrix(cbind(xx,yy));comparison round(comparison,4)
49. 49. ########### using package. md from original data ##### md=mahalanobis(data,colMeans(data),cov(data),inverted =F);md ## md =mahalanobis distance ######## mahalanobis distance from transformed data ######## tmd=mahalanobis(tdata,colMeans(tdata),cov(tdata),inverted =F);tmd ###### comparison ############ md-tmd
50. 50. Mahalanobis distance : Manually mu=colMeans(tdata);mu incov=solve(cov(tdata));incov md1=t(tdata[1,]-mu)%*%incov%*%(tdata[1,]- mu);md1 md2=t(tdata[2,]-mu)%*%incov%*%(tdata[2,]- mu);md2 md3=t(tdata[3,]-mu)%*%incov%*%(tdata[3,]- mu);md3 ............. ……………. ………….. md20=t(tdata[20,]-mu)%*%incov%*%(tdata[20,]- mu);md20 md for package and manully are equal
51. 51. tdata s1=sd(tdata[,1]);s1 s2=sd(tdata[,2]);s2 xstar=c(tdata[,1])/s1;xstar ystar=c(tdata[,2])/s2;ystar md1=sqrt((-1.46787309)^2 + (0.1484462)^2);md1 md2=sqrt((-1.22516896 )^2 + ( 0.6020111 )^2);md2 ………. ………… …………….. Not equal to above distances…….. Why ??????? Take into account mean
52. 52. 12/12/2016 Statistical Distance under Rotated Coordinate System 2 2222112 2 111 212 211 22 2 2 11 2 1 21 2),( cossin~ sincos~ ~ ~ ~ ~ ),( )~,~(),0,0( xaxxaxaPOd xxx xxx s x s x POd xxPO       𝑠11 𝑠22 are sample variances
53. 53. • After some manipulation this can be written in terms of origin variables Whereas
54. 54. Proof………… • 𝑠11= 1 𝑛−1 Σ( 𝑥1 − 𝑥1 ) 2 = 1 𝑛−1 Σ (𝑥1 cos 𝜃 + 𝑥2 sin 𝜃 − 𝑥1 cos 𝜃 − 𝑥2 sin 𝜃 )2 = 𝑐𝑜𝑠2(𝜃)𝑠11 + 2 sin 𝜃 cos 𝜃 𝑠12 + 𝑠𝑖𝑛2(𝜃)𝑠22 𝑠22 = 1 𝑛−1 Σ( 𝑥2 − 𝑥2 ) 2 = Σ 1 𝑛−1 ( − 𝑥1 sin 𝜃 + 𝑥2 cos 𝜃 + 𝑥1 sin(𝜃) + 𝑥2 cos 𝜃 ) 2 = 𝑐𝑜𝑠2(𝜃)𝑠22 - 2 sin 𝜃 cos 𝜃 𝑠12 + 𝑠𝑖𝑛2(𝜃)𝑠11
55. 55. Continued…………. 𝑑(𝑂, 𝑃)= (𝑥1cos 𝜃 + 𝑥2 sin 𝜃) 2 𝑠11 + (− 𝑥1 sin 𝜃 + 𝑥2 cos 𝜃)2 𝑠22
56. 56. Continued………….
57. 57. 12/12/2016 General Statistical Distance )])((2 ))((2))((2 )( )()([ ),( ]222 [ ),( ),,,(),0,,0,0(),,,,( 11,1 331113221112 2 2 2222 2 1111 1,131132112 22 222 2 111 2121 pppppp pppp pppp ppp pp yxyxa yxyxayxyxa yxa yxayxa QPd xxaxxaxxa xaxaxa POd yyyQOxxxP               
58. 58. • The above distances are completely determined by the coefficients(weights) 𝑎𝑖𝑘 ; i, k = 1,2,3, … … … p. These are can be arranged in rectangular array as this array (matrix) must be symmetric positive definite.
59. 59. Why Positive definite ???? Let A be a positive definite matrix . A=C’C X’AX= X’C’CX = (CX)’(CX) = Y’Y It obeys all the distance property. X’AX is distance , For different A it gives different distance .
60. 60. • Why positive definite matrix ???????? • Ans: Spectral decomposition : the spectral decomposition of a k×k symmetric matrix A is given by • Where (λ𝑖, 𝑒𝑖); 𝑖 = 1,2, … … … , 𝑘 are pair of eigenvalues and eigenvectors. And λ1 ≥ λ2 ≥ λ3 ≥ … … . . And if pd λ𝑖 > 0 & invertible .
61. 61. 4.0 4.5 5.0 5.5 6.0 2 3 4 5 λ1 λ2 𝑒1 𝑒2
62. 62. • Suppose p=2. The distance from origin is By spectral decomposition X1 X2 𝐶 √λ1 𝐶 √λ2
63. 63. Another property is Thus We use this property in Mahalanobis distance
64. 64. 12/12/2016 Necessity of Statistical Distance Center of gravity Another point
65. 65. • Consider the Euclidean distances from the point Q to the points P and the origin O. • Obviously d(PQ) > d (QO )  But, P appears to be more like the points in the cluster than does the origin .  If we take into account the variability of the points in cluster and measure distance by statistical distance , then Q will be closer to P than O .
66. 66. Mahalanobis distance • The Mahalanobis distance is a descriptive statistic that provides a relative measure of a data point's distance from a common point. It is a unitless measure introduced by P. C. Mahalanobis in 1936
67. 67. Intuition of Mahalanobis Distance • Recall the eqution d(O,P)= 𝑥′ 𝐴𝑥 => 𝑑2 (𝑂, 𝑃) =𝑥′ 𝐴𝑥 Where x= 𝑥1 𝑥2 , A= 𝑎11 𝑎12 𝑎21 𝑎22
68. 68. Intuition of Mahalanobis Distance d(O,P)= 𝑥′ 𝐴𝑥 𝑑2 𝑂, 𝑃 = 𝑥′ 𝐴𝑥 Where 𝑥′ = 𝑥1 𝑥2 𝑥3 ⋯ 𝑥 𝑝 ; A=
69. 69. Intuition of Mahalanobis Distance 𝑑2 (𝑃, 𝑄) = 𝑥 − 𝑦 ′ 𝐴(𝑥 − 𝑦) where, 𝑥′ = 𝑥1, 𝑥2, … , 𝑥 𝑝 ; 𝑦′ = (𝑦1, 𝑦2, … 𝑦𝑝) A=
70. 70. Mahalanobis Distance • Mahalanobis used ,inverse of covariance matrix Σ instead of A • Thus 𝑑2 𝑂, 𝑃 = 𝑥′ Σ−1 𝑥 ……………..(1) • And used 𝜇 (𝑐𝑒𝑛𝑡𝑒𝑟 𝑜𝑓 𝑔𝑟𝑎𝑣𝑖𝑡𝑦 ) instead of y 𝑑2 (𝑃, 𝑄) = (𝑥 − 𝜇 )′Σ−1 (𝑥 − 𝜇)………..(2) Mah- alan- obis dist- ance
71. 71. Mahalanobis Distance • The above equations are nothing but Mahalanobis Distance …… • For example, suppose we took a single observation from a bivariate population with Variable X and Variable Y, and that our two variables had the following characteristics
72. 72. • single observation, X = 410 and Y = 400 The Mahalanobis distance for that single value as:
73. 73. • ghk 1.825
74. 74. • Therefore, our single observation would have a distance of 1.825 standardized units from the mean (mean is at X = 500, Y = 500). • If we took many such observations, graphed them and colored them according to their Mahalanobis values, we can see the elliptical Mahalanobis regions come out
75. 75. • The points are actually distributed along two primary axes:
76. 76. If we calculate Mahalanobis distances for each of these points and shade them according to their distance value, we see clear elliptical patterns emerge:
77. 77. • We can also draw actual ellipses at regions of constant Mahalanobis values: 68% obs 95% obs 99.7% obs
78. 78. • Which ellipse do you choose ?????? Ans : Use the 68-95-99.7 rule . 1) about two-thirds (68%) of the points should be within 1 unit of the origin (along the axis). 2) about 95% should be within 2 units 3)about 99.7 should be within 3 units
79. 79. If normal
80. 80. Sample Mahalanobis Distancce • The sample Mahalanobis distance is made by replacing Σ by S and 𝜇 by 𝑋 • i.e (X- 𝑋)’ 𝑆−1 (X- 𝑋)
81. 81. For sample (X- 𝑿)’ 𝑺−𝟏 (X- 𝑿)≤ 𝝌 𝟐 𝒑 (∝) Distribution of mahalanobis distance
82. 82. Distribution of mahalanobis distance Let 𝑋1, 𝑋2, 𝑋3, … … … , 𝑋 𝑛 be in dependent observation from any population with mean 𝜇 and finite (nonsingular) covariance Σ . Then  𝑛 ( 𝑋 − 𝜇) is approximately 𝑁𝑝(0, Σ) and  𝑛 𝑋 − 𝜇 ′ 𝑆−1 ( 𝑋 − 𝜇) is approximately χ 𝑝 2 for n-p large This is nothing but central limit theorem
83. 83. Mahalanobis distance in R • ########### Mahalanobis Distance ########## • x=rnorm(100);x • dm=matrix(x,nrow=20,ncol=5,byrow=F);dm ##dm = data matrix • cm=colMeans(dm);cm ## cm= column means • cov=cov(dm);cov ##cov = covariance matrix • incov=solve(cov);incov ##incov= inverse of covarianc matrix
84. 84. Mahalanobis distance in R • ####### MAHALANOBIS DISTANCE : MANUALY ###### • @@@ Mahalanobis distance of first • observation@@@@@@ • ob1=dm[1,];ob1 ## first observation • mv1=ob1-cm;mv1 ## deviatiopn of first observation from center of gravity • md1=t(mv1)%*%incov%*%mv1;md1 ## mahalanobis distance of first observation from center of gravity •
85. 85. Mahalanobis distance in R • @@@@@@ Mahalanobis distance of second observation@@@@@ • ob2=dm[2,];ob2 ## second observation • mv2=ob2-cm;mv2 ## deviatiopn of second • observation from • center of gravity • md2=t(mv2)%*%incov%*%mv2;md2 ##mahalanobis distance of second observation from center of gravity ................ ……………… …..……………
86. 86. Mahalanobis distance in R ………....... ……………… …………… @@@@@ Mahalanobis distance of 20th observation@@@@@ • Ob20=dm[,20];ob20 [## 20th observation • mv20=ob20-cm;mv20 ## deviatiopn of 20th observation from center of gravity • md20=t(mv20)%*%incov%*%mv20;md20 ## mahalanobis distance of 20thobservation from center of gravity
87. 87. Mahalanobis distance in R ####### MAHALANOBIS DISTANCE : PACKAGE ######## • md=mahalanobis(dm,cm,cov,inverted =F);md ## md =mahalanobis distance • md=mahalanobis(dm,cm,cov);md
88. 88. Another example • x <- matrix(rnorm(100*3), ncol = 3) • Sx <- cov(x) • D2 <- mahalanobis(x, colMeans(x), Sx)
89. 89. • plot(density(D2, bw = 0.5), main="Squared Mahalanobis distances, n=100, p=3") • qqplot(qchisq(ppoints(100), df = 3), D2, main = expression("Q-Q plot of Mahalanobis" * ~D^2 * " vs. quantiles of" * ~ chi[3]^2)) • abline(0, 1, col = 'gray') • ?? mahalanobis
90. 90. Acknowledgement Prof . Mohammad Nasser . Richard A. Johnson & Dean W. Wichern . & others
91. 91. THANK YOU ALL
92. 92. Necessity of Statistical Distance In home Mother In mess Female maid Student in mess