Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Traditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada

531 views

Published on

COMPLEX NETWORKS: THEORY, METHODS, AND APPLICATIONS (2ND EDITION)
May 16-20, 2016

Published in: Science
  • Be the first to comment

  • Be the first to like this

Traditional vs Nontraditional Methods for Network Analytics - Ernesto Estrada

  1. 1. 1 Professor Ernesto Estrada Department of Mathematics & Statistics University of Strathclyde Glasgow, UK www.estradalab.org
  2. 2. 2
  3. 3. 3
  4. 4. Measure what is measurable, and make measurable what is not so. Galileo Galilei
  5. 5. Professor Ernesto Estrada Department of Mathematics & Statistics University of Strathclyde Glasgow, UK www.estradalab.org
  6. 6.    kkp ~  kk ekp / ~    !k ke kp kk                 2 2 2 2 1 k kk k ekp  
  7. 7.    kkp ~  !k ke kp kk 
  8. 8. Poisson Exponential Gamma Power-law Log-normal Stretched exponential Stumpf & Ingram, Europhys. Lett. 71 (2005) 152.
  9. 9.     kekp kk / ~            2 ~ 22 2//ln   k e kp mk Stumpf & Ingram, Europhys. Lett. 71 (2005) 152.
  10. 10. The qualitative interpretation of the skew is complicated. For example, a zero value indicates that the tails on both sides of the mean balance out, which is the case for a symmetric distribution, but is also true for an asymmetric distribution where the asymmetries even out, such as one tail being long but thin, and the other being short but fat. Further, in multimodal distributions and discrete distributions, skewness is also difficult to interpret. Importantly, the skewness does not determine the relationship of mean and median. Wikipedia
  11. 11. • Collatz-Sinogowitz Index (1957)   kGCS  1 • Bell Index (1992)   2 2 11         i i i i k n k n GVar • Albertson Index (1997)       Eji ji kkGirr , Unknown upper bounds
  12. 12. i i k p 1           ji ij kk p 11 2 1 Estrada: Phys. Rev. E 82, 066102 (2010)          ji ij kk p 1 ˆ ijijij pp ˆ
  13. 13. 2 11 211 2                    ji jiji ij kk kkkk                   Eij jiEij ij kk G 2 11 2 
  14. 14. AKL          otherwise,0 ,~for1 ,for ji jik L i ij 2/12/1 2/12/1     KAKI KLKL           otherwise,0 ,~for 1 ,for1 ji kk ji ji ijL
  15. 15.        .2 2 11 , 2/1 Gn kkn G Eji ji T         L
  16. 16.   2 1 n Gn       12 2~    nn Gn G     0~1  G
  17. 17. n  210      n i j j j T j i n 1 1 1 1 cos          j jj T nG  2 cos11  L             j i jj T iG 2 11   L
  18. 18. jjjx  cos1 jjjy  sin1     n j jx nn n G 1 2 12 ~ j 01  j jx jy y x
  19. 19. 8k 16k pnG ,   017.0~ G  034.0~ G
  20. 20.                      1227.2 27.0 18.1 18.1 nn n k k BA 8k 16k   132.0~ G   108.0~ G
  21. 21. S. cereviciaeD. melanogaster C. elegansH. pylori   148.0~ G   294.0~ G E. coli   241.0~ G   323.0~ G   383.0~ G Stretched exponential Log-normal     kekp kk / ~            2 ~ 22 2//ln   k e kp mk
  22. 22. Professor Ernesto Estrada Department of Mathematics & Statistics University of Strathclyde Glasgow, UK www.estradalab.org
  23. 23. 1,305 mi. Stanley Milgram 1933-1984
  24. 24. CLUSTERINGDISTANCE
  25. 25. 5ij d 
  26. 26. i i Ci nodeofrelationsetransitivpossibleofnumbertotal nodeofrelationsetransitivofnumber     1 2 2/1     ii i ii i i kk t kk t C  i iC n C 1
  27. 27. d 3.65 0.79 2.99 0.00027 1847 0.74C
  28. 28. d 2.65 0.28 2.25 0.05 10.5 0.69C 33
  29. 29. d 18.7 0.08 12.4 0.0005 926 0.3C 34
  30. 30. D. J. Watts S. H. Strogatz 35
  31. 31. There is a high probability that the shortest path connecting nodes p and q goes through the most connected nodes of the network. Sending the ‘information’ to the most connected nodes (hubs) of the network will increase the chances of reaching the target. The sender does not know the global structure of the network! 36
  32. 32. But also, there is a high probability that the most connected nodes of the network are involved in a high number of transitive relations. Sending the ‘information’ to the most connected nodes (hubs) of the network will increase the chances of getting lost. 37
  33. 33. Definition: A route is a walk of length l, which is any sequence of (not necessarily different) nodes v1,…, vl such as for each i =1,…,l there is a link from vl to vl+1. Definition: The communicability between two nodes in a network is defined as a function of the total number of routes connecting them, giving more importance to the shorter than to the longer ones. 38 Hypothesis: The information not only flows through the shortest paths but by mean of any available route.
  34. 34. 39  0 # of walks in steps from topq l l G c l p q     The between the nodes p and q is then mathematically defined as where cl should:  makes the series convergent  gives more weight to the shorter than to the longer walks
  35. 35.           VpVp pfpfV 2 2         Eqp qfpAf , Definition: Let . The adjacency operator is a bounded operator on defined as . Theorem: (Harary) The number of walks of length l between the nodes p and q in a network is equal to .  pq l A  V2 40
  36. 36. The communicability function can be written as:   qjpj l j l n j lpq l l lpq cAcG ,, 0 10        41 Let be a nonincreasing ordering of the eigenvalues of A, and let be the eigenvector associated with . n  21 j  j
  37. 37. 42     j ee l A G n j qjpjpq A l pq l pq       1 ,, 0 !             n j j qjpj pq l pq ll pq AIAG 1 ,,1 0 1    1 10   
  38. 38.          1 1 1 1 2 n n pq n j j j G K p q e e p p                 1 1 2 1 1 1 1 . n n pq n j j j n n e G K e p p n e e n ne ne               pq nG K   
  39. 39. 2cos 1 j j n            2 sin 1 1 j jp p n n        2cos 1 1 2cos 1 1 2 2 sin sin 1 1 1 1 2 sin sin . 1 1 1 jn n pq n j jn n j jp jq G P e n n n n jp jq e n n n                                                
  40. 40.       sin sin 1/ 2 cos cos               2cos 2cos 1 1 1 1 1 cos cos 1 1 1 1 j jn n n pq n j j p q j p q G P e e n n n n                                    1,2, ,j n  1/ nj  ,0        2cos 2cos 0 0 1 1 cos cospq nG P p q e d p q e d                 / 1j n  
  41. 41.    xcos 0 1 cos e d I x            2 2pq n p q p qG P I I         1 1 12 2 0n n n nG P I I    
  42. 42. x xx x xx x x xo oo oooo oo o 0 5 10 15 20 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 Subject Secondlargestprincipalvalue x x x x x x x x oo oo o oo o 0 5 10 15 20 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 x o o Subject Secondlargestprincipalvalue   pqpq AWWG 2/12/1 exp ~  
  43. 43. : Stroke : Decreased communicability : Stroke : Increased communicability Images courtesy of Dr. Jonathan J. Crofts.
  44. 44. Professor Ernesto Estrada Department of Mathematics & Statistics University of Strathclyde Glasgow, UK www.estradalab.org
  45. 45.  Local metrics provide a measurement of a structural property of a single node  Designed to characterise  Functional role – what part does this node play in system dynamics?  Structural importance – how important is this node to the structural characteristics of the system?
  46. 46.     j ee l A G n j pjpp A l pp l pp       1 2 , 0 ! 1) 3.026 2) 1.641 3) 1.641 4) 3.127 5) 2.284 6) 2.284 7) 1.592 8) 1.592 1 2 3 4 5 6 7 8
  47. 47. 1 2 3 4 56 7 8 9  8,6,5,3,11 S   65.451 SpGpp  9,7,4,22 S   70.452 SpGpp
  48. 48. Essential 54
  49. 49. 55 Rank Protein Essential ? 1 YCR035C Y 2 YIL062C Y ... ... ...
  50. 50. Estrada: Proteomics 6 (2009) 35-40 0 10 20 30 40 50 60 70PercentageofEssentialProteinsIdentified
  51. 51. 57 Consider that two nodes p and q are trying to communicate with each other by sending information both ways through the network. ppG Quantifies the amount of ‘information’ that is sent by p, wanders around the network, and returns to the origin, the node p. pqG Quantifies the amount of ‘information’ that is sent by p, wanders around the network, and arrives to its destination, the node q. We know that:
  52. 52. 2 def pq pp qq pqG G G    The goal of communication is to maximise the amount of information that arrives to its destination by minimising that which is lost. 58
  53. 53. Theorem: The function is a Euclidean distance.pq 59     T pq p q p qe   Λ φ φ φ φ 1 2 0 0 0 0 0 0 n                Λ       1 2 p n p p p                 φ Proof:
  54. 54. 60             /2 /2 /2 /2 /2 /2 2 T pq p q p q T p q p q T p q p q p q e e e e e e                Λ Λ Λ Λ Λ Λ φ φ φ φ φ φ φ φ x x x x x x Proof (cont.):
  55. 55. 61 pq pqd p q
  56. 56. 62 p q 1P p q 2P     09.5 1, , 2/1    Pji Eji ij     80.4 2, , 2/1    Pji Eji ij Remark. The shortest route based on the communicability distance is the shortest path that avoids the nodes with the highest ‘cliquishness’ in the graph.
  57. 57. 8 1.56 10  8 3.30 10  
  58. 58. Fraction of nodes removed
  59. 59. The routes minimizing the communicability distance are the shortest ones that avoid the hubs of the network. Then, they can be useful to avoid possible collapse of the networks due to the removal of the hubs.
  60. 60. 67 Essential
  61. 61.    q qp n , 1     q qpd n , 1 68
  62. 62. 69 large ppG Wikipedia
  63. 63. 70 qqpp pq def pq GG G  Theorem. The function is the cosine of the Euclidean angle spanned by the position vectors of the nodes p and q. pq qqpp pq qp qp pq GG G xx xx 11 coscos        Definition:
  64. 64. 71             a b cR 2 2 2 4 1 11  AT ea   1  AT esb   sesc AT    Theorem. The communicability distance induces an embedding of a network into an (n-1)-dimensional Euclidean sphere of radius: Estrada et al.: Discrete Appl. Math. 176 (2014) 53-77. Estrada & Hatano, SIAM Rev. (2016) in press. ATT essC 211    A ediags   Remark. The communicability distance matrix C is circum- Euclidean:
  65. 65. 72            2/1 2/1 2/1 1             2/1 0 2/1 2             2/1 2/1 2/1 3  jjjA   
  66. 66. 73 0.08.0 6.0 0.0 7.0 0.0 0.1 0.1 x y z 2  3  1  0.0 0.1 2  3  1  0.1 0.1
  67. 67. 74 2  1  3  x y z 0.1 0.0 0.1 0.1 0.1 0.0 0.1 0.2 0.0 O 1x  2x  3x 
  68. 68. 75 px  qx  O ppp Gx   pq pq qqq Gx   Estrada & Hatano: Arxiv (2014) 1412.7388. Estrada: Lin. Alg. Appl. 436 (2012) 4317-4328. qppq xxG  
  69. 69. Professor Ernesto Estrada Department of Mathematics & Statistics University of Strathclyde Glasgow, UK www.estradalab.org
  70. 70. 1V 2V 1 2V V V  0 0T B A B        1j n j    
  71. 71. m m b fr 1 Problem:
  72. 72.   sinh 0Bipartivity Tr A  Bipartivity No odd cycles
  73. 73.         exp sinh coshTr A Tr A Tr A       exp coshBipartivity Tr A Tr A            EE EE ATr ATr b even n j j n j j s      1 1 exp cosh exp cosh  
  74. 74.    even even s s EE EE a b G b G e EE EE a b        Theorem. G G e e b a
  75. 75. 1.000sb  0.829sb  0.769sb  0.721sb  0.692sb  0.645sb  0.645sb  0.597sb 
  76. 76.         1 1 1 1 cosh sinh cosh sinh n n j j j j e n n j j j j b                          1 1 exp exp exp exp n j j e n j j Tr A b Tr A          
  77. 77. 1.000sb  0.658sb  0.538sb  0.462sb  0.383sb  0.289sb  0.289sb  0.194sb 
  78. 78. Hervibores Grass Parasitoids Hyper-hyper-parasitoids Hyper-parasitoids 0.766sb  0.532eb 
  79. 79. j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
  80. 80.      ˆ exp cosh sinhpq pq pq G A A A           ˆ sinh 0pq pq G A     p q p q  ˆ cosh 0pq pq G A    0 and in different partitionsˆ 0 and in the same partitions pq p q G p q   
  81. 81. 1 3 7 4 6 10 8 11 9 12 5 2
  82. 82.                                                    38.919.872.673.680.636.628.777.610.705.704.773.8 19.81.1083.609.711.766.673.862.598.729.728.702.9 72.683.603.817.487.543.504.731.632.456.665.528.7 73.609.717.405.862.550.505.742.562.630.456.629.7 80.611.787.562.517.893.310.768.596.562.632.498.7 36.666.643.550.593.337.777.639.568.542.531.662.5 28.773.804.705.710.777.638.936.680.673.672.619.8 77.662.531.642.568.539.536.637.793.350.543,566.6 10.798.732.462.696.568.580.693.317.862.587.511.7 05.729.756.630.462.642.573.650.562.505.817.409.7 04.728.765.556.632.431.672.643.587.517.403.883.6 73.802.928.729.798.75.6219.866.611.709.783.61.10 ˆG
  83. 83.                                        011111000000 101111000000 110111000000 111011000000 111101000000 111110000000 000000011111 000000101111 000000110111 000000111011 000000111101 000000111110 ~ G
  84. 84. 10 12 11 8 7 9 2 3 4 1 5 6  1 1,2,3,4,5,6C   2 6,7,8,9,10,11,12C  1 3 7 4 6 10 8 11 9 12 5 2
  85. 85. 2 3 10 4 1211 1 5 6 87 9 1 3 7 4 6 10 8 11 9 12 5 2
  86. 86. Professor Ernesto Estrada Department of Mathematics & Statistics University of Strathclyde Glasgow, UK www.estradalab.org
  87. 87. VS  VS 2/1  S Let and represents the number of edges with exactly one endpoint in S.     1 2 min , ,0 2n S S V G S V S S          
  88. 88.    1 2 1 1 22 2 G           N  321(Alon-Milman): Let 21  
  89. 89. A Ramanujan graph n = 80,  = 1/4 The Petersen graph n = 10,  = 1
  90. 90. S S   SS  bottleneck
  91. 91. MySQL 30.7 93.6 56.5 St Marks St Martin
  92. 92.  t   t   t   t 
  93. 93. iiG i              800,628,332 1 ! 1032 0 iiiiii k ii k ii AAA k A G
  94. 94. i i i 1/2 1/6 1/24 i 1/40320
  95. 95.  iNk       1 1             n j kkk jNiNis i,1 
  96. 96.   k jpj n p n j ijk iN  , 1 1 ,          n p k jqj n q n j pj n p k pN 1 , 1 1 , 1            n p k jqj n q n j pj k jpj n p n j ij k is 1 , 1 1 , , 1 1 ,  
  97. 97.   . 1 1 ,1,1 1 ,1,1 1 1 1,1,1 1 1,1,1            n p n q qp n p pi n p n q k qp n p k pi k is     2 1 ,1 1 1 ,1,1             n j p n p n q qp    in p p i k is ,1 1 ,1 ,1      k Remark: The eigenvector centrality of a given node represents the probability of selecting at random a walk of infinite length that has started at that node.
  98. 98. 1 2       n j jijiiiG 2 2 ,1 2 ,1 expexp   1 2 ,1 exp  iiiG  2 ln 2 1 ln 1 ,1    iii G i,1ln  iiGln
  99. 99.   5.0 1 2 ,1 ,1 exp lnln        ii i i G  
  100. 100.   iiii GVi  1 2 ,1,1 exp,0ln 
  101. 101.   iiii GVi  1 2 ,1,1 exp,0ln 
  102. 102.   iiii GVi  1 2 ,1,1 exp,0ln 
  103. 103. andVpp  ,0ln ,1 Vqq  ,0ln ,1
  104. 104. Chordless cycle of length 15
  105. 105. MySQL 3 105.1   St Marks St Martin 3 1090.2   67.1

×