Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation

45 views

Published on

  • Be the first to comment

  • Be the first to like this

Presentation

  1. 1. Dynamics in large scale networks John Clements Supervised by: Dr. Babak Farzad, Dr. Henryk Fuk± Brock University jc09xs@brocku.ca February 01 2016 John Clements (Brock University) Dynamics in large scale networks February 01 2016 1 / 65
  2. 2. Table of Contents 1 Introduction Denitions 2 A Brief History of Large network dynamics Patterns in the removal of nodes from large networks. Network properties 3 High Clustering 4 Node expiration Connectivity and node expiration. Degree Clustering Coecient Conclusions 5 The server merger Graphical evolution The servers before the merger The merger. Degree dierences. 6 Graph motifs Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 2 / 65
  3. 3. Table of Contents 1 Introduction Denitions 2 A Brief History of Large network dynamics Patterns in the removal of nodes from large networks. Network properties 3 High Clustering 4 Node expiration Connectivity and node expiration. Degree Clustering Coecient Conclusions 5 The server merger Graphical evolution The servers before the merger The merger. Degree dierences. 6 Graph motifs Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 3 / 65
  4. 4. Graph Theory Denition Graph A Graph G is an ordered pair (V (G),E(G)) consisting of a set V (G) of vertices and a set E(G) of edges, that form connections between them. John Clements (Brock University) Dynamics in large scale networks February 01 2016 4 / 65
  5. 5. Network analysis Denitions Degree The degree of a vertex v in a graph G, denoted kG (v) is the number of edges of G incident with v.[?] Clustering coecient: The clustering coecient of a node v is: cv = 2T(v) kv (kv − 1) Where T(v) is the number of triangles (i.e. connected neighbors) v is involved in. The clustering coecient of a degree 0 or 1 node is set as 0.[?] John Clements (Brock University) Dynamics in large scale networks February 01 2016 5 / 65
  6. 6. Network analysis Denitions Degree The degree of a vertex v in a graph G, denoted kG (v) is the number of edges of G incident with v.[?] Clustering coecient: The clustering coecient of a node v is: cv = 2T(v) kv (kv − 1) Where T(v) is the number of triangles (i.e. connected neighbors) v is involved in. The clustering coecient of a degree 0 or 1 node is set as 0.[?] John Clements (Brock University) Dynamics in large scale networks February 01 2016 5 / 65
  7. 7. A Brief overview of Large network dynamics There are a truly enormous number of paper of studies and analysis of real world large networks including nearly any type of online network. Alongside these studies are network models But the removal process of nodes from large networks has rarely been studied empirically and incorporated in very few dynamic into network models. Most dynamic models Many dynamic models have been proposed often these include a edge removal process models that use a node removal process are much rarer. Examples: 6 degrees of separation, the actor network, durr These studies range from single snapshots and painstakingly gathered survey data to the event based dynamic studies. Most of these studies do not account for the removal of nodes. Another area important to us is network modeling, many of the studies of real world networks propose a model or provide best ts of one. Models • Nodal attribute models • Exponental random graphsJohn Clements (Brock University) Dynamics in large scale networks February 01 2016 6 / 65
  8. 8. The two datasets. The businesses competing for add space on Google and Bing. Why did we choose these networks in particular? We looked at the removal or lapse process for businesses in a network of businesses competing for AD space on Google and Bing. The network of friendships among Avatars in the MMOFPS planetside 2. We looked for patterns in the removal or expiration of Avatars in the massively multiplayer online game (MMOG) planetside 2. Look at the merger of two servers. Examine the removal or lapse process for avatars, looking for a simple rule. Why did we collect these? • We thought we could nd patterns in the removal or lapse of these nodes. • Competition example John Clements (Brock University) Dynamics in large scale networks February 01 2016 7 / 65
  9. 9. Crawler overview. 1 Gather a list of active avatar Ids from the server we want to crawl. Add them to the queue of Id's to check. 2 Get the friendlists of all avatar Ids in the queue from the API. If successful remove them from the queue and add it to the list of visited Ids. 3 Then go through the friend list identify which Ids are valid. Save each of these valid friend relationships to the edge set. 4 While there are Ids in the queue go to step two. 5 Record the edge list in a sql table. 6 Gather the avatar attributes for each of the Ids found in the crawl and record them to a sql table. John Clements (Brock University) Dynamics in large scale networks February 01 2016 8 / 65
  10. 10. Planetside 2 avatar attributes The planetside 2 servers Datasets Server Location 7 days 44 days US East EW Emerald US West CW Connery EU MW Miller Our data is drawn from three planetside 2 servers: • Connery the east coast server. • Emerald the server created from the merger of Waterson and Mattherson. • Miller a EU server The available avatar attributes depends on when it was gathered. Common to both datasets. • Id • Name John Clements (Brock University) Dynamics in large scale networks February 01 2016 9 / 65
  11. 11. Exclusive Avatars online in the last 44 days, stored in Connery, Emerald and Miller. • Includes the server merger • Starts on the 23rd of May. Avatars online in the last 7 days, referred to by CW,EW and MW. • Outt Id • Outt size • Creation date • Login count • Last login date • Total time played and time played by month • Number of kills and deaths by month. John Clements (Brock University) Dynamics in large scale networks February 01 2016 10 / 65
  12. 12. Correlation between attributes Average attribute correlation matrix for CW. Degree CC Br Kills Deaths K/D Time Outt Size Degree 1.000 CC -0.005 1.000 Br 0.305 0.059 1.000 Kills 0.210 0.006 0.499 1.000 Deaths 0.206 0.001 0.445 0.794 1.000 K/D 0.103 0.004 0.403 0.324 0.194 1.000 Time 0.246 0.004 0.510 0.792 0.892 0.280 1.000 Outt Size 0.024 -0.033 0.088 0.003 0.065 0.008 0.056 1.000 John Clements (Brock University) Dynamics in large scale networks February 01 2016 11 / 65
  13. 13. Google Ad network visualization John Clements (Brock University) Dynamics in large scale networks February 01 2016 12 / 65
  14. 14. High Clustering Clustering coecient in the planetside 2 snapshots. Advertisement networks. John Clements (Brock University) Dynamics in large scale networks February 01 2016 13 / 65
  15. 15. High Clustering Clustering coecient in the planetside 2 snapshots. Advertisement networks. John Clements (Brock University) Dynamics in large scale networks February 01 2016 13 / 65
  16. 16. Avatar states Active vs Inactive • An avatar with active after the previous snapshot is active. • An inactive avatar is any avatar who is not active but is seen active again in the future. Avatar states • A new avatar is any avatar created after the previous snapshot. • A dead or abandoned avatar is any avatar who never returns from inactivity. • And the third group of Immediately abandoned (IA) new avatars. John Clements (Brock University) Dynamics in large scale networks February 01 2016 14 / 65
  17. 17. small world for long diameters Advertisment network Random graphs. Bing Google Diameter 7 8 3 4 APL 2.528 2.752 2.945 (0.00180) 5.108 (0.00696) Table: The diameter and average path length of the competition network John Clements (Brock University) Dynamics in large scale networks February 01 2016 15 / 65
  18. 18. The clustering coecient distribution of Google. Full Without the spike John Clements (Brock University) Dynamics in large scale networks February 01 2016 16 / 65
  19. 19. The clustering coecient distribution of Bing. Full Without the spike John Clements (Brock University) Dynamics in large scale networks February 01 2016 17 / 65
  20. 20. Emerald August 18th a typical distribution of avatar clustering coecient Full Without the spike John Clements (Brock University) Dynamics in large scale networks February 01 2016 18 / 65
  21. 21. Table of Contents 1 Introduction Denitions 2 A Brief History of Large network dynamics Patterns in the removal of nodes from large networks. Network properties 3 High Clustering 4 Node expiration Connectivity and node expiration. Degree Clustering Coecient Conclusions 5 The server merger Graphical evolution The servers before the merger The merger. Degree dierences. 6 Graph motifs Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 19 / 65
  22. 22. Edges connecting failed companies. Compared with edges in random subgraphs. John Clements (Brock University) Dynamics in large scale networks February 01 2016 20 / 65
  23. 23. Edge dynamics CW EW MW Edge formation existing ↔ existing 29.97% 29.23% 25.15% existing ↔ new 4.74% 5.27% 4.85% existing ↔ IA 3.31% 3.71% 3.32% new ↔ new 0.45% 0.53% 0.56% new ↔ IA 0.30% 0.41% 0.39% IA ↔ IA 0.23% 0.29% 0.30% Edge deletion One removed 53.15% 54.70% 52.01% Both removed 4.30% 5.02% 4.54% Broken 4.06% 4.23% 4.69% Unstable 0.29% 0.30% 0.26% John Clements (Brock University) Dynamics in large scale networks February 01 2016 21 / 65
  24. 24. Power law Degree Distribution The impact of the degree • Power law x−α • Exponential e−λx • Power law with exponential cuto x−α e−λx John Clements (Brock University) Dynamics in large scale networks February 01 2016 22 / 65
  25. 25. Degree of failed companies Bing Google John Clements (Brock University) Dynamics in large scale networks February 01 2016 23 / 65
  26. 26. Degree of Dead Avatars John Clements (Brock University) Dynamics in large scale networks February 01 2016 24 / 65
  27. 27. Clustering coecient distribution of failed companies Bing Google John Clements (Brock University) Dynamics in large scale networks February 01 2016 25 / 65
  28. 28. The normalized battle rank distribution. John Clements (Brock University) Dynamics in large scale networks February 01 2016 26 / 65
  29. 29. Avatar state by size of outt. John Clements (Brock University) Dynamics in large scale networks February 01 2016 27 / 65
  30. 30. Creation date. John Clements (Brock University) Dynamics in large scale networks February 01 2016 28 / 65
  31. 31. Conclusions • Generally the nodes that were removed from both network were peripheral in unimportant positions. • But none of the patterens we did nd were strong indicators in the end. John Clements (Brock University) Dynamics in large scale networks February 01 2016 29 / 65
  32. 32. Table of Contents 1 Introduction Denitions 2 A Brief History of Large network dynamics Patterns in the removal of nodes from large networks. Network properties 3 High Clustering 4 Node expiration Connectivity and node expiration. Degree Clustering Coecient Conclusions 5 The server merger Graphical evolution The servers before the merger The merger. Degree dierences. 6 Graph motifs Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 30 / 65
  33. 33. W e initially started collecting the planetside 2 dataset to capture the merger of two servers. • This is the rst time that a server merger has been captured and studied. • Provides an easily studied analog to real world merger of populations. John Clements (Brock University) Dynamics in large scale networks February 01 2016 31 / 65
  34. 34. Reading the graphs These images were created using Gephi [?] using the force atlas 2 layout. Node size scales linearly with degree, and colour is assigned by the following table. Colour Key Origin Faction NC TR VS Waterson Mattherson Neither John Clements (Brock University) Dynamics in large scale networks February 01 2016 32 / 65
  35. 35. Merger: Waterson June 23 John Clements (Brock University) Dynamics in large scale networks February 01 2016 33 / 65
  36. 36. Merger: Mattherson June 23 John Clements (Brock University) Dynamics in large scale networks February 01 2016 34 / 65
  37. 37. June 30th John Clements (Brock University) Dynamics in large scale networks February 01 2016 35 / 65
  38. 38. July 14th John Clements (Brock University) Dynamics in large scale networks February 01 2016 36 / 65
  39. 39. August 4th John Clements (Brock University) Dynamics in large scale networks February 01 2016 37 / 65
  40. 40. August 18th John Clements (Brock University) Dynamics in large scale networks February 01 2016 38 / 65
  41. 41. September 15 John Clements (Brock University) Dynamics in large scale networks February 01 2016 39 / 65
  42. 42. November 17th. John Clements (Brock University) Dynamics in large scale networks February 01 2016 40 / 65
  43. 43. December 17th. John Clements (Brock University) Dynamics in large scale networks February 01 2016 41 / 65
  44. 44. February 23rd. John Clements (Brock University) Dynamics in large scale networks February 01 2016 42 / 65
  45. 45. Assortivity: Assortativity measures the tendency of nodes to be connected to nodes similar to themselves in some way. The assortativity coecient is dened as follows: r = i ei,i − i a2 i 1 − i a2 i Where ei,j is the fraction of edges that connecting vertexes of type i to vertexes of type j. Let ai to be the fraction of edges connecting to a vertex of type i. The minimum is: rmin = − i a2 i 1 − i a2 i which occurs when ei,j = 0∀i,j John Clements (Brock University) Dynamics in large scale networks February 01 2016 43 / 65
  46. 46. The assortivity by origin. John Clements (Brock University) Dynamics in large scale networks February 01 2016 44 / 65
  47. 47. Degree dierence of cross origin edges. John Clements (Brock University) Dynamics in large scale networks February 01 2016 45 / 65
  48. 48. Degree dierence of mattherson to waterson edges. John Clements (Brock University) Dynamics in large scale networks February 01 2016 46 / 65
  49. 49. Table of Contents 1 Introduction Denitions 2 A Brief History of Large network dynamics Patterns in the removal of nodes from large networks. Network properties 3 High Clustering 4 Node expiration Connectivity and node expiration. Degree Clustering Coecient Conclusions 5 The server merger Graphical evolution The servers before the merger The merger. Degree dierences. 6 Graph motifs Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 47 / 65
  50. 50. Graph motif Finding the potential motifs in a Barabási-Albert graph. Denition The graph motifs of a network are patterns that occur signicantly more often in it then expected in an ensemble of networks[?]. The signicance of a motif is measured with the simple Z score. Signicance Z = M − ¯Mr σr Where M be the number of subgraphs in the network and ¯Mr and σr be the mean and standard deviation for the number found in the ensemble John Clements (Brock University) Dynamics in large scale networks February 01 2016 48 / 65
  51. 51. History of network motifs Introduced in by Shen-Orr et. al. Most research has focused eciently nding motifs such as: • FANMOD 2006 • KAVOSH 2009 In 2013 Johan Ugander found the extremal bounds on the potential subgraphs found in any network by its density. John Clements (Brock University) Dynamics in large scale networks February 01 2016 49 / 65
  52. 52. History of network motifs Introduced in by Shen-Orr et. al. Most research has focused eciently nding motifs such as: • FANMOD 2006 • KAVOSH 2009 In 2013 Johan Ugander found the extremal bounds on the potential subgraphs found in any network by its density. John Clements (Brock University) Dynamics in large scale networks February 01 2016 49 / 65
  53. 53. History of network motifs Introduced in by Shen-Orr et. al. Most research has focused eciently nding motifs such as: • FANMOD 2006 • KAVOSH 2009 In 2013 Johan Ugander found the extremal bounds on the potential subgraphs found in any network by its density. John Clements (Brock University) Dynamics in large scale networks February 01 2016 49 / 65
  54. 54. signifcance prole example John Clements (Brock University) Dynamics in large scale networks February 01 2016 50 / 65
  55. 55. The Barabási Albert Algorithm The algorithm takes two parameters N the number of nodes in the nal graph and m the number of edges each node forms to existing nodes. • Create graph with m unconnected nodes. • While there are less then N nodes in the network, add a node with m edges to existing nodes. • The probability of choosing a existing node is proportional to its degree. [?] P(v) = kx i∈V (G) ki (1) John Clements (Brock University) Dynamics in large scale networks February 01 2016 51 / 65
  56. 56. Random graph ensemble • Traditionally the ensemble consists of random graphs with the same degree distribution as the original network. • However this method results in some correlations that arrise from the degree distrobution itself • So we used a ensemble of Gn,p random graphs with the same density as the original. Gn,p random graph There are two parameters n and p, generate a graph with n nodes for every pair of nodes add an edge with probability p independently. The expected density of such a graph is equal to the p. John Clements (Brock University) Dynamics in large scale networks February 01 2016 52 / 65
  57. 57. Random graph ensemble • Traditionally the ensemble consists of random graphs with the same degree distribution as the original network. • However this method results in some correlations that arrise from the degree distrobution itself • So we used a ensemble of Gn,p random graphs with the same density as the original. Gn,p random graph There are two parameters n and p, generate a graph with n nodes for every pair of nodes add an edge with probability p independently. The expected density of such a graph is equal to the p. John Clements (Brock University) Dynamics in large scale networks February 01 2016 52 / 65
  58. 58. The 4 undirected triads Possible triads Empty (1 − p)3 One edge 3p(1 − p)2 Open Triad 3p2(1 − p) Triangle p3 The density of a completed BA graph is: p = 2m(N − m) N(N − 1) (2) So we can easily compute the expected number of each triad in a Gn,p random graph. N (1 − p)3 N 3p(1 − p)2 N 3p2 (1 − p) N p3 John Clements (Brock University) Dynamics in large scale networks February 01 2016 53 / 65
  59. 59. Empirical tests The triangles created by low values of m. John Clements (Brock University) Dynamics in large scale networks February 01 2016 54 / 65
  60. 60. Probabilistic bounds. A the time t depends entirely on the parameters N and m. Since we start from a empty graph and add m edges with every node the number of edges at any given time must be: E(Gt) = m(t − m) (3) As a result many other graph parameters can be calculated at any given step such as density: D = 2m(t − m) t(t − 1) (4) Examples of edge probabilities and bounds. John Clements (Brock University) Dynamics in large scale networks February 01 2016 55 / 65
  61. 61. Additive bounds. • So when N is greater then 8m2−1+ 16m3−16m2+1 8m−2 the probability of a edge is greater in the BA graph then in a Gn,p graph. • If we were to simply count how many of each subgraph are added and nd bounds for small motifs at least. • The expected number of any n node subgraphs in our ensemble which is simply (N 3)P where P is the probability of such a subgraph in a Gn,p random graph. John Clements (Brock University) Dynamics in large scale networks February 01 2016 56 / 65
  62. 62. Bounds on triads in the BA graph vs the expected number in the ensemble. John Clements (Brock University) Dynamics in large scale networks February 01 2016 57 / 65
  63. 63. Triangles As we know at each timestep we add m at most (m 2) triangles. And at least (m 2) open triads are created at each step after the second. BarabásiAlbert The upper bound on the number of Triangle subgraphs is: N t=m+1 m 2 = 1 2 m(m − 1)(N − m) (5) Random Graph Ensemble The expected number of triangle subgraphs in a Gn,p random graph is: N 3 p3 = 4 3 (N − 2)(N − m)3m3 (N − 1)2N2 (6) John Clements (Brock University) Dynamics in large scale networks February 01 2016 58 / 65
  64. 64. Triangles As we know at each timestep we add m at most (m 2) triangles. And at least (m 2) open triads are created at each step after the second. Trivially: 1 2 m(m − 1)(N − m) 4 3 (N − 2)(N − m)3m3 (N − 1)2N2 For all 0 m N. John Clements (Brock University) Dynamics in large scale networks February 01 2016 58 / 65
  65. 65. Open Triad. Random Graph Ensemble The expected number of open triads in a Gn,p random graph is. N 3 3p2 (p − 1) = 2(N − 2)(N − m)2m2(N2 − 2Nm + 2m2 − N) (N − 1)2N2 (5) BarabásiAlbert The lower bound on the number of open triads in a BA graph. 1 2 m(m + 1)(N − m) ≥ 2(N − 2)(N − m)2m2(N2 − 2Nm + 2m2 − N) (N − 1)2N2 (6) John Clements (Brock University) Dynamics in large scale networks February 01 2016 59 / 65
  66. 66. Open Triad. solution Therefore when m and N are used such that m ≥ N 2 − 1 the open triad will be a motif of the resulting graph. John Clements (Brock University) Dynamics in large scale networks February 01 2016 59 / 65
  67. 67. One Edge. BarabásiAlbert The minimum number of subgraphs containing a single edge. N t=m+1 t − 1 2 − t − 1 − m 2 = 1 2 m(N − 2)(N − m) (5) Random Graph Ensemble While the expected number of subgraphs containing exactly one edge in a Gn,p random graph is: N 3 3p(p − 1)2 = (N − 2)(N − m)m(N2 − 2Nm + 2m2 − N)2 (N − 1)2N2 (6) John Clements (Brock University) Dynamics in large scale networks February 01 2016 60 / 65
  68. 68. One Edge. So the maximum number of one edge subgraphs in the BA graph is greater then the expected number in the Gn,p when: m 1 2 N + 1 2 ( 2 − 1)N2 + (2 − 2)N (5) m 1 2 N − 1 2 ( 2 − 1)N2 + (2 − 2)N (6) John Clements (Brock University) Dynamics in large scale networks February 01 2016 60 / 65
  69. 69. Empty. BarabásiAlbert The upper bound on the number of empty nodes in a BA graph the bound is: m 3 + N t=m+1 t − 1 − m 2 = 1 6 (N − 2)(N2 − 3Nm + 3m2 − N) (7) Random Graph Ensemble The expected number of empty graphs in a Gn,p graph is: N 3 (1 − p)3 = 1 6 (N − 2)(N2 − 2Nm + 2m2 − N)3 (N − 1)2N2 (8) John Clements (Brock University) Dynamics in large scale networks February 01 2016 61 / 65
  70. 70. Empty. For all N 5 the upper bound is less then the expected value of empty subgraphs in the ensemble. John Clements (Brock University) Dynamics in large scale networks February 01 2016 61 / 65
  71. 71. Final bounds Probabilistic bounds When the dierence between m and N is such that N ≥ 8m2 − 16m3 − 16m2 + 1 − 1 8m − 2 holds, then the triangle and empty triads will never be a motif. John Clements (Brock University) Dynamics in large scale networks February 01 2016 62 / 65
  72. 72. Final bounds Additive bounds The open triad will be a motif a BA graph whenever: m N 2 − 1 for any N 2. The single edge triad can only be a motif when: m 1 2 N + 1 2 ( 2 − 1)N2 + (2 − 2)N or m 1 2 N − 1 2 ( 2 − 1)N2 + (2 − 2)N for any valid N. John Clements (Brock University) Dynamics in large scale networks February 01 2016 62 / 65
  73. 73. future work The formation of a servers social network. Continuing from the server merger, we have records of the formation of several new servers that could be investigated. Letting us learn how the original servers structure came to be. Identifying the social structure of players Continuing from the work on the removal of players from the planetside 2 network and from the ad network, it would be very helpful to have better way of identifying the parent players or companies. Potentially changing the networks structure greatly John Clements (Brock University) Dynamics in large scale networks February 01 2016 63 / 65
  74. 74. future work The formation of a servers social network. Continuing from the server merger, we have records of the formation of several new servers that could be investigated. Letting us learn how the original servers structure came to be. Identifying the social structure of players Continuing from the work on the removal of players from the planetside 2 network and from the ad network, it would be very helpful to have better way of identifying the parent players or companies. Potentially changing the networks structure greatly John Clements (Brock University) Dynamics in large scale networks February 01 2016 63 / 65
  75. 75. Future Work: Additional database analysis. By its very nature large datasets will always have more unanswered questions. There are a huge number of potential relationships between the networks, the actors and their removal that we did not have time to test, for example how many of the removed avatars had a typo in their name. In this suggests some future work that seems interesting but is either outside the scope of large network analysis or simply something we did not have time to do. John Clements (Brock University) Dynamics in large scale networks February 01 2016 64 / 65
  76. 76. John Clements (Brock University) Dynamics in large scale networks February 01 2016 65 / 65

×