The spectre of the spectrum


Published on

Slides from a talk I just gave at IU. N

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The spectre of the spectrum

  1. 1. The Spectre of the Spectrum An empirical study of the ? spectra of large networks David F. Gleich     Purdue University Indiana University Thanks to Ali Pinar, Jaideep Ray,Tammy Kolda, Bloomington, IN C. Seshadhri, Rich Lehoucq @ Sandia and Supported by Sandia’s John von Neumann Jure Leskovec and Michael Mahoney @ Stanford postdoctoral fellowship and the DOE for helpful discussions. Office of Science’s Graphs project.David Gleich (Purdue) IU Seminar 1/51
  2. 2. Spectral density plots Log(Density) ~ CountEigenvalue Eigenvalues Eigenvalue index 0 2 Scree Plot Empirical Spectral Measure David Gleich (Purdue) IU Seminar 2/51
  3. 3. There’s information inside the spectra Log(Density) ~ Count Eigenvalues 0 2 Words in dictionary definitions Internet router network 111k vertices, 2.7M edges 192k vetices, 1.2M edges These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra.David Gleich (Purdue) IU Seminar 3/51
  4. 4. OverviewGraphs and their matricesData for our experimentsIssues with computing spectraMany examples of graph spectraA curious property around the eigenvalue oneComputing spectra for large networksOngoing studiesDavid Gleich (Purdue) IU Seminar 4/51
  5. 5. Why are we interested in the spectra?ModelingProperties Moments of the adjacencyRegularitiesAnomaliesNetwork Comparison Fay et al. 2010 – Weighted Spectral Density The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodesDavid Gleich (Purdue) IU Seminar 5/51
  6. 6. Matrices from graphsAdjacency matrix Random walk matrix                                                  if               Modularity matrix                                                                          Laplacian matrix                       Not covered Signless Laplacian matrix                 Incidence matrix                            (It is incidentally discussed) Seidel matrixNormalized Laplacian matrix Heat Kernel                                                                       Everything is undirected. Mostly connected components only too.David Gleich (Purdue) IU Seminar 6/51
  7. 7. Erdős–Rényi Semi-circlesBased on Wigner’s semi-circle law.The eigenvalues of the Countadjacency matrix forn=1000, averagedover 10 trials Warning AdjacencySemi-circle with outlier matrix!if average degree islarge enough. Eigenvalue Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14)David Gleich (Purdue) IU Seminar 7/51
  8. 8. Previous resultsFarkas et al. Significant deviation from the semi- circle law for the adjacency matrixMihail and Papadimitriou Leading eigenvalues of the adjacency matrix obey a power-law based on the degree-sequenceChung et al. Normalized Laplacian still obeys a semi-circle law if min-degree largeBanerjee and Jost Study of types of patterns that emerge in evolving graph models – explain many features of the spectraDavid Gleich (Purdue) IU Seminar 8/51
  9. 9. In comparison to other empiric studiesWe use “exact” computation of spectra, instead of approximation.We study “all” of the standard matrices over a range of large networks.Our “large” is bigger.We look at a few random graph models preferential attachment random powerlaw copying model forest fire modelDavid Gleich (Purdue) IU Seminar 9/51
  10. 10. Why you ISSUES WITH should be COMPUTING very SPECTRA careful with eigenvalues.David Gleich (Purdue) IU Seminar 10
  11. 11. Matlab!Always a great starting point. My desktop has 24GB of RAM (less than $2500 now!)24GB/8 bytes (per double) = 3 billion numbers ~ 50,000-by-50,000 matrixPossibilities D = eig(A) – needs twice the memory for A,D [V,D] = eig(A) – needs three times the memory for A,D,VThese limit us to ~38000 and ~31000 respectively.David Gleich (Purdue) IU Seminar 11/51
  12. 12. Bugs – Matlabeig(A)Returns incorrect eigenvectors Seems to be the result of a bug in Intel’s MKL library. Fixed in R2011aDavid Gleich (Purdue) IU Seminar 12/51
  13. 13. Bug – ScaLAPACK defaultsudo apt-get install scalapack-openmpiAllocate 36000x36000 local matrixRun on 4 processorsCode crashes Fixed in upcoming Debian libscalapackDavid Gleich (Purdue) IU Seminar 13/51
  14. 14. Bug – LAPACKScalapack MRRRCompare standard lapack/blas to atlas performanceResult: correct output from atlasResult: incorrect output from lapackHypothesis: lapack contains a known bug that’s apparently in the default ubuntu lapackDavid Gleich (Purdue) IU Seminar 14/51
  15. 15. MoralAlways test your software.Extensively.David Gleich (Purdue) IU Seminar 15/51
  16. 16. David Gleich (Purdue) IU Seminar 16
  17. 17. EXAMPLESDavid Gleich (Purdue) IU Seminar 17
  18. 18. Data sources SNAP Various 100s-100,000s SNAP-p2p Gnutella Network 5-60k, ~30 inst. SNAP-as-733 Autonomous Sys. ~5,000, 733 inst. SNAP-caida Router networks ~20,000, ~125 inst. Pajek Various 100s-100,000s Models Copying Model 1k-100k 9 inst. 324 gs Pref. Attach 1k-100k 9 inst. 164 gs Forest Fire 1k-100k 9 inst. 324 gs Mine Various 2k-500k Newman Various Arenas Various Porter Facebook 100 schools, 5k-60k IsoRank, Natalie Protein-Protein <10k , 4 graphs Thanks to all who make data availableDavid Gleich (Purdue) IU Seminar 18/51
  19. 19. Big graphsArxiv 86376 1035126 Co-authorDblp 93156 356290 Co-authorDictionary(*) 111982 2750576 Word defns.Internet(*) 124651 414428 RoutersItdk0304 190914 1215220 Routersp2p-gnu(*) 62561 295756 Peer-to-peerPatents(*) 230686 1109898 CitationsRoads 126146 323900 RoadsWordnet(*) 75606 240036 Word*) 325729 2994268 Web (*) denotes that this is a weakly connected component of a directed graph.David Gleich (Purdue) IU Seminar 19/51
  20. 20. A $8,000 matrix computation 925 nodes and 7400 processors on Redsky for 10 hours normalized Laplacian matrixDavid Gleich (Purdue) IU Seminar 20/51
  21. 21. Indiana’s Facebook Network Warning Warning Adjacency Laplacian marix! marix! Data from Mason Porter. Aka, the start of a $50,000,000,000 graph.David Gleich (Purdue) IU Seminar 21/51
  22. 22. Yes! These are cases where we have multiple instances of the same graph.David Gleich (Purdue) IU Seminar 22/51
  23. 23. Already known? Just the facebook spectra.David Gleich (Purdue) IU Seminar 23/51
  24. 24. Already known? I soon realized I was searching for “spectre” instead of spectrum, oops.David Gleich (Purdue) IU Seminar 24/51
  25. 25. Spikes? Repeated rows Identical rows grow the null-space. Banerjee and JostUnit eigenvalue Motif doubling and joining small                                          repeated graphs will tend to cause eigenvalues and null vectors. Banerjee and Jost explained how evolving graphs should produce repeated eigenvaluesDavid Gleich (Purdue) IU Seminar 25/51
  26. 26. Combining EigenvaluesIf A has an eigenvector with a zero component, then“A + B” (as in the figure) has the same eigenvalue with eigenvector extended with zeros on B. Bannerjee and Jost observed this for the normalized Laplacian.David Gleich (Purdue) IU Seminar 26/51
  27. 27. Spikes! 1.33 (two!) 1.5, 0.5 1.5 0.565741 1.833 1.767592 0.725708 1.607625 1.5 (two)David Gleich (Purdue) IU Seminar 27/51
  28. 28. Classic spectra From NASA: Gleich (Purdue) IU Seminar 28/51
  29. 29. Random power law Random power law 12500 vertices, 500 (2.*) / 400 (1.8) min degreeGenerate a power lawdegree distribution.Produce a randomgraph with aprescribed degreedistribution using theBayati-Kim-Saberiprocedure.David Gleich (Purdue) IU Seminar 29/51
  30. 30. Preferential AttachmentStart graph with a k-node clique. Add a new node andconnect to k random nodes, chosen proportional to degree. Semi-circle in log-space!David Gleich (Purdue) IU Seminar 30/51
  31. 31. Copying modelStart graph with a k-node clique. Add a new node and pick aparent uniformly at random. Copy edges of parent and makean error with probability     Obvious follow up here: does a random sample with the same degree distribution show the same thing?David Gleich (Purdue) IU Seminar 31/51
  32. 32. Forest Fire modelsStart graph with a k-node clique. Add a new node and pick a parent uniformly at random. Do a random “bfs’/”forest fire” and link to all nodes “burned”David Gleich (Purdue) IU Seminar 32/51
  33. 33. Reality vs. graph modelsDavid Gleich (Purdue) IU Seminar 33/51
  34. 34. Where is this going?We can computespectra for largenetworks ifneeded.Study relationshipwith known power-laws in spectraEigenvectorlocalizationDirected LaplaciansDavid Gleich (Purdue) IU Seminar 34/51
  35. 35. Just the degree distribution? NoDavid Gleich (Purdue) IU Seminar 35/51
  36. 36. Facebook is not a copying modelDavid Gleich (Purdue) IU Seminar 36/51
  37. 37. Why is one separated sometimes?David Gleich (Purdue) IU Seminar 37/51
  38. 38. Separation in copying, forest-fire Note, the axes on these fiures aren’t comparable – different plotting scales – but the shapes are.David Gleich (Purdue) IU Seminar 38/51
  39. 39. Forest-fire plotsDavid Gleich (Purdue) IU Seminar 39/51
  40. 40. Strong separation in random powerlawDavid Gleich (Purdue) IU Seminar 40/51
  41. 41. Not edge-density aloneDavid Gleich (Purdue) IU Seminar 41/51
  42. 42. Not edge-density aloneDavid Gleich (Purdue) IU Seminar 42/51
  43. 43. COMPUTING SPECTRA OF LARGE NETWORKSDavid Gleich (Purdue) IU Seminar 43
  44. 44. Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time.David Gleich (Purdue) IU Seminar 44/51
  45. 45. Eigenvalues with ScaLAPACKMostly the same approach as in LAPACK1.  Reduce to tridiagonal form (most time consuming part)2.  Distribute tridiagonals to all processors3.  Each processor finds all eigenvalues4.  Each processor computes a subset of eigenvectors ScaLAPACK’s 2d block cyclic storageI’m actually using the MRRR algorithm,where steps 3 and 4 are better and faster MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel.David Gleich (Purdue) IU Seminar 45/51
  46. 46. Estimating the density directly    and have the same eigenvalue inertia if     is non- singular. Eigenvalue inertia = (p,n,z) Positive eigenvalues Negative eigenvalues Zero eigenvalues If       is diagonal, inertia is easy to compute     has inertia (n-1,0,1)        has inertia (sum(      ), sum(      ),…) This is an old trick in linear algebra. I know that Fay et al. used it in their weighted spectral density.David Gleich (Purdue) IU Seminar 46/51
  47. 47. AlternativesUse ARPACK to get extremaUse ARPACK to get interior around      via the folded spectrum                     Large nearly repeated sets of eigenvalues will make this tricky. Farkas et al. used this approach. Figure from somewhere on the web… sorry!David Gleich (Purdue) IU Seminar 47/51
  48. 48. Adding MPI tasks vs. using threadsMost math libraries have threaded versions (Intel MKL, AMD ACML)Is it better to use threads or MPI tasks?It depends. Cray libsci Intel MKL Threads Ranks TimeThreads Ranks Time-T Time-E 1 64 1412.51 36 1271.4 339.0 4 16 1881.44 9 1058.1 456.6 16 4 Omitted. Normalized Laplacian for 36k-by-36k co-author graph of CondMatDavid Gleich (Purdue) IU Seminar 48/51
  49. 49. Weak Parallel Scaling Time               Time in hours Good strong scaling up to 325,000 vertices Estimated time for 500,000 nodes 9 hours with 925 nodes (7400 procs)          David Gleich (Purdue) IU Seminar 49/51
  50. 50. Code will be available eventually. Image from good financial cents.David Gleich (Purdue) IU Seminar 50 of <Total>
  51. 51. Same density, but somewhat different Both have a mean degree of 3.8David Gleich (Purdue) IU Seminar 51/51
  52. 52. GRAPHS As well as things we AND THEIR already know about graph spectra. MATRICESDavid Gleich (Purdue) IU Seminar 52
  53. 53. Spectral bounds from Gerschgorin$$-d_{max} le lambda(mA) le d_{max}$$$$0 le lambda(mL) le 2 d_{max}$$$$0 le lambda(mat{tilde{L}}) le 2$$ (from a slightly different approach)David Gleich (Purdue) IU Seminar 53/51
  54. 54. ScaLAPACKLAPACK with distributed memory dense matricesScalapack uses a 2d block-cyclic dense matrix distributionDavid Gleich (Purdue) IU Seminar 54/51
  55. 55. usroads Connected component from the us road networkDavid Gleich (Purdue) IU Seminar 55/51
  56. 56. Movies of spectra…As these models evolve, what do the spectra look like?David Gleich (Purdue) IU Seminar 56/51
  57. 57. Gnutella large vs. resampledDavid Gleich (Purdue) IU Seminar 57/51
  58. 58. David Gleich (Purdue) IU Seminar 58 of <Total>
  59. 59. ModelsPreferential AttachmentStart graph with a k-node clique. Add a new node and connect to k random nodes, chosen proportional to degree.Copying modelStart graph with a k-node clique. Add a new node and pick a parent uniformly at random. Copy edges of parent and make an error with probability $$alpha$$Forest FireStart graph with a k-node clique. Add a new node and pick a parent uniformly at random. Do a random “bfs’/”forest fire” and link to all nodes “burned”David Gleich (Purdue) IU Seminar 59/51
  60. 60. Nullspaces of the adjacency matrix                                         So unit eigenvalues of the normalized Laplacian are null- vectors of the adjacency matrix.David Gleich (Purdue) IU Seminar 60/51