The Spectre of the Spectra


Published on

The slides from my presentation at SIAM's CSE 2011 meeting about graph spectra and the information inside them. This material is still preliminary and you should check with me for the latest information about it.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Spectre of the Spectra

  1. 1. Tweet Along @dgleich The Spectre of the Spectrum An empirical study of the ? spectra of large networks David F. Gleich   Sandia National Laboratories SIAM CSE 2011 Thanks to Ali Pinar, Jaideep Ray, Tammy Kolda, 28 February 2011 C. Seshadhri, Rich Lehoucq @ Sandia and Supported by Sandia’s John von Neumann Jure Leskovec and Michael Mahoney @ Stanford postdoctoral fellowship and the DOE for helpful discussions. Office of Science’s Graphs project.Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.2/24/2011 SIAM CSE 2011 1/36
  2. 2. 2/24/2011 SIAM CSE 2011 2/28
  3. 3. Tweet Along @dgleichThere’s information inside the spectra Words in dictionary definitions Internet router network 111k vertices, 2.7M edges 192k vetices, 1.2M edges These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra.2/24/2011 SIAM CSE 2011 3/28
  4. 4. Tweet Along @dgleichWhy are we interested in the spectra?ModelingProperties Moments of the adjacencyAnomaliesRegularitiesNetwork Comparison Fay et al. 2010 – Weighted Spectral Density The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes2/24/2011 SIAM CSE 2011 4/28
  5. 5. Tweet Along @dgleichMatrices from graphsAdjacency matrix Random walk matrix     if   Modularity matrix     Laplacian matrix  Not covered Signless Laplacian matrix  Incidence matrix  (It is incidentally discussed) Seidel matrixNormalized Laplacian matrix Heat Kernel   Everything is undirected. Mostly connected components only too.2/24/2011 SIAM CSE 2011 5/28
  6. 6. Tweet Along @dgleichErdős–Rényi Semi-circlesBased on Wigner’s semi-circle law.The eigenvalues of the Countadjacency matrix forn=1000, averagedover 10 trialsSemi-circle with outlierif average degree islarge enough. Eigenvalue Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14)2/24/2011 SIAM CSE 2011 6/28
  7. 7. Tweet Along @dgleichPrevious resultsFarkas et al. : Significant deviation from the semi-circle law for the adjacency matrixMihail and Papadimitriou : Leading eigenvalues of the adjacency matrix obey a power-law based on the degree-sequenceChung et al. : Normalized Laplacian still obeys a semi-circle law if min-degree largeBanerjee and Jost : Study of types of patterns that emerge in evolving graph models – explain many features of the spectra2/24/2011 SIAM CSE 2011 7/28
  8. 8. Tweet Along @dgleichIn comparison to other empiric studiesWe use “exact” computation of spectra, instead of approximation.We study “all” of the standard matrices over a range of large networks.Our “large” is bigger.We look at a few different random graph models.2/24/2011 SIAM CSE 2011 8/28
  9. 9. Tweet Along @dgleichData sources SNAP Various 100s-100,000s SNAP-p2p Gnutella Network 5-60k, ~30 inst. SNAP-as-733 Autonomous Sys. ~5,000, 733 inst. SNAP-caida Router networks ~20,000, ~125 inst. Pajek Various 100s-100,000s Models Copying Model 1k-100k 9 inst. 324 gs Pref. Attach 1k-100k 9 inst. 164 gs Forest Fire 1k-100k 9 inst. 324 gs Mine Various 2k-500k Newman Various Arenas Various Porter Facebook 100 schools, 5k-60k IsoRank, Natalie Protein-Protein <10k , 4 graphs Thanks to all who make data available2/24/2011 SIAM CSE 2011 9/28
  10. 10. Tweet Along @dgleichBig graphsArxiv 86376 1035126 Co-authorDblp 93156 356290 Co-authorDictionary(*) 111982 2750576 Word defns.Internet(*) 124651 414428 RoutersItdk0304 190914 1215220 Routersp2p-gnu(*) 62561 295756 Peer-to-peerPatents(*) 230686 1109898 CitationsRoads 126146 323900 RoadsWordnet(*) 75606 240036 Word 325729 2994268 Web (*) denotes that this is a weakly connected component of a directed graph.2/24/2011 SIAM CSE 2011 10/28
  11. 11. Tweet Along @dgleichModelsPreferential AttachmentStart graph with a k-node clique. Add a new node and connect to k random nodes, chosen proportional to degree.Copying modelStart graph with a k-node clique. Add a new node and pick a parent uniformly at random. Copy edges of parent and make an error with probability  Forest FireStart graph with a k-node clique. Add a new node and pick a parent uniformly at random. Do a random “bfs’/”forest fire” and link to all nodes “burned”2/24/2011 SIAM CSE 2011 11/28
  12. 12. Tweet Along @dgleich COMPUTING SPECTRA OF LARGE NETWORKS2/24/2011 SIAM CSE 2011 12
  13. 13. Tweet Along @dgleich Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time.2/24/2011 SIAM CSE 2011 13/28
  14. 14. Tweet Along @dgleichEigenvalues with ScaLAPACKMostly the same approach as in LAPACK1. Reduce to tridiagonal form (most time consuming part)2. Distribute tridiagonals to all processors3. Each processor finds all eigenvalues4. Each processor computes a subset of eigenvectorsI’m actually using the MRRR algorithm,where steps 3 and 4 are better and faster MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel.2/24/2011 SIAM CSE 2011 14/28
  15. 15. Tweet Along @dgleichAlternativesUse ARPACK to get extremaUse ARPACK to get interior around   via the folded spectrum  Large nearly repeated sets of eigenvalues will make this tricky. Farkas et al. used this approach. Figure from somewhere on the web… sorry!2/24/2011 SIAM CSE 2011 15/28
  16. 16. Tweet Along @dgleichAdding MPI tasks vs. using threadsMost math libraries have threaded versions (Intel MKL, AMD ACML)Is it better to use threads or MPI tasks?It depends. Cray libsci Intel MKL Threads Ranks TimeThreads Ranks Time-T Time-E 1 64 1412.51 36 1271.4 339.0 4 16 1881.44 9 1058.1 456.6 16 4 Omitted. Normalized Laplacian for 36k-by-36k co-author graph of CondMat2/24/2011 SIAM CSE 2011 16/28
  17. 17. Tweet Along @dgleich Weak Parallel Scaling Time  Time in hours Good strong scaling up to 325,000 vertices Estimated time for 500,000 nodes 9 hours with 925 nodes (7400 procs)   2/24/2011 SIAM CSE 2011 17/28
  18. 18. Tweet Along @dgleich EXAMPLES2/24/2011 SIAM CSE 2011 18
  19. 19. Tweet Along @dgleichA $8,000 matrix computation 325729 nodes and 2994268 edges 500 nodes and 4000 processors on Redsky for 5 hours x 2 for normalized Laplacian/adjacency matrix2/24/2011 SIAM CSE 2011 19/28
  20. 20. Tweet Along @dgleich Yes! These are cases where we have multiple instances of the same graph.2/24/2011 SIAM CSE 2011 20/28
  21. 21. Tweet Along @dgleichAlready known? Just the facebook spectra.2/24/2011 SIAM CSE 2011 21/28
  22. 22. Tweet Along @dgleichAlready known? I soon realized I was searching for “spectre” instead of spectrum, oops.2/24/2011 SIAM CSE 2011 22/28
  23. 23. Tweet Along @dgleichSpikes? Repeated rows Identical rows grow the null-space. Banerjee and JostUnit eigenvalue Motif doubling and joining small  graphs will tend to cause repeated eigenvalues and null vectors. Banerjee and Jost explained how evolving graphs should produce repeated eigenvalues2/24/2011 SIAM CSE 2011 23/28
  24. 24. Tweet Along @dgleichCopying model Obvious follow up here: does a random sample with the same degree distribution show the same thing?2/24/2011 SIAM CSE 2011 24/28
  25. 25. Tweet Along @dgleichForest Fire models2/24/2011 SIAM CSE 2011 25/28
  26. 26. Tweet Along @dgleichPreferential Attachment Semi-circle in log-space!2/24/2011 SIAM CSE 2011 26/28
  27. 27. Tweet Along @dgleichWhere is this going?We can computespectra for largenetworks ifneeded.Study relationshipwith known power-laws in spectraEigenvectorlocalizationDirected Laplacians2/24/2011 SIAM CSE 2011 27/28
  28. 28. Tweet Along @dgleichNullspaces of the adjacency matrix So unit eigenvalues of the normalized Laplacian are null- vectors of the adjacency matrix.2/24/2011 SIAM CSE 2011 28/28
  29. 29. Code will be available eventually. Image from good financial cents.2/24/2011 SIAM CSE 2011 29 of <Total>