The document summarizes a study on computing and analyzing the spectra of large network graphs. It discusses computing the spectra of real-world and model networks with thousands to hundreds of thousands of nodes using parallel computing techniques. It presents examples of spectra for networks like the Internet router network, social networks, and preferential attachment models. The study aims to better understand properties like spikes, repeated eigenvalues, and relationships to known power laws from analyzing the spectra of large empirical and model networks.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
The Spectre of the Spectra
1. Tweet Along @dgleich
The Spectre of the Spectrum
An empirical study of the
?
spectra of large networks
David F. Gleich
Sandia National Laboratories
SIAM CSE 2011
Thanks to Ali Pinar, Jaideep Ray, Tammy Kolda, 28 February 2011
C. Seshadhri, Rich Lehoucq @ Sandia
and Supported by Sandia’s John von Neumann
Jure Leskovec and Michael Mahoney @ Stanford postdoctoral fellowship and the DOE
for helpful discussions. Office of Science’s Graphs project.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
2/24/2011 SIAM CSE 2011 1/36
3. Tweet Along @dgleich
There’s information inside the spectra
Words in dictionary definitions Internet router network
111k vertices, 2.7M edges 192k vetices, 1.2M edges
These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra.
2/24/2011 SIAM CSE 2011 3/28
4. Tweet Along @dgleich
Why are we interested in the spectra?
Modeling
Properties
Moments of the adjacency
Anomalies
Regularities
Network Comparison
Fay et al. 2010 – Weighted Spectral Density
The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes
2/24/2011 SIAM CSE 2011 4/28
5. Tweet Along @dgleich
Matrices from graphs
Adjacency matrix Random walk matrix
if Modularity matrix
Laplacian matrix
Not covered
Signless Laplacian matrix
Incidence matrix
(It is incidentally discussed)
Seidel matrix
Normalized Laplacian matrix
Heat Kernel
Everything is undirected. Mostly connected components only too.
2/24/2011 SIAM CSE 2011 5/28
6. Tweet Along @dgleich
Erdős–Rényi Semi-circles
Based on Wigner’s
semi-circle law.
The eigenvalues of the
Count
adjacency matrix for
n=1000, averaged
over 10 trials
Semi-circle with outlier
if average degree is
large enough.
Eigenvalue
Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14)
2/24/2011 SIAM CSE 2011 6/28
7. Tweet Along @dgleich
Previous results
Farkas et al. : Significant deviation from the
semi-circle law for the adjacency matrix
Mihail and Papadimitriou : Leading eigenvalues
of the adjacency matrix obey a power-law
based on the degree-sequence
Chung et al. : Normalized Laplacian still
obeys a semi-circle law if min-degree large
Banerjee and Jost : Study of types of patterns
that emerge in evolving graph models –
explain many features of the spectra
2/24/2011 SIAM CSE 2011 7/28
8. Tweet Along @dgleich
In comparison to other empiric studies
We use “exact” computation of spectra,
instead of approximation.
We study “all” of the standard matrices
over a range of large networks.
Our “large” is bigger.
We look at a few different random graph
models.
2/24/2011 SIAM CSE 2011 8/28
9. Tweet Along @dgleich
Data sources
SNAP Various 100s-100,000s
SNAP-p2p Gnutella Network 5-60k, ~30 inst.
SNAP-as-733 Autonomous Sys. ~5,000, 733 inst.
SNAP-caida Router networks ~20,000, ~125 inst.
Pajek Various 100s-100,000s
Models Copying Model 1k-100k 9 inst. 324 gs
Pref. Attach 1k-100k 9 inst. 164 gs
Forest Fire 1k-100k 9 inst. 324 gs
Mine Various 2k-500k
Newman Various
Arenas Various
Porter Facebook 100 schools, 5k-60k
IsoRank, Natalie Protein-Protein <10k , 4 graphs
Thanks to all who make data available
2/24/2011 SIAM CSE 2011 9/28
10. Tweet Along @dgleich
Big graphs
Arxiv 86376 1035126 Co-author
Dblp 93156 356290 Co-author
Dictionary(*) 111982 2750576 Word defns.
Internet(*) 124651 414428 Routers
Itdk0304 190914 1215220 Routers
p2p-gnu(*) 62561 295756 Peer-to-peer
Patents(*) 230686 1109898 Citations
Roads 126146 323900 Roads
Wordnet(*) 75606 240036 Word relation
web-nb.edu 325729 2994268 Web
(*) denotes that this is a weakly connected component of a directed graph.
2/24/2011 SIAM CSE 2011 10/28
11. Tweet Along @dgleich
Models
Preferential Attachment
Start graph with a k-node clique. Add a new node and
connect to k random nodes, chosen proportional to degree.
Copying model
Start graph with a k-node clique. Add a new node and pick a
parent uniformly at random. Copy edges of parent and
make an error with probability
Forest Fire
Start graph with a k-node clique. Add a new node and pick a
parent uniformly at random. Do a random “bfs’/”forest fire”
and link to all nodes “burned”
2/24/2011 SIAM CSE 2011 11/28
12. Tweet Along @dgleich
COMPUTING
SPECTRA OF
LARGE NETWORKS
2/24/2011 SIAM CSE 2011 12
13. Tweet Along @dgleich
Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time.
2/24/2011 SIAM CSE 2011 13/28
14. Tweet Along @dgleich
Eigenvalues with ScaLAPACK
Mostly the same approach as in LAPACK
1. Reduce to tridiagonal form
(most time consuming part)
2. Distribute tridiagonals to
all processors
3. Each processor finds
all eigenvalues
4. Each processor computes a
subset of eigenvectors
I’m actually using the MRRR algorithm,
where steps 3 and 4 are better and faster
MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel.
2/24/2011 SIAM CSE 2011 14/28
15. Tweet Along @dgleich
Alternatives
Use ARPACK to get extrema
Use ARPACK to get interior around via the folded spectrum
Large nearly
repeated sets of
eigenvalues will
make this tricky.
Farkas et al. used this approach. Figure from somewhere on the web… sorry!
2/24/2011 SIAM CSE 2011 15/28
16. Tweet Along @dgleich
Adding MPI tasks vs. using threads
Most math libraries have threaded versions
(Intel MKL, AMD ACML)
Is it better to use threads or MPI tasks?
It depends.
Cray libsci
Intel MKL
Threads Ranks Time
Threads Ranks Time-T Time-E
1 64 1412.5
1 36 1271.4 339.0
4 16 1881.4
4 9 1058.1 456.6
16 4 Omitted.
Normalized Laplacian for 36k-by-36k co-author graph of CondMat
2/24/2011 SIAM CSE 2011 16/28
17. Tweet Along @dgleich
Weak Parallel Scaling
Time
Time in hours
Good strong
scaling up to
325,000 vertices
Estimated time for
500,000 nodes
9 hours with
925 nodes
(7400 procs)
2/24/2011 SIAM CSE 2011 17/28
19. Tweet Along @dgleich
A $8,000 matrix computation
325729 nodes and 2994268 edges
500 nodes and 4000 processors on Redsky for 5 hours x 2 for normalized Laplacian/adjacency matrix
2/24/2011 SIAM CSE 2011 19/28
20. Tweet Along @dgleich
Yes!
These are cases where we have multiple instances of the same graph.
2/24/2011 SIAM CSE 2011 20/28
22. Tweet Along @dgleich
Already known?
I soon realized I was searching for “spectre” instead of spectrum, oops.
2/24/2011 SIAM CSE 2011 22/28
23. Tweet Along @dgleich
Spikes? Repeated rows
Identical rows grow the null-space.
Banerjee and Jost
Unit eigenvalue Motif doubling and joining small
graphs will tend to cause repeated
eigenvalues and null vectors.
Banerjee and Jost explained how evolving graphs should produce repeated eigenvalues
2/24/2011 SIAM CSE 2011 23/28
24. Tweet Along @dgleich
Copying model
Obvious follow up here: does a random sample with the same degree distribution show the same thing?
2/24/2011 SIAM CSE 2011 24/28
27. Tweet Along @dgleich
Where is this going?
We can compute
spectra for large
networks if
needed.
Study relationship
with known power-
laws in spectra
Eigenvector
localization
Directed Laplacians
2/24/2011 SIAM CSE 2011 27/28
28. Tweet Along @dgleich
Nullspaces of the adjacency matrix
So unit eigenvalues of the normalized Laplacian are null-
vectors of the adjacency matrix.
2/24/2011 SIAM CSE 2011 28/28
29. Code will be available eventually. Image from good financial cents.
2/24/2011 SIAM CSE 2011 29 of <Total>