SlideShare a Scribd company logo
1 of 29
Download to read offline
Tweet Along @dgleich


                                                             The Spectre of the Spectrum
                                                                                                An empirical study of the




                             ?
                                                                                                spectra of large networks


                                                                                                     David F. Gleich
                                  
                                                                                        Sandia National Laboratories

                                                                                                             SIAM CSE 2011
  Thanks to Ali Pinar, Jaideep Ray, Tammy Kolda,                                                            28 February 2011
  C. Seshadhri, Rich Lehoucq @ Sandia
  and                                                                                     Supported by Sandia’s John von Neumann
  Jure Leskovec and Michael Mahoney @ Stanford                                                postdoctoral fellowship and the DOE
  for helpful discussions.                                                                       Office of Science’s Graphs project.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
                       Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
2/24/2011                                                      SIAM CSE 2011                                                               1/36
2/24/2011   SIAM CSE 2011   2/28
Tweet Along @dgleich



There’s information inside the spectra




      Words in dictionary definitions                                  Internet router network
       111k vertices, 2.7M edges                                      192k vetices, 1.2M edges
             These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra.
2/24/2011                                         SIAM CSE 2011                                                      3/28
Tweet Along @dgleich



Why are we interested in the spectra?
Modeling

Properties
  Moments of the adjacency

Anomalies

Regularities

Network Comparison
  Fay et al. 2010 – Weighted Spectral Density
  The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes
2/24/2011                                            SIAM CSE 2011                                                    4/28
Tweet Along @dgleich



Matrices from graphs
Adjacency matrix                          Random walk matrix
                                           
 
      if                                  Modularity matrix
                                           
                                           

Laplacian matrix
                                          Not covered
                                          Signless Laplacian matrix
 
                                          Incidence matrix
                                           (It is incidentally discussed)

                                          Seidel matrix
Normalized Laplacian matrix
                                          Heat Kernel
 
 
                             Everything is undirected. Mostly connected components only too.
2/24/2011              SIAM CSE 2011                                                    5/28
Tweet Along @dgleich



Erdős–Rényi Semi-circles
Based on Wigner’s
  semi-circle law.

The eigenvalues of the




                                           Count
adjacency matrix for
n=1000, averaged
over 10 trials

Semi-circle with outlier
if average degree is
large enough.
                                                          Eigenvalue

                     Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14)
2/24/2011                            SIAM CSE 2011                                                    6/28
Tweet Along @dgleich



Previous results
Farkas et al. : Significant deviation from the
  semi-circle law for the adjacency matrix

Mihail and Papadimitriou : Leading eigenvalues
  of the adjacency matrix obey a power-law
  based on the degree-sequence

Chung et al. : Normalized Laplacian still
  obeys a semi-circle law if min-degree large

Banerjee and Jost : Study of types of patterns
  that emerge in evolving graph models –
  explain many features of the spectra
2/24/2011                   SIAM CSE 2011                       7/28
Tweet Along @dgleich



In comparison to other empiric studies
We use “exact” computation of spectra,
 instead of approximation.

We study “all” of the standard matrices
 over a range of large networks.

Our “large” is bigger.

We look at a few different random graph
 models.



2/24/2011                  SIAM CSE 2011                  8/28
Tweet Along @dgleich



Data sources
 SNAP               Various                   100s-100,000s
 SNAP-p2p           Gnutella Network          5-60k, ~30 inst.
 SNAP-as-733        Autonomous Sys.           ~5,000, 733 inst.
 SNAP-caida         Router networks           ~20,000, ~125 inst.
 Pajek              Various                   100s-100,000s
 Models             Copying Model             1k-100k 9 inst. 324 gs
                    Pref. Attach              1k-100k 9 inst. 164 gs
                    Forest Fire               1k-100k 9 inst. 324 gs
 Mine               Various                   2k-500k
 Newman             Various
 Arenas             Various
 Porter             Facebook                  100 schools, 5k-60k
 IsoRank, Natalie   Protein-Protein           <10k , 4 graphs
                                                 Thanks to all who make data available
2/24/2011                     SIAM CSE 2011                                       9/28
Tweet Along @dgleich



Big graphs
Arxiv           86376             1035126                          Co-author
Dblp            93156             356290                           Co-author
Dictionary(*)   111982            2750576                          Word defns.
Internet(*)     124651            414428                           Routers
Itdk0304        190914            1215220                          Routers
p2p-gnu(*)      62561             295756                           Peer-to-peer
Patents(*)      230686            1109898                          Citations
Roads           126146            323900                           Roads
Wordnet(*)      75606             240036                           Word relation
web-nb.edu      325729            2994268                          Web



                         (*) denotes that this is a weakly connected component of a directed graph.
2/24/2011                  SIAM CSE 2011                                                     10/28
Tweet Along @dgleich



Models
Preferential Attachment
Start graph with a k-node clique. Add a new node and
  connect to k random nodes, chosen proportional to degree.

Copying model
Start graph with a k-node clique. Add a new node and pick a
  parent uniformly at random. Copy edges of parent and
  make an error with probability  

Forest Fire
Start graph with a k-node clique. Add a new node and pick a
  parent uniformly at random. Do a random “bfs’/”forest fire”
  and link to all nodes “burned”
2/24/2011                 SIAM CSE 2011                      11/28
Tweet Along @dgleich




                COMPUTING
                 SPECTRA OF
            LARGE NETWORKS




2/24/2011               SIAM CSE 2011                   12
Tweet Along @dgleich




                   Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time.
2/24/2011   SIAM CSE 2011                                                      13/28
Tweet Along @dgleich



Eigenvalues with ScaLAPACK
Mostly the same approach as in LAPACK

1. Reduce to tridiagonal form
   (most time consuming part)
2. Distribute tridiagonals to
    all processors
3. Each processor finds
   all eigenvalues
4. Each processor computes a
   subset of eigenvectors

I’m actually using the MRRR algorithm,
where steps 3 and 4 are better and faster
                       MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel.
2/24/2011                     SIAM CSE 2011                                                   14/28
Tweet Along @dgleich



Alternatives
Use ARPACK to get extrema

Use ARPACK to get interior around                 via the folded spectrum
 




                                                                      Large nearly
                                                                      repeated sets of
                                                                      eigenvalues will
                                                                      make this tricky.
                        Farkas et al. used this approach. Figure from somewhere on the web… sorry!
2/24/2011                    SIAM CSE 2011                                                   15/28
Tweet Along @dgleich



Adding MPI tasks vs. using threads
Most math libraries have threaded versions
   (Intel MKL, AMD ACML)
Is it better to use threads or MPI tasks?

It depends.

                                                              Cray libsci
                 Intel MKL
                                               Threads          Ranks             Time
Threads Ranks        Time-T   Time-E
                                               1                64                1412.5
1           36       1271.4   339.0
                                               4                16                1881.4
4           9        1058.1   456.6
                                               16               4                 Omitted.

                                   Normalized Laplacian for 36k-by-36k co-author graph of CondMat
2/24/2011                      SIAM CSE 2011                                                16/28
Tweet Along @dgleich



    Weak Parallel Scaling
                                    Time  
Time in hours




                                    Good strong
                                      scaling up to
                                      325,000 vertices

                                    Estimated time for
                                      500,000 nodes
                                      9 hours with
                                      925 nodes
                                      (7400 procs)
                 
    2/24/2011       SIAM CSE 2011                          17/28
Tweet Along @dgleich




            EXAMPLES




2/24/2011        SIAM CSE 2011                   18
Tweet Along @dgleich



A $8,000 matrix computation




                                                                           325729 nodes and 2994268 edges
            500 nodes and 4000 processors on Redsky for 5 hours x 2 for normalized Laplacian/adjacency matrix
2/24/2011                               SIAM CSE 2011                                                   19/28
Tweet Along @dgleich




            Yes!


                     These are cases where we have multiple instances of the same graph.
2/24/2011          SIAM CSE 2011                                                  20/28
Tweet Along @dgleich



Already known?




                                 Just the facebook spectra.
2/24/2011        SIAM CSE 2011                       21/28
Tweet Along @dgleich



Already known?




                  I soon realized I was searching for “spectre” instead of spectrum, oops.
2/24/2011        SIAM CSE 2011                                                      22/28
Tweet Along @dgleich



Spikes?                              Repeated rows
                                     Identical rows grow the null-space.

                                     Banerjee and Jost
Unit eigenvalue                      Motif doubling and joining small
                                     graphs will tend to cause repeated
                                     eigenvalues and null vectors.




                  Banerjee and Jost explained how evolving graphs should produce repeated eigenvalues
2/24/2011                      SIAM CSE 2011                                                   23/28
Tweet Along @dgleich



Copying model




            Obvious follow up here: does a random sample with the same degree distribution show the same thing?
2/24/2011                                SIAM CSE 2011                                                    24/28
Tweet Along @dgleich



Forest Fire models




2/24/2011       SIAM CSE 2011                 25/28
Tweet Along @dgleich



Preferential Attachment




            Semi-circle in log-space!


2/24/2011                SIAM CSE 2011                 26/28
Tweet Along @dgleich



Where is this going?
We can compute
spectra for large
networks if
needed.

Study relationship
with known power-
laws in spectra

Eigenvector
localization

Directed Laplacians
2/24/2011             SIAM CSE 2011                 27/28
Tweet Along @dgleich



Nullspaces of the adjacency matrix
 

So unit eigenvalues of the normalized Laplacian are null-
  vectors of the adjacency matrix.




2/24/2011                  SIAM CSE 2011                       28/28
Code will be available eventually. Image from good financial cents.
2/24/2011                               SIAM CSE 2011                             29 of <Total>

More Related Content

Similar to The Spectre of the Spectra

The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrumDavid Gleich
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyFabio Porto
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-finalmarpierc
 
Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling Leila Jalali
 
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...Larry Smarr
 
High Performance Cyberinfrastructure Required for Data Intensive Scientific R...
High Performance Cyberinfrastructure Required for Data Intensive Scientific R...High Performance Cyberinfrastructure Required for Data Intensive Scientific R...
High Performance Cyberinfrastructure Required for Data Intensive Scientific R...Larry Smarr
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)Hajime Sasaki
 
ENERGY EFFICIENCY IN AD HOC NETWORKS
ENERGY EFFICIENCY IN AD HOC NETWORKSENERGY EFFICIENCY IN AD HOC NETWORKS
ENERGY EFFICIENCY IN AD HOC NETWORKSijasuc
 
Coupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation EconomyCoupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation EconomyLarry Smarr
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineLarry Smarr
 
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...Larry Smarr
 
Active metamaterials
Active metamaterialsActive metamaterials
Active metamaterialsMichael Singh
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...TCBG
 

Similar to The Spectre of the Spectra (20)

The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrum
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in Astronomy
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling
 
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
 
High Performance Cyberinfrastructure Required for Data Intensive Scientific R...
High Performance Cyberinfrastructure Required for Data Intensive Scientific R...High Performance Cyberinfrastructure Required for Data Intensive Scientific R...
High Performance Cyberinfrastructure Required for Data Intensive Scientific R...
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
ENERGY EFFICIENCY IN AD HOC NETWORKS
ENERGY EFFICIENCY IN AD HOC NETWORKSENERGY EFFICIENCY IN AD HOC NETWORKS
ENERGY EFFICIENCY IN AD HOC NETWORKS
 
Coupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation EconomyCoupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation Economy
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
NoSQL
NoSQLNoSQL
NoSQL
 
130 133
130 133130 133
130 133
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
 
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
Active metamaterials
Active metamaterialsActive metamaterials
Active metamaterials
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 

More from David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

The Spectre of the Spectra

  • 1. Tweet Along @dgleich The Spectre of the Spectrum An empirical study of the ? spectra of large networks David F. Gleich   Sandia National Laboratories SIAM CSE 2011 Thanks to Ali Pinar, Jaideep Ray, Tammy Kolda, 28 February 2011 C. Seshadhri, Rich Lehoucq @ Sandia and Supported by Sandia’s John von Neumann Jure Leskovec and Michael Mahoney @ Stanford postdoctoral fellowship and the DOE for helpful discussions. Office of Science’s Graphs project. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 2/24/2011 SIAM CSE 2011 1/36
  • 2. 2/24/2011 SIAM CSE 2011 2/28
  • 3. Tweet Along @dgleich There’s information inside the spectra Words in dictionary definitions Internet router network 111k vertices, 2.7M edges 192k vetices, 1.2M edges These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra. 2/24/2011 SIAM CSE 2011 3/28
  • 4. Tweet Along @dgleich Why are we interested in the spectra? Modeling Properties Moments of the adjacency Anomalies Regularities Network Comparison Fay et al. 2010 – Weighted Spectral Density The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes 2/24/2011 SIAM CSE 2011 4/28
  • 5. Tweet Along @dgleich Matrices from graphs Adjacency matrix Random walk matrix       if   Modularity matrix       Laplacian matrix   Not covered Signless Laplacian matrix   Incidence matrix   (It is incidentally discussed) Seidel matrix Normalized Laplacian matrix Heat Kernel     Everything is undirected. Mostly connected components only too. 2/24/2011 SIAM CSE 2011 5/28
  • 6. Tweet Along @dgleich Erdős–Rényi Semi-circles Based on Wigner’s semi-circle law. The eigenvalues of the Count adjacency matrix for n=1000, averaged over 10 trials Semi-circle with outlier if average degree is large enough. Eigenvalue Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14) 2/24/2011 SIAM CSE 2011 6/28
  • 7. Tweet Along @dgleich Previous results Farkas et al. : Significant deviation from the semi-circle law for the adjacency matrix Mihail and Papadimitriou : Leading eigenvalues of the adjacency matrix obey a power-law based on the degree-sequence Chung et al. : Normalized Laplacian still obeys a semi-circle law if min-degree large Banerjee and Jost : Study of types of patterns that emerge in evolving graph models – explain many features of the spectra 2/24/2011 SIAM CSE 2011 7/28
  • 8. Tweet Along @dgleich In comparison to other empiric studies We use “exact” computation of spectra, instead of approximation. We study “all” of the standard matrices over a range of large networks. Our “large” is bigger. We look at a few different random graph models. 2/24/2011 SIAM CSE 2011 8/28
  • 9. Tweet Along @dgleich Data sources SNAP Various 100s-100,000s SNAP-p2p Gnutella Network 5-60k, ~30 inst. SNAP-as-733 Autonomous Sys. ~5,000, 733 inst. SNAP-caida Router networks ~20,000, ~125 inst. Pajek Various 100s-100,000s Models Copying Model 1k-100k 9 inst. 324 gs Pref. Attach 1k-100k 9 inst. 164 gs Forest Fire 1k-100k 9 inst. 324 gs Mine Various 2k-500k Newman Various Arenas Various Porter Facebook 100 schools, 5k-60k IsoRank, Natalie Protein-Protein <10k , 4 graphs Thanks to all who make data available 2/24/2011 SIAM CSE 2011 9/28
  • 10. Tweet Along @dgleich Big graphs Arxiv 86376 1035126 Co-author Dblp 93156 356290 Co-author Dictionary(*) 111982 2750576 Word defns. Internet(*) 124651 414428 Routers Itdk0304 190914 1215220 Routers p2p-gnu(*) 62561 295756 Peer-to-peer Patents(*) 230686 1109898 Citations Roads 126146 323900 Roads Wordnet(*) 75606 240036 Word relation web-nb.edu 325729 2994268 Web (*) denotes that this is a weakly connected component of a directed graph. 2/24/2011 SIAM CSE 2011 10/28
  • 11. Tweet Along @dgleich Models Preferential Attachment Start graph with a k-node clique. Add a new node and connect to k random nodes, chosen proportional to degree. Copying model Start graph with a k-node clique. Add a new node and pick a parent uniformly at random. Copy edges of parent and make an error with probability   Forest Fire Start graph with a k-node clique. Add a new node and pick a parent uniformly at random. Do a random “bfs’/”forest fire” and link to all nodes “burned” 2/24/2011 SIAM CSE 2011 11/28
  • 12. Tweet Along @dgleich COMPUTING SPECTRA OF LARGE NETWORKS 2/24/2011 SIAM CSE 2011 12
  • 13. Tweet Along @dgleich Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time. 2/24/2011 SIAM CSE 2011 13/28
  • 14. Tweet Along @dgleich Eigenvalues with ScaLAPACK Mostly the same approach as in LAPACK 1. Reduce to tridiagonal form (most time consuming part) 2. Distribute tridiagonals to all processors 3. Each processor finds all eigenvalues 4. Each processor computes a subset of eigenvectors I’m actually using the MRRR algorithm, where steps 3 and 4 are better and faster MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel. 2/24/2011 SIAM CSE 2011 14/28
  • 15. Tweet Along @dgleich Alternatives Use ARPACK to get extrema Use ARPACK to get interior around   via the folded spectrum   Large nearly repeated sets of eigenvalues will make this tricky. Farkas et al. used this approach. Figure from somewhere on the web… sorry! 2/24/2011 SIAM CSE 2011 15/28
  • 16. Tweet Along @dgleich Adding MPI tasks vs. using threads Most math libraries have threaded versions (Intel MKL, AMD ACML) Is it better to use threads or MPI tasks? It depends. Cray libsci Intel MKL Threads Ranks Time Threads Ranks Time-T Time-E 1 64 1412.5 1 36 1271.4 339.0 4 16 1881.4 4 9 1058.1 456.6 16 4 Omitted. Normalized Laplacian for 36k-by-36k co-author graph of CondMat 2/24/2011 SIAM CSE 2011 16/28
  • 17. Tweet Along @dgleich Weak Parallel Scaling Time   Time in hours Good strong scaling up to 325,000 vertices Estimated time for 500,000 nodes 9 hours with 925 nodes (7400 procs)   2/24/2011 SIAM CSE 2011 17/28
  • 18. Tweet Along @dgleich EXAMPLES 2/24/2011 SIAM CSE 2011 18
  • 19. Tweet Along @dgleich A $8,000 matrix computation 325729 nodes and 2994268 edges 500 nodes and 4000 processors on Redsky for 5 hours x 2 for normalized Laplacian/adjacency matrix 2/24/2011 SIAM CSE 2011 19/28
  • 20. Tweet Along @dgleich Yes! These are cases where we have multiple instances of the same graph. 2/24/2011 SIAM CSE 2011 20/28
  • 21. Tweet Along @dgleich Already known? Just the facebook spectra. 2/24/2011 SIAM CSE 2011 21/28
  • 22. Tweet Along @dgleich Already known? I soon realized I was searching for “spectre” instead of spectrum, oops. 2/24/2011 SIAM CSE 2011 22/28
  • 23. Tweet Along @dgleich Spikes? Repeated rows Identical rows grow the null-space. Banerjee and Jost Unit eigenvalue Motif doubling and joining small   graphs will tend to cause repeated eigenvalues and null vectors. Banerjee and Jost explained how evolving graphs should produce repeated eigenvalues 2/24/2011 SIAM CSE 2011 23/28
  • 24. Tweet Along @dgleich Copying model Obvious follow up here: does a random sample with the same degree distribution show the same thing? 2/24/2011 SIAM CSE 2011 24/28
  • 25. Tweet Along @dgleich Forest Fire models 2/24/2011 SIAM CSE 2011 25/28
  • 26. Tweet Along @dgleich Preferential Attachment Semi-circle in log-space! 2/24/2011 SIAM CSE 2011 26/28
  • 27. Tweet Along @dgleich Where is this going? We can compute spectra for large networks if needed. Study relationship with known power- laws in spectra Eigenvector localization Directed Laplacians 2/24/2011 SIAM CSE 2011 27/28
  • 28. Tweet Along @dgleich Nullspaces of the adjacency matrix   So unit eigenvalues of the normalized Laplacian are null- vectors of the adjacency matrix. 2/24/2011 SIAM CSE 2011 28/28
  • 29. Code will be available eventually. Image from good financial cents. 2/24/2011 SIAM CSE 2011 29 of <Total>