doug
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,080
On Slideshare
2,076
From Embeds
4
Number of Embeds
1

Actions

Shares
Downloads
19
Comments
0
Likes
1

Embeds 4

http://www.slideshare.net 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email_address]
  • 2. What are the BGLs?
    • A collection of libraries for computation on graphs/networks.
      • Graph data structures
      • Graph algorithms
      • Graph input/output
    • Common design
      • Flexibility/customizability throughout
      • Obsessed with performance
      • Common interfaces throughout the collection
    • All open source, freely available online
    Intro
  • 3. The BGL Family
    • The Original (sequential) BGL
    • BGL-Python
    • The Parallel BGL
    • Parallel BGL-Python
    Intro
  • 4. The Original BGL
    • The largest and most mature BGL
      • ~7 years of research and development
      • Many users, contributors outside of the OSL
      • Steadily evolving
    • Written in C++
      • Generic
      • Highly customizable
      • Efficient (both storage and execution)
    Intro BGL
  • 5. BGL: Graph Data Structures
    • Graphs:
      • adjacency_list : highly configurable with user-specified containers for vertices and edges
      • adjacency_matrix
      • compressed_sparse_row
    • Adaptors:
      • subgraphs, filtered graphs, reverse graphs
      • LEDA and Stanford GraphBase
    • Or, use your own…
    Intro BGL
  • 6. Original BGL: Algorithms
    • Searches (breadth-first, depth-first, A*)
    • Single-source shortest paths (Dijkstra, Bellman-Ford, DAG)
    • All-pairs shortest paths (Johnson, Floyd-Warshall)
    • Minimum spanning tree (Kruskal, Prim)
    • Components (connected, strongly connected, biconnected)
    • Maximum cardinality matching
    • Max-flow (Edmonds-Karp, push-relabel)
    • Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)
    • Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)
    • Betweenness centrality
    • PageRank
    • Isomorphism
    • Vertex coloring
    • Transitive closure
    • Dominator tree
    Intro BGL
  • 7. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  • 8. Define a Graph Type
    • Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; };
    • Determine the graph type: typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph;
    Intro BGL
  • 9. Read in a GraphViz DOT File
    • Build an empty graph: Graph g;
    • Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g));
    • Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn);
    Intro BGL
  • 10. Run Biconnected Components
    • Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points;
    • Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points));
    Intro BGL
  • 11. Output results
    • Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g));
    • Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn);
    • Show articulation points: cout << “Articulation points: “; for ( int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; }
    Intro BGL
  • 12. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  • 13. Original BGL Summary
    • The original BGL is large, stable, efficient
      • Lots of algorithms, graph types
      • Peer-reviewed code with many users, nightly regression testing, etc.
      • Performance comparable to FORTRAN.
    • Who should use the BGL?
      • Programmers comfortable with C++
      • Users with graph sizes from tens of vertices to millions of vertices
    Intro BGL
  • 14. BGL-Python
    • Python is ideal for rapid prototyping:
      • It’s a scripting language (no compiler)
      • Dynamically typed means less typing for you
      • Easy to use: you already know Python…
    • BGL-Python provides access to the BGL from within Python
      • Similar interfaces to C++ BGL
      • Easier to learn than C++
      • Great for scripting, GUI applications
      • help(bgl.dijkstra_shortest_paths)
    Intro BGL Python
  • 15. Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz(&quot;biconnected_components.dot&quot;) # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz(&quot;biconnected_components_out.dot&quot;) print &quot;Articulation points: &quot;, node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print &quot;&quot; Intro BGL Python
  • 16. Wrapping the BGL in Python
    • BGL-Python is not a…
      • “ port”
      • reimplementation
    • BGL-Python wraps the C++ BGL
      • Python calls translate to C++ calls
      • C++ can call back into Python
    • Most of the speed of C++
    • Most of the flexibility of Python
  • 17. Performance: Shortest Paths Intro BGL Python
  • 18. BGL-Python Summary
    • BGL-Python is all about tradeoffs:
      • More gradual learning curve
      • Faster time-to-solution
      • Lower performance
    • Our typical approach:
      • Prototype in Python to get your ideas down
      • Port to C++ when performance matters
    Intro BGL Python
  • 19.  
  • 20. The Parallel BGL
    • A version of the C++ BGL for computational clusters
      • Distributed memory for huge graphs
      • Parallel processing for improved performance
    • An active research project
    • Closely related to the original BGL
      • Parallelizing BGL programs should be “easy”
    Intro BGL Parallel Python
  • 21. Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Parallel Python A simple, directed graph…
  • 22. Parallel Graph Algorithms
    • Breadth-first search
    • Eager Dijkstra’s single-source shortest paths
    • Crauser et al. single-source shortest paths
    • Depth-first search
    • Minimum spanning tree (Boruvka, Dehne & Götz)
    • Connected components
    • Strongly connected components
    • Biconnected components
    • PageRank
    • Graph coloring
    • Fruchterman-Reingold layout
    • Max-flow (Dinic’s)
    Intro BGL Parallel Python
  • 23. Performance: Sparse graphs
  • 24. Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph
  • 25. Performance vs. CGM graph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Parallel Python
  • 26. Parallel BGL Summary
    • The Parallel BGL is built for huge graphs
      • Millions to hundreds of millions of nodes
      • Distributed-memory parallel processing on clusters
      • Future work will permit larger graphs…
    • Parallel programming has a learning curve
      • Parallel graph algorithms much harder to write
      • Distributed graph manipulation can be tricky
    • Parallel BGL is an active research library
    Intro BGL Parallel Python
  • 27. Distributed Graph Layout Intro BGL Parallel Python
  • 28. Parallel BGL in Python
    • Preliminary support for the Parallel BGL in Python
      • Just import boost.graph.distributed
      • Similar interface to sequential BGL-Python
    • Several options for usage with MPI:
      • Straight MPI: mpirun -np 2 python script.py
      • pyMPI: allows interactive use of the interpreter
    • Initially used to prototype our distributed Fruchterman-Reingold implementation.
    Intro BGL Parallel Python
  • 29. Porting for Performance Intro BGL Parallel Python Porting
  • 30. Which BGL is Right for You?
    • Is any BGL right for you?
    • Depends on how large your networks are:
      • Up to 1/2 million vertices, any BGL will do
      • C++ BGL can push to a couple million vertices
      • For tens of millions or larger, Parallel BGL only
    • Other considerations:
      • You can prototype in Python, port to C++
      • Algorithm authors might prefer the original BGL
      • Parallelism is very hard to manage
    Intro BGL Parallel Python Porting
  • 31. Conclusion
    • The Boost Graph Library family is a collection of full-featured graph libraries
      • All are flexible, customizable, efficient
      • Easy to port from Python to C++
      • Can port from sequential to parallel
      • Always growing, improving
    • Is one of the BGLs right for you?
      • A typical “build or buy” decision
    Intro BGL Parallel Python Porting Conclusion
  • 32. For More Information…
    • (Original) Boost Graph Library http://www.boost.org/libs/graph/doc
    • Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl
    • Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python
    • Contact us!
      • Douglas Gregor < [email_address] . iu . edu >
      • Andrew Lumsdaine < [email_address] >
    Intro BGL Parallel Python Porting Conclusion
  • 33. Other BGL Variants
    • QuickGraph (C#) http://www. codeproject . com/cs/miscctrl/quickgraph .asp
    • Ruby Graph Library http: //rubyforge . org/projects/rgl/
    • Rooster Graph (Scheme) http://savannah. nongnu . org/projects/rgraph/
    • RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html
    • Disclaimer: These are all separate projects. We do not maintain them.
    Intro BGL Parallel Python Porting
  • 34. Comparative Performance Intro BGL