0
Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email_addr...
What are the BGLs? <ul><li>A collection of libraries for computation on graphs/networks. </li></ul><ul><ul><li>Graph data ...
The BGL Family <ul><li>The Original (sequential) BGL </li></ul><ul><li>BGL-Python </li></ul><ul><li>The Parallel BGL </li>...
The Original BGL <ul><li>The largest and most mature BGL </li></ul><ul><ul><li>~7 years of research and development </li><...
BGL: Graph Data Structures <ul><li>Graphs: </li></ul><ul><ul><li>adjacency_list : highly configurable with user-specified ...
Original BGL: Algorithms <ul><li>Searches (breadth-first, depth-first, A*) </li></ul><ul><li>Single-source shortest paths ...
Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
Define a Graph Type <ul><li>Determine vertex/edge properties: struct  Vertex { string name; }; struct  Edge {  int  bicomp...
Read in a GraphViz DOT File <ul><li>Build an empty graph: Graph g; </li></ul><ul><li>Map vertex properties: dynamic_proper...
Run Biconnected Components <ul><li>Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points; </l...
Output results <ul><li>Attach bicomponent number to the “label” property of edges: dyn.property(“label”,    get(&Edge::bic...
Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
Original BGL Summary <ul><li>The original BGL is large, stable, efficient </li></ul><ul><ul><li>Lots of algorithms, graph ...
BGL-Python <ul><li>Python is ideal for rapid prototyping: </li></ul><ul><ul><li>It’s a scripting language (no compiler) </...
Example: Biconnected Components import  boost.graph  as  bgl  # Pull in the BGL bindings g = bgl.Graph.read_graphviz(&quot...
Wrapping the BGL in Python <ul><li>BGL-Python is not a… </li></ul><ul><ul><li>“ port” </li></ul></ul><ul><ul><li>reimpleme...
Performance: Shortest Paths Intro BGL Python
BGL-Python Summary <ul><li>BGL-Python is all about tradeoffs: </li></ul><ul><ul><li>More gradual learning curve </li></ul>...
 
The Parallel BGL <ul><li>A version of the C++ BGL for computational clusters </li></ul><ul><ul><li>Distributed memory for ...
Parallel BGL: Distributed Graphs distributed across 3 processors.  Intro BGL Parallel Python A simple, directed graph…
Parallel Graph Algorithms <ul><li>Breadth-first search </li></ul><ul><li>Eager Dijkstra’s single-source shortest paths </l...
Performance: Sparse graphs
Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph
Performance vs. CGM graph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Parallel Python
Parallel BGL Summary <ul><li>The Parallel BGL is built for huge graphs </li></ul><ul><ul><li>Millions to hundreds of milli...
Distributed Graph Layout Intro BGL Parallel Python
Parallel BGL in Python <ul><li>Preliminary support for the Parallel BGL in Python </li></ul><ul><ul><li>Just  import boost...
Porting for Performance Intro BGL Parallel Python Porting
Which BGL is Right for You? <ul><li>Is any BGL right for you? </li></ul><ul><li>Depends on how large your networks are: </...
Conclusion <ul><li>The Boost Graph Library family is a collection of full-featured graph libraries </li></ul><ul><ul><li>A...
For More Information… <ul><li>(Original) Boost Graph Library http://www.boost.org/libs/graph/doc </li></ul><ul><li>Paralle...
Other BGL Variants <ul><li>QuickGraph (C#) http://www. codeproject . com/cs/miscctrl/quickgraph .asp </li></ul><ul><li>Rub...
Comparative Performance Intro BGL
Upcoming SlideShare
Loading in...5
×

doug

1,026

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,026
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "doug"

  1. 1. Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email_address]
  2. 2. What are the BGLs? <ul><li>A collection of libraries for computation on graphs/networks. </li></ul><ul><ul><li>Graph data structures </li></ul></ul><ul><ul><li>Graph algorithms </li></ul></ul><ul><ul><li>Graph input/output </li></ul></ul><ul><li>Common design </li></ul><ul><ul><li>Flexibility/customizability throughout </li></ul></ul><ul><ul><li>Obsessed with performance </li></ul></ul><ul><ul><li>Common interfaces throughout the collection </li></ul></ul><ul><li>All open source, freely available online </li></ul>Intro
  3. 3. The BGL Family <ul><li>The Original (sequential) BGL </li></ul><ul><li>BGL-Python </li></ul><ul><li>The Parallel BGL </li></ul><ul><li>Parallel BGL-Python </li></ul>Intro
  4. 4. The Original BGL <ul><li>The largest and most mature BGL </li></ul><ul><ul><li>~7 years of research and development </li></ul></ul><ul><ul><li>Many users, contributors outside of the OSL </li></ul></ul><ul><ul><li>Steadily evolving </li></ul></ul><ul><li>Written in C++ </li></ul><ul><ul><li>Generic </li></ul></ul><ul><ul><li>Highly customizable </li></ul></ul><ul><ul><li>Efficient (both storage and execution) </li></ul></ul>Intro BGL
  5. 5. BGL: Graph Data Structures <ul><li>Graphs: </li></ul><ul><ul><li>adjacency_list : highly configurable with user-specified containers for vertices and edges </li></ul></ul><ul><ul><li>adjacency_matrix </li></ul></ul><ul><ul><li>compressed_sparse_row </li></ul></ul><ul><li>Adaptors: </li></ul><ul><ul><li>subgraphs, filtered graphs, reverse graphs </li></ul></ul><ul><ul><li>LEDA and Stanford GraphBase </li></ul></ul><ul><li>Or, use your own… </li></ul>Intro BGL
  6. 6. Original BGL: Algorithms <ul><li>Searches (breadth-first, depth-first, A*) </li></ul><ul><li>Single-source shortest paths (Dijkstra, Bellman-Ford, DAG) </li></ul><ul><li>All-pairs shortest paths (Johnson, Floyd-Warshall) </li></ul><ul><li>Minimum spanning tree (Kruskal, Prim) </li></ul><ul><li>Components (connected, strongly connected, biconnected) </li></ul><ul><li>Maximum cardinality matching </li></ul><ul><li>Max-flow (Edmonds-Karp, push-relabel) </li></ul><ul><li>Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree) </li></ul><ul><li>Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun) </li></ul><ul><li>Betweenness centrality </li></ul><ul><li>PageRank </li></ul><ul><li>Isomorphism </li></ul><ul><li>Vertex coloring </li></ul><ul><li>Transitive closure </li></ul><ul><li>Dominator tree </li></ul>Intro BGL
  7. 7. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  8. 8. Define a Graph Type <ul><li>Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; }; </li></ul><ul><li>Determine the graph type: typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph; </li></ul>Intro BGL
  9. 9. Read in a GraphViz DOT File <ul><li>Build an empty graph: Graph g; </li></ul><ul><li>Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g)); </li></ul><ul><li>Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn); </li></ul>Intro BGL
  10. 10. Run Biconnected Components <ul><li>Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points; </li></ul><ul><li>Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points)); </li></ul>Intro BGL
  11. 11. Output results <ul><li>Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g)); </li></ul><ul><li>Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn); </li></ul><ul><li>Show articulation points: cout << “Articulation points: “; for ( int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; } </li></ul>Intro BGL
  12. 12. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  13. 13. Original BGL Summary <ul><li>The original BGL is large, stable, efficient </li></ul><ul><ul><li>Lots of algorithms, graph types </li></ul></ul><ul><ul><li>Peer-reviewed code with many users, nightly regression testing, etc. </li></ul></ul><ul><ul><li>Performance comparable to FORTRAN. </li></ul></ul><ul><li>Who should use the BGL? </li></ul><ul><ul><li>Programmers comfortable with C++ </li></ul></ul><ul><ul><li>Users with graph sizes from tens of vertices to millions of vertices </li></ul></ul>Intro BGL
  14. 14. BGL-Python <ul><li>Python is ideal for rapid prototyping: </li></ul><ul><ul><li>It’s a scripting language (no compiler) </li></ul></ul><ul><ul><li>Dynamically typed means less typing for you </li></ul></ul><ul><ul><li>Easy to use: you already know Python… </li></ul></ul><ul><li>BGL-Python provides access to the BGL from within Python </li></ul><ul><ul><li>Similar interfaces to C++ BGL </li></ul></ul><ul><ul><li>Easier to learn than C++ </li></ul></ul><ul><ul><li>Great for scripting, GUI applications </li></ul></ul><ul><ul><li>help(bgl.dijkstra_shortest_paths) </li></ul></ul>Intro BGL Python
  15. 15. Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz(&quot;biconnected_components.dot&quot;) # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz(&quot;biconnected_components_out.dot&quot;) print &quot;Articulation points: &quot;, node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print &quot;&quot; Intro BGL Python
  16. 16. Wrapping the BGL in Python <ul><li>BGL-Python is not a… </li></ul><ul><ul><li>“ port” </li></ul></ul><ul><ul><li>reimplementation </li></ul></ul><ul><li>BGL-Python wraps the C++ BGL </li></ul><ul><ul><li>Python calls translate to C++ calls </li></ul></ul><ul><ul><li>C++ can call back into Python </li></ul></ul><ul><li>Most of the speed of C++ </li></ul><ul><li>Most of the flexibility of Python </li></ul>
  17. 17. Performance: Shortest Paths Intro BGL Python
  18. 18. BGL-Python Summary <ul><li>BGL-Python is all about tradeoffs: </li></ul><ul><ul><li>More gradual learning curve </li></ul></ul><ul><ul><li>Faster time-to-solution </li></ul></ul><ul><ul><li>Lower performance </li></ul></ul><ul><li>Our typical approach: </li></ul><ul><ul><li>Prototype in Python to get your ideas down </li></ul></ul><ul><ul><li>Port to C++ when performance matters </li></ul></ul>Intro BGL Python
  19. 20. The Parallel BGL <ul><li>A version of the C++ BGL for computational clusters </li></ul><ul><ul><li>Distributed memory for huge graphs </li></ul></ul><ul><ul><li>Parallel processing for improved performance </li></ul></ul><ul><li>An active research project </li></ul><ul><li>Closely related to the original BGL </li></ul><ul><ul><li>Parallelizing BGL programs should be “easy” </li></ul></ul>Intro BGL Parallel Python
  20. 21. Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Parallel Python A simple, directed graph…
  21. 22. Parallel Graph Algorithms <ul><li>Breadth-first search </li></ul><ul><li>Eager Dijkstra’s single-source shortest paths </li></ul><ul><li>Crauser et al. single-source shortest paths </li></ul><ul><li>Depth-first search </li></ul><ul><li>Minimum spanning tree (Boruvka, Dehne & Götz) </li></ul><ul><li>Connected components </li></ul><ul><li>Strongly connected components </li></ul><ul><li>Biconnected components </li></ul><ul><li>PageRank </li></ul><ul><li>Graph coloring </li></ul><ul><li>Fruchterman-Reingold layout </li></ul><ul><li>Max-flow (Dinic’s) </li></ul>Intro BGL Parallel Python
  22. 23. Performance: Sparse graphs
  23. 24. Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph
  24. 25. Performance vs. CGM graph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Parallel Python
  25. 26. Parallel BGL Summary <ul><li>The Parallel BGL is built for huge graphs </li></ul><ul><ul><li>Millions to hundreds of millions of nodes </li></ul></ul><ul><ul><li>Distributed-memory parallel processing on clusters </li></ul></ul><ul><ul><li>Future work will permit larger graphs… </li></ul></ul><ul><li>Parallel programming has a learning curve </li></ul><ul><ul><li>Parallel graph algorithms much harder to write </li></ul></ul><ul><ul><li>Distributed graph manipulation can be tricky </li></ul></ul><ul><li>Parallel BGL is an active research library </li></ul>Intro BGL Parallel Python
  26. 27. Distributed Graph Layout Intro BGL Parallel Python
  27. 28. Parallel BGL in Python <ul><li>Preliminary support for the Parallel BGL in Python </li></ul><ul><ul><li>Just import boost.graph.distributed </li></ul></ul><ul><ul><li>Similar interface to sequential BGL-Python </li></ul></ul><ul><li>Several options for usage with MPI: </li></ul><ul><ul><li>Straight MPI: mpirun -np 2 python script.py </li></ul></ul><ul><ul><li>pyMPI: allows interactive use of the interpreter </li></ul></ul><ul><li>Initially used to prototype our distributed Fruchterman-Reingold implementation. </li></ul>Intro BGL Parallel Python
  28. 29. Porting for Performance Intro BGL Parallel Python Porting
  29. 30. Which BGL is Right for You? <ul><li>Is any BGL right for you? </li></ul><ul><li>Depends on how large your networks are: </li></ul><ul><ul><li>Up to 1/2 million vertices, any BGL will do </li></ul></ul><ul><ul><li>C++ BGL can push to a couple million vertices </li></ul></ul><ul><ul><li>For tens of millions or larger, Parallel BGL only </li></ul></ul><ul><li>Other considerations: </li></ul><ul><ul><li>You can prototype in Python, port to C++ </li></ul></ul><ul><ul><li>Algorithm authors might prefer the original BGL </li></ul></ul><ul><ul><li>Parallelism is very hard to manage </li></ul></ul>Intro BGL Parallel Python Porting
  30. 31. Conclusion <ul><li>The Boost Graph Library family is a collection of full-featured graph libraries </li></ul><ul><ul><li>All are flexible, customizable, efficient </li></ul></ul><ul><ul><li>Easy to port from Python to C++ </li></ul></ul><ul><ul><li>Can port from sequential to parallel </li></ul></ul><ul><ul><li>Always growing, improving </li></ul></ul><ul><li>Is one of the BGLs right for you? </li></ul><ul><ul><li>A typical “build or buy” decision </li></ul></ul>Intro BGL Parallel Python Porting Conclusion
  31. 32. For More Information… <ul><li>(Original) Boost Graph Library http://www.boost.org/libs/graph/doc </li></ul><ul><li>Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl </li></ul><ul><li>Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python </li></ul><ul><li>Contact us! </li></ul><ul><ul><li>Douglas Gregor < [email_address] . iu . edu > </li></ul></ul><ul><ul><li>Andrew Lumsdaine < [email_address] > </li></ul></ul>Intro BGL Parallel Python Porting Conclusion
  32. 33. Other BGL Variants <ul><li>QuickGraph (C#) http://www. codeproject . com/cs/miscctrl/quickgraph .asp </li></ul><ul><li>Ruby Graph Library http: //rubyforge . org/projects/rgl/ </li></ul><ul><li>Rooster Graph (Scheme) http://savannah. nongnu . org/projects/rgraph/ </li></ul><ul><li>RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html </li></ul><ul><li>Disclaimer: These are all separate projects. We do not maintain them. </li></ul>Intro BGL Parallel Python Porting
  33. 34. Comparative Performance Intro BGL
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×