• Like
  • Save
doug
Upcoming SlideShare
Loading in...5
×
 

doug

on

  • 1,947 views

 

Statistics

Views

Total Views
1,947
Views on SlideShare
1,943
Embed Views
4

Actions

Likes
1
Downloads
19
Comments
0

1 Embed 4

http://www.slideshare.net 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    doug doug Presentation Transcript

    • Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email_address]
    • What are the BGLs?
      • A collection of libraries for computation on graphs/networks.
        • Graph data structures
        • Graph algorithms
        • Graph input/output
      • Common design
        • Flexibility/customizability throughout
        • Obsessed with performance
        • Common interfaces throughout the collection
      • All open source, freely available online
      Intro
    • The BGL Family
      • The Original (sequential) BGL
      • BGL-Python
      • The Parallel BGL
      • Parallel BGL-Python
      Intro
    • The Original BGL
      • The largest and most mature BGL
        • ~7 years of research and development
        • Many users, contributors outside of the OSL
        • Steadily evolving
      • Written in C++
        • Generic
        • Highly customizable
        • Efficient (both storage and execution)
      Intro BGL
    • BGL: Graph Data Structures
      • Graphs:
        • adjacency_list : highly configurable with user-specified containers for vertices and edges
        • adjacency_matrix
        • compressed_sparse_row
      • Adaptors:
        • subgraphs, filtered graphs, reverse graphs
        • LEDA and Stanford GraphBase
      • Or, use your own…
      Intro BGL
    • Original BGL: Algorithms
      • Searches (breadth-first, depth-first, A*)
      • Single-source shortest paths (Dijkstra, Bellman-Ford, DAG)
      • All-pairs shortest paths (Johnson, Floyd-Warshall)
      • Minimum spanning tree (Kruskal, Prim)
      • Components (connected, strongly connected, biconnected)
      • Maximum cardinality matching
      • Max-flow (Edmonds-Karp, push-relabel)
      • Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)
      • Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)
      • Betweenness centrality
      • PageRank
      • Isomorphism
      • Vertex coloring
      • Transitive closure
      • Dominator tree
      Intro BGL
    • Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
    • Define a Graph Type
      • Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; };
      • Determine the graph type: typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph;
      Intro BGL
    • Read in a GraphViz DOT File
      • Build an empty graph: Graph g;
      • Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g));
      • Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn);
      Intro BGL
    • Run Biconnected Components
      • Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points;
      • Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points));
      Intro BGL
    • Output results
      • Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g));
      • Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn);
      • Show articulation points: cout << “Articulation points: “; for ( int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; }
      Intro BGL
    • Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
    • Original BGL Summary
      • The original BGL is large, stable, efficient
        • Lots of algorithms, graph types
        • Peer-reviewed code with many users, nightly regression testing, etc.
        • Performance comparable to FORTRAN.
      • Who should use the BGL?
        • Programmers comfortable with C++
        • Users with graph sizes from tens of vertices to millions of vertices
      Intro BGL
    • BGL-Python
      • Python is ideal for rapid prototyping:
        • It’s a scripting language (no compiler)
        • Dynamically typed means less typing for you
        • Easy to use: you already know Python…
      • BGL-Python provides access to the BGL from within Python
        • Similar interfaces to C++ BGL
        • Easier to learn than C++
        • Great for scripting, GUI applications
        • help(bgl.dijkstra_shortest_paths)
      Intro BGL Python
    • Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz(&quot;biconnected_components.dot&quot;) # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz(&quot;biconnected_components_out.dot&quot;) print &quot;Articulation points: &quot;, node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print &quot;&quot; Intro BGL Python
    • Wrapping the BGL in Python
      • BGL-Python is not a…
        • “ port”
        • reimplementation
      • BGL-Python wraps the C++ BGL
        • Python calls translate to C++ calls
        • C++ can call back into Python
      • Most of the speed of C++
      • Most of the flexibility of Python
    • Performance: Shortest Paths Intro BGL Python
    • BGL-Python Summary
      • BGL-Python is all about tradeoffs:
        • More gradual learning curve
        • Faster time-to-solution
        • Lower performance
      • Our typical approach:
        • Prototype in Python to get your ideas down
        • Port to C++ when performance matters
      Intro BGL Python
    •  
    • The Parallel BGL
      • A version of the C++ BGL for computational clusters
        • Distributed memory for huge graphs
        • Parallel processing for improved performance
      • An active research project
      • Closely related to the original BGL
        • Parallelizing BGL programs should be “easy”
      Intro BGL Parallel Python
    • Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Parallel Python A simple, directed graph…
    • Parallel Graph Algorithms
      • Breadth-first search
      • Eager Dijkstra’s single-source shortest paths
      • Crauser et al. single-source shortest paths
      • Depth-first search
      • Minimum spanning tree (Boruvka, Dehne & Götz)
      • Connected components
      • Strongly connected components
      • Biconnected components
      • PageRank
      • Graph coloring
      • Fruchterman-Reingold layout
      • Max-flow (Dinic’s)
      Intro BGL Parallel Python
    • Performance: Sparse graphs
    • Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph
    • Performance vs. CGM graph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Parallel Python
    • Parallel BGL Summary
      • The Parallel BGL is built for huge graphs
        • Millions to hundreds of millions of nodes
        • Distributed-memory parallel processing on clusters
        • Future work will permit larger graphs…
      • Parallel programming has a learning curve
        • Parallel graph algorithms much harder to write
        • Distributed graph manipulation can be tricky
      • Parallel BGL is an active research library
      Intro BGL Parallel Python
    • Distributed Graph Layout Intro BGL Parallel Python
    • Parallel BGL in Python
      • Preliminary support for the Parallel BGL in Python
        • Just import boost.graph.distributed
        • Similar interface to sequential BGL-Python
      • Several options for usage with MPI:
        • Straight MPI: mpirun -np 2 python script.py
        • pyMPI: allows interactive use of the interpreter
      • Initially used to prototype our distributed Fruchterman-Reingold implementation.
      Intro BGL Parallel Python
    • Porting for Performance Intro BGL Parallel Python Porting
    • Which BGL is Right for You?
      • Is any BGL right for you?
      • Depends on how large your networks are:
        • Up to 1/2 million vertices, any BGL will do
        • C++ BGL can push to a couple million vertices
        • For tens of millions or larger, Parallel BGL only
      • Other considerations:
        • You can prototype in Python, port to C++
        • Algorithm authors might prefer the original BGL
        • Parallelism is very hard to manage
      Intro BGL Parallel Python Porting
    • Conclusion
      • The Boost Graph Library family is a collection of full-featured graph libraries
        • All are flexible, customizable, efficient
        • Easy to port from Python to C++
        • Can port from sequential to parallel
        • Always growing, improving
      • Is one of the BGLs right for you?
        • A typical “build or buy” decision
      Intro BGL Parallel Python Porting Conclusion
    • For More Information…
      • (Original) Boost Graph Library http://www.boost.org/libs/graph/doc
      • Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl
      • Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python
      • Contact us!
        • Douglas Gregor < [email_address] . iu . edu >
        • Andrew Lumsdaine < [email_address] >
      Intro BGL Parallel Python Porting Conclusion
    • Other BGL Variants
      • QuickGraph (C#) http://www. codeproject . com/cs/miscctrl/quickgraph .asp
      • Ruby Graph Library http: //rubyforge . org/projects/rgl/
      • Rooster Graph (Scheme) http://savannah. nongnu . org/projects/rgraph/
      • RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html
      • Disclaimer: These are all separate projects. We do not maintain them.
      Intro BGL Parallel Python Porting
    • Comparative Performance Intro BGL