Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Matlab titles 2014_2015_For ME_M.... by S3 Info tech 1688 views
- Iterative methods with special stru... by David Gleich 568 views
- PageRank Centrality of dynamic grap... by David Gleich 1213 views
- Graph Plots in Matlab by DataminingTools Inc 6728 views
- Abir project by Abeer Naskar 275 views
- Non-exhaustive, Overlapping K-means by David Gleich 993 views

26,971 views

26,711 views

26,711 views

Published on

Slides from my talk at the Mathworks summit at Stanford

Published in:
Technology

No Downloads

Total views

26,971

On SlideShare

0

From Embeds

0

Number of Embeds

21,495

Shares

0

Downloads

99

Comments

1

Likes

6

No embeds

No notes for slide

- 1. A tale of two Matlab libraries !for graph algorithms!MatlabBGL and gaimc David F. Gleich Purdue University
- 2. The Settingrecursive spectral graph partitioning
- 3. To store an m×n sparse matrix M, Matlab uses compressed column format Compr 2 12 4 The Setting[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always 16 20 rp“re-compresses” the data structure in these cases. If M is the adjacency matrix 1 10 corresponds to storing theof a graph, then storing the matrix by columns 4 9 7 6graph as an in-edge list. recursive spectral graph partitioning 13 4 ci We briey illustrate compressed row and column storage schemes in g- 3 14 5 ai ure .. 1 2 3 4 5 6 1 0 16 13 Compressed sparse row0 0 0 Compr 0 rp2 1 3 0 5 10 9 11 0 0 cp 2 12 4 7 12 11 0 3 0 4 16 20 0 0 14 4 0 0 20 1 10 4 9 7 6 2 3 3 9 20 5 0 6 6 13 4 ci 0 0 0 7 0 5 4 3 4 ri 4 ai 13 10 12 4 14 9 16 20 4 7 6 0 0 0 ai 3 14 5 0 0 0 0 Compressed sparse column0 16 13 0 0 0 0 cp 0 10 12 0 Most graph algorithms are designe0 0 1 1 3 6 8 9 11 4 0 0 14 in-edge lists. Before running an algo0 0 9 0 0 20
- 4. The Settingrecursive spectral graph partitioning A = load_adjacency_matrix; L = speye(sum(A,2)) - A; [V,D] = eigs(L,2,’SA’); f = V(:,2); A1 = A(f=0,f=0); A2 = A(f0, f0);*
- 5. The Settingrecursive spectral graph partitioning A = load_adjacency_matrix; L = speye(sum(A,2)) - A; [V,D] = eigs(L,2,’SA’); f = V(:,2); A1 = A(f=0,f=0); A2 = A(f0, f0);* *Warning Can do much better than this split!
- 6. The Problemdisconnected components
- 7. The Problemdisconnected components C = components(A);??? Undeﬁned function or method’components for input arguments of typedouble’.
- 8. The Problemdisconnected components *Warningthis isn’t a speaking, Strictly problem. However, it’s inefﬁcient to solve larger eigenproblems C = components(A); than required.??? Undeﬁned function or method’components for input arguments of typedouble’.
- 9. The Rescuedisconnected componentsMESHPART toolkit by John Gilbert and Sheng-hua Teng C = components(A); Uses Matlab’s dmperm function
- 10. The Failed Rescuedisconnected components C = components(A); caused Matlab to randomly crashI wanted a fast max-ﬂow routine too
- 11. Matlab and the Boost graph libraryMatlabBGL
- 12. The Recoupworking recursive spectral partitioningcode using Boost graph library in C++including a max-ﬂow heuristic extensionBoost graph library has a componentsfunction and many other graphalgorithmsBoost has a “generic” graph data-type
- 13. The Ideaadd graph algorithms to Matlabnaturally using Boost graph library
- 14. The Plangraph data type = Matlab sparse matrixresults= “natural” Matlab types
- 15. The Plan A = load_adjacency_matrix d = bfs(A,1); d = dijkstra(A,size(A,1)); T = mst(A); c = components(A); F = maxﬂow(A,s,t); test_dag(A) [ﬂag,K] = test_planarity(A);
- 16. The Plansuitable for large problems = 10 million edges circa 2006= avoid copying data
- 17. The CatchBoost graph type Matlab sparse type compressed sparse column vertices(G) 1:nedges(G) [i,j,w] = ﬁnd(A);num_vertices(G) size(A,1)out_edges(G,v) [~,j,w] = ﬁnd(A(v,:))adjacenct(G,v) [~,j] = ﬁnd(A(v,:))
- 18. graph as an in-edge list. We briey illustrate compressed row and column storage schemes in g-ure .. 2 12 4 Compressed sparse row 16 20 rp 1 3 5 7 9 11 11 1 10 4 9 7 6 13 4 ci 2 3 3 4 2 5 3 6 4 6 3 14 5 ai 16 13 10 12 4 14 9 20 7 40 0 Compressed sparse column 16 13 0 0 0 0 cp 0 10 12 0 0 0 1 1 3 6 8 9 11 4 0 0 14 0 20 0 9 0 0 0 4 ri 1 3 1 2 4 2 5 3 4 5 0 0 7 0 0 0 ai 16 4 13 10 9 12 7 14 20 4 0 0 0 0 Most graph algorithms are designed to work with out-edge lists instead of
- 19. The Compromisemake a transpose when its requiredbut let “smart” users by-pass it
- 20. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph The Detailsfunction from Boost, for example. Next, gure . shows the high level architecture of MatlabBGL. ere dfs dfs bfs bfs Sparse Matrix CSR Graph mst primmst M code extern c code mex code c++ code CSR Graph Boost Matlab libmbglare four main components: m-les, mex-les, libmbgl, and BGL functions.
- 21. MatlabBGL – Version 1.0Released April 2006 on Matlab File exchangeJuly ‘06 v2.0 added visitorsApril ‘07 v2.1 64-bit MatlabApril ‘08 v3.0 performance improvementOct ‘08 v4.0 planarity testing, layout,structural zerosJan ‘12 v5.0 update forthcoming?
- 22. ImpactDownloaded over 20,000 timesUsed in over 10 publications by others!including a PNAS article on brain topologyIdentiﬁed numerous bugs in the Boost graph library
- 23. Impact
- 24. Network Partitioning… and now for a demo …
- 25. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graphfunction from Boost, for example. Next, gure . shows the high level architecture of MatlabBGL. ere The Devil of the Details dfs dfs bfs bfs Sparse Matrix CSR Graph mst primmst M code extern c code mex code c++ code CSR Graph Boost Matlab libmbgl Compile mex ﬁles on Compile libmbgl on OSX/Linux/Win in OSX/Linux/Win in are four main components: m-les, mex-les, libmbgl, and 64-bit functions. 32-bit and 64-bit mode 32-bit and BGL modeLet’s illustrate a typical call to a MatlabBGL function: dfs for a depth-rstsearch through the graph.
- 26. The Devil of the DetailsHard to keep up with changes in MatlabHard for users to compile themselves(changes in Boost and changes in Matlab)Hard to play around with new algorithms Mathworks graph library inbioinformatics toolbox
- 27. graph algorithms in matlab codegaimc
- 28. A visionfunction n=my1norm(x)n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc NoteElapsed time is 0.16 seconds R2007b on 64-bit linux tic, n1 = norm(x,1); toc;Elapsed time is 0.32 seconds
- 29. A visionfunction n=my1norm(x)n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc NoteElapsed time is 0.16 seconds R2007b on 64-bit linux tic, n1 = norm(x,1); toc;Elapsed time is 0.32 seconds
- 30. A visionfunction n=my1norm(x)n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc NoteElapsed time is 0.15 seconds R2011a on 64-bit osx tic, n1 = norm(x,1); toc;Elapsed time is 0.1 seconds
- 31. Quite impressedget within spitting distance of vectorizedperformance using Matlab for loopseven faster than some things in python
- 32. Another ideaimplement graph algorithms in pureMatlab codeshould only be “somewhat” slowermuch more portable
- 33. More problemsfunction calls make things REALLY slow(unless the function is built-in, e.g. abs)mst and dijkstra need a heap, a heap in Matlab?
- 34. Problem speciﬁcsfunction n=my1normfunc(x)n = 0;for i=1:numel(x),n=n+abs1(x(i)); endfunction a=myabs(a), if a0, a=-a; end tic, n1=mynorm1(x); toc NoteElapsed time is 0.15 seconds R2011a on 64-bit osx tic, n1 = my1normfunc(x,1); toc;Elapsed time is 3.16 seconds
- 35. tation of a heap.ion is inspired by Kahaner []. From a More generally speaking, algorithmsap is a binary tree where smaller elements are written in Fortran are excellent can- A heap in Matlab codeupports the following operations: didates for the Matlab just-in-time compiler. nt to the heap; ement from the heap with the smallest e array 5 6 7 1 9 6 Old reference lue of an element in the heap. corresponds to the following tree: D. K. Kahaner s (or vectors), and a common way to store a 5ociate Algorithm 561: a le child the tree node of index j with index 2 j + 1. See gure . for an example. Fortran implementationMatlab heap will consist of four arrays and one 8 7 of heap programs. ACM TOMS 1980 tores the identiers of the items in the heap. 1 9 6 the element in tree node i and T(1) is the id t of the heap tree. Figure 6.3 – Binary trees as arrays. tores ids of elements in D so that D(T(i)) is
- 36. Graph access, take 1Simple, efﬁcient neighbor accessAt = A’;[v,~,w] = ﬁnd(At(:,u));
- 37. Graph access, take 2Complicated neighbor access[i,j,w] = ﬁnd(A);[ai,aj,a] = indexed2csr(i,j,w,size(A,1))v = aj(ai(u):ai(u+1));
- 38. Graph accessbfs, take 1 At=A’; for w=ﬁnd(A(:,v)) tic, d=bfs(A,1), tocElapsed time 0.05 secondsbfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) … tic, d=bfs(A,1), tocElapsed time, 0.007 seconds
- 39. Graph accessbfs, take 1 At=A’; for w=ﬁnd(A(:,v)) tic, d=bfs(A,1), tocElapsed time 0.05 secondsbfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) … tic, d=bfs(A,1), tocElapsed time, 0.007 seconds
- 40. gaimcconvert input to CSR arraysrun graph algorithms on CSR arraysbfs, clustering coefﬁents, core numbers,cosine knn, dfs, dijkstra, ﬂoyd warshall,mst, strong componentsbipartite_matching (Thanks to Ying Wang)
- 41. nstances of a random symmetric graph with average degree and0, and 10000 vertices. e aggregated results of all these tests are shgure .. The pudding function s=mysumsq(x) 14 Standard 12 = 0; Fast s for i=1:numel(x), s = s + x(i)^2; end 10 x = randn(1e7,1); Slowdown 8 tic, s1 = mysumsq(x); toc; 6 4 tic, s2 = x’*x; toc 2 0 dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs 6.4 – Performance of the gaimc library. An experimental comparison of the perform
- 42. nstances of a random symmetric graph with average degree and0, and 10000 vertices. e aggregated results of all these tests are shgure .. The pudding changes function s=mysumsq(x) 35 14 Standard Standard Fast 12 = 0; Fast s 30 for i=1:numel(x), s = s + x(i)^2; end 25 10 x = randn(1e7,1); Slowdown Slowdown 208 tic, s1 = mysumsq(x); toc; 156 104 tic, s2 = x’*x; toc 52 00 dfs dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs 6.4 – Performance of the gaimc library. An experimental comparison of the perform
- 43. Afterward“putting the graph into Matlab”Matlab could just as easily have beencalled “Graphlab” with a few extrafunctionsIt’s a great environment to play withgraphs as matrices

No public clipboards found for this slide

I wasted many hours doing this myself about 7 or 8 years ago.

Gaimc is a great idea for those who want pure Matlab.