Sparse Data Structures for Weighted Bipartite
                 Matching
                        E. Jason Riedy
           ...
Use Sparse Matrix Optimizations...




Take a fixed, simple algorithm: Auction alg. for matchings
    Repeated iterations o...
Where’s the Time Going?



Auction algorithm: Iterative, greedy algorithm bipartite matching:
          1. An unmatched ro...
Time linear in entries examined...
Number of entries examined is problem-dependent.

           4.5
            4
        ...
Expensive Inner Loop!
1.3 GHz Itanium 2
10


 8


 6


 4


 2


 0
  100    150        200    250       300     350   400...
Verifying...
Using kcachegrind (N.Nethercote and J.Weidendorfer) and valgrind
(J. Seward).
And Locating...


No obvious culprits in the instructions...
And Locating...


But considering cache effects!
Auction’s Inner Loop



                                 Price
Index
Entry



        value = entry - price
        save l...
Auction’s Inner Loop


Same accesses as sparse matrix-vector multiplication!
                                             ...
Performance Through Blocking?




(Images swiped from Berkeley’s BeBOP group.)
Performance Through Blocking?




(Images swiped from Berkeley’s BeBOP group.)
Performance Through Blocking?




More entries, but 1.5× performance on Pentium 3!
(Images swiped from Berkeley’s BeBOP gr...
Blocking Speeds Some Matches
Finite element matrix from Vavasis (in UF collection):
Blocking Speeds Some Matches
Blocking Speeds Some Matches
Observations



A blocked graph data structure may provide additional
performance if:
    you iterate over whole rows,
   ...
Upcoming SlideShare
Loading in …5
×

Sparse Data Structures for Weighted Bipartite Matching

1,311 views

Published on

A stab at optimizing the inner loop for auctiion-based, sparse bipartite matching.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,311
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sparse Data Structures for Weighted Bipartite Matching

  1. 1. Sparse Data Structures for Weighted Bipartite Matching E. Jason Riedy Dr. James Demmel (. . . and thanks to the BeBOP group) SIAM Workshop on Combinatorial Scientific Computing 2004
  2. 2. Use Sparse Matrix Optimizations... Take a fixed, simple algorithm: Auction alg. for matchings Repeated iterations over a sparse graph. What’s expensive, and is there anything we can do about it? Take an idea from optimizing sparse matrix-vector products. A little speed-up in some cases, but there are more ideas available...
  3. 3. Where’s the Time Going? Auction algorithm: Iterative, greedy algorithm bipartite matching: 1. An unmatched row i finds a “most profitable” column j π(i) = maxj b(i, j) − p(i) 2. Row i places a bid for column j. Bid price raised until j is no longer the best choice. (Min. increment µ) 3. Highest bid gets the matching (i, j).
  4. 4. Time linear in entries examined... Number of entries examined is problem-dependent. 4.5 4 3.5 3 time (s) 2.5 2 1.5 1 0.5 0 0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 number of entries examined
  5. 5. Expensive Inner Loop! 1.3 GHz Itanium 2 10 8 6 4 2 0 100 150 200 250 300 350 400 450 cycles per entry
  6. 6. Verifying... Using kcachegrind (N.Nethercote and J.Weidendorfer) and valgrind (J. Seward).
  7. 7. And Locating... No obvious culprits in the instructions...
  8. 8. And Locating... But considering cache effects!
  9. 9. Auction’s Inner Loop Price Index Entry value = entry - price save largest...
  10. 10. Auction’s Inner Loop Same accesses as sparse matrix-vector multiplication! Price Index Entry y += a(i,j) * x(j)
  11. 11. Performance Through Blocking? (Images swiped from Berkeley’s BeBOP group.)
  12. 12. Performance Through Blocking? (Images swiped from Berkeley’s BeBOP group.)
  13. 13. Performance Through Blocking? More entries, but 1.5× performance on Pentium 3! (Images swiped from Berkeley’s BeBOP group.)
  14. 14. Blocking Speeds Some Matches Finite element matrix from Vavasis (in UF collection):
  15. 15. Blocking Speeds Some Matches
  16. 16. Blocking Speeds Some Matches
  17. 17. Observations A blocked graph data structure may provide additional performance if: you iterate over whole rows, the graph / matrix has runs of columns, and you’re willing to use an automated tuning system. Maximizing the runs: linear arrangement. Hard, but there may be cheap heuristics. Only worth-while if you’re performing many iterations. (For mat-vec, often > 50 computations of Ax.)

×