Dynamic Load-balancing On Graphics Processors

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Dynamic Load-balancing On Graphics Processors - Presentation Transcript

    1. On Dynamic Load Balancing on Graphics Processors
      Daniel Cederman and Philippas Tsigas
      Chalmers University of Technology
    2. Overview
      Motivation
      Methods
      Experimental evaluation
      Conclusion
    3. The problem setting
      Work
      Offline
      Task
      Task
      Task
      Task
      Task
      Task
      Task
      Online
      Task
      Task
      Task
      Task
    4. Static Load Balancing
      Processor
      Processor
      Processor
      Processor
    5. Static Load Balancing
      Processor
      Processor
      Processor
      Processor
      Task
      Task
      Task
      Task
    6. Static Load Balancing
      Processor
      Processor
      Processor
      Processor
      Task
      Task
      Task
      Task
    7. Static Load Balancing
      Processor
      Processor
      Processor
      Processor
      Task
      Task
      Task
      Task
      Subtask
      Subtask
      Subtask
      Subtask
    8. Static Load Balancing
      Processor
      Processor
      Processor
      Processor
      Task
      Task
      Task
      Task
      Subtask
      Subtask
      Subtask
      Subtask
    9. Dynamic Load Balancing
      Processor
      Processor
      Processor
      Processor
      Task
      Task
      Task
      Task
      Subtask
      Subtask
      Subtask
      Subtask
    10. Task sharing
      Check condition
      Work done?
      Done
      Task Set
      Acquire Task
      Try to get task
      Task
      Got task?
      No, retry
      Task
      Task
      Perform task
      Task
      New tasks?
      No, continue
      Add Task
      Task
      Add task
    11. System Model
      CUDA
      Global Memory
      Gather and scatter
      Compare-And-Swap
      Fetch-And-Inc
      Multiprocessors
      Maximum number ofconcurrent thread blocks
      Global Memory
      Multi-processor
      Multi-processor
      Multi-processor
      Thread Block
      Thread Block
      Thread Block
      Thread Block
      Thread Block
      Thread Block
      Thread Block
      Thread Block
      Thread Block
    12. Synchronization
      Blocking
      Uses mutual exclusion to only allow one process at a time to access the object.
      Lockfree
      Multiple processes can access the object concurrently. At least one operation in a set of concurrent operations finishes in a finite number of its own steps.
      Waitfree
      Multiple processes can access the object concurrently. Every operation finishes in a finite number of its own steps.
    13. Load Balancing Methods
      Blocking Task Queue
      Non-blocking Task Queue
      Task Stealing
      Static Task List
    14. Blocking queue
      Free
      TB 1
      Head
      TB 2
      Tail
      TB n
    15. Blocking queue
      Free
      TB 1
      Head
      TB 2
      Tail
      TB n
    16. Blocking queue
      Free
      TB 1
      Head
      TB 2
      T1
      Tail
      TB n
    17. Blocking queue
      Free
      TB 1
      Head
      TB 2
      T1
      Tail
      TB n
    18. Blocking queue
      Free
      TB 1
      Head
      TB 2
      T1
      Tail
      TB n
    19. Non-blocking Queue
      TB 1
      TB 1
      Head
      TB 2
      TB 2
      T1
      T2
      T3
      T4
      Tail
      TB n
      Reference
      P. Tsigas and Y. Zhang, A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems[SPAA01]
    20. Non-blocking Queue
      TB 1
      TB 1
      Head
      TB 2
      TB 2
      T1
      T2
      T3
      T4
      Tail
      TB n
    21. Non-blocking Queue
      TB 1
      TB 1
      Head
      TB 2
      TB 2
      T1
      T2
      T3
      T4
      Tail
      TB n
    22. Non-blocking Queue
      TB 1
      TB 1
      Head
      TB 2
      TB 2
      T1
      T2
      T3
      T4
      Tail
      TB n
    23. Non-blocking Queue
      TB 1
      TB 1
      Head
      TB 2
      TB 2
      T1
      T2
      T3
      T4
      T5
      Tail
      TB n
    24. Non-blocking Queue
      TB 1
      TB 1
      Head
      TB 2
      TB 2
      T1
      T2
      T3
      T4
      T5
      Tail
      TB n
    25. Task stealing
      T1
      TB 1
      T3
      T2
      TB 2
      TB n
      Reference
      Arora N. S., Blumofe R. D., Plaxton C. G. , Thread Scheduling for Multiprogrammed Multiprocessors [SPAA 98]
    26. Task stealing
      T1
      T4
      TB 1
      T3
      T2
      TB 2
      TB n
    27. Task stealing
      T1
      T4
      T5
      TB 1
      T3
      T2
      TB 2
      TB n
    28. Task stealing
      T1
      T4
      TB 1
      T3
      T2
      TB 2
      TB n
    29. Task stealing
      T1
      TB 1
      T3
      T2
      TB 2
      TB n
    30. Task stealing
      TB 1
      T3
      T2
      TB 2
      TB n
    31. Task stealing
      TB 1
      T2
      TB 2
      TB n
    32. Static Task List
      In
      T1
      T2
      T3
      T4
    33. Static Task List
      In
      T1
      TB 1
      T2
      TB 2
      T3
      TB 3
      T4
      TB 4
    34. Static Task List
      In
      Out
      T1
      TB 1
      T2
      TB 2
      T3
      TB 3
      T4
      TB 4
    35. Static Task List
      In
      Out
      T1
      T5
      TB 1
      T2
      TB 2
      T3
      TB 3
      T4
      TB 4
    36. Static Task List
      In
      Out
      T1
      T5
      TB 1
      T2
      T6
      TB 2
      T3
      TB 3
      T4
      TB 4
    37. Static Task List
      In
      Out
      T1
      T5
      TB 1
      T2
      T6
      TB 2
      T3
      T7
      TB 3
      T4
      TB 4
    38. Octree Partitioning
      Bandwidth bound
    39. Octree Partitioning
      Bandwidth bound
    40. Octree Partitioning
      Bandwidth bound
    41. Octree Partitioning
      Bandwidth bound
    42. Four-in-a-row
      Computation intensive
    43. Graphics Processors
      8800GT
      14 Multiprocessors
      57 GB/sec bandwidth
      9600GT
      8 Multiprocessors
      57 GB/sec bandwidth
    44. Blocking Queue – Octree/9600GT
    45. Blocking Queue – Octree/8800GT
    46. Blocking Queue – Four-in-a-row
    47. Non-blocking Queue – Octree/9600GT
    48. Non-blocking Queue – Octree/8800GT
    49. Non-blocking Queue - Four-in-a-row
    50. Task stealing – Octree/9600GT
    51. Task stealing – Octree/8800GT
    52. Task stealing – Four-in-a-row
    53. Static List
    54. Octree Comparison
    55. Previous work
      Korch M., Raubert T., A comparison of task pools for dynamic load balancing of irregular algorithms, Concurrency and Computation: Practice & Experience, 16, 2003
      Heirich A., Arvo J., A competetive analysis of load balancing strategies for parallel ray tracing, Journal of Supercomputing, 12, 1998
      Foley T., Sugerman J., KD-tree acceleration structures for a GPU raytracer, Graphics Hardware 2005
    56. Conclusion
      Synchronization plays a significant role in dynamic load-balancing
      Lock-free data structures/synchronization scales well and looks promising also in the GPU general purpose programming
      Locks perform poorly
      It is good that operations such as CAS and FAA have been introduced in the new GPUs
      Work stealing could outperform static load balancing
    57. Thank you!
      http://www.cs.chalmers.se/~dcs

    + daceddaced, 3 months ago

    custom

    193 views, 0 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 193
      • 193 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories