ACCELERATOR-AWARE
TASK SYNCHRONIZATION FOR
REAL-TIME SYSTEMS
Yu-Chen Wu, Che-Wei Chang, Tei-Wei Kuo, Chi-Sheng Shih
Depart...
INTRODUCTION
National Taiwan University, Taiwan 2
INTRODUCTION(1/2)
• Heterogeneous computing is highly important to
various system designs
• Various accelerator Supports :...
INTRODUCTION(2/2)
• The trend of heterogeneous computing unavoidably
leads to complicated task synchronization problems
• ...
RELATED WORK
National Taiwan University, Taiwan 5
RELATED WORK
• Priority Ceiling Protocol (PCP)
• Classic priority-ceiling-based protocol
• Extensions
• SRP
• MPCP
• MSRP
...
THE SYSTEM MODEL
National Taiwan University, Taiwan 7
THE SYSTEM MODEL(1/2)
HARDWARE PLATFORM
• Target platforms
• A single-core CPU
• Multiple kinds of accelerators
• Each kin...
THE SYSTEM MODEL(2/2)
SOFTWARE
• Tasks are statically assigned priorities
• The priority of the task Ti is Pi
• Px-1 > Px ...
MOTIVATIONS AND
CHALLENGES
National Taiwan University, Taiwan 10
MOTIVATIONS
• A naive idea : Applying the traditional protocols
• Managing accelerators as semaphores
• If we apply protoc...
AN EXAMPLE (A PCP SCHEDULE)
• Should the system block T1 in requesting the
accelerator A1 when T2 is using the A2 ?
• Will...
A MORE REASONABLE SCHEDULE
National Taiwan University, Taiwan
13
T1
T2
T3
A2
A1 A2
S1 S1 S2 S1
S2 S1 S2 S2
lock A2
lock A1...
CHALLENGES
• How to improve the utilization of accelerators?
• How to guarantee
• Deadlock freeness
• Bounded blocked dura...
OUR APPROACH
National Taiwan University, Taiwan 15
IDEAS
• Should we manage accelerators as normal resources?
• In many applications, there is no deadlock among
accelerator ...
THE ASSUMPTIONS
• A task never holds a type of accelerator and then
request another type of accelerator (because of
synchr...
THE CONCEPT OF ACCELERATOR-
AWARE SYNCHRONIZATION
PROTOCOL (ASP)
• Basic idea
• Managing accelerators as semaphores might ...
NECESSARY BLOCKINGS AND
UNNECESSARY BLOCKINGS
• Necessary Blocking
• If the blocking is removed, then the request might ca...
PRIORITY BARS
• Definition:
• Each accelerator type owns a priority bar
• It equals to the highest priority of the tasks wh...
AN EXAMPLE OF PRIORITY BARS
• Consider an accelerator type A which has 3 instances.
• Three tasks T1, T2, T3 will request ...
THE RULES FOR ACCELERATORS
• For any accelerator request of a task Ti, the request will be
granted if all of the three con...
THE RULES FOR SEMAPHORES
• For any task Ti to lock a semaphore Sj, it will be granted
by the ASP if and only if both of th...
PROPERTIES
• The ASP guarantees being deadlock-free
• A task has at most one accelerator blocking.
• A task has at most on...
TRADEOFF BETWEEN TASK
BLOCKING AND ACCELERATOR
UTILIZATION
• Can we further improve the utilization? Think about…
• Does e...
CRITICAL ACCELERATORS
• Critical accelerators
• Concept: Accelerators that might incur direct/indirect
blockings
• These a...
AN EXTENDED ASP
• For any task to request a semaphore, the rule is as the
same as those of the original ASP.
• For any tas...
THE EXPERIMENTS
National Taiwan University, Taiwan 28
THE PARAMETERS
Parameters Units
#tasks
#critical tasks 6
#ordinary tasks 24
#types of accelerators 10
#units of each type ...
THE PARAMETERS OF TASKS
Parameters Units
#accelerator requests 1~4 Uniform Random
#semaphore requests 0~4 Uniform Random
r...
THE SIMULATION
• To observe the following performance metrics under the various total
workloads of the accelerators and th...
THE RESULT OF SIMULATION-
DEADLINE SATISFACTION RATIOS
• The PCP is a multiple-instance PCP
• X-SYSTEM : the deadline sati...
THE RESULT OF SIMULATION-
UTILIZATION
• CPU : the utilization of the CPU
• ACC: the utilization of the accelerators
Nation...
CONCLUSIONS
• The experiment results show the ASP can
• Provide similar or better capability to control priority
inversion...
Q & A
National Taiwan University, Taiwan 35
THANK YOU
THE PRIORITY INHERITANCE RULE
• A task might be blocked by multiple tasks
• Let lower priority tasks release occupied acce...
SCHEDULABILITY TEST
• For N tasks {T1, T2, ..., TN } with priorities P1>P2> ... >PN,
each task Ti is schedulable if the fo...
THE WORST CASE
• The priority bar of accelerator used in a chained
blocking is lower.
• The bar cannot protect higher-prio...
ABOUT IMPLEMENT
• The rule 3 for accelerator allocation
• Two variable to maintain the scope of priorities
• The scope is ...
OTHER ASSUMPTIONS (MISC)
• A request gets all of the necessary units or nothing.
• All of the obtained units are released ...
CRITICAL SEMAPHORES AND
CRITICAL ACCELERATORS
• Critical semaphores
• The semaphores which would be requested by any
criti...
THE SIMULATION 2
• To observe the following performance metrics under
the various accelerators contention
• The deadline s...
THE RESULT OF SIMULATION 2-
DEADLINE SATISFACTION RATIOS
National Taiwan University, Taiwan
44
THE RESULT OF SIMULATION 2-
UTILIZATION
National Taiwan University, Taiwan
45
Upcoming SlideShare
Loading in …5
×

Accelerator-Aware Task Synchronization for Real-Time Systems (ISORC14)

310 views

Published on

This work is motivated by the needs to synchronize task executions
where tasks might use semaphores to protect their critical sections
and run over accelerators. In particular, the Priority Ceiling
Protocol is extended to manage priority inversion caused by
accelerator usages. By recognizing the difference between an
accelerator and a semaphore, higher-priority tasks are less likely
blocked by lower-priority tasks, due to their requests for an
accelerator. In particular, blocking that will not contribute to any
deadlock and/or chained blocking is allowed in a managed way, with
an objective to maximize the utilization of accelerators. A series
of experiments is then conducted to derive insights to task
synchronization when accelerators might be used.

Published in: Engineering
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
310
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Accelerator-Aware Task Synchronization for Real-Time Systems (ISORC14)

  1. 1. ACCELERATOR-AWARE TASK SYNCHRONIZATION FOR REAL-TIME SYSTEMS Yu-Chen Wu, Che-Wei Chang, Tei-Wei Kuo, Chi-Sheng Shih Department of Computer Science and Information Engineering, National Taiwan University, Taiwan Department of Computer Science and Information Engineering, Chang Gung University, Taiwan
  2. 2. INTRODUCTION National Taiwan University, Taiwan 2
  3. 3. INTRODUCTION(1/2) • Heterogeneous computing is highly important to various system designs • Various accelerator Supports : GPU,DSP, ASIC, etc. • 42 of the top 500 supercomputer systems are equiped with GPUs(November 2013)[1] • Growing popularity of FPGA • Projects liquid metal[2] and kova[3] of IBM • Altera and Xilinx support the OpenCL National Taiwan University, Taiwan 3 [1] http://www.top500.org [2] IBM Liquid Metal, 2013. http://researcher.watson.ibm.com/researcher/ view_project.php?id=122. [3]H.P.Hofstee. The Big Deal about Big Data –a perspective from IBM Research, 2013. http://www.nas- conference.org/NAS- 2013/conference%20speech/NAS%20XiAn%20Big%20Data.pdf
  4. 4. INTRODUCTION(2/2) • The trend of heterogeneous computing unavoidably leads to complicated task synchronization problems • Deadlock issues • Performance issues • Real-time issues National Taiwan University, Taiwan 4
  5. 5. RELATED WORK National Taiwan University, Taiwan 5
  6. 6. RELATED WORK • Priority Ceiling Protocol (PCP) • Classic priority-ceiling-based protocol • Extensions • SRP • MPCP • MSRP • RWPCP • Little consideration to maximize the utilization! • k-Exclusion Real-Time Locking Protocol for Multi-GPU Systems • Reserve Inheritance with X server(RIX) protocol • GPU management • Resolving of the priority inversion problem in having services from the X server • No considerations of hard real time performance • The target of k-Exclusion Real-Time Locking Protocol and RIX protocol are different from ours. National Taiwan University, Taiwan 6
  7. 7. THE SYSTEM MODEL National Taiwan University, Taiwan 7
  8. 8. THE SYSTEM MODEL(1/2) HARDWARE PLATFORM • Target platforms • A single-core CPU • Multiple kinds of accelerators • Each kind of accelerator is of multiple units. • non-preemptable allocation • mutually exclusive allocation • Is the target platform realistic? • In some specific applications, a FPGA implementation outperforms a dual-core Intel Xeon processor at 2.6 GHz.[1][2] • 500X performance • 30% energy consumption • The Freescale C29x crypto coprocessor • A single-core embedded processor • 2/3 (C292/C293) security engines • The target platform is an extension. 8 [1] T.Todm an, G .Constantinides, S. J. E . Wilton, O .Mencer, W. Luk, and P. Y. K . Cheung. Reconfigurable computing: architectures and design methods. Computers and Digital Techniques, IEE Proceedings -, 152(2):193–207, Mar 2005. [2] N .Telle, W. L uk, and R. Cheung. Customising hardware designs for elliptic curve cryptography. In A . Pimentel and S.Vassiliadis, editors, Computer Systems: Architectures, Modeling, and Simulation, volume 3133 of Lecture Notes in Computer Science, pages 274–283. SpringerBerlin Heidelberg, 2004.National Taiwan University, Taiwan
  9. 9. THE SYSTEM MODEL(2/2) SOFTWARE • Tasks are statically assigned priorities • The priority of the task Ti is Pi • Px-1 > Px >Px+1 • They are scheduled by the Rate-Monotonic algorithm • Jobs on accelerators are issued by tasks on the CPU • They are called subtasks • The associated requests are called accelerator requests • Synchronous usages: A task suspends its CPU-side execution until its subtask is finished • An accelerator is locked once it is allocated, and then it is released after a synchronization. • A request consists of requested type and number. • ex. two security engines 9 National Taiwan University, Taiwan
  10. 10. MOTIVATIONS AND CHALLENGES National Taiwan University, Taiwan 10
  11. 11. MOTIVATIONS • A naive idea : Applying the traditional protocols • Managing accelerators as semaphores • If we apply protocols by adopting the concept of priority ceiling • The utilization of accelerators might be unacceptably low. • The situation never happen in systems without accelerators • All subtask executions might be serialized! National Taiwan University, Taiwan 11
  12. 12. AN EXAMPLE (A PCP SCHEDULE) • Should the system block T1 in requesting the accelerator A1 when T2 is using the A2 ? • Will the request of T1 cause a deadlock? • Is it common that a task holds a GPU and then requests some FPGA? • Deadlocks never occur among accelerator requests if the behavior never occur. National Taiwan University, Taiwan 12 T1 T2 T3 A2 A1 A2 S1 S1 A2 S1 S2 S1 S2 S1 S2 S2 PCP
  13. 13. A MORE REASONABLE SCHEDULE National Taiwan University, Taiwan 13 T1 T2 T3 A2 A1 A2 S1 S1 S2 S1 S2 S1 S2 S2 lock A2 lock A1 free A2 free A1 and lock A2 S1 A2 T1 T2 T3 A2 A1 A2 S1 S1 A2 S1 S2 S1 S2 S1 S2 S2 lock A2 lock A1 free A2 free A1 and lock A2PCP
  14. 14. CHALLENGES • How to improve the utilization of accelerators? • How to guarantee • Deadlock freeness • Bounded blocked duration National Taiwan University, Taiwan 14
  15. 15. OUR APPROACH National Taiwan University, Taiwan 15
  16. 16. IDEAS • Should we manage accelerators as normal resources? • In many applications, there is no deadlock among accelerator requests • Should we still need such strict policies such as the PCP? • If we relax the rules in managing accelerators… • Could we have a higher utilization? • More jobs could meet their deadlines? 16 National Taiwan University, Taiwan
  17. 17. THE ASSUMPTIONS • A task never holds a type of accelerator and then request another type of accelerator (because of synchronous access). • All critical sections are properly nested. • An accelerator request can be nested in a critical section. • A critical section can never be initiated by a subtask on an accelerator. • When a job terminates(completes/misses deadline), it must release all of the resources obtained by itself. National Taiwan University, Taiwan 17
  18. 18. THE CONCEPT OF ACCELERATOR- AWARE SYNCHRONIZATION PROTOCOL (ASP) • Basic idea • Managing accelerators as semaphores might cause some unnecessary blockings. • These unnecessary blockings are caused by some strict policy • Now our problem is “what kind of blocking is necessary and what should be considered being unnecessary” • We solve the problem by jointly management • Strict policies for resources that might incur a deadlock • Semaphore Allocation • Relaxed policies for resources that never incur a deadlock • Accelerator Allocation National Taiwan University, Taiwan 18
  19. 19. NECESSARY BLOCKINGS AND UNNECESSARY BLOCKINGS • Necessary Blocking • If the blocking is removed, then the request might cause • A deadlock • Some unacceptable priority inversions • Unnecessary Blocking • Even if the blocking is removed, • the system is still deadlock-free • We have reasonable/acceptable priority inversions National Taiwan University, Taiwan 19 T1 T2 T3 A2 A1 A2 S1 S1 A2 S1 S2 S1 S2 S1 S2 S2 lock A2 lock A1 free A2 free A1 and lock A2PCP
  20. 20. PRIORITY BARS • Definition: • Each accelerator type owns a priority bar • It equals to the highest priority of the tasks which might request accelerators of the type, and the currently available accelerators of the type can not satisfy the needs of each of the tasks. • Define the bars so as to control the number of priority inversion • Notation : Ψ(A1) , the priority bar of the accelerator type A1 National Taiwan University, Taiwan 20
  21. 21. AN EXAMPLE OF PRIORITY BARS • Consider an accelerator type A which has 3 instances. • Three tasks T1, T2, T3 will request 1, 2 and 3 instances of A, respectively, where P1 > P2 > P3. • At the initial state, Ψ(A) should be assigned to the lowest priority • Case 1: If T2 is using two accelerators now. Only T3 might be blocked in the future. Thus, Ψ(A) is P3 now. • Case 2: If T3 is using all of the 3 instances, Ψ(A) is P1 now. National Taiwan University, Taiwan 21
  22. 22. THE RULES FOR ACCELERATORS • For any accelerator request of a task Ti, the request will be granted if all of the three conditions shown below are satisfied; otherwise, the request should be blocked: 1. The number of the available instances of the requested accelerator type is no less than the number of the requested instances. 2. Pi is no less than the priority bars of all accelerator types which are currently used by other tasks. 3. Pi is no less than any of the priorities of the tasks which have been blocked due to their accelerator requests. Or, if the request is granted, then the priority bar of the requested accelerator type is less than any of the priorities of the tasks which have been blocked, due to their accelerator requests. National Taiwan University, Taiwan 22
  23. 23. THE RULES FOR SEMAPHORES • For any task Ti to lock a semaphore Sj, it will be granted by the ASP if and only if both of the following two conditions are met: 1. Pi is larger than all of the priority ceilings of the semaphores which are currently locked by other tasks. (PCP) 2. Pi is larger than all of the priorities of the tasks which are running on accelerators, or π(Sj) is lower than all of the priorities of the tasks which are currently at their accelerator subtask parts. • If a task is blocked, we adopt the standard priority inheritance protocol (PIP) National Taiwan University, Taiwan 23
  24. 24. PROPERTIES • The ASP guarantees being deadlock-free • A task has at most one accelerator blocking. • A task has at most one semaphore blocking. • Any blocking chain of a system consists of at most two blockings. National Taiwan University, Taiwan 24 blocking blocking
  25. 25. TRADEOFF BETWEEN TASK BLOCKING AND ACCELERATOR UTILIZATION • Can we further improve the utilization? Think about… • Does each task have a strict time constraint? • If each task can accept more blocking, then the system utilization might be further improved • Idea : Classify tasks into two kinds of tasks • Critical tasks • Tasks which are associated with hard deadlines. • We discussed it previously. • Ordinary tasks • Tasks don’t have strict time constraints. • Have no deadline • Soft real-time • Priorities of critical tasks are always higher than priorities of ordinary tasks National Taiwan University, Taiwan 25
  26. 26. CRITICAL ACCELERATORS • Critical accelerators • Concept: Accelerators that might incur direct/indirect blockings • These accelerators be managed carefully National Taiwan University, Taiwan 26
  27. 27. AN EXTENDED ASP • For any task to request a semaphore, the rule is as the same as those of the original ASP. • For any task to request a critical accelerator, the rule is as the same as the original ASP • For any task to request accelerators which are not critical accelerators • A request is granted if the number of the available instances of the requested accelerator type is enough to satisfy the request ((i.e., available resources ≥ needs)). National Taiwan University, Taiwan 27
  28. 28. THE EXPERIMENTS National Taiwan University, Taiwan 28
  29. 29. THE PARAMETERS Parameters Units #tasks #critical tasks 6 #ordinary tasks 24 #types of accelerators 10 #units of each type of acc 1~3 Uniform Random #binary semaphores 10 National Taiwan University, Taiwan 29
  30. 30. THE PARAMETERS OF TASKS Parameters Units #accelerator requests 1~4 Uniform Random #semaphore requests 0~4 Uniform Random relative deadline 5000~1000000 cycles Uniform Random period Relative deadline National Taiwan University, Taiwan 30 • We simulate the performance with synthesized workloads which refer to • A Generic Avionics Platform (GAP): A real system[15] • The benchmark Parboil: A benchmark for GPGPU computing[18] [15]C. Locke, D. Vogel, and T. Mesler. Building a predictable avionics platform in ada: a case study. In Real-Time Systems Symposium, 1991. Proceedings., Twelfth, pages 181–189, 1991. [18] J. A. Stratton, C. Rodrigues, I. J. Sung, N. Obeid, L. W. Chang, N. Anssari, G. D. Liu, and W. W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 2012.
  31. 31. THE SIMULATION • To observe the following performance metrics under the various total workloads of the accelerators and the CPU • The deadline satisfaction ratios = (‫݄݁ݐ‬ ݊‫ݎܾ݁݉ݑ‬ ‫݂݋‬ ݆‫ݏܾ݋‬ ‫ݐ݄ܽݐ‬ ݉݁݁‫ݐ‬ ݈݀݁ܽ݀݅݊݁‫)ݏ‬ (‫݄݁ݐ‬ ݊‫ݎܾ݁݉ݑ‬ ‫݂݋‬ ݆‫)ݏܾ݋‬⁄ • The Utilizations • For each task, the following settings are generated randomly, and the distribution is an uniform one. • (‫݈ܽݐ݋ݐ‬ ݁‫݊݋݅ݐݑܿ݁ݔ‬ ‫)݁݉݅ݐ‬ (‫݁ݒ݅ݐ݈ܽ݁ݎ‬ ݈݀݁ܽ݀݅݊݁)⁄ = 10% ~ 40% • (‫݈ܽݐ݋ݐ‬ ݈ܽܿܿ݁݁‫ݎ݋ݐܽݎ‬ ݁‫݊݋݅ݐݑܿ݁ݔ‬ ‫)݁݉݅ݐ‬ (‫݈ܽݐ݋ݐ‬ ݁‫݊݋݅ݐݑܿ݁ݔ‬ ‫)݁݉݅ݐ‬⁄ = 80%~95% • The benchmark Parboil is referenced for these settings • Platform: Intel Core i5- 760, AMD Radeon HD7700, and 8GB memory • The expected total accelerator utilization of the task set was from 30% to 100%. National Taiwan University, Taiwan 31 Period/relative deadline Total execution time(40%) Total accelerator execution time(80%)
  32. 32. THE RESULT OF SIMULATION- DEADLINE SATISFACTION RATIOS • The PCP is a multiple-instance PCP • X-SYSTEM : the deadline satisfaction ratios of all tasks under the X protocol • X-CT: the deadline satisfaction ratios of critical tasks under the X protocol National Taiwan University, Taiwan 32
  33. 33. THE RESULT OF SIMULATION- UTILIZATION • CPU : the utilization of the CPU • ACC: the utilization of the accelerators National Taiwan University, Taiwan 33
  34. 34. CONCLUSIONS • The experiment results show the ASP can • Provide similar or better capability to control priority inversions • Much improve the system utilization • Managing different types of resources with different policies could improve the system utilization by avoiding unnecessary blocking. • The concept could be applied to other protocols that adopt the concept of priority ceiling National Taiwan University, Taiwan 34
  35. 35. Q & A National Taiwan University, Taiwan 35
  36. 36. THANK YOU
  37. 37. THE PRIORITY INHERITANCE RULE • A task might be blocked by multiple tasks • Let lower priority tasks release occupied accelerators as soon as possible • When the task Ti is blocked • A task inherits the Pi if all of the following conditions are satisfied • A subtask of the task is finished, and then the task is ready to sync • The task has the highest priority among these tasks ready to sync National Taiwan University, Taiwan 37
  38. 38. SCHEDULABILITY TEST • For N tasks {T1, T2, ..., TN } with priorities P1>P2> ... >PN, each task Ti is schedulable if the following condition is satisfied: • Li,j : the length of the jth critical section of the task Ti • Ci,j : the execution time of the jth subtask of the task Ti National Taiwan University, Taiwan 38
  39. 39. THE WORST CASE • The priority bar of accelerator used in a chained blocking is lower. • The bar cannot protect higher-priority tasks • A high priority task is blocked because of another accelerator. National Taiwan University, Taiwan 39
  40. 40. ABOUT IMPLEMENT • The rule 3 for accelerator allocation • Two variable to maintain the scope of priorities • The scope is from the highest priority to the lowest priority of the tasks which have been blocked once due to accelerator requests • Granting the request if one of following is satisied • Pi >= the highest priority • Pre-calculated priority ceiling < the lowest priority National Taiwan University, Taiwan 40
  41. 41. OTHER ASSUMPTIONS (MISC) • A request gets all of the necessary units or nothing. • All of the obtained units are released after the usage. • Get 2 units and then release 2 units National Taiwan University, Taiwan 41
  42. 42. CRITICAL SEMAPHORES AND CRITICAL ACCELERATORS • Critical semaphores • The semaphores which would be requested by any critical task. • Critical accelerators • All of the accelerators that might be requested by any critical task or requested within critical sections protected critical semaphores. National Taiwan University, Taiwan 42
  43. 43. THE SIMULATION 2 • To observe the following performance metrics under the various accelerators contention • The deadline satisfaction ratios • Utilizations • The total CPU utilization of the task set is fixed • We adjust the ratio of the accelerator execution time to the total execution time of each task ranging from 10% to 90%. • Synthesis of different degrees of accelerator contention National Taiwan University, Taiwan 43
  44. 44. THE RESULT OF SIMULATION 2- DEADLINE SATISFACTION RATIOS National Taiwan University, Taiwan 44
  45. 45. THE RESULT OF SIMULATION 2- UTILIZATION National Taiwan University, Taiwan 45

×