Your SlideShare is downloading. ×
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Accelerator-AwareTask Synchronization forReal-Time Systems (ISORC14)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Accelerator-Aware Task Synchronization for Real-Time Systems (ISORC14)

145

Published on

This work is motivated by the needs to synchronize task executions …

This work is motivated by the needs to synchronize task executions
where tasks might use semaphores to protect their critical sections
and run over accelerators. In particular, the Priority Ceiling
Protocol is extended to manage priority inversion caused by
accelerator usages. By recognizing the difference between an
accelerator and a semaphore, higher-priority tasks are less likely
blocked by lower-priority tasks, due to their requests for an
accelerator. In particular, blocking that will not contribute to any
deadlock and/or chained blocking is allowed in a managed way, with
an objective to maximize the utilization of accelerators. A series
of experiments is then conducted to derive insights to task
synchronization when accelerators might be used.

Published in: Engineering
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
145
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ACCELERATOR-AWARE TASK SYNCHRONIZATION FOR REAL-TIME SYSTEMS Yu-Chen Wu, Che-Wei Chang, Tei-Wei Kuo, Chi-Sheng Shih Department of Computer Science and Information Engineering, National Taiwan University, Taiwan Department of Computer Science and Information Engineering, Chang Gung University, Taiwan
  • 2. INTRODUCTION National Taiwan University, Taiwan 2
  • 3. INTRODUCTION(1/2) • Heterogeneous computing is highly important to various system designs • Various accelerator Supports : GPU,DSP, ASIC, etc. • 42 of the top 500 supercomputer systems are equiped with GPUs(November 2013)[1] • Growing popularity of FPGA • Projects liquid metal[2] and kova[3] of IBM • Altera and Xilinx support the OpenCL National Taiwan University, Taiwan 3 [1] http://www.top500.org [2] IBM Liquid Metal, 2013. http://researcher.watson.ibm.com/researcher/ view_project.php?id=122. [3]H.P.Hofstee. The Big Deal about Big Data –a perspective from IBM Research, 2013. http://www.nas- conference.org/NAS- 2013/conference%20speech/NAS%20XiAn%20Big%20Data.pdf
  • 4. INTRODUCTION(2/2) • The trend of heterogeneous computing unavoidably leads to complicated task synchronization problems • Deadlock issues • Performance issues • Real-time issues National Taiwan University, Taiwan 4
  • 5. RELATED WORK National Taiwan University, Taiwan 5
  • 6. RELATED WORK • Priority Ceiling Protocol (PCP) • Classic priority-ceiling-based protocol • Extensions • SRP • MPCP • MSRP • RWPCP • Little consideration to maximize the utilization! • k-Exclusion Real-Time Locking Protocol for Multi-GPU Systems • Reserve Inheritance with X server(RIX) protocol • GPU management • Resolving of the priority inversion problem in having services from the X server • No considerations of hard real time performance • The target of k-Exclusion Real-Time Locking Protocol and RIX protocol are different from ours. National Taiwan University, Taiwan 6
  • 7. THE SYSTEM MODEL National Taiwan University, Taiwan 7
  • 8. THE SYSTEM MODEL(1/2) HARDWARE PLATFORM • Target platforms • A single-core CPU • Multiple kinds of accelerators • Each kind of accelerator is of multiple units. • non-preemptable allocation • mutually exclusive allocation • Is the target platform realistic? • In some specific applications, a FPGA implementation outperforms a dual-core Intel Xeon processor at 2.6 GHz.[1][2] • 500X performance • 30% energy consumption • The Freescale C29x crypto coprocessor • A single-core embedded processor • 2/3 (C292/C293) security engines • The target platform is an extension. 8 [1] T.Todm an, G .Constantinides, S. J. E . Wilton, O .Mencer, W. Luk, and P. Y. K . Cheung. Reconfigurable computing: architectures and design methods. Computers and Digital Techniques, IEE Proceedings -, 152(2):193–207, Mar 2005. [2] N .Telle, W. L uk, and R. Cheung. Customising hardware designs for elliptic curve cryptography. In A . Pimentel and S.Vassiliadis, editors, Computer Systems: Architectures, Modeling, and Simulation, volume 3133 of Lecture Notes in Computer Science, pages 274–283. SpringerBerlin Heidelberg, 2004.National Taiwan University, Taiwan
  • 9. THE SYSTEM MODEL(2/2) SOFTWARE • Tasks are statically assigned priorities • The priority of the task Ti is Pi • Px-1 > Px >Px+1 • They are scheduled by the Rate-Monotonic algorithm • Jobs on accelerators are issued by tasks on the CPU • They are called subtasks • The associated requests are called accelerator requests • Synchronous usages: A task suspends its CPU-side execution until its subtask is finished • An accelerator is locked once it is allocated, and then it is released after a synchronization. • A request consists of requested type and number. • ex. two security engines 9 National Taiwan University, Taiwan
  • 10. MOTIVATIONS AND CHALLENGES National Taiwan University, Taiwan 10
  • 11. MOTIVATIONS • A naive idea : Applying the traditional protocols • Managing accelerators as semaphores • If we apply protocols by adopting the concept of priority ceiling • The utilization of accelerators might be unacceptably low. • The situation never happen in systems without accelerators • All subtask executions might be serialized! National Taiwan University, Taiwan 11
  • 12. AN EXAMPLE (A PCP SCHEDULE) • Should the system block T1 in requesting the accelerator A1 when T2 is using the A2 ? • Will the request of T1 cause a deadlock? • Is it common that a task holds a GPU and then requests some FPGA? • Deadlocks never occur among accelerator requests if the behavior never occur. National Taiwan University, Taiwan 12 T1 T2 T3 A2 A1 A2 S1 S1 A2 S1 S2 S1 S2 S1 S2 S2 PCP
  • 13. A MORE REASONABLE SCHEDULE National Taiwan University, Taiwan 13 T1 T2 T3 A2 A1 A2 S1 S1 S2 S1 S2 S1 S2 S2 lock A2 lock A1 free A2 free A1 and lock A2 S1 A2 T1 T2 T3 A2 A1 A2 S1 S1 A2 S1 S2 S1 S2 S1 S2 S2 lock A2 lock A1 free A2 free A1 and lock A2PCP
  • 14. CHALLENGES • How to improve the utilization of accelerators? • How to guarantee • Deadlock freeness • Bounded blocked duration National Taiwan University, Taiwan 14
  • 15. OUR APPROACH National Taiwan University, Taiwan 15
  • 16. IDEAS • Should we manage accelerators as normal resources? • In many applications, there is no deadlock among accelerator requests • Should we still need such strict policies such as the PCP? • If we relax the rules in managing accelerators… • Could we have a higher utilization? • More jobs could meet their deadlines? 16 National Taiwan University, Taiwan
  • 17. THE ASSUMPTIONS • A task never holds a type of accelerator and then request another type of accelerator (because of synchronous access). • All critical sections are properly nested. • An accelerator request can be nested in a critical section. • A critical section can never be initiated by a subtask on an accelerator. • When a job terminates(completes/misses deadline), it must release all of the resources obtained by itself. National Taiwan University, Taiwan 17
  • 18. THE CONCEPT OF ACCELERATOR- AWARE SYNCHRONIZATION PROTOCOL (ASP) • Basic idea • Managing accelerators as semaphores might cause some unnecessary blockings. • These unnecessary blockings are caused by some strict policy • Now our problem is “what kind of blocking is necessary and what should be considered being unnecessary” • We solve the problem by jointly management • Strict policies for resources that might incur a deadlock • Semaphore Allocation • Relaxed policies for resources that never incur a deadlock • Accelerator Allocation National Taiwan University, Taiwan 18
  • 19. NECESSARY BLOCKINGS AND UNNECESSARY BLOCKINGS • Necessary Blocking • If the blocking is removed, then the request might cause • A deadlock • Some unacceptable priority inversions • Unnecessary Blocking • Even if the blocking is removed, • the system is still deadlock-free • We have reasonable/acceptable priority inversions National Taiwan University, Taiwan 19 T1 T2 T3 A2 A1 A2 S1 S1 A2 S1 S2 S1 S2 S1 S2 S2 lock A2 lock A1 free A2 free A1 and lock A2PCP
  • 20. PRIORITY BARS • Definition: • Each accelerator type owns a priority bar • It equals to the highest priority of the tasks which might request accelerators of the type, and the currently available accelerators of the type can not satisfy the needs of each of the tasks. • Define the bars so as to control the number of priority inversion • Notation : Ψ(A1) , the priority bar of the accelerator type A1 National Taiwan University, Taiwan 20
  • 21. AN EXAMPLE OF PRIORITY BARS • Consider an accelerator type A which has 3 instances. • Three tasks T1, T2, T3 will request 1, 2 and 3 instances of A, respectively, where P1 > P2 > P3. • At the initial state, Ψ(A) should be assigned to the lowest priority • Case 1: If T2 is using two accelerators now. Only T3 might be blocked in the future. Thus, Ψ(A) is P3 now. • Case 2: If T3 is using all of the 3 instances, Ψ(A) is P1 now. National Taiwan University, Taiwan 21
  • 22. THE RULES FOR ACCELERATORS • For any accelerator request of a task Ti, the request will be granted if all of the three conditions shown below are satisfied; otherwise, the request should be blocked: 1. The number of the available instances of the requested accelerator type is no less than the number of the requested instances. 2. Pi is no less than the priority bars of all accelerator types which are currently used by other tasks. 3. Pi is no less than any of the priorities of the tasks which have been blocked due to their accelerator requests. Or, if the request is granted, then the priority bar of the requested accelerator type is less than any of the priorities of the tasks which have been blocked, due to their accelerator requests. National Taiwan University, Taiwan 22
  • 23. THE RULES FOR SEMAPHORES • For any task Ti to lock a semaphore Sj, it will be granted by the ASP if and only if both of the following two conditions are met: 1. Pi is larger than all of the priority ceilings of the semaphores which are currently locked by other tasks. (PCP) 2. Pi is larger than all of the priorities of the tasks which are running on accelerators, or π(Sj) is lower than all of the priorities of the tasks which are currently at their accelerator subtask parts. • If a task is blocked, we adopt the standard priority inheritance protocol (PIP) National Taiwan University, Taiwan 23
  • 24. PROPERTIES • The ASP guarantees being deadlock-free • A task has at most one accelerator blocking. • A task has at most one semaphore blocking. • Any blocking chain of a system consists of at most two blockings. National Taiwan University, Taiwan 24 blocking blocking
  • 25. TRADEOFF BETWEEN TASK BLOCKING AND ACCELERATOR UTILIZATION • Can we further improve the utilization? Think about… • Does each task have a strict time constraint? • If each task can accept more blocking, then the system utilization might be further improved • Idea : Classify tasks into two kinds of tasks • Critical tasks • Tasks which are associated with hard deadlines. • We discussed it previously. • Ordinary tasks • Tasks don’t have strict time constraints. • Have no deadline • Soft real-time • Priorities of critical tasks are always higher than priorities of ordinary tasks National Taiwan University, Taiwan 25
  • 26. CRITICAL ACCELERATORS • Critical accelerators • Concept: Accelerators that might incur direct/indirect blockings • These accelerators be managed carefully National Taiwan University, Taiwan 26
  • 27. AN EXTENDED ASP • For any task to request a semaphore, the rule is as the same as those of the original ASP. • For any task to request a critical accelerator, the rule is as the same as the original ASP • For any task to request accelerators which are not critical accelerators • A request is granted if the number of the available instances of the requested accelerator type is enough to satisfy the request ((i.e., available resources ≥ needs)). National Taiwan University, Taiwan 27
  • 28. THE EXPERIMENTS National Taiwan University, Taiwan 28
  • 29. THE PARAMETERS Parameters Units #tasks #critical tasks 6 #ordinary tasks 24 #types of accelerators 10 #units of each type of acc 1~3 Uniform Random #binary semaphores 10 National Taiwan University, Taiwan 29
  • 30. THE PARAMETERS OF TASKS Parameters Units #accelerator requests 1~4 Uniform Random #semaphore requests 0~4 Uniform Random relative deadline 5000~1000000 cycles Uniform Random period Relative deadline National Taiwan University, Taiwan 30 • We simulate the performance with synthesized workloads which refer to • A Generic Avionics Platform (GAP): A real system[15] • The benchmark Parboil: A benchmark for GPGPU computing[18] [15]C. Locke, D. Vogel, and T. Mesler. Building a predictable avionics platform in ada: a case study. In Real-Time Systems Symposium, 1991. Proceedings., Twelfth, pages 181–189, 1991. [18] J. A. Stratton, C. Rodrigues, I. J. Sung, N. Obeid, L. W. Chang, N. Anssari, G. D. Liu, and W. W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 2012.
  • 31. THE SIMULATION • To observe the following performance metrics under the various total workloads of the accelerators and the CPU • The deadline satisfaction ratios = (‫݄݁ݐ‬ ݊‫ݎܾ݁݉ݑ‬ ‫݂݋‬ ݆‫ݏܾ݋‬ ‫ݐ݄ܽݐ‬ ݉݁݁‫ݐ‬ ݈݀݁ܽ݀݅݊݁‫)ݏ‬ (‫݄݁ݐ‬ ݊‫ݎܾ݁݉ݑ‬ ‫݂݋‬ ݆‫)ݏܾ݋‬⁄ • The Utilizations • For each task, the following settings are generated randomly, and the distribution is an uniform one. • (‫݈ܽݐ݋ݐ‬ ݁‫݊݋݅ݐݑܿ݁ݔ‬ ‫)݁݉݅ݐ‬ (‫݁ݒ݅ݐ݈ܽ݁ݎ‬ ݈݀݁ܽ݀݅݊݁)⁄ = 10% ~ 40% • (‫݈ܽݐ݋ݐ‬ ݈ܽܿܿ݁݁‫ݎ݋ݐܽݎ‬ ݁‫݊݋݅ݐݑܿ݁ݔ‬ ‫)݁݉݅ݐ‬ (‫݈ܽݐ݋ݐ‬ ݁‫݊݋݅ݐݑܿ݁ݔ‬ ‫)݁݉݅ݐ‬⁄ = 80%~95% • The benchmark Parboil is referenced for these settings • Platform: Intel Core i5- 760, AMD Radeon HD7700, and 8GB memory • The expected total accelerator utilization of the task set was from 30% to 100%. National Taiwan University, Taiwan 31 Period/relative deadline Total execution time(40%) Total accelerator execution time(80%)
  • 32. THE RESULT OF SIMULATION- DEADLINE SATISFACTION RATIOS • The PCP is a multiple-instance PCP • X-SYSTEM : the deadline satisfaction ratios of all tasks under the X protocol • X-CT: the deadline satisfaction ratios of critical tasks under the X protocol National Taiwan University, Taiwan 32
  • 33. THE RESULT OF SIMULATION- UTILIZATION • CPU : the utilization of the CPU • ACC: the utilization of the accelerators National Taiwan University, Taiwan 33
  • 34. CONCLUSIONS • The experiment results show the ASP can • Provide similar or better capability to control priority inversions • Much improve the system utilization • Managing different types of resources with different policies could improve the system utilization by avoiding unnecessary blocking. • The concept could be applied to other protocols that adopt the concept of priority ceiling National Taiwan University, Taiwan 34
  • 35. Q & A National Taiwan University, Taiwan 35
  • 36. THANK YOU
  • 37. THE PRIORITY INHERITANCE RULE • A task might be blocked by multiple tasks • Let lower priority tasks release occupied accelerators as soon as possible • When the task Ti is blocked • A task inherits the Pi if all of the following conditions are satisfied • A subtask of the task is finished, and then the task is ready to sync • The task has the highest priority among these tasks ready to sync National Taiwan University, Taiwan 37
  • 38. SCHEDULABILITY TEST • For N tasks {T1, T2, ..., TN } with priorities P1>P2> ... >PN, each task Ti is schedulable if the following condition is satisfied: • Li,j : the length of the jth critical section of the task Ti • Ci,j : the execution time of the jth subtask of the task Ti National Taiwan University, Taiwan 38
  • 39. THE WORST CASE • The priority bar of accelerator used in a chained blocking is lower. • The bar cannot protect higher-priority tasks • A high priority task is blocked because of another accelerator. National Taiwan University, Taiwan 39
  • 40. ABOUT IMPLEMENT • The rule 3 for accelerator allocation • Two variable to maintain the scope of priorities • The scope is from the highest priority to the lowest priority of the tasks which have been blocked once due to accelerator requests • Granting the request if one of following is satisied • Pi >= the highest priority • Pre-calculated priority ceiling < the lowest priority National Taiwan University, Taiwan 40
  • 41. OTHER ASSUMPTIONS (MISC) • A request gets all of the necessary units or nothing. • All of the obtained units are released after the usage. • Get 2 units and then release 2 units National Taiwan University, Taiwan 41
  • 42. CRITICAL SEMAPHORES AND CRITICAL ACCELERATORS • Critical semaphores • The semaphores which would be requested by any critical task. • Critical accelerators • All of the accelerators that might be requested by any critical task or requested within critical sections protected critical semaphores. National Taiwan University, Taiwan 42
  • 43. THE SIMULATION 2 • To observe the following performance metrics under the various accelerators contention • The deadline satisfaction ratios • Utilizations • The total CPU utilization of the task set is fixed • We adjust the ratio of the accelerator execution time to the total execution time of each task ranging from 10% to 90%. • Synthesis of different degrees of accelerator contention National Taiwan University, Taiwan 43
  • 44. THE RESULT OF SIMULATION 2- DEADLINE SATISFACTION RATIOS National Taiwan University, Taiwan 44
  • 45. THE RESULT OF SIMULATION 2- UTILIZATION National Taiwan University, Taiwan 45

×