Fault tolerant real-time scheduling

401 views
258 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
401
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fault tolerant real-time scheduling

  1. 1. Quasi-static fault-tolerant scheduling schemes for energy-efficient hard real-time systems • Wei Tongquan, CS Department of East China Normal University, China • Piyush Mishra, GE Global Research, Niskayuna, NY 12309, USA • Kaijie Wu, ECE Department of University of Illinois, Chicago, IL 60607, USA • Junlong Zhou, CS Department of East China Normal University, China Journal of Systems and Software 2012 Reza Ramezani 1
  2. 2. A Unified Approach for Fault Tolerance and Dynamic Power Management in Fixed-Priority Real-Time Embedded Systems • Ying Zhang – a Senior Software engineer with the Research and Development Department, Guidant Corporation, St. Paul, MN, USA • Krishnendu Chakrabarty – Department of Electrical and Computer Engineering, Duke University, Durham, USA Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on 25, no. 1 (2006): 111-125. 2
  3. 3. Overview  Primaries  Checkpointing & Response Time  Reliability, The best fault tolerance count?  Feasibility Analysis  Offline Application Level Voltage Scaling  Offline Task Level Voltage Scaling  Online DVS by Using Slacks  Previous Work (Ying Zhang, Krishnendu Chakrabarty, 2006)  Results  Suggestion 3
  4. 4. Primaries 4
  5. 5. Features • Fault Tolerance Scheduling  Transient Faults  Fast Detection  Fault occurrences at runtime, checkpointing and state restoration. • Dynamic Voltage Scaling (DVS) • Offline Scheduling  Application Level Voltage Scaling (A-DVS)  Task Level Voltage Scaling (T-DVS) • Online Scheduling  Using Slacks • Exact Rate-Monotonic Characterization  Instead of iteratively deriving the response time of each task for feasibility analysis. 5
  6. 6. Online DVS Outline • The adaptation of the offline task schedules to the runtime behavior of fault occurrences is implemented:  (1) Pre-computing and saving in a lookup table the maximum slack requirements for the processor to dynamically slow down.  (2) Retrieving and comparing the stored slack time requirements with the generated cumulative slack in the runtime.  (3) Dynamically scaling down processor speed when the generated slack time is equal to or greater than the stored slack requirements. 6
  7. 7. System Architecture 7
  8. 8. System Architecture (2) 8
  9. 9. Checkpointing & Response Time 9
  10. 10. Checkpoint count  Fault-tolerant computing refers to the correct execution of user programs and system software in the presence of faults.  Fault tolerance is typically achieved in real-time systems through online fault detection, checkpointing, and rollback recovery .  Checkpointing increases the task execution time, and in the absence of faults, it might cause a missed deadline for a task that completes on time without checkpointing.  Frequent checkpointing reduces re-execution time due to faults but increases task execution time and vice versa.  Therefore, the checkpointing interval, i.e., the duration between two consecutive checkpoints, must be carefully chosen to balance checkpointing cost with the re-execution time. 10
  11. 11. Fault occurrences count • Relation between fault occurrences count and fault arrival rate  k is the fault occurrences count to be tolerated.  a fault arrival rate λ and a task execution interval t, the mean number of faults that arrive during the interval is λt. o If k is much smaller than λt, a sophisticated fault-tolerant scheme with its associated overhead is not appropriate. o if k is much larger than λt, a fault-tolerant scheme that provides deterministic real-time guarantee may not exist.  In order to target a system with reasonable real-time performance with fault tolerance, the value of k can be taken to be a small multiple of λt, e.g., 2λt ≤ k ≤ 3λt. 11
  12. 12. Checkpointing 12
  13. 13. Fault placement 13
  14. 14. Fault placement 14
  15. 15. Task response time 15
  16. 16. Task response time 16
  17. 17. Reliability The best fault tolerance count? 17
  18. 18. Reliability 18
  19. 19. Task Reliability 19
  20. 20. Task Reliability (2) 20
  21. 21. Task Reliability (3) 21
  22. 22. Feasibility Analysis 22
  23. 23. Exact Characterization of RMA (ECRMA) • Critical Instant  The worst case behavior of RMA occurs when all tasks in a task set are instantiated simultaneously and are ready for execution immediately after initiation.  It has been shown that a schedule of independent periodic tasks is feasible if the first instance of each task is schedulable when it is instantiated at a critical instant Lehoczky et al. (1989) . 23
  24. 24. Exact Characterization of RMA (ECRMA) (2) 24
  25. 25. Exact Characterization of RMA (ECRMA) (3) 25
  26. 26. Offline Application Level Voltage Scaling 26
  27. 27. Application level voltage scaling (A-DVS) 27
  28. 28. A-DVS algorithm 28
  29. 29. A-DVS algorithm (2) • Some Considerations  The binary search based A-DVS algorithm is valid only if the energy consumption is monotonic with respect to frequency/voltage changes.  When the processor static power consumption as well as context switching overhead is considered, the monotonicity does not hold.  In this case, there exists a critical processor speed below which scaling down the processor speed will instead increase the energy consumption.  The minimum voltage level low is initialized to the level corresponding to the processor critical speed. 29
  30. 30. Feasibility Checking Algorithm (FCA) 30
  31. 31. 31 Feasibility Checking Algorithm (FCA)
  32. 32. Offline Task Level Voltage Scaling 32
  33. 33. Task level voltage scaling (T-DVS) 33
  34. 34. T-DVS algorithm 34
  35. 35. T-DVS algorithm (2) 35
  36. 36. T-DVS algorithm (3) 36
  37. 37. T-DVS Consideration 37
  38. 38. Schedulability Checking Algorithm (SCA) 38
  39. 39. Online DVS by Using Slacks 39
  40. 40. Online reevaluation of DVS policies  Offline scheduling assumes that all tasks exhibit the worst case execution time and all faults occur during the checkpointing.  The runtime behavior of task execution and fault occurrences can vary significantly.  In the runtime, not all tasks execute up to their worst case execution times and not all faults occur during task executions.  Hence, the slack generated in the runtime could be used to dynamically scale down the processor speed to save energy.  The online reevaluation of DVS policies can save significant energy by using generated slacks due to uncertainties in fault occurrence. 40
  41. 41. Reevaluation of DVS at application level 41
  42. 42. Reevaluation of DVS at application level (2) 42
  43. 43. Reevaluation of DVS at application level (3) 43
  44. 44. 44
  45. 45. 45
  46. 46. Dynamic ADVS Algorithm 46
  47. 47. Dynamic ADVS Algorithm (2) 47
  48. 48. Example 48
  49. 49. Reevaluation of DVS policies at task level 49
  50. 50. Reevaluation of T-DVS (D-TDVS) 50
  51. 51. Previous Work (Ying Zhang, Krishnendu Chakrabarty, 2006) 51
  52. 52. Feasibility Analysis 52
  53. 53. Feasibility of a task set under fault-free conditions 53 Fault Free
  54. 54. Tolerating k Faults in Each Task 54
  55. 55. Fault Tolerance With DVS 55
  56. 56. Fault Tolerance With DVS (2) 56
  57. 57. Fault Tolerance With DVS (3) 57
  58. 58. Heuristic Method Based on GA 58
  59. 59. Heuristic Method Based on GA (2) • Init function  Initializes the search space (chromosome population).  One chromosome is initially generated using the computationally feasible application-level speed scaling method.  The other chromosomes are generated randomly. 59
  60. 60. Heuristic Method Based on GA 60
  61. 61. Results 61
  62. 62. Experiments 62
  63. 63. Processors 63
  64. 64. Task Sets 64
  65. 65. Application level results on Tranmeta Crusoe 65
  66. 66. Task level results on Tranmeta Crusoe 66
  67. 67. Application level results on Intel XScale 67
  68. 68. Task level results on Intel XScale 68
  69. 69. Real life implementation  The energy consumptions of the system board ,excludes the processor time. 69
  70. 70. Suggestion  The scheduler can tolerate at least k faults and then tries to DVS by using slacks.  Tolerating more faults than k by increasing processor speed when more faults than k occur. 70

×