• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Fault tolerant real-time scheduling

Fault tolerant real-time scheduling






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Fault tolerant real-time scheduling Fault tolerant real-time scheduling Presentation Transcript

    • Quasi-static fault-tolerant scheduling schemes for energy-efficient hard real-time systems • Wei Tongquan, CS Department of East China Normal University, China • Piyush Mishra, GE Global Research, Niskayuna, NY 12309, USA • Kaijie Wu, ECE Department of University of Illinois, Chicago, IL 60607, USA • Junlong Zhou, CS Department of East China Normal University, China Journal of Systems and Software 2012 Reza Ramezani 1
    • A Unified Approach for Fault Tolerance and Dynamic Power Management in Fixed-Priority Real-Time Embedded Systems • Ying Zhang – a Senior Software engineer with the Research and Development Department, Guidant Corporation, St. Paul, MN, USA • Krishnendu Chakrabarty – Department of Electrical and Computer Engineering, Duke University, Durham, USA Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on 25, no. 1 (2006): 111-125. 2
    • Overview  Primaries  Checkpointing & Response Time  Reliability, The best fault tolerance count?  Feasibility Analysis  Offline Application Level Voltage Scaling  Offline Task Level Voltage Scaling  Online DVS by Using Slacks  Previous Work (Ying Zhang, Krishnendu Chakrabarty, 2006)  Results  Suggestion 3
    • Primaries 4
    • Features • Fault Tolerance Scheduling  Transient Faults  Fast Detection  Fault occurrences at runtime, checkpointing and state restoration. • Dynamic Voltage Scaling (DVS) • Offline Scheduling  Application Level Voltage Scaling (A-DVS)  Task Level Voltage Scaling (T-DVS) • Online Scheduling  Using Slacks • Exact Rate-Monotonic Characterization  Instead of iteratively deriving the response time of each task for feasibility analysis. 5
    • Online DVS Outline • The adaptation of the offline task schedules to the runtime behavior of fault occurrences is implemented:  (1) Pre-computing and saving in a lookup table the maximum slack requirements for the processor to dynamically slow down.  (2) Retrieving and comparing the stored slack time requirements with the generated cumulative slack in the runtime.  (3) Dynamically scaling down processor speed when the generated slack time is equal to or greater than the stored slack requirements. 6
    • System Architecture 7
    • System Architecture (2) 8
    • Checkpointing & Response Time 9
    • Checkpoint count  Fault-tolerant computing refers to the correct execution of user programs and system software in the presence of faults.  Fault tolerance is typically achieved in real-time systems through online fault detection, checkpointing, and rollback recovery .  Checkpointing increases the task execution time, and in the absence of faults, it might cause a missed deadline for a task that completes on time without checkpointing.  Frequent checkpointing reduces re-execution time due to faults but increases task execution time and vice versa.  Therefore, the checkpointing interval, i.e., the duration between two consecutive checkpoints, must be carefully chosen to balance checkpointing cost with the re-execution time. 10
    • Fault occurrences count • Relation between fault occurrences count and fault arrival rate  k is the fault occurrences count to be tolerated.  a fault arrival rate λ and a task execution interval t, the mean number of faults that arrive during the interval is λt. o If k is much smaller than λt, a sophisticated fault-tolerant scheme with its associated overhead is not appropriate. o if k is much larger than λt, a fault-tolerant scheme that provides deterministic real-time guarantee may not exist.  In order to target a system with reasonable real-time performance with fault tolerance, the value of k can be taken to be a small multiple of λt, e.g., 2λt ≤ k ≤ 3λt. 11
    • Checkpointing 12
    • Fault placement 13
    • Fault placement 14
    • Task response time 15
    • Task response time 16
    • Reliability The best fault tolerance count? 17
    • Reliability 18
    • Task Reliability 19
    • Task Reliability (2) 20
    • Task Reliability (3) 21
    • Feasibility Analysis 22
    • Exact Characterization of RMA (ECRMA) • Critical Instant  The worst case behavior of RMA occurs when all tasks in a task set are instantiated simultaneously and are ready for execution immediately after initiation.  It has been shown that a schedule of independent periodic tasks is feasible if the first instance of each task is schedulable when it is instantiated at a critical instant Lehoczky et al. (1989) . 23
    • Exact Characterization of RMA (ECRMA) (2) 24
    • Exact Characterization of RMA (ECRMA) (3) 25
    • Offline Application Level Voltage Scaling 26
    • Application level voltage scaling (A-DVS) 27
    • A-DVS algorithm 28
    • A-DVS algorithm (2) • Some Considerations  The binary search based A-DVS algorithm is valid only if the energy consumption is monotonic with respect to frequency/voltage changes.  When the processor static power consumption as well as context switching overhead is considered, the monotonicity does not hold.  In this case, there exists a critical processor speed below which scaling down the processor speed will instead increase the energy consumption.  The minimum voltage level low is initialized to the level corresponding to the processor critical speed. 29
    • Feasibility Checking Algorithm (FCA) 30
    • 31 Feasibility Checking Algorithm (FCA)
    • Offline Task Level Voltage Scaling 32
    • Task level voltage scaling (T-DVS) 33
    • T-DVS algorithm 34
    • T-DVS algorithm (2) 35
    • T-DVS algorithm (3) 36
    • T-DVS Consideration 37
    • Schedulability Checking Algorithm (SCA) 38
    • Online DVS by Using Slacks 39
    • Online reevaluation of DVS policies  Offline scheduling assumes that all tasks exhibit the worst case execution time and all faults occur during the checkpointing.  The runtime behavior of task execution and fault occurrences can vary significantly.  In the runtime, not all tasks execute up to their worst case execution times and not all faults occur during task executions.  Hence, the slack generated in the runtime could be used to dynamically scale down the processor speed to save energy.  The online reevaluation of DVS policies can save significant energy by using generated slacks due to uncertainties in fault occurrence. 40
    • Reevaluation of DVS at application level 41
    • Reevaluation of DVS at application level (2) 42
    • Reevaluation of DVS at application level (3) 43
    • 44
    • 45
    • Dynamic ADVS Algorithm 46
    • Dynamic ADVS Algorithm (2) 47
    • Example 48
    • Reevaluation of DVS policies at task level 49
    • Reevaluation of T-DVS (D-TDVS) 50
    • Previous Work (Ying Zhang, Krishnendu Chakrabarty, 2006) 51
    • Feasibility Analysis 52
    • Feasibility of a task set under fault-free conditions 53 Fault Free
    • Tolerating k Faults in Each Task 54
    • Fault Tolerance With DVS 55
    • Fault Tolerance With DVS (2) 56
    • Fault Tolerance With DVS (3) 57
    • Heuristic Method Based on GA 58
    • Heuristic Method Based on GA (2) • Init function  Initializes the search space (chromosome population).  One chromosome is initially generated using the computationally feasible application-level speed scaling method.  The other chromosomes are generated randomly. 59
    • Heuristic Method Based on GA 60
    • Results 61
    • Experiments 62
    • Processors 63
    • Task Sets 64
    • Application level results on Tranmeta Crusoe 65
    • Task level results on Tranmeta Crusoe 66
    • Application level results on Intel XScale 67
    • Task level results on Intel XScale 68
    • Real life implementation  The energy consumptions of the system board ,excludes the processor time. 69
    • Suggestion  The scheduler can tolerate at least k faults and then tries to DVS by using slacks.  Tolerating more faults than k by increasing processor speed when more faults than k occur. 70