10. resource management


Published on


Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

10. resource management

  1. 1. Sandeep Kumar Poonia Head of Dept. CS/IT B.E., M.Tech., UGC-NET LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
  2. 2. Resource Management  Introduction  Desirable Features of Good Global Scheduling Algorithm  Task Assignment Approach  Load Balancing Approach  Load Sharing Approach Sandeep Kumar Poonia 11/10/2013 2
  3. 3. Resource Management A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions of a distributed operating system is to assign processes to the nodes (resources) of the distributed system such that the resource usage, response time, network congestion, and scheduling overhead are optimized. Sandeep Kumar Poonia 11/10/2013 3
  4. 4. Resource Management Scheduling Techniques in DS  Task Assignment Approach  Load Balancing Approach  Load Sharing Approach Sandeep Kumar Poonia 11/10/2013 4
  5. 5. Resource Management  Task Assignment Approach, in which each process submitted by a user for processing is viewed as a collection of related tasks and these tasks are scheduled to suitable nodes so as to improve performance. Sandeep Kumar Poonia 11/10/2013 5
  6. 6. Resource Management  Load-balancing approach, in which all the processes submitted by the users are distributed among the nodes of the system so as to equalize the workload among the nodes. Sandeep Kumar Poonia 11/10/2013 6
  7. 7. Resource Management  Load-sharing approach, which simply attempts to conserve the ability of the system to perform work by assuring that no node is idle while processes wait for being processed. Sandeep Kumar Poonia 11/10/2013 7
  8. 8. Resource Management         No A Priori Knowledge about the Processes Dynamic in Nature Quick Decision-making Capability Balanced System Performance and Scheduling Overhead Stability Scalability Fault Tolerance Fairness of Service Sandeep Kumar Poonia 11/10/2013 8
  9. 9. Resource Management  No a priori knowledge about the processes Scheduling algorithms that operate based on the information about the characteristics and resource requirements of the processes pose an extra burden on the users who must provide this information while submitting their processes for execution.  Dynamic in nature Process assignment decisions should be dynamic, I.e., be based on the current load of the system and not on some static policy. It is recommended that the scheduling algorithm possess the flexibility to migrate a process more than once because the initial decision of placing a process on a particular node may have to be changed after some time to adapt to the new system load. Sandeep Kumar Poonia 11/10/2013 9
  10. 10. Resource Management  Quick decision making capability Heuristic methods requiring less computational efforts (and hence less time) while providing near-optimal results are preferable to exhaustive (optimal) solution methods.  Balanced system performance scheduling overhead and Algorithms that provide near-optimal system performance with a minimum of global state information (such as CPU load) gathering overhead are desirable. This is because the overhead increases as the amount of global state information collected increases. This is because the usefulness of that information is decreased due to both the aging of the information being gathered and the low scheduling frequency as a result of the cost of gathering and processing the extra information. Sandeep Kumar Poonia 11/10/2013 10
  11. 11. Resource Management  Stability Fruitless migration of processes, known as processor thrashing, must be prevented. E.g. if nodes n1 and n2 observe that node n3 is idle and then offload a portion of their work to n3 without being aware of the offloading decision made by the other node. Now if n3 becomes overloaded due to this it may again start transferring its processes to other nodes. This is caused by scheduling decisions being made at each node independently of decisions made by other nodes. Sandeep Kumar Poonia 11/10/2013 11
  12. 12. Resource Management  Scalability A scheduling algorithm should scale well as the number of nodes increases. An algorithm that makes scheduling decisions by first inquiring the workload from all the nodes and then selecting the most lightly loaded node has poor scalability. This will work fine only when there are few nodes in the system. This is because the inquirer receives a flood of replies almost simultaneously, and the time required to process the reply messages for making a node selection is too long as the number of nodes (N) increase. Also the network traffic quickly consumes network bandwidth. A simple approach is to probe only m of N nodes for selecting a node. Sandeep Kumar Poonia 11/10/2013 12
  13. 13. Resource Management  Fault tolerance A good scheduling algorithm should not be disabled by the crash of one or more nodes of the system. Also, if the nodes are partitioned into two or more groups due to link failures, the algorithm should be capable of functioning properly for the nodes within a group. Algorithms that have decentralized decision making capability and consider only available nodes in their decision making have better fault tolerance capability. Sandeep Kumar Poonia 11/10/2013 13
  14. 14. Resource Management  Fairness of service Global scheduling policies that blindly attempt to balance the load on all the nodes of the system are not good from the point of view of fairness of service. This is because in any load-balancing scheme, heavily loaded nodes will obtain all the benefits while lightly loaded nodes will suffer poorer response time than in a stand-alone configuration. A fair strategy that improves response time of the former without unduly affecting the latter is desirable. Hence load-balancing has to be replaced by the concept of load sharing, that is, a node will share some of its resources as long as its users are not significantly affected. Sandeep Kumar Poonia 11/10/2013 14
  15. 15. Resource Management Assumptions: 1. 2. 3. 4. 5. 6. A process has already been split up into pieces called tasks. This split occurs along natural boundaries (such as a method), so that each task will have integrity in itself and data transfers among the tasks are minimized. The amount of computation required by each task and the speed of each CPU are known. The cost of processing each task on every node is known. This is derived from assumption 2. The IPC costs between every pair of tasks is known. The IPC cost is 0 for tasks assigned to the same node. This is usually estimated by an analysis of the static program. If two tasks communicate n times and the average time for each inter-task communication is t, them IPC costs for the two tasks is n * t. Precedence relationships among the tasks are known. Reassignment of tasks is not possible. Sandeep Kumar Poonia 11/10/2013 15
  16. 16. Resource Management ◦ ◦ ◦ ◦ Minimization of IPC costs Quick turnaround time for the complete process A high degree of parallelism Efficient utilization of system resources in general These goals often conflict. E.g., while minimizing IPC costs tends to assign all tasks of a process to a single node, efficient utilization of system resources tries to distribute the tasks evenly among the nodes. So also, quick turnaround time and a high degree of parallelism encourage parallel execution of the tasks, the precedence relationship among the tasks limits their parallel execution. Also note that in case of m tasks and q nodes, there are mq possible assignments of tasks to nodes . In practice, however, the actual number of possible assignments of tasks to nodes may be less than mq due to the restriction that certain tasks cannot be assigned to certain nodes due to their specific requirements (e.g. need a certain amount of memory or a certain data file). Sandeep Kumar Poonia 11/10/2013 16
  17. 17. Resource Management  There are two nodes, {n1, n2} and six tasks {t1, t2, t3, t4, t5, t6}. There are two task assignment parameters – the task execution cost (xab the cost of executing task a on node b) and the inter-task communication cost (cij the inter-task communication cost between tasks i and j). Inter-task communication cost Execution costs t1 t2 t1 t2 t3 t4 t5 t6 0 6 4 0 0 12 6 0 8 12 3 0 Nodes n1 n2 t1 5 10 t3 t4 t5 t6 4 0 0 12 t2 t3 t4 t5 8 12 3 0 0 0 11 0 0 0 5 0 11 5 0 0 0 0 0 0 2 4 6 5  4 3 2 t6  4 Task t6 cannot be executed on node n1 and task t2 cannot be executed on node n2 since the resources they need are not available on these nodes. Sandeep Kumar Poonia 11/10/2013 17
  18. 18. Resource Management 1) Serial assignment, where tasks t1, t2, t3 are assigned to node n1 and tasks t4, t5, t6 are assigned to node n2: Execution cost, x = x11 + x21 + x31 + x42 + x52 + x62 = 5 + 2 + 4 + 3 + 2 + 4 = 20 Communication cost, c = c14 + c15 + c16 + c24 + c25 + c26 + c34 + c35 + c36 = 0 + 0 + 12 + 12 + 3 + 0 + 0 + 11 + 0 = 38. Hence total cost = 58. 2) Optimal assignment, where tasks t1, t2, t3, t4, t5 are assigned to node n1 and task t6 is assigned to node n2. Execution cost, x = x11 + x21 + x31 + x41 + x51 + x62 = 5 + 2 + 4 + 6 + 5 + 4 = 26 Communication cost, c = c16 + c26 + c36 + c46 + c56 = 12 + 0 + 0 + 0 + 0 = 12 Total cost = 38 Sandeep Kumar Poonia 11/10/2013 18
  19. 19. Resource Management Optimal assignments are found by first creating a static assignment graph. In this graph, the weights of the edges joining pairs of task nodes represent inter-task communication costs. The weight on the edge joining a task node to node n1 represents the execution cost of that task on node n2 and vice-versa. Then we determine a minimum cutset in this graph. A cutset is defined to be a set of edges such that when these edges are removed, the nodes of the graph are partitioned into two disjoint subsets such that nodes in one subset are reachable from n1 and the nodes in the other are reachable from n2. Each task node is reachable from either n1 or n2. The weight of a cutset is the sum of the weights of the edges in the cutset. This sums up the execution and communication costs for that assignment. An optimal assignment is found by finding a minimum cutset. Sandeep Kumar Poonia 11/10/2013 19
  20. 20. Resource Management Sandeep Kumar Poonia 11/10/2013 20
  21. 21. Resource Management Load-balancing algorithms Static Deterministic Dynamic Probabilistic Centralized Distributed Cooperative  Noncooperative A Taxonomy of Load-Balancing Algorithms Sandeep Kumar Poonia 11/10/2013 21
  22. 22. Resource Management  Static versus Dynamic ◦ Static algorithms use only information about the average behavior of the system ◦ Static algorithms ignore the current state or load of the nodes in the system ◦ Dynamic algorithms collect state information and react to system state if it changed ◦ Static algorithms are much more simpler ◦ Dynamic algorithms are able to give significantly better performance Sandeep Kumar Poonia 11/10/2013 22
  23. 23. Resource Management  Deterministic versus Probabilistic ◦ Deterministic algorithms use the information about the properties of the nodes and the characteristic of processes to be scheduled ◦ Probabilistic algorithms use information of static attributes of the system (e.g. number of nodes, processing capability, topology) to formulate simple process placement rules ◦ Deterministic approach is difficult to optimize Probabilistic approach has poor performance Sandeep Kumar Poonia 11/10/2013 23
  24. 24. Resource Management  Centralized versus Distributed ◦ Centralized approach collects information to server node and makes assignment decision ◦ Distributed approach contains entities to make decisions on a predefined set of nodes ◦ Centralized algorithms can make efficient decisions, have lower fault-tolerance ◦ Distributed algorithms avoid the bottleneck of collecting state information and react faster Sandeep Kumar Poonia 11/10/2013 24
  25. 25. Resource Management  Cooperative versus Noncooperative ◦ In Noncooperative algorithms entities act as autonomous ones and make scheduling decisions independently from other entities ◦ In Cooperative algorithms distributed entities cooperate with each other ◦ Cooperative algorithms are more complex and involve larger overhead ◦ Stability of Cooperative algorithms are better Sandeep Kumar Poonia 11/10/2013 25
  26. 26. Resource Management  Load estimation policy ◦ determines how to estimate the workload of a node  Process transfer policy ◦ determines whether to execute a process locally or remote  State information exchange policy ◦ determines how to exchange load information among nodes  Location policy ◦ determines to which node the transferable process should be sent  Priority assignment policy ◦ determines the priority of execution of local and remote processes  Migration limiting policy ◦ determines the total number of times a process can migrate Sandeep Kumar Poonia 11/10/2013 26
  27. 27. Resource Management   To balance the workload on all the nodes of the system, it is necessary to decide how to measure the workload of a particular node Some measurable parameters (with time and node dependent factor) can be the following: ◦ ◦ ◦ ◦  Total number of processes on the node Resource demands of these processes Instruction mixes of these processes Architecture and speed of the node’s processor Several load-balancing algorithms use the total number of processes to achieve big efficiency Sandeep Kumar Poonia 11/10/2013 27
  28. 28. Resource Management  In some cases the true load could vary widely depending on the remaining service time, which can be measured in several way: ◦ Memoryless method assumes that all processes have the same expected remaining service time, independent of the time used so far ◦ Pastrepeats assumes that the remaining service time is equal to the time used so far ◦ Distribution method states that if the distribution service times is known, the associated process’s remaining service time is the expected remaining time conditioned by the time already used Sandeep Kumar Poonia 11/10/2013 28
  29. 29. Resource Management     None of the previous methods can be used in modern systems because of periodically running processes and daemons An acceptable method for use as the load estimation policy in these systems would be to measure the CPU utilization of the nodes Central Processing Unit utilization is defined as the number of CPU cycles actually executed per unit of real time It can be measured by setting up a timer to periodically check the CPU state (idle/busy) Sandeep Kumar Poonia 11/10/2013 29
  30. 30. Resource Management   Most of the algorithms use the threshold policy to decide on whether the node is lightly-loaded or heavily-loaded Threshold value is a limiting value of the workload of node which can be determined by ◦ Static policy: predefined threshold value for each node depending on processing capability ◦ Dynamic policy: threshold value is calculated from average workload and a predefined constant  Below threshold value node accepts processes to execute, above threshold value node tries to transfer processes to a lightly-loaded node Sandeep Kumar Poonia 11/10/2013 30
  31. 31. Resource Management  Single-threshold policy may lead to unstable algorithm because underloaded node could turn to be overloaded right after a process migration Overloaded Overloaded High mark Threshold Underloaded Single-threshold policy  Normal Low mark Underloaded Double-threshold policy To reduce instability double-threshold policy has been proposed which is also known as high-low policy Sandeep Kumar Poonia 11/10/2013 31
  32. 32. Resource Management  Double threshold policy ◦ When node is in overloaded region new local processes are sent to run remotely, requests to accept remote processes are rejected ◦ When node is in normal region new local processes run locally, requests to accept remote processes are rejected ◦ When node is in under loaded region new local processes run locally, requests to accept remote processes are accepted Sandeep Kumar Poonia 11/10/2013 32
  33. 33. Resource Management  Threshold method ◦ Policy selects a random node, checks whether the node is able to receive the process, then transfers the process. If node rejects, another node is selected randomly. This continues until probe limit is reached.  Shortest method ◦ L distinct nodes are chosen at random, each is polled to determine its load. The process is transferred to the node having the minimum value unless its workload value prohibits to accept the process. ◦ Simple improvement is to discontinue probing whenever a node with zero load is encountered. Sandeep Kumar Poonia 11/10/2013 33
  34. 34. Resource Management  Bidding method ◦ Nodes contain managers (to send processes) and contractors (to receive processes) ◦ Managers broadcast a request for bid, contractors respond with bids (prices based on capacity of the contractor node) and manager selects the best offer ◦ Winning contractor is notified and asked whether it accepts the process for execution or not ◦ Full autonomy for the nodes regarding scheduling ◦ Big communication overhead ◦ Difficult to decide a good pricing policy Sandeep Kumar Poonia 11/10/2013 34
  35. 35. Resource Management  Pairing ◦ Contrary to the former methods the pairing policy is to reduce the variance of load only between pairs ◦ Each node asks some randomly chosen node to form a pair with it ◦ If it receives a rejection it randomly selects another node and tries to pair again ◦ Two nodes that differ greatly in load are temporarily paired with each other and migration starts ◦ The pair is broken as soon as the migration is over ◦ A node only tries to find a partner if it has at least two processes Sandeep Kumar Poonia 11/10/2013 35
  36. 36. Resource Management  Dynamic policies require frequent exchange of state information, but these extra messages arise two opposite impacts: ◦ Increasing the number of messages gives more accurate scheduling decision ◦ Increasing the number of messages raises the queuing time of messages  State information policies can be the following: ◦ ◦ ◦ ◦ Periodic broadcast Broadcast when state changes On-demand exchange Exchange by polling Sandeep Kumar Poonia 11/10/2013 36
  37. 37. Resource Management  Periodic broadcast ◦ Each node broadcasts its state information after the elapse of every T units of time ◦ Problem: heavy traffic, fruitless messages, poor scalability since information exchange is too large for networks having many nodes  Broadcast when state changes ◦ Avoids fruitless messages by broadcasting the state only when a process arrives or departures ◦ Further improvement is to broadcast only when state switches to another region (double-threshold policy) Sandeep Kumar Poonia 11/10/2013 37
  38. 38. Resource Management  On-demand exchange ◦ In this method a node broadcast a State-Information-Request message when its state switches from normal to either underloaded or overloaded region. ◦ On receiving this message other nodes reply with their own state information to the requesting node ◦ Further improvement can be that only those nodes reply which are useful to the requesting node  Exchange by polling ◦ To avoid poor scalability (coming from broadcast messages) the partner node is searched by polling the other nodes on by one, until poll limit is reached Sandeep Kumar Poonia 11/10/2013 38
  39. 39. Resource Management  Selfish ◦ Local processes are given higher priority than remote processes. Worst response time performance of the three policies.  Altruistic ◦ Remote processes are given higher priority than local processes. Best response time performance of the three policies.  Intermediate ◦ When the number of local processes is greater or equal to the number of remote processes, local processes are given higher priority than remote processes. Otherwise, remote processes are given higher priority than local processes. Sandeep Kumar Poonia 11/10/2013 39
  40. 40. Resource Management  This policy determines the total number of times a process can migrate ◦ Uncontrolled  A remote process arriving at a node is treated just as a process originating at a node, so a process may be migrated any number of times ◦ Controlled  Avoids the instability of the uncontrolled policy  Use a migration count parameter to fix a limit on the number of time a process can migrate  Irrevocable migration policy: migration count is fixed to 1  For long execution processes migration count must be greater than 1 to adapt for dynamically changing states Sandeep Kumar Poonia 11/10/2013 40
  41. 41. Resource Management  Drawbacks of Load-balancing approach ◦ Load balancing technique with attempting equalizing the workload on all the nodes is not an appropriate object since big overhead is generated by gathering exact state information ◦ Load balancing is not achievable since number of processes in a node is always fluctuating and temporal unbalance among the nodes exists every moment  Basic ideas for Load-sharing approach ◦ It is necessary and sufficient to prevent nodes from being idle while some other nodes have more than two processes ◦ Load-sharing is much simpler than load-balancing since it only attempts to ensure that no node is idle when heavily node exists ◦ Priority assignment policy and migration limiting policy are the same as that for the load-balancing algorithms Sandeep Kumar Poonia 11/10/2013 41
  42. 42. Resource Management    Since load-sharing algorithms simply attempt to avoid idle nodes, it is sufficient to know whether a node is busy or idle Thus these algorithms normally employ the simplest load estimation policy of counting the total number of processes In modern systems where permanent existence of several processes on an idle node is possible, algorithms measure CPU utilization to estimate the load of a node Sandeep Kumar Poonia 11/10/2013 42
  43. 43. Resource Management      Algorithms normally use all-or-nothing strategy This strategy uses the threshold value of all the nodes fixed to 1 Nodes become receiver node when it has no process, and become sender node when it has more than 1 process To avoid processing power on nodes having zero process load-sharing algorithms use a threshold value of 2 instead of 1 When CPU utilization is used as the load estimation policy, the double-threshold policy should be used as the process transfer policy Sandeep Kumar Poonia 11/10/2013 43
  44. 44. Resource Management  Location policy decides whether the sender node or the receiver node of the process takes the initiative to search for suitable node in the system, and this policy can be the following: ◦ Sender-initiated location policy  Sender node decides where to send the process  Heavily loaded nodes search for lightly loaded nodes ◦ Receiver-initiated location policy  Receiver node decides from where to get the process  Lightly loaded nodes search for heavily loaded nodes Sandeep Kumar Poonia 11/10/2013 44
  45. 45. Resource Management  Sender-initiated location policy ◦ Node becomes overloaded, it either broadcasts or randomly probes the other nodes one by one to find a node that is able to receive remote processes ◦ When broadcasting, suitable node is known as soon as reply arrives  Receiver-initiated location policy ◦ Nodes becomes underloaded, it either broadcast or randomly probes the other nodes one by one to indicate its willingness to receive remote processes  Receiver-initiated policy require preemptive process migration facility since scheduling decisions are usually made at process departure epochs Sandeep Kumar Poonia 11/10/2013 45
  46. 46. Resource Management  Experiences with location policies ◦ Both policies gives substantial performance advantages over the situation in which no loadsharing is attempted ◦ Sender-initiated policy is preferable at light to moderate system loads ◦ Receiver-initiated policy is preferable at high system loads ◦ Sender-initiated policy provide better performance for the case when process transfer cost significantly more at receiver-initiated than at sender-initiated policy due to the preemptive transfer of processes Sandeep Kumar Poonia 11/10/2013 46
  47. 47. Resource Management   In load-sharing algorithms it is not necessary for the nodes to periodically exchange state information, but needs to know the state of other nodes when it is either underloaded or overloaded Broadcast when state changes ◦ In sender-initiated/receiver-initiated location policy a node broadcasts State Information Request when it becomes overloaded/underloaded ◦ It is called broadcast-when-idle policy when receiver-initiated policy is used with fixed threshold value value of 1  Poll when state changes ◦ In large networks polling mechanism is used ◦ Polling mechanism randomly asks different nodes for state information until find an appropriate one or probe limit is reached ◦ It is called poll-when-idle policy when receiver-initiated policy is used with fixed threshold value value of 1 Sandeep Kumar Poonia 11/10/2013 47
  48. 48. Resource Management   Resource manager of a distributed system schedules the processes to optimize combination of resources usage, response time, network congestion, scheduling overhead Three different approaches has been discussed ◦ Task assignment approach deals with the assignment of task in order to minimize inter process communication costs and improve turnaround time for the complete process, by taking some constraints into account ◦ In load-balancing approach the process assignment decisions attempt to equalize the avarage workload on all the nodes of the system ◦ In load-sharing approach the process assignment decisions attempt to keep all the nodes busy if there are sufficient processes in the system for all the nodes Sandeep Kumar Poonia 11/10/2013 48
  49. 49. Resource Management component faults    Transient faults: occur once and then disappear. E.g. a bird flying through the beam of a microwave transmitter may cause lost bits on some network. If retry, may work. Intermittent faults: occurs, then vanishes, then reappears, and so on. E.g. A loose contact on a connector. Permanent faults: continue to exist until the fault is repaired. E.g. burnt-out chips, software bugs, and disk head crashes. Sandeep Kumar Poonia 11/10/2013 49
  50. 50. Resource Management There are two types of processor faults: 1. 2. Fail-silent faults: a faulty processor just stops and does not respond Byzantine faults: continue to run but give wrong answers Sandeep Kumar Poonia 11/10/2013 50
  51. 51. Resource Management  Synchronous systems: a system that has the property of always responding to a message within a known finite bound if it is working is said to be synchronous. Otherwise, it is asynchronous. Sandeep Kumar Poonia 11/10/2013 51
  52. 52. Resource Management There are three kinds of fault tolerance approaches: 1. Information redundancy: extra bit to recover from garbled bits.(ex. Hamming code) 2. Time redundancy: do again(using atomic transaction, useful in transient & intermittent faults) 3. Physical redundancy: add extra components. There are two ways to organize extra physical equipment: active replication (use the components at the same time) and primary backup (use the backup if one fails). Sandeep Kumar Poonia 11/10/2013 52
  53. 53. Resource Management A B C voter A1 V1 B1 V4 C1 V7 A2 V2 B2 V5 C2 V8 A3 V3 B3 V6 C3 V9 Sandeep Kumar Poonia 11/10/2013 53
  54. 54. Resource Management   A system is said to be k fault tolerant if it can survive faults in k components and still meet its specifications. K+1 processors can fault tolerant k failstop faults. If k of them fail, the one left can work. But need 2k+1 to tolerate k Byzantine faults because if k processors send out wrong replies, but there are still k+1 processors giving the correct answer. By majority vote, a correct answer can still be obtained. Sandeep Kumar Poonia 11/10/2013 54
  55. 55. Resource Management 1. Request Client 2. Do work 3. Update Primary 6. Reply 4. Do work Backup 5. Ack Sandeep Kumar Poonia 11/10/2013 55
  56. 56. Resource Management Backward recovery – checkpoints.  In the checkpointing method, undesirable situations can occur:  two Lost message- The state of process Pi indicates that it has sent a message m to process Pj. Pj has no record of receiving this message.  Orphan message- The state of process Pj is such that it has received a message m from the process Pi but the state of the process Pi is such that it has never sent the message m to Pj. Sandeep Kumar Poonia 11/10/2013 56
  57. 57. Resource Management   A strongly consistent set of checkpoints consist of a set of local checkpoints such that there is no orphan or lost message. A consistent set of checkpoints consists of a set of local checkpoints such that there is no orphan message. Sandeep Kumar Poonia 11/10/2013 57
  58. 58. Resource Management Current checkpoint Pi m Pj failure Current checkpoint Sandeep Kumar Poonia 11/10/2013 58
  59. 59. Resource Management   a processor Pi needs to take a checkpoint only if there is another process Pj that has taken a checkpoint that includes the receipt of a message from Pi and Pi has not recorded the sending of this message. In this way no orphan message will be generated. Sandeep Kumar Poonia 11/10/2013 59
  60. 60. Resource Management  Each process takes its checkpoints independently without any coordination. Sandeep Kumar Poonia 11/10/2013 60
  61. 61. Resource Management  Synchronous checkpoints are established in a longer period while asynchronous checkpoints are used in a shorter period. That is, within a synchronous period there are several asynchronous periods. Sandeep Kumar Poonia 11/10/2013 61
  62. 62. Resource Management Two-army problem  Two blue armies must reach agreement to attack a red army. If one blue army attacks by itself it will be slaughtered. They can only communicate using an unreliable channel: sending a messenger who is subject to capture by the red army.  They can never reach an agreement on attacking. Sandeep Kumar Poonia 11/10/2013 62
  63. 63. Resource Management  Now assume the communication is perfect but the processors are not. The classical problem is called the Byzantine generals problem. N generals and M of them are traitors. Can they reach an agreement? Sandeep Kumar Poonia 11/10/2013 63
  64. 64. Resource Management 1 1 1 2 x 3 2 2 y 1 z 4 2 4 N=4 M=1 4 4 Sandeep Kumar Poonia 11/10/2013 6 4
  65. 65. Resource Management  After the first round 1 got (1,2,x,4); 2 got (1,2,y,4); 3 got (1,2,3,4); 4 got (1,2,z,4)  After the second round 1 got (1,2,y,4), (a,b,c,d), (1,2,z,4) 2 got (1,2,x,4), (e,f,g,h), (1,2,z,4) 4 got (1,2,x,4), (1,2,y,4), (i,j,k,l)  Majority 1 got (1,2,_,4); 2 got (1,2,_,4); 4 got (1,2,_,4)  So all the good generals know that 3 is the bad guy. Sandeep Kumar Poonia 11/10/2013 65
  66. 66. Resource Management   If there are m faulty processors, agreement can be achieved only if 2m+1 correct processors are present, for a total of 3m+1. Suppose n=3, m=1. Agreement cannot be reached. Sandeep Kumar Poonia 11/10/2013 66
  67. 67. Resource Management 1 1 2 2 2 x 1 y 3 After the first round 1 got (1,2,x); 2 got (1,2,y); 3 got (1,2,3) After the second round 1 got (1,2,y), (a,b,c) 2 got (1,2,x), (d,e,f) No majority. Cannot reach an agreement. Sandeep Kumar Poonia 11/10/2013 67