Introduction• A resource can be logical, such as a shared file or physical such as CPU.• The set of available resources in a distributed system acts like a single virtual system.
Cont…• Resource manager: – Controls the assignment of resources to processes. – Routes the processes to suitable nodes of the system in such a manner that resource usage, response time, network congestion, and scheduling overhead are optimized.
TechniquesTask assignment approach:• Each process submitted by a user for processing is viewed as a collection of related tasks.• Tasks are scheduled to suitable nodes to improve performance.
Cont…Load-balancing approach:• All the processes submitted by the users are distributed among the nodes of the system.• Equalizes the workload among the nodes.
Cont…Load-sharing approach:• Attempts to conserve the ability of the system, assuring that no node is idle while processes wait for being processed.
Desirable features of a good Scheduling Algorithms• No a priori knowledge about the processes.• Dynamic in nature.• Quick decision-making capability.• Balanced system performance.• Stability.• Scalability.• Fault tolerance.• Fairness of service.
Task assignment approach• A process is considered to be composed of multiple task.• Goal is to find an optimal assignment policy for the task of an individual process.
Cont…Assumptions: 1. A process has already been split into pieces called tasks. 2. Amount of computation required by each task and speed of each processor are known. 3. The cost of processing each task on every node of the system is known. 4. The IPC costs between every pair of task is known.
Cont…5. Other constraints, like Resource requirements of the tasks and the available resources at each node are also known.6. Reassignment of the tasks is generally not possible.
Cont…• Goals: Minimization of IPC costs Quick turnaround time for the complete process A high degree of parallelism Efficient utilization of system resources in general• These goals often conflict with each other.
Cont…• Two task assignment parameters – Task execution cost and – Inter-task communication cost
Cont…: Example• Total tasks = 6• Total nodes = 2
Load balancing approach• Load balancing Algorithms are also known as load-leveling algorithms. – Based on the intuition of better resource utilization.• Algorithm tries to balance the total system load by transparently transferring the workload from heavily loaded nodes to lightly loaded nodes.• Goal: maximize the total system throughput.
Static vs. Dynamic• Static algorithms: • Use only information about the average behavior of the system, ignoring the current state of the system. • Simpler because no need to maintain and process system state information. • Do not react to the current system state.
Cont…• Dynamic algorithms: • React to the system state that changes dynamically. • Able to avoid those state with unnecessarily poor performance. • More complex than static algorithms.
Deterministic vs. Probabilistic• Both are Static load balancing algorithms.• Deterministic algorithms: – Use the information about the properties of the node and characteristics of the processes. – Difficult to optimize and cost more to implement.
Cont…• Probabilistic algorithms: – Use information regarding static attributes of the system. – Easier to implement. – Suffer from having poor performance.
Centralized vs. Distributed• Centralized algorithm: – The responsibility of scheduling physically resides on a single node. – System state information is collected at a single node at which all the scheduling decisions are made. • Known as Centralized server node.
Cont…– Problem : reliability • If the centralized server fails, all scheduling in the system would cease.– Solution : replicate the server on K+1 nodes if it is to survive k faults.
Cont…• Distributed algorithms: – The work involved in making process assignment decisions is physically distributed among the various nodes of the system. – Avoids the bottleneck of collecting state information at a single node. – Allows the scheduler to react quickly to dynamic changes in the state.
Cont…– Algorithm is composed of entities known as local controllers.– Each entity is responsible for making scheduling decisions for the processes of its own node.
Co-operative vs. Non-Cooperative• Non-cooperative algorithms : • Individual entities act as autonomous entities and make scheduling decisions independently of the actions of other entities.
Cont…• Cooperative algorithms: • Distributed entities cooperate with each other to make scheduling decisions. • More complex and involve larger overhead than non-cooperative.
Issues in designing load balancing algorithms• Load estimation policy• Process transfer policy• State information exchange policy• Location policy• Priority assignment policy• Migration limiting policy
Cont…• Local Process – A process which is processed at its originated node.• Remote Process – A process which is processed at a node different than the one on which it originated.
Load Estimation Policies• Estimation based on parameters like: 1. Total no. of processes on the node at the time of load estimation. 2. Resource demands of these processes. 3. Instruction mixes of these processes. 4. Architecture and speed of the node’s processor.
Cont…• Sum of the remaining service times of all the processes on a node can be a measure for estimating a node’s workload.• Issue: how to estimate the remaining service time of the processes?
Cont…• Solutions: 1. Memoryless method • This method assumes that all processes have the same expected remaining service time, independent of the time used so far. • It reduces the load estimation method to that of total number of processes.
Cont…2.Pastrepeats • This method assumes that the remaining service time of a process is equal to the time used so far by it.3.Distribution method • If the distribution of service times is known, the associated process’s remaining service time is the expected remaining time conditioned by the time already used.
Process Transfer Policies• Load balancing algorithms use the threshold policy to decide whether a node is lightly or heavily loaded.• The threshold value of a node: – the limiting value of its workload, – used to decide whether a node is lightly or heavily loaded.
Cont…• Methods to determine the threshold value of a node: 1. Static policy • Each node has a predefined threshold value depending on its processing capability. • This value does not vary with the dynamic changes in workload at local or remote nodes. • No exchange of state information among the nodes to decide this value.
Cont…2. Dynamic policy • The threshold value of a node is calculated as a product of the average workload of all the nodes and a predefined constant (ci). • Nodes exchange state information by using one of the state information exchange policies. • It gives a more realistic value of threshold for each node. • Involves overhead in exchange of state information.
Cont…• Most load balancing algorithms uses a single threshold and thus only have overloaded and under loaded regions.
Cont… : Single-threshold policy• A node accepts new processes (either local or remote) based on its load. – Accepts if load is below the threshold value. – Rejects if load is above the threshold value.• It makes scheduling algorithms unstable.
Cont…– A node should only transfer one or more of its processes to another node if such transfers greatly improves the performance of the rest of its local processes.– A node should accept remote processes if its load is such that the added workload of processing these incoming processes does not significantly affect the service to the local ones.
Cont… : Double-threshold policy• Also known as high-low policy.• Use of two threshold values: high mark and low mark.• Three regions : – Overloaded – Normal – Under loaded
Cont…• For overloaded region: – New local processes are sent to be run remotely and requests to accept remote processes are rejected.• For normal region: – New local processes run locally and requests to accept remote processes are rejected.
Cont…• For underloaded region: – New local processes run locally and requests to accept remote processes are accepted.
Cont… : Threshold• A destination node is selected at random.• A check is made to determine – whether the transfer of the process to that node would place it in a state that prohibits the node to accept remote processes. – If not, the process is transferred to the selected node, which must execute the process regardless of its state when the process actually arrives.
Cont…• If the check indicates that the selected node is in a state that prohibits it to accept remote processes, another node is selected at random and probed in the same manner.• A static probe limit LP is used here.• The performance with a small probe limit (e.g. 3 or 5) is almost as good as the performance with a large probe limit (e.g. 20).
Cont… : Shortest• LP distinct nodes are chosen at random and each is polled in turn to determine its load.• The process is transferred to the node having the minimum load value, unless that node’s load is such that it prohibits the node to accept remote processes.
Cont…• If none of the polled node can accept the process, it is executed at its originating node.• Discontinue probing whenever a node with zero load is encountered.
Cont… : Bidding• Each node in the network is responsible for two roles : manager and contractor.• The Manager represents a node having a process in need of a location to execute.• The Contractor represents a node that is able to accept remote processes.
Cont…• To select a node for its processes, a manager broadcasts a request-for-bids message to all other nodes in the system.• The contractors return bids to the manager node.• Manager transfers the process to the node with best bid.
Cont…• Problem: – A contractor may win many bids from many other manager nodes and thus becomes overloaded.• Solution: – On choosing best bid, manager node may send a message to the owner of that bid and send process on acknowledgement.
Cont…• Both manager and contractor are free to take decisions.• Drawback of bidding policy: – Communication overhead – Difficult to decide a good pricing policy.
Cont… : Pairing• This policy reduces the variance of loads only between pairs of nodes of the system.• Two nodes that differ greatly in load are temporarily paired with each other.• The load-balancing operation is carried out between the nodes belonging to the same pair.
Cont…• A node only tries to find a partner if it has at least two processes; otherwise migration from this node is never reasonable.• Use of random selection of pair.• The pair is broken as soon as the process migration is over.
State information exchange policies1. Periodic broadcast2. Broadcast when state changes3. On- demand exchange4. Exchange by polling
Cont…: Periodic broadcast• Each node broadcasts its state information after the elapse of every t units of time.• Generates heavy traffic.• Possibility of fruitless messages being broadcast.• Poor scalability problem.
Cont…: Broadcast when state changes• A node broadcasts its state information only when its state changes. – When a process arrives at that node or when a process departs from that node. – When its state switches from the normal load region to either the underloaded region or the overloaded region. • Works with two-threshold transfer policy.
Cont…: On- demand exchange• A node broadcasts a StateInformationRequest message when its state switches from the normal load region to either the underloaded region or the overloaded region.• Receiving nodes send their current state to the requesting node.• Policy works with two-threshold transfer policy.
Cont…• The status of the requesting node is included in the StateInformationRequest message.• If this status is – Underloaded, only overloaded nodes will respond to it. – Overloaded, only underloaded nodes will respond to it.
Cont…: Exchange by polling• No need for a node to exchange its state information with all other nodes in the system.• When a node needs the cooperation of some other node for load balancing, it can search for a suitable partner by randomly polling the other nodes one by one.
Cont…• Selfish: – Local processes are given higher priority than remote processes. – Yields the worst response time performance among other policies. • Poor performance of remote processes. • Best response time for local processes.
Cont…• Altruistic: – Remote processes are given higher priority than local processes. – Policy has best response time of the three policies.
Cont…• Intermediate: – The priority of processes depends on the number of local processes and the number of remote processes at the concerned node. – If no. of local nodes is greater than or equal to the no. of remote processes, priority will be given to local processes otherwise to remote processes. – Overall response time performance is much closer to that of the altruistic policy.
Migration- limiting policies• A decision about the total no. of times a process should be allowed to migrate.• Two migration-limiting policies: – Uncontrolled – Controlled
Cont…• Uncontrolled – A remote process arriving at a node is treated just as a process originating at the node. – A process may be migrated any no of times.
Cont…• Controlled – To overcome the instability problem of the uncontrolled policy, most system treat remote processes different from local processes and use a migration count parameter.
Load sharing approach• For the proper utilization of the resources of a distributed system it is not required to balance the load on all the nodes.• It is necessary and sufficient to prevent the nodes from being idle while some other node have more than two processes.
Cont…• This rectification is often called dynamic load sharing instead of dynamic load balancing.
Issues in designing load-sharing algorithms• Load sharing algorithms do not attempt to balance the average workload on all the nodes of the system, rather they only attempt to ensure that no node is idle when a node is heavily loaded.• The priority assignment policies and migration limiting policies are same as that for the load-balancing algorithms. Other policies are described here.
Cont… : Load Estimation Policy• It is sufficient to know whether a node is busy or idle.• Methods for estimating load: – Count the total number of processes on a node. – Measure CPU utilization.
Cont… : Process transfer policies• All-or-nothing strategy: – Uses the single threshold policy with the threshold value of all the node fixed at 1 and some uses 2. – Drawback : Loss of available processing power in the system. – Solution : use a threshold value of 2 instead of 1.
Sender initiated location policy• Heavily loaded nodes search for lightly loaded node to which work may be transferred.• When load becomes more than the threshold value, it either broadcasts a message or randomly probes the other nodes one by one to find a lightly loaded node.
Cont…• In the broadcasting method, the presence or absence of a suitable receiver node is known as soon as the sender node receives reply messages from the other nodes.
Cont…• In the random probing method, the probing continues until either a suitable node is found or the no. of probes reaches a static probe limit, Lp.• Fixed limit has better scalability than broadcast method.
Receiver-initiated location policy• Lightly loaded node search for heavily loaded nodes from which work may be transferred.• When a node’s load falls below the threshold value either it broadcasts a message indicating its willingness to receive processes or randomly probes the other nodes one by one to find a heavily loaded node.• In the broadcast method, a suitable node is found as soon as the receiver node receives reply messages from the other nodes.
Cont…• In the random probing method, the probing continues until either a suitable node is found or the no. of probes reaches a static probe limit, Lp.
Cont…• Both sender-initiated and receiver-initiated policies offer substantial performance advantages over the situation in which no load sharing is attempted.• Sender-initiated policies are preferable at light to moderate system loads, while receiver-initiated policies are preferable at high system loads.
Cont…• If the cost of process transfer under receiver- initiated policies is significantly greater than under the sender-initiated policies due to the preemptive transfer of processes, sender- initiated policies provide uniformly better performance.
State information Exchange Policy1. Broadcast when state changes.2. Poll when state changes.