1. Genetic Algorithm For Process Scheduling InDistributed Operating System Adhokshaj Mishra Department of Computer Science and Engineering, University Institute of Engineering and Technology, CSJM University Kanpur, INDIA Email: adhokshajmishra@indidigilabs.in Ankur Verma Department of Computer Science and Engineering, University Institute of Engineering and Technology, CSJM University Kanpur, INDIA Email: ankurverma@indidigilabs.in
2. AbstractThe problem of process scheduling in distributed system is one ofthe important and challenging area of research in computerengineering. Scheduling in distributed operating system has animportant role in overall system performance. Process scheduling indistributed system can be defined as allocating processes toprocessor so that total execution time will be minimized, utilizationof processors will be maximized and load balancing will bemaximized. The scheduling in distributed system is known as NP-Complete problem even in the best conditions, and methods basedon heuristic search have been proposed to obtain optimal andsuboptimal solutions. Genetic algorithm is one of the widely usedtechniques for constrain optimization. Genetic algorithm is basicallysearch algorithm based on natural selection and natural genetics. Inthis paper, using the power of genetic algorithms, we solve thisproblem considering load balancing efficiently.Keywords: Genetic Algorithm, Distributed Systems, Load Balancing 1. IntroductionThe computational complicated process cannot be executed on thecomputing machine in an accepted interval time. Therefore, theymust be divided into small sub-process. The sub-process can beexecuted either in the expensive multiprocessor or in thedistributed system. Distributed system is preferred due to betterratio of cost per performance. Scheduling in distributed operatingsystems is a critical factor in overall system performance. Processscheduling in a distributed operating system can be stated asallocating processes to processors so that total execution time will
3. be minimized, utilization of processors will be maximized, and loadbalancing will be maximized. Process scheduling in distributedsystem is done in two phases: in first phase processes aredistributed on computers and in second processes execution orderon each processor must be determined.The methods used to solve scheduling problem in distributedcomputing system can be classified into three categories graphtheory based approaches, mathematical models based methods andheuristic techniques.Heuristic algorithm can be classified into three categories iterativeimprovement algorithms, the probabilistic optimization algorithmsand constructive heuristics. Heuristic can obtain sub optimalsolution in ordinary situations and optimal solution in particulars.The first phase of process scheduling in a distributed system isprocess distribution on computer. The critical aspects of this phaseare load balancing. Recently created processes may be overloadedheavily while the others are under loaded or idle. The mainobjectives of load balancing are to speared load on processorsequally, maximizing processors utilization and minimizing totalexecution time.The second phase of process scheduling in distributed computingsystem is process execution ordering on each processor. Geneticalgorithm used for this phase. Genetic algorithm is guided randomsearch method which mimics the principles of evolution and naturalgenetics. Genetic algorithms search optimal solution from entiresolution space. They often can obtain reasonable solution in allsituations. Nevertheless, their main drawback is to spend much timefor schedule. Hence, we propose a modified genetic algorithm toovercome from drawback through this paper.
4. In this paper using the power of genetic algorithms we solve thisproblem. Process distribution on different processor done based onprocessors load. The proposed algorithm maps each schedule with achromosome that shows the execution order of all existing processon processors. The fittest chromosomes are selected to reproduceoffspring: chromosomes which their corresponding schedules haveless total execution time, better load balance and processorutilization. We assume that the distributed system processes arenon uniform and non-preemptive, that is the processors may bedifferent and a processor completes current process beforeexecuting a new one the load balancing mechanism used in thispaper only schedule process without process migration. 2. Preliminaries 2.1 System and Process ModelThe system used for simulation is loosely coupled non-uniformsystem, all tasks are non-pre-emptive and no process migration isassumed. The process scheduling problem considered in this paperis based on the deterministic model. A distributed system with mprocessors, m>1 should be modeled as follows:P= {p1, p2, p3...pm} is the set of processors in the distributed system.Each processor can only execute one process at each moment; aprocessor completes current process before executing a new one,and a process cannot be moved to another processor duringexecution. R is an m × m matrix, where the element ruv 1≤ u, v ≤ m ofR, is the communication delay rate between Pu and Pv. H is an m × mmatrix, where the element huv 1≤ u, v ≤m of H, is the time required totransmit a unit of data from Pu and Pv. It is obvious that huu=0 andruu=0.
5. T= {t1, t2, t3…tm} is the set of processes to execute. A is an n x mmatrix, where the element aij 1 ≤i ≤n, 1 ≤ j ≤ m of A, is the executiontime of process ti on processor pj. In homogeneous distributedsystems the execution time of an individual process on allprocessors is equal, that means: 1 ≤i ≤ n; ai1 = ai2 = ai3 = … = aim. D isa linear matrix, where the element di 1 ≤ i ≤ n of D, is the datavolume for process ti to be transmitted, when the process ti is to beexecuted on a remote processor.F is a linear matrix, where the element fi 1 ≤ i ≤ n of F is the targetprocessor that is selected for the process ti to be executed on. C is alinear matrix, where the element ci 1 ≤ i ≤ n of C, is the processorthat the process ti is presented on just now.The problem of process scheduling is to assign for each processtia processor fiP so that the total execution time is minimized,utilization of processors is maximized, and load balancing will bemaximized. In such systems, there are finite numbers of processes,each having a process number and an execution time and placed in aprocess pool from which processes are assigned to processors. Themain objective is to find a schedule with minimum cost. Thefollowing definitions are also needed:Definition 1The processor load for each processor is the sum of processesexecution times allocated to that processor. However, as theprocessors may not always be idle when a chromosome (schedule)is evaluated, the current existing load on individual processor mustalso be taken into account. Therefore: . . = , + =1 , …….. (1) =1
6. Definition 2The length or maxspan of schedule T is the maximal finishing time ofall the processes or maximum load. Also, communication cost (CC)to spread recently created processes on processors must becomputed: Maxspan(T) = max(Load(pi)) 1≤i≤ Number of processors …(2) . CC T = (rc i f i + hc i f i X di ) ……. (3) =1Definition 3The Processor utilization for each processor is obtained by dividingthe sum of processing times by maxspan, and the average ofprocessors utilization is obtained by dividing the sum of allutilizations by number of processors: ( ) ( ) = ……….. (4) . ( =1 ( )) = . …….. (5)Definition 4Number of Acceptable Processor Queues (NoAPQ): We must definethresholds for light and heavy load on processors. If the processescompletion time of a processor (by adding the current system loadand those contributed by the new processes) is within the light and
7. heavy thresholds, this processor queue will be acceptable. If it isabove the heavy threshold or below the light-threshold, then it isunacceptable, but what is important is average of number ofacceptable processors queues, which is achievable by: = . …… (6)Definition 5A Queue associated with every processor, shows the processes thatprocessor has to execute. The execution order of processes on eachprocessor is based on queues.The Proposed Genetic AlgorithmGenetic algorithms, as powerful and broadly applicable stochasticsearch and optimization techniques, are the most widely knowntypes of evolutionary computation methods today. In general, agenetic algorithm has five basic components as follows: 1. An encoding method that is a genetic representation (genotype) of solutions to the program. 2. A way to create an initial population of individuals (chromosomes). 3. An evaluation function, rating solutions in terms of their fitness, and a selection mechanism. 4. The genetic operators (crossover and mutation) that alter the genetic composition of offspring during reproduction. 5. Values for the parameters of genetic algorithm.Genotype
8. In the GA-Based algorithms each chromosome corresponds to asolution to the problem. The genetic representation of individuals iscalled Genotype. In this paper a chromosome consists of an array ofn digits, where n is the number of processes. Indexes show processnumbers and a digit can take any one of the 1...m values, whichshows the processor that the process is assigned to. If more thanone process is assigned to the same processor, the left to-right orderdetermines their execution order on that processor.Initial PopulationAs discussed before, the main objective of GA is to find a schedulewith optimal cost while load-balancing; processors utilization andcost of communication are considered. We take into account allobjectives in following equation. The fitness function of a Schedule Tis defined as follows: ( ) () = ……. (7) ( ())Where 0 α, β, γ, θ ≤ 1 are control parameters to control effect ofeach part according to special cases and their default value is one.This equation shows that a fitter solution (Schedule) has lessmaxspan, less communication cost, higher processor utilization andhigher Average number of acceptable processor queues.SelectionThe selection process used here is based on spinning the roulettewheel, which each chromosome in the population has a slot sized inproportion to its fitness. Each time we require an offspring, a simplespin of the weighted roulette wheel gives a parent chromosome. Theprobability pi that a parent Ti will be selected is given by:
9. ( ) = ……. (8) =1 ( )Where F(Ti) is the fitness of chromosome Ti.CrossoverCrossover is generally used to exchange portions between strings.Crossover is not always affected; the invocation of the crossoverdepends on the probability of the crossover Pc. We haveimplemented two crossover operators. The GA uses one of them,which is decided randomly.Single-Point CrossoverThis operator randomly selects a point, called Crossover point, on theselected chromosomes, then swaps the bottom halves aftercrossover point, including the gene at the crossover point andgenerate two new chromosomes called children.Proposed CrossoverThis operator randomly selects points on the selectedchromosomes, then for each child non-selected genes are taken fromone parent and selected genes from the other.MutationMutation is used to change the genes in a chromosome. Mutationreplaces the value of a gene with a new value from defined domainfor that gene. Mutation is not always affected, the invocation of theMutation depend on the probability of the Mutation Pm. We haveimplemented two mutation operators. The GA uses one of them,which is decided randomly.
10. First Mutation OperatorThis operator randomly selects two points on the selectedchromosome, and then generates a chromosome by swapping thegenes at the selected points.Second Mutation OperatorThe other approach is to check if any jobs could be swappedbetween processors which would result in a lower make span. If wewant to test every possible swap, it would be computationally veryintensive, and in larger problems would take an unfeasible amountof time. It also seems unreasonable to consider swapping processeson processors which their load is significantly below the make span,therefore we try to swap processes between overloaded and underloaded processors. This concept can be implemented as follows: 1. First, select a processor, say pv, which has maximum finish time. 2. Second, select a processor, say pu, which has minimum finish time. 3. Third, try to transfer a process from pv to pu or swap a single pair of processes between pv and pu that improves the make span of both processors the most. 4. This process is repeated until no improvement is possible.Replacement StrategyWith genetic operators (crossover, mutation) are applied onselected patterns T1, T2 two new chromosomes T’ and T” aregenerated. These chromosomes are added to new temporarypopulation. By repeating this operation, a new temporarypopulation with size of 2*POPSIZE is generated. After that fitter
11. chromosomes are selected from current population and newtemporary population, at last selected chromosomes made newpopulation and algorithm restarts.Termination ConditionWe can apply multiple choices for termination condition: Maxnumber of generation, algorithm convergence, and equal fitness forfittest selected chromosomes in respective iterations.The Structure of Proposed Genetic AlgorithmOur proposed GA-Based algorithm starts with a generation ofindividuals. A certain fitness function is used to evaluate the fitnessof each individual. Good individuals survive after selectionaccording to the fitness of individuals. Then the survived individualsreproduce offspring through crossover and mutation operators. Thisprocess iterates until termination condition is satisfied. It isConsiderable to say that parameters such as pc, pm, POPSIZE,NOGEN, α, β, γ and θ must be determined before GA is started. Thealgorithm is as below:Procedure GA-based algorithmBegin Initialize P (k): {Create an initial population} Evaluation P (k): {evaluate all individuals in the population} Repeat For i=1 to 2*POPSIZE do Select 2 chromosomes as parent1 and parent2 from population Child1 and Child2←Crossover (parent1, parent2); Child1←Mutation (Child1); Child2←Mutation (Child2); Add (new temp population, Child1, Child2);
12. End For Make (new population, new temp population, population); Population = new Population; While (not termination condition); Select best chromosome in population as solution and return it;EndConclusionsScheduling in distributed operating systems has a significant role inoverall system performance and throughput. The scheduling indistributed systems is known as an NP-complete problem even inthe best conditions. We have presented and evaluated new GA-Based method to solve this problem. This algorithm considers multiobjectives in its solution evaluation and solves the schedulingproblem in a way that simultaneously minimizes maxspan andcommunication cost, and maximizes average processor utilizationand load-balance. Most existing approaches tend to focus on one ofthe objectives. Experimental results prove that our proposedalgorithm tend to focus on all of the objectives simultaneously andoptimize them.References 1. A Genetic Algorithm for Process Scheduling in Distributed Operating Systems considering Load Balancing, by M. Nikravan and M. H. Kashani 2. A Modified Genetic Algorithm for Process Scheduling in Distributed System, by Vinay Harsora and Dr. Apurva Shah, International Journal of Computer Applications, AIT – 2011
Be the first to comment