Biology for Computer Engineers Course Handout.pptx
Optimization Final Report
1. Optimization of the Time Scheduling Problem
Using Genetic Algorithm in MATLAB
Yichen Sun
Washington University in St. Louis, April 2016
Abstract
This paper discusses the use of genetic algorithm in the optimization of daily time management problem.
The problem approached was to optimize the efficiency of finishing a number of tasks in a given time span,
where our working time is ranked in working efficiency and each task is assigned a "value". The problem
was approached with genetic algorithm, in which a chromosome is a permutation sequence of time blocks of
half an hour. As Matlab genetic algorithm toolbox cannot solve our problem, I wrote a complete set of
algorithm and improved it several times. A simple example of 3-task-8-hour scheduling optimization
example is presented with an accurate solution, and a complicated real life example of 8-task-126-hour
scheduling optimization is discussed in detail.
Introduction
When our schedule is filled up with different types of
tasks, some of which are urgent but not important like
cooking tomorrow's meals, some of which are important
but not urgent like preparing for a future interview, and
some of which are both important and urgent like
cramming up for an imminent exam. According to a
survey introduced by Bruce K. Britton and Abraham
Tesser,[1] good time management leads to feelings of
self-efficacy allow, and indeed support, more efficient
cognitive processing, more positive affective responses,
and more persevering behavior; besides, the survey also
shows a positive relationship between time-management
attitudes and skills and grade point average.
So it is important and practical to arrange our schedule
in an efficient way, so that important and urgent tasks
could be finished early, and long-term tasks would also
be fitted into proper time to avoid procrastination. There
are smart mobile schedule planners like Timely, and
plenty of GTD (Get Things Done) apps on the market,
all of which aim to aid people making better time
schedules. In our problem, solving this time scheduling
optimization problem helps us better utilize our time and
make our working efficiency high according to our own
design rules. However, according to Mladen Janković[2]
,
making a schedule is one of those NP hard problems.
Such problems can be solved using heuristic search
algorithm to find the optimal solution, but it only works
for simple cases. For more complex inputs and
requirements, finding a considerably good solution can
take a while, or it may even be impossible. This is why
we adopted genetic algorithm. Another difference of our
model to other scheduling problems lies on the form of
chromosomes. In our model, we break up time into
consecutive time blocks of half an hour, and it is hard to
code and decode such time sequence into binary strings
and perform crossover and mutation like those in
traditional genetic algorithms. Therefore, we have to
adopt a permutation type of chromosome to avoid
creating nonsensical chromosomes after crossovers and
mutations.
In our optimization model, we have any arrangement of
time a chromosome, and we try to maximize the fitness
value assigned to the chromosome by our design rules
after optimization.
Background/Related Work
Previous researches on order based crossover and
scheduling problems are extensive, and we also
referred to papers on pseudo codes of our
permutation type genetic algorithm. Our scheduling
problem is similar to a resource-constrained
2. scheduling problem[3], in which the number of
possible courses of action and the number of
ways to allocate resources quickly become
overwhelming on a project. In our model, time is a
constrained resource and we need to allocate time
properly to achieve maximum efficiency. As for
optimization methods, we have heuristic and
deterministic methods[3]. Heuristic solutions may
have no guarantee of finding the exact solution
even if it exists, but typically assure analytically
some degree of optimality in their solutions, while
deterministic methods operate the same way each
time for a given problem. In our optimization
problem, we wish we could solve scheduling
problem of different scales and prevent huge time
consumption when we work on multiple tasks and
long time scheduling, so we adopt genetic
algorithm to search for a heuristic solution in its
search space.
We also referred to the model used in a typical
knapsack problem[4], in which every item has a
value and a weight. Comparatively, in our model,
every task is also given a value and an assuming
working length, thus could be used in calculating
the fitness value of any time arrangement. And
different from a knapsack problem, we also assign
efficiency values to all time blocks, hoping these
values to play a role in genetic algorithm to
determine which job to finish first. That is, we add
another "degree of freedom" to a traditional
knapsack problem: we care not only what job to be
finished during a fixed amount of time, but also in
what sequence should every job be processed.
Another important feature of our problem is that every
time block is not coded into binary strings but is
shuffled in a random order from 1 to N. A crossover
technique called "Single Point Preservation" is
introduced[5]
to permutation population, which swaps the
value of two random elements in a chromosome.
However, we will later discuss in "Result" section how
we have improved this crossover technique to make our
genetic algorithm work.
Model Summary
In our model, for N different tasks to complete, we
assign values of "importance" and "urgency" both
from 1 to N. A higher value indicates a higher
importance and urgency, and we sum these two
values to define the "value" of the task. If two tasks
are evaluated with the same "value", we add 0.5 to
the one with higher urgency. The step above is
mimicking what we did in formulating a knapsack
problem.
We then assign integers to each of our time block
indicating working efficiency within that period.
For example, if we are planning for a period of 8
hours from 8am to 4pm, we have 16 time blocks of
0.5 hour, and we may create an efficiency vector of
16 elements like [3 3 5 5 5 5 5 5 2 2 1 1 3 3 4 4].
The vector vividly illustrates the fact that we are
productive from 9am to 12pm, fall asleep during
1pm to 2pm, and gradually regain efficiency after
2pm. Without an efficiency vector, every period of
time would be practically the same, and it is no
longer necessary to do optimization.
We then formulate the crux of our optimization
model--genotypes. Consider optimizing K jobs in
the period of N hours, we number each time block
from 1 to 2N consecutively and shuffle these 2N
numbers into any order to form a genotype. Then,
suppose K jobs require a total of M time blocks to
finish, i.e. M/2 hours, so the first M numbers in any
genotype represent the time these K jobs is working
on. We could then multiply the "value" of each job
to efficiency values of all time blocks it works on,
and then sum it up over the entire K jobs to attain a
fitness value of the genotype.
With everything ready, we could follow the step
introduced[6][7] to define and code our genetic
algorithm. We first generate N random initial
population and select N mating parents by a roulette
wheel selection. Then, an illustration of how we
performed crossover and mutation in our algorithm
is shown in Figure 1 and Figure 2.
Figure 1 Illustration of Crossover
3. We rearrange the part after the crossover point of
one parent according to the sequence of
corresponding elements in the other mating parent.
And we perform mutation by a given mutation
probability to every parent by simply swapping two
random elements within the parent.
After iterating for MaxIt cycles, we evaluate best
fitness results in each generation and check how the
overall quality of population has evolved due to our
selection of good quality schemata and elimination
of bad schemata.
It is clear that our algorithm is searching for a
heuristic solution in its discrete searching space,
and an approximating optimal solution would be
returned. We will discuss in the next section our
results and how to further improve it.
Results
We first try to optimize scheduling of a simpler
problem consist of three tasks and a search space of
8 hours. Three tasks each has "value" = [6,10,4],
corresponding finishing time = [8,5,1], and our
efficiency vector is [3 3 5 5 5 5 5 5 2 2 1 1 3 3 4 4].
With a population N = 30, mutation rate = 0.001
and maximum generation = 300, we have
The best fitness oscillates a lot, and a final schedule
is returned as
[3,4,14,10,15,16,12,13,7,8,5,6,11,1,9,2], and the
fitness value is 264. The improvement of overall
population from generation to generation is not
obvious because the population is small and
convergence is achieved pretty fast. But we could
still see from the result, we work on the first job on
3,4,10,12,13,14,15,16 blocks and second job on
5,6,7,8,11 blocks and the last job block 1. This
result is pretty acceptable.
To have a better view of how our genetic algorithm
is functioning, we work on a much more complex
scheduling problem consist of 8 tasks, 16*7=252 (G)
total time blocks and 115 total working blocks
scheduling problem. We use an initial population of
500 (N) to accommodate enough variation of
population because the length of each genome is as
long as 252. We then run our algorithm for 5000
and 30000 cycles respectively, and the result is as
follows.
Figure 2 Illustration of Mutation
Figure 3 Convergence Plot of Best Fitness
and Mean Fitness, N=30, G=16
Figure 4 Convergence Plot of Best Fitness
and Mean Fitness, N=500, G=252, Iter=5000
Figure 5 Best Fitness at Every 100
Generations N=500, G=252, Iter=5000
4. From Figure 4 to Figure 7, we could see our genetic
algorithm is improving overall fitness from
generation to generation directly. The improvement
is achieved most rapidly in the first 5000 cycles
according to Figure 7. However, the overall fitness
is still slowly improving itself even after 30000
generations. We postulate another approximating
optimal chromosome by fitting tasks with higher
"values" into time blocks of higher efficiency
without applying any design rules, and its fitness
value is 3568. In fact, our current design rules are
incorporated into fitness function by only restricting
breakups of one single consistent task.
Compare our highest fitness value attained by our
genetic algorithm and another simple fitting
algorithm, we see improvement still need to be
made to achieve higher fitness value and faster
convergence. In fact, 30000 generations may take a
normal PC running for 2 hours.
Before this stage, we have also tested using "Single
Point Preservation" type crossover illustrated in
reference[5]. However, this technique is basically
similar to a mutation, therefore very weak
manipulation at genotype level is performed to
attain variety during producing offspring. And not
surprisingly, improvement could hardly be seen by
using this technique. We have also tried to
introduce good "seed", which is a good initial
population generated by a simple fitting method.
But our effort failed because during first thousands
of generations, other initial population will only
dilute the good "seed", and the overall convergence
speed is not faster compared to a traditional genetic
algorithm.
Continuing Work
We could possibly improve our code by sorting
generated offspring at the end of each iteration and
preserve the best genotype from crossover and
mutation to check if faster convergence could be
achieved. Furthermore, improvement on designs of
fitness function could also be made to accelerate
selection by creating larger divergence between
each genotype.
In short, this paper has shown how our algorithm is
working and what improvements have been made.
We see the potential of genetic algorithm in solving
huge population problems and will continue our
work to make the algorithm converge fast and break
through existing good "seed".
Besides what we are currently working on, a
prospect of our future work would be as follows:
1. Dynamic scheduling, so that whenever a new
project and its constraint are added, a new schedule
would be returned quickly;
2. Restrict scheduling space into shorter time span,
like future 2 days, so we are likely to attain faster
convergence and better optimal scheduling;
3. Extend our code into an automatic scheduling
software for portable planning.
Figure 6 Convergence Plot of Best Fitness and
Mean Fitness, N=500, G=252, Iter=30000
Figure 7 Best Fitness at Every 100
Generations N=500, G=252, Iter=30000
5. Reference
[1] Britton B K, Tesser A. Effects of time-management
practices on college grades [J]. Journal of educational
psychology, 1991, 83(3): 405.
[2] Mladen Janković. Making a Class Schedule Using a
Genetic Algorithm
http://www.codeproject.com/Articles/23111/Making-a-C
lass-Schedule-Using-a-Genetic-Algorithm
[3] Wall M B. A genetic algorithm for
resource-constrained scheduling [D]. Massachusetts
Institute of Technology, 1996.
[4] Knapsack problem
https://en.wikipedia.org/wiki/Knapsack_problem
[5] Rice O, Nyman R. Efficiently Vectorized Code for
Population Based Optimization Algorithms[J]. RN, 2013,
13: 09.
[6] Arora J. Introduction to optimum design[M].
Academic Press, 2004.
[7] Guo C, Yang X. A programming of genetic algorithm
in matlab7. 0[J]. Modern Applied Science, 2011, 5(1):
p230.
6. Appendix (Matlab Code)
【Main_scheduling】
clc
clear all
%%Read data from Existing Excel schedule
filename ='Schedule Data.xlsx';
sheet1 = 1;
sheet2 = 2;
sheet3 = 3;
xlRange1 = 'B2:AK8';
xlRange2 = 'H2:H9';
xlRange3 = 'K2:K9';
Data = xlsread(filename,sheet1,xlRange1);
Eff_facm = xlsread(filename,sheet2,xlRange1);
Eff_facv=reshape(Eff_facm',1,[]); %Compress efficiency factor matrix into one dimension for
easy calculation
%Eff_facv is the efficiency factor vector in the same dimension as a genotype.
days=size(Data,1);%How many days are we to schedule
timeblocks=size(Data,2);%How many time blocks in a day are we to schedule
Value = xlsread(filename,sheet3,xlRange2); %import value data of different jobs to be
scheduled
TimeSum = xlsread(filename,sheet3,xlRange3); %import number of time blocks each job are to
take
%%Setting Parameters
MaxIt = 30000; %Maximum number of iterations
PerMut = 0.001; % Prob of Each Genome Mutating, we swap gene value on two loci inside the
mutated genotype
N = 500; % Individuals in Population, and N needs to be even (N/2 should be integer)
G = days*timeblocks; % Genome Length = amount of total available time blocks
%%evaluate a good initial population generated by traditional algorithm
good=Initial_Good_Genotype()
BBBBB=fitness(good,Eff_facv,Value,TimeSum)
%%Initializing all population which has genomes of length N
for i = 1:N
Pop(i,:) = randperm(G);
end
7. for iga = 1:MaxIt %Start of iterations
W=Roulette_Wheel_Selection(Pop,N,G,Eff_facv,Value,TimeSum);
Pop2=crossover(Pop,W,N,G);
Pop2=mutation(Pop2,W,N,G,PerMut);
Pop=Pop2;
for i = 1:N %Calculate Fitness in this loop
eval=Pop(i,:);% evaluator of fitness value in current iteration
FitVal(i)=fitness(eval,Eff_facv,Value,TimeSum); % Measure Fitness
end
fprintf('Gen: %d Mean Fitness: %d Best
Fitness: %dn',iga,round(mean(FitVal)),round(max(FitVal)))% Print Stats
%%Write value into vectors MaxFit and MeanFit and Iteration for Plot
MaxFit(iga)=max(FitVal);
MeanFit(iga)=mean(FitVal);
Iteration(iga)=iga;
for i=1:MaxIt/100
if iga==100*i
Hundreds_Value(i)=max(FitVal);
end
end
end
for i=1:N %We have to evaluate fitness function at the end so as to display the final
results(after last crossover and mutation)
if FitVal(i)==max(FitVal);
BestGenotype=Pop(i,:);
end
end
disp('Best Genome:') % Write Text to Console
disp(BestGenotype) % Display Best Genome
figure
plot(Iteration,MaxFit,'b',Iteration,MeanFit,'g')
title('Convergence of Best Fitness and Mean Fitness')
xlabel('Number of Iteration')
ylabel('Fitness Value')
hold on
figure(2)
8. plot(Hundreds_Value)
title('Best Fitness in Every Hundredth Iterations')
xlabel('Number of One Hundred Iterations')
ylabel('Fitness Value')
【Fitness】
function[M]=fitness(X,Y,Z,W) %%X for population vactor[1*256],Y for Efficiency
factor[1*256]
%%Z for Value of each project[1*8], W for number of time blocks each will take[1*8]
lenX=length(X);
lenZ=length(Z);
M=0;
WW=0;
for i=1:lenZ
SumEff=0;
WW=WW+W(i);% variable tracking current counting region on Efficiency_Vector Y
if i==1
for j=1:WW
if (j>1)&&((X(j)-X(j-1))~=1)
Y(X(j))=0;
end
SumEff=SumEff+Y(X(j));
end
M=M+Z(i)*SumEff;
else
for j=(WW-W(i)+1):WW
if (j>(WW-W(i)+1))&&((X(j)-X(j-1))~=1) %If the i-th job has two
%nonconsecutive time blocks,
%the latter one(and only this one block)
%has 0 efficiency
Y(X(j))=0;%SumEff=SumEff*0.5;
end
SumEff=SumEff+Y(X(j));
end
M=M+Z(i)*SumEff;
end
end
【Roulette Wheel Selection】
function W=Roulette_Wheel_Selection(Pop,N,G,Eff_facv,Value,TimeSum);
for i = 1:N % Begin: Compute Cumulative Distribution
AA=Pop(i,:);
if i ==1 %
9. FCD(i) = fitness(AA,Eff_facv,Value,TimeSum); %F1 for cumulative fitness value of only
one population
else
FCD(i) = FCD(i-1) + fitness(AA,Eff_facv,Value,TimeSum);%Sum up fitness values
end
end % End: Compute Cumulative Distribution
for i = 1:N
FCD(i) = FCD(i)/FCD(end); %Normalize Cumulative Distribution
end
for i = 1:N %Create N matching parents for producing offsprings according to their fitness
randval = rand(1);%Spin Roulette wheel and returns a value between 0 and 1
for k = 1:N
if k == 1
if randval>=0 && randval<=FCD(k)
W(i) = k;% Return the sequence of winner genotype in the initial population,
this is the first winner
end
else
if randval>=FCD(k-1) && randval<=FCD(k)
W(i) = k;% The other (N-1) winners, W stands for "Winner"
end
end
end
end
【Crossover】
function Pop2=crossover(Pop,W,N,G)
for i = 1:N/2
idx = (i-1)*2+1; % First Winner
mate1=Pop(W(idx),:);
mate2=Pop(W(idx+1),:);
CP = round(rand(1)*(G-1)+1); % Round to nearest integer, find the Crossover Point
tempmate=mate1;
xs = zeros(1,G);
ys = zeros(1,G);
for j=1:(G-CP+1)
xs(j)=find(mate2==mate1(CP+j-1));
end
xx=sort(xs,'descend');
for j=1:(G-CP+1)
mate1(CP+j-1)=mate2(xx(j));
end
for j=1:(G-CP+1)
ys(j)=find(tempmate==mate2(CP+j-1));
end
yx=sort(ys,'descend');
10. for j=1:(G-CP+1)
mate2(CP+j-1)=tempmate(yx(j));
end
Pop2(idx,:)=mate1;
Pop2(idx+1,:)=mate2;
End
【Mutation】
function Pop2=mutation(Pop2,W,N,G,PerMut)
for i = 1:N
idx = rand(1)<PerMut; % Index for Mutation, true return 1 and we perform steps thereafter
if idx == 1
Loc1 = round(rand(1)*(G-1)+1); % Swap Location 1
Loc2 = round(rand(1)*(G-1)+1); % Swap Location 2
Hold = Pop2(i,Loc1); % Hold Value 1
Pop2(i,Loc1) = Pop2(i,Loc2); % Value 1 = Value 2
Pop2(i,Loc2) = Hold; % Value 2 = Holder
end
end
【Initial_Good_Genotype】
function Eff5=Initial_Good_Genotype()
N=36;
eff5=[3 4 5 6 7 8 15 16 17 18 21 22 23 24];
eff4=[25 26 27 28];
len5=length(eff5);
len4=length(eff4);
temp=0;
for i=1:7
for j=1:len5
Eff5(temp+j)=eff5(j)+36*(i-1);
end
temp=temp+len5;
end
for i=1:7
for j=1:len4
Eff5(temp+j)=eff4(j)+36*(i-1);
end
temp=temp+len4;
end
for i=1:252
if any(Eff5==i)==0 %if exist i, return 1, otherwise, 0
temp=temp+1;
Eff5(temp)=i;
end
end