Introduction to Genetic Algorithms

890 views

Published on

An introduction to Genetic Algorithms with an example application on the NP-Complete Set-Covering Problem.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
890
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
52
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Introduction to Genetic Algorithms

  1. 1. Genetic Algorithms A Concise Introduction Arber Borici March 30, 2010
  2. 2. Outline GAs: Powerful Metaheuristics The Canonical Algorithm Hybrid GAs Areas of Application  Integer Programming  Multi-Objective Optimization An Example: Set Covering Problem  Application on binary image compression 2
  3. 3. What is a GA? A GA is an ‘intelligent’ stochastic search metaheuristic that simulates evolutionary natural selection  Intelligent?  Stochastic search Developed by J. Holland ’75 Univ. Mich. Advanced variations  Simple, Steady-State, Hybrid, etc. Genetic Programming (Koza’90) G. P. 3
  4. 4. GA Procedure Start with a random or predefined population Individuals reproduce, mutate, die Each individual has a relative fitness Genes from highly-fit individuals survive the next generation  Fitness is the crux of GAs Run GA for N generations or until some criteria are met  An infinite run would yield some optimal result 4
  5. 5. Some Terminology Chromosomes as simple data structures 5
  6. 6. GA PseudocodeGenerate initial population PEvaluate fitness for each individualREPEAT - Select parents from P - Recombine selected parents - Evaluate offspring fitness - Replace low-fit individuals with offspringUNTIL some criteria are met 6
  7. 7. Generating the Population Problem-dependent  Merely random individuals  Some predefined configuration  Admixture of individuals Each individual has a uniform structure  Integer-based representation  Alleles as groups of bits  Individual = chromosome 7
  8. 8. The Fitness Function Suppose you are searching for a solution You start with a random population The fitness function could be the “distance” between the current solution to the final solution  The smaller the distance, the higher the fitness Low-fit individuals will eventually die Highly-fit individuals will reproduce, i.e. transfer genes to offspring 8
  9. 9. Fitness (cont’d) Fitness determines the strength of the GA Relative fitness is computationally suitable The Schema Theorem (Holland’75):  Each generation, highly-fit individuals survive and produce offspring with even higher fitness  The fittest individual of the last generation shall be the solution to the search problem Fitness functions vary in complexity and computability 9
  10. 10. Inside the Loop: Selection Selection is done at every generation  A stochastic operation  Two individuals are chosen until a pool of offspring is constructed Selection determines who will live and who will die  Offspring are first evaluated (fitness) Highly-fit offspring will replace low-fit individuals 10
  11. 11. Inside the Loop: Selection Several strategies:  Ranking  Roulette wheel  Tournament  Stochastic remainder sampling  Stochastic uniform sampling Strategies are empirically and theoretically employed 11
  12. 12. Inside the Loop: Recombination Parents recombine to produce offspring: • Gene regions are exchanged • Locations are heuristically or randomly selected Crossover • Note the swapped genes • Offspring have a different (usually better) fitness 12
  13. 13. Inside the Loop: Replacement Several strategies:  Replace extrema  Stochastic replacement  Crowding (most similar)  Replace just parents Theoretical and empirical judgment 13
  14. 14. Ending the Loop: Criteria Number of generations  Apocalypse time at a predefined generation number (Sub-)Optimal solution  A satisfying solution has been attained  Requires careful judgment Problem: Premature convergence 14
  15. 15. Premature Convergence Crossover and mutation too effective  No gene variation – quick convergence  Likelihood of finding new individuals (solutions) decreases Two approaches:  DeJong Crowding: replace most similar solutions  Goldberg’s Fitness Scaling: decrease fitness of most similar individuals 15
  16. 16. Premature ConvergenceOur objective is to attain Premature convergence atsome maximum value certain local niches Figure: CodeProject 16
  17. 17. General Parameters Population size  Problem-dependent  A small size does not imply derated optimality Number of generations  What is a good stopping time? Crossover and mutation probabilities  Naturally, mutation has a minute rate  In simulations, rates vary per empirical judgment 17
  18. 18. Some Properties Implicit Parallelism  GA searches many solutions implicitly at the same time (schema of chromosomes)  Binary alphabets offer the largest number of schema Standalone GAs often provide sub-optimal solutions for very large problem sizes  Optimality is expected 18
  19. 19. Hybrid GAs GAs may be combined with local search heuristics  Local search on planes input by the GA  If niche is reached, it has the highest fitness  Much like hill-climbing  Great impact on efficiency and speed Employed in hard optimization problems 19
  20. 20. Applications Optimization Problems  Single and multi-objective  Constrained and unconstrained  Eg. NP-hard problems (TSP, SPP, SCP, scheduling, etc.); Economics (loads of optimization of problems); Artificial and Biological Systems  Gene profiling  Computational creativity 20
  21. 21. Case Study: Set-Covering Problem SCP is a classical CS problem  Discrete combinatorial optimization  Applicable to many real-world problems SCP is NP-Hard SCP is formulated as a typical cost minimization problem Real-world areas: bus/airline scheduling, resource allocation, nurse scheduling, etc. 21
  22. 22. SCP: Formulation Given a binary X-by-Y matrix, cover ALL the rows with the smallest number of columns  At minimum cost, if columns have some corresponding cost Mathematically: Y Min Z = ∑ ccol xcol col =1 Subject to all rows being covered! s.t. Mx = 1 If column i is chosen, then xcol = 1. M , x ∈ {0,1}Y 22
  23. 23. SCP: Example 1 2 3 4 5 1 1 1 0 0 0 1 2 1 1 0 0 1 1 3 0 1 1 0 0 1 4 0 0 0 1 0 ? 1 5 0 0 0 1 0 1 6 1 0 0 0 0 1 7 0 0 1 0 0 1 23
  24. 24. Example 1 2 3 4 5 1 1 1 0 0 0 1 2 1 1 0 0 1 1 3 0 1 1 0 0 1 4 0 0 0 1 0 Use 1, 3, 4 1 5 0 0 0 1 0 1 6 1 0 0 0 1 1 7 0 0 1 0 0 1 24
  25. 25. Example 1 2 3 4 5 1 1 1 0 0 0 1 2 1 1 0 0 1 1 3 0 1 1 0 0 1 4 0 0 0 1 0 Use 1, 3, 4 1 5 0 0 0 1 0 1 6 1 0 0 0 1 1 7 0 0 1 0 0 1 25
  26. 26. GA Approach GA’s purpose: select those columns whose union will be a unit vector How do we model a chromosome? What parameter values do we choose?  Crossover rate = ?  Mutation rate = ?  Type of GA? Any local search heuristics? Fitness? 26
  27. 27. SCP Chromosome Chromosome = a vector u of columns  u[ i ] = 1  the i-th column of the matrix is considered for the overall covering 1 2 3 4 5 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 0 0 0 1 0 27
  28. 28. SCP Chromosome Chromosome = a vector u of columns  u[ i ] = 1  the i-th column of the matrix is considered for the overall covering 1 2 3 4 5 1 1 0 0 0 1 1 0 0 1 Not optimal 1 0 1 1 0 0 1 1 0 0 0 0 0 1 0 28
  29. 29. SCP Chromosome Chromosome = a vector u of columns  u[ i ] = 1  the i-th column of the matrix is considered for the overall covering 1 2 3 4 5 1 1 0 0 0 1 1 0 0 1 Minimal 0 1 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 0 1 0 29
  30. 30. SCP: GA Parameters Generate a fixed number of binary vectors u  Population size depends on SCP matrix size  Empirically, 60 – 300 individuals Mutation rate should be very small  Usually, pm = 0.1% to pm = 1%  Higher pm may destroy highly-fit individuals Crossover rate  Unless other operators are used, pc= 1 - pm 30
  31. 31. Fitness Function We are trying to minimize the total cost of select columns  If C(i) > C(j), i, j are individuals, then F(i) < F(j)  The most highly-fit individual has the smallest cost score That could be the weak fitness function In real-world problems (e.g. bus scheduling) more constraints are involved: fares, unions, gas, and so forth 31
  32. 32. SCP: GA Type A canonical GA will converge prematurely  Eventually, crossover will cause vectors u to be indistinguishable Strategy: Use a hybrid GA  Some established local search heuristic  E.g. Nelder simplex search (Numerical Recipes in C, 1989) Computationally expensive for large matrices!  Consider parallel GAs 32
  33. 33. SCP: Binary Image Compression A binary image is a collection of black and white pixels:  Representedas a matrix of zeros and ones  Byconvention, zero represents a white pixel, while one represents a black pixel Consider partitioning the image into 8x8 matrices (or blocks)  Compress the image by compressing each block 33
  34. 34. Compressing Blocks The block comprises 0’s and clusters of 1’s  How can we efficiently encode those clusters? - One line - Two points - One rectangle 34
  35. 35. Compressing Blocks Problem: Group 1’s using a number of Problem geometric shapes, such that the total cost (expressed in bits) is at minimum.  First, convert blocks into an SCP matrix, whose columns are all possible geometric shapes  Then, choose those columns that cover all rows Example 35
  36. 36. Conclusions GAs are powerful search metaheuristics  Survival-of-the-fittest paradigm  Implicit parallelism: Schema Theorem  Not random! Stochasticity in mutation/crossover GA crux: modeling the fitness function  The fitness model determines GA capability  Careful empirical/theoretical considerations GAs are computationally expensive  Consider distributing GA computations 36

×