Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

1,492 views

Published on

Epistasis correlation is a measure that estimates the strength of interactions between problem variables. This paper presents an empirical study of epistasis correlation on a large number of random problem instances of NK landscapes with nearest neighbor interactions. The results are analyzed with respect to the performance of hybrid variants of two evolutionary algorithms: (1) the genetic algorithm with uniform crossover and (2) the hierarchical Bayesian optimization algorithm.
http://medal.cs.umsl.edu/files/2011002.pdf

  • Be the first to comment

  • Be the first to like this

Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

  1. 1. Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions Martin Pelikan Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/ pelikan@cs.umsl.edu Download MEDAL Report No. 2011002 http://medal.cs.umsl.edu/files/2011002.pdfMartin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  2. 2. Motivation Problem difficulty measures Important for understanding and estimating problem difficulty. Should be useful in designing, chosing and setting up optimization algorithms. Most past work considers few isolated instances. This study Focuses on measures of epistasis (variable interactions). Analyzes epistasis measures on a large number of instances of nearest-neighbor NK landscapes. Compares the measures with actual performance of hybrid GA. Complements last year’s GECCO paper on other measures.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  3. 3. Outline 1. Epistasis. 2. Epistasis variance and epistasis correlation. 3. NK landscapes. 4. Experiments. 5. Conclusions and future work.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  4. 4. Epistasis Epistasis Epistasis refers to interactions between problem variables. Effects of one variable depend on values of other variable(s). In biology phenotype mapping of a gene is affected by another. Why should we care? Absence of epistasis indicates a simple, linear problem. Epistasis may make a problem more difficult.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  5. 5. Critical View on Epistasis Criticism Epistasis is of little use unless we understand its nature. There exist many easy problems with high epistasis. There exist many hard problems with little epistasis. Epistasis is difficult to measure using finite samples. Examples Epistasis in a difficult problem Needle in a haystack. Deceptive problem. Epistasis in a simple problem Onemax with additional contribution of optimum (simple).Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  6. 6. Linear Fitness Approximation Linear fitness approximation Assume candidate solutions are n-bit binary srings. Assume population P of N solutions. Pi (vi ) denotes solutions with vi ∈ {0, 1} in position i. Ni (vi ) is the number of solutions in Pi (vi ). fi (vi ) approximates contribution of vi to fitness 1 fi (vi ) = f (x) − f (P ) Ni (vi ) x∈Pi (vi ) Approximate fitness as follows n flin (X1 , X2 , . . . , Xn ) = fi (Xi ) + f (P ). i=1Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  7. 7. Epistasis Variance Epistasis variance (Davidor, 1990) In short: Sum of square differences between f and flin . Epistasis variance ξP (f ) is defined as 1 ξP (f ) = (f (x) − flin (x))2 N x∈PMartin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  8. 8. Epistasis Correlation Epistasis correlation (Rochet et al., 1997) In short: Correlation coefficient between f and flin . Sum of square differences between f and its average f (P ) 2 sP (f ) = f (x) − f (P ) x∈P Sum of square differences between flin and its average flin (P ) 2 sP (flin ) = flin (x) − flin (P ) x∈P Epistasis correlation ξP (f ) is defined as x∈P f (x) − f (P ) flin (P ) − flin (P ) epicP (f ) = sP (f )sP (flin )Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  9. 9. Evaluating Epistasis Measures Epistasis variance Not invariant w.r.t. linear transformations of f . Not within a fixed range of values. Smaller epistasis variance indicates weaker epistasis. Epistasis correlation Inviariant w.r.t. linear transformations of f . Value is within range [0, 1]. Greater epistasis correlation indicates weaker epistasis.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  10. 10. Experiments: Algorithms Genetic algorithm (Holland, 1975) Uniform crossover. Bit-flip mutation. Tournament selection. Restricted tournaments for niching. Steepest ascent hill climber for local search. Hierarchical BOA (Pelikan et al., 2001) Variation by learning and sampling Bayesian networks with decision trees. Tournament selection. Restricted tournaments for niching. Steepest ascent hill climber for local search.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  11. 11. Experiments: Problems NK landscapes with nearest neighbors Defined on n-bit binary strings. Fitness is sum of n subproblems of order k + 1. Subproblem i uses ith variable and the following k variables. Neighborhoods wrap around (as on a circle). Subproblems defined as lookup tables generated from [0, 1). Example for n = 6 and k = 2 f (X1 , . . . , X6 ) = f1 (X1 , X2 , X3 )+ X1 X2 X3 f1 (·) = f2 (X2 , X3 , X4 )+ 0 0 0 0.51 = f3 (X3 , X4 , X5 )+ 0 0 1 0.18 = f4 (X4 , X5 , X6 )+ 0 1 0 0.97 = f5 (X5 , X6 , X1 )+ 0 1 1 0.68 = f6 (X6 , X1 , X2 ) 1 0 0 0.47 ... 1 0 1 0.73 1 1 0 0.06 1 1 1 0.41Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  12. 12. Experiments: Problems NK parameters k ∈ {2, 3, 4, 5, 6} n ∈ {20, 30, 40, 50, 60, 70, 80, 90, 100} For each (n, k), we use 10,000 instances. Difficulty of nearest-neighbor NK landscapes Difficulty grows with k. Polynomially solvable using dynamic programming. For larger n and k, hBOA outperforms GA.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  13. 13. Results: Scatter Plot for hBOA Epistasis correlation decreases with k (expected). For any k, epistasis correlation does not seem to closely correspond to the actual problem difficulty.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  14. 14. Results: Epistasis Correlation vs. n and k for hBOA 1 1 1 0.9 k=2 k=2 0.9 Avg. epistasis0.9 n=100 corr., Avg. epista Epistasis correlation Epistasis correlation Epistasis correlation 0.8 k=3 k=3 0.8 0.8 0.7 k=4 k=4 0.7 0.7 0.6 k=5 k=5 0.6 0.6 k=6 k=6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 60 80 100 120 20 40 60 80 100 120 2 3 4 52 63 4umber of bits, n Number of bits, n Number of neighbors, k Number of nerrelation with respect (a) Epistasis correlation with respect (b) Epistasis correlation with respect (b) Epistasis correlatio to n. Epistasis correlation does notto k. change with n. to k. Epistasis correlation increases with k.re 3: Epistasis correlation with respectof bits and the number k and the nu elation with respect to the number n to the number n of bits of neighboest-neighbor NK landscapes. dscapes.creasedfact, for GA, the resultsis. In level of epistasis. In fact, forcorrelation does not provide a single input provide a GA, the results correlation does not for the punderstanding of epistasis and agreement with our understanding tioner to assess problem difficulty, even if we assume of epistasis and tioner to assess problem difficuem difficulty of k, although thearger values even for larger values ofthe problem sizeofand the order of interactions are fixed Martin Pelikan k, although the Epistasis Correlation on NK Landscapesthe order Analysis the problem size and
  15. 15. 5)5) 50% easiest all instances 26684.2 (3427.8) 11445.0 (14255.6) 0.4026 (0.025) 0.3079 (0.020) all instances 84375.6 (119204.9) 0.305)5) Results:all Problem Difficulty and Epistasis Correlation(149074.5) 50% hardest 35968.5 (26303.0) 0.4009 (0.025) instances 25903.7 (14931.0) 0.3072 (0.020) 25% hardest 44928.0 (30885.2) 0.3993 (0.025) 50% hardest 40362.3 (16730.7) 0.3068 (0.020) 50% hardest 139492.0 25% hardest 209536.2 (185682.5) 0.30 0.305) 25% hardest 10% hardest 58353.5 (36351.8) 57288.4 (19617.4) 0.3989 (0.025) 0.3071 (0.020) 10% hardest 335718.7 (242644.0) 0.305) 10% hardest 85279.2 (44200.4) 0.3970 (0.025) GA for n = 80 and hBOA, n = 100, k = 6 (e) k = 6: (e) GA (uniform), n = 80, k = (d) GA (uniform), n = 100, k=5 desc. of DHC steps until epistasis desc. of DHC steps until epis desc. of instances DHC steps until optimum epistasis correlation instances optimum corr instances 10% easiest optimum(2929.9) 21364.1 correlation 0.2349 (0.016) 10% easiest 15208.7 (3718.2) 0.230) 10% easiest 25% easiest 13898.4 (2852.6) 26787.7 (5261.6) 0.3099 (0.020) 0.2351 (0.016) 25% easiest 22427.3 (6968.4) 0.230) 25% easiest 50% easiest 19872.9 (5765.2) 34276.6 (8833.1) 0.3098 (0.020) 0.2348 (0.016) 50% easiest 34855.5 (14722.9) 0.230) 50% easiest all instances 29259.2 (11063.4) 60774.8 (42442.8) 0.3087 (0.020) 0.2344 (0.016) all instances 117021.4 (204462.0) 0.230) all instances 84375.6 (119204.9) 50% hardest 87272.9 (46049.2) 0.3079 (0.020) 0.2339 (0.016) 50% hardest 199187.4 (264378.2) 0.230) 50% hardest 139492.0 (149074.5) 25% hardest 114418.9 (52085.3) 0.3070 (0.020) 0.2340 (0.016) 25% hardest 310451.2 (338773.5) 0.230) 25% hardest 209536.2 (185682.5) 10% hardest 154912.8 (62794.1) 0.3068 (0.020) 0.2341 (0.016) 10% hardest 519430.9 (461122.7) 0.230) 10% hardest 335718.7 (242644.0) 0.3058 (0.019) hBOA for n =(e) GA (uniform), n = 80,easy 6 100 and k = 6: Table 1: Epistasis correlation for k = and hard in- Table 2: Epistasis correlation for easy stances for hBOA. The difficulty of instances is mea- stances for GA with uniform crossove desc. of DHC steps until epistasis sured by the overall number of steps of the local culty of instances is measured by the ov instances optimum correlation searcher. of steps of the local searcher.16) 10% easiest 15208.7 (3718.2) 0.2358 (0.018)16) 25% easiest 22427.3 (6968.4) 0.2358 (0.018)16) 50% easiest 34855.5 (14722.9) 0.2353 (0.018)16) all instances 117021.4 (204462.0) 0.2344 (0.018)16) 50% hardest 199187.4 (264378.2) 0.2335 (0.018)16) 25% hardest 310451.2 (338773.5) 0.2330 (0.018)16) 10% hardest 519430.9 (461122.7) 0.2324 (0.018)ard in- Table 2: Epistasis correlation for easy and hard in- For fixed n and k, epistasis correlation changes only a little.s mea- stances for GA with uniform crossover. The diffi-e local culty of instances is measured by difficult problems, but the Epistasis is stronger for more the overall number of differences are nearly negligible. steps of the local searcher. Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  16. 16. Conclusions and Future Work Conclusions For NK landscapes, epistasis correlation is certainly not useless, it provided some input on problem difficulty of NK landscapes. Epistasis correlation succeeded in providing a clear indication that problem difficulty increases with k. Epistasis correlation failed to capture the increase of problem difficulty with problem size. Epistasis correlation failed to provide a clear indication of problem difficulty for a fixed n and k. Future work Compare different measures of problem difficulty. Identify problem features that these measures do not capture. Create new problem difficulty measures that provide better input for optimization practitioners. Key goals of these efforts: Tune algorithm to problem (parameters, operators). Choose best optimization algorithm. Drive design of new optimization algorithms.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes
  17. 17. Acknowledgments Acknowledgments NSF; NSF CAREER grant ECS-0547013. University of Missouri; High Performance Computing Collaboratory sponsored by Information Technology Services; Research Award; Research Board.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes

×