1.
PhD Viva VoceSPOT: A suboptimum- and proportion-basedheuristic generation method forcombinatorial optimization problems25th February 2013, Hong KongPhD Candidate: Fan XUESupervisor: Dr. C.Y. CHANCo-supervisors: Dr. W.H. IP and Dr. C.F. CHEUNGDepartment of Industrial & Systems EngineeringHong Kong Polytechnic University Department of Industrial and Systems Engineering 工業及系統工程學系
2.
Outline 1 Introduction 2 The SPOT algorithm 3 Applications in two domains 4 Discussion and conclusionsFan XUE: SPOT: A heuristic generation method (PhD viva) 2
4.
Introduction: Optimization Optimization is … Minimizing inconvenience (e.g., cost) Maximizing benefits (e.g., revenue) A continuous example: Dido’s problem Ruins of Carthage (from Wikipedia) The legend of foundation of Carthage (~800 y B.C.) was recorded by A Roman poet (~29- 19B.C.) The Queen Dido cut a bull hide into very a b x narrow strips and circumscribed a maximized Dido’s problem (variables: b, f(x)) size of land in a semicircle, with the max Mediterranean coast as a given edge. The Dido’s problem (isoperimetric problem) s.t. To enclose the maximum area with a given perimeter.Fan XUE: SPOT: A heuristic generation method (PhD viva) 4
5.
Introduction: Optimization A discrete example: Tian’s horse racing (田忌賽馬) (In 340s B.C.,) General Tian of Kingdom Qi raced horses with other members of royal family several times. His guest Sun told him a better Ruins of Ancestral Temple of Qi (from qidu.net) plan… (Sima, 91bc) General Tian’s King’s With Sun’s displacement strategy, Tian won 2/3 Faster races. The two successful solvers of optimization problems became representatives of human rationality and intelligence. Slower Optimization: “soft” research, yet important Tian’s usual races Sun’s strategyFan XUE: SPOT: A heuristic generation method (PhD viva) 5
6.
Introduction: Combinatorial optimization To look for an object from a finite (or possibly countably infinite) set (Papadimitriou & Steiglitz, 1982) Typical object: an number, a permutation or a graph structure … (C. Blum & Roli, 2003). Term search space: the set of all objects Term optimum: the best object Examples Traveling salesman (TSP) Flow shop scheduling (FSP) An instance (from Wikipedia)Fan XUE: SPOT: A heuristic generation method (PhD viva) 6
7.
Introduction: Algorithms for difficult combinatorial optimization Many problems, including TSP and FSP, are very difficult (NP-hard, Non-deterministic Polynomial time-hard) when in large-scale Even NP-hard to approximate in polynomial time TSP: when ratio < 220/219 (Papadimitriou and Vempala, 2006) FSP: when ratio < 5/4 (Williamson et al., 1997) Exact algorithms which guarantee to find the optimum Including Brach-and-bound, integer (linear) programming-based, relaxation-based, exponential neighborhood search, … Typically face explosive running time (when prob. size is large) Heuristics which concentrate on a (small) subset of the solution space do not guarantee finding the optimum Including greedy construction, hyper-heuristics, random, … Typically return suboptimal solutions Thesis focuses onFan XUE: SPOT: A heuristic generation method (PhD viva) 7
8.
Introduction: Dominance among algorithms Exact algorithms and heuristics are indispensable squeeze theorem One algorithm is called “dominated” Iff. inferior both in time and in objective value Non-dominated = good A few exact algorithms A lot of heuristics Examples of non-dominated and dominated heuristics for solving the 10,000-Fan XUE: SPOT: A heuristic generation method (PhD viva) city Random Uniform Euclidean dataset (Johnson & McGeoch, 2002) 8
9.
Introduction: Hyper-heuristics A hyper-heuristic is a search method or learning mechanism for selecting or generating heuristics to solve computational search problems (Burke et al., 2010) Formally, a hyper-heuristic is a 3-tuple (ag, A, l) where S: problem search space A: attribute space H: low-level heuristic (LLH) space R+∪{0}: measurement ag: attribute generation l: learning π: getting performanceFan XUE: SPOT: A heuristic generation method (PhD viva) 9
10.
Introduction: Hyper-heuristics (cont.) Two subclasses in hyper-heuristics Heuristic selection Dated back to Crowston et al. (1963)’s early trial Portfolios algorithm: SATzilla (Xu et al., 2008), Pearl Hunter (Chan et al., 2012) Heuristic generation E.g., Genetic programming (GP) (Koza, 1992) Many hyper-heuristics were easily dominated But some (e.g., SATzilla, Pearl Hunter) are strong Two main challenges for hyper-heuristics Computation time (preparation+ learning + applying) Generalization: “the biggest challenge” (Burke et al., 2010)Fan XUE: SPOT: A heuristic generation method (PhD viva) 10
11.
Introduction: No-Free-Lunch Theorems If an algorithm aims at solving general problems (latter challenge), there is a pitfall called… No-Free-Lunch Theorems (NFLs, Wolpert & Macready, 1997) Over all possible problems, no search algorithm is better than any other algorithm, on average. Over all possible search algorithms, no search problem is harder than any other problem, on average. NFLs concern Black Box Optimization No instance/domain-specific information is considered. Question: How to keep a safe distance from the pitfall of the NFLs?Fan XUE: SPOT: A heuristic generation method (PhD viva) 11
12.
Introduction: Motivations Answer: For an optimization algorithm, there are (at least) two options To become a well-designed problem domain-specific (ad-hoc) algorithm (Burke, et al., 2003) To discover and to make use of domain-specific, problem data set-specific and/or instance-specific characteristics dynamically, from case to case Thesis focuses on VLSI drilling data set Random data set All in TSP domainFan XUE: SPOT: A heuristic generation method (PhD viva) 12
13.
Introduction: Forerunners -- Fractals using instance-specific information An example is measuring the length of the coastline of Great Britain by “self-similarity” (Mandelbrot, 1967) L(G) = MG1-D where M is a positive constant prefactor, G is a variable of the measurement scale and D is an instance-specific constant of the fractal dimension D Geographic boundary 1.00 A straight line 1.25 The west coast of Great Britain 1.13 The Australian coast 1.02 The coast of South Africa, one of the smoothest in the atlas Maps of Australia (left) and South Africa (right) Coastline of Britain (from Wikipedia) (from ducksters.com)Fan XUE: SPOT: A heuristic generation method (PhD viva) 13
14.
Introduction: Forerunners -- Sampling using instance-specific information Sampling/estimation is another example to select some part of a population to estimate something about the whole population A sample is representative (correctly reflecting instance- specific information) only when accurate (with a mean of errors no more than a given threshold) reproducible (with a variance of errors no more than another given threshold) Well-known applications Census (with a sampling frame) Production testing (See also in this thesis) Census (from ehow.com)Fan XUE: SPOT: A heuristic generation method (PhD viva) 14
15.
Introduction: Using instance-specific information to adjust algorithms Employing instance-specific information as “characteristic” to adjust algorithms is common in literature, e.g., tour-merging Typical ways of adjustments algorithm selection parameter optimization (aka parameter tuning) algorithm generation In this thesis, define: Heuristic selection: “always returns n ≤ 0 possible heuristics” Heuristic generation: “returns n> 0 possible heuristics for at least one (usually large) problem” (e.g., with 0 objects) Thesis focuses onFan XUE: SPOT: A heuristic generation method (PhD viva) 15
16.
Introduction: Supervised learning Supervised learning (or classification) The search for algorithms that reason from externally supplied records (or instances) to produce general hypotheses, which then make predictions about future instances (Kotsiantis, 2007). Well known methods include Naive Bayes, Logistic Regression, Support vector machines, C4.5, Neural networks, Ripper, … In thesis, three representative methods are employed (to reduce errors of model) Color Legs Result black 8 Spider gray 8 Spider gray 4 Lizard white/black 4 Other … … … Supervised learning comic (Image from uq.edu.au)Fan XUE: SPOT: A heuristic generation method (PhD viva) 16
17.
Section 2 THE SPOT ALGORITHMFan XUE: SPOT: A heuristic generation method (PhD viva) 17
18.
The SPOT algorithm: Objectives SPOT Suboptimum- and Proportion-based On-the-fly Training (SPOT) Main objectives On-the-fly learning based on sampled proportion & suboptimum Using learned instance-specific information to generate (modify) heuristics Supportive objectives To standardize the input model A systematic way of compiling decision attributes An indicator to guide development and to predict the effectiveness approximately Several effective ways of using instance-specific A proportion of an image of a leopard with spots information (from worldwildlife.org)Fan XUE: SPOT: A heuristic generation method (PhD viva) 18
19.
The SPOT algorithm: The U/EA model The U/EA model A combinatorial optimization problem is called Unconstrained and with Equinumerous Assignments (U/EA), iff. there are no hard constraints among different variables in the given problem all variables have equinumerous assignments in the given problem Convenient for learning to predict the best assignment for each variable A small proportion of variables (solved once) are sufficient U/EA transformation An injection (sometimes bijective) f is a U/EA transformation, iff. The domain of f is a problem domain U/EA transformation of combinatorial optimization A U/EA cable (image from cooldrives.com) The image of f fulfills U/EA model Problem in U/EA model given domainFan XUE: SPOT: A heuristic generation method (PhD viva) 19
20.
The SPOT algorithm: Formalization The SPOT algorithm is a 7-tuple (sam, S’, T, S, ag, A, l), where sam: sampling S’: A proportion /subproblem’s search space T: Transformation to U/EA ∑: U/EA problem search space rest symbols are similar to the model of hyper-heuristics. The SPOT algorithm is a hyper- heuristic Proof: Define a mapping ag’(s) = ag (T (sam(s))), it becomes a hyper-heuristic (ag’, A, l).Fan XUE: SPOT: A heuristic generation method (PhD viva) 20
21.
The SPOT algorithm: Development Three main phases to develop a SPOT application An indicator resemblance r of two solutions is r = ||same assignments|| / ||variables|| Compatibility: An evaluation of U/EA transformation Given a set {r1, r2, …} of average resemblances of suboptima of different instances (one average resemblance for each instance, at least two suboptima), a U/EA problem is compatible with SPOT, if the following can generally be held: (Generality)ri ≈ rj ≈ … ≈ r (σ → 0), where ri, rj ∈{r1, r2, …} (Approximability) r ≈1Fan XUE: SPOT: A heuristic generation method (PhD viva) 21
22.
The SPOT algorithm: Attribute population 2p+3 attributes for i-th assignment from one raw attribute p: ||variables in the subproblem|| An example One raw att. raw a1 a2 a3 a4 … 2 2 1 T F … 3 3 2 F T … 3 3 2 F T …Fan XUE: SPOT: A heuristic generation method (PhD viva) 22
23.
The SPOT algorithm: Supervised learning from library Learning methods from Weka 3.6 Three supervised learning methods J48, NaiveBayes and JRip implementing C4.5, Naïve Bayes and Ripper methods, respectively An attribute selection method BestFirst search with the CfsSubsetEval evaluator Parameters were their default values Parameters of the methods were default valuesFan XUE: SPOT: A heuristic generation method (PhD viva) 23
24.
The SPOT algorithm: Implementation in Java The main program SPOT calls two components SPOT_ML 3 + 1 learning methods PD_UEA2 Extends problem domains from HyFlex HyFlex is a hyper-heuristic development platform In CHeSC 2011 (1st cross-domain heuristic search challenge) 6 domains, tens of existing LLHs With some best-so-far hyper- heuristic resultsFan XUE: SPOT: A heuristic generation method (PhD viva) 24
25.
Section 3 APPLICATIONS IN TWO DOMAINSFan XUE: SPOT: A heuristic generation method (PhD viva) 25
26.
Application I: The FSP domain FSP: An overview To find a permutation of n jobs on m machines that minimizes the makespan Early research dated back to Johnson (1954) Applications: shop floor, Internet multimedia transferring, … Top heuristics: NEH (Nawaz et al., 1983), ... FSP benchmark instances & heuristics in HyFlexFan XUE: SPOT: A heuristic generation method (PhD viva) 26
27.
Application I: Transformations P1: Transformations For each two jobs Ji, Jj in a permutation, define a function “following” as A permutation (J3, J2, J1) can be written as Reverse transformation: summation of i-th row as Ji’s reverse order Compatibility r=79.58% high approx. σ=5.80% high generality Learning r in 4 test instances in FSP Predicting the instance-specific 0-1 matrixFan XUE: SPOT: A heuristic generation method (PhD viva) 27
28.
Application I: Attribute definitions 5m+3 attributes for each job pair (Ji ,Jj)Fan XUE: SPOT: A heuristic generation method (PhD viva) 28
29.
Application I: Parameters P2: Parameter determination n : size of subproblem := 30 Reverse:= weighted permutation (50% NEH) Sampling := Random selection 2 Average test results of resemblance (n’= 30, 100 runs, significant values of p in bold)Fan XUE: SPOT: A heuristic generation method (PhD viva) Average test results (100 runs) 29
30.
Application I: Modifying heuristics Applying learning results back to LLHs Become constructive heuristics by set param 1 to 1.0 Random Instance-specific information guidedFan XUE: SPOT: A heuristic generation method (PhD viva) 30
31.
Application I: Individual results of new LLHs LLHs 0# and 1# were not significantly changed LLHs 5#, 6#, 9# & 10# were significantly changed 3 new non-dominated LLHs 5’, 6’ & 9’Fan XUE: SPOT: A heuristic generation method (PhD viva) 31
32.
Application I: Comparing to world leading hyper-heuristics PHunter: Pearl Hunter (10 mins) SPOT-PH: SPOT + PH (new LLHs) (10 mins overall) Observations PH is generally improved Score doubled and sur- passed the best-known entries Score against best results in CHeSC 2011 SPOT executed on-the-fly (5x31 runs, higher = better) (Median) Time spent by SPOT (31 runs) A comparison on median makespan (31 runs)Fan XUE: SPOT: A heuristic generation method (PhD viva) 32
33.
Application II: The TSP domain (fast- forward) P1: transform = Predict best two or more edge candidates in edge sets Compatibility r=70.4% moderate approximability σ=21.7% moderate to low generality P2: Parameters determinations n := 400 Aspect ratio of subproblems := 1:1 Predicting promising candidates from a list of edge set Sampling := rectangular subarea + random selection P3: Applying back & runs Let local search LLHs try the candidates suggested by SPOT first Note: A light /general version of Xue et al. (2011)Fan XUE: SPOT: A heuristic generation method (PhD viva) 33
34.
Application II: Results in the TSP domain Observations Improved in some instances Worsen in some others Overall improvement is slight and insignificant statistically Comparing to Xue et al. (2011) Score against best results in CHeSC 2011 (5x31 runs, higher = better) LLHs: 2/3-Opt vs. 5-Opt Candidate sets: 8-NN (99% coverage of the optimum’s edges) vs. 5-NN (97%) A comparison on median tour length (31 runs)Fan XUE: SPOT: A heuristic generation method (PhD viva) 34
35.
Section 4 DISCUSSION AND CONCLUSIONSFan XUE: SPOT: A heuristic generation method (PhD viva) 35
36.
Discussion: The SPOT SPOT: a heuristic selection or heuristic generation? A heuristic generation usually A heuristic selection, in extreme cases, e.g., there is only one Boolean attribute designed (insufficient resolution for learning) and/or the domain is finite (e.g., 33 tic-tac-toe) Assumptions for learning from parts? ri ≈ rj ≈ … ≈ roverall ≈ 1, for instance i, j, … Tic-tac-toe Flexibly indicated in development (from cornell.edu) SPOT can generate completely new LLHs like Genetic Programming, instead of modifying existing LLHs See the permutations in the “direct” mode in FSP. (Though it was easily dominated)Fan XUE: SPOT: A heuristic generation method (PhD viva) 36
37.
Discussion: The SPOT (cont.) Advantages of SPOT SPOT is supposed to be on-the-fly Solving asubproblem(s): on-the-fly Learning on the results of the subproblem(s): on-the-fly Applying back to LLHs: on-the-fly SPOT is supposed to be cross-domain (with U/EA models) But it depends on the U/EA transformation and attributes in each application Only a few parameters to determine in SPOT Size of proportion (key parameter) , sampling & applying methods Two of them can be quickly determined by resemblances instead of trial-and- error testsFan XUE: SPOT: A heuristic generation method (PhD viva) 37
38.
Discussion: The development Difficulties/drawbacks Designing a compatible U/EA transformation “cable” (Key issue) Memory usage of stochastic learning (may exceed 32-bit limit) Applying instance-specific information back to heuristics SPOT’s instance-specific information can also be applied into Constructive heuristics Some exact algorithms E.g., branching (Xue et al., 2011)Fan XUE: SPOT: A heuristic generation method (PhD viva) 38
39.
Discussion: Errors in experiments Possible sources of errors in tests Number of test instances Source of test instances Unstable solutions Instrument error in measuring time in Java 32/64-bit environment of Java CPU and memory performance adjustment … So statistical techniques are employed generally to identify errors, deviations Statistical significanceFan XUE: SPOT: A heuristic generation method (PhD viva) 39
40.
Discussion: The formal definitions Definitions of heuristic selection and heuristic generation Consistent with the existing taxonomy (Burke et al., 2010) Extensions of both concepts are extended Parameter tuning for heuristics now belongs to heuristic selection Known issues: Countability trap: Ifa problem domain contains a strictly finite variables, each variable has a strictly finite domain, (e.g., all atoms/quanta in our universe), then any heuristic generation method exists there? Not well-defined concept of learning: Define: L0: If ||H||=1, skip learning Meta-heuristics/Heuristics versus hyper-heuristics under the new definition?Fan XUE: SPOT: A heuristic generation method (PhD viva) 40
41.
Conclusion Thesis presents The SPOT heuristic generation approach for combinatorial optimization And supportive models and indicators Formal definitions of the hyper-heuristic and its subclasses The results of tests were encouraging Possible future works Non-stochastic machine learning techniques To re-examine the-state-of-the-art LLHs A domain transformation map Further investigation of formalizationFan XUE: SPOT: A heuristic generation method (PhD viva) 41
42.
Contributions of thesis Contributions A successful trial of learning instance-specific information from proportions and suboptima A successful trial of employing machine learning in optimizing difficult combinatorial optimization problems A trial of formalizing the hyper-heuristics And the first clear separation between the heuristic selection and the heuristic generation, as far as is concerned Relations to previous publications Extends Xue et al. (2011)’s algorithm to cross-domain Embeds Chan et al. (2012)’s Pearl Hunter in learning preparationFan XUE: SPOT: A heuristic generation method (PhD viva) 42
43.
References Blum, C., & Roli, A. (2003, September). Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Computer Survey, 35(3), 268–308. doi:10.1145/937503.937505 Burke, E. K., Kendall, G., Newall, J., Hart, E., Ross, P., & Schulenburg, S. (2003). Hyper-heuristics: an emerging direction in modern search technology. In F. Glover & G. Kochenberger (Eds.), Handbook of metaheuristics (Vol. 57, pp. 457–474). International Series in Operations Research & Management Science. Springer New York. doi:10.1007/0-306-48056-5 16 Burke, E. K., Hyde, M., Kendall, G., Ochoa, G., Ozcan, E., & Qu, R. (2010). Hyper-heuristics: a survey of the state of the art (tech. rep. No. NOTTCS-TRSUB-0906241418-2747). School of Computer Science and Information Technology, University of Nottingham. Nottingham NG8 1BB, UK. Retrieved April 4, 2012, from http://www.cs.nott.ac.uk/~gxo/papers/hhsurvey.pdf Chan, C., Xue. F., Ip, W., Cheung, C. (2012) A Hyper-heuristic inspired by Pearl Hunting. In Y. Hamadi & M. Schoenauer (Eds.), Learning and intelligent optimization (pp. 349?353). Lecture Notes in Computer Science. Springer-Verlag. doi: 10.1007/978-3-642-34413-8_26 Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the twelfth international conference on machine learning (ml95) (pp. 115–123). Tahoe City, CA, USA. Retrieved April 4, 2012, from http://www.cs.cmu.edu/~wcohen/postscript/ml-95-ripper.ps Crowston,W. B., Glover, F., Thompson, G. L., & Trawick, J. D. (1963). Probabilistic and parametric learning combinations of local job shop scheduling rules. ONR Research Memorandum. Pittsburgh, PA, USA: Defense Technical Information Center. Hall, M. A. (1999, April). Correlation-based feature selection for machine learning. (Doctoral dissertation, Department of Computer Science, University of Waikato, Hamilton, NewZealand). Retrieved April 4, 2012, from http://www.cs.waikato.ac.nz/~mhall/thesis.pdf John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (uai’95) (pp. 338–345). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Retrieved April 4, 2012, from http://dl.acm.org/citation.cfm?id=2074158.2074196 Johnson, S. M. (1954). Optimal two- and three-stage production schedules with setup times included. Naval Research Logistics Quarterly, 1(1), 61–68. doi:10.1002/nav.3800010110 Johnson, D. S., & McGeoch, L. A. (2002). Experimental analysis of heuristics for STSP. In G. Gutin & A. P. Punnen (Eds.), The traveling salesman problem and its variations (Chap. 9, pp. 369–443). New York, NY, USA: Kluwer Academic Publishers. doi:10.1007/b101971 Joslin, D. E., & Clements, D. P. (1999, May). "squeaky wheel" optimization. Journal of Artificial Intelligence Research, 10(1), 353–373. Retrieved April 4, 2012, from http://www.jair.org/media/561/live-561-1759-jair.pdf Koza, J. R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press. Mandelbrot, B. (1967, May). How long is the coast of Britain? statistical self-similarity and fractional dimension. Science, 156(3775), 636–638. doi:10.1126/science.156.3775.636 Nawaz, M., Enscore, E. E., & Ham, I. (1983). A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega, 11(1), 91–95. doi:10.1016/0305-0483(83)90088-9 Papadimitriou, C. H., & Steiglitz, K. (1982). Combinatorial optimization: algorithms and complexity. Upper Saddle River, NJ, USA: Prentice-Hall, Inc. Papadimitriou, C. H., & Vempala, S. (2006, February). On the approximability of the traveling salesman problem. Combinatorica, 26(1), 101–120. doi:10.1007/s00493-006-0008-z Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Sima, Q. (2010). Records of the grand historian (2nd) (Y.-W. Wang, Ed.). The Twenty-four Histories of Baina Edition. Taipei, Taiwan: Commercial Press (Taiwan), Ltd. (Original work published 91B.C.). Williamson, D. P., Hall, L. A., Hoogeveen, J. A., Hurkens, C. A. J., Lenstra, J. K., Sevast’janov, S. V., & Shmoys, D. B. (1997, April). Short shop schedules. Operations Research, 45(2), 288–294. doi:10.1287/opre.45.2.288 Wolpert, D. H., & Macready, W. G. (1997, April). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82. doi:10.1109/4235.585893 Xu, L., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2008, June). SATzilla: portfolio-based algorithm selection for SAT. Journal of Artificial Intelligence Research, 32(1), 565–606. Retrieved April 4, 2012, from http://www.aaai.org/Papers/JAIR/Vol32/JAIR-3214.pdf Xue, F., Chan, C. Y., Ip, W. H., & Cheung, C. F. (2011). A learning-based variable assignment weighting scheme for heuristic and exact searching in Euclidean traveling salesman problems. NETNOMICS: Economic Research and Electronic Networking, 12, 183-207. doi:10.1007/s11066-011-9064-7.Fan XUE: SPOT: A heuristic generation method (PhD viva) 43
44.
Appendix I: Coursework Content Credits Score ISE6830 On rostering 3 A ISE6831 On demand modeling 3 A ISE552 On logistics 3 A ELC6001,6002 English - PASS Credit transfer 6 - Overall 15 A (4.0)Fan XUE: SPOT: A heuristic generation method (PhD viva) 44
Be the first to comment