We've updated our privacy policy. Click here to review the details. Tap here to review the details.

Successfully reported this slideshow.

Your SlideShare is downloading.
×

Activate your 30 day free trial to unlock unlimited reading.

Activate your 30 day free trial to continue reading.

Top clipped slide

1 of 44
Ad

PPSN XV: 15th International Conference on Parallel Problem Solving from Nature

Coimbra, Portugal, September 8–12, 2018

Session: Black Box Discrete Optimization Benchmarking (BB-DOB)

Saturday, 8 September, 14:00-15:30, Room 2.4

Aleš Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges

Coimbra, Portugal, September 8–12, 2018

Session: Black Box Discrete Optimization Benchmarking (BB-DOB)

Saturday, 8 September, 14:00-15:30, Room 2.4

Aleš Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges

- 1. Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN PPSN XV: 15th International Conference on Parallel Problem Solving from Nature Coimbra, Portugal, September 8–12, 2018 Session: Black Box Discrete Optimization Benchmarking (BB-DOB) Saturday, 8 September, 14:00-15:30, Room 2.4 Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges COST CA15140 ImAppNIO Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN1 / 44
- 2. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN2 / 44
- 3. Motivation Taxonomical identiﬁcation survey of classes in discrete optimization challenges that can be found in the literature. Black-Box Discrete Optimization Benchmarking (BB-DOB). Including a proposed pipeline perspective for benchmarking, inspired by previous computational optimization competitions. Main topic: why certain classes together with their properties should be included in the perspective, like deception and separability or toy problem label. Moreover, guidelines are discussed on: how to select signiﬁcant instances within these classes, the design of experiments setup, performance measures, and presentation methods and formats. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Introduction Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN3 / 44
- 4. Other Existing Benchmarks Inspired by previous computational optimization competitions in continuous settings that used test functions for optimization application domains: single-objective: CEC 2005, 2013, 2014, 2015 constrained: CEC 2006, CEC 2007, CEC 2010 multi-modal: CEC 2010, SWEVO 2016 black-box (target value): BBOB 2009, COCO 2016 noisy optimization: BBOB 2009 large-scale: CEC 2008, CEC 2010 dynamic: CEC 2009, CEC 2014 real-world: CEC 2011 computationally expensive: CEC 2013, CEC 2015 learning-based: CEC 2015 multi-objective: CEC 2002, CEC 2007, CEC 2009, CEC 2014 bi-objective: CEC 2008 many objective: CEC 2018 Tuning/ranking/hyperheuristics use. → DEs as usual winner algorithms. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Introduction Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN4 / 44
- 5. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN5 / 44
- 6. Discrete Optimization Functions Classes: Perspectives Including grey-box knowledge: black → white (box). More is known about the problem, the better the algorithm. Representation (known knowledge) and budget cost (knowledge from new/online ﬁtness calls): 1. Modality: unimodal, bimodal, multimodal – over GA ﬁxed genotypes. 2. Programming representations: ﬁxed vs. dynamic – using GP trees. 3. Real-world challenges modeling: for tailored problem representation. 4. Budget planning: for new problems. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN6 / 44
- 7. Perspective 1: Fixed Genotype Functions – Modality Modality: Pseudo-Boolean: f : {0, 1}n → R unimodal: there is a unique local optimum (e.g. OneMax) a search point x∗ is a local optimum if: for all x with H(x∗ , x) = 1 (i.e., the direct Hamming neighbors of x∗ ), f (x∗ ) ≥ f (x) weakly unimodal: all its local optima have the same ﬁtness multimodal: otherwise (not (weakly)unimodal) – e.g. Trap bimodal: have two local optima (e.g. TwoMax) generalization of TwoMax to arbitrary no. local optima Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN7 / 44
- 8. Perspective 1: Fixed Genotype Functions – More Properties Other properties for Boolean functions: linear functions the function value for a search point is computed as a weighted sum of the values of its bits; OneMax, monotone functions functions where a mutation ﬂipping at least one 0-bit into a 1-bit and no 1-bit into a 0-bit strictly increases the function value; OneMax, functions of unitation the ﬁtness only depends on the number of 1-bits in the considered search point; OneMax, TwoMax, separable functions the ﬁtness can be expressed as a sum of subfunctions that depend on mutually disjoint sets of bits of the search points; OneMax, TwoMax). Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN8 / 44
- 9. Perspective 2: Sample Symbolic Regression Problems with GP Genetic Programming (GP) has seen a recent eﬀort towards standardization of benchmarks, particularly in the application area of Symbolic Regression and Classiﬁcation. These have been mostly artiﬁcial problems: a function is provided, which allows the generation of input-output pairs for regression. Some of the most commonly used in recent GP literature include the sets deﬁned by Keijzer (15 functions), Pagie (1 function), Korns (15 functions), and Vladislavleva (8 functions). F1 : f (x1, x2) = exp(−(x1−1)2) 1.2+(x2−2.5)2 F2 : f (x1, x2) = exp(−x1)x3 1 cos(x1) sin(x1)(cos(x1) sin2 x1 − 1)(x2 − 5) F3 : f (x1, x2, x3, x4, x5) = 10 5+ 5 i=1 (xi −3)2 F4 : f (x1, x2, x3) = 30 (x1−1)(x3−1) x2 2(x1−10) F5 : f (x1, x2) = 6 sin(x1) cos(x2) F6 : f (x1, x2) = (x1 − 3)(x2 − 3) + 2 sin((x1 − 4)(x2 − 4)) F7 : f (x1, x2) = (x1−3)4+(x2−3)3−(x2−3) (x2−2)4+10 F8 : f (x1, x2) = 1 1+x −4 1 + 1 1+x −4 2 F9 : f (x1, x2) = x1 4 − x1 3 + x2 2 /2 − x2 F10 : f (x1, x2) = 8 2+x1 2+x2 2 F11 f (x1, x2) = x1 3 /5 + x2 3 /2 − x2 − x1 F12 : f (x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) = x1x2 + x3x4 + x5x6 + x1x7x9 + x3x6x10 F13 : f (x1, x2, x3, x4, x5) = −5.41 + 4.9 x4−x1+x2/x5 3x4 F14 : f (x1, x2, x3, x4, x5, x6) = (x5x6)/( x1 x2 x3 x4 ) F15 : f (x1, x2, x3, x4, x5) = 0.81 + 24.3 2x2+3x2 3 4x3 4 +5x4 5 F16 : f (x1, x2, x3, x4, x5) = 32 − 3 tan(x1) tan(x2) tan(x3) tan(x4) F17 : f (x1, x2, x3, x4, x5) = 22 − 4.2(cos(x1) − tan(x2))( tanh(x3) sin(x4) ) F18 : f (x1, x2, x3, x4, x5) = x1x2x3x4x5 F19 : f (x1, x2, x3, x4, x5) = 12 − 6 tan(x1) exp(x2) (x3 − tan(x4)) F20 : f (x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) = 5 i=1 1/xi F21 : f (x1, x2, x3, x4, x5) = 2 − 2.1 cos(9.8x1) sin(1.3x5) Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN9 / 44
- 10. Perspective 2: Dynamic Genotype Functions, GP – Guidelines Guidelines on improving benchmarking GP by Nicolau et al.: Careful deﬁnition of the input variable ranges; Analysis of the range of the response variable(s); Availability of exact train/test datasets; Clear deﬁnition of function/terminal sets; Publication of baseline performance for performance comparison; Large test datasets for generalization performance analysis; Clear deﬁnition of error measures for generalization performance analysis; Introduction of controlled noise as simulation of real-world data. Some real-world datasets have also been suggested and used during the last few years, but problems have also been detected with these. Mostly GP researchers resort to UCI datasets for real-world data.Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN10 / 44
- 11. Perspective 2: Dynamic Genotype Functions, GP – Example Classes Scalable Genetic Programming (function classes): e.g. through gene-pool optimal mixing and input-space entropy-based building-block learning To learn and exploit model structures in black-box optimization for possibly atomic representations of partial solutions, e.g. in dimension (number of input variables), but also through deﬁnition of function/terminal set (with scalable number of inputs) artiﬁcial problems Order (GP version of OneMax) and Trap (with scalable problem size as the maximum binary tree height); applied: Boolean Circuits design (Comparator, Even Parity, Majority, and Multiplexer). GP at other areas: most popular in recent years: gaming and automatic program synthesis with a range of competitions and workshops e.g. Evolving levels for Super Mario Bros, Virtual Creatures Contest (https://virtualcreatures.github.io/vc2018/), and The General Video Game AI Competition (http://www.gvgai.net/). But in these and other areas, there has not been a concerted eﬀort to provide benchmark data/setups which can be freely and easily used by researchers. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN11 / 44
- 12. Perspective 3: Tailored Problem Representation Additionally to those functions already mentioned, real-world challenges for optimization algorithms include e.g.: knapsack problems (e.g. automatic summarization), routing (Traveling Salesperson Problem (TSP), Chinese Postman (CP), path planning) scheduling (including job shop and ﬂow shop) bioinformatics (including sequencing, alignment, and protein folding), cryptography, and computer vision. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN12 / 44
- 13. Perspective 4: Budget Planning During Optimizer Design In order to introduce solutions to industry that require process improvement after very latent evaluation of the process, an optimization algorithm might need to suﬃciently quickly self-adjust itself to a black-box challenge. Such a process should take into account the work of designing the optimizer approach itself using the total number of ﬁtness evaluations over a cycle of design and execution. Such a metric might also indirectly measure (in part) the ease and eﬃciency of use of an optimizer. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN13 / 44
- 14. Perspective 4: New Algorithm Design Application of an algorithm to a domain could be reﬂective: inﬂuence the quality of the produced results due to a limited ﬁtness evaluation budget allowed for setting up an optimizer: A measure of performance inclination to any function class should also be measured. This would yield the reﬂectivity measure (inﬂuence of the design and tuning of an algorithm to benchmarked evaluation) for an algorithm that might not be suitable for it with an arbitrary black-box challenge. Example: an optimizer that writes/adopts a successful optimizer for a black-box benchmark within a limited budget of communications to a benchmark descriptor. Namely, it should be avoided that the designer (human or automaton) is able to call the ﬁtness function suﬃciently often during the design phase to extract the information to be optimized. This would yield a grey-box or even a white-box generated optimizer, or in the simplest case, unfairly save an encoded solution itself into the yielded optimizer code. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Taxonomical Classes Identiﬁcation Survey Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN14 / 44
- 15. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Properties and Usage Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN15 / 44
- 16. Properties and Usage Use the 4 identiﬁed aspects as BB-DOB challenges: through function class types representative instances. The shifting and rotation of benchmarking functions is achieved by: transforming the input structure from an optimizer into the input to the benchmarking function by: the two mathematical transformations (shift and rotation) before each call to a ﬁtness function in the benchmark. Deﬁning such transformations might be easier for simpler genotypes while for the more advanced ones like trees this might be more challenging. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Properties and Usage Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN16 / 44
- 17. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Signiﬁcant Instances Methodology Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN17 / 44
- 18. Signiﬁcant Instances Methodology: Openness Requirement Methodology requirement: when preparing a benchmark, formulation as well as data should be accessible. Example: namely, some e.g. domain speciﬁc benchmarks might not be fully disclosed, which would not allow testing of other (new) algorithms. Explained: if designing a narrow application domain optimizer, then application and performance assessment to real world challenges for such an optimizer usually yields something like a domain speciﬁc benchmark, which is not applicable for general black-box re-application of the same optimizer in a diﬀerent domain. → Take into account existing and recognized benchmark sets when preparing a new discrete optimization benchmark, like the MIPLIB 2010 benchmark set1. 1 http://plato.asu.edu/bench.html Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Signiﬁcant Instances Methodology Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN18 / 44
- 19. Signiﬁcant Instances Methodology: Connectivity Requirement New benchmark set should connect to existing optimizer providers Gurobi http://www.gurobi.com/downloads/download-center, CPLEX https://www.ibm.com/analytics/data-science/ prescriptive-analytics/cplex-optimizer, XPRESS https://www.solver.com/xpress-solver-engine, Mosek https://www.mosek.com/, SCIP http://scip.zib.de/, CBC https://projects.coin-or.org/Cbc, GLPK https://www.gnu.org/software/glpk/, LP Solve http://lpsolve.sourceforge.net/, and MATLAB https://www.mathworks.com/products/optimization.html. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Signiﬁcant Instances Methodology Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN19 / 44
- 20. Signiﬁcant Instances Methodology: General Guidelines As a black-box benchmarking suite, the perspectives of the COCO platform could be followed. The functions referenced previously in this talk capture: combinatorial optimization problems diﬃculties in practice and the references list strives also at the same time to be comprehensible w.r.t. the classes of challenges, so that the resulting algorithm behaviors can be understood or interpreted when using the benchmark. The proposed instances list also considers being scalable with the problem size and non-trivial in the black-box optimization sense, i. e., allow for shifting the optimum to any point. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Signiﬁcant Instances Methodology Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN20 / 44
- 21. Signiﬁcant Instances Methodology: Selecting Classes We suggest: choose few representative items from each function class. The instances to be chosen within a function class should be such that the instances: thoroughly cover the underlying features of the function class, which the class is representing in the terms of challenges it represents. This should foster the benchmarking of optimization algorithms that are designed for black-box functions. The distribution of features in the chosen instances should be non-biased with regard to a function class as well as the benchmark as a whole. Select classes: at the workshop and until PPSN. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Signiﬁcant Instances Methodology Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN21 / 44
- 22. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Experiments Setup Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN22 / 44
- 23. Experiments Setup: Outline 1. Optimize the instance functions listed in the previous part, 2. a budget of runtime should be used for determining the maximum number of function evaluations allowed. This suggestion is based on the design of algorithms with ﬁxed budget. 3. The number of independent runs for an optimizer on a speciﬁc test instance should be set based on the test instance complexity 4. The runtime of optimizers should be measured proportional to the time it takes to execute the ﬁtness functions. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Experiments Setup Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN23 / 44
- 24. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Performance Measures Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN24 / 44
- 25. Performance Measures: Requirements Benchmark: practically relevant & theoretically accessible Vital practicality measure: algorithms’ mutual advantages and disadvantages insight Accessibility: public website w/ results submission Measured performance: runtime, ﬁxed budget yield, ﬁxed precision yield. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Performance Measures Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN25 / 44
- 26. Performance Measures: Analysis Comparing algorithms: Friedman ranking & post-hoc procedures for signiﬁcance (ﬁxed budget) observing order of magnitude improvements (limited budget for new problems) fast incremental rating systems or single value performance mark. Deep statistics for evolution and behavior tracking of: values of population and memory within the algorithm, traits visualization, e.g. with graph plots for: ﬁtness convergence, control parameters, optimizer population memory, inter-connectedness of evolved population members through generations (i.e. communication complexity analysis). Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Performance Measures Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN26 / 44
- 27. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Results Presentation Methods and Formats Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN27 / 44
- 28. Results Tables: Content The tables should present: ﬁtness function values attained at diﬀerent stages of the optimization runs during several cut-oﬀ points, statistically as best, worst, median, average, and standard deviation values, as based on these further comparisons can usually be made between experiments. → Competitions could be launched by creating composite functions, e. g., one perspective type competition per year, including some of the set of function classes mentioned. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Results Presentation Methods and Formats Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN28 / 44
- 29. Results Tables: Compatibility Compatibility for: generation of data output, procedures for post-processing, ranking, and presentation formats for the results. Furthermore, the evaluation results should be stored in a cloud-compatible format, possibly online as a structured database or archive, comprising the evaluation as well as its corresponding solution values for each of these cut-oﬀ points at each run, using binary compilant architecture to enable an Application Binary Interface (ABI) when re-using the experiments at diﬀerent computers. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Results Presentation Methods and Formats Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN29 / 44
- 30. Speciﬁc Examples Implementing BB-DOB Speciﬁc experiences from preparing discrete optimization benchmark challenges like feature selection in: healthcare data mining or automatic text summarization. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN30 / 44
- 31. Feature selection for logistic linear regression of healthcare records Logistic linear regression model is a binary classiﬁcation ML model: h0(x) = P(y = 1|x1, x2, ..., xp) = = g(β0 + β1x1 + β2x2 + ... + βpxp) = = 1 1 + exp(−(β0 + β1x1 + β2x2 + ... + βpxp)) , where x = (x1, x2, ..., xp) is input of p features and the h0(x) outcome is the probability of y = 1. The variable y represents: two possible classes being predicted. The probability of y = 0 is obtained by calculating P(y = 0|x) = 1 − h0(x). Parameters of the model are represented with β = (β0, β1, ..., βp) and are estimated during regression. Squeezing of outcome is through sigmoid function: g(t) = 1 1 + exp(−t) , where t is a value to be transformed: interval (0,1) interpreted as a probability. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN31 / 44
- 32. Feature selection for logistic linear regression of healthcare records Input features are required to be numerical, categorical input features must be transformed to dummy features. Solving a logistic linear model is treated as yet another optimization problem, to estimate parameters β in the model. Gradient descent methods optimization2 search for minimum of the cost function J(β): J(β) = 1 n n i=1 y(i) − 1 log 1 − h0(x(i) ) − y(i) log h0(x(i) ) , where n is number of training records, y(i) is outcome, and x(i) vector of feature values for i-th training record. 2 Boris T. Polyak. “Some methods of speeding up the convergence of iteration methods”. In: USSR Computational Mathematics and Mathematical Physics 4.5 (1964), pp. 1–17, John Duchi, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization”. In: Journal of Machine Learning Research 12.Jul (2011), pp. 2121–2159, Diederik Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014). Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN32 / 44
- 33. Feature selection for logistic linear regression of healthcare records The input records data is using Healthcare Records from Healthcare Cost and Utilization Project (HCUP) US National (Nationwide) Inpatient Sample (NIS) database from year 20093 . This US NIS database contains 7,810,762 discharge records with personal characteristics of the patients, e.g. age, gender, race; administrative information, e.g. length of stay; and medical information, where diagnoses and procedures are coded in International Statistical Classiﬁcation of Diseases and Related Health Problems, 9th Rev., Clinical Modiﬁcation (ICD-9-CM) format. D = 1205: in the benchmark, only age, gender, race, and diagnosis with at least 0.1% admissions are chosen relevant. 3 HCUP Nationwide Inpatient Sample (NIS). “Healthcare Cost and Utilization Project (HCUP)”. In: Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/nisoverview.jsp (2009). Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN33 / 44
- 34. Feature Selection Tackling Over-ﬁtting – Approaches (for Logistic Linear Regression) 1. Regularization, adds penalization term to the cost function: to penalize high values of parameters β, resulting in reduced over-ﬁtting. 2 penalization term Ridge4 reduces parameters β, while 1 penalization term LASSO5 causes many parameters β to become zero and keeping only few parameters in the ML model: PRidge(β) = p j=1 β2 j = |β|2 2 PLASSO(β) = p j=1 |βj | = |β|ell1 LASSO penalization is therefore treated as embedded feature selection method. Elastic Net6, combination of Ridge and LASSO for cost penalization: Elastic Net penalization control parameters λR and λL, representing the impact of Ridge and LASSO penalizations: JElastic net(β) = J(β) + λR PRidge(β) + λLPLASSO(β). 2. Feature Selection – use cross-validation to tackle over-ﬁtting. 4 Arthur E. Hoerl and Robert W. Kennard. “Ridge regression: Biased estimation for nonorthogonal problems”. In: Technometrics 12.1 (1970), pp. 55–67. 5 Robert Tibshirani. “Regression shrinkage and selection via the lasso”. In: Journal of the Royal Statistical Society. Series B (Methodological) 58 (1 1996), pp. 267–288. 6 Hui Zou and Trevor Hastie. “Regularization and variable selection via the elastic net”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67.2 (2005), pp. 301–320. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN34 / 44
- 35. Feature Selection for Text Summarization Preprocessing For the abstract, combination of sentences from original text with selected sentences for full inclusion (extraction)7: From each document is D, sentences are indexed. A statement consists of words (terms), words consist of letters (isAlpha(), byte-wise or other code). Text indexing for computational model: beginnings of statements are marked as text character indices D = {s1, s2, s3, ...} and dictionary of diﬀerent words (terms) and their appearences in text is composed W = {w1, w2, w3, ...}. A symbol table (implemented using hash table) is used to compare terms and build the dictionary. Possible preprocessing eliminations: short sentences or words are ignored. 7 Rasim M. Alguliev, Ramiz M. Aliguliyev, and Chingiz A. Mehdiyev. “Sentence selection for generic document summarization using an adaptive diﬀerential evolution algorithm”. In: Swarm and Evolutionary Computation 1.4 (2011), pp. 213–222. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN35 / 44
- 36. 1 – Preprocessing: Precomputation 1) For each i-th term (wi ), during indexing number of occurences in the text is gathered, and number of occurences (nk ) of a term in some k-th statement, 2) For each term wi in the document is calculated an inverse frequency: isfw i = log( n nk ), where n denotes number of statements in the document, and nk number of statements including a term wi . 3) To conclude preprocessing, for each term in the document, a weight is calculated: wi,k = tfi,kisfk, where tfik is number of occurences (term frequency) of a term wk in a statement si . Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN36 / 44
- 37. 2 – Summary Fitness Evaluation (1/4) Sentence combination (X) is evaluated as a 0/1 knapsack problem, we want to include optimal selection of statements in the ﬁnal output an i-th sentence si is selected (xi = 1) or unselected (xi = 0). a) Price of a knapsack (its ﬁtness) should be maximized, the ﬁtness represents a ratio between content coverage, V (X), and redundancy, R(X): f (X) = V (X) R(X) , considering a constraint: the summary length is L ± words. Constraint handling with solutions: each feasable solution is better than unfeasable, unfeasable compared by constraint value (lower better), feasable compared by ﬁtness (higher better). Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN37 / 44
- 38. 2 – Summary Fitness Evaluation (2/4) b) Content coverage V (X) is computed as a double sum of similarities (deﬁned at d)): V (X) = n−1 i=1 n j=i+1 (sim(si , O) + sim(sj , O))xi,j , where xi,j denotes inclusion of both statement, si , and sj , xi,j is only 1 if xi = xj = 1, otherwise 0, and O is a vector of average term weights wi,k: O = (o1, o2, ..., om) for all i = {1..m} diﬀerent text terms: oi = n j=1 wi,j n . Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN38 / 44
- 39. 2 – Summary Fitness Evaluation (3/4) c) Redundance R(X) is also measured as double similarity (deﬁned at d)) sum for all statements: R(X) = n−1 i=1 n j=i+1 sim(si , sj )xi,j , where xi,j denotes inclusion of both statement, si , and sj , again, xi,j is only 1 if xi = xj = 1, otherwise 0, Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN39 / 44
- 40. 2 – Summary Fitness Evaluation (4/4) d) Similarity between statements si = [wi,1, wi,2, ..., wi,m] and sj = [wj,1, wj,2, ..., wj,m] is computed: sim(si , sj ) = m k=1 wi,kwj,k m k=1 wi,kwi,k m k=1 wj,kwj,k , where wi,k is term weight (deﬁned in step 3)) and m number of all terms in text. e) When concluded: the selected statements from the best assessed combination are printed in order as they appear in the text. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Speciﬁc Examples Implementing BB-DOB Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN40 / 44
- 41. Introduction Taxonomical Classes Identiﬁcation Survey Properties and Usage Signiﬁcant Instances Methodology Experiments Setup Performance Measures Results Presentation Methods and Formats Speciﬁc Examples Implementing BB-DOB Conclusions Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Conclusions Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN41 / 44
- 42. Conclusions Surveyed previous successes in benchmarking of optimization algorithms (CEC, GECCO). Listed toy problems and also speciﬁc domain benchmarks, GP, knapsack, routing, scheduling, bioinformatics, and CV. Taxonomical identiﬁcation survey: classes in discrete optimization, perspective on Black-Box Discrete Optimization Benchmarking (BB-DOB). Benchmarking pipeline for BB-DOB, providing: properties, usage, instances of listed classes, experimental setup, performance measures, and formats for result representation. Towards challenges for more general discrete optimization: able to tackle new unknown problems as a black box and algorithms automating performance over these problems. Two speciﬁc example problem types using Feature Selection. Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Conclusions Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN42 / 44
- 43. Future Work When the WG3 of ImAppNIO wiki8 will list suﬃcient benchmarking suggestions, it is expected that a benchmark code package (WG4) could be facilitated. Fostering of the contributions is also expected through BB-DOB workshop at PPSN 2018 and inviting more researchers to contribute to the benchmark and competitions that will provide black-box algorithms. 8 http://imappnio.dcs.aber.ac.uk/dokuwiki/doku.php?id=wg3 Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Conclusions Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN43 / 44
- 44. Acknowledgement. This article is based upon work from COST Action CA15140 ‘Improving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO)’ and COST Action IC1406 ‘High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)’ supported by COST (European Cooperation in Science and Technology). The work is also supported in part by the Slovenian Research Agency, Programme Unit P2-0041. Thank you for your attention. Questions? Aleˇs Zamuda, Goran Hrovat, Elena Lloret, Miguel Nicolau, Christine Zarges Conclusions Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for BB-DOB@GECCO and BB-DOB@PPSN44 / 44

No public clipboards found for this slide

You just clipped your first slide!

Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.Hate ads?

Enjoy access to millions of presentations, documents, ebooks, audiobooks, magazines, and more **ad-free.**

The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.

Cancel anytime.Total views

836

On SlideShare

0

From Embeds

0

Number of Embeds

183

Unlimited Reading

Learn faster and smarter from top experts

Unlimited Downloading

Download to take your learnings offline and on the go

You also get free access to Scribd!

Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.

Read and listen offline with any device.

Free access to premium services like Tuneln, Mubi and more.

We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.

You can read the details below. By accepting, you agree to the updated privacy policy.

Thank you!

We've encountered a problem, please try again.