Slides from my oral defense

1,133 views
1,000 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,133
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • View the problem space as all possible software artifacts Use the given software artifact as a starting point Use the correct version(s) of the given software artifact as the goal point Generate test cases to guide the evolution of the software artifact Evolve the software artifacts in order to handle the test cases better (and ultimately perform better) Evolve the test cases in order to better find flaws in the software artifacts Competitive coevolutionary arms race results
  • Mantere’s result is particularly good for the CASC system The CASC system dynamically finds high impact points in the program Finds as many as it can
  • ANTLR is a parser generator Reads in a file specifying the grammar for a language Outputs a parser for that language in a specified programming language The CASC system uses a parser that parses C++ and is also written in C++
  • Only want to evolve the relevant portions of code These sections are designated using comments specified in the CASC config file
  • The initial variation phase copies the seed program Makes random variations to the program Variation amount is pulled from Gaussian distribution Modified individual is added to the population Repeat until the population is full
  • Evaluate all individuals, assign fitness Discuss the details of evaluation later
  • Only specific nodes are considered for mutation (critical points) Numeric constants and unmodified variables Identified dynamically
  • Programs which have compile-time errors, run-time errors, or execution time-outs are assigned arbitrarily low fitness Should be no compile time errors, all the CASC operators result in syntactically valid ASTs As long as the seed is syntactically valid then there should be no compile errors Still watch for them
  • Trim populations back down to specified size Reverse tournament selection
  • e.g. Number of generations, goal fitness reached, population converged on maxima (low diversity), etc.
  • Chose to focus on mutation operator because of the extraordinary high ratio between search space size and population size Mutation is the exploration variation operator and a high exploration rate is necessary because of the high ratio
  • Takes around 2 weeks to complete a full run with the current resources
  • Highlight best fitnesses and discuss Highlight A, B, C std dev and discuss the high variation in the end points Highlight the D results and discuss
  • The standard deviations shown are across the 5 runs for each experiment They show that there was a lot of variation on the experiment endpoints Near infinite number of endpoints
  • Transactions on Software Engineering: special issue on search based software engineering
  • Slides from my oral defense

    1. 1. Coevolutionary Automated Software Correction A Proof of Concept Master’s Oral Defense September 8, 2008 Josh Wilkerson Committee Dr. Daniel Tauritz – Chair Dr. Bruce McMillin Dr. Thomas Weigert
    2. 2. Motivation <ul><li>In 2002 the National Institute of Science and Technology stated [9]: </li></ul><ul><ul><li>Software errors cost the U.S. economy $59.5 billion a year </li></ul></ul><ul><ul><li>Approximately 0.6% of gross domestic product </li></ul></ul><ul><ul><li>30% of these costs could be removed by earlier, more effective software defect detection and an improved testing infrastructure </li></ul></ul>
    3. 3. Problem Statement <ul><li>Software Debugging: </li></ul><ul><ul><li>Test the software </li></ul></ul><ul><ul><li>Locate the errors identified </li></ul></ul><ul><ul><li>Correct the errors </li></ul></ul><ul><li>Time consuming yet critical process </li></ul><ul><li>Many publications on automating the testing process </li></ul><ul><li>None that fully automate the testing and correction phase </li></ul>
    4. 4. The System Envisioned
    5. 5. Most Related Work <ul><li>Paolo Tonella [14] and Stefen Wappler [6,15,16] </li></ul><ul><ul><li>Unit testing of object oriented software </li></ul></ul><ul><ul><li>Used evolutionary methods </li></ul></ul><ul><ul><li>Focused only on testing, did nothing with correction </li></ul></ul><ul><li>Timo Mantere [7,8] </li></ul><ul><ul><li>Two-population testing system using genetic algorithms </li></ul></ul><ul><ul><li>Optimized program parameters through evolution </li></ul></ul><ul><ul><li>The more control the EA has over the program the better the results </li></ul></ul>
    6. 6. Technical Background <ul><li>Darrel Rosin [10,11] and John Cartlidge [1] </li></ul><ul><ul><li>Extensive analysis of co-evolution </li></ul></ul><ul><ul><li>Outline many potential problems that can occur during co-evolution </li></ul></ul><ul><li>Koza [2,3,4,5] </li></ul><ul><ul><li>Popularized genetic programming in the 1990’s </li></ul></ul><ul><ul><li>Father of modern genetic programming </li></ul></ul>
    7. 7. CASC Evolutionary Model
    8. 8. CASC Evolutionary Model
    9. 9. Parsing in the CASC System The program population is based on the program to be corrected (seed program)
    10. 10. Parsing in the CASC System: Step 1 <ul><li>The ANTLR system is used to create parsing tools (only done once for each language) </li></ul><ul><li>The parser created is based on a provided grammar (C++) </li></ul><ul><li>The resulting parser is dependent on the ANTLR libraries </li></ul>
    11. 11. Parsing in the CASC System: Step 2 <ul><li>The system reads in the source code for the program to correct </li></ul><ul><li>The code to evolve is extracted in preprocessing </li></ul>
    12. 12. Parsing in the CASC System: Step 3 <ul><li>The preprocessed source code to evolve is provided to the parsing tools </li></ul>
    13. 13. Parsing in the CASC System: Step 4 <ul><li>The parsing tools produce the Abstract Syntax Tree (AST) for the evolvable code </li></ul><ul><li>The AST produced is heavily dependent on the ANTLR libraries </li></ul><ul><li>These dependencies incur unnecessary computational cost </li></ul>
    14. 14. Parsing in the CASC System: Step 5 <ul><li>The ANTLR AST is provided to the CASC AST translator </li></ul><ul><li>The AST translator removes the ANTLR dependencies from the AST </li></ul><ul><li>The result is a lightweight version of the AST </li></ul>
    15. 15. Parsing in the CASC System: Step 6 <ul><li>The lightweight AST is provided to the CASC coevolutionary system </li></ul><ul><li>Copies of the AST are randomly modified </li></ul><ul><li>Initial variation phase </li></ul>
    16. 16. CASC Evolutionary Model
    17. 17. CASC Evolutionary Model
    18. 18. CASC Evolutionary Model
    19. 19. CASC Evolutionary Model <ul><li>Reproduction </li></ul><ul><ul><li>Parents selected using tournament selection </li></ul></ul><ul><ul><li>Uniform crossover with bias </li></ul></ul><ul><ul><li>Program child subtrees of the roots were used for crossover </li></ul></ul><ul><li>Mutation </li></ul><ul><ul><li>Each offspring has a chance to mutate </li></ul></ul><ul><ul><li>Only specific nodes are considered for program mutation </li></ul></ul><ul><ul><li>Genes to be mutated are altered based on a Gaussian distribution </li></ul></ul>
    20. 20. CASC Evolutionary Model
    21. 21. CASC Evolutionary Model
    22. 22. <ul><li>For each individual: </li></ul><ul><ul><li>Randomly select set of (unique) opponents </li></ul></ul><ul><ul><li>Check hash table to retrieve repeat pairing results </li></ul></ul><ul><ul><li>Execute program with test case as input for each new pairing </li></ul></ul><ul><ul><li>Apply fitness function to program output, store fitness for the trial </li></ul></ul><ul><ul><li>Set individual fitness as average fitness across all trials </li></ul></ul><ul><li>Program compilation is performed as needed </li></ul><ul><li>Program errors/time-outs result in arbitrarily low fitness </li></ul><ul><li>This is done in parallel, using the NIC-Cluster and MPI </li></ul>CASC Evolutionary Model: Fitness Evaluation
    23. 23. CASC Evolutionary Model
    24. 24. CASC Evolutionary Model
    25. 25. CASC Evolutionary Model
    26. 26. Experimental Setup <ul><li>Proof of concept </li></ul><ul><li>Correction of insertion sort implementation </li></ul><ul><li>Test case: unsorted data array </li></ul>
    27. 27. Experimental Setup <ul><li>Fitness function </li></ul><ul><li>Scoring method </li></ul><ul><li>For each element x in the output data array: </li></ul><ul><ul><li>For each element a before x in the array, decrement score if x < a, increment score otherwise </li></ul></ul><ul><ul><li>For each element b after x in the array, decrement score if x > b , increment score otherwise </li></ul></ul><ul><li>Normalized to fall between 0 and 1 </li></ul><ul><li>-1 assigned to programs with errors/time-outs </li></ul>
    28. 28. Experimental SetupExperimental Setup <ul><li>Four seed programs used </li></ul><ul><ul><li>Each has one common error and one unique error (of varying severity) </li></ul></ul><ul><li>Four different configurations used </li></ul><ul><ul><li>Mutation Rate: Likelihood of an offspring being mutated </li></ul></ul><ul><ul><li>Mutative Proportion: Amount of change mutation incurs </li></ul></ul>High High Moderate Moderate Mutative Proportion High Moderate High Moderate Mutative Rate Config 3 Config 2 Config 1 Config 0  
    29. 29. Results <ul><li>A total of 16 experiments per full run </li></ul><ul><li>High computational complexity and limited resources </li></ul><ul><li>Five full runs were completed, totaling in 80 experiments </li></ul>
    30. 30. Summary of Results <ul><li>Run three of both the program A and B experiments found a solution in the initial population (these were omitted from the table) </li></ul><ul><li>20% of the experiments (16) reported success </li></ul>Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.) A : Base 0.526 (0.262) 0.163 (0.157) A : Enhanced Rate 0.557 (0.283) 0.170 (0.166) A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133) A : Enhance Both 0.559 (0.255) 0.175 (0.153)       B : Base 0.965 (0.353) 0.275 (0.374) B : Enhanced Rate 0.975 (0.357) 0.276 (0.370) B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458) B : Enhance Both 0.959 (0.434) 0.415 (0.463)       C : Base 0.707 (0.224) 0.372 (0.196) C : Enhanced Rate 0.717 (0.224) 0.366 (0.179) C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172) C : Enhance Both 0.717 (0.224) 0.377 (0.181)       D : Base 1.0 (0.282) -0.484 (0.535) D : Enhanced Rate 1.0 (0.948) -0.568 (0.572) D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587) D : Enhance Both 1.0 (0.946) -0.601 (0.604)
    31. 31. Summary of Results <ul><li>75% of the experiments reported above 0.7 fitness </li></ul>Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.) A : Base 0.526 (0.262) 0.163 (0.157) A : Enhanced Rate 0.557 (0.283) 0.170 (0.166) A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133) A : Enhance Both 0.559 (0.255) 0.175 (0.153)       B : Base 0.965 (0.353) 0.275 (0.374) B : Enhanced Rate 0.975 (0.357) 0.276 (0.370) B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458) B : Enhance Both 0.959 (0.434) 0.415 (0.463)       C : Base 0.707 (0.224) 0.372 (0.196) C : Enhanced Rate 0.717 (0.224) 0.366 (0.179) C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172) C : Enhance Both 0.717 (0.224) 0.377 (0.181)       D : Base 1.0 (0.282) -0.484 (0.535) D : Enhanced Rate 1.0 (0.948) -0.568 (0.572) D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587) D : Enhance Both 1.0 (0.946) -0.601 (0.604)
    32. 32. Summary of Results <ul><li>There was a high amount of variation in the experiment endpoints </li></ul><ul><li>Large number of possible solutions for each seed program </li></ul>Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.) A : Base 0.526 ( 0.262 ) 0.163 (0.157) A : Enhanced Rate 0.557 ( 0.283 ) 0.170 (0.166) A : Enhanced Proportion 0.537 ( 0.226 ) 0.196 (0.133) A : Enhance Both 0.559 ( 0.255 ) 0.175 (0.153)       B : Base 0.965 ( 0.353 ) 0.275 (0.374) B : Enhanced Rate 0.975 ( 0.357 ) 0.276 (0.370) B : Enhanced Proportion 0.950 ( 0.432 ) 0.587 (0.458) B : Enhance Both 0.959 ( 0.434 ) 0.415 (0.463)       C : Base 0.707 ( 0.224 ) 0.372 (0.196) C : Enhanced Rate 0.717 ( 0.224 ) 0.366 (0.179) C : Enhanced Proportion 0.716 ( 0.217 ) 0.369 (0.172) C : Enhance Both 0.717 ( 0.224 ) 0.377 (0.181)       D : Base 1.0 (0.282) -0.484 (0.535) D : Enhanced Rate 1.0 (0.948) -0.568 (0.572) D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587) D : Enhance Both 1.0 (0.946) -0.601 (0.604)
    33. 33. Summary of Results <ul><li>The seed program D experiments were the toughest for the system </li></ul><ul><li>Seeded error resulted in either a 0 or -1 fitness </li></ul><ul><li>Experiments were either hit or miss </li></ul>Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.) A : Base 0.526 (0.262) 0.163 (0.157) A : Enhanced Rate 0.557 (0.283) 0.170 (0.166) A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133) A : Enhance Both 0.559 (0.255) 0.175 (0.153)       B : Base 0.965 (0.353) 0.275 (0.374) B : Enhanced Rate 0.975 (0.357) 0.276 (0.370) B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458) B : Enhance Both 0.959 (0.434) 0.415 (0.463)       C : Base 0.707 (0.224) 0.372 (0.196) C : Enhanced Rate 0.717 (0.224) 0.366 (0.179) C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172) C : Enhance Both 0.717 (0.224) 0.377 (0.181)       D : Base 1.0 (0.282) -0.484 (0.535) D : Enhanced Rate 1.0 (0.948) -0.568 (0.572) D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587) D : Enhance Both 1.0 (0.946) -0.601 (0.604)
    34. 34. Discussion of False Positives <ul><li>A number of the programs returned by successful experiments still have an error </li></ul><ul><li>For example, this is the evolvable section from a solution: </li></ul><ul><ul><ul><ul><li>for(m=0; m-1 < SIZE-1; m=m+1) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>{ </li></ul></ul></ul></ul><ul><ul><ul><ul><li>for(n=m+1; n>0 && data[n] < data[n-1]; n=n-1) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Swap(data[n], data[n-1]); </li></ul></ul></ul></ul><ul><ul><ul><ul><li>} </li></ul></ul></ul></ul><ul><li>When m is SIZE-1 , n is initialized to Size (invalid array index) </li></ul><ul><li>Tough to catch </li></ul>
    35. 35. Conclusion <ul><li>The goal: demonstrate a proof of concept coevolutionary system for integrated automated software testing and correction </li></ul><ul><li>A prototype Coevolutionary Automated Software Correction system was introduced </li></ul><ul><li>80 experiments were conducted </li></ul><ul><li>16 successes, with 75% of best-of-experiment fitnesses reporting over 0.7 (out of 1.0) </li></ul><ul><li>These experiments indicate validity of CASC system concept </li></ul><ul><li>Further work is required to determine scalability </li></ul><ul><li>Article on this work submitted to IEEE TSE </li></ul>
    36. 36. Work in Progress and Future Work <ul><li>Evolve complete parse tree </li></ul><ul><ul><li>Preliminary results using GP evolutionary model are favorable </li></ul></ul><ul><li>Cut down on run-times </li></ul><ul><ul><li>Add symmetric multiprocessing (server-client) functionality </li></ul></ul><ul><ul><li>More efficient compilation </li></ul></ul><ul><ul><li>Acquire additional computing resources (e.g., NSF Teragrid) </li></ul></ul><ul><li>Investigate the potential benefits of co-optimization [12,13] </li></ul>
    37. 37. Work in Progress and Future Work <ul><li>Implement adaptive parameter control </li></ul><ul><li>Investigate options for detecting errors like false positives </li></ul><ul><li>Parameter sensitivity analysis </li></ul>
    38. 38. References <ul><li>[1] J. P. Cartlidge. Rules of Engagement: Competitive Coevolutionary Dynamics in Computational Systems. PhD thesis, University of Leeds, 2004. </li></ul><ul><li>[2] J. R. Koza. Genetic Programming: On the Programming of Computers by the Means of Natural Selection. MIT Press, Cambridge MA, 1992. </li></ul><ul><li>[3] J. R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge MA, 1994. </li></ul><ul><li>[4] J. R. Koza. Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann, 1999. </li></ul><ul><li>[5] J. R. Koza. Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Acadmeic Publishers, 2003. </li></ul><ul><li>[6] F. Lammermann and S. Wappler. Benefits of software measures for evolutionary white-box testing. In Proceedings of GECCO 2005 - the Genetic and Evolutionary Computation Conference, pages 1083–1084, Washington DC, 2005. ACM, ACM Press. </li></ul>
    39. 39. References <ul><li>[7] T. Mantere and J. T. Alander. Developing and testing structural light vision software by co-evolutionary genetic algorithm. In QSSE 2002 The Proceedings of the Second ASERC Workshop on Quantative and Soft Computing based Software Engineering, pages 31–37. Alberta Software Engineering Research Consortium (ASERC) and the Department of Electrical and Computer Engineering, University of Alberta, Feb. 2002 </li></ul><ul><li>[8] T. Mantere and J. T. Alander. Testing digital halftoning software by generating test images and filters co-evolutionarily. In Proceedings of SPIE Vol. 5267 Intelligent Robots and Computer Vision XXI: Algorithms, Techniques, and Active Vision, pages 257–258. SPIE, Oct. 2003. </li></ul><ul><li>[9] M. Newman. Software Errors Cost U.S. Economy $59.5 Billion Annually. NIST News Release, June 2002. </li></ul><ul><li>[10] C. D. Rosin and R. K. Belew. Methods for competitive coevolution: Finding opponents worth beating. In L. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 373–380, San Francisco, CA, 1995. Morgan Kaufmann. </li></ul><ul><li>[11] C. D. Rosin and R. K. Belew. New methods for competitive coevolution. Evolutionary Computation, 5(1):1–29, 1997. </li></ul>
    40. 40. References <ul><li>[12] T. Service. Co-optimization: A generalization of coevolution. Master's thesis, Missouri University of Science and Technology, 2008. </li></ul><ul><li>[13] T. Service and D. Tauritz. Co-optimization algorithms. In Proceedings of GECCO 2008 - the Genetic and Evolutionary Computation Conference, pages 387-388, 2008. </li></ul><ul><li>[14] P. Tonella. Evolutionary testing of classes. In Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis, pages 119–128, Boston, Massachusetts, 2004. ACM Press. </li></ul><ul><li>[15] S. Wappler and F. Lammermann. Using evolutionary algorithms for the unit testing of object-oriented software. In Proceedings of GECCO 2005 - the Genetic and Evolutionary Computation Conference, pages 1053–1060, Washington DC, 2005. ACM, ACM Press. </li></ul><ul><li>[16] S. Wappler and J. Wegener. Evolutionary unit testing of object-oriented software using strongly-typed genetic programming. In Proceedings of GECCO 2006 - the Genetic and Evolutionary Computation Conference, pages 1925– 1932, Seattle, Washington, 2006. ACM, ACM Press. </li></ul>
    41. 41. Questions?
    42. 42. Koza’s GP Evolutionary Model Back to future work slide
    43. 43. Diversity in New Experiments

    ×