Advertisement
Advertisement

More Related Content

Advertisement

More from Annibale Panichella(17)

Advertisement

VST2022.pdf

  1. Do Tests Generated by AI Help Developers? Open Challenges, Applications and Opportunities Annibale Panichella, Ph.D. a.panichella@tudelft.nl @AnniPanic 1
  2. About Me Assistant Professor in Software Engineering at TU Delft 2
  3. The CISE Lab 3 https://www.ciselab.nl Dr. Annibale Panichella (Lab leader) Dr. Pouria Derakhshanfar (Post-doc) Imara van Dinten (Ph.D. Student) Mitchell Olsthoorn (Ph.D. Student) Leonhard Applis (Ph.D. Student) Team of 10 M.Sc. students
  4. My Research Interests 4 Word-cloud from my research papers Research Topics: • Automated Test Generation • Crash Replication • Security Attacks Generations • SE for Cyber-Physical Systems • Empirical Software Engineering • Testing for AI-based systems • …
  5. Why AI-based Testing? 5
  6. Spot the Bug 6 int balance = 1000; void decrease(int amount) { if (balance <= amount) { balance = balance – amount; } else { printf(“Insufficient fundsn”); } } void increase(int amount) { balance = balance + amount; }
  7. Spot the Bug 7 int balance = 1000; void decrease(int amount) { if (balance <= amount) { balance = balance – amount; } else { printf(“Insufficient fundsn”); } } void increase(int amount) { balance = balance + amount; } It should be balance >= amount
  8. Spot the Bug 8 int balance = 1000; void decrease(int amount) { if (balance <= amount) { balance = balance – amount; } else { printf(“Insufficient fundsn”); } } void increase(int amount) { balance = balance + amount; } It should be balance >= amount What if the amount is negative?
  9. Spot the Bug 9 int balance = 1000; void decrease(int amount) { if (balance <= amount) { balance = balance – amount; } else { printf(“Insufficient fundsn”); } } void increase(int amount) { balance = balance + amount; } It should be balance >= amount What if the amount is negative? What if sum is too large for int ?
  10. Is That Easy? 10 Class = Pass2Verifier.java Project = Apache commons BCEL
  11. Software Testing Is… 11 Slow Painful Boring Zzz… Necessary
  12. AI for SE 12 Arti fi cial Intelligence Software_ Testing_ AI-Based Software Engineering Optimization Search Genetic Algorithms Ant Colony Test Case Coverage Assertions Failures Bugs Machine Learning
  13. The Master Algorithm 13 [P. Domingos 2015 “The Master Algorithm”] Tribe Origin Master Algorithm Symbolists Logic, philosophy Inverse deduction Connectionists Neuroscience Back-Propagation Evolutionary Biology Evolutionary Algorithms Bayesian Statistics Probabilistic inference Analogizers Psychology Kernel machines The most used AI- tribes in Software Testing
  14. A (Very) Brief Historical Overview 14
  15. Historical Overview 15 NATO 1968 NATO 1969 1968-69 First SE Conference
  16. Historical Overview 16 Ramamoorthy et al. IEEE TSE 1976 1976 Test Data Generation (Symbolic execution) 1968-69 First SE Conference
  17. Symbolic Execution 17 int foo (int v) { return 2*v; } void method (int x, int y){ int z = foo(y); if (z == x) if (x > y+10) printf(“Error”); } Concrete State x = 2 y = 1 z = 2 Symbolic State x = x0 y = y0 z = 2*y0 Path Condition 2*y0 = x0 x0 <= y0+10 Code Under Test Find y0 and x0 that solve these equations/paths
  18. Symbolic AI [J. Haugeland, 1985] • Symbolic AI is often called GOFAI (Good Old-fashioned Arti fi cial Intelligence) • The overall idea is that many aspects of intelligence can be achieved by manipulation of “symbols” and symbolic solvers • Pros: powerful and each • Cons: • Not all problems can be modelled as symbolic equations • Not all formulas can be solved with exact methods • Path explosion problem in testing
  19. Historical Overview 19 Automatic generation of random self-checking test cases D. L. Bird; C. U. Munoz IBM Systems Journal 1976 Test Data Generation (Symbolic execution) 1982 Random Testing (Data generation) 1968-69 First SE Conference
  20. Random Testing 20 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; void computeTriangleType(int a, int b, int c) { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Program Under Test The simplest fuzzer public class TestDataGenerator { static int lowerBound = -100; static int upperBound = 100; static int[] generate(int nData){ int[] data = new int[nData]; for (int i=0; i<nData; i++){ double value = lowerBound + Math.random() * (upperBound - lowerBound); data[i] = (int) Math.round(value); } return data; } } [-66, 59, -8] [91, 43, 36] [51, -76, -62] [74, 66, -40] … Output Number of inputs Upper and Lower Bounds It is fast and useful, but it does not generate complete test cases (only the test input) !
  21. Historical Overview 21 1976 Test Data Generation (Symbolic execution) 1982 Random Testing (Data generation) IEEE TSE 1990 Numerical Optimization (Data generation) 1968-69 First SE Conference
  22. Historical Overview 22 1976 Test Data Generation (Symbolic execution) 1982 Random Testing (Data generation) Pargas et al. IEEE TSE1999 1990 Numerical Optimization (Data generation) 1968-69 First SE Conference 1999 Genetic Algorithms (Data Generation)
  23. The Test Case Generation Era 23 2007 Randoop - Random Testing (Test Case Generation) 2011 EvoSuite - Genetic Algorithm (Test Case Generation) 2004 Genetic Algorithms (Test Case Generation) P. Tonella, ISSTA 2004
  24. Test Case Generation 24 class Triangle { int a, b, c; //sides String type = "NOT_TRIANGLE"; void computeTriangleType(int a, int b, int c) { 1. if (a == b) { 2. if (b == c) 3. type = "EQUILATERAL"; else 4. type = "ISOSCELES"; } else { 5. if (a == c) { 6. type = "ISOSCELES"; } else { 7. if (b == c) 8. type = “ISOSCELES”; else 9. type = “SCALENE”; } } } Program Under Test @Test public void testTriangle_invalid1() { assertEquals(Triangle2.Type.INVALID, Triangle2.triangle(0,0,0)); } @Test public void testTriangle_invalid2() { assertEquals(Triangle2.Type.INVALID, Triangle2.triangle(1,1,3)); } @Test public void testTriangle_equilateral() { assertEquals(Triangle2.Type.EQUILATERAL, Triangle2.triangle(2,2,2)); } @Test public void testTriangle_isoscele() { assertEquals(Triangle2.Type.ISOSCELES, Triangle2.triangle(3,4,3)); } @Test public void testTriangle_scalene() { assertEquals(Triangle2.Type.SCALENE, Triangle2.triangle(3,4,5)); } Generated Test Suite AI
  25. The Test Case Generation Era 25 2007 Randoop - Random Testing (Test Case Generation) 2011 EvoSuite - Genetic Algorithm (Test Case Generation) 2004 Genetic Algorithms (Test Case Generation) 2013 SBST tool competition 2015 Many-objective GAs (Test Case Generation) A. Panichella et al., ICST 2015 A. Panichella et al., TSE 2018 Many-objective evolutionary algorithms outperform state-of-the-art test case generation algorithms Nowadays, many-objective algorithms are the core engine of many existing state-of-the-art tools (see next slide)
  26. Some Existing Tools… 26 BOTSING
  27. The good… 27
  28. The Good… 28 Program Under Test @Test public void testTriangle_invalid1() { assertEquals(Triangle2.Type.INVALID, Triangle2.triangle(0,0,0)); } @Test public void testTriangle_invalid2() { assertEquals(Triangle2.Type.INVALID, Triangle2.triangle(1,1,3)); } @Test public void testTriangle_equilateral() { assertEquals(Triangle2.Type.EQUILATERAL, Triangle2.triangle(2,2,2)); } @Test public void testTriangle_isoscele() { assertEquals(Triangle2.Type.ISOSCELES, Triangle2.triangle(3,4,3)); } @Test public void testTriangle_scalene() { assertEquals(Triangle2.Type.SCALENE, Triangle2.triangle(3,4,5)); } Output Developer
  29. The Good… 29 EMSE 2015 ICSE-SEIP 2017 SBST - Tool Competition 2017 EvoSuite fi nds 1600 unknown bugs in 100 projects EvoSuite detects 56.40% bugs on an industrial project Generated tests achieves better coverage than manually-written tests
  30. The Good… 30 0.00 0.23 0.45 0.68 0.90 Eager 
 Tests Assertion 
 Roulette Indirect 
 Testing Sensitive 
 Equality Manually-writtenTests GeneratedTests Test Smells Frequency Automatically generated test cases are shorter and contains fewer test smells than their manually-written counterpart A. Panichella et al. “Test Smells 20 Years Later: Detectability, Validity, and Reliability” under review in EMSE A. Panichella et al. “Revisiting test smells in automatically generated tests: limitations, pitfalls, and opportunities”, ICSME 2020
  31. The bad… 31
  32. Are Generated Tests Readable? 32 G. Fraser et al., Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study, TOSEM 2015. 0% Testing Comprehension Testing time 75% 100% Generated tests achieve higher structural coverage than manually created test suites. Do not lead to fi nd faults more quickly if developers have to manually-validate the tests
  33. Are Generated Tests Readable? 33 G. Fraser et al., Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study, TOSEM 2015. There is no difference in the ef fi ciency of debugging when it is supported either by manual or EvoSuite test cases
  34. Are Generated Tests Readable? 34 Live Demo..
  35. How to make the best of AI-based testing? 35
  36. My Personal View 36 Test cases generated by AI-methods do fi nd many crashes and runtime exceptions Generating effective (bug detecting) tests is not the end of the story Focus on application domains that are too hard and complex to test by hand Easy to validate oracle Generate documentation Successful results
  37. Let me show some examples… 37
  38. Generating Test Documentation 38 TestDescriber Generated test + Documentation S. Panichella, A. Panichella, M. Beller, A. Zaidman, H.C. Gall. “The impact of test case Summaries on Bug Fixing Performance: And Empirical Investigation”. ICSE 2016
  39. Generating Test Documentation 39 Main Steps in TestDescriber: public class Option { public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this.opt = opt; this.longOpt = longOpt; if (hasArg) { this.numberOfArgs = 1; } this.description = descr; } ... } Production Code
  40. public class Option { public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this.opt = opt; this.longOpt = longOpt; if (hasArg) { this.numberOfArgs = 1; } this.description = descr; } ... } Generating Test Documentation 40 Main Steps in TestDescriber: 1. Select the covered statements Covered Code
  41. public class Option { public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException { OptionValidator.validateOption(opt); this opt = opt; this longOpt = longOpt; if (hasArg) {false } this description = descr; } ... } Generating Test Documentation 41 Main Steps in TestDescriber: 1. Select the covered statements 2. Filter out Java keywords, etc. Covered Code
  42. public class Option { public Option(String opt, String long Opt, boolean has Arg, String descr) throws IllegalArgumentException { Option Validator.validate Option(opt); this opt = opt; this long Opt = long Opt; if (has Arg) {false ; } this description = descr; } ... } Generating Test Documentation 42 Covered Code Main Steps in TestDescriber: 1. Select the covered statements 2. Filter out Java keywords, etc. 3. Identi fi er Splitting (Camel case)
  43. public class Option { public Option(String option, String long Option, boolean has Argument String description) throws IllegalArgumentException { Option Validator.validate Option(option); this option = option; this long Option = long Option; if (has Argument) {false } this description = description; } ... } Generating Test Documentation 43 Covered Code Main Steps in TestDescriber: 1. Select the covered statements 2. Filter out Java keywords, etc. 3. Identi fi er Splitting (Camel case) 4. Abbreviation Expansion (using external vocabularies)
  44. public class Option { Option(String option, String long Option , boolean has Argument String description) throws IllegalArgumentException Option Validator.validate Option(option); this option = option ; this long Option = long Option; if (has Argument false } this description = description; } NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Generating Test Documentation 44 Main Steps in TestDescriber: 1. Select the covered statements 2. Filter out Java keywords, etc. 3. Identi fi er Splitting (Camel case) 4. Abbreviation Expansion (using external vocabularies) 5. Part-of-Speech tagger Covered Code
  45. public class Option { Option(String option, String long Option , boolean has Argument String description) throws IllegalArgumentException Option Validator.validate Option(option); this option = option ; this long Option = long Option; if (has Argument false } this description = description; } NOUN NOUN NOUN ADJ NOUN NOUN VERB NOUN NOUN NOUN NOUN VERB NOUN NOUN ADJ ADJ ADJ ADJ NOUN NOUN NOUN VERB ADJ NOUN CON NOUN ADJ Generating Test Documentation 45 Covered Code The test case instantiates an "Option" with: - option equal to “...” - long option equal to “...” - it has no argument - description equal to “…” An option-validator validates the instantiated object The test asserts the following condition: - "Option" has no argument Natural Language Sentences
  46. How Do Test Case Summaries Impact the Number of Bugs Fixed by Developers? 46 Participants WITHOUT TestDescriber summaries fi xed 40% of bugs Both the two groups had 45 minutes to fi x each class Participants WITH TestDescriber summaries, fi xed 60%-80% of bugs
  47. Test Documentation and Comprehension 47 Without With 4% 6% 14% 33% 14% 6% 32% 9% 36% 45% Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries Without Summaries: • Only 15% of participants consider the test cases as “easy to understand”. • 40% of participants considered the test cases as incomprehensible. With Summaries: • 46% of participants consider the test cases as “easy to understand”. • Only 18% of participants considered the test cases as incomprehensible.
  48. Test Documentation and Comprehension 48 Roy et al., ASE 2020 Follow-up work that uses Deep Learning for post-process generated tests DeepTC-Enhancer generate: • Test Documentation • Test Method Names • Variable Names
  49. Easy to validate test oracle 49
  50. Finding Crashes 50
  51. An Example 51 https://issues.apache.org/jira/browse/COLLECTIONS-70 Created in June 2005 Solved in January 2006 Major Bug for Apache Commons Collections A test case is always needed to help debugging
  52. The Botsing Project 52 Target Crash: Bug Name: ACC-70 Library: Apache Commons Collection Exception in thread "main" java.lang.NullPointerException at org.apache.commons.collections.list.TreeList$TreeListIterator.previous (TreeList.java:841) at java.util.Collections.get(Unknown Source) at java.util.Collections.iteratorBinarySearch(Unknown Source) at java.util.Collections.binarySearch(Unknown Source) at utils.queue.QueueSorted.put(QueueSorted.java:51) at framework.search.GraphSearch.solve(GraphSearch.java:53) at search.informed.BestFirstSearch.solve(BestFirstSearch.java:20) at Hlavni.main(Hlavni.java:66) public Object previous() { ... if (next == null) { next = parent.root.get(nextIndex - 1); } else { next = next.previous(); } Object value = next.getValue(); ... } } Buggy Code if “parent” is null, this code triggers an exception BOTSING public void test0() throws Throwable { TreeList treeList0 = new TreeList(); treeList0.add((Object) null); TreeList.TreeListIterator treeList_TreeListIterator0 = new TreeList.TreeListIterator(treeList0, 732); // Undeclared exception! treeList_TreeListIterator0.previous(); } Test generated by BOTSING
  53. 53 Test Case Selection Evolutionary Algorithm Test Execution Initial Tests Variants Generation The Botsing Project java.lang.IllegalArgumentException: org.apache.commons.collections.map.AbstractHashedMap.<init> (AbstractHashedMap.java:142) org.apache.commons.collections.map.AbstractHashedMap.<init> (AbstractHashedMap.java:127) org.apache.commons.collections.map.AbstractLinkedMap.<init> (AbstractLinkedMap.java:95) org.apache.commons.collections.map.LinkedMap.<init> (LinkedMap.java:78) org.apache.commons.collections.map.TransformedMap.transformMap (TransformedMap.java:153) org.apache.commons.collections.map.TransformedMap.putAll (TransformedMap.java:190) java.lang.IllegalArgumentException: org.apache.commons.collections.map.AbstractHashedMap.<init> (AbstractHashedMap.java:141) org.apache.commons.collections.map.AbstractHashedMap.<init> (AbstractHashedMap.java:48) org.apache.commons.collections.map.AbstractLinkedMap.<init> (AbstractLinkedMap.java:31) org.apache.commons.collections.map.LinkedMap.<init> (LinkedMap.java:72) org.apache.commons.collections.map.TransformedMap.transformMap (TransformedMap.java:148) org.apache.commons.collections.map.TransformedMap.putAll (TransformedMap.java:190) Test quality is measured using the “distance” between the target and the generated stack traces Target Stack Trace Produced Stack Trace
  54. 54 Do Generated Tests Help Developers? Time to fi x a bug (in s) With Botsing Without Botsing https://github.com/STAMP-project/botsing Generated tests help developers fi xing bugs signi fi cantly faster Generated tests help developers locating bugs signi fi cantly faster “Search-based crash reproduction and its impact on debugging”, TSE 2018
  55. Testing for Complex Systems (Testing the Untestable) 55
  56. Advanced Driver Assistance Systems (ADAS) Traf fi c Sign Recognition (TSR) Pedestrian Protection (PP) Lane Departure Warning (LDW) 56 Automated Emergency Braking (AEB)
  57. Feature Interactions Sensors / Camera Autonomous Feature Actuator Braking (over time) 57 Sensors / Camera Autonomous Feature Actuator Sensors / Camera Autonomous Feature Actuator . . . 30% 20% … 80% Acceleration (over time) 60% 10% … 20% Steering (over time) 30% 20% … 80%
  58. Feature Interactions Sensors / Camera Autonomous Feature Actuator 58 Sensors / Camera Autonomous Feature Actuator Sensors / Camera Autonomous Feature Actuator . . . Priority?
  59. Integration Components 59 Pedestrian Protection (PP) Autom. Emerg. Braking (AEB) Lane Dep. Warning (LDW) The integration is a rule set: each condition checks a speci fi c feature interaction situation and resolves potential con fl icts that may arise under that condition
  60. Integration Components 60 ystems ICSE ’20, May 23-29, 2020, Seoul, South Korea 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 ve con�icts be- th and �cth are is a risk of colli- (�c(t) < �cth ^ Detected(t)), then mmand issued by nd requires sub- Figure 4: A decision tree diagram representing integration rules 1, 2 and 4 in Figure 3. Table 1: Safety requirements for AutoDrive. Features Requirements PP The PP system shall avoid collision with pedestrians by initiating emergency braking in case of impending collision with pedestrians. TSR The TSR system shall stop the vehicle at the stop sign by initiating a full braking when a stop sign is detected AEB The AEB system shall avoid collision with vehicles by initiating emergency braking in case of impending collision with vehicles. ACC The ACC system shall respect the safety distance by keeping the vehicle Simpli fi ed Example Condition Template ⟨if op1 operator threshold⟩ speed(t) < speedLeadingCar(t) (t is the time stamp)
  61. Testing Automated Driving Systems 61 Testing on-the-road ! Simulation-based Testing
  62. Test Inputs Environment Position and speed Road Shape Traf fi c lights position and status 62 Weather Ego Car: - Initial Position - Initial Speed Car Under Test: - Initial Position - Initial Speed
  63. Feature Interactions Failures 63 Stop Min Distance 50Km
  64. In fi nite Input Space 64
  65. AI-Based Testing 65 Test Case Selection Initial Tests Evolutionary Algorithm Test Execution Variants Generation
  66. AI-Based Testing 66 Test 1 Test 2 Test Case Selection Initial Tests Evolutionary Algorithm Test Execution Variants Generation
  67. AI-Based Testing 67 Minimum distance within the simulation time window Results of Test 1 2m Results of Test 2 1m Results of Test 3 1.5m Test Case Selection Initial Tests Evolutionary Algorithm Test Execution Variants Generation
  68. AI-Based Testing 68 The best test case it the one closer to violate the safe distance ( fi tness) Test Case Selection Initial Tests Evolutionary Algorithm Test Execution Variants Generation Results of Test 1 2m Results of Test 2 1m
  69. AI-Based Testing 69 Test 2 Mutation and/or Crossover Test Case Selection Initial Tests Evolutionary Algorithm Test Execution Variants Generation
  70. Search Objectives 70 stem. s a TABLE I SAFETY REQUIREMENTS AND FAILURE DISTANCE FUNCTIONS FOR SafeDrive. Feature Requirement Failure distance functions (FD1, . . . , FD5) PP No collision with pedestrians FD1(i) is the distance between the ego car and the pedestrian at step i. AEB No collision with cars FD2(i) is the distance between the ego car and the leading car at step i. TSR Stop at a stop sign Let u(i) be the speed of the ego car at time step i if a stop sign is detected, and let u(i) = 0 if there is no stop sign. We define FD3(i) = 0 if u(i) 5km/h; FD3(i) = 1 u(i) if u(i) 6= 0; and otherwise, FD3(i) = 1. TSR Respect the speed limit Let u0(i) be the difference between the speed of the ego car and the speed limit at step i if a speed- limit sign is detected, and let u0(i) = 0 if there is no speed-limit sign. We define FD4(i) = 0 if u0(i) 10km/h; FD4(i) = 1 u0(i) if u0(i) 6= 0; and otherwise, FD4(i) = 1. ACC Respect the safety distance FD5(i) is the absolute difference between the safety distance sd and FD2(i). C. Hybrid Test Objectives Our test objectives aim to guide the test generation process towards test inputs that reveal undesired feature interactions. We first present our formal notation and assumptions and then we introduce our test objectives. Note that since in this paper, • For each safety requirement, we measure the distance to fail that requirement during the simulation • The problem is inherently many- objectives
  71. Case Study • Two case study systems from IEE (industrial partner) • Designed by experts • Manually tested for more than six months • Different rules to integrated feature actuator commands • Both systems consist of four self-driving features • Adaptive Cruise Control (ACC) • Automated Emergency Braking (AEB) • Traf fi c Sign Recognition (TSR) • Pedestrian Protection (PP) 71
  72. Many-Objective Search in Action 72 0 min 0 1 2 3 4 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 Distance to a failure Different types of (potential) failures with feature interactions A zero distance means we found a test that exposes the failure
  73. Some Results 73 12 h # Discovered Failures 0 1 2 3 4 5 6 7 8 9 10 0 3 6 9 12 Coverage-based Fuzzing Many-objective search
  74. Example of Failures 74
  75. Feedback From Domain Experts • The failure we found were due to undesired feature interactions • The failures were not previously known to the experts (new tests for regression testing) • We identi fi ed ways to improve the feature interaction logic to avoid such failures 75
  76. Take-Away Message 76
  77. Do Tests Generated by AI Help Developers? Open Challenges, Applications and Opportunities Annibale Panichella, Ph.D. a.panichella@tudelft.nl @AnniPanic 77
Advertisement