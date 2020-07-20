Successfully reported this slideshow.
Automated Driving Systems Trafﬁc Sign Recognition (TSR) Pedestrian Protection (PP) Lane Departure Warning (LDW) !2 Automat...
Feature Interactions Sensors / Camera Autonomous Feature Actuator Braking (over time) !3 Sensors / Camera Autonomous Featu...
Integration Components !4 Pedestrian Protection (PP) Autom. Emerg. Braking (AEB) Lane Dep. Warning (LDW) The integration i...
Testing Automated Driving Systems !5 Testing on-the-road ! Simulation-based Testing
Simulation-Based Test Case Simulator (Matlab/Simulink) Test Input Test Output !6 Software Under Test (SUT)
Case Study • Two case study systems from IEE (industrial partner) • Designed by experts • Manually tested for more than si...
Feature Interactions Failures !8 Stop
Program Repair !9 C. Le Goues et al. TSE 2012 Martinez and Monperrus, ISSTA 2016
Genetic Programming !10 Patch Selection Faulty Program GP Patch Evaluation Variants Generation Test Suite Potential patche...
Genetic Programming !11 Implicit Assumptions: • One-defect assumption • The patches require ﬁle line changes • Inexpensive...
ARIEL Automated Repair of IntEgration ruLes for ADS 12
ARIEL ARIEL is a (1+1) Evolutionary Algorithm with an Archive !13 ICSE ’20, May 23-29, 2020, Seoul, South Korea 589 590 59...
Customized Fault Localization !14 FL formulae measures the suspicious (likely faulty) statements in the production code ba...
Customized Mutation !15 Potential patches are generated using only two operators: • Changing the thresholds in the rules (...
Empirical Evaluation 16
Setting !17 Benchmark: • SafeDrive1 and SafeDrive2 from our industrial partner Baselines: • Genetic Programming (GP) • Ran...
Results !18 SelfDrive1 #FailingTests 0 1 2 3 4 Time(h) 0 2 4 6 8 10 12 14 16 GP ARIEL Random SelfDrive2 #FailingTests 0 0,...
Feedback From Domain Experts !19 • We interviewed software engineers involved in the development of AutoDrive1 and AutoDri...
!20 In Summary
Automated Repair of Feature Interaction Failures in Automated Driving Systems

  1. 1. Automated Repair of Feature Interaction Failures in Automated Driving Systems Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter !1
  2. 2. Automated Driving Systems Trafﬁc Sign Recognition (TSR) Pedestrian Protection (PP) Lane Departure Warning (LDW) !2 Automated Emergency Braking (AEB)
  3. 3. Feature Interactions Sensors / Camera Autonomous Feature Actuator Braking (over time) !3 Sensors / Camera Autonomous Feature Actuator Sensors / Camera Autonomous Feature Actuator . . . 30 % 20 % … 80 % Acceleration (over time) 60 % 10 % … 20 % Steering (over time) 30 % 20 % … 80 % (Deep Learning) (Neural Net.) (K-means)
  4. 4. Integration Components !4 Pedestrian Protection (PP) Autom. Emerg. Braking (AEB) Lane Dep. Warning (LDW) The integration is a rule set: each condition checks a speciﬁc feature interaction situation and resolves potential conﬂicts that may arise under that condition
  5. 5. Testing Automated Driving Systems !5 Testing on-the-road ! Simulation-based Testing
  6. 6. Simulation-Based Test Case Simulator (Matlab/Simulink) Test Input Test Output !6 Software Under Test (SUT)
  7. 7. Case Study • Two case study systems from IEE (industrial partner) • Designed by experts • Manually tested for more than six months • Different rules to integrated feature actuator commands • 700K eLOC • Two system-level test suites (≈30 min) with failing tests • Both systems consist of four self-driving features • ACC, AEB, TSR, PP !7
  8. 8. Feature Interactions Failures !8 Stop
  9. 9. Program Repair !9 C. Le Goues et al. TSE 2012 Martinez and Monperrus, ISSTA 2016
  10. 10. Genetic Programming !10 Patch Selection Faulty Program GP Patch Evaluation Variants Generation Test Suite Potential patches are generated using crossover (AST cuts) and mutation (AST changes) Run the entire test suite against each generated patch The patches with a lower number of failing test cases survive
  11. 11. Genetic Programming !11 Implicit Assumptions: • One-defect assumption • The patches require ﬁle line changes • Inexpensive test suites (a few seconds) • No guiding heuristics (a test either fails or passes) Automated Driving Systems: • Multiple defects in different locations • Up to 100 lines to changes • Each test suite requires 30 min • Not all failures are equal (the intensity of the violation changes)
  12. 12. ARIEL Automated Repair of IntEgration ruLes for ADS 12
  13. 13. ARIEL ARIEL is a (1+1) Evolutionary Algorithm with an Archive !13 ICSE ’20, May 23-29, 2020, Seoul, South Korea 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 e then (1) t have failed passed assing to the failing ): (2) some is the t fails. signed s they Algorithm 1: ARIEL Input: (f1, . . . , fn, ): Faulty self-driving system TS: Test suite Result: ⇤: a repaired rule-set satisfying all tc 2 TS 1 begin 2 Archive 3 RUN-EVALUATE( , TS) 4 while not(|Archive|==1 & Archive satises all tc 2 TS) do 5 p SELECT-A-PARENT(Archive) // Random selection 6 o GENERATE-PATCH( p, TS, ) 7 RUN-EVALUATE( o, TS) 8 Archive UPDATE-ARCHIVE(Archive , o, ) 9 return Archive localization (Equation 1) and (2) mutating the rule set in p. The routine GENERATE-PATCH is presented in subsection 3.2.1. Then, the ospring o is evaluated (line 7) by running the test suite TS, extracting the remaining failures, and computing their corresponding objective scores ( ). Note that the severities of the failures are our search objectives to optimize and are discussed in Section 3.2.3. The ospring o is added to the archive (line 8 of Al- gorithm 1) if it decreases the severity of the failures compared to the patches currently stored in the archive. The archive and its updating Archive Run the faulty program and computes the failures intensities (search-objectives) Generate only one patch through customized fault localization and mutation Add the offspring to the archive if it is better than the archive for at least one failure (1 parent + 1 offspring)
  14. 14. Customized Fault Localization !14 FL formulae measures the suspicious (likely faulty) statements in the production code based on the number of failing tests te by wtc the weight (severity) of the failure of tc. We then pute the suspiciousness of each statement s as follows: Susp(s) = Õ tc2T Sf [wtc ·co (tc,s)] Õ tc2T Sf wtc passed(s) total_passed + f ailed(s) total_f ailed (1) e passed(s) counts the number of passed test cases that have uted s at some time step; f ailed(s) counts the number of failed ases that have executed s at some time step; and total_passed otal_f ailed denote the total numbers of failing and passing cases, respectively. Note that Equation 1 is equivalent to the dard Tarantula formula if we let the weight (severity) for failing cases be equal to one (i.e., if wtc = 1 for every tc 2 TSf ): Susp(s) = f ailed(s) total_f ailed passed(s) total_passed + f ailed(s) total_f ailed (2) r each test case tc that fails at time step u and violates some irement r, we dene wtc = |O(tc(u),r)|. That is, wtc is the ee of violation caused by tc at the time step u when it fails. ce, test cases that lead to more severe violations are assigned r weights. Note that since we stop test cases as soon as they each test case can violate at most one requirement. Program Repair Algorithm 1: ARIEL Input: (f1, . . . , fn, ): Faulty self-driving system TS: Test suite Result: ⇤: a repaired rule-set satisfying all tc 2 1 begin 2 Archive 3 RUN-EVALUATE( , TS) 4 while not(|Archive|==1 Archive satises al 5 p SELECT-A-PARENT(Archive) 6 o GENERATE-PATCH( p, TS, 7 RUN-EVALUATE( o, TS) 8 Archive UPDATE-ARCHIVE(Archi 9 return Archive localization (Equation 1) and (2) mutatin routine GENERATE-PATCH is presented Then, the ospring o is evaluated (lin suite TS, extracting the remaining failur corresponding objective scores ( ). Note failures are our search objectives to optim Section 3.2.3. The ospring o is added to gorithm 1) if it decreases the severity of the patches currently stored in the archive. Th routine are described in details in subsecti when the termination criteria are met (se Tarantula [Jones et al. 2002] Suspicious statements are covered by failing tests mostly 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 Automated Repair of Integration Rules in Automated Driving Systems denote by wtc the weight (severity) of the failure of tc. We then compute the suspiciousness of each statement s as follows: Susp(s) = Õ tc2T Sf [wtc ·co (tc,s)] Õ tc2T Sf wtc passed(s) total_passed + f ailed(s) total_f ailed (1) where passed(s) counts the number of passed test cases that have executed s at some time step; f ailed(s) counts the number of failed test cases that have executed s at some time step; and total_passed and total_f ailed denote the total numbers of failing and passing test cases, respectively. Note that Equation 1 is equivalent to the standard Tarantula formula if we let the weight (severity) for failing test cases be equal to one (i.e., if wtc = 1 for every tc 2 TSf ): f ailed(s) Our formula Failing tests have weights that are proportional to the severity of the failures
  15. 15. Customized Mutation !15 Potential patches are generated using only two operators: • Changing the thresholds in the rules (e.g., minimum distance between cars) • Shifting conditions within rule sets (changing the priorities of the checks/rules) • No deletion (legal and ethical constraints) Anon. 727 728 729 730 731 732 733 734 735 736 737 738 739 Figure 5: Illustrating the shift operator: (a) selecting bs and path , and (b) applying the shift operator.
  16. 16. Empirical Evaluation 16
  17. 17. Setting !17 Benchmark: • SafeDrive1 and SafeDrive2 from our industrial partner Baselines: • Genetic Programming (GP) • Random Search (RS) Parameters: • GP with population size of 10 patches • Search time = 16h • 50 repetitions
  18. 18. Results !18 SelfDrive1 #FailingTests 0 1 2 3 4 Time(h) 0 2 4 6 8 10 12 14 16 GP ARIEL Random SelfDrive2 #FailingTests 0 0,5 1 1,5 2 Time(h) 0 2 4 6 8 10 12 14 16 GP ARIEL Random
  19. 19. Feedback From Domain Experts !19 • We interviewed software engineers involved in the development of AutoDrive1 and AutoDrive2 • ARIEL produces patches that differ from patches developers would write manually (developers would add more integration rules) • According to the developers, the patches generated by ARIEL are valid, understandable, useful and optimal. Besides, they cannot be produced by engineers Synthesized patches are superior to manually-written patches based on expert judgements
  20. 20. !20 In Summary

