1
Simulator-based Explanation and Debugging of
Hazard-triggering Events
in DNN-based Safety-critical Systems
https://dl.acm.org/doi/10.1145/3569935
ACM Transactions on Software Engineering & Methodology (TOSEM)
Hazem Fahmy 1, Fabrizio Pastore 1, Lionel Briand 1,2, Thomas Stifter 3
1 University of Luxembourg, 2 University of Ottawa, 3 IEE S.A.
2
DNN-based Safety-critical Systems
Autonomous
Drones
Self-driving
Cars
Child
Detection
3
Simulator-based Training of DNNs
Training set
(simulator images)
DNN
Training
Fine-tuning
DNN
Testing
Trained
DNN
Fine-Tuned
DNN
Failures
Training set
(real-world images)
Test set
(real-world images)
Necessary for engineers to characterize failure-inducing images.
To improve DNNs (e.g., select images for retraining), or to identify
countermeasures (e.g., two cameras).
Manual inspection of images is expensive and error prone.
4
Can we automatically generate
expressions
constraining simulator parameters
to explain DNN failures
observed with real-world data?
5
-28.2 < Head_Vert < -15.7
(Top)
-23.9 < Head_Hor < 0.7
(Left)
22.8 > Head_Vert > -10.8
(Middle – Top)
7.3 < Head_Hor < 32.9
(Center – Right)
7
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Step 1. Identify root-cause clusters (RCCs)
Rooot
Cause
Clusters
(RCCs)
Simulator-based Explanation for DNN Errors (SEDE)
8
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Evolutionary
Algorithm
Simulator
Simulator
images
Configuration
Parameters
RCC Representative
Images
Step 1. Identify root-cause clusters (RCCs)
Step 2. Generate images associated to RCCs
RCCs
Step 2.1. Identify RCC Representative Images
PaiR
Simulator-based Explanation for DNN Errors (SEDE)
9
Step 2.1. Identify RCC Representative Images
(PaiR)
Generates a set of images that belong to the RCC and are diverse
Off-springs
RCC medoid
Parents
O2. Diversity
O1. Cluster Membership
11
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Evolutionary
Algorithms
Simulator
Simulator
images
Configuration
Parameters
Step 1. Identify root-cause clusters (RCCs)
Step 2. Generate images associated to RCCs
RCCs
PaiR
Simulator-based Explanation for DNN Errors (SEDE)
Unsafe images in the RCC
+ simulator parameters
RCC Representative
Images
Step 2.1. Identify RCC Representative Images
Step 2.2. Generate a set of failing images belonging to the cluster
12
Step 2.2. Generate a set of failing images
• Objective: Generate a failing image that is similar to one reference image in P1
• to characterize the unsafe space of a cluster
• while leveraging the diversity in the population P1
P1 images
Failing images
RCC
Real unsafe
space
13
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Evolutionary
Algorithms
Simulator
Simulator
images
Configuration
Parameters
Step 1. Identify root-cause clusters (RCCs)
Step 2. Generate images associated to RCCs
RCCs
Step 2.3. Generate one non-failing image close to each failing image
PaiR
Simulator-based Explanation for DNN Errors (SEDE)
Non-failing images similar to faioling
images + simulator parameters
Failing images in the RCC
+ simulator parameters
RCC Representative
Images
Step 2.1. Identify RCC Representative Images
Step 2.2. Generate a set of failing images belonging to the cluster
15
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Evolutionary
Algorithms
Simulator
Simulator
images
Configuration
Parameters
Safe images similar to unsafe images
+ simulator parameters
Unsafe images in the RCC
+ simulator parameters
Rule Extraction Algorithm (PART)
IF-THEN
Rules
Expressions
Generator
Explanation
Expression
Step 1. Identify root-cause clusters (RCCs)
Step 2. Generate images associated to RCCs
RCCs
Step 2.2. Generate a set of non-failing images belonging to the cluster
Step 2.3. Generate one non-failing image close to each unsafe image
Step 3. Generate expressions that characterize unsafe images
Simulator-based Explanation for DNN Errors (SEDE)
PaiR
RCC Representative
Images Step 2.1. Identify RCC Representative Images
16
Example Output
-4.25 < Head_Hor < 36.5
(Center – Right)
6.4 < Head_Vert < 21.8
(Bottom)
HUDD-RCC
SEDE Failing Images
SEDE Expressions
SEDE Passing Images
18
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Rule Extraction Algorithm (PART)
IF-THEN
Rules
Expressions
Generator
Explanation
Expression
Configuration
Parameters
Unsafe
Improvement Set
Retrained
DNN
Inputs
Selection
Simulator
Retraining
Best DNN
Step 1. Identify root-cause clusters (RCCs)
Step 2. Generate images associated to RCCs
Step 4. Retrain the DNN Execute 10 times
RCCs
Step 3. Generate expressions that characterize unsafe images
Simulator-based Explanation for DNN Errors (SEDE)
Evolutionary
Algorithms
Simulator
Simulator
images
Configuration
Parameters
Safe images similar to unsafe images
+ simulator parameters
Unsafe images in the RCC
+ simulator parameters
Step 2.2. Generate a set of unsafe images belonging to the cluster
Step 2.3. Generate one safe image for each unsafe image
PaiR
RCC Representative
Images Step 2.1. Identify RCC Representative Images
19
Empirical
Evaluation
20
Research Questions
§ RQ1: How does PaiR fare, compared to alternative approaches, for the
generation of diverse images belonging to RCCs?
§ RQ2: Does SEDE generate images that are close to the center of each
RCC?
§ RQ3: Does SEDE generate, for each RCC, a set of images sharing similar
characteristics?
§ RQ4: Do the RCC expressions identified by SEDE delimit an unsafe space?
§ Necessary to generate meaningful explanations
§ RQ5: How does SEDE compare to traditional DNN accuracy improvement
practices?
§ Necessary to evaluate if the images generated according to our expressions help
improving the accuracy of a DNN when it processed real-world images
Simulator-based Explanation for DNN Errors (SEDE)
21
Two opensource Simulators from
IEE Face-Simulator
(13 parameters)
IEE Human-Simulator
(21 parameters)
Simulator-based Explanation for DNN Errors (SEDE)
22
Subjects of the study
• Two head pose detection DNNs
• One trained with IEE Face-Simulator
• One trained with IEE Human-Simulator
• Both fine-tuned with IEE real-world dataset
• One face-landmarks detection DNN
• Trained with to IEE Face-Simulator
• Fine-tuned with IEE real-world dataset
IEE-Faces
Generated Image
Simulator-based Explanation for DNN Errors (SEDE)
IEE-Humans
Simulator
IEE Real-world
Dataset
BIWI-Kinect
Dataset
33
RQ4. Do the expressions identified by SEDE delimit an
unsafe space?
• We aim to demonstrate that images matching our expression
lead to low accuracy
• Experiment design:
• Generate 500 images for each RCC, according to SEDE
expressions
• Compute the percentage of correctly classified images
• Positively answer RQ4 if, for a large number of clusters,
• the generated images have an accuracy that is significantly lower than the
accuracy observed with the Test Set
Simulator-based Explanation for DNN Errors (SEDE)
34
RQ4. Summary Results
Do the RCC expressions identified by SEDE delimit an unsafe space?
-37%
-36%
-17%
Simulator-based Explanation for DNN Errors (SEDE)
35
RQ5. Is it possible to improve the DNN by leveraging the
unsafe expressions identified by SEDE?
• We aim to determine if images matching our expressions
may improve the accuracy of the DNN
• Experiment design:
• Retrain the DNN using 500 generated images per cluster
matching our expressions
• Measure the overall improvement of the DNN’s accuracy on the
testset
• Compare with HUDD and a random baseline
• Repeat the experiment 10 times
Simulator-based Explanation for DNN Errors (SEDE)
37
RQ5. Results
How does SEDE compare to traditional DNN accuracy improvement practices?
Simulator-based Explanation for DNN Errors (SEDE)
DNN
Original
accuracy
Accuracy after retraining SEDE
Gain over
best
baseline
Stat. Sign.
SEDE HUDD RBL p-value A12
FLD 80.06% 86.14% 79.94% 77.41% +6.19% 1e-4 1.0
HPD-F 51.65% 56.15% 45.80% 44.33% +10.35% 4e-4 0.94
HPD-H 51.03% 69.68% 60.65% 55.57% +9.03% 1e-4 1.0
38
4
Can we automatically generate
explanations for DNN failures
as expressions
constraining simulator parameters?
15
Simulator-based Explanations for DNN Errors (SEDE)
Real-world
Error-inducing images HUDD
Rule Extraction Algorithm (PART)
IF-THEN
Rules
Expressions
Generator
Explanation
Expression
Configuration
Parameters
Unsafe
Improvement Set
Retrained
DNN
Inputs
Selection
Simulator
Retraining
Best DNN
Step 1. Identify root-cause clusters (RCCs)
Step 2. Generate images associated to RCCs
Step 4. Retrain the DNN xN
RCCs
Step 3. Generate expressions that characterize unsafe images
Simulator-based Explanation for DNN Errors (SEDE)
SEDE. Full Approach
Evolutionary
Algorithms
Simulator
Simulator
images
Configuration
Parameters
Safe images similar to unsafe images
+ simulator parameters
Unsafe images in the RCC
+ simulator parameters
Step 2.2. Generate a set of unsafe images belonging to the cluster
Step 2.3. Generate one safe image for each unsafe image
PaiR
RCC Representative
Images Step 2.1. Identify RCC Representative Images
17
Research Questions
• RQ1: How does PaiR fare, compared to alternative approaches, for the
generation of diverse images belonging to RCCs?
• RQ2: Does SEDE generate images that are close to the center of each
RCC?
• Necessary to generate explanations that are related to each RCC
• RQ3: Does SEDE generate, for each RCC, a set of images sharing similar
characteristics?
• Necessary to generate meaningful explanations
• RQ4: Do the RCC expressions identified by SEDE delimit an unsafe space?
• Necessary to evaluate the effectiveness of our explanations
• RQ5: How does SEDE compare to traditional DNN accuracy improvement
practices?
• Necessary to evaluate if the generated images according to our expressions can address
problems concerning real-world scenarios, compared to SOTA
Simulator-based Explanation for DNN Errors (SEDE)
SEDE. RQs
https://github.com/SNTSVV/SEDE
https://doi.org/10.6084/m9.figshare.19467401
39
Simulator-based Explanation and Debugging of
Hazard-triggering Events
in DNN-based Safety-critical Systems
https://dl.acm.org/doi/10.1145/3569935
ACM Transactions on Software Engineering & Methodology (TOSEM)
Hazem Fahmy 1, Fabrizio Pastore 1, Lionel Briand 1,2, Thomas Stifter 3
1 University of Luxembourg, 2 University of Ottawa, 3 IEE S.A.

Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-based Safety-critical Systems

  • 1.
    1 Simulator-based Explanation andDebugging of Hazard-triggering Events in DNN-based Safety-critical Systems https://dl.acm.org/doi/10.1145/3569935 ACM Transactions on Software Engineering & Methodology (TOSEM) Hazem Fahmy 1, Fabrizio Pastore 1, Lionel Briand 1,2, Thomas Stifter 3 1 University of Luxembourg, 2 University of Ottawa, 3 IEE S.A.
  • 2.
  • 3.
    3 Simulator-based Training ofDNNs Training set (simulator images) DNN Training Fine-tuning DNN Testing Trained DNN Fine-Tuned DNN Failures Training set (real-world images) Test set (real-world images) Necessary for engineers to characterize failure-inducing images. To improve DNNs (e.g., select images for retraining), or to identify countermeasures (e.g., two cameras). Manual inspection of images is expensive and error prone.
  • 4.
    4 Can we automaticallygenerate expressions constraining simulator parameters to explain DNN failures observed with real-world data?
  • 5.
    5 -28.2 < Head_Vert< -15.7 (Top) -23.9 < Head_Hor < 0.7 (Left) 22.8 > Head_Vert > -10.8 (Middle – Top) 7.3 < Head_Hor < 32.9 (Center – Right)
  • 6.
    7 Simulator-based Explanations forDNN Errors (SEDE) Real-world Error-inducing images HUDD Step 1. Identify root-cause clusters (RCCs) Rooot Cause Clusters (RCCs) Simulator-based Explanation for DNN Errors (SEDE)
  • 7.
    8 Simulator-based Explanations forDNN Errors (SEDE) Real-world Error-inducing images HUDD Evolutionary Algorithm Simulator Simulator images Configuration Parameters RCC Representative Images Step 1. Identify root-cause clusters (RCCs) Step 2. Generate images associated to RCCs RCCs Step 2.1. Identify RCC Representative Images PaiR Simulator-based Explanation for DNN Errors (SEDE)
  • 8.
    9 Step 2.1. IdentifyRCC Representative Images (PaiR) Generates a set of images that belong to the RCC and are diverse Off-springs RCC medoid Parents O2. Diversity O1. Cluster Membership
  • 9.
    11 Simulator-based Explanations forDNN Errors (SEDE) Real-world Error-inducing images HUDD Evolutionary Algorithms Simulator Simulator images Configuration Parameters Step 1. Identify root-cause clusters (RCCs) Step 2. Generate images associated to RCCs RCCs PaiR Simulator-based Explanation for DNN Errors (SEDE) Unsafe images in the RCC + simulator parameters RCC Representative Images Step 2.1. Identify RCC Representative Images Step 2.2. Generate a set of failing images belonging to the cluster
  • 10.
    12 Step 2.2. Generatea set of failing images • Objective: Generate a failing image that is similar to one reference image in P1 • to characterize the unsafe space of a cluster • while leveraging the diversity in the population P1 P1 images Failing images RCC Real unsafe space
  • 11.
    13 Simulator-based Explanations forDNN Errors (SEDE) Real-world Error-inducing images HUDD Evolutionary Algorithms Simulator Simulator images Configuration Parameters Step 1. Identify root-cause clusters (RCCs) Step 2. Generate images associated to RCCs RCCs Step 2.3. Generate one non-failing image close to each failing image PaiR Simulator-based Explanation for DNN Errors (SEDE) Non-failing images similar to faioling images + simulator parameters Failing images in the RCC + simulator parameters RCC Representative Images Step 2.1. Identify RCC Representative Images Step 2.2. Generate a set of failing images belonging to the cluster
  • 12.
    15 Simulator-based Explanations forDNN Errors (SEDE) Real-world Error-inducing images HUDD Evolutionary Algorithms Simulator Simulator images Configuration Parameters Safe images similar to unsafe images + simulator parameters Unsafe images in the RCC + simulator parameters Rule Extraction Algorithm (PART) IF-THEN Rules Expressions Generator Explanation Expression Step 1. Identify root-cause clusters (RCCs) Step 2. Generate images associated to RCCs RCCs Step 2.2. Generate a set of non-failing images belonging to the cluster Step 2.3. Generate one non-failing image close to each unsafe image Step 3. Generate expressions that characterize unsafe images Simulator-based Explanation for DNN Errors (SEDE) PaiR RCC Representative Images Step 2.1. Identify RCC Representative Images
  • 13.
    16 Example Output -4.25 <Head_Hor < 36.5 (Center – Right) 6.4 < Head_Vert < 21.8 (Bottom) HUDD-RCC SEDE Failing Images SEDE Expressions SEDE Passing Images
  • 14.
    18 Simulator-based Explanations forDNN Errors (SEDE) Real-world Error-inducing images HUDD Rule Extraction Algorithm (PART) IF-THEN Rules Expressions Generator Explanation Expression Configuration Parameters Unsafe Improvement Set Retrained DNN Inputs Selection Simulator Retraining Best DNN Step 1. Identify root-cause clusters (RCCs) Step 2. Generate images associated to RCCs Step 4. Retrain the DNN Execute 10 times RCCs Step 3. Generate expressions that characterize unsafe images Simulator-based Explanation for DNN Errors (SEDE) Evolutionary Algorithms Simulator Simulator images Configuration Parameters Safe images similar to unsafe images + simulator parameters Unsafe images in the RCC + simulator parameters Step 2.2. Generate a set of unsafe images belonging to the cluster Step 2.3. Generate one safe image for each unsafe image PaiR RCC Representative Images Step 2.1. Identify RCC Representative Images
  • 15.
  • 16.
    20 Research Questions § RQ1:How does PaiR fare, compared to alternative approaches, for the generation of diverse images belonging to RCCs? § RQ2: Does SEDE generate images that are close to the center of each RCC? § RQ3: Does SEDE generate, for each RCC, a set of images sharing similar characteristics? § RQ4: Do the RCC expressions identified by SEDE delimit an unsafe space? § Necessary to generate meaningful explanations § RQ5: How does SEDE compare to traditional DNN accuracy improvement practices? § Necessary to evaluate if the images generated according to our expressions help improving the accuracy of a DNN when it processed real-world images Simulator-based Explanation for DNN Errors (SEDE)
  • 17.
    21 Two opensource Simulatorsfrom IEE Face-Simulator (13 parameters) IEE Human-Simulator (21 parameters) Simulator-based Explanation for DNN Errors (SEDE)
  • 18.
    22 Subjects of thestudy • Two head pose detection DNNs • One trained with IEE Face-Simulator • One trained with IEE Human-Simulator • Both fine-tuned with IEE real-world dataset • One face-landmarks detection DNN • Trained with to IEE Face-Simulator • Fine-tuned with IEE real-world dataset IEE-Faces Generated Image Simulator-based Explanation for DNN Errors (SEDE) IEE-Humans Simulator IEE Real-world Dataset BIWI-Kinect Dataset
  • 19.
    33 RQ4. Do theexpressions identified by SEDE delimit an unsafe space? • We aim to demonstrate that images matching our expression lead to low accuracy • Experiment design: • Generate 500 images for each RCC, according to SEDE expressions • Compute the percentage of correctly classified images • Positively answer RQ4 if, for a large number of clusters, • the generated images have an accuracy that is significantly lower than the accuracy observed with the Test Set Simulator-based Explanation for DNN Errors (SEDE)
  • 20.
    34 RQ4. Summary Results Dothe RCC expressions identified by SEDE delimit an unsafe space? -37% -36% -17% Simulator-based Explanation for DNN Errors (SEDE)
  • 21.
    35 RQ5. Is itpossible to improve the DNN by leveraging the unsafe expressions identified by SEDE? • We aim to determine if images matching our expressions may improve the accuracy of the DNN • Experiment design: • Retrain the DNN using 500 generated images per cluster matching our expressions • Measure the overall improvement of the DNN’s accuracy on the testset • Compare with HUDD and a random baseline • Repeat the experiment 10 times Simulator-based Explanation for DNN Errors (SEDE)
  • 22.
    37 RQ5. Results How doesSEDE compare to traditional DNN accuracy improvement practices? Simulator-based Explanation for DNN Errors (SEDE) DNN Original accuracy Accuracy after retraining SEDE Gain over best baseline Stat. Sign. SEDE HUDD RBL p-value A12 FLD 80.06% 86.14% 79.94% 77.41% +6.19% 1e-4 1.0 HPD-F 51.65% 56.15% 45.80% 44.33% +10.35% 4e-4 0.94 HPD-H 51.03% 69.68% 60.65% 55.57% +9.03% 1e-4 1.0
  • 23.
    38 4 Can we automaticallygenerate explanations for DNN failures as expressions constraining simulator parameters? 15 Simulator-based Explanations for DNN Errors (SEDE) Real-world Error-inducing images HUDD Rule Extraction Algorithm (PART) IF-THEN Rules Expressions Generator Explanation Expression Configuration Parameters Unsafe Improvement Set Retrained DNN Inputs Selection Simulator Retraining Best DNN Step 1. Identify root-cause clusters (RCCs) Step 2. Generate images associated to RCCs Step 4. Retrain the DNN xN RCCs Step 3. Generate expressions that characterize unsafe images Simulator-based Explanation for DNN Errors (SEDE) SEDE. Full Approach Evolutionary Algorithms Simulator Simulator images Configuration Parameters Safe images similar to unsafe images + simulator parameters Unsafe images in the RCC + simulator parameters Step 2.2. Generate a set of unsafe images belonging to the cluster Step 2.3. Generate one safe image for each unsafe image PaiR RCC Representative Images Step 2.1. Identify RCC Representative Images 17 Research Questions • RQ1: How does PaiR fare, compared to alternative approaches, for the generation of diverse images belonging to RCCs? • RQ2: Does SEDE generate images that are close to the center of each RCC? • Necessary to generate explanations that are related to each RCC • RQ3: Does SEDE generate, for each RCC, a set of images sharing similar characteristics? • Necessary to generate meaningful explanations • RQ4: Do the RCC expressions identified by SEDE delimit an unsafe space? • Necessary to evaluate the effectiveness of our explanations • RQ5: How does SEDE compare to traditional DNN accuracy improvement practices? • Necessary to evaluate if the generated images according to our expressions can address problems concerning real-world scenarios, compared to SOTA Simulator-based Explanation for DNN Errors (SEDE) SEDE. RQs https://github.com/SNTSVV/SEDE https://doi.org/10.6084/m9.figshare.19467401
  • 24.
    39 Simulator-based Explanation andDebugging of Hazard-triggering Events in DNN-based Safety-critical Systems https://dl.acm.org/doi/10.1145/3569935 ACM Transactions on Software Engineering & Methodology (TOSEM) Hazem Fahmy 1, Fabrizio Pastore 1, Lionel Briand 1,2, Thomas Stifter 3 1 University of Luxembourg, 2 University of Ottawa, 3 IEE S.A.