SlideShare a Scribd company logo
1 of 29
Download to read offline
.lusoftware verification & validation
VVS
Comparing Offline and Online
Testing of Deep Neural Networks:
An Autonomous Car Case Study
Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel Briand
2020-10-25
Introduction
• Deep Neural Networks (DNNs) help accurately automate real-world
tasks such as speech recognition and image classification
• DNNs are increasingly used in safety critical autonomous systems,
such as Automated Driving System (ADS)
• The challenge of ensuring safety and reliability of DNN-based
systems emerges as a fundamental problem
!2
Existing Testing Approaches
• Many DNN testing approaches have been proposed recently
• Distinct modes of testing:
• Offline testing
• Online testing
!3
Offline Testing
• Testing DNNs as stand-alone components
• DNNs are tested using (historical) data in an open-loop mode
!4
Label Image Prediction
DNN
Prediction Error
Test data
Online Testing
• Testing DNNs embedded into a specific application
• DNNs are tested when embedded into an application environment in a
closed-loop mode
!5
DNN
(Virtual) 

Ego Car
Image
Prediction
Embedded
Mobile Objects
over Time
Application Environment
Safety Violation
Offline Testing vs. Online Testing?
• Comparatively, offline testing has been far more studied to date
• Limited insight as to how these two DNN testing approaches
compare with another
• Do large prediction errors identified by offline testing always lead to
safety violations detectable by online testing?
• Do the safety violations identified by online testing translate into large
prediction errors in offline testing?
!6
RQ1: How do offline and online testing results differ and complement each other?
Real-world vs. Simulated Data?
• Testing DNNs embedded into real and operational environments is
often very expensive, dangerous, and time-consuming
• To answer RQ1, we can rely on high-fidelity simulators that allow us
to specify and execute scenarios capturing various situations
• However, we do not know if simulator-generated data are a reliable
substitute to real-world data for the purpose of DNN testing
!7
RQ0: Can we use simulator-generated data as a reliable substitute 

to real-world data for the purpose of DNN testing?
DNNs in ADS
• In this study, though the investigated questions are relevant to all
autonomous systems, we focus on DNNs in the context of ADS
!8
ADS
DNN
Camera Steering angle
Brake & Accelerate
Environment
Lidar
… …
Feedback Action
Offline Testing for ADS-DNN
!9
PredictionsDNNTest Data
Human Drvier Real Car
Domain Model Simulator
Online Testing for ADS-DNN
!10
Domain Model
Image
DNNSimulator
Steering Angle
Ego Car and
Mobile Objects
Behaviors
over Time
Domain Model for Simulator
• Capturing the test input space
• Based on the features observed in
real-world datasets
• Each entity has multiple variables
• Additional constraints describing
valid value assignments to the
variables
• A (test) simulation scenario is
determined by a vector of values
assigned to the variables
!11
Scenario
Weather
type: {sunny, fog, rainy, snowy}
visibility: {low, medium, high}
Road
type: {straight, curve, spiral}
direction: {left, right}
length: {25, 50, 75, 100}
curveRadius: {20, 30, …, 60}
numLanes: {1, 2, 3}
…
Car
speed: {10, 20, …, 100}
oppositeLane: Boolean
headlight: Boolean
highBeam: Boolean
foglight: Boolean
infrontEgoCar: Boolean
Environment
trees: Boolean
…
Research Questions
• RQ0: Can we use simulator-generated data as a reliable alternative
source to real-world data?
• We configure the simulator to generate a dataset that resembles the
characteristics of a real-life dataset, and then compare the offline
testing results for these datasets
• RQ1: How do offline and online testing results differ and complement
each other?
• For the same simulator-generated datasets, we compare the offline
and online testing results
!12
Subject DNN Models
• Two publicly-available, widely used pre-trained DNN-based steering
angle prediction models, i.e., Autumn and Chauffeur
• Autumn consists of an image preprocessing module that computes the
optical flow and a Convolutional Neural Network (CNN) that predicts
steering angles
• Chauffeur consists of one CNN that extracts the image features and a
Recurrent Neural Network (RNN) that predicts steering angles from the
previous 100 consecutive images
!13
Real-world Dataset
!14
• Sequences of [image, steering angle] pairs from the Udacity Challenge
−1.0
−0.5
0.0
0.5
1.0
0 2000 4000
Image ID
Steeringangle(deg/25)
(a fragment of) the Training Data
Actual Steering Angle for Testing Data

(i.e., 5614 labeled images for testing)
Prediction Errors
• Prediction errors of the DNN models for the real-world testing dataset
• The prediction error is computed by two well-known metrics, Mean
Absolute Error (MAE) and Root Mean Square Error (RMSE)
• The models are reasonably accurate for the real-world test dataset
!15
Model Reported RMSE Our RMSE Our MAE
Autumn Not Reported 0,049 0,034
Chauffeur 0,058 0,092 0,055
Meaning: 1.375° on average
RQ0: Overview
!16
MAE(RD)
MAE(SD)
Real-world Dataset (RD)
Simulator-generated Dataset (SD)
DNN-based model
Compare
“Comparable”
RQ0: Replicate Real-world Dataset
• It is infeasible to generate SD with exactly the same environmental
properties and vehicle dynamics as in RD
• Instead, we say SD is comparable with a subsequence of RD if:
• the images have the same features (e.g., sunny weather)
• the steering angle difference per image is small enough on average
• We propose a two-step heuristic to generate SDs that are
comparable with the subsequences of RD
!17
RQ0: Two-Step Heuristic (1/2)
• Step 1: Randomly generate SDs based on a domain model restricted
to the features observed in RD
• For example, the restricted domain model includes only sunny weather
since the test dataset has only sunny images
• This enables us to steer the simulator to resemble the characteristics
of the images in the test dataset, to the extent possible
!18
RQ0: Two-Step Heuristic (2/2)
• Step 2: For each SD, identify a comparable subsequence of RD
considering steering angles
• We obtain comparable dataset pairs with small-enough steering angle
differences
!19
Simulator-
generated
Steering
Angles
Human-generated Steering Angles

for the real-world dataset
Minimal Difference 

(less than a small threshold)
Comparable subsequence of the real-world dataset
Search
RQ0: Results (1/2)
• We identified 92 simulator-generated datasets that could match
subsequences of the Udacity real-life test dataset
• One of the comparable pairs is shown as follows:
!20
−0.1
0.0
0.1
0.2
0.3
0.4
0 50 100 150 200
Image ID
Actualsteeringangle(deg/25)
Real (human) Simulated
Steering AnglesImages
Real-world Simulator-generated
RQ0: Results (2/2)
• Distributions of MAE differences, i.e., abs(MAE(r), MAE(s)), where r
and s are comparable real-world and simulator-generated datasets
!21
0.00
0.25
0.50
0.75
1.00
Autumn Chauffeur
MAEdifference
Meaning: 2.5° on average
• For Autumn, 96.7% of the comparable pairs
have an MAE difference below 0.1
• For Chauffeur, 68.5% of the comparable
pairs have an MAE difference below 0.1
• Even when MAE is larger than 0.1,
MAE(s) is always greater than MAE(r)
0.10
RQ0: Implications
• The prediction error differences between simulator-generated
datasets and real-life datasets are less than 0.1, on average, for both
Autumn and Chauffeur
• We can use simulator-generated datasets as a reliable alternative to
real-world datasets for testing these DNNs
!22
RQ1: Setup (1/2)
• We randomly generate 50 scenarios and compare the offline and
online testing results for each of the simulator-generated datasets
• For offline testing, we use the MAE metric (i.e., prediction error)
• For online testing, we use the Maximum Distance from Center of Lane
(MDCL) metric to measure the lane departure degree (i.e., safety
violation)
• However, we cannot directly compare MAE and MDCL values since
MAE and MDCL are different metrics
!23
RQ1: Setup (2/2)
• To determine whether the offline and online testing results are
consistent or not, we set threshold values for MAE and MDCL
• We interpret the offline testing result as acceptable if MAE < 0.1
(meaning the average prediction error < 2.5°)
• We interpret the online testing result as acceptable if MDCL < 1
(meaning the maximum departure < one meter)
• If both offline and online testing results are consistently
(un)acceptable, we say offline and online testing are in agreement
!24
RQ1: Results (1/2)
• Comparison between offline and online testing results
!25
Autumn Chauffeur
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
MAE
MDCL
44% 34%
0% 0%
48% 48%
8%
18%
RQ1: Results (2/2)
• One of the scenarios on which offline and online testing disagreed
!26
0
1
2
3
4
5
0 20 40 60
Image ID
Predictionerror(deg)
Offline Testing Result Online Testing Result
RQ1: Implications
• Offline and online testing results differ in many cases
• Offline testing is more optimistic than online testing because the
accumulation of errors is not observed in offline testing
• Online testing is preferable to offline testing for ADS-DNNs
!27
Conclusion
• We showed that simulator-generated datasets yield DNN prediction
errors that are similar to those obtained by real-world datasets
• We also found that many safety violations identified by online testing
were not detected by offline testing
• As part of future work, we plan to investigate how to improve the
performance of DNN-based ADS using offline and online testing
results
!28
.lusoftware verification & validation
VVS
Comparing Offline and Online
Testing of Deep Neural Networks:
An Autonomous Car Case Study
Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel Briand
2020-10-25

More Related Content

Similar to Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study

Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETMarco Parenzan
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Sara Granados Cabeza
 
Realtime pothole detection system using improved CNN Models
Realtime pothole detection system using improved CNN ModelsRealtime pothole detection system using improved CNN Models
Realtime pothole detection system using improved CNN Modelsnithinsai2992
 
74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-d74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-dravi247272
 
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Alejandro Salado
 
Programming with Relaxed Synchronization
Programming with Relaxed SynchronizationProgramming with Relaxed Synchronization
Programming with Relaxed Synchronizationracesworkshop
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
Lecture 01   frank dellaert - 3 d reconstruction and mapping: a factor graph ...Lecture 01   frank dellaert - 3 d reconstruction and mapping: a factor graph ...
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...mustafa sarac
 
Conen 442 module1b: Traffic Studies
Conen  442 module1b: Traffic StudiesConen  442 module1b: Traffic Studies
Conen 442 module1b: Traffic StudiesWael ElDessouki
 
Ubiquitious Computing system : Integrating RFID with Face Recognition systems
Ubiquitious Computing system : Integrating RFID with Face Recognition systemsUbiquitious Computing system : Integrating RFID with Face Recognition systems
Ubiquitious Computing system : Integrating RFID with Face Recognition systemsShahryar Ali
 
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]KenjiKoide1
 
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...IRJET Journal
 
Change Detection of 3D Scene with 3D and 2D Information for Environment Checking
Change Detection of 3D Scene with 3D and 2D Information for Environment CheckingChange Detection of 3D Scene with 3D and 2D Information for Environment Checking
Change Detection of 3D Scene with 3D and 2D Information for Environment Checkingbaowei lin
 
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET Journal
 
Development of ML-based Optical Fine Alignment tool
Development of ML-based Optical Fine Alignment toolDevelopment of ML-based Optical Fine Alignment tool
Development of ML-based Optical Fine Alignment toolSashank Mishra
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002Enrico Busto
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002Enrico Busto
 

Similar to Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study (20)

Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Realtime pothole detection system using improved CNN Models
Realtime pothole detection system using improved CNN ModelsRealtime pothole detection system using improved CNN Models
Realtime pothole detection system using improved CNN Models
 
74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-d74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-d
 
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
Assessing the Impacts of Uncertainty Propagation to System Requirements by Ev...
 
Programming with Relaxed Synchronization
Programming with Relaxed SynchronizationProgramming with Relaxed Synchronization
Programming with Relaxed Synchronization
 
CVPR presentation
CVPR presentationCVPR presentation
CVPR presentation
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
Lecture 01   frank dellaert - 3 d reconstruction and mapping: a factor graph ...Lecture 01   frank dellaert - 3 d reconstruction and mapping: a factor graph ...
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
 
Conen 442 module1b: Traffic Studies
Conen  442 module1b: Traffic StudiesConen  442 module1b: Traffic Studies
Conen 442 module1b: Traffic Studies
 
Ubiquitious Computing system : Integrating RFID with Face Recognition systems
Ubiquitious Computing system : Integrating RFID with Face Recognition systemsUbiquitious Computing system : Integrating RFID with Face Recognition systems
Ubiquitious Computing system : Integrating RFID with Face Recognition systems
 
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
Adaptive Hyper-Parameter Tuning for Black-box LiDAR Odometry [IROS2021]
 
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
 
Change Detection of 3D Scene with 3D and 2D Information for Environment Checking
Change Detection of 3D Scene with 3D and 2D Information for Environment CheckingChange Detection of 3D Scene with 3D and 2D Information for Environment Checking
Change Detection of 3D Scene with 3D and 2D Information for Environment Checking
 
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
 
Development of ML-based Optical Fine Alignment tool
Development of ML-based Optical Fine Alignment toolDevelopment of ML-based Optical Fine Alignment tool
Development of ML-based Optical Fine Alignment tool
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 

More from Lionel Briand

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityLionel Briand
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingLionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsLionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsLionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Lionel Briand
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsLionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyLionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationLionel Briand
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Lionel Briand
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Lionel Briand
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...Lionel Briand
 
Requirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsRequirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsLionel Briand
 
Practical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test DataPractical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test DataLionel Briand
 

More from Lionel Briand (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System Logs
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
 
Requirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and ApplicationsRequirements in Cyber-Physical Systems: Specifications and Applications
Requirements in Cyber-Physical Systems: Specifications and Applications
 
Practical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test DataPractical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test Data
 

Recently uploaded

OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 

Recently uploaded (20)

OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 

Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study

  • 1. .lusoftware verification & validation VVS Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel Briand 2020-10-25
  • 2. Introduction • Deep Neural Networks (DNNs) help accurately automate real-world tasks such as speech recognition and image classification • DNNs are increasingly used in safety critical autonomous systems, such as Automated Driving System (ADS) • The challenge of ensuring safety and reliability of DNN-based systems emerges as a fundamental problem !2
  • 3. Existing Testing Approaches • Many DNN testing approaches have been proposed recently • Distinct modes of testing: • Offline testing • Online testing !3
  • 4. Offline Testing • Testing DNNs as stand-alone components • DNNs are tested using (historical) data in an open-loop mode !4 Label Image Prediction DNN Prediction Error Test data
  • 5. Online Testing • Testing DNNs embedded into a specific application • DNNs are tested when embedded into an application environment in a closed-loop mode !5 DNN (Virtual) 
 Ego Car Image Prediction Embedded Mobile Objects over Time Application Environment Safety Violation
  • 6. Offline Testing vs. Online Testing? • Comparatively, offline testing has been far more studied to date • Limited insight as to how these two DNN testing approaches compare with another • Do large prediction errors identified by offline testing always lead to safety violations detectable by online testing? • Do the safety violations identified by online testing translate into large prediction errors in offline testing? !6 RQ1: How do offline and online testing results differ and complement each other?
  • 7. Real-world vs. Simulated Data? • Testing DNNs embedded into real and operational environments is often very expensive, dangerous, and time-consuming • To answer RQ1, we can rely on high-fidelity simulators that allow us to specify and execute scenarios capturing various situations • However, we do not know if simulator-generated data are a reliable substitute to real-world data for the purpose of DNN testing !7 RQ0: Can we use simulator-generated data as a reliable substitute 
 to real-world data for the purpose of DNN testing?
  • 8. DNNs in ADS • In this study, though the investigated questions are relevant to all autonomous systems, we focus on DNNs in the context of ADS !8 ADS DNN Camera Steering angle Brake & Accelerate Environment Lidar … … Feedback Action
  • 9. Offline Testing for ADS-DNN !9 PredictionsDNNTest Data Human Drvier Real Car Domain Model Simulator
  • 10. Online Testing for ADS-DNN !10 Domain Model Image DNNSimulator Steering Angle Ego Car and Mobile Objects Behaviors over Time
  • 11. Domain Model for Simulator • Capturing the test input space • Based on the features observed in real-world datasets • Each entity has multiple variables • Additional constraints describing valid value assignments to the variables • A (test) simulation scenario is determined by a vector of values assigned to the variables !11 Scenario Weather type: {sunny, fog, rainy, snowy} visibility: {low, medium, high} Road type: {straight, curve, spiral} direction: {left, right} length: {25, 50, 75, 100} curveRadius: {20, 30, …, 60} numLanes: {1, 2, 3} … Car speed: {10, 20, …, 100} oppositeLane: Boolean headlight: Boolean highBeam: Boolean foglight: Boolean infrontEgoCar: Boolean Environment trees: Boolean …
  • 12. Research Questions • RQ0: Can we use simulator-generated data as a reliable alternative source to real-world data? • We configure the simulator to generate a dataset that resembles the characteristics of a real-life dataset, and then compare the offline testing results for these datasets • RQ1: How do offline and online testing results differ and complement each other? • For the same simulator-generated datasets, we compare the offline and online testing results !12
  • 13. Subject DNN Models • Two publicly-available, widely used pre-trained DNN-based steering angle prediction models, i.e., Autumn and Chauffeur • Autumn consists of an image preprocessing module that computes the optical flow and a Convolutional Neural Network (CNN) that predicts steering angles • Chauffeur consists of one CNN that extracts the image features and a Recurrent Neural Network (RNN) that predicts steering angles from the previous 100 consecutive images !13
  • 14. Real-world Dataset !14 • Sequences of [image, steering angle] pairs from the Udacity Challenge −1.0 −0.5 0.0 0.5 1.0 0 2000 4000 Image ID Steeringangle(deg/25) (a fragment of) the Training Data Actual Steering Angle for Testing Data
 (i.e., 5614 labeled images for testing)
  • 15. Prediction Errors • Prediction errors of the DNN models for the real-world testing dataset • The prediction error is computed by two well-known metrics, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) • The models are reasonably accurate for the real-world test dataset !15 Model Reported RMSE Our RMSE Our MAE Autumn Not Reported 0,049 0,034 Chauffeur 0,058 0,092 0,055 Meaning: 1.375° on average
  • 16. RQ0: Overview !16 MAE(RD) MAE(SD) Real-world Dataset (RD) Simulator-generated Dataset (SD) DNN-based model Compare “Comparable”
  • 17. RQ0: Replicate Real-world Dataset • It is infeasible to generate SD with exactly the same environmental properties and vehicle dynamics as in RD • Instead, we say SD is comparable with a subsequence of RD if: • the images have the same features (e.g., sunny weather) • the steering angle difference per image is small enough on average • We propose a two-step heuristic to generate SDs that are comparable with the subsequences of RD !17
  • 18. RQ0: Two-Step Heuristic (1/2) • Step 1: Randomly generate SDs based on a domain model restricted to the features observed in RD • For example, the restricted domain model includes only sunny weather since the test dataset has only sunny images • This enables us to steer the simulator to resemble the characteristics of the images in the test dataset, to the extent possible !18
  • 19. RQ0: Two-Step Heuristic (2/2) • Step 2: For each SD, identify a comparable subsequence of RD considering steering angles • We obtain comparable dataset pairs with small-enough steering angle differences !19 Simulator- generated Steering Angles Human-generated Steering Angles
 for the real-world dataset Minimal Difference 
 (less than a small threshold) Comparable subsequence of the real-world dataset Search
  • 20. RQ0: Results (1/2) • We identified 92 simulator-generated datasets that could match subsequences of the Udacity real-life test dataset • One of the comparable pairs is shown as follows: !20 −0.1 0.0 0.1 0.2 0.3 0.4 0 50 100 150 200 Image ID Actualsteeringangle(deg/25) Real (human) Simulated Steering AnglesImages Real-world Simulator-generated
  • 21. RQ0: Results (2/2) • Distributions of MAE differences, i.e., abs(MAE(r), MAE(s)), where r and s are comparable real-world and simulator-generated datasets !21 0.00 0.25 0.50 0.75 1.00 Autumn Chauffeur MAEdifference Meaning: 2.5° on average • For Autumn, 96.7% of the comparable pairs have an MAE difference below 0.1 • For Chauffeur, 68.5% of the comparable pairs have an MAE difference below 0.1 • Even when MAE is larger than 0.1, MAE(s) is always greater than MAE(r) 0.10
  • 22. RQ0: Implications • The prediction error differences between simulator-generated datasets and real-life datasets are less than 0.1, on average, for both Autumn and Chauffeur • We can use simulator-generated datasets as a reliable alternative to real-world datasets for testing these DNNs !22
  • 23. RQ1: Setup (1/2) • We randomly generate 50 scenarios and compare the offline and online testing results for each of the simulator-generated datasets • For offline testing, we use the MAE metric (i.e., prediction error) • For online testing, we use the Maximum Distance from Center of Lane (MDCL) metric to measure the lane departure degree (i.e., safety violation) • However, we cannot directly compare MAE and MDCL values since MAE and MDCL are different metrics !23
  • 24. RQ1: Setup (2/2) • To determine whether the offline and online testing results are consistent or not, we set threshold values for MAE and MDCL • We interpret the offline testing result as acceptable if MAE < 0.1 (meaning the average prediction error < 2.5°) • We interpret the online testing result as acceptable if MDCL < 1 (meaning the maximum departure < one meter) • If both offline and online testing results are consistently (un)acceptable, we say offline and online testing are in agreement !24
  • 25. RQ1: Results (1/2) • Comparison between offline and online testing results !25 Autumn Chauffeur 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 MAE MDCL 44% 34% 0% 0% 48% 48% 8% 18%
  • 26. RQ1: Results (2/2) • One of the scenarios on which offline and online testing disagreed !26 0 1 2 3 4 5 0 20 40 60 Image ID Predictionerror(deg) Offline Testing Result Online Testing Result
  • 27. RQ1: Implications • Offline and online testing results differ in many cases • Offline testing is more optimistic than online testing because the accumulation of errors is not observed in offline testing • Online testing is preferable to offline testing for ADS-DNNs !27
  • 28. Conclusion • We showed that simulator-generated datasets yield DNN prediction errors that are similar to those obtained by real-world datasets • We also found that many safety violations identified by online testing were not detected by offline testing • As part of future work, we plan to investigate how to improve the performance of DNN-based ADS using offline and online testing results !28
  • 29. .lusoftware verification & validation VVS Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel Briand 2020-10-25