Slides of a paper at ERTSS'2014 co-authored by Nicolas NAVET (University of Luxembourg), Shehnaz LOUVART (Renault), Jose VILLANUEVA (Renault), Sergio CAMPOY-MARTINEZ (Renault) and Jörn MIGGE (RealTime-at-Work). Early stage timing verification on CAN traditionally relies on simulation and schedulability analysis, also known as worst-case response time (WCRT) analysis. Despite recent progresses, the latter technique remains pessimistic especially in complex networking architectures with gateways and heterogeneous communication stacks. Indeed, there are practical cases where no exact WCRT analysis is available, and merely upper bounds on the response times can be derived, on the basis of which unnecessary conservative design choices may be made. Simulation, on the other hand, does not provide anyguarantees per se and, in the context of critical networks, should only be used along with an adequate methodology. In this paper, we argue for the use of quantiles of the response time distribution as performance
metrics providing an adjustable trade-off between safety and resource usage optimization. We discuss how the exact value of the quantile to consider should be chosen with regard to the criticality of the frames, and illustrate the approach on two typical automotive use-cases.
Timing verification of automotive communication architecture using quantile estimation
1. Timing verification of automotive communication
architecture using quantile estimation
Nicolas NAVET (Uni Lu), Shehnaz LOUVART (Renault), Jose
VILLANUEVA (Renault), Sergio CAMPOY-MARTINEZ
(Renault) and Jörn MIGGE (RealTime-at-Work).
ERTSS’2014 - Toulouse, February 5-7, 2014.
February 07, 2014
2. 1 Outline
Early-stage timing verification of wired automotive
buses – CAN-based communication architectures
Schedulability
analysis versus
simulation
Performance
metrics : the
case for
quantiles
derived by
simulation
2 typical
automotive
use-cases
ERTSS'2014
07/02/2014 - 2
3. 2 Automotive communication architectures
Increased bandwidth requirements & timing constraints
More complex & heterogeneous architectures with
black-box ECUs
Optimized CAN networks for higher bus loads:
priorities, frame offsets, gateways, communication
stacks, etc
Verification activity of higher importance today, higher
load levels calls for more accurate verification models
no margin for errors
Main performance metrics: frame response time =
communication latency
ERTSS'2014
07/02/2014 - 3
4. Schedulability analysis
“mathematic model of the
worst-case possible situation”
VS
Schedulability analysis :
Simulation
“mathematic reproduces the
“program that model of the
worst-case possible situation”
behavior of a system”
max number of
max number of
instances that can
instances arriving after
accumulate at critical
critical instants
instants
Upper bounds on the perf.
metrics Safe if model is correct
and assumptions met
Models close to real systems
Often pessimistic overdimensioning
Fine grained information
Might be a gap between
models and real systems!
unpredictably unsafe then
Worst-case response times are
out of reach! Occasional deadline
misses must be acceptable
ERTSS'2014
07/02/2014 - 4
5. RTaW : “enable designers to build provably
safe and optimized critical systems”
– Simulation and schedulability analysis for networks and ECU
CAN, CAN FD, Arinc825, Ethernet, FlexRay, AFDX, etc…
– OEM customers: Renault, PSA, Eurocopter, Astrium, ABB
− RTaW/Sim Starter edition can be
downloaded from www.realtimeatwork.com
− No black box software: all schedulability
analysis that are implemented are published
Used in this study
RTaW-Sim CAN simulator
with schedulability analysis
and configuration algorithms
ERTSS'2014
07/02/2014 - 5
6. 2
Metrics for the evaluation of
frame latencies: the case for
quantiles
7. Frame response time distribution
Upper-bound with
schedulability analysis
(actual) worst-case
response time (WCRT)
Probability
Simulation max.
Q1
Q2
Response time
Easily observable events
Testbed /
Simulation
Infrequent events
Long
Simulation
Rare events
Schedulability
analysis
Q1: pessimism of schedulability analysis ?!
Q2: distance between simulation max. and WCRT ?!
ERTSS'2014
07/02/2014 - 10
8. Using quantiles means accepting a controlled risk
Quantile Qn: P[ response time > Qn ] < 10-n
Probability
Upper-bound with
schedulability analysis
Simulation max.
Q4
Q5
Probability
< 10-5
one frame
every 100 000
Response time
No extrapolation here, won’t help to say anything about what is
too rare to be in simulation traces
ERTSS'2014
07/02/2014 - 11
9. Identifying both deadline and tolerable risks
Probability
Simulation max.
Q4
deadline Q
5
Response time
1.
2.
3.
4.
Identify frame deadline
Decide the tolerable risk target quantile
Simulate “sufficiently” long
If target quantile value is below deadline,
performance objective is met
ERTSS'2014
07/02/2014 - 12
10. 1) Quantiles vs average time between
deadline misses
Quantile
One frame
every …
Mean time to failure
Frame period = 10ms
Mean time to failure
Frame period = 500ms
Q3
1 000
10 s
8mn 20s
Q4
10 000
1mn 40s
≈ 1h 23mn
Q5
100 000
≈ 17mn
≈ 13h 53mn
Q6
1000 000
≈ 2h 46mn
≈ 5d 19h
…
…
…
Warning : successive failures in some cases might be
temporally correlated, this must be assessed!
Use of distributions of successive quantile overshoots, linear and
non-linear dependency analysis
ERTSS'2014
07/02/2014 - 13
11. 2) Determine the minimum simulation length
time needed for quantile convergence
reasonable # of values: a few tens …
Tool support can help here:
e.g. numbers in gray
should not be trusted
ERTSS'2014
[RTaW-sim screenshot]
Reasonable values for Q5 and Q6
(with periods <500ms) are obtained in
a few hours of simulation (with a highspeed simulation engine) – e.g. 2 hours
for a typical automotive setup
07/02/2014 - 14
13. Use-case 1: OBD2 request through a gateway
50% load – 500kbit/s
40% load – 500kbit/s
OBD2
request
Simulated
production
delay
response
Conservative assumptions:
FIFO, transmission errors
[RTaW-sim screenshot]
Time between the OBD2 request frame
and reception of the first answer frame
must not be greater than 50ms once every
1000 requests
ERTSS'2014
07/02/2014 - 16
14. Use-case 1: OBD2 request through a gateway
Time between the OBD2 request frame
and reception of the first answer frame
must not be greater than 50ms once every
1000 requests
Metrics
OBD
response
times
Min
Average
Q3
Q4
Q5
Q6
Max
31.94
34.29
46.55
49.31
53.45
55.32
56.57
Q3
Q4
Response time distribution
ERTSS'2014
07/02/2014 - 17
15. Use-case 2: end-to-end response time of a 10ms
control frame
10ms
T13 frame
Functional level impact: less than 1 frame every 106
above deadline=10ms is acceptable
Q6 = 8.9
max= 12.1
ERTSS'2014
07/02/2014 - 18
16. Concluding remarks
1
Timing verification techniques & tools should not
be trusted blindly
2
Simulation is well suited to systems that requires
timing guarantees but
Are not well amenable to schedulability analysis
Or can tolerate deadline misses with a controlled
level of risk
3
Some methodological aspects
Determine quantile wrt criticality, and simulation
length wrt to quantile
Simulator and models validation
High-performance simulation engine needed for
higher quantiles
ERTSS'2014
07/02/2014 - 19