Towards Functional Safety compliance of Matrix-Matrix Multiplication

Towards functional Safety
Compliance of Matrix-Matrix
Multiplication for Machine
Learning-based Autonomous
systems
Javier Fernández* , Jon Perez*, Irune Agirre*, Imanol Allende*, Jaume
Abella, Francisco J. Cazorla
( ) (*)

CONTENTS
01
02
03
CONTENTS
CONTEXT
PROPOSED SOLUTION
EVALUATION
04 CONCLUSIONS

© 2021 IKERLAN. All rights reserved
Introduction
CONTEXT
IEC 61508 IEC 61513 EN 5012X ISO 26262
Current Standards
“Process where a machine/computer/system learns things that can be
used to make it perform better in the future 2 ”
ML is based on probabilistic models built from training
data set instead from specifications for making
predictions and for making decisions
ML has made enormous progress reaching near-human
accuracy in several safety-related tasks.
ML algorithms process large data volumes and they
need higher performance than that provided by
traditional dependable embedded systems
Machine learning (ML)
[1] IEC 61508 (1-7): Functional safety of
E/E/PE safety related systems
[2] Data Mining: Practical Machine Learning Tools and Techniques
• ISO/PAS 21448:2019 (SOTIF)
• ANSI/ UL 4600
• VDE-AR-E 2842-61
New standards
3

Introduction
Object
Detection
CONTEXT
1. General Matrix-Matrix Multiplication (GEMM)
2. CUBLAS
3. Processor-specific variants such as AVX
MMM offered by YOLO
YOLO (You Only Look Once):
• Multiscale Object detector
• Based on darknet
Library
Matrix-Matrix Multiplication (MMM):
• 67 % execution time [1]
• Backbone of the
Convolutional Neural
Networks (CNNs)
Code subset
[1] Evaluation and mitigation of soft-errors in neural network
based object detection in three GPU architectures
4

Failures &
Diagnostic
Coverage
CONTEXT
Introduction
Object
Detection
“DC denotes the effectiveness of
diagnostic techniques to detect
dangerous errors” [1]
Diagnostic Coverage (DC)
a) Permanent Faults
b) Intermittent Faults
c) Transient Faults
Classification of faults (frequency)
0 1 0
0 0 1
1 0 1
1 1 0
0 0 1
1 0 0
Activation weights
to be mapped
Activation weights
mapped Systematic: associated with the
development process
Random: Associated to hardware
errors. For example:
• Electromagnetic interference
• Component wear-out
• Voltage drops
Classification of faults (source)
5 [2] Understanding Error Propagation in Deep Learning
Neural Network (DNN) Accelerators and Applications
[2] [2]
[1] Evaluation Multi-core devices for safety-critical systems:
a survey

CONTENTS
01
02
03
04
CONTENTS
CONTEXTUALIZATION
PROPOSED SOLUTION
EVALUATION
CONCLUSIONS

Error avoidance
PROPOSED SOLUTION
MMM: VIOLATIONS BY RULE OF MISRA C:2012
DARKNET: VIOLATIONS BY RULE OF MISRA C:2012
• Usage of defensive programming
• Compliance with codification guidelines (MISRA C [1])
Strategy
• Not explicitly defining types with the size and signedness for
basic numerical types.
• Not checking the correctness of input parameters
• Not explicitly defining the desired precedence of operators
within expressions
• Not explicitly defining input parameter pointers as
const-qualified type
Violations Sequential MMM
7
[1] MISRA C:2012 – Guidelines for the use of the C language
in critical systems
• Polyspace
Analysis Tool

Error Detection
Error avoidance
To guarantee a safe execution during the software
deployment of the Matrix-matrix multiplication (MMM)
through the use of diagnostic techniques.
Objective
a) To employ checksums algorithms as diagnostic
techniques to compute an Execution Signature (ES)
of all the values of the input and output matrices.
b) To provide a catalogue of checksums and to evaluate
a trade-off between DC and performance impact.
Proposal
• XOR
• Twos’ complement
• Ones’ complement
Checksums algorithms[1]
• Fletcher
• CRC
𝐴11 𝐴12
𝐴21 𝐴22
𝐴31 𝐴32
𝑋
𝐵11 𝐵12 𝐵13
𝐵21 𝐵22 𝐵23
=
𝐶11 𝐶12 𝐶13
𝐶21 𝐶22 𝐶23
𝐶31 𝐶32 𝐶33
PROPOSED SOLUTION
[1] The Effectiveness of Checksums for 8

Error Detection
Error avoidance
a) Periodic diagnosis with design time fixed data pattern(s)
b) Redundancy (with or without diversity)
Safety architectural patterns
Control
action
Inputs
(camera/simulation)
Switch every
N executions
ES from MMM including
checksums
Majority voter
MMM
including
checksums
Compare every N
executions
Diagnostic
Known inputs
Known ES
9
Voted Output
PROPOSED SOLUTION
checksums
checksums

EVALUATION
Set-up
Sequential MMM:
• Core R5 implemented in Zynq UltraScale+
AVX-based MMM:
• Core i7 implemented in a PC
Implementation
Square matrices:
80x80, 160x060 and 320x320
Unbalanced matrices:
𝑀 = 18; 𝑁 = 230400; K =64
Performance experiments
Square matrices:
20x20, 40x40 and 80x80
𝐿1 ∶ 𝑀 = 32; 𝑁 = 29; K = 144
𝐿2 ∶ 𝑀 = 8; 𝑁 = 900; K = 8
𝐿3 ∶ 𝑀 = 15; 𝑁 = 225; K = 48
𝐿59: 𝑀 = 18; 𝑁 = 900; K =1024
DC experiments
Kind of matrices
𝐴11 𝐴12
𝐴21 𝐴22
𝐴31 𝐴32
𝑋
𝐵11 𝐵12 𝐵13
𝐵21 𝐵22 𝐵23
=
𝐶11 𝐶12 𝐶13
𝐶21 𝐶22 𝐶23
𝐶31 𝐶32 𝐶33
𝑀𝑥𝐾 𝑀𝑥𝑁
𝐾𝑥𝑁
Square matrices: Dimension N x N
11

Performance
Impact (PI)
Set-up
EVALUATION
Performance Impact (PI):
𝑛 =
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑋
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑌
Definition
𝑋 = 𝐶𝑜𝑑𝑒 𝑎𝑐𝑐𝑜𝑚𝑝𝑙𝑖𝑠ℎ𝑖𝑛𝑔 𝑀𝐼𝑆𝑅𝐴 𝐶
𝑌 = 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑐𝑜𝑑𝑒
Adoption of MISRA C
𝑋 = 𝑐𝑜𝑑𝑒 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑑 𝑎𝑛𝑑 𝑎𝑐𝑐𝑜𝑚𝑝𝑙𝑖𝑠ℎ𝑖𝑛𝑔 𝑀𝐼𝑆𝑅𝐴 𝐶
𝑌 = 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑐𝑜𝑑𝑒
Adoption of MISRA C and code optimization
𝑋 = 𝐶𝑜𝑑𝑒 𝑖𝑛𝑐𝑙𝑢𝑖𝑑𝑖𝑛𝑔 𝑐ℎ𝑒𝑐𝑘𝑠𝑢𝑚/𝑠
𝑌 = 𝐶𝑜𝑑𝑒 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑑 𝑎𝑛𝑑 𝑎𝑐𝑐𝑜𝑚𝑝𝑙𝑖𝑠ℎ𝑖𝑛𝑔 𝑀𝐼𝑆𝑅𝐴 𝐶
Experiments
𝑛 = 1.01
𝑛 = 0.95
12

Performance
Impact (PI)
Set-up
EVALUATION
Sequential MMM
Remarks
1. Decrement in the PI when the size of the matrices increase,
approaching to asymptotically specific values
2. Individual experiments confirm the increase in performance
impact from the less intricate algorithms (XOR, 1’s
complement, 2’s complement) to even more intricate ones
(Fletcher, CRC).
3. The PI is smaller on the most external loops (M,E) with
respect to the internal loop (I)
13

Performance
Impact (PI)
Set-up
EVALUATION
Sequential MMM
AVX-based MMM
1. Same tendency to a fixed PI value when the size of the
matrices increase.
2. PI Increment in the Fletcher and 1’s complement checksums
according to the results obtained in the sequential MMM
experiments.
Reasons: Lack of specific arithmetic operations in AVX
• 1’s complement requires adding all values to be checked and
subsequently adding the carry bit back into the result.
• Fletcher requires a modulo operation
Remarks
14

EVALUATION
Set-up
Performance
Impact (PI)
Diagnostic
Coverage
(DC)
• Induction of an exhaustive single-bit error injection in
all bit positions of the values of matrices A and B
𝐷𝐶 =
𝐷𝑒𝑡𝑒𝑐𝑡𝑒𝑑 𝑓𝑎𝑢𝑙𝑡𝑠
𝐼𝑛𝑗𝑒𝑐𝑡𝑒𝑑 𝑓𝑎𝑢𝑙𝑡𝑠
𝑥 100 (%)
• DC is independent of the platform where is
implemented
• The percentage of the code protected is dependent
of the loop where the checksum is implemented
Fault injection experiment A) External loop injection experiment
𝐴11 𝐴12
𝐴21 𝐴22
𝐴31 𝐴32
𝑋
𝐵11 𝐵12 𝐵13
𝐵21 𝐵22 𝐵23
=
𝐶11 𝐶12 𝐶13
𝐶21 𝐶22 𝐶23
𝐶31 𝐶32 𝐶33
B) Intermediate loop injection experiment
𝐴11 𝐴12
𝐴21 𝐴22
𝐴31 𝐴32
𝑋
𝐵11 𝐵12 𝐵13
𝐵21 𝐵22 𝐵23
=
𝐶11 𝐶12 𝐶13
𝐶21 𝐶22 𝐶23
𝐶31 𝐶32 𝐶33
C) Internal loop injection experiment (100 %)
𝐴11 𝐴12
𝐴21 𝐴22
𝐴31 𝐴32
𝑋
𝐵11 𝐵12 𝐵13
𝐵21 𝐵22 𝐵23
=
𝐶11 𝐶12 𝐶13
𝐶21 𝐶22 𝐶23
𝐶31 𝐶32 𝐶33
15

EVALUATION
Set-up
Performance
Impact (PI)
Diagnostic
Coverage
(DC)
DC for Sequential MMMs
1. XOR, 1’s and 2’s complements not reach 100% DC in the
internal loop.
2. DC is highly dependent on the dimension of the matrix
involved in the MMM
3. All the checksums combinations reach 100% DC except for
XOR_Fletcher and Two’s_Fletcher
4. Evaluation of a layer extracted from YOLO:
1. 1’ complement -> 98.5 %
2. 2’s complement -> 96.9 %
Remarks
16

EVALUATION
Set-up
Performance
Impact (PI)
Diagnostic
Coverage
(DC)
DC for AVX MMMs
1. DC of XOR (all loops) and 1’s (E), 2’s
complement (E), CRC (I) and Fletcher (I)
checksums remain the same value.
2. Increment in the DC of the rest of checksum.
Reason:
AVX instructions compute the ES of 8 data
values in each iteration and therefore the
percentage of protected data is higher than in
the sequential implementation.
Remarks
DC for Sequential MMMs
17

Set-up
Performance
Impact (PI)
Diagnostic
Coverage
(DC)
Trade-off
PI vs DC
EVALUATION
18
1. All combination of checksums reach 100%
DC except for XOR_Fletcher and TWO’s
Fletcher
2. Checksums that reach 100% DC:
• Fletcher (I)
• CRC (I)
• XOR_CRC
• ONE’s Fletcher
• Fletcher_CRC
Remarks

EVALUATION
Set-up
Performance
Impact (PI)
Diagnostic
Coverage
(DC)
Trade-off
PI vs DC
19
1. Checksums that reach 100% DC:
• Fletcher (I)
• CRC (I)
• Al combinations of checksums
Remarks

CONTENTS
01
02
03
04
CONTENTS
CONCLUSIONS
CONTEXTUALIZATION
PROPOSED SOLUTION
EVALUATION

Conclusions & Future Work
21
Adaptation to coding guidelines
Trade-off between DC and PI
Future work
It doesn’t require a huge effort and the PI is negligible (lower than 1%) • DC evaluation with multiple bit errors
instead of single bit-errors
• Extend this work towards platform
accelerators such as GPUs and FPGAs
Achievement of 100% of DC with a PI between n = 1.001 and n=2.97 for the
largest matrix sized in the Sequential implementation and n = 1.001 and n=
6.70 in AVX-based implementation
High dependence of the performance impact
with respect to the dimension of the matrices.
The selection of the most appropriate checksums combination depends on
the dimensions of the matrix under consideration
CONCLUSIONS

IKERLAN
P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón
T. +34 943712400 F. +34 943796944
THANK YOU

IKERLAN
P.º José María Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón
T. +34 943712400 F. +34 943796944
NAME: JAVIER FERNÁNDEZ MUÑOZ
EMAIL: JAVIER.FERNANDEZ@IKERLAN.ES

Towards Functional Safety compliance of Matrix-Matrix Multiplication

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Towards Functional Safety compliance of Matrix-Matrix Multiplication

Similar to Towards Functional Safety compliance of Matrix-Matrix Multiplication (20)

Recently uploaded

Recently uploaded (20)

Towards Functional Safety compliance of Matrix-Matrix Multiplication

Editor's Notes