Fault Injection for Software Certification

Fault Injection for
Software Certification
Roberto Natella

Many
industries are
facing

legal
troubles
because
they

are
liable
for
accidents caused

by
computer faults
2
SSooffttwwaarree

rriisskkss

iinn

ccrriittiiccaall

ssyysstteemmss
The
Toyota
“unintended

acceleration”
is
a
relevant

example
of
accident
caused

by
bad
software
quality and

lack of fault-‐tolerance

Fault
Injection
is
the
process
of
deliberately
introducing
faults
(from
software
and
hardware
components)for
validating
fault-‐
tolerance
properties
of
a
system
FFaauulltt

IInnjjeeccttiioonn

TTeessttiinngg

FFaauulltt

iinnjjeeccttiioonn

iinn

tthhee
DDOO-‐-‐117788BB//CC

ssaaffeettyy

ssttaannddaarrddss
The
standard
recommends
robustness
test
cases
“...
[able
to]
demonstrate

the
ability
of
the
software
to
respond
to
abnormal
inputs
and
conditions.

Activities
include:
○ Real
and
integer
variables
should
be
exercised
using
equivalence
class
selection

of
invalid
values.
○ For
time-‐related
functions,
such
as
filters,
integrators
and
delays,
test
cases

should
be
developed
for
arithmetic
overflow
protection
mechanisms.
○ For
state
transitions,
test
cases
should
be
developed
to
provoke
transitions
that

are
not
allowed by
the
software
requirements.”
○ ...
*

RTCA
DO-‐178B,
Software
considerations
in
airborne
systems
and
equipment
certification,
Sec.
6.4.2.2

FFaauulltt

iinnjjeeccttiioonn

iinn

tthhee

IISSOO

2266226622

ssaaffeettyy

ssttaannddaarrdd

● The
NASA
Software
Safety
Guidebook
recommends
fault
injection

for
OTS
(off-‐the-‐shelf)
software
components
○ Software
fault
injection
(SFI)
is
a
technique
used
to
determine
the
robustness
of
the

software,
and
can
be
used
to
understand
the
behavior
of
OTS
software.
It
injects

faults
into
the
software
and
looks
at
the
results
(Did
the
fault
propagate?
Was
the

end
result
an
undesirable
outcome?).
Basically,
the
intent
is
to
determine
if
the

software
responds
gracefully
to
the
injected
faults.
FFaauulltt

IInnjjeeccttiioonn

iinn

tthhee

NNAASSAA

SSooffttwwaarree

SSaaffeettyy

SSttaannddaarrddss

● FIN.X-‐RTOS
is
a
real-‐time
operating

systemfrom
Leonardo/Finmeccanica,

based
on
open-‐source
software
● Objective
of
the
project:
to
develop
a

Linux
distribution
compliant
to
the

DO-‐178B
recommendations
● Built
upon
a
network
of
excellence

between
industriesand
universities
CCaassee

ssttuuddyy::

FFIINN..XX-‐-‐RRTTOOSS

● Industrial
product
management
and
fully
customizable
● Support
for
hard
real-‐time
on
multi-‐core
CPUs
● Guaranteed
scalability
(from
embedded
devices
to
high-‐
performance
systems,
such
as
workstations
and
servers)
● No
dependence
on
a
commercial
product
or
vendor
● Enhanced
IDE
for
software
development
● No
export
license
restriction
● Full
control
of
all
source
packages
and
build
process
(based
on

Gentoo
Linux,
a
Linux
meta-‐distribution)
8

oovveerrvviieeww

9
RReeaall-‐-‐ttiimmee

ffeeaattuurreess

ooff


10
CCeerrttiiffiiccaattiioonn

pprroocceessss

ooff

Linux
kernel
Open
Source
FIN.X-‐RTOS
RTCA/DO-‐178B
D
Level
o The
DO-‐178B
recommendations
allow
the
reuse
of
“previously-‐developed

software”,
provided
that
safety
evidence
is
produced
from
alternative
sources
such

as
additional
testing and
reverse
engineering
o The
functional
requirements of
the
kernel
were
studied,
documented,
and
tested

(complying
to
level
D of
DO-‐178B)

11
FFaauulltt

IInnjjeeccttiioonn

iinn

Faults
from
user-‐space

software
(API
misuse
injection)
Faults
from
device

drivers
(code

mutation)
Faults
from

kernel
APIs
(API
error

injection)

● There
are
many

potential
cases
of

kernel
API
failures:
○ Resource
exhaustion
(e.g.,

allocation
of
I/O
regions,

pages,
slabs,
...)
○ Hardware
I/O
errors
○ Resource
busy
(e.g.,

mutexes,
pinned
pages)
● Kernel
API
callers must

check
and
handle
errors
12
FFaauulltt

iinnjjeeccttiioonn

oonn

kkeerrnneell

AAPPIIss

● The
Linux
kernel
already
includes
a
fault
injector
that
forces
erroneous
return
codes (to
simulate

failed
memory
allocations,
I/O
errors,
...)
13
AA

ffaauulltt

iinnjjeeccttoorr

iinn

tthhee

LLiinnuuxx

kkeerrnneell
void * kmem_cache_alloc (struct kmem_cache * cachep, gfp_t flags)
{
void * objp;;
if (should_failslab(cachep, flags))
return NULL;;
...
return objp;;
}
The
fault
injector
is

programmed
from
user-‐space
Examples:
• Fail
with
X%
probability
• Fail
1-‐every-‐X
calls
to
API
• Fail
after
X
seconds

LLiimmiittaattiioonnss

ooff

rraannddoomm

ffaauulltt

iinnjjeeccttiioonn
● Faults
are
injected
with
a
“blind”
(black-‐box)

approach,
with
a
random
timing
● However,
this
approach
neglects
the
internal
state

of
the
system
○ Many
tests
are
redundant:
they
are
performed
on
the
same

state
○ Many
important
states
may
be
missed by
the
tests
● The manual
definition
of
test
scenarios is
not
a

feasible
solution
○ Too
much
effort
for
a
large
system,
and
may
still
be
inaccurate

● Basic
idea:
○ the
internal
state of
an
OS

component
(such
as
the
FS)
is

given
by
the
history
of
its

interactions
○ we
profile
the
history
of

interactions,
and
extract

behavioral
models of
the
OS

component
under
test
○ based
on
the
behavioral
model,

we
perform
distinct
fault

injections at
each
state,
to

efficiently
cover different
states

of
the
target
15
TThhee

SSAABBRRIINNEE

aapppprrooaacchh
ext3_dirty_inode
journal_dirty_metadata
kmem_cache_alloc

16
AApppprrooaacchh

oovveerrvviieeww
Operating
System
OS
component
1
OS
component
2
OS
component
N
OS
interface
Target
OS
component
User

apps
HW
System
calls
Interrupt

requests

17
AApppprrooaacchh

oovveerrvviieeww
Operating
System
OS
component
1
OS
component
2
OS
component
N
OS
interface
Target
OS
component
User

apps
HW
System
calls
Interrupt

requests
Phase
1:
monitoring

18
AApppprrooaacchh

oovveerrvviieeww
Operating
System
OS
component
1
OS
component
2
OS
component
N
OS
interface
Target
OS
component
User

apps
HW
System
calls
Interrupt

requests
Phase
1:
monitoring
Phase
2:
model
learning

19
AApppprrooaacchh

oovveerrvviieeww
Operating
System
OS
component
1
OS
component
2
OS
component
N
OS
interface
Target
OS
component
User

apps
HW
System
calls
Interrupt

requests
Phase
1:
monitoring
Phase
2:
model
learning
Phase
3:
model-‐based
testing

PPaatttteerrnn

iiddeennttiiffiiccaattiioonn
● We
get
an
execution
log of
the
target
OS
component,
by
running
a
workload
and
recording
the

function
calls
(interactions)
made
by
the
component
● The
execution
log
is
divided
into
sequences (i.e.,
a
subset
of
interactions that
happen
during
the
same

system
call,
interrupt
request,
or
kernel
task
execution)
● Unique
repeated
sequences
are
grouped
(patterns)
● Patterns
that
are
similar
(even
if
not
identical)
are
further
grouped
into
clusters
TRACE
ID
OPERATION
ID
SEQ.
ID
INT.
TYPE
CALLED
FUNCTION
CALL
POINT
Seq.
B
Seq.
A
Seq.
C
...
OUT, pdflush, 428, 1, ll_rw_block, flush_commit_list:1f3eb
INJ, pdflush, 428, 1, kmem_cache_alloc, flush_commit_list:1f3eb
INJ, pdflush, 428, 1, kmem_cache_alloc, flush_commit_list:1f3eb
IN, close, 491, 1, reiserfs_file_release, __fput:c018efda
INJ, pdflush, 428, 1, generic_make_request, flush_commit_list:1f3eb
OUT, pdflush, 428, 1, __find_get_block, flush_commit_list:1f3cc
...
IN, close, 503, 1, reiserfs_file_release, __fput:c018efda
...

CClluusstteerriinngg
ext3_dirty_inode
journal_start
kmem_cache_alloc
__getblk
journal_get_write_access
__alloc_pages
kmem_cache_alloc
kmem_cache_alloc
__brelse
journal_stop
ext3_dirty_inode
journal_start
kmem_cache_alloc
__getblk
__alloc_pages
kmem_cache_alloc
journal_stop
Sequence
1 Sequence
2
21
ll_rw_block
kmem_cache_alloc
kmem_cache_alloc
generic_make_request
__find_get_block
Sequence
3
Clustering
Finite
state
machine
Finite
state
machine

CClluusstteerriinngg

aallggoorriitthhmm
1. For
each
pair
of
patterns,
we
compute
a
similarity
score (Smith-‐Waterman
algorithm)
○ It
first
searches
the
best
alignment between
two
patterns
○ The
score
is
higher
when
there
are
many
matching
symbols and
few
gaps/mismatches
2. Similar
patterns
are
grouped
(spectral
clustering)
○ Patterns
are
the
nodes of
a
weighted
graph,
and
the
similarity
score
is
the
weight
of
the
edge
between
two
nodes
○ By
cutting
“weak”
edges,
the
graph
is
split
into
partitions that
are
“strongly
connected”
(i.e.,

very
similar
patterns)

EExxaammpplleess

ooff

cclluusstteerrss
Clusters
(EXT3) Behavior Context #
patterns
1 gets
and
sets
the
file
metadata stat
syscall 6
2 retrieves
and
stores
in
memory
the
file
index
block,
or
updates

it
on
the
disk
open,
unlink
syscalls 5
3 copies
file
contents
from
disk
to
a
cache,
and
modifies
it write
syscall 8
4 modifies
the
contents
of
a
file
already
in
the
disk
cache write
syscall 8
5 copies
a
large
amount
of
data
from
a
file
to
a
socket sendfile syscall 12
6 copies
a
small
amount
of
data
from
a
file
to
a
socket sendfile syscall 10
7 flushes
a
small
amount
of
data
from
the
cache
to
the
disk pdflush kernel
task 19
8 flushes
a
large
amount
of
data
from
the
cache
to
the
disk pdflush kernel
task 6
9 updates
file
metadata
to
reflect
that
is
has
been
memory-‐
mapped
mmap2
syscall 5
EXT3 ReiserFS SCSI
# interactions 34,784 97,341 27,311
# sequences 432 239 1,307
#
(distinct)
sequences 79 57 10
#
clusters 9 6 2
#
test
cases 49 28 10

BBeehhaavviioorraall

mmooddeelliinngg

eexxaammppllee
0 1
ext3_dirty_inode
2 5 6 7 8
journal_start
journal_dirty_metadata __brelse journal_stop
3 4
kmem_cache_alloc
__getblk
1. A
partial
state
automata
is
derived
from
the
first
pattern
in
the
cluster


mmooddeelliinngg

eexxaammppllee
0 1
ext3_dirty_inode
2 5 6 7 8
9 10 11
journal_start
alloc_pages
kmem_cache_alloc journal_dirty_metadata
kmem_cache_alloc
__brelse journal_stop
3 4
kmem_cache_alloc
__getblk
1. A
partial
state
automata
is
derived
from
the
first
pattern
in
the
cluster
2. The
automata
is
extended
with
the
second
pattern
(partially
overlapping

with
the
first
pattern)


mmooddeelliinngg

eexxaammppllee
0 1
ext3_dirty_inode
2 5 6 7 8
9 10 11
journal_start
alloc_pages
kmem_cache_alloc
3 4
kmem_cache_alloc
__getblk
Robustness
test
case
#1
1. A
partial
state
automata
is
derived
from
the
first
pattern
in
the
cluster
2. The
automata
is
extended
with
the
second
pattern
(partially
overlapping

with
the
first
pattern)
3. A
robustness
test
case
is
generated
for
each
injectable
interaction in
the

automata


mmooddeelliinngg

eexxaammppllee
0 1
ext3_dirty_inode
2 5 6 7 8
9 10 11
journal_start
alloc_pages
kmem_cache_alloc
3 4
kmem_cache_alloc
__getblk
Robustness
test
case
#1
Robustness
test
case
#2
1. A
partial
state
automata
is
derived
from
the
first
pattern
in
the
cluster
2. The
automata
is
extended
with
the
second
pattern
(partially
overlapping

with
the
first
pattern)
3. A
robustness
test
case
is
generated
for
each
injectable
interaction in
the

automata


mmooddeelliinngg

eexxaammppllee
0 1
ext3_dirty_inode
2 5 6 7 8
9 10 11
journal_start
alloc_pages
kmem_cache_alloc
3 4
kmem_cache_alloc
__getblk
Robustness
test
case
#1
Robustness
test
case
#2
Robustness
test
case
#3
1. A
partial
state
automata
is
derived
from
the
first
pattern
in
the
cluster
2. The
automata
is
extended
with
the
second
pattern
(partially
overlapping

with
the
first
pattern)
3. A
robustness
test
case
is
generated
for
each
injectable
interaction in
the

automata


mmooddeelliinngg
1. For
each
cluster,
we
obtain
a
behavioral
model
(kBehavior algorithm)
○ A
Finite
State
Automaton
(FSA)
is
incrementally
extended
with
new
transitions
and
states
○ Transitionsrepresent
interactions
of
the
patterns
2. A
robustness
test
case
is
generated
for
each

injectable
interaction included
in
the
FSA
○ This
allows
to
perform
injections
in
different
contexts

TTeesstt

eexxeeccuuttiioonn
● The
interactions
of
the
component-‐under-‐test
are
initially

profiled
using
kernel
debugging
tools (SystemTap)
● For
the
traces,
we
automatically
generate
a
kernel
injection

module that
keeps
trackof
the
OS
state
automata
at
run-‐time
● During
robustness
tests,
the
system
is
again
executed
with
the

same
workload
● When
the
injector
notices
that
an
injectable
function
is

invoked
at
a
given
state,
it
forces
an
erroneous
return
code
from
that
function
call

RRoobbuussttnneessss

vvuullnneerraabbiilliittiieess
We
found
two
robustness
vulnerabilities,
that
affected
the
EXT3
and

ReiserFS filesystem (radix_tree_node_alloc and
__get_blk)
31
STACK FRAME AT THE TIME OF INJECTION:
0 kmem_cache_alloc
1 radix_tree_node_alloc
2 radix_tree_insert
3 add_to_page_cache
4 add_to_page_cache_lru
5 mpage_readpages
6 ext3_readpages
7 __do_page_cache_readahead
8 ondemand_readahead
9 page_cache_async_readahead
10 generic_file_splice_read
11 do_splice_to
12 splice_direct_to_actor
13 do_splice_direct
14 do_sendfile
15 sys_sendfile64
16 sysenter_past_esp
First
called
function

(a
system
call)
Function

call
to
the

EXT3

filesystem
Function

call
to
the

memory

allocator
Fault
injection!
Kernel
crash!

EEffffiicciieennccyy

aanndd

rreepprroodduucciibbiilliittyy
● With
random
injection,
thousands of
tests
are
needed
to
hit
the

two
robustness
vulnerabilities
● With
model-‐based
injection,
the
same
vulnerabilities
can
be

found
efficiently(only
77
tests
are
needed),
and
tests
are
highly

reproducible
29,0%
3,8%
68,8% 77,7%
__get_blk radix_tree_node_alloc
Vulnerable
functions
EXT3
Random SABRINE
0,2%
9,4%
100,0% 100,0%
__get_blk radix_tree_node_alloc
Vulnerable
functions
ReiserFS
Random SABRINE

● Drivers
come
from

third-‐party developers
● They
are
defect-‐prone

(due
to
concurrency

and
hardware

dependencies)
● If
drivers
fail,
the
OS

should
avoid
an

escalation
(stalls,
data
corruptions,
…)
33
FFaauulltt

iinnjjeeccttiioonn
iinn

ddeevviiccee

ddrriivveerrss

Safety-critical system
FIN.X-RTOS kernel
Device Drivers
Applications
1. a fault is
injected into
driver’s code
2. the
device
driver is in
an error
state
3. the error
state
propagates
to the kernel
OOvveerrvviieeww

ooff

FFaauulltt

IInnjjeeccttiioonn

iinn
DDeevviiccee

DDrriivveerrss

TThhee

ccooddee

mmuuttaattiioonn

aapppprrooaacchh
complex_routine(...) {
...
...
if ((GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5906) &&
(off >= NIC_SRAM_STATS_BLK) &&
off < NIC_SRAM_TX_BUFFER_DESC))
{
*val = 0;
return;
}
...
...
}
Code
mutation mimics
software
faults
by
making

small
“faulty”
changes into
the
target
code,
to

emulate
programmers’
omissions
and
mistakes
Missing variable
initialization in a
complex IF construct
Missing logical clause
among several

Automatic
generation and
execution of fault
injection tests
Seamless integration to the
project under development
Supports the
injection of an
extensive and
realistic fault
suite
SSAAFFEE::
SSooffttwwAArreeFFaauulltt

EEmmuullaattoorr

AAuuttoommaattiinngg

ffaauulltt

iinnjjeeccttiioonn
uussiinngg

tthhee

SSAAFFEE

ttooooll
if(a && b)
{
c=1;
}
Target
component

source
code
Source
code
analysis
...
Mutated
target

component
Code

mutation
if(a && b)
{
c=1;
}
if(a && b)
{
c=1;
}
if(a && b)
{
c=2;
}
Fault
library

if(a && b)
{
c=1;
}
...
1. The
target

component
is
replaced

with
a
faulty
version
if(a && b)
{
c=1;
}
Software

under
test 3. These
steps
are
iterated

several
times
(one
iteration

per
faulty
version)
AAuuttoommaattiinngg

ffaauulltt

iinnjjeeccttiioonn
uussiinngg

tthhee

SSAAFFEE

ttooooll
APP LIB
MW
OS
DD
APP LIB
MW
OS
DD
LOGS
2. The
software
is

exercised
under
a
real
or

simulated
environment
if(a && b)
{
c=2;
}
LOGS
LOGS
RESULTS
4. Dependability

measures
are

computed
from

raw
data
...

FIN.X-RTOS kernel
● Fault
Injection
found
several
robustness
issues
in

the
OS
at
handling
faulty
drivers
○ Not
detectable
through
traditional
testing
techniques
Ethernet
device
driver
crash
of
an
OS

thread
that

manages

periodic
events
Fault
Injection
stall
of
OS
services
writes
un-‐initialized

data
EExxaammpplleess

ooff
rroobbuussttnneessss

iissssuueess

iinn


● The
use
of
corrupted
memory
caused
an
illegal

memory
access
exception
● When
the
sirq-‐timer kernel
thread
is
killed,
timer

functions
in
the
kernel
couldn’t
be
executed
anymore
● To
avoid
this
situation,
the
kernel’s
exception
handler
should
be
modified
to
restart
a
kernel
thread when
an

exception
occurs
instead
of
terminating
it
● In
this
way,
the
kernel
could
preserve
the
execution
of

other
timer
functions when
a
timer
functions
fails
due

to
a
faulty
driver
FFeeeeddbbaacckk

ttoo

ddeevveellooppeerrss

● The
current
trend
of

application
complexity

increases
the
opportunities

for
bad
data
values
to

circulate
within
a
system
● The
POSIX
OS
system
calls

must
gracefully
deal with

such
exceptional
conditions
○ Which
COTS
POSIX
OS
is
the
most

robust?
○ Are
errors
detected
and
handled?

How?
41
FFaauulltt

iinnjjeeccttiioonn

oonn

uusseerr-‐-‐ssppaaccee

iinntteerrffaacceess


tteessttiinngg

ooff

PPOOSSIIXX

OOSSss
Based
on
the
IEEE
1003.1b
standard,

a
tester
process generates
faulty
inputs
for
the
OS,
using
a
set
of
pre-‐defined
data
types
In
the
ideal
case
(the
test
oracle),
the
OS
should
return
an
error
code

from
the
syscall to
the
tester
process;
crashes,
stalls,
wrong
errors
are

robustness
failures

The
outcome
of
tests
is
classified
according
to
the

C.R.A.S.H.
failure
scale
43

tteessttiinngg

ooff

PPOOSSIIXX

OOSSss
Failure
type Description
Catastrophic The
OS
state
becomes
corrupted;
the
machine
crashes
and

reboots
Restart The
OS
never
returns
from
a
system
call;
the
calling

process
is
stalled and
needs
to
be
restarted
Abort The
OS
terminates
the
caller
process
in
an
abnormal
way
Silent The
OS
system
call
does
not
return
an
error
code
Hindering The
OS
system
call
returns
a
misleading
error
code


tteessttiinngg

ooff

PPOOSSIIXX

OOSSss
System
calls
are
invoked
with
combinations
of
both

valid
and
invalid
parameters
(invalid
memory

addresses,
non-‐existing
paths,
...)


tteessttiinngg

ooff

PPOOSSIIXX

OOSSss
File descriptor
FD_CLOSED
FD_OPEN_READ
FD_OPEN_WRITE
FD_NOEXIST
...
write(int filedes, const void * buffer, size_t nbytes)
Memory buffer
BUF_SMALL_1
BUF_MED_PAGESIZE
BUF_LARGE_512MB
BUF_HUGE_2GB
...
Size
SIZE_1
SIZE_16
SIZE_PAGE
SIZE_PAGEx16plus1
...

46

tteessttiinngg

ooff

PPOOSSIIXX

OOSSss
All
OSs
studied
failed
to
provide

correct
error
handling
in
a

substantial
portion
of
tests.
Indeed,
the
POSIX
standard
does
not

require
comprehensive
exception

reporting;
but
it
seems
likely
that
a

growing
number
of
applications
will

need
it.

● Invalid
inputs
that
were
often
associated
with
a

robustness
failure:
○ 94.0%
of
invalid
file
pointers
(excluding
NULL)
○ 82.5%
of
NULL
file
pointers
○ 49.8%
of
invalid
buffer
pointers
(excluding
NULL)
○ 46.0%
of
NULL
buffer
pointers
○ 44.3%
of
MININT
integer
values
○ 36.3%
of
MAXINT
integer
values
47

tteessttiinngg

ooff

PPOOSSIIXX

OOSSss

AAddddiinngg

““ssttaattee””

ttoo

rroobbuussttnneessss

tteessttiinngg
● The
test
plan
includes
an
additional
state
variable S={s1,s2,…,sn},

that
reflects
the
allocated
OS
resources
and
processes
at
the

time
of
the
test
● A
State
Setter program
is
used
in
conjunction
with
the

robustness
test
driver

CCUUTT

SSttaattee

MMooddeell
● The
states
to
be
tested
are
defined
using
a
state
model of

the
Component
Under
Test
(CUT)
● The
state
model
should:
○ be
easy
to
set
and
control
by
the
tester
○ represent
the
state
at
a
level
of
abstraction
high
enough
to
keep
the

number
of
test
cases
reasonably
small
○ include
those
configurations
that
are
the
most
influential
on
the

component
behavior

MMooddeelliinngg

tthhee

FFiillee

SSyysstteemm

ssttaattee
FileSystem attributes Type Notes
Partition
type {Primary,
Logical} The
partition
on
which
the
FS
is
installed
Partition
size Byte Total
partition space
Partition allocated Byte Used
partition
space
FS
implementation {ext2, ext3,
NTFS} The
filesystem module
to
be
loaded
Number of
Files Integer Must
be
preallocated before
the
tests
Number of
Directories Integer Must
be
preallocated before
the
tests
FS
layout {Balanced,
Unbalanced} Randomly
generated
(the
probability
that
a
new
directory
dk+1 is

appended
to
dj is
(inversely)
proportional
to
depth(dj))
○ Attributes
have
been
defined
to
cover
a
large
set
of
test
scenarios,
and
to
keep
low

the
number
of
test
cases
at
the
same
time
◌ Numeric
attributes:
a
subset
of
values
is
considered
(e.g.,
free
space
=
{low,
medium,
high})
◌ Categorical
attributes (e.g.,
partition
type
=
{primary,
logical})
◌ Random
attributes:
values
are
defined
in
terms
of
random
distributions
(e.g.,
Balanced and

Unbalanced are
two
distributions
for
FS
layout)

MMooddeelliinngg

tthhee

FFiillee

SSyysstteemm

ssttaattee
OperationalProfileattributes Type Notes
Number of
tasks
performing
I/O Integer Tasks
performing
randomI/O
Averagenumber
of
ops./s Integer I/O
syscalls performed
by the
tasks
Ratio
between read/write
ops. Float Types
of
I/O
syscalls performed
by
the
tasks
● OperationalProfile defines
the
number
and
type
of
I/O

operations
running
in
background
○ Processes
are
instantiated
in
order
to
exercise
FS-‐internal
resources
(e.g.,

buffers,
caches,
locks)
● Attributes
of
an
Item are
selected
according
to
random
(user-‐
defined)
distributions
○ E.g.,
the
name
length
and
size
of
a
file
assume
a
value
within
a
range,

selected
using
a
uniform
distribution

EExxaammpplleess

ooff

ffaaiilluurree

ddiissttrriibbuuttiioonn
Stateful tests
caused
restart
failures that
did
not

happen
in
the
statelesstests
Function #
Tests
Stateless
Rob.Testing StatefulRob.
Testing
#
Restart #
Abort #
Restart #
Abort
access() 3,986 0 4 1 4
dup2() 3,954 0 0 1 0
lseek() 3,977 0 0 0 0
mkfifo() 3,870 0 5 1 5
mmap() 4,003 0 0 0 0
open() 3,988 0 8 40 8
read() 3,924 0 253 1 253
unlink() 500 0 1 0 1
write() 3,989 0 68 4 68
Total 32,191 0 339 48 339

IInnccrreeaasseedd

ccoovveerraaggee

ffoorr

““ccoorrnneerr-‐-‐ccaasseess””
static struct dentry * real_lookup(...) { // fs/namei.c:478
/* --- OMISSIS (DECLARATIONS) */
mutex_lock(&dir->i_mutex);
result = d_lookup(parent, name);
if (!result) {
/* --- OMISSIS (PERFORMS LOOK-UP) --- */
mutex_unlock(&dir->i_mutex);
return result;
}
/*
* Uhhuh! Nasty case: the cache was re-populated while
* we waited on the semaphore. Need to revalidate.
*/
mutex_unlock(&dir->i_mutex);
if (result->d_op && result->d_op->d_revalidate) {
result = do_revalidate(result, nd);
if (!result)
result = ERR_PTR(-ENOENT);
}
return result;
}
The
cache
lookup
Code
not
covered
before

IInnccrreeaasseedd

ccoovveerraaggee

ffoorr

““ccoorrnneerr-‐-‐ccaasseess””
int try_to_free_buffers(struct page *page) { // fs/buffer.c:3057
/* --- OMISSIS (declarations) --- */
BUG_ON(!PageLocked(page));
if (PageWriteback(page))
return 0;
if (mapping == NULL) { /* can this still happen? */
ret = drop_buffers(page, &buffers_to_free);
goto out;
}
/* --- OMISSIS (page writeback and deallocation) --- */
}
• I/O
buffers
used
by
a
transaction
are

marked
by
“mapping
==
NULL”
• If
free
memory
is
low,
the
page
cache
mgmt
looks
for
pages
that
can
be
freed
(checks

with
drop_buffers())
• It
is
a
rare
condition
that
happens
under

stress

Statement
coverage
improvement
ranged
between
0.49%
and
15.11%

(especially
for
journal-‐ and
driver-‐related
code)
SSttaatteemmeenntt

ccoovveerraaggee
Source
file Stateless
Rob. Test. Stress
testing Stateful Rob.
Test.
fs/binfmt_elf.c 319/850
(37.53%) 331/850
(38.94%) 332/850
(39.06%)
fs/buffer.c 529/1320
(40.08%) 553/1320
(41.89%) 565/1320
(42.80%)
fs/dcache.c 371/880
(42.16%) 341/880
(38.75%) 387/880
(43.98%)
fs/exec.c 479/807
(59.36%) 392/807
(48.57%) 486/807
(60.22%)
fs/fs-‐writeback.c 146/273
(53.48%) 169/273
(61.90%) 174/273
(63.74%)
fs/inode.c 252/527
(47.82%) 307/527
(58.25%) 316/527
(59.96%)
fs/namei.c 918/1392
(65.95%) 626/1392
(44.97%) 925/1392
(66.45%)
fs/select.c 237/402
(58.96%) 237/402
(58.96%) 239/402
(59.45%)
fs/ext3/balloc.c 384/556
(69.06%) 385/556
(69.24%) 398/556
(71.58%)
fs/ext3/dir.c 140/219
(63.93%) 143/219
(65.30%) 144/219
(65.75%)
fs/ext3/ialloc.c 181/337
(53.71%) 186/337
(55.19%) 189/337
(56.08%)
fs/ext3/inode.c 719/1204
(59.72%) 729/1204
(60.55%) 737/1204
(61.21%)
fs/ext3/namei.c 607/1088
(55.79%) 654/1088
(60.11%) 781/1088
(71.78%)
fs/jbd/checkpoint.c 102/263
(38.78%) 141/263
(53.61%) 142/263
(53.99%)
fs/jbd/commit.c 300/362
(82.87%) 302/362
(83.43%) 318/362
(87.85%)
fs/jbd/revoke.c 108/228
(47.37%) 105/228
(46.05%) 116/228
(50.87%)
fs/jbd/transaction.c 489/697
(70.16%) 500/697
(71.74%) 545/697
(78.19%)

CCoonncclluussiioonn
● Residual
faults
are
hidden
in
our

software,
and
they
will
eventually

manifest
themselves
during
operation
● Software
Fault
Injection
is
a
means
to

assess
and
mitigate
their
impact

before
releasing
the
product
● It
is
a
reasonably
mature
technology

that
can
now
be
adopted
in
complex

software
systems

● Assessing
Dependability
with
Software
Fault
Injection:
A
Survey
R.
Natella,
D.
Cotroneo,
H.
Madeira

ACM
Computing
Surveys
(CSUR),
Vol.
48,
No.
3,
pages
44:1-‐-‐44:55,
2016
● Fault
Injection
for
Software
Certification
D.
Cotroneo,
R.
Natella
IEEE
Security
&
Privacy,
Vol.
11,
No.
4,
pp.
38-‐45,
July/August
2013
● On
Fault
Representativeness
of
Software
Fault
Injection
R.
Natella,
D.
Cotroneo,
J.
Duraes,
H.
Madeira

IEEE
Transactions
on
Software
Engineering
(TSE),
Vol.
39,
No.
1,
pp.
80-‐96,
2013
● SABRINE:
StAte-‐Based
Robustness
testIng of
operatiNg systEms
D.
Cotroneo,
D.
Di
Leo,
F.
Fucci,
R.
Natella
Proc.
28th
IEEE/ACM
International
Conference
on
Automated
Software
Engineering

(ASE
2013)
● A
Case
Study
on
State-‐Based
Robustness
Testing
of
an
Operating
System
for
the

Avionic
Domain
D.
Cotroneo,
D.
Di
Leo,
R.
Natella,
R.
Pietrantuono
Proc.
of
the
30th
International
Conference
on
Computer
Safety,
Reliability
and

Security
(SAFECOMP
2011)
57
RReesseeaarrcchh

ppuubblliiccaattiioonnss

Fault Injection for Software Certification

More Related Content

What's hot

Similar to Fault Injection for Software Certification

Recently uploaded

Fault Injection for Software Certification