What activates a bug? A refinement of the Laprie terminology model.
1. Peter Tröger*
, Lena Feinbube+
, Matthias Werner*
26th IEEE International Symposium on Software Reliability Engineering
*
Operating Systems Group, Technical University Chemnitz, Germany
+
Operating Systems and Middleware Group, Hasso-Plattner-Institute, Germany
WAP: What activates a bug?
A refinement of the
Laprie terminology model
2. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Describing Software Bugs
▶ ‚Buggy‘ code producing an error only in the ‚right’ state
▶ Dormant design fault, activated by execution?
▶ Dormant design fault, activated for some state of argv[1]?
▶ Erroneous argument as external fault?
▶ Erroneous argument as propagating error?
▶ Mandelbug?
2
What activates a bug? A refinement of the Laprie terminology model
#define BUFSIZE 256
int main(int argc, char **argv) {
char buf[BUFSIZE];
strcpy(buf, argv[1]);
} [CWE ID 121]
3. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Terminology in Use
▶ Meta-study of 144 SE papers
▶ Different terminology models in use
▶ Orthogonal Defect Classification (ODC)
R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray, and
M.-Y. Wong, “Orthogonal defect classification- a concept for in-process measurements,”
IEEE Transactions on Software Engineering,, vol. 18, no. 11, pp. 943–956, 1992.
▶ IEEE Software Engineering Glossary
J. Radatz, A. Geraci, and F. Katki, “IEEE Standard Glossary of Software Engineering
Terminology,” IEEE Std, vol. 61012, no. 12, p. 3, 1990.
▶ Laprie / Avizienis
A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr,
“Basic concepts and taxonomy of dependable and secure computing,”
IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, 2004.
▶ Binder
R. Binder, „Testing object-oriented systems: models, patterns, and tools.“
Addison-Wesley Professional, 2000.
▶ Cristian
F. Cristian, “Understanding fault-tolerant distributed systems,”
Communications of The ACM, vol. 34, pp. 56–78, 1991.
3
What activates a bug? A refinement of the Laprie terminology model
4. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Goal
▶ Vocabulary is crucial
▶ Fault, error, failure, defect, bug, problem,
recovery, outage, failure, crash, ....
▶ Team communication, document writing
▶ Terminology model for software bugs
▶ Focus on state-related issues
▶ Approach: Refine the proven existing terminology
▶ Step 1: Unambiguous description of „the“ Laprie model
▶ Step 2: Create system model for software specifics
▶ Step 3: Refine the Laprie model accordingly
4
What activates a bug? A refinement of the Laprie terminology model
e the main balance of interest and activity lies
dability and security specification of a
nclude the requirements for the attributes in
cceptable frequency and severity of service
pecified classes of faults and a given use
One or more attributes may not be required at
system.
ans to Attain Dependability and Security
e of the past 50 years many means have been
attain the various attributes of dependability
Those means can be grouped into four major
1. the physical world with its natural phenomena,
2. human developers, some possibly lacking compe
or having malicious objectives,
3. development tools: software and hardware used b
developers to assist them in the develop
process,
4. production and test facilities.
The use phase of a system’s life begins when the s
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, JANUARY-MARC
ility and security attributes.
Fig. 2. The dependability and security tree.
[Avizienis et al., 2004]
5. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
What activates a bug? A refinement of the Laprie terminology model
Step 1: Failure Automaton
5
along
t has
posi-
code
ware
error
nized
esign
and
they
tions
ernal
umes
ant –
odel.
Normal
Dormant Fault
Active Fault /
Latent Error
Detected Error
Outage
External Fault
Internal Fault
Activation
Detection
Failure
Restoration
Failure
Error Handling
Figure 1. Failure automaton with the classical Laprie terminology [6]. Since
[quotes from Avizienis et al., 2001]
“The delivery of
incorrect service
is a system outage.”
“A system failure is an
event that occurs
when the delivered
service deviates from
correct service.”
“Fault prevention:
how to prevent the
occurrence or
introduction of
faults.”
“Errors that are
present but not
detected are
latent errors.”
“A fault is active when
it produces an error,
otherwise it is
dormant. ...Most
internal faults cycle
between their dormant
and active states.”
“A transition from
incorrect service to
correct service is
service restoration.”
6. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Step 2: System Model for Software
▶ Recursive concept for n-tier systems
▶ Investigated layer I
▶ State vector sI: Correct or incorrect
▶ Incorrect sI (error) may be detectable
and/or externally visible (failure)
▶ Input / output through environment
▶ Next state depends on environment
▶ Environment layer E
▶ State vector sE: Expected or unexpected
▶ Can have progress on its own
6
What activates a bug? A refinement of the Laprie terminology model
Investigated Layer I
Environment Layer E
...
...
...
Internal
External
Input Output
Figure 2. Abstract system model.
The choice for the right granularity of layers is a wid
debated topic. Often, it is discussed with the unit-of-mitiga
idea in mind [29]: The smallest acceptable granularity
the one where dependability strategies, such as spatial f
tolerance, are still implementable in the layer itself.
The failure of the highest investigated system layer is
one that ultimately becomes visible to the user, since it of
the service interface of the system. The user can be ei
Laprie understanding. Error propagation happens from the
environment up to the investigated layer. The failure of the
environment layer therefore influences the fault conditions in
the investigated layer.
It may be argued that environment layer failures can lead
to a direct system failure, without prior error propagation
through the investigated layer. We argue here that, given the
assumptions above, this should also be interpreted as case
of (immediate) error propagation. One example is the crash
of an application server (environment layer) that hosts a
web application (investigated layer). In this case, propagation
occurs in the form of an implicit termination of the running
application as a distinct progress step.
It can also be argued that error propagation may happen
inside the investigated layer as well. However, for such cases,
a separate system model at a lower level of granularity can be
defined.
B. System State
The possible states of a layer can be described as state
space ⌦. We interpret one state from this set as an arbitrarily
complex vector of information, implemented in hardware or
software. Or, to use the words of Aviˇzienis et al. [5]:
“The total state of a given system is the set of the follow-
ing states: computation, communication, stored information,
interconnection, and physical condition.”
Given a chosen granularity, both the investigated layer I
and the environment layer E have current state vectors sI 2
⌦I and sE 2 ⌦E at any discrete point in time. ⌦I and ⌦E
may overlap in parts, for example when one physical memory
location in a computer contributes to both state sets.
The investigated layer has a set of correct states XI ⇢ ⌦I
which lead to a non-failing operation of the system. The envi-
ronment layer is typically a black box for the developer, so we
denote XE ⇢ ⌦E as its expected states, which just expresses
an external observer assumption regardless of whether this is
an internal error state for E. The current states sI and sE may
or may not be in the set of correct resp. expected states.
Most software systems contain both volatile state and
persistent state at any given time. Any restart or other kind of
recovery resets only the volatile state, so the persistent state
outside world. Instead, multiple levels of input buffers and
caches make the consumed input a part of the environment
layer. Similarly, the generation of output by the investigated
layer is modeled as triggered state change in the environment
layer, and not as direct action. The investigated layer therefore
has no own input or output events.
Both the environment and the investigated layer need a
progress concept, meaning that their states evolve at discrete
points in time. The most common approach to model state
changes are discrete events for an ‘atomic’ execution step.
The atomicity may e.g. refer to the processor hardware, the
semantics of the programming language as with C sequence
points, or to the execution model of the virtual runtime
environment as with PLC loop-based computers.
The choice of the next active state in I depends on the
combination of sI and sE, specifically at the moment when
the execution step happens.
Progress in I potentially also changes the state in E,
for example when system calls take place. Therefore, we
assume mutual influence between the layers, a direct one from
investigated to environment layer, and an indirect one from
environment to investigated layer.
In E, the decision about the next state relies only on the
current sE.
The assumptions of our abstract system model are summa-
rized in Table II.
Table II. STATE CONCEPT FOR THE ABSTRACT SYSTEM MODEL.
Investigated layer states ⌦I
Current state sI 2 ⌦I
Correct states XI ⇢ ⌦I
Incorrect states EI = ⌦I XI
Detectable incorrect states DI ⇢ EI
Externally visible incorrect states FI ⇢ EI
Investigated layer progress fI : ⌦I ⇥ ⌦E ! ⌦I ⇥ ⌦E
sI
sE
(t) 7! sI
sE
(t + 1)
Environment layer states ⌦E
Current state sE 2 ⌦E
Expected states XE ⇢ ⌦E
Unexpected states EE = ⌦E XE
Environment layer progress fE : ⌦E ! ⌦E
fE : sE (t) 7! sE (t + 1)
7. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Refined Terminology
▶ Refined definition for ‚software fault‘:
Minimal set of code deviations from correct code,
such that the execution of the deviating code can trigger an error.
▶ Fault Model: Description of possibilities for faulty code
▶ Fault Condition Model: Description of fault-enabling system states
▶ Fault Enabling: Change of system state to allow some error
▶ Fault Activation: Execution of faulty code leading to that error
7
What activates a bug? A refinement of the Laprie terminology model
8. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
What activates a bug? A refinement of the Laprie terminology model
Step 3: Refined Failure Automaton
8
Disabled Fault
sI 2 XI
sE 2 XE
Dormant Fault
sI 2 XI
sE 2 ⌦E
Active Fault /
Latent Error
sI 2 EI
sE 2 ⌦E
Detected Error
sI 2 DI
sE 2 ⌦E
Outage
sI 2 FI
sE 2 ⌦E
EXECF
CON
(Enabling)
COF F
(Disabling)
EXECF
(Activation) Deactivation
COF F , CON ,
EXECF
FAIL
Detection
COF F , CON ,
EXECF
Mitigation
Restoration
Recovery
FAIL
Figure 3. Failure automaton with fault activation conditions. Some events may occur in more than one of the states: CON (fault condition is fulfilled now
COF F (fault condition is no longer fulfilled), EXECF (faulty code is executed), FAIL (failure).
EXECF: Event when
faulty code is executed
CON: Event when activation
condition is established
COFF: Event when activation
condition is no longer given
EI: Incorrect states
DI: Detectable
incorrect states
FI: Externally visible
incorrect states
9. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Discussion
▶ Most Laprie concepts remain the same
▶ Fault prevention, fault removal and fault tolerance
▶ External physical faults may impact both sI and sE
▶ Fault handling in the original sense is now fault disabling or fault removal
▶ Might be interesting to focus on fault disabling
▶ Adding software dependencies makes fault disabling harder
▶ Activation conditions with unexpected environment states are key
▶ How to test that?
9
What activates a bug? A refinement of the Laprie terminology model
10. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Describing Software Bugs
▶ Unexpected input: Fault enabling due to input received in the environment
▶ Race condition: Fault enabling / disabling due to timing of the environment
▶ Missing libraries: Fault enabling immediately on application start, no COFF
▶ Automatic variable initialization: Reduction of activation conditions
▶ Common-cause error: Same sE in multiple activation conditions
10
What activates a bug? A refinement of the Laprie terminology model
#define BUFSIZE 256
int main(int argc, char **argv) {
char buf[BUFSIZE];
strcpy(buf, argv[1]);
} [CWE ID 121]
11. peter.troeger@informatik.tu-chemnitz.deISSRE 2015
Summary
▶ Refinement of the proven Laprie terminology model
▶ Separation of code defects and their enabling states
▶ Separation of investigated and environment layer
▶ Basic concepts (propagation, fault / error / failure) remain the same
▶ Fault model:
Missing and defective code
▶ Fault condition model:
System states enabling faults
▶ Error model:
System states with activated
faults that may lead to failure
11
What activates a bug? A refinement of the Laprie terminology model
Disabled Fault
sI 2 XI
sE 2 XE
Dormant Fault
sI 2 XI
sE 2 ⌦E
Active Fault /
Latent Error
sI 2 EI
sE 2 ⌦E
Detected Error
sI 2 DI
sE 2 ⌦E
Outage
sI 2 FI
sE 2 ⌦E
EXECF
CON
(Enabling)
COF F
(Disabling)
EXECF
(Activation) Deactivation
COF F , CON ,
EXECF
FAIL
Detection
COF F , CON ,
EXECF
Mitigation
Restoration
Recovery
FAIL
Figure 3. Failure automaton with fault activation conditions. Some events may occur in more than one of the states: CON (fault condition is fulfilled n
COF F (fault condition is no longer fulfilled), EXECF (faulty code is executed), FAIL (failure).
is no longer fulfilled. The precise formulation of such events wrong behavior. Note that COF F may not always be poss
12. peter.troeger@informatik.tu-chemnitz.deISSRE 2015 12
What activates a bug? A refinement of the Laprie terminology model
Disabled Fault
sI 2 XI
sE 2 XE
Dormant Fault
sI 2 XI
sE 2 ⌦E
Active Fault /
Latent Error
sI 2 EI
sE 2 ⌦E
Detected Error
sI 2 DI
sE 2 ⌦E
Outage
sI 2 FI
sE 2 ⌦E
EXECF
CON
(Enabling)
COF F
(Disabling)
EXECF
(Activation) Deactivation
COF F , CON ,
EXECF
FAIL
Detection
COF F , CON ,
EXECF
Mitigation
Restoration
Recovery
FAIL
gure 3. Failure automaton with fault activation conditions. Some events may occur in more than one of the states: CON (fault condition is fulfill
OF F (fault condition is no longer fulfilled), EXECF (faulty code is executed), FAIL (failure).