The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
A Language Support for Exhaustive Fault-Injection in Message-Passing System Models
1. A Language Support for Exhaustive
Fault-Injection in Message-Passing
System Models!
Masaya Suzuki & Takuo Watanabe!
Department of Computer Science!
Tokyo Institute of Technology
1
MOD*2014, Bertinoro
2. About This Work
• Proposes a modeling language Sandal that is aimed
to describe fault-prone distributed systems.!
- Sandal provides a fixed set of features for describing faults
and fault-handling actions!
• timeout, message lost, shutdown!
!
• Talk Outline!
- Background!
- Modeling Faults and Fault-Handling Actions!
- Language Features of Sandal!
- Case Study: 2PC!
- Final Stuff
2
3. Research Background (1):!
Adaptive Distributed Systems
• Concurrent Context-Oriented Programming!
• "A Reflective Approach to Actor-Based Concurrent Context-
Oriented Systems" [Watanabe & Takeno, COP 2014]!
- asynchronous context manipulation using reflection!
• optimistic and pessimistic synchronization!
!
!
!
!
!
!
!
• Verification of context manipulation mechanism
3
observer
context
change
info.
O
A
B
cross-context message
4. Research Background (2):!
Modeling Human-Made Faults
• Verifying workflows including recovery processes of
human-made faults!
• "A Model-Checking Based Approach to Robustness Analysis of
Procedures under Human-Made Faults" [Nagatou & Watanabe,
APBPM 2014]!
- Modeling a system as a set of concurrent processes!
- Injecting possible human-made fault actions to the model!
• cf. HAZOP!
- Model-check the fault-injected model!
- Applications!
• Blood Testing, Radar Data Processing, etc.!
!
• Modular fault description mechanism
4
5. Modular Description of Self-* Behaviors
• Generally, modeling/specification languages need
good modularization mechanisms for describing/
specifying self-* behaviors and/or non-functional
behaviors such as:!
- Faults, Fault Handling Actions!
- (Dynamic) Adaptation / Evolution / Self-Updating!
- Context-Aware / Context-Oriented Behaviors!
- Resource Aware Actions!
- (Application-Aware) Synchronizations!
- Security / Safety Related Behaviors!
!
• cf. Advanced Modularization Mechanisms in
Programming Languages: AOP, FOP, COP, etc.
5
6. Motivation: Modeling a Faulty System
• From an experience on building a complex service on
a distributed system: testing is not satisfactory for
some fault-prone environments!
• Tried to borrow the idea of SFI (software fault
injection) for describing the abstract model of the
service to be model checked.
6
7. Describing Faults (1)
• A simple timeout action for a message reception
(in Promela)!
!
!
!
!
!
!
!
!
- Note: Promela's timeout primitive can not be used for this
purpose.
7
ch ? var;
if
:: var == Done -> ...
:: ...
fi
bool recv_timeout = false;
if
:: ch ? var;
:: recv_timeout = true;
fi;
if
:: var == Done -> ...
:: ...
fi
the original model!
(w/o timeout)
a model with timeout action
8. Describing Faults (2)
• Unexpected termination actions (highlighted) should
be inserted to wherever needed.
8
proctype Arbiter() {
mtype resp;
if :: true; false :: true fi;
worker1_recv ! Ready;
if :: true; false :: true fi;
worker2_recv ! Ready;
if :: true; false :: true fi;
worker1_send ? resp;
if :: true; false :: true fi;
if
:: resp == NotReady ->
if :: true; false :: true fi;
all_ready = false
:: else
fi;
if :: true; false :: true fi;
worker2_send ? resp;
if :: true; false :: true fi;
if
:: resp == NotReady ->
if :: true; false :: true fi;
all_ready = false
:: else
fi;
determined = true;
if :: true; false :: true fi;
if
:: all_ready ->
if :: true; false :: true fi;
worker1_recv ! Commit;
if :: true; false :: true fi;
woeker2_recv ! Commit
:: else ->
if :: true; false :: true fi;
worker1_recv ! Abort;
if :: true; false :: true fi;
worker2_recv ! Abort
fi
}
proctype Worker1() {
mtype resp;
if :: true; false :: true fi;
worker1_recv ? resp;
if :: true; false :: true fi;
if
:: worker1_ready = true;
if :: true; false :: true fi;
worker1_send ! Ready
:: worker1_ready = false;
if :: true; false :: true fi;
worker1_send ! NotReady
fi;
if :: true; false :: true fi;
worker1_recv ? worker1_resp
}
proctype Worker2() {
...
}
9. Need for Modular Description Mechanism
• Manually inserting faults and fault-handling actions
into a model is itself fault-prone. !
• Modeling language should have features that support
modular descriptions for faults and fault-handling
actions.
9
10. Current Contribution
• We designed and implemented a modeling language
Sandal that is aimed to describe fault-prone distributed
systems.!
• Some case studies, including two phase commit (2PC)
protocol, show the effectiveness of the language
features of Sandal.
10
11. Sandal
• A process-oriented modeling language with features
for describing typical faults:!
- unexpected process termination!
- timeout in message reception!
- random loss of message!
!
• Langauge Processor (translator to NuSMV)!
- Source code: https://github.com/draftcode/sandal!
- You need!
• Go (http://golang.org) to build the translator!
• NuSMV (http://nusmv.fbk.eu) to verify translated models
11
14. Unexpected Process Termination &!
Random Loss of Messages (1)
• @shutdown!
- specifies that the process may terminate unexpectedly!
• @drop!
- specifies that the channel may lost messages
14
init {
P0_0: PingProc(ping_to_pong_0, pong_to_ping_0) @shutdown,
P1_0: PongProc(pong_to_ping_0, ping_to_pong_0) @shutdown,
ping_to_pong_0: channel { Message } @drop,
pong_to_ping_0: channel { Message } @drop,
}
18. Case Study: Experimental Result
18
(arbiter.determined^
¬arbiter.all ready ! (¬worker1.resp = Commit ^ ¬worker2.resp = Commit))
Speed LOC Memory
No Fault 0.96 sec 51 26.4 MB
With Timeout 2.88 sec 51 (6) 21.8 MB
With Message Loss 2.11 sec 51 (8) 11.9 MB
With Termination 0.51 sec 51 (6) 17.1 MB
Arch Linux (Kernel 3.12.7) Intel Core i7-3370K @ 3.50GHz 16GB Memory
NuSMV 2.5.4 (CUDD 2.4.1 MiniSat2-070721), Spin 6.2.5
Property to be checked:
Result:
19. Comparison (1): Time & Memory Footprint
19
Sandal Spin NuSMV
No Fault 20.8 MB 128 MB 6.42 MB
With Timeout 21.2 MB 128 MB 6.64 MB
With Message Loss 25.2 MB 128 MB 6.82 MB
With Termination 12.7 MB 128 MB 6.57 MB
Sandal Spin NuSMV
No Fault 0.42 sec 0.87 sec 0.016 sec
With Timeout 0.50 sec 0.89 sec 0.018 sec
With Message Loss 0.95 sec 0.88 sec 0.025 sec
With Termination 0.21 sec 0.95 sec 0.015 sec
20. Comparison (2): Size of Models
• (n) : # of lines modified / added to "No Fault" version
20
LOC (Diff) Sandal Promela NuSMV
No Fault 45 28 178 (58)
With Timeout 48 (5) 37 (13) 180 (6)
With Message Loss 45 (2) 34 (10) 182 (14)
With Termination 45 (6) 41 (21) 179 (23)
21. Related Work
• Automatic fault-injection tools targeted to models!
- MODIFI [Svenningsson et al, 2010]!
- FSAP/NuSMV-SA [Bozzano et al, 2003]!
• both are for hardware faults!
• modularization problem!
• Model-checking message-based distributed systems!
- Rebeca [Sirjani et al, 2004]!
• AOP for modeling languages!
- Aspect-Oriented Promela [Ohno & Kishi, 2008]!
- Moxa [Yamada & Watanabe, 2005]!
• Aspect-Oriented Extension of JML
21
22. Future Work
• Optimizing the Translator!
- Abstraction Refinement, K-Induction, etc.!
• AOP/FOP version of Sandal!
- Model-level separation of concerns (parameterization?)!
• Probabilistic Models for Faulty Behaviors!
!
• Verifying Multi-Level Models of Self-* Systems!
• Compositional Construction of Actor-Based
Group-Wide Reflection [Watanabe, 2013]!
- Self-* actions vs. base-level actions
22
a"group"of"objects
meta0group
23. Summary
• We propose a modeling language Sandal that
provides features for describing faults and fault-
handling actions!
- timeout, random loss of messages, unexpected termination!
• Case study (2PC protocol) shows the effectiveness of
the language features.
23