Be the first to like this
For more details also vide our blog post: http://blog.zuehlke.com/is-it-safe-to-let-it-crash/
Critical software systems are bound to perform extensive error detection and exception handling. The corresponding source code is typically implemented in a defensive programming style. By addressing a cross-cutting concern, error handling code fragments are often not separable from the source ode realizing the core functionality. For extending exception handling in order to further improve fault-tolerance, even more source code is necessary. The same applies when increasing software safety by pro-actively detecting hazardous situations. However some leftover vulnerability always remains, especially in complex and distributed systems. Producing more code ultimately results in more complexity while reducing readability and maintainability. This in turn inevitably leads to programming errors.
The programming language Erlang breaks a new ground for handling faulttolerance problems. Very light-weight processes enable straightforward concurrency with communication solely based on message passing. Processes are able to monitor and – in case of a process termination – restart each other very swiftly. The exception handling method of choice for a worker process is to terminate itself, if it is unable to handle the situation locally. Supervisor hierarchies ensure appropriate error responses by starting a different process or by restarting a new instance of the terminated one.
This article aims to discuss whether the let-it-crash paradigm for faulttolerant systems may also be applicable to safety-related software projects. The industrial background of this paper is a proof-of-concept project using Erlang for implementing safety-related functionality in a close-to-reality scenario.