Robust Software –Robust Software –
Dotting the I’s andDotting the I’s and
Crossing the T’sCrossing the T’s
Chris OldwoodChris Oldwood
ACCU Conference 2013ACCU Conference 2013
@chrisoldwood / gort@cix.co.uk@chrisoldwood / gort@cix.co.uk
The I’s & T’sThe I’s & T’s
 RobustnessRobustness
 Handling ErrorsHandling Errors
 Safely Ignoring ErrorsSafely Ignoring Errors
 TimeoutsTimeouts
 Unit Testing FailuresUnit Testing Failures
 Flexible ConfigurationFlexible Configuration
 Monitoring ClarityMonitoring Clarity
RobustnessRobustness
Stable in the face ofStable in the face of
unexpected behaviourunexpected behaviour
Pop Quiz – Exit Code?Pop Quiz – Exit Code?
int main(int argc, char* argv[])
{
throw UnhandledException();
}
Exit Code ConventionExit Code Convention
program.exe
if %errorlevel% neq 0 (
echo ERROR: Program failed
exit /b 1
)
Big Outer Try BlockBig Outer Try Block
int main(int argc, char* argv[])
{
try
{
return DoUsefulWork(argc, argv);
}
catch (const std::exception& e)
{ /* Report failure */ }
catch (…)
{ /* Report failure */ }
return EXIT_FAILURE;
}
Module BoundariesModule Boundaries
HRESULT DoSomething(...)
{
try
{
return Impl::DoSomething(...);
}
catch (const std::bad_alloc& e)
{ return E_OUTOFMEMORY; }
catch (const std::exception& e)
{ return E_FAIL; }
catch (...)
{ return E_UNEXPECTED; }
}
Exception Safety GuaranteesException Safety Guarantees
 NoneNone
 BasicBasic
 StrongStrong
 No ThrowNo Throw
Exception Unsafe CodeException Unsafe Code
IServicePtr AcquireService()
{
if (!m_service)
{
m_service = new Service();
m_service.CreateInstance();
}
return m_service;
}
IServicePtr m_service;
Exception Safe CodeException Safe Code
IServicePtr AcquireService()
{
if (!m_service)
{
ServicePtr service = new Service();
service.CreateInstance();
m_service.swap(service);
}
return m_service;
}
IServicePtr m_service;
Forever is a Really Long TimeForever is a Really Long Time
Handle completed = BeginAsyncOperation();
. . .
Wait(completed, INFINITE);
Cancellable OperationsCancellable Operations
Handle completed = BeginAsyncOperation();
Handle aborted = GetAbortHandle();
Handle waitables[] = { aborted, completed };
. . .
Handle signalled = Wait(waitables, timeout);
if (signalled == aborted)
{
Retries: immediate then queuedRetries: immediate then queued
Unit Testing FailuresUnit Testing Failures
Testing Write+Rename IdiomTesting Write+Rename Idiom
[Test]
public Void OriginalFilePreservedOnException()
{
var fakeIo = new FakeIo();
fakeIo.Write = (file, buffer) =>
{ throw new IoException(); }
var writer = new WriterService(fakeIo);
var filename = “original.txt”;
Assert.Throws(() => writer.WriteFile(filename));
Assert.True(fakeIo.FileExists(filename));
Assert.That(. . .);
}
Flexible ConfigurationFlexible Configuration
Monitoring ClarityMonitoring Clarity
Release It!Release It!
Questions?Questions?
Blog:Blog:
http://chrisoldwood.blogspot.comhttp://chrisoldwood.blogspot.com
@chrisoldwood / gort@cix.co.uk@chrisoldwood / gort@cix.co.uk

Robust Software

Editor's Notes

  • #2 Who am I
  • #3 Quick walkthrough of the schedule
  • #4 What do I mean by Robustness? Not so much about reliability Chair – sitting, to standing, stacking, etc. – from specified to unknown
  • #5 Why is it important? Bedrock for sustainable development of new features. Not over-engineering, just consideration of failures
  • #6 What do some runtimes do when an unhandled exceptional failure occurs? Nothing! See QM #6
  • #7 The exit code convention is 0 for success Note, that’s “success == !true” just for extra confusion The parent can’t react and recover if you don’t give them the chance to Exceptions only exists within languages once you cross module boundaries it’s back to return codes
  • #8 Assume failure by default Don’t assume the runtime will do the right thing It’s int main(), not void main() – always return an exit code
  • #9 Required at any module boundary, e.g. Win32 callback, COM component, WCF service, etc. Service recovery – shutdown may be worse – black hole effect
  • #10 Recap the Abrahams exception safety guarantees These apply equally to C#, Java, etc. as well Basic can be implemented with RAII in C++ and Dispose pattern in C# otherwise a manual try/catch block
  • #11 Example of real-world code, caused process to fail all work rapidly
  • #12 When recovery is not foremost in the method, be exception agnostic Still hard - more recent example was slowly losing engines due to subtle out-of-memory exception Two phase construction is a bad idea anyway, always prefer just the constructor or factory method to do it all
  • #13 Don’t wait forever, there must be an upper limit on how long a user/system actor will actually wait Don’t even start work if the users has already got bored Status message example – received every 60 secs so no point waiting any longer
  • #14 Infinite waits acceptable when operation can be cancelled through other means Long running operations should be cancellable to allow graceful termination/shutdown
  • #15 Fast and slow retries – perhaps retry much later (queued) if there is a specific blockage
  • #16 Test more than just the happy path (disks fill up, networks hang, access gets denied) If expecting automatic retry on a cluster failover, mock the service and simulate one to test recovery
  • #17 Write + rename is equivalent to create + swap earlier Build facades to allow unit testing of I/O operations and for simulating errors, e.g. out of disk space
  • #18 In-house production can be simpler as change is tightly controlled, development is where the action happens Never hard-code anything, all service endpoints and paths must be configurable (on different levels) Testing often drives the need for flexibility due to shared resources, e.g. developers workstation DR also a driver, but can be useful outside DR too (e.g. active/passive failover) But also default sensibly where possible to avoid bloated configuration files
  • #19 Calm and considered – pages of errors and alarm bells make it harder to diagnose You’ll never dream up every possible failure, but you can design ways to allow for it
  • #20 An excellent book probably the best on the subject – good case studies