SlideShare a Scribd company logo
Generation 5 » What do you do when you’ve caught an exception?

 Subscribe to our RSS Feed | About Us

What do you do when you’ve caught an exception?
Abort, Retry, Ignore
This article is a follow up to “Don’t Catch Exceptions“, which advocates that
exceptions should (in general) be passed up to a “unit of work”, that is, a fairly
coarse-grained activity which can reasonably be failed, retried or ignored. A unit of
work could be:
an entire program, for a command-line script,
a single web request in a web application,
the delivery of an e-mail message
the handling of a single input record in a batch loading application,
rendering a single frame in a media player or a video game, or
an event handler in a GUI program
The code around the unit of work may look something like
[01] try {
[02]
DoUnitOfWork()
[03] } catch(Exception e) {
[04]
... examine exception and decide what to do ...
[05] }

For the most part, the code inside DoUnitOfWork() and the functions it calls tries to
throw exceptions upward rather than catch them.
To handle errors correctly, you need to answer a few questions, such as
Was this error caused by a corrupted application state?
Did this error cause the application state to be corrupted?
Was this error caused by invalid input?
What do we tell the user, the developers and the system administrator?
Could this operation succeed if it was retried?
Is there something else we could do?
Although it’s good to depend on existing exception hierarchies (at least you won’t
introduce new problems), the way that exceptions are defined and thrown inside the
work unit should help the code on line [04] make a decision about what to do — such
practices are the subject of a future article, which subscribers to our RSS feed will be
the first to read.

The cause and effect of errors
There are a certain range of error conditions that are predictable,  where it’s possible
to detect the error and implement the correct response.  As an application becomes
more complex,  the number of possible errors explodes,  and it becomes impossible or
unacceptably expensive to implement explicit handling of every condition.
What do do about unanticipated errors is a controversial topic.  Two extreme positions
are: (i) an unexpected error could be a sign that the application is corrupted, so that
the application should be shut down, and (ii) systems should bend but not break: we
should be optimistic and hope for the best.  Ultimately, there’s a contradiction
between integrity and availability, and different systems make different choices.  The
ecosystem around Microsoft Windows,  where people predominantly develop desktop
applications,   is inclined to give up the ghost when things go wrong — better to show
a “blue screen of death” than to let the unpredictable happen.  In the Unix
ecosystem,  more centered around server applications and custom scripts,  the
tendency is to soldier on in the face of adversity.
What’s at stake?
Desktop applications tend to fail when unexpected errors happen:  users learn to save

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]

Search for:
Search

Archives

June 2012 (1)
August 2010 (1)
May 2010 (1)
June 2009 (2)
April 2009 (1)
March 2009 (1)
February 2009 (3)
January 2009 (3)
November 2008 (1)
August 2008 (2)
July 2008 (5)
June 2008 (5)
May 2008 (2)
April 2008 (6)
March 2008 (8)
June 2006 (1)
February 2006 (1)

Categories

AJAX (2)
Asynchronous Communications (16)
Biology (1)
Books (1)
Design (1)
Distributed (1)
Exceptions (2)
Functional Programming (1)
GIS (1)
Ithaca (1)
Japan (1)
Math (1)
Media (3)
Nature (1)
Semantic Web (3)
Tools (28)
CRUD (1)
Dot Net (17)
Freebase (2)
GWT (9)
Java (7)
Linq (2)
PHP (6)
Server Frameworks (1)
Silverlight (12)
SQL (5)
Uncategorized (1)
Web (2)
Analytics (1)
Generation 5 » What do you do when you’ve caught an exception?

frequently.  Some of the best applications,  such as GNU emacs and Microsoft Word, 
keep a running log of changes to minimize work lost to application and system
crashes.  Users accept the situation.
On the other hand,   it’s unreasonable for a server application that serves hundreds or
millions of users to shut down on account of a cosmic ray.  Embedded systems,  in
particular,  function in a world where failure is frequent and the effects must be
minimized.   As we’ll see later,  it would be a real bummer if the Engine Control Unit in
your car left you stranded home because your oxygen sensor quit working.
The following diagram illustrates the environment of a work unit in a typical
application:  (although this application accesses network resources,  we’re not thinking
of it as a distributed application.  We’re responsible for the correct behavior of the
application running in a single address space,  not about the correct behavior of a
process swarm.)

The Input to the work unit is a potential source of trouble.  The input could be
invalid,  or it could trigger a bug in the work unit or elsewhere in the system (the
“system” encompasses everything in the diagram)   Even if the input is valid,  it could
contain a reference to a corrupted resource,  elsewhere in the system.  A corrupted
resource could be a damaged data structure (such as a colored box in a database), 
or an otherwise malfunctioning part of the system (a crashed server or router on the
network.)
Data structures in the work unit itself are the least problematic,  for purposes of error
handling,  because they don’t outlive the work unit and don’t have any impact on
future work units.
Static application data,  on the other hand,  persists after the work unit ends,  and
this has two possible consequences:
1. The current work unit can fail because a previous work unit caused a resource to
be corrupted, and
2. The current work unit can corrupt a resource,  causing a future work unit to fail
Osterman’s argument that applications should crash on errors is based on this reality: 
an unanticipated failure is a sign that the application is in an unknown (and possibly
bad) state,  and can’t be trusted to be reliable in the future.  Stopping the application
and restarting it clears out the static state,  eliminating resource corruption.
Rebooting the application,  however,  might not free up corrupted resources inside the
operating system.  Both desktop and server applications suffer from operating system
errors from time to time,  and often can get immediate relief by rebooting the whole
computer.
The “reboot” strategy runs out of steam when we cross the line from in-RAM state to
persistent state,  state that’s stored on disks,  or stored elsewhere on the network. 

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?

Once resources in the persistent world are corrupted,  they need to be (i) lived with, 
or repaired by (ii) manual or (iii) automatic action.
In either world,  a corrupted resource can have either a narrow (blue) or wide
(orange) effect on the application.  For instance,  the user account record of an
individual user could be damaged,  which prevents that user from logging in.  That’s
bad,  but it would hardly be catastrophic for a system that has 100,000 users.   It’s
best to ‘ignore’ this error,  because a system-wide ‘abort’ would deny service to
99,999 other users;  the problem can be corrected when the user complains,  or when
the problem is otherwise detected by the system administrator.
If,  on the other hand,  the cryptographic signing key that controls the authentication
process were lost,  nobody would be able to log in:  that’s quite a problem.  It’s kind
of the problem that will be noticed,  however,  so aborting at the work unit level
(authenticated request) is enough to protect the integrity of the system while the
administrators repair the problem.
Problems can happen at an intermediate scope as well.  For instance,  if the system
has damage to a message file for Italian users,  people who use the system in the
Italian language could be locked out.  If Italian speakers are 10% of the users,  it’s
best to keep the system running for others while you correct the problem.

Repair
There are several tools for dealing with corruption in persistent data stores. In a oneof-a-kind business system, a DBA may need to intervene occasionally to repair
corruption. More common events can be handled by running scripts which detect and
repair corruption, much like the fsck command in Unix or the chkdsk command in
Windows. Corruption in the metadata of a filesystem can, potentially, cause a
sequence of events which leads to massive data loss, so UNIX systems have
historically run the fsck command on filesystems whenever the filesystem is in a
questionable state (such as after a system crash or power failure.) The time do do an
fsck has become an increasing burden as disks have gotten larger, so modern UNIX
systems use journaling filesystems that protect  filesystem metadata with transactional
semantics.

Release and Rollback
One role of an exception handler for a unit of work is to take steps to prevent
corruption. This involves the release of resources, putting data in a safe state, and,
when possible, the rollback of transactions.
Although many kinds of persistent store support transactions, and many in-memory
data structures can support transactions, the most common transactional store that
people use is the relational database. Although transactions don’t protect the database
from all programming errors, they can ensure that neither expected or unexpected
exceptions will cause partially-completed work to remain in the database.
A classic example in pseudo code is the following:
[06] function TransferMoney(fromAccount,toAccount,amount) {
[07]
try {
[08]
BeginTransaction();
[09]
ChangeBalance(toAccount,amount);
[10]
... something throws exception here ...
[11]
ChangeBalance(fromAccount,-amount);
[12]
CommitTransaction();
[13]
} catch(Exception e) {
[14]
RollbackTransaction();
[15]
}
[16] }

In this (simplified) example, we’re transferring money from one bank account to
another. Potentially an exception thrown at line [05] could be serious, since it would
cause money to appear in toAccount without it being removed from fromAccount . It’s
bad enough if this happens by accident, but a clever cracker who finds a way to
cause an exception at line [05] has discovered a way to steal money from the bank.
Fortunately we’re doing this financial transaction inside a database transaction.
Everything done after BeginTransaction() is provisional: it doesn’t actually appear in
the database until CommitTransaction() is called. When an exception happens, we call
RollbackTransaction(), which makes it as if the first ChangeBalance() had never been
called.
As mentioned in the “Don’t Catch Exceptions” article, it often makes sense to do
release, rollback and repairing operations in a finally clause rather than the unit-ofwork catch clause because it lets an individual subsystem take care of itself — this

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?

promotes encapsulation. However, in applications that use databases transactionally, it
often makes sense to push transaction management out the the work unit.
Why? Complex database operations are often composed out of simpler database
operations that, themselves, should be done transactionally. To take an example,
imagine that somebody is opening a new account and funding it from an existing
account:
[17] function OpenAndFundNewAccount(accountInformation,oldAccount,amount) {
[18]
if (amount<MinimumAmount) {
[19]
throw new InvalidInputException(
[20]
"Attempted To Create Account With Balance Below Minimum"
[21]
);
[22]
}
[23]
newAccount=CreateNewAccountRecords(accountInformation);
[24]
TransferMoney(oldAccount,newAccount,amount);|
[25] }

It’s important that the TransferMoney operation be done transactionally, but it’s also
important that the whole OpenAndFundNewAccount operation be done transactionally
too, because we don’t want an account in the system to start with a zero balance.
A straightforward answer to this problem is to always do banking operations inside a
unit of work, and to begin, commit and roll back transactions at the work unit level:
[26] AtmOutput ProcessAtmRequest(AtmInput in) {
[27]
try {
[28]
BeginTransaction();
[29]
BankingOperation op=AtmInput.ParseOperation();
[30]
var out=op.Execute();
[31]
var atmOut=AtmOutput.Encode(out);
[32]
CommitTransaction();
[33]
return atmOut;
[34]
}
[35]
catch(Exception e) {
[36]
RollbackTransaction();
[37]
... Complete Error Handling ...
[38]
}

In this case, there might be a large number of functions that are used to manipulate
the database internally, but these are only accessable to customers and bank tellers
through a limited set of BankingOperations that are always executed in a transaction.

Notification
There are several parties that could be notified when something goes wrong with an
application, most commonly:
1. the end user,
2. the system administrator, and
3. the developers.
Sometimes, as in the case of a public-facing web application, #2 and #3 may overlap.
In desktop applications, #2 might not exist.
Let’s consider the end user first. The end user really needs to know (i) that something
went wrong, and (ii) what they can do about it. Often errors are caused by user input:
hopefully these errors are expected, so the system can tell the user specifically what
went wrong: for instance,
[39] try {
[40]
... process form information ...
[41]
[42]
if (!IsWellFormedSSN(ssn))
[43]
throw new InvalidInputException("You must supply a valid social
security number");
[44]
[45]
... process form some more ...
[46] } catch(InvalidInputException e) {
[47]
DisplayError(e.Message);
[48] }

other times, errors happen that are unexpected. Consider a common (and bad)
practice that we see in database applications: programs that write queries without
correctly escaping strings:
[49] dbConn.Execute("
[50]
INSERT INTO people (first_name,last_name)
[51]
VALUES ('"+firstName+"','+lastName+"');
[52] ");

this code is straightforward, but dangerous, because a single quote in the firstName or
lastName variable ends the string literal in the VALUES clause, and enables an SQL
injection attack. (I’d hope that you know better than than to do this, but large
projects worked on by large teams inevitably have problems of this order.) This code

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?

might even hold up well in testing, failing only in production when a person registers
with
[53] lastName="O'Reilly";

Now, the dbConn is going to throw something like a SqlException with the following
message:
[54] SqlException.Message="Invalid SQL Statement:
[55]
INSERT INTO people (first_name,last_name)
[56]
VALUES ('Baba','O'Reilly');"

we could show that message to the end user, but that message is worthless to most
people. Worse than that, it’s harmful if the end user is a cracker who could take
advantage of the error — it tells them the name of the affected table, the names of
the columns, and the exact SQL code that they can inject something into. You might
be better off showing users something like:

and telling them that they’ve experienced an “Internal Server Error.”  Even so,  the
discovery that a single quote can cause an “Internal Server Error” can be enough  for
a good cracker to sniff out the fault and develop an attack in the blind.. What can we
do? Warn the system administrators. The error handling system for a server
application should log exceptions, stack trace and all. It doesn’t matter if you use the
UNIX syslog mechanism, the logging service in Windows NT, or something that’s built
into your server, like Apache’s error_log . Although logging systems are built into both
Java and .Net, many developers find that Log4J and Log4N are especially effective.
There really are two ways to use logs:
1. Detailed logging information is useful for debugging problems after the fact. For
instance, if a user reports a problem, you can look in the logs to understand the
origin of the problem, making it easy to debug problems that occur rarely: this
can save hours of time trying to understand the exact problem a user is
experiencing.
2. A second approach to logs is proactive: to regularly look a logs to detect
problems before they get reported. In the example above, the SqlException
would probably first be thrown by an innocent person who has an apostrophe in
his or her name — if the error was detected that day and quickly fixed, a
potential security hole could be fixed long before it would be exploited. 
Organizaitons that investigate all exceptions thrown by production web
applications run the most secure and reliable applications.
In the last decade it’s become quite common for desktop applications to send stack
traces back to the developers after a crash: usually they pop up a dialog box that
asks for permission first. Although developers of desktop applications can’t be as
proactive as maintainers of server applications, this is a useful tool for discovering
errors that escape testing, and to discover how commonly they occur in the field.

Retry I: Do it again!
Some errors are transient: that is, if you try to do the same operation later, the
operation may succeed. Here are a few common cases:
An attempt to write to a DVD-R could fail because the disk is missing from the
drive
A database transaction could fail when you commit it because of a conflict with

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?

another transaction: an attempt to do the transaction again could succeed
An attempt to deliver a mail message could fail because of problems with the
network or destination mail server
A web crawler that crawls thousands (or millions) of sites will find that many of
them are down at any given time: it needs to deal with this reasonably, rather
than drop your site from it’s index because it happened to be down for a few
hours
Transient errors are commonly associated with the internet and with remote servers;
errors are frequent because of the complexity of the internet, but they’re transitory
because problems are repaired by both automatic and human intervention. For
instance, if a hardware failure causes a remote web or email server to go down, it’s
likely that somebody is going to notice the problem and fix it in a few hours or days.
One strategy for dealing with transient errors is to punt it back to the user: in a case
like this, we display an error message that tells the user that the problem might clear
up if they retry the operation. This is implicit in how web browsers work: sometimes
you try to visit a web page, you get an error message, then you hit reload and it’s all
OK. This strategy is particularly effective when the user could be aware that there’s a
problem with their internet connection and could do something about it: for instance,
they might discover that they’ve moved their laptop out of Wi-Fi range, or that the
DSL connection at their house has gone down for the weekend.
SMTP, the internet protocol for email, is one of the best examples of automated retry.
Compliant e-mail servers store outgoing mail in a queue: if an attempt to send mail to
a destination server fails, mail will stay in the queue for several days before reporting
failure to the user. Section 4.5.4 of RFC 2821 states:
The sender MUST delay retrying a particular destination after one
attempt has failed. In general, the retry interval SHOULD be at
least 30 minutes; however, more sophisticated and variable strategies
will be beneficial when the SMTP client can determine the reason for
non-delivery.
Retries continue until the message is transmitted or the sender gives
up; the give-up time generally needs to be at least 4-5 days. The
parameters to the retry algorithm MUST be configurable.
A client SHOULD keep a list of hosts it cannot reach and
corresponding connection timeouts, rather than just retrying queued
mail items.
Experience suggests that failures are typically transient (the target
system or its connection has crashed), favoring a policy of two
connection attempts in the first hour the message is in the queue,
and then backing off to one every two or three hours.

Practical mail servers use fsync() and other mechanisms to implement transactional
semantics on the queue: the needs of reliability make it expensive to run an SMTPcompliant server, so e-mail spammers often use non-compliant servers that don’t
correctly retry (if they’re going to send you 20 copies of the message anyway, who
cares if only 15 get through?) Greylisting is a highly effective filtering strategy that
tests the compliance of SMTP senders by forcing a retry.

Retry II: If first you don’t succeed…
An alternate form of retry is to try something different. For instance, many programs
in the UNIX environment will look in many different places for a configuration file: if
the file isn’t in the first place tried, it will try the second place and so forth.
The online e-print server at arXiv.org has a system called AutoTex which automatically
converts documents written in several dialects of TeX and LaTeX into Postscript and
PDF files. AutoTex unpacks the files in a submission into a directory and uses chroot
to run the document processing tools in a protected sandbox. It tries about of ten
different configurations until it finds one that successfully compiles the document.
In embedded applications,  where availability is important,  it’s common to fall back to
a “safe mode” when normal operation is impossible.  The Engine Control Unit in a
modern car is a good example:

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?

Since the 1970′s,   regulations in the United States have reduced emissions of
hydrocarbons and nitrogen oxides from passenger automobiles by more than a
hundred fold.  The technology has many aspects,  but the core of the system in an
Engine Control Unit that uses a collection of sensors to monitor the state of the engine
and uses this information to adjust engine parameters (such as the quantity of fuel
injected) to balance performance and fuel economy with environmental compliance.
As the condition of the engine,  driving conditions and composition of fuel change over
the time,  the ECU normally operates in a “closed-loop” mode that continually
optimizes performance.   When part of the system fails (for instance,  the oxygen
sensor) the ECU switches to an “open-loop” mode.  Rather than leaving you
stranded,  it lights the “check engine” indicator and operates the engine with
conservative assumptions that will get you home and to a repair shop.

Ignore?
One strength of exceptions,  compared to the older return-value method of error
handling is that the default behavior of an exception is to abort,  not to ignore.  In
general,  that’s good,  but there are a few cases where “ignore” is the best option. 
Ignoring an error makes sense when:
1. Security is not at stake,  and
2. there’s no alternative action available,  and
3. the consequences of an abort are worse than the consequences of avoiding an
error
The first rule is important,  because crackers will take advantage of system faults to
attack a system.  Imagine,  for instance,  a “smart card” chip embedded in a payment
card.  People have successfully extracted information from smart cards by fault
injection:  this could be anything from a power dropout to a bright flash of light on an
exposed silicon surface.  If you’re concerned that a system will be abused,  it’s
probably best to shut down when abnormal conditions are detected.
On the other hand,  some operations are vestigial to an application.  Imagine,  for
instance,  a dialog box that pops when an application crashes that offers the user the
choice of sending a stack trace to the vendor.  If the attempt to send the stack trace
fails,  it’s best to ignore the failure — there’s no point in subjecting the user to an
endless series of dialog boxes.
“Ignoring” often makes sense in the applications that matter the most and those that
matter the least.
For instance,  media players and video games operate in a hostile environment where
disks,  the network, sound and controller hardware are uncooperative.  The “unit of
work” could be the rendering of an individual frame:  it’s appropriate for entertainment
devices to soldier on despite hardware defects,  unplugged game controllers,  network
dropouts and corrupted inputs,  since the consequences of failure are no worse than
shutting the system down.
In the opposite case,  high-value systems and high-risk should continue functioning
no matter what happen.  The software for a space probe,  for instance,  should never
give up.  Much like an automotive ECU,  space probes default to a “safe mode” when
contact with the earth is lost:  frequently this strategy involves one or more reboots, 
but the goal is to always regain contact with controllers so that the mission has a
chance at success.

Conclusion
http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?

It’s most practical to catch exceptions at the boundaries of relatively coarse “units of
work.” Although the handling of errors usually involves some amount of rollback
(restoring system state) and notification of affected people, the ultimate choices are
still what they were in the days of DOS: abort, retry, or ignore.
Correct handling of an error requires some thought about the cause of an error: was it
caused by bad input, corrupted application state, or a transient network failure? It’s
also important to understand the impact the error has on the application state and to
try to reduce it using mechanisms such as database transactions.
“Abort” is a logical choice when an error is likely to have caused corruption of the
application state, or if an error was probably caused by a corrupted state. Applications
that depend on network communications sometimes must “Retry” operations when
they are interrupted by network failures. Another form of “Retry” is to try a different
approach to an operation when the first approach fails. Finally, “Ignore” is appropriate
when “Retry” isn’t available and the cost of “Abort” is worse than soldiering on.
This article is one of a series on error handling.  The next article in this series will
describe practices for defining and throwing exceptions that gives exception handlers
good information for making decisions.  Subscribers to our RSS Feed will be the first
to read it.
Paul Houle on August 27th 2008 in Dot Net, Exceptions, Java, PHP, SQL

Comments (4)

Comments (4)

Login

Sort by: Date Rating Last Activity
Brandon Edens · 280 weeks ago

0

Change the nature of the game by using a programming language that supports something beyond
primitive exceptions, try/catch/finally, etc...
Try a condition system today:
http://www.gigamonkeys.com/book/beyond-exception-...
Reply

Paul Houle · 280 weeks ago

0

@Brandon,
that's neat stuff. I see some things in that chapter that are right along the lines that I'm thinking.
Could this behavior be easily emulated in a language like C# that supports lambdas and delegates?
Reply

web design company · 280 weeks ago

0

Throw it back
Reply

Generation 5 » Twitter Joins Me

[...] several other bloggers had hotlinked the copy of the twitter fail whale that was in my old “What do
you do if you catch an exception?” post.  It turns out that my copy of the whale currently ranks #1 in
Google Image Search.  [...]

Post a new comment
Enter text right here!

Comment as a Guest, or login:
Name

Email

Website (optional)

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
Generation 5 » What do you do when you’ve caught an exception?
Displayed next to your comments.

Not displayed publicly.

If you have a website, link to it here.

None
Subscribe to None

Submit Comment

Copyright © 2013 Generation 5.
WordPress Theme design.

http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]

More Related Content

Similar to What do you do when you’ve caught an exception?

dist_systems.pdf
dist_systems.pdfdist_systems.pdf
dist_systems.pdf
CherenetToma
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
Charity Majors
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
sri indu 1213 it
sri indu 1213 itsri indu 1213 it
sri indu 1213 it
jignash
 
Developing fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDeveloping fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systems
Dr Amira Bibo
 
Operating System Structure Of A Single Large Executable...
Operating System Structure Of A Single Large Executable...Operating System Structure Of A Single Large Executable...
Operating System Structure Of A Single Large Executable...
Jennifer Lopez
 
Adidrds
AdidrdsAdidrds
Running Head MALWARE1MALWARE2MalwareName.docx
Running Head MALWARE1MALWARE2MalwareName.docxRunning Head MALWARE1MALWARE2MalwareName.docx
Running Head MALWARE1MALWARE2MalwareName.docx
cowinhelen
 
Ads7 deflorio
Ads7 deflorioAds7 deflorio
Ads7 deflorio
Vincenzo De Florio
 
FAILURE FREE CLOUD COMPUTING ARCHITECTURES
FAILURE FREE CLOUD COMPUTING ARCHITECTURESFAILURE FREE CLOUD COMPUTING ARCHITECTURES
FAILURE FREE CLOUD COMPUTING ARCHITECTURES
ijcsit
 
Failure Free Cloud Computing Architectures
Failure Free Cloud Computing ArchitecturesFailure Free Cloud Computing Architectures
Failure Free Cloud Computing Architectures
AIRCC Publishing Corporation
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
Why software performance reduces with time?.pdf
Why software performance reduces with time?.pdfWhy software performance reduces with time?.pdf
Why software performance reduces with time?.pdf
Mike Brown
 
System structure
System structureSystem structure
System structure
Kalyani Patil
 
Computer integrated manufacturing
Computer integrated manufacturingComputer integrated manufacturing
Computer integrated manufacturing
Syed Ajeesh
 
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMI
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMIEvolving role of Software,Legacy software,CASE tools,Process Models,CMMI
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMI
nimmik4u
 
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
ijgca
 
Program Aging and Service Crash
Program Aging and Service CrashProgram Aging and Service Crash
Program Aging and Service Crash
Editor IJCATR
 
A Study Of Real-Time Embedded Software Systems And Real-Time Operating Systems
A Study Of Real-Time Embedded Software Systems And Real-Time Operating SystemsA Study Of Real-Time Embedded Software Systems And Real-Time Operating Systems
A Study Of Real-Time Embedded Software Systems And Real-Time Operating Systems
Rick Vogel
 
DevOps_SelfHealing
DevOps_SelfHealingDevOps_SelfHealing
DevOps_SelfHealing
Atul Dhingra
 

Similar to What do you do when you’ve caught an exception? (20)

dist_systems.pdf
dist_systems.pdfdist_systems.pdf
dist_systems.pdf
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
sri indu 1213 it
sri indu 1213 itsri indu 1213 it
sri indu 1213 it
 
Developing fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDeveloping fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systems
 
Operating System Structure Of A Single Large Executable...
Operating System Structure Of A Single Large Executable...Operating System Structure Of A Single Large Executable...
Operating System Structure Of A Single Large Executable...
 
Adidrds
AdidrdsAdidrds
Adidrds
 
Running Head MALWARE1MALWARE2MalwareName.docx
Running Head MALWARE1MALWARE2MalwareName.docxRunning Head MALWARE1MALWARE2MalwareName.docx
Running Head MALWARE1MALWARE2MalwareName.docx
 
Ads7 deflorio
Ads7 deflorioAds7 deflorio
Ads7 deflorio
 
FAILURE FREE CLOUD COMPUTING ARCHITECTURES
FAILURE FREE CLOUD COMPUTING ARCHITECTURESFAILURE FREE CLOUD COMPUTING ARCHITECTURES
FAILURE FREE CLOUD COMPUTING ARCHITECTURES
 
Failure Free Cloud Computing Architectures
Failure Free Cloud Computing ArchitecturesFailure Free Cloud Computing Architectures
Failure Free Cloud Computing Architectures
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Why software performance reduces with time?.pdf
Why software performance reduces with time?.pdfWhy software performance reduces with time?.pdf
Why software performance reduces with time?.pdf
 
System structure
System structureSystem structure
System structure
 
Computer integrated manufacturing
Computer integrated manufacturingComputer integrated manufacturing
Computer integrated manufacturing
 
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMI
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMIEvolving role of Software,Legacy software,CASE tools,Process Models,CMMI
Evolving role of Software,Legacy software,CASE tools,Process Models,CMMI
 
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
Program Aging and Service Crash
Program Aging and Service CrashProgram Aging and Service Crash
Program Aging and Service Crash
 
A Study Of Real-Time Embedded Software Systems And Real-Time Operating Systems
A Study Of Real-Time Embedded Software Systems And Real-Time Operating SystemsA Study Of Real-Time Embedded Software Systems And Real-Time Operating Systems
A Study Of Real-Time Embedded Software Systems And Real-Time Operating Systems
 
DevOps_SelfHealing
DevOps_SelfHealingDevOps_SelfHealing
DevOps_SelfHealing
 

More from Paul Houle

Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
Paul Houle
 
Estimating the Software Product Value during the Development Process
Estimating the Software Product Value during the Development ProcessEstimating the Software Product Value during the Development Process
Estimating the Software Product Value during the Development Process
Paul Houle
 
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Paul Houle
 
Fixing a leaky bucket; Observations on the Global LEI System
Fixing a leaky bucket; Observations on the Global LEI SystemFixing a leaky bucket; Observations on the Global LEI System
Fixing a leaky bucket; Observations on the Global LEI System
Paul Houle
 
Cisco Fog Strategy For Big and Smart Data
Cisco Fog Strategy For Big and Smart DataCisco Fog Strategy For Big and Smart Data
Cisco Fog Strategy For Big and Smart Data
Paul Houle
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
Paul Houle
 
Ontology2 platform
Ontology2 platformOntology2 platform
Ontology2 platform
Paul Houle
 
Ontology2 Platform Evolution
Ontology2 Platform EvolutionOntology2 Platform Evolution
Ontology2 Platform Evolution
Paul Houle
 
Paul houle the supermen
Paul houle   the supermenPaul houle   the supermen
Paul houle the supermen
Paul Houle
 
Paul houle what ails enterprise search
Paul houle   what ails enterprise search Paul houle   what ails enterprise search
Paul houle what ails enterprise search
Paul Houle
 
Subjective Importance Smackdown
Subjective Importance SmackdownSubjective Importance Smackdown
Subjective Importance Smackdown
Paul Houle
 
Extension methods, nulls, namespaces and precedence in c#
Extension methods, nulls, namespaces and precedence in c#Extension methods, nulls, namespaces and precedence in c#
Extension methods, nulls, namespaces and precedence in c#
Paul Houle
 
Dropping unique constraints in sql server
Dropping unique constraints in sql serverDropping unique constraints in sql server
Dropping unique constraints in sql server
Paul Houle
 
Paul houle resume
Paul houle resumePaul houle resume
Paul houle resume
Paul Houle
 
Embrace dynamic PHP
Embrace dynamic PHPEmbrace dynamic PHP
Embrace dynamic PHP
Paul Houle
 
Once asynchronous, always asynchronous
Once asynchronous, always asynchronousOnce asynchronous, always asynchronous
Once asynchronous, always asynchronous
Paul Houle
 
Pro align snap 2
Pro align snap 2Pro align snap 2
Pro align snap 2
Paul Houle
 
Proalign Snapshot 1
Proalign Snapshot 1Proalign Snapshot 1
Proalign Snapshot 1
Paul Houle
 
Text wise technology textwise company, llc
Text wise technology   textwise company, llcText wise technology   textwise company, llc
Text wise technology textwise company, llc
Paul Houle
 
Tapir user manager
Tapir user managerTapir user manager
Tapir user manager
Paul Houle
 

More from Paul Houle (20)

Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 
Estimating the Software Product Value during the Development Process
Estimating the Software Product Value during the Development ProcessEstimating the Software Product Value during the Development Process
Estimating the Software Product Value during the Development Process
 
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
 
Fixing a leaky bucket; Observations on the Global LEI System
Fixing a leaky bucket; Observations on the Global LEI SystemFixing a leaky bucket; Observations on the Global LEI System
Fixing a leaky bucket; Observations on the Global LEI System
 
Cisco Fog Strategy For Big and Smart Data
Cisco Fog Strategy For Big and Smart DataCisco Fog Strategy For Big and Smart Data
Cisco Fog Strategy For Big and Smart Data
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 
Ontology2 platform
Ontology2 platformOntology2 platform
Ontology2 platform
 
Ontology2 Platform Evolution
Ontology2 Platform EvolutionOntology2 Platform Evolution
Ontology2 Platform Evolution
 
Paul houle the supermen
Paul houle   the supermenPaul houle   the supermen
Paul houle the supermen
 
Paul houle what ails enterprise search
Paul houle   what ails enterprise search Paul houle   what ails enterprise search
Paul houle what ails enterprise search
 
Subjective Importance Smackdown
Subjective Importance SmackdownSubjective Importance Smackdown
Subjective Importance Smackdown
 
Extension methods, nulls, namespaces and precedence in c#
Extension methods, nulls, namespaces and precedence in c#Extension methods, nulls, namespaces and precedence in c#
Extension methods, nulls, namespaces and precedence in c#
 
Dropping unique constraints in sql server
Dropping unique constraints in sql serverDropping unique constraints in sql server
Dropping unique constraints in sql server
 
Paul houle resume
Paul houle resumePaul houle resume
Paul houle resume
 
Embrace dynamic PHP
Embrace dynamic PHPEmbrace dynamic PHP
Embrace dynamic PHP
 
Once asynchronous, always asynchronous
Once asynchronous, always asynchronousOnce asynchronous, always asynchronous
Once asynchronous, always asynchronous
 
Pro align snap 2
Pro align snap 2Pro align snap 2
Pro align snap 2
 
Proalign Snapshot 1
Proalign Snapshot 1Proalign Snapshot 1
Proalign Snapshot 1
 
Text wise technology textwise company, llc
Text wise technology   textwise company, llcText wise technology   textwise company, llc
Text wise technology textwise company, llc
 
Tapir user manager
Tapir user managerTapir user manager
Tapir user manager
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 

What do you do when you’ve caught an exception?

  • 1. Generation 5 » What do you do when you’ve caught an exception?  Subscribe to our RSS Feed | About Us What do you do when you’ve caught an exception? Abort, Retry, Ignore This article is a follow up to “Don’t Catch Exceptions“, which advocates that exceptions should (in general) be passed up to a “unit of work”, that is, a fairly coarse-grained activity which can reasonably be failed, retried or ignored. A unit of work could be: an entire program, for a command-line script, a single web request in a web application, the delivery of an e-mail message the handling of a single input record in a batch loading application, rendering a single frame in a media player or a video game, or an event handler in a GUI program The code around the unit of work may look something like [01] try { [02] DoUnitOfWork() [03] } catch(Exception e) { [04] ... examine exception and decide what to do ... [05] } For the most part, the code inside DoUnitOfWork() and the functions it calls tries to throw exceptions upward rather than catch them. To handle errors correctly, you need to answer a few questions, such as Was this error caused by a corrupted application state? Did this error cause the application state to be corrupted? Was this error caused by invalid input? What do we tell the user, the developers and the system administrator? Could this operation succeed if it was retried? Is there something else we could do? Although it’s good to depend on existing exception hierarchies (at least you won’t introduce new problems), the way that exceptions are defined and thrown inside the work unit should help the code on line [04] make a decision about what to do — such practices are the subject of a future article, which subscribers to our RSS feed will be the first to read. The cause and effect of errors There are a certain range of error conditions that are predictable,  where it’s possible to detect the error and implement the correct response.  As an application becomes more complex,  the number of possible errors explodes,  and it becomes impossible or unacceptably expensive to implement explicit handling of every condition. What do do about unanticipated errors is a controversial topic.  Two extreme positions are: (i) an unexpected error could be a sign that the application is corrupted, so that the application should be shut down, and (ii) systems should bend but not break: we should be optimistic and hope for the best.  Ultimately, there’s a contradiction between integrity and availability, and different systems make different choices.  The ecosystem around Microsoft Windows,  where people predominantly develop desktop applications,   is inclined to give up the ghost when things go wrong — better to show a “blue screen of death” than to let the unpredictable happen.  In the Unix ecosystem,  more centered around server applications and custom scripts,  the tendency is to soldier on in the face of adversity. What’s at stake? Desktop applications tend to fail when unexpected errors happen:  users learn to save http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM] Search for: Search Archives June 2012 (1) August 2010 (1) May 2010 (1) June 2009 (2) April 2009 (1) March 2009 (1) February 2009 (3) January 2009 (3) November 2008 (1) August 2008 (2) July 2008 (5) June 2008 (5) May 2008 (2) April 2008 (6) March 2008 (8) June 2006 (1) February 2006 (1) Categories AJAX (2) Asynchronous Communications (16) Biology (1) Books (1) Design (1) Distributed (1) Exceptions (2) Functional Programming (1) GIS (1) Ithaca (1) Japan (1) Math (1) Media (3) Nature (1) Semantic Web (3) Tools (28) CRUD (1) Dot Net (17) Freebase (2) GWT (9) Java (7) Linq (2) PHP (6) Server Frameworks (1) Silverlight (12) SQL (5) Uncategorized (1) Web (2) Analytics (1)
  • 2. Generation 5 » What do you do when you’ve caught an exception? frequently.  Some of the best applications,  such as GNU emacs and Microsoft Word,  keep a running log of changes to minimize work lost to application and system crashes.  Users accept the situation. On the other hand,   it’s unreasonable for a server application that serves hundreds or millions of users to shut down on account of a cosmic ray.  Embedded systems,  in particular,  function in a world where failure is frequent and the effects must be minimized.   As we’ll see later,  it would be a real bummer if the Engine Control Unit in your car left you stranded home because your oxygen sensor quit working. The following diagram illustrates the environment of a work unit in a typical application:  (although this application accesses network resources,  we’re not thinking of it as a distributed application.  We’re responsible for the correct behavior of the application running in a single address space,  not about the correct behavior of a process swarm.) The Input to the work unit is a potential source of trouble.  The input could be invalid,  or it could trigger a bug in the work unit or elsewhere in the system (the “system” encompasses everything in the diagram)   Even if the input is valid,  it could contain a reference to a corrupted resource,  elsewhere in the system.  A corrupted resource could be a damaged data structure (such as a colored box in a database),  or an otherwise malfunctioning part of the system (a crashed server or router on the network.) Data structures in the work unit itself are the least problematic,  for purposes of error handling,  because they don’t outlive the work unit and don’t have any impact on future work units. Static application data,  on the other hand,  persists after the work unit ends,  and this has two possible consequences: 1. The current work unit can fail because a previous work unit caused a resource to be corrupted, and 2. The current work unit can corrupt a resource,  causing a future work unit to fail Osterman’s argument that applications should crash on errors is based on this reality:  an unanticipated failure is a sign that the application is in an unknown (and possibly bad) state,  and can’t be trusted to be reliable in the future.  Stopping the application and restarting it clears out the static state,  eliminating resource corruption. Rebooting the application,  however,  might not free up corrupted resources inside the operating system.  Both desktop and server applications suffer from operating system errors from time to time,  and often can get immediate relief by rebooting the whole computer. The “reboot” strategy runs out of steam when we cross the line from in-RAM state to persistent state,  state that’s stored on disks,  or stored elsewhere on the network.  http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 3. Generation 5 » What do you do when you’ve caught an exception? Once resources in the persistent world are corrupted,  they need to be (i) lived with,  or repaired by (ii) manual or (iii) automatic action. In either world,  a corrupted resource can have either a narrow (blue) or wide (orange) effect on the application.  For instance,  the user account record of an individual user could be damaged,  which prevents that user from logging in.  That’s bad,  but it would hardly be catastrophic for a system that has 100,000 users.   It’s best to ‘ignore’ this error,  because a system-wide ‘abort’ would deny service to 99,999 other users;  the problem can be corrected when the user complains,  or when the problem is otherwise detected by the system administrator. If,  on the other hand,  the cryptographic signing key that controls the authentication process were lost,  nobody would be able to log in:  that’s quite a problem.  It’s kind of the problem that will be noticed,  however,  so aborting at the work unit level (authenticated request) is enough to protect the integrity of the system while the administrators repair the problem. Problems can happen at an intermediate scope as well.  For instance,  if the system has damage to a message file for Italian users,  people who use the system in the Italian language could be locked out.  If Italian speakers are 10% of the users,  it’s best to keep the system running for others while you correct the problem. Repair There are several tools for dealing with corruption in persistent data stores. In a oneof-a-kind business system, a DBA may need to intervene occasionally to repair corruption. More common events can be handled by running scripts which detect and repair corruption, much like the fsck command in Unix or the chkdsk command in Windows. Corruption in the metadata of a filesystem can, potentially, cause a sequence of events which leads to massive data loss, so UNIX systems have historically run the fsck command on filesystems whenever the filesystem is in a questionable state (such as after a system crash or power failure.) The time do do an fsck has become an increasing burden as disks have gotten larger, so modern UNIX systems use journaling filesystems that protect  filesystem metadata with transactional semantics. Release and Rollback One role of an exception handler for a unit of work is to take steps to prevent corruption. This involves the release of resources, putting data in a safe state, and, when possible, the rollback of transactions. Although many kinds of persistent store support transactions, and many in-memory data structures can support transactions, the most common transactional store that people use is the relational database. Although transactions don’t protect the database from all programming errors, they can ensure that neither expected or unexpected exceptions will cause partially-completed work to remain in the database. A classic example in pseudo code is the following: [06] function TransferMoney(fromAccount,toAccount,amount) { [07] try { [08] BeginTransaction(); [09] ChangeBalance(toAccount,amount); [10] ... something throws exception here ... [11] ChangeBalance(fromAccount,-amount); [12] CommitTransaction(); [13] } catch(Exception e) { [14] RollbackTransaction(); [15] } [16] } In this (simplified) example, we’re transferring money from one bank account to another. Potentially an exception thrown at line [05] could be serious, since it would cause money to appear in toAccount without it being removed from fromAccount . It’s bad enough if this happens by accident, but a clever cracker who finds a way to cause an exception at line [05] has discovered a way to steal money from the bank. Fortunately we’re doing this financial transaction inside a database transaction. Everything done after BeginTransaction() is provisional: it doesn’t actually appear in the database until CommitTransaction() is called. When an exception happens, we call RollbackTransaction(), which makes it as if the first ChangeBalance() had never been called. As mentioned in the “Don’t Catch Exceptions” article, it often makes sense to do release, rollback and repairing operations in a finally clause rather than the unit-ofwork catch clause because it lets an individual subsystem take care of itself — this http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 4. Generation 5 » What do you do when you’ve caught an exception? promotes encapsulation. However, in applications that use databases transactionally, it often makes sense to push transaction management out the the work unit. Why? Complex database operations are often composed out of simpler database operations that, themselves, should be done transactionally. To take an example, imagine that somebody is opening a new account and funding it from an existing account: [17] function OpenAndFundNewAccount(accountInformation,oldAccount,amount) { [18] if (amount<MinimumAmount) { [19] throw new InvalidInputException( [20] "Attempted To Create Account With Balance Below Minimum" [21] ); [22] } [23] newAccount=CreateNewAccountRecords(accountInformation); [24] TransferMoney(oldAccount,newAccount,amount);| [25] } It’s important that the TransferMoney operation be done transactionally, but it’s also important that the whole OpenAndFundNewAccount operation be done transactionally too, because we don’t want an account in the system to start with a zero balance. A straightforward answer to this problem is to always do banking operations inside a unit of work, and to begin, commit and roll back transactions at the work unit level: [26] AtmOutput ProcessAtmRequest(AtmInput in) { [27] try { [28] BeginTransaction(); [29] BankingOperation op=AtmInput.ParseOperation(); [30] var out=op.Execute(); [31] var atmOut=AtmOutput.Encode(out); [32] CommitTransaction(); [33] return atmOut; [34] } [35] catch(Exception e) { [36] RollbackTransaction(); [37] ... Complete Error Handling ... [38] } In this case, there might be a large number of functions that are used to manipulate the database internally, but these are only accessable to customers and bank tellers through a limited set of BankingOperations that are always executed in a transaction. Notification There are several parties that could be notified when something goes wrong with an application, most commonly: 1. the end user, 2. the system administrator, and 3. the developers. Sometimes, as in the case of a public-facing web application, #2 and #3 may overlap. In desktop applications, #2 might not exist. Let’s consider the end user first. The end user really needs to know (i) that something went wrong, and (ii) what they can do about it. Often errors are caused by user input: hopefully these errors are expected, so the system can tell the user specifically what went wrong: for instance, [39] try { [40] ... process form information ... [41] [42] if (!IsWellFormedSSN(ssn)) [43] throw new InvalidInputException("You must supply a valid social security number"); [44] [45] ... process form some more ... [46] } catch(InvalidInputException e) { [47] DisplayError(e.Message); [48] } other times, errors happen that are unexpected. Consider a common (and bad) practice that we see in database applications: programs that write queries without correctly escaping strings: [49] dbConn.Execute(" [50] INSERT INTO people (first_name,last_name) [51] VALUES ('"+firstName+"','+lastName+"'); [52] "); this code is straightforward, but dangerous, because a single quote in the firstName or lastName variable ends the string literal in the VALUES clause, and enables an SQL injection attack. (I’d hope that you know better than than to do this, but large projects worked on by large teams inevitably have problems of this order.) This code http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 5. Generation 5 » What do you do when you’ve caught an exception? might even hold up well in testing, failing only in production when a person registers with [53] lastName="O'Reilly"; Now, the dbConn is going to throw something like a SqlException with the following message: [54] SqlException.Message="Invalid SQL Statement: [55] INSERT INTO people (first_name,last_name) [56] VALUES ('Baba','O'Reilly');" we could show that message to the end user, but that message is worthless to most people. Worse than that, it’s harmful if the end user is a cracker who could take advantage of the error — it tells them the name of the affected table, the names of the columns, and the exact SQL code that they can inject something into. You might be better off showing users something like: and telling them that they’ve experienced an “Internal Server Error.”  Even so,  the discovery that a single quote can cause an “Internal Server Error” can be enough  for a good cracker to sniff out the fault and develop an attack in the blind.. What can we do? Warn the system administrators. The error handling system for a server application should log exceptions, stack trace and all. It doesn’t matter if you use the UNIX syslog mechanism, the logging service in Windows NT, or something that’s built into your server, like Apache’s error_log . Although logging systems are built into both Java and .Net, many developers find that Log4J and Log4N are especially effective. There really are two ways to use logs: 1. Detailed logging information is useful for debugging problems after the fact. For instance, if a user reports a problem, you can look in the logs to understand the origin of the problem, making it easy to debug problems that occur rarely: this can save hours of time trying to understand the exact problem a user is experiencing. 2. A second approach to logs is proactive: to regularly look a logs to detect problems before they get reported. In the example above, the SqlException would probably first be thrown by an innocent person who has an apostrophe in his or her name — if the error was detected that day and quickly fixed, a potential security hole could be fixed long before it would be exploited.  Organizaitons that investigate all exceptions thrown by production web applications run the most secure and reliable applications. In the last decade it’s become quite common for desktop applications to send stack traces back to the developers after a crash: usually they pop up a dialog box that asks for permission first. Although developers of desktop applications can’t be as proactive as maintainers of server applications, this is a useful tool for discovering errors that escape testing, and to discover how commonly they occur in the field. Retry I: Do it again! Some errors are transient: that is, if you try to do the same operation later, the operation may succeed. Here are a few common cases: An attempt to write to a DVD-R could fail because the disk is missing from the drive A database transaction could fail when you commit it because of a conflict with http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 6. Generation 5 » What do you do when you’ve caught an exception? another transaction: an attempt to do the transaction again could succeed An attempt to deliver a mail message could fail because of problems with the network or destination mail server A web crawler that crawls thousands (or millions) of sites will find that many of them are down at any given time: it needs to deal with this reasonably, rather than drop your site from it’s index because it happened to be down for a few hours Transient errors are commonly associated with the internet and with remote servers; errors are frequent because of the complexity of the internet, but they’re transitory because problems are repaired by both automatic and human intervention. For instance, if a hardware failure causes a remote web or email server to go down, it’s likely that somebody is going to notice the problem and fix it in a few hours or days. One strategy for dealing with transient errors is to punt it back to the user: in a case like this, we display an error message that tells the user that the problem might clear up if they retry the operation. This is implicit in how web browsers work: sometimes you try to visit a web page, you get an error message, then you hit reload and it’s all OK. This strategy is particularly effective when the user could be aware that there’s a problem with their internet connection and could do something about it: for instance, they might discover that they’ve moved their laptop out of Wi-Fi range, or that the DSL connection at their house has gone down for the weekend. SMTP, the internet protocol for email, is one of the best examples of automated retry. Compliant e-mail servers store outgoing mail in a queue: if an attempt to send mail to a destination server fails, mail will stay in the queue for several days before reporting failure to the user. Section 4.5.4 of RFC 2821 states: The sender MUST delay retrying a particular destination after one attempt has failed. In general, the retry interval SHOULD be at least 30 minutes; however, more sophisticated and variable strategies will be beneficial when the SMTP client can determine the reason for non-delivery. Retries continue until the message is transmitted or the sender gives up; the give-up time generally needs to be at least 4-5 days. The parameters to the retry algorithm MUST be configurable. A client SHOULD keep a list of hosts it cannot reach and corresponding connection timeouts, rather than just retrying queued mail items. Experience suggests that failures are typically transient (the target system or its connection has crashed), favoring a policy of two connection attempts in the first hour the message is in the queue, and then backing off to one every two or three hours. Practical mail servers use fsync() and other mechanisms to implement transactional semantics on the queue: the needs of reliability make it expensive to run an SMTPcompliant server, so e-mail spammers often use non-compliant servers that don’t correctly retry (if they’re going to send you 20 copies of the message anyway, who cares if only 15 get through?) Greylisting is a highly effective filtering strategy that tests the compliance of SMTP senders by forcing a retry. Retry II: If first you don’t succeed… An alternate form of retry is to try something different. For instance, many programs in the UNIX environment will look in many different places for a configuration file: if the file isn’t in the first place tried, it will try the second place and so forth. The online e-print server at arXiv.org has a system called AutoTex which automatically converts documents written in several dialects of TeX and LaTeX into Postscript and PDF files. AutoTex unpacks the files in a submission into a directory and uses chroot to run the document processing tools in a protected sandbox. It tries about of ten different configurations until it finds one that successfully compiles the document. In embedded applications,  where availability is important,  it’s common to fall back to a “safe mode” when normal operation is impossible.  The Engine Control Unit in a modern car is a good example: http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 7. Generation 5 » What do you do when you’ve caught an exception? Since the 1970′s,   regulations in the United States have reduced emissions of hydrocarbons and nitrogen oxides from passenger automobiles by more than a hundred fold.  The technology has many aspects,  but the core of the system in an Engine Control Unit that uses a collection of sensors to monitor the state of the engine and uses this information to adjust engine parameters (such as the quantity of fuel injected) to balance performance and fuel economy with environmental compliance. As the condition of the engine,  driving conditions and composition of fuel change over the time,  the ECU normally operates in a “closed-loop” mode that continually optimizes performance.   When part of the system fails (for instance,  the oxygen sensor) the ECU switches to an “open-loop” mode.  Rather than leaving you stranded,  it lights the “check engine” indicator and operates the engine with conservative assumptions that will get you home and to a repair shop. Ignore? One strength of exceptions,  compared to the older return-value method of error handling is that the default behavior of an exception is to abort,  not to ignore.  In general,  that’s good,  but there are a few cases where “ignore” is the best option.  Ignoring an error makes sense when: 1. Security is not at stake,  and 2. there’s no alternative action available,  and 3. the consequences of an abort are worse than the consequences of avoiding an error The first rule is important,  because crackers will take advantage of system faults to attack a system.  Imagine,  for instance,  a “smart card” chip embedded in a payment card.  People have successfully extracted information from smart cards by fault injection:  this could be anything from a power dropout to a bright flash of light on an exposed silicon surface.  If you’re concerned that a system will be abused,  it’s probably best to shut down when abnormal conditions are detected. On the other hand,  some operations are vestigial to an application.  Imagine,  for instance,  a dialog box that pops when an application crashes that offers the user the choice of sending a stack trace to the vendor.  If the attempt to send the stack trace fails,  it’s best to ignore the failure — there’s no point in subjecting the user to an endless series of dialog boxes. “Ignoring” often makes sense in the applications that matter the most and those that matter the least. For instance,  media players and video games operate in a hostile environment where disks,  the network, sound and controller hardware are uncooperative.  The “unit of work” could be the rendering of an individual frame:  it’s appropriate for entertainment devices to soldier on despite hardware defects,  unplugged game controllers,  network dropouts and corrupted inputs,  since the consequences of failure are no worse than shutting the system down. In the opposite case,  high-value systems and high-risk should continue functioning no matter what happen.  The software for a space probe,  for instance,  should never give up.  Much like an automotive ECU,  space probes default to a “safe mode” when contact with the earth is lost:  frequently this strategy involves one or more reboots,  but the goal is to always regain contact with controllers so that the mission has a chance at success. Conclusion http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 8. Generation 5 » What do you do when you’ve caught an exception? It’s most practical to catch exceptions at the boundaries of relatively coarse “units of work.” Although the handling of errors usually involves some amount of rollback (restoring system state) and notification of affected people, the ultimate choices are still what they were in the days of DOS: abort, retry, or ignore. Correct handling of an error requires some thought about the cause of an error: was it caused by bad input, corrupted application state, or a transient network failure? It’s also important to understand the impact the error has on the application state and to try to reduce it using mechanisms such as database transactions. “Abort” is a logical choice when an error is likely to have caused corruption of the application state, or if an error was probably caused by a corrupted state. Applications that depend on network communications sometimes must “Retry” operations when they are interrupted by network failures. Another form of “Retry” is to try a different approach to an operation when the first approach fails. Finally, “Ignore” is appropriate when “Retry” isn’t available and the cost of “Abort” is worse than soldiering on. This article is one of a series on error handling.  The next article in this series will describe practices for defining and throwing exceptions that gives exception handlers good information for making decisions.  Subscribers to our RSS Feed will be the first to read it. Paul Houle on August 27th 2008 in Dot Net, Exceptions, Java, PHP, SQL Comments (4) Comments (4) Login Sort by: Date Rating Last Activity Brandon Edens · 280 weeks ago 0 Change the nature of the game by using a programming language that supports something beyond primitive exceptions, try/catch/finally, etc... Try a condition system today: http://www.gigamonkeys.com/book/beyond-exception-... Reply Paul Houle · 280 weeks ago 0 @Brandon, that's neat stuff. I see some things in that chapter that are right along the lines that I'm thinking. Could this behavior be easily emulated in a language like C# that supports lambdas and delegates? Reply web design company · 280 weeks ago 0 Throw it back Reply Generation 5 » Twitter Joins Me [...] several other bloggers had hotlinked the copy of the twitter fail whale that was in my old “What do you do if you catch an exception?” post.  It turns out that my copy of the whale currently ranks #1 in Google Image Search.  [...] Post a new comment Enter text right here! Comment as a Guest, or login: Name Email Website (optional) http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]
  • 9. Generation 5 » What do you do when you’ve caught an exception? Displayed next to your comments. Not displayed publicly. If you have a website, link to it here. None Subscribe to None Submit Comment Copyright © 2013 Generation 5. WordPress Theme design. http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/[1/12/2014 8:27:44 PM]