Table of Contents
- Intended Audience
- Popular False Premise
- Always Know: Anything May Fail!
- Two Groups of Exceptional Conditions
- What I Mean by "Handling"?
- Ultimate Objectives of Error Handling
- Thoughtful Design and Planning vs. an Afterthought
- Consolidated Error Handling
- Exceptions Fundamentals
- Java Exceptions
- Noble Idea
- Myth: Checked Exceptions Add Safety
- Myth: Recoverable vs. Unrecoverable
- Myth: Unchecked Exceptions are Bad and Dangerous
- Reality: Correctly Handling Checked Exceptions is a Lot of Work
- Business Exceptions
- Avoid Using Exceptions to Communicate Business Data
- Use Unchecked/Runtime Exceptions
- Packaging Exception Classes
Most of today's cutting edge Java frameworks and latest specifications seem to be doing things exactly right. The Spring Framework, or even the EJB 3.x specification are just a couple of the most obvious examples. However, I have not yet seen a book or article written by a well-respected Java authority that would openly and forcefully explain the true essense of error handling, why some approaches are better and safer than others, and why projects like Spring, for example, are chosing their way of handling errors while shying away from one of the most hyped-up features of the Java language. Most publications, in fact, continue to promote the same principles, theories, and stereotypes that have proven to fail so miserably. Many egos are at stake, software engineers generally don't like to admit that they were wrong or that their well-intended solution didn't work very well after all. And I honestly think that it seriously hurts our industry.
If you have been frustrated or dissatisfied with the way your projects handle errors, if you have witnessed how teams struggle chasing subtle untraceable exceptions that provide no answers, and if you are genuinely looking for a better way of doing things, read on. I hope this article answers many of your questions and helps you get on the right track with your next project.
I welcome comments and critical opinions. I only have one favor to ask of anyone who wishes to dismiss my point of view: please read the whole article first and make sure your arguments are not already addressed. It took a great deal of effort to put together this article. I have really tried to deliver the information in the most comprehensive and logical way - based on hundreds of typical questions and arguments I have heard in the past. One of the frustrating experiences has been hearing or reading people bringing up arguments whose flaws have been, I believe, very meticulously exposed in this article.
I'd also like to note that I do not intend to convert anyone who is perfectly happy with the way they are doing things now, if they believe that their approaches, however different from mine, work well for them.
Amazingly, it is a commonly accepted and hardly disputed assumption that if a module may(!) produce an error, then an indication of that possibility must be somehow built into the module to let programmers know whether or not they should provide an error-handling strategy for that module. In other words, the common thinking is: "Let's see if the function tells us that an error may occur inside; if yes, we will decide how to handle the error." If you have just read this and did not immediately see the mind-boggling ridiculousness of that assumption, you may be one of the many who have fallen into the trap. It is this philosophy that makes software systems vulnerable by not at all "expecting" large sets of error conditions, and turns error-handling solutions into unmanageable, cumbersome, inefficient, and unreliable behemoths. I will try to do my best to explain.
It takes a 180-degree change of the view angle to see how simple and elegant error handling may - and must - be.
A subroutine (a.k.a "method", "function", etc.) in a programming language is the most basic form of abstraction. The main purpose of abstraction in software is to hide the implementation details of a particular functional unit, and provide simplicity of usage by exposing only a minimalistic interface via which the clients/callers may effectively invoke the abstracted functionality. Among many benefits, the concept allows to later modify the implementation details without affecting any code that depends on it. The very definition of abstraction implies that no entity that uses the abstracted functionality is in the position to make any assumptions about the implementation details of that unit and may only be aware of the exposed interface. Bingo! (Oh, and before someone even thinks about stating that silly, childish argument... No, it absolutely does not matter if the same programmer writes the client code and the function the code call!) Once something is abstracted, you are not supposed to make any assumptions of the abstracted details! Period.
When it comes to error handling in software, the only safe and correct assumption that may ever be made is that a failure may occur in absolutely every subroutine or module that exists! There are no functions for which a programmer has the right and responsibility to safely assume no possibility of failure. (Again, I don't want to hear any lame demagogy about functions that do nothing, contain no code, or simply return a constant, etc. That does not build up to a valid argument in favor of ridiculously complicating things by distinguishing between methods that may fail and those that may not.)
Even if the designer of the particular method thinks that the method is very simple and solid, it is not responsible (and not smart) to state or imply that it may never fail. A failure may happen due to a wide variety of reasons. The method may have bugs in it. The implementation of the method, or any underlying libraries it depends on may change in the future. Some low-level code that the method depends on may have subtle bugs or not designed to work as expected by a client - in certain environments. And so on. Of course, there are two different types of failures: system/software errors, and purposefully designed/signaled exceptional conditions whose intent is to reflect different types of legitimate business situations that disallow the successful execution of the "main" workflow. We will talk about these differences later. Some people would argue that the "possible failure" premise we mentioned in the beginning usually implies the latter type ("business" conditions.) It doesn't matter. The only safe and responsible way to program is to always KNOW that any method may potentially fail to perform its main task (if not due to a business condition then due to some other type of error) and design for both successful and negative outcomes. Assuming that some methods may fail and some may not will put you on a wrong track from the start.
- the few that actually matter; these are the conditions that must be handled individually due to the fact that the system is expected to (can and must) do something specific in case of such conditions - as defined by the actual needs of the application (requirements), and nothing else; such conditions may be further divided into classes that require (by design!) more specific individual handling; the application designer/programmer may identify and add new error types or exceptional conditions to this group at any stage of development;
- everything else, i.e. all other exceptional conditions that the system cannot and/or does not care to provide any form of recovery other than, perhaps, logging and graceful termination.
The term "handler", as used in this article, refers to a dedicated software module that implements an action to manage a specific class or group of exceptional conditions.
Logging of the most complete error information available should be one of the mandatory steps performed by an error handler. Each type of error should be handled (and, therefore, logged) no more than once within the sub-system where it has occurred. Sometimes it is helpful to intercept and log the error at the boundary of the component where the error occurred - before propagating it further to the client. That may be necessary if the component may be potentially deployed on a different server - to ensure that the error logs are available on that server, as well as to the client applications that would do their own logging.
In a proper design and analysis of any software module or system equal consideration must be given to the desired behavior of the system for both - successful execution and any types of failure. Depending on the system, the latter may be a simple single use case, or a set of quite elaborate use cases whose complexity may measure up to or even outweigh the complexity of the successful work flow.
Therefore, error handling strategies must be considered an essential part of the system requirements and design. Error handling may never be an afterthought, something that programmers may (or may forget to) add after the "main" work flow is implemented. Regardless of error types (e.g. critical unrecoverable system failures or predictable business conditions treated as errors), the system must be designed to accommodate for graceful and efficient handling of each such class of conditions.
As I have stated in the Introduction to this article, when approaching error handling in software development, perhaps, the most important thing to keep in mind is the fact that everything can potentially result in an error, and a failure must be always considered as a possible outcome of an operation. This means that any given method call may potentially produce an error of some sort. There are no operations whose successful execution may be unconditionally guaranteed at any time.
The purpose of error-handling design is not to determine what may fail (since anything may) but to determine how to appropriately divide the complete error set into the sub-sets relevant to the application's desired behavior, and where in the system each such class of failures must be treated. Fortunately, even although errors may occur in thousands of places, a typical application only needs to distinguish between a few types/classes of such conditions. This means that only a few error handling strategies normally need to be implemented per application, and all errors must simply be sorted out and properly directed to the appropriate handler.
Errors may not be reliably handled or controlled in a sporadic decentralized manner. Instead of being sprinkled all over the application, error handling must be consolidated to ensure clarity, reliability, coherence, and prevent a loss of any valuable information about the cause of the error.
Error handling is the job of a few dedicated modules that are designed to treat failures based solely on the application/business requirements.
In the agile development environment, each new requirement and exceptional case must be analyzed to determine which part of the system should be responsible for handling it. In a case of an exceptional error condition, the call stack must immediately unwind all the way to the dedicated handler that possesses the contextual knowledge and ability to handle the error properly. The error data must reach such handlers in its original form, unaltered, with its complete stack trace. The original error information may be supplemented with additional helpful data or message - but never shortened, replaced, or otherwise altered. This is the purpose of exceptions.
When the exception mechanism triggers an exception, the instance of that exception class is broadcast to all the methods on the chain of method calls that eventually resulted in the exception. This means that any of those methods can potentially be coded to catch and handle the exception, if necessary, while others can ignore it. Naturally, there is always one such method in the call chain that is better suited to handle the error than the rest. Which one? The correct answer can only be provided by the application designer, the programmer that is. No tool - a compiler or IDE - can accurately, with 100% certainty determine where in the call stack the given error must be caught and handled. The decision usually depends on the circumstances, such as the application's or use case requirements, etc. So, it is the programmer who has the responsibility to carefully consider all those factors. The error handling constructs must be implemented in the module that contains the sufficient contextual knowledge and ability to take the appropriate corrective action in response to the condition.
One thing that can be stated with certainty is that in the vast majority of cases, the most appropriate place to handle an error is NOT immediately at the origin of the error. The immediate caller of the method that triggers an exception rarely can and should do anything about the error - simply because it does not have enough contextual knowledge of the circumstances in which it itself was called. So, in the majority of cases, the role of the exception should be to efficiently and transparently carry the error signal through the call stack to the one method that has the intelligence to handle that exceptional condition.
catch and handle the exception itself, or to add it to its own signature, which, in turn, subjects each of its own callers to the same restrictions.
The noble idea behind checked exceptions was to introduce a harness mechanism that would remind/force developers to handle errors before they bubble up to the surface and break the application. This concept may be used with benefits in small tight sub-systems where it is, indeed, necessary to force the immediate caller to handle the error at its origin. However, as noted above, such cases, while valid and existent, make up only a small minority of all real-life situations. So, in a vast majority of cases, for proper handling, an exception absolutely must be propagated further up the call stack - to the only appropriate module that was designated to handle the exception. And this is where checked exceptions introduce more problems than they solve.
Read the interview with Anders Hejlsberg (the Chief Architect of the C# project) where the subject of checked exceptions is discussed in great detail. In that interview given back in 2003, Hejlsberg explains the most obvious pitfalls of the Java's implementation of checked exceptions, and why the C# committee had unanimously rejected the idea. He specifically talks about such issues as broken encapsulation, inappropriately forced contracts, negative effect on scalability and "versionabiliy" of systems built with checked exceptions.
Whether the authors of checked exceptions meant this or not, relying on the compiler to indicate which methods can throw exceptions (e.g. fail to do what's expected from it) led so many programmers to disregard the fact that anything may fail under certain circumstances and worry only about a small subset of all potential errors represented by checked exceptions - leaving huge holes in their applications. (I am not theorizing here: it is merely a fact of life, a pathetic reality that can be observed on thousands of Java projects.)
Many people - perhaps, even subconsciously - interpret the mere presence of checked exceptions in the language as a suggestion that some modules/methods may result in errors while others may not! So, they forget to handle anything other than what the compiler forces them to handle... Need I say more?
Sun Microsystems in their tutorial - is that checked exceptions are meant to indicate "recoverable" conditions, while unchecked exceptions represent only errors (programming errors, i.e. bugs.) Some less experienced (or less competent) programmers even conclude from the latter that since runtime exceptions are "unrecoverable", applications should not even do anything about them! (I only hope those guys are not employed by companies that build software for systems where human lives are at stake!) The "Recoverable vs. Unrecoverable" thesis is also suggested in one of my most favorite and respected books on Java - Joshua Bloch's "Effective Java", which is quite understood considering that the book was written by one of the authors of the JDK. With all fairness, Bloch also stresses that checked exceptions should be used with caution and not overused. He suggests to exercise the best judgement when deciding what should be considered recoverable, and when in doubt, opt for an unchecked exception. And that is the approach I use. However, the only conclusion I usually arrive at is that I am in no position to decide for the client how they would want and need to act upon the specific exception. An API defines how a client accesses the unit of functionality, and what results it should expect. No API can dictate how the client should use the result after the result has been obtained. Therefore, only the caller can and should decide what it wants to recover from, and what it does not care to recover from - based on its own needs and not on the implementation detail of the module it calls. On the other hand, any exceptional condition must be handled by a correctly designed system - at some point. The question is where and when. So, it is also very, very wrong to suggest that runtime exceptions are meant to be ignored.
It is absolutely incorrect to assume that all runtime exceptions should not be caught and allowed to propagate to the very "top" of the application. As I have stressed at the beginning of this article, for every exceptional condition that is required to be handled distinctly - by the system/business requirements - programmers must decide where to catch it and what to do once the condition is caught. This must be done strictly according to the actual needs of the application, not based on a compiler alert. All other errors must be allowed to freely propagate to the topmost handler where they would be logged and a graceful (perhaps, termination) action will be taken.
To properly utilize checked exceptions and ensure their propagation to the correct designated handlers in a typical real-world application, a great deal of developer awareness and competence is absolutely required. When checked exceptions are used, even a slightest sloppiness of the programmer inevitably results in chaos, unimaginable code clutter, redundancies and unnecessary complexity. This is at odds with the original intent of checked exceptions, which is to simplify error handling and make it easier for developers of all levels of expertise to know what and when to handle.
In reality, checked exceptions end up confusing programmers, prompting them to handle or swallow exceptions before they reach the proper handler, and ultimately introducing more problems than they solve.
If there is one thing that both camps (checked vs. unchecked) seemingly agree on is that exceptions should be primarily used to indicate exceptional conditions and should not be used to implement business logic. This guideline has been widely promoted by most publications, starting with the very popular and well-respected book by Joshua Bloch, "Effective Java". In reality, however, both camps tend to use exceptions to signal business conditions and direct business logic based on that. Personally, I think that occasionally using an exception to cut through the layers and quickly trigger an alternative business flow is perfectly acceptable, and may be very convenient. However, this should be done with caution and after evaluating all other options.
For example, here is a very typical use case. The client calls some User service to create a new user entry. If a new user is successfully created, a new object of some sort is returned. That may be a user ID, user entity, or some kind of result object that is appropriate according to the use case specification. Naturally, one well-expected legitimate scenario of this use case would be a situation when a user with the provided credentials already exists. Many API designers prefer to throw some sort of a UserAlreadyExistsException in such cases, and many argue that it should be a checked exception - in order to explicitly inform the caller of this potential outcome. Any other type of failure to create a new user would indicate a more serious system failure (or even programming error.)
Here's my decision-making process when designing APIs for use cases like the one described above. First, I identify any "legitimate" situations in which the service will not create a new user, e.g. "the user already exists", or, perhaps, "the max allowed participants exceeded/enrollment temporarily suspended", etc. I know that all other conditions would be represented with a generic service-specific runtime exception, e.g. "UserServiceException" (extends RuntimeException) whose only purpose is to signal the service's failure that is not related to a legitimate business situation, and to provide the most complete error info possible for the diagnostic purposes. Then, I look at the other - legitimate - cases of failure to create a new user and see if my service might have more information than just the binary "yes/no" to return to the client. If the service, upon its failure to create the user with the given data, has a whole lot of meaningful information, then it may be a good idea to have the API return a result object that the client would examine to see whether a user was created or whether the status of the returned object indicates one of the multiple meaningful conditions. In our example, "the user exists" condition may only mean one thing: an entry with the given primary key already exists. So, there is hardly anything else useful that can be returned to the client, and, perhaps throwing an exception is just fine. (Another option might be returning a null vs. a valid newly created object. But the latter may or may not be acceptable depending on the API specifications or specific development guidelines. If the null result may be ambiguous, such solution would not be good. Also, many guidelines discourage returning null for the sake of programming safety.) The bottom line is that throwing an exception in a case of a business condition is only one of several options, and, perhaps, more often than not, it is not the most preferred way. In case the analysis indicates that throwing a business exception is indeed the most convenient solution (and it may be), I would create a separate UserExistsException (that extends my generic UserServiceException), document it in the API's Javadoc, and place it into the method signature. Since it is a runtime exception, it will not force the very immediate caller method on the client to mess with it but will give the client programmer the freedom to decide where to place the try/catch construct for that particular exception without polluting any methods on the call stack in between. The client can simply implement a listener for this particular type of exception and not worry about it being a nuisance anywhere else. As I have said, many developers would prefer to make such business exception a checked exception, which is also a valid option, however I would vote for less burden on the client and providing more information via the method signature and Javadoc. These days, most competent developers would catch a checked business exception on the client the first thing - as soon as they call the API that enforces the contract - and convert it into a runtime exception so it can freely propagate to the dedicated listener elsewhere on the client.
It is reasonable to expect (and require) of each developer to understand the nature of any API they use. So, if the developer designs an application that has a need to implement the workflows for a successful user creation as well as for the situation when a user already exists, it is a full responsibility of that developer to understand how to use the "createUser" API.
If an exception does not represent a legitimate business condition, one should never put meaningful business data into the instance of the exception class! Generally, the information inside the exception is for diagnostics purposes only. In most cases, the only "business" information carried by an exception is the very fact of that particular exception being thrown. If you want to indicate a different type of condition, use a different exception class.
It may be somewhat appropriate in cases of business exceptions to provide a public method or two on the exception class that would allow the clients/handlers to extract some supplementary data. Programmers should use their best judgement to decide what kind of data could be conveyed to the client via an exception, and whether it is appropriate to use that data in the client's business logic. Personally, I believe that in most cases it is not very appropriate or fair to expect the client to build its business logic based on some potential attribute values stored in the instance of the thrown exception class. Exception classes should not contain any "error codes" that are required to be correctly resolved by the client's business logic via all sorts of ugly conditional constructs. This, to a large extent, defeats the purpose of exceptions that is, among other things, to eliminate the necessity in such conditional logic. Unfortunately, this ridiculous pattern of exception misuse is widely spread, and it is even promoted by code examples in many Java books and articles! The worst possible example of bad usage of exception data is when clients are expected to parse the detail message and act upon whatever string tokens are detected.
Exceptions must not carry any user-oriented messages inside them. That's another common form of exception abuse. The "detail message" obtained via the Exception.getMessage() method is not meant for the presentation purposes. It is for diagnostics. If you need to present a user-friendly message in the UI in a case of a particular exception, the appropriate message must be resolved on the client side as part of the handling of that particular exception class - by the dedicated handler. It is your presentation tier - and only the presentation tier - that should be aware of the locales and message resources. Therefore, it is absolutely incompetent to place your presentation-tier-specific messages or message constants into the exception classes, especially those originated in other tiers.
Any mappings between business exceptions and appropriate client actions or presentation details must be implemented on the client itself in a separate dedicated module. For a good example of this concept, I encourage you to take a close look at the Spring MVC's SimpleMappingExceptionResolver class and its usage.
NOTE: You should carefully read the previous chapters in this article before reading this!
Developers of the modules that have no business of handling a particular error as it passes through do not need to be aware of that error at all. They do not need to bother with implementing any code to handle that error. Instead, only one module in the whole application must be implemented to listen to that particular exception, catch it when it bubbles up, and handle it according to the application requirements. Combined with the mandatory common-sense understanding that anything might potentially fail, runtime exceptions become a super-efficient tool that allows to consolidate error handling in a few strategically placed modules, and with bullet-proof safety.
Like any technology or tool, unchecked exceptions may be mishandled, true. However, it is very easy to spot - during testing - a case when a particular run-time exception bubbles up way too close to the surface (or crashes the application!) instead of being handled earlier. A detection of such case indicates that a new handler must be introduced at a deeper level (as needed by the system) that will intercept that particular error before the more generic handler. No modifications to any business code would be required. It is significantly easier and less time-consuming than chasing a usually obscure problem caused by a mishandled checked exception.
While there are cases where a checked exception does no harm and serves good purpose, for the sake of consistency and to avoid error-handling chaos, I recommend using only unchecked exceptions. Unfortunately, the particular implementation of the checked exceptions concept in Java introduces so many possibilities and encouragements for doing things wrong that any benefits of exception-re-enforced contracts between methods (where they may help) pale compared to the overall damage this mechanism causes to the software projects all over the world, and, most importantly, to the very perception of the error handling task and the role of a programmer in it.
Checked JDK and 3rd-party exceptions could always be caught inside the implementations of the your classes, wrapped into module-specific run-time exceptions (usually, with an additional clarifying message), and re-thrown only to be caught and appropriately handled by the designated handler upstream.
When an error occurs within a function - regardless of whether the function declares a checked exception or not - that should never be a surprise to a developer! Just know that it may happen, and stop looking at the compiler to tell you whether it may or may not. It may. Period. End of story.
Programmers must never rely on the source of the potential error to tell them whether they should expect an error or not!!! The necessity for error handling must be unconditionally assumed by all programmers at any time. What this means is that the programmer must always KNOW and REMEMBER that any module or operation may potentially - in certain conditions - fail or misfire. Once this simple fact is understood, and the application programmer knows that the module her code depends on may fail, all she has to decide is WHERE in the application (not WHETHER!) she should handle that potential class of errors - without having to do something about the error immediately at its origin. This is what we should be teaching programmers! Instead, we are telling them: "Hey, we have this cool feature in Java that will tell you precisely when you need to worry about errors; in case a method signature declares an exception, you should pay attention..." Are we kidding ourselves? Does anyone see how horribly destructive - and foolish - such approach is? It is one of the most harmful but well-concealed philosophies that has ever been burnt into the minds of millions of programmers. I can't tell you how many times I have heard a developer scream with rage at his application exploding in his face: "What the ...!!! I have taken care of all the exceptions and it still blows up!" What they are saying is that they had caught all the checked exceptions in their code, but didn't even think about any other potential failures. Checked exceptions re-enforce this horrific mindset. Instead of consciously expecting negative results as a given possible outcome of any operation and thoughtfully designing against them, we wait for the compiler to tell us whether we should worry about a possible error or not! That is just ridiculous. The implication of checked exceptions (not intentional, of course) is that if the method does not throw a checked exception, there's nothing to worry about. We know that it is not true because checked exceptions are just a subset of all possible exceptions that may be thrown by the code that implements the operation. And if we know that, we understand that even if we handle the checked exceptions, at some point we must take care of everything else. And if so, we don't need checked exceptions. The problem is that many today's Java programmers raised on checked exceptions are not used to think that way. As the result, they often forget to handle the rest of the exceptions. And you can't really blame them, because - with checked exceptions - the error handling code is everywhere, literally in every single method, i.e. in thousands of places. It is virtually impossible to keep track of what and where is being handled. That is the reason why so many errors are not handled at all, countless errors are mishandled, and so much of valuable error information is lost completely before it reaches the proper place in the system where sense could be made of it, where it could and should be properly handled.
Fortunately, it's easy to build bullet-proof systems with only unchecked exceptions (much easier than ensuring proper handling of checked ones.) Any application, in reality, only needs to distinguish between a very limited set of exception classes that need to be handled separately by strategically placed dedicated handlers to which these classes of exceptions should propagate without being intercepted. Everything else should be allowed to bubble up to the "ultimate" top-level (the "toppest") handler that catches everything that was not handled before and handles it gracefully. (This means, once such [unchecked] exception is thrown, you don't have to worry about it anywhere in your code, knowing that it will be caught and handled by your top-level handler.) So, you normally start with creating such top-level handler that will catch everything inside the system.
For example, in a Spring MVC application that would be an exception resolver class configured to catch and handle the most generic Exception class, and, perhaps, resolving the error to the generic error page that politely apologizes to the user and informs the user that the system is currently not able to process the request, etc. Of course, in a case of such web application, there must also be a filter defined in the web.xml file that would ensure that any critical exceptions that may potentially occur in JSP tags on the pages would also be caught and the application redirected to an error page. Such filter is required because any JSP tag exceptions occur outside the Spring-managed controller and will not be dispatched to the resolver registered in the Spring application context.
Next, we need to identify the classes of conditions that must to be handled individually in a distinct manner. We decide where such events should be handled in our system - based on our system's requirements, and nothing else! For example, in Spring MVC web applications, often a single generic exception resolver class may be all you need to handle all your server-side exceptions! Such exception resolver would resolve all distinct exception classes by implementing the mappings between the exception classes and views (or, in the case of Spring WebFlow, flow states.) Based on the class of the exception thrown anywhere within the application, it will redirect the request to the appropriate view or state - possibly, after executing some logic, if necessary. The resolver would always fall back on the generic "catch-all" redirection, in case the caught exception class is not explicitly mapped to any view or state.
If, during testing, we discover that some exception propagates too far (e.g. to the topmost handler) instead of being handled earlier, well, we immediately know that another lower-level handler is required (or that the particular exception should be caught by one of the existing lower-level handlers.) That's it. Very simple. And very reliable.